Skip to content

The world's largest dataset of profanity.

License

Notifications You must be signed in to change notification settings

mod-tc/profanity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The Obscenity List

by Surge AI

Ever wish you had a ready-made list of profanity? Maybe you want to remove NSFW comments, filter offensive usernames, or build content moderation tools, and you can't dream up enough obscenities on your own.

We're creating the world's largest profanity dataset, in 20+ languages.

Dataset

This repo contains 1600+ popular English profanities and their variations.

Columns

  • text: the profanity
  • canonical_form_1: the profanity's canonical form
  • canonical_form_2: an additional canonical form, if applicable
  • canonical_form_3: an additional canonical form, if applicable
  • category_1: the profanity's primary category (see below for list of categories)
  • category_2: the profanity's secondary category, if applicable
  • category_3: the profanity's tertiary category, if applicable
  • severity_rating: We asked 5 Surge AI data labelers to rate how severe they believed each profanity to be, on a 1-3 point scale. This is the mean of those 5 ratings.
  • severity_description: We rounded severity_rating to the nearest integer. Mild corresponds to a rounded mean rating of 1, Strong to 2, and Severe to 3.

Categories

We organized the profanity into the following categories:

  • sexual anatomy / sexual acts (ass kisser, dick, pigfucker)
  • bodily fluids / excrement (shit, cum)
  • sexual orientation / gender (faggot, tranny, bitch, whore)
  • racial / ethnic (chink, n3gro)
  • mental disability (retard, dumbass)
  • physical disability (quadriplegic bitch)
  • physical attributes (fatass, ugly whore)
  • animal references (pigfucker, jackass)
  • religious offense (goddamn)
  • political (China virus)

Future

We'll be adding more languages and profanity annotations (e.g., augmenting each profanity with its severity level, type, and other variations) over time.

Contact

Need a larger set of expletives and slurs, or a list of swear words in other languages (Spanish, French, German, Japanese, Portuguese, etc)? We love feedback. Post an issue or reach out to profanity@surgehq.ai!

Profanity Logo

Follow us on Twitter at @HelloSurgeAI.

About

The world's largest dataset of profanity.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published