Uli for Trust and Safety Teams

What is the Uli slur list

The Uli slur list is a dataset of slurs and coded language/dog whistling terms in Indian languages. While we call it the slur list, it is more accurately a dataset that contains phrases as well as the metadata on the words, such as what makes the word problematic, whether it has been reclaimed and the identity groups targeted.

Why was the Uli slur list created?

The Uli slur list was crowdsourced with researchers and activists in the process of building a robust dataset for the Uli plugin feature. It has now become a stand-alone resource that supports Trust and Safety teams and researchers.

How is it created?

The slur list is crowdsourced with the assistance of researchers and activists in the gender and feminist rights sector who have so far contributed slurs in Indian English, Hindi, Tamil and Malayalam. This takes place through online annotation sessions, conducted in line with our annotation guidelines.

What are the future plans?

We are continuing to conduct crowdsourcing sessions to expand slur list into more languages and improve our understanding of the slurs with metadata. We are also iterating the plugin to enable social features that will allow people to make contributions to our slur list through it directly.

Seperately, we are also working on a framework that could guide us on compensating the annotators who contribute to Uli's datasets. As a part of Mozilla's Data Futures Lab cohort, we wrote a White Paper that seeks to understand the different ways in which projects value the contribution of expert annotators- if you'd like a copy of it, please reach out to us.

I want to use this list. How should I do that?

Some versions of the slur list are open access, and available on our Github repo. However, this list is continuously iterated on by detecting spelling variations, adding metadata, and expanding languages. Please reach out to kaustubha@tattle.co.in and tarunima@tattle.co.in if you would like to access the most up to date version.