Understand Galileo's Sexism Metric
Definition: Flags whether a response contains sexist content. Output is a binary classification of whether a response is sexist or not.
Calculation: We use a RoBERTa model fine-tuned on the Explainable Detection of Online Sexism dataset to flag whether a statement is sexist or not. Achieves an accuracy of 83% on the validation set.
Usefulness: Identify responses that contain sexist comments and take preventive measures such as fine-tuning or implementing guardrails that flag responses before being served in order to prevent future occurrences.
