Studies have shown that biased labels can produce biased models. For example, the average annotator is more likely to label phrases in African-American Vernacular English (AAVE), the informal grammar used by some Black Americans, as toxic, leading AI toxicity detectors trained on the labels to see AAVE as disproportionately toxic.
Other experts don’t buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.