“[Labs like] OpenAI and Anthropic utilize [third-party] data labeling services for PhD-level reasoning data for science, math, and coding,” Xiao wrote in a post on X. “[F]or expert labor availability and cost reasons, many of these data providers are based in China.”
Labels, also known as tags or annotations, help models understand and interpret data during the training process. For example, labels to train an image recognition model might take the form of markings around objects or captions referring to each person, place, or object depicted in an image.