Part 6/12:
The advent of large language models (LLMs) like GPT raised hopes of overcoming translation barriers, but the speaker notes that most LLMs are predominantly trained on English data—up to 95%. Consequently, they perform poorly with Indian languages and other regional dialects, as they tend to force-fit non-English content into English structures, leading to inaccurate or unnatural translations.
Example: Translating "I am going to the hospital" into Hindi might result in a sentence structured in English syntax, which feels unnatural and can distort meaning.