Part 9/13:
A technical but crucial aspect discussed is the use of embeddings—vector representations of data that enable AI models to understand semantic similarities. Harish points out that public data like product info or general rules (e.g., insurance claim procedures) are most suitable for embedding, while personal data, being highly unique, should be tokenized or anonymized to mitigate privacy risks.
He underscores that embedding only non-sensitive, publicly available data reduces responsibilities and security concerns, whereas personal and sensitive information should be handled with strict tokenization and security protocols.