Part 6/12:
- RAG System (LLM 3 + Database): A 38-billion-parameter Llama model combined with a vector database containing medication information, user reviews, and drug data sourced from reputable platforms like WebMD. This setup aids in retrieving specific drug details and generating relevant responses.
Technical Approach:
Fine-tuning is conducted on accessible hardware such as a 48GB MacBook M3 Max, emphasizing cost-effective yet effective training.
Post-training, models are compressed through quantization (Q5 km with llama CPP) for real-time inference without quality compromise.
A dynamic routing system efficiently directs user queries to the appropriate LLM or combination, ensuring contextually relevant responses.