RE: LeoThread 2024-10-05 09:19

Llama 3.2 introduces significant advancements over Llama 3.1, particularly by adding multimodal capabilities and optimizing performance for various deployment environments.

Multimodal Capabilities: One of the major differences is that Llama 3.2 introduces models that can handle both text and images. The 11B and 90B variants integrate image processing, making them capable of visual reasoning, image captioning, and understanding text-image relationships. This contrasts with Llama 3.1, which is strictly text-based.