Meta has released Llama 3.3, a new open-source multilingual large language model (LLM). Llama 3.3 is designed to offer high performance while being more accessible and affordable than previous models.

Key features of Llama 3.3
- Smaller Size, Similar Performance: Despite having only 70 billion parameters, Llama 3.3 delivers performance comparable to Meta’s 405 billion parameter Llama 3.1 model. This means users can achieve high-quality results with significantly lower computational overhead and cost.
- Cost-Effectiveness and Efficiency: Llama 3.3 is optimized for cost-effective inference, with token generation costing as little as $0.01 per million tokens. This makes it much more affordable than models like GPT-4 and Claude 3.5. It also requires significantly less GPU memory than larger models, potentially saving users hundreds of thousands of dollars in upfront GPU costs and continuous power expenses.
- Multilingual Capabilities: Llama 3.3 excels in multilingual tasks, achieving a 91.1% accuracy rate on the MGSM benchmark. It supports languages like German, French, Italian, Hindi, Portuguese, Spanish, and Thai, in addition to English.
- Advanced Features: Llama 3.3 boasts a 128k token context window, allowing it to process long-form content. It uses Grouped Query Attention (GQA) for improved scalability and performance. Reinforcement learning with human feedback (RLHF) and supervised fine-tuning (SFT) ensure the model aligns with user preferences for safety and helpfulness.
- Environmental Responsibility: Meta offset the greenhouse gas emissions from the intensive training process with renewable energy, resulting in net-zero emissions for the training phase.
Llama 3.3 vs. Its Predecessors
- Llama 3.1-405B: While achieving similar performance, Llama 3.3 significantly reduces the hardware requirements. Llama 3.1-405B requires a massive amount of GPU memory (between 243 GB and 1944 GB), making it inaccessible for many users. Llama 3.3, with its smaller size, is much more affordable to run and deploy.
- Llama 3.1-70B: Llama 3.3 surpasses the identically sized Llama 3.1-70B in several benchmarks, showcasing improved performance in multilingual dialogue, reasoning, and other NLP tasks. It also shows significant improvement in instruction following and multilingual reasoning.
- Llama 3.2: Llama 3.3 boasts numerous enhancements over Llama 3.2, including improved fine-tuning, expanded safety features, support for more languages, a longer context window, stronger benchmark performance, tool-use capabilities, improved energy efficiency, and a more robust responsible AI framework.
Llama 3.3 Use Cases
Llama 3.3’s balance of performance and efficiency makes it suitable for a variety of applications:
- Multilingual Chatbots and Assistants: Llama 3.3 is ideal for building chatbots and virtual assistants that can handle multiple languages.
- Coding Support and Software Development: With high scores on coding benchmarks, Llama 3.3 can assist with code generation, debugging, and other programming tasks.
- Synthetic Data Generation: Llama 3.3 can generate high-quality synthetic datasets for various purposes, including training other AI models.
- Multilingual Content Creation and Localization: Llama 3.3 can be used for tasks such as generating marketing materials, translating documents, and creating multilingual content.
- Research and Experimentation: Researchers can use Llama 3.3 to explore language modeling, alignment techniques, and other areas of AI research.
- Knowledge-Based Applications: Llama 3.3 is effective for tasks involving large amounts of text, like question answering, summarization, and report generation.
- Flexible Deployment for Small Teams: Llama 3.3’s ability to run on standard developer hardware makes it a practical and affordable solution for smaller teams.
Availability and Access
For users and organizations with under 700 million monthly active users: The license is generally free. Meta offers Llama 3.3 under the Llama 3.3 Community License Agreement. This agreement grants a non-exclusive, royalty-free license to use, reproduce, distribute, and modify the model and its outputs. Developers who integrate Llama 3.3 into products or services are required to:
- Include proper attribution, such as “Built with Llama”
- Adhere to an Acceptable Use Policy that prohibits activities like generating harmful content, breaking laws, or enabling cyberattacks
For organizations with over 700 million monthly active users: A commercial license must be obtained directly from Meta. The sources do not specify the terms or cost of this commercial license.