LLMs & GenAI - ML Digest

AI Agents and Agentic Systems: From Chat to Action

Chatbots produce text. Agents produce outcomes. The conceptual shift is simple: instead of stopping at an answer, an AI agent […]

AI Agents and Agentic Systems: From Chat to Action Read More »

Retrieval-Augmented Generation (RAG): A Practical Guide

Retrieval-Augmented Generation (RAG) is a technique that acts as an open-book exam for Large Language Models (LLMs). It allows a

Retrieval-Augmented Generation (RAG): A Practical Guide Read More »

FLAN-T5: Instruction Tuning for a Stronger “Do What I Mean” Model

Imagine a student who has memorized an entire textbook, but only answers questions when they are phrased exactly like the

FLAN-T5: Instruction Tuning for a Stronger “Do What I Mean” Model Read More »

Mixture of Experts (MoE): Scaling Model Capacity Without Proportional Compute

Imagine you are building a house. You could hire one master builder who knows everything about construction, from plumbing and

Mixture of Experts (MoE): Scaling Model Capacity Without Proportional Compute Read More »

Understanding Diffusion Models: How AI Generates Images from Noise

Imagine standing in an art gallery, looking at a detailed photograph of a landscape. Now imagine a thick fog slowly

Understanding Diffusion Models: How AI Generates Images from Noise Read More »

DeepSeek V3.2: Architecture, Training, and Practical Capabilities

DeepSeek V3.2 is one of the open-weight models that consistently competes with frontier proprietary systems (for example, GPT‑5‑class and Gemini

DeepSeek V3.2: Architecture, Training, and Practical Capabilities Read More »

ALiBi: Attention with Linear Biases

Imagine you are reading a mystery novel. The clue you find on page 10 is crucial for understanding the twist

ALiBi: Attention with Linear Biases Read More »

RoPE Made Easy: Understanding Rotary Positional Embeddings Step by Step

Rotary Positional Embeddings represent a shift from viewing position as a static label to viewing it as a geometric relationship. By treating tokens as vectors rotating in high-dimensional space, we allow neural networks to understand that “King” is to “Queen” not just by their semantic meaning, but by their relative placement in the text.

RoPE Made Easy: Understanding Rotary Positional Embeddings Step by Step Read More »

per-token computation costs with and without KV caching

KV Caching Made Simple: The Key To Efficient LLM Inference

For Large Language Models (LLMs), inference speed and efficiency are paramount. One of the most critical optimizations for speeding up text generation is KV-Caching (Key-Value Caching).

KV Caching Made Simple: The Key To Efficient LLM Inference Read More »

How Language Model Architectures Have Evolved Over Time

Introduction: The Quest to Understand Language Imagine a machine that could read, understand, and write text just like a human.

How Language Model Architectures Have Evolved Over Time Read More »

A Guide to Positional Embeddings: Absolute (APE) vs. Relative (RPE)

Think of a bookshelf versus a long hallway: absolute positional embeddings (APE) assign each token a fixed “slot” on the shelf, while relative positional embeddings (RPE) care only about the distance between tokens — like how far two people stand in a hallway. This article first builds intuition with simple analogies and visual descriptions, then dives into the math: deriving sinusoidal APE, showing how sin–cos interactions yield purely relative terms, and explaining how RPE is injected into attention (including T5-style relative bias). Practical PyTorch examples are provided so the reader can implement APE and RPE, understand their trade‑offs (simplicity and extrapolation vs. relational power), and choose the right approach for real-world sequence tasks.

A Guide to Positional Embeddings: Absolute (APE) vs. Relative (RPE) Read More »

How Large Language Model Architectures Have Evolved Since 2017

Imagine building a city: at first, you lay simple roads and bridges, but as the population grows and needs diversify,

How Large Language Model Architectures Have Evolved Since 2017 Read More »