SUMANTA KABIRAJ's blog

How Does ChatGPT Actually Understand Your Questions?

SUMANTA KABIRAJ — Wed, 01 Jul 2026 06:53:18 GMT

You type a question, hit enter, and a few seconds later you get an answer that actually makes sense. It feels a little like magic. It isn't — it's math, a lot of it, running very fast. Here's what's really going on under the hood.

1. What Is an LLM, Anyway?

LLM stands for Large Language Model. Strip away the buzzwords and it's a program trained on enormous amounts of text — books, articles, code, conversations — to learn the patterns of how language works. Not facts memorized like a database, but patterns: which words tend to follow which, how ideas connect, how a question relates to an answer.

LLMs exist to solve a stubbornly hard problem: getting a computer to work with human language, which is messy, ambiguous, and full of context. "Can you get that thing from the store?" makes perfect sense to a person and means almost nothing to a traditional program.

You've probably already met a few LLMs by name: GPT-4/GPT-5 (OpenAI), Claude (Anthropic), Gemini (Google), Llama (Meta). And you're using them more than you might realize — in chatbots, in your email's "smart reply" suggestions, in search engines that now summarize instead of just linking, in coding assistants, in customer support widgets.

2. What Actually Happens When You Send a Message

The journey from "you typing" to "ChatGPT replying" has three simple stages:

You type a prompt. Could be a question, an instruction, a half-formed thought — doesn't matter.
The message gets processed. The model breaks your text down, feeds it through its network, and works out what's likely to come next, word by word (more precisely, piece by piece — more on that below).
A response gets generated. Not copy-pasted, not looked up — built, one piece at a time, based on probabilities the model learned during training.

That last point matters: ChatGPT isn't Googling your question and pasting the top result. It's generating a fresh sequence of text based on patterns, which is why it can write a poem about your cat's opinion on Mondays — that sentence never existed anywhere on the internet before you asked for it.

3. Why Computers Don't "Understand" Language the Way We Do

Here's the inconvenient truth: computers don't understand words. At all. They understand numbers — specifically, arrays of numbers being multiplied and added together very quickly.

So before a model can do anything with "What's the capital of France?", that sentence has to become numbers. Every letter, every word, every model on Earth — text always gets converted into numeric form first. The unit that conversion happens in is called a token.

4. Tokenization: Chopping Language Into Bite-Sized Pieces

Tokenization is the process of breaking text into smaller chunks — tokens — that the model can turn into numbers and actually process.

Why not just use whole words? Two reasons: there are too many possible words (including typos, made-up words, and words in every language), and some words are really made of meaningful smaller parts. So models split text into a mix of whole words, word-fragments, and even single characters, depending on what's efficient.

A rough example — the sentence:

"Tokenization is tricky!"

might get split into something like:

["Token", "ization", " is", " tricky", "!"]

Five tokens, four words. That's a normal ratio — as a rule of thumb, 100 tokens is roughly 75 words in English. Every one of those tokens gets mapped to a number, and that number is what the model actually operates on.

This is also why LLMs have a context window — a maximum number of tokens they can "see" at once, like a limited-size whiteboard. Old messages fall off the edge once you fill it up.

5. Transformers: The Architecture That Changed Everything

Tokens alone don't get you understanding — you need something to actually process the relationships between them. That something, for basically every serious LLM today, is called a Transformer.

Introduced in a 2017 paper with the almost cocky title "Attention Is All You Need," the Transformer solved a problem older models struggled with: figuring out which words in a sentence matter to each other, even when they're far apart.

Take: "The trophy didn't fit in the suitcase because it was too big." Is "it" the trophy or the suitcase? A human resolves this instantly using context. Transformers do it using a mechanism called self-attention — for every word, the model calculates how much "attention" to pay to every other word in the sentence when figuring out its meaning. Do this across many layers, at massive scale, and you get a system that can track meaning across entire paragraphs, not just neighboring words.

That's the "T" in GPT — Generative Pre-trained Transformer. It's why this architecture took over: it's parallelizable (fast to train on modern hardware), scales well with more data, and is genuinely good at capturing long-range context — the exact thing older approaches were bad at.

Putting It All Together

The full pipeline, start to finish:

One more knob worth knowing about: temperature. It controls how "safe" vs. "creative" the model's word choices are.

Low temperature → the model almost always picks the most probable next word. Output is focused, consistent, a bit predictable. Good for factual answers, code, summaries.
High temperature → the model is more willing to pick less-likely words. Output is more varied, surprising, and creative — but also more prone to going off the rails. Good for brainstorming, poetry, casual chat.

So, Is It "Understanding"?

Not in the human sense — there's no consciousness, no beliefs, no lived experience behind the response. But functionally, by learning statistical patterns over trillions of tokens and using attention to track context, the model produces responses that behave like understanding: relevant, coherent, and tailored to what you actually asked.

It's not magic. It's tokens, numbers, and attention — just at a scale big enough to feel like magic.

Decoding AI 🤖 Jargons with Chai ☕

SUMANTA KABIRAJ — Tue, 08 Apr 2025 11:56:49 GMT

Brewing Machine Learning Concepts One Cup at a Time

"AI is like chai. You need the right ingredients, in the right order, with just the right amount of attention." – Hitesh Choudhary

Every evening at 5 PM, a ritual unfolds across the streets of India—cups clink, kettles hiss, and the air fills with the spicy scent of chai. It's not just a drink. It’s an experience.

Now imagine decoding artificial intelligence concepts the same way—one flavorful sip at a time.

This blog is for you if AI jargon like "self-attention" or "tokenization" makes your head spin faster than a boiling kettle. Let's walk through these terms using the humble, glorious cup of chai. And we’ll have company—two of the best teachers in tech: Hitesh Choudhary (founder of Chai Code) and Piyush Garg, both known for making complex concepts digestible.

🥄 1. Tokenization – Chopping the Words

Before brewing, you prep ingredients—chop ginger, measure sugar. Tokenization is the AI’s version of chopping language into pieces it can understand.

Example:

Sentence: “Chai is life.”
Tokens: [“Ch”, “ai”, “is”, “life”, “.”]

Transformers prefer subwords so they can handle unseen words gracefully.

📦 2. Vocab Size – Size of the Spice Rack

The bigger your spice rack, the more variations of chai you can make. Vocab size in AI determines how many unique tokens a model can understand.

GPT-3 has ~50,000 tokens.
Too small: Can’t understand rare words.
Too big: Model gets bloated.

🏗️ 3. Transformers – The Master Chaiwala of AI

If AI were a tea stall, the transformer would be the all-knowing chaiwala. Just like a skilled chaiwala mixes ingredients to craft that perfect flavor, a transformer learns which words to focus on, which to ignore, and how they all mix together.

Imagine you're making masala chai for four friends—each with a unique preference. One wants more ginger, another less sugar. You adapt the recipe. Transformers do the same with language.

🍵 Chai Code Tip by Hitesh:

“Transformers changed the AI game. Before them, models forgot long-term context like bad waiters forgetting orders. Now they remember, relate, and generate.”

↪️ 4. Encoder – Crushing the Spices

Before making chai, we start by crushing cardamom, clove, and ginger. That's the encoder—it takes raw ingredients (words), processes them, and preps them into a meaningful mixture.

It doesn’t make chai—it just prepares the flavor base.

For example: “I love chai” gets turned into a context-rich internal representation of what each word means and how they connect.

↩️ 5. Decoder – Pouring the Final Cup

Now that we’ve brewed the base, we strain and pour the chai into cups—the decoder's job. It takes all that encoded meaning and produces the final output: a translated sentence, a reply, or a line of poetry.

Without the decoder, all you’ve got is a spicy mess in the pot. No drink.

🧮 6. Vectors – Ingredient Flavor Profiles

Ever wondered how ginger is warming, and cardamom is floral? In AI, words are converted into vectors—sets of numbers that represent their “flavor.”

Like:

"chai" => [0.4, 0.8, -0.3]
"tea" => [0.41, 0.79, -0.28]

Words with similar meanings have similar vectors.

🌊 7. Embeddings – Soaking the Spices

When you let chai simmer, the spices soak into the liquid. In AI, embeddings are like that—the deeper flavor of a word in context.

Example from Piyush Garg: "The word 'bank' means something different in 'river bank' vs 'bank account.' Context changes flavor. Embeddings capture that nuance."

🔢 8. Positional Encoding – The Recipe Order

Chai tastes awful if you boil milk before water or add sugar too early. Order matters.

But transformers don’t naturally understand sequence. That’s where positional encoding comes in—like writing step numbers on your recipe.

“I love chai” ≠ “Chai love I”
Positional encodings let the model know who came first.

🧠 9. Semantic Meaning – The Taste of a Word

Just like "masala chai" and "spiced tea" taste similar, semantic meaning is about understanding similarity even when words differ.

Example:
“Large” ≈ “Big”
“Hot” = (temperature or trend)

Semantic understanding helps models answer:

“What’s the opposite of cold?”
“What does ‘burning out’ mean in work context?”

👁️ 10. Self-Attention – The Tasting Spoon

While brewing, the chaiwala constantly tastes and adjusts. Ginger too strong? More milk. That’s self-attention—a mechanism where each word checks in on others in the sentence.

Example:
“The tea that Hitesh made was perfect.”
→ “tea” should relate more to “perfect” than “Hitesh.”

☕ 11. Softmax – Deciding on the Flavor

Now all ingredients are in. But what dominates? Is the cardamom strongest? That’s softmax—it takes all possible outputs and decides the most probable one.

Output:

“chai” – 70%

“coffee” – 20%

“milk” – 10%

Chai wins.

🧠🧠 12. Multi-Head Attention – The Multi-Sensory Taste Test

You know those chaiwalas who taste, sniff, and swirl the pot? They use multi-head attention—looking at multiple aspects of flavor at once.

AI does the same. Each “head” looks at different relationships—like grammar, emotion, or topic—simultaneously.

🔥 13. Temperature (temp) – Spontaneity Level

In AI, temperature controls creativity.

Low temperature (0.2): Safe, predictable text → “Chai is a beverage.”
High temperature (0.9): Creative, poetic → “Chai is liquid nostalgia in a clay cup.”

Hitesh once demoed this in a video where the AI generated a Bollywood chai poem at temp = 1.0!

📆 14. Knowledge Cutoff – The Last Chai Update

If your chaiwala moved out of town in 2021, he won’t know what “rose chai” is. That’s knowledge cutoff.

GPT-3's cutoff: 2021
GPT-4: 2023
Anything newer is unknown to the model unless updated.

🫶 Final Sips: Brewing Language with Love

Just like every cup of chai is a delicate balance of spices, temperature, and timing, so is every sentence generated by an AI. The terms may sound technical, but underneath it all, they’re about relationships, understanding, and context—just like great conversation over tea.

🍵 Bonus Chai Code Example – With Hitesh and Piyush

const chaiAI = (prompt, temp = 0.7) => {
  const transformer = new HiteshChoudhary(); // Master chaiwala
  const encoder = new PiyushGarg(); // Spice grinder
  const context = encoder.encode(prompt);
  const output = transformer.brew(context, temp);
  return output; // Perfectly brewed sentence
};

console.log(chaiAI("Tell me a poetic line about chai", 0.9));
// Output: “Chai whispers secrets of sun-drenched afternoons into clay cups of joy.”

🙌 Final Thoughts: Learn with Chai & Code

Whether you're watching Hitesh Choudhary break down transformers or Piyush Garg explain embeddings with whiteboard magic, just know—you’re learning from the best.

And just like chai, it’s not about rushing. Let the concepts steep, savor the flavor, and share a cup of code with your fellow learners.

“AI isn’t magic—it’s just math with the right masala.”
— Piyush Garg

Wrapping Up With Chai 🍵

All this AI stuff sounds complex, but when you hear Hitesh explain it in Hindi, with chai in one hand and a whiteboard in the other, it just clicks. And when Piyush Garg shows you how to use it in real projects with React or Node.js, you start building cool stuff immediately.

So next time someone says:

“Self-attention with multi-head transformers uses vector embeddings and softmax over tokenized vocab to preserve semantic meaning...”

You just sip your chai and say:

“Bhai, Hitesh ne padhaya hai. Sab samajh gaya.” 😎