Chapter 49: The rustc Compilation Pipeline
Prerequisites
You will understand
- Lexing → parsing → HIR → MIR → LLVM IR → machine code
- Where borrow checking happens in the pipeline
- Why MIR matters for optimization and analysis
Reading time
Source Code Through Rustc's Internal Stages
Why the Compiler Sees More Structure Than You Wrote
Step 1 - The Problem
Without a mental model of the compiler pipeline, many advanced Rust phenomena feel disconnected:
- why borrow checking sees code differently from surface syntax
- why macro expansion changes what later phases operate on
- why generics are zero-cost at runtime yet expensive for compile time
- why diagnostics often refer to desugared or inferred structure
The pipeline view turns these from isolated facts into one story.
Step 2 - Rust’s Design Decision
Rust compiles through a sequence of increasingly semantic representations instead of one monolithic pass:
- parsing and expansion
- lowering to internal representations
- type and trait reasoning
- borrow checking and MIR optimizations
- codegen through LLVM
Rust accepted:
- a sophisticated compiler architecture
- many internal representations
Rust refused:
- trying to do all semantic work on raw syntax
- collapsing high-level language guarantees into ad hoc backend heuristics
Step 3 - The Mental Model
Plain English rule: each compiler stage removes one kind of ambiguity and adds one kind of meaning.
Surface Rust is for humans. HIR is for semantic analysis. MIR is for control-flow and ownership analysis. LLVM IR is for low-level optimization and machine-code generation.
Step 4 - Minimal Code Example
Take this source:
#![allow(unused)]
fn main() {
for x in values {
println!("{x}");
}
}
This is not how the compiler reasons about it in later stages. By HIR/MIR time, it has been desugared into iterator and control-flow machinery.
Step 5 - Walkthrough
High-level pipeline:
Source
-> tokens
-> AST
-> expanded AST
-> HIR
-> MIR
-> LLVM IR
-> machine code
What each stage is really doing:
- Parsing turns text into syntax structure.
- Macro expansion rewrites macro-driven syntax into ordinary syntax trees.
- Name resolution ties names to definitions.
- HIR lowers away much syntactic sugar and becomes a better substrate for type checking.
- Trait solving and type checking operate on this more semantic form.
- MIR makes control flow, temporaries, and drops explicit.
- Borrow checking and many mid-level optimizations operate on MIR.
- Monomorphization creates concrete instantiations of generic code.
- LLVM handles low-level optimization and machine-code emission.
The invariant is:
ownership, typing, and dispatch semantics must become explicit enough before the compiler can check or optimize them soundly.
Step 6 - Three-Level Explanation
The compiler does not check your program only as written. It gradually turns your code into simpler internal forms that are easier to analyze.
HIR matters because it is where a lot of semantic reasoning becomes clearer after desugaring.
MIR matters because:
- control flow is explicit
- drops are explicit
- temporary lifetimes are clearer
- borrow checking is more precise there than on raw syntax
Monomorphization matters because it explains why generic code is fast but can grow compile time and binary size.
The pipeline is also a design boundary system:
- macro system before semantic analysis
- HIR for language-level meaning
- MIR for ownership/control-flow reasoning
- backend IR for machine-level optimization
This separation lets Rust pursue strong source-level guarantees without forcing the backend to reconstruct ownership and borrow semantics from machine-ish code.
HIR, MIR, and Borrow Checking
HIR is where many surface conveniences have already been normalized.
MIR is closer to a control-flow graph with explicit temporaries, assignments, and drops. That is why borrow checking happens there: the compiler can see where values are live, where borrows start and end, and how control flow really branches.
This is also why some borrow-checker errors make more sense once you imagine the desugared form rather than the prettified source.
Trait Solving
Trait solving answers questions like:
- which method implementation applies here?
- does this type satisfy the required bound?
- which associated type flows from this impl?
This is deeper than method lookup in many OO languages because traits interact with generics, blanket impls, associated types, and coherence.
For the handbook reader, the important point is not every internal algorithm detail. It is:
many “trait bound not satisfied” errors are the surface symptom of the compiler failing to prove a capability relationship in the current type environment.
Monomorphization and LLVM
Monomorphization turns:
#![allow(unused)]
fn main() {
fn max<T: Ord>(a: T, b: T) -> T { ... }
}
into concrete instances like:
max_i32max_String
That is why generics can be zero-cost at runtime.
LLVM then optimizes the resulting concrete IR and emits machine code. Rust hands LLVM low-level work, but not the job of rediscovering Rust’s ownership or lifetime story. Those semantics were handled earlier.
Incremental Compilation
Large Rust builds would be intolerable without reuse. Incremental compilation lets the compiler avoid rebuilding every query result from scratch when only some inputs changed.
For practitioners, the practical takeaway is simple:
- architectural boundaries matter for compile times too
- generic-heavy and macro-heavy designs can shift compilation cost significantly
Step 7 - Common Misconceptions
Wrong model 1: “Borrow checking operates directly on my source text.”
Correction: it works on MIR after substantial lowering and explicit control-flow modeling.
Wrong model 2: “LLVM is responsible for all of Rust’s intelligence.”
Correction: LLVM is crucial for low-level optimization, but Rust’s safety and ownership reasoning happens earlier.
Wrong model 3: “Generics are fast because LLVM is magical.”
Correction: monomorphization gives LLVM concrete code to optimize.
Wrong model 4: “HIR and MIR are too internal to matter.”
Correction: understanding them makes compiler diagnostics and language behavior far more legible.
Step 8 - Real-World Pattern
This understanding pays off when:
- reading compiler errors
- debugging macro-heavy code
- reasoning about generic performance
- browsing
rust-lang/rust - understanding why certain language proposals affect compiler complexity
Step 9 - Practice Block
Code Exercise
Take one for loop and manually explain what iterator and control-flow machinery it desugars into conceptually.
Code Reading Drill
Explain why borrow checking becomes easier on a control-flow graph than on raw source syntax.
Spot the Bug
Why is this misunderstanding wrong?
"LLVM handles borrow checking because it sees the low-level code."
Refactoring Drill
Take one confusing borrow error and restate it in MIR-style terms: owner, temporary, drop point, last use, and conflicting access.
Compiler Error Interpretation
If an error seems odd on the original source, ask: “what desugared or lowered form is the compiler probably reasoning about instead?”
Step 10 - Contribution Connection
After this chapter, you can:
- read compiler docs with less intimidation
- interpret borrow and trait errors with deeper structure
- approach
rust-lang/rustwith a phase-based map - reason about compile-time versus runtime tradeoffs more clearly
Good first PRs include:
- docs clarifying compiler-stage behavior
- small diagnostic improvements
- tests capturing confusing desugaring or MIR-visible behavior
In Plain English
The Rust compiler does not jump straight from your source code to machine code. It gradually translates the program into forms that make typing, borrowing, and optimization easier to reason about. That matters because many advanced Rust behaviors only make sense once you know which form the compiler is actually looking at.
What Invariant Is Rust Protecting Here?
Semantic meaning, ownership behavior, and dispatch rules must be made explicit enough at each stage for later analyses and optimizations to remain sound and effective.
If You Remember Only 3 Things
- HIR is where surface syntax has already been cleaned up for semantic analysis.
- MIR is where control flow and ownership become explicit enough for borrow checking.
- Monomorphization explains why generics are fast and why they cost compile time.
Memory Hook
The compiler pipeline is a series of increasingly disciplined blueprints: marketing sketch, architectural plan, wiring diagram, then machine-shop instructions.
Flashcard Deck
| Question | Answer |
|---|---|
| Why does Rust use multiple internal representations? | Different phases need different levels of semantic explicitness. |
| What stage does macro expansion affect before later analysis? | The syntax tree before later semantic stages operate on the expanded program. |
| What is HIR for in practice? | A desugared, analysis-friendly form for semantic checking. |
| What is MIR for in practice? | Explicit control flow, temporaries, drops, and borrow analysis. |
| Why does borrow checking happen on MIR? | Ownership and liveness are clearer on an explicit control-flow representation. |
| What is monomorphization? | Generating concrete instances of generic code for each used type. |
| What does LLVM mainly contribute? | Low-level optimization and machine-code generation. |
| Why does pipeline knowledge help with diagnostics? | It explains why compiler reasoning may differ from surface syntax intuition. |
Chapter Cheat Sheet
| Stage | Main job | Why it matters to you |
|---|---|---|
| parsing/expansion | turn syntax into expanded program | macro behavior |
| HIR | semantic-friendly lowered form | type and trait reasoning |
| MIR | explicit control flow and drops | borrow-checking intuition |
| monomorphization | concrete generic instances | performance and code size |
| LLVM/codegen | low-level optimization | final runtime shape |