Chapter 49: The `rustc` Compilation Pipeline

Prerequisites

Ch 21: Borrow Checker

You will understand

Lexing → parsing → HIR → MIR → LLVM IR → machine code
Where borrow checking happens in the pipeline
Why MIR matters for optimization and analysis

Reading time

35 min

+ 15 min exercises

Step 1 - The Problem

Without a mental model of the compiler pipeline, many advanced Rust phenomena feel disconnected:

why borrow checking sees code differently from surface syntax
why macro expansion changes what later phases operate on
why generics are zero-cost at runtime yet expensive for compile time
why diagnostics often refer to desugared or inferred structure

The pipeline view turns these from isolated facts into one story.

Step 2 - Rust’s Design Decision

Rust compiles through a sequence of increasingly semantic representations instead of one monolithic pass:

parsing and expansion
lowering to internal representations
type and trait reasoning
borrow checking and MIR optimizations
codegen through LLVM

Rust accepted:

a sophisticated compiler architecture
many internal representations

Rust refused:

trying to do all semantic work on raw syntax
collapsing high-level language guarantees into ad hoc backend heuristics

Step 3 - The Mental Model

Plain English rule: each compiler stage removes one kind of ambiguity and adds one kind of meaning.

Surface Rust is for humans. HIR is for semantic analysis. MIR is for control-flow and ownership analysis. LLVM IR is for low-level optimization and machine-code generation.

Step 4 - Minimal Code Example

Take this source:

#![allow(unused)]
fn main() {
for x in values {
    println!("{x}");
}
}

This is not how the compiler reasons about it in later stages. By HIR/MIR time, it has been desugared into iterator and control-flow machinery.

Step 5 - Walkthrough

High-level pipeline:

Source
  -> tokens
  -> AST
  -> expanded AST
  -> HIR
  -> MIR
  -> LLVM IR
  -> machine code

What each stage is really doing:

Parsing turns text into syntax structure.
Macro expansion rewrites macro-driven syntax into ordinary syntax trees.
Name resolution ties names to definitions.
HIR lowers away much syntactic sugar and becomes a better substrate for type checking.
Trait solving and type checking operate on this more semantic form.
MIR makes control flow, temporaries, and drops explicit.
Borrow checking and many mid-level optimizations operate on MIR.
Monomorphization creates concrete instantiations of generic code.
LLVM handles low-level optimization and machine-code emission.

The invariant is:

ownership, typing, and dispatch semantics must become explicit enough before the compiler can check or optimize them soundly.

The flagship tour: follow three lines of code through every representation the compiler builds on the way to machine code.

Interactive simulation (requires JavaScript): a for loop traced through the full rustc pipeline — token stream, AST, HIR desugaring with type checking, MIR control-flow graph where the borrow checker runs, LLVM IR optimization, and the final linked binary.

## Step 6 - Three-Level Explanation

The compiler does not check your program only as written. It gradually turns your code into simpler internal forms that are easier to analyze.

HIR matters because it is where a lot of semantic reasoning becomes clearer after desugaring.

MIR matters because:

control flow is explicit
drops are explicit
temporary lifetimes are clearer
borrow checking is more precise there than on raw syntax

Monomorphization matters because it explains why generic code is fast but can grow compile time and binary size.

The pipeline is also a design boundary system:

macro system before semantic analysis
HIR for language-level meaning
MIR for ownership/control-flow reasoning
backend IR for machine-level optimization

This separation lets Rust pursue strong source-level guarantees without forcing the backend to reconstruct ownership and borrow semantics from machine-ish code.

HIR, MIR, and Borrow Checking

HIR is where many surface conveniences have already been normalized.

MIR is closer to a control-flow graph with explicit temporaries, assignments, and drops. That is why borrow checking happens there: the compiler can see where values are live, where borrows start and end, and how control flow really branches.

This is also why some borrow-checker errors make more sense once you imagine the desugared form rather than the prettified source.

Trait Solving

Trait solving answers questions like:

which method implementation applies here?
does this type satisfy the required bound?
which associated type flows from this impl?

This is deeper than method lookup in many OO languages because traits interact with generics, blanket impls, associated types, and coherence.

For the handbook reader, the important point is not every internal algorithm detail. It is:

many “trait bound not satisfied” errors are the surface symptom of the compiler failing to prove a capability relationship in the current type environment.

Monomorphization and LLVM

Monomorphization turns:

#![allow(unused)]
fn main() {
fn max<T: Ord>(a: T, b: T) -> T { ... }
}

into concrete instances like:

max_i32
max_String

That is why generics can be zero-cost at runtime.

LLVM then optimizes the resulting concrete IR and emits machine code. Rust hands LLVM low-level work, but not the job of rediscovering Rust’s ownership or lifetime story. Those semantics were handled earlier.

Incremental Compilation

Large Rust builds would be intolerable without reuse. Incremental compilation lets the compiler avoid rebuilding every query result from scratch when only some inputs changed.

For practitioners, the practical takeaway is simple:

architectural boundaries matter for compile times too
generic-heavy and macro-heavy designs can shift compilation cost significantly

Step 7 - Common Misconceptions

Wrong model 1: “Borrow checking operates directly on my source text.”

Correction: it works on MIR after substantial lowering and explicit control-flow modeling.

Wrong model 2: “LLVM is responsible for all of Rust’s intelligence.”

Correction: LLVM is crucial for low-level optimization, but Rust’s safety and ownership reasoning happens earlier.

Wrong model 3: “Generics are fast because LLVM is magical.”

Correction: monomorphization gives LLVM concrete code to optimize.

Wrong model 4: “HIR and MIR are too internal to matter.”

Correction: understanding them makes compiler diagnostics and language behavior far more legible.

Step 8 - Real-World Pattern

This understanding pays off when:

reading compiler errors
debugging macro-heavy code
reasoning about generic performance
browsing rust-lang/rust
understanding why certain language proposals affect compiler complexity

Step 9 - Practice Block

Code Exercise

Take one for loop and manually explain what iterator and control-flow machinery it desugars into conceptually.

Code Reading Drill

Explain why borrow checking becomes easier on a control-flow graph than on raw source syntax.

Spot the Bug

Why is this misunderstanding wrong?

"LLVM handles borrow checking because it sees the low-level code."

Refactoring Drill

Take one confusing borrow error and restate it in MIR-style terms: owner, temporary, drop point, last use, and conflicting access.

Compiler Error Interpretation

If an error seems odd on the original source, ask: “what desugared or lowered form is the compiler probably reasoning about instead?”

Step 10 - Contribution Connection

After this chapter, you can:

read compiler docs with less intimidation
interpret borrow and trait errors with deeper structure
approach rust-lang/rust with a phase-based map
reason about compile-time versus runtime tradeoffs more clearly

Good first PRs include:

docs clarifying compiler-stage behavior
small diagnostic improvements
tests capturing confusing desugaring or MIR-visible behavior

In Plain English

The Rust compiler does not jump straight from your source code to machine code. It gradually translates the program into forms that make typing, borrowing, and optimization easier to reason about. That matters because many advanced Rust behaviors only make sense once you know which form the compiler is actually looking at.

What Invariant Is Rust Protecting Here?

Semantic meaning, ownership behavior, and dispatch rules must be made explicit enough at each stage for later analyses and optimizations to remain sound and effective.

If You Remember Only 3 Things

HIR is where surface syntax has already been cleaned up for semantic analysis.
MIR is where control flow and ownership become explicit enough for borrow checking.
Monomorphization explains why generics are fast and why they cost compile time.

Memory Hook

The compiler pipeline is a series of increasingly disciplined blueprints: marketing sketch, architectural plan, wiring diagram, then machine-shop instructions.

Flashcard Deck

Question	Answer
Why does Rust use multiple internal representations?	Different phases need different levels of semantic explicitness.
What stage does macro expansion affect before later analysis?	The syntax tree before later semantic stages operate on the expanded program.
What is HIR for in practice?	A desugared, analysis-friendly form for semantic checking.
What is MIR for in practice?	Explicit control flow, temporaries, drops, and borrow analysis.
Why does borrow checking happen on MIR?	Ownership and liveness are clearer on an explicit control-flow representation.
What is monomorphization?	Generating concrete instances of generic code for each used type.
What does LLVM mainly contribute?	Low-level optimization and machine-code generation.
Why does pipeline knowledge help with diagnostics?	It explains why compiler reasoning may differ from surface syntax intuition.

Chapter Cheat Sheet

Stage	Main job	Why it matters to you
parsing/expansion	turn syntax into expanded program	macro behavior
HIR	semantic-friendly lowered form	type and trait reasoning
MIR	explicit control flow and drops	borrow-checking intuition
monomorphization	concrete generic instances	performance and code size
LLVM/codegen	low-level optimization	final runtime shape

The Rust Mastery Handbook

Chapter 49: The `rustc` Compilation Pipeline

Prerequisites

You will understand

Reading time

Source Code Through Rustc's Internal Stages

Why the Compiler Sees More Structure Than You Wrote

Step 1 - The Problem

Step 2 - Rust’s Design Decision

Step 3 - The Mental Model

Step 4 - Minimal Code Example

Step 5 - Walkthrough

HIR, MIR, and Borrow Checking

Trait Solving

Monomorphization and LLVM

Incremental Compilation

Step 7 - Common Misconceptions

Step 8 - Real-World Pattern

Step 9 - Practice Block

Code Exercise

Code Reading Drill

Spot the Bug

Refactoring Drill

Compiler Error Interpretation

Step 10 - Contribution Connection

In Plain English

What Invariant Is Rust Protecting Here?

If You Remember Only 3 Things

Memory Hook

Flashcard Deck

Chapter Cheat Sheet

Keyboard shortcuts

The Rust Mastery Handbook

Prerequisites

You will understand

Reading time

Source Code Through Rustc's Internal Stages

Why the Compiler Sees More Structure Than You Wrote