Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 49: The rustc Compilation Pipeline

Prerequisites

You will understand

  • Lexing → parsing → HIR → MIR → LLVM IR → machine code
  • Where borrow checking happens in the pipeline
  • Why MIR matters for optimization and analysis

Reading time

35 min
+ 15 min exercises
Compilation Flow

Source Code Through Rustc's Internal Stages

source code AST and expansion HIR MIR borrow checking and control-flow analysis LLVM IR -> machine code
Desugaring Lens

Why the Compiler Sees More Structure Than You Wrote

source for x in values { println!(\"{x}\"); } conceptual lowered form let mut iter = IntoIterator::into_iter(values); loop { match iter.next() { Some(x) => ... None => break } } later stages reason about explicit temporaries, matches, and drops

Step 1 - The Problem

Without a mental model of the compiler pipeline, many advanced Rust phenomena feel disconnected:

  • why borrow checking sees code differently from surface syntax
  • why macro expansion changes what later phases operate on
  • why generics are zero-cost at runtime yet expensive for compile time
  • why diagnostics often refer to desugared or inferred structure

The pipeline view turns these from isolated facts into one story.

Step 2 - Rust’s Design Decision

Rust compiles through a sequence of increasingly semantic representations instead of one monolithic pass:

  • parsing and expansion
  • lowering to internal representations
  • type and trait reasoning
  • borrow checking and MIR optimizations
  • codegen through LLVM

Rust accepted:

  • a sophisticated compiler architecture
  • many internal representations

Rust refused:

  • trying to do all semantic work on raw syntax
  • collapsing high-level language guarantees into ad hoc backend heuristics

Step 3 - The Mental Model

Plain English rule: each compiler stage removes one kind of ambiguity and adds one kind of meaning.

Surface Rust is for humans. HIR is for semantic analysis. MIR is for control-flow and ownership analysis. LLVM IR is for low-level optimization and machine-code generation.

Step 4 - Minimal Code Example

Take this source:

#![allow(unused)]
fn main() {
for x in values {
    println!("{x}");
}
}

This is not how the compiler reasons about it in later stages. By HIR/MIR time, it has been desugared into iterator and control-flow machinery.

Step 5 - Walkthrough

High-level pipeline:

Source
  -> tokens
  -> AST
  -> expanded AST
  -> HIR
  -> MIR
  -> LLVM IR
  -> machine code

What each stage is really doing:

  1. Parsing turns text into syntax structure.
  2. Macro expansion rewrites macro-driven syntax into ordinary syntax trees.
  3. Name resolution ties names to definitions.
  4. HIR lowers away much syntactic sugar and becomes a better substrate for type checking.
  5. Trait solving and type checking operate on this more semantic form.
  6. MIR makes control flow, temporaries, and drops explicit.
  7. Borrow checking and many mid-level optimizations operate on MIR.
  8. Monomorphization creates concrete instantiations of generic code.
  9. LLVM handles low-level optimization and machine-code emission.

The invariant is:

ownership, typing, and dispatch semantics must become explicit enough before the compiler can check or optimize them soundly.

Step 6 - Three-Level Explanation

The compiler does not check your program only as written. It gradually turns your code into simpler internal forms that are easier to analyze.

HIR matters because it is where a lot of semantic reasoning becomes clearer after desugaring.

MIR matters because:

  • control flow is explicit
  • drops are explicit
  • temporary lifetimes are clearer
  • borrow checking is more precise there than on raw syntax

Monomorphization matters because it explains why generic code is fast but can grow compile time and binary size.

The pipeline is also a design boundary system:

  • macro system before semantic analysis
  • HIR for language-level meaning
  • MIR for ownership/control-flow reasoning
  • backend IR for machine-level optimization

This separation lets Rust pursue strong source-level guarantees without forcing the backend to reconstruct ownership and borrow semantics from machine-ish code.

HIR, MIR, and Borrow Checking

HIR is where many surface conveniences have already been normalized.

MIR is closer to a control-flow graph with explicit temporaries, assignments, and drops. That is why borrow checking happens there: the compiler can see where values are live, where borrows start and end, and how control flow really branches.

This is also why some borrow-checker errors make more sense once you imagine the desugared form rather than the prettified source.

Trait Solving

Trait solving answers questions like:

  • which method implementation applies here?
  • does this type satisfy the required bound?
  • which associated type flows from this impl?

This is deeper than method lookup in many OO languages because traits interact with generics, blanket impls, associated types, and coherence.

For the handbook reader, the important point is not every internal algorithm detail. It is:

many “trait bound not satisfied” errors are the surface symptom of the compiler failing to prove a capability relationship in the current type environment.

Monomorphization and LLVM

Monomorphization turns:

#![allow(unused)]
fn main() {
fn max<T: Ord>(a: T, b: T) -> T { ... }
}

into concrete instances like:

  • max_i32
  • max_String

That is why generics can be zero-cost at runtime.

LLVM then optimizes the resulting concrete IR and emits machine code. Rust hands LLVM low-level work, but not the job of rediscovering Rust’s ownership or lifetime story. Those semantics were handled earlier.

Incremental Compilation

Large Rust builds would be intolerable without reuse. Incremental compilation lets the compiler avoid rebuilding every query result from scratch when only some inputs changed.

For practitioners, the practical takeaway is simple:

  • architectural boundaries matter for compile times too
  • generic-heavy and macro-heavy designs can shift compilation cost significantly

Step 7 - Common Misconceptions

Wrong model 1: “Borrow checking operates directly on my source text.”

Correction: it works on MIR after substantial lowering and explicit control-flow modeling.

Wrong model 2: “LLVM is responsible for all of Rust’s intelligence.”

Correction: LLVM is crucial for low-level optimization, but Rust’s safety and ownership reasoning happens earlier.

Wrong model 3: “Generics are fast because LLVM is magical.”

Correction: monomorphization gives LLVM concrete code to optimize.

Wrong model 4: “HIR and MIR are too internal to matter.”

Correction: understanding them makes compiler diagnostics and language behavior far more legible.

Step 8 - Real-World Pattern

This understanding pays off when:

  • reading compiler errors
  • debugging macro-heavy code
  • reasoning about generic performance
  • browsing rust-lang/rust
  • understanding why certain language proposals affect compiler complexity

Step 9 - Practice Block

Code Exercise

Take one for loop and manually explain what iterator and control-flow machinery it desugars into conceptually.

Code Reading Drill

Explain why borrow checking becomes easier on a control-flow graph than on raw source syntax.

Spot the Bug

Why is this misunderstanding wrong?

"LLVM handles borrow checking because it sees the low-level code."

Refactoring Drill

Take one confusing borrow error and restate it in MIR-style terms: owner, temporary, drop point, last use, and conflicting access.

Compiler Error Interpretation

If an error seems odd on the original source, ask: “what desugared or lowered form is the compiler probably reasoning about instead?”

Step 10 - Contribution Connection

After this chapter, you can:

  • read compiler docs with less intimidation
  • interpret borrow and trait errors with deeper structure
  • approach rust-lang/rust with a phase-based map
  • reason about compile-time versus runtime tradeoffs more clearly

Good first PRs include:

  • docs clarifying compiler-stage behavior
  • small diagnostic improvements
  • tests capturing confusing desugaring or MIR-visible behavior

In Plain English

The Rust compiler does not jump straight from your source code to machine code. It gradually translates the program into forms that make typing, borrowing, and optimization easier to reason about. That matters because many advanced Rust behaviors only make sense once you know which form the compiler is actually looking at.

What Invariant Is Rust Protecting Here?

Semantic meaning, ownership behavior, and dispatch rules must be made explicit enough at each stage for later analyses and optimizations to remain sound and effective.

If You Remember Only 3 Things

  • HIR is where surface syntax has already been cleaned up for semantic analysis.
  • MIR is where control flow and ownership become explicit enough for borrow checking.
  • Monomorphization explains why generics are fast and why they cost compile time.

Memory Hook

The compiler pipeline is a series of increasingly disciplined blueprints: marketing sketch, architectural plan, wiring diagram, then machine-shop instructions.

Flashcard Deck

QuestionAnswer
Why does Rust use multiple internal representations?Different phases need different levels of semantic explicitness.
What stage does macro expansion affect before later analysis?The syntax tree before later semantic stages operate on the expanded program.
What is HIR for in practice?A desugared, analysis-friendly form for semantic checking.
What is MIR for in practice?Explicit control flow, temporaries, drops, and borrow analysis.
Why does borrow checking happen on MIR?Ownership and liveness are clearer on an explicit control-flow representation.
What is monomorphization?Generating concrete instances of generic code for each used type.
What does LLVM mainly contribute?Low-level optimization and machine-code generation.
Why does pipeline knowledge help with diagnostics?It explains why compiler reasoning may differ from surface syntax intuition.

Chapter Cheat Sheet

StageMain jobWhy it matters to you
parsing/expansionturn syntax into expanded programmacro behavior
HIRsemantic-friendly lowered formtype and trait reasoning
MIRexplicit control flow and dropsborrow-checking intuition
monomorphizationconcrete generic instancesperformance and code size
LLVM/codegenlow-level optimizationfinal runtime shape