Zero-Copy Parsing in Rust

Zero-Copy Parsing in Rust

3 min readrustperformancesystems

One of Rust's most powerful patterns is zero-copy parsing: analyzing structured data by borrowing slices of the original input rather than allocating new strings. The lifetime system makes this both safe and ergonomic in a way no other mainstream language can match.

The Core Idea

Traditional parsing reads input bytes, allocates new strings for each field, and returns owned data. Zero-copy parsing returns references into the original buffer.

// Traditional: allocates a new String for each field
struct HeaderOwned {
    method: String,      // 24 bytes + heap allocation
    path: String,        // 24 bytes + heap allocation
    version: String,     // 24 bytes + heap allocation
}
 
// Zero-copy: borrows slices from the input buffer
struct Header<'a> {
    method: &'a str,     // 16 bytes, no allocation
    path: &'a str,       // 16 bytes, no allocation
    version: &'a str,    // 16 bytes, no allocation
}

The 'a lifetime tells the compiler: "this Header cannot outlive the buffer it was parsed from." This is enforced at compile time with zero runtime cost.

Zero-copy vs owned allocation memory layout
Zero-copy vs owned allocation memory layout

Parsing with nom

The nom crate is the standard library for parser combinators in Rust. It works naturally with zero-copy parsing:

use nom::{
    bytes::complete::{tag, take_until, take_while1},
    character::complete::{char, space1},
    sequence::{terminated, tuple},
    IResult,
};
 
fn parse_request_line(input: &str) -> IResult<&str, Header<'_>> {
    let (input, (method, _, path, _, version)) = tuple((
        take_while1(|c: char| c.is_ascii_uppercase()),
        space1,
        take_until(" "),
        space1,
        terminated(take_until("\r"), tag("\r\n")),
    ))(input)?;
 
    Ok((input, Header { method, path, version }))
}
 
fn parse_header_field(input: &str) -> IResult<&str, (&str, &str)> {
    let (input, name) = take_until(":")(input)?;
    let (input, _) = tag(": ")(input)?;
    let (input, value) = terminated(take_until("\r"), tag("\r\n"))(input)?;
    Ok((input, (name, value)))
}

Every parsed value is a &str — a pointer and length into the original input. No heap allocations at all.

Benchmarks

I benchmarked parsing a 1KB HTTP request with headers using three strategies on an M3 MacBook Pro:

StrategyThroughputAllocations per parseMemory per parse
Owned (String)2.1M ops/sec12847 bytes
Zero-copy (nom)8.7M ops/sec00 bytes
Regex0.4M ops/sec82,104 bytes

Zero-copy parsing is 4x faster than allocating owned strings and 21x faster than regex-based parsing. For high-throughput services parsing millions of requests, this is the difference between needing 4 servers and needing 1.

Parser throughput benchmark comparison
Parser throughput benchmark comparison

When Zero-Copy Shines

Zero-copy parsing is most valuable when:

  • Input data is large and you only need small slices of it (log parsing, protocol headers)
  • Throughput is critical and allocation overhead is measurable
  • Parsed data is short-lived — you process it and discard it within the same scope
  • The input buffer is contiguous in memory
// Perfect use case: parse log line, extract fields, aggregate, discard
fn process_log_batch(raw: &str) -> Stats {
    let mut stats = Stats::default();
    for line in raw.lines() {
        if let Ok((_, entry)) = parse_log_entry(line) {
            stats.record(entry.level, entry.latency_ms);
            // entry borrows from line, which borrows from raw
            // everything is freed when this iteration ends
        }
    }
    stats
}

When to Use Owned Data Instead

Zero-copy isn't always the answer. Use owned data when:

  • Parsed data needs to outlive the input (storing results in a database or cache)
  • You need to modify the parsed values (case normalization, trimming)
  • The input arrives in chunks (streaming protocols where you can't hold the full buffer)
// When you need owned data, convert explicitly at the boundary
impl<'a> Header<'a> {
    fn to_owned(&self) -> HeaderOwned {
        HeaderOwned {
            method: self.method.to_string(),
            path: self.path.to_string(),
            version: self.version.to_string(),
        }
    }
}

The decision is mechanical: if the data's lifetime fits within the input's lifetime, go zero-copy. If it doesn't, own the data. Rust's type system makes the wrong choice a compile error, not a runtime bug.

The Broader Lesson

Zero-copy parsing is a specific technique, but the broader lesson applies everywhere: don't allocate memory you don't need. Rust makes this easy because the type system tracks ownership. In other languages, the same principle applies — you just have to be more disciplined about it.

Dopey

Written by Dopey

Just one letter away from being Dope.

Discussion3

Cold Penguin19d ago

That benchmark table is wild. 21x faster than regex. We're using zero-copy parsing for our log ingestion pipeline now.

Yucky Herring18d ago

How do you handle the case where parsed data needs to outlive the buffer? Do you convert to owned lazily or eagerly?

Motionless Tick7d ago

well

Subscribe above to join the conversation.