Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Idiomatic Rust code guidance based on Apollo GraphQL's best practices handbook for ownership, errors, and performance.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
references/chapter_03.md
1# Chapter 3 - Performance Mindset23The **golden rule** of performance work:45> Don't guess, measure.67Rust code is often already pretty fast - don't "optimize" without evidence. Optimize only after finding bottlenecks.89### A good first steps10* Use `--release` flag on you builds (might sound dummy, but it is quite common to hear people complaining that their Rust code is slower than their X language code, and 99% of the time is because they didn't use the `--release` flag).11* `$ cargo clippy -- -D clippy::perf` gives you important tips on best practices for performance.12* [`cargo bench`](https://doc.rust-lang.org/cargo/commands/cargo-bench.html) is a cargo tool to create micro-benchmarks and test different code solutions. Write a test scenario and bench you solution against the original code, if your improvement is larger than 5%, might be a good performance improvement.13* [`cargo flamegraph`](https://github.com/flamegraph-rs/flamegraph) a powerful profiler for Rust code. For MacOS, [samply](https://github.com/mstange/samply) might be a better DX option.1415> #### Further reading on Benchmarking:16> - [How to build a Custom Benchmarking Harness in Rust](https://bencher.dev/learn/benchmarking/rust/custom-harness/)171819## 3.1 Flamegraph2021Flamegraph helps you visualize how much time CPU spent on each task.2223```shell24# Installing flamegraph25cargo install flamegraph2627# cargo support provided through the cargo-flamegraph binary!28# defaults to profiling cargo run --release29cargo flamegraph3031# by default, `--release` profile is used,32# but you can override this:33cargo flamegraph --dev3435# if you'd like to profile a specific binary:36cargo flamegraph --bin=stress23738# Profile unit tests.39# Note that a separating `--` is necessary if `--unit-test` is the last flag.40cargo flamegraph --unit-test -- test::in::package::with::single::crate41cargo flamegraph --unit-test crate_name -- test::in::package::with::multiple:crate4243# Profile integration tests.44cargo flamegraph --test test_name4546# Run criterion benchmark47# Note that the last --bench is required for `criterion 0.3` to run in benchmark mode, instead of test mode.48cargo flamegraph --bench some_benchmark --features some_features -- --bench4950# Run workspace example51cargo flamegraph --example some_example --features some_features52```5354> โ Always run your profiles with `--release` enabled, the `--dev` flag isn't realistic as it doesn't have optimizations enabled.5556The result will look like a flame graph where:5758* The `y-axis` shows the **stack depth number**. When looking at a flamegraph, the main function of your program will be closer to the bottom, and the called functions will be stacked on top, with the functions that they call stacked on top of them.5960* The `width of each box` shows the **total time that that function** is on the CPU or is part of the call stack. If a function's box is wider than others, that means that it consumes more CPU per execution than other functions, or that it is called more than other functions.6162> โ The **color of each box** isn't significant, and **is chosen at random**.6364### ๐จ Remember65* Thick stacks: heavy CPU usage66* Thin stacks: low intensity (cheap)6768## 3.2 Avoid Redundant Cloning6970> Cloning is cheap... **until it isn't**7172In sections [Borrowing over Cloning](./chapter_01.md#11-borrowing-over-cloning) and [Important Clippy lints to respect](./chapter_02.md#23-important-clippy-lints-to-respect) we mentioned the impacts of cloning and the relevant clippy lint [`redundant_clone`](https://rust-lang.github.io/rust-clippy/master/#redundant_clone), so in this section we will explore a bit "when to pass ownership".7374* ๐จ If you really need to clone, leave it to the last moment.7576### When to pass ownership?7778* Only `.clone()` if you truly need a new owned copy. A few examples:79* Crate API Design requires owned data.80* Have overloaded `std::ops` but still need ownership to the old data:81```rust82use std::ops::Add;8384#[derive(Debug, Copy, Clone, PartialEq)]85struct Point {86x: i32,87y: i32,88}8990impl Add for Point {91type Output = Self;9293fn add(self, other: Self) -> Self {94Self {95x: self.x + other.x,96y: self.y + other.y,97}98}99}100101assert_eq!(Point { x: 1, y: 0 } + Point { x: 2, y: 3 },102Point { x: 3, y: 3 });103```104* Need to do comparison snapshots or due to API you need multiple owned instances of the data.105```rust106fn snapshot(a: &MyValue, b:&MyValue) -> MyValueDiff {107a - b108}109110impl Sub for MyValue {111type Output = MyValueDiff;112113fn sub(self, other: Self) -> MyValue {114...115}116}117118fn main() {119let mut a = MyValue::default();120let b = a.clone();121122a.magical_update();123println!("{:?}", snapshot(&a, &b));124}125```126* You have reference counted pointers (`Arc, Rc`).127* You have small structs that are to big to `Copy` but as costly as `std::collections`. An example is HTTP client like `hyper_util::client::legacy::Client` that cloning allows you to share the connection pool.128* You have a chained struct modifier that needs owned mutation, some **builders** require owned mutation, but most custom builders can be done with `pub fn with_xyz(&mut self, value: Xyz) -> &mut Self`.129```rust130// Inline `HashMap` insertion extension131132fn insert_owned(mut self, key: K, value: V) -> Self {133self.insert(key, value);134self135}136```137* Ownership can also be a good way to model business logic / state. For example:138```rust139let not_validated: String = ...;// some user source140let validated = Validate::try_from(not_validated)?;141// Technically that `try_from` maybe didn't need ownership, but taking it lets us model intent142```143144### When **NOT** to pass ownership?145146* Prefer API designs that take reference (`fn process(values: &[T])`), instead of ownership (`fn process(values: Vec<T>)`).147* If you only need read access to elements, prefer `.iter` or slices:148```rust149for item in &some_vec {150...151}152```153* You need to mutate data that is owned by another thread, use `&mut MyStruct`.154155### Use `Cow` for `Maybe Owned` data156157Sometimes you don't actually need owned data, but that is not clear from the API perspective, so using [`std::borrow::Cow`](https://doc.rust-lang.org/std/borrow/enum.Cow.html) is a way to efficiently address this case:158159```rust160use std::borrow::Cow;161162fn hello_greet(name: Cow<'_, str>) {163println!("Hello {name}");164}165166hello_greet(Cow::Borrowed("Julia"));167hello_greet(Cow::Owned("Naomi".to_string()));168```169170## 3.3 Stack vs Heap: Be size-smart!171172### โ Good Practices173174* Keep small types (`impl Copy`, `usize`, `bool`, etc) **on the stack**.175* Avoid passing huge types (`> 512 bytes`) by value or transferring ownership. Prefer pass by reference (e.g. `&T` and `&mut T`).176* Heap allocate recursive data structures:177```rust178enum OctreeNode<T> {179Node(T),180Children(Box<[Node<T>; 8]>),181}182```183* Return small types by value, types that implement `Copy` or a cheaply Cloned are efficient to return by value (e.g. `struct Vector2 {x: f32, y: f32}`).184185### โ Be Mindful186187* Only use `#[inline]` when benchmark proves beneficial, Rust is already pretty good at inlining **without** hints.188* Avoid massive stack allocations, box them. Example `let buffer: Box<[u8; 65536]> = Box::new(..)` would first allocate `[u8; 65536]` on the stack then box it, a non-const solution to this would be `let buffer: Box<[u8]> = vec![0; 65536].into_boxed_slice()`.189* For large `const` arrays, considering using [crate smallvec](https://docs.rs/smallvec/latest/smallvec/) as it behaves like an array, but is smart enough to allocate large arrays on the heap.190191## 3.4 Iterators and Zero-Cost Abstractions192193Rust iterators are lazy, but eventually compiled away into very efficient tight loops that are only called when consumed. Chaining `.filter()`, `.map()`, `.rev()`, `.skip()`, `.take()`, `.collect()` usually doesn't cost extra and the compiler can reason well enough how to optimize them.194* Prefer `iterators` over manual `for` loops when working with collections, the compiler can optimize them better than manually doing it.195* Calling `.iter()` only creates a **reference** to the original collection, this allows you to hold multiple iterators of the same collection.196197#### โ Avoid creating intermediate collections unless it is really needed:198199* Consider that `process` accepts an `iterator`.200* โ BAD - useless intermediate collection:201```rust202let doubled: Vec<_> = items.iter().map(|x| x * 2).collect();203process(doubled);204```205* โ GOOD - pass the iterator (`fn process(arg: impl Iterator<Item = T>)`):206```rust207let doubled_iter = items.iter().map(|x| x * 2);208process(doubled_iter);209```210