Welcome to Rust

(Adapted from Comprehensive Rust) 🩀

The goal of the course is to teach you Rust. We assume you don’t know anything about Rust and hope to:

  • Give you a comprehensive understanding of the Rust syntax and language.
  • Enable you to modify existing programs
  • Write new programs in Rust for embedded systems
  • Show interoperability with C

Assumptions

The course assumes that you already know how to program. Rust is a statically typed language and we will sometimes make comparisons with C and C++ to better explain or contrast the Rust approach.

If you know how to program in a dynamically typed language such as Python or JavaScript, then you will be able to follow along just fine too.

Course Structure

The course is fast paced and covers a lot of ground:

  • Course 1: Basic Rust.
  • Course 2: Ownership and the borrow checker.
  • Course 3: Compound data types, pattern matching.
  • Course 4: The standard library.
  • Course 5: Traits and generics, error handling
  • Course 6: Testing, unsafe Rust.
  • Course 7: Concurrency in Rust and interoperability with other languages

Keyboard Shortcuts

There are several useful keyboard shortcuts in mdBook:

  • Arrow-Left: Navigate to the previous page.
  • Arrow-Right: Navigate to the next page.
  • Ctrl + Enter: Execute the code sample that has focus.
  • s: Activate the search bar.

Using Cargo

When you start reading about Rust, you will soon meet Cargo, the standard tool used in the Rust ecosystem to build and run Rust applications. Here we want to give a brief overview of what Cargo is and how it fits into the wider ecosystem and how it fits into this training.

Installation

You can follow the instructions to install cargo and rust compiler, among other standard ecosystem tools with the rustup tool, which is maintained by the Rust Foundation.

Along with cargo and rustc, Rustup will install itself as a command line utility that you can use to install/switch toolchains, setup cross compilation, etc.

Package Managers

Debian

On Debian/Ubuntu, you can install Cargo, the Rust source and the Rust formatter with

$ sudo apt install cargo rust-src rustfmt

This will allow rust-analyzer to jump to the definitions. We suggest using VS Code to edit the code (but any LSP compatible editor works).

Some folks also like to use the JetBrains family of IDEs, which do their own analysis but have their own tradeoffs. If you prefer them, you can install the Rust Plugin. Please take note that as of January 2023 debugging only works on the CLion version of the JetBrains IDEA suite.

The Rust Ecosystem

The Rust ecosystem consists of a number of tools, of which the main ones are:

  • rustc: the Rust compiler which turns .rs files into binaries and other intermediate formats.

  • cargo: the Rust dependency manager and build tool. Cargo knows how to download dependencies hosted on https://crates.io and it will pass them to rustc when building your project. Cargo also comes with a built-in test runner which is used to execute unit tests.

  • rustup: the Rust toolchain installer and updater. This tool is used to install and update rustc and cargo when new versions of Rust is released. In addition, rustup can also download documentation for the standard library. You can have multiple versions of Rust installed at once and rustup will let you switch between them as needed.

Key points:

  • Rust has a rapid release schedule with a new release coming out every six weeks. New releases maintain backwards compatibility with old releases — plus they enable new functionality.

  • There are three release channels: “stable”, “beta”, and “nightly”.

  • New features are being tested on “nightly”, “beta” is what becomes “stable” every six weeks.

  • Rust also has editions: the current edition is Rust 2021. Previous editions were Rust 2015 and Rust 2018.

    • The editions are allowed to make backwards incompatible changes to the language.

    • To prevent breaking code, editions are opt-in: you select the edition for your crate via the Cargo.toml file.

    • To avoid splitting the ecosystem, Rust compilers can mix code written for different editions.

    • Mention that it is quite rare to ever use the compiler directly not through cargo (most users never do).

    • It might be worth alluding that Cargo itself is an extremely powerful and comprehensive tool. It is capable of many advanced features including but not limited to:

    • Read more from the official Cargo Book

Code Samples in This Training

For this training, we will mostly explore the Rust language through examples which can be executed through your browser. This makes the setup much easier and ensures a consistent experience for everyone.

Installing Cargo is still encouraged: it will make it easier for you to do the exercises. On the last day, we will do a larger exercise which shows you how to work with dependencies and for that you need Cargo.

The code blocks in this course are fully interactive:

fn main() {
    println!("Edit me!");
}

You can use Ctrl + Enter to execute the code when focus is in the text box.

Most code samples are editable like shown above. A few code samples are not editable for various reasons:

  • The embedded playgrounds cannot execute unit tests. Copy-paste the code and open it in the real Playground to demonstrate unit tests.

  • The embedded playgrounds lose their state the moment you navigate away from the page! This is the reason that the students should solve the exercises using a local Rust installation or via the Playground.

Running Code Locally with Cargo

If you want to experiment with the code on your own system, then you will need to first install Rust. Do this by following the instructions in the Rust Book. This should give you a working rustc and cargo. At the time of writing, the latest stable Rust release has these version numbers:

% rustc --version
rustc 1.61.0 (fe5b13d68 2022-05-18)
% cargo --version
cargo 1.61.0 (a028ae4 2022-04-29)

With this is in place, then follow these steps to build a Rust binary from one of the examples in this training:

  1. Click the “Copy to clipboard” button on the example you want to copy.

  2. Use cargo new exercise to create a new exercise/ directory for your code:

    $ cargo new exercise
         Created binary (application) `exercise` package
    
  3. Navigate into exercise/ and use cargo run to build and run your binary:

    $ cd exercise
    $ cargo run
       Compiling exercise v0.1.0 (/home/mgeisler/tmp/exercise)
        Finished dev [unoptimized + debuginfo] target(s) in 0.75s
         Running `target/debug/exercise`
    Hello, world!
    
  4. Replace the boiler-plate code in src/main.rs with your own code. For example, using the example on the previous page, make src/main.rs look like

    fn main() {
        println!("Edit me!");
    }
  5. Use cargo run to build and run your updated binary:

    $ cargo run
       Compiling exercise v0.1.0 (/home/mgeisler/tmp/exercise)
        Finished dev [unoptimized + debuginfo] target(s) in 0.24s
         Running `target/debug/exercise`
    Edit me!
    
  6. Use cargo check to quickly check your project for errors, use cargo build to compile it without running it. You will find the output in target/debug/ for a normal debug build. Use cargo build --release to produce an optimized release build in target/release/.

  7. You can add dependencies for your project by editing Cargo.toml. When you run cargo commands, it will automatically download and compile missing dependencies for you.

Try to encourage the class participants to install Cargo and use a local editor. It will make their life easier since they will have a normal development environment.

Welcome to course 1

In this first session we will cover:

  • Basic Rust syntax: variables, scalar and compound types, enums, structs, references, functions, and methods.

What is Rust?

Rust is a new programming language which had its 1.0 release in 2015:

  • Rust is a statically compiled language in a similar role as C++
    • rustc uses LLVM as its backend.
  • Rust supports many platforms and architectures:
    • x86, ARM, WebAssembly, 

    • Linux, Mac, Windows, 

  • Rust is used for a wide range of devices:
    • firmware and boot loaders,
    • smart displays,
    • mobile phones,
    • desktops,
    • servers.

Rust fits in the same area as C++:

  • High flexibility.
  • High level of control.
  • Can be scaled down to very constrained devices like mobile phones.
  • Has no runtime or garbage collection.
  • Focuses on reliability and safety without sacrificing performance.

Hello World!

Let us jump into the simplest possible Rust program, a classic Hello World program:

fn main() {
    println!("Hello 🌍!");
}

What you see:

  • Functions are introduced with fn.
  • Blocks are delimited by curly braces like in C and C++.
  • The main function is the entry point of the program.
  • Rust has hygienic macros, println! is an example of this.
  • Rust strings are UTF-8 encoded and can contain any Unicode character.

This slide tries to make the students comfortable with Rust code. They will see a ton of it over the next four days so we start small with something familiar.

Key points:

  • Rust is very much like other languages in the C/C++/Java tradition. It is imperative (not functional) and it doesn’t try to reinvent things unless absolutely necessary.

  • Rust is modern with full support for things like Unicode.

  • Rust uses macros for situations where you want to have a variable number of arguments (no function overloading).

Small Example

Here is a small example program in Rust:

fn main() {              // Program entry point
    let mut x: i32 = 6;  // Mutable variable binding
    print!("{x}");       // Macro for printing, like printf
    while x != 1 {       // No parenthesis around expression
        if x % 2 == 0 {  // Math like in other languages
            x = x / 2;
        } else {
            x = 3 * x + 1;
        }
        print!(" -> {x}");
    }
    println!();
}

The code implements the Collatz conjecture: it is believed that the loop will always end, but this is not yet proved. Edit the code and play with different inputs.

Key points:

  • Explain that all variables are statically typed. Try removing i32 to trigger type inference. Try with i8 instead and trigger a runtime integer overflow.

  • Change let mut x to let x, discuss the compiler error.

  • Show how print! gives a compilation error if the arguments don’t match the format string.

  • Show how you need to use {} as a placeholder if you want to print an expression which is more complex than just a single variable.

  • Show the students the standard library, show them how to search for std::fmt which has the rules of the formatting mini-language. It’s important that the students become familiar with searching in the standard library.

Why Rust?

Some unique selling points of Rust:

  • Compile time memory safety.
  • Lack of undefined runtime behavior.
  • Modern language features.

Make sure to ask the class which languages they have experience with. Depending on the answer you can highlight different features of Rust:

  • Experience with C or C++: Rust eliminates a whole class of runtime errors via the borrow checker. You get performance like in C and C++, but you don’t have the memory unsafety issues. In addition, you get a modern language with constructs like pattern matching and built-in dependency management.

  • Experience with Java, Go, Python, JavaScript
: You get the same memory safety as in those languages, plus a similar high-level language feeling. In addition you get fast and predictable performance like C and C++ (no garbage collector) as well as access to low-level hardware (should you need it)

Compile Time Guarantees

Static memory management at compile time:

  • No uninitialized variables.
  • No memory leaks (mostly, see notes).
  • No double-frees.
  • No use-after-free.
  • No NULL pointers.
  • No forgotten locked mutexes.
  • No data races between threads.
  • No iterator invalidation.

For the purpose of this course, “No memory leaks” should be understood as “Pretty much no accidental memory leaks”.

It is possible to produce memory leaks in (safe) Rust. Some examples are:

  • You can for example use Box::leak to leak a pointer. A use of this could be to get runtime-initialized and runtime-sized static variables
  • You can use std::mem::forget to make the compiler “forget” about a value (meaning the destructor is never run).
  • You can also accidentally create a reference cycle with Rc or Arc.
  • In fact, some will consider infinitely populating a collection a memory leak and Rust does not protect from those.

Runtime Guarantees

No undefined behavior at runtime:

  • Array access is bounds checked.
  • Integer overflow is defined.

Key points:

  • Integer overflow is defined via a compile-time flag. The options are either a panic (a controlled crash of the program) or wrap-around semantics. By default, you get panics in debug mode (cargo build) and wrap-around in release mode (cargo build --release).

  • Bounds checking cannot be disabled with a compiler flag. It can also not be disabled directly with the unsafe keyword. However, unsafe allows you to call functions such as slice::get_unchecked which does not do bounds checking.

Modern Features

Rust is built with all the experience gained in the last 40 years.

Tooling

  • Great compiler errors.
  • Built-in dependency manager.
  • Built-in support for testing.
  • Excellent Language Server Protocol support.

Key points:

  • Remind people to read the errors — many developers have gotten used to ignore lengthy compiler output. The Rust compiler is significantly more talkative than other compilers. It will often provide you with actionable feedback, ready to copy-paste into your code.

  • The Rust standard library is small compared to languages like Java, Python, and Go. Rust does not come with several things you might consider standard and essential:

    • a random number generator, but see rand.
    • support for SSL or TLS, but see rusttls.
    • support for JSON, but see serde_json. The reasoning behind this is that functionality in the standard library cannot go away, so it has to be very stable. For the examples above, the Rust community is still working on finding the best solution — and perhaps there isn’t a single “best solution” for some of these things.

    Rust comes with a built-in package manager in the form of Cargo and this makes it trivial to download and compile third-party crates. A consequence of this is that the standard library can be smaller.

    Discovering good third-party crates can be a problem. Sites like https://lib.rs/ help with this by letting you compare health metrics for crates to find a good and trusted one.

  • rust-analyzer is a well supported LSP implementation used in major IDEs and text editors.

Basic Syntax

Much of the Rust syntax will be familiar to you from C, C++ or Java:

  • Blocks and scopes are delimited by curly braces.
  • Line comments are started with //, block comments are delimited by /* ... */.
  • Keywords like if and while work the same.
  • Variable assignment is done with =, comparison is done with ==.

If expressions

You use if expressions exactly like if statements in other languages:

fn main() {
    let x = 10;
    if x < 20 {
        println!("small");
    } else if x < 100 {
        println!("biggish");
    } else {
        println!("huge");
    }
}

In addition, you can use if as an expression. The last expression of each block becomes the value of the if expression:

fn main() {
    let x = 10;
    let size = if x < 20 { "small" } else { "large" };
    println!("number size: {}", size);
}

Because if is an expression and must have a particular type, both of its branch blocks must have the same type. Show what happens if you add ; after "small" in the second example.

When if is used in an expression, the expression must have a ; to separate it from the next statement. Remove the ; before println! to see the compiler error.

Scalar Types

TypesLiterals
Signed integersi8, i16, i32, i64, i128, isize-10, 0, 1_000, 123i64
Unsigned integersu8, u16, u32, u64, u128, usize0, 123, 10u16
Floating point numbersf32, f643.14, -10.0e20, 2f32
Strings&str"foo", r#"\\"#
Unicode scalar valueschar'a', 'α', '∞'
Byte strings&[u8]b"abc", br#" " "#
Booleansbooltrue, false

The types have widths as follows:

  • iN, uN, and fN are N bits wide,
  • isize and usize are the width of a pointer,
  • char is 32 bit wide,
  • bool is 8 bit wide.

r is used to denote raw string literals. Raw string literals do not process any escapes. It is followed by (#)+, then ", the litteral, " and (#)+, where + means one or more occurences.

r#""foo""# stands for "foo".

r##"foo #"# bar"## stands for foo #"# bar. Each character in a raw string literal is represented as a Unicode scalar value. r#"Hello, "Rust"!"# is a string that includes the characters H, e, l, l, o, ,, , ", R, u, s, t, ", !.

b#"..."# denotes a byte string, ie a sequence of bytes. b"hello" is equivalent to [104, 101, 108, 108, 111] (ASCII values for ‘h’, ‘e’, ‘l’, ‘l’, ‘o’).

br#"..."# is for a raw byte string.

Compound Types

TypesLiterals
Arrays[T; N][20, 30, 40], [0; 3]
Tuples(), (T,), (T1, T2), 
(), ('x',), ('x', 1.2), 


Array assignment and access:

fn main() {
    let mut a: [i8; 10] = [42; 10];
    a[5] = 0;
    println!("a: {:?}", a);
}

Tuple assignment and access:

fn main() {
    let t: (i8, bool) = (7, true);
    println!("1st index: {}", t.0);
    println!("2nd index: {}", t.1);
}

Key points:

Arrays:

  • Arrays have elements of the same type, T, and length, N, which is a compile-time constant. Note that the length of the array is part of its type, which means that [u8; 3] and [u8; 4] are considered two different types.

  • We can use literals to assign values to arrays.

  • In the main function, the print statement asks for the debug implementation with the ? format parameter: {} gives the default output, {:?} gives the debug output. We could also have used {a} and {a:?} without specifying the value after the format string.

  • Adding #, eg {a:#?}, invokes a “pretty printing” format, which can be easier to read.

Tuples:

  • Like arrays, tuples have a fixed length.

  • Tuples group together values of different types into a compound type.

  • Fields of a tuple can be accessed by the period and the index of the value, e.g. t.0, t.1.

  • The empty tuple () is also known as the “unit type”. It is both a type, and the only valid value of that type - that is to say both the type and its value are expressed as (). It is used to indicate, for example, that a function or expression has no return value, as we’ll see in a future slide.

    • You can think of it as void that can be familiar to you from other programming languages.

T is a generic type

References

Like C++, Rust has references:

fn main() {
    let mut x: i32 = 10;
    let ref_x: &mut i32 = &mut x;
    *ref_x = 20;
    println!("x: {x}");
}

Some notes:

  • We must dereference ref_x when assigning to it, similar to C and C++ pointers.
  • Rust will auto-dereference in some cases, in particular when invoking methods (try ref_x.count_ones()).
  • References that are declared as mut can be bound to different values over their lifetime.
Key points:
  • Be sure to note the difference between let mut ref_x: &i32 and let ref_x: &mut i32. The first one represents a mutable reference which can be bound to different values, while the second represents a reference to a mutable value.

Dangling References

Rust will statically forbid dangling references:

fn main() {
    let ref_x: &i32;
    {
        let x: i32 = 10;
        ref_x = &x;
    }
    println!("ref_x: {ref_x}");
}
  • A reference is said to “borrow” the value it refers to.
  • Rust is tracking the lifetimes of all references to ensure they live long enough.
  • We will talk more about borrowing when we get to ownership.

Slices

A slice gives you a view into a larger collection:

fn main() {
    let a: [i32; 6] = [10, 20, 30, 40, 50, 60];
    println!("a: {a:?}");

    let s: &[i32] = &a[2..4];
    println!("s: {s:?}");
}
  • Slices borrow data from the sliced type.
  • Question: What happens if you modify a[3]?
  • We create a slice by borrowing a and specifying the starting and ending indexes in brackets.

  • If the slice starts at index 0, Rust’s range syntax allows us to drop the starting index, meaning that &a[0..a.len()] and &a[..a.len()] are identical.

  • The same is true for the last index, so &a[2..a.len()] and &a[2..] are identical.

  • To easily create a slice of the full array, we can therefore use &a[..].

  • s is a reference to a slice of i32s. Notice that the type of s (&[i32]) no longer mentions the array length. This allows us to perform computation on slices of different sizes.

  • Slices always borrow from another object. In this example, a has to remain ‘alive’ (in scope) for at least as long as our slice.

  • The question about modifying a[3] can spark an interesting discussion, but the answer is that for memory safety reasons you cannot do it through a after you created a slice, but you can read the data from both a and s safely. More details will be explained in the borrow checker section.

    // We cannot change a[3] after it has been borrowed by s.
    fn main() {
      let mut a: [i32; 6] = [10, 20, 30, 40, 50, 60];
      println!("a: {a:?}");
    
      let s: &mut [i32] = &mut a[2..4];
      s[0] = 1;
      println!("s: {s:?}");
    }
    
    // but we can still read the content of a
    fn main() {
      let mut a: [i32; 6] = [10, 20, 30, 40, 50, 60];
      println!("a: {a:?}");
    
      let s: &mut [i32] = &mut a[2..4];
      s[0] = 1;
      println!("a: {a:?}");
    }
    

String vs str

We can now understand the two string types in Rust:

fn main() {
    let s1: &str = "World";
    println!("s1: {s1}");

    let mut s2: String = String::from("Hello ");
    println!("s2: {s2}");
    s2.push_str(s1);
    println!("s2: {s2}");
    
    let s3: &str = &s2[6..];
    println!("s3: {s3}");
}

Rust terminology:

  • &str an immutable reference to a string slice.
  • String a mutable string buffer.
  • &str introduces a string slice, which is an immutable reference to UTF-8 encoded string data stored in a block of memory. String literals (”Hello”), are stored in the program’s binary.

  • Rust’s String type is a wrapper around a vector of bytes. As with a Vec<T>, it is owned. Vec<T> is an array that changes size dynamically.

  • As with many other types String::from() creates a string from a string literal; String::new() creates a new empty string, to which string data can be added using the push() and push_str() methods.

  • The format!() macro is a convenient way to generate an owned string from dynamic values. It accepts the same format specification as println!().

  • You can borrow &str slices from String via & and optionally range selection.

  • For C++ programmers: think of &str as const char* from C++, but the one that always points to a valid string in memory. Rust String is a rough equivalent of std::string from C++ (main difference: it can only contain UTF-8 encoded bytes and will never use a small-string optimization).

  • How do I modify s1? Here is an example built by iteratively following the guidelines of the compiler.

    fn main() {
        let mut s1: &str = "World";
        println!("s1: {s1}");
        let binding  = &(s1.to_owned() + " hobbit");
        s1 = &binding;
    
        let mut s2: String = String::from("Hello ");
        println!("s2: {s2}");
        s2.push_str(s1);
        println!("s2: {s2}");
      
        let s3: &str = &s2[6..];
        println!("s3: {s3}");
    }
    fn to_owned(&self) -> Self::Owned
    

    Creates owned data from borrowed data, usually by cloning.

Functions

A Rust version of the famous FizzBuzz interview question:

fn main() {
    fizzbuzz_to(20);   // Defined below, no forward declaration needed
}

fn is_divisible_by(lhs: u32, rhs: u32) -> bool {
    if rhs == 0 {
        return false;  // Corner case, early return
    }
    lhs % rhs == 0     // The last expression in a block is the return value
}

fn fizzbuzz(n: u32) -> () {  // No return value means returning the unit type `()`
    match (is_divisible_by(n, 3), is_divisible_by(n, 5)) {
        (true,  true)  => println!("fizzbuzz"),
        (true,  false) => println!("fizz"),
        (false, true)  => println!("buzz"),
        (false, false) => println!("{n}"),
    }
}

fn fizzbuzz_to(n: u32) {  // `-> ()` is normally omitted
    for i in 1..=n {
        fizzbuzz(i);
    }
}

In FizzBuzz, players take turns to count incrementally, replacing any number divisible by three with the word “fizz”, and any number divisible by five with the word “buzz”, and any number divisible by both 3 and 5 with the word “fizzbuzz”.

  • We refer in main to a function written below. Neither forward declarations nor headers are necessary.

  • Declaration parameters are followed by a type (the reverse of some programming languages), then a return type.

  • The last expression in a function body (or any block) becomes the return value. Simply omit the ; at the end of the expression.

  • Some functions have no return value, and return the ‘unit type’, (). The compiler will infer this if the -> () return type is omitted.

  • The range expression in the for loop in fizzbuzz_to() contains =n, which causes it to include the upper bound.

  • The match expression in fizzbuzz() is doing a lot of work. It is expanded below to show what is happening.

    (Type annotations added for clarity, but they can be elided.)

    let by_3: bool = is_divisible_by(n, 3);
    let by_5: bool = is_divisible_by(n, 5);
    let by_35: (bool, bool) = (by_3, by_5);
    match by_35 {
      // ...

Methods

Rust has methods. They are simply functions that are associated with a particular type. The first argument of a method is an instance of the type it is associated with:

struct Rectangle {
    width: u32,
    height: u32,
}

impl Rectangle {
    fn area(&self) -> u32 {
        self.width * self.height
    }

    fn inc_width(&mut self, delta: u32) {
        self.width += delta;
    }
}

fn main() {
    let mut rect = Rectangle { width: 10, height: 5 };
    println!("old area: {}", rect.area());
    rect.inc_width(5);
    println!("new area: {}", rect.area());
}
  • We will look much more at methods in this class’ exercise and in the next class.

Function Overloading

Overloading is not supported:

  • Each function has a single implementation:
    • Always takes a fixed number of parameters.
    • Always takes a single set of parameter types.
  • Default values are not supported:
    • All call sites have the same number of arguments.
    • Macros are sometimes used as an alternative.

However, function parameters can be generic:

fn pick_one<T>(a: T, b: T) -> T {
    if std::process::id() % 2 == 0 { a } else { b }
}

fn main() {
    println!("coin toss: {}", pick_one("heads", "tails"));
    println!("cash prize: {}", pick_one(500, 1000));
}
  • When using generics, the standard library’s Into<T> can provide a kind of limited polymorphism on argument types. We will see more details in a later section.

  • std::process::id() returns the pid of the current process

Course 1: Exercises

In these exercises, we will explore two parts of Rust:

  • Implicit conversions between types.

  • Arrays and for loops.

A few things to consider while solving the exercises:

  • Use a local Rust installation, if possible. This way you can get auto-completion in your editor. See the page about Using Cargo for details on installing Rust.

  • Alternatively, use the Rust Playground.

The code snippets are not editable on purpose: the inline code snippets lose their state if you navigate away from the page.

Implicit Conversions

Rust will not automatically apply implicit conversions between types (unlike C++). You can see this in a program like this:

fn multiply(x: i16, y: i16) -> i16 {
    x * y
}

fn main() {
    let x: i8 = 15;
    let y: i16 = 1000;

    println!("{x} * {y} = {}", multiply(x, y));
}

The Rust integer types all implement the From<T> and Into<T> traits to let us convert between them. The From<T> trait has a single from() method and similarly, the Into<T> trait has a single into() method. Implementing these traits is how a type expresses that it can be converted into another type.

The standard library has an implementation of From<i8> for i16, which means that we can convert a variable x of type i8 to an i16 by calling i16::from(x). Or, simpler, with x.into(), because From<i8> for i16 implementation automatically create an implementation of Into<i16> for i8.

The same applies for your own From implementations for your own types, so it is sufficient to only implement From to get a respective Into implementation automatically.

  1. Execute the above program and look at the compiler error.

  2. Update the code above to use into() to do the conversion.

  3. Change the types of x and y to other things (such as f32, bool, i128) to see which types you can convert to which other types. Try converting small types to big types and the other way around. Check the standard library documentation to see if From<T> is implemented for the pairs you check.

Matrix multiplication

Your task is to write a function that performs matrix multiplication in Rust. The function signature should look like this:

fn multiply_matrices(a: &Vec<Vec<i32>>, b: &Vec<Vec<i32>>) -> Vec<Vec<i32>> {
    // Your code
}

Your code should check that the matrices have the right dimensions.

There are different ways to represent 2-D arrays in Rust, here we chose the type &Vec<Vec<i32>>. When you finish this exercice, feel free to explore other possible representations (e.g ndarray::arr2).

Usage example:

fn main() {
    let matrix_a = vec![
        vec![1, 2, 3],
        vec![4, 5, 6],
    ];

    let matrix_b = vec![
        vec![7, 8],
        vec![9, 10],
        vec![11, 12],
    ];

    let result_matrix = multiply_matrices(&matrix_a, &matrix_b);

    // Print the result
}

Arrays and for Loops

We saw that an array can be declared like this:

#![allow(unused)]
fn main() {
let array = [10, 20, 30];
}

You can print such an array by asking for its debug representation with {:?}:

fn main() {
    let array = [10, 20, 30];
    println!("array: {array:?}");
}

Rust lets you iterate over things like arrays and ranges using the for keyword:

fn main() {
    let array = [10, 20, 30];
    print!("Iterating over array:");
    for n in array {
        print!(" {n}");
    }
    println!();

    print!("Iterating over range:");
    for i in 0..3 {
        print!(" {}", array[i]);
    }
    println!();
}

Use the above to write a function pretty_print which pretty-print a matrix and a function transpose which will transpose a matrix (turn rows into columns):

2584567⎀8⎄9⎊transpose==⎛⎡1⎜⎱4⎝⎣73⎀⎞6⎄⎟9⎩⎠⎡1⎱2⎣3

Hard-code both functions to operate on 3 × 3 matrices.

Copy the code below to https://play.rust-lang.org/ and implement the functions:

// TODO: remove this when you're done with your implementation.
#![allow(unused_variables, dead_code)]

fn transpose(matrix: [[i32; 3]; 3]) -> [[i32; 3]; 3] {
    unimplemented!()
}

fn pretty_print(matrix: &[[i32; 3]; 3]) {
    unimplemented!()
}

fn main() {
    let matrix = [
        [101, 102, 103], // <-- the comment makes rustfmt add a newline
        [201, 202, 203],
        [301, 302, 303],
    ];

    println!("matrix:");
    pretty_print(&matrix);

    let transposed = transpose(matrix);
    println!("transposed:");
    pretty_print(&transposed);
}

Bonus Question

Could you use &[i32] slices instead of hard-coded 3 × 3 matrices for your argument and return types? Something like &[&[i32]] for a two-dimensional slice-of-slices. Why or why not?

See the ndarray crate for a production quality implementation.

Palindrome

Implement a Rust function that determines whether a given string is a palindrome.

A palindrome is a sequence of characters that reads the same forward and backward (e.g. kayak, madam, racecar, 
).

In this exercices we ask you to ignore spaces, punctuation, and case.

Have a look at the std::str documentation if needed.

Function signature:

fn is_palindrome(s: &str) -> bool{
    // Your code
}

Usage example:

fn main() {
    let palindrome1 = "A man, a plan, a canal, Panama";
    let palindrome2 = "Madam, in Eden, I'm Adam";
    let non_palindrome = "Having class 8:30am is fun!!";

    println!("Is '{}' a palindrome? {}", palindrome1, is_palindrome(palindrome1)); // True
    println!("Is '{}' a palindrome? {}", palindrome2, is_palindrome(palindrome2)); // True
    println!("Is '{}' a palindrome? {}", non_palindrome, is_palindrome(non_palindrome)); // False
}

Welcome to course 2

We will cover:

  • Memory management: stack vs heap, manual memory management, scope-based memory management, and garbage collection.

  • Ownership: move semantics, copying and cloning, borrowing, and lifetimes.

Variables

Rust provides type safety via static typing. Variable bindings are immutable by default:

fn main() {
    let x: i32 = 10;
    println!("x: {x}");
    // x = 20;
    // println!("x: {x}");
}
  • Due to type inference the i32 is optional. We will gradually show the types less and less as the course progresses.
  • Note that since println! is a macro, x is not moved, even using the function like syntax of println!("x: {}", x)

Type Inference

Rust will look at how the variable is used to determine the type:

fn takes_u32(x: u32) {
    println!("u32: {x}");
}

fn takes_i8(y: i8) {
    println!("i8: {y}");
}

fn main() {
    let x = 10;
    let y = 20;

    takes_u32(x);
    takes_i8(y);
    // takes_u32(y);
}

This slide demonstrates how the Rust compiler infers types based on constraints given by variable declarations and usages.

It is very important to emphasize that variables declared like this are not of some sort of dynamic “any type” that can hold any data. The machine code generated by such declaration is identical to the explicit declaration of a type. The compiler does the job for us and helps us to write a more concise code.

The following code tells the compiler to copy into a certain generic container without the code ever explicitly specifying the contained type, using _ as a placeholder:

fn main() {
    let mut v = Vec::new();
    v.push((10, false));
    v.push((20, true));
    println!("v: {v:?}");

    let vv = v.iter().collect::<std::collections::HashSet<_>>();
    println!("vv: {vv:?}");
}

collect relies on FromIterator, which HashSet implements.

Option

Option comes from std::option. It is used to represent an optional value. Option is either Some or None.

fn divide(numerator: f64, denominator: f64) -> Option<f64> {
    if denominator == 0.0 {
        None
    } else {
        Some(numerator / denominator)
    }
}

fn main () {
    // The return value of the function is an option
    let result = divide(2.0, 3.0);

    // Pattern match to retrieve the value
    match result {
        // The division was valid
        Some(x) => println!("Result: {x}"),
        // The division was invalid
        None    => println!("Cannot divide by 0"),
    }
}

Static and Constant Variables

Global state is managed with static and constant variables.

const

You can declare compile-time constants:

const DIGEST_SIZE: usize = 3;
const ZERO: Option<u8> = Some(42);

fn compute_digest(text: &str) -> [u8; DIGEST_SIZE] {

    // println!("{:?}", text.as_bytes());
    // println!("{}", ZERO.unwrap_or(0));

    let mut digest = [ZERO.unwrap_or(0); DIGEST_SIZE];

    // println!("{:?}", digest);

    for (idx, &b) in text.as_bytes().iter().enumerate() {

        // println!("[{:?}]: {:?}", idx, &b);
        
        digest[idx % DIGEST_SIZE] = digest[idx % DIGEST_SIZE].wrapping_add(b);
    }
    digest
}

fn main() {
    let digest = compute_digest("Hello");
    println!("Digest: {digest:?}");
}

According the the Rust RFC Book these are inlined upon use.

static

You can also declare static variables:

static BANNER: &str = "Welcome to RustOS 3.14";

fn main() {
    println!("{BANNER}");
}

As noted in the Rust RFC Book, these are not inlined upon use and have an actual associated memory location. This is useful for unsafe and embedded code, and the variable lives through the entirety of the program execution.

We will look at mutating static data in the chapter on Unsafe Rust.

  • pub fn unwrap_or(self, default: T) -> T. Returns the contained Some value or a provided default. Here, ZERO.unwrap_or(0) is equal to 42. It would return 0 if ZERO was equal to None.
  • const is used to define a constant value that is inlined wherever it is used. Inlining means that the compiler replaces all instances of the constant with its value.
  • static on the other hand have a fixed memory location and can be mutated.

Scopes and Shadowing

You can shadow variables, both those from outer scopes and variables from the same scope:

fn main() {
    let a = 10;
    println!("before: {a}");

    {
        let a = "hello";
        println!("inner scope: {a}");

        let a = true;
        println!("shadowed in inner scope: {a}");
    }

    println!("after: {a}");
}
  • Definition: Shadowing is different from mutation, because after shadowing both variable’s memory locations exist at the same time. Both are available under the same name, depending where you use it in the code.
  • A shadowing variable can have a different type.
  • Shadowing looks obscure at first, but is convenient for holding on to values after .unwrap().
  • The following code demonstrates why the compiler can’t simply reuse memory locations when shadowing an immutable variable in a scope, even if the type does not change.
fn main() {
    let a = 1;
    let b = &a;
    let a = a + 1;
    println!("{a} {b}");
}

Unwrap

In Rust, .unwrap() is a method used on types like Option<T> and Result<T, E> to extract the value inside when you are certain it exists (i.e., it is Some(value) or Ok(value)).

However, if the value is None or Err, .unwrap() will panic (crash your program).

fn main() {
    let some_value: Option<i32> = Some(42);
    let x = some_value.unwrap(); // x = 42
    println!("x: {x}");

    let none_value: Option<i32> = None;
    let y = none_value.unwrap(); // PANIC!
    println!("y: {y}");
}
fn main() {
    let result: Result<i32, &str> = Ok(42);
    let x = result.unwrap(); // x = 42
    println!("x: {x}");

    let err_result: Result<i32, &str> = Err("Something went wrong");
    let y = err_result.unwrap(); // PANIC: "Something went wrong"
    println!("y: {y}");
}

This is how cloudflare “crashed” 20% of the Internet.

Prefer explicit error handling in real-world code to avoid crashes.

  • Use .expect("message") to provide a custom panic message.
  • Use pattern matching (match or if let) for explicit error handling.
  • Use methods like .unwrap_or(default) or ? (the “try” operator) for safer handling.
  • Result is either Ok or Err.
  • We’ll see match and if let later.

Memory Management

Traditionally, languages have fallen into two broad categories:

  • Full control via manual memory management: C, C++, Pascal, 

  • Full safety via automatic memory management at runtime: Java, Python, Go, Haskell, 


Rust offers a new mix:

Full control and safety via compile time enforcement of correct memory management.

It does this with an explicit ownership concept.

First, let’s refresh how memory management works.

The Stack vs The Heap

  • Stack: Continuous area of memory for local variables.

    • Values have fixed sizes known at compile time.
    • Extremely fast: just move a stack pointer.
    • Easy to manage: follows function calls.
    • Great memory locality.
  • Heap: Storage of values outside of function calls.

    • Values have dynamic sizes determined at runtime.
    • Slightly slower than the stack: some book-keeping needed.
    • No guarantee of memory locality.
  • The stack is used to store the program and the variables of fixed size.
  • The heap is used for dynamically allocating memory.

Stack Memory

Creating a String puts fixed-sized data on the stack and dynamically sized data on the heap:

fn main() {
    let s1 = String::from("Hello");
}
StackHeaps1ptrHellolen5capacity5
  • Mention that a String is backed by a Vec, so it has a capacity and length and can grow if mutable via reallocation on the heap.

Manual Memory Management

You allocate and deallocate heap memory yourself.

If not done with care, this can lead to crashes, bugs, security vulnerabilities, and memory leaks.

C Example

You must call free on every pointer you allocate with malloc:

void foo(size_t n) {
    int* int_array = (int*)malloc(n * sizeof(int));
    //
    // ... lots of code
    //
    free(int_array);
}

Memory is leaked if the function returns early between malloc and free: the pointer is lost and we cannot deallocate the memory.

Scope-Based Memory Management

Constructors and destructors let you hook into the lifetime of an object.

By wrapping a pointer in an object, you can free memory when the object is destroyed. The compiler guarantees that this happens, even if an exception is raised.

This is often called resource acquisition is initialization (RAII) and gives you smart pointers.

C++ Example

void say_hello(std::unique_ptr<Person> person) {
  std::cout << "Hello " << person->name << std::endl;
}
  • The std::unique_ptr object is allocated on the stack, and points to memory allocated on the heap.
  • At the end of say_hello, the std::unique_ptr destructor will run.
  • The destructor frees the Person object it points to.

Special move constructors are used when passing ownership to a function:

std::unique_ptr<Person> person = find_person("Carla");
say_hello(std::move(person));

Automatic Memory Management

An alternative to manual and scope-based memory management is automatic memory management:

  • The programmer never allocates or deallocates memory explicitly.
  • A garbage collector finds unused memory and deallocates it for the programmer.

Java Example

The person object is not deallocated after sayHello returns:

void sayHello(Person person) {
  System.out.println("Hello " + person.getName());
}

Memory Management in Rust

Memory management in Rust is a mix:

  • Safe and correct like Java, but without a garbage collector.
  • Depending on which abstraction (or combination of abstractions) you choose, can be a single unique pointer, reference counted, or atomically reference counted.
  • Scope-based like C++, but the compiler enforces full adherence.
  • A Rust user can choose the right abstraction for the situation, some even have no cost at runtime like C.

It achieves this by modeling ownership explicitly.

  • If asked how at this point, you can mention that in Rust this is usually handled by RAII wrapper types such as Box, Vec, Rc, or Arc. These encapsulate ownership and memory allocation via various means, and prevent the potential errors in C.

  • You may be asked about destructors here, the Drop trait is the Rust equivalent.

Comparison

Here is a rough comparison of the memory management techniques.

Pros of Different Memory Management Techniques

  • Manual like C:
    • No runtime overhead.
  • Automatic like Java:
    • Fully automatic.
    • Safe and correct.
  • Scope-based like C++:
    • Partially automatic.
    • No runtime overhead.
  • Compiler-enforced scope-based like Rust:
    • Enforced by compiler.
    • No runtime overhead.
    • Safe and correct.

Cons of Different Memory Management Techniques

  • Manual like C:
    • Use-after-free.
    • Double-frees.
    • Memory leaks.
  • Automatic like Java:
    • Garbage collection pauses.
    • Destructor delays.
  • Scope-based like C++:
    • Complex, opt-in by programmer.
    • Potential for use-after-free.
  • Compiler-enforced and scope-based like Rust:
    • Some upfront complexity.
    • Can reject valid programs.

Ownership

All variable bindings have a scope where they are valid and it is an error to use a variable outside its scope:

struct Point(i32, i32);

fn main() {
    {
        let p = Point(3, 4);
        println!("x: {}", p.0);
    }
    println!("y: {}", p.1);
}
  • At the end of the scope, the variable is dropped and the data is freed.
  • A destructor can run here to free up resources.
  • We say that the variable owns the value.

Move Semantics

An assignment will transfer ownership between variables:

fn main() {
    let s1: String = String::from("Hello!");
    let s2: String = s1;
    println!("s2: {s2}");
    // println!("s1: {s1}");
}
  • The assignment of s1 to s2 transfers ownership.
  • The data was moved from s1 and s1 is no longer accessible.
  • When s1 goes out of scope, nothing happens: it has no ownership.
  • When s2 goes out of scope, the string data is freed.
  • There is always exactly one variable binding which owns a value.
  • Mention that this is the opposite of the defaults in C++, which copies by value unless you use std::move (and the move constructor is defined!).

  • In Rust, your clones are explicit (by using clone).

Moved Strings in Rust

fn main() {
    let s1: String = String::from("Rust");
    let s2: String = s1;
}
  • The heap data from s1 is reused for s2.
  • When s1 goes out of scope, nothing happens (it has been moved from).

Before move to s2:

StackHeaps1ptrRustlen4capacity4

After move to s2:

StackHeaps1ptrRustlen4capacity4s2ptrlen4capacity4(inaccessible)

Moves in Function Calls

When you pass a value to a function, the value is assigned to the function parameter. This transfers ownership:

fn say_hello(name: String) {
    println!("Hello {name}")
}

fn main() {
    let name = String::from("Alice");
    say_hello(name);
    // say_hello(name);
}
  • With the first call to say_hello, main gives up ownership of name. Afterwards, name cannot be used anymore within main.
  • The heap memory allocated for name will be freed at the end of the say_hello function.
  • main can retain ownership if it passes name as a reference (&name) and if say_hello accepts a reference as a parameter.
fn say_hello(name: &String) {
    println!("Hello {name}")
}

fn main() {
    let name = String::from("Alice");
    say_hello(&name);
    say_hello(&name);
}
  • Alternatively, main can pass a clone of name in the first call (name.clone()).
fn say_hello(name: String) {
    println!("Hello {name}")
}

fn main() {
    let name = String::from("Alice");
    say_hello(name.clone());
    say_hello(name);
}
  • Rust makes it harder than C++ to inadvertently create copies by making move semantics the default, and by forcing programmers to make clones explicit.

Copying and Cloning

While move semantics are the default, certain types are copied by default:

fn main() {
    let x = 42;
    let y = x;
    println!("x: {x}");
    println!("y: {y}");
}

These types implement the Copy trait.

You can opt-in your own types to use copy semantics:

#[derive(Copy, Clone, Debug)]
struct Point(i32, i32);

fn main() {
    let p1 = Point(3, 4);
    let p2 = p1;
    println!("p1: {p1:?}");
    println!("p2: {p2:?}");
}
  • After the assignment, both p1 and p2 own their own data.
  • We can also use p1.clone() to explicitly copy the data.

Copying and cloning are not the same thing:

  • Copying refers to bitwise copies of memory regions and does not work on arbitrary objects.
  • Copying does not allow for custom logic (unlike copy constructors in C++).
  • Cloning is a more general operation and also allows for custom behavior by implementing the Clone trait.
  • Copying does not work on types that implement the Drop trait.

In the above example, try the following:

  • Add a String field to struct Point. It will not compile because String is not a Copy type.
#[derive(Copy, Clone, Debug)]
struct Point(i32, i32, String);

fn main() {
    let p1 = Point(3, 4, "Pokemon");
    let p2 = p1;
    println!("p1: {p1:?}");
    println!("p2: {p2:?}");
}
  • Remove Copy from the derive attribute. The compiler error is now in the println! for p1.
#[derive(Clone, Debug)]
struct Point(i32, i32, String);

fn main() {
    let p1 = Point(3, 4, String::from("Pokemon"));
    let p2 = p1;
    println!("p1: {p1:?}");
    println!("p2: {p2:?}");
}
  • Show that it works if you clone p1 instead.
#[derive(Clone, Debug)]
struct Point(i32, i32, String);

fn main() {
    let p1 = Point(3, 4, String::from("Pokemon"));
    let p2 = p1.clone();
    println!("p1: {p1:?}");
    println!("p2: {p2:?}");
}

If students ask about derive, it is sufficient to say that this is a way to generate code in Rust at compile time. In this case the default implementations of Copy and Clone traits are generated.

Borrowing

Instead of transferring ownership when calling a function, you can let a function borrow the value:

#[derive(Debug)]
struct Point(i32, i32);

fn add(p1: &Point, p2: &Point) -> Point {
    Point(p1.0 + p2.0, p1.1 + p2.1)
}

fn main() {
    let p1 = Point(3, 4);
    let p2 = Point(10, 20);
    let p3 = add(&p1, &p2);
    println!("{p1:?} + {p2:?} = {p3:?}");
}
  • The add function borrows two points and returns a new point.
  • The caller retains ownership of the inputs.

Notes on stack returns:

  • Demonstrate that the return from add is cheap because the compiler can eliminate the copy operation. Change the above code to print stack addresses and run it on the Playground. In the “DEBUG” optimization level, the addresses should change, while they stay the same when changing to the “RELEASE” setting:

    #[derive(Debug)]
    struct Point(i32, i32);
    
    fn add(p1: &Point, p2: &Point) -> Point {
        let p = Point(p1.0 + p2.0, p1.1 + p2.1);
        println!("&p.0: {:p}", &p.0);
        p
    }
    
    fn main() {
        let p1 = Point(3, 4);
        let p2 = Point(10, 20);
        let p3 = add(&p1, &p2);
        println!("&p3.0: {:p}", &p3.0);
        println!("{p1:?} + {p2:?} = {p3:?}");
    }
  • The Rust compiler can do return value optimization (RVO).

See Grapham King’s “Return Value Optimization in Rust” post

  • In C++, copy elision has to be defined in the language specification because constructors can have side effects. In Rust, this is not an issue at all. If RVO did not happen, Rust will always performs a simple and efficient memcpy copy.

Elision means dropping, omitting, cutting. It is used for a sound that is not pronounced in a work, for example.

Shared and Unique Borrows

Rust puts constraints on the ways you can borrow values:

  • You can have one or more &T values at any given time, or
  • You can have exactly one &mut T value.
fn main() {
    let mut a: i32 = 10;
    let b: &i32 = &a;

    {
        let c: &mut i32 = &mut a;
        *c = 20;
    }

    println!("a: {a}");
    println!("b: {b}");
}
  • The above code does not compile because a is borrowed as mutable (through c) and as immutable (through b) at the same time.
  • Move the println! statement for b before the scope that introduces c to make the code compile.
fn main() {
    let mut a: i32 = 10;
    let b: &i32 = &a;
    println!("b: {b}");
    {
        let c: &mut i32 = &mut a;
        *c = 20;
    }

    println!("a: {a}");

}
  • After that change, the compiler realizes that b is only ever used before the new mutable borrow of a through c. This is a feature of the borrow checker called “non-lexical lifetimes”.

Lifetimes

A borrowed value has a lifetime:

  • The lifetime can be elided: add(p1: &Point, p2: &Point) -> Point.
  • Lifetimes can also be explicit: &'a Point, &'document str.
  • Read &'a Point as “a borrowed Point which is valid for at least the lifetime a”.
  • Lifetimes are always inferred by the compiler: you cannot assign a lifetime yourself.
    • Lifetime annotations create constraints; the compiler verifies that there is a valid solution.

Lifetimes in Function Calls

In addition to borrowing its arguments, a function can return a borrowed value:

#[derive(Debug)]
struct Point(i32, i32);

fn left_most<'a>(p1: &'a Point, p2: &'a Point) -> &'a Point {
    if p1.0 < p2.0 { p1 } else { p2 }
}

fn main() {
    let p1: Point = Point(10, 10);
    let p2: Point = Point(20, 20);
    let p3: &Point = left_most(&p1, &p2);
    println!("left-most point: {:?}", p3);
}
  • 'a is a generic parameter, it is inferred by the compiler.
  • Lifetimes start with ' and 'a is a typical default name.
  • Read &'a Point as “a borrowed Point which is valid for at least the lifetime a”.
    • The at least part is important when parameters are in different scopes.

In the above example, try the following:

  • Move the declaration of p2 and p3 into a a new scope ({ ... }), resulting in the following code:

    #[derive(Debug)]
    struct Point(i32, i32);
    
    fn left_most<'a>(p1: &'a Point, p2: &'a Point) -> &'a Point {
        if p1.0 < p2.0 { p1 } else { p2 }
    }
    
    fn main() {
        let p1: Point = Point(10, 10);
        let p3: &Point;
        {
            let p2: Point = Point(20, 20);
            p3 = left_most(&p1, &p2);
        }
        println!("left-most point: {:?}", p3);
    }

    Note how this does not compile since p3 outlives p2.

  • Reset the workspace and change the function signature to fn left_most<'a, 'b>(p1: &'a Point, p2: &'a Point) -> &'b Point. This will not compile because the relationship between the lifetimes 'a and 'b is unclear.

  • Another way to explain it:

    • Two references to two values are borrowed by a function and the function returns another reference.
    • It must have come from one of those two inputs (or from a global variable).
    • Which one is it? The compiler needs to know, so at the call site the returned reference is not used for longer than a variable from where the reference came from.

Lifetimes in Data Structures

If a data type stores borrowed data, it must be annotated with a lifetime:

#[derive(Debug)]
struct Highlight<'doc>(&'doc str);

fn erase(text: String) {
    println!("Bye {text}!");
}

fn main() {
    let text = String::from("The quick brown fox jumps over the lazy dog.");
    let fox = Highlight(&text[4..19]);
    let dog = Highlight(&text[35..43]);
    // erase(text);
    println!("{fox:?}");
    println!("{dog:?}");
}
  • In the above example, the annotation on Highlight enforces that the data underlying the contained &str lives at least as long as any instance of Highlight that uses that data.
  • If text is consumed before the end of the lifetime of fox (or dog), the borrow checker throws an error.
  • Types with borrowed data force users to hold on to the original data. This can be useful for creating lightweight views, but it generally makes them somewhat harder to use.
  • When possible, make data structures own their data directly.
  • Some structs with multiple references inside can have more than one lifetime annotation. This can be necessary if there is a need to describe lifetime relationships between the references themselves, in addition to the lifetime of the struct itself. Those are very advanced use cases.

Course 2: Exercises

We will look at two things:

  • A small book library,

  • Iterators and ownership (hard).

Designing a Library

We will learn much more about structs and the Vec<T> type later. For now, you just need to know part of its API:

fn main() {
    let mut vec = vec![10, 20];
    vec.push(30);
    println!("middle value: {}", vec[vec.len() / 2]);
    for item in vec.iter() {
        println!("item: {item}");
    }
}

Use this to create a library application. Copy the code below to https://play.rust-lang.org/ and update the types to make it compile:

// TODO: remove this when you're done with your implementation.
#![allow(unused_variables, dead_code)]

struct Library {
    books: Vec<Book>,
}

struct Book {
    title: String,
    year: u16,
}

impl Book {
    // This is a constructor, used below.
    fn new(title: &str, year: u16) -> Book {
        Book {
            title: String::from(title),
            year,
        }
    }
}

// This makes it possible to print Book values with {}.
impl std::fmt::Display for Book {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "{} ({})", self.title, self.year)
    }
}

impl Library {
    fn new() -> Library {
        unimplemented!()
    }

    //fn len(self) -> usize {
    //    unimplemented!()
    //}

    //fn is_empty(self) -> bool {
    //    unimplemented!()
    //}

    //fn add_book(self, book: Book) {
    //    unimplemented!()
    //}

    //fn print_books(self) {
    //    unimplemented!()
    //}

    //fn oldest_book(self) -> Option<&Book> {
    //    unimplemented!()
    //}
}

// This shows the desired behavior. Uncomment the code below and
// implement the missing methods. You will need to update the
// method signatures, including the "self" parameter! You may
// also need to update the variable bindings within main.
fn main() {
    let library = Library::new();

    //println!("Our library is empty: {}", library.is_empty());
    //
    //library.add_book(Book::new("Lord of the Rings", 1954));
    //library.add_book(Book::new("Alice's Adventures in Wonderland", 1865));
    //
    //library.print_books();
    //
    //match library.oldest_book() {
    //    Some(book) => println!("My oldest book is {book}"),
    //    None => println!("My library is empty!"),
    //}
    //
    //println!("Our library has {} books", library.len());
}

Iterators and Ownership

The ownership model of Rust affects many APIs. An example of this is the Iterator and IntoIterator traits.

Iterator

Traits are like interfaces: they describe behavior (methods) for a type. The Iterator trait simply says that you can call next until you get None back:

#![allow(unused)]
fn main() {
pub trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}
}

You use this trait like this:

fn main() {
    let v: Vec<i8> = vec![10, 20, 30];
    let mut iter = v.iter();

    println!("v[0]: {:?}", iter.next());
    println!("v[1]: {:?}", iter.next());
    println!("v[2]: {:?}", iter.next());
    println!("No more items: {:?}", iter.next());
}

What is the type returned by the iterator? Test your answer here:

fn main() {
    let v: Vec<i8> = vec![10, 20, 30];
    let mut iter = v.iter();

    let v0: Option<..> = iter.next();
    println!("v0: {v0:?}");
}

Why is this type used?

IntoIterator

The Iterator trait tells you how to iterate once you have created an iterator. The related trait IntoIterator tells you how to create the iterator:

#![allow(unused)]
fn main() {
pub trait IntoIterator {
    type Item;
    type IntoIter: Iterator<Item = Self::Item>;

    fn into_iter(self) -> Self::IntoIter;
}
}

The syntax here means that every implementation of IntoIterator must declare two types:

  • Item: the type we iterate over, such as i8,
  • IntoIter: the Iterator type returned by the into_iter method.

Note that IntoIter and Item are linked: the iterator must have the same Item type, which means that it returns Option<Item>

Like before, what is the type returned by the iterator?

fn main() {
    let v: Vec<String> = vec![String::from("foo"), String::from("bar")];
    let mut iter = v.into_iter();

    let v0: Option<..> = iter.next();
    println!("v0: {v0:?}");
}

for Loops

Now that we know both Iterator and IntoIterator, we can build for loops. They call into_iter() on an expression and iterates over the resulting iterator:

fn main() {
    let v: Vec<String> = vec![String::from("foo"), String::from("bar")];

    for word in &v {
        println!("word: {word}");
    }

    for word in v {
        println!("word: {word}");
    }
}

What is the type of word in each loop?

Experiment with the code above and then consult the documentation for impl IntoIterator for &Vec<T> and impl IntoIterator for Vec<T> to check your answers.

Welcome to course 3

Now that we have seen a fair amount of Rust, we will continue with:

  • Structs, enums, methods.

  • Pattern matching: destructuring enums, structs, and arrays.

Structs

Like C and C++, Rust has support for custom structs:

struct Person {
    name: String,
    age: u8,
}

fn main() {
    let mut peter = Person {
        name: String::from("Peter"),
        age: 27,
    };
    println!("{} is {} years old", peter.name, peter.age);
    
    peter.age = 28;
    println!("{} is {} years old", peter.name, peter.age);
    
    let jackie = Person {
        name: String::from("Jackie"),
        ..peter
    };
    println!("{} is {} years old", jackie.name, jackie.age);
}
Key Points:
  • Structs work like in C or C++.
    • Like in C++, and unlike in C, no typedef is needed to define a type.
    • Unlike in C++, there is no inheritance between structs.
  • Methods are defined in an impl block, which we will see in following slides.
  • There are different types of structs.
    • Zero-sized structs e.g., struct Foo; `is a structure that does not require any memory. It might be used when implementing a trait on some type but don’t have any data that you want to store in the value itself.
    • The next slide will introduce Tuple structs.
  • .. is an operator that is used for pattern binding here. It performs a “and the rest” pattern binding.

Tuple Structs

If the field names are unimportant, you can use a tuple struct:

struct Point(i32, i32);

fn main() {
    let p = Point(17, 23);
    println!("({}, {})", p.0, p.1);
}

This is often used for single-field wrappers (called newtypes):

struct PoundOfForce(f64);
struct Newtons(f64);

fn compute_thruster_force() -> PoundOfForce {
    todo!("Ask a rocket scientist at NASA")
}

fn set_thruster_force(force: Newtons) {
    // ...
}

fn main() {
    let force = compute_thruster_force();
    set_thruster_force(force);
}

Newtypes are a great way to encode additional information about the value in a primitive type, for example:

  • The number is measured in some units: Newtons in the example above.
  • The value passed some validation when it was created, so you no longer have to validate it again at every use: ’PhoneNumber(String)orOddNumber(u32)`.

Field Shorthand Syntax

If you already have variables with the right names, then you can create the struct using a shorthand:

#[derive(Debug)]
struct Person {
    name: String,
    age: u8,
}

impl Person {
    fn new(name: String, age: u8) -> Person {
        Person { name, age }
    }
}

fn main() {
    let peter = Person::new(String::from("Peter"), 27);
    println!("{peter:?}");
}

name and age are field names of the structure and variable names.

The new function could be written using Self as a type, as it is interchangeable with the struct type name

impl Person {
    fn new(name: String, age: u8) -> Self {
        Self { name, age }
    }
}

Enums

The enum keyword allows the creation of a type which has a few different variants:

fn generate_random_number() -> i32 {
    4  // Chosen by fair dice roll. Guaranteed to be random.
}

#[derive(Debug)]
enum CoinFlip {
    Heads,
    Tails,
}

fn flip_coin() -> CoinFlip {
    let random_number = generate_random_number();
    if random_number % 2 == 0 {
        return CoinFlip::Heads;
    } else {
        return CoinFlip::Tails;
    }
}

fn main() {
    println!("You got: {:?}", flip_coin());
}

Key Points:

  • Enumerations allow you to collect a set of values under one type
  • This page offers an enum type CoinFlip with two variants Heads and Tail. You might note the namespace when using variants.
  • This might be a good time to compare Structs and Enums:
    • In both, you can have a simple version without fields (unit struct) or one with different types of fields (variant payloads).
    • In both, associated functions are defined within an impl block.
    • You could even implement the different variants of an enum with separate structs but then they wouldn’t be the same type as they would if they were all defined in an enum.

Variant Payloads

You can define richer enums where the variants carry data. You can then use the match statement to extract the data from each variant:

enum WebEvent {
    PageLoad,                 // Variant without payload
    KeyPress(char),           // Tuple struct variant
    Click { x: i64, y: i64 }, // Full struct variant
}

#[rustfmt::skip]
fn inspect(event: WebEvent) {
    match event {
        WebEvent::PageLoad       => println!("page loaded"),
        WebEvent::KeyPress(c)    => println!("pressed '{c}'"),
        WebEvent::Click { x, y } => println!("clicked at x={x}, y={y}"),
    }
}

fn main() {
    let load = WebEvent::PageLoad;
    let press = WebEvent::KeyPress('x');
    let click = WebEvent::Click { x: 20, y: 80 };

    inspect(load);
    inspect(press);
    inspect(click);
}
  • In the above example, accessing the char in KeyPress, or x and y in Click only works within a match statement.
  • match inspects a hidden discriminant field in the enum.

Enum Sizes

Rust enums are packed tightly, taking constraints due to alignment into account:

use std::mem::{align_of, size_of};

macro_rules! dbg_size {
    ($t:ty) => {
        println!("{}: size {} bytes, align: {} bytes",
                 stringify!($t), size_of::<$t>(), align_of::<$t>());
    };
}

enum Foo {
    A,
    B,
}

#[repr(u32)]
enum Bar {
    A,  // 0
    B = 10000,
    C,  // 10001
}

fn main() {
    dbg_size!(Foo);
    dbg_size!(Bar);
    dbg_size!(bool);
    dbg_size!(Option<bool>);
    dbg_size!(&i32);
    dbg_size!(Option<&i32>);
    // dbg_size!(Option<&Foo>);
    // dbg_size!(&Foo);
}

Key Points:

  • Internally Rust is using a field (discriminant) to keep track of the enum variant.
  • Bar enum demonstrates that there is a way to control the discriminant value and type. If repr is removed, the discriminant type takes 2 bytes, becuase 10001 fits 2 bytes.
  • As a niche optimization an enum discriminant is merged with the pointer so that Option<&Foo> is the same size as &Foo.
  • Option<bool> is another example of tight packing.
  • For some types, Rust guarantees that size_of::<T>() equals size_of::<Option<T>>().
  • Zero-sized types allow for efficient implementation of HashSet using HashMap with () as the value.
  • The ty module defines how rustc represents types internally. https://rustc-dev-guide.rust-lang.org/ty.html

Methods

Rust allows you to associate functions with your new types. You do this with an impl block:

#[derive(Debug)]
struct Person {
    name: String,
    age: u8,
}

impl Person {
    fn say_hello(&self) {
        println!("Hello, my name is {}", self.name);
    }
}

fn main() {
    let peter = Person {
        name: String::from("Peter"),
        age: 27,
    };
    peter.say_hello();
}

Key Points:

  • Methods versus functions.
    • Methods are called on an instance of a type (such as a struct or enum), the first parameter represents the instance as self.
    • Developers may choose to use methods to take advantage of method receiver syntax and to help keep them more organized. By using methods we can keep all the implementation code in one predictable place.
  • Point out the use of the keyword self, a method receiver.
    • Explain that Self is a type alias for the type the impl block is in and can be used elsewhere in the block. –>

    • Note how self is used like other structs and dot notation can be used to refer to individual fields.

    • This might be a good time to demonstrate how the &self differs from self by modifying the code and trying to run say_hello twice.

  • We describe the distinction between method receivers next.

Method Receiver

The &self above indicates that the method borrows the object immutably. There are other possible receivers for a method:

  • &self: borrows the object from the caller using a shared and immutable reference. The object can be used again afterwards.
  • &mut self: borrows the object from the caller using a unique and mutable reference. The object can be used again afterwards.
  • self: takes ownership of the object and moves it away from the caller. The method becomes the owner of the object. The object will be dropped (deallocated) when the method returns, unless its ownership is explicitly transmitted.
  • mut self: same as above, but while the method owns the object, it can mutate it too. Complete ownership does not automatically mean mutability.
  • No receiver: this becomes a static method on the struct. Typically used to create constructors which are called new by convention.

Beyond variants on self, there are also special wrapper types allowed to be receiver types, such as Box<Self>.

Consider emphasizing on “shared and immutable” and “unique and mutable”. These constraints always come together in Rust due to borrow checker rules, and self is no exception. It won’t be possible to reference a struct from multiple locations and call a mutating (&mut self) method on it.

Example

#[derive(Debug)]
struct Race {
    name: String,
    laps: Vec<i32>,
}

impl Race {
    fn new(name: &str) -> Race {  // No receiver, a static method
        Race { name: String::from(name), laps: Vec::new() }
    }

    fn add_lap(&mut self, lap: i32) {  // Exclusive borrowed read-write access to self
        self.laps.push(lap);
    }

    fn print_laps(&self) {  // Shared and read-only borrowed access to self
        println!("Recorded {} laps for {}:", self.laps.len(), self.name);
        for (idx, lap) in self.laps.iter().enumerate() {
            println!("Lap {idx}: {lap} sec");
        }
    }

    fn finish(self) {  // Exclusive ownership of self
        let total = self.laps.iter().sum::<i32>();
        println!("Race {} is finished, total lap time: {}", self.name, total);
    }
}

fn main() {
    let mut race = Race::new("Monaco Grand Prix");
    race.add_lap(70);
    race.add_lap(68);
    race.print_laps();
    race.add_lap(71);
    race.print_laps();
    race.finish();
    // race.add_lap(42);
}

Key Points:

  • All four methods here use a different method receiver.
    • You can point out how that changes what the function can do with the variable values and if/how it can be used again in main.
    • You can showcase the error that appears when trying to call finish twice.
  • Note, that although the method receivers are different, the non-static functions are called the same way in the main body. Rust enables automatic referencing and dereferencing when calling methods. Rust automatically adds in the &, *, muts so that that object matches the method signature.
  • You might point out that print_laps is using a vector that is iterated over. We describe vectors in more detail in course 4.

Pattern Matching

The match keyword let you match a value against one or more patterns. The comparisons are done from top to bottom and the first match wins.

The patterns can be simple values, similarly to switch in C and C++:

fn main() {
    let input = 'x';

    match input {
        'q'                   => println!("Quitting"),
        'a' | 's' | 'w' | 'd' => println!("Moving around"),
        '0'..='9'             => println!("Number input"),
        _                     => println!("Something else"),
    }
}

The _ pattern is a wildcard pattern which matches any value.

Key Points:

  • You might point out how some specific characters are being used when in a patten
    • | as an or
    • .. can expand as much as it needs to be
    • 1..=5 represents an inclusive range
    • _ is a wild card
  • It can be useful to show how binding works, by for instance replacing a wildcard character with a variable, or removing the quotes around q.
  • You can demonstrate matching on a reference.
  • This might be a good time to bring up the concept of irrefutable patterns, as the term can show up in error messages.

From Rust Book

“Patterns that will match for any possible value passed are irrefutable. An example would be x in the statement let x = 5; because x matches anything and therefore cannot fail to match. Patterns that can fail to match for some possible value are refutable. An example would be Some(x) in the expression if let Some(x) = a_value because if the value in the a_value variable is None rather than Some, the Some(x) pattern will not match.”

Destructuring Enums

Patterns can also be used to bind variables to parts of your values. This is how you inspect the structure of your types. Let us start with a simple enum type:

enum Result {
    Ok(i32),
    Err(String),
}

fn divide_in_two(n: i32) -> Result {
    if n % 2 == 0 {
        Result::Ok(n / 2)
    } else {
        Result::Err(format!("cannot divide {n} into two equal parts"))
    }
}

fn main() {
    let n = 100;
    match divide_in_two(n) {
        Result::Ok(half) => println!("{n} divided in two is {half}"),
        Result::Err(msg) => println!("sorry, an error happened: {msg}"),
    }
}

Here we have used the arms to destructure the Result value. In the first arm, half is bound to the value inside the Ok variant. In the second arm, msg is bound to the error message.

Key points:

  • The if/else expression is returning an enum that is later unpacked with a match.
  • You can try adding a third variant to the enum definition and displaying the errors when running the code. Point out the places where your code is now inexhaustive and how the compiler tries to give you hints.
enum Result {
    Ok(i32),
    Err(String),
    Add(i32),
}

fn divide_in_two(n: i32) -> Result {
    if n % 2 == 0 {
        Result::Ok(n / 2)
    } else {
        Result::Err(format!("cannot divide {n} into two equal parts"))
    }
}

fn main() {
    let n = 99;
    match divide_in_two(n) {
        Result::Ok(half) => println!("{n} divided in two is {half}"),
        Result::Err(msg) => println!("sorry, an error happened: {msg}"),
        Result::Add(a) => println!("{}",a)
    }
}

Destructuring Structs

You can also destructure structs:

struct Foo {
    x: (u32, u32),
    y: u32,
}

#[rustfmt::skip]
fn main() {
    let foo = Foo { x: (1, 2), y: 3 };
    match foo {
        Foo { x: (1, b), y } => println!("x.0 = 1, b = {b}, y = {y}"),
        Foo { y: 2, x: i }   => println!("y = 2, i = {i:?}"),
        Foo { y, .. }        => println!("y = {y}, other fields were ignored"),
    }
}
  • #[rustfmt::skip] disables rust formatting for the following block of code

Destructuring Arrays

You can destructure arrays, tuples, and slices by matching on their elements:

#[rustfmt::skip]
fn main() {
    let triple = [0, -2, 3];
    println!("Tell me about {triple:?}");
    match triple {
        [0, y, z] => println!("First is 0, y = {y}, and z = {z}"),
        [1, ..]   => println!("First is 1 and the rest were ignored"),
        _         => println!("All elements were ignored"),
    }
}

Match Guards

When matching, you can add a guard to a pattern. This is an arbitrary Boolean expression which will be executed if the pattern matches:

#[rustfmt::skip]
fn main() {
    let pair = (2, -2);
    println!("Tell me about {pair:?}");
    match pair {
        (x, y) if x == y     => println!("These are twins"),
        (x, y) if x + y == 0 => println!("Antimatter, kaboom!"),
        (x, _) if x % 2 == 1 => println!("The first one is odd"),
        _                    => println!("No correlation..."),
    }
}

Key Points:

  • Match guards as a separate syntax feature are important and necessary.
  • They are not the same as separate if expression inside of the match arm. An if expression inside of the branch block (after =>) happens after the match arm is selected. Failing the if condition inside of that block won’t result in other arms of the original match expression being considered.
  • You can use the variables defined in the pattern in your if expression.
  • The condition defined in the guard applies to every expression in a pattern with an |.
#[rustfmt::skip]
fn main() {
    let pair = (1, -2);
    println!("Tell me about {pair:?}");
    match pair {
        (x, y) if x == y     => println!("These are twins"),
        (x, y) if x + y == 0 => println!("Antimatter, kaboom!"),
        (x, _) | (x, -2) if x % 2 == 1 => println!("The first one is odd"),
        _                    => println!("No correlation..."),
    }
}

Course 3: Exercises

We will look at implementing methods in two contexts:

  • Simple struct which tracks health statistics.

  • Multiple structs and enums for a drawing library.

Health Statistics

You’re working on implementing a health-monitoring system. As part of that, you need to keep track of users’ health statistics.

You’ll start with some stubbed functions in an impl block as well as a User struct definition. Your goal is to implement the stubbed out methods on the User struct defined in the impl block.

Copy the code below to https://play.rust-lang.org/ and fill in the missing methods:

// TODO: remove this when you're done with your implementation.
#![allow(unused_variables, dead_code)]

struct User {
    name: String,
    age: u32,
    weight: f32,
}

impl User {
    pub fn new(name: String, age: u32, weight: f32) -> Self {
        unimplemented!()
    }

    pub fn name(&self) -> &str {
        unimplemented!()
    }

    pub fn age(&self) -> u32 {
        unimplemented!()
    }

    pub fn weight(&self) -> f32 {
        unimplemented!()
    }

    pub fn set_age(&mut self, new_age: u32) {
        unimplemented!()
    }

    pub fn set_weight(&mut self, new_weight: f32) {
        unimplemented!()
    }
}

fn main() {
    let bob = User::new(String::from("Bob"), 32, 155.2);
    println!("I'm {} and my age is {}", bob.name(), bob.age());
}

#[test]
fn test_weight() {
    let bob = User::new(String::from("Bob"), 32, 155.2);
    assert_eq!(bob.weight(), 155.2);
}

#[test]
fn test_set_age() {
    let mut bob = User::new(String::from("Bob"), 32, 155.2);
    assert_eq!(bob.age(), 32);
    bob.set_age(33);
    assert_eq!(bob.age(), 33);
}

Polygon Struct

We will create a Polygon struct which contain some points. Copy the code below to https://play.rust-lang.org/ and fill in the missing methods to make the tests pass:

// TODO: remove this when you're done with your implementation.
#![allow(unused_variables, dead_code)]

pub struct Point {
    // add fields
}

impl Point {
    // add methods
}

pub struct Polygon {
    // add fields
}

impl Polygon {
    // add methods
}

pub struct Circle {
    // add fields
}

impl Circle {
    // add methods
}

pub enum Shape {
    Polygon(Polygon),
    Circle(Circle),
}

#[cfg(test)]
mod tests {
    use super::*;

    fn round_two_digits(x: f64) -> f64 {
        (x * 100.0).round() / 100.0
    }

    #[test]
    fn test_point_magnitude() {
        let p1 = Point::new(12, 13);
        assert_eq!(round_two_digits(p1.magnitude()), 17.69);
    }

    #[test]
    fn test_point_dist() {
        let p1 = Point::new(10, 10);
        let p2 = Point::new(14, 13);
        assert_eq!(round_two_digits(p1.dist(p2)), 5.00);
    }

    #[test]
    fn test_point_add() {
        let p1 = Point::new(16, 16);
        let p2 = p1 + Point::new(-4, 3);
        assert_eq!(p2, Point::new(12, 19));
    }

    #[test]
    fn test_polygon_left_most_point() {
        let p1 = Point::new(12, 13);
        let p2 = Point::new(16, 16);

        let mut poly = Polygon::new();
        poly.add_point(p1);
        poly.add_point(p2);
        assert_eq!(poly.left_most_point(), Some(p1));
    }

    #[test]
    fn test_polygon_iter() {
        let p1 = Point::new(12, 13);
        let p2 = Point::new(16, 16);

        let mut poly = Polygon::new();
        poly.add_point(p1);
        poly.add_point(p2);

        let points = poly.iter().cloned().collect::<Vec<_>>();
        assert_eq!(points, vec![Point::new(12, 13), Point::new(16, 16)]);
    }

    #[test]
    fn test_shape_perimeters() {
        let mut poly = Polygon::new();
        poly.add_point(Point::new(12, 13));
        poly.add_point(Point::new(17, 11));
        poly.add_point(Point::new(16, 16));
        let shapes = vec![
            Shape::from(poly),
            Shape::from(Circle::new(Point::new(10, 20), 5)),
        ];
        let perimeters = shapes
            .iter()
            .map(Shape::perimeter)
            .map(round_two_digits)
            .collect::<Vec<_>>();
        assert_eq!(perimeters, vec![15.48, 31.42]);
    }
}

#[allow(dead_code)]
fn main() {}

Since the method signatures are missing from the problem statements, the key part of the exercise is to specify those correctly.

Other interesting parts of the exercise:

  • Derive a Copy trait for some structs, as in tests the methods sometimes don’t borrow their arguments.
  • Discover that Add trait must be implemented for two objects to be addable via “+”.

Welcome to course 4

Now that we have seen a fair amount of Rust, we will continue with:

  • Control flow constructs: if, if let, while, while let, break, and continue.

  • The Standard Library: String, Option and Result, Vec, HashMap, Rc and Arc.

  • Modules: visibility, paths, and filesystem hierarchy.

Control Flow

As we have seen, if is an expression in Rust. It is used to conditionally evaluate one of two blocks, but the blocks can have a value which then becomes the value of the if expression. Other control flow expressions work similarly in Rust.

if let expressions

If you want to match a value against a pattern, you can use if let:

fn main() {
    let arg = std::env::args().next();
    if let Some(value) = arg {
        println!("Program name: {value}");
    } else {
        println!("Missing name?");
    }
}

See pattern matching for more details on patterns in Rust.

  • if let can be more concise than match, e.g., when only one case is interesting. In contrast, match requires all branches to be covered.
  • Unlike match, if let does not support guard clauses for pattern matching.

while expressions

The while keyword works very similar to other languages:

fn main() {
    let mut x = 10;
    while x != 1 {
        x = if x % 2 == 0 {
            x / 2
        } else {
            3 * x + 1
        };
    }
    println!("Final x: {x}");
}

while let expressions

Like with if, there is a while let variant which repeatedly tests a value against a pattern:

fn main() {
    let v = vec![10, 20, 30];
    let mut iter = v.into_iter();

    while let Some(x) = iter.next() {
        println!("x: {x}");
    }
}

Here the iterator returned by v.iter() will return a Option<i32> on every call to next(). It returns Some(x) until it is done, after which it will return None. The while let lets us keep iterating through all items.

See pattern matching for more details on patterns in Rust.

  • Point out that the while let loop will keep going as long as the value matches the pattern.
  • You could rewrite the while let loop as an infinite loop with an if statement that breaks when there is no value to unwrap for iter.next(). The while let provides syntactic sugar for the above scenario.

for expressions

The for expression is closely related to the while let expression. It will automatically call into_iter() on the expression and then iterate over it:

fn main() {
    let v = vec![10, 20, 30];

    for x in v {
        println!("x: {x}");
    }
    
    for i in (0..10).step_by(2) {
        println!("i: {i}");
    }
}

You can use break and continue here as usual.

  • Index iteration is not a special syntax in Rust for just that case.
  • (0..10) is a range that implements an Iterator trait.
  • step_by is a method that returns another Iterator that skips every other element.

loop expressions

Finally, there is a loop keyword which creates an endless loop. Here you must either break or return to stop the loop:

fn main() {
    let mut x = 10;
    loop {
        x = if x % 2 == 0 {
            x / 2
        } else {
            3 * x + 1
        };
        if x == 1 {
            break;
        }
    }
    println!("Final x: {x}");
}

match expressions

The match keyword is used to match a value against one or more patterns. In that sense, it works like a series of if let expressions:

fn main() {
    match std::env::args().next().as_deref() {
        Some("cat") => println!("Will do cat things"),
        Some("ls")  => println!("Will ls some files"),
        Some("mv")  => println!("Let's move some files"),
        Some("rm")  => println!("Uh, dangerous!"),
        None        => println!("Hmm, no program name?"),
        _           => println!("Unknown program name!"),
    }
}

Like if let, each match arm must have the same type. The type is the last expression of the block, if any. In the example above, the type is ().

See pattern matching for more details on patterns in Rust.

break and continue

If you want to exit a loop early, use break, if you want to immediately start the next iteration use continue. Both continue and break can optionally take a label argument which is used to break out of nested loops:

fn main() {
    let v = vec![10, 20, 30];
    let mut iter = v.into_iter();
    'outer: while let Some(x) = iter.next() {
        println!("x: {x}");
        let mut i = 0;
        while i < x {
            println!("x: {x}, i: {i}");
            i += 1;
            if i == 3 {
                break 'outer;
            }
        }
    }
}

In this case we break the outer loop after 3 iterations of the inner loop.

Standard Library

Rust comes with a standard library which helps establish a set of common types used by Rust library and programs. This way, two libraries can work together smoothly because they both use the same String type.

The common vocabulary types include:

  • Option and Result types: used for optional values and error handling.

  • String: the default string type used for owned data.

  • Vec: a standard extensible vector.

  • HashMap: a hash map type with a configurable hashing algorithm.

  • Box: an owned pointer for heap-allocated data.

  • Rc: a shared reference-counted pointer for heap-allocated data.

  • In fact, Rust contains several layers of the Standard Library: core, alloc and std.
  • core includes the most basic types and functions that don’t depend on libc, allocator or even the presence of an operating system.
  • alloc includes types which require a global heap allocator, such as Vec, Box and Arc.
  • Embedded Rust applications often only use core, and sometimes alloc.

Option and Result

The types represent optional data:

fn main() {
    let numbers = vec![10, 20, 30];
    let first: Option<&i8> = numbers.first();
    println!("first: {first:?}");

    let idx: Result<usize, usize> = numbers.binary_search(&10);
    println!("idx: {idx:?}");
}
  • Option and Result are widely used not just in the standard library.
  • Option<&T> has zero space overhead compared to &T.
  • Result is the standard type to implement error handling as we will see in a later course.
  • binary_search returns Result<usize, usize>.
    • If found, Result::Ok holds the index where the element is found.
    • Otherwise, Result::Err contains the index where such an element should be inserted.

The Result enum is defined as follows:

enum Result<T, E> {
    Ok(T),
    Err(E),
}

String

String is the standard heap-allocated growable UTF-8 string buffer:

fn main() {
    let mut s1 = String::new();
    s1.push_str("Hello");
    println!("s1: len = {}, capacity = {}", s1.len(), s1.capacity());

    let mut s2 = String::with_capacity(s1.len() + 1);
    s2.push_str(&s1);
    s2.push('!');
    println!("s2: len = {}, capacity = {}", s2.len(), s2.capacity());

    let s3 = String::from("🇹🇭");
    println!("s3: len = {}, number of chars = {}", s3.len(),
             s3.chars().count());
}

String implements Deref<Target = str>, which means that you can call all str methods on a String.

  • len returns the size of the String in bytes, not its length in characters.
  • chars returns an iterator over the actual characters.
  • String implements Deref<Target = str> which transparently gives it access to str’s methods.

Vec

Vec is the standard resizable heap-allocated buffer:

fn main() {
    let mut v1 = Vec::new();
    v1.push(42);
    println!("v1: len = {}, capacity = {}", v1.len(), v1.capacity());

    let mut v2 = Vec::with_capacity(v1.len() + 1);
    v2.extend(v1.iter());
    v2.push(9999);
    println!("v2: len = {}, capacity = {}", v2.len(), v2.capacity());
}

Vec implements Deref<Target = [T]>, which means that you can call slice methods on a Vec.

fn main() {
 
    let numbers = vec![10, 20, 30, 40, 50];

    // Slice from index 1 to 3 (exclusive)
    let slice = &numbers[1..3];
    println!("{:?}", slice); 
}
  • Vec is a type of collection, along with String and HashMap. The data it contains is stored on the heap. This means the amount of data doesn’t need to be known at compile time. It can grow or shrink at runtime.

  • Notice how Vec<T> is a generic type too, but you don’t have to specify T explicitly. As always with Rust type inference, the T was established during the first push call.

  • vec![...] is a canonical macro to use instead of Vec::new() and it supports adding initial elements to the vector.

  • To index the vector you use [ ], but they will panic if out of bounds. Alternatively, using get will return an Option. The pop function will remove the last element.

    println!("v2: len = {}, capacity = {}, {:?}", v2.len(), v2.capacity(), v2.get(0)); println!("{:?}",v2.pop());

  • Show iterating over a vector and mutating the value: for e in &mut v { *e += 50; }

    Example code:

    let mut v = vec![1, 2];
    for e in &mut v { *e += 50; }
    println!("{:?}", v);
    

HashMap

Standard hash map with protection against HashDoS attacks:

use std::collections::HashMap;

fn main() {
    let mut page_counts = HashMap::new();
    page_counts.insert("Adventures of Huckleberry Finn".to_string(), 207);
    page_counts.insert("Grimms' Fairy Tales".to_string(), 751);
    page_counts.insert("Pride and Prejudice".to_string(), 303);

    if !page_counts.contains_key("Les Misérables") {
        println!("We've know about {} books, but not Les Misérables.",
                 page_counts.len());
    }

    for book in ["Pride and Prejudice", "Alice's Adventure in Wonderland"] {
        match page_counts.get(book) {
            Some(count) => println!("{book}: {count} pages"),
            None => println!("{book} is unknown.")
        }
    }
}

Box

Box is an owned pointer to data on the heap:

fn main() {
    let five = Box::new(5);
    println!("five: {}", *five);
}
5StackHeapfive

Box<T> implements Deref<Target = T>, which means that you can call methods from T directly on a Box<T>.

  • Box is like std::unique_ptr in C++.

    • Ownership: owns the value/object it points to.
    • Only one Box, one unique_ptr can own the data at a time.
    • Automatically deallocates memory when dropped/destroyed.
  • In the above example, you can even leave out the * in the println! statement thanks to Deref.

Box with Recursive Data Structures

Recursive data types or data types with dynamic sizes need to use a Box:

#[derive(Debug)]
enum List<T> {
    Cons(T, Box<List<T>>),
    Nil,
}

fn main() {
    let list: List<i32> = List::Cons(1, Box::new(List::Cons(2, Box::new(List::Nil))));
    println!("{list:?}");
}
StackHeaplistTagConsTagConsTagNil010211

If the Box was not used here and we attempted to embed a List directly into the List, the compiler would not compute a fixed size of the struct in memory, it would look infinite.

Box solves this problem as it has the same size as a regular pointer and just points at the next element of the List in the heap.

Niche Optimization

#[derive(Debug)]
enum List<T> {
    Cons(T, Box<List<T>>),
    Nil,
}

fn main() {
    let list: List<i32> = List::Cons(1, Box::new(List::Cons(2, Box::new(List::Nil))));
    println!("{list:?}");
}

A Box cannot be empty, so the pointer is always valid and non-null. This allows the compiler to optimize the memory layout:

StackHeaplist0102null1/Tag1/Tag1/Tag

Rc

Rc is a reference-counted shared pointer. Use this when you need to refer to the same data from multiple places:

use std::rc::Rc;

fn main() {
    let mut a = Rc::new(10);
    let mut b = a.clone();

    println!("a: {a}");
    println!("b: {b}");
}

Shared references in Rust disallow mutation by default. If you need to mutate the data inside an Rc, you will need to wrap the data in a type such as Cell or RefCell, in single-threaded cases. See Arc if you are in a multi-threaded context. Arc stands for “Atomically Reference counted”. If you need to mutate through Arc use Mutex, RwLock, or one of the Atomic types.

  • Like C++’s std::shared_ptr.
  • clone is cheap: creates a pointer to the same allocation and increases the reference count.
  • make_mut actually clones the inner value if necessary (“clone-on-write”) and returns a mutable reference.

Modules

We have seen how impl blocks let us namespace functions to a type.

mod is about organization: how your code is structured and accessed.

mod foo {
    pub fn do_something() {
        println!("In the foo module");
    }
}

mod bar {
    pub fn do_something() {
        println!("In the bar module");
    }
}

fn main() {
    foo::do_something();
    bar::do_something();
}

Visibility

Modules are a privacy boundary:

  • Module items are private by default (hides implementation details).
  • Parent and sibling items are always visible.
mod outer {
    fn private() {
        println!("outer::private");
    }

    pub fn public() {
        println!("outer::public");
    }

    mod inner {
        fn private() {
            println!("outer::inner::private");
        }

        pub fn public() {
            println!("outer::inner::public");
            super::private();
        }
    }
}

fn main() {
    outer::public();
}
mod outer {
    fn private() {
        println!("outer::private");
    }

    pub fn public() {
        inner::public();
    }

    mod inner {
        fn private() {
            println!("outer::inner::private");
        }

        pub fn public() {
            println!("outer::inner::public");
            super::private();
        }
    }
}

fn main() {
    outer::public();
}

Paths

Paths are resolved as follows:

  1. As a relative path:

    • foo or self::foo refers to foo in the current module,
    • super::foo refers to foo in the parent module.
  2. As an absolute path:

    • crate::foo refers to foo in the root of the current crate,
    • bar::foo refers to foo in the bar crate.

Filesystem Hierarchy

The module content can be omitted:

mod garden;

The garden module content is found at:

  • src/garden.rs (modern Rust 2018 style)
  • src/garden/mod.rs (older Rust 2015 style)

Similarly, a garden::vegetables module can be found at:

  • src/garden/vegetables.rs (modern Rust 2018 style)
  • src/garden/vegetables/mod.rs (older Rust 2015 style)

The crate root is in:

  • src/lib.rs (for a library crate)
  • src/main.rs (for a binary crate)

Course 4: Exercises

In course 4, the exercices will focus on strings and iterators.

Luhn Algorithm

The Luhn algorithm is used to validate credit card numbers. The algorithm takes a string as input and does the following to validate the credit card number:

  • Ignore all spaces. Reject number with less than two digits.

  • Moving from right to left, double every second digit: for the number 1234, we double 3 and 1.

  • After doubling a digit, sum the digits. So doubling 7 becomes 14 which becomes 5.

  • Sum all the undoubled and doubled digits.

  • The credit card number is valid if the sum ends with 0.

Copy the following code to https://play.rust-lang.org/ and implement the function:

// TODO: remove this when you're done with your implementation.
#![allow(unused_variables, dead_code)]

pub fn luhn(cc_number: &str) -> bool {
    unimplemented!()
}

#[test]
fn test_non_digit_cc_number() {
    assert!(!luhn("foo"));
}

#[test]
fn test_empty_cc_number() {
    assert!(!luhn(""));
    assert!(!luhn(" "));
    assert!(!luhn("  "));
    assert!(!luhn("    "));
}

#[test]
fn test_single_digit_cc_number() {
    assert!(!luhn("0"));
}

#[test]
fn test_two_digit_cc_number() {
    assert!(luhn(" 0 0 "));
}

#[test]
fn test_valid_cc_number() {
    assert!(luhn("4263 9826 4026 9299"));
    assert!(luhn("4539 3195 0343 6467"));
    assert!(luhn("7992 7398 713"));
}

#[test]
fn test_invalid_cc_number() {
    assert!(!luhn("4223 9826 4026 9299"));
    assert!(!luhn("4539 3195 0343 6476"));
    assert!(!luhn("8273 1232 7352 0569"));
}

#[allow(dead_code)]
fn main() {}

Strings and Iterators

In this exercise, you are implementing a routing component of a web server. The server is configured with a number of path prefixes which are matched against request paths. The path prefixes can contain a wildcard character which matches a full segment. See the unit tests below.

Copy the following code to https://play.rust-lang.org/ and make the tests pass. Try avoiding allocating a Vec for your intermediate results:

#![allow(unused)]
fn main() {
// TODO: remove this when you're done with your implementation.
#![allow(unused_variables, dead_code)]

pub fn prefix_matches(prefix: &str, request_path: &str) -> bool {
    unimplemented!()
}

#[test]
fn test_matches_without_wildcard() {
    assert!(prefix_matches("/v1/publishers", "/v1/publishers"));
    assert!(prefix_matches("/v1/publishers", "/v1/publishers/abc-123"));
    assert!(prefix_matches("/v1/publishers", "/v1/publishers/abc/books"));

    assert!(!prefix_matches("/v1/publishers", "/v1"));
    assert!(!prefix_matches("/v1/publishers", "/v1/publishersBooks"));
    assert!(!prefix_matches("/v1/publishers", "/v1/parent/publishers"));
}

#[test]
fn test_matches_with_wildcard() {
    assert!(prefix_matches(
        "/v1/publishers/*/books",
        "/v1/publishers/foo/books"
    ));
    assert!(prefix_matches(
        "/v1/publishers/*/books",
        "/v1/publishers/bar/books"
    ));
    assert!(prefix_matches(
        "/v1/publishers/*/books",
        "/v1/publishers/foo/books/book1"
    ));

    assert!(!prefix_matches("/v1/publishers/*/books", "/v1/publishers"));
    assert!(!prefix_matches(
        "/v1/publishers/*/books",
        "/v1/publishers/foo/booksByAuthor"
    ));
}
}

Welcome to course 5

In this course, we will cover some more advanced topics of Rust:

  • Traits: deriving traits, default methods, and important standard library traits.

  • Generics: generic data types, generic methods, monomorphization, and trait objects.

Traits

Rust lets you abstract over types with traits. They’re similar to interfaces:

trait Greet {
    fn say_hello(&self);
}

struct Dog {
    name: String,
}

struct Cat;  // No name, cats won't respond to it anyway.

impl Greet for Dog {
    fn say_hello(&self) {
        println!("Wuf, my name is {}!", self.name);
    }
}

impl Greet for Cat {
    fn say_hello(&self) {
        println!("Miau!");
    }
}

fn main() {
    let pets: Vec<Box<dyn Greet>> = vec![
        Box::new(Dog { name: String::from("Fido") }),
        Box::new(Cat),
    ];
    for pet in pets {
        pet.say_hello();
    }
}
  • Traits may specify pre-implemented (default) methods and methods that users are required to implement themselves. Methods with default implementations can rely on required methods.
  • Types that implement a given trait may be of different sizes. This makes it impossible to have things like Vec<Greet> in the example above.
  • dyn Greet is a way to tell the compiler about a dynamically sized type that implements Greet. The compiler does not know the concrete type that is being passed.
  • In the example, pets holds Fat Pointers to objects that implement Greet. The Fat Pointer consists of two components, a pointer to the actual object and a pointer to the virtual method table for the Greet implementation of that particular object.

From rust-lang, “a dyn Trait reference contains two pointers. One pointer goes to the data (e.g., an instance of a struct). Another pointer goes to a map of method call names to function pointers (known as a virtual method table or vtable). At run-time, when a method needs to be called on the dyn Trait, the vtable is consulted to get the function pointer and then that function pointer is called.”

Compare these outputs in the above example:

    println!("{} {}", std::mem::size_of::<Dog>(), std::mem::size_of::<Cat>());
    println!("{} {}", std::mem::size_of::<&Dog>(), std::mem::size_of::<&Cat>());
    println!("{}", std::mem::size_of::<&dyn Greet>());
    println!("{}", std::mem::size_of::<Box<dyn Greet>>());

This gives the output:

24 0
8 8
16
16

Deriving Traits

You can let the compiler derive a number of traits:

#[derive(Debug, Clone, PartialEq, Eq, Default)]
struct Player {
    name: String,
    strength: u8,
    hit_points: u8,
}

fn main() {
    let p1 = Player::default();
    let p2 = p1.clone();
    println!("Is {:?}\nequal to {:?}?\nThe answer is {}!", &p1, &p2,
             if p1 == p2 { "yes" } else { "no" });
}

Default Methods

Traits can implement behavior in terms of other trait methods.

This means that when defining a trait, you can:

Declare a set of methods. Provide default implementations for some methods, which can call other methods in the same trait. This allows you to:

Define a “core” set of methods that must be implemented by types. Provide additional methods that build on top of the core methods, reducing boilerplate for implementors.

trait Equals {
    fn equal(&self, other: &Self) -> bool;
    fn not_equal(&self, other: &Self) -> bool {
        !self.equal(other)
    }
}

#[derive(Debug)]
struct Centimeter(i16);

impl Equals for Centimeter {
    fn equal(&self, other: &Centimeter) -> bool {
        self.0 == other.0
    }
}

fn main() {
    let a = Centimeter(10);
    let b = Centimeter(20);
    println!("{a:?} equals {b:?}: {}", a.equal(&b));
    println!("{a:?} not_equals {b:?}: {}", a.not_equal(&b));
}

Centimeters reuses the behavior of not_equal from the Equals trait. It overwrites equal, however.

Important Traits

We will now look at some of the most common traits of the Rust standard library:

  • Iterator and IntoIterator used in for loops,
  • From and Into used to convert values,
  • Read and Write used for IO,
  • Add, Mul, 
 used for operator overloading, and
  • Drop used for defining destructors.

Iterators

You can implement the Iterator trait on your own types:

struct Fibonacci {
    curr: u32,
    next: u32,
}

impl Iterator for Fibonacci {
    type Item = u32;

    fn next(&mut self) -> Option<Self::Item> {
        let new_next = self.curr + self.next;
        self.curr = self.next;
        self.next = new_next;
        Some(self.curr)
    }
}

fn main() {
    let fib = Fibonacci { curr: 0, next: 1 };
    for (i, n) in fib.enumerate().take(5) {
        println!("fib({i}): {n}");
    }
}
  • enumerate().take() is a powerful combination of two iterator methods:

    • enumerate(): Converts an iterator into an iterator of tuples (index, value), where index is the position of the value in the original iterator.
    • take(n): Limits the iterator to the first n elements.
  • IntoIterator is the trait that makes for loops work. It is implemented by collection types such as Vec<T> and references to them such as &Vec<T> and &[T]. Ranges also implement it. &[T] is the shared slice type.

  • The Iterator trait implements many common functional programming operations over collections (e.g. map, filter, reduce, etc). This is the trait where you can find all the documentation about them. In Rust these functions should produce the code as efficient as equivalent imperative implementations.

FromIterator

FromIterator lets you build a collection from an Iterator.

fn main() {
    let primes = vec![2, 3, 5, 7];
    let prime_squares = primes
        .into_iter()
        .map(|prime| prime * prime)
        .collect::<Vec<_>>();
}

Iterator implements fn collect<B>(self) -> B where B: FromIterator<Self::Item>, Self: Sized collect enables to convert an interator into a collection. Here it enables to reconstruct a vector.

There are also implementations which let you do cool things like convert an Iterator<Item = Result<V, E>> into a Result<Vec<V>, E>.

Add this line to print the computed squares: println!("{:?}", prime_squares)

From and Into

Types implement From and Into to facilitate type conversions:

fn main() {
    let s = String::from("hello");
    let addr = std::net::Ipv4Addr::from([127, 0, 0, 1]);
    let one = i16::from(true);
    let bigger = i32::from(123i16);
    println!("{s}, {addr}, {one}, {bigger}");
}

Into is automatically implemented when From is implemented:

fn main() {
    let s: String = "hello".into();
    let addr: std::net::Ipv4Addr = [127, 0, 0, 1].into();
    let one: i16 = true.into();
    let bigger: i32 = 123i16.into();
    println!("{s}, {addr}, {one}, {bigger}");
}
  • That’s why it is common to only implement From, as your type will get Into implementation too.
  • When declaring a function argument input type like “anything that can be converted into a String”, the rule is opposite, you should use Into. Your function will accept types that implement From and those that only implement Into.

Read and Write

Using Read and BufRead, you can abstract over u8 sources:

use std::io::{BufRead, BufReader, Read, Result};

fn count_lines<R: Read>(reader: R) -> usize {
    let buf_reader = BufReader::new(reader);
    buf_reader.lines().count()
}

fn main() -> Result<()> {
    let slice: &[u8] = b"foo\nbar\nbaz\n";
    println!("lines in slice: {}", count_lines(slice));

    let file = std::fs::File::open(std::env::current_exe()?)?;
    println!("lines in file: {}", count_lines(file));
    Ok(())
}

Similarly, Write lets you abstract over u8 sinks:

use std::io::{Result, Write};

fn log<W: Write>(writer: &mut W, msg: &str) -> Result<()> {
    writer.write_all(msg.as_bytes())?;
    writer.write_all("\n".as_bytes())
}

fn main() -> Result<()> {
    let mut buffer = Vec::new();
    log(&mut buffer, "Hello")?;
    log(&mut buffer, "World")?;
    println!("Logged: {:?}", buffer);
    Ok(())
}
`std::env::current_exe()` Returns the full filesystem path of the current running executable.

Add, Mul, 


Operator overloading is implemented via traits in std::ops:

#[derive(Debug, Copy, Clone)]
struct Point { x: i32, y: i32 }

impl std::ops::Add for Point {
    type Output = Self;

    fn add(self, other: Self) -> Self {
        Self {x: self.x + other.x, y: self.y + other.y}
    }
}

fn main() {
    let p1 = Point { x: 10, y: 20 };
    let p2 = Point { x: 100, y: 200 };
    println!("{:?} + {:?} = {:?}", p1, p2, p1 + p2);
}

Discussion points:

  • You could implement Add for &Point. In which situations is that useful?
    • Answer: Add:add consumes self. If type T for which you are overloading the operator is not Copy, you should consider overloading the operator for &T as well. This avoids unnecessary cloning on the call site.
  • Why is Output an associated type? Could it be made a type parameter?
    • Short answer: Type parameters are controlled by the caller, but associated types (like Output) are controlled by the implementor of a trait.

The Drop Trait

Values which implement Drop can specify code to run when they go out of scope:

struct Droppable {
    name: &'static str,
}

impl Drop for Droppable {
    fn drop(&mut self) {
        println!("Dropping {}", self.name);
    }
}

fn main() {
    let a = Droppable { name: "a" };
    {
        let b = Droppable { name: "b" };
        {
            let c = Droppable { name: "c" };
            let d = Droppable { name: "d" };
            println!("Exiting block B");
        }
        println!("Exiting block A");
    }
    drop(a);
    println!("Exiting main");
}

Generics

Rust support generics, which lets you abstract an algorithm (such as sorting) over the types used in the algorithm.

Generic Data Types

You can use generics to abstract over the concrete field type:

#[derive(Debug)]
struct Point<T> {
    x: T,
    y: T,
}

fn main() {
    let integer = Point { x: 5, y: 10 };
    let float = Point { x: 1.0, y: 4.0 };
    println!("{integer:?} and {float:?}");
}

Generic Methods

You can declare a generic type on your impl block:

#[derive(Debug)]
struct Point<T>(T, T);

impl<T> Point<T> {
    fn x(&self) -> &T {
        &self.0  // + 10
    }

    // fn set_x(&mut self, x: T)
}

fn main() {
    let p = Point(5, 10);
    println!("p.x = {}", p.x());
}
  • Q: Why T is specified twice in impl<T> Point<T> {}? Isn’t that redundant?
    • This is because it is a generic implementation section for generic type. They are independently generic.
    • It means these methods are defined for any T.
    • It is possible to write impl Point<u32> { .. }.
      • Point is still generic and you can use Point<f64>, but methods in this block will only be available for Point<u32>.

Trait Bounds

When working with generics, you often want to limit the types. You can do this with T: Trait or impl Trait:

fn duplicate<T: Clone>(a: T) -> (T, T) {
    (a.clone(), a.clone())
}

// struct NotClonable;

fn main() {
    let foo = String::from("foo");
    let pair = duplicate(foo);
    println!("{pair:?}");
}

Here traits are used as bounds to stipulate what functionality a type implements. For example, above, T must implement Clone.

Show a where clause, students will encounter it when reading code.

fn duplicate<T>(a: T) -> (T, T)
where
    T: Clone,
{
    (a.clone(), a.clone())
}
  • It declutters the function signature if you have many parameters.
  • It has additional features making it more powerful.
    • If someone asks, the extra feature is that the type on the left of “:” can be arbitrary, like Option<T>.

impl Trait

Similar to trait bounds, an impl Trait syntax can be used in function arguments and return values:

use std::fmt::Display;

fn get_x(name: impl Display) -> impl Display {
    format!("Hello {name}")
}

fn main() {
    let x = get_x("foo");
    println!("{x}");
}
  • impl Trait cannot be used with the ::<> turbo fish syntax.
  • impl Trait allows you to work with types which you cannot name.

The meaning of impl Trait is a bit different in the different positions.

  • For a parameter, impl Trait is like an anonymous generic parameter with a trait bound.
  • For a return type, it means that the return type is some concrete type that implements the trait, without naming the type. This can be useful when you don’t want to expose the concrete type in a public API.

This example is great, because it uses impl Display twice. It helps to explain that nothing here enforces that it is the same impl Display type. If we used a single T: Display, it would enforce the constraint that input T and return T type are the same type. It would not work for this particular function, as the type we expect as input is likely not what format! returns. If we wanted to do the same via : Display syntax, we’d need two independent generic parameters.

Closures

Closures or lambda expressions have types which cannot be named. However, they implement special Fn, FnMut, and FnOnce traits:

fn apply_with_log(func: impl FnOnce(i32) -> i32, input: i32) -> i32 {
    println!("Calling function on {input}");
    func(input)
}

fn main() {
    let add_3 = |x| x + 3;
    let mul_5 = |x| x * 5;

    println!("add_3: {}", apply_with_log(add_3, 10));
    println!("mul_5: {}", apply_with_log(mul_5, 20));
}

If you have an FnOnce, you may only call it once. It might consume captured values.

An FnMut might mutate captured values, so you can call it multiple times but not concurrently.

An Fn neither consumes nor mutates captured values, or perhaps captures nothing at all, so it can be called multiple times concurrently.

FnMut is a subtype of FnOnce. Fn is a subtype of FnMut and FnOnce. I.e. you can use an FnMut wherever an FnOnce is called for, and you can use an Fn wherever an FnMut or FnOnce is called for.

move closures only implement FnOnce.

Monomorphization

Generic code is turned into non-generic code based on the call sites:

fn main() {
    let integer = Some(5);
    let float = Some(5.0);
}

behaves as if you wrote

enum Option_i32 {
    Some(i32),
    None,
}

enum Option_f64 {
    Some(f64),
    None,
}

fn main() {
    let integer = Option_i32::Some(5);
    let float = Option_f64::Some(5.0);
}

This is a zero-cost abstraction: you get exactly the same result as if you had hand-coded the data structures without the abstraction.

Monomorphization

“In Rust, the compiler stamps out a different copy of the code of a generic function for each concrete type needed. For example, if I use a Vec<u64> and a Vec<String> in my code, then the generated binary will have two copies of the generated code for Vec: one for Vec<u64> and another for Vec<String>. The result is fast programs, but it comes at the cost of compile time (creating all those copies can take a while) and binary size (all those copies might take a lot of space).” This is different from Java, where the precise type is known at run-time and (almost) all variables are reference values (ie pointer to a heap allocated object). This has performance cost because of the dereference for each access.

Trait Objects

We’ve seen how a function can take arguments which implement a trait:

use std::fmt::Display;

fn print<T: Display>(x: T) {
    println!("Your value: {}", x);
}

fn main() {
    print(123);
    print("Hello");
}

However, how can we store a collection of mixed types which implement Display?

fn main() {
    let xs = vec![123, "Hello"];
}

For this, we need trait objects:

use std::fmt::Display;

fn main() {
    let xs: Vec<Box<dyn Display>> = vec![Box::new(123), Box::new("Hello")];
    for x in xs {
        println!("x: {x}");
    }
}

Memory layout after allocating xs:

<str as Display>::fmt<i32 as Display>::fmtStackHeapxsptrlen2capacity2Hello7b000000

Similarly, you need a trait object if you want to return different values implementing a trait:

fn numbers(n: i32) -> Box<dyn Iterator<Item=i32>> {
    if n > 0 {
        Box::new(0..n)
    } else {
        Box::new((n..0).rev())
    }
}

fn main() {
    println!("{:?}", numbers(-5).collect::<Vec<_>>());
    println!("{:?}", numbers(5).collect::<Vec<_>>());
}

Course 5 Exercises

We will design a classical GUI library traits and trait objects.

A Simple GUI Library

Let us design a classical GUI library using our new knowledge of traits and trait objects.

We will have a number of widgets in our library:

  • Window: has a title and contains other widgets.
  • Button: has a label and a callback function which is invoked when the button is pressed.
  • Label: has a label.

The widgets will implement a Widget trait, see below.

Copy the code below to https://play.rust-lang.org/, fill in the missing draw_into methods so that you implement the Widget trait:

// TODO: remove this when you're done with your implementation.
#![allow(unused_imports, unused_variables, dead_code)]

pub trait Widget {
    /// Natural width of `self`.
    fn width(&self) -> usize;

    /// Draw the widget into a buffer.
    fn draw_into(&self, buffer: &mut dyn std::fmt::Write);

    /// Draw the widget on standard output.
    fn draw(&self) {
        let mut buffer = String::new();
        self.draw_into(&mut buffer);
        println!("{}", &buffer);
    }
}

pub struct Label {
    label: String,
}

impl Label {
    fn new(label: &str) -> Label {
        Label {
            label: label.to_owned(),
        }
    }
}

pub struct Button {
    label: Label,
    callback: Box<dyn FnMut()>,
}

impl Button {
    fn new(label: &str, callback: Box<dyn FnMut()>) -> Button {
        Button {
            label: Label::new(label),
            callback,
        }
    }
}

pub struct Window {
    title: String,
    widgets: Vec<Box<dyn Widget>>,
}

impl Window {
    fn new(title: &str) -> Window {
        Window {
            title: title.to_owned(),
            widgets: Vec::new(),
        }
    }

    fn add_widget(&mut self, widget: Box<dyn Widget>) {
        self.widgets.push(widget);
    }
}


impl Widget for Label {
    fn width(&self) -> usize {
        unimplemented!()
    }

    fn draw_into(&self, buffer: &mut dyn std::fmt::Write) {
        unimplemented!()
    }
}

impl Widget for Button {
    fn width(&self) -> usize {
        unimplemented!()
    }

    fn draw_into(&self, buffer: &mut dyn std::fmt::Write) {
        unimplemented!()
    }
}

impl Widget for Window {
    fn width(&self) -> usize {
        unimplemented!()
    }

    fn draw_into(&self, buffer: &mut dyn std::fmt::Write) {
        unimplemented!()
    }
}

fn main() {
    let mut window = Window::new("Rust GUI Demo 1.23");
    window.add_widget(Box::new(Label::new("This is a small text GUI demo.")));
    window.add_widget(Box::new(Button::new(
        "Click me!",
        Box::new(|| println!("You clicked the button!")),
    )));
    window.draw();
}

The output of the above program can be something simple like this:

========
Rust GUI Demo 1.23
========

This is a small text GUI demo.

| Click me! |

If you want to draw aligned text, you can use the fill/alignment formatting operators. In particular, notice how you can pad with different characters (here a '/') and how you can control alignment:

fn main() {
    let width = 10;
    println!("left aligned:  |{:/<width$}|", "foo");
    println!("centered:      |{:/^width$}|", "foo");
    println!("right aligned: |{:/>width$}|", "foo");
}

Using such alignment tricks, you can for example produce output like this:

+--------------------------------+
|       Rust GUI Demo 1.23       |
+================================+
| This is a small text GUI demo. |
| +-----------+                  |
| | Click me! |                  |
| +-----------+                  |
+--------------------------------+

In this 6th course, we will discover

  • Error handling: panics, Result, and the try operator ?.

  • Testing: unit tests, documentation tests, and integration tests.

  • Unsafe Rust: raw pointers, static variables, unsafe functions, and extern functions.

Error Handling

Error handling in Rust is done using explicit control flow:

  • Functions that can have errors list this in their return type,
  • There are no exceptions.

Doc on handling errors

Panics

Rust will trigger a panic if a fatal error happens at runtime:

fn main() {
    let v = vec![10, 20, 30];
    println!("v[100]: {}", v[100]);
}
  • Panics are for unrecoverable and unexpected errors.
    • Panics are symptoms of bugs in the program.
  • Use non-panicking APIs (such as Vec::get) if crashing is not acceptable.

Catching the Stack Unwinding

By default, a panic will cause the stack to unwind. The unwinding can be caught:

#![allow(unused)]
fn main() {
use std::panic;

let result = panic::catch_unwind(|| {
    println!("hello!");
});
assert!(result.is_ok());

let result = panic::catch_unwind(|| {
    panic!("oh no!");
});
assert!(result.is_err());
}
  • Catching the stack unwinding can be useful in servers which should keep running even if a single request crashes.
  • This does not work if panic = 'abort' is set in your Cargo.toml.

The panic! macro can be used to generate a panic and start unwinding its stack. While unwinding, the runtime will take care of freeing all the resources owned by the thread by calling the destructor of all its objects.

assert! asserts that a boolean expression is true at runtime. This will invoke the panic! macro if the provided expression cannot be evaluated to true at runtime.

Structured Error Handling with Result

We have already seen the Result enum. This is used pervasively when errors are expected as part of normal operation:

use std::fs::File;
use std::io::Read;

fn main() {
    let file = File::open("diary.txt");
    match file {
        Ok(mut file) => {
            let mut contents = String::new();
            file.read_to_string(&mut contents);
            println!("Dear diary: {contents}");
        },
        Err(err) => {
            println!("The diary could not be opened: {err}");
        }
    }
}
  • As with Option, the successful value sits inside of Result, forcing the developer to explicitly extract it. This encourages error checking. In the case where an error should never happen, unwrap() or expect() can be called, and this is a signal of the developer intent too.
  • Result documentation is a recommended read. Not during the course, but it is worth mentioning. It contains a lot of convenience methods and functions that help functional-style programming.

Propagating Errors with ?

The try-operator ? is used to return errors to the caller. It lets you turn the common

match some_expression {
    Ok(value) => value,
    Err(err) => return Err(err),
}

into the much simpler

some_expression?

We can use this to simplify our error handing code:

use std::fs;
use std::io::{self, Read};

fn read_username(path: &str) -> Result<String, io::Error> {
    let username_file_result = fs::File::open(path);

    let mut username_file = match username_file_result {
        Ok(file) => file,
        Err(e) => return Err(e),
    };
    //let mut username_file = username_file_result?;

    let mut username = String::new();

    match username_file.read_to_string(&mut username) {
        Ok(_) => Ok(username),
        Err(e) => Err(e),
    }
}

fn main() {
    //fs::write("config.dat", "alice").unwrap();
    let username = read_username("config.dat");
    println!("username or error: {username:?}");
}

Key points:

  • The username variable can be either Ok(string) or Err(error).
  • Use the fs::write call to test out the different scenarios: no file, empty file, file with username.

Converting Error Types

The effective expansion of ? is a little more complicated than previously indicated:

expression?

works the same as

match expression {
    Ok(value) => value,
    Err(err)  => return Err(From::from(err)),
}

The From::from call here means we attempt to convert the error type to the type returned by the function:

Converting Error Types

use std::error::Error;
use std::fmt::{self, Display, Formatter};
use std::fs::{self, File};
use std::io::{self, Read};

#[derive(Debug)]
enum ReadUsernameError {
    IoError(io::Error),
    EmptyUsername(String),
}

impl Error for ReadUsernameError {}

impl Display for ReadUsernameError {
    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
        match self {
            Self::IoError(e) => write!(f, "IO error: {}", e),
            Self::EmptyUsername(filename) => write!(f, "Found no username in {}", filename),
        }
    }
}

impl From<io::Error> for ReadUsernameError {
    fn from(err: io::Error) -> ReadUsernameError {
        ReadUsernameError::IoError(err)
    }
}

fn read_username(path: &str) -> Result<String, ReadUsernameError> {
    let mut username = String::with_capacity(100);
    File::open(path)?.read_to_string(&mut username)?;
    if username.is_empty() {
        return Err(ReadUsernameError::EmptyUsername(String::from(path)));
    }
    Ok(username)
}

fn main() {
    //fs::write("config.dat", "").unwrap();
    let username = read_username("config.dat");
    println!("username or error: {username:?}");
}

Key points:

  • The username variable can be either Ok(string) or Err(error).
  • Use the fs::write call to test out the different scenarios: no file, empty file, file with username.

It is good practice for all error types to implement std::error::Error, which requires Debug and Display. It’s generally helpful for them to implement Clone and Eq too where possible, to make life easier for tests and consumers of your library. In this case we can’t easily do so, because io::Error doesn’t implement them.

Deriving Error Enums

The thiserror crate is a popular way to create an error enum like we did on the previous page:

use std::{fs, io};
use std::io::Read;
use thiserror::Error;

#[derive(Debug, Error)]
enum ReadUsernameError {
    #[error("Could not read: {0}")]
    IoError(#[from] io::Error),
    #[error("Found no username in {0}")]
    EmptyUsername(String),
}

fn read_username(path: &str) -> Result<String, ReadUsernameError> {
    let mut username = String::with_capacity(100);
    fs::File::open(path)?.read_to_string(&mut username)?;
    if username.is_empty() {
        return Err(ReadUsernameError::EmptyUsername(String::from(path)));
    }
    Ok(username)
}

fn main() {
    //fs::write("config.dat", "").unwrap();
    match read_username("config.dat") {
        Ok(username) => println!("Username: {username}"),
        Err(err)     => println!("Error: {err}"),
    }
}

thiserror’s derive macro automatically implements std::error::Error, and optionally Display (if the #[error(...)] attributes are provided) and From (if the #[from] attribute is added). It also works for structs.

It doesn’t affect your public API, which makes it good for libraries.

Dynamic Error Types

Sometimes we want to allow any type of error to be returned without writing our own enum covering all the different possibilities. std::error::Error makes this easy.

use std::fs::{self, File};
use std::io::Read;
use thiserror::Error;
use std::error::Error;

#[derive(Clone, Debug, Eq, Error, PartialEq)]
#[error("Found no username in {0}")]
struct EmptyUsernameError(String);

fn read_username(path: &str) -> Result<String, Box<dyn Error>> {
    let mut username = String::with_capacity(100);
    File::open(path)?.read_to_string(&mut username)?;
    if username.is_empty() {
        return Err(EmptyUsernameError(String::from(path)).into());
    }
    Ok(username)
}

fn main() {
    //fs::write("config.dat", "").unwrap();
    match read_username("config.dat") {
        Ok(username) => println!("Username: {username}"),
        Err(err)     => println!("Error: {err}"),
    }
}

This saves on code, but gives up the ability to cleanly handle different error cases differently in the program. As such it’s generally not a good idea to use Box<dyn Error> in the public API of a library, but it can be a good option in a program where you just want to display the error message somewhere.

Adding Context to Errors

The widely used anyhow crate can help you add contextual information to your errors and allows you to have fewer custom error types:

use std::{fs, io};
use std::io::Read;
use anyhow::{Context, Result, bail};

fn read_username(path: &str) -> Result<String> {
    let mut username = String::with_capacity(100);
    fs::File::open(path)
        .context(format!("Failed to open {path}"))?
        .read_to_string(&mut username)
        .context("Failed to read")?;
    if username.is_empty() {
        bail!("Found no username in {path}");
    }
    Ok(username)
}

fn main() {
    //fs::write("config.dat", "").unwrap();
    match read_username("config.dat") {
        Ok(username) => println!("Username: {username}"),
        Err(err)     => println!("Error: {err:?}"),
    }
}
  • anyhow::Result<V> is a type alias for Result<V, anyhow::Error>.
  • anyhow::Error is essentially a wrapper around Box<dyn Error>. As such it’s again generally not a good choice for the public API of a library, but is widely used in applications.
  • Actual error type inside of it can be extracted for examination if necessary.
  • Functionality provided by anyhow::Result<T> may be familiar to Go developers, as it provides similar usage patterns and ergonomics to (T, error) from Go.

Testing

Rust and Cargo come with a simple unit test framework:

  • Unit tests are supported throughout your code.

  • Integration tests are supported via the tests/ directory.

Unit Tests

Mark unit tests with #[test]:

fn first_word(text: &str) -> &str {
    match text.find(' ') {
        Some(idx) => &text[..idx],
        None => &text,
    }
}

#[test]
fn test_empty() {
    assert_eq!(first_word(""), "");
}

#[test]
fn test_single_word() {
    assert_eq!(first_word("Hello"), "Hello");
}

#[test]
fn test_multiple_words() {
    assert_eq!(first_word("Hello World"), "Hello");
}

Use cargo test to find and run the unit tests.

Test Modules

Unit tests are often put in a nested module (run tests on the Playground):

fn helper(a: &str, b: &str) -> String {
    format!("{a} {b}")
}

pub fn main() {
    println!("{}", helper("Hello", "World"));
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_helper() {
        assert_eq!(helper("foo", "bar"), "foo bar");
    }
}
  • This lets you unit test private helpers.
  • The #[cfg(test)] attribute is only active when you run cargo test.

Documentation Tests

Rust has built-in support for documentation tests:

#![allow(unused)]
fn main() {
/// Shortens a string to the given length.
///
/// ```
/// use playground::shorten_string;
/// assert_eq!(shorten_string("Hello World", 5), "Hello");
/// assert_eq!(shorten_string("Hello World", 20), "Hello World");
/// ```
pub fn shorten_string(s: &str, length: usize) -> &str {
    &s[..std::cmp::min(length, s.len())]
}
}
  • Code blocks in /// comments are automatically seen as Rust code.
  • The code will be compiled and executed as part of cargo test.
  • Test the above code on the Rust Playground.
pub fn shorten_string(s: &str, length: usize) -> &str {
    &s[..std::cmp::min(length, s.len())]
}

#[test]
fn test_single_word() {
    assert_eq!(shorten_string("Hello World",5), "Hello");
}

#[test]
fn test_multiple_words() {
    assert_eq!(shorten_string("Hello World",20), "Hello World");
}

Integration Tests

If you want to test your library as a client, use an integration test.

Create a .rs file under tests/:

use my_library::init;

#[test]
fn test_init() {
    assert!(init().is_ok());
}

These tests only have access to the public API of your crate.

Unsafe Rust

The Rust language has two parts:

  • Safe Rust: memory safe, no undefined behavior possible.
  • Unsafe Rust: can trigger undefined behavior if preconditions are violated.

We will be seeing mostly safe Rust in this course, but it’s important to know what Unsafe Rust is.

Unsafe code is usually small and isolated, and its correctness should be carefully documented. It is usually wrapped in a safe abstraction layer.

Unsafe Rust gives you access to five new capabilities:

  • Dereference raw pointers.
  • Access or modify mutable static variables.
  • Access union fields.
  • Call unsafe functions, including extern functions.
  • Implement unsafe traits.

We will briefly cover unsafe capabilities next. For full details, please see Chapter 19.1 in the Rust Book and the Rustonomicon.

Unsafe Rust does not mean the code is incorrect. It means that developers have turned off the compiler safety features and have to write correct code by themselves. It means the compiler no longer enforces Rust’s memory-safety rules.

Dereferencing Raw Pointers

Creating pointers is safe, but dereferencing them requires unsafe:

fn main() {
    let mut num = 5;

    let r1 = &mut num as *mut i32;
    let r2 = &num as *const i32;

    // Safe because r1 and r2 were obtained from references and so are guaranteed to be non-null and
    // properly aligned, the objects underlying the references from which they were obtained are
    // live throughout the whole unsafe block, and they are not accessed either through the
    // references or concurrently through any other pointers.
    unsafe {
        println!("r1 is: {}", *r1);
        *r1 = 10;
        println!("r2 is: {}", *r2);
    }
}

It is good practice (and required by the Android Rust style guide) to write a comment for each unsafe block explaining how the code inside it satisfies the safety requirements of the unsafe operations it is doing.

In the case of pointer dereferences, this means that the pointers must be valid, i.e.:

  • The pointer must be non-null.
  • The pointer must be dereferenceable (within the bounds of a single allocated object).
  • The object must not have been deallocated.
  • There must not be concurrent accesses to the same location.
  • If the pointer was obtained by casting a reference, the underlying object must be live and no reference may be used to access the memory.

In most cases the pointer must also be properly aligned.

Mutable Static Variables

It is safe to read an immutable static variable:

static HELLO_WORLD: &str = "Hello, world!";

fn main() {
    println!("HELLO_WORLD: {}", HELLO_WORLD);
}

However, since data races can occur, it is unsafe to read and write mutable static variables:

static mut COUNTER: u32 = 0;

fn add_to_counter(inc: u32) {
    unsafe { COUNTER += inc; }  // Potential data race!
}

fn main() {
    add_to_counter(42);

    unsafe { println!("COUNTER: {}", COUNTER); }  // Potential data race!
}

Using a mutable static is generally a bad idea, but there are some cases where it might make sense in low-level no_std code, such as implementing a heap allocator or working with some C APIs.

no_std is used to prevent Rust from loading the standard library. This is used for bare metal development for example. no_std in the Embedded Rust Book

Unions

Unions are like enums, but you need to track the active field yourself:

#[repr(C)]
union MyUnion {
    i: u8,
    b: bool,
}

fn main() {
    let u = MyUnion { i: 42 };
    println!("int: {}", unsafe { u.i });
    println!("bool: {}", unsafe { u.b });  // Undefined behavior!
}

Unions are very rarely needed in Rust as you can usually use an enum. They are occasionally needed for interacting with C library APIs.

If you just want to reinterpret bytes as a different type, you probably want std::mem::transmute or a safe wrapper such as the zerocopy crate.

Calling Unsafe Functions

A function or method can be marked unsafe if it has extra preconditions you must uphold to avoid undefined behaviour:

fn main() {
    let emojis = "đŸ—»âˆˆđŸŒ";

    // Safe because the indices are in the correct order, within the bounds of
    // the string slice, and lie on UTF-8 sequence boundaries.
    unsafe {
        println!("{}", emojis.get_unchecked(0..4));
        println!("{}", emojis.get_unchecked(4..7));
        println!("{}", emojis.get_unchecked(7..11));
    }
}

Writing Unsafe Functions

You can mark your own functions as unsafe if they require particular conditions to avoid undefined behaviour.

/// Swaps the values pointed to by the given pointers.
///
/// # Safety
///
/// The pointers must be valid and properly aligned.
unsafe fn swap(a: *mut u8, b: *mut u8) {
    let temp = *a;
    *a = *b;
    *b = temp;
}

fn main() {
    let mut a = 42;
    let mut b = 66;

    // Safe because ...
    unsafe {
        swap(&mut a, &mut b);
    }

    println!("a = {}, b = {}", a, b);
}

We wouldn’t actually use pointers for this because it can be done safely with references.

Note that unsafe code is allowed within an unsafe function without an unsafe block. We can prohibit this with #[deny(unsafe_op_in_unsafe_fn)]. Try adding it and see what happens.

Calling External Code

Functions from other languages might violate the guarantees of Rust. Calling them is thus unsafe:

extern "C" {
    fn abs(input: i32) -> i32;
}

fn main() {
    unsafe {
        // Undefined behavior if abs misbehaves.
        println!("Absolute value of -3 according to C: {}", abs(-3));
    }
}

This is usually only a problem for extern functions which do things with pointers which might violate Rust’s memory model, but in general any C function might have undefined behaviour under any arbitrary circumstances.

The "C" in this example is the ABI; other ABIs are available too.

Implementing Unsafe Traits

Like with functions, you can mark a trait as unsafe if the implementation must guarantee particular conditions to avoid undefined behaviour.

For example, the zerocopy crate has an unsafe trait that looks something like this:

use std::mem::size_of_val;
use std::slice;

/// ...
/// # Safety
/// The type must have a defined representation and no padding.
pub unsafe trait AsBytes {
    fn as_bytes(&self) -> &[u8] {
        unsafe {
            slice::from_raw_parts(self as *const Self as *const u8, size_of_val(self))
        }
    }
}

// Safe because u32 has a defined representation and no padding.
unsafe impl AsBytes for u32 {}

There should be a # Safety section on the Rustdoc for the trait explaining the requirements for the trait to be safely implemented.

The actual safety section for AsBytes is rather longer and more complicated.

The built-in Send and Sync traits are unsafe.

Course 6 Exercises

Let us build a safe wrapper for reading directory content!

Safe FFI Wrapper

Rust has great support for calling functions through a foreign function interface (FFI). We will use this to build a safe wrapper for the libc functions you would use from C to read the filenames of a directory.

You will want to consult the manual pages:

You will also want to browse the std::ffi module, particular for CStr and CString types which are used to hold NUL-terminated strings coming from C. The Nomicon also has a very useful chapter about FFI.

Copy the code below to https://play.rust-lang.org/ and fill in the missing functions and methods:

// TODO: remove this when you're done with your implementation.
#![allow(unused_imports, unused_variables, dead_code)]

mod ffi {
    use std::os::raw::{c_char, c_int, c_long, c_ulong, c_ushort};

    // Opaque type. See https://doc.rust-lang.org/nomicon/ffi.html.
    #[repr(C)]
    pub struct DIR {
        _data: [u8; 0],
        _marker: core::marker::PhantomData<(*mut u8, core::marker::PhantomPinned)>,
    }

    // Layout as per readdir(3) and definitions in /usr/include/x86_64-linux-gnu.
    #[repr(C)]
    pub struct dirent {
        pub d_ino: c_long,
        pub d_off: c_ulong,
        pub d_reclen: c_ushort,
        pub d_type: c_char,
        pub d_name: [c_char; 256],
    }

    extern "C" {
        pub fn opendir(s: *const c_char) -> *mut DIR;
        pub fn readdir(s: *mut DIR) -> *const dirent;
        pub fn closedir(s: *mut DIR) -> c_int;
    }
}

use std::ffi::{CStr, CString, OsStr, OsString};
use std::os::unix::ffi::OsStrExt;

#[derive(Debug)]
struct DirectoryIterator {
    path: CString,
    dir: *mut ffi::DIR,
}

impl DirectoryIterator {
    fn new(path: &str) -> Result<DirectoryIterator, String> {
        // Call opendir and return a Ok value if that worked,
        // otherwise return Err with a message.
        unimplemented!()
    }
}

impl Iterator for DirectoryIterator {
    type Item = OsString;
    fn next(&mut self) -> Option<OsString> {
        // Keep calling readdir until we get a NULL pointer back.
        unimplemented!()
    }
}

impl Drop for DirectoryIterator {
    fn drop(&mut self) {
        // Call closedir as needed.
        unimplemented!()
    }
}

fn main() -> Result<(), String> {
    let iter = DirectoryIterator::new(".")?;
    println!("files: {:#?}", iter.collect::<Vec<_>>());
    Ok(())
}

Welcome to course 7

Today we will look at:

  • Concurrency: threads, channels, shared state, Send and Sync.

Fearless Concurrency

Rust has full support for concurrency using OS threads with mutexes and channels.

The Rust type system plays an important role in making many concurrency bugs compile time bugs. This is often referred to as fearless concurrency since you can rely on the compiler to ensure correctness at runtime.

Threads

Rust threads work similarly to threads in other languages:

use std::thread;
use std::time::Duration;

fn main() {
    thread::spawn(|| {
        for i in 1..10 {
            println!("Count in thread: {i}!");
            thread::sleep(Duration::from_millis(5));
        }
    });

    for i in 1..5 {
        println!("Main thread: {i}");
        thread::sleep(Duration::from_millis(5));
    }
}
  • Threads are all daemon threads, the main thread does not wait for them.
  • Thread panics are independent of each other.
    • Panics can carry a payload, which can be unpacked with downcast_ref.

Key points:

  • Notice that the thread is stopped before it reaches 10 — the main thread is not waiting.

  • Use let handle = thread::spawn(...) and later handle.join() to wait for the thread to finish.

  • Trigger a panic in the thread, notice how this doesn’t affect main.

  • Use the Result return value from handle.join() to get access to the panic payload. This is a good time to talk about Any.

use std::thread;
use std::time::Duration;

fn main() {
    let handler = thread::spawn(|| {
        for i in 1..10 {
            println!("Count in thread: {i}!");
            thread::sleep(Duration::from_millis(5));
        }
    });

    for i in 1..5 {
        println!("Main thread: {i}");
        thread::sleep(Duration::from_millis(5));
    }
    handler.join().unwrap();
}

Scoped Threads

Normal threads cannot borrow from their environment:

use std::thread;

fn main() {
    let s = String::from("Hello");

    thread::spawn(|| {
        println!("Length: {}", s.len());
    });
}

However, you can use a scoped thread for this:

use std::thread;

fn main() {
    let s = String::from("Hello");

    thread::scope(|scope| {
        scope.spawn(|| {
            println!("Length: {}", s.len());
        });
    });
}
  • The reason for that is that when the thread::scope function completes, all the threads are guaranteed to be joined, so they can return borrowed data.
  • Normal Rust borrowing rules apply: you can either borrow mutably by one thread, or immutably by any number of threads.

Channels

Rust channels have two parts: a Sender<T> and a Receiver<T>. The two parts are connected via the channel, but you only see the end-points.

use std::sync::mpsc;
use std::thread;

fn main() {
    let (tx, rx) = mpsc::channel();

    tx.send(10).unwrap();
    tx.send(20).unwrap();

    println!("Received: {:?}", rx.recv());
    println!("Received: {:?}", rx.recv());

    let tx2 = tx.clone();
    tx2.send(30).unwrap();
    println!("Received: {:?}", rx.recv());
}
  • mpsc stands for Multi-Producer, Single-Consumer. Sender and SyncSender implement Clone (so you can make multiple producers) but Receiver does not.
  • send() and recv() return Result. If they return Err, it means the counterpart Sender or Receiver is dropped and the channel is closed.

Unbounded Channels

You get an unbounded and asynchronous channel with mpsc::channel():

use std::sync::mpsc;
use std::thread;
use std::time::Duration;

fn main() {
    let (tx, rx) = mpsc::channel();

    thread::spawn(move || {
        let thread_id = thread::current().id();
        for i in 1..10 {
            tx.send(format!("Message {i}")).unwrap();
            println!("{thread_id:?}: sent Message {i}");
        }
        println!("{thread_id:?}: done");
    });
    thread::sleep(Duration::from_millis(100));

    for msg in rx.iter() {
        println!("Main: got {}", msg);
    }
}

Bounded Channels

Bounded and synchronous channels make send block the current thread:

use std::sync::mpsc;
use std::thread;
use std::time::Duration;

fn main() {
    let (tx, rx) = mpsc::sync_channel(3);

    thread::spawn(move || {
        let thread_id = thread::current().id();
        for i in 1..10 {
            tx.send(format!("Message {i}")).unwrap();
            println!("{thread_id:?}: sent Message {i}");
        }
        println!("{thread_id:?}: done");
    });
    thread::sleep(Duration::from_millis(100));

    for msg in rx.iter() {
        println!("Main: got {}", msg);
    }
}

Shared State

Rust uses the type system to enforce synchronization of shared data. This is primarily done via two types:

  • Arc<T>, atomic reference counted T: handles sharing between threads and takes care to deallocate T when the last reference is dropped,
  • Mutex<T>: ensures mutually exclusive access to the T value.

Arc

Arc<T> allows shared read-only access via its clone method:

use std::thread;
use std::sync::Arc;

fn main() {
    let v = Arc::new(vec![10, 20, 30]);
    let mut handles = Vec::new();
    for _ in 1..5 {
        let v = v.clone();
        handles.push(thread::spawn(move || {
            let thread_id = thread::current().id();
            println!("{thread_id:?}: {v:?}");
        }));
    }

    handles.into_iter().for_each(|h| h.join().unwrap());
    println!("v: {v:?}");
}
  • Arc stands for “Atomic Reference Counted”, a thread safe version of Rc that uses atomic operations.

    “The type Arc provides shared ownership of a value of type T, allocated in the heap. Invoking clone on Arc produces a new Arc instance, which points to the same allocation on the heap as the source Arc, while increasing a reference count. When the last Arc pointer to a given allocation is destroyed, the value stored in that allocation (often referred to as “inner value”) is also dropped.” from 1.

    Also see How arc works in rust

  • Arc<T> implements Clone whether or not T does. It implements Send and Sync iff T implements them both.

  • Arc::clone() has the cost of atomic operations that get executed, but after that the use of the T is free.

  • Beware of reference cycles, Arc does not use a garbage collector to detect them.

    • std::sync::Weak can help.

Mutex

Mutex<T> ensures mutual exclusion and allows mutable access to T behind a read-only interface:

use std::sync::Mutex;

fn main() {
    let v: Mutex<Vec<i32>> = Mutex::new(vec![10, 20, 30]);
    println!("v: {:?}", v.lock().unwrap());

    {
        let v: &Mutex<Vec<i32>> = &v;
        let mut guard = v.lock().unwrap();
        guard.push(40);
    }

    println!("v: {:?}", v.lock().unwrap());
}

Notice how we have a impl<T: Send> Sync for Mutex<T> blanket implementation.

  • Mutex in Rust looks like a collection with just one element - the protected data.
    • It is not possible to forget to acquire the mutex before accessing the protected data.
  • You can get an &mut T from an &Mutex<T> by taking the lock. The MutexGuard ensures that the &mut T doesn’t outlive the lock being held.
  • Mutex<T> implements both Send and Sync iff T implements Send.
  • A read-write lock counterpart - RwLock.
  • Why does lock() return a Result?
    • If the thread that held the Mutex panicked, the Mutex becomes “poisoned”. The error signals that the data it protected might be in an inconsistent state. Calling lock() on a poisoned mutex fails with a PoisonError. You can call into_inner() on the error to recover the data regardless.

Example

Let us see Arc and Mutex in action:

use std::thread;
// use std::sync::{Arc, Mutex};

fn main() {
    let mut v = vec![10, 20, 30];
    let handle = thread::spawn(|| {
        v.push(10);
    });
    v.push(1000);

    handle.join().unwrap();
    println!("v: {v:?}");
}

Possible solution:

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let v = Arc::new(Mutex::new(vec![10, 20, 30]));

    let v2 = v.clone();
    let handle = thread::spawn(move || {
        let mut v2 = v2.lock().unwrap();
        v2.push(10);
    });

    {
        let mut v = v.lock().unwrap();
        v.push(1000);
    }

    handle.join().unwrap();

    {
        let v = v.lock().unwrap();
        println!("v: {v:?}");
    }
}

Notable parts:

  • v is wrapped in both Arc and Mutex, because their concerns are orthogonal.
    • Wrapping a Mutex in an Arc is a common pattern to share mutable state between threads.
  • v: Arc<_> needs to be cloned as v2 before it can be moved into another thread. Note move was added to the lambda signature.
  • Blocks are introduced to narrow the scope of the LockGuard as much as possible.
  • We still need to acquire the Mutex to print our Vec.

Below is code to slowdown the main thread to invert the pushes into the vector.

use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;

fn main() {
    let v = Arc::new(Mutex::new(vec![10, 20, 30]));

    let v2 = v.clone();
    let handle = thread::spawn(move || {
        let mut v2 = v2.lock().unwrap();
        v2.push(10);
    });

    thread::sleep(Duration::from_millis(10));

    {
        let mut v = v.lock().unwrap();
        v.push(1000);
    }

    handle.join().unwrap();

    {
        let v = v.lock().unwrap();
        println!("v: {v:?}");
    }
}

Send and Sync

How does Rust know to forbid shared access across thread? The answer is in two traits:

  • Send: a type T is Send if it is safe to move a T across a thread boundary.
  • Sync: a type T is Sync if it is safe to move a &T across a thread boundary.

Send and Sync are unsafe traits. The compiler will automatically derive them for your types as long as they only contain Send and Sync types. You can also implement them manually when you know it is valid.

  • One can think of these traits as markers that the type has certain thread-safety properties.
  • They can be used in the generic constraints as normal traits.

Send

A type T is Send if it is safe to move a T value to another thread.

The effect of moving ownership to another thread is that destructors will run in that thread. So the question is when you can allocate a value in one thread and deallocate it in another.

Sync

A type T is Sync if it is safe to access a T value from multiple threads at the same time.

More precisely, the definition is:

T is Sync if and only if &T is Send

This statement is essentially a shorthand way of saying that if a type is thread-safe for shared use, it is also thread-safe to pass references of it across threads.

This is because if a type is Sync it means that it can be shared across multiple threads without the risk of data races or other synchronization issues, so it is safe to move it to another thread. A reference to the type is also safe to move to another thread, because the data it references can be accessed from any thread safely.

Examples

Send + Sync

Most types you come across are Send + Sync:

  • i8, f32, bool, char, &str, 

  • (T1, T2), [T; N], &[T], struct { x: T }, 

  • String, Option<T>, Vec<T>, Box<T>, 

  • Arc<T>: Explicitly thread-safe via atomic reference count.
  • Mutex<T>: Explicitly thread-safe via internal locking.
  • AtomicBool, AtomicU8, 
: Uses special atomic instructions.

The generic types are typically Send + Sync when the type parameters are Send + Sync.

Send + !Sync

These types can be moved to other threads, but they’re not thread-safe. Typically because of interior mutability:

  • mpsc::Sender<T>
  • mpsc::Receiver<T>
  • Cell<T>
  • RefCell<T>

!Send + Sync

These types are thread-safe, but they cannot be moved to another thread:

  • MutexGuard<T>: Uses OS level primitives which must be deallocated on the thread which created them.

!Send + !Sync

These types are not thread-safe and cannot be moved to other threads:

  • Rc<T>: each Rc<T> has a reference to an RcBox<T>, which contains a non-atomic reference count.
  • *const T, *mut T: Rust assumes raw pointers may have special concurrency considerations.

Exercises

Let us practice our new concurrency skills with

  • Dining philosophers: a classic problem in concurrency.

  • Multi-threaded link checker: a larger project where you’ll use Cargo to download dependencies and then check links in parallel.

Dining Philosophers

The dining philosophers problem is a classic problem in concurrency:

Five philosophers dine together at the same table. Each philosopher has their own place at the table. There is a fork between each plate. The dish served is a kind of spaghetti which has to be eaten with two forks. Each philosopher can only alternately think and eat. Moreover, a philosopher can only eat their spaghetti when they have both a left and right fork. Thus two forks will only be available when their two nearest neighbors are thinking, not eating. After an individual philosopher finishes eating, they will put down both forks.

You will need a local Cargo installation for this exercise. Copy the code below to src/main.rs file, fill out the blanks, and test that cargo run does not deadlock:

use std::sync::mpsc;
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;

struct Fork;

struct Philosopher {
    name: String,
    // left_fork: ...
    // right_fork: ...
    // thoughts: ...
}

impl Philosopher {
    fn think(&self) {
        self.thoughts
            .send(format!("Eureka! {} has a new idea!", &self.name))
            .unwrap();
    }

    fn eat(&self) {
        // Pick up forks...
        println!("{} is eating...", &self.name);
        thread::sleep(Duration::from_millis(10));
    }
}

static PHILOSOPHERS: &[&str] =
    &["Socrates", "Plato", "Aristotle", "Thales", "Pythagoras"];

fn main() {
    // Create forks

    // Create philosophers

    // Make them think and eat

    // Output their thoughts
}

Multi-threaded Link Checker

Let us use our new knowledge to create a multi-threaded link checker. It should start at a webpage and check that links on the page are valid. It should recursively check other pages on the same domain and keep doing this until all pages have been validated.

For this, you will need an HTTP client such as reqwest. Create a new Cargo project and reqwest it as a dependency with:

$ cargo new link-checker
$ cd link-checker
$ cargo add --features blocking,rustls-tls reqwest

If cargo add fails with error: no such subcommand, then please edit the Cargo.toml file by hand. Add the dependencies listed below.

You will also need a way to find links. We can use scraper for that:

$ cargo add scraper

Finally, we’ll need some way of handling errors. We use thiserror for that:

$ cargo add thiserror

The cargo add calls will update the Cargo.toml file to look like this:

[dependencies]
reqwest = { version = "0.11.12", features = ["blocking", "rustls-tls"] }
scraper = "0.13.0"
thiserror = "1.0.37"

You can now download the start page. Try with a small site such as https://www.google.org/.

Your src/main.rs file should look something like this:

use reqwest::blocking::{get, Response};
use reqwest::Url;
use scraper::{Html, Selector};
use thiserror::Error;

#[derive(Error, Debug)]
enum Error {
    #[error("request error: {0}")]
    ReqwestError(#[from] reqwest::Error),
}

fn extract_links(response: Response) -> Result<Vec<Url>, Error> {
    let base_url = response.url().to_owned();
    let document = response.text()?;
    let html = Html::parse_document(&document);
    let selector = Selector::parse("a").unwrap();

    let mut valid_urls = Vec::new();
    for element in html.select(&selector) {
        if let Some(href) = element.value().attr("href") {
            match base_url.join(href) {
                Ok(url) => valid_urls.push(url),
                Err(err) => {
                    println!("On {base_url}: could not parse {href:?}: {err} (ignored)",);
                }
            }
        }
    }

    Ok(valid_urls)
}

fn main() {
    let start_url = Url::parse("https://www.google.org").unwrap();
    let response = get(start_url).unwrap();
    match extract_links(response) {
        Ok(links) => println!("Links: {links:#?}"),
        Err(err) => println!("Could not extract links: {err:#}"),
    }
}

Run the code in src/main.rs with

$ cargo run

Tasks

  • Use threads to check the links in parallel: send the URLs to be checked to a channel and let a few threads check the URLs in parallel.
  • Extend this to recursively extract links from all pages on the www.google.org domain. Put an upper limit of 100 pages or so so that you don’t end up being blocked by the site.