Chenyo's org-static-blog

28 Dec 2024

Interesting Rust snippets

1. References

2. Reference and lifetime

let mut x;
x = 42;
let y = &x;
x = 43;
assert_eq!(*y, 42);

The above code will cause compiler error at x=43 since x has a shared reference which is still active after x=43. If we comment the assert_eq!, the code compiles.

let mut x = Vec::new();
let y = &mut x;
let z = &mut x;
y.push(42);
z.push(13);
assert_eq!(x, [42, 13]);

Similarly, the code above will cause compiler error at let z = &mut x since at this point there exist two mutable references for x. If we swap this line with y.push(42), the error is resolved because y is never used after let z = &mut x now.

The compiler accepts the following code. In *x = 84, the borrow checker takes out a mutable reference to x, while there is also a shared reference r = &x in the scope at the same time, since it is not accessed later, it does not create a conflict flow. Similarly, when println!("{z}"), the borrow checker walks back the path and finds no conflict until the r us created.

let mut x = Box::new(42);
let r = &x; // lifetime 'a starts
if rand() > 0.5 {
    // deref a Box returns a shared or a mutable reference
    *x = 84;
} else {
    println!("{z}");
}
// println!("{z}");  // this will cause the compiler error

The following code also compiles, even when the lifetime is intermittently invalid:

let mut x = Box::new(42);
let mut z = &x;  // liefttime 'a starts
for i in 0..100 {
    println!({"z"});  // lifetime 'a ends here
    x = Box::new(i);  // x is moved
    z = &x;  // restart lifetime 'a
}
println!("{z}");  // 'a ends here

In the following code, we have to use two generic lifetime annotations:

struct MutStr<'a, 'b> {
    _s: &'a mut &'b str
}

let mut s = "hello";  // creates a 'static lifetime to "hello"
*MutStr {_s: &mut s}._s = "world";  // creates a mutable reference to s
println!("{s}");  // creates a shared reference to s, now s is "world"

To println("{s}"), the borrow checker has to assume that the lifetime of the mutable reference to s 'a ends at the line before, and 'b is always 'static as a reference to a string slice. If we use 'a only, then the borrow checker tries to shorten 'static to 'a, but since the lifetime is behind &mut which is invariant (see “Rust for Rustaceans” Ch 1), the conversion will fail.

Lifetimes are required when there exists mutable references in the function parameters, even if there is no output reference. In the following code, we need 'contents to make sure the new inserted value lives for at least as long as the vector content.

fn insert_value<'vector, 'content>(my_vec: &'vector mut Vec<&'content i32>, value: &'content i32) {
    my_vec.push(value); // value needs to live for as long as the contents of my_vec, aka my_vec
}

fn main() {
    let x = 1;
    let my_vec = vec![&x];
    {
        let y = 2;
        insert_value(&mut my_vec, &y);
    }
    println!("{my_vec:?}");
}

It’s important to note that the single lifetime does not work, see the usage below:

fn insert_value<'a>(my_vec: &'a mut Vec<&'a i32>, value: &'a i32) {
    my_vec.push(value);
}
fn main() {
    let mut my_vec: Vec<&i32> = vec![];
    let val1 = 1;
    let val2 = 2;
    insert_value(&mut my_vec, &val1); // &val1 needs to live until the vector is dropped
                                      // so is &mut my_vec now
    insert_value(&mut my_vec, &val2); // the last &mut my_vec is still valid!
    println!("{my_vec:?}");
}

We need to explicitly annotate the lifetime in the structs and its implementation. In the following code, next_word uses a second lifetime annotation to decouple the lifetime of the mutable reference to WordIterator and the next world, otherwise we have to drop the previous next word before getting a new next word.

struct WordIterator<'s> {
    position: usize,
    string: &'s str,
}

// impl<'a> defines a lifetime
// WordIterator<'a> says the references in WordIterator must live for 'a
impl<'a> WordIterator<'a> {
    fn new(string: &'a str) -> WordIterator<'a> {
        WordIterator {
            position: 0,
            string,
        }
    }

    // Gives the next word, None if there is no word left
    fn next_word(&'b mut self) -> Option<&'a str> {
        todo!()
    }
}

We can also use lifetime bounds to specify 'a should outlives, i.e., live for at least as long as 'b with 'a: 'b:

fn f<'a, 'b>(x: &'a i32, mut y: &'b i32)
where
    'a: 'b,
{
    y = x; // &'a i32 is a subtype of &'b
    let r: &'b &'a i32 = &&0; // &'b is never dangling
}

3. Traits

3.1. Associated types

One should use associated types instead of generic type parameters in a trait if there should only exist one trait implementation for any type. For instance, for any given type, the Item type should be unambiguous even if type contains generic parameters.

/* standard trait
trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}
*/

#[derive(Default)]
struct MyVec<T> {  // we cannot directly implement Iterator for MyVec<T> as it does not store internal states.
    vec: Vec<T>
}

struct MyVecIter<'a, T> {
    vec: &'a Vec<T>,
    position: usize,
}

impl<'a, T> Iterator for MyVecIter<'a, T> {
    type Item = &'a T;

    fn next(&mut self) -> Option<Self::Item> {
        if self.position < self.vec.len() {
            let item = &self.vec[self.position];
            self.position += 1;
            Some(item)
        } else {
            None
        }
    }
}

impl<T> MyVec<T> {
    fn iter(&self) -> MyVecIter<'_, T> {  // this does not consume MyIter
        MyVecIter {
            vec: &self.vec,
            position: 0,
        }
    }
}

impl<'a, T> IntoIterator for &'a MyVec<T> {  // this consumes MyIter
    type Item = &'a T;
    type IntoIter = MyVecIter<'a, T>;

    fn into_iter(self) -> Self::IntoIter {
        self.iter()
    }
}

// usage
let my_vec = MyVec::<i32>::default();
let mut my_vec_iter = my_vec.into_iter();
my_vec_iter.next();

On the other hands, sometimes we want multiple trait implementations for the given type, e.g., convert a type to both String and i32, or when the type has different generic bounds:

impl From<i32> for MyType {}

impl From<String> for MyType {}

impl<T> MyTrait for Vec<T> where T: Display {}

impl<T> MyTrait for Vec<T> where T: Debug {}

3.2. Static/Dynamic dispatch

There are two ways to pass into a function a type that implements a trait: static dispatch and dynamic dispatch:

fn static_dispatch(item: impl MyTrait) {}  // or static_dispatch<T: MyTrait>(item: T) {}

fn dynamic_dispatch(item: &dyn MyTrait) {}

Static dispatch tells the compiler to make a copy of the function for each T and replace each generic parameter with the concrete type provided, which is called monomorphization. In this way the compiler can optimize the generic code just at non-generic code. To avoid generating duplicated machine code, one can declare a non-generic inner function inside the generic function.

When calling dynamic_dispatch, the caller passes a trait object, which is the combination of a type that implements the trait and a pointer to its virtual method table (vtable) which holds all trait method implementations of this type. Since dyn is not resolved during the compile time, it is !Sized and hence it needs to be a wide pointer (&dyn) holder, e.g., &mut, Box or Arc.

Only object-safe traits can allow dynamic dispatch. An object-safe trait must satisfy:

  1. All methods are object-safe:
    1. No generic parameters,
    2. No Self as return type, otherwise the memory layout cannot be determined during the compile time.
    3. No static methods, i.e., must take &self or &mut self, otherwise the vtable is not available.
  2. Trait cannot be Sized bound, otherwise it’s against the nature of dynamic dispatch.

Broadly speaking, use static dispatch in a library, and dynamic dispatch in a binary (no other users):

pub struct DynamicLogger {
    writer: Box<dyn Write>
}

pub struct StaticLogger<W: Write> {
    writer: W
}

// usage
let file = File::create("dynamic_log.txt")?;
let dynamic_logger = DynamicLogger { writer: Box::new(file) };  // Users are forced to use dynamic dispatch

// Both works for static dispatch
let file = File::create("static_log.txt")?;
let logger = Logger { writer: file };

let file: Box<dyn Write> = Box::new(File::create("static_log.txt")?);
let logger = Logger { writer: file };

3.3. Trait bounds

fn collect_to_map<T, S>(items: impl IntoIterator<Item = T>) -> HashMap<T, usize, S>
where
   HashMap<T, usize, S>: FromIterator<(T, usize)>
{
   items
       .into_iter()
       .enumerate()
       .map(|(i, item)| (item, i))
       .collect()
}

The trait above is better than explicitly writing T: Hash + Eq and S: BuildHasher + Default since they are implicitly required for HashMap to implement FromIterator.

impl Debug for AnyIterable
where
    // the reference to Self implements IntoIterator for any lifetime 'a
    for<'a> &'a Self: IntoIterator,
    // the item (associated type in Self) produced by iterating over the reference to Self implement Debug
    // We cannnot write <&'a Self>::Item because the associated type Item only exists in IntoIterator
    for<'a> <&'a Self as IntoIterator>::Item: Debug,
{
    // since fmt signature does not allow lifetime annotation,
    // we need to specify "the reference implements the trait for any lifetime"
    fn fmt(&self, f: &mut Formatter) -> Result<(), Error> {
        f.debug_list()  // formats items, e.g., [item1, item2]
            .entries(self) // iterates over self and debug-formatted each item
            .finish()
    }
}

The code above adds trait bounds for both outer type and the inner associated type. It implement Debug for any type that can be iterated over and whose elements are Debug.

3.4. Existential types

#![feature(impl_trait_in_assoc_type)] // require nightly

#[derive(Default)]
struct MyVec<T> {
    vec: Vec<T>,
}

impl<T> IntoIterator for MyVec<T> {
    type Item = T;
    type IntoIter = impl Iterator<Item = Self::Item>; // existential type

    fn into_iter(self) -> Self::IntoIter {
        let mut vec = self.vec;
        let position = 0;
        std::iter::from_fn(move || {
            if position < vec.len() {
                let item = vec.remove(position); // consume vec
                Some(item)
            } else {
                None
            }
        })
    }
}

fn main() {
    let mut my_vec = MyVec::<i32>::default().into_iter();
    println!("{:?}", my_vec.next());
}

The code above uses existential types in the associated types to avoid creating new type MyVecIter, hence perform zero-cost type erasure.

4. Linked-list

4.1. Leetcode 146: LRU cache

Below is my personal solution for the problem, while it was accepted, there is space to improve it and I may come back when I have another idea.

use std::{cell::RefCell, collections::HashMap, rc::Rc};

type List = Option<Rc<RefCell<ListNode>>>;

struct ListNode {
    key: i32,
    value: i32,
    prev: List,
    next: List,
}

struct LRUCache {
    head: List,
    tail: List,
    data: HashMap<i32, Rc<RefCell<ListNode>>>,
    capacity: usize,
}

impl LRUCache {
    fn new(capacity: i32) -> Self {
        Self {
            head: None,
            tail: None,
            data: HashMap::new(),
            capacity: capacity as usize,
        }
    }

    fn move_to_front(&mut self, node: Rc<RefCell<ListNode>>) {
        // early return if the node is already the head
        if let Some(head) = &self.head {
            if Rc::ptr_eq(head, &node) {
                return;
            }
        }

        let prev = node.borrow().prev.clone();
        let next = node.borrow().next.clone();

        match (prev.as_ref(), next.as_ref()) {
            (None, None) => {
                // insert a new node only modifies the head,
                // the rest of the list is untouched.
            }
            (Some(prev), None) => {
                // get the tail node
                prev.borrow_mut().next = None;
                self.tail = Some(Rc::clone(prev));
            }
            (None, Some(_)) => {
                // get the head node is already handled
                return;
            }
            (Some(prev), Some(next)) => {
                // get a node in the middle of a list,
                // connect its prev and next
                prev.borrow_mut().next = Some(Rc::clone(next));
                next.borrow_mut().prev = Some(Rc::clone(prev));
            }
        }

        // update node to be the new head
        if let Some(old_head) = &self.head {
            node.borrow_mut().next = Some(Rc::clone(old_head));
            old_head.borrow_mut().prev = Some(Rc::clone(&node));
        } else {
            // insert a node to the empty list
            self.tail = Some(Rc::clone(&node));
        }
        node.borrow_mut().prev = None;
        self.head = Some(node);
    }

    fn get(&mut self, key: i32) -> i32 {
        let mut result = -1;
        if let Some(node) = self.data.get(&key) {
            result = node.borrow().value;
            self.move_to_front(Rc::clone(node));
        }
        result
    }

    fn put(&mut self, key: i32, value: i32) {
        if let Some(node) = self.data.get(&key) {
            let node = Rc::clone(node);
            node.borrow_mut().value = value;
            self.move_to_front(node);
        } else {
            let new_node = Rc::new(RefCell::new(ListNode {
                key,
                value,
                prev: None,
                next: None,
            }));
            // evict the tail node
            if self.data.len() >= self.capacity {
                if let Some(tail) = self.tail.take() {
                    // cannot simply take the prev as it breaks the list?
                    let prev = tail.borrow().prev.clone();
                    self.data.remove(&tail.borrow().key);
                    if let Some(prev) = prev {
                        prev.borrow_mut().next = None;
                        self.tail = Some(prev);
                    } else {
                        self.head = None;
                        self.tail = None;
                    }
                }
            }
            self.data.insert(key, Rc::clone(&new_node));
            self.move_to_front(new_node);
        }
    }
}

Tags: rust program