Chenyo's org-static-blog

Posts tagged "rust":

04 Aug 2025

CS110L Notes

Notes for CS110L Safety in Systems Programming. Mostly follow Spring 2020, also include Spring 2021.

Not complete, only started taking notes since Lec 10 (2020).

1. Channel

  • Golang: Do not communicate by sharing memory; instead, share memory by communicating.
  • Message passing: each thread has its own memory, they communicate only by message exchanging.
    • In theory, this implies: no shared memory -> no data races -> each thread needs to copy the data -> expensive
    • In practice, goroutines share heap memory and only make shallow copies into channels -> pass pointers to channel is dangerous!
  • In Rust, passing pointers, e.g., Box is always safe due to ownership transfer
  • An ideal channel: MCMC (multi-producer, multi-consumer).

1.1. Mutex-based channel implementation

pub struct SemaPlusPlus<T> {
    queue_and_cv: Arc<(Mutex<VecDeque<T>>, Condvar)>,
}

impl<T> SemaPlusPlus<T> {
    pub fn send(&self, message: T) {
        let (queue_lock, cv) = &*self.queue_and_cv;
        let mut queue = queue_lock.lock().unwrap();
        let queue_was_empty = queue.is_empty();
        queue.push_back(message);
        // if a queue is not empty, then another thread must have notified and main()
        // must be awake
        if queue_was_empty {
            cv.notify_all();
        }
    }

    pub fn recv(&self) -> T {
        let (queue_lock, cv) = &*self.queue_and_cv;
        // wait_while dereferences the guard and pass the protected data to the closure,
        // when the condition is true, it releases the lock and sleep until another thread
        // notify_all
        let mut queue = cv
            .wait_while(queue_lock.lock().unwrap(), |queue| queue.is_empty())
            .unwrap();
        queue.pop_front().unwrap()
    }
}

fn main() {
    let sem: SemaPlusPlus<String> = SemaPlusPlus::new();
    let mut handlers = Vec::new();
    for i in 0..NUM_THREADS {
        let sem_clone = sem.clone();
        let handle = thread::spawn(move || {
            rand_sleep();
            sem_clone.send(format!("thread {} just finished!", i))
        });
        handlers.push(handle);
    }
    // main is the only consumer of the queue,
    // it wakes up when some thread notify_all, locks the queue, consume the message
    // and release the lock until it is notified again
    for _ in 0..NUM_THREADS {
        println!("{}", sem.recv())
    }

    for handle in handlers {
        handle.join().unwrap();
    }
}
  • Mutex-based channel is inefficient:
    • it does not allow the queue producer and consumer modify the queue simultaneously.
    • It involves system calls.
  • Go channels are MCMC, but it essentially implement Mutex<VecDeque<>> with fuxtex.

1.2. Rust crossbeam channel implementation

  • Minimal lock usage.
  • Channels are not the best choice for global values -> lock still required.
/// Factor the number while receiving numbers from stdin
fn main() {
    let (sender, receiver) = crossbeam_channel::unbounded();
    let mut threads = Vec::new();
    for _ in 0..num_cpus::get() {
        // The channel maintains a reference counter for senders and for receivers
        // when receiver.clone(), the receiver counter increments.
        let receiver = receiver.clone();
        threads.push(thread::spawn(move || {
            while let Ok(next_num) = receiver.recv() {
                factor_number(next_num);
            }
        }));
    }

    let stdin = std::io::stdin();
    for line in stdin.lock().lines() {
        let num = line.unwrap().parse::<u32>().unwrap();
        sender.send(num).expect("Cannot send to channel when there is no receiver");
    }

    // decrement the sender counter, when no sender, the channel is closed
    drop(sender);

    for thread in threads {
        thread.join().expect("panic");
    }
}
Tags: study rust stanford
28 Dec 2024

Interesting Rust snippets

1. References

2. Reference and lifetime

let mut x;
x = 42;
let y = &x;
x = 43;
assert_eq!(*y, 42);

The above code will cause compiler error at x=43 since x has a shared reference which is still active after x=43. If we comment the assert_eq!, the code compiles.

let mut x = Vec::new();
let y = &mut x;
let z = &mut x;
y.push(42);
z.push(13);
assert_eq!(x, [42, 13]);

Similarly, the code above will cause compiler error at let z = &mut x since at this point there exist two mutable references for x. If we swap this line with y.push(42), the error is resolved because y is never used after let z = &mut x now.

The compiler accepts the following code. In *x = 84, the borrow checker takes out a mutable reference to x, while there is also a shared reference r = &x in the scope at the same time, since it is not accessed later, it does not create a conflict flow. Similarly, when println!("{z}"), the borrow checker walks back the path and finds no conflict until the r us created.

let mut x = Box::new(42);
let r = &x; // lifetime 'a starts
if rand() > 0.5 {
    // deref a Box returns a shared or a mutable reference
    *x = 84;
} else {
    println!("{z}");
}
// println!("{z}");  // this will cause the compiler error

The following code also compiles, even when the lifetime is intermittently invalid:

let mut x = Box::new(42);
let mut z = &x;  // liefttime 'a starts
for i in 0..100 {
    println!({"z"});  // lifetime 'a ends here
    x = Box::new(i);  // x is moved
    z = &x;  // restart lifetime 'a
}
println!("{z}");  // 'a ends here

In the following code, we have to use two generic lifetime annotations:

struct MutStr<'a, 'b> {
    _s: &'a mut &'b str
}

let mut s = "hello";  // creates a 'static lifetime to "hello"
*MutStr {_s: &mut s}._s = "world";  // creates a mutable reference to s
println!("{s}");  // creates a shared reference to s, now s is "world"

To println("{s}"), the borrow checker has to assume that the lifetime of the mutable reference to s 'a ends at the line before, and 'b is always 'static as a reference to a string slice. If we use 'a only, then the borrow checker tries to shorten 'static to 'a, but since the lifetime is behind &mut which is invariant (see “Rust for Rustaceans” Ch 1), the conversion will fail.

Lifetimes are required when there exists mutable references in the function parameters, even if there is no output reference. In the following code, we need 'contents to make sure the new inserted value lives for at least as long as the vector content.

fn insert_value<'vector, 'content>(my_vec: &'vector mut Vec<&'content i32>, value: &'content i32) {
    my_vec.push(value); // value needs to live for as long as the contents of my_vec, aka my_vec
}

fn main() {
    let x = 1;
    let my_vec = vec![&x];
    {
        let y = 2;
        insert_value(&mut my_vec, &y);
    }
    println!("{my_vec:?}");
}

It’s important to note that the single lifetime does not work, see the usage below:

fn insert_value<'a>(my_vec: &'a mut Vec<&'a i32>, value: &'a i32) {
    my_vec.push(value);
}
fn main() {
    let mut my_vec: Vec<&i32> = vec![];
    let val1 = 1;
    let val2 = 2;
    insert_value(&mut my_vec, &val1); // &val1 needs to live until the vector is dropped
                                      // so is &mut my_vec now
    insert_value(&mut my_vec, &val2); // the last &mut my_vec is still valid!
    println!("{my_vec:?}");
}

We need to explicitly annotate the lifetime in the structs and its implementation. In the following code, next_word uses a second lifetime annotation to decouple the lifetime of the mutable reference to WordIterator and the next world, otherwise we have to drop the previous next word before getting a new next word.

struct WordIterator<'s> {
    position: usize,
    string: &'s str,
}

// impl<'a> defines a lifetime
// WordIterator<'a> says the references in WordIterator must live for 'a
impl<'a> WordIterator<'a> {
    fn new(string: &'a str) -> WordIterator<'a> {
        WordIterator {
            position: 0,
            string,
        }
    }

    // Gives the next word, None if there is no word left
    fn next_word(&'b mut self) -> Option<&'a str> {
        todo!()
    }
}

We can also use lifetime bounds to specify 'a should outlives, i.e., live for at least as long as 'b with 'a: 'b:

fn f<'a, 'b>(x: &'a i32, mut y: &'b i32)
where
    'a: 'b,
{
    y = x; // &'a i32 is a subtype of &'b
    let r: &'b &'a i32 = &&0; // &'b is never dangling
}

3. Traits

3.1. Associated types

One should use associated types instead of generic type parameters in a trait if there should only exist one trait implementation for any type. For instance, for any given type, the Item type should be unambiguous even if type contains generic parameters.

/* standard trait
trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}
*/

#[derive(Default)]
struct MyVec<T> {  // we cannot directly implement Iterator for MyVec<T> as it does not store internal states.
    vec: Vec<T>
}

struct MyVecIter<'a, T> {
    vec: &'a Vec<T>,
    position: usize,
}

impl<'a, T> Iterator for MyVecIter<'a, T> {
    type Item = &'a T;

    fn next(&mut self) -> Option<Self::Item> {
        if self.position < self.vec.len() {
            let item = &self.vec[self.position];
            self.position += 1;
            Some(item)
        } else {
            None
        }
    }
}

impl<T> MyVec<T> {
    fn iter(&self) -> MyVecIter<'_, T> {  // this does not consume MyIter
        MyVecIter {
            vec: &self.vec,
            position: 0,
        }
    }
}

impl<'a, T> IntoIterator for &'a MyVec<T> {  // this consumes MyIter
    type Item = &'a T;
    type IntoIter = MyVecIter<'a, T>;

    fn into_iter(self) -> Self::IntoIter {
        self.iter()
    }
}

// usage
let my_vec = MyVec::<i32>::default();
let mut my_vec_iter = my_vec.into_iter();
my_vec_iter.next();

On the other hands, sometimes we want multiple trait implementations for the given type, e.g., convert a type to both String and i32, or when the type has different generic bounds:

impl From<i32> for MyType {}

impl From<String> for MyType {}

impl<T> MyTrait for Vec<T> where T: Display {}

impl<T> MyTrait for Vec<T> where T: Debug {}

3.2. Static/Dynamic dispatch

There are two ways to pass into a function a type that implements a trait: static dispatch and dynamic dispatch:

fn static_dispatch(item: impl MyTrait) {}  // or static_dispatch<T: MyTrait>(item: T) {}

fn dynamic_dispatch(item: &dyn MyTrait) {}

Static dispatch tells the compiler to make a copy of the function for each T and replace each generic parameter with the concrete type provided, which is called monomorphization. In this way the compiler can optimize the generic code just at non-generic code. To avoid generating duplicated machine code, one can declare a non-generic inner function inside the generic function.

When calling dynamic_dispatch, the caller passes a trait object, which is the combination of a type that implements the trait and a pointer to its virtual method table (vtable) which holds all trait method implementations of this type. Since dyn is not resolved during the compile time, it is !Sized and hence it needs to be a wide pointer (&dyn) holder, e.g., &mut, Box or Arc.

Only object-safe traits can allow dynamic dispatch. An object-safe trait must satisfy:

  1. All methods are object-safe:
    1. No generic parameters,
    2. No Self as return type, otherwise the memory layout cannot be determined during the compile time.
    3. No static methods, i.e., must take &self or &mut self, otherwise the vtable is not available.
  2. Trait cannot be Sized bound, otherwise it’s against the nature of dynamic dispatch.

Broadly speaking, use static dispatch in a library, and dynamic dispatch in a binary (no other users):

pub struct DynamicLogger {
    writer: Box<dyn Write>
}

pub struct StaticLogger<W: Write> {
    writer: W
}

// usage
let file = File::create("dynamic_log.txt")?;
let dynamic_logger = DynamicLogger { writer: Box::new(file) };  // Users are forced to use dynamic dispatch

// Both works for static dispatch
let file = File::create("static_log.txt")?;
let logger = Logger { writer: file };

let file: Box<dyn Write> = Box::new(File::create("static_log.txt")?);
let logger = Logger { writer: file };

3.3. Trait bounds

fn collect_to_map<T, S>(items: impl IntoIterator<Item = T>) -> HashMap<T, usize, S>
where
   HashMap<T, usize, S>: FromIterator<(T, usize)>
{
   items
       .into_iter()
       .enumerate()
       .map(|(i, item)| (item, i))
       .collect()
}

The trait above is better than explicitly writing T: Hash + Eq and S: BuildHasher + Default since they are implicitly required for HashMap to implement FromIterator.

impl Debug for AnyIterable
where
    // the reference to Self implements IntoIterator for any lifetime 'a
    for<'a> &'a Self: IntoIterator,
    // the item (associated type in Self) produced by iterating over the reference to Self implement Debug
    // We cannnot write <&'a Self>::Item because the associated type Item only exists in IntoIterator
    for<'a> <&'a Self as IntoIterator>::Item: Debug,
{
    // since fmt signature does not allow lifetime annotation,
    // we need to specify "the reference implements the trait for any lifetime"
    fn fmt(&self, f: &mut Formatter) -> Result<(), Error> {
        f.debug_list()  // formats items, e.g., [item1, item2]
            .entries(self) // iterates over self and debug-formatted each item
            .finish()
    }
}

The code above adds trait bounds for both outer type and the inner associated type. It implement Debug for any type that can be iterated over and whose elements are Debug.

3.4. Existential types

#![feature(impl_trait_in_assoc_type)] // require nightly

#[derive(Default)]
struct MyVec<T> {
    vec: Vec<T>,
}

impl<T> IntoIterator for MyVec<T> {
    type Item = T;
    type IntoIter = impl Iterator<Item = Self::Item>; // existential type

    fn into_iter(self) -> Self::IntoIter {
        let mut vec = self.vec;
        let position = 0;
        std::iter::from_fn(move || {
            if position < vec.len() {
                let item = vec.remove(position); // consume vec
                Some(item)
            } else {
                None
            }
        })
    }
}

fn main() {
    let mut my_vec = MyVec::<i32>::default().into_iter();
    println!("{:?}", my_vec.next());
}

The code above uses existential types in the associated types to avoid creating new type MyVecIter, hence perform zero-cost type erasure.

4. Linked-list

4.1. Leetcode 146: LRU cache

Below is my personal solution for the problem, while it was accepted, there is space to improve it and I may come back when I have another idea.

use std::{cell::RefCell, collections::HashMap, rc::Rc};

type List = Option<Rc<RefCell<ListNode>>>;

struct ListNode {
    key: i32,
    value: i32,
    prev: List,
    next: List,
}

struct LRUCache {
    head: List,
    tail: List,
    data: HashMap<i32, Rc<RefCell<ListNode>>>,
    capacity: usize,
}

impl LRUCache {
    fn new(capacity: i32) -> Self {
        Self {
            head: None,
            tail: None,
            data: HashMap::new(),
            capacity: capacity as usize,
        }
    }

    fn move_to_front(&mut self, node: Rc<RefCell<ListNode>>) {
        // early return if the node is already the head
        if let Some(head) = &self.head {
            if Rc::ptr_eq(head, &node) {
                return;
            }
        }

        let prev = node.borrow().prev.clone();
        let next = node.borrow().next.clone();

        match (prev.as_ref(), next.as_ref()) {
            (None, None) => {
                // insert a new node only modifies the head,
                // the rest of the list is untouched.
            }
            (Some(prev), None) => {
                // get the tail node
                prev.borrow_mut().next = None;
                self.tail = Some(Rc::clone(prev));
            }
            (None, Some(_)) => {
                // get the head node is already handled
                return;
            }
            (Some(prev), Some(next)) => {
                // get a node in the middle of a list,
                // connect its prev and next
                prev.borrow_mut().next = Some(Rc::clone(next));
                next.borrow_mut().prev = Some(Rc::clone(prev));
            }
        }

        // update node to be the new head
        if let Some(old_head) = &self.head {
            node.borrow_mut().next = Some(Rc::clone(old_head));
            old_head.borrow_mut().prev = Some(Rc::clone(&node));
        } else {
            // insert a node to the empty list
            self.tail = Some(Rc::clone(&node));
        }
        node.borrow_mut().prev = None;
        self.head = Some(node);
    }

    fn get(&mut self, key: i32) -> i32 {
        let mut result = -1;
        if let Some(node) = self.data.get(&key) {
            result = node.borrow().value;
            self.move_to_front(Rc::clone(node));
        }
        result
    }

    fn put(&mut self, key: i32, value: i32) {
        if let Some(node) = self.data.get(&key) {
            let node = Rc::clone(node);
            node.borrow_mut().value = value;
            self.move_to_front(node);
        } else {
            let new_node = Rc::new(RefCell::new(ListNode {
                key,
                value,
                prev: None,
                next: None,
            }));
            // evict the tail node
            if self.data.len() >= self.capacity {
                if let Some(tail) = self.tail.take() {
                    // cannot simply take the prev as it breaks the list?
                    let prev = tail.borrow().prev.clone();
                    self.data.remove(&tail.borrow().key);
                    if let Some(prev) = prev {
                        prev.borrow_mut().next = None;
                        self.tail = Some(prev);
                    } else {
                        self.head = None;
                        self.tail = None;
                    }
                }
            }
            self.data.insert(key, Rc::clone(&new_node));
            self.move_to_front(new_node);
        }
    }
}

Tags: rust program
Other posts