04 Aug 2025
CS110L Notes
Notes for CS110L Safety in Systems Programming. Mostly follow Spring 2020, also include Spring 2021.
Not complete, only started taking notes since Lec 10 (2020).
1. Channel
- Golang: Do not communicate by sharing memory; instead, share memory by communicating.
- Message passing: each thread has its own memory, they communicate only by message exchanging.
- In theory, this implies: no shared memory -> no data races -> each thread needs to copy the data -> expensive
- In practice, goroutines share heap memory and only make shallow copies into channels -> pass pointers to channel is dangerous!
- In Rust, passing pointers, e.g.,
Box
is always safe due to ownership transfer - An ideal channel: MPMC (multi-producer, multi-consumer).
1.1. Mutex-based channel implementation
pub struct SemaPlusPlus<T> { queue_and_cv: Arc<(Mutex<VecDeque<T>>, Condvar)>, } impl<T> SemaPlusPlus<T> { pub fn send(&self, message: T) { let (queue_lock, cv) = &*self.queue_and_cv; let mut queue = queue_lock.lock().unwrap(); let queue_was_empty = queue.is_empty(); queue.push_back(message); // if a queue is not empty, then another thread must have notified and main() // must be awake if queue_was_empty { cv.notify_all(); } } pub fn recv(&self) -> T { let (queue_lock, cv) = &*self.queue_and_cv; // wait_while dereferences the guard and pass the protected data to the closure, // when the condition is true, it releases the lock and sleep until another thread // notify_all let mut queue = cv .wait_while(queue_lock.lock().unwrap(), |queue| queue.is_empty()) .unwrap(); queue.pop_front().unwrap() } } fn main() { let sem: SemaPlusPlus<String> = SemaPlusPlus::new(); let mut handlers = Vec::new(); for i in 0..NUM_THREADS { let sem_clone = sem.clone(); let handle = thread::spawn(move || { rand_sleep(); sem_clone.send(format!("thread {} just finished!", i)) }); handlers.push(handle); } // main is the only consumer of the queue, // it wakes up when some thread notify_all, locks the queue, consume the message // and release the lock until it is notified again for _ in 0..NUM_THREADS { println!("{}", sem.recv()) } for handle in handlers { handle.join().unwrap(); } }
- Mutex-based channel is inefficient:
- it does not allow the queue producer and consumer modify the queue simultaneously.
- It involves system calls.
- Go channels are MPMC, but it essentially implement
Mutex<VecDeque<>>
with fuxtex.
1.2. Rust crossbeam channel implementation
- Minimal lock usage.
- Channels are not the best choice for global values -> lock still required.
/// Factor the number while receiving numbers from stdin fn main() { let (sender, receiver) = crossbeam_channel::unbounded(); let mut threads = Vec::new(); for _ in 0..num_cpus::get() { // The channel maintains a reference counter for senders and for receivers // when receiver.clone(), the receiver counter increments. let receiver = receiver.clone(); threads.push(thread::spawn(move || { while let Ok(next_num) = receiver.recv() { factor_number(next_num); } })); } let stdin = std::io::stdin(); for line in stdin.lock().lines() { let num = line.unwrap().parse::<u32>().unwrap(); sender.send(num).expect("Cannot send to channel when there is no receiver"); } // decrement the sender counter, when no sender, the channel is closed drop(sender); for thread in threads { thread.join().expect("panic"); } }
2. Networking systems
2.1. System
2.1.1. File system
- Open file (OF) table: system-wide table, stores file position, access mode (read/write) and reference count.
- File descriptor (FD) table: process-specific table, maps file descriptor integers to references in the OF table; multiple FD entries can point to the same OF entry (e.g., parent and child processes share file positions).
- Vnode table: system-wide table, contains file (e.g., terminal, socket) size, permissions, timestamps, disk block locations; multiple OF entries can reference the same vnode entry, indicating same file opened multiple times.
2.1.2. Start a server
- A process creates a socket file descriptor with a queue for connection requests.
- The process binds the socket with specific IP and port.
- The process listens and accepts connections in a loop.
2.1.3. Connect to a server (ignore networking details)
- The client creates a socket and send the socket address in the SYN packet.
- The server accpets the request and start a new clinet-specific socket to establish a bidirectional communication channel.
2.2. Scalability and avilability
- Internet traffic has grown faster than hardware has increased in power.
- Two machines with commodity performance is much cheaper than one machine with 2x performance.
- Need to distribute a system’s functionality over servers using networking.
- Scale out: separate stateless computation and data storage.
- Chaos engineering: intentionally introduce failures on production, e.g., randomly terminates an entire datacenter or even a region.
2.2.1. Load balancers
- Relay client requests to one compute node; health check each compute node.
- Load balancing load balancers:
- Round-robin DNS; DNS return multiple IPs for the same hostname.
- Hard to consistently rotate IPs if the DNS responses are cached.
- DNS has no idea of IP health.
- Geographic routing: DNS servers respond with the load balancer IP closest to the client.
- If the cloest data center is done, availability is still broken.
- IP anycast: multiple computers announce they own the same IP (BGP).
- Round-robin DNS; DNS return multiple IPs for the same hostname.
2.3. Security
- Authentication: who are you? Authorization: are you allowed?
- Prevent confidential data leakage: use a framework to handle every request, check authenticartion/authorization; use type systems.
- Elasticsearch: distributed, open source search and analytics engine for all types of data; involved in multiple data breach incidents.
- Why: bad default settings, e.g., default username and password; accept all connections by default.
- Design principles: needs to be harder to do things wrong than to do thing right.