Chenyo's org-static-blog

13 Aug 2024

CMU 15-445 notes: Memory Management

This is a personal note for the CMU 15-445 L6 notes as well as some explanation from Claude.ai.

1. Goals

Manage DBMS memory and move data between the memory and the disk, such that the execution engine does no worry about the data fetching.
Spatial control: keep relevant pages physically together on disk.
Temporal control: minimize the number of stalls to read data from the disk.

2. Locks & Latches

Both used to protect internal elements.

2.1. Locks

A high-level logical primitive to allow for transaction atomicity.
Exposed to users when queries are run.
Need to be able to rollback changes.

2.2. Latches

A low-level protection primitive a DBMS uses in its internal data structures, e.g., hash tables.
Only held when the operation is being made, like a mutex.
Do not need to expose to users or used to rollback changes.

3. Buffer pool

An in-memory cache of pages read from the disk.
The buffer pool’s region of memory is organized as an array of fixed size pages; each array entry is called a frame.
When the DBMS requests a page, the buffer pool first searches its cache, if not found, it copies he page from the disk into one of its frame.
Dirty pages (i.e., modified pages) are buffered rather than writing back immediately.

3.1. Metadata

Page table: An in-memory hash mapping page ids to frame locations in the buffer pool.
Dirty flag: set by a thread whenever it modifies a page to indicate the pages must be written back to disk.
Pin/Reference counter: tracks the number of threads that are currently accessing the page; the storage manager is not allowed to evict a page if its pin count is greater than 0.

3.2. Memory allocation policies

The buffer pool decides when to allocate a frame for a page.
Global policies: decisions based on the overall workload, e.g., least recently used (LRU) or clock algorithm.
Local policies: decisions applying to specific query, e.g., priority-based page replacement.

4. Buffer pool optimizations

4.1. Multiple buffer pools

Each database or page can have its own buffer pool and adopts local policies to reduce latch contention.
To map a page to a buffer pool, the DBMS can use object IDs or page IDs as the key.
- Record ID: a unique identifier for a row in a table (cf. Tuple layout post).
- Object ID: a unique identifier for an object, used to reference a user-defined type.

4.2. Pre-fetching

While the first set of pages is being processed, the DBMS can pre-fetch the second set into the buffer pool based on the dependency between pages.
- E.g., If pages are index-organized, the sibling pages can be pre-fetched.

4.3. Scan sharing

Query cursors can reuse retrieved data.
When a query comes while a previous query is being processed by scanning the table, the new query can attach its scanning cursor to the first query’s cursor.
The DBMS keeps track of where the second query joined to make sure it also completes the scan.

4.4. Buffer pool bypass

Scanned pages do not have to be stored in the buffer pool to avoid the overhead.
Use cases: a query needs to read a large sequence of contiguous pages; temporary pages like sorting or joins.

5. Buffer replacement policies

The DBMS decides which page to evict from the buffer pool to free up a frame.

5.1. Least recently used (LRU)

LRU maintains a timestamp of when each page was last accessed, and evicts the page with the oldest timestamp.
- The timestamp can be stored in a queue for efficient sorting.
Susceptible to sequential flooding, where the buffer pool is corrupted due to a sequential scan.
- With the LRU policy the oldest pages are evicted, but they are more likely to be scanned soon.

5.2. Clock

An approximation of LRU but replace the timestamp with a reference bit which is set to 1 when a page is accessed.
Regularly sweeping all pages, if a bit is set to 1, reset to 0; if a bit is 0, evict the page.

5.3. LRU-K

Tracks the last K accessed timestamps to predict the next accessed time, hence avoid the sequential flooding issue.

5.3.1. MySQL approximate LRU-K

Use a linked list with two entry points: “old” and “young”.
The new pages are always inserted to the head of “old”.
If a page in the “old” is accessed again, it is then inserted to the head of “young”.

5.4. Localization

Instead of using a global replacement policy, the DBMS make eviction decisions based on each query.
Pages brought in by one query are less likely to evict pages that are important for other ongoing queries.
The DBMS can predicts more accurately which pages should stay or be evicted once the query is complete, so that the buffer pool is less polluted with less useful pages.

5.5. Priority hint

Transactions tell the buffer pool where pages are important based on the context of each page.

5.6. Dirty pages

Two ways to handle dirty pages in the buffer pool:
- Fast: only drop clean pages.
- Slow: write back dirty pages to ensure persistent change, and then evict them (if they will not be read again.).
Can periodically walk through the page table and write back dirty pages in the background.

6. Other memory pools

A DBMS also maintains other pools to store:
- query caches,
- logs,
- temporary tables, e.g., sorting, join,
- dictionary caches.

7. OS cache bypass

Most DBMS use direct I/O (e.g., with fsync instead of fwrite) to bypass the OS cache to avoid redundant page copy and to manage eviction policies more intelligently (cf. Why not OS post).

8. I/O scheduling

The DBMS maintains internal queue to track page read/write.
The priority are determined by multi-facets, e.g., critical path task, SLAs.