Chenyo's Blog

Build a free Telegram sticker tag bot

Sun, 08 Sep 2024 17:01:00 +0200

1. What happened

When I started to use Telegram, I really missed being able to search stickers by text, like I can in WeChat. Sadly, Telegram only lets you search emojis by text, or find stickers using emojis.

After digging around, I discovered the easiest way to add this cool feature was through Telegram bots. The idea? Store a tag-to-sticker map in the bot, then fetch all matching stickers when given a tag in the bot’s inline mode. There are a few sticker tag bots on Telegram already, but they’re either dead or can’t handle Unicode input like Chinese characters. Plus, I’m not super keen on trusting my data to someone else’s database. Moreover, I might want to use a bot for other personal stuff later.

So, I decided to build my own Telegram bot.

2. What do I need

My goal was to create a bot using only free services, including cloud storage for key-value pairs and a hosting platform to keep the bot running.

I stumbled upon Render from an X recommendation which offers 750 hours per month for free deployment (which equals 31 days), so I deployed my bot there once I got the bot running locally. But then I found out Render’s free tier doesn’t offer permanent storage and shuts down services after 15 minutes of inactivity.

A sticker tag bot without memory isn’t much use to anyone, so I went hunting for free cloud storage. With some help from Claude and Perplexity, I discovered Firebase Realtime database, which offers 1GB storage and 10GB throughput per month on its free tier.

Even with cloud storage, a bot that konks out every 15 minutes just won’t cut it - I need my stickers now! So my next quest was finding a way to keep the bot awake, which led me to UptimeRobot. It’s actually a web monitoring tool, but I can use it to ping the bot regularly, tricking it into staying “active” instead of dozing off.

So, to sum it up, building this sticker tag bot required:

a Telegram bot from BotFather,
a free Render deployment,
a Firebase’s free key-value storage, and
an UptimeRobot’s free monitoring service.

However, these services do not work together automatically. Gluing them together required additional effort.

3. How to build a bot

The first step in building any bot is asking BotFather for a new bot and keeping the bot token secure. Telegram offers a helpful tutorial that explains the process using Java. Examples in other languages can be found in their Gitlab repo. In my opinion, the most challenging part here is creating a unique bot username that is still available.

The next step involves working with the Telegram bot API in the chosen programming language. This includes learning how to handle messages effectively. For example, I used tgbotapi(Golang).

3.1. How to build a sticker tag bot

A sticker tag bot needs two main functionalities:

Handle direct messages to add new sticker tags.
Handle inline queries to search for stickers using a given tag.

To implement the first functionality, I created a simple state machine with two states:

The initial state waits for a sticker input and then moves to the tag state.
The tag state waits for any text input to use as the tag for the previous sticker.

To implement the second functionality, one needs to use the InlineQueryResultCachedSticker method.

For local testing, one can use a lightweight local key-value storage to store and search sticker tags. I used BadgerDB(Golang) for example.

I noticed that the generated file ID for the same sticker is different each time, making it hard to check for duplicates when adding new tags. To address this, I added a /delete method to remove tags when needed.

3.2. How to make the bot private

I couldn’t find an official way to make a bot visible only to me. Suggested by Claude, I predefined a list of authorized users. Then I performed a sanity check before handling any messages.

4. How to deploy the bot on Render

Deploying a service on Render with a free account is challenging due to the lack of shell access, disk access, and non-so-live logs. The process of making everything work at this stage was time-consuming and I even contacted Render’s technical support although they only responded “Ok I will close the ticket” after the issue was self-resolved.

Three main steps are required here:

start an HTTP server with several endpoints at the bot, and
configure the web service and environment variables on Render’s dashboard,
configure the Telegram webhook at the bot.

In step 1, starting an HTTP server at 0.0.0.0: is necessary. One should also enable GET methods for the root and a health check endpoint to allow Render to scan the service regularly.

In step 2, one needs to fill in the service configuration and environment variables in different boxes. This includes settings such as port, build command, and health check endpoint. The issue I encountered was Render could not scan any port even if I have triple-checked that everything worked fine locally. In the end, I solved this issue by adding the Golang build tag -tags netgo in the build command. Actually this flag was configured by default, but I initially replaced it with a simpler build command.

In step 3, one needs to configure the webhook with the Bot API and to enable the POST method for the webhook at the HTTP server (this can also be handled by the Bot API). The webhook can be https: //.onrender.com/ (or another unique URL). This URL informs Telegram where to send and receive all messages for the bot.

5. How to connect to the Firebase

The Firebase Realtime database stores key-value pairs in the JSON format. Connecting the bot with the database requires the following steps:

Create the app and the database on Firebase’s dashboard. Specifically, one needs to store the following 3 values for interaction:
- The database URL, which looks like https: //-default-rtdb.*.firebasedatabase.app.
- The credential file, which can be downloaded at Project settings->Service accounts->Firebase Admin SDK (and should also be added to Render).
Import the language-specific Firebase API to configure the database in the bot. For example, I use firebase(Golang).
Update the database rules in Firebase dashboard to only allow authorized writes for specific tags, e.g., one name/path to refer to those key-value pairs.

It’s worth noting that connecting to the database on Render may take some time after a fresh start. During this initialization period, the log may display a 502 Bad Gateway error to the database.

6. How to configure UptimeRobot

Before configuring UptimeRobot, an attempt was made to ping the bot from within itself, but this approach did not function for Render.

Using UptimeRobot to maintain the bot’s active status involves two primary steps:

Enable the HEAD method (the sole method available for a free account) for any endpoint on the HTTP server.
Configure an HTTP(S) monitor for that endpoint, which appears as .onrender.com/, and establish the monitoring interval to less than 15 minutes.

7. Conclusion

This post isn’t meant to be a step-by-step guide for building a Telegram bot. It skips some steps and doesn’t include screenshots. But don’t worry, most of the missing bits can be figured out using AI language models these days. The rest really depends on each specific situation. The main point here is to show how to set up a free small web service, even when there’s no single platform that does it all.

When I first wrote this, my bot had been up and running for 10 days. It only had 30 minutes of downtime, which I think happened because UptimeRobot couldn’t reach Render’s IP address during that time.

Right now, the repository is private since I plan to add a second functionality to the bot soon.

Install Doom Emacs with Lisp native compilation in WSL2

Sat, 07 Sep 2024 12:38:00 +0200

Today I installed Doom Emacs for my husband on his WSL2. Although the entire process was guided by Claude, there were some back-and-forth during the interaction. Therefore I would like to record the full commands I have used here in sequence for any potential reference.

1. Assumptions

This installation guide assumes a fresh installation of WSL2 Ubuntu 22.04 on Windows 11 in 2024 September.

2. Install prerequisite packages

According to the Doom Emacs documentation, the following packages are recommended:

Git 2.23+: this is already installed by default.
Emacs 29.4 with Lisp native compilation: this is finicky and will be elaborated later.
ripgrep 14.0+: the documentation says 11.0+ suffices, but doom doctor still complains the latest version (13.0) installed from apt is not advanced, so we need to install it from its Github released package later.
GNU find: also installed already.
fd: sudo apt install fd-find suffices.

3. Install Emacs 29.4

3.1. Before building

First, let’s install some build dependencies:

sudo apt update
sudo apt upgrade  # update packages
sudo apt install build-essential libjansson4 libjansson-dev \
    libgnutls28-dev libtree-sitter-dev libsqlite3-dev
sudo apt install texinfo  # to generate documentation

build-essnetial should install necessary tools to build C/C++ programs such as gcc, g++, make, gdb and dpkg. The rest packages install pre-compiled libraries.

Besides these packages, there are two important packages to support List native compilation:

sudo apt install ibgccjit0 libgccjit-11-dev  # 11 is my gcc version

After installing them, make sure to export the path in the current session, otherwise the compiler will not realize it.

export LD_LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/11:$LD_LIBRARY_PATH

The last thing to do is install a bunch of X and GTK-3 development libraries for Emacs GUI, and another bunch of image processing libraries.

sudo apt install libx11-dev libtiff-dev libgtk-3-dev libncurses-dev
sudo apt install libtiff5-dev libgif-dev libjpeg-dev libpng-dev libxpm-dev

Without the above packages, one may encounter the following error when configuring the Emacs build:

You seem to be running X, but no X development libraries were found. You should install the relevant development files for X and for the toolkit you want, such as Gtk+ or Motif. Also make sure you have development files for image handling, i.e. tiff, gif, jpeg, png and xpm.

3.2. Build Emacs 29.4 with native-comp

At this moment, we can start to download Emacs source code:

wget https://ftp.gnu.org/gnu/emacs/emacs-29.4.tar.xz
tar xvf emacs-29.4.tar.xz
cd emacs-29.4

Then we can configure the build (i.e., generate Makefile) with the following command.

./configure --with-native-compilation --with-x-toolkit=gtk3 --without-pop

--with-native-compilation: with this flag the Emacs source code is compiled to native machine code to achieve better performance.
- Otherwise it is compiled to bytecode and then interpreted by Emacs virtual machine during runtime.
--with-x-toolkit=gtk3: this is recommended by Claude.
--without-pop: if we are not using Emacs as the email client, we don’t need to bother configure the protocol.

If everything goes well, one should see the following line in the output. If not, make sure libgccjit has been installed and exported.

Does Emacs have native lisp compiler? yes

Now we can finally start compiling the Emacs:

make -j$(nproc)

If some error occurs, we may want to start again, to do this:

sudo apt install autoconf automake
rm -f Makefile
./autogen.sh  # regenerate the configuration file
# then rebuild
make -j$(nproc)

Finally, install Emacs globally:

sudo make install

To confirm the Emacs indeed used the native Lisp compiler, one can evaluate inside the vanilla Emacs with M-: (M is Alt in WSL2):

(native-comp-available-p) ;; should return t

Congratulations! You have now installed the latest and fastest Emacs on WSL2.

4. Install `ripgrep`

As mentioned in ripgrep documentation, for Debian/Ubuntu users, one should install the latest ripgrep 14.0+ with the following commands.

curl -LO https://github.com/BurntSushi/ripgrep/releases/download/14.1.0/ripgrep_14.1.0-1_amd64.deb  # check the latest version on its documentation
sudo dpkg -i ripgrep_14.1.0-1_amd64.deb  # dpkg has been installed before

Instead, if one installs it with apt, a 13.0+ version is installed and running doom doctor later returns the warning:

The installed ripgrep binary was not built with support for PCRE lookaheads.

5. Install Doom Emacs

Installing Doom Emacs is straightforward, but before that, one should first remove the default Emacs configuration folder:

rm -rf ~/.emacs.d

Then, clone and install Doom Emacs, it could take a while.

git clone --depth 1 https://github.com/doomemacs/doomemacs ~/.config/emacs
~/.config/emacs/bin/doom install

Don’t forget to export ~/.config/emacs/bin to PATH:

echo 'export PATH="$HOME/.config/emacs/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Now one can run doom doctor to check any missing dependencies, e.g., shellcheck. One common issue is the Nerd font is not installed by default so that some icons are not properly displayed. To fix that, run M-x nerd-icons-install-font inside the Emacs, then update the font cache with:

fc-cache -fv
# fc-list | grep Nerd  # to verify the font is installed

6. Some issues with running Emacs in WSL2

The first thing is I cannot reload the configuration with M-x doom/reload as running this command always gives me the following error message so that I need to restart the Emacs every time the configuration is changed.

%s sync -B -e /bin/bash: line 1: fg: no job control
I really dislike the white border that surrounds any application launched by WSL!

Hiked: Stoos

Fri, 06 Sep 2024 21:32:00 +0200

1. What happened?

About three weeks ago, I hiked Stoos and wrote a Redbook post. Unfortunately, the post was only visible to me, apparently violating community guidelines.

I spent most of that day trying to identify the problematic content, repeatedly editing and reposting. Later, I learned this behavior is discouraged on Redbook, e.g., I am not really authorized to edit my own content.

Frustrated, I abandoned my efforts, leaving my Redbook homepage cluttered with articles like “10 Behaviors That Harm Your Redbook Account” and “100 Forbidden Words on Redbook”.

Though brief, this experience instilled in me a tendency towards self-censorship, a feeling I deeply resent. I also noticed the lack of competitive alternatives to this centralized platform where users share authentic life experiences through multimedia.

Well played, Redbook!

2. What now?

Anyway, given that I invested considerable effort in creating post figures, I will post them here instead.

Figure 1: The Stoos roadmap

Figure 2: The uphill view

Figure 3: The top view

Figure 4: The ridge roadmap

Figure 5: The ridge view

Figure 6: My trace

Figure 7: Recommended apps

Figure 8: My favorite hike shoots

Figure 9: The Redbook post

CMU 15-445 notes: Hash Tables

Thu, 05 Sep 2024 12:26:00 +0200

This is a personal note for the CMU 15-445 L7 notes as well as some explanation from Claude.ai.

1. DBMS data structure application

Internal meta-data: page tables, page directories.
Tuple storage on disk.
Table indices: easy to find specific tuples.

1.1. Design decisions

Data layout for efficient access.
Concurrent access to data structures.

2. Hash tables

Implements an associative array that maps keys to values.
On average $O(1)$ operation complexity with the worst case $O(n)$; $O(n)$ storage complexity.
- Optimization for constant complexity is important in real world.

2.1. Where are hash tables used

For tuple indexing. While tuples are stored on pages with NSM or DSM, during the query the DBMS needs to quickly locate the page that stores specific tuples. It can achieve this with separately-stored hash tables, where each key can be a hash of a tuple id, and the value points the location.

3. Hash function

Maps a large key space into a smaller domain.
Takes in any key as input, and returns a deterministic integer representation.
Needs to consider the trade-off between fast execution and collision rate.
- Does not need to be cryptographically.
- The state-of-art (Fall 2023) hash function is XXHash3.

4. Hashing scheme

Handles key collision after hashing.
Needs to consider the trade-off between large table allocation (to avoid collision) and additional instruction execution when a collision occurs.

5. Static hashing scheme

The hash table size is fixed; the DBMS has to rebuild a larger hash table (e.g., twice the size) from scratch when it runs out of space.

5.1. Linear probe hashing

Insertion: when a collision occurs, linearly search the adjacent slots in a circular buffer until a open one is found.
Lookup: search linearly from the first hashed slot until the desired entry or an empty slot is reached, or every slot has been iterated.
- Requires to store both key and value in the slot.
Deletion: simply deleting the entry prevents future lookups as it becomes empty; two solutions:
- Replace the deleted entry with a dummy entry to tell future lookups to keep scanning.
- Shift the adjacent entries which were originally shifted, i.e., those who were originally hashed to the same key.
  - Very expensive and rarely implemented in practice.
The state-of-art linear probe hashing is Google absl::flat_hash_map.

5.1.1. Non-unique keys

The same key may be associated with multiple different values.
Separate linked list: each value is a pointer to a linked list of all values, which may overflow to multiple pages.
Redundant keys: store the same key multiple times with different values.
- A more common approach.
- Linear probing still works, if it is fine to get one value.

5.1.2. Optimization

Specialized hash table based on key types and size; e.g., for long string keys, one can store the pointer or the hash of the long string in the hash table.
Store metadata in a separate array, e.g., store empty slot in a bitmap to avoid looking up deleted keys.
Version control of hash tables: invalidate entries by incrementing the version counter rather than explicitly marking the deletion.

5.2. Cuckoo hashing

Maintains multiple hash tables with different hash functions to generate different hashes for the same key using different seeds.
Insertion: check each table and choose one with a free slot; if none table has free slot, choose and evict an old entry and find it another table.
- If a rare cycle happens, rebuild all hash tables with new seeds or with larger tables.
$O(1)$ lookups and deletion (also needs to store keys), but insertion is more expensive.
Practical implementation maps a key to different slots in a single hash table.

6. Dynamic hashing schemes

Resize the hash table on demand without rebuilding the entire table.

6.1. Chained Hashing

Maintains a linked list of buckets for each slot in the hash table; keys hashed to the same slot are inserted into the linked list.
Lookup: hash to the key’s bucket and scan for it.
Optimization: store bloom filter in the bucket pointer list to tell if a key exist in the linked list.

6.2. Extendible hashing

Improve chained hashing to avoid letting chains grow forever.
Allow multiple slot locations in the hash table to point to the same chain.

Figure 1: Extendible hashing example

6.3. Linear hashing

Maintains a split pointer to keep track of next bucket to split, even if the pointed bucket is not overflowed.

Figure 2: Linear hashing example

There are always only 2 hash functions: $(key\ mod\ n)$ and $(key\ mod\ 2n)$ where $n$ is the length of buckets when the split pointer is at the index 0 (i.e., the bucket length at any time is $n + index(sp)$).

Figure 3: Linear hashing deletion example

Why does $k\ mod\ 2n < n + sp$ hold?
- A key is only mod by $2n$ if the result of $(k\ mod\ n)$ is above the split pointer, i.e., $0 \leq k\ mod\ n < sp)$.
- Let $r = k\ mod\ n$, then $k = pn + r$ and $0 \leq r < sp$.
- Let $r' = k\ mod\ 2n$, then $k = q(2n) + r'$.
- If $p = 2m$, then we also have $k = m(2n) + r = q(2n) + r'$, in this case $0 \leq r = r' < sp$.
- If $p = 2m + 1$, then we have $k = m(2n) + r + n = q(2n) + r'$, in this case $n \leq r' = n + r < n + sp$.

My blog search function

Thu, 29 Aug 2024 13:59:00 +0200

1. What happened

Two months ago, I started to use the current blog to keep track of my study and personal notes.

This blog uses bastibe/org-static-blog, which is a convenient Emacs package to publish a static blog from Org-mode files.

When I first started, this blog had no styling at all. With the help of Claude, I gradually decorated it with the theme and fonts of my choice.

During this process, my husband made a feature request: a blog search function.

2. What is not enough with existing search functions?

Most default search tools provided by popular static website frameworks do not return complete search results and their context.

For instance, I am learning database design. Assume I cannot remember what a “slotted page” is but I know I have noted it before. If I enter “slotted” in the search box of a Jekyll-based blog, it will provide me with 2 post links implying these posts contain this keyword. In the best case, it also highlights the first occurrence of “slotted” in each post with its context.

This does not really meet my needs. I would like to see all notes I have made for “slotted” to better refresh my memory, but the current results force me to click each post and perform a local search myself to achieve this goal.

I understand that modern search tools consider scalability and intelligence. However, for a personal blog, we can achieve greater breadth.

3. What does the search function do in this blog?

Figure 1: A blog search example

The current search function searches for all occurrences of the given search input and highlights each occurrence in an item surrounded by its context, e.g., 50 characters before and after the input. If multiple occurrences are close to each other, they are displayed once in the same item. Clicking each item opens a new tab and jumps to the closest header before the input.

In this way, I can immediately see all results and their context on one page, and I only click an item when the short context is not sufficient.

4. How is it implemented?

I implemented the search function along with all styling with the help of Cursor. The full logic can be found in search.js.

In short, every time I build the blog, a list of all post links is stored in post-list.json. When the page is loaded, for each post link, the blog caches the full post content and all header indices. When a search input is detected, the search function iterates through each post to find each occurrence and the closest header. It then wraps the occurrence with certain context and prepares the link by appending the header tag to the post link.

5. How can it be improved?

The current search function is simple and sub-optimal. It always re-stores all posts when the page is reloaded and is not lazy at all. The search is also neither fuzzy nor the most efficient.

It just works well so far for a personal blog with about 30 posts.

In addition, after I finished the implementation, my husband made another feature request: can you also highlight all occurrences in the opened link?

Web learning in practice

Thu, 15 Aug 2024 19:26:00 +0200

This post records the basic web development knowledge I have learned in practice.

1. Basic html structure

<!DOCTYPE html>
<html lang="en">

    
    <head>
        <meta charset="UTF-8">
        
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        
        <title>A websitetitle>
        <link rel="stylesheet" href="styles.css">
    head>


    
    <<body>
    
    <header>
        <h1>Welcomeh1>
    header>

    
    <nav>
        <ul>
        <<li><a href="index.html">Homea>li>
        ul>
    nav>

    
    <main>
        <section id="home">
        <h2>Homeh2>
        <p>Welcome!p>
        section>
    main>

    
    <footer>
        <p>© All rights reserved.p>
    footer>
    body>

html>

2. Tags

<a>: contain links; have following attributes:
- target="_blank": open the link in a new tab.
- title="Go to the link": the tooltip message, i.e., the floating message when a user hovers over.
<span>: don’t add line breaks before or after it.
<hr>: horizontal rule.

3. Attributes

id="home":
- allow specific styling of the element, e.g., #home ul {...} only styles the ul in the block with the same id.

4. Javascript

4.1. Fetch from an HTML URL

await fetch(url): returns a Response object.
- await waits for the method to complete.
await response.text(): returns the html string.
DOMParser().parseFromString(postHtml, "text/html"): returns a Document object, which is a complete DOM tree from the HTML string; DOMParser is a built-in browser API.
postDoc.getElementById("content"): returns an Element with the content id name.
content.querySelector(".post-title a"): fetches the first Element with the post-title class name, and returns the first Element the first
(anchor) tag.
- . for classes, # for IDs (IDs should be unique within a page)

4.2. Modify a DOM

content.querySelector(".taglist").remove(): removes the element from the DOM, i.e., it modifies content.

4.3. Syntax

const escapedQuery = query.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"): add a backslash to any special character in the query; $& is a special pattern used in Javascript’s replacement method.

CMU 15-445 notes: Memory Management

Tue, 13 Aug 2024 07:46:00 +0200

This is a personal note for the CMU 15-445 L6 notes as well as some explanation from Claude.ai.

1. Goals

Manage DBMS memory and move data between the memory and the disk, such that the execution engine does no worry about the data fetching.
Spatial control: keep relevant pages physically together on disk.
Temporal control: minimize the number of stalls to read data from the disk.

2. Locks & Latches

Both used to protect internal elements.

2.1. Locks

A high-level logical primitive to allow for transaction atomicity.
Exposed to users when queries are run.
Need to be able to rollback changes.

2.2. Latches

A low-level protection primitive a DBMS uses in its internal data structures, e.g., hash tables.
Only held when the operation is being made, like a mutex.
Do not need to expose to users or used to rollback changes.

3. Buffer pool

An in-memory cache of pages read from the disk.
The buffer pool’s region of memory is organized as an array of fixed size pages; each array entry is called a frame.
When the DBMS requests a page, the buffer pool first searches its cache, if not found, it copies he page from the disk into one of its frame.
Dirty pages (i.e., modified pages) are buffered rather than writing back immediately.

3.1. Metadata

Page table: An in-memory hash mapping page ids to frame locations in the buffer pool.
Dirty flag: set by a thread whenever it modifies a page to indicate the pages must be written back to disk.
Pin/Reference counter: tracks the number of threads that are currently accessing the page; the storage manager is not allowed to evict a page if its pin count is greater than 0.

3.2. Memory allocation policies

The buffer pool decides when to allocate a frame for a page.
Global policies: decisions based on the overall workload, e.g., least recently used (LRU) or clock algorithm.
Local policies: decisions applying to specific query, e.g., priority-based page replacement.

4. Buffer pool optimizations

4.1. Multiple buffer pools

Each database or page can have its own buffer pool and adopts local policies to reduce latch contention.
To map a page to a buffer pool, the DBMS can use object IDs or page IDs as the key.
- Record ID: a unique identifier for a row in a table (cf. Tuple layout post).
- Object ID: a unique identifier for an object, used to reference a user-defined type.

4.2. Pre-fetching

While the first set of pages is being processed, the DBMS can pre-fetch the second set into the buffer pool based on the dependency between pages.
- E.g., If pages are index-organized, the sibling pages can be pre-fetched.

4.3. Scan sharing

Query cursors can reuse retrieved data.
When a query comes while a previous query is being processed by scanning the table, the new query can attach its scanning cursor to the first query’s cursor.
The DBMS keeps track of where the second query joined to make sure it also completes the scan.

4.4. Buffer pool bypass

Scanned pages do not have to be stored in the buffer pool to avoid the overhead.
Use cases: a query needs to read a large sequence of contiguous pages; temporary pages like sorting or joins.

5. Buffer replacement policies

The DBMS decides which page to evict from the buffer pool to free up a frame.

5.1. Least recently used (LRU)

LRU maintains a timestamp of when each page was last accessed, and evicts the page with the oldest timestamp.
- The timestamp can be stored in a queue for efficient sorting.
Susceptible to sequential flooding, where the buffer pool is corrupted due to a sequential scan.
- With the LRU policy the oldest pages are evicted, but they are more likely to be scanned soon.

5.2. Clock

An approximation of LRU but replace the timestamp with a reference bit which is set to 1 when a page is accessed.
Regularly sweeping all pages, if a bit is set to 1, reset to 0; if a bit is 0, evict the page.

5.3. LRU-K

Tracks the last K accessed timestamps to predict the next accessed time, hence avoid the sequential flooding issue.

5.3.1. MySQL approximate LRU-K

Use a linked list with two entry points: “old” and “young”.
The new pages are always inserted to the head of “old”.
If a page in the “old” is accessed again, it is then inserted to the head of “young”.

5.4. Localization

Instead of using a global replacement policy, the DBMS make eviction decisions based on each query.
Pages brought in by one query are less likely to evict pages that are important for other ongoing queries.
The DBMS can predicts more accurately which pages should stay or be evicted once the query is complete, so that the buffer pool is less polluted with less useful pages.

5.5. Priority hint

Transactions tell the buffer pool where pages are important based on the context of each page.

5.6. Dirty pages

Two ways to handle dirty pages in the buffer pool:
- Fast: only drop clean pages.
- Slow: write back dirty pages to ensure persistent change, and then evict them (if they will not be read again.).
Can periodically walk through the page table and write back dirty pages in the background.

6. Other memory pools

A DBMS also maintains other pools to store:
- query caches,
- logs,
- temporary tables, e.g., sorting, join,
- dictionary caches.

7. OS cache bypass

Most DBMS use direct I/O (e.g., with fsync instead of fwrite) to bypass the OS cache to avoid redundant page copy and to manage eviction policies more intelligently (cf. Why not OS post).

8. I/O scheduling

The DBMS maintains internal queue to track page read/write.
The priority are determined by multi-facets, e.g., critical path task, SLAs.

CMU 15-445 notes: Storage Models & Compression

Thu, 08 Aug 2024 17:48:00 +0200

This is a personal note for the CMU 15-445 L5 notes.

1. Database workloads

1.1. OLTP (Online Transaction Processing)

Characterized by fast, repetitive, simple queries that operator on a small amount of data, e.g., a user adds an item to its Amazon cart and pay.
Usually more writes than read.

1.2. OLAP (Online Analytical Processing)

Characterized by complex read queries on large data.
E.g., compute the most popular item in a period.

1.3. HTAP (Hybrid)

OLTP + OLAP.

2. Storage models

Different ways to store tuples in pages.

2.1. N-ary Storage Model (NSM)

Store all attributes for a single tuple contiguously in a single page, e.g., slotted pages.
Pros: good for queries that need the entire tuple, e.g., OLTP.
Cons: inefficient for scanning large data with a few attributes, e.g., OLAP.

2.2. Decomposition Storage Model (DSM)

Store each attribute for all tuples contiguously in a block of data, i.e., column store.
Pros: save I/O; better data compression; ideal for bulk single attribute queries like OLAP.
Cons: slow for point queries due to tuple splitting, e.g., OLTP.
2 common ways to put back tuples:
- Most common: use fixed-length offsets, e.g., the value in a given column belong to the same tuple as the value in another column at the same offset.
- Less common: use embedded tuple ids, e.g., each attribute is associated with the tuple id, and the DBMS stores a mapping to jump to any attribute with the given tuple id.

Figure 1: DSM storage model (Source)

2.3. Partition Attributes Across (PAX)

Rows are horizontally partitioned into groups of rows; each row group uses a column store.
A PAX file has a global header containing a directory with each row group’s offset; each row group maintains its own header with content metadata.

Figure 2: PAX storage model (Source)

3. Database compression

Disk I/O is always the main bottleneck; read-only analytical workloads are popular; compression in advance allows for more I/O throughput.
Real-world data sets have the following properties for compression:
- Highly skewed distributions for attribute values.
- High correlation between attributes of the same tuple, e.g., zip code to city.
Requirements on the database compression:
- Fixed-length values to follow word-alignment; variable length data stored in separate mappings.
- Postpone decompression as long as possible during query execution, i.e., late materialization.
- Lossless; any lossy compression can only be performed at the application level.

3.1. Compression granularity

Block level: compress all tuples for the same table.
Tuple level: compress each tuple (NSM only).
Attribute level: compress one or multiple values within one tuple.
Columnar level: compress one or multiple columns across multiple tuples (DSM only).

3.2. Naive compression

Engineers often use a general purpose compression algorithm with lower compression ratio in exchange for faster compression/decompression.
E.g., compress disk pages by padding them to a power of 2KBs and storing them in the buffer pool.
- Why small chunk: must decompress before reading/writing the data every time, hence need o limit the compression scope.
Does not consider the high-level data semantics, thus cannot utilize late materialization.

4. Columnar compression

Works best with OLAP, may need additional support for writes.

4.1. Dictionary encoding

The most common database compression scheme, support late materialization.
Replace frequent value patterns with smaller codes, and use a dictionary to map codes to their original values.
Need to support fast encoding and decoding, so hash function is impossible.
Need to support order-preserving encodings, i.e., sorting codes in the same order as original values, to support range queries.
- E.g., when SELECT DISTINCT with pattern-matching, the DBMS only needs to scan the encoding dictionary (but without DISTINCT it still needs to scan the whole column).

4.2. Run-Length encoding (RLE)

Compress runs (consecutive instances) of the same value in a column into triplets (value, offset, length).
Need to cluster same column values to maximize the compression.

Figure 3: Run-length encoding (Source)

4.3. Bit-packing encoding

Use less bits to store an attribute.

Figure 4: Bit-packing encoding (Source)

4.4. Mostly encoding

Use a special marker to indicate values that exceed the bit size and maintains a look-up table to store them.

Figure 5: Mostly encoding (Source)

4.5. Bitmap (One-hot) encoding

Only practical if the value cardinality is low.

Figure 6: Bitmap encoding (Source)

4.6. Delta encoding

Record the difference between values; the base value can be stored in-line or in a separate look-up table.
Can be combined with RLE encoding.

Figure 7: Delta encoding (Source)

4.7. Incremental encoding

Common prefixes or suffixes and their lengths are recorded to avoid duplication.
Need to sort the data first.

Parallel EVM: Blockworks news (Sei, Monad, Solana)

Thu, 08 Aug 2024 09:19:00 +0200

This is a personal note for Blockworks news (12.01.2024) as well as some terminology explained online, e.g., Coindesk and GPT-4o.

1. Terminology

1.1. Ethereum sharding

The Ethereum mainnet is divided into smaller interconnected networks called shards.
Each shard processes and validates its own transactions parallel to others.
Pros: increase scalability and participation.
Cons: a single unit can be compromised; lead to centralization.

1.2. Blob

Rather than storing each transaction data directly in the blockchain, the data is aggregated into a blob (binary object).
Each blob performs erasure coding to dive the blob into multiple smaller pieces with redundancy.
Encoded pieces are stored separately, the block header contain pointers to the piece locations without storing actual data.
Transactions in a block may be distributed across multiple blobs.

1.3. Erasure coding

Allows one to encode blobs such that if at least half of the data in the blob is published, anyone in the network can reconstruct and re-publish the rest of the data.

1.4. Data availability sampling (DAS)

Validators randomly sample blob pieces to confirm the data can be reconstructed.g
If a client cannot get enough pieces to verify the blob availability, or the blob fails the integrity check, or transactions within the blob are invalid or inconsistent with the blockchain state, the blob is rejected.

1.5. Danksharding (L2 optimization)

A specific sharding implementation proposal.
Require data availability sampling and proposer-builder separation.
Can support hundreds of individual rollups.

1.6. Relations between L1 and L2 scaling

L1 scaling: optimizations directly to the Ethereum mainnet and core infrastructure, e.g., parallel EVM.
L2 scaling: building secondary rollup layers, e.g., optimistic rollups and ZK rollups, to offload mainnet computation and storage.

1.7. Double spending prevention

Bitcoin: uses UTXOs to track which inputs have been spent (no need to go through the entire chain).
Ethereum: uses a nounce to track the number of transactions sent from an account, the nounce is included in the transaction and is incremented by 1 for every new transaction, and all transactions must be executed in order.

1.8. Sealevel (Solana)

Solana’s parallel smart contract runtime to process thousands of contracts in parallel.
Solana transactions describe all states a transaction accesses to efficiently recognize transaction dependency and to schedule parallel execution without accessing full blockchain state.

2. Ways to achieve parallel processing

Process independent transactions in parallel.
Sharding.

3. Production-ready parallelized EVM projects (Jan 2024)

Sei: optimistic parallel execution.
Monad: custom EVM implementation, optimistic parallel execution, custom state database.
- Commodity databases are not optimized for Merkle tree data read/write with SSD.
Neon (Solana): transactions pre-specify dependencies.
See BNB chain post for more solutions.

My Exam Organization Experience

Wed, 07 Aug 2024 20:16:00 +0200

This post was co-authored with the assistance of Cursor and Claude AI.

1. What is it about?

Over the past years, I’ve been involved in organizing and monitoring four exams. Yesterday marked my fourth and likely final exam monitoring session for the foreseeable future. Now seems like an opportune moment to reflect on my exam organization journey.

2. My first exam

My first exam experience was relatively straightforward. As I was unfamiliar with the exam content, my primary responsibility was taking students to the restrooms. I recall accompanying nearly 10 students during the session. For the remainder of the time, I sat at the back of the room, occupying myself with some drawing.

3. My second exam

This exam was identical to the first, but I was tasked with designing a significant portion of the questions. Initially, formulating questions seemed straightforward, but I later realized my inability to create effective ones. My first draft incorporated connections between various questions. During the review process, I was advised to maximize information density in questions, as stressed students prefer concise text. Additional challenges emerged when I began grading the exams. For example, open-ended questions consistently yielded unexpected answers, requiring me to modify the grading scheme while maintaining consistency with previous assessments.

The exam monitoring this time was more demanding as I began responding to student inquiries. My limited knowledge of the entire exam often required me to seek assistance from colleagues, and occasionally I struggled to comprehend the questions.

Grading the exam proved to be the most challenging aspect of the process. Evaluating over 100 exams, deciphering handwriting, determining fair point allocations, and recording scores was an incredibly tedious task. Unlike traditional mathematics exams that focus primarily on calculations, our exam encouraged creative thinking. Each answer contained implicit assumptions about the question, requiring careful interpretation to avoid excessive point deductions. The process was extremely time-consuming and offered minimal personal benefit or satisfaction.

4. My third exam

This was a different exam, where I again designed one part and monitored its administration. Fortunately, due to its advanced nature, few students registered for this exam. This time, I had the privilege of collaborating with exceptional colleagues. Despite these advantages, the design process still demanded considerable effort on my part.

A unique aspect of this exam was that I also developed the course material, as it was a newly introduced subject. The topic proved challenging, and the exercises were demanding. Students expressed concerns about the complexity of the material. Consequently, we faced the task of devising ways to simplify the exam questions without compromising their effectiveness.

5. My fourth exam

Now comes the most recent exam, where I took on the role of a coordinator. This position entailed not only designing my portion of the tasks but also finding colleagues to design other tasks, managing the timeline, reviewing drafts, printing exams, coordinating exam monitoring, and overseeing the grading process. Fortunately, I had the support of proactive and helpful colleagues, which allowed everything to proceed smoothly.

The monitoring process proved particularly demanding this time. As I was now familiar with the entire exam, I was responsible for reading instructions, making announcements, and answering as many questions as possible. I had to arrive at the exam rooms by 8 AM, forgoing breakfast, and remained standing and focused until noon. In retrospect, I may have been overly diligent due to inexperience and could have perhaps allowed myself to relax more. However, I’ll never know for certain, as this marks the conclusion of my exam organization duties.

Well, not quite the end, as I still need to complete the grading and submit the results.

6. My feeling

After four instances of exam organization experience, I must admit that I do not recommend it. This opinion is, of course, highly personal, as I am not particularly inclined towards teaching. From my perspective, it is a job that demands significant effort while offering minimal returns. I can confidently say that I invest more time in the exam process than any individual student taking it.

The process of designing and grading exams presents a complex optimization challenge. There’s an inverse relationship between the time spent on design and the time required for grading. The less effort put into crafting the exam, the more time-consuming the grading process becomes. Moreover, it’s disheartening to spend two full days meticulously designing and reviewing a question, only to have it attempted by a mere handful of students.

CMU 15-445 notes: Database storage

Wed, 31 Jul 2024 18:26:00 +0200

This is a personal note for the CMU 15-445 L3 notes and CMU 15-445 L4 notes.

1. Data storage

1.1. Volatile device

The data is lost once the power is off.
Support fast random access with byte-addressable locations, i.e., can jump to any byte address and access the data.
A.k.a memory, e.g., DRAM.

1.2. Non-volatile device

The data is retained after the power is off.
Block/Page addressable, i.e., in order to read a value at a particular offset, first need to load 4KB page into memory that holds the value.
Perform better for sequential access, i.e., contiguous chunks.
A.k.a disk, e.g., SSD (solid-state storage) and HDD (spinning hard drives).

1.3. Storage hierarchy

Close to CPU: faster, smaller, more expensive.

1.4. Persistent memory

As fast as DRAM, with the persistence of disk.
Not in widespread production use.
A.k.a, non-volatile memory.

1.5. NVM (non-volatile memory express)

NAND flash drives that connect over an improved hardware interface to allow faster transfer.

2. DBMS architecture

Primary storage location of the database is on disks.
The DBMS is responsible for data movement between disk and memory with a buffer pool.
The data is organized into pages by the storage manager; the first page is the directory page
To execute the query, the execution engine asks the buffer pool for a page; the buffer pool brings the page to the memory, gives the execution engine the page pointer, and ensures the page is retained in the memory while being executed.

2.1. Why not OS

The architecture is like virtual memory: a large address space and a place for the OS to bring the pages from the disk.
The OS way to achieve virtual memory is to use mmap to map the contents of a file in a process address space, and the OS is responsible for the data movement.
If mmap hits a page fault, the process is blocked; however a DBMS should be able to still process other queries.
A DBMS knows more about the data being processed (the OS cannot decode the file contents) and can do better than OS.
Can still use some OS operations:
- madvise: tell the OS when DBMS is planning on reading some page.
- mlock: tell the OS to not swap ranges outs of disk.
- msync: tell the OS to flush memory ranges out to disk, i.e., write.

3. Database pages

Usually fixed-sized blocks of data.
Can contain different data types, e.g., tuples, indexes; data of different types are usually not mixed within the same page.
Some DBMS requires each page is self-contained, i.e., a tuple does not point to another page.
Each page is given a unique id, which can be mapped to the file path and offset to find the page.

3.1. Hardware page

The storage that a device guarantees an atomic write, i.e., if the hardware page is 4KB and the DBMS tries to write 4KB to the disk, either all 4KB is written or none is.
If the database page is larger than the hardware page, the DBMS requires extra measures to ensure the writing atomicity itself.

4. Database heap

A heap file (e.g., a table) is an unordered collection of pages where tuples are stored in random order.
To locate a page in a heap file, a DBMS can use either a linked list or a page directory.
- Linked list: the header page holds a pointer to a list of data and free pages; require a sequential scan when finding a specific page.
- Page directory: a DBMS uses special pages to track the location of each data page and the free space in database files.
  - All changes to the page directory must be recorded on disk to allow the DBMS to find on restart.

5. Page layout

Each page includes a header to record the page meta-data, e.g., page size, checksum, version.
Two main approaches to laying out data in pages: slotted-pages and log-structured.

5.1. Slotted-pages

The header keeps track of the number of used slots, the offset of the starting of each slot.
When adding a tuple, the slot array grows from the beginning to the end, the tuple data grows from the end to the beginning; the page is full when they meet.
Problems associated with this layout are:
- Fragmentation: tuple deletions leave gaps in the pages.
- Inefficient disk I/O: need to fetch the entire block to update a tuple; users could randomly jump to multiple different pages to update a tuple.

Figure 1: Slotted pages (Source)

5.2. Log-structured

Only allows creations of new pages and no overwrites.
Stores the log records of changes to the tuples; the DBMS appends new log entries to an in-memory buffer without checking previous records -> fast writes.
Potentially slow reads; can be optimized by bookkeeping the latest write of each tuple.

5.2.1. Log compaction

Take only the most recent change for each tuple across several pages.
There is only one entry for each tuple after the compaction, and can be easily sorted by id for faster lookup -> called Sorted String Table (SSTable).
Universal compaction: any log files can be compacted.
Level compaction: level 0 (smallest) files can be compacted to created a level 1 file.
Write amplification issue: for each logical write, there could be multiple physical writes.

5.3. Index-organized storage

Both page-oriented and log-structured storage rely on additional index to find a tuple since tables are inherently unsorted.
In an index-organized storage scheme, the DBMS stores tuples as the value of an index data structure.
E.g., In a B-tree indexed DBMS, the index (i.e., primary keys) are stored as the intermediate nodes, and the data is stored in the leaf nodes.

Figure 2: Index-organized storage (Source)

6. Tuple layout

Tuple: a sequence of bytes for a DBMS to decode.
Tuple header: contains tuple meta-data, e.g., visibility information (which transactions write the tuple).
Tuple data: cannot exceed the size of a page.
Unique id: usually page id + offset/slot; an application cannot rely on it to mean anything.

6.1. Denormalized tuple data

If two tables are related, a DBMS can “pre-join” them so that the tables are on the same page.
The read is faster since only one page is required to load, but the write is more expensive since a tuple needs more space (not free lunch in DB system!).

7. Data representation

A data representation scheme specifies how a DBMS stores the bytes of a tuple.
Tuples can be word-aligned via padding or attribute reordering to make sure the CPU can access a tuple without unexpected behavior.
5 high level data types stored in a tuple: integer, variable-precision numbers, fixed-point precision numbers, variable length values, dates/times.

7.1. Integers

Fixed length, usually stored using the DBMS native C/C++ types.
E.g., INTEGER.

7.2. Variable precision numbers

Inexact, variable-precision numeric types; fast than arbitrary precision numbers.
Could have rounding errors.
E.g., REAL.

7.3. Fixed-point precision numbers

Arbitrary precision data type stored in exact, variable-length binary representation (almost like a string) with additional meta-data (e.g., length, decimal position).
E.g., DECIMAL.

7.4. Variable-length data

Represent data of arbitrary length, usually stored with a header to keep the track of the length and the checksum.
Overflowed data is stored on a special overflow page referenced by the tuple, the overflow page can also contain pointers to next overflow pages.
Some DBMS allows to store files (e.g., photos) externally, but the DBMS cannot modify them.
E.g., BLOB.

7.5. Dates/Times

Usually represented as unit time, e.g., micro/milli-seconds.
E.g., TIMESTAMP.

7.6. Null

3 common approaches to represent nulls:
- Most common: store a bitmap in a centralized header to specify which attributes are null.
- Designate a value, e.g., INT32_MIN.
- Not recommended: store a flag per attribute to mark a value is null; may need more bits to ensure word alignment.

8. System catalogs

A DBMS maintains an internal catalog table for the table meta-data, e.g., tables/columns, user permissions, table statistics.
Bootstrapped by special code.

Ethereum Merkle Patricia Trie

Sun, 28 Jul 2024 09:34:00 +0200

This is a personal note of Ethereum Merkle Patricia Trie (MPT), resources are from:

1. Blockchain fundamentals

1.1. RLP (Recursive Length Prefix)

A serialization method to encode arbitrarily nested arrays of binary data.
RLP provides a simple (e.g., no type), space-efficient and deterministic encoding.

1.2. Merkle tree

Used in Bitcoin to simplify proof of inclusion (PoI) of a transaction.
If one computes the hash of an array of $N$:
- Construction complexity: $O(n)$ time and space.
- PoI complexity: $O(n)$ time and space (needs all other items).

1.2.1. Complexity for $N$ items.

Construction: $O(2n)$ time and space.
PoI complexity:
- $O(logN)$ space: PoI requires one hash from each level from the leaf to the root (the Merkle tree is binary).
- $O(logN)$ time: $O(logN)$ to collect all hashes, and $O(logN)$ to generate the proof.

Figure 1: Bitcoin Merkle Tree (source)

1.3. Patricia tree

Trie: a data structure that stores key-value pair in a key’s prefix tree.
Patricia tree: compress trie by merging nodes on the same path.
The structure the Patricia tree is independent of the item insertion order.
The time complexity for add, query and deletion is $O(K)$, where $K$ is the key length.

Figure 2: Patricia Tree (source)

1.4. Merkle Patricia Tree (MPT)

MPT is a hex-ary Merkle tree with an additional DB for hash lookup.
There are 4 types of nodes:
- Empty node: the null node the root points to when first creating the tree.
- Leaf node: stores the real data, e.g., account balance.
- Branch node: stores the pointers to at most 16 other nodes, e.g., they have the common prefix (nibbles) before and differ at the current nibble (4 bit,0-f).
- Extension node: record the compressed common prefix for a branch node.
Each pointer in the tree is the hash value of the child node; the real node data is stored in a separate DB that maps from a node hash to its data.
If the child node is small, the parent node could also directly store the node data rather than the hash pointer.
In practical implementation, the entire tree is typically stored in a KV DB, and each node is stored with its hash as the key.

Figure 3: MPT DB storage (source)

1.4.1. Prefix byte

Identify both the node type and the parity of the stored nibbles.
Leaf node: 2 if the key-end has even number of nibbles, e.g., the compressed ending of an account; 3X if the number is odd (so the last 4-bit is stored as X in the prefix).
Extension: 0 if the shared nibbles has even number; 1X if has odd number.

1.4.2. Complexity for $N$ items and key length $K$

Construction:
- Time: worst $O(NK)$; average: $O(Nlog_{16}N)$.
- Space: $O(N)$.
Indexing (e.g., query an account balance):
- Time: tree traversal worst $O(K)$, average $O(log_{16}N)$; each traversal equals a DB query.
PoI: $O(16log_{16}N)$ time and space.
- Calculating the hash of a branch node requires the hash of all 16 child nodes.

Figure 4: Merkle Patricia Tree (source)

1.5. Rollup state tree

Rollup has a higher performance requirement for PoI.
Separate the indexing and PoI with a sorted key-value arrays and a (binary) Merkle tree.
- MPT: {Addr0: State0, Addr1: State1,...}.
- Rollup: map: {Addr0: Id0, Addr1: Id1,...} + array: [(Addr0, State0), (Addr1, State1),...].
When a client wants to query an account, it first gets the key id from the map, then get the state from the array.
When a node wants to generate PoI, it follows the merkle path and collect hashes (more hashes than MPT).

1.6. PoI for Verkle tree (see MegaETH post for details)

Stateless light nodes get a witness along with the new block, the witness is a PoI for the state change in the block.
Light nodes download related state information, e.g., changed account from other full nodes, or from the portal network.

1.7. Polynomial/KZG commitment

In MPT, PoI for a branch node requires the hash values of all branches.
KZG commitment reduce the proof size by adding a polynomial formula $f(x)$ in the branch node, and each branch has a point $(x, y)$ such that $y = f(x)$.
In this way, the proof no longer requires hashes of other branches, the proof space complexity $O(log_{16}N)$ (no 16 coefficient).

2. Ethereum MPT data structure

Essentially is a key-value mapping; it provides Get, Put and Del functions.
Ethereum has 3 MPTs: transaction trie; receipt trie and state trie, each trie root hash is included in the block header.
- transactionTrie: all transactions included in the block.
  - The keys are the RLP encodings of an unsigned integer starting from 0.
  - The values are the RLP encodings of the transaction.
- stateTrie: all account states in the network.
- receiptTrie: the outcomes of all transaction executions in the block, e.g., gas used, transaction status.

3. Ethereum MPT Functionality

Allows to verify data integrity with the Hash function to compute the Merkle root hash.
Allows to verify the inclusion of a key-value pair without the access to the entire key-value pairs.
- A full node provide a merkle proof Proof for a key-value pair (e.g., an account and its balance).
- A light node can verify a proof only against the root hash with VerifyProf(rootHash, key, proof); if the proof does not match the hash (e.g., the balance mismatches), an error is thrown.
Why would a light node trust the root hash: it trusts the consensus mechanism, e.g., other benign full nodes verify the hash, act honestly is more profitable.

4. Proof of inclusion

Proof: the path from the root to the leaf node.
Verification: start from the root, decode the node to match the nibbles until find the node that matches all the remaining nibbles; if not found, the proof is invalid.

End of my first German course

Sat, 27 Jul 2024 00:28:00 +0200

It’s been a while since I wanted to document my experience in my German course. Now, having completed the first month, there couldn’t be a better time to do so.

Initially, I approached the course with some dissatisfaction. The pace felt too slow, and I was critical of my teacher’s methods. Boredom and impatience set in as I waited for others to catch up.

But then I began interacting with my classmates, and I had the chance to hear a different world.

One of the first people I met was a talkative Ukrainian woman. When she is not satisfied with something, she interrupts and shouts. She knows a lot about healthcare, and she told me this and that about refugee polices. I admire her enviable energy.

When I first walked in the classroom, I noticed a young man who already spoke rapid German. I asked him how he did it and he said since he started learning German four months ago, he only spoke German. He is smart and also speaks English, Turkish and Afghanistan. His experience reminds me a lot of the book “The New Odyssey”, and I am curious about more of his stories. He is only 19 years old, I believe a bright future awaits him.

Figure 1: The New Odyssey

The man sitting next to me has a political asylum visa. He was a political journalist not welcomed by the government. He said so he lived at the borderline for many years before he came here. He is no longer young and it was his second time to take the same course. He has a beautiful handwriting.

Another young man I know a bit comes from Latin America, he often dozes off in the class due to his Uber delivery job. I also met a young lady from Turkey, she is so charming with a wonderful personality. There are also two resilient mothers who undertake childcare with their studies.

In the last week I finally got to know my teacher a bit. He told us he worked 220 hours a month at the moment so that he can pay his expense for he and his girlfriend, and he used to work even more. He said he had different trainings in different countries, and now he finally got a new passport. He shared with us different information that help foreigners maintain a basic life here.

In the end, I still think the course is too easy for me, who has spent twenty years staying in schools. But I wish I had talked more with everyone about their lives. So many people come here, and each finds their own way to stay and live. Life is always hard but also incredible, and I wish I don’t ever forget it.

Parallel EVM: Reth scaling plan

Wed, 24 Jul 2024 15:54:00 +0200

This is a personal note for Reth-performance-blog as well as some terminology explain online, e.g., Reth-repo and Claude.ai.

1. Blockchain fundamentals

1.1. Ethereum engine API

A collection of JSON-RPC methods that all execution clients implement.
Specify the interfaces between consensus and execution layers.

1.2. Foundry

A Rust-written toolkit for Ethereum application development.
Consists of an Ethereum testing framework Forge; a framework to interact with the chain Cast; a local Ethereum node Anvil; and a Solidity REPL (Read-Eval-Print-Loop: an interactive environment) Chisel.

1.3. Revm

A Rust-written EVM; responsible for executing transactions and contracts.

1.4. Alloy

A library to interact with the Ethereum and other EVM-base chains.

1.5. Erigon & Staged sync

Erigon: a Go-written Ethereum client implementation (execution layer).
Staged sync: break the chain synchronization process into distinct stages in order to achieve better efficiency.

1.6. Storage engines

1.6.1. ACID

A set of properties for database transactions: atomicity, consistency, isolation, duration.
Atomicity: a transaction is treated as an indivisible unit; if any part of the transaction fails, the entire transaction is rolled back.
Consistency: a transaction brings the database from one valid state to another.
Isolation: concurrent transaction execution leave the database in the same state as if transactions are executed sequentially
Duration: a committed transaction remains committed even when the system fails.

1.6.2. MVCC (Multi-version concurrency control)

A concurrency control model used in DBMS.
MVCC keeps multiple version of data simultaneously, each transaction sees a snapshot of the database.

1.6.3. Common database models

Relational model, e.g., SQL.
Document model.
Network model.
key-value, e.g., NoSQL.

1.6.4. Common storage engines

MDBX: Ultra-fate key-value embedded database with ACID and MVCC supported.
LevelDB: Google-developed key-value store using log-structured merge-tree for high write throughput.
RocksDB: Meta’s fork of LevelDB, optimized for fast storage.
LSM-based DBs, e.g., BadgerDB: optimized for write-heavy workloads with log-structured merge-tree.
BoltDB: Go-written key-value database with optimized B+ tree, ACID supported.
LMDB: memory-mapped key-value store with ACID and MVCC supported.

1.7. Reth

A Rust implementation of an Ethereum full node; allows users to interact with the Ethereum blockchain.
An execution layer that implements all Ethereum engine APIs.
Modularity: every component is built as a library.
Performance: uses Erigon staged-sync node architecture and other Rust libraries (e.g., Alloy, revm); tests and optimizes on Foundry.
Database/Storage engine: MDBX.

1.8. Why gas per second as the performance metric

More nuanced than TPS.
Allows for a clear understanding for the capacity and efficiency.
Helps assessing the cost implications, e.g., DoS attacks.

1.9. EVM cost models

Determines the computational and storage costs for the execution.
Key aspects: gas, gas cost (for each operation), gas price (in Wei), gas limit.

1.10. TPC benchmark

Standardized performance tests for transaction processing and databases, e.g., how many transactions a system can process in a given period.
Offer benchmarks for different scenarios, e.g., TPC-C for online transaction processing.

1.11. State growth

State: the set of data for building and validating new Ethereum blocks.
State growth: the accumulation of new account and new contract storage.

1.12. JIT (Just-In-Time) and AOT (Ahead-of-Time) EVM

JIT: convert bytecode to native machine code just before execution to bypass the VM’s interpretative process.
AOT: compile the highest demand contracts and store them on disk, to avoid untrusted bytecode absuing native-code compilation.

1.13. Actor model

A paradigm/framework for designing distributed systems.
Actor: each actor is an independent entity to receive, process and send messages; create new actors or modify its state.

1.14. Storage trie

Each contract account has its own storage trie, which is usually stored in a KV database.

1.15. Serverless database

Allow developers to focus on writing queries without managing database servers.
Automatically scales up or down base on the workload.
Pay-per-use pricing.

2. Reth scaling plan

Current status (April 2024): achieves 100-200 mg/s during live sync, including sender recovery, transaction execution and block trie calculation.
The scaling plan does not involve solving state growth.

2.1. Vertical scaling (2024)

Optimize how each system handle transactions and data.

2.1.1. JIT/AOT EVM

Reduce EVM interpreter overhead to speed up single-threaded transaction processing.
The processing costs $\approx$ 50% EVM time
Released on June 2024.

Figure 1: The JIT/AOT compiler (source)

2.1.2. Parallel EVM

Utilize multiple cores during EVM execution.
<80% of historical transactions have non-conflicting dependencies.
Historical sync: can calculate the best parallelization schedule offline; an early attempt is available.
Live sync: combine serial and parallel execution based on static analysis, since Block STM has poor performance during heavy state contention periods; an early attempt is available.

2.1.3. Optimized state commitment

Traditional EVM implementation couples the transaction execution and the state root computation: the state root is updated whenever a transaction updates a trie, since the state root computation has to be sequential from the updated node to the root, this is slow.
Reth decouples the process: raw state data is stored in KV databases, and each trie is re-built for each block from the databases in the end.
- Pro: can use more efficient databases.
- Con: need to re-calculate the entire trie, which costs >75% of end-to-end block production time.
Optimizations:
- Now already re-calculate the storage trie for each updated contract in parallel.
- Can also calculate the account trie when the storage tries are computed.
- Pre-fetch cached trie nodes (cached by the state root computation) by tracking updated accounts and storage, e.g., a part of the trie may remain the same hash.
Going beyond:
- Only calculate the state root every $T$ blocks.
- Lag the state root computation a few blocks behind to advance executions.
- Use a cheaper encoder and hash function (Blake3).
- Use wider branch nodes.

2.2. Horizontal scaling (2025)

Spread the workload across multiple systems.

2.2.1. Multi-Rollup (?)

Reduce operational overhead of running multiple rollups.

2.2.2. Cloud-Native nodes.

Deploy the heavy node (e.g., sequencer) as a service stack that can autoscale with compute demand and use cloud storage for persistence.
Similar to serverless database projects, e.g., NeonDB.

2.3. Open questions

Second order effects of above changes, e.g., on light clients.
What is the best, average and worst case scenarios for each optimization.

CMU 15-445 notes: Modern SQL

Tue, 23 Jul 2024 13:05:00 +0200

This is a personal note for the CMU 15-445 L2 notes, along with some SQL command explained by Claude.ai.

1. Terminology

1.1. SQL and relational algebra

Relational algebra is based on sets (unordered, no duplicates); SQL is based on bags (unordered, allows duplicates).
SQL is a declarative query language; users use SQL to specify the desired result, each DBMS determines the most efficient strategy to produce the answer.

1.2. SQL commands

Data manipulation language (DML): SELECT, INSERT, UPDATE, DELETE.

Data definition language (DDL): CREATE.

CREATE TABLR student (
    sid INT PRIMARY KEY,
    name VARCHAR(16),
    login VARCHAR(32) UNIQUE,
    age SMALLINT,
    gpa FLOAT
);

Data control language (DCL): security, access control.

2. SQL syntax

2.1. Join

Combine columns from one or more tables and produces a new table.

-- All students that get an A in 15-721
SELECT s.name
    FROM enrolled AS e, student AS s
WHERE e.grade = 'A' AND e.cid = '15-721'
    AND e.sid = s.sid

2.2. Aggregation function

AVG(COL), MIN(COL), MAX(COL), COUNT(COL).

Take as input a bag of tuples and produce a single scalar value.

-- Get number of students and their average GPA with a '@cs' login
SELECT AVG(gpa), COUNT(sid) FROM student WHERE login LIKE '@cs';
-- Get the unique students
SELECT COUNT(DISTINCT login) FROM student WHERE login LIKE '@cs';

Non-aggregated values in SELECT output must appear in GROUP BY.

-- Get the average GPA in each course
SELECT AVG(s.gpa), e.cid
    FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid;

HAVING: filter output results based on aggregation computation.

SELECT AVG(s.gpa), e.cid
    FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid
HAVING AVG(s.gpa) > 3.9;

2.3. String operation

Strings are case sensitive and single-quotes only in the SQL standard.
Use LIKE for string pattern matching:
- % matches any sub-string,
- _ matches any one character
Standard string functions: UPPER(S), SUBSTRING(S, B, E).
||: string concatenation.

2.4. Date and time

Attributes: DATE, TIME.
Different DBMS have different date/time operations.

2.5. Output redirection

One can store the results into another table

-- output to a non-existing table
SELECT DISTINCT cis INTO CourseIds FROM enrolled;
-- output to an existing table with the same number of columns and column type
-- but the names do not matter
INSERT INTO CourseIds (SELECT DISTINCT cid FROM enrolled);

2.6. Output control

Use ORDER, ASC and DESC to sort the output tuples; otherwise the output could have different order every time.

Use LIMIT, OFFSET to restrict the output number.

SELECT sid FROM enrolled WHERE cid = '15-721'
ORDER BY UPPER(grade) DESC, sid + 1 ASC;
    LIMIT 10 OFFSET 10;  -- output 10 tuples, starting from the 11th tuple

2.7. Nested queries

Nested queries are often difficult to optimize.
The inner query can access attributes defined in the outer query.

Inner queries can appear anywhere.

-- Output a column 'one' with 1s, the number of 1s
-- equals to the number of rows in 'student'
SELECT (SELECT 1) AS one FROM student;

-- Get the names of students that are enrolled in '15-445'
SELECT name FROM students
    WHERE sid IN (
        SELECT sid FROM enrolled
        WHERE cid = '15-445'
);

-- Get student record with the highest id
-- that is enrolled in at least one course.
SELECT student.sid, name
    FROM student
    -- the intermediate output is aliases as max_e
    JOIN (SELECT MAX(sid) AS sid FROM enrolled) AS max_e
    -- only select student who has the max_e
    ON student.sid = max_e.sid;

-- the above is same as below, but `join` syntax is more preferred
SELECT student.sid, name
FROM student AS s, (SELECT MAX(sid) AS sid FROM enrolled) AS max_e
WHERE s.sid = max_e.sid;

Nested query results expression:

ALL: must satisfy expression for all rows in sub-query.
ANY, IN: must satisfy expression for at least one row in sub-query.

EXISTS: at least one row is returned.

-- Get all courses with no students enrolled in
SELECT * FROM course
    WHERE NOT EXISTS(
        SELECT * FROM enrolled
            WHERE course.cid = enrolled.cid
)

-- Get students whose gpa is larget than the highest score in '15-712'
-- and the login has a level > 3
SELECT student.sid, name
    FROM student AS S
WHERE s.gpa > ALL (
    SELECT course.score FROM course
        WHERE course.cid = '15-712'
)
AND student.login IN (
    SELECT login FROM enrolled
    WHERE level > 3
);

2.8. Window functions

Perform sliding calculation across a set of tuples.

2.9. Common Table Expressions (CTE)

An alternative to windows or nested queries when writing more complex queries.

CTEs use WITH to bind the output of an inner query to a temporary table.

WITH cteName (col1, col2) AS (
    SELECT 1, 2
)
SELECT col1 + col2 FROM cteName;

CMU 15-445 notes: Relational Model & Algebra

Wed, 17 Jul 2024 17:05:00 +0200

This is a personal note for the CMU 15-445 L1 video and CMU 15-445 L1 notes, along with some terminology explained by Claude.ai.

1. Terminology

1.1. Database

An organized collection of inter-related data that models some aspect of the real-world.

1.2. Database design consideration

Data integrity: protect invalid writing.
Implementation: query complexity, concurrent query.
Durability: replication, fault tolerance.

1.3. Database management system (DBMS)

A software that manages a database.
Allow the definition, creation, query, update and administration of databases.

1.4. Data model

A conceptual, high-level representation of how data is structured
Defines entities, attributes, relationships between entities and constraints.

1.5. Schema

A concrete implementation of a data model.
Defines tables, fields, data types, keys and rules.
Typically represented by a specific database language.

1.6. Entities and Tables

Entities: conceptual representations of objects in the logical data model.
Tables: physical storage structures in the physical data model.

1.7. Attributes and Fields

Attributes: properties of an entity.
Fields: columns in a database table.

1.8. Logical layer

The entities and attributes the database has.

1.9. Physical layer

How are entities and attributes stored in the database.

1.10. Data manipulation languages (DMLs)

Methods to store and retrieve information from a database.
Procedural: the query specifies the (high-level) strategy the DBMS should use to get the results, e.g., with relational algebra.
Declarative: the query specifies only what data is desired but not how to get it, e.g., with relational calculus (a formal language).

1.11. SQL (Structured Query Language) and relational model

SQL implements the relational model in DBMS and provides a standard way to create, manipulate and query relational databases.
Different SQL implementation may vary and do not strictly adhere to the relational model, e.g., allow duplicate rows.

2. Relational model

A data model that defines a database abstraction to avoid maintenance overhead when changing the physical layer.
Data is stored as relations/tables.
Physical layer implementation and execution strategy depends on DBMS implementation.

Figure 1: Relational model concepts (Source)

2.1. A relation

An unordered set that contains the relationship of attributes that represent entities.
Relationships are unordered in the relation.

2.2. A domain

A named set of allowable values for a specific attribute.

2.3. A tuple

A set of attribute values in the relation.
Values can also be lists or nested data structures.
Null: a special value in any attribute which means the attribute in a tuple is undefined.
$n-ary$: a relation with $n$ attributes.

2.4. Keys

Primary key: uniquely identifies a single tuple.
Foreign key: specifies that an attribute (e.g., CustomerID) in one relation (e.g., OrderTable) has to map to a tuple (e.g., the tuple with the same CustomerID) in another relation (e.g., CustomerTable).

3. Relational Algebra

A set of fundamental operations to retrieve and manipulate tuples in a relation.
Each operator takes in one or more relations as inputs, and outputs a new relation; operators can be chained.
Is a procedure language, meaning the execution always follow the query, even there exists more efficient way to get the same result; A better way is to be more declarative, e.g., SQL’s where syntax.
Common relational algebra.

Parallel EVM: BEP-130

Sun, 14 Jul 2024 12:38:00 +0200

This is a personal note for BEP-130. BEP-130 is a proposal that introduces a parallel transaction execution mechanism on the BNB Smart Chain (BSC).

1. Blockchain fundamentals

1.1. System contract

Built-in contracts to perform system level operations, e,g., gas fee reward, cross chain communication.
Cannot be executed concurrently since they depend on the execution results of other transactions, e.g., a number of transaction made by an account at some timestamp.

1.2. Transaction execution phases

Block mining phase: received from the P2P transaction pool, could contain invalid transactions.
Block sync phase: the block is confirmed.

2. Design principle

Should always produce the same result as the current sequential execution.
Should be decoupled into existing or new modules with no circular dependency.
Should be configurable based on node hardware resources.
Keep it simple and smart.

3. Workflow

3.1. Dispatch factors

Is the slot idle or occupied?
Is there a same address contract running or pending in this slot?
Has the slot’s pending transactions size reached the max transactions queue size limitation?
Is there a big transaction index gap between the slot’s head transaction and the dispatched transaction?
Is the transaction contract likely to have high gas cost or a conflict rate?

3.2. Slot execution stages

Execute the transaction $Tx_i$based on a specific worldstate, e.g., the state when the execution starts.
Wait for the finalization of the previous transaction $Tx_{i-1}$.
Detect if there is any conflict between the state read by $Tx_i$ and the state change after the execution of $Tx_i$ starts.
If a conflict is detected, re-execute $Tx_{i}$ again based on the latest finalized worldstate.
Finalize the state changed by $Tx_i$ to the latest worldstate.
The state changes are kept within each slot, and are merged to the main StateDB once the execution is done.
The first transaction in a block can be immediately finalized.
If $Tx_i$ and $Tx_{i-1}$ are in the same slot, $Tx_i$ can immediately start conflict detection.
Re-executed transaction can be immediately finalized as it reads the latest worldstate.

3.3. Conflict detection

Detection items: storage key/value pair; account balance; contract content and status.
Overlap reads without write, or hardcode writes without read are not conflicts.

Parallel EVM: BNB chain

Sun, 07 Jul 2024 22:19:00 +0200

This is a personal note for BNB chain-blog.

1. Blockchain fundamentals

1.1. Why is parallel EVM not easy

Lack of visibility of potential transaction conflict.
Blockchains experience transaction bursts, e.g., >70M transactions per day.

1.2. A Parallel EVM ideas

Run multiple EVM instances concurrently on different threads.
Execute transactions independently on each thread and later merge a finial state update.
Parallel EVM scheme

1.3. Block STM algorithm

Optimistic parallelism: assigns transactions to various threads.
Software transaction memory (STM): detect conflicts when transactions try to modify the same shared state simultaneously.
Conflict resolution: when conflicts are detected, the offending transactions are discarded without affecting the blockchain state and are re-executed.

2. BNB Parallel EVM 1.0: Infrastructure

Proposal: BEP-130 (2022)
Dispatcher: distributes transactions across threads to optimize throughput.
Parallel execution engine: execute transactions independently on each thread.
Local stateDB: each thread maintains a local stateDB to record state access.
Conflict detection: detect conflicts and re-execute conflicting transactions.
State commit: the finalized results are committed to the global state DB.

3. BNB Parallel EVM 2.0: Performance enhancement

Dispatcher: combine both static and dynamic dispatch strategies.
Execution engine: streaming pipeline to enable smooth transaction processing.
Conflict detection: ensure data integrity while minimizing unnecessary re-execution.
Memory: shared memory pools and light copying techniques to reduce memory footprint.
The overall performance ranges from 20% to 50%.

4. BNB Parallel EVM 3.0: Production

4.1. Hint-based dispatcher

leverages external hint providers to analyze transactions and generate predictions about potential state access conflicts.
Simple hints include read/write state sets; advanced hints incorporate weak/strong ordering for optimal parallelism.
Conflicting transactions are assigned to the same slot.
Transactions with no conflicts are distributed across different slots.
Conflict detector remains as a backup for handling unforeseen conflicts.

4.2. Seamless BNB chain ecosystem integration

Modularization and reconstructing.
Thorough testing and validation.

5. Comparison with other solutions

Solutions	TX dependency check	Conflict resolution	StateDB optimization
BlockSTM	tracks at execution	re-execution	N/A
Polygon	minimal metadata solution	reduced re-execution	N/A
Monad	static analysis	reduced re-execution	Monad DB
Sei	tracks at execution	re-execution	SeiDB
Neon EVM and Solana Sealevel	contract provided	reduced re-execution	depends on Solana
BNBChain	hint info	reduced or eliminated re-execution	Thread local DB

6. Other optimizations

Opcode-level optimization: fine-tuning individual EVM instructions for maximum efficiency.
Compilation optimization: JIT/AOT compilation paradigms; instruction-level parallelism (SIMD).
Database sharding: distribute data across multiple databases.
Concurrent node execution.

Parallel EVM: MegaETH

Thu, 04 Jul 2024 15:34:00 +0200

This is a personal note for MegaETH-blog as well as some terminology explained online, e.g., ethereum.org.

In summary, this blog proposes many challenges when designing a high-performance EVM chain, but does not include any design details of MegaETH itself.

1. Blockchain fundamentals

1.1. Conduit chain

Allows one to deploy a rollup through its Rollups-as-a-service platform within in minutes.

1.2. Gas per second

Reflects the amount of computation the blockchain can handle per second.
Different EVM operation costs different gas, e.g., ADD costs 3 gas.
Block gas limit: ensures that any node can reliably keep up with the rest of the network.

1.3. Target gas per block

Dynamically regulate the amount of computation a block can include.
Gas per second = Target Gas per block / Block time.

1.4. Current blockchain scalability

Figure 1: 2024 blockchain scalability comparison

Throughput: 100MGas/s ($\approx$ 3700 ERC-20 transfer) cannot compares to Web2 database with >1M transactions per second.
Capacity: Complex applications cannot be on-chain, e.g., compute large Fibonacci (e.g., $10^8$) number would take 55 seconds on opBNB, while in C just 30 milliseconds in a single core.
Delay: Applications that require fast feedback loop, e.g., high-frequency trading are not feasible with long block times, e.g., 1s.

1.5. Blockchain node hardware requirements

Lower hardware requirements for full nodes increase decentralization.
Higher requirements increase performance and security.

1.6. L1 and L2 nodes

L1 nodes are homogeneous; each node performs identical tasks, i.e., transaction consensus and execution without specialization.
L2 nodes are heterogeneous; different nodes perform specific tasks, e.g., sequencer node determines the transaction order, prover nodes rely on accelerators to enhance proof generation.

1.7. Verifying a block

Re-execute the transactions in the block.
Applying the changes to Ethereum state trie.
Calculate the new root hash and compare it with the root hash provided by the block.

1.8. Maximum extractable value (MEV)

Validators maximize their profitability by favorably ordering transactions.

1.9. Proposer-builder separation (PBS)

Block builders are responsible for creating blocks and offering them to the block proposer in each slot.
Block proposers cannot see the contents, but simply choose the most profitable one and pay a fee to the block builder before broadcasting the block.
PBS makes it harder for block builders to censor transactions, and to outperform individuals at MEV extraction.

1.10. Live and historical sync

Live (online): continuously update a node with the latest data.
Historical (offline): synchronize a node by downloading the processing data up to a certain point.
Historical sync has much higher TPS than live sync, e.g., 10x, since historical sync can perform batch processing and does not have network latency.

1.11. Portal Network

An in-development p2p network for serving historical data where each node stores a small piece of Ethereum’s history.
Light nodes do not need to trust on full nodes.
The entire history exists distributed across the network.

1.12. Verkle tree

Stateless clients rely on a witness that arrives with the block for PoI rather on maintaining own local trie.
Witness: the minimal set of data that prove the values of the state that are being changed by the transactions in a block.

Merkle tree is too large to be broadcast between peers; the witness is a path connecting the data from leaved to the root, and to verify the data the hash of all sibling nodes are also required (to compute the parent hash).
Verkle trees reduce the witness size by shortening the distance between leaves and eliminating the need to provide sibling nodes; Using a polynomial commitment scheme (see Ethereum MPT post for explanation) allows the witness to have a fixed size.

1.13. Node storage

High disk space is the main barrier to a full node access, due to the need to store large chunks of Ethereum state data to process new transactions.
Using cheap hard drivers to store old data cannot keep up with new blocks.
Clients should find new ways to verify transactions without relying on looking up local databases.

1.13.1. History expiry

Nodes discard state data older than X blocks with weak subjectivity checkpoints, i.e., a genesis block close to the present.
Nodes can request historical data from peers with Portal Network, e.g., altruistic nodes that are willing to maintain and serve historical achieves, e.g., DAO.
Does not fundamentally change how Ethereum node handles data.
Controversial due to it could introduce new censorship risks if centralized organizations are providing historical data.
EIP-4444 is under active discussion regarding community management.

1.13.2. State expiry

Remove state from individual nodes if it has not been accessed recently.
The inactive accounts is not deleted, but stored separately from the active state and can be resurrected.
A leading approach requires to add timestamps to the account address.
The responsibility of storing old data may also be moved to centralized providers.

1.13.3. Statelessness

weak statelessness: only block producers need access to full state data.
Weak statelessness require Verkle trees and proposer-builder separation.
strong statelessness: no nodes need access to the full state data.
In strong statelessness, witnesses are generated by users to declare accounts related to the transaction; not a part of Ethereum’s roadmap.

1.14. Software transactional memory (STM)

A concurrency control mechanism to control access to shares memory in software.
A transaction refers to a piece of code executing a series of reads and writes to the shared memory.
Transactions are isolated; changes made by one transaction are not visible to others until the transaction commits.
When a conflict is detected, e.g., two transactions try to modify the same memory, one transaction is rolled back.

1.15. Block-STM

A parallel execution engine to schedule smart contract transactions based on STM.
Transactions are grouped in blocks, every execution of the block must yield the deterministic and consistent outcome.

2. What is MagaETH

An EVM-compatible L2 blockchain with Web2-level real-time processing and publishing, i.e., millisecond-level response times under heavy load.
Main idea: delegate security and censorship resistance to base layers, e.g., Ethereum to make room for L2 optimization.

2.1. Node specialization

sequencer: only one active sequencer at any time to eliminate the consensus overhead.
full node: receive state diff from the sequencer via a p2p network and apply the diffs to update local states; don’t re-execute transactions, only validates the block indirectly using proofs provided by the provers.
provers: validate the block asynchronously using the stateless validation scheme.
Endgame, Vitalik 2021: Node specialization ensures trustless and high decentralized block validation (more provers), even though block production becomes more centralized (one sequencer).

2.2. Design philosophy

Reth (Rust implementation of the Ethereum protocol) is bottlenecked by the MPT update in a live sync setup, even with a powerful sequencer.
Measure, then build: first get insights from real problems, then design techniques to address all problems simultaneously.
Prefer clean-slate, as addressing any bottleneck in isolation rarely results in significant end-to-end performance improvement.

3. MegaETH challenges

Figure 2: A transaction life-cycle.

State synchronization requires high data compression given limited network bandwidth.
Updating the hash root requires intensive disk I/O operation, which cannot be well speedup with optimized smart-contract compilers.
Cannot easily raise block gas limit without properly repricing opcodes that do not benefit from optimized compilation.
Parallelism is low for long dependency chains.
The actual user experience highly depend on the infrastructure, e.g., RPC nodes, indexers.
Support transaction priorities, e.g., critical transactions should be processed without queuing delays.

My German course

Thu, 04 Jul 2024 13:36:00 +0200

1. Was ist passiert

Ich habe seit diesem Montag einen intensiven Deutschkurs gemacht. Die Kurs fangt von 8:30 Uhr bis 11:45 Uhr jeden Tag an. Weil von mein Wohnung bis zu die Schule ich fast einen Stunden brauche, muss ich vor 7 aufstehen und zur Bushaltstelle runnen. Nach dem Kurs fahre ich mit dem Bus zur Buro und arbeite ich bis zum Abendessen. Danach komme ich nach Haus zuruck und mache ein paar Haushalt. Ich gehe circa 11 Uhr ins Bett.

2. Ich kann kein gutes Englisch schreiben

Wenn ich hier diesen Satzen schreibe, merke ich, dass ich komplexe Satze auf Englisch nicht mehr schreiben kann.

This is interesting, I am still in the starting phase, I cannot speak fluently at all and I have grammar mistakes everywhere. However, now when I try to type something in english, the german translation pops up in my mind first. If I don’t know the german, I have a problem writing it down in english. As a result, my written english falls back to almost the same level as german.

3. Ich kann nicht mich vorstellen

All my classmates come from diverse backgrounds. At least 5 are Ukriainian, and at least 2 from some countries that I have never heard before. English is no longer a universal language, but with my broken german I can barely chat.

Even with english, I have struggled a lot introducing myself. In the past decade, I never had this issue since everyone around me shared the similar experience. I can easily position myself; which education I had, which university I went, what work I am doing, what issue I have, etc. But this coordinate system works no more in this class. I am only confusing others if I mention I am a PhD student. So you are a student, they ask, why would they pay a student?

Once I asked classmate what she does on weekend. She paused, and said: because you are working, so the weekend is different for you, but I don’t work, so there is no difference for me. And before this I have never thought of such a bias in my question.

More and more often recently, I realize how limited my perspective is. I did not know what the life with a social worker is like. I did not know that Afghan can often speak Turkish. I did not know what my lecturer thinks when he works as a security guard at night. I have been too much caught up in my own world, such that I am unable to ask a meaningful question of people I don’t understand. I want to know more about the difference of this world, but I don’t know how.

Ethereum Virtual Machine

Sun, 30 Jun 2024 11:29:00 +0200

This is a personal note of EVM, resources are from:

1. Terminology

EVM: a decentralized virtual environment that executes code consistently and securely across all Ethereum nodes.
Gas: used to measure the computational effort required to execute smart contracts.
Ether(ETH): the native cryptocurrency in Ethereum; used to incentivize computation.
Wei: the smallest subdenomination of Ether; 1 Ether = $10^{18}$ Wei.
State: A modified Merkle Patricia Trie to keep all accounts linked by hashes and reducible to a single root hash stored on the blockchain.
State transition function: Y(S, T)=S': produces a deterministic new valid state (S') given an old valid state (S) new set of valid transactions (T)
Transactions: signed instructions from accounts, includes
- a contraction creation to create a new contract account containing compiled contract bytecode, or
- a message call to a contract to execute the bytecode.
Proof of work: a spam deterrence mechanism; demonstrate the potential for a basic data channel to carry a strong economic signal without relying on trust.
Fork: a disagreement between nodes as to which root-to-leaf path down the block tree is the best blockchain.
Chain Id ($\beta$): distinguish between diverged blockchains (EIP-155).

Figure 1: The EVM structure

1.1. Motivation

To facilitate transactions between individuals who would otherwise have no means to trust one another.
To enforce a rich and unambiguous agreement autonomously (crypto-lay system).

1.2. Blockchain paradigm

\begin{aligned} \sigma_{t+1} & \equiv \Pi\left(\boldsymbol{\sigma}_t, B\right) \\ B & \equiv\left(\ldots,\left(T_0, T_1, \ldots\right), \ldots\right) \\ \Pi(\boldsymbol{\sigma}, B) & \equiv \Upsilon\left(\Upsilon\left(\boldsymbol{\sigma}, T_0\right), T_1\right) \ldots \end{aligned}

$\sigma$: a valid state between two transactions.
$B$: a block including a series of transactions.
$\Upsilon$: the Ethereum state transition function.
$\Pi$: the block-level state transition function.

1.3. Ethereum Transaction execution

An user (externally owned accounts, EOA) signs a transaction, including the sender, receiver (the contract address), Ether value, Gas limit and Gas price.
The transaction is broadcast to the Ethereum network.
Once a validator receives the transaction, it first performs sanity check, e.g., signature validation, balance check.
Upon passing the validation, a transaction is included in a block and executed.
1. Initialization: PC set to the start of the contract code; Gas limit; empty stack, memory; contract state trie loaded to the storage.
2. Execution: locally executes each bytecode and modifies stack (PUSH), memory (MSTORE) and storage (SSTORE); modifies the global state tree (CALL).
3. Abortion: if the gas is used up, all state changes during the execution are reverted.
After the execution is finished, the validator assembles the block and proposes the new block.
If a consensus is reached, the block is appended to the blockchain, and other nodes verify the block and update their global states accordingly.

1.4. World state $\sigma$

A mapping between 160-bit addresses and account states, maintained in a modified Merkle Patricia tree (MPT), serialized as RLP, stored in a off-chain database backend.
MPT benefits: the root node depends on all internal data; allows any previous state with known root hash to be recalled as the tree is immutable.
Merkle Patricia Trie representation of state data across blocks

1.4.1. The account state $\sigma[a]$

nonce: the number of transactions the address has sent, or the number of contracts the address has made.
balance: the number of Wei owned by the address.
storageRoot: a 256-bit hash of the root node of a MPT which encodes the account storage, i.e., the contract storage.
codeHash: the hash of the EVM code, i.e., the contract bytecode, which is executed if the address receives a message call.

Viewed: Interlaken

Sat, 29 Jun 2024 23:52:00 +0200

1. Where did we go

We rent a car and drove to Interlaken.
We visited Aareschlucht and Brienz, and did some short strolls around.

Figure 1: The view outside the car

2. Why did we go

According to the unpredictable Meteoswiss forecast, this Saturday would have been the only weekend-day with some good weather in limited locations to go hiking since 2? or 3? weeks ago.
But this was not the case anymore when we arrived the Rothorn Kulm restaurant via the steam train.

Figure 2: The weather at the Rothorn Kulm

3. What did we do

We first went a tour in Aareschlucht, where we experienced both strong cold wind and gentle warm wind.
We then did a half-hour small hiking to go back to our starting point.
Then we drove to Brienz, and had a relaxing lunch along the lake, where I ate a piece of watermelon.
We took the 1-hour steam train to the Rothorn Kulm, when we arrived it started to be foggy, windy and rainy.
We refilled our energy with hot drinks and snacks in the restaurant, and took a short stroll around the restaurant.
We went back to the restaurant to escape the cold (yes it is end of June) and waited for the steam train to go down.
During the train down the sun came out a bit, and the view was clear and nice again!

Figure 3: The water in Aareschlucht

Figure 4: The view on the train

Parallel EVM: Sei-v2

Fri, 28 Jun 2024 21:25:00 +0200

This is a personal note for Sei-v2-blog as well as some terminology explained by EVM GPT.

1. Blockchain fundamentals

1.1. Mainnet and Testnet

Mainnet: real transactions occur and have real-world value; any operation is final and irreversible.
Testnet: a sandbox environment with test cryptocurrencies without real-world value.

1.2. Mainnet Alpha and Beta

Alpha: test core functionalities and gather initial feedback in live environment.
Beta: more stable and feature-complete, but still need more testing.

1.3. Layer 1 and Layer 2

L1: the main network where all transactions are processed and the primary chain is maintained.
L2: secondary frameworks on top of L1 chain, aimed to enhance scalability without compromising the security of the L1.

1.4. Rollups

Processes transactions off-chain and periodically submits a summary (rollup) to L1.
Optimistic rollups: assume transactions are valid be default and use a challenge period to allow disputes
- Examples: Optimism, Arbitrum.
ZK-rollups: use zero-knowledge proofs to validate transactions.
- Examples: zkSync, StarkNet.

1.5. State channels

allow participants to conduct numerous off-chain transactions, with only the final state recorded on the L1.
Examples: Bitcoin lightning network.

1.6. Sidechains

Independent blockchains running parallel to the main chain, with own consensus and security.
Examples: Polygon.

1.7. Ethereum and EVM

Ethereum: a blockchain ecosystem, includes the blockchain, consensus mechanism, smart contracts, native cryptocurrency.
EVM: the runtime environment to execute smart contracts in Ethereum.

1.8. IAVL tree

AVL tree: self-balancing binary tree where the difference in heights between left and right subtrees of anynode is at most one.
IAVL tree: immutable AVL tree; node cannot be changed once it is added.

1.9. CosmWasm contract

Allows developers to write fast and portable smart contracts in WebAssembly.
Designed to be interoperable within in the Cosmos ecosystem, a network of independent blockchains connected via the Inter-blockchain communication (IBC) protocol.

1.10. Cosmos Ecosystem

A network of independent, interoperable blockchains designed to create an Internet of blockchain.
Decouples the consensus (BFT consensus engine) and networking (IBC protocol) layers from the application layers
Cosmos SDK: a modular framework for building application-specific blockchains efficiently.
Cosmos Hub: the first blockchain in the Cosmos network, serves as a central hub to connect multiple blockchains via IBC.

1.11. Blockchain layers

Infrastructure layer: the physical devices that support the network; and the underlying communication protocols for data transfer between nodes.
Data layer: the distributed ledger and the storage methods.
Consensus layer: the protocols, validators and miners; determines the transaction orders.
Execution layer: smart contracts and virtual machines; determines the transaction update.
Application layer: dApps to provide service and user interfaces.
Governance layer: the community decision-making process and proposals.
Security layer: cryptographic primitives and security protocols to avoid attacks.

1.12. Optimistic parallelization

Multiple transactions are processed in parallel under the assumption that they will conflict with each other; necessary corrections are made afterwards if conflicts are detected.
Increase throughput if conflicts are well handled.

1.13. Integrated and Modular blockchain

Integrated: all components, e.g., execution layer, consensus mechanism, networking are tightly coupled; faster internal communication but lower flexibility and scalability.
Modular: allow independent upgrades for different components; enhance scalability.

1.14. EVM Execution and storage layer

Execution: responsible for running smart contracts and processing transactions.
Storage: store all blockchain data, e.g., accounts, smart contract states, transaction history.

1.15. Block time and finalize time

Block: the average time for a new block to be added.
Finalize: the period after which a block is considered irreversible.
Faster block times often imply cheaper transaction fees due to increased transaction throughput and less block competition.

1.16. Blockchain audit

A review of a blockchain to ensures its security and functionality.

2. What is Sei

On mainnet beta since August 2023.
Consistently finalizes blocks at 390ms; the fastest chain in existence.
Consistently sees activity of >45 TPS (transaction per seconds); the second highest number of successful transactions per second.
Allows for Cosmwasm smart contracts written in Rust; more execution environments like EVM is the biggest request.

3. What is Sei v2

The first fully parallelized EVM.
Backwards compatibility of EVM smart contracts.
Optimistic parallelization; support parallelization without requiring any dependencies.
Improves the storage layer to prevent state bloat, read/write, and state sync for new nodes.
Seamless composability between different execution environments.
Offers 28,300 batched transactions per second of throughput; 390ms block times and 390ms finality; far cheaper per-transaction costs.
Once audits are complete, the upgrade is released in a public testnet in Q1 2024, and deployed to mainnet in H1 2024.

3.1. Backwards compatibility

Ethereum contracts can be seamlessly deployed on Sei v2 with no code changes.
User can send a Eth transaction to the Ethereum contract on Sei v2 via the same interface, e.g., Metamask, Hardhat.
Sei v2 imports Geth (a Go EVM implementation) to process the Eth transaction, and convert the result to Sei storage.

3.2. Optimistic parallelization

Sei requires smart contract developers to optionally define the state that smart contracts are using, Sei v2 removes this need.
Sei v2 chain optimistically runs all transactions in parallel, when reaching conflicts, i.e., transactions touching the same state, the chain tracks the storage parts each transaction is touching.
Transactions touching different parts will be rerun in parallel; transactions touching the same state will be rerun sequentially.
Recursively continue until no more conflicts.
Since the transactions are ordered in a block, this process is deterministic.

3.3. SeiDB

Sei uses a vanilla database layer composed of an IAVL tree, which is less efficient in terms of storage and latency.
Sei v2 breaks the single IAVL tree into 2 components:
- state store: provide low latency direct access to raw key-value pairs to remove the overhead of redundant metadata and disk usage; uses a write-ahead log to help event recovery.
- state commitment: use an in-memory IAVL tree to help validators reach consensus faster.
After benchmarking, Sei v2 replaces GoLevelDB with PebbleDB for better read/write in multi-threaded access.

3.4. Interoperability

Sei v2 processes different transactions, e.g., Cosmwasm, EVM in a uniformed way, and then forwards them to different storage sections.

Watched: Argo (2012)

Wed, 26 Jun 2024 21:37:00 +0200

Figure 1: Argo poster

1. Why did I watch it

This week, I found myself craving a movie with a “Homeland” theme. Then the name of Argo came to me. I saw its name a decade ago (I am too old!) on some movie magazine cover.

2. What is it about

It has little thing to do with “Homeland”. Instead, it blends action, comedy and drama based on a true story, although non of any dramatic conflicts really happened in history afak. It is about how a selfless agent, backed by an entire intelligence system which behaved incredibly efficient, and an alliance which also behaved incredibly generous, escaped a group of compatriots, who also behaved incredibly brave, from a dangerous place. My boyfriend said it is quite “Hitchcock”. Every tension resolves safely in the end: the phone will be picked up at the last second, the plane will take off at the last second, even the plane tickets will magically show up at the last second (Swissair yeah!).

3. Did I like it

I guess? The filming is good, as well as the narration and the performance. There were some moments where I was not fully convinced, but I accept them if that was the history. What really amazed me was when I found the actor of Tony was also the director, and also wrote “Good Will Hunting (1997)”, what a genius! But I guessed I preferred “Civil war (2024)” (my last movie) to this one. The reason? I felt “Argo” could have delved deeper into the absurdities of war rather than solely focusing on a heroic narrative.

Figure 2: Good Will Hunting poster

Figure 3: Civil War poster

Go patterns

Tue, 25 Jun 2024 21:25:00 +0200

This a personal note for the Russ Cox guest lecture.

1. Concurrency vs Parallelism

Concurrency: write a program to handle lot of things at once
- not necessarily faster
Parallelism: the program itself can do a lot of computations at once

2. Use goroutines for states

2.1. Matching a regex

return if a given string matches a regex: start with ", contains arbitrary escape sequence and ends with "

unclear logic: store states in the data

 1: state := 0
 2: for {
 3:     c := read()
 4:     switch state {
 5:     case 0:
 6:         // first char must be "
 7:         if c != '"' {
 8:             return false
 9:         }
10:         state = 1 // match the next char
11:     case 1:
12:         // ending with " matches
13:         if c == '"' {
14:             return true
15:         }
16:         if c == '\\' {
17:             state = 2
18:         } else {
19:             // transition to state 1 to match next char
20:             state = 1
21:         }
22:     case 2:
23:         // read the char, discard it and
24:         state = 1
25:     }
26: }

clear logic: store states in the code

 1: // no variable to store state
 2: if read() != '"' {
 3:     return false
 4: }
 5: var c rune // c is a Unicode, alias to int32
 6: for c != '"' {
 7:     c = read()
 8:     if c == '\\' {
 9:         read()  // skip the next char
10:     }
11: }
12: return true

2.2. When the state variable cannot be avoided

the function needs to return the state

 1: type quoter struct {
 2:     state int
 3: }
 4: 
 5: func (q *quoter) Init() {
 6:     r.state = 0
 7: }
 8: // proess each char based on current state
 9: func (q *quoter) Write(c rune) Status {
10:     switch q.state {
11:     case 0:
12:         if c != '"' {
13:             return BadInput
14:         }
15:         q.state = 1
16:     case 1:
17:         if c == '"' {
18:             return Success
19:         }
20:         if c == '\\' {
21:             q.state = 2
22:         } else {
23:             q.state = 1
24:         }
25:     case 2:
26:         q.state = 1
27:     }
28:     return NeedMoreInput
29: }

use additional goroutines to hold states

 1: type quoter struct {
 2:     char chan rune
 3:     status chan Status
 4: }
 5: func (q *quoter) Init() {
 6:     q.char = make(chan rune)
 7:     q.status = make(chan Status)
 8:     // need to make sure why and when the goroutine will exit
 9:     go q.parse()
10:     // blocks until it receives an initial status from parse()
11:     // to ensure that parse() is ready, i.e., q.status = NeedMoreInput
12:     // before Write() is called
13:     <-q.status
14: }
15: // Write sends the next char to q.char, which will be receivecd by parse()
16: // the status is a public state accessible by the user
17: func (q *quoter) Write(r rune) Status {
18:     q.char <- c
19:     // wait for the result
20:     return <-q.status
21: }
22: func (q *quoteReader) parse() {
23:     if q.read() != '"' {
24:         q.status <- SyntaxError
25:         return
26:     }
27:     var c rune
28:     for c!= '"' {
29:         c = q.read()
30:         if c == '\\' {
31:             q.read()
32:         }
33:     }
34:     q.status <- Done
35: }
36: // a helper function used in parse() to return the next char in q.char
37: func (q *quoter) read() int {
38:     q.status <- NeedMoreInput
39:     return <- q.char
40: }
41: func main() {
42:     q := &quoter{}
43:     q.Init()
44: 
45:     input := `"Hello, \"World\""`
46:     for _, c := range input {
47:         status := q.Write(c)
48:     }
49: }

check goroutine blockage
- Ctrl-\ sends SIGQUIT
- use the HTTP server’s /debug/pprof/goroutine if importing net/http

3. Pattern 1: publish/subscribe server

the information goes one way: server -> client
close a channel to signal no new values will be sent

prefer defer when unlocking the mutex

 1: type Server struct {
 2:     mu  sync.Mutex // protect sub
 3:     sub map[chan<- Event]bool  // whether a channel should be closed
 4: }
 5: func (s *Server) Init() {
 6:     s.sub = make(map[chan<- Event]bool)
 7: }
 8: // publish an event to all subscribed channel
 9: func (s *Server) Publish(e Event) {
10:     s.mu.Lock()  // each method could be called by many clients
11:     defer s.mu.Unlock()
12:     // need mutex here since it needs to read s.sub state
13:     for c := range s.sub {
14:         // if a goroutine consumes the channel events too slow
15:         // then a new event publish has to wait
16:         // before it can send to the channel
17:         // can add channel buffer to mitigate this
18:         c <- e
19:     }
20: }
21: // a channel starts to subscribe
22: func (s *Server) Subscribe(c chan<- Event) {
23:     s.mu.Lock()
24:     defer s.mu.Unlock()
25:     if s.sub[c] {
26:         // the mutex wil also be unlocked with defer
27:         panic("pubsub: already subscribed")     }
28:     s.sub[c] = true
29: }
30: // a channel cancels the subscription
31: func (s *Server) Cancel(c chan<- Event) {
32:     s.mu.Lock()
33:     defer s.mu.Unlock()
34:     if !s.sub[c] {
35:         panic("pubsub: not subscribed")
36:     }
37:     close(c)
38:     delete(s.sub, c)
39: }

3.1. Options for slow goroutines

slow down event generation
drop events if it cannot be sent, e.g., os/signal, runtime/pprof

queue events, e.g., add a helper between the server and each client, which also separates the concerns

 1: func helper(in <-chan Event, out chan<- Event) {
 2:     var q []Event
 3:     // if the in is closed, flash out the pending events in q
 4:     // and close out
 5:     for in != nil || len(q) > 0 {
 6:         // decide whether and what to send
 7:         var sendOut chan<- Event
 8:         var next Event
 9:         if len(q) > 0 {
10:             sendOut = out
11:             next = q[0]
12:         }
13:         select {
14:         case e, ok := <-in: // never reaches here after in = nil
15:             // ok tells whether in is closed
16:             if !ok {
17:                 in = nil
18:                 break
19:             }
20:             q = append(q, e)
21:         case sendOut <- next: // if len(q) == 0, sendOut = nil
22:             q = q[1:]
23:         }
24:     }
25:     close(out)
26: }

convert mutexes into goroutines, not suitable for Raft where state transition is complex

 1: type Server struct {
 2:     publish   chan Event
 3:     subscribe chan subReq  // a channel to queue unhandled subscription
 4:     cancel    chan subReq
 5: }
 6: type subReq struct {
 7:     c  chan<- Event
 8:     // a signal of whether an operation succeeds
 9:     ok chan bool
10: }
11: 
12: func (s *Server) Init() {
13:     s.publish = make(chan Event)
14:     s.subscribe = make(chan subReq)
15:     s.cancel = make(chan subReq)
16:     go s.loop()
17: }
18: func (s *Server) Publish(e Event) {
19:     // no mutex is required here
20:     // as it does not read state
21:     s.publish <- e
22: }
23: func (s *Server) Subscribe(c chan<- Event) {
24:     r := subReq{c: c, ok: make(chan bool)}
25:     s.subscribe <- r
26:     if !<-r.ok {  // wait for loop() handle result
27:         panic("pubsub: already subscribed")
28:     }
29: }
30: func (s *Server) Cancel(c chan<- Event) {
31:     r := subReq{c: c, ok: make(chan bool)}
32:     s.cancel <- r
33:     if !<-r.ok {
34:         panic("pubusb: not subscribed")
35:     }
36: }
37: func (s *Server) loop() {
38:     // now sub is a local variable, no lock is needed
39:     // sub maps from a subscribed channel to a helper channel
40:     sub := make(map[chan<- Event]chan<- Event)
41:     for {
42:         select {
43:         case e := <-s.publish:
44:             for _, h := range sub {
45:                 // the event is published to a helper channel
46:                 h <- e
47:             }
48:         case r := <-s.subscribe:
49:             // the helper channel exists
50:             // meaning the subscriber has been handled before
51:             if sub[r.c] != nil {
52:                 r.ok <- false
53:                 break
54:             }
55:             h = make(chan Event)
56:             go helper(h, r.c)
57:             sub[r.c] = h
58:             r.ok <- true
59:         case c := <-s.cancel:
60:             if !sub[r.c] == nil{
61:                 r.ok <- false
62:                 break
63:             }
64:             // close the helper channel
65:             close(sub[r.c])
66:             delete(sub, r.c)
67:             r.ok <- true
68:         }
69:     }
70: }

4. Pattern 2: work scheduler

$M$ tasks assigned to $N$ servers/workers, $ M >> N$.

 1: func Schedule(servers []string, numTask int,
 2:     call func(srv string, task int)) {
 3: 
 4:     idle := make(chan string, len(servers))
 5:     // initialize a channel of idle servers
 6:     for _, srv := range servers {
 7:         idle <- srv
 8:     }
 9: 
10:     for task := 0, task < numTask; task++ {
11:         // if using task in the for loop rather than a local task,
12:         // there is a race: the loop goes on before the goroutinue starts,
13:         // so that some tasks are skipped.
14:         task := task
15:         // if moving srv := <- idle inside goroutine
16:         // a lot of goroutines are created simoutaneously and hung
17:         // due to non-idle server
18:         // leaving it outside so that a goroutine is only created when
19:         // there is an idle server (but it slows down the main loop)
20:         srv := <-idle
21:         go func() {
22:             call(srv, task) // server does the task
23:             // serve finishes the task and becomes idle again
24:             idle <- srv
25:         }()
26:     }
27: 
28:     // determine when all tasks are done / all servers are idle
29:     // this is used to prevent early exit when all tasks have been assigned
30:     // but the last servers have not finished
31:     for i :=0; i < len(servers); i++ {
32:         <-idle
33:     }
34: }

Optimization for the above code: while the task loop creates goroutines $M$ times, actually there are only at most $N$ active goroutines at any time.

Better to spin off a goroutine for each server.
The number of servers can be dynamic.

 1: func Schedule(servers chan string, numTask int,
 2:     call func(srv string, task int)) {
 3: 
 4:     work := make(chan int)  // a queue of all works yet to be done
 5:     done := make(chan bool) // a queue of all done tasks
 6:     exit := make(chan bool) // signal when should not pull new servers
 7: 
 8:     runTasks := func(srv string) {
 9:         // keep polling until work is closed
10:         for task := range work {
11:             if call(srv, task) {
12:                 done <- true
13:             } else {
14:                 // repush the task if it failed
15:                 work <- task
16:             }
17:         }
18:     }
19: 
20:     // use a goroutine to avoid hanging when
21:     // no server is available
22:     go func() {
23:         for _, srv := range servers {
24:             for {
25:                 select {
26:                 case src := <-servers:
27:                     go runTasks(srv)
28:                 case <-exit:
29:                     return
30:                 }
31:             }
32:         }
33:     }()
34: 
35:     // The following code has a deadlock!
36:     // In the runTasks, the server pushes to done channel when a task is done.
37:     // However, the done channel is only pulled when the main routine has
38:     // pushed all tasks and close the work channel.
39:     // Therefore any server hangs when trying push the second done work.
40:     // for taks := 0; task < numTask; task++ {
41:     //  work <- task
42:     // }
43:     // // signal no more task so that servers know
44:     // // when to termiante
45:     // close(work)
46: 
47:     // // wait until all tasks are done
48:     // for i := 0; i < numTask; i++ {
49:     //  <-done
50:     // }
51: 
52:     // fix 1: one can switch between work and donw channel
53:     i := 0
54: WorkLoop:
55:     for task := 0; task < numTask; task++ {
56:         for {
57:             select {
58:             case work <- task:
59:                 continue WorkLoop
60:             case <-done:
61:                 i++
62:             }
63:         }
64:     }
65: 
66:     // wait for the last assigned tasks to be done
67:     for ; i < numTask; i++ {
68:         <-done
69:     }
70: 
71:     // only close work channel in the end,
72:     // in case some tasks failed and need to be redo
73:     close(work)
74:     exit <- true // stop pulling new servers
75: 
76:     // fix 2: move the work assignment to a separate go routine
77:     // go func() {
78:     //  for task := ; task < numTask; task++ {
79:     //      work <- task
80:     //  }
81:     //  close(work)
82:     // }()
83: 
84:     // fix 3: increase buffer for the work channel
85:     // work := make(chan int, numTask)
86: }

5. Pattern 3: replicated service client

A client replicates its requests to multiple servers, waits for the first reply and changes its preferred server.

func (c *Client) Call(args Args) Reply {
    type result struct {
        serverID int
        reply Reply
    }

    const timeout = 1 * time.Second
    t := time.NewTimer(timeout)
    defer t.Stop()

    // a channel for all servers to send reply
    // so that even if the client has received a reply
    // other later replies don't hang
    done := make(chan result, len(c.servers))

    c.mu.Lock()
    prefer := c.prefer
    c.mu.Unlock()

    var r result
    for off := 0; off < len(c.servers); off++ {
        // start from the preferred server
        id := (prefer + off) % len(c.servers)
        go func() {
            done <- result{id, c.callOne(c.servers[id], arfs)}
        }()

        // now wait for either a done signal or a timeout
        // if it is done, don't send to other servers
        // otherwise, reset the timer and sends to the next server
        select {
        case r = <-done:
            goto Done  // use a goto if it makes code clear
        case <-t.C:
            // timeout
            t.Reset(timeout)
        }
    }

    r = <-done  // wait for the first reply even if it is a RPC timeout

Done:
    c.mu.Lock()
    c.prefer = r.serverID // update preference
    c.mu.Unlock()
    return r.reply
}

6. Pattern 4: Protocol multiplexer

A multiplexer sits in front of a service and forward messages between multiple clients and the service, e.g., an RPC.

 1: type ProtocolMux interface {
 2:     // A mux is binded to a specific service
 3:     Init(Service)
 4:     // A client uses this method to send message to the service
 5:     // and wait for the service reply
 6:     Call(Msg) Msg
 7: }
 8: 
 9: // Methods that a service exposes to let a mux use
10: // Underlining messgae processing are in the implementation
11: // of the actual service struct
12: type Service interface {
13:     // A tag is a muxing identifier in the request or reply message,
14:     // e.g., which client channel to send the reply
15:     ReadTag(Msg) int64
16:     // Send a request message to the service
17:     // multiple sends cannot be called concurrently
18:     // probably due to only a single channel between
19:     // mux and the service (serialization)
20:     Send(Msg)
21:     // Waits and return the reply message,
22:     // multiple recvs cannot be called concurrently
23:     Recv() Msg
24: }

The mux maintains a channel to queue unsent requests and a channel to queue unsent replies.

 1: type Mux struct {
 2:     srv Service
 3:     // stores unsent requests
 4:     send chan Msg
 5:     mu sync.Mutex
 6:     // maps channel tag to channel
 7:     // whose replies have not been sent out
 8:     pending map[int64]chan<- Msg
 9: }
10: 
11: func (m *Mux) Init(srv Service) {
12:     m.srv = srv
13:     m.pending = make(map[int64]chan Msg)
14:     go m.sendLoop()
15:     go m.recvLoop()
16: }
17: 
18: // sending out queued requests
19: func (m *Mux) sendLoop {
20:     for args := range m.send {
21:         m.srv.Send(args)
22:     }
23: }
24: 
25: func (m *Mux) recvLoop() {
26:     for {
27:         reply := m.srv.Recv()
28:         tag := m.srv.ReadTag(reply)
29:         m.mu.Lock()
30:         // get the reply channel
31:         done := m.pending[tag]
32:         // clear the channel since the message loop
33:         // is complete
34:         delete(m.pending, tag)
35:         m.mu.Unlock()
36: 
37:         if done == nil {
38:             panic("unexpected reply")
39:         }
40:         done <- reply
41:     }
42: 
43: }
44: 
45: // Clients call this method concurrently
46: func (m *Mux) Call(args Msg) (reply Msg) {
47:     tag := m.srv.ReadTag(args)
48:     // to record which message should reply
49:     // to which client
50:     done := make(chan Msg, 1)
51:     m.mu.Lock()
52:     if m.pending[tag] != nil {
53:         m.mu.Unlock()
54:         panic("duplicate request")
55:     }
56:     m.pending[tag] = done
57:     m.mu.Unlock()
58:     m.send <- args
59:     return <-done // hang until a reply is received
60: }

A stupid debugging experience

Mon, 24 Jun 2024 15:06:00 +0200

1. What happened

Servers SA and SB have the same docker installation, and the same running container CA and CB.
A Go file G can be built on CA, but on CB it reports this error:

runtime: failed to create new OS thread (have 2 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc

2. What did I do

I compared any related configurations between SA and SB. and between CA and CB, e.g., ulimit -a, /etc/security/limits.conf. They all look the same.
I created a new container CN on SA with the same docker image, CN can compile G.
I looked into the (complex) docker run script for CA/CB and figured out it was due to a resource constraint --pids-limit 100.
- Increasing this limit to 200 seems resolve the issue, but I had no idea why the Go compiler needed so many resources (perhaps due to package I imported).
Until this point, I realized, since the container did not support the compilation, why not just only transfer the compiled binary!
- How silly that I didn’t even try this in the beginning!
Since the program imports the net package, and there is a known issue of Alpine image running a Go binary file, I followed the post and disabled CGO on SA, then docker cp the binary to CA, and it worked.

3. Another issue of running RPC in docker

The other day, I also spent hours debugging a route unreachable error when I want to send a request from CA to SA.
The CA is using the bridge network, so it should talk to SA via SA’s interface docker0 within the subnet 172.17.0.0/16.
However, in my case, the docker by default rejects packages from any container as shown in SA’s tcpdump result:

172.17.0.1->172.17.0.3 ICMP host unreachable- admin prohibited, length 68
By checking SA’s iptables, I found this rule:
```
  -A INPUT -j REJECT --reject-with icmp-host-prohibited
```
- Strangely, the ping still works with this rule.

In the end, I need to append a new rule to make the RPC work.

  iptables -I INPUT 1 -i docker0 -p tcp --dport  -s 172.17.0.0/16 -j ACCEPT

Linux use tips

Mon, 24 Jun 2024 09:17:00 +0200

1. i3

1.1. Move specific workspaces between different monitors (ref)

Adjust the monitor relative positions.
Use i3-msg -- move workspace to output right to move the current workspace to the monitor on the right

2. Org

2.1. Format code blocks

Use the shortcut < s to create a code block.
Use C-c ' to enter the code environment to use the language major mode.

3. Emacs

3.1. Prefix argument

When the function includes (interactive "P"), it means one can add a prefix argument C-u 1 (or any integer) to trigger some effect.

3.2. Help functions

Ctrl-h K get the function name given a shortcut

4. Vim

4.1. Delete a word backwards

diw or daw

5. Firefox

5.1. Cache bypass refresh

Ctrl+Shift+R

Weblab notes: React route

Sun, 23 Jun 2024 18:38:00 +0200

This is a personal note for the web.lab lectures.

1. Router

use the Reach Reach Router library

URL -> Router -> render different components

1: 
2:   // conditional rendering based on curren url
3:   
4:     "/" /> // root path
5:     "dashboard" /> // relative to the current URL
6:     "/team" /> // absolute path: root path + "/team"
7:     default />
8:   
9: ;

2. Link

relative: "newpage">Click me
absolute: "/newpage">Click me

3. Workshop 3

3.1. Structure

Figure 1: The Catbook structure in workshop 3

3.2. States

name	states
Feed	`stories`: a list of stories
Card	`comments`: a list of comments for a story id

3.3. Props

index	props
1	a function to update `stories`
2	all attributes in a story
3	the attributes used to display a story
4	a story id; a list of comments under the story; a function to update `comments`
5	all attributes in a comment
6	a comment id; the function to update `comments`

3.4. Why passing down the update function in props 1, 4, 6?

To share the parent states, i.e., stories and comments to child component. Since the post action happens in the child component, we need a way to automatically update the states to see new contents immediately.

Weblab notes: React hooks

Sun, 23 Jun 2024 18:21:00 +0200

This is a personal note for the web.lab lectures.

1. What is a React hook

Special functions to access parts of the component lifestyle.
e.g., useState

1.1. `useState` is not enough

1: const [persons, setPersons] = useState([]);
2: 
3: testingStuff = () => {
4:     /* assume persons is empty before */
5:     setPersons([...persons, "me"]);
6: }
7: console.log(persons);

The output of console.log is [] instead of ["me"] because setting a state is async!
To do something immediately after a state is changed, use useEffect hook!

1.2. `useEffect` runs after specific variable change

1: useEffect(() => {
2:     console.log(persons);
3: }, [persons]);

1: useEffect(() => {
2: /* do something, e.g., interact with an external service */
3: 
4: return () => {
5: /* cleanup function on dismount, e.g., disconnect from external service */
6: }
7: }, [/*dependencies */])

useEffect(myFunction, [var1, var2]) calls myFunction everytime when var1 or var2 changes
useEffect(myFunction, []]) calls only once when the component is rendered for the first time (on mount)
useEffect(myFunction) calls at every render

2. React hook patterns

2.1. Fetch and send data

1: /* fetch data on mount */
2: useEffect(() => {
3:     get("/api/packages").then((packageList) => {
4:         setPackages(packageList);
5:     });
6: }, []);

1: /* send data then toggle admin state */
2: const handleToggleAdmin = () => {
3:     // .then(), do something once the promise is fulfilled
4:     post("/api/user/admin", { admin: !admin }).then(() => {
5:         setAdmin(!admin);
6:     });
7: };
8: /*

2.2. Conditional rendering

1: // JSX is a way of writing HTML in js
2: let content = loading ? Loading...
 : Loaded;
3: return (
4:     
5:         Title
6:         {content}
7:     
8: );

2.3. Render an array of Data

1: const data = [
2:     { id: 0, text: "Text 1" },
3:     { id: 1, text: "Text 2" },
4: ];
5: // render a component for each data item
6: return data.map((item) => (
7:     {item.text}
8: ));

key is a special prop in React; it is used identify which item has changed efficiently

3. Example: Stopwatch

 1: const Stopwatch = () => {
 2:     const [time, setTimer] = useState(0);
 3: 
 4:     useEffect(() => {
 5:         const timer = setInterval(() => {
 6:             // setTimer accepts either a new state value,
 7:             // or a function that takes the previous state (oldTime) as an argument and returns the new state
 8:             setTime((oldTime) => oldTime + 1);}, 1000);
 9:         // if not properly cleanup after unmounting
10:         // the timer will continue to run even the state no longer exists
11:         return () => clearInterval(timer);
12:     }, []);
13:     return <>TIme: {time};
14: };

4. DOM and component mounting

DOM (Document Object Model): a programming interface for web documents; represents the structure of a document, e.g., HTML, as a tree of objects, where each object corresponds to a part of the document; it dynamically updates the document contents
- React is a framework that manipulates DOM
A React component is unmounted when:
- conditional rendering
- routing; navigating from one route to another
- its parent component is unmounted

Hello world

Sun, 23 Jun 2024 18:04:00 +0200

1. Hallo!

This is the first blog with org-static-blog!

Chenyo's Blog

Build a free Telegram sticker tag bot

1. What happened

2. What do I need

3. How to build a bot

3.1. How to build a sticker tag bot

3.2. How to make the bot private

4. How to deploy the bot on Render

5. How to connect to the Firebase

6. How to configure UptimeRobot

7. Conclusion

Install Doom Emacs with Lisp native compilation in WSL2

1. Assumptions

2. Install prerequisite packages

3. Install Emacs 29.4

3.1. Before building

3.2. Build Emacs 29.4 with native-comp

4. Install ripgrep

5. Install Doom Emacs

6. Some issues with running Emacs in WSL2

Hiked: Stoos

1. What happened?

2. What now?

CMU 15-445 notes: Hash Tables

1. DBMS data structure application

1.1. Design decisions

2. Hash tables

2.1. Where are hash tables used

3. Hash function

4. Hashing scheme

5. Static hashing scheme

5.1. Linear probe hashing

5.1.1. Non-unique keys

5.1.2. Optimization

5.2. Cuckoo hashing

6. Dynamic hashing schemes

6.1. Chained Hashing

6.2. Extendible hashing

6.3. Linear hashing

My blog search function

1. What happened

2. What is not enough with existing search functions?

3. What does the search function do in this blog?

4. How is it implemented?

5. How can it be improved?

Web learning in practice

1. Basic html structure

2. Tags

3. Attributes

4. Javascript

4.1. Fetch from an HTML URL

4.2. Modify a DOM

4.3. Syntax

CMU 15-445 notes: Memory Management

1. Goals

2. Locks & Latches

2.1. Locks

2.2. Latches

3. Buffer pool

3.1. Metadata

3.2. Memory allocation policies

4. Buffer pool optimizations

4.1. Multiple buffer pools

4.2. Pre-fetching

4.3. Scan sharing

4.4. Buffer pool bypass

5. Buffer replacement policies

5.1. Least recently used (LRU)

5.2. Clock

5.3. LRU-K

5.3.1. MySQL approximate LRU-K

5.4. Localization

5.5. Priority hint

5.6. Dirty pages

6. Other memory pools

7. OS cache bypass

8. I/O scheduling

CMU 15-445 notes: Storage Models & Compression

1. Database workloads

1.1. OLTP (Online Transaction Processing)

4. Install `ripgrep`