Damon's Blog
The magic you are looking for is in the work you're avoiding.
Gems
2026
February
looping through the tasks
looping through the tasks with priority
looping through the tasks with quota and heuristic priority
CFS: Completely Fair Scheduler
check the task with the smallest virtual runtime, and run it for a small time slice, then update its virtual runtime and put it back to the red-black tree.
use the nic value to adjust the virtual runtime of a task, thus adjusting its priority.
IO bounding tasks will have smaller virtual runtime, thus higher priority, and CPU bound tasks will have larger virtual runtime, thus lower priority.
TODO
January
25: Get the classpath through mvn:
mvn dependency:build-classpath -Dmdep.outputFile=cp.txt23: Java Service Provider Interface
Java SPI -> api + impl with meta-info + app
Annotation processing is one usage of Java SPI. API is the Processor interface, impl is the annotation processor implementation, meta-info is the file in META-INF/services/ that lists the implementation class, app is the Java compiler that will load the annotation processor and execute it during compilation.
Java Instrumentation: can be used to transform existing bytecode / add extra-opens/exports to existing modules
Annotation processor can only be used to generate new files, not to change existing ones.
The notable exception is the Lombok library which use annotation processing as bootstrapping mechanism to include itself into the compilation process and modify the AST via some internal compiler APIs.
Google AutoService: used for SPI, Processor is the interface, and AnnotationProcessor is the implementation that need to be found during compile time by ServiceLoader.
generic programming
memory allocation
interface design
20: 链接器是如何一步步发明出来的?
编译器处理各个模块,但不必关心跨模块引用
根据各个模块提供的信息来确定符号最终的内存地址并合并所有的模块为一个最终可执行文件
Relocation Table
Symbol Table
0x400000
zero copy
当程序访问映射区域时,如果所需的页面不在内存中,虚拟内存子系统会自动触发缺页中断,并将相应的页面从磁盘加载到内存中。
频繁修改分散的小数据块(如散列写入)可能导致大量缺页中断和TLB(Translation Lookaside Buffer)未命中,性能可能低于传统read/write。mmap适合需要零拷贝访问、大文件随机读或共享内存的高性能场景(如内存数据库、图像处理)。
如果是实时系统的话,那么这种场景对操作延迟有严格上限,mmap的缺页中断和磁盘I/O延迟不可预测。
Chronicle Queue mitigates mmap page‑fault unpredictability with design choices aimed at determinism:
Preallocation and pre‑touching: files are pre‑created and pages are proactively touched via a pretoucher, so writes hit resident pages instead of triggering page faults.
Sequential, append‑only I/O: linear writes avoid random access, letting the OS keep pages hot and leverage readahead efficiently.
Off‑heap/direct memory: minimizes GC pauses and heap contention, keeping latency stable.
Bounded, fixed‑size blocks and rolling cycles: predictable allocation patterns reduce surprise faults and fragmentation.
Warm‑up on startup: touches mappings and populates TLB/page cache before latency‑sensitive work.
Optional mlock/pinning (where allowed): keeps critical pages resident to avoid paging.
Controlled durability: fsync can be tuned (or batched) to match real‑time latency budgets.
Pretouch (or pre-touching) means proactively accessing mmaped pages before latency-sensitive I/O so they are resident in memory and TLB/cache, avoiding on-demand page faults.
In Chronicle Queue, a pretoucher thread walks ahead in the append path and touches upcoming pages/blocks (often by writing small dummy bytes or reading headers). This:
Faults pages in early, populating the page cache and TLB.
Ensures sequential, append-only writes hit hot pages.
Reduces first-touch latency spikes during real-time operation.
18: Linux性能分析工具汇总
vmstat
iostat
dstat
iotop
pidstat
top
htop
mpstat
netstat
ps
strace
uptime
lsof
perf
17: What Is a TLAB or Thread-Local Allocation Buffer in Java?
The JVM addresses this concern using Thread-Local Allocation Buffers, or TLABs. These are areas of heap memory that are reserved for a given thread and are used only by that thread to allocate memory.
launch the JVM with the -Xlog:gc+tlab=trace flag to see this information
16: Java 25
vectorization
scoped values
structured concurrency
15: Guest Post: How I Scanned all of GitHub’s “Oops Commits” for Leaked Secrets
GitHub Archive logs every public commit, even the ones developers try to delete. Force pushes often cover up mistakes like leaked credentials by rewriting Git history. GitHub keeps these dangling commits, from what we can tell, forever. In the archive, they show up as “zero-commit” PushEvents.
"静态初始化顺序问题"(Static Initialization Order Fiasco,简称SIOF)
解决方案:Meyers Singleton
C++11保证局部静态变量初始化是线程安全的
13: 改进FAST协议解码性能
Stop Bit Encoded
VByte 是一种“面向字节”的方案。在每个字节中,保留最高有效位作为控制位:当该字节是编码整数中的最后一个时,该位设置为 0,否则设置为 1。
将一组控制位集中到前面后,减少了VByte对每个byte判断第一位控制位时引发的分支预测失败率。
vint-G8IU快在利用SIMD指令。vint-GB也能被SIMD,但它对于Group控制位后的数据位长度是不固定的(4~16位)。因此在数据预取时会因长度不固定而导致cpu不时的停顿。而vint-G8IU每块数据都是固定的9位byte,在数据预取时,更加的可预测。
VByte -> vint-GB -> vint-G8IU -> Stream VByte
12: Building a full-text search engine in 150 lines of Python code
Bart de Goede A full-text search engine consists of three main components:
analyze
tokenize
lower case normalization
punctuation removal
stop word removal
stem
inverted index
search
query parsing
precision search
relevancy search - Understanding TF-IDF (Term Frequency-Inverse Document Frequency)
Term frequency
Inverse Document Frequency
11: Lessons Learned Shipping 500 Units of my First Hardware Product
From software engineer to hardware engineer.
10: How to Scale a System from 0 to 10 million+ Users
A re-warm up.
9: What is the purpose of std::launder?
Money laundering is used to prevent people from tracing where you got your money from. Memory laundering is used to prevent the compiler from tracing where you got your object from, thus forcing it to avoid any optimizations that may no longer apply.
The placement new operator in C++ allows you to construct an object in a pre-allocated memory buffer.
7: std::jthread: A safer and more capable way of concurrency in C++20
The destructor of a std::thread object with an associated thread calls std::terminate if join() has not been called.
upper_bound: Returns the first iterator iter in [first, last) where bool(value < *iter) is true, or last if no such iter exists.
lower_bound: Returns the first iterator iter in [first, last) where bool(*iter < value) is false, or last if no such iter exists.
Compiling in debug mode will then include debug symbols (-g), disable optimisation (-00), and enable assert() by omitting -DNDEBUG.
在读操作占主导时,Wait-Free 表现极佳。因为读线程的“协助”行为分散了写线程的竞争压力,而且读操作本身不需要像 CAS 循环那样反复争抢缓存行。
但在写操作占主导时,Lock-Free 反而更快。为什么?因为 Wait-Free 的那些位运算、状态判断、原子交换握手,都是实打实的 CPU 指令开销。而 Lock-Free 在竞争不激烈的时候,就是一个简单的原子加减,极其轻量。
TODO
1: Transforming Uniform Random Variables to Normal
Central Limit Theorem (CLT) Method (12 Uniforms): Generate Gaussian samples by central limit theorem
Inverse Transform Sampling: How to generate Gaussian samples
Box-Muller Transform: Box-Muller
Beasley-Springer-Moro Algorithm
linear congruential generator
Transforming Uniform Random Variables to Normal
Box-Muller Transform
Beasley-Springer-Moro Algorithm
Inverse Transform Method
Contacts

Last updated