Damon's Blog

Gems

2025

June

5: Dissecting the Disruptor: Writing to the ring buffer
The important areas are: not wrapping the ring; informing the consumers; batching for producers; and how multiple producers work.
4: Initialization Scripts
Using it to config the repo that could be used by all projects. And define some common tasks inside allprojects, eg., the showRepositories share in the page.
3: How to Warm Up the JVM
More about the JVM - JVM and Java Virtual Machine Series
2: Appendix D: Order State Change Matrices – FIX 4.4 – FIX Dictionary
A very good example about how to present a reproducible state matrix in a table.
1: Implement the Streaming Real-Time Java Application with Kotlin Language
An example of integration to Refinitiv market data.

May

30: Templating Maven Plugin
The templating maven plugin handles copying files from a source to a given output directory, while filtering them. This plugin is useful to filter Java Source Code if you need for example to have things in that code replaced with some properties values.
29: Beginner’s Guide To Bash getopts
A beginner's guide to using getopts in bash scripts for parsing command-line options and arguments. Also How to Use Bash Getopts With Examples.
24: Plain Vanilla
An explainer for doing web development using only vanilla techniques. No tools, no frameworks — just HTML, CSS, and JavaScript. TODO
23: Jane Street防抖动简明教程
如何避免 Jitter (System Jitter and Where to Find It: A Whack-a-Mole Experiencer and magic-trace github)
- 第一轮：干掉虚拟机！
- 第二轮：消除中断的“骚扰”
- 第三轮：隔离出 CPU
- 第四轮：让时钟中断也“消停会儿”
- 第五轮：关掉 CPU 的“自动挡”
- 第六轮：给 CPU 加个“暂停”提示
- 第七轮：氪金
22: Gall's Law
加尔定律经常被引用：“一个有效的复杂系统，总是从一个有效的简单系统进化而来。”
但是，它的推论很少被引用：“一个从零开始设计的复杂系统永远不会有效，你必须从一个可以运行的简单系统开始。”
There's More To That Nugget of Wisdom
16: How Core Git Developers Configure Git
What git config settings should be defaults by now? Here are some settings that even the core developers change.
Why is Git Autocorrect too fast for Formula One drivers?
- it's based on a fairly simple, modified Levenshtein distance algorithm - which is basically a way to figure out how expensive it is to change one string into a second string given single character edits, with some operations being more expensive than others.
Experiment on your code freely with Git worktree
15: The Unreasonable Effectiveness of an LLM Agent Loop with Tool Use
With just that one very general purpose tool, the current models (we use Claude 3.7 Sonnet extensively) can nail many problems, some of them in "one shot."
14: Ports that are blocked by browsers
list of the ports blocked by Firefox.
13: Pick the right clock
- Choosing which timer to use is very simple and depends on how long the thing is that you want to measure. If you measure something over a very small time period, TSC will give you better accuracy. Conversely, it’s pointless to use the TSC to measure a program that runs for hours. Unless you really need cycle accuracy, the system timer should be enough for a large proportion of cases. It’s important to keep in mind that accessing system timer usually has higher latency than accessing TSC. Making a clock_gettime system call can be easily ten times slower than executing RDTSC instruction, which takes 20+ CPU cycles. This may become important for minimizing measurement overhead, especially in the production environment. Performance comparison of diﬀerent APIs for accessing timers on various platforms is available on wiki page46 of CppPerformanceBenchmarks repository. "Performance Analysis and Tuning on Modern CPUs"
- /sys/devices/system/clocksource/clocksource0/current_clocksource to check whether tsc is used
- the clock_gettime() function from <time.h> can use the TSC (Time Stamp Counter), but it depends on:
  The clock source (e.g., CLOCK_MONOTONIC, CLOCK_REALTIME).
  The underlying system configuration (VDSO acceleration, TSC stability).
12: Templating Maven Plugin
The templating maven plugin handles copying files from a source to a given output directory, while filtering them. This plugin is useful to filter Java Source Code if you need for example to have things in that code replaced with some properties values.
11: MM and LOB
TODO:
10: Concatenating kdb Columns
- Suppose in a query you need to concatenate two kdb columns into one; for example, to join date and time into one field - kdb has nifty features to do it easily.
- Programming/Kdb/Resources
9: vTable And vPtr in C++ and Understandig Virtual Tables in C++
- how to design cpp similar to the interface in java
  runtime polymorphism vs compile time generics / templates
  runtime polymorphism with virtual methods and always with override keyword
  pure virtual function
  base contract class should always have virtual destructor to prevent memory leakage
- Whenever a class contains a virtual function, the compiler creates a Vtable for that class. Each object of the class is then provided with a hidden pointer to this table, known as Vptr.
- It's important to note that vptr is created only if a class has or inherits a virtual function.
- This process is known as static dispatch or early binding: the compiler knows which routine to execute during compilation.
- given that virtual functions can be redefined in subclasses, calls via pointers (or references) to a base type can not be dispatched at compile time. The compiler has to find the right function definition (i.e. the most specific one) at runtime. This process is called dynamic dispatch or late method binding.
- Since derived classes are often handled via base class references, declaring a non-virtual destructor will be dispatched statically, obfuscating the destructor of the derived class.
8: Latency percentiles are not additive
Latency percentiles are simply not additive. Adding latency percentiles from multiple requests are indicative but not conclusive. And their summation is often too pessimistic and may trigger unnecessary overreaction.
7: C++: C-Style arrays vs. std::array vs. std::vector and std::vector versus std::array in C++
- std::array is a very thin wrapper around C-style arrays that go on the stack (to put it simply, they do not use operator new. The examples above do this). Like arrays that go on the stack, its size must be known at compile time
- You should use stdarray when the array size is known at compile time. You should use stdvector when you do not, or the array can grow.
6: Beej's Guide to Network Programming
A good sites for all kinds of guides including the Network Programming.
5: Solve a Hard Problem (Tinder). Chapter 8 of my upcoming book, The Cold Start Problem
- What people are doing on their nights and weekends represents all the underutilized time and energy in the world that if put to good use, can become the basis of the hard side of an atomic network.
- If there is no network in your product, add it through atomic network.
4: A Candidate For the “Most Important const”
The "const" is important. The first line is an error and the code won’t compile portably with this reference to non-const, because f() returns a temporary object (i.e., rvalue) and only lvalues can be bound to references to non-const.

April

26: An Introduction to Epsilon GC: A No-Op Experimental Garbage Collector
JEP 318 explains that “[Epsilon] … handles memory allocation but does not implement any actual memory reclamation mechanism. Once the available Java heap is exhausted, the JVM will shut down.”
25: Proof Engineering: The Message Bus
Every input into the system is assigned a globally unique monotonic sequence number and timestamp by a central component known as a sequencer. This sequenced stream of events is disseminated to all nodes/applications in the system, which only operate on these sequenced inputs, and never on any other external inputs that have not been sequenced. Any outputs from the applications must also first be sequenced before they can be consumed by other applications or the external world. Since all nodes in the distributed system are presented with the exact same sequence of events, it is relatively straightforward for them to arrive at the same logical state after each event, without incurring any overhead or issues related to inter-node communication.
- Binary Encoding
  flatbuffers
  capnproto
- Hardware Efficiency / Kernel Bypassing
  DPDK
  NVIDIA Messaging Accelerator (VMA)
  OpenOnload
20: Details of the Cloudflare outage on July 2, 2019
TODO
19: Finding Memory Leak through MAT
The following 4-step approach proved to be most efficient to detect memory issues:
1. Get an overview of the heap dump. See: Overview
2. Find big memory chunks (single objects or groups of objects).
3. Inspect the content of this memory chunk.
4. If the content of the memory chunk is too big check who keeps this memory chunk alive This sequence of actions is automated in Memory Analyzer by the Leak Suspects Report.
18: Suffering-oriented programming
First make it possible. Then make it beautiful. Then make it fast.
17: Proof Engineering: The Algorithmic Trading Platform
- The best way to avoid GC is to not create garbage in the first place. This topic could fill a book, but the primary ways to do that are: (a) Do not create new objects in the critical path of processing. Create all the objects you’ll need upfront and cache them in object pools. (b) Do not use Java strings. Java strings are immutable objects that are a common source of garbage. We use pooled custom strings that are based on java.lang.StringBuilder (c) Do not use standard Java collections. More on this below (d) Careful about boxing/unboxing of primitive types, which can happen when using standard collections or during logging. (e) Consider using off-heap memory buffers where appropriate (we use some of the utilities available in chronicle-core).
- Avoid standard Java collections. Most standard Java collections use a companion Entry or Node object, that is created and destroyed as items are added/removed. Also, every iteration through these collections creates a new Iterator object, which contributes to garbage. Lastly, when used with primitive data types (e.g. a map of long → Object), garbage will be produced with almost every operation due to boxing/unboxing. When possible, we use collections from agrona and fastutil (and rarely, guava).
- Write deterministic code. We’ve alluded to determinism above, but it deserves elaboration, as this is key to making the system work. By deterministic code, we mean that the code should produce the exact same output each time it is presented with a given sequenced stream, down to even the timestamps. This is easier said than done, because it means that the code may not use constructs such as external threads, or timers, or even the local system clock. The very passage of time must be derived from timestamps seen on the sequenced stream. And it gets weirder from there — like, did you know that the iteration order of some collections (e.g. java.util.HashMap) is non-deterministic because it relies on the hashCode of the entry keys?!
- but our changes enable us to integrate QuickFIX/J with the sequenced stream architecture in such a way that we no longer rely on disk logs for recovery (which is how most FIX sessions recover).
- Our FIX spec is available in either the PDF format or the ATDL format (Algorithmic Trading Definition Language).
13: The Escape of ArrayList.iterator()
Escape Analysis works, at least for some trivial cases. It is not as powerful as we'd like it, and code that is not hot enough will not enjoy it, but for hot code it will happen. I'd be happier if the flags for tracking when it happens were not debug only.
12: What is the meaning of SO_REUSEADDR (setsockopt option) - Linux?
This socket option tells the kernel that even if this port is busy (in the TIME_WAIT state), go ahead and reuse it anyway. If it is busy, but with another state, you will still get an address already in use error. It is useful if your server has been shut down, and then restarted right away while sockets are still active on its port.
11: Single Writer Principle
If a system is decomposed into components that keep their own relevant state model, without a central shared model, and all communication is achieved via message passing then you have a system without contention naturally. This type of system obeys the single writer principle if the messaging passing sub-system is not implemented as queues. If you cannot move straight to a model like this, but are finding scalability issues related to contention, then start by asking the question, “How do I change this code to preserve the Single Writer Principle and thus avoid the contention?”LMAX - How to Do 100K TPS at Less than 1ms Latency: the head and the tail compete with each other quite often since the queue normally is either full or empty, and when it's empty, they are normally pointing to the same cacheline. Why queue is not a good data structure for low latency?
- Contention & Locking Overhead: locks / cache coherence traffic
- Memory Allocation & Garbage Collection (GC): LMAX avoids this by using pre-allocated, garbage-free data structures.
- Pointer Chasing & Cache Misses: LMAX uses a pre-allocated ring buffer (Disruptor) that is cache-friendly (sequential memory access).
- Batching & False Sharing: Queues often process items one at a time, missing opportunities for batching (which improves throughput).Little's law
10: Double Buffer
Efficient pattern for single writer and single reader case. To ensure thread-safety, ReadWriteLock / Semaphore could be used.Parallel C++: Double Buffering
9: PERFORMANCE NINJA CLASS
Performance Ninja Class is a FREE self-paced online course for developers who want to master software performance tuning.easyperf -> this is the author's amazing blog.
8: The update-alternatives Command in Linux
Linux systems allow easily switching between programs of similar functionality or goal. So we can set a given version of a utility program or development tool for all users. Moreover, the change applies not only to the program itself but to its configuration or documentation as well.
7: Is a write to a volatile a memory-barrier in Java
All writes that occur before a volatile store are visible by any other threads with the predicate that the other threads load this new store. However write that occur before a volatile load my or may not be seen by other threads if they do not load the new value.
In Java, the semantics of volative are defined to ensure visibility and ordering of variables across threads.
- A volatile write in Java means that a StoreStore barrier and a LoadStore barrier are inserted. This ensures that
  All previous writes (stores) are visible before the volatile write.
  The volatile write is visible before any subsequent writes (stores).
- A volatile read in Java means that a LoadLoad barrier and a LoadStore barrier are inserted. This ensures that
  The volatile read is visible before any subsequent reads (loads).
  The volativle read is visible before any subsequent writes (stores).
6: Linux Default Route
Linux setup default gateway with route command Route internet traffic through a specific interface in Linux Servers – CentOS / RHEL
4: InheritableThreadLocal使用详解
InheritableThreadLocal 就能实现这样的功能，这个类能让子线程继承父线程中已经设置的ThreadLocal值。
3: Design of the Shutdown Hooks API
Why are shutdown hooks run concurrently? Wouldn't it make more sense to run them in reverse order of registration?
Invoking shutdown hooks in their reverse order of registration is certainly intuitive, and is in fact how the C runtime library's atexit procedure works. This technique really only makes sense, however, in a single-threaded system. In a multi-threaded system such as Java platform the order in which hooks are registered is in general undetermined and therefore implies nothing about which hooks ought to be run before which other hooks. Invoking hooks in any particular sequential order also increases the possibility of deadlocks. Note that if a particular subsystem needs to invoke shutdown actions in a particular order then it is free to synchronize them internally.
2: XOR swap algorithm
In computer programming, the exclusive or swap (sometimes shortened to XOR swap) is an algorithm that uses the exclusive or bitwise operation to swap the values of two variables without using the temporary variable which is normally required.
1: Thread Affinity

March

31: Using Pausers in Event Loops
- sleep requests of ~1ms and ~1us reduce CPU usage to ~1% and ~10% respectively compared with busy waiting (100%)
- Here again, there is no single answer as to how the system will behave. The key is to bias the situation as much as possible to avoid the thread being switched from a core, and the use of thread affinity (to avoid the thread being moved to another core) and CPU isolation (to avoid another process/thread contending with the thread) can be very effective in this case1. Careful use of affinity, isolation, and short sleep periods can result in responsive, low-jitter environments, which use considerably fewer CPU resources compared with busy waiting.
- 1 Other options include running with real-time priorities, however we want to keep the focus of this document on standard setups as much as possible
- Why the Cool Kids Use Event Loops Below are some of the key points to consider when choosing to use event Loops:
  Lock Free
  Testing and Evolving Requirements
  Shared Mutable State
  CPU Isolation and Thread Affinity
  Event Driven Architecture
  Resource UtilizationSingleAndMultiThreadedExample.java
- Building Fast Trading Engines: Chronicle’s Approach to Low-Latency Trading
  Challenges in Low-Latency Trading
  Threading and Core Utilisation
  Serialisation and Deserialisation
  Message Passing and Data Persistence
  Addressing Low-Latency Trading Pain Points
  Thread Affinity and Event Loop Optimisation
  Efficient Message Passing
  Minimising Garbage Collection
  Performance Tuning for High-Throughput Trading
  Real-World Example: A High-Performance Trading Engine in Action
  Accepting Market Data
  Making Trading Decisions
  Chronicle Queue Enterprise for Communication
  Keeping Latency Stable
30: github useful scripts
- show-busy-java-threads; how to find the thread that uses the most CPU
  top命令找出消耗CPU高的Java进程及其线程id
  开启线程显示模式（top -H，或是打开top后按H）
  按CPU使用率排序（top缺省是按CPU使用降序，已经合要求；打开top后按P可以显式指定按CPU使用降序）
  记下Java进程id及其CPU高的线程id
  查看消耗CPU高的线程栈：
  用进程id作为参数，jstack 出有问题的Java进程; jstack命令解析
  手动转换线程id成十六进制（可以用printf %x 1234）
  在jstack输出中查找十六进制的线程id（可以用vim的查找功能/0x1234，或是grep 0x1234 -A 20）
  查看对应的线程栈，分析问题; 查问题时，会要多次上面的操作以分析确定问题
- tcp-connection-state-counter
29: 操作系统是如何一步步发明中断机制的？
当发生中断时，CPU使用中断号作为索引，查找中断向量表中的对应条目，从而获取中断处理程序的入口地址。操作系统是如何一步步发明进程、线程的？
1. 要实现这一点程序必须具备暂停运行以及恢复运行的能力，要想让程序具备暂停运行/恢复运行的能力就必须保存CPU上下文信息。
2. 设计一个新的抽象概念，让各个运行的程序彼此隔离，为每个程序提供独立的内存空间，你决定采用段氏内存管理，每个运行的程序中的各个段都有自己的内存区域现在你设计了struct context以及struct memory_map，显然它们都属于某一个运行起来的程序，“运行起来的程序”是一个新的概念，你给起了个名字叫做进程，process，现在进程上下文以及内存映射都可以放到进程这个结构体中
每个线程都是进程内的一个独立执行单元，它们：
1. 共享进程的地址空间，这意味着所有线程可以直接访问相同的内存区域
2. 共享打开的文件描述符，避免了重复打开关闭文件的开销
3. 共享其他系统资源，如信号处理函数、进程工作目录等
4. 仅维护独立的执行栈和寄存器状态，确保每个线程可以独立执行
28: Java Annotation Processing and Creating a Builder
An important thing to note is the limitation of the annotation processing API — it can only be used to generate new files, not to change existing ones. If you use Maven to build this jar and try to put this file directly into the src/main/resources/META-INF/services directory, you’ll encounter the following error:
```
[ERROR] Bad service configuration file, or exception thrown while 
constructing Processor object: javax.annotation.processing.Processor: 
Provider com.baeldung.annotation.processor.BuilderProcessor not found
```
This is because the compiler tries to use this file during the source-processing stage of the module itself when the BuilderProcessor file is not yet compiled. The file has to be either put inside another resource directory and copied to the META-INF/services directory during the resource copying stage of the Maven build, or (even better) generated during the build. The Google auto-service library, discussed in the following section, allows generating this file using a simple annotation.
- JavaPoet
- palantir's JavaPoet
27: Blocking Sockets
This means that accept blocks the calling thread until a new connection is available from the OS, but the reverse is not true. The underlying OS will establish TCP connections for the application even if the program is not currently blocked at accept. In other words, accept asks the OS for the first ready-to-use connection, but the OS does not wait for the application to accept connections in order to establish new ones. It might establish many more.
26: hatch
Hatch is a modern, extensible Python project manager.
24: Building a (T1D) Smartwatch from Scratch
Learn how a hardware engineer works.
23: Booleans Are a Trap
Enum may be a better option.
22: On inheritance and subtyping
Explicit Inheritance vs Implicit Inheritance
21: Server-Sent Events (SSE) Are Underrated
LLM and content-type: text/event-stream
20: Difference between Real User ID, Effective User ID and Saved User ID
19: toArray with pre sized array
- Bottom line: toArray(new T[0]) seems faster, safer, and contractually cleaner, and therefore should be the default choice now.
18: AOP in JDK、CGLIB
JDK based AOP leverage reflection which brings in performance cost; while CGLIB uses ASM to modify the original class's bytecode and generates its subclass in runtime to intercept the method call.
17: A minimal CMake project template
Learn how to use CMake properly; and note that CMake is a generator for a building system, itself is not a building system.
16: A Guide to CompletableFuture
The key difference between CompletableFuture and Future is chain.
15: Writing Compilers
- Compiler problems really are everywhere.
- Compiling to Assembly from Scratch
- Crafting Interpreters
  Game Programming Patterns
  These can also be a template to teach you how to write a book
- chibicc: A Small C Compiler
  read each commit one by one

February

27: The concept behind C++ concepts
Concepts are an extension for templates.
- They can be used to perform compile-time validation of template arguments through boolean predicates.
- They can also be used to perform function dispatch based on properties of types.
24: A mental model for Linux file, hard and soft links
- Mental Mode about the understanding of inode, hard and soft links in Linux.
- a soft link links a link file to a target file. This is in contrast to a hard link, which links a pathname to an inode.
- The content of a soft link is the pathname of the target file it points to.
- a hard link exists as a directory entry that links a pathname to an inode, while a soft link exists as a file that links its own pathname to another pathname.
- symlinks, hardlinks and reflinks explained: Note a file can be held open by a process while all hardlinks are subsequently unlinked, leaving the data accessible until the file is closed. The main use for multiply hardlinked files is to create efficient backups.
23: Gradle Tutorial
- Running Gradle Builds
- Authoring Gradle Builds
- Optimizing Gradle Builds
- Dependency Management
22: Stackoverflow: toArray with pre sized array
Bottom line: toArray(new T[0]) seems faster, safer, and contractually cleaner, and therefore should be the default choice now. Future VM optimizations may close this performance gap for toArray(new T[size]), rendering the current "believed to be optimal" usages on par with an actually optimal one. Further improvements in toArray APIs would follow the same logic as toArray(new T[0])the collection itself should create the appropriate storage.
21: 比printf高效1000倍！如何精准捕捉C/C++野指针
在GDB中你可以通过添加watchpoint来观察一段内存，这段内存被修改时程序将会停止，此时我们就能知道到底是哪行代码对该内存进行了修改。
20: Overview of cross-architecture portability problems
This blog post provides an overview of common cross-architecture portability problems encountered in software development, particularly focusing on the challenges when targeting 32-bit systems. It discusses issues related to integer type sizes, address space limitations, large file support, the Y2K38 problem, byte order (endianness), and char signedness. While many of these issues are often discussed in the context of C programming, the author highlights that some, like address space limitations, can affect programs written in higher-level languages such as Python. The post emphasizes that achieving true cross-architecture portability requires careful consideration of these low-level details and can be challenging, especially when dealing with legacy or proprietary software.
19: Meta Schema of Website
6: The Impact of 25% Tariffs on Canadian GDP
Learn how to think like a master from DeepSeek.
5: isd – interactive systemd
isd (interactive systemd) – a better way to work with systemd units
4: changedetection.io
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service.
3: Building a semantic movie search demo with pgvector and Next.js
Image replacement in Canva designs using reverse image search
2: Writing Compilers
"Writing a Compiler in Go"
1: Guava Splitter vs StringUtils
Still I was surprised by the result, and if you're splitting lots of Strings and performance is an issue, it might be worth considering switching back to Commons StringUtils.

January

25: Gamma Scalping: A Primer
long gamma
24: Operating System in 1,000 Lines
23: CPP RVO (Return Value Optimization)
21: Python Chained Expression
- Interview gone wrong
- What does it mean that Python comparison operators chain/group left to right?
20: The Uses of the Exec Command in Shell Script
when run with source, it should be exec /bin/bash -c 'source ...'
15: Understanding and using the multi-target exporter pattern
useful starting from "So what is new compared to the last config?"
14: Zstd-jni
JNI bindings for Zstd native library that provides fast and high compression lossless algorithm for Android, Java and all JVM languages
13: The concept behind C++ concepts
12: Boyer-Moore Majority Voting Algorithm
11: Implementing Raft using a functional effect system
- raft-java easier to start
10: Efficient Memory Mapping for Terabyte Sparse Files in Java
- Chronicle 25: What’s New and Improved: with huge pages
- 为什么HugePage能让Oracle数据库如虎添翼？
9: A quick primer on type traits in modern C++
- What are type traits in C++?
8: Java: Chronicle Bytes, Kicking the Tires
- Guide to ByteBuffer
7: 9 Best Java Profilers to Use in 2024
4: 10 Essential Terminal Commands Every Developer Should Know
3: T2 Linux for Mac
2: How to Implement a FIX Trading Engine in Python — Andres Berejnoi
- Kdb+ and FIX messaging: Working with repeating groups
1: Difference between yyyy and YYYY Java date pattern. What is week-based-year?
- dd/mm/yyyy vs dd/MM/yyyy?
- mm for minutes, MM for months, and in most cases use yyyy for years

2024

Contacts

NextUnderstand C++ Special Member Function Generation

Last updated 26 days ago

Was this helpful?