2025

December

16: Low Latency C++ for HFT - Part 1 - Introduction
TODO
15: C/C++ linting that just works
C/C++ linting that just works. No configuration, no false positives, no hassle.
14: BoxLite
BoxLite is an embeddable virtual machine runtime for secure, isolated execution environments. Following the SQLite philosophy of "small, fast, reliable", BoxLite lets you run containers inside lightweight VMs with hardware-level isolation—no daemon required.
13: Prompt caching: 10x cheaper LLM tokens, but how?
This enormous graph of operations can be roughly split into 4 parts.
1. Tokenizer
2. Embedding
3. Transformer
  Attention
  Feedforward
4. Output References: TODO
12: Core Dump
- Making Core Dumps Useful
- How to get a core dump for a segfault on Linux
- MacOS Core Dump
  MacOs上输出Core Dump File
11: 人生是旷野，不是轨道。
10: pipelined-state-machines
TODO
9: Structured Concurrency in Java
- StructuredTaskScope — An advanced concurrent utility feature in Java
- Difference Between Thread and Virtual Thread in Java
  what happened is that our main thread created a new stack frame on its call stack for the run() method and proceeded with the execution. Then, after the continuation yielded, the JVM saved the current state of its execution.
8: How Gradle Works
1. How Gradle Works Part 1 - Startup
  Gradle Client JVM
  Gradle Daemon JVM
2. How Gradle Works Part 2 - Inside The Daemon
  Gradle Test JVM
  Gradle Worker Daemon
  Gradle Test Executor
3. How Gradle Works Part 3 - Build Script
4. Common Gradle misconceptions
5. An In-depth Look at Gradle's Approach to Faster Compilation: TODO
7: Determine if an interface ens8f0np0 is assigned to bond0,
1. cat /proc/net/bonding/bond0 and look for a section labeled Slave Interface. If ens8f0np0 is listed, it is assigned to bond0.
2. ip link show bond0 and look for ens8f0np0 in the output. If it is listed, it is assigned to bond0.
3. cat /etc/network/interfaces or cat /etc/sysconfig/network-scripts/ifcfg-bond0 (depending on your Linux distribution) and look for ens8f0np0 in the configuration. If it is listed, it is assigned to bond0.
4. nmcli connection show bond0 and look for ens8f0np0 in the output. If it is listed, it is assigned to bond0.
6: Concurrency and parallelism in Java in a nutshell
5: Using YAML over the network.
- Using JSON in a low latency environment
4: Better than JSON
3: Writing a good CLAUDE.md
- Best practices for using GitHub Copilot to work on tasks
- Adding repository custom instructions for GitHub Copilot
2: Why you should consider ZFS over Btrfs for storing data
1: Upgrading Java 17 to Java 21
- What’s New Between Java 17 and Java 21?
- Upgrading From Java 17 To 21: All You Need To Know

November

15: History of users modifying a file in Linux
1. use stat command
2. Find the Modify time
3. Use last command to see the log in history
4. Compare the log-in/log-out times with the file's Modify timestamp
14: Aeron spy subscription
The NOT_CONNECTED isn't an error; it is an indication that there are no subscribers. If you want to discard data when there are no subscribers, then don't offer the message again, but drop it instead.
13: The Write-Ahead Log: The underrated Reliability Foundation for Databases and Distributed systems
1. PostgreSQL: WAL for ACID Transactions and Replication
2. Kafka: Logs as the System
3. MongoDB: The Oplog for Replication
- TODO: Queuing, Backpressure, Single Writer and other useful patterns for managing concurrency
- TODO: Mastering Database Connection Pooling
- How do journaling file systems help prevent data loss in the event of a power failure or system crash?
  In a journaling file system, data is written twice. One into the journal, and a second time into its final location on disk.An alternative to journaling is a log-structured file system. In a log-structured system, data and metadata are written in contiguous regions of the disk.
- TODO: PostgreSQL WAL Internals for Data Engineers
- How could WAL （write ahead log） have better performance than write directly to disk?
  Log writes can be batched into larger, sequential writes. For busy workloads, delaying a log write and then performing a single write can significantly improve throughput.
12: Performance Engineering on Hard Mode
TODO
12: Revisit of Dijkstra
and other algorithms: Algorithms and data structures implemented in JavaScript with explanations and links to further readings
11: Using Python with C++
- Part 1
  pybind11 documentation
- TODO
10: Booting my Raspberry Pi over my network made a huge performance difference
One of the most noticeable improvements was speed. Even though the Pi boots over Ethernet, read and write performance is far better than I’ve ever seen with SD cards. System updates apply faster, logs write more consistently, and services start without delay. It feels smoother, especially when working with heavier tasks like database-driven apps or media servers.
9: A Short Survey of Compiler Targets
- Most modern compilers actually don’t emit machine code or assembly directly. They lower the source code down to a language-agnostic Intermediate representation (IR) first, and then generate machine code for major architectures (x86-64, ARM64, etc.) from it.
- Sometimes you are okay with letting other compilers/runtimes take care of the heavy lifting. You can transpile your code to a another established high-level language and leverage that language’s existing compiler/runtime and toolchain.
- Meta-tracing and Metacompilation frameworks are a more complex category. These are not the targets for your compiler backend, instead, you use them to build a custom JIT compiler for your language by specifying an interpreter for it.
8: Spinning up an Onion Mirror is Stupid Easy
TODO
7: 向黄仁勋汇报的英伟达36人
第二名就是第一个失败者。
6: Asyncio Event Loops Tutorial
- A Conceptual Overview of asyncio
  The terms “coroutine function” and “coroutine object” are often conflated as coroutine. That can be confusing!
  Similar to a coroutine function, calling a generator function does not run it. Instead, it creates a generator object
  In practice, it’s recommended to use (and common to see) asyncio.run(), which takes care of managing the event loop and ensuring the provided coroutine finishes before advancing.
  It’s important to be aware that the task itself is not added to the event loop, only a callback to the task is. This matters if the task object you created is garbage collected before it’s called by the event loop.
  When the coroutine exits, local variables go out of scope and may be subject to garbage collection.
  Unlike tasks, awaiting a coroutine does not hand control back to the event loop! The behavior of await coroutine is effectively the same as invoking a regular, synchronous Python function.
  Each time a task is awaited, control needs to be passed all the way up the call stack to the event loop. That might sound minor, but in a large program with many await statements and a deep call stack, that overhead can add up to a meaningful performance drag.
  The only way to yield (or effectively cede control) from a coroutine is to await an object that yields in its await method.
5: KVM host in a few lines of code
TODO
4: Write your Own Virtual Machine
TODO
3: TCP Socket Listen: A Tale of Two Queues
- SYN Queue: tcp_max_syn_backlog
- Accept Queue: backlog <= somaxconn
- If SYN+ACK is lost, the server is responsible to retransmit it: net.ipv4.tcp_synack_retries = 5
- we can indirectly get the status by counting the number of sockets in SYN_RECV state for a listening socket:
  sudo netstat -antp | grep SYN_RECV | wc -l
  ss -n state syn-recv sport :80 | wc -l
  netstat -s | grep -i listen
  701 times the listen queue of a socket overflowed # accept queue overflow
  1246 SYNs to LISTEN sockets dropped # SYN queue overflow
  SYN cookies can be used to alleviate the attack: net.ipv4.tcp_syncookies = 1
  TCP Fast Open: sysctl net.ipv4.tcp_fastopen
  In computer networking, TCP Fast Open (TFO) is an extension to speed up the opening of successive Transmission Control Protocol (TCP) connections between two endpoints. It works by using a TFO cookie (a TCP option), which is a cryptographic cookie stored on the client and set upon the initial connection with the server.[1] When the client later reconnects, it sends the initial SYN packet along with the TFO cookie data to authenticate itself. If successful, the server may start sending data to the client even before the reception of the final ACK packet of the three-way handshake, thus skipping a round-trip delay and lowering the latency in the start of data transmission.
  What is TCP Fast Open?
  With TFO enabled, Clients use sendto() instead of connect(); SYN packets carry data directly.
2: The Linux Boot Process: From Power Button to Kernel
TODO
1: 6 ways to find out what's hogging your bandwidth at home

October

30: Microservices in the Chronicle world
- Microservices in the Chronicle world - Part 1
  Microservices in the Chronicle world are designed around:
  Simplicity - simple is fast, flexable and easier to maintain.
  Transparency - you can’t control what you don’t understand.
  Reproduceablity - this must be in your design to ensure a quality solution.
  An asynchronous method call is one which:
  doesn't return anything
  doesn't alter it's arguments
  doesn't throw any exceptions (although the underlying transport could)
- Microservices in the Chronicle world - Part 2
  In this part we look at turning a component into a service.
- Microservices in the Chronicle World - Part 3
  One of the problems with using microservices is performance. Latencies can be higher due to the cost of serialization, messaging and deserialization, and this reduces throughput.
  JMH Benchmark on microservices
- Microservices in the Chronicle world - Part 4
  A common issue we cover in our workshops is, how to restart a queue reader after a failure.
- Microservices in the Chronicle World - Part 5
  In this part we look at putting a micro service together as a collection of services, and consider how we can evaluate the performance of these services. We introduce JLBH (Java Latency Benchmark Harness) to test these services.
29: Lock-free Algorithms: Introduction
- With lockfree algorithms a thread that can make forward progress is always one of the currently running threads, and thus it actually makes forward progress. With mutex-based algorithms there is also usually a thread that can make forward progress, however it may be a currently non-running thread, and thus no actual forward progress happens (at least until, for example, a page will be loaded from disk and/or several context switches happen and/or some amount of active spinning happens).
- For example, it's generally unsafe to use locks in signal handlers, because the lock can be currently acquired by the preempted thread, and it instantly leads to a deadlock.
  The thread cannot proceed because the signal handler is executed in the context of the thread that was interrupted by the signal. If the signal handler tries to acquire a lock that the thread already holds, the signal handler will block, waiting for the lock to be released. However the thread itself cannot release the lock because it is effectively paused while the signal handler is running. This creates a deadlock situation where neither the thread nor the signal handler can make progress.
- Lock-free Algorithms: First things first
  First, if there is write sharing system ungracefully degrades, the more threads we add the slower it becomes.
  Second, if there is no write sharing system linearly scales. Yes, atomic RMW operations are slower than plain stores and loads, but they do scale linearly in itself.
  Third, loads are always scalable. Several threads are able to read a memory location simultaneously. Read-only accesses are your best friends in a concurrent environment.
- Lock-free Algorithms: Your Arsenal
  Compare-And-Swap
  Fetch-And-Add
  Exchange
  Atomic loads and stores
  Mutexes and the company
  Why not? The most stupid thing one can do is try to implement everything in a non-blocking style (of course, if you are not writing infantile research paper, and not betting a money). Generally it's perfectly Ok to use mutexes/condition variables/semaphores/etc on cold-paths. For example, during process or thread startup/shutdown mutexes and condition variables is the way to go.
8: How to Leverage Method Chaining to Add Smart Message Routing
This article has shown how it is possible to use method chaining to route messages, but this is not the only use-case for method chaining. This technique can also allow other types of metadata to be associated with business events. Other uses for method chaining and associating meta-information include setting a message priority for a priority queue or recording access history. Then, Dispatching events with associated metadata over an event-driven architecture (EDA) framework allows custom lightweight microservices to read and act upon that metadata.
7: tunnel using websockets using cranker-connector
Another idea other than ssh -R <remote_port>:localhost:<local_port> user@remote_host
6: Shadowsocks and SOCK5
5: TEXMAKER
- Texmaker is a free, modern and cross-platform LaTeX editor for Linux, macOS and Windows systems that integrates many tools needed to develop documents with LaTeX, in just one application.
- LaTeX Wikibook
- Overleaf, Online LaTeX Editor
4: Simple Binary Encoding
- In low-latency applications, messages are often encoded/decoded in memory mapped files via MappedByteBuffer and thus can be be transferred to a network channel by the kernel thus avoiding user space copies.
- Guide to Simple Binary Encoding
  A Guide to XML in Java
  Working with XML Files in Java Using DOM Parsing
  Validate an XML File Against an XSD File
3: Why aeron`s logbuffers divide into three selections?
The main points are that it enables an algorithm which is wait-free for concurrent publication and supports retransmits on the network in the event of loss. "Aeron: Open-source high-performance messaging" by Martin Thompson
- Composable Design
- OSI layer 4 Transport for message oriented streams
  Connection Oriented Communication
  Reliability
  Flow Control
  Congestion Avoidance/Control
  Multiplexing
  avoid head of line blocking
- Design Principles
  Clear separation of converns
  Garbage free in steady state running
  Lock-free, wait-free, and copy-free in data structures in the message path
  Respect the Single Writer Principle
  Major data structures are not shared
  Don't burden the main path with exceptional cases
  Non-blocking in the message
- Putting a Disruptor in front of the network is not necessary as there is Zero Copy from the application to the network.
- How the skip list is used to build the messaging system from the point of view of contiguity of streaming data? TODO

2: Function Pointer to Member Function in C++

Dereferencing the member function pointer from the class for the current object/pointer.

// Declare a pointer to the member function 'add' of
// MyClass
int (MyClass::*ptrToMemberFunc)(int, int)
   = &MyClass::add;
// Call the member function 'add' using the function
// pointer
int result = (obj.*ptrToMemberFunc)(20, 30);

1: simple-binary-encoding design principles
1. Copy-Free: The principle of copy-free is to not employ any intermediate buffers for the encoding or decoding of messages.
2. Native Type Mapping: For example, a 64-bit integer can be encoded directly to the underlying buffer as a single x86_64 MOV assembly instruction.
  What is the proper byte order in messages?
  They did it by using the B0-to-B31 convention of the Little-Endians, while keeping the Big-Endians' conventions for bytes and words.
  Big Endian, Little Endian, Endianness: Understanding Byte Arrangements in Digital Systems
3. Allocate-Free: The design of SBE codecs are allocation-free by employing the flyweight pattern. The flyweight windows over the underlying buffer for direct encoding and decoding of messages.
  Flyweight Pattern in Java: Here the flyweight pattern is used to minimize memory usage or computational expenses by sharing as much as possible with similar objects, which is different from the SBE flyweight pattern.
4. Streaming Access: It is possible to backtrack to a degree within messages but this is highly discouraged from a performance and latency perspective.
  Memory Access Patterns Are Important
  Basically three major bets are taken on memory access patterns:
  Temporal: Memory accessed recently will likely be required again soon.
  Spatial: Adjacent memory is likely to be required soon.
  For an Intel processor these cache-lines are typically 64-bytes, that is 8 words on a 64-bit machine. This plays to the spatial bet that adjacent memory is likely to be required soon, which is typically the case if we think of arrays or fields of an object.
  Striding: Memory access is likely to follow a predictable pattern.
  Hardware will try and predict the next memory access our programs will make and speculatively load that memory into fill buffers. This is done at it simplest level by pre-loading adjacent cache-lines for the spatial bet, or by recognising regular stride based access patterns, typically less than 2KB in stride length.
  By moving to larger pages, a TLB cache can cover a larger address range for the same number of entries.
  Cache-Oblivious Algorithms and Cache-oblivious algorithm wiki
  The idea behind cache-oblivious algorithms is efficient usage of processor caches and reduction of memory bandwidth requirements.
  Cache-oblivious algorithms work by recursively dividing a problem's dataset into smaller parts and then doing as much computations of each part as possible. Eventually subproblem dataset fits into cache, and we can do significant amount of computations on it without accessing memory
  When designing algorithms and data structures, it is now vitally important to consider cache-misses, probably even more so than counting steps in the algorithm.
  The last decade has seen some fundamental changes in technology. For me the two most significant are the rise of multi-core, and now big-memory systems with 64-bit address spaces.
5. Word Aligned Access: It is assumed the messages are encapsulated within a framing protocol on 8 byte boundaries. To achieve compact and efficient messages the fields should be sorted in order by type and descending size.
6. Backward Compatibility: An extension mechanism is designed into SBE which allows for the introduction of new optional fields within a message that the new systems can use while the older systems ignore them until upgrade.

September

21: 航天“双五归零”
发现它，理解它，复现它，修复它，最后消灭它的所有同类。
20: Agents & Idle Strategies
A typical duty cycle will poll the doWork function of an agent until it returns zero. Once the zero is returned, the idle strategy will be called.
19: The Problem with Threads
Threads, as a model of computation, are wildly nondeterministic, and the job of the programmer becomes one of pruning that nondeterminism. Although many research techniques improve the model by offering more effective pruning, I argue that this is approaching the problem backwards. Rather than pruning nondeterminism, we should build from essentially deterministic, composable components. Nondeterminism should be explicitly and judiciously introduced where needed, rather than removed where not needed.
18: Billions of Messages Per Minute Over TCP/IP
- @NanoTime is used by Chronicle Wire to encode the property value most efficiently as a timestamp, @Base85 is used to encode short strings in a space-efficient way
- Java is Very Fast, If You Don’t Create Many Objects
17: The Unix Philosophy for Low Latency
Much of Unix’s success can be attributed to the “Unix Philosophy” which can be very briefly summarised as:
- Write programs that do one thing and do it well
- Write programs to work together
- Write programs to handle text streams, because that is a universal interface
- More shell, less egg
16: Base85 encoding
- Like Base64, the goal of Base85 encoding is to encode binary data printable ASCII characters. But it uses a larger set of characters, and so it can be a little more efficient. Specifically, it can encode 4 bytes (32 bits) in 5 characters.
- Base 32 and base 64 encoding
  There are around 100 possible characters on a keyboard, and 64 is the largest power of 2 less than 100, and so base 64 is the most dense encoding using common characters in a base that is a power of 2.
- Base 58 encoding and Bitcoin addresses
  Base58 is nearly as efficient as base64, but more concerned about confusing letters and numbers. The number 1, the lower case letter l, and the upper case letter I all look similar, so base58 retains the digit 1 and does not use the lower case letter l or the capital letter I.
  it may take up to 35 characters to represent a Bitcoin address in base58. Using base64 would have taken up to 34 characters, so base58 pays a very small price for preventing a class of errors relative to base64.
- How UTF-8 works
  Since the first bit of ASCII characters is set to zero, bytes with the first bit set to 1 are unused and can be used specially.
  Unicode initially wanted to use two bytes instead of one byte to represent characters, which would allow for 2^16 = 65,536 possibilities, enough to capture a lot of the world’s writing systems. But not all, and so Unicode expanded to four bytes.
  Although a Unicode character is ostensibly a 32-bit number, it actually takes at most 21 bits to encode a Unicode character for reasons explained here. How many possible Unicode characters there are and why
  UTF-8 lets you take an ordinary ASCII file and consider it a Unicode file encoded with UTF-8. So UTF-8 is as efficient as ASCII in terms of space. But not in terms of time. If software knows that a file is in fact ASCII, it can take each byte at face value, not having to check whether it is the first byte of a multibyte sequence.
  And while plain ASCII is legal UTF-8, extended ASCII is not. So extended ASCII characters would now take two bytes where they used to take one.
15: Liquidity Models
- daytrading.com
- check the informative Coding Example – Liquidity Model in Trading
14: Creating Mappers Without Creating Underlying Objects in Java
A HashMap with int keys and long values might, for each entry, create a wrapped Integer, a wrapped Long object, and a Node that holds the former values together with a hash value and a link to other potential Node objects sharing the same hash bucket. Perhaps even more tantalizing is that a wrapped Integer might be created each time the Map is queried! For example, using the Map::get operation.
- class IntMapper<V> and class Mapper<K, V> in Chronicle-Wire demo
13: Java Memory Management
- Reference Queue
- Phantom Reference: Used to schedule post-mortem cleanup actions, since we know for sure that objects are no longer alive. Used only with a reference queue, since the .get() method of such references will always return null. These types of references are considered preferable to finalizers.
- -XX:+HeapDumpOnOutOfMemoryError
- -verbose:gc
- -Xms512m -Xmx1024m -Xss1m -Xmn256m
- -Xlog:gc*:file=gc.log:time,uptime,level,tags -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime
12: Java: Creating Terabyte Sized Queues with Low-Latency
- The ConcurrentLinkedQueue will create a wrapping Node for each element added to the queue. This will effectively double the number of objects created.
- Objects are placed on the Java heap, contributing to heap memory pressure and garbage collection problems. On my machine, this led to my entire JVM becoming unresponsive and the only way forward was to kill it forcibly using “kill -9”.
- The queue cannot be read from other processes (i.e. other JVMs).
- Once the JVM terminates, the content of the queue is lost. Hence, the queue is not durable.
- a single MarketData instance can be reused over and over again because Chronicle Queue will flatten out the content of the current object onto the memory-mapped file, allowing object reuse.
11: Java: How Object Reuse Can Reduce Latency and Improve Performance
- Hence, contrary to many beliefs, creating a POJO, setting some values in one thread, and handing that POJO off to another thread will simply not work. The receiving thread might see no updates, might see partial updates (such as the lower four bits of a long were updated but not the upper ones), or all updates. To make thighs worse, the changes might be seen 100 nanoseconds later, one second later or they might never be seen at all. There is simply no way to know.
  One way to avoid the POJO problem is to declare primitive fields (such as int and long fields) volatile and use atomic variants for reference fields. Declaring an array as volatile means only the reference itself is volatile and does not provide volatile semantics to the elements.
  Another way to reuse objects is by means of ThreadLocal variables which will provide distinct and time-invariant instances for each thread.
  It should be noted that there are other ways to ensure memory consistency. For example, using the perhaps less known Java class Exchanger.
  Yet another way is to use open-source Chronicle Queue which provides an efficient, thread-safe, object creation-free means of exchanging messages between threads.
- jmap -histo 8536
- As can be seen, Chronicle Queue spends most of its time accessing field values in the POJO to be written to the queue using Java reflection. Even though it is a good indicator that the intended action (i.e. copying values from a POJO to a Queue) appears somewhere near the top, there are ways to improve performance even more by providing hand-crafted methods for serialization substantially reducing execution time. (instead of SelfDescribingMarshallable)
10: Chronicle JLBH
Java Latency Benchmark Harness is a tool that allows you to benchmark your code running in context, rather than in a microbenchmark.
- A Series of Posts on JLBH
9: Chronicle Wire: Object Marshalling
- Chronicle Wire is able to find a middle ground between compacting data formatting (storing more data into the same space) versus compressing data (reducing the amount of storage required).
- Typically, when we talk about a byte, a byte can represent one of 256 different characters. Yet, rather than being able to represent one of 256 characters, because we used Base64LongConverter we are saying that the 8-bit byte can only represent one of 64 characters. By limiting the number of characters that can be represented in a byte, we are able to compress more characters into a long.
- Chronicle-Wire: Acts as a serialization library that abstracts over various wire formats (e.g., YAML, JSON, binary). It handles marshalling (serialization) and unmarshalling (deserialization) of Java objects into/from these formats, emphasizing performance, schema evolution, and cross-platform compatibility.
- Chronicle-Bytes: Focuses on low-level memory management and byte manipulation. It provides wrappers around byte arrays, ByteBuffers, and off-heap memory, offering thread-safe operations, elastic resizing, and deterministic resource release. It is similar to Java NIO's ByteBuffer but with extended features.
  Java: Chronicle Bytes, Kicking the Tires
- How to Get C++ Speed in Java Serialisation
  Trivially Copyable Objects.
- Did You Know the Fastest Way of Serializing a Java Field Is Not Serializing It at All?
  Many JVMs will sort primitive class fields in descending field size order and lay them out in succession. This has the advantage that read and write operations can be performed on even primitive type boundaries.
  Well, as it turns out, it is possible to access an object’s field memory region directly via Unsafe and use memcpy to directly copy the fields in one single sweep to memory or to a memory-mapped file.
- High-Performance Java Serialization to Different Formats
  The encoding will affect the number of bytes used to store the data, the more compact the format, the fewer bytes used. Chronicle Wire balances the compactness of the format without going to the extreme of compressing the data, which would use valuable CPU time, Chronicle Wire aims to be flexible and backwards compatible, but also very performant.
  stop-bit encoding
  Some encodings are more performant, perhaps by not encoding the field names to reduce the size of the encoded data, this can be achieved by using Chronicle Wire’s Field Less Binary. However this is a trade-off, sometimes it is better to sacrifice a bit of performance and add the field names since it will give us both forwards and backwards compatibility.
8: Chronicle-Map
When deciding between on-heap and off-heap you are trading the extra memory you require for on-heap implementation against the extra latency to fetch the item from the queue in the off-heap implementation. The general rule is to favour on-heap unless you have very large maps. Another consideration is that off-heap maps will update faster than on-heap maps as there is no serialisation.
- Java: ChronicleMap, Part 1: Go Off-Heap
  jmap -histo 34366 | head to check the number of objects created.
  -XX:NativeMemoryTracking=summary, we can retrieve the amount off-heap memory being used by issuing the following command: jcmd 34413 VM.native_memory | grep Internal
  Many Garbage Collection (GC) algorithms complete in a time that is proportional to the square of objects that exist on the heap.
  The mediator between heap and off-heap memory is often called a serializer.
  Memory Layout of Objects in Java
  For normal objects in Java, represented as instanceOop, the object header consists of mark and klass words plus possible alignment paddings. After the object header, there may be zero or more references to instance fields. So, that’s at least 16 bytes in 64-bit architectures because of 8 bytes of the mark, 4 bytes of klass, and another 4 bytes for padding.
  For arrays, represented as arrayOop, the object header contains a 4-byte array length in addition to mark, klass, and paddings. Again, that would be at least 16 bytes because of 8 bytes of the mark, 4 bytes of klass, and another 4 bytes for the array length.
  When you want to store a Java object (from the heap) into off-heap memory, the serializer's job is to convert that complex, structured object into a simple, flat sequence of bytes.
- Java: ChronicleMap, Part 2: Super RAM Maps
  Needless to say, you should make sure that the file you are mapping to is located on a file system with high random access performance. For example, a filesystem located on a local SSD.
7: Improving Putty settings on Windows
Make the Putty more developer friendly.
6: log4j2: Garbage-free logging
How to configure garbage-free logging with Log4j2.
5: 如何让交易不被“压垮”？
- 队列必须有界，而且这个界限就是你的拥塞窗口。
- 最大延迟 ≈ (窗口大小 / 并发数) × 单次处理耗时 -> Little's Law: L = λ x W
- 拥塞控制的核心，是一个反馈循环：感知拥塞，然后调整窗口。
  窗口占用率 TCP ECN
  处理单个请求的时间; 监控单次事务耗时的P99分位数，和监控队列深度同等重要。
  网络路由器在高负载时丢弃数据包
- 有了拥塞窗口和拥塞信号，你就可以构建一个控制算法了。这和TCP的AIMD（加性增、乘性减）思想异曲同工。
  在网关层，直接拒绝。
  在网关层，感知撮合拥塞。
  在服务内，自然阻塞。
4: An Illustrated Guide to OAuth
How OAuth works.
3: How Can AI ID a Cat? An Illustrated Guide.
A neuron with two inputs has three parameters. Two of them, called weights, determine how much each input affects the output. The third parameter, called the bias, determines the neuron’s overall preference for putting out 0 or 1.
- What Is Machine Learning?
- How ‘Embeddings’ Encode What Words Mean — Sort Of
2: PerfectScramble
This searches all possible arrangements of a 3x3 Rubik's Cube to find a scramble that is very difficult to solve.
1: Differences between AMD and Intel CPUs

August

9: Low Latency Optimization: Understanding Huge Pages
- Low Latency Optimization: Understanding Huge Pages (part 1)
- Low Latency Optimization: Understanding Huge Pages (part 2)
- JVM Anatomy Quark #2: Transparent Huge Pages
  It is sometimes funny to see how people do manual memory management to avoid GCs, only to hit THP defrag causing latency spikes for them!
  To shift these costs to the JVM startup that will avoid surprising latency hiccups when application is running, you may instruct JVM to touch every single page in Java heap with -XX:+AlwaysPreTouch during initialization.
8: 一文讲透交易性能指标
延迟：只能在系统中两个不同点之间进行测量；没有某点上的延迟。每条消息都有自己的延迟，每个模块内部的两点间，有其延迟。可以对多个消息的延迟进行平均值计算。
吞吐：只能在系统的单个点（或者说单线程）进行测量。接收线程有一个吞吐量，发送线程有另一个吞吐量，在两者之间的任何中间点都有一个吞吐量。而我们通常讲的交易系统的吞吐量，往往指的是交易系统对外部的最后一个点。而且，吞吐量只对一组消息有意义；单条消息没有吞吐量。
- 如何用简单易懂的实例解释排队论？ (similar to HFT order book)
- I-Star Market Impact Model
- 泊松分布和指数分布
- Eurex如何处理冲击订单
7: 如何设计一个高性能订单簿
- QuantCup limit order book in C++
- Quant Cup 1's winning order book implementation
6: The Magical container_of() Macro
Use it to build Intrusive Data Structure
5: sfptpd, chronyd and linuxptp
- sfptpd and linuxptp: Both aim to provide PTP synchronization, but sfptpd is specific to Solarflare hardware and offers enhanced features and optimizations for those adapters. linuxptp is a more general-purpose PTP solution for various hardware.
- chronyd can be used alongside PTP solutions like linuxptp or sfptpd.
- Linux 时钟源之 TSC
4: How to Secure A Linux Server
- An evolving how-to guide for securing a Linux server that, hopefully, also teaches you a little about security and why it matters.
- How To Secure A Linux Server With Ansible
3: Google经典编程竞赛题：计算 (3+sqrt(5))^n 的小数点前三位数
TODO

2: Algorithms for Modern Hardware

Pointer Alternatives
低延迟交易设计之指针访存
Memory Latency
One way to ensure this is to generate a random permutation of size N that corresponds to a cycle and then repeatedly follow the permutation:
int p[N], q[N];

// generating a random permutation
iota(p, p + N, 0);
random_shuffle(p, p + N);
   
// this permutation may contain multiple cycles,
// so instead we use it to construct another permutation with a single cycle
int k = p[N - 1];
for (int i = 0; i < N; i++)
    k = q[k] = p[i];
   
    for (int t = 0; t < K; t++)
        for (int i = 0; i < N; i++)
            k = q[k];

1: My Frugal Indie Dev Startup Stack
The affordable tools that are used to build my indie dev startup, from code editor to hosting.

July

6: Bloom Filters by Example
Your false positive rate will be approximately (1-e-kn/m)k, so you can just plug the number n of elements you expect to insert, and try various values of k and m to configure your filter for your application. So, to choose the size of a bloom filter, we:
- Check the value range of n.
- Choose the number of bits m.
- Calculate the optimal value of the number of hash functions k = (m/n)ln(2).
- Calculate the error rate, if it's unacceptable, return to step 2 and try again.
5: 斯特林公式(Stirling's Formula)：我一个阶乘表达式，怎么就和圆扯上关系了呢？
Stirling's Approximation: n! ≈ √(2πn) * (n/e)^n
4: Memory Consistency Models: A Tutorial
One common ordering challenge is memory consistency, which is the problem of defining how parallel threads can observe their shared memory state.
- 低延时设计内功之内存模型
- 低时延交易设计之内存带宽
- 低时延交易设计之大页内存
  L1 数据缓存: 通常很小，约 32KB，缓存行为 64字节。
  L2 缓存: 大一些，通常为 256KB 到 1MB，缓存行也是 64字节。
  L3 缓存 (LLC): 更大，通常为几MB到几十MB，共享于所有核心。
  D_critical = L2_Size / N ≈ 128 bytes; 当 D 大于 128 字节时，你访问的 N 个缓存行总大小 8192 * 64B = 512KB 虽然不变，但它们在内存中的分布范围超过了 L2 缓存的大小。
  虽然有效数据量只有 512KB，但数组的总跨度是 8MB。CPU 的预取器无法有效预测这种极其稀疏的访问模式。
  当D=256时，数组大小正好是8K(N)×256(D)×4B(int)=8MB——这正是L2 TLB的极限。超过这个阈值后，每次内存访问都要穿透多层页表，相当于在电话簿里逐页翻找地址，而非直接查看速记本。
- 低时延交易设计之无分支编程
- 低时延交易设计之内存时延
3: Include Guards and their Optimizations
This article discusses the purpose and importance of include guards in C/C++ projects. It also explores the optimizations that compilers have surrounding include guards to improve build times, and the how easy it is to unintentionally disable these optimizations!
2: What is "stdafx.h" used for in Visual Studio?
The trick consists of designating a special header file as the starting point of all compilation chains, the so called 'precompiled header' file, which is commonly a file named stdafx.h simply for historical reasons.
Simply list all your big huge headers for your APIs in your stdafx.h file, in the appropriate order, and then start each of your CPP files at the very top with an #include "stdafx.h", before any meaningful content (just about the only thing allowed before is comments).
Under those conditions, instead of starting from scratch, the compiler starts compiling from the already saved results of compiling everything in stdafx.h.
1: Java with ANTLR
ANTLR is a powerful parser generator that can be used to read, process, execute, or translate structured text or binary files. It is widely used for building languages, tools, and frameworks.

June

15: Git Notes: git's coolest, most unloved feature
the short of it is: they’re cool for appending notes from automated systems (like ticket or build systems) but not really for having interactive conversations with other developers (at least not yet)
14: Using test fixtures in Gradle and Maven
In Maven if you want to reuse text fixtures from one module in another module, you use the jar plugin and build a test jar.
Providing test fixtures in Gradle projects
13: Lock on existence of file in Java
The short answer: reliable file based locking in Java is not practical.
The long answer: The issue with file based locking, in any OS, always comes down to what kind of storage system the file comes from. Almost all network accessed file systems (NFS, SAMBA, etc) have very unreliable (or at least unpredictable) synchronizations on file creates or deletes that make a general Java-ish approach inadvisable. In certain OSes, using local file systems, you can sometimes get what you desire. But you need to understand the underlying file system and its characteristics and proceed with care.
12: Using the JFR Event Streaming API in Automated Tests - Sip of Java
With JDK 14 (JEP 349), the JFR Event Streaming API was added, which provided a way to give a real-time look into what is happening within a Java application. This article will explore how the JFR Event Streaming API could be utilized in automated testing.
JfrUnit - A JUnit extension for asserting JDK Flight Recorder events
11: Better Shell History Search
Larger Unix shells such as Bash have long allowed users to search through their shell history by pressing Ctrl-r and entering a substring. If I (in order) executed the commands cat /etc/motd then cat /etc/rc.conf, then Ctrl-r followed by “cat” will first match cat /etc/rc.conf; pressing Ctrl-r again will cycle backwards for the next match which is cat /etc/motd.
10: PandoraTrader
A free and open-source trading platform for algorithmic trading, backtesting, and paper trading. It supports multiple asset classes, including stocks, options, futures, and forex.
9: “Sneaky Throws” in Java
8: An Interactive Guide To Rate Limiting
- Token bucket
- Leaky bucket
- Fixed window counter
- Sliding window counter
7: Eurex因高频试单机制被起诉
“投机性触发”（Speculative Triggering）。交易所的网络为了追求极致的速度，采用cut-through转发模式。这意味着，一个数据包（比如行情信息），不需要等它整帧完全抵达交换机，只要它的“包头”（相当于收件地址）被识别，交换机就会立刻开始把它转发到下一个目的地。
6: Optiver 交易性能调优10法
Optiver交易性能调优10法（上篇） and Optiver交易性能调优10法（下篇）
1. Most of the time, you don't want node container: std::map -> std::vector
2. Understanding your problem by looking at data -> reverse the std::vector
3. Hand tailored (specialized) algorithms are key to achieve performance -> branchless binary search
4. Simplicity is the ultimate sophistication -> no binary search, just linear search
5. Mechanical sympathy -> branch prediction; cold code separation; lambda, Functor > std::function; etc
6. True efficiency is found not in the layers of the complexity we add, but in the unnecessary layers we remove -> kernel bypassing; user-space networking; etc
7. Choose the right tool for the right task -> FastQueue, SPMC, monitor, and visualization
8. Being fast is good - staying fast is better
9. Thinking about the system as a whole
10. The performance of your code depends on your colleagues' code as much as yours
“投机性触发”（Speculative Triggering）。交易所的网络为了追求极致的速度，采用cut-through转发模式。这意味着，一个数据包（比如行情信息），不需要等它整帧完全抵达交换机，只要它的“包头”（相当于收件地址）被识别，交换机就会立刻开始把它转发到下一个目的地。
Optiver低延时交易核心解密（1） and Optiver低延时交易核心解密（2）
- stable_vector: chunks of contiguous memory, but the chunks themselves are not contiguous. This allows for efficient insertions and deletions while maintaining good cache performance.
- SeqLock: depends on version numbers to detect concurrent modifications. Writers increment the version number before and after writing, while readers check the version number before and after reading to ensure data consistency.
  写操作：增加版本号 (奇数)，修改数据，再次增加版本号 (偶数)。
  读操作：读取版本号，读取数据，再次读取版本号。如果两次读取的版本号相同且为偶数，则数据读取成功；否则，需要重试。
- SPMC Queue
5: Dissecting the Disruptor: Writing to the ring buffer
The important areas are: not wrapping the ring; informing the consumers; batching for producers; and how multiple producers work.
高性能无锁队列 Disruptor 核心原理分析及其在i主题业务中的应用
4: Initialization Scripts
Using it to config the repo that could be used by all projects. And define some common tasks inside allprojects, eg., the showRepositories share in the page.
3: How to Warm Up the JVM
More about the JVM - JVM and Java Virtual Machine Series
2: Appendix D: Order State Change Matrices – FIX 4.4 – FIX Dictionary
A very good example about how to present a reproducible state matrix in a table.
1: Implement the Streaming Real-Time Java Application with Kotlin Language
An example of integration to Refinitiv market data.

May

30: Templating Maven Plugin
The templating maven plugin handles copying files from a source to a given output directory, while filtering them. This plugin is useful to filter Java Source Code if you need for example to have things in that code replaced with some properties values.
29: Beginner’s Guide To Bash getopts
A beginner's guide to using getopts in bash scripts for parsing command-line options and arguments. Also How to Use Bash Getopts With Examples.
24: Plain Vanilla
An explainer for doing web development using only vanilla techniques. No tools, no frameworks — just HTML, CSS, and JavaScript. TODO
23: Jane Street防抖动简明教程
如何避免 Jitter (System Jitter and Where to Find It: A Whack-a-Mole Experiencer and magic-trace github)
- 第一轮：干掉虚拟机！
- 第二轮：消除中断的“骚扰”
- 第三轮：隔离出 CPU
- 第四轮：让时钟中断也“消停会儿”
- 第五轮：关掉 CPU 的“自动挡”
- 第六轮：给 CPU 加个“暂停”提示
- 第七轮：氪金
22: Gall's Law
加尔定律经常被引用：“一个有效的复杂系统，总是从一个有效的简单系统进化而来。”
但是，它的推论很少被引用：“一个从零开始设计的复杂系统永远不会有效，你必须从一个可以运行的简单系统开始。”
There's More To That Nugget of Wisdom
16: How Core Git Developers Configure Git
What git config settings should be defaults by now? Here are some settings that even the core developers change.
Why is Git Autocorrect too fast for Formula One drivers?
- it's based on a fairly simple, modified Levenshtein distance algorithm - which is basically a way to figure out how expensive it is to change one string into a second string given single character edits, with some operations being more expensive than others.
Experiment on your code freely with Git worktree
15: The Unreasonable Effectiveness of an LLM Agent Loop with Tool Use
With just that one very general purpose tool, the current models (we use Claude 3.7 Sonnet extensively) can nail many problems, some of them in "one shot."
14: Ports that are blocked by browsers
list of the ports blocked by Firefox.
13: Pick the right clock
- Choosing which timer to use is very simple and depends on how long the thing is that you want to measure. If you measure something over a very small time period, TSC will give you better accuracy. Conversely, it’s pointless to use the TSC to measure a program that runs for hours. Unless you really need cycle accuracy, the system timer should be enough for a large proportion of cases. It’s important to keep in mind that accessing system timer usually has higher latency than accessing TSC. Making a clock_gettime system call can be easily ten times slower than executing RDTSC instruction, which takes 20+ CPU cycles. This may become important for minimizing measurement overhead, especially in the production environment. Performance comparison of diﬀerent APIs for accessing timers on various platforms is available on wiki page46 of CppPerformanceBenchmarks repository. "Performance Analysis and Tuning on Modern CPUs"
- /sys/devices/system/clocksource/clocksource0/current_clocksource to check whether tsc is used
- the clock_gettime() function from <time.h> can use the TSC (Time Stamp Counter), but it depends on:
  The clock source (e.g., CLOCK_MONOTONIC, CLOCK_REALTIME).
  The underlying system configuration (VDSO acceleration, TSC stability).
12: Templating Maven Plugin
The templating maven plugin handles copying files from a source to a given output directory, while filtering them. This plugin is useful to filter Java Source Code if you need for example to have things in that code replaced with some properties values.
11: MM and LOB
TODO:
10: Concatenating kdb Columns
- Suppose in a query you need to concatenate two kdb columns into one; for example, to join date and time into one field - kdb has nifty features to do it easily.
- Programming/Kdb/Resources
9: vTable And vPtr in C++ and Understandig Virtual Tables in C++
- how to design cpp similar to the interface in java
  runtime polymorphism vs compile time generics / templates
  runtime polymorphism with virtual methods and always with override keyword
  pure virtual function
  base contract class should always have virtual destructor to prevent memory leakage
- Whenever a class contains a virtual function, the compiler creates a Vtable for that class. Each object of the class is then provided with a hidden pointer to this table, known as Vptr.
- It's important to note that vptr is created only if a class has or inherits a virtual function.
- This process is known as static dispatch or early binding: the compiler knows which routine to execute during compilation.
- given that virtual functions can be redefined in subclasses, calls via pointers (or references) to a base type can not be dispatched at compile time. The compiler has to find the right function definition (i.e. the most specific one) at runtime. This process is called dynamic dispatch or late method binding.
- Since derived classes are often handled via base class references, declaring a non-virtual destructor will be dispatched statically, obfuscating the destructor of the derived class.
8: Latency percentiles are not additive
Latency percentiles are simply not additive. Adding latency percentiles from multiple requests are indicative but not conclusive. And their summation is often too pessimistic and may trigger unnecessary overreaction.
7: C++: C-Style arrays vs. std::array vs. std::vector and std::vector versus std::array in C++
- std::array is a very thin wrapper around C-style arrays that go on the stack (to put it simply, they do not use operator new. The examples above do this). Like arrays that go on the stack, its size must be known at compile time
- You should use stdarray when the array size is known at compile time. You should use stdvector when you do not, or the array can grow.
6: Beej's Guide to Network Programming
A good sites for all kinds of guides including the Network Programming.
5: Solve a Hard Problem (Tinder). Chapter 8 of my upcoming book, The Cold Start Problem
- What people are doing on their nights and weekends represents all the underutilized time and energy in the world that if put to good use, can become the basis of the hard side of an atomic network.
- If there is no network in your product, add it through atomic network.
4: A Candidate For the “Most Important const”
The "const" is important. The first line is an error and the code won’t compile portably with this reference to non-const, because f() returns a temporary object (i.e., rvalue) and only lvalues can be bound to references to non-const.

April

26: An Introduction to Epsilon GC: A No-Op Experimental Garbage Collector
JEP 318 explains that “[Epsilon] … handles memory allocation but does not implement any actual memory reclamation mechanism. Once the available Java heap is exhausted, the JVM will shut down.”
25: Proof Engineering: The Message Bus
Every input into the system is assigned a globally unique monotonic sequence number and timestamp by a central component known as a sequencer. This sequenced stream of events is disseminated to all nodes/applications in the system, which only operate on these sequenced inputs, and never on any other external inputs that have not been sequenced. Any outputs from the applications must also first be sequenced before they can be consumed by other applications or the external world. Since all nodes in the distributed system are presented with the exact same sequence of events, it is relatively straightforward for them to arrive at the same logical state after each event, without incurring any overhead or issues related to inter-node communication.
- Binary Encoding
  flatbuffers
  capnproto
- Hardware Efficiency / Kernel Bypassing
  DPDK
  NVIDIA Messaging Accelerator (VMA)
  OpenOnload
20: Details of the Cloudflare outage on July 2, 2019
TODO
19: Finding Memory Leak through MAT
The following 4-step approach proved to be most efficient to detect memory issues:
1. Get an overview of the heap dump. See: Overview
2. Find big memory chunks (single objects or groups of objects).
3. Inspect the content of this memory chunk.
4. If the content of the memory chunk is too big check who keeps this memory chunk alive This sequence of actions is automated in Memory Analyzer by the Leak Suspects Report.
18: Suffering-oriented programming
First make it possible. Then make it beautiful. Then make it fast.
17: Proof Engineering: The Algorithmic Trading Platform
- The best way to avoid GC is to not create garbage in the first place. This topic could fill a book, but the primary ways to do that are: (a) Do not create new objects in the critical path of processing. Create all the objects you’ll need upfront and cache them in object pools. (b) Do not use Java strings. Java strings are immutable objects that are a common source of garbage. We use pooled custom strings that are based on java.lang.StringBuilder (c) Do not use standard Java collections. More on this below (d) Careful about boxing/unboxing of primitive types, which can happen when using standard collections or during logging. (e) Consider using off-heap memory buffers where appropriate (we use some of the utilities available in chronicle-core).
- Avoid standard Java collections. Most standard Java collections use a companion Entry or Node object, that is created and destroyed as items are added/removed. Also, every iteration through these collections creates a new Iterator object, which contributes to garbage. Lastly, when used with primitive data types (e.g. a map of long → Object), garbage will be produced with almost every operation due to boxing/unboxing. When possible, we use collections from agrona and fastutil (and rarely, guava).
- Write deterministic code. We’ve alluded to determinism above, but it deserves elaboration, as this is key to making the system work. By deterministic code, we mean that the code should produce the exact same output each time it is presented with a given sequenced stream, down to even the timestamps. This is easier said than done, because it means that the code may not use constructs such as external threads, or timers, or even the local system clock. The very passage of time must be derived from timestamps seen on the sequenced stream. And it gets weirder from there — like, did you know that the iteration order of some collections (e.g. java.util.HashMap) is non-deterministic because it relies on the hashCode of the entry keys?!
- but our changes enable us to integrate QuickFIX/J with the sequenced stream architecture in such a way that we no longer rely on disk logs for recovery (which is how most FIX sessions recover).
- Our FIX spec is available in either the PDF format or the ATDL format (Algorithmic Trading Definition Language).
13: The Escape of ArrayList.iterator()
Escape Analysis works, at least for some trivial cases. It is not as powerful as we'd like it, and code that is not hot enough will not enjoy it, but for hot code it will happen. I'd be happier if the flags for tracking when it happens were not debug only.
12: What is the meaning of SO_REUSEADDR (setsockopt option) - Linux?
This socket option tells the kernel that even if this port is busy (in the TIME_WAIT state), go ahead and reuse it anyway. If it is busy, but with another state, you will still get an address already in use error. It is useful if your server has been shut down, and then restarted right away while sockets are still active on its port.
11: Single Writer Principle
If a system is decomposed into components that keep their own relevant state model, without a central shared model, and all communication is achieved via message passing then you have a system without contention naturally. This type of system obeys the single writer principle if the messaging passing sub-system is not implemented as queues. If you cannot move straight to a model like this, but are finding scalability issues related to contention, then start by asking the question, “How do I change this code to preserve the Single Writer Principle and thus avoid the contention?” LMAX - How to Do 100K TPS at Less than 1ms Latency: the head and the tail compete with each other quite often since the queue normally is either full or empty, and when it's empty, they are normally pointing to the same cacheline. Why queue is not a good data structure for low latency?
- Contention & Locking Overhead: locks / cache coherence traffic
- Memory Allocation & Garbage Collection (GC): LMAX avoids this by using pre-allocated, garbage-free data structures.
- Pointer Chasing & Cache Misses: LMAX uses a pre-allocated ring buffer (Disruptor) that is cache-friendly (sequential memory access).
- Batching & False Sharing: Queues often process items one at a time, missing opportunities for batching (which improves throughput). Little's law
10: Double Buffer
Efficient pattern for single writer and single reader case. To ensure thread-safety, ReadWriteLock / Semaphore could be used. Parallel C++: Double Buffering
9: PERFORMANCE NINJA CLASS
Performance Ninja Class is a FREE self-paced online course for developers who want to master software performance tuning. easyperf -> this is the author's amazing blog.
8: The update-alternatives Command in Linux
Linux systems allow easily switching between programs of similar functionality or goal. So we can set a given version of a utility program or development tool for all users. Moreover, the change applies not only to the program itself but to its configuration or documentation as well.
7: Is a write to a volatile a memory-barrier in Java
All writes that occur before a volatile store are visible by any other threads with the predicate that the other threads load this new store. However write that occur before a volatile load my or may not be seen by other threads if they do not load the new value.
In Java, the semantics of volative are defined to ensure visibility and ordering of variables across threads.
- A volatile write in Java means that a StoreStore barrier and a LoadStore barrier are inserted. This ensures that
  All previous writes (stores) are visible before the volatile write.
  The volatile write is visible before any subsequent writes (stores).
- A volatile read in Java means that a LoadLoad barrier and a LoadStore barrier are inserted. This ensures that
  The volatile read is visible before any subsequent reads (loads).
  The volativle read is visible before any subsequent writes (stores).
6: Linux Default Route
commands, list routes: ip route or ip route list show interface: ifconfig add route: ip route add 192.168.1.0/24 via 10.217.245.129 dev bond1 show gateways: route -n check the interface assigned to the bonded interface: ip link show bond0 or cat /proc/net/bonding/bond0
Linux setup default gateway with route command Route internet traffic through a specific interface in Linux Servers – CentOS / RHEL
4: InheritableThreadLocal使用详解
InheritableThreadLocal 就能实现这样的功能，这个类能让子线程继承父线程中已经设置的ThreadLocal值。
3: Design of the Shutdown Hooks API
Why are shutdown hooks run concurrently? Wouldn't it make more sense to run them in reverse order of registration?
Invoking shutdown hooks in their reverse order of registration is certainly intuitive, and is in fact how the C runtime library's atexit procedure works. This technique really only makes sense, however, in a single-threaded system. In a multi-threaded system such as Java platform the order in which hooks are registered is in general undetermined and therefore implies nothing about which hooks ought to be run before which other hooks. Invoking hooks in any particular sequential order also increases the possibility of deadlocks. Note that if a particular subsystem needs to invoke shutdown actions in a particular order then it is free to synchronize them internally.
2: XOR swap algorithm
In computer programming, the exclusive or swap (sometimes shortened to XOR swap) is an algorithm that uses the exclusive or bitwise operation to swap the values of two variables without using the temporary variable which is normally required.
1: Thread Affinity

March

31: Using Pausers in Event Loops
- sleep requests of ~1ms and ~1us reduce CPU usage to ~1% and ~10% respectively compared with busy waiting (100%)
- Here again, there is no single answer as to how the system will behave. The key is to bias the situation as much as possible to avoid the thread being switched from a core, and the use of thread affinity (to avoid the thread being moved to another core) and CPU isolation (to avoid another process/thread contending with the thread) can be very effective in this case1. Careful use of affinity, isolation, and short sleep periods can result in responsive, low-jitter environments, which use considerably fewer CPU resources compared with busy waiting.
- 1 Other options include running with real-time priorities, however we want to keep the focus of this document on standard setups as much as possible
- Why the Cool Kids Use Event Loops Below are some of the key points to consider when choosing to use event Loops:
  Lock Free
  Testing and Evolving Requirements
  Shared Mutable State
  CPU Isolation and Thread Affinity
  Event Driven Architecture
  Resource Utilization SingleAndMultiThreadedExample.java
- Building Fast Trading Engines: Chronicle’s Approach to Low-Latency Trading
  Challenges in Low-Latency Trading
  Threading and Core Utilisation
  Serialisation and Deserialisation
  Message Passing and Data Persistence
  Addressing Low-Latency Trading Pain Points
  Thread Affinity and Event Loop Optimisation
  Efficient Message Passing
  Minimising Garbage Collection
  Performance Tuning for High-Throughput Trading
  Real-World Example: A High-Performance Trading Engine in Action
  Accepting Market Data
  Making Trading Decisions
  Chronicle Queue Enterprise for Communication
  Keeping Latency Stable
30: github useful scripts
- show-busy-java-threads; how to find the thread that uses the most CPU
  top命令找出消耗CPU高的Java进程及其线程id
  开启线程显示模式（top -H，或是打开top后按H）
  按CPU使用率排序（top缺省是按CPU使用降序，已经合要求；打开top后按P可以显式指定按CPU使用降序）
  记下Java进程id及其CPU高的线程id
  查看消耗CPU高的线程栈：
  用进程id作为参数，jstack 出有问题的Java进程; jstack命令解析
  手动转换线程id成十六进制（可以用printf %x 1234）
  在jstack输出中查找十六进制的线程id（可以用vim的查找功能/0x1234，或是grep 0x1234 -A 20）
  查看对应的线程栈，分析问题; 查问题时，会要多次上面的操作以分析确定问题
- tcp-connection-state-counter
29: 操作系统是如何一步步发明中断机制的？
当发生中断时，CPU使用中断号作为索引，查找中断向量表中的对应条目，从而获取中断处理程序的入口地址。操作系统是如何一步步发明进程、线程的？
1. 要实现这一点程序必须具备暂停运行以及恢复运行的能力，要想让程序具备暂停运行/恢复运行的能力就必须保存CPU上下文信息。
2. 设计一个新的抽象概念，让各个运行的程序彼此隔离，为每个程序提供独立的内存空间，你决定采用段氏内存管理，每个运行的程序中的各个段都有自己的内存区域现在你设计了struct context以及struct memory_map，显然它们都属于某一个运行起来的程序，“运行起来的程序”是一个新的概念，你给起了个名字叫做进程，process，现在进程上下文以及内存映射都可以放到进程这个结构体中
每个线程都是进程内的一个独立执行单元，它们：
1. 共享进程的地址空间，这意味着所有线程可以直接访问相同的内存区域
2. 共享打开的文件描述符，避免了重复打开关闭文件的开销
3. 共享其他系统资源，如信号处理函数、进程工作目录等
4. 仅维护独立的执行栈和寄存器状态，确保每个线程可以独立执行
28: Java Annotation Processing and Creating a Builder
An important thing to note is the limitation of the annotation processing API — it can only be used to generate new files, not to change existing ones. If you use Maven to build this jar and try to put this file directly into the src/main/resources/META-INF/services directory, you’ll encounter the following error:
```
[ERROR] Bad service configuration file, or exception thrown while 
constructing Processor object: javax.annotation.processing.Processor: 
Provider com.baeldung.annotation.processor.BuilderProcessor not found
```
This is because the compiler tries to use this file during the source-processing stage of the module itself when the BuilderProcessor file is not yet compiled. The file has to be either put inside another resource directory and copied to the META-INF/services directory during the resource copying stage of the Maven build, or (even better) generated during the build. The Google auto-service library, discussed in the following section, allows generating this file using a simple annotation.
- JavaPoet
- palantir's JavaPoet
27: Blocking Sockets
This means that accept blocks the calling thread until a new connection is available from the OS, but the reverse is not true. The underlying OS will establish TCP connections for the application even if the program is not currently blocked at accept. In other words, accept asks the OS for the first ready-to-use connection, but the OS does not wait for the application to accept connections in order to establish new ones. It might establish many more.
26: hatch
Hatch is a modern, extensible Python project manager.
24: Building a (T1D) Smartwatch from Scratch
Learn how a hardware engineer works.
23: Booleans Are a Trap
Enum may be a better option.
22: On inheritance and subtyping
Explicit Inheritance vs Implicit Inheritance
21: Server-Sent Events (SSE) Are Underrated
LLM and content-type: text/event-stream
20: Difference between Real User ID, Effective User ID and Saved User ID
19: toArray with pre sized array
- Bottom line: toArray(new T[0]) seems faster, safer, and contractually cleaner, and therefore should be the default choice now.
18: AOP in JDK、CGLIB
JDK based AOP leverage reflection which brings in performance cost; while CGLIB uses ASM to modify the original class's bytecode and generates its subclass in runtime to intercept the method call.
17: A minimal CMake project template
Learn how to use CMake properly; and note that CMake is a generator for a building system, itself is not a building system.
16: A Guide to CompletableFuture
The key difference between CompletableFuture and Future is chain.
15: Writing Compilers
- Compiler problems really are everywhere.
- Compiling to Assembly from Scratch
- Crafting Interpreters
  Game Programming Patterns
  These can also be a template to teach you how to write a book
- chibicc: A Small C Compiler
  read each commit one by one

February

27: The concept behind C++ concepts
Concepts are an extension for templates.
- They can be used to perform compile-time validation of template arguments through boolean predicates.
- They can also be used to perform function dispatch based on properties of types.
24: A mental model for Linux file, hard and soft links
- Mental Mode about the understanding of inode, hard and soft links in Linux.
- a soft link links a link file to a target file. This is in contrast to a hard link, which links a pathname to an inode.
- The content of a soft link is the pathname of the target file it points to.
- a hard link exists as a directory entry that links a pathname to an inode, while a soft link exists as a file that links its own pathname to another pathname.
- symlinks, hardlinks and reflinks explained: Note a file can be held open by a process while all hardlinks are subsequently unlinked, leaving the data accessible until the file is closed. The main use for multiply hardlinked files is to create efficient backups.
23: Gradle Tutorial
- Running Gradle Builds
- Authoring Gradle Builds
- Optimizing Gradle Builds
- Dependency Management
22: Stackoverflow: toArray with pre sized array
Bottom line: toArray(new T[0]) seems faster, safer, and contractually cleaner, and therefore should be the default choice now. Future VM optimizations may close this performance gap for toArray(new T[size]), rendering the current "believed to be optimal" usages on par with an actually optimal one. Further improvements in toArray APIs would follow the same logic as toArray(new T[0])the collection itself should create the appropriate storage.
21: 比printf高效1000倍！如何精准捕捉C/C++野指针
在GDB中你可以通过添加watchpoint来观察一段内存，这段内存被修改时程序将会停止，此时我们就能知道到底是哪行代码对该内存进行了修改。
20: Overview of cross-architecture portability problems
This blog post provides an overview of common cross-architecture portability problems encountered in software development, particularly focusing on the challenges when targeting 32-bit systems. It discusses issues related to integer type sizes, address space limitations, large file support, the Y2K38 problem, byte order (endianness), and char signedness. While many of these issues are often discussed in the context of C programming, the author highlights that some, like address space limitations, can affect programs written in higher-level languages such as Python. The post emphasizes that achieving true cross-architecture portability requires careful consideration of these low-level details and can be challenging, especially when dealing with legacy or proprietary software.
19: Meta Schema of Website
6: The Impact of 25% Tariffs on Canadian GDP
Learn how to think like a master from DeepSeek.
5: isd – interactive systemd
isd (interactive systemd) – a better way to work with systemd units
4: changedetection.io
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service.
3: Building a semantic movie search demo with pgvector and Next.js
Image replacement in Canva designs using reverse image search
2: Writing Compilers
"Writing a Compiler in Go"
1: Guava Splitter vs StringUtils
Still I was surprised by the result, and if you're splitting lots of Strings and performance is an issue, it might be worth considering switching back to Commons StringUtils.

January

25: Gamma Scalping: A Primer
long gamma
24: Operating System in 1,000 Lines
23: CPP RVO (Return Value Optimization)
21: Python Chained Expression
- Interview gone wrong
- What does it mean that Python comparison operators chain/group left to right?
20: The Uses of the Exec Command in Shell Script
when run with source, it should be exec /bin/bash -c 'source ...'
15: Understanding and using the multi-target exporter pattern
useful starting from "So what is new compared to the last config?"
14: Zstd-jni
JNI bindings for Zstd native library that provides fast and high compression lossless algorithm for Android, Java and all JVM languages
13: The concept behind C++ concepts
12: Boyer-Moore Majority Voting Algorithm
11: Implementing Raft using a functional effect system
- raft-java easier to start
10: Efficient Memory Mapping for Terabyte Sparse Files in Java
- Chronicle 25: What’s New and Improved: with huge pages
- 为什么HugePage能让Oracle数据库如虎添翼？
9: A quick primer on type traits in modern C++
- What are type traits in C++?
8: Java: Chronicle Bytes, Kicking the Tires
- Guide to ByteBuffer
7: 9 Best Java Profilers to Use in 2024
4: 10 Essential Terminal Commands Every Developer Should Know
3: T2 Linux for Mac
2: How to Implement a FIX Trading Engine in Python — Andres Berejnoi
- Kdb+ and FIX messaging: Working with repeating groups
1: Difference between yyyy and YYYY Java date pattern. What is week-based-year?
- dd/mm/yyyy vs dd/MM/yyyy?
- mm for minutes, MM for months, and in most cases use yyyy for years

PreviousQuant Questions Next2024

Last updated 5 days ago

hashtag2025

hashtagDecember

hashtagNovember

hashtagOctober

hashtagSeptember

hashtagAugust

hashtagJuly

hashtagJune

hashtagMay

hashtagApril

hashtagMarch

hashtagFebruary

hashtagJanuary

2025

December

November

October

September

August

July

June

May

April

March

February

January