2025

2025

December

November

October

  • 30: Microservices in the Chronicle world

  • 29: Lock-free Algorithms: Introductionarrow-up-right

    • With lockfree algorithms a thread that can make forward progress is always one of the currently running threads, and thus it actually makes forward progress. With mutex-based algorithms there is also usually a thread that can make forward progress, however it may be a currently non-running thread, and thus no actual forward progress happens (at least until, for example, a page will be loaded from disk and/or several context switches happen and/or some amount of active spinning happens).

    • For example, it's generally unsafe to use locks in signal handlers, because the lock can be currently acquired by the preempted thread, and it instantly leads to a deadlock.

      • The thread cannot proceed because the signal handler is executed in the context of the thread that was interrupted by the signal. If the signal handler tries to acquire a lock that the thread already holds, the signal handler will block, waiting for the lock to be released. However the thread itself cannot release the lock because it is effectively paused while the signal handler is running. This creates a deadlock situation where neither the thread nor the signal handler can make progress.

    • Lock-free Algorithms: First things firstarrow-up-right

      • First, if there is write sharing system ungracefully degrades, the more threads we add the slower it becomes.

      • Second, if there is no write sharing system linearly scales. Yes, atomic RMW operations are slower than plain stores and loads, but they do scale linearly in itself.

      • Third, loads are always scalable. Several threads are able to read a memory location simultaneously. Read-only accesses are your best friends in a concurrent environment.

    • Lock-free Algorithms: Your Arsenalarrow-up-right

      • Compare-And-Swap

      • Fetch-And-Add

      • Exchange

      • Atomic loads and stores

      • Mutexes and the company

        • Why not? The most stupid thing one can do is try to implement everything in a non-blocking style (of course, if you are not writing infantile research paper, and not betting a money). Generally it's perfectly Ok to use mutexes/condition variables/semaphores/etc on cold-paths. For example, during process or thread startup/shutdown mutexes and condition variables is the way to go.

  • 8: How to Leverage Method Chaining to Add Smart Message Routingarrow-up-right

    This article has shown how it is possible to use method chaining to route messages, but this is not the only use-case for method chaining. This technique can also allow other types of metadata to be associated with business events. Other uses for method chaining and associating meta-information include setting a message priority for a priority queue or recording access history. Then, Dispatching events with associated metadata over an event-driven architecture (EDA) framework allows custom lightweight microservices to read and act upon that metadata.

  • 7: tunnel using websockets using cranker-connectorarrow-up-right

    Another idea other than ssh -R <remote_port>:localhost:<local_port> user@remote_host

  • 5: TEXMAKERarrow-up-right

  • 4: Simple Binary Encodingarrow-up-right

  • 3: Why aeron`s logbuffers divide into three selections?arrow-up-right

    The main points are that it enables an algorithm which is wait-free for concurrent publication and supports retransmits on the network in the event of loss. "Aeron: Open-source high-performance messaging" by Martin Thompsonarrow-up-right

    • Composable Design

    • OSI layer 4 Transport for message oriented streams

      • Connection Oriented Communication

      • Reliability

      • Flow Control

      • Congestion Avoidance/Control

      • Multiplexing

        • avoid head of line blocking

    • Design Principles

      • Clear separation of converns

      • Garbage free in steady state running

      • Lock-free, wait-free, and copy-free in data structures in the message path

      • Respect the Single Writer Principle

      • Major data structures are not shared

      • Don't burden the main path with exceptional cases

      • Non-blocking in the message

    • Putting a Disruptor in front of the network is not necessary as there is Zero Copy from the application to the network.

    • How the skip list is used to build the messaging system from the point of view of contiguity of streaming data? TODO

  • 2: Function Pointer to Member Function in C++arrow-up-right

    Dereferencing the member function pointer from the class for the current object/pointer.

  • 1: simple-binary-encoding design principlesarrow-up-right

    1. Copy-Free: The principle of copy-free is to not employ any intermediate buffers for the encoding or decoding of messages.

    2. Native Type Mapping: For example, a 64-bit integer can be encoded directly to the underlying buffer as a single x86_64 MOV assembly instruction.

    3. Allocate-Free: The design of SBE codecs are allocation-free by employing the flyweight pattern. The flyweight windows over the underlying buffer for direct encoding and decoding of messages.

      • Flyweight Pattern in Javaarrow-up-right: Here the flyweight pattern is used to minimize memory usage or computational expenses by sharing as much as possible with similar objects, which is different from the SBE flyweight pattern.

    4. Streaming Access: It is possible to backtrack to a degree within messages but this is highly discouraged from a performance and latency perspective.

      • Memory Access Patterns Are Importantarrow-up-right

        • Basically three major bets are taken on memory access patterns:

          • Temporal: Memory accessed recently will likely be required again soon.

          • Spatial: Adjacent memory is likely to be required soon.

            • For an Intel processor these cache-lines are typically 64-bytes, that is 8 words on a 64-bit machine. This plays to the spatial bet that adjacent memory is likely to be required soon, which is typically the case if we think of arrays or fields of an object.

          • Striding: Memory access is likely to follow a predictable pattern.

            • Hardware will try and predict the next memory access our programs will make and speculatively load that memory into fill buffers. This is done at it simplest level by pre-loading adjacent cache-lines for the spatial bet, or by recognising regular stride based access patterns, typically less than 2KB in stride length.

        • By moving to larger pages, a TLB cache can cover a larger address range for the same number of entries.

        • Cache-Oblivious Algorithmsarrow-up-right and Cache-oblivious algorithm wikiarrow-up-right

          • The idea behind cache-oblivious algorithms is efficient usage of processor caches and reduction of memory bandwidth requirements.

          • Cache-oblivious algorithms work by recursively dividing a problem's dataset into smaller parts and then doing as much computations of each part as possible. Eventually subproblem dataset fits into cache, and we can do significant amount of computations on it without accessing memory

        • When designing algorithms and data structures, it is now vitally important to consider cache-misses, probably even more so than counting steps in the algorithm.

        • The last decade has seen some fundamental changes in technology. For me the two most significant are the rise of multi-core, and now big-memory systems with 64-bit address spaces.

    5. Word Aligned Access: It is assumed the messages are encapsulated within a framing protocol on 8 byte boundaries. To achieve compact and efficient messages the fields should be sorted in order by type and descending size.

    6. Backward Compatibility: An extension mechanism is designed into SBE which allows for the introduction of new optional fields within a message that the new systems can use while the older systems ignore them until upgrade.

September

  • 21: 航天“双五归零”arrow-up-right

    发现它,理解它,复现它,修复它,最后消灭它的所有同类。

  • 20: Agents & Idle Strategiesarrow-up-right

    A typical duty cycle will poll the doWork function of an agent until it returns zero. Once the zero is returned, the idle strategy will be called.

  • 19: The Problem with Threadsarrow-up-right

    Threads, as a model of computation, are wildly nondeterministic, and the job of the programmer becomes one of pruning that nondeterminism. Although many research techniques improve the model by offering more effective pruning, I argue that this is approaching the problem backwards. Rather than pruning nondeterminism, we should build from essentially deterministic, composable components. Nondeterminism should be explicitly and judiciously introduced where needed, rather than removed where not needed.

  • 18: Billions of Messages Per Minute Over TCP/IParrow-up-right

  • 17: The Unix Philosophy for Low Latencyarrow-up-right

    Much of Unix’s success can be attributed to the “Unix Philosophy” which can be very briefly summarised as:

    • Write programs that do one thing and do it well

    • Write programs to work together

    • Write programs to handle text streams, because that is a universal interface

  • 16: Base85 encodingarrow-up-right

    • Like Base64, the goal of Base85 encoding is to encode binary data printable ASCII characters. But it uses a larger set of characters, and so it can be a little more efficient. Specifically, it can encode 4 bytes (32 bits) in 5 characters.

    • Base 32 and base 64 encodingarrow-up-right

      • There are around 100 possible characters on a keyboard, and 64 is the largest power of 2 less than 100, and so base 64 is the most dense encoding using common characters in a base that is a power of 2.

    • Base 58 encoding and Bitcoin addressesarrow-up-right

      • Base58 is nearly as efficient as base64, but more concerned about confusing letters and numbers. The number 1, the lower case letter l, and the upper case letter I all look similar, so base58 retains the digit 1 and does not use the lower case letter l or the capital letter I.

      • it may take up to 35 characters to represent a Bitcoin address in base58. Using base64 would have taken up to 34 characters, so base58 pays a very small price for preventing a class of errors relative to base64.

    • How UTF-8 worksarrow-up-right

      • Since the first bit of ASCII characters is set to zero, bytes with the first bit set to 1 are unused and can be used specially.

      • Unicode initially wanted to use two bytes instead of one byte to represent characters, which would allow for 2^16 = 65,536 possibilities, enough to capture a lot of the world’s writing systems. But not all, and so Unicode expanded to four bytes.

      • Although a Unicode character is ostensibly a 32-bit number, it actually takes at most 21 bits to encode a Unicode character for reasons explained here. How many possible Unicode characters there are and whyarrow-up-right

      • UTF-8 lets you take an ordinary ASCII file and consider it a Unicode file encoded with UTF-8. So UTF-8 is as efficient as ASCII in terms of space. But not in terms of time. If software knows that a file is in fact ASCII, it can take each byte at face value, not having to check whether it is the first byte of a multibyte sequence.

      • And while plain ASCII is legal UTF-8, extended ASCII is not. So extended ASCII characters would now take two bytes where they used to take one.

  • 15: Liquidity Modelsarrow-up-right

  • 14: Creating Mappers Without Creating Underlying Objects in Javaarrow-up-right

    A HashMap with int keys and long values might, for each entry, create a wrapped Integer, a wrapped Long object, and a Node that holds the former values together with a hash value and a link to other potential Node objects sharing the same hash bucket. Perhaps even more tantalizing is that a wrapped Integer might be created each time the Map is queried! For example, using the Map::get operation.

  • 13: Java Memory Managementarrow-up-right

    • Phantom Reference: Used to schedule post-mortem cleanup actions, since we know for sure that objects are no longer alive. Used only with a reference queue, since the .get() method of such references will always return null. These types of references are considered preferable to finalizers.

    • -XX:+HeapDumpOnOutOfMemoryError

    • -verbose:gc

    • -Xms512m -Xmx1024m -Xss1m -Xmn256m

    • -Xlog:gc*:file=gc.log:time,uptime,level,tags -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime

  • 12: Java: Creating Terabyte Sized Queues with Low-Latencyarrow-up-right

    • The ConcurrentLinkedQueue will create a wrapping Node for each element added to the queue. This will effectively double the number of objects created.

    • Objects are placed on the Java heap, contributing to heap memory pressure and garbage collection problems. On my machine, this led to my entire JVM becoming unresponsive and the only way forward was to kill it forcibly using “kill -9”.

    • The queue cannot be read from other processes (i.e. other JVMs).

    • Once the JVM terminates, the content of the queue is lost. Hence, the queue is not durable.

    • a single MarketData instance can be reused over and over again because Chronicle Queue will flatten out the content of the current object onto the memory-mapped file, allowing object reuse.

  • 11: Java: How Object Reuse Can Reduce Latency and Improve Performancearrow-up-right

    • Hence, contrary to many beliefs, creating a POJO, setting some values in one thread, and handing that POJO off to another thread will simply not work. The receiving thread might see no updates, might see partial updates (such as the lower four bits of a long were updated but not the upper ones), or all updates. To make thighs worse, the changes might be seen 100 nanoseconds later, one second later or they might never be seen at all. There is simply no way to know.

      • One way to avoid the POJO problem is to declare primitive fields (such as int and long fields) volatile and use atomic variants for reference fields. Declaring an array as volatile means only the reference itself is volatile and does not provide volatile semantics to the elements.

      • Another way to reuse objects is by means of ThreadLocal variables which will provide distinct and time-invariant instances for each thread.

      • It should be noted that there are other ways to ensure memory consistency. For example, using the perhaps less known Java class Exchanger.

      • Yet another way is to use open-source Chronicle Queue which provides an efficient, thread-safe, object creation-free means of exchanging messages between threads.

    • jmap -histo 8536

    • As can be seen, Chronicle Queue spends most of its time accessing field values in the POJO to be written to the queue using Java reflection. Even though it is a good indicator that the intended action (i.e. copying values from a POJO to a Queue) appears somewhere near the top, there are ways to improve performance even more by providing hand-crafted methods for serialization substantially reducing execution time. (instead of SelfDescribingMarshallable)

  • 10: Chronicle JLBHarrow-up-right

    Java Latency Benchmark Harness is a tool that allows you to benchmark your code running in context, rather than in a microbenchmark.

  • 9: Chronicle Wire: Object Marshallingarrow-up-right

    • Chronicle Wire is able to find a middle ground between compacting data formatting (storing more data into the same space) versus compressing data (reducing the amount of storage required).

    • Typically, when we talk about a byte, a byte can represent one of 256 different characters. Yet, rather than being able to represent one of 256 characters, because we used Base64LongConverter we are saying that the 8-bit byte can only represent one of 64 characters. By limiting the number of characters that can be represented in a byte, we are able to compress more characters into a long.

    • Chronicle-Wirearrow-up-right: Acts as a serialization library that abstracts over various wire formats (e.g., YAML, JSON, binary). It handles marshalling (serialization) and unmarshalling (deserialization) of Java objects into/from these formats, emphasizing performance, schema evolution, and cross-platform compatibility.

    • Chronicle-Bytesarrow-up-right: Focuses on low-level memory management and byte manipulation. It provides wrappers around byte arrays, ByteBuffers, and off-heap memory, offering thread-safe operations, elastic resizing, and deterministic resource release. It is similar to Java NIO's ByteBuffer but with extended features.

    • Did You Know the Fastest Way of Serializing a Java Field Is Not Serializing It at All?arrow-up-right

      • Many JVMs will sort primitive class fields in descending field size order and lay them out in succession. This has the advantage that read and write operations can be performed on even primitive type boundaries.

      • Well, as it turns out, it is possible to access an object’s field memory region directly via Unsafe and use memcpy to directly copy the fields in one single sweep to memory or to a memory-mapped file.

    • High-Performance Java Serialization to Different Formatsarrow-up-right

      • The encoding will affect the number of bytes used to store the data, the more compact the format, the fewer bytes used. Chronicle Wire balances the compactness of the format without going to the extreme of compressing the data, which would use valuable CPU time, Chronicle Wire aims to be flexible and backwards compatible, but also very performant.

      • Some encodings are more performant, perhaps by not encoding the field names to reduce the size of the encoded data, this can be achieved by using Chronicle Wire’s Field Less Binary. However this is a trade-off, sometimes it is better to sacrifice a bit of performance and add the field names since it will give us both forwards and backwards compatibility.

  • 8: Chronicle-Maparrow-up-right

    When deciding between on-heap and off-heap you are trading the extra memory you require for on-heap implementation against the extra latency to fetch the item from the queue in the off-heap implementation. The general rule is to favour on-heap unless you have very large maps. Another consideration is that off-heap maps will update faster than on-heap maps as there is no serialisation.

    • Java: ChronicleMap, Part 1: Go Off-Heaparrow-up-right

      • jmap -histo 34366 | head to check the number of objects created.

      • -XX:NativeMemoryTracking=summary, we can retrieve the amount off-heap memory being used by issuing the following command: jcmd 34413 VM.native_memory | grep Internal

      • Many Garbage Collection (GC) algorithms complete in a time that is proportional to the square of objects that exist on the heap.

      • The mediator between heap and off-heap memory is often called a serializer.

        • Memory Layout of Objects in Javaarrow-up-right

          • For normal objects in Java, represented as instanceOop, the object header consists of mark and klass words plus possible alignment paddings. After the object header, there may be zero or more references to instance fields. So, that’s at least 16 bytes in 64-bit architectures because of 8 bytes of the mark, 4 bytes of klass, and another 4 bytes for padding.

          • For arrays, represented as arrayOop, the object header contains a 4-byte array length in addition to mark, klass, and paddings. Again, that would be at least 16 bytes because of 8 bytes of the mark, 4 bytes of klass, and another 4 bytes for the array length.

        • When you want to store a Java object (from the heap) into off-heap memory, the serializer's job is to convert that complex, structured object into a simple, flat sequence of bytes.

    • Java: ChronicleMap, Part 2: Super RAM Mapsarrow-up-right

      • Needless to say, you should make sure that the file you are mapping to is located on a file system with high random access performance. For example, a filesystem located on a local SSD.

  • 7: Improving Putty settings on Windowsarrow-up-right

    Make the Putty more developer friendly.

  • 6: log4j2: Garbage-free loggingarrow-up-right

    How to configure garbage-free logging with Log4j2.

  • 5: 如何让交易不被“压垮”?arrow-up-right

    • 队列必须有界,而且这个界限就是你的拥塞窗口。

    • 最大延迟 ≈ (单次处理耗时 / 并发数) × 窗口大小 -> Little's Law: W = 1/ λ x L

    • 拥塞控制的核心,是一个反馈循环:感知拥塞,然后调整窗口。

      • 窗口占用率 TCP ECN

      • 处理单个请求的时间; 监控单次事务耗时的P99分位数,和监控队列深度同等重要。

      • 网络路由器在高负载时丢弃数据包

    • 有了拥塞窗口和拥塞信号,你就可以构建一个控制算法了。这和TCP的AIMD(加性增、乘性减)思想异曲同工。

      • 在网关层,直接拒绝。

      • 在网关层,感知撮合拥塞。

      • 在服务内,自然阻塞。

  • 3: How Can AI ID a Cat? An Illustrated Guide.arrow-up-right

    A neuron with two inputs has three parameters. Two of them, called weights, determine how much each input affects the output. The third parameter, called the bias, determines the neuron’s overall preference for putting out 0 or 1.

  • 2: PerfectScramblearrow-up-right

    This searches all possible arrangements of a 3x3 Rubik's Cube to find a scramble that is very difficult to solve.

August

July

June

May

April

  • 26: An Introduction to Epsilon GC: A No-Op Experimental Garbage Collectorarrow-up-right

    JEP 318 explains that “[Epsilon] … handles memory allocation but does not implement any actual memory reclamation mechanism. Once the available Java heap is exhausted, the JVM will shut down.”

  • 25: Proof Engineering: The Message Busarrow-up-right

    Every input into the system is assigned a globally unique monotonic sequence number and timestamp by a central component known as a sequencer. This sequenced stream of events is disseminated to all nodes/applications in the system, which only operate on these sequenced inputs, and never on any other external inputs that have not been sequenced. Any outputs from the applications must also first be sequenced before they can be consumed by other applications or the external world. Since all nodes in the distributed system are presented with the exact same sequence of events, it is relatively straightforward for them to arrive at the same logical state after each event, without incurring any overhead or issues related to inter-node communication.

  • 19: Finding Memory Leak through MATarrow-up-right

    The following 4-step approach proved to be most efficient to detect memory issues:

    1. Get an overview of the heap dump. See: Overview

    2. Find big memory chunks (single objects or groups of objects).

    3. Inspect the content of this memory chunk.

    4. If the content of the memory chunk is too big check who keeps this memory chunk alive This sequence of actions is automated in Memory Analyzer by the Leak Suspects Report.

  • 18: Suffering-oriented programmingarrow-up-right

    First make it possible. Then make it beautiful. Then make it fast.

  • 17: Proof Engineering: The Algorithmic Trading Platformarrow-up-right

    • The best way to avoid GC is to not create garbage in the first place. This topic could fill a book, but the primary ways to do that are: (a) Do not create new objects in the critical path of processing. Create all the objects you’ll need upfront and cache them in object pools. (b) Do not use Java strings. Java strings are immutable objects that are a common source of garbage. We use pooled custom strings that are based on java.lang.StringBuilder (c) Do not use standard Java collections. More on this below (d) Careful about boxing/unboxing of primitive types, which can happen when using standard collections or during logging. (e) Consider using off-heap memory buffers where appropriate (we use some of the utilities available in chronicle-core).

    • Avoid standard Java collections. Most standard Java collections use a companion Entry or Node object, that is created and destroyed as items are added/removed. Also, every iteration through these collections creates a new Iterator object, which contributes to garbage. Lastly, when used with primitive data types (e.g. a map of long → Object), garbage will be produced with almost every operation due to boxing/unboxing. When possible, we use collections from agrona and fastutil (and rarely, guava).

    • Write deterministic code. We’ve alluded to determinism above, but it deserves elaboration, as this is key to making the system work. By deterministic code, we mean that the code should produce the exact same output each time it is presented with a given sequenced stream, down to even the timestamps. This is easier said than done, because it means that the code may not use constructs such as external threads, or timers, or even the local system clock. The very passage of time must be derived from timestamps seen on the sequenced stream. And it gets weirder from there — like, did you know that the iteration order of some collections (e.g. java.util.HashMap) is non-deterministic because it relies on the hashCode of the entry keys?!

    • but our changes enable us to integrate QuickFIX/J with the sequenced stream architecture in such a way that we no longer rely on disk logs for recovery (which is how most FIX sessions recover).

    • Our FIX spec is available in either the PDF format or the ATDL format (Algorithmic Trading Definition Language).

  • 13: The Escape of ArrayList.iterator()arrow-up-right

    Escape Analysis works, at least for some trivial cases. It is not as powerful as we'd like it, and code that is not hot enough will not enjoy it, but for hot code it will happen. I'd be happier if the flags for tracking when it happens were not debug only.

  • 12: What is the meaning of SO_REUSEADDR (setsockopt option) - Linux?arrow-up-right

    This socket option tells the kernel that even if this port is busy (in the TIME_WAIT state), go ahead and reuse it anyway. If it is busy, but with another state, you will still get an address already in use error. It is useful if your server has been shut down, and then restarted right away while sockets are still active on its port.

  • 11: Single Writer Principlearrow-up-right

    If a system is decomposed into components that keep their own relevant state model, without a central shared model, and all communication is achieved via message passing then you have a system without contention naturally. This type of system obeys the single writer principle if the messaging passing sub-system is not implemented as queues. If you cannot move straight to a model like this, but are finding scalability issues related to contention, then start by asking the question, “How do I change this code to preserve the Single Writer Principle and thus avoid the contention?” LMAX - How to Do 100K TPS at Less than 1ms Latencyarrow-up-right: the head and the tail compete with each other quite often since the queue normally is either full or empty, and when it's empty, they are normally pointing to the same cacheline. Why queue is not a good data structure for low latency?

    • Contention & Locking Overhead: locks / cache coherence traffic

    • Memory Allocation & Garbage Collection (GC): LMAX avoids this by using pre-allocated, garbage-free data structures.

    • Pointer Chasing & Cache Misses: LMAX uses a pre-allocated ring buffer (Disruptor) that is cache-friendly (sequential memory access).

    • Batching & False Sharing: Queues often process items one at a time, missing opportunities for batching (which improves throughput). Little's lawarrow-up-right

  • 10: Double Bufferarrow-up-right

    Efficient pattern for single writer and single reader case. To ensure thread-safety, ReadWriteLock / Semaphore could be used. Parallel C++: Double Bufferingarrow-up-right

  • 9: PERFORMANCE NINJA CLASSarrow-up-right

    Performance Ninja Class is a FREE self-paced online course for developers who want to master software performance tuning. easyperfarrow-up-right -> this is the author's amazing blog.

  • 8: The update-alternatives Command in Linuxarrow-up-right

    Linux systems allow easily switching between programs of similar functionality or goal. So we can set a given version of a utility program or development tool for all users. Moreover, the change applies not only to the program itself but to its configuration or documentation as well.

  • 7: Is a write to a volatile a memory-barrier in Javaarrow-up-right

    All writes that occur before a volatile store are visible by any other threads with the predicate that the other threads load this new store. However write that occur before a volatile load my or may not be seen by other threads if they do not load the new value.

    In Java, the semantics of volative are defined to ensure visibility and ordering of variables across threads.

    • A volatile write in Java means that a StoreStore barrier and a LoadStore barrier are inserted. This ensures that

      1. All previous writes (stores) are visible before the volatile write.

      2. The volatile write is visible before any subsequent writes (stores).

    • A volatile read in Java means that a LoadLoad barrier and a LoadStore barrier are inserted. This ensures that

      1. The volatile read is visible before any subsequent reads (loads).

      2. The volativle read is visible before any subsequent writes (stores).

  • 6: Linux Default Route

    commands, list routes: ip route or ip route list show interface: ifconfig add route: ip route add 192.168.1.0/24 via 10.217.245.129 dev bond1 show gateways: route -n check the interface assigned to the bonded interface: ip link show bond0 or cat /proc/net/bonding/bond0

    Linux setup default gateway with route commandarrow-up-right Route internet traffic through a specific interface in Linux Servers – CentOS / RHELarrow-up-right

  • 4: InheritableThreadLocal使用详解arrow-up-right

    InheritableThreadLocal 就能实现这样的功能,这个类能让子线程继承父线程中已经设置的ThreadLocal值。

  • 3: Design of the Shutdown Hooks APIarrow-up-right

    Why are shutdown hooks run concurrently? Wouldn't it make more sense to run them in reverse order of registration?

    Invoking shutdown hooks in their reverse order of registration is certainly intuitive, and is in fact how the C runtime library's atexit procedure works. This technique really only makes sense, however, in a single-threaded system. In a multi-threaded system such as Java platform the order in which hooks are registered is in general undetermined and therefore implies nothing about which hooks ought to be run before which other hooks. Invoking hooks in any particular sequential order also increases the possibility of deadlocks. Note that if a particular subsystem needs to invoke shutdown actions in a particular order then it is free to synchronize them internally.

  • 2: XOR swap algorithmarrow-up-right

    In computer programming, the exclusive or swap (sometimes shortened to XOR swap) is an algorithm that uses the exclusive or bitwise operation to swap the values of two variables without using the temporary variable which is normally required.

March

  • 31: Using Pausers in Event Loopsarrow-up-right

    • sleep requests of ~1ms and ~1us reduce CPU usage to ~1% and ~10% respectively compared with busy waiting (100%)

    • Here again, there is no single answer as to how the system will behave. The key is to bias the situation as much as possible to avoid the thread being switched from a core, and the use of thread affinity (to avoid the thread being moved to another core) and CPU isolation (to avoid another process/thread contending with the thread) can be very effective in this case1. Careful use of affinity, isolation, and short sleep periods can result in responsive, low-jitter environments, which use considerably fewer CPU resources compared with busy waiting.

    • 1 Other options include running with real-time priorities, however we want to keep the focus of this document on standard setups as much as possible

    • Why the Cool Kids Use Event Loopsarrow-up-right Below are some of the key points to consider when choosing to use event Loops:

      1. Lock Free

      2. Testing and Evolving Requirements

      3. Shared Mutable State

      4. CPU Isolation and Thread Affinity

      5. Event Driven Architecture

    • Building Fast Trading Engines: Chronicle’s Approach to Low-Latency Tradingarrow-up-right

      • Challenges in Low-Latency Trading

        1. Threading and Core Utilisation

        2. Serialisation and Deserialisation

        3. Message Passing and Data Persistence

      • Addressing Low-Latency Trading Pain Points

        1. Thread Affinity and Event Loop Optimisation

        2. Efficient Message Passing

        3. Minimising Garbage Collection

        4. Performance Tuning for High-Throughput Trading

      • Real-World Example: A High-Performance Trading Engine in Action

        1. Accepting Market Data

        2. Making Trading Decisions

        3. Chronicle Queue Enterprise for Communication

        4. Keeping Latency Stable

  • 30: github useful scriptsarrow-up-right

    • show-busy-java-threads; how to find the thread that uses the most CPU

      1. top命令找出消耗CPU高的Java进程及其线程id

        1. 开启线程显示模式(top -H,或是打开top后按H)

        2. 按CPU使用率排序(top缺省是按CPU使用降序,已经合要求;打开top后按P可以显式指定按CPU使用降序)

        3. 记下Java进程id及其CPU高的线程id

      2. 查看消耗CPU高的线程栈:

        1. 用进程id作为参数,jstack 出有问题的Java进程; jstack命令解析arrow-up-right

        2. 手动转换线程id成十六进制(可以用printf %x 1234)

        3. 在jstack输出中查找十六进制的线程id(可以用vim的查找功能/0x1234,或是grep 0x1234 -A 20)

      3. 查看对应的线程栈,分析问题; 查问题时,会要多次上面的操作以分析确定问题

    • tcp-connection-state-counter

  • 29: 操作系统是如何一步步发明中断机制的?arrow-up-right

    当发生中断时,CPU使用中断号作为索引,查找中断向量表中的对应条目,从而获取中断处理程序的入口地址。 操作系统是如何一步步发明进程、线程的?arrow-up-right

    1. 要实现这一点程序必须具备暂停运行以及恢复运行的能力,要想让程序具备暂停运行/恢复运行的能力就必须保存CPU上下文信息。

    2. 设计一个新的抽象概念,让各个运行的程序彼此隔离,为每个程序提供独立的内存空间,你决定采用段氏内存管理,每个运行的程序中的各个段都有自己的内存区域 现在你设计了struct context以及struct memory_map,显然它们都属于某一个运行起来的程序,“运行起来的程序”是一个新的概念,你给起了个名字叫做进程,process,现在进程上下文以及内存映射都可以放到进程这个结构体中

    每个线程都是进程内的一个独立执行单元,它们:

    1. 共享进程的地址空间,这意味着所有线程可以直接访问相同的内存区域

    2. 共享打开的文件描述符,避免了重复打开关闭文件的开销

    3. 共享其他系统资源,如信号处理函数、进程工作目录等

    4. 仅维护独立的执行栈和寄存器状态,确保每个线程可以独立执行

  • 28: Java Annotation Processing and Creating a Builderarrow-up-right

    An important thing to note is the limitation of the annotation processing API — it can only be used to generate new files, not to change existing ones. If you use Maven to build this jar and try to put this file directly into the src/main/resources/META-INF/services directory, you’ll encounter the following error:

    This is because the compiler tries to use this file during the source-processing stage of the module itself when the BuilderProcessor file is not yet compiled. The file has to be either put inside another resource directory and copied to the META-INF/services directory during the resource copying stage of the Maven build, or (even better) generated during the build. The Google auto-service library, discussed in the following section, allows generating this file using a simple annotation.

  • 27: Blocking Socketsarrow-up-right

    This means that accept blocks the calling thread until a new connection is available from the OS, but the reverse is not true. The underlying OS will establish TCP connections for the application even if the program is not currently blocked at accept. In other words, accept asks the OS for the first ready-to-use connection, but the OS does not wait for the application to accept connections in order to establish new ones. It might establish many more.

  • 26: hatcharrow-up-right

    Hatch is a modern, extensible Python project manager.

  • 24: Building a (T1D) Smartwatch from Scratcharrow-up-right

    Learn how a hardware engineer works.

  • 23: Booleans Are a Traparrow-up-right

    Enum may be a better option.

  • 22: On inheritance and subtypingarrow-up-right

    Explicit Inheritance vs Implicit Inheritance

  • 21: Server-Sent Events (SSE) Are Underratedarrow-up-right

    LLM and content-type: text/event-stream

  • 19: toArray with pre sized arrayarrow-up-right

    • Bottom line: toArray(new T[0]) seems faster, safer, and contractually cleaner, and therefore should be the default choice now.

  • 18: AOP in JDK、CGLIBarrow-up-right

    JDK based AOP leverage reflection which brings in performance cost; while CGLIB uses ASM to modify the original class's bytecode and generates its subclass in runtime to intercept the method call.

  • 17: A minimal CMake project templatearrow-up-right

    Learn how to use CMake properly; and note that CMake is a generator for a building system, itself is not a building system.

  • 16: A Guide to CompletableFuturearrow-up-right

    The key difference between CompletableFuture and Future is chain.

  • 15: Writing Compilersarrow-up-right

February

January

Last updated