Java HFT Toolbox

Aeron

Chronicle

Articles

Write Combining
- Loop unswitching
- Loop unrolling

Memory-mapped files require direct access to the underlying file system to map a file's contents into memory. Shared drives, such as network drives, do not provide the necessary low-level access and control over the file system required for memory mapping. This limitation is due to the following reasons:

Performance: Memory-mapped files rely on fast, low-latency access to the file system, which is not guaranteed over a network.
Consistency: Ensuring data consistency and coherency across a network is complex and not feasible for memory-mapped files.
File System Control: Memory mapping requires specific file system operations that are not supported by network protocols used for shared drives. For these reasons, memory-mapped files are typically restricted to local file systems.

Memory-mapped files require specific file system operations that include:

File Mapping: The ability to map a file's contents directly into the virtual address space of a process. This involves creating a mapping between the file and the memory.
Direct 1/O Access: Low-level access to the file system to read and write data directly to and from memory without intermediate buffering.
Locking Mechanisms: The ability to lock portions of the file to ensure data consistency and prevent concurrent modifications.
Synchronization: Ensuring that changes made to the memory-mapped region are synchronized with the underlying file on disk. These operations are typically supported by local file systems but are not feasible over network file systems due to performance, consistency, and control limitations.

Chronicle Queue

Chronicle Threads

simple event loop in python for understanding

Java-Thread-Affinity

AffinityLock.java

Chronicle-Wire

A Low Garbage Java Serialisation Library that supports multiple formats.

LMAX-Exchange

disruptor

lock-free using volatile / AtomicLong
padding to prevent false sharing
batching for disk/network writing (e.g., the size of a block is 4k)
enable the ability to zero garbage route using byte array or using the immutable object

JDK

loom

Project Loom

程序员应如何理解高并发中的协程

SynchronousQueue.java

SynchronousQueue

Exchanger

LinkedTransferQueue

Performance Testing & Analysis

Java Microbenchmark Harness (JMH)

What false sharing is and how JVM prevents it

arthas

Runtime Information Collector

Java Flight Recorde and JMC / JVisualVM

Memory Analyzer

MAT

Books

Java Concurrency in Practice

Most Java (and Scala) programmers know about Java Concurrency in Practice. It is indeed an essential book, and anyone serious about concurrency should read it cover-to-cover. After that, it's muddier. There is no single book that complements that one. You have to read many, though not necessarily cover-to-cover.

Here are 6 additional concurrency books that I personally have in my book-shelve and that will considerably deepen your knowledge.

This is not a comprehensive list, but if you read all these, do the exercises, and then try to implement these concepts on hobby projects - for example, building HTTP server, an HTTP client, or a actor framework - it will put you above most developers.

TCP/IP Sockets in Java, Kenneth Calvert et al.

At less than 180 pages, this is the shortest of the list. Don't let the size fool you, it is comprehensive, and packed with clear and succinct explanations of TCP and UDP sockets in Java. Both "plain" and non-blocking sockets, but no asynchronous sockets. It contains practical explanations, but where it shines is in the insights into network low-level details. I particularly liked the last chapter with a brief, yet clear explanation on how TCP works under the hood.

Programming with POSIX threads, David R. Butenhof

Not Java specific. An old book that is a reference in the field. Its an easy and insightful read, but no point in doing it before "Java Concurrency in Practice". It details what a thread is, and how to use them. It also covers concurrency primitives like mutexes, and conditional variables. This is all at the OS level (for UNIX systems), but it is very relevant for the Java programmer, since JVM implementations on Linux and Mac OS use POSIX threads, so reading it gives you great insights.

Learning Concurrent Programming in Scala, Aleksandar Prokopec

Scala shares the same memory model as Java. It relies on the same primitives provided by the JVM. Even if you are a Java programmer, it is worth reading some chapters of this book, as it explains some topics with a bit more detail than Java Concurrency in Practice. If you are a Java developer, read chapters 1 to 3. If Scala is relevant to you, read until chapter 4. The remaining chapters are less important, and some are already out of date.

The Art of Multiprocessor Programming, Maurice Herlihy et al

The most advanced book in the list. Language agnostic, but the practical examples are in Java, whilst the lower-level concepts are in C++. The first chapters are heavy on theory, and will likely demoralise you, if you don't already have a strong grasp of concurrency. The second part is more practical and it details how to actually construct some data-structures to be concurrent.

The Little Book of Semaphores, Allen Downey

A fantastic book. Language agnostic and very compact. It consists of a collection of exercises around classical concurrency problems like the Dining Philosophers problem. As the name suggests, the objective is to solve every problem using one or several semaphores. Code snippets are in Python, but resemble pseudo-code, and should be no problem for the Java/Scala developer.

Effective Java: Bloch, Joshua

Although this book is not targeting at HFT, but the knowledge contained is quite important for writing a robust and high performance application.

Others

JCTools

MpmcArrayQueue.java
- False Sharing
- Bounded MPMC queue

log4j2

Unbox.java
- Utility for preventing primitive parameter values from being auto-boxed. Auto-boxing creates temporary objects which contribute to pressure on the garbage collector. With this utility users can convert primitive values directly into text without allocating temporary objects.
- Java Performance Notes: Autoboxing / Unboxing
- private final ThreadLocal<int[]> current = new ThreadLocal<>();: By using ThreadLocal<int[]>, you can store multiple integer values in a single thread-local variable without the overhead of boxing.
- “Item 61: Prefer primitive types to boxed primitives”, Effective Java, Third Edition
some optimizations around rendering of timestamps if a built-in format is used
tries its best to provide overloads that avoid varargs (“Item 53: Use varargs judiciously”, Effective Java, Third Edition)

GNU Trove

The GNU Trove library has two objectives:

Provide "free" (as in "free speech" and "free beer"), fast, lightweight implementations of the java.util Collections API. These implementations are designed to be pluggable replacements for their JDK equivalents.
Whenever possible, provide the same collections support for primitive types. This gap in the JDK is often addressed by using the "wrapper" classes (java.lang.Integer, java.lang.Float, etc.) with Object-based collections. For most applications, however, collections which store primitives directly will require less space and yield significant performance gains.

HikariCP

netty

NIO Buffer
HashedWheelTimer
MPSC Queue
FastThreadLocal
Netty核心原理剖析与RPC实践
- EventLoop
https://learn.lianglianglee.com/专栏/Netty%20核心原理剖析与%20RPC%20实践-完/00%20学好%20Netty，是你修炼%20Java%20内功的必经之路.md

seqlock

NUMA vs SMP

JEP 345: NUMA-Aware Memory Allocation for G1 GC
Azul C4 is designed to take advantage of NUMA architectures to enhance performance through effective memory management strategies that minimize access latency.

Kernel Bypass

Kernel bypass

Object Pool

Apache Commons Pool

Caveats,

Conversely, avoiding object creation by maintaining your own object pool is a bad idea unless the objects in the pool are extremely heavyweight. The classic example of an object that does justify an object pool is a database connection. The cost of establishing the connection is sufficiently high that it makes sense to reuse these objects. Generally speaking, however, maintaining your own object pools clutters your code, increases memory footprint, and harms performance. Modern JVM implementations have highly optimized garbage collectors that easily outperform such object pools on lightweight objects.
(Effective Java, Item 6: Avoid creating unnecessary objects)
In early JVM versions, object allocation and garbage collection were slow,13 but their performance has improved substantially since then. In fact, allocation in Java is now faster than malloc is in C: the common code path for new Object in HotSpot 1.4.x and 5.0 is approximately ten machine instructions.
In concurrent applications, pooling fares even worse. When threads allocate new objects, very little inter-thread coordination is required, as allocators typically use thread-local allocation blocks to eliminate most synchronization on heap data structures. But if those threads instead request an object from a pool, some synchronization is necessary to coordinate access to the pool data structure, creating the possibility that a thread will block. Because blocking a thread due to lock contention is hundreds of times more expensive than an allocation, even a small amount of pool-induced contention would be a scalability bottleneck. (Even an uncontended synchronization is usually more expensive than allocating an object.)
In addition to being a loss in terms of CPU cycles, object pooling has a number of other problems, among them the challenge of setting pool sizes correctly (too small, and pooling has no effect; too large, and it puts pressure on the garbage collector, retaining memory that could be used more effectively for something else); the risk that an object will not be properly reset to its newly allocated state, introducing subtle bugs; the risk that a thread will return an object to the pool but continue using it; and that it makes more work for generational garbage collectors by encouraging a pattern of old-to-young references.
(Java concurrency in practice, 11.4.7 Just say no to object pooling)

fastutil

fastutil: Fast & compact type-specific collections for Java™

PreviousFloating-point Arithmetic NextInteresting Bitwise Operation

Last updated 7 months ago

Was this helpful?