Damon's Blog
  • Damon's Blog
  • 💻Tech
    • Understand C++ Special Member Function Generation
    • Understand C++ Template Type Deduction
    • Using Windows Keymap in Mac and Intellij
    • Setup C++ Dev Env in MacOS using Docker
    • Floating-point Arithmetic
    • Java HFT Toolbox
    • Interesting Bitwise Operation
    • Python Metaclass
    • Memory Order
    • Grind 75+
      • Array
      • String
      • Matrix
      • Binary Search
      • Graph
      • Tree
      • Math
      • Hash Table
      • Recursion
      • Linked List
      • Stack
      • Heap
      • Trie
      • Dynamic Programming
      • Binary Operation
    • C++ for Java Developers
    • Typical Domain Driven Design Application Architecture
    • How to Deploy Applications in Linux Properly
    • Design Patterns with Java Examples
    • Tools for Reliability
    • MVCC in Database
    • Two Microservice Tech Stacks
    • The Difference between 127.0.0.1 and 0.0.0.0
  • ➗Math
    • Local Volatility in Terms of Implied Volatility
    • Mean and Variance of Squared Gaussian
    • A Simple Explanation of Information Gain
  • 💲Trading
    • Understanding Risk-Neutral Density Functions
    • The Deriving of Black-Scholes Equation
    • Quant Questions
  • 💎Gems
    • 2024
  • 📖Books
    • Performance Analysis and Tuning on Modern CPUs
Powered by GitBook
On this page
  • Aeron
  • aeron
  • Agrona
  • SBE
  • Chronicle
  • Articles
  • Chronicle Queue
  • Chronicle Threads
  • Java-Thread-Affinity
  • Chronicle-Wire
  • LMAX-Exchange
  • disruptor
  • JDK
  • loom
  • SynchronousQueue.java
  • Exchanger
  • LinkedTransferQueue
  • Performance Testing & Analysis
  • Java Microbenchmark Harness (JMH)
  • arthas
  • Runtime Information Collector
  • Memory Analyzer
  • Books
  • Java Concurrency in Practice
  • TCP/IP Sockets in Java, Kenneth Calvert et al.
  • Programming with POSIX threads, David R. Butenhof
  • Learning Concurrent Programming in Scala, Aleksandar Prokopec
  • The Art of Multiprocessor Programming, Maurice Herlihy et al
  • The Little Book of Semaphores, Allen Downey
  • Effective Java: Bloch, Joshua
  • Others
  • JCTools
  • log4j2
  • GNU Trove
  • HikariCP
  • netty
  • seqlock
  • NUMA vs SMP
  • Kernel Bypass
  • Object Pool
  • fastutil

Was this helpful?

  1. Tech

Java HFT Toolbox

PreviousFloating-point ArithmeticNextInteresting Bitwise Operation

Last updated 6 months ago

Was this helpful?

Aeron

Chronicle

Articles

Memory-mapped files require direct access to the underlying file system to map a file's contents into memory. Shared drives, such as network drives, do not provide the necessary low-level access and control over the file system required for memory mapping. This limitation is due to the following reasons:

  1. Performance: Memory-mapped files rely on fast, low-latency access to the file system, which is not guaranteed over a network.

  2. Consistency: Ensuring data consistency and coherency across a network is complex and not feasible for memory-mapped files.

  3. File System Control: Memory mapping requires specific file system operations that are not supported by network protocols used for shared drives. For these reasons, memory-mapped files are typically restricted to local file systems.

Memory-mapped files require specific file system operations that include:

  1. File Mapping: The ability to map a file's contents directly into the virtual address space of a process. This involves creating a mapping between the file and the memory.

  2. Direct 1/O Access: Low-level access to the file system to read and write data directly to and from memory without intermediate buffering.

  3. Locking Mechanisms: The ability to lock portions of the file to ensure data consistency and prevent concurrent modifications.

  4. Synchronization: Ensuring that changes made to the memory-mapped region are synchronized with the underlying file on disk. These operations are typically supported by local file systems but are not feasible over network file systems due to performance, consistency, and control limitations.

A Low Garbage Java Serialisation Library that supports multiple formats.

LMAX-Exchange

  1. lock-free using volatile / AtomicLong

  2. padding to prevent false sharing

  3. batching for disk/network writing (e.g., the size of a block is 4k)

  4. enable the ability to zero garbage route using byte array or using the immutable object

JDK

Performance Testing & Analysis

Runtime Information Collector

Memory Analyzer

Books

Java Concurrency in Practice

Most Java (and Scala) programmers know about Java Concurrency in Practice. It is indeed an essential book, and anyone serious about concurrency should read it cover-to-cover. After that, it's muddier. There is no single book that complements that one. You have to read many, though not necessarily cover-to-cover.

Here are 6 additional concurrency books that I personally have in my book-shelve and that will considerably deepen your knowledge.

This is not a comprehensive list, but if you read all these, do the exercises, and then try to implement these concepts on hobby projects - for example, building HTTP server, an HTTP client, or a actor framework - it will put you above most developers.

TCP/IP Sockets in Java, Kenneth Calvert et al.

At less than 180 pages, this is the shortest of the list. Don't let the size fool you, it is comprehensive, and packed with clear and succinct explanations of TCP and UDP sockets in Java. Both "plain" and non-blocking sockets, but no asynchronous sockets. It contains practical explanations, but where it shines is in the insights into network low-level details. I particularly liked the last chapter with a brief, yet clear explanation on how TCP works under the hood.

Programming with POSIX threads, David R. Butenhof

Not Java specific. An old book that is a reference in the field. Its an easy and insightful read, but no point in doing it before "Java Concurrency in Practice". It details what a thread is, and how to use them. It also covers concurrency primitives like mutexes, and conditional variables. This is all at the OS level (for UNIX systems), but it is very relevant for the Java programmer, since JVM implementations on Linux and Mac OS use POSIX threads, so reading it gives you great insights.

Learning Concurrent Programming in Scala, Aleksandar Prokopec

Scala shares the same memory model as Java. It relies on the same primitives provided by the JVM. Even if you are a Java programmer, it is worth reading some chapters of this book, as it explains some topics with a bit more detail than Java Concurrency in Practice. If you are a Java developer, read chapters 1 to 3. If Scala is relevant to you, read until chapter 4. The remaining chapters are less important, and some are already out of date.

The Art of Multiprocessor Programming, Maurice Herlihy et al

The most advanced book in the list. Language agnostic, but the practical examples are in Java, whilst the lower-level concepts are in C++. The first chapters are heavy on theory, and will likely demoralise you, if you don't already have a strong grasp of concurrency. The second part is more practical and it details how to actually construct some data-structures to be concurrent.

The Little Book of Semaphores, Allen Downey

A fantastic book. Language agnostic and very compact. It consists of a collection of exercises around classical concurrency problems like the Dining Philosophers problem. As the name suggests, the objective is to solve every problem using one or several semaphores. Code snippets are in Python, but resemble pseudo-code, and should be no problem for the Java/Scala developer.

Effective Java: Bloch, Joshua

Although this book is not targeting at HFT, but the knowledge contained is quite important for writing a robust and high performance application.

Others

    • Utility for preventing primitive parameter values from being auto-boxed. Auto-boxing creates temporary objects which contribute to pressure on the garbage collector. With this utility users can convert primitive values directly into text without allocating temporary objects.

    • private final ThreadLocal<int[]> current = new ThreadLocal<>();: By using ThreadLocal<int[]>, you can store multiple integer values in a single thread-local variable without the overhead of boxing.

    • “Item 61: Prefer primitive types to boxed primitives”, Effective Java, Third Edition

  • some optimizations around rendering of timestamps if a built-in format is used

  • tries its best to provide overloads that avoid varargs (“Item 53: Use varargs judiciously”, Effective Java, Third Edition)

The GNU Trove library has two objectives:

  • Provide "free" (as in "free speech" and "free beer"), fast, lightweight implementations of the java.util Collections API. These implementations are designed to be pluggable replacements for their JDK equivalents.

  • Whenever possible, provide the same collections support for primitive types. This gap in the JDK is often addressed by using the "wrapper" classes (java.lang.Integer, java.lang.Float, etc.) with Object-based collections. For most applications, however, collections which store primitives directly will require less space and yield significant performance gains.

  • https://learn.lianglianglee.com/专栏/Netty%20核心原理剖析与%20RPC%20实践-完/00%20学好%20Netty,是你修炼%20Java%20内功的必经之路.md

seqlock

NUMA vs SMP

  • JEP 345: NUMA-Aware Memory Allocation for G1 GC

  • Azul C4 is designed to take advantage of NUMA architectures to enhance performance through effective memory management strategies that minimize access latency.

Kernel Bypass

Object Pool

Caveats,

  • Conversely, avoiding object creation by maintaining your own object pool is a bad idea unless the objects in the pool are extremely heavyweight. The classic example of an object that does justify an object pool is a database connection. The cost of establishing the connection is sufficiently high that it makes sense to reuse these objects. Generally speaking, however, maintaining your own object pools clutters your code, increases memory footprint, and harms performance. Modern JVM implementations have highly optimized garbage collectors that easily outperform such object pools on lightweight objects.

    (Effective Java, Item 6: Avoid creating unnecessary objects)

  • In early JVM versions, object allocation and garbage collection were slow,13 but their performance has improved substantially since then. In fact, allocation in Java is now faster than malloc is in C: the common code path for new Object in HotSpot 1.4.x and 5.0 is approximately ten machine instructions.

    In concurrent applications, pooling fares even worse. When threads allocate new objects, very little inter-thread coordination is required, as allocators typically use thread-local allocation blocks to eliminate most synchronization on heap data structures. But if those threads instead request an object from a pool, some synchronization is necessary to coordinate access to the pool data structure, creating the possibility that a thread will block. Because blocking a thread due to lock contention is hundreds of times more expensive than an allocation, even a small amount of pool-induced contention would be a scalability bottleneck. (Even an uncontended synchronization is usually more expensive than allocating an object.)

    In addition to being a loss in terms of CPU cycles, object pooling has a number of other problems, among them the challenge of setting pool sizes correctly (too small, and pooling has no effect; too large, and it puts pressure on the garbage collector, retaining memory that could be used more effectively for something else); the risk that an object will not be properly reset to its newly allocated state, introducing subtle bugs; the risk that a thread will return an object to the pool but continue using it; and that it makes more work for generational garbage collectors by encouraging a pattern of old-to-young references.

    (Java concurrency in practice, 11.4.7 Just say no to object pooling)

fastutil

and /

💻
aeron
Java Aeron Framework: A Beginner’s Guide to Unicast Networking with UDP
Exploring the Java Aeron Framework: A Comprehensive Introduction to IPC
Agrona
IntArrayList.java
CachedNanoClock
SBE
Write Combining
Loop unswitching
Loop unrolling
Chronicle Queue
Chronicle Threads
simple event loop in python for understanding
Java-Thread-Affinity
AffinityLock.java
Chronicle-Wire
disruptor
loom
Project Loom
程序员应如何理解高并发中的协程
SynchronousQueue.java
SynchronousQueue
Exchanger
LinkedTransferQueue
Java Microbenchmark Harness (JMH)
What false sharing is and how JVM prevents it
arthas
Java Flight Recorde
JMC
JVisualVM
MAT
JCTools
MpmcArrayQueue.java
False Sharing
Bounded MPMC queue
log4j2
Unbox.java
Java Performance Notes: Autoboxing / Unboxing
GNU Trove
HikariCP
ConcurrentBag.java
读懂HikariCP
全面讲解HikariCP的使用和源码
netty
NIO Buffer
HashedWheelTimer
MPSC Queue
FastThreadLocal
Netty核心原理剖析与RPC实践
EventLoop
Is it possible to efficiently implement a seqlock in Java?
c++ seqlock
Trading at light speed: designing low latency systems in C++ - David Gross - Meeting C++ 2022
Kernel bypass
Apache Commons Pool
fastutil: Fast & compact type-specific collections for Java™