2024

2024

December

November

October

  • 31: Replace your switch statement and multiple "if and else", using Object Literalsarrow-up-right

    Lesson: treat the switch statement as if they were the data.

  • 30: Java Convert Bytes to Unsigned Bytesarrow-up-right

    • When we need to represent signed numbers in Java, we find 2's complementarrow-up-right.In 2's complements the left most bit represent the sign (+ ive or - ive). The bit 0 denotes positive and 1 denotes negative. The rest of the bits denotes the value from -128 to 127. Therefore, it is called 8-bit byte only have 7 bits to store the values. other extra values range from 128 to 255 are unable to fit into a single byte. So, we can cast it to 32-bit unsigned integer for more spaces (bits).

    • Note that Java does not provide unsigned byte. If we need to represent a number as unsigned byte (1 byte -> 4 bytes), we must cast byte to int and mask (&) the new int with a &0xff (get the last 8 bits). It gives the last 8-bits or prevents sign extension.

    • Java 8 provides the built-in method toUnsignedInt() that is defined in the Byte class. It supports unsigned operations. The method converts a signed byte into an unsigned integer.

    • Many external systems (e.g., databases, network protocols) utilize unsigned types. The lack of native support for unsigned bytes in Java complicates integration with these systems, requiring additional conversion logic or the use of larger data types (like int or long) to represent values that should fit in an unsigned byte24. This can lead to performance overhead and potential bugs if developers do not handle these conversions carefully.

    • Frequent conversions between signed and unsigned representations can impact performance, especially in applications that require high throughput or low latency, such as video processing or real-time data analysis.

    • the lack of an unsigned byte type in Java complicates data handling and interoperability while increasing the risk of errors in code. Developers must implement additional logic to work around these limitations, which can lead to more complex and error-prone applications.

  • 26: Using files for shared memory IPCarrow-up-right

    • If another process attempts to load the same file (while it is still resident in the cache) the kernel detects this and doesn't need to reload the file. If the page cache gets full, pages will get evicted - dirty ones being written back out to the disk.

    • By contrast, with IPC implemented using shared memory, there are no read and write syscalls, and no extra copy step. Each "channel" can simply use a separate area of the mapped buffer. A thread in one process writes data into the shared memory and it is almost immediately visible to the second process.

    • if shared memory IPC can be implemented without memory mapped files?

    • A practical way would be to create a memory-mapped file for a file that lives in a memory-only file system; e.g. a "tmpfs" in Linux.

    • You could in theory implement a shared segment between two processes

    • Note that both Aeron IPC and CQ support tmpfs to further improve the performance

    • When setting up Aeron for IPC, the media driver can be configured to operate with a term buffer located on a tmpfs mount point. This setup minimizes disk I/O latency since all operations occur in memory. The configuration involves specifying the directory for the Aeron media driver to point to a tmpfs mount, ensuring that all IPC messages are handled in-memory

    • For even lower latencies, Chronicle Queue can be backed by tmpfs, a temporary filesystem that resides in RAM. This configuration significantly reduces delays caused by disk operations, provided that the queue size is managed appropriately.

  • 25: Aeron: Open-source high-performance messagingarrow-up-right

    The video discusses Aeron, a messaging system focused on high performance and reliability, particularly in scenarios where traditional protocols like TCP and UDP may fall short. The speaker, Martin Thompson, emphasizes the need for consistent latency and the challenges of reliable message delivery over UDP.

    • Transportation Media: multicast, IPC, InfiniBand, RDMA, PCI-e 3.0

    • OSI Layer 4 (Transport) Services

    • Connection Oriented Communication

    • Reliability

    • Flow Control: counters are the key to flow control and monitoring; pluggable in Aeron

    • Congestion Avoidance/Control: TCP is not suitable for HFT partially because of it; pluggable in Aeron

    • Multiplexing: HOL Blocking

    • Design Principles

    from

    1. clear segregation of control

    2. garbage free in steady state running

    3. lock-free, wait-free and copy-free in data structure in the messaging path

    4. respect the Single Writer Principle

    5. major data structures are not shared

    6. don't burden the main paths with exceptional cases

    7. non-blocking in the message path

    8. ...

    into 3 basic things

    • system architecture

    • data structure

    • protocol of interactions

    • Data Structure

    • Maps: dealing with primitives

    • IPC Ring/Broadcast Buffer: between Conductors

    • ITC Queues: between Sender/Receiver and Conductors

    • Dynamic Arrays

    • Log Buffer: IPC for messaging, creates a replicated persistent log of messages

      • mmap

      • tail is being moved atomically

      • No big file: page fault; page cache churn; VM pressure; clean/dirty/active

      • receiver side: High Water Mark + Completed; point chasing is really bad (In the context of messaging systems, point chasing refers to a method where a sender strategically prioritizes and sends messages to maximize engagement or response rates.)

    • Monitoring and Debugging should be designed on day 1

    • Loss, throughput and buffer size are strongly related

    • Java

    • Bad:

      • No Unsigned Type

      • NIO - Locks, off-heap, PAUSE, Signals, etc

      • String Encoding - 3 buffer copy

      • External Resources

      • Selectors - GC

      • converting bytes into int

    • Good:

      • Tooling: IDEs, Gradle, HdrHistogram

      • Bytecode Instrumentation: good to debugging

      • Unsafe

      • The Optimizer

      • Garbage Collectors

    • Kernel Module and FPGAs possible

  • 24: Evolution of Financial Exchange Architecturesarrow-up-right

    The video features Martin Thompson discussing the evolution of financial exchanges, focusing on advancements in design, resilience, performance and deployment over the past decade.

    • Design

    • State Machine -> Replicated State Machine: ordered input + deterministic execution

    • Distributed Event Log: event sourcing

    • Rich Domain Model (DDD) and specific data structure designed from scratch

    • Time & Timers: atomic clock + gps synchronizer; how a timer cancels an order

    • Resilience

    • Fairness: multiple gateways -> 1

    • Gateway: classification of customers

    • Matching Engine: sharding by symbol/fungible...

    • Primary Secondary vs Consensus: Raft

    • Code Quality and Model Fidelity: Model fidelity refers to the degree to which a model accurately represents the real-world system or phenomenon it is intended to simulate or predict. High fidelity means that the model closely matches the actual behavior or characteristics of the system, capturing important details and dynamics. Low fidelity indicates a more simplified or abstract representation that may overlook critical factors.

    • Performance: Transaction throughput has increased significantly, with some exchanges reaching millions of transactions per second and achieving latencies below 100 microseconds.

    • Latency: average latency is misleading, we need percentile

    • Throughput: burst scenario

    • JVM:

      • CMS full GC

      • G1

      • Azul C4: Continuously Concurrent Compacting Collector, high allocation rate without nasty gc pauses with Amdahl's law

      • ZGC: not generational (but we can turn it on now?)

      • Shanadoah: better at smaller heaps

    • Memory Access Patterns: Java is still catching up with that, c can get the close to the machine about the memory layout so that's why the fastest matching engine is written is c

    • Data Structure: check all kinds of libs or even implement your own one; prevent cache misses;

    • Binary Codecs: SBE; the FIX protocol is encoded in ASCII

    • Preventing Costs: system calls; disk calls; page fault is going to interrupt the kernel - A page fault is an exception raised by the memory management unit (MMU) when a program attempts to access a memory page that is not currently mapped to its virtual address space. This situation typically arises when the required page is not loaded into physical memory (RAM), which make the mmap file horrendously more expensive all of a sudden. Setup huge pages to fix it; context switching

    • Hardware

      • Disks: from milliseconds to tens/hundreds of microseconds

      • Network: financial organization is good at that

      • CPU: not too much improvement, throughput is abundant, but the latency is not getting better

      • IO: socket is not good; new API for IO and please use asynchronous API; DPDK

    • Language: polyglot

    • Deployment

    • CI/CD

    • Flexible Scaling: dev env in your local machine; using IPC if the machine has 100 cores

  • 23: An oral history of Bank Pythonarrow-up-right

    Dagger, a directed, acyclic graph of financial instruments. Also refer to tradebookarrow-up-right for this concept.

  • 21: HTTP/3 From A To Z: Core Conceptsarrow-up-right

    QUIC’s faster connection set-up with 0-RTT is really more of a micro-optimization than a revolutionary new feature. Compared to a state-of-the art TCP + TLS 1.3 set-up, it would save a maximum of one round trip. The amount of data that can actually be sent in the first round trip is additionally limited by a number of security considerations.

  • 20: The Gamma Of Levered ETFsarrow-up-right

    Levered ETFs are trading tools that are not suitable for investing. They do a good job of matching the levered return of an underlying index intraday. The sum of all the negative gamma trading is expensive as the mechanical re-balancing gets front-run and “arbed” by traders. This creates significant drag on the levered ETF’s assets. In fact, if the borrowing costs to short levered ETFs were not punitive, a popular strategy would be to short both the long and short versions of the same ETF, allowing the neutral arbitrageur to harvest both the expense ratios and negative gamma costs from tracking the index!

  • 18: Server Setup Basicsarrow-up-right

    and his [YREADME.mdarrow-up-rightet another full-node guide](https://becomesovran.com/blog/yet-another-full-node-guide.html) is quite good too. And here is another blog about the mentioned btoparrow-up-right.

  • 15: What is an Equivalent Martingale Measure, and why should a bookie care?arrow-up-right

  • 7: 你管这破玩意儿叫 TCParrow-up-right

    窗口大小 = min(cwnd, rwnd)

  • 2: String Length vs Character Length in Different Languagesarrow-up-right

    In Java, the length method of String objects is not the length of that String in characters. Instead, it only gives the number of 16-bit code units used to encode a string. This is not (always) the number of Unicode characters (code points) in the string.

September

August

July

June

Last updated