Damon's Blog
  • Damon's Blog
  • 💻Tech
    • Understand C++ Special Member Function Generation
    • Understand C++ Template Type Deduction
    • Using Windows Keymap in Mac and Intellij
    • Setup C++ Dev Env in MacOS using Docker
    • Floating-point Arithmetic
    • Java HFT Toolbox
    • Interesting Bitwise Operation
    • Python Metaclass
    • Memory Order
    • Grind 75+
      • Array
      • String
      • Matrix
      • Binary Search
      • Graph
      • Tree
      • Math
      • Hash Table
      • Recursion
      • Linked List
      • Stack
      • Heap
      • Trie
      • Dynamic Programming
      • Binary Operation
    • C++ for Java Developers
    • Typical Domain Driven Design Application Architecture
    • How to Deploy Applications in Linux Properly
    • Design Patterns with Java Examples
    • Tools for Reliability
    • MVCC in Database
    • Two Microservice Tech Stacks
    • The Difference between 127.0.0.1 and 0.0.0.0
  • ➗Math
    • Local Volatility in Terms of Implied Volatility
    • Mean and Variance of Squared Gaussian
    • A Simple Explanation of Information Gain
  • 💲Trading
    • Understanding Risk-Neutral Density Functions
    • The Deriving of Black-Scholes Equation
    • Quant Questions
  • 💎Gems
    • 2024
  • 📖Books
    • Performance Analysis and Tuning on Modern CPUs
Powered by GitBook
On this page
  • 2024
  • December
  • November
  • October
  • September
  • August
  • July
  • June

Was this helpful?

  1. Gems

2024

PreviousQuant QuestionsNextPerformance Analysis and Tuning on Modern CPUs

Last updated 2 months ago

Was this helpful?

2024

December

  • 31:

  • 20:

    • Open Addressing, aka., Closed Hashing

    • Closed Addressing, aka., Open Hashing

  • 19:

  • 18:

  • 17:

    • The first process in the process group becomes the process group leader and the first process in the session becomes the session leader. Every session can have one TTY associated with it. Only a session leader can take control of a TTY. For a process to be truly daemonized (ran in the background) we should ensure that the session leader is killed so that there is no possibility of the session ever taking control of the TTY.

  • 6: Some Python knowledge reviewed

  • 3:

    • There's no difference between shm and tmpfs (actually, tmpfs is only the new name of former shmfs). hugetlbfs is a tmpfs-based filesystem that allocates its space from kernel huge pages and needs some additional configuration afford.

    • Block devices use SCSI (Small Computer System Interface) to talk to processes on one side and hardware on the other side. Hence, block devices have device names (since everything on Linux is a file, so device files) starting with sd*, where s = SCSI and d = device. During boot time, the kernel detects block devices one by one and name them in the form sd[a-z]. So in my case, my vmware disk got detected first and hence, it was named sda. Next, after boot, I attached my USB stick and the kernel named it sdb. As I mentioned earlier, all disks have one or more partitions. In the above output, sdb1, sda1, etc. are names of partitions.

    • Remember that file systems are the managers of every partition? When we created a partition on mylv1, it's size was 1 GB. Just increasing the volume size doesn't mean that the filesystem would also be synchronized.

  • 1:

    functools.wraps, an easy-to-use interface for functools.update_wrapper, is a decorator that automatically transfers the key metadata from a callable (generally a function or class) to its wrapper. Typically, this wrapper is another function, but it can be any callable object such as a class.

November

  • It is equivalent but not identical. Simply said, if a lambda expression does not capture values, it will be a singleton that is re-used on every invocation.

  • Decorator Functions with Decorator Arguments

  • I hope I know it earlier... Note there is a bash.exe and use it instead of git-bash.ext. Use .bashrc instead of .profile or .bash_profile to setup env var because by default the git-bash won't login like bash -l automatically.

  • The token bucket limits the average inflow rate and allows sudden increase in traffic. The leaky bucket limits the constant outflow rate, which is set to a fixed value.

    Some other implementations,

    • ring buffer based solution

    • queue based solution

    • Implementing a Java Timer uses native Java API.

    • Comparing it with a Hashed Timer Wheel Solution.

    • from Doug Lea, the famous Java Concurrency Module author.

    • "Container types, including collections, maps, streams, arrays, and optionals should not be wrapped in optionals."

    • "The ProcessHandle class does have the arguments method, which returns Optional<String[]>, but this method should be regarded as an anomaly that is not to be emulated."

    • Using String as a key in a Java HashMap is generally preferred over CharSequence. This is because String is immutable, ensuring that its hash code remains constant throughout its lifetime, which is crucial for maintaining the integrity of the HashMap. In contrast, CharSequence is an interface that includes both mutable (like StringBuilder) and immutable implementations, leading to potential inconsistencies in equality and hash code behavior across different implementations. Thus, to avoid unexpected behavior, it’s advisable to use String as the key.

    • The hashCode() implementation for StringBuilder in Java is not explicitly defined in the same way it is for String. Instead, StringBuilder inherits the hashCode() method from the Object class, which returns a hash code based on the object's memory address. This means that each instance of StringBuilder will have a unique hash code that does not take into account the contents of the builder.

  • String length method for Java and other languages.

    • due to the nature of threaded execution and system timing, this test could potentially fail if the system is under heavy load or experiencing other issues that cause significant delays.

    • Alternatives to Hashed Wheel Timers:

    • Heap-based timers

    • List-based timers

  • The default implementation has better performance.

October

  • Lesson: treat the switch statement as if they were the data.

    • Note that Java does not provide unsigned byte. If we need to represent a number as unsigned byte (1 byte -> 4 bytes), we must cast byte to int and mask (&) the new int with a &0xff (get the last 8 bits). It gives the last 8-bits or prevents sign extension.

    • Java 8 provides the built-in method toUnsignedInt() that is defined in the Byte class. It supports unsigned operations. The method converts a signed byte into an unsigned integer.

    • Many external systems (e.g., databases, network protocols) utilize unsigned types. The lack of native support for unsigned bytes in Java complicates integration with these systems, requiring additional conversion logic or the use of larger data types (like int or long) to represent values that should fit in an unsigned byte24. This can lead to performance overhead and potential bugs if developers do not handle these conversions carefully.

    • Frequent conversions between signed and unsigned representations can impact performance, especially in applications that require high throughput or low latency, such as video processing or real-time data analysis.

    • the lack of an unsigned byte type in Java complicates data handling and interoperability while increasing the risk of errors in code. Developers must implement additional logic to work around these limitations, which can lead to more complex and error-prone applications.

  • git bisect run ./test_for_bug.sh

    • If another process attempts to load the same file (while it is still resident in the cache) the kernel detects this and doesn't need to reload the file. If the page cache gets full, pages will get evicted - dirty ones being written back out to the disk.

    • By contrast, with IPC implemented using shared memory, there are no read and write syscalls, and no extra copy step. Each "channel" can simply use a separate area of the mapped buffer. A thread in one process writes data into the shared memory and it is almost immediately visible to the second process.

    • if shared memory IPC can be implemented without memory mapped files?

    • A practical way would be to create a memory-mapped file for a file that lives in a memory-only file system; e.g. a "tmpfs" in Linux.

    • You could in theory implement a shared segment between two processes

    • Note that both Aeron IPC and CQ support tmpfs to further improve the performance

    • When setting up Aeron for IPC, the media driver can be configured to operate with a term buffer located on a tmpfs mount point. This setup minimizes disk I/O latency since all operations occur in memory. The configuration involves specifying the directory for the Aeron media driver to point to a tmpfs mount, ensuring that all IPC messages are handled in-memory

    • For even lower latencies, Chronicle Queue can be backed by tmpfs, a temporary filesystem that resides in RAM. This configuration significantly reduces delays caused by disk operations, provided that the queue size is managed appropriately.

    • Transportation Media: multicast, IPC, InfiniBand, RDMA, PCI-e 3.0

    • OSI Layer 4 (Transport) Services

    • Connection Oriented Communication

    • Reliability

    • Flow Control: counters are the key to flow control and monitoring; pluggable in Aeron

    • Congestion Avoidance/Control: TCP is not suitable for HFT partially because of it; pluggable in Aeron

    • Multiplexing: HOL Blocking

    • Design Principles

    from

    1. clear segregation of control

    2. garbage free in steady state running

    3. lock-free, wait-free and copy-free in data structure in the messaging path

    4. respect the Single Writer Principle

    5. major data structures are not shared

    6. don't burden the main paths with exceptional cases

    7. non-blocking in the message path

    8. ...

    into 3 basic things

    • system architecture

    • data structure

    • protocol of interactions

    • Data Structure

    • Maps: dealing with primitives

    • IPC Ring/Broadcast Buffer: between Conductors

    • ITC Queues: between Sender/Receiver and Conductors

    • Dynamic Arrays

    • Log Buffer: IPC for messaging, creates a replicated persistent log of messages

      • mmap

      • tail is being moved atomically

      • No big file: page fault; page cache churn; VM pressure; clean/dirty/active

      • receiver side: High Water Mark + Completed; point chasing is really bad (In the context of messaging systems, point chasing refers to a method where a sender strategically prioritizes and sends messages to maximize engagement or response rates.)

    • Monitoring and Debugging should be designed on day 1

    • Loss, throughput and buffer size are strongly related

    • Java

    • Bad:

      • No Unsigned Type

      • NIO - Locks, off-heap, PAUSE, Signals, etc

      • String Encoding - 3 buffer copy

      • External Resources

      • Selectors - GC

      • converting bytes into int

    • Good:

      • Tooling: IDEs, Gradle, HdrHistogram

      • Bytecode Instrumentation: good to debugging

      • Unsafe

      • The Optimizer

      • Garbage Collectors

    • Kernel Module and FPGAs possible

  • The video features Martin Thompson discussing the evolution of financial exchanges, focusing on advancements in design, resilience, performance and deployment over the past decade.

    • Design

    • State Machine -> Replicated State Machine: ordered input + deterministic execution

    • Distributed Event Log: event sourcing

    • Rich Domain Model (DDD) and specific data structure designed from scratch

    • Time & Timers: atomic clock + gps synchronizer; how a timer cancels an order

    • Resilience

    • Fairness: multiple gateways -> 1

    • Gateway: classification of customers

    • Matching Engine: sharding by symbol/fungible...

    • Primary Secondary vs Consensus: Raft

    • Code Quality and Model Fidelity: Model fidelity refers to the degree to which a model accurately represents the real-world system or phenomenon it is intended to simulate or predict. High fidelity means that the model closely matches the actual behavior or characteristics of the system, capturing important details and dynamics. Low fidelity indicates a more simplified or abstract representation that may overlook critical factors.

    • Performance: Transaction throughput has increased significantly, with some exchanges reaching millions of transactions per second and achieving latencies below 100 microseconds.

    • Latency: average latency is misleading, we need percentile

    • Throughput: burst scenario

    • JVM:

      • CMS full GC

      • G1

      • Azul C4: Continuously Concurrent Compacting Collector, high allocation rate without nasty gc pauses with Amdahl's law

      • ZGC: not generational (but we can turn it on now?)

      • Shanadoah: better at smaller heaps

    • Memory Access Patterns: Java is still catching up with that, c can get the close to the machine about the memory layout so that's why the fastest matching engine is written is c

    • Data Structure: check all kinds of libs or even implement your own one; prevent cache misses;

    • Binary Codecs: SBE; the FIX protocol is encoded in ASCII

    • Preventing Costs: system calls; disk calls; page fault is going to interrupt the kernel - A page fault is an exception raised by the memory management unit (MMU) when a program attempts to access a memory page that is not currently mapped to its virtual address space. This situation typically arises when the required page is not loaded into physical memory (RAM), which make the mmap file horrendously more expensive all of a sudden. Setup huge pages to fix it; context switching

    • Hardware

      • Disks: from milliseconds to tens/hundreds of microseconds

      • Network: financial organization is good at that

      • CPU: not too much improvement, throughput is abundant, but the latency is not getting better

      • IO: socket is not good; new API for IO and please use asynchronous API; DPDK

    • Language: polyglot

    • Deployment

    • CI/CD

    • Flexible Scaling: dev env in your local machine; using IPC if the machine has 100 cores

    • The Epochalypse problem.

  • QUIC’s faster connection set-up with 0-RTT is really more of a micro-optimization than a revolutionary new feature. Compared to a state-of-the art TCP + TLS 1.3 set-up, it would save a maximum of one round trip. The amount of data that can actually be sent in the first round trip is additionally limited by a number of security considerations.

  • Levered ETFs are trading tools that are not suitable for investing. They do a good job of matching the levered return of an underlying index intraday. The sum of all the negative gamma trading is expensive as the mechanical re-balancing gets front-run and “arbed” by traders. This creates significant drag on the levered ETF’s assets. In fact, if the borrowing costs to short levered ETFs were not punitive, a popular strategy would be to short both the long and short versions of the same ETF, allowing the neutral arbitrageur to harvest both the expense ratios and negative gamma costs from tracking the index!

    • If there is an arbitrage possibility, then there is no EMM.

    • If there are no arbitrage possibilities, then there is at least one EMM.

    • If every payoff is replicable, then there is exactly one EMM.

  • 窗口大小 = min(cwnd, rwnd)

  • In Java, the length method of String objects is not the length of that String in characters. Instead, it only gives the number of 16-bit code units used to encode a string. This is not (always) the number of Unicode characters (code points) in the string.

September

  • TCP Fast Open (TFO) 是在传统的三次握手基础上进行优化,允许在握手过程中发送数据,从而减少首次发送数据的延迟,提升网络应用性能。

  • TODO

  • This repo using RAFT to ensure the Availability, and based on CAP Theory,

    • Consistency: no, only eventual consistency

    • Availability: yes

    • Partition Tolerance: yes

  • TODO

    • I wish I can understand more when I watch it again later.

    • back to the basics.

    • The goal of VarHandle is to define a standard for invoking equivalents of java.util.concurrent.atomic and sun.misc.Unsafe operations on fields and array elements.

    • In the worst case we evaluated, non-generational ZGC caused 36% more CPU utilization than G1 for the same workload. That became a nearly 10% improvement with generational ZGC.

  • When we don’t declare a hashCode() method for a class, Java will use the identity hash code for it.

    • JVM raises a flag in the monitor object that some thread acquires the lock, so reacquiring and releasing the lock by the same thread is lightweight. But the lock must be revoked when another thread tries to acquire the bias lock. And the revocation is a costly operation.

August

    • Use the keyword final when you want the compiler to prevent a variable from being re-assigned to a different object.

    • The default type of reference in Java is a strong reference.

    • If JVM sees an object that has only a weak reference during GC, it will be collected.

    • A soft reference is garbage-collected only if there is not enough memory left in the heap.

    • Unlike weak and soft references whereby we can control how objects garbage-collected, a phantom reference is used for pre-mortem clean-up actions before the GC removes the object.

  • Simple but useful.

    • Put simply, the Contended annotation adds some paddings around each annotated field to isolate each field on its own cache line. Consequently, this will impact the memory layout. Please note that the Contended annotation is JDK internal, therefore we should avoid using it. Also, we should run our code with the -XX:-RestrictContended tuning flag; otherwise, the annotation wouldn’t take effect.

    • the JVM adds padding to the objects so that their size is a multiple of 8 bytes. With these paddings, the last three bits in oops are always zero. This is because numbers that are a multiple of 8 always end in 000 in binary.

    • Since ZGC needs to use 64-bit colored pointers, it does not support compressed references. So, using an ultra-low latency GC like ZGC has to be weighed against using more memory.

    • On modern x86 CPUs, the typical cache line size is 64 bytes. However, some CPUs may have larger cache lines, such as 128 bytes.

    • To ensure compatibility with various CPU architectures and avoid false sharing on CPUs with larger cache lines, the ConcurrentCircularArrayQueueL0Pad class is padded with 128 bytes.

    • We can make “message processors” are faster

    • We can make deserialize and serialize process faster

    • We can make the network transfer faster

    • Usually when optimising algorithms, you wanna find redundant or unnecessary work. Then find a way to consolidate that redundancy.

    • The sweep-and-prune algorithm is also known as sort-and-sweep.

    • Pairs flagged in all dimensions would be considered intersecting.

    • Insertion sort has a running time of O(n) at best when the list is already sorted or nearly-sorted, and O(n2) at worst when the list is in reverse.

    • FAST: Primarily designed to reduce the amount of data sent over the network and improve the speed of message delivery, particularly for market data. It utilizes compression techniques to minimize the size of messages, which can include both binary and character-based data.

    • FAST: Focuses on the transmission layer and may be used in conjunction with other encoding methods, including SBE.

    • Mark you move constructors and move assignment operators with noexcept

    • Further optimizations and stronger exception safety with copy-and-swap idiom

    • Perfect forwarding

    • A possible mitigation strategy to this is, if for example 95% of the callers of the function will include 3 or less parameters you can create three functions that take 1, 2, and 3 parameters of the argument type to handle the 95% and a fourth that takes three parameters and a vararg parameter to handle the rest of the 5%.

    • Utility for preventing primitive parameter values from being auto-boxed. Auto-boxing creates temporary objects which contribute to pressure on the garbage collector. With this utility users can convert primitive values directly into text without allocating temporary objects.

    • private final ThreadLocal<int[]> current = new ThreadLocal<>();: By using ThreadLocal<int[]>, you can store multiple integer values in a single thread-local variable without the overhead of boxing.

  • TODO

  • You will always get the same bean from the context if calling a @Bean annotated method in Spring java configuration. Same for @Component.

  • TODO

July

    1. Anonymous subclass of TypeToken

    2. The annotation of the getGenericSuperClass() method of the Class class is: Returns the Type representing the direct superclass of the entity (class, interface, primitive type or void) represented by thisClass. If the superclass is a parameterized type, the Type object returned must accurately reflect the actual type parameters used in the source code. The parameterized type representing the superclass is created if it had not been created before. See the declaration of ParameterizedType for the semantics of the creation process for parameterized types. If thisClass represents either theObject class, an interface, a primitive type, or void, then null is returned. If this object represents an array class then theClass object representing theObject class is returned.

    1. VISUAL BLOCK: I - insert before; A - append after; c - replace

    2. macro

    3. substitute

    • Exchanger vs SynchronousQueue vs LinkedTransferQueue

    • public interface RunnableFuture<V> extends Runnable, Future<V>

    • public class FutureTask<V> implements RunnableFuture<V>

    • There are four types of references in Java: Strong, Weak, Soft, and Phantom.

    • Unlike weak and soft references whereby we can control how objects garbage-collected, a phantom reference is used for pre-mortem clean-up actions before the GC removes the object.

  • It explains the origin of complex number and the geometrical meaning of complex domain.

    • How big can a certificate be? For curl to work is 100kB and for a web browser like Firefox is ~60kB though Firefox TLS library works different so results may vary.

    • How long can it last? From Jan 1, 1950 to Dec 31, 9999. About 8050 years.

    • Blocking I/O model (Blocking / Synchronous): recvfrom

    • Signal-driven I/O (Blocking / Synchronous): SIGIO

    • Non-Blocking I/O (Non-Blocking / Synchronous): recvfrom (O_NONBLOCK)

    • I/O Multiplexing (Blocking / Asynchronous): select, poll, epoll

    • Asynchronous I/O (Non-Blocking / Asynchronous): AIO

    • Power Set of X is the set of all possible subsets of X.

    • Definition of sigma algebra

    • what is a trade and how can it be modelled

    • what you need before booking a trade

    • what happens when a trade is booked

  • “Bootstrapping is a statistical procedure that resamples a single data set to create many simulated samples.”

  • 30 lines.

    • CRDT stands for “Conflict-free Replicated Data Type”.

    • It’s a kind of data structure that can be stored on different computers (peers). Each peer can update its own state instantly, without a network request to check with other peers.

    • Peers may have different states at different points in time, but are guaranteed to eventually converge on a single agreed-upon state. That makes CRDTs great for building rich collaborative apps, like Google Docs and Figma — without requiring a central server to sync changes.

June

    • Acquire semantics -> LoadLoad + LoadStore

    • Release semantics -> LoadStore + StoreStore

  • Types of Memory Barrier/Reordering: LoadLoad, StoreStore, LoadStore, StoreLoad

  • Mintomic (short for “minimal atomic”) is an API for low-level lock-free programming in C and C++.

  • Sequential consistency means that all threads agree on the order in which memory operations occurred, and that order is consistent with the order of operations in the program source code.

  • Kiss Linux™ is a meta-distribution for the x86_64 architecture with a focus on simplicity, sustainability and user freedom.

  • The truth is that pairs of memory barriers provide conditional ordering guarantees.

    • This is a shell script template generator (i.e. a script that writes scripts).

    • Maybe the thread.yield() should be used instead of the LockSupport.parkNanos(1);

    • Java supports to use “memory-mapped file” we can see this feature has been implemented in some popular frameworks and are successful by using this capability of Java

    • Chronicle queue

    • QuestDB

    • KDB

    • kafka -> sendfile

  • TODO

    1. lock-free using volatile / AtomicLong

    2. padding to prevent false sharing

    3. batching for disk/network writing (e.g., the size of a block is 4k)

    4. enable the ability to zero garbage route using byte array or using the immutable object Reference,

  • Latency numbers every programmer should know,

    • L1 cache reference 0.5 ns

    • Branch mispredict 5 ns

    • L2 cache reference 7 ns

    • Mutex lock/unlock 25 ns

    • Main memory reference 100 ns

    • ...

27:

26:

check the

25:

20:

19:

13:

worth a reading.

7: /

The algorithm behind the check is based on Determine if a word has a zero byte:

6:

5:

4:

3:

2:

1:

31:

30:

When we need to represent signed numbers in Java, we find .In 2's complements the left most bit represent the sign (+ ive or - ive). The bit 0 denotes positive and 1 denotes negative. The rest of the bits denotes the value from -128 to 127. Therefore, it is called 8-bit byte only have 7 bits to store the values. other extra values range from 128 to 255 are unable to fit into a single byte. So, we can cast it to 32-bit unsigned integer for more spaces (bits).

29:

28:

27:

26:

25:

The video discusses Aeron, a messaging system focused on high performance and reliability, particularly in scenarios where traditional protocols like TCP and UDP may fall short. The speaker, Martin Thompson, emphasizes the need for consistent latency and the challenges of reliable message delivery over UDP.

Lambda &

24:

23:

Dagger, a directed, acyclic graph of financial instruments. Also refer to for this concept.

22:

21:

20:

19:

18:

and his [Yet another full-node guide](https://becomesovran.com/blog/yet-another-full-node-guide.html) is quite good too. And here is another blog about the mentioned .

15:

13:

12:

11:

7:

2:

30:

29:

28:

27:

26:

25:

24:

23:

13:

12:

11:

10:

9:

7:

: Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff.

: Loop unswitching is a compiler optimization. It moves a conditional inside a loop outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional.[1] This can improve the parallelization of the loop. Since modern processors can operate quickly on vectors, this improvement increases the speed of the program.

6: -> , which means Apple chips are not supported yet.

29:

28:

27:

26:

24:

23:

22:

to inspect the memory layout of objects in the JVM.

21:

20:

19:

18:

17:

16:

15:

: The body length is the character count starting at tag 35 (included) all the way to tag 10 (excluded), including trailing SOH delimiters.

Simple Binary Encoding (SBE) is a codec format specified by the Financial Information eXchange (FIX) Trading Community for use in high-performance trading systems. ()

14:

13:

more,

12:

11:

10:

9:

8:

7:

6:

5:

TODO

4:

3:

2:

1:

31:

30:

29:

28:

27:

26:

24:

25:

: Mark-Sweep, Copying, Mark-Compact, Generation-Collection

24:

: TreeMap (Red-Black Tree) vs Priority Queue (Heap)

18:

15:

12:

11:

08:

04:

02: and

28:

27:

26:

24:

23:

21:

20:

"For antirez, programming was a way to express himself, a form of art. Every character and line break had to be meticulously crafted, akin to the art form of writing. Software development was like writing a book — it had to be beautiful, elegant, and easy to comprehend. If that software happened to be useful to others, that was just a side effect."

19:

18:

17:

16:

15:

14:

13:

12:

11:

10:

9:

8:

7:

6:

5:

4:

3:

💎
MarkItDown is a utility for converting various files to Markdown
Open Addressing vs Closed Addressing in HashMap
Java 9 引入的 Memory Order
Java 9 VarHandles Best practices, and why? youtube
What are memory fences used for in Java?
A friendly introduction to assembly for high-level programmers
What is the reason for performing a double fork when creating a daemon?
Use and meaning of session and process group in Unix?
Async IO in Python: A Complete Walkthrough
Understanding Object Instantiation and Metaclasses in Python
Implementing Interfaces in Python
On system memory... specifically the difference between tmpfs, shm, and hugepages...
JVM Anatomy Quark #2: Transparent Huge Pages
hugetlbfs support comes to Java Chronicle Queue
Huge Pages and Transparent Huge Pages
Understanding Huge Pages
Logical Volume Manager: A Beginner's Guide
A Deep Dive Into Python's functools.wraps Decorator
Does a lambda expression create an object on the heap every time it's executed?
A Guide to BitSet in Java
Affinity.java
Using Pausers in Event Loops
Python Decorators II: Decorator Arguments by Bruce Eckel
Adding Git-Bash to Windows Terminal
Detailed Explanation of Guava RateLimiter's Throttling Mechanism
Java Timer
Concurrency Deep Dives
DelayedQueue used by ScheduledThreadPoolExecutor
HashedWheelTimer vs ScheduledThreadPoolExecutor for higher performance
How the glibc strlen() implementation works
strlen performance implementation
Bit Twiddling Hacks
Scalable IO in Java
《Scalable IO in Java》译文
Introduction to Java ProcessHandle
Why String is popular HashMap key in Java?
String length
Hashed Wheel Timers
DeadlineTimerWheel.java
Java Enums Are Inherently Serializable
Replace your switch statement and multiple "if and else", using Object Literals
Java Convert Bytes to Unsigned Bytes
2's complement
Debugging Till Dawn: How Git Bisect Saved My Demo
Nginx Logging: A Comprehensive Guide
A virtual DOM in 200 lines of JavaScript
Using files for shared memory IPC
What is the purpose of MAP_ANONYMOUS flag in mmap system call?
Aeron: Open-source high-performance messaging
Method Handlers
Aeron: Do we really need another messaging system?
Evolution of Financial Exchange Architectures
An oral history of Bank Python
tradebook
Datetimes versus timestamps in MySQL
Using Unix Timestamps in MySQL Mini-Course
HTTP/3 From A To Z: Core Concepts
The Gamma Of Levered ETFs
How the Guinness Brewery Invented the Most Important Statistical Method in Science
Server Setup Basics
README.md
btop
What is an Equivalent Martingale Measure, and why should a bookie care?
Changing Probability Measures
Girsanov’s Theorem
Introduction to Spliterator in Java
Writing a custom spliterator in Java 8
NIO Buffer
Guide to Java groupingBy Collector
你管这破玩意儿叫 TCP
String Length vs Character Length in Different Languages
TCP Fast Open
Essence of linear algebra
High-availability matching engine of a stock exchange
Linear Algebra 101 for AI/ML
Trading at light speed: designing low latency systems in C++ - David Gross - Meeting C++ 2022
如果让你来设计网络
Java Objects.hash() vs Objects.hashCode()
Java Variable Handles Demystified
Java G1GC - Card Table (CT) vs Remembered Set (RS)
Understanding JVM Garbage Collection – Part 9 Garbage First (G1) Garbage Collector (GC)
垃圾回收之CardTable和Remembered Set
Bending pause times to your will with Generational ZGC
Introducing Generational ZGC
Demystifying ZGC: Concurrent Garbage Collection and Colored Pointers
Memory Address of Objects in Java
Java Variable Handles Demystified
Correct way to use VarHandle in Java 9?
Java 17 migration: bias locks performance regression
Loop unrolling
Loop unswitching
Remote C++ Development with Docker and CLion (with X11)
linux ARM is not supported yet
Debugging C++ in a Docker Container with CLion
Use containers for C++ development
Docker toolchain
Using Docker with CLion
Why should I use the keyword "final" on a method parameter in Java?
8 Kinds of Variables
Types of References in Java
Where are generic types stored in java class files?
详解Gson的TypeToken原理
Move Your Mac's Home Folder to a New Location
The Synchronizes-With Relation
Java Objects Inside Out
Java Memory Layour
JOL
Compressed OOPs in the JVM
jvm-堆内存不要超过32G
ZGC: A Scalable Low-Latency Garbage Collector
An Introduction to ZGC
Shenandoah: A Low-Pause-Time Garbage Collector
JC Tools
MpmcArrayQueue
Java Concurrency Utility with JCTools
False Sharing
From the Circle to Epicycles (Part 1) - An animated introduction to Fourier Series
JMH: Benchmark Reactive vs Disruptor
Microbenchmarking with Java
JSON is incredibly slow: Here’s What’s Faster!
Protocol Buffers & Simple Binary Encoding
Json,Protobuf, SBE: Benchmark the byte story
Sort, sweep, and prune: Collision detection algorithms
Deribit volatility (DVOL)
How to Implement a FIX Trading Engine in Python
Financial Information eXchange
The Unreasonable Effectiveness of Simple Binary Encoding (SBE)
Bounded MPMC queue
Understanding the meaning of lvalues and rvalues in C++
C++: lvalue/rvalue for Complete Dummies
Understanding lvalues, rvalues and their references
C++ rvalue references and move semantics for beginners
Resource acquisition is initialization
Effective Java! Use Varargs Judiciously
var args constructors/methods vs lists
Thread-Local Allocation Buffers in JVM
https://www.baeldung.com/java-jvm-tlab
Introduction to Thread Local Allocation Buffers (TLAB)
org.apache.logging.log4j.util.Unbox.java
Java Performance Notes: Autoboxing / Unboxing
An exploration of vector search
Calling a @Bean annotated method in Spring java configuration
Exact difference between CharSequence and String in java
Maps in Java
Embeddings: What they are and why they matter
How does cosine similarity work?
Fiber in C++: Understanding the Basics
Kernel bypass
How to implement a hash table (in C)
Linear Algebra Done Right
TypeToken in Gson
Vim: insert the same characters across multiple lines
Java Collections Framework
FutureTask
Types of References in Java
为什么数学不允许除以0,却定义了根号- 1?
Exploring TLS certificates and their limits
GCs of JVM tuning: PS+PO VS CMS VS G1
GC Algos
JVM and GC
JVM Garbage Collectors
Simulating a financial exchange in Scala
Order Matching Engine Design
matching engine
IO Mode
Measure Theory
covariance of two squared (not zero mean) random variables
How to compute expectation for the product of squared jointly normal random variables
Risk System Concepts - Trade Booking & Pricing
What Is Bootstrapping Statistics?
String Matching KMP
An Interactive Intro to CRDTs
Building a collaborative text editor in Go
Acquire and Release Semantics
Memory Barriers Are Like Source Control Operations
Memory Reordering Caught in the Act
mintomic
An Introduction to Lock-Free Programming
Kiss Linux
Story: Redis and its creator antirez
Why is single threaded Redis so fast
User-space RCU: Memory-barrier menagerie
new_script
Advanced Shell Scripting Techniques: Automating Complex Tasks with Bash
Arthas for Java Debugging
How Arthas Works Underneath
How to Get the Size of an Object in Java
Unleashing the Power of Java Instrumentation: A Deep Dive into Runtime Metamorphosis
https://www.baeldung.com/java-instrumentation
core-java-jvm
Java Aeron Framework: A Beginner’s Guide to Unicast Networking with UDP
Exploring the Java Aeron Framework: A Comprehensive Introduction to IPC
Introduction to Lock-Free Algorithms 101 in Java
Java Microbenchmark Harness (JMH)
Java NIO: Using Memory-mapped file to load big data into applications
mmap
Netty核心原理剖析与RPC实践
Build Your Own Redis with C/C++
LMAX Disruptor
Disruptor++
disruptor C++ 用法浅析
c++ disruptor 无锁消息队列
Low Latency Java with the Disruptor
Scrambling Eggs for Spotify with Knuth's Fibonacci Hashing
Capturing a Java Thread Dump
Manipulation with ASM
A Simple Explanation of How Shares Move Around the Securities Settlement System
A Simple Explanation of Balance Sheets
A simple explanation of how money moves around the banking system
https://github.com/donnemartin/system-design-primer
Happens-Before
String Types in Java 8
Maps in Java