2024

December

31: MarkItDown is a utility for converting various files to Markdown
20: Open Addressing vs Closed Addressing in HashMap
- Open Addressing, aka., Closed Hashing
- Closed Addressing, aka., Open Hashing
19: Java 9 引入的 Memory Order
- Java 9 VarHandles Best practices, and why? youtube
- What are memory fences used for in Java?
18: A friendly introduction to assembly for high-level programmers
17: What is the reason for performing a double fork when creating a daemon?
- The first process in the process group becomes the process group leader and the first process in the session becomes the session leader. Every session can have one TTY associated with it. Only a session leader can take control of a TTY. For a process to be truly daemonized (ran in the background) we should ensure that the session leader is killed so that there is no possibility of the session ever taking control of the TTY.
- Use and meaning of session and process group in Unix?
6: Some Python knowledge reviewed
3: On system memory... specifically the difference between tmpfs, shm, and hugepages...
- There's no difference between shm and tmpfs (actually, tmpfs is only the new name of former shmfs). hugetlbfs is a tmpfs-based filesystem that allocates its space from kernel huge pages and needs some additional configuration afford.
- JVM Anatomy Quark #2: Transparent Huge Pages
- hugetlbfs support comes to Java Chronicle Queue
- Huge Pages and Transparent Huge Pages
- Understanding Huge Pages
- Logical Volume Manager: A Beginner's Guide
- Block devices use SCSI (Small Computer System Interface) to talk to processes on one side and hardware on the other side. Hence, block devices have device names (since everything on Linux is a file, so device files) starting with sd*, where s = SCSI and d = device. During boot time, the kernel detects block devices one by one and name them in the form sd[a-z]. So in my case, my vmware disk got detected first and hence, it was named sda. Next, after boot, I attached my USB stick and the kernel named it sdb. As I mentioned earlier, all disks have one or more partitions. In the above output, sdb1, sda1, etc. are names of partitions.
- Remember that file systems are the managers of every partition? When we created a partition on mylv1, it's size was 1 GB. Just increasing the volume size doesn't mean that the filesystem would also be synchronized.
1: A Deep Dive Into Python's functools.wraps Decorator
functools.wraps, an easy-to-use interface for functools.update_wrapper, is a decorator that automatically transfers the key metadata from a callable (generally a function or class) to its wrapper. Typically, this wrapper is another function, but it can be any callable object such as a class.

November

27: Does a lambda expression create an object on the heap every time it's executed?
It is equivalent but not identical. Simply said, if a lambda expression does not capture values, it will be a singleton that is re-used on every invocation.
26: A Guide to BitSet in Java
check the Affinity.java Using Pausers in Event Loops
25: Python Decorators II: Decorator Arguments by Bruce Eckel
Decorator Functions with Decorator Arguments
20: Adding Git-Bash to Windows Terminal
I hope I know it earlier... Note there is a bash.exe and use it instead of git-bash.ext. Use .bashrc instead of .profile or .bash_profile to setup env var because by default the git-bash won't login like bash -l automatically.
19: Detailed Explanation of Guava RateLimiter's Throttling Mechanism
The token bucket limits the average inflow rate and allows sudden increase in traffic. The leaky bucket limits the constant outflow rate, which is set to a fixed value.
Some other implementations,
- ring buffer based solution
- queue based solution
13: Java Timer
- Concurrency Deep Dives worth a reading.
- Implementing a Java Timer uses native Java API.
- Comparing it with a Hashed Timer Wheel Solution.
- DelayedQueue used by ScheduledThreadPoolExecutor
- HashedWheelTimer vs ScheduledThreadPoolExecutor for higher performance
7: How the glibc strlen() implementation works / strlen performance implementation
- The algorithm behind the check is based on Determine if a word has a zero byte: Bit Twiddling Hacks
6: Scalable IO in Java
- from Doug Lea, the famous Java Concurrency Module author.
- 《Scalable IO in Java》译文
5: Introduction to Java ProcessHandle
- "Container types, including collections, maps, streams, arrays, and optionals should not be wrapped in optionals."
- "The ProcessHandle class does have the arguments method, which returns Optional<String[]>, but this method should be regarded as an anomaly that is not to be emulated."
4: Why String is popular HashMap key in Java?
- Using String as a key in a Java HashMap is generally preferred over CharSequence. This is because String is immutable, ensuring that its hash code remains constant throughout its lifetime, which is crucial for maintaining the integrity of the HashMap. In contrast, CharSequence is an interface that includes both mutable (like StringBuilder) and immutable implementations, leading to potential inconsistencies in equality and hash code behavior across different implementations. Thus, to avoid unexpected behavior, it’s advisable to use String as the key.
- The hashCode() implementation for StringBuilder in Java is not explicitly defined in the same way it is for String. Instead, StringBuilder inherits the hashCode() method from the Object class, which returns a hash code based on the object's memory address. This means that each instance of StringBuilder will have a unique hash code that does not take into account the contents of the builder.
3: String length
String length method for Java and other languages.
2: Hashed Wheel Timers
- due to the nature of threaded execution and system timing, this test could potentially fail if the system is under heavy load or experiencing other issues that cause significant delays.
- Alternatives to Hashed Wheel Timers:
- Heap-based timers
- List-based timers
- DeadlineTimerWheel.java
1: Java Enums Are Inherently Serializable
The default implementation has better performance.

October

31: Replace your switch statement and multiple "if and else", using Object Literals
Lesson: treat the switch statement as if they were the data.
30: Java Convert Bytes to Unsigned Bytes
- When we need to represent signed numbers in Java, we find 2's complement.In 2's complements the left most bit represent the sign (+ ive or - ive). The bit 0 denotes positive and 1 denotes negative. The rest of the bits denotes the value from -128 to 127. Therefore, it is called 8-bit byte only have 7 bits to store the values. other extra values range from 128 to 255 are unable to fit into a single byte. So, we can cast it to 32-bit unsigned integer for more spaces (bits).
- Note that Java does not provide unsigned byte. If we need to represent a number as unsigned byte (1 byte -> 4 bytes), we must cast byte to int and mask (&) the new int with a &0xff (get the last 8 bits). It gives the last 8-bits or prevents sign extension.
- Java 8 provides the built-in method toUnsignedInt() that is defined in the Byte class. It supports unsigned operations. The method converts a signed byte into an unsigned integer.
- Many external systems (e.g., databases, network protocols) utilize unsigned types. The lack of native support for unsigned bytes in Java complicates integration with these systems, requiring additional conversion logic or the use of larger data types (like int or long) to represent values that should fit in an unsigned byte24. This can lead to performance overhead and potential bugs if developers do not handle these conversions carefully.
- Frequent conversions between signed and unsigned representations can impact performance, especially in applications that require high throughput or low latency, such as video processing or real-time data analysis.
- the lack of an unsigned byte type in Java complicates data handling and interoperability while increasing the risk of errors in code. Developers must implement additional logic to work around these limitations, which can lead to more complex and error-prone applications.
29: Debugging Till Dawn: How Git Bisect Saved My Demo
git bisect run ./test_for_bug.sh
28: Nginx Logging: A Comprehensive Guide
27: A virtual DOM in 200 lines of JavaScript
26: Using files for shared memory IPC
- If another process attempts to load the same file (while it is still resident in the cache) the kernel detects this and doesn't need to reload the file. If the page cache gets full, pages will get evicted - dirty ones being written back out to the disk.
- By contrast, with IPC implemented using shared memory, there are no read and write syscalls, and no extra copy step. Each "channel" can simply use a separate area of the mapped buffer. A thread in one process writes data into the shared memory and it is almost immediately visible to the second process.
- if shared memory IPC can be implemented without memory mapped files?
- A practical way would be to create a memory-mapped file for a file that lives in a memory-only file system; e.g. a "tmpfs" in Linux.
- You could in theory implement a shared segment between two processes
- What is the purpose of MAP_ANONYMOUS flag in mmap system call?
- Note that both Aeron IPC and CQ support tmpfs to further improve the performance
- When setting up Aeron for IPC, the media driver can be configured to operate with a term buffer located on a tmpfs mount point. This setup minimizes disk I/O latency since all operations occur in memory. The configuration involves specifying the directory for the Aeron media driver to point to a tmpfs mount, ensuring that all IPC messages are handled in-memory
- For even lower latencies, Chronicle Queue can be backed by tmpfs, a temporary filesystem that resides in RAM. This configuration significantly reduces delays caused by disk operations, provided that the queue size is managed appropriately.
25: Aeron: Open-source high-performance messaging
The video discusses Aeron, a messaging system focused on high performance and reliability, particularly in scenarios where traditional protocols like TCP and UDP may fall short. The speaker, Martin Thompson, emphasizes the need for consistent latency and the challenges of reliable message delivery over UDP.
- Transportation Media: multicast, IPC, InfiniBand, RDMA, PCI-e 3.0
- OSI Layer 4 (Transport) Services
- Connection Oriented Communication
- Reliability
- Flow Control: counters are the key to flow control and monitoring; pluggable in Aeron
- Congestion Avoidance/Control: TCP is not suitable for HFT partially because of it; pluggable in Aeron
- Multiplexing: HOL Blocking
- Design Principles
from
1. clear segregation of control
2. garbage free in steady state running
3. lock-free, wait-free and copy-free in data structure in the messaging path
4. respect the Single Writer Principle
5. major data structures are not shared
6. don't burden the main paths with exceptional cases
7. non-blocking in the message path
8. ...
into 3 basic things
- system architecture
- data structure
- protocol of interactions
- Data Structure
- Maps: dealing with primitives
- IPC Ring/Broadcast Buffer: between Conductors
- ITC Queues: between Sender/Receiver and Conductors
- Dynamic Arrays
- Log Buffer: IPC for messaging, creates a replicated persistent log of messages
  mmap
  tail is being moved atomically
  No big file: page fault; page cache churn; VM pressure; clean/dirty/active
  receiver side: High Water Mark + Completed; point chasing is really bad (In the context of messaging systems, point chasing refers to a method where a sender strategically prioritizes and sends messages to maximize engagement or response rates.)
- Monitoring and Debugging should be designed on day 1
- Loss, throughput and buffer size are strongly related
- Java
- Bad:
  No Unsigned Type
  NIO - Locks, off-heap, PAUSE, Signals, etc
  String Encoding - 3 buffer copy
  External Resources
  Selectors - GC
  converting bytes into int
- Good:
  Tooling: IDEs, Gradle, HdrHistogram
  Lambda & Method Handlers
  Bytecode Instrumentation: good to debugging
  Unsafe
  The Optimizer
  Garbage Collectors
- Kernel Module and FPGAs possible
- Aeron: Do we really need another messaging system?
24: Evolution of Financial Exchange Architectures
The video features Martin Thompson discussing the evolution of financial exchanges, focusing on advancements in design, resilience, performance and deployment over the past decade.
- Design
- State Machine -> Replicated State Machine: ordered input + deterministic execution
- Distributed Event Log: event sourcing
- Rich Domain Model (DDD) and specific data structure designed from scratch
- Time & Timers: atomic clock + gps synchronizer; how a timer cancels an order
- Resilience
- Fairness: multiple gateways -> 1
- Gateway: classification of customers
- Matching Engine: sharding by symbol/fungible...
- Primary Secondary vs Consensus: Raft
- Code Quality and Model Fidelity: Model fidelity refers to the degree to which a model accurately represents the real-world system or phenomenon it is intended to simulate or predict. High fidelity means that the model closely matches the actual behavior or characteristics of the system, capturing important details and dynamics. Low fidelity indicates a more simplified or abstract representation that may overlook critical factors.
- Performance: Transaction throughput has increased significantly, with some exchanges reaching millions of transactions per second and achieving latencies below 100 microseconds.
- Latency: average latency is misleading, we need percentile
- Throughput: burst scenario
- JVM:
  CMS full GC
  G1
  Azul C4: Continuously Concurrent Compacting Collector, high allocation rate without nasty gc pauses with Amdahl's law
  ZGC: not generational (but we can turn it on now?)
  Shanadoah: better at smaller heaps
- Memory Access Patterns: Java is still catching up with that, c can get the close to the machine about the memory layout so that's why the fastest matching engine is written is c
- Data Structure: check all kinds of libs or even implement your own one; prevent cache misses;
- Binary Codecs: SBE; the FIX protocol is encoded in ASCII
- Preventing Costs: system calls; disk calls; page fault is going to interrupt the kernel - A page fault is an exception raised by the memory management unit (MMU) when a program attempts to access a memory page that is not currently mapped to its virtual address space. This situation typically arises when the required page is not loaded into physical memory (RAM), which make the mmap file horrendously more expensive all of a sudden. Setup huge pages to fix it; context switching
- Hardware
  Disks: from milliseconds to tens/hundreds of microseconds
  Network: financial organization is good at that
  CPU: not too much improvement, throughput is abundant, but the latency is not getting better
  IO: socket is not good; new API for IO and please use asynchronous API; DPDK
- Language: polyglot
- Deployment
- CI/CD
- Flexible Scaling: dev env in your local machine; using IPC if the machine has 100 cores
23: An oral history of Bank Python
Dagger, a directed, acyclic graph of financial instruments. Also refer to tradebook for this concept.
22: Datetimes versus timestamps in MySQL
- The Epochalypse problem.
- Using Unix Timestamps in MySQL Mini-Course
21: HTTP/3 From A To Z: Core Concepts
QUIC’s faster connection set-up with 0-RTT is really more of a micro-optimization than a revolutionary new feature. Compared to a state-of-the art TCP + TLS 1.3 set-up, it would save a maximum of one round trip. The amount of data that can actually be sent in the first round trip is additionally limited by a number of security considerations.
20: The Gamma Of Levered ETFs
Levered ETFs are trading tools that are not suitable for investing. They do a good job of matching the levered return of an underlying index intraday. The sum of all the negative gamma trading is expensive as the mechanical re-balancing gets front-run and “arbed” by traders. This creates significant drag on the levered ETF’s assets. In fact, if the borrowing costs to short levered ETFs were not punitive, a popular strategy would be to short both the long and short versions of the same ETF, allowing the neutral arbitrageur to harvest both the expense ratios and negative gamma costs from tracking the index!
19: How the Guinness Brewery Invented the Most Important Statistical Method in Science
18: Server Setup Basics
and his [YREADME.mdet another full-node guide](https://becomesovran.com/blog/yet-another-full-node-guide.html) is quite good too. And here is another blog about the mentioned btop.
15: What is an Equivalent Martingale Measure, and why should a bookie care?
- If there is an arbitrage possibility, then there is no EMM.
- If there are no arbitrage possibilities, then there is at least one EMM.
- If every payoff is replicable, then there is exactly one EMM.
- Changing Probability Measures
- Girsanov’s Theorem
13: Introduction to Spliterator in Java
- Writing a custom spliterator in Java 8
12: NIO Buffer
11: Guide to Java groupingBy Collector
7: 你管这破玩意儿叫 TCP
窗口大小 = min(cwnd, rwnd)
2: String Length vs Character Length in Different Languages
In Java, the length method of String objects is not the length of that String in characters. Instead, it only gives the number of 16-bit code units used to encode a string. This is not (always) the number of Unicode characters (code points) in the string.

September

30: TCP Fast Open
TCP Fast Open (TFO) 是在传统的三次握手基础上进行优化，允许在握手过程中发送数据，从而减少首次发送数据的延迟，提升网络应用性能。
29: Essence of linear algebra
TODO
28: High-availability matching engine of a stock exchange
This repo using RAFT to ensure the Availability, and based on CAP Theory,
- Consistency: no, only eventual consistency
- Availability: yes
- Partition Tolerance: yes
27: Linear Algebra 101 for AI/ML
TODO
26: Trading at light speed: designing low latency systems in C++ - David Gross - Meeting C++ 2022
- I wish I can understand more when I watch it again later.
25: 如果让你来设计网络
24: Java Objects.hash() vs Objects.hashCode()
- back to the basics.
23: Java Variable Handles Demystified
- The goal of VarHandle is to define a standard for invoking equivalents of java.util.concurrent.atomic and sun.misc.Unsafe operations on fields and array elements.
13: Java G1GC - Card Table (CT) vs Remembered Set (RS)
12: Bending pause times to your will with Generational ZGC
- In the worst case we evaluated, non-generational ZGC caused 36% more CPU utilization than G1 for the same workload. That became a nearly 10% improvement with generational ZGC.
- Introducing Generational ZGC
11: Demystifying ZGC: Concurrent Garbage Collection and Colored Pointers
10: Memory Address of Objects in Java
When we don’t declare a hashCode() method for a class, Java will use the identity hash code for it.
9: Java Variable Handles Demystified
- Correct way to use VarHandle in Java 9?
7: Java 17 migration: bias locks performance regression
- JVM raises a flag in the monitor object that some thread acquires the lock, so reacquiring and releasing the lock by the same thread is lightweight. But the lock must be revoked when another thread tries to acquire the bias lock. And the revocation is a costly operation.
- Loop unrolling: Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff.
- Loop unswitching: Loop unswitching is a compiler optimization. It moves a conditional inside a loop outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional.[1] This can improve the parallelization of the loop. Since modern processors can operate quickly on vectors, this improvement increases the speed of the program.
6: Remote C++ Development with Docker and CLion (with X11) -> linux ARM is not supported yet, which means Apple chips are not supported yet.

August

29: Why should I use the keyword "final" on a method parameter in Java?
- Use the keyword final when you want the compiler to prevent a variable from being re-assigned to a different object.
- 8 Kinds of Variables
28: Types of References in Java
- The default type of reference in Java is a strong reference.
- If JVM sees an object that has only a weak reference during GC, it will be collected.
- A soft reference is garbage-collected only if there is not enough memory left in the heap.
- Unlike weak and soft references whereby we can control how objects garbage-collected, a phantom reference is used for pre-mortem clean-up actions before the GC removes the object.
27: Where are generic types stored in java class files?
- 详解Gson的TypeToken原理
26: Move Your Mac's Home Folder to a New Location
Simple but useful.
24: The Synchronizes-With Relation
23: Java Objects Inside Out
22: Java Memory Layour
- JOL to inspect the memory layout of objects in the JVM.
- Put simply, the Contended annotation adds some paddings around each annotated field to isolate each field on its own cache line. Consequently, this will impact the memory layout. Please note that the Contended annotation is JDK internal, therefore we should avoid using it. Also, we should run our code with the -XX:-RestrictContended tuning flag; otherwise, the annotation wouldn’t take effect.
- Compressed OOPs in the JVM
- the JVM adds padding to the objects so that their size is a multiple of 8 bytes. With these paddings, the last three bits in oops are always zero. This is because numbers that are a multiple of 8 always end in 000 in binary.
- Since ZGC needs to use 64-bit colored pointers, it does not support compressed references. So, using an ultra-low latency GC like ZGC has to be weighed against using more memory.
- jvm-堆内存不要超过32G
- ZGC: A Scalable Low-Latency Garbage Collector
- An Introduction to ZGC
- Shenandoah: A Low-Pause-Time Garbage Collector
21: JC Tools
- MpmcArrayQueue
- Java Concurrency Utility with JCTools
- False Sharing
- On modern x86 CPUs, the typical cache line size is 64 bytes. However, some CPUs may have larger cache lines, such as 128 bytes.
- To ensure compatibility with various CPU architectures and avoid false sharing on CPUs with larger cache lines, the ConcurrentCircularArrayQueueL0Pad class is padded with 128 bytes.
20: From the Circle to Epicycles (Part 1) - An animated introduction to Fourier Series
19: JMH: Benchmark Reactive vs Disruptor
- Microbenchmarking with Java
18: JSON is incredibly slow: Here’s What’s Faster!
- Protocol Buffers & Simple Binary Encoding
- Json,Protobuf, SBE: Benchmark the byte story
- We can make “message processors” are faster
- We can make deserialize and serialize process faster
- We can make the network transfer faster
17: Sort, sweep, and prune: Collision detection algorithms
- Usually when optimising algorithms, you wanna find redundant or unnecessary work. Then find a way to consolidate that redundancy.
- The sweep-and-prune algorithm is also known as sort-and-sweep.
- Pairs flagged in all dimensions would be considered intersecting.
- Insertion sort has a running time of O(n) at best when the list is already sorted or nearly-sorted, and O(n2) at worst when the list is in reverse.
16: Deribit volatility (DVOL)
15: How to Implement a FIX Trading Engine in Python
- Financial Information eXchange: The body length is the character count starting at tag 35 (included) all the way to tag 10 (excluded), including trailing SOH delimiters.
- Simple Binary Encoding (SBE) is a codec format specified by the Financial Information eXchange (FIX) Trading Community for use in high-performance trading systems. (The Unreasonable Effectiveness of Simple Binary Encoding (SBE))
- FAST: Primarily designed to reduce the amount of data sent over the network and improve the speed of message delivery, particularly for market data. It utilizes compression techniques to minimize the size of messages, which can include both binary and character-based data.
- FAST: Focuses on the transmission layer and may be used in conjunction with other encoding methods, including SBE.
14: Bounded MPMC queue
13: Understanding the meaning of lvalues and rvalues in C++
- C++: lvalue/rvalue for Complete Dummies
- Understanding lvalues, rvalues and their references
- C++ rvalue references and move semantics for beginners more,
- Resource acquisition is initialization
- Mark you move constructors and move assignment operators with noexcept
- Further optimizations and stronger exception safety with copy-and-swap idiom
- Perfect forwarding
12: Effective Java! Use Varargs Judiciously
- A possible mitigation strategy to this is, if for example 95% of the callers of the function will include 3 or less parameters you can create three functions that take 1, 2, and 3 parameters of the argument type to handle the 95% and a fourth that takes three parameters and a vararg parameter to handle the rest of the 5%.
- var args constructors/methods vs lists
11: Thread-Local Allocation Buffers in JVM
- https://www.baeldung.com/java-jvm-tlab
- Introduction to Thread Local Allocation Buffers (TLAB)
10: org.apache.logging.log4j.util.Unbox.java
- Utility for preventing primitive parameter values from being auto-boxed. Auto-boxing creates temporary objects which contribute to pressure on the garbage collector. With this utility users can convert primitive values directly into text without allocating temporary objects.
- Java Performance Notes: Autoboxing / Unboxing
- private final ThreadLocal<int[]> current = new ThreadLocal<>();: By using ThreadLocal<int[]>, you can store multiple integer values in a single thread-local variable without the overhead of boxing.
9: An exploration of vector search
TODO
8: Calling a @Bean annotated method in Spring java configuration
You will always get the same bean from the context if calling a @Bean annotated method in Spring java configuration. Same for @Component.
7: Exact difference between CharSequence and String in java
6: Maps in Java
5: Embeddings: What they are and why they matter
- How does cosine similarity work? TODO
4: Fiber in C++: Understanding the Basics
TODO
3: Kernel bypass
2: How to implement a hash table (in C)
1: Linear Algebra Done Right

July

31: TypeToken in Gson
1. Anonymous subclass of TypeToken
2. The annotation of the getGenericSuperClass() method of the Class class is: Returns the Type representing the direct superclass of the entity (class, interface, primitive type or void) represented by thisClass. If the superclass is a parameterized type, the Type object returned must accurately reflect the actual type parameters used in the source code. The parameterized type representing the superclass is created if it had not been created before. See the declaration of ParameterizedType for the semantics of the creation process for parameterized types. If thisClass represents either theObject class, an interface, a primitive type, or void, then null is returned. If this object represents an array class then theClass object representing theObject class is returned.
30: Vim: insert the same characters across multiple lines
1. VISUAL BLOCK: I - insert before; A - append after; c - replace
2. macro
3. substitute
29: Java Collections Framework
- Exchanger vs SynchronousQueue vs LinkedTransferQueue
28: FutureTask
- public interface RunnableFuture<V> extends Runnable, Future<V>
- public class FutureTask<V> implements RunnableFuture<V>
27: Types of References in Java
- There are four types of references in Java: Strong, Weak, Soft, and Phantom.
- Unlike weak and soft references whereby we can control how objects garbage-collected, a phantom reference is used for pre-mortem clean-up actions before the GC removes the object.
26: 为什么数学不允许除以0，却定义了根号- 1？
It explains the origin of complex number and the geometrical meaning of complex domain.
24: Exploring TLS certificates and their limits
- How big can a certificate be? For curl to work is 100kB and for a web browser like Firefox is ~60kB though Firefox TLS library works different so results may vary.
- How long can it last? From Jan 1, 1950 to Dec 31, 9999. About 8050 years.
25: GCs of JVM tuning: PS+PO VS CMS VS G1
- GC Algos: Mark-Sweep, Copying, Mark-Compact, Generation-Collection
- JVM and GC
- JVM Garbage Collectors
24: Simulating a financial exchange in Scala
- Order Matching Engine Design: TreeMap (Red-Black Tree) vs Priority Queue (Heap)
- matching engine
18: IO Mode
- Blocking I/O model (Blocking / Synchronous): recvfrom
- Signal-driven I/O (Blocking / Synchronous): SIGIO
- Non-Blocking I/O (Non-Blocking / Synchronous): recvfrom (O_NONBLOCK)
- I/O Multiplexing (Blocking / Asynchronous): select, poll, epoll
- Asynchronous I/O (Non-Blocking / Asynchronous): AIO
15: Measure Theory
- Power Set of X is the set of all possible subsets of X.
- Definition of sigma algebra
12: covariance of two squared (not zero mean) random variables
How to compute expectation for the product of squared jointly normal random variables
11: Risk System Concepts - Trade Booking & Pricing
- what is a trade and how can it be modelled
- what you need before booking a trade
- what happens when a trade is booked
08: What Is Bootstrapping Statistics?
“Bootstrapping is a statistical procedure that resamples a single data set to create many simulated samples.”
04: String Matching KMP
30 lines.
02: An Interactive Intro to CRDTs and Building a collaborative text editor in Go
- CRDT stands for “Conflict-free Replicated Data Type”.
- It’s a kind of data structure that can be stored on different computers (peers). Each peer can update its own state instantly, without a network request to check with other peers.
- Peers may have different states at different points in time, but are guaranteed to eventually converge on a single agreed-upon state. That makes CRDTs great for building rich collaborative apps, like Google Docs and Figma — without requiring a central server to sync changes.

June

28: Acquire and Release Semantics
- Acquire semantics -> LoadLoad + LoadStore
- Release semantics -> LoadStore + StoreStore
27: Memory Barriers Are Like Source Control Operations
Types of Memory Barrier/Reordering: LoadLoad, StoreStore, LoadStore, StoreLoad
26: Memory Reordering Caught in the Act
24: mintomic
Mintomic (short for “minimal atomic”) is an API for low-level lock-free programming in C and C++.
23: An Introduction to Lock-Free Programming
Sequential consistency means that all threads agree on the order in which memory operations occurred, and that order is consistent with the order of operations in the program source code.
21: Kiss Linux
Kiss Linux™ is a meta-distribution for the x86_64 architecture with a focus on simplicity, sustainability and user freedom.
20: Story: Redis and its creator antirez
"For antirez, programming was a way to express himself, a form of art. Every character and line break had to be meticulously crafted, akin to the art form of writing. Software development was like writing a book — it had to be beautiful, elegant, and easy to comprehend. If that software happened to be useful to others, that was just a side effect."Why is single threaded Redis so fast
19: User-space RCU: Memory-barrier menagerie
The truth is that pairs of memory barriers provide conditional ordering guarantees.
18: new_script
- This is a shell script template generator (i.e. a script that writes scripts).
- Advanced Shell Scripting Techniques: Automating Complex Tasks with Bash
17: Arthas for Java Debugging
16: Java Aeron Framework: A Beginner’s Guide to Unicast Networking with UDP
15: Exploring the Java Aeron Framework: A Comprehensive Introduction to IPC
14: Introduction to Lock-Free Algorithms 101 in Java
- Java Microbenchmark Harness (JMH)
- Maybe the thread.yield() should be used instead of the LockSupport.parkNanos(1);
13: Java NIO: Using Memory-mapped file to load big data into applications
- Java supports to use “memory-mapped file” we can see this feature has been implemented in some popular frameworks and are successful by using this capability of Java
- Chronicle queue
- QuestDB
- KDB
- mmap
- kafka -> sendfile
12: Netty核心原理剖析与RPC实践
TODO
11: Build Your Own Redis with C/C++
10: LMAX Disruptor
1. lock-free using volatile / AtomicLong
2. padding to prevent false sharing
3. batching for disk/network writing (e.g., the size of a block is 4k)
4. enable the ability to zero garbage route using byte array or using the immutable object Reference,
5. Disruptor++
6. disruptor C++ 用法浅析
7. c++ disruptor 无锁消息队列
8. Low Latency Java with the Disruptor
9: Scrambling Eggs for Spotify with Knuth's Fibonacci Hashing
8: Capturing a Java Thread Dump
7: Manipulation with ASM
6: A Simple Explanation of How Shares Move Around the Securities Settlement System
5: A Simple Explanation of Balance Sheets
4: A simple explanation of how money moves around the banking system
3: https://github.com/donnemartin/system-design-primer
Latency numbers every programmer should know,
- L1 cache reference 0.5 ns
- Branch mispredict 5 ns
- L2 cache reference 7 ns
- Mutex lock/unlock 25 ns
- Main memory reference 100 ns
- ...

PreviousQuant Questions NextPerformance Analysis and Tuning on Modern CPUs

Last updated 4 months ago

Was this helpful?