NIC Tuning for HFT

These are four interrelated techniques used in Linux to distribute network packet processing across multiple CPU cores. The goal is to avoid overloading a single CPU, increase throughput, and reduce latency.

1. NIC Interrupt Binding (IRQ Affinity)

What it does: Assigns the hardware interrupt (IRQ) of a network interface card (NIC) to specific CPU cores.

How it works:

When a packet arrives, the NIC raises an interrupt.
The CPU that handles that interrupt then processes the packet (or hands it off).
By default, interrupts may be handled by any core, causing cache bouncing and imbalance.
Binding the interrupt to a dedicated core (or set of cores) improves cache locality and predictable performance.

Configuration:

IRQ affinity is set via /proc/irq/<irq_num>/smp_affinity (bitmask) or irqbalance service.

2. RSS – Receive Side Scaling

What it does: A hardware feature of modern NICs that distributes incoming packets among multiple receive queues, each with its own interrupt.

How it works:

The NIC uses a hash (typically over IP addresses and ports) to map each flow to a queue.
Each queue can be bound to a different CPU core (via interrupt binding).
Packets of the same flow always go to the same queue → avoids reordering and preserves per‑flow processing locality.
Only flows are balanced, not individual packets.

Benefits:

Parallel packet reception from the NIC directly.
Low CPU overhead (hash done in hardware).

Configuration:

Enabled via ethtool: ethtool -L eth0 combined <num_queues>
Queue‑to‑CPU mapping via interrupt affinity or irqbalance.

3. RPS – Receive Packet Steering

What it does: A software implementation of RSS, introduced when NICs have only one queue or to further spread load beyond hardware queues.

How it works:

Works at the driver layer, after the NIC has received the packet.
For each incoming packet, a hash is computed (similar to RSS) to decide which CPU should process it.
The packet is placed on the backlog queue of the target CPU, and a softirq is raised on that CPU.
Requires RPS to be enabled and CPU masks to be defined.

Benefits:

Distributes receive processing among CPUs even with a single‑queue NIC.
Can be used together with RSS to spread traffic from each hardware queue to multiple CPUs.

Configuration:

/sys/class/net/<eth0>/queues/rx-<n>/rps_cpus – CPU mask for flows from this queue.

4. RFS – Receive Flow Steering

What it does: Extends RPS by steering packets to the same CPU that is running the application consuming the flow.

How it works:

RPS alone sends flows to arbitrary CPUs based on hash.
RFS tracks the CPU on which a socket is being read (via the kernel’s flow table).
Incoming packets for that flow are directed to that CPU, increasing cache hit rates.
Falls back to RPS if no flow information is available.

Benefits:

Better CPU cache utilization → lower latency, higher throughput for CPU‑bound workloads.

Configuration:

/proc/sys/net/core/rps_sock_flow_entries (global)
/sys/class/net/<eth0>/queues/rx-<n>/rps_flow_cnt (per queue)
Also requires RPS to be enabled.

Summary Comparison

Technique

Scope

Mechanism

Dependency

IRQ Affinity

Hardware interrupt

Bind IRQ to CPU

NIC (any)

RSS

Hardware

NIC distributes flows to queues

NIC must support multiple queues

RPS

Software

Kernel distributes packets after reception

Any NIC, works with single queue

RFS

Software

RPS + steer to application CPU

RPS enabled, flow table

All four can be used together: RSS spreads packets across queues, each queue interrupt is bound to a CPU, RPS further spreads the workload from those queues, and RFS fine‑tunes steering to the application’s CPU.

Below is an ASCII diagram that traces a single network packet from the wire to the application. It highlights where each of the four techniques (RSS, Interrupt Binding, RPS, RFS) intervenes and what they contribute.

        +---------------------------------------+
        |          1.  INCOMING PACKET          |
        |         (Ethernet frame)              |
        +------------------+--------------------+
                           |
                           v
        +---------------------------------------+
        |  2.  NIC HARDWARE (with RSS)          |
        |      +------------+------------+      |
        |      | Queue 0    | Queue 1    | ...  | <--- RSS: hash(5‑tuple)
        |      | (IRQ 104)  | (IRQ 105)  |      |      → choose queue
        |      +------------+------------+      |
        +------------------+--------------------+
                           | (DMA into memory)
                           v
        +---------------------------------------+
        |  3.  INTERRUPT CONTROLLER             |
        |      (delivers IRQ to CPU)            |
        +------------------+--------------------+
                           | (IRQ)
                           v
        +---------------------------------------+
        |  4.  CPU CORES                        |
        |      +------------+------------+      |
        |      | CPU 0      | CPU 1      | ...  | <--- INTERRUPT BINDING
        |      | handles    | handles    |      |      (smp_affinity)
        |      | IRQ 104    | IRQ 105    |      |
        |      +------------+------------+      |
        +------------------+--------------------+
                           |
                           v
        +---------------------------------------+
        |  5.  DRIVER / NAPI POLL               |
        |      (allocate skb, fetch packet)     |
        +------------------+--------------------+
                           |
                           v
        +---------------------------------------+
        |  6.  RPS (Receive Packet Steering)    |
        |      - compute hash again             |
        |      - enqueue to backlog of CPU X    | <--- RPS: spread load
        |      - raise IPI to CPU X             |      from single queue
        +------------------+--------------------+
                           |
                           v
        +---------------------------------------+
        |  7.  RFS (Receive Flow Steering)      |
        |      - consult flow table             |
        |      - override CPU X → CPU Y         | <--- RFS: follow application
        |        (where socket is read)         |      (better cache)
        +------------------+--------------------+
                           |
                           v
        +---------------------------------------+
        |  8.  TARGET CPU (softirq)             |
        |      - run backlog                    |
        |      - IP stack (IP, TCP/UDP)         |
        +------------------+--------------------+
                           |
                           v
        +---------------------------------------+
        |  9.  SOCKET / APPLICATION             |
        |      (running on same CPU as step 8)  |
        +---------------------------------------+

How Each Technique Impacts the Flow

Technique

Location

Role

RSS

NIC hardware

Splits incoming flows into separate hardware queues. Enables parallel DMA + IRQs.

IRQ Binding

Interrupt controller / CPU

Pins each queue’s IRQ to a specific core. Prevents interrupts from bouncing between CPUs.

RPS

Kernel (driver RX path)

Software‑based spreading: moves packets from the IRQ CPU to another CPU’s backlog.

RFS

Kernel (flow table)

Refines RPS by steering packets to the same CPU that is running the consuming app.

Together they allow a modern system to:

Receive packets from a 100 GbE NIC without dropping (RSS + IRQ binding).
Distribute software processing evenly (RPS).
Keep data hot in the CPU cache (RFS).

Note: RPS and RFS are only active when explicitly configured; otherwise packets are processed entirely on the CPU that handled the IRQ.

PreviousTCP Tuning for HFT Next2025

Last updated 17 hours ago

hashtag1. NIC Interrupt Binding (IRQ Affinity)

hashtag2. RSS – Receive Side Scaling

hashtag3. RPS – Receive Packet Steering

hashtag4. RFS – Receive Flow Steering

hashtagSummary Comparison

hashtagHow Each Technique Impacts the Flow

1. NIC Interrupt Binding (IRQ Affinity)

2. RSS – Receive Side Scaling

3. RPS – Receive Packet Steering

4. RFS – Receive Flow Steering

Summary Comparison

How Each Technique Impacts the Flow