NIC Tuning for HFT
These are four interrelated techniques used in Linux to distribute network packet processing across multiple CPU cores. The goal is to avoid overloading a single CPU, increase throughput, and reduce latency.
1. NIC Interrupt Binding (IRQ Affinity)
What it does: Assigns the hardware interrupt (IRQ) of a network interface card (NIC) to specific CPU cores.
How it works:
When a packet arrives, the NIC raises an interrupt.
The CPU that handles that interrupt then processes the packet (or hands it off).
By default, interrupts may be handled by any core, causing cache bouncing and imbalance.
Binding the interrupt to a dedicated core (or set of cores) improves cache locality and predictable performance.
Configuration:
IRQ affinity is set via
/proc/irq/<irq_num>/smp_affinity(bitmask) orirqbalanceservice.
2. RSS – Receive Side Scaling
What it does: A hardware feature of modern NICs that distributes incoming packets among multiple receive queues, each with its own interrupt.
How it works:
The NIC uses a hash (typically over IP addresses and ports) to map each flow to a queue.
Each queue can be bound to a different CPU core (via interrupt binding).
Packets of the same flow always go to the same queue → avoids reordering and preserves per‑flow processing locality.
Only flows are balanced, not individual packets.
Benefits:
Parallel packet reception from the NIC directly.
Low CPU overhead (hash done in hardware).
Configuration:
Enabled via ethtool:
ethtool -L eth0 combined <num_queues>Queue‑to‑CPU mapping via interrupt affinity or
irqbalance.
3. RPS – Receive Packet Steering
What it does: A software implementation of RSS, introduced when NICs have only one queue or to further spread load beyond hardware queues.
How it works:
Works at the driver layer, after the NIC has received the packet.
For each incoming packet, a hash is computed (similar to RSS) to decide which CPU should process it.
The packet is placed on the backlog queue of the target CPU, and a softirq is raised on that CPU.
Requires RPS to be enabled and CPU masks to be defined.
Benefits:
Distributes receive processing among CPUs even with a single‑queue NIC.
Can be used together with RSS to spread traffic from each hardware queue to multiple CPUs.
Configuration:
/sys/class/net/<eth0>/queues/rx-<n>/rps_cpus– CPU mask for flows from this queue.
4. RFS – Receive Flow Steering
What it does: Extends RPS by steering packets to the same CPU that is running the application consuming the flow.
How it works:
RPS alone sends flows to arbitrary CPUs based on hash.
RFS tracks the CPU on which a socket is being read (via the kernel’s flow table).
Incoming packets for that flow are directed to that CPU, increasing cache hit rates.
Falls back to RPS if no flow information is available.
Benefits:
Better CPU cache utilization → lower latency, higher throughput for CPU‑bound workloads.
Configuration:
/proc/sys/net/core/rps_sock_flow_entries(global)/sys/class/net/<eth0>/queues/rx-<n>/rps_flow_cnt(per queue)Also requires RPS to be enabled.
Summary Comparison
IRQ Affinity
Hardware interrupt
Bind IRQ to CPU
NIC (any)
RSS
Hardware
NIC distributes flows to queues
NIC must support multiple queues
RPS
Software
Kernel distributes packets after reception
Any NIC, works with single queue
RFS
Software
RPS + steer to application CPU
RPS enabled, flow table
All four can be used together: RSS spreads packets across queues, each queue interrupt is bound to a CPU, RPS further spreads the workload from those queues, and RFS fine‑tunes steering to the application’s CPU.
Below is an ASCII diagram that traces a single network packet from the wire to the application. It highlights where each of the four techniques (RSS, Interrupt Binding, RPS, RFS) intervenes and what they contribute.
How Each Technique Impacts the Flow
RSS
NIC hardware
Splits incoming flows into separate hardware queues. Enables parallel DMA + IRQs.
IRQ Binding
Interrupt controller / CPU
Pins each queue’s IRQ to a specific core. Prevents interrupts from bouncing between CPUs.
RPS
Kernel (driver RX path)
Software‑based spreading: moves packets from the IRQ CPU to another CPU’s backlog.
RFS
Kernel (flow table)
Refines RPS by steering packets to the same CPU that is running the consuming app.
Together they allow a modern system to:
Receive packets from a 100 GbE NIC without dropping (RSS + IRQ binding).
Distribute software processing evenly (RPS).
Keep data hot in the CPU cache (RFS).
Note: RPS and RFS are only active when explicitly configured; otherwise packets are processed entirely on the CPU that handled the IRQ.
Last updated