Tricks for faster Cpp

1. Memory Alignment

Misaligned memory access can lead to performance degradation due to additional CPU cycles required to access data. To improve performance, ensure that your data structures are properly aligned in memory. This can be achieved by reordering the members of a struct or using compiler-specific attributes to control alignment. Basically reorder the members of a struct to ensure that larger data types are aligned on their natural boundaries, and smaller data types are placed together to minimize padding. This can reduce the overall size of the struct and improve cache performance.

#include <cstdio>
#include <cstdint>
#include <cstddef>

struct PoorlyAlignedData {
  char c;
  uint16_t u;
  double d;
  int16_t i;
};

struct WellAlignedData {
  double d;
  uint16_t u;
  int16_t i;
  char c;
};

#pragma pack(push, 1)
struct PackedData {
  double d;
  uint16_t u;
  int16_t i;
  char c;
};
#pragma pack(pop)

int main() {
  printf("PoorlyAlignedData c:%lu u:%lu d:%lu i:%lu size:%lu\n",
         offsetof(struct PoorlyAlignedData,c), offsetof(struct PoorlyAlignedData,u), offsetof(struct PoorlyAlignedData,d), offsetof(struct PoorlyAlignedData,i), sizeof(PoorlyAlignedData));
  printf("WellAlignedData d:%lu u:%lu i:%lu c:%lu size:%lu\n",
         offsetof(struct WellAlignedData,d), offsetof(struct WellAlignedData,u), offsetof(struct WellAlignedData,i), offsetof(struct WellAlignedData,c), sizeof(WellAlignedData));
  printf("PackedData d:%lu u:%lu i:%lu c:%lu size:%lu\n",
         offsetof(struct PackedData,d), offsetof(struct PackedData,u), offsetof(struct PackedData,i), offsetof(struct PackedData,c), sizeof(PackedData));
}

2. branchless programming

Avoid mispredicted branches by using arithmetic and bitwise operations instead of conditional statements. This can improve performance, especially in tight loops or performance-critical code because modern CPUs can execute instructions in parallel and avoid pipeline stalls caused by branch mispredictions.

and branchless binary search

3. Composition over Inheritance

Composition can often lead to better performance than inheritance because it allows for more flexible and efficient code. Inheritance can introduce additional overhead due to virtual function calls and the need to maintain a vtable, which can lead to slower execution. Composition, on the other hand, allows you to create complex objects by combining simpler ones without the overhead of inheritance. This can lead to faster execution and better memory usage, especially in cases where the behavior of the composed objects is not tightly coupled.

4. CRTP (Curiously Recurring Template Pattern)

CRTP is a C++ programming technique where a class template takes a derived class as a template parameter. This allows for static polymorphism, which can be more efficient than dynamic polymorphism (using virtual functions) because it avoids the overhead of virtual function calls. CRTP can be used to implement features like static interfaces, mixins, and compile-time polymorphism, which can lead to faster code execution and better optimization opportunities for the compiler.

5. Induction Variable

Induction variable is a variable that is incremented or decremented in a loop. It can be used to optimize loops by reducing the number of instructions executed in each iteration. For example, instead of calculating the value of i * 10 + 12 in each iteration of the loop, we can use an induction variable to keep track of the current value and update it in each iteration. This can lead to faster execution and better performance, especially in cases where the loop is executed many times.

6. Loop Unrolling

Loop unrolling is a technique where the body of a loop is replicated multiple times to reduce the overhead of loop control and increase instruction-level parallelism. By unrolling a loop, you can decrease the number of iterations and the number of branch instructions, which can lead to improved performance. However, it can also increase the code size, so it should be used judiciously.

7. Pointer Aliasing and Strict Alias

Pointer aliasing occurs when two or more pointers refer to the same memory location.

This is because, while it may be obvious to the user, at compile time, the compiler cannot guarantee that two pointer variables in the code will never point to the same memory address. Under those theoretical possible cases of pointer aliasing, some compiler optimizations would change the outcome of code; hence, those optimizations are disabled. For instance, the following code would prevent the compiler from applying loop-invariant code motion.

This can lead to performance degradation because the compiler may not be able to optimize code effectively due to potential side effects. To avoid pointer aliasing, you can use the restrict qualifier in C++ (available in C++11 and later) to indicate that a pointer is the only reference to a particular memory location. This allows the compiler to optimize code more aggressively, leading to improved performance.

An example of type-casting a pointer to be a different object is presented here. This example uses a conversion from double * into uint64_t * and modifies the sign bit using the uint64_t pointer.

Type-casting a pointer to an object to a pointer to a different object violates the strict aliasing rule stating that two pointers of different types cannot point to the same memory location, which really means that it is possible the compiler might not use the same register to store the two different pointers, even though they point to the same address. Remember that the CPU registers are the fastest form of storage available to the processor but are extremely limited in storage capacity. So, when an extra register gets used to store the same variable, it is an inefficient use of the registers and negatively impacts performance overall.

This is nothing more than a convoluted but more efficient method of achieving x = -std::abs(x), but it is to demonstrates how this violates the strict aliasing rule.

8. RVO (Return Value Optimization)

If a function returns an object by value, the compiler can optimize the code by eliminating the need to create a temporary object and directly constructing the return value in the caller's memory. This optimization is known as Return Value Optimization (RVO). By using RVO, you can avoid unnecessary copy or move operations, which can lead to improved performance, especially when returning large objects.

And if a local variable is returned, if

  1. the type of the local object is the same as that returned by the function and

  2. the local object is what’s being returned.

RVO will be applied here too. Check "Item 25: Use std::move on rvalue references, std::forward on universal references." for more details.

9. Strength Reduction

Strength reduction is a term used to describe compiler optimizations where complex operations that are quite expensive are replaced by instructions that are simpler and cheaper to improve performance. A classic example is one in which the compiler replaces operations involving division by some value with multiplication by the reciprocal of that value. Another example would be replacing multiplication by a loop index with an addition operation.

10. Tail Call Optimization

Tail call optimization refers to compiler optimization techniques in which recursive function calls are replaced by loops.

Last updated