Table of Contents

Optimization

C++

Tools

Interesting tool to count cpu cycles and see the resulting assembly code with diverent compilers and optimization levels.

http://gcc.godbolt.org

Intel Compilers

Compiler Auto Vectorization

The Intel compiler has several options for vectorization - the -x flag. To determine what is supported, examine the /proc/cpuinfo file and look for avx or sse specifications in the flags category. s. Note that the Intel compiler will try to vectorize a code with SSE2 instructions at optimizations of -O2 or higher. Disable this by specifying -no-vec.

Multiple levels of vectorization - the -ax flag. This flag will generate run-time checks to determine the level of vectorization support on the processor and will then choose the optimal execution path for that processor. It will also generate a baseline execution path that is taken if the -ax level of vectorization specified is not supported. The baseline can be defined with the -x flag, with -xSSE4.2 recommended. Multiple -ax flags can be specified to create several options. For example, compile with -axAVX -xSSE4.2. In this case, when run on ww8-node1, the baseline SSE4.2 execution path will be taken, on all other, the AVX execution path will be taken

optimizing the code

The -vec-report flag can generates diagnostic information regarding vectorization to stdout. (optional parameter 0 … 5 (e.g., -vec-report0), with 0 disabling diagnostics and 5 providing the most detailed diagnostics) You can see which loops are which are notoptimized and why. The output can be useful to identify possible strategies to get a loop to vectorize.

Guided Auto Parallelization

The GAP feature can help analyze source code and generate advice on how to obtain better performance. In particular, GAP will suggest code changes or compiler options that will lead to better vectorized code. GAP may optionally allow the user to take advantage of the auto-parallelization capability that can generate multithreaded code for independent loop iterations; however, developers are encouraged to use explicit thread parallelism through mechanisms like OpenMP.

The GAP feature can be accessed by adding the -guide (optional parameter =1 … =4). The report will be printed to stderr or it can be redirected to a file with the -guide-file=filename option, or -guide-file-append=filename. The GAP analysis can be targeted to a specific file, function, or source line with the -guide-opt=specification option.

C++ 11

Watch out for mixing ABI:s when linking against libs that are compiled with GCC <5.x, as they don't have the modern ABI (C++11 ABI). Beginning with GCC 5.x modern ABI is default. More info: https://gcc.gnu.org/onlinedocs/libst..._dual_abi.html Modern ABI forbids copy-on-write for std::string and requires std::list to keep track of their size. If you app is multithreaded, and you do lots of string manipulations, then -D_GLIBCXX_USE_CXX11_ABI=0 is a good approach for winning back some lost performance.

In the case of upgrading compilers in horrendous legacy applications -D_GLIBCXX_USE_CXX11_ABI=0 is often a must to begin with.