Programming and Scripting :: Compiler magic



I thought I'd gather some tips here, along with explanations.

Optimization levels:
-O3     Highest
-Os     Like O2, but also optimizes for size. This is a very good option; smaller programs take less ram too. The linux kernel is a good example: the recent 2.6 kernels have an experimental option to turn -Os on. To me it resulted in a kernel that took an entire mb less ram! And because this was for a 16mb machine, it really showed: free ram right after boot increased from 12mb to 13mb. (including 3 gettys running)
-O2     The usual level
-O       Smallest optimization
-O0     That's O zero. No optimization at all.

Optimization for certain processor: Tests have shown this (with -O3) makes programs about 15% faster if run on the cpu optimized for. The difference is biggest with i586 chips, because of their unique design, they get about 20% more speed.
-march=pentium2     Sets minimum processor to pentium2. Also optimizes for it.
-mcpu=pentium2     Optimizes for P2, but doesn't touch the minimum; ie this is still runnable on i386.

All the above go into CFLAGS/CXXFLAGS. Everything below goes into LDFLAGS.

-Wl,-rpath -Wl,/opt/prog/lib    A linker option to add /opt/prog/lib to the runtime library search path. This is useful for not needing to create a wrapper script to run the app, that usually sets LD_LIBRARY_PATH to that dir so the app finds it's libraries.

-Wl,-as-needed     This is a really interesting option. Let's say we have a gtk1 app that only uses libgdk. But gtk-config --libs (that the makefile uses) causes it to link with all the gtk1 libs, causing slow start because all these libraries are loaded into memory. But if you give this flag, the program will only get linked with the libraries it uses; resulting in a faster startup, and a cleaner looking ldd output. *Warning* this requires binutils 2.17.

-s     Strips the results during linking, saving you some time.



PS: for compiling speed, if you ever compile GCC yourself, be sure to do it with profiling optimization. It speeds up C compiling up to 9%.

Hi Curaga, nice summary.

Quote
Tests have shown


Do you have a ref for that?  Conventional wisdom has it that performance differences using optimizations within the x86 family would usually be much less than this, more like <=4% for most progs, or so I had been led to believe.  But you might have a better source.

http://wnd.katei.fi/gccopt/

He got 10-16% speed increase in Vorbis enc/dec on 32bit, and going 64bit added about 30% on top of that (32bit unoptimized vs 64bit optimized)


My own experiences also support a figure about 15%.

Ok, that ref is consistent with the conventional wisdom.  He got that big improvement with multimedia encoding, which is cpu intensive.  Cpu intensive progs can be expected to benefit more from optimization than most other progs.

Quote
Optimizing for i686 instead of i386 may speed up processing multimedia contents, but ordinary user never notices any difference.

That doesn't make your post any less handy a summary, btw.
Next Page...
original here.