Home GCC compiler size benchmarks
Post
Cancel

GCC compiler size benchmarks

Intro

Compilers, compiliers, compilers…

The black magic behind producing executable binaries for different kind of processors. All programmers use them, but most of them don’t care about the internals and their differences. Anyway, this post is not about the compiler’s internals though, but how the different versions perform regarding the size that is produced.

I’ve made another benchmark few months ago here, but that was using different compilers (GCC and clang) and different C libraries. Now I’m using only GCC, but different versions.

Size doesn’t matter!

Well, don’t get me wrong here, but sometimes it does. Typical scenario is when you have a small microcontroller with a small flash size and your firmware is getting bigger and bigger. Another scenario is that you need to sacrifice some flash space for the DFU bootloader and then you realize that 4-12K are gone without writing any line of code for you actual app.

Therefore, size does matter.

Compiler Flags

Compilers come with different optimization flags for example the -Os flag configures the compiler to optimize specifically for size.

“OK, so the binary size matters only when you the -Os!”

No, no, no. The binary size matters whatever optimization flag you use. For example your main need may be to optimize for performance. An example is if you’re using a fast toggle gpio, e.g. implementing a custom bit-banging bus to program and interface an FPGA (like the Xilinx’s selectmap). In this case you may need the -O1/2/3 optimization more than -Os, but still the size matters because you’re limited in flash space. So, two different compiler versions may have even 1KB difference for the same optimization level and that 1KB may be critical someday to one of your projects!

And don’t forget about the -flto! This is an important flag if you need size optimization; therefore, all the benchmarks are done with and without this flag also.

Benchmarking

I’ve benchmarked the following 9 different GCC compiler versions:

  • gcc-arm-none-eabi-4_8-2013q4
  • gcc-arm-none-eabi-4_9-2014q4
  • gcc-arm-none-eabi-5_3-2016q1
  • gcc-arm-none-eabi-5_4-2016q2
  • gcc-arm-none-eabi-5_4-2016q3
  • gcc-arm-none-eabi-6_2-2016q4
  • gcc-arm-none-eabi-6-2017-q1-update
  • gcc-arm-none-eabi-6-2017-q2-update
  • gcc-arm-none-eabi-7-2017-q4-major

It turned out that all the GCC6 compilers performed exactly the same; therefore, without reading the release notes I assume that the changes have to do with fixes rather optimizations.

The code I’ve used for the benchmarks is here:
https://github.com/dimtass/stm32f103-usb-periph-expander

This is my next stupid project and it’s not completed yet, but still it compiles and without optimizations creates a ~50KB binary. To use your toolchain, just change the toolchain path in the TOOLCHAIN_DIR variable in the cmake/TOOLCHAIN_arm_none_eabi_cortex_m3.cmake file and run ./build.bashon Linux or build.cmdon Windows.

Results

These are the results from compiling the code with different compilers and optimization flags.

gcc-arm-none-eabi-4_8-2013q4

flagsize in bytessize in bytes (-flto)
-O051908
-O132656
-O231612
-O339360
-Os27704

gcc-arm-none-eabi-4_9-2014q4

flagsize in bytessize in bytes (-flto)
-O05221656940
-O13269223984
-O23149622988
-O33967231268
-Os2756319748

gcc-arm-none-eabi-5_3-2016q1

flagsize in bytessize in bytes (-flto)
-O05169655684
-O13265624032
-O23112423272
-O33973230956
-Os2726019684

gcc-arm-none-eabi-5_4-2016q2

flagsize in bytessize in bytes (-flto)
-O05173655724
-O13267224060
-O23114423292
-O33974430932
-Os2729219692

gcc-arm-none-eabi-5_4-2016q3

flagsize in bytessize in bytes (-flto)
-O05192055888
-O13268424060
-O23114423300
-O33974030948
-Os2729219692

gcc-arm-none-eabi-6_2-2016q4,gcc-arm-none-eabi-6-2017-q1-update,gcc-arm-none-eabi-6-2017-q2-update

flagsize in bytessize in bytes (-flto)
-O05163255596
-O13271224284
-O23105622868
-O34014030488
-Os2712819468

gcc-arm-none-eabi-7-2017-q4-major

flagsize in bytessize in bytes (-flto)
-O05150055420
-O13248824016
-O23067222080
-O34064829544
-Os2674418920

Conclusion

From the results it’s pretty obvious that the -fltoflag makes a huge difference in all versions except GCC4.8 where the code failed to compile at all with this flag enabled.

Also it seems that when no optimizations are applied with -O0then the -fltoinstead of doing size optimization, actually created a larger binary. I have no explain for that, but anyways it doesn’t really matter, because there’s no point in using -fltoat all in such cases.

OK, so now let’s get to the point. Is there any difference between GCC versions? Yes, there is, but you need to see that in different angles. So, for the -Osflag it seems that the GCC7-2017-q4-majorproduces a binary which is ~380 bytes smaller without -fltoand ~550 bytes with -fltofrom the second better GCC version (GCC6). That means that GCC7 will save you from changing part to another one with a bigger flash, only if your firmware exceeds the size by those sizes with GCC6. But, what are the changes, right? We’re not talking about 8051 here…

But wait… let’s see what happens with the -O3though. In this case using the -fltoflag GCC7 creates a binary which is 1KB smaller compared to the GCC6 version. That’s big enough and that may save you from changing to a larger part! Therefore, the size matters also for other optimization levels like the -O3. This also means that if your code size getting larger and you need the max performance optimization, then the compiler version may be significant.

“So, why not use always the latest GCC version?”

That’s a good question. Well, if you’re writing your software from the scratch now, then probably you should. But if you have an old project which is compiling with an old GCC version, then this doesn’t mean that it will also compile with -Wallin the newer version. That’s because between those two versions there might be some new warnings and errors that doesn’t allow the build. Hence, you need to edit your code and correct all the warnings and errors. If the code is not that big, then the effort may not be that much; but if the code is large then it means that you may need to spend much time on it. It’s even worse if you’re porting code that is not yours.

Therefore, the compiler version does matter for the binary size for all the available optimization flags and depending your code size and processor you might need to choose between those versions depending your needs.

Have fun!

This post is licensed under CC BY 4.0 by the author.

container_of()

Joystick gestures w/ STM32

Comments powered by Disqus.