Intro
Compilers, compiliers, compilers…
The black magic behind producing executable binaries for different kind of processors. All programmers use them, but most of them don’t care about the internals and their differences. Anyway, this post is not about the compiler’s internals though, but how the different versions perform regarding the size that is produced.
I’ve made another benchmark few months ago here, but that was using different compilers (GCC and clang) and different C libraries. Now I’m using only GCC, but different versions.
Size doesn’t matter!
Well, don’t get me wrong here, but sometimes it does. Typical scenario is when you have a small microcontroller with a small flash size and your firmware is getting bigger and bigger. Another scenario is that you need to sacrifice some flash space for the DFU bootloader and then you realize that 4-12K are gone without writing any line of code for you actual app.
Therefore, size does matter.
Compiler Flags
Compilers come with different optimization flags for example the -Os
flag configures the compiler to optimize specifically for size.
“OK, so the binary size matters only when you the -Os!”
No, no, no. The binary size matters whatever optimization flag you use. For example your main need may be to optimize for performance. An example is if you’re using a fast toggle gpio, e.g. implementing a custom bit-banging bus to program and interface an FPGA (like the Xilinx’s selectmap). In this case you may need the -O1/2/3
optimization more than -Os
, but still the size matters because you’re limited in flash space. So, two different compiler versions may have even 1KB difference for the same optimization level and that 1KB may be critical someday to one of your projects!
And don’t forget about the -flto
! This is an important flag if you need size optimization; therefore, all the benchmarks are done with and without this flag also.
Benchmarking
I’ve benchmarked the following 9 different GCC compiler versions:
- gcc-arm-none-eabi-4_8-2013q4
- gcc-arm-none-eabi-4_9-2014q4
- gcc-arm-none-eabi-5_3-2016q1
- gcc-arm-none-eabi-5_4-2016q2
- gcc-arm-none-eabi-5_4-2016q3
- gcc-arm-none-eabi-6_2-2016q4
- gcc-arm-none-eabi-6-2017-q1-update
- gcc-arm-none-eabi-6-2017-q2-update
- gcc-arm-none-eabi-7-2017-q4-major
It turned out that all the GCC6 compilers performed exactly the same; therefore, without reading the release notes I assume that the changes have to do with fixes rather optimizations.
The code I’ve used for the benchmarks is here:
https://github.com/dimtass/stm32f103-usb-periph-expander
This is my next stupid project and it’s not completed yet, but still it compiles and without optimizations creates a ~50KB binary. To use your toolchain, just change the toolchain path in the TOOLCHAIN_DIR
variable in the cmake/TOOLCHAIN_arm_none_eabi_cortex_m3.cmake
file and run ./build.bash
on Linux or build.cmd
on Windows.
Results
These are the results from compiling the code with different compilers and optimization flags.
gcc-arm-none-eabi-4_8-2013q4
flag | size in bytes | size in bytes (-flto) |
---|---|---|
-O0 | 51908 | – |
-O1 | 32656 | – |
-O2 | 31612 | – |
-O3 | 39360 | – |
-Os | 27704 | – |
gcc-arm-none-eabi-4_9-2014q4
flag | size in bytes | size in bytes (-flto) |
---|---|---|
-O0 | 52216 | 56940 |
-O1 | 32692 | 23984 |
-O2 | 31496 | 22988 |
-O3 | 39672 | 31268 |
-Os | 27563 | 19748 |
gcc-arm-none-eabi-5_3-2016q1
flag | size in bytes | size in bytes (-flto) |
---|---|---|
-O0 | 51696 | 55684 |
-O1 | 32656 | 24032 |
-O2 | 31124 | 23272 |
-O3 | 39732 | 30956 |
-Os | 27260 | 19684 |
gcc-arm-none-eabi-5_4-2016q2
flag | size in bytes | size in bytes (-flto) |
---|---|---|
-O0 | 51736 | 55724 |
-O1 | 32672 | 24060 |
-O2 | 31144 | 23292 |
-O3 | 39744 | 30932 |
-Os | 27292 | 19692 |
gcc-arm-none-eabi-5_4-2016q3
flag | size in bytes | size in bytes (-flto) |
---|---|---|
-O0 | 51920 | 55888 |
-O1 | 32684 | 24060 |
-O2 | 31144 | 23300 |
-O3 | 39740 | 30948 |
-Os | 27292 | 19692 |
gcc-arm-none-eabi-6_2-2016q4,gcc-arm-none-eabi-6-2017-q1-update,gcc-arm-none-eabi-6-2017-q2-update
flag | size in bytes | size in bytes (-flto) |
---|---|---|
-O0 | 51632 | 55596 |
-O1 | 32712 | 24284 |
-O2 | 31056 | 22868 |
-O3 | 40140 | 30488 |
-Os | 27128 | 19468 |
gcc-arm-none-eabi-7-2017-q4-major
flag | size in bytes | size in bytes (-flto) |
---|---|---|
-O0 | 51500 | 55420 |
-O1 | 32488 | 24016 |
-O2 | 30672 | 22080 |
-O3 | 40648 | 29544 |
-Os | 26744 | 18920 |
Conclusion
From the results it’s pretty obvious that the -flto
flag makes a huge difference in all versions except GCC4.8 where the code failed to compile at all with this flag enabled.
Also it seems that when no optimizations are applied with -O0
then the -flto
instead of doing size optimization, actually created a larger binary. I have no explain for that, but anyways it doesn’t really matter, because there’s no point in using -flto
at all in such cases.
OK, so now let’s get to the point. Is there any difference between GCC versions? Yes, there is, but you need to see that in different angles. So, for the -Os
flag it seems that the GCC7-2017-q4-major
produces a binary which is ~380 bytes smaller without -flto
and ~550 bytes with -flto
from the second better GCC version (GCC6). That means that GCC7 will save you from changing part to another one with a bigger flash, only if your firmware exceeds the size by those sizes with GCC6. But, what are the changes, right? We’re not talking about 8051 here…
But wait… let’s see what happens with the -O3
though. In this case using the -flto
flag GCC7 creates a binary which is 1KB smaller compared to the GCC6 version. That’s big enough and that may save you from changing to a larger part! Therefore, the size matters also for other optimization levels like the -O3.
This also means that if your code size getting larger and you need the max performance optimization, then the compiler version may be significant.
“So, why not use always the latest GCC version?”
That’s a good question. Well, if you’re writing your software from the scratch now, then probably you should. But if you have an old project which is compiling with an old GCC version, then this doesn’t mean that it will also compile with -Wall
in the newer version. That’s because between those two versions there might be some new warnings and errors that doesn’t allow the build. Hence, you need to edit your code and correct all the warnings and errors. If the code is not that big, then the effort may not be that much; but if the code is large then it means that you may need to spend much time on it. It’s even worse if you’re porting code that is not yours.
Therefore, the compiler version does matter for the binary size for all the available optimization flags and depending your code size and processor you might need to choose between those versions depending your needs.
Have fun!
Comments powered by Disqus.