Monday, June 4, 2012

Building Gentoo Linux with GCC 4.7 and LTO

GCC 4.7 was recently added to portage. It's still hard masked and not keyworded, but works fairly well. As with almost every new major release of GCC, a few packages fail to build, but the community is already working on it; most reported packages have been fixed already. I fixed the ones I encountered myself (kdocker and zsnes) and submitted patches to Gentoo's Bugzilla. Still, if you're using a package that hasn't been fixed yet and no workaround has been posted, you might want to hold off from switching your system to GCC 4.7 as its main compiler. (Just installing 4.7 will have no impact on your system, since the current active version will remain the default.)

There's one feature in this release that stands out: the much improved link time optimization (LTO). What this does is hold off most of the optimization work until the final link step so that the optimizer can work across the whole program rather than on individual object files. LTO is activated using the -flto and -fno-use-linker-plugin options.

As you can guess, this isn't your usual Gentoo ricer option (see -funroll-loops for that). It's a useful feature that helps the optimizer produce much better results. Most commercial compilers had this for a long time but GCC was lagging behind. Well, for now it's a ricer option, since it's unsupported by the Gentoo devs, but I fully expect it to become a supported option at some point as GCC will keep improving it. Building your whole system with LTO was still a hopeless endeavor with GCC 4.6. But in 4.7 it actually improved to the point where you can emerge -e world with it. And I did. And it works, even.

My Gentoo box is a KDE desktop system, with 1043 packages installed. Out of those 1043, only 33 could not be built with LTO. Quite a step up from GCC 4.6! Last time I experimented with it on 4.6, I gave up after about the 100th package failing to build. Worse, some packages would build but not run. So LTO in GCC 4.7 seems to really have improved a lot. There was only one package that would build but not run correctly (dev-libs/glib).

How to disable LTO

So here's what you do if you want to try this yourself. The most important thing to set up is a way to disable LTO for packages that don't work well with it. Fortunately, package.env makes this easy. First, create the file:

/etc/portage/env/no-lto.conf

With the following contents:

CFLAGS="${CFLAGS} -fno-lto -fno-use-linker-plugin"
CXXFLAGS="${CXXFLAGS} -fno-lto -fno-use-linker-plugin"
LDFLAGS="${LDFLAGS} -fno-lto -fno-use-linker-plugin"


You can now use that file for every package that breaks with LTO. You do that by listing the appropriate package atoms, followed by no-lto.conf in /etc/portage/package.env. Here's how that looks on my system:

sys-apps/sysvinit no-lto.conf
dev-lang/perl no-lto.conf
sys-libs/gpm no-lto.conf
dev-libs/elfutils no-lto.conf
>=dev-lang/python-3 no-lto.conf
sys-fs/e2fsprogs no-lto.conf
sys-apps/hdparm no-lto.conf
sys-apps/pciutils no-lto.conf
media-sound/wavpack no-lto.conf
media-libs/libpostproc no-lto.conf
dev-vcs/cvs no-lto.conf
x11-libs/qt-script no-lto.conf
media-libs/alsa-lib no-lto.conf
dev-util/dialog no-lto.conf
sys-apps/hwinfo no-lto.conf
dev-util/valgrind no-lto.conf
app-cdr/cdrtools no-lto.conf
dev-libs/boost no-lto.conf
media-video/libav no-lto.conf
app-text/aspell no-lto.conf
net-misc/nx no-lto.conf
app-text/dvisvgm no-lto.conf
x11-libs/wxGTK no-lto.conf
media-video/mplayer2 no-lto.conf
x11-libs/qt-webkit no-lto.conf
x11-libs/qt-declarative no-lto.conf
x11-base/xorg-server no-lto.conf
dev-tex/luatex no-lto.conf
app-misc/strigi no-lto.conf
kde-base/kdelibs no-lto.conf
kde-base/okular no-lto.conf
net-im/amsn no-lto.conf
sys-devel/llvm no-lto.conf
sys-devel/clang no-lto.conf
app-office/lyx no-lto.conf
dev-libs/glib no-lto.conf
sys-auth/polkit no-lto.conf
net-analyzer/nmap no-lto.conf

Now all of the above packages will be built without LTO. Go ahead and copy the above into your own package.env.

Update: This list is somewhat old now; I suspect many of the above packages have already been fixed or current GCC versions are able to build them without errors.

How to enable LTO

To actually enable LTO, you need to change your make.conf and add -flto -fuse-linker-plugin to your CFLAGS/CXXFLAGS and LDFLAGS. Yes, it's also needed in LDFLAGS. Do not omit it!

In order to speed up LTO-enabled builds, you can pass the amount of concurrent jobs to be performed by the linker to the -flto option. So on a quad core CPU, you'd use -flto=4 (with a BFS kernel) or -flto=5 (with a vanilla kernel.) In general, use the same value from your MAKEOPTS variable.

Note that with LTO, you need to include your optimization flags in LDFLAGS as well. You can use the same optimizations you have in your CFLAGS. Though you can use different ones if you really want to.

Keep in mind that you'll need at least sys-devel/binutils-2.21 for LTO to work correctly. If (for some weird reason) you're still on 2.20, it will not work, since GNU ld versions prior to 2.21 do not support linker plugins (-fuse-linker-plugin).

For reference, here are the relevant entries from my own make.conf:

CFLAGS="-pipe -mtune=native -march=native -O2 -flto=4 -fuse-linker-plugin -fomit-frame-pointer -floop-interchange -floop-strip-mine -floop-block"
CXXFLAGS="${CFLAGS}"
LDFLAGS="-Wl,--as-needed -Wl,-O1 -Wl,--hash-style=gnu -Wl,--sort-common 
${CFLAGS}"

You're now ready to emerge -e @system followed by emerge -e @world. Of course you need to make GCC 4.7 the default compiler first. You do that by using the gcc-config tool.

Dealing with breakage

Since you most probably will have a different set of packages installed on your system compared to me, you might encounter build failures. Those can either be caused by LTO or by GCC 4.7 in general. So if a package fails to emerge, add it to package.env with no-lto.conf and emerge --resume. If that fails again for the same package, then it's probably a GCC 4.7 problem. In that case, you can skip building that particular package with emerge --resume --skipfirst. Don't worry, the package will continue to work even with the rest of the system being built with GCC 4.7. GCC 4.7 is compatible with 4.6 and binaries (including libraries) can be built with one and work OK with the other.

It would be nice if you filed a bug for packages that don't build with 4.7 and make that bug block 390247. Note: only file bugs for packages that fail with 4.7 without LTO. If a package only fails when LTO is enabled, don't file a bug; the Gentoo devs are not interested about LTO-related bugs at the moment.

Final thoughts

If you want to ask whether the system runs any faster now, I'm not going to answer that. I didn't run any benchmarks, and statements like "this and that program feel faster now" are subjective to begin with and very prone to placebo. Keep in mind that I didn't rebuild the whole system in order to get a faster system. The system was already fast enough. I mainly wanted to test whether LTO is ready for prime time. And it's very close to being ready. The next GCC version will surely improve compatibility even further. One day, package.env might be empty, even :-)

35 comments:

  1. Rather than faster I am interested in HD space. Have you observed a reduction on the space ocuped by the binaries?

    ReplyDelete
    Replies
    1. df -h showed no change, which means that if there was one, it was well under 1GB. (I had 21GB free before, I still have 21GB free.)

      Delete
    2. I have just only compiled gcc and it shows 250 before and 175. For me, just 1 GB it is a LOT of space. My root partition has only 8 GB (this was a terrible mistake :P)

      Delete
    3. Installing GCC 4.7 alongside 4.6 will of course take more space.

      You can quickpkg gcc:4.6 with a PKGDIR that points to another partition and then unmerge it.

      Note that GCC itself is always compiled without LTO, regardless of the CFLAGS in your make.conf. Other packages will be somewhat smaller in size when built with LTO.

      Delete
  2. Hi, nice information! Thanks!

    I would like to ask you something. In a couple of days I will receive a laptop with Ivy Bridge, so I thought I would install Gentoo using gcc-4.7, to optimize it for this CPU. In your explanation above you were talking about emerging the whole system from an existing installation, so if one package failed there would still be another version in the system. What to do if it is a fresh Gentoo install?

    ReplyDelete
    Replies
    1. The only reliable way is to emerge a working base system using an older GCC. After you get that working, switch to GCC 4.7 and "emerge -e --keep-going @world". This is only a bare base system that shouldn't take more than 40 minutes to 1 hour at most to rebuild. After that's done, install the rest (like your desktop environment and all that stuff). For every package that fails, switch to the older GCC (with gcc-config), "emerge -1" that single package, then switch to 4.7 again and continue to emerge your stuff.

      For example, imagine you're trying to emerge "kde-base/kdebase-meta". If one of its deps fails along the way (say, "kwrite"), you do:

      gcc-config 1
      source /etc/profile
      emerge -1 kwrite
      gcc-config 2
      source /etc/profile
      emerge kdebase-meta

      That should work. Downside: it requires you to monitor the emerge process. You can't just leave the machine alone in the corner for 6 hours.

      Delete
    2. This is the hard way to do it. But you can just add "--keep-going" to you emerge and get nice list of packages which failed to emerge after it finishes. You can then switch compilers, emerge and switch back.

      Delete
    3. "--keep-going" won't really work for a first install, because when a dependency is skipped, all packages depending on it can't be emerged anymore.

      Delete
    4. Do I really have to do the first basic install with an older gcc, or can I skip the rebuilding of it with the newer gcc by first installing gcc-4.7.1 and build from the start the basic install?

      Delete
    5. You should be able to do that. But note that you will be building everything on an older base system. The usual practice in Gentoo is to rebuild the whole system toolchain after you've set up your make.conf so that everything will be consistent.

      But again, the base system shouldn't take long to rebuild, especially on an IB CPU :-) It should really be a matter of minutes rather than hours.

      Delete
    6. Thanks a lot for the comments! I will try all these when the laptop comes next week. Good luck with your blog, so far I find it very interesting. Nine inches... :)

      Delete
  3. The sad part: one of the reasons -flto needs to be added to LDFLAGS is because a massive amount of packages using $(CC) $(LDFLAGS) when linking, omitting $(CFLAGS). This has been discouraged by the upstream GCC-folks in their documentation for years, mostly because "For predictable results, you must also specify the same set of options used for compilation when you specify this linker option."

    Hardened has struggled with this for years, since -fPIC/-fPIE is one set of these flags, and because of that also the -nopie. But that is not the only flag I have gotten strange results with when not passed during linking as well.

    ReplyDelete
    Replies
    1. This the job of the build system used by the package though. I use automake or qmake for my projects, and I always assumed they do the right thing. I suppose this whole situation would improve if the build systems themselves would do it correctly.

      Delete
  4. I Just rebuilded my new stage 3 gentoo with the gcc-4.7.1 just a few problems:

    I needed to add "sys-process/procps no-lto.conf" and "sys-apps/gawk no-lto.conf" to /etc/portage/package.env

    Also I have to create a file named "/etc/portage/env/gcc45.conf" with the following contents:

    CC="x86_64-pc-linux-gnu-gcc-4.5.4"
    CXX="x86_64-pc-linux-gnu-g++-4.5.4"
    CFLAGS="-O2 -pipe"
    CXXFLAGS="-O2 -pipe"
    LDFLAGS=""

    and then add to /etc/portage/package.env the following line:

    sys-boot/grub gcc45.conf

    Then the system has compiled and runs ok.

    Just now I am emerging kde, firefox and libreoffice and there ara some other packages that need to be "lto-disabled".

    When succesfully finished and running I will post the full list.

    Thank you for the guide.

    ReplyDelete
  5. XBMC needs to have LTO disabled to build - I just reported that issue at http://trac.xbmc.org/ticket/13332

    ReplyDelete
  6. I AM STILL WAITING FOR SABAYON/GENTOO TO RELEASE A BUILT COMPILED WITH GCC 4.7.1 :(

    Can't wait to be officially allowed to create C11 and C++11 Programs!! :) :)

    ReplyDelete
  7. JUST 7 Bugs left cry :*( can't wait...

    https://bugs.gentoo.org/buglist.cgi?quicksearch=+sys-devel%2Fgcc-4.7.1

    ReplyDelete
  8. Just for boost is about the only thing I can't build with 4.7.2
    Now, I don't really understand.. LTO makes binaries bigger, clunkier, and practically, the same that --as-needed does for linking... Or am I just getting it wrong?

    ReplyDelete
    Replies
    1. LTO has nothing in common with --as-needed. Not even conceptually. What LTO does is allowing the compiler's optimizer to work across object files, thus performing interprocedural optimization:

      http://en.wikipedia.org/wiki/Interprocedural_optimization

      Delete
    2. You're right I was totally of the track, I've been reading up and rebuilding my system, besies adding awk, and removing xorg-server, I'm finding it impossible (gcc 4.7.2 fault) to build libX11..
      Although I'm in arch, and the error is related I guess to -O3 and -ffast-math (yeah, I know, I'm asking for it)

      Delete
  9. AFAIK you could build most of these packages, if you have enough memory. Even if you don't have 64Gb, setting ulimit -v 16384000 would probably do it. AFAIK GCC checks for limits.

    ReplyDelete
  10. Hello.

    Nice experiment. I just finished rebuilding my Gentoo world with -flto too. I have about 1436 packages on my system, among them are kde, texlive, firefox, clang, few IDEs, few games etc. Most of them built successfully, only 27 packages were built without lto. But there were some large packages which I'd like to optimize but they failed:

    kde-base/kdelibs
    dev-lang/perl
    www-client/firefox
    dev-lang/spidermonkey
    media-gfx/inkscape
    app-text/texlive-core
    media-video/mplayer
    x11-base/xorg-server

    and others.

    Also, I've noticed a package who had -fno-lto in its Makefile, so probably without lto I have a bit more packages.

    I even had a successful built of kernel, but it broken my wifi, so my kernel is without lto too :(

    The main problem was that compilation speed was extremely low. It decreased more than two times. On my Core2 duo in simple mode I rebuild system about 1.5 days, and now it built more than 3 days!

    So, increasing of compile time was the only effect of enabling lto that I cat see with my eyes ;)

    ReplyDelete
  11. from all i've read in the manpage and experimented, you should build CFLAGS with -O0 and save the massive optimization steps for the LDFLAGS.

    another details is that -flto by itself is mostly harmless (just emits gimple) until you introduce -fwhole-program or the -fuse-linker-plugin

    thus for performance with graphite working hard and lto, I've cobbled together to play with

    CFLAGS         = -flto -O0
    CXXFLAGS     = -flto -O0
    LDFLAGS        = -march=native -flto -fwhole-program -O6 -fuse-linker-plugin -fvariable-expansion-in-unroller -fsplit-ivs-in-unroller -funroll-loops -ftracer -fvect-cost-model -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-distribution -ftree-loop-if-convert -floop-parallelize-all -floop-flatten -fgraphite-identity -floop-block -floop-strip-mine -floop-interchange -fipa-matrix-reorg -fipa-pta -fipa-struct-reorg -fsel-sched-pipelining-outer-loops -fsel-sched-pipelining -fselective-scheduling2 -fselective-scheduling -freschedule-modulo-scheduled-loops -fsched-stalled-insns -fgcse-after-reload -fgcse-las -fgcse-sm -fmodulo-sched -fmerge-all-constants

    ReplyDelete
    Replies
    1. I don't think that's correct. Optimization flags have to match in the compile and link steps. In the compile step, the optimization flags affect GIMPLE generation, which is then used in the link step. The manpage actually shows an example on how to use LTO:

      gcc -c -O2 -flto foo.c
      gcc -c -O2 -flto bar.c
      gcc -o myprog -flto -O2 foo.o bar.o

      -fwhole-program on the other hand is described as a way to use LTO for large programs whose call graphs don't fit in memory (http://gcc.gnu.org/wiki/LinkTimeOptimization, http://gcc.gnu.org/projects/lto/whopr.pdf).

      Delete
    2. You're right though that without -fuse-linker-plugin the results aren't that great. I updated the article to include that option (and also mention the concurrent jobs option of -flto).

      Delete
    3. In theory you shouldn't need to use that option as it is enabled by default, provided that you are using LD 2.21+ or the gold linker:

      "This option is enabled by default when LTO support in GCC is enabled and GCC was configured for use with a linker supporting plugins (GNU ld 2.21 or newer or gold)" -GCC 4.7.2 manpage

      Delete
  12. Does dev-libs/glib still not work correctly with enabled lto? How can I verify, that it does or doesn't work?

    Thanks by the way for your article.

    ReplyDelete
    Replies
    1. I'am finished with rebuilding. I'am using ~amd64 with cinnamon and a total of 1086 packages. The following packages won't build with lto:
      app-admin/sudo
      app-emulation/virtualbox
      app-text/rarian
      dev-lang/perl
      dev-lang/python
      dev-lang/ruby
      dev-lang/spidermonkey
      dev-lang/v8
      dev-libs/elfutils
      dev-python/notify-python
      dev-qt/qtopengl
      dev-qt/qtscript
      dev-util/dialog
      dev-util/valgrind
      dev-tex/luatex
      media-libs/alsa-lib
      media-libs/mesa
      media-sound/pulseaudio
      media-video/ffmpeg
      media-video/mplayer
      net-analyzer/nmap
      net-libs/webkit-gtk
      sys-apps/hdparm
      sys-apps/pciutils
      sys-apps/sysvinit
      sys-devel/llvm
      sys-fs/e2fsprogs
      sys-fs/mtools
      sys-libs/gpm
      x11-base/xorg-server
      x11-libs/wxGTK

      And finally gnome-extra/evolution-data-server and/or mail-client/evolution built but don't work correctly with lto.

      Delete
    2. You'll know if glib doesn't work since your desktop won't start (KDE too, since Qt also makes use of glib.) IIRC, applications using it just segfaulted.

      Delete
  13. Thank you so much for this article! I've been following many of the GCC improvements over the last year, but it wasn't until I googled "Gentoo GCC 4.7" and found your article that I decided to take the final step and gain the C++11 extensions system wide.

    My list is very similar to Anonymous' list above, so if anyone else decides to go through with this and they actually read the comments, between that list and mine, you shouldn't have to emerge --resume at all. :P

    dev-libs/glib no-lto.conf
    sys-apps/sysvinit no-lto.conf
    dev-lang/perl no-lto.conf
    dev-libs/elfutils no-lto.conf
    sys-apps/pciutils no-lto.conf
    app-admin/sudo no-lto.conf
    dev-lang/v8 no-lto.conf
    net-libs/nodejs no-lto.conf
    dev-libs/atk no-lto.conf
    sys-libs/gpm no-lto.conf
    media-libs/alsa-lib no-lto.conf
    dev-util/dialog no-lto.conf
    dev-lang/spidermonkey no-lto.conf
    app-text/rarian no-lto.conf
    xfce-base/xfconf no-lto.conf
    sys-boot/grub no-lto.conf
    >=dev-lang/python-3 no-lto.conf
    media-video/ffmpeg no-lto.conf
    media-video/mplayer2 no-lto.conf
    dev-python/notify-python no-lto.conf
    sys-devel/llvm no-lto.conf
    x11-base/xorg-server no-lto.conf

    I wanted to mention a number of things that caught me off guard by your article: it isn't made clear that -flto takes the number of threads as an argument. It also isn't obvious that LDFLAGS should take the same CFLAGS used for compilation. I also think it would be beneficial for you and others to have links to the GCC documentation for the flags used (e.g. You have both -mtune and -march, but it's very clear in the GCC docs that march implies mtune, so it's a redundant flag that clutters your CFLAGS var).

    The following link describes several the the flags I see on this page (under "General Optimizer Improvements"): http://gcc.gnu.org/gcc-4.4/changes.html

    ReplyDelete
    Replies
    1. Oops, I thought I was already putting the optimization flags into LDFLAGS as well. Will fix.

      The article does already mention that the -flto option takes the number of jobs as a parameter though.

      Using both -mtune and -march is not redundant from a Gentoo perspective. Some ebuilds strip out the -march option. When that happens, the -mtune option will still apply.

      Delete
    2. Aha! Fair point! Darn those ebuilds! :P

      Thanks for the great article! I'm finally about to enjoy my new build. I'll let you know if anything crops up. One lingering thought: I haven't run grub-install... and I'm a bit afraid to.. I built grub using LTO and it seems to have worked, but just like the glib bugs.. I don't want to have to reboot and find out I need to pull out my RIP disk. lol

      Delete
  14. dev-libs/libaio will actually build w/ LTO, but then causes errors:
    /lib64/libaio.so.1: undefined reference to `io_getevents'

    dev-libs/libaio no-lto.conf

    ReplyDelete
  15. Just had to add the following for Chromium:

    www-client/chromium no-lto.conf

    ReplyDelete
  16. Nikos, well done for your article. I am trying to compile my Gentoo with LTO and GCC-4.9.1. I am only confused about the C/LDFLAGS. As I can see above I sould set CFLAGS="-flto=n -fuse-linker-plugin etc" and again in LDFLAGS="-flto=n -fuse-linker-plugin". According to http://yuguangzhang.com/blog/enabling-gcc-graphite-and-lto-on-gentoo/ -fuse-linker-plugin is only written in LDFLAGS. So what should I do?
    Should I use both -flto and -fuse-linker-plugin at C/LDFLAGS? Currently everything I try to compile either stops telling that gcc cannot produce executables (wrong C/LDFLAGS) or stops at linking ranting about collect2: ld return 2.

    CFLAGS="-march=native -O2 -m3dnow -mtls-dialect=gnu2 -mglibc -m64 -pipe -ftree-parallelize-loops=6 -flto=3 -frename-registers -fomit-frame-pointer "
    CXXFLAGS="${CFLAGS}"
    LDFLAGS="-Wl,--as-needed -Wl,-O1 -Wl,--hash-style=gnu -Wl,--sort-common -Wl,-zcombreloc -fuse-linker-plugin"

    Also what about "-flto-partition"? It speeds up compilation or the final code?

    ReplyDelete