Struggling with Linux’ OOM killer when building RPMs

Last two weeks, I have created a lot of updated packages for Mandriva 2007.1. I packaged Gnome 2.18.1, and also updated subversion snaphots of kdepim and kdegraphics. Kdepim, because it has received a lot of bug fixing love the last two months, and kdegraphics, because it contains kpdf using new xpdf code, which should be compatible with PDF 1.6 and 1.7 specifications. In kdepim, they also removed the kitchensynk tool, which offered synchronization options with external devices. Apparently it was too buggy to be really useful. I have the impression that Kmail is indeed also more stable than in 3.5.6. I could not reproduce yet the hangs I sometimes experienced with 3.5.6.

Compiling kdepim on an AMD64 system, seems to require a huge amount of memory (much more than on x86 32 bit). 1 GB of RAM and about 250 MB of swap did not prevent g++ eating up all of my memory (even when no other services were active!). Unfortunately, this made Linux become completely unstable: the hard drive started thrashing the whole time, and the system was completely unresponsive. I could only stop it by doing a hard reset. I am clearly not the only one hating this stupid Linux behaviour.

In the Mandriva Cooker channel (irc.freenode.org, #mandriva-cooker), couriousous suggested to execute

# echo 2 > /proc/sys/vm/overcommit_memory

The default value is 0: when an application asks more memory than is available, Linux will still try to allocate it, even if chances exist that it won’t be available. This seems to be done because some applications ask more memory than what they will really use. If in the end, no more memory or swap is availed, Linux’ OOM killer will kill some (random) processes to free up memory. By setting overcommit_memory to 2, the allocation of too much memory will fail immediately. The application can then react itself to the fact that not enough memory is available. The result was that instead of bringing my system to death, the g++ compiler just exited with the message that I was out of memory. Much nicer! I found a complete technical explanation about memory allocation in Linux on the web. To make this setting default, I put vm.overcommit_memory = 2 in my /etc/sysctl.conf. I do not understand why it is not the default value, making a system unresponsive for several (tens of) minutes does not seem very friendly…

Update 16 april 2007: It seems like this setting has severe problems as well. On my system with 384 MB RAM, which I use as server and desktop, several applications randomly crashed because they did not get the memory they wanted (Evolution, Tilda,…). Changing the setting back to 0, made these applications work correctly again. I suppose playing with the overcommit_ratio value as explained in the article which I mentioned above, can improve this behaviour. But anyway, it sucks that such things are so difficult to get right. This should really be working nicely out of the box

Anyway, in the end I got kdepim compiled on AMD64 by adding some more swap. From now on, I’ll be creating bigger swap partitions is Linux, it can realy be useful, even when you think you have enough memory…

I also built a freetype 2.3.4 RPM for Mandriva. Font rendering on my flat panel is now much nicer comparing with freetype 2.3.1 which is included in Mandriva 2007.1! Not that it was ugly before, but it’s a nice surprise to still see such big improvements, especially from a minor update.

8 Replies to “Struggling with Linux’ OOM killer when building RPMs”

  1. Hi Frederik,

    For your specific irresponsive computer, i found out the switch from as scheduler (default mandriva actually) to cfq (boot option elevator=cfq) to change dramatically this behaviour. The system is a lot more responsive during high I/O operations on disk and allows more operations to do while a process eats up the memory.

    May be we can ask (again !) for this behaviour and yours for the 2008.0 release.

  2. Thanks for your comment.

    Actually I was already using CFQ IO scheduler, because I use tmb kernel on most of my machines. It indeed helps to make the system more responsive in “normal” heavy IO situations, but it does not make any difference at all when you are out of memory. The kernel just tries to hard to find free memory and swap to no avail, and the system becomes totally irresponsive in about 30 seconds.

    CFQ became the default IO scheduler in Linux 2.6.18, so Mandriva 2008 should definitely have it as default :-)

    I also hope Mandriva 2008 will have >a new task scheduler, based on the recent work by Ingo Molnar and Con Kolivas: http://lkml.org/lkml/2007/4/13/180

  3. Hi Reinout,

    Thanks for your interest! I already though this question would come sooner or later.

    Actually my packages are not real backports, as Gnome 2.18.1 is only landing in cooker since today. Mine are updated packages based on the 2.18.0 RPMS in 2007.1 and Gnome 2.18.1 sources, so they are not eligible for backports, and anyway I don’t have a maintainer account which would permit me to post such packages.

    The only alternative I have, would be to publish the RPMs in a custom repository. But I’m afraid the world is not waiting for yet another third party repository which is potentially incompatible with official and other third party updates and backports. And it would probably cost a lot of bandwidth, and I’m not sure if my employer would be happy if I would be hosting that on this machine :-)

    So I’m not sure. Having updated packages in backports maintained in collaboration with Mandriva maintainers indeed seems to be the best solution. Maybe we should bring the question to the mailing list, whether any Gnome 2.18.x backports are planned.

  4. Just today I read on planet GNOME that 2.18.1 will be made available for OpenSUSE… we don’t want to be left behind do we? :)

  5. Frederik: what font and context did you see the improvements in for freetype 2.3.4 vs. 2.3.1? I just checked it out following your praise, but I didn’t see any difference (I even took a screenshot to compare). This is on a 1440×900 laptop screen.

    On the other hand, it was a Web page that used DejaVu in Firefox (this very blog entry of yours on Planet Mandriva), and I know that might affect the results (DejaVu being optimized for freetype w/o hinting and Firefox’s font rendering apparently being different than other apps’ from what I’d heard before).

    Since I’m already very impressed with font rendering under 2007.1/freetype 2.3.1, I’d be very interested in seeing what improvements were made. Looks like I picked the right time to move to using it for my every-day work!

  6. Hi John,

    I saw it with standard fonts (so this should be Deja Vu fonts) everywhere in KDE’s interface: in the menus, etc…

    I’m using anti-aliasing with full hinting style and sub-pixel hinting enabled on a TFT panel. On a classic monitor, the difference was almost invisible.

  7. Huh… I use Gnome, but I also have full hinting and sub-pixel turned on, using my laptop’s TFT. I suppose I should check with menus and other native elements (I use DejaVu all over, too) instead of looking at rendering in Firefox.

    Thanks for the info. I’m definitely going to look closely the next time I’m in my cooker setup…

Comments are closed.