• Linux,  Work

    Enabling jumbo frames on your network

    Jumbo frames are Ethernet frames with up to 9000 bytes of payload, in contrast to normal frames which have up to 1500 bytes per payload. They are useful on fast (Gigabit Ethernet and faster) networks, because they reduce the overhead. Not only will it result in a higher throughput, it will also reduce CPU usage. To use jumbo frames, you whole network needs to support it. That means that your switch needs to support jumbo frames (it might need to be enabled by hand), and also all connected hosts need to support jumbo frames. Jumbo frames should also only be used on reliable networks, as the higher payload will make…

  • Linux,  Work

    FS-CACHE for NFS clients

    FS-CACHE is a system which caches files from remote network mounts on the local disk. It is a very easy to set up facility to improve performance on NFS clients. I strongly recommend a recent kernel if you want to use FS-CACHE though. I tried this with the 4.9 based Debian Stretch kernel a year ago, and this resulted in a kernel oops from time to time, so I had to disable it again. I’m currently using it again with a 4.19 based kernel, and I did not encounter any stability issues up to now. First of all, you will need a dedicated file system where you will store the…

  • Linux,  Work

    Debian Stretch on AMD EPYC (ZEN) with an NVIDIA GPU for HPC

    Recently at work we bought a new Dell PowerEdge R7425 server for our HPC cluster. These are some of the specifications: 2 AMD EPYC 7351 16-Core Processors 128 GB RAM (16 DIMMs of 8 GB) Tesla V100 GPU Our FAI configuration automatically installed Debian stretch on it without any problem. All hardware was recognized and working. The installation of the basic operating system took less than 20 minutes. FAI also sets up Puppet on the machine. After booting the system, Puppet continues setting up the system: installing all needed software, setting up the Slurm daemon (part of the job scheduler), mounting the NFS4 shared directories, etc. Everything together, the system…

  • Linux,  Work

    Leap second causing ksoftirqd and java to use lots of cpu time

    Today there was a leap second at 23:59:60 UTC. On one of my systems, this caused a high CPU load starting from around 02h00 GMT+2 (which corresponds with the time of the leap second). ksoftirqd and some java (glassfish) process where using lots of CPU time. This system was running Debian Squeeze with kernel 2.6.32-45. The problem is very easy to fix: just run # date -s "`date`" and everything will be fine again. I found this solution on the Linux Kernel Mailing List: http://marc.info/?l=linux-kernel&m=134113389621450&w=2. Apparently a similar problem can happen with Firefox, Thunderbird, Chrome/Chromium, Java, Mysql, Virtualbox and probably other processes. I was a bit suprised that this problem…

  • Linux,  Work

    MegaCLI: useful commands

    Recently I installed a server with a Supermicro SMC2108 RAID adapter, which is actually a LSI MegaRAID SAS 9260. LSI created a command line utility called MegaCLI for Linux to manage this adapter. You can download it from their support pages. The downloaded archive contains an RPM file. I installed mc and rpm on Debian with apt-get, and then extracted the MegaCli64 binary (for x86_64) to /usr/local/sbin, and the libsysfs.so.2.0.2 from the Lib_utils RPM to /opt/lsi/3rdpartylibs/x86_64/ (that’s the location where MegaCli64 looks for this library). Here are some useful commands: View information about the RAID adapter For checking the firmware version, battery back-up unit presence, installed cache memory and the…

  • Linux,  Work

    Fixing grub-probe error: Couldn’t find PV, check your device.map.

    Today I was getting this error when installing a new kernel on a server running Debian: /usr/sbin/grub-probe: error: Couldn't find PV pv2. Check your device.map. The error can be reproduce by running the update-grub command. The day before, a new RAID disk was added to this server, so I suspected this could be the cause. The file /boot/grub/device.map contained a reference to the first RAID disk as (hd0) but did not contain a reference to the new RAID disk. I ran # ls -l /dev/disk/by-id/ to find out which SCSI ID referred to sdb (the new RAID disk), and then added the following line to device.map: (hd1) /dev/disk/by-id/scsi-3600304800087c4f015fb4f2e4cc7a8e5 Now installing…

  • Linux,  Work

    Linux performance improvements

    Two years ago I wrote an article presenting some Linux performance improvements. These performance improvements are still valid, but it is time to talk about some new improvements available. As I am using Debian now, I will focus on that distribution, but you should be able to easily implement these things on other distributions too. Some of these improvements are best suited for desktop systems, other for server systems and some are useful for both.

  • Linux,  Work

    Improving Mediawiki performance

    Now that I am on the subject of improving performance, I configured some performance improvements for a Mediawiki installation here: Make sure you run the latest Mediawiki version. Mediawiki 1.16 introduced a new localisation caching system which is supposed to improve performance, so you definitely want this to get the best performance. Create a directory where Mediawiki can store the localisation cache (make sure it is writable by your web server). By preference store it on a tmpfs (at least if you are sure it will be big enough to store the cache), and configure it in LocalSettings.php: $wgCacheDirectory = "/tmp/mediawiki"; Iif /tmp is on a tmpfs, you might add…

  • Linux,  Work

    Improving performance by using tmpfs

    Today I analyzed disk reads and writes on a server with iotop and strace and found some interesting possible optimizations. With iotop you can check which processes are reading and writing from the disks. I always press the o, p and a keys in iotop so that it only shows me processes doing I/O and so that it will show accumulated I/O instead of the bandwidth. With the left and right arrows I select on which columns to sort the list. Once you have identified the processes wich are doing much I/O, you can check what they are reading or writing with strace, for example # strace  -f -p $PID …

  • Linux,  Work

    DHCPd failover

    Last week, I set up two dhcpd servers in a fail-over configuration. The goal is that when one DHCP server goes down, the other one takes over so that clients don’t lose their network connection. I read different tutorials on the web, such as this one of a fellow blogger and this documentation published by IBM.