Leap second causing ksoftirqd and java to use lots of cpu time

Today there was a leap second at 23:59:60 UTC. On one of my systems, this caused a high CPU load starting from around 02h00 GMT+2 (which corresponds with the time of the leap second). ksoftirqd and some java (glassfish) process where using lots of CPU time. This system was running Debian Squeeze with kernel 2.6.32-45. The problem is very easy to fix: just run

# date -s "`date`"

and everything will be fine again. I found this solution on the Linux Kernel Mailing List: http://marc.info/?l=linux-kernel&m=134113389621450&w=2. Apparently a similar problem can happen with Firefox, Thunderbird, Chrome/Chromium, Java, Mysql, Virtualbox and probably other processes.

I was a bit suprised that this problem only happened on this particular machine, because I have several other servers running similar kernel versions.

MegaCLI: useful commands

Recently I installed a server with a Supermicro SMC2108 RAID adapter, which is actually a LSI MegaRAID SAS 9260. LSI created a command line utility called MegaCLI for Linux to manage this adapter. You can download it from their support pages. The downloaded archive contains an RPM file. I installed mc and rpm on Debian with apt-get, and then extracted the MegaCli64 binary (for x86_64) to /usr/local/sbin, and the libsysfs.so.2.0.2 from the Lib_utils RPM to /opt/lsi/3rdpartylibs/x86_64/ (that’s the location where MegaCli64 looks for this library).

Here are some useful commands:

View information about the RAID adapter

For checking the firmware version, battery back-up unit presence, installed cache memory and the capabilities of the adapter:

# MegaCli64 -AdpAllInfo -aAll

View information about the battery backup-up unit state

# MegaCli64 -AdpBbuCmd -aAll

View information about virtual disks

Useful for checking RAID level, stripe size, cache policy and RAID state:

# MegaCli64 -LDInfo -Lall -aALL

View information about physical drives

# MegaCli64 -PDList -aALL

Patrol read

Patrol read is a feature which tries to discover disk error before it is too late and data is lost. By default it is done automatically (with a delay of 168 hours between different patrol reads) and will take up to 30% of IO resources.

To see information about the patrol read state and the delay between patrol read runs:
# MegaCli64 -AdpPR -Info -aALL

To find out the current patrol read rate, execute
# MegaCli64 -AdpGetProp PatrolReadRate -aALL

To reduce patrol read resource usage to 2% in order to minimize the performance impact:
# MegaCli64 -AdpSetProp PatrolReadRate 2 -aALL

To disable automatic patrol read:
# MegaCli64 -AdpPR -Dsbl -aALL

To start a manual patrol read scan:
# MegaCli64 -AdpPR -Start -aALL

To stop a patrol read scan:
# MegaCli64 -AdpPR -Stop -aALL

You could use the above commands to run patrol read in off-peak times.

Migrate from one RAID level to another

In this example, I migrate the virtual disk 0 from RAID level 6 to RAID 5, so that the disk space of one additional disk becomes available. The second command is used to make Linux detect the new size of the RAID disk.

# /usr/local/sbin/MegaCli64 -LDRecon -Start -r5 -L0 -a0
# echo 1 > /sys/block/sda/device/rescan

Extending an existing RAID array with a new disk

./MegaCli64 -LDRecon -Start -r5 -Add -PhysDrv[32:3] -L0 -a0

Create a new RAID 5 virtual disk from a set of new hard drives

First we need to now the enclosure and slot number of the hard drives we want to use for the new RAID disk. You can find them out by the first command. Then I add a virtual disk using RAID level 5, followed by the list of drives I want to use, specified by enclosure:slot syntax.

# MegaCli64 -PDList -aALL | egrep 'Adapter|Enclosure|Slot|Inquiry'
# MegaCli64 -CfgLdAdd -r5'[252:5,252:6,252:7]' -a0

Extending an existing RAID array with a new disk

First check the enclosure device ID and the slot number of the newly added disk with the command above. Then we reconstruct the logical drive, adding the new drive. For a RAID 5 array this command is used:

# MegaCli64 -LDRecon -Start -r5 -Add -PhysDrv[32:3] -L0 -a0

View reconstruction progress

When reconstructing a RAID array, you can check its progress with this command.
# MegaCli64 -LDRecon ShowProg L0 -a0

(replace L0 by L1 for the second virtual disk, and so on)

Configure write-cache to be disabled when battery is broken

# MegaCli64 -LDSetProp NoCachedBadBBU -LALL -aALL

Change physical disk cache policy

If your system is not connected to a UPS, you should disable the physical disk cache in order to prevent data loss.

# MegaCli -LDGetProp -DskCache -LAll -aALL

To enable it (only do this if you have a UPS and redundant power supplies):

# MegaCli -LDGetProp -DskCache -LAll -aALL

More information

http://ftzdomino.blogspot.com/2009/03/some-useful-megacli-commands.html
https://twiki.cern.ch/twiki/bin/view/FIOgroup/DiskRefPerc
http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS
http://kb.lsi.com/KnowledgebaseArticle16516.aspx

Linux performance improvements

Two years ago I wrote an article presenting some Linux performance improvements. These performance improvements are still valid, but it is time to talk about some new improvements available. As I am using Debian now, I will focus on that distribution, but you should be able to easily implement these things on other distributions too. Some of these improvements are best suited for desktop systems, other for server systems and some are useful for both. Continue reading “Linux performance improvements”

Server migration to Debian

Since this afternoon, this server is now running Debian GNU/Linux Squeeze. Just like the previous system, this is a KVM virtual machine running on a HP Proliant DL185G5 host. The host server has always been running Debian. This was my last production system still running Mandriva. I might have forgotten to move over a few things or there might be some breakage somewhere, so let me know if you encounter a problem.

Migration to virtual machine

I just finished migrating this website to a new KVM virtual machine. The virtual machine is running Mandriva Cooker (2010.0) now with kernel 2.6.31-rc9 and the virtio drivers. The host machine is a HP DL185 G5 running Debian Lenny (kernel 2.6.26) and kvm 85.

During the next days, there might be some short downtime now and then while I continue configuring things

Server migration

Since two days, I have merged the main servers used by two research laboratories at work. One server was an old Linux server which really needed a hardware upgrade, and the other one was a Mac Pro machine running a flaky OS X Leopard. The new server is of course running Linux: Debian Lenny.

It was a very interesting experience: working out procedures to migrate the mailboxes (from Dovecot on the Linux server and Cyrus on the Mac server to Cyrus on the new server), finding out how to set up one NIC in two different subnets (especially the routing is a little bit tricky), getting all services hooked up to LDAP and managed by GOSA, getting dhcpd to do exactly what we want in a shared-network set up, and much more.

The new server is a HP DL185 G5 with an AMD Opteron quad core CPU and 8 GB of RAM and hosts two KVM virtual machines, one for public services and another one running internal services. You can visit the two websites, which are also hosted on this machine of course, of the concerned research labs:

Maybe in the not too far away future, I should try to move the services hosted on the underpowered desktop machine running this website, also to a virtual machine…