Like many Linux users, I'm fairly paranoid about backups. I back up all of the important data on my desktop and laptop to the RAID on my main fileserver. I have several offline backup disks which I keep unplugged most of the time. Once or twice a week, I insert one into the trayless hot-swappable SATA bay on my server and run a backup to it.
Linux has always had great support for hot-swapping drives. However, there is one problem: if a new drive is inserted too quickly after the old one is removed, the error handler may limit the link speed to 1.5 Gbps:
[7270695.660961] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen [7270695.661416] ata4: irq_stat 0x00000040, connection status changed [7270695.662327] ata4: SError: { RecovComm PHYRdyChg CommWake DevExch } [7270695.663518] ata4: limiting SATA link speed to 1.5 Gbps [7270695.663524] ata4: hard resetting link [7270705.684030] ata4: softreset failed (device not ready) [7270705.684498] ata4: hard resetting link [7270712.632035] ata4: SATA link up 1.5 Gbps (SStatus 123 SControl 300) [7270712.640555] ata4.00: ATA-7: Hitachi HDS721010KLA330, GKAOAB0A, max UDMA/133 [7270712.640560] ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [7270712.641499] ata4.00: configured for UDMA/133 [7270712.656024] ata4: EH complete
For the longest time, the only solution was to unplug the drive, wait 5 minutes, then plug it back in. I found the function responsible for limiting the speed (sata_down_spd_limit), but since Ubuntu kernels include libata in the base image, changing it would mean patching and recompiling the kernel every time it updates.
Today, I discovered that Linux has a feature called kretprobe which allows a module to patch a running kernel without a reboot. Using this, I created a module which patches libata to prevent it from limiting SATA link speeds after link errors occur. I've tested it on all three of my systems with no problems.
If you have problems with link speeds after hot-swapping, you can download the module from here.
1 comment:
You didn't give what version of kernel you were running.
I am experiencing the same problem.
Since you mentioned a five minute time
out I figured I would try it. At least
on my 2.6.32.60 kernel it didn't work with about a 15 minute delay.
Post a Comment