Latest news
Downtime for disk replace
Published by Georgi Sotirov at 2017-12-09 04:01:15 UTC
Last night while all
students were celebrating, I spent replacing a failing Samsung
830 SSD on the server.
The disk was so rotten that it's copy with dd
took about 5 hours,
which is why the server was offline somewhere between 2017-12-08 22:00 EET and 2017-12-09 05:00 EET. The disk started failing in
beginning of September, but recently the number of reallocated sectors become
extremely high and I started detecting bad sectors on some system files. The
read performance had also dropped and during the copy it fell to 5 MB/s (!), which explains the fore
mentioned slow copy of just 64 GB between
the old and new SSD. The disk
failed only after about 24 000 power on hours (i.e. about 2 years and 9 months),
which is rather strange, but maybe this is the normal life span of consumer
SSDs?
Anyway, the drive is now replaced with a brand new ADATA SU800
128 GB, which unfortunately is not yet in
smartctl
database (see ticket 954).
The server is back online and fully operational.