Hard disk health check
Sometimes, hard disks fail – often inexplicably. A hard disk drive is full of moving parts, and mechanics like this have a shelf life. Unfortunately, the life expectancy of a hard disk drive is not as definitive as something like a carton of milk that has an expiry date on the side. At the time of manufacturing, the expiry date of the hard disk cannot be predicted to the day – or even the year. By their nature, it’s an inevitability that the hard disk will fail at some point within the next zero to one hundred years. That’s not a particularly helpful timeframe, so it’s important to regularly check the health of a hard disk, to predict and be prepared for any failures.
There are many factors that might cause a hard disk to fail. A hard disk failure could be caused by a power surge or outage, by overheating, by sudden rapid vibration, corrupt files or malware.
Data centre protection
The Fasthosts data centres are designed and built to combat the external factors. The climate of our data centres is controlled by multiple redundant CRAC units that keep the temperature of the data centre optimal at all times. Over six thousand dedicated servers running 24/7 expel a lot of heat, so a lot of design and technology ensures that the temperature is controlled, and the servers are cooled. Humidity is analysed constantly so that the servers are always running in the perfect conditions.
Uninterruptible power supply
The data centres are supported by an uninterruptible power supply. A series of batteries and diesel generators ensure that in the freak case of a power outage, the power to the data centre is switched to the backup supplies with absolutely no interruption. Power and heat conditions are managed and monitored 24/7 by technology systems and a team of expert engineers.
The data centres are meticulously designed and managed to ensure our dedicated servers are running in optimal conditions, and the dedicated servers themselves are also designed with hard disk health in mind. Inside the chassis of our dedicated servers, hard disks are installed with rubber dampeners that absorb vibrations and movements that could be harmful to the hard disk. There are also small, powerful fans inside the chassis that keep a constant flow of cold air passing over the hard disk and other vital components, cooling them down.
Health check software
We do a lot to ensure that the conditions are perfect for the dedicated servers and their hard disks, but there are extra measures that customers can take.
As part of regular server management, customers should utilise tools like HDSentinel to perform consistent health checks on their dedicated servers. A hard disk failure can sometimes be predicted if a machine starts to run significantly slower, or if data transfer speed is impacted, but often there are no warning signs that a hard disk failure is imminent.
In mechanics and hardware things can break. As a basic example, you may think your car’s engine is working fine right up until the second you end up broken down on the hard shoulder at the side of the M1 at 11 o’clock at night. That’s why cars need regular services and MOTs. In a way, hard disks and servers work similarly. It’s important to regularly check on the health of a hard disk, or you’ll end up on a metaphorical hard shoulder wondering where the metaphorical smoke is coming from.
Percentage health and reports
Hard disk monitoring software like HDSentinel gives you a better insight into the health of your hard disk. Health is displayed as an intuitive percentage of 100. This figure is based on the estimated remaining life time and SMART (self-monitoring, analysis, reporting technology) data that was built into the hard disk at time of manufacturing. HDSentinel presents this data visually and textually, and shows an estimated remaining life time in days and hours. At Fasthosts, if a customer tells us that their hard disk is under 40% health then we will exchange it for a new one.
HDSentinel can provide regular, automated reports on the health of a hard disk, and it can be set up to send alerts when the hard disk health reaches certain thresholds. For example, it can send the server owner an email when hard disk health reaches a user-defined percentage. These reports and alerts can run in the background, and are an important part of server management.
The HDSentinel software runs a variety of different tests that can check the health. Short tests of the major hard disks components (read/write heads, electronics, internal memory etc) take only a couple of minutes, or more extended, in-depth tests that scan fully for any problematic areas can take up to a couple of hours.
Preventing and predicting hard disk failure is crucial because a failure could result in lost data. If your car breaks down an engineer can come out, fix it, and send you on your way. If a hard disk fails the consequences could be more catastrophic and immediate. That’s why it’s so important to use tools like HDSentinel to predict health and prevent failure. We replace any hard disks under 40% health because this figure is high enough to give customers time to make those crucial final backups before replacement.
In the case that a hard disk does fail, it’s important to have damage limitation in place. It’s likely that all data on the disk will be lost, so steps should have been taken to ensure that that data is backed up somewhere else. All Fasthosts dedicated servers are in a RAID-redundant configuration. In a RAID 1 setup of two mirrored drives, if one disk fails, the data is safe on the other. Replace the failed drive and, depending on the operating system, data can be retrieved from the other drive. However, if both disk drives fail then there’s nothing to boot from, and data is lost. Again, this is why it’s important to regularly check the health of hard disk drives. When hard disk health gets below 40% and we replace the disk, the data will have to be restored to the disk from a backup.
An alternate solution is to keep consistent onsite backups. Backing data up in multiple locations is vital, and can prevent catastrophic data loss.
For more information on managing your dedicated server visit the Fasthosts support site.