How ECC Memory Improves Workstation Reliability

For high-performance computing, error-correcting code (ECC) memory is an essential requirement. Here's how it works and why it helps.

Memory picture from Shutterstock

Despite the fact that computer memory ultimately only has to store binary information (ones and zeros), it doesn't do so infallibly. Hardware faults can result in the incorrect digit being recorded. It's also thought that radiation may play a role in random errors, especially with increased density in memory, though this is still a subject of ongoing research.

Those (relatively rare) errors might be acceptable on cheaper desktop computers and laptops -- repeated crashing because of memory problems is annoying, but modern OSes have a number of mechanisms that can help minimise that kind of impact.

However, on higher performance workstations, tolerance for errors is usually lower. After all, if someone is designing a bridge, you don't want the stress factors to be rendered inaccurate because of a memory fault. In finance applications, a tiny error can easily be magnified because of the scale of the figures involved.

Want to upgrade your own office workstation? We've got $13,000 of gear up for grabs in our Dell workstation giveaway. Enter here!

ECC memory deals with this issue in a simple but effective way: it uses an extra parity bit to tell if data has been altered during the write process.

This isn't a new concept -- using parity as a means of checking data accuracy is a technique that predates the advent of the personal computer in the early 1980s. However, most general-purpose PCs didn't implement this approach because it makes for a more expensive approach.

Because workstations aren't such a price-sensitive market, implementing ECC to check for and correct simple data errors is a worthwhile investment. Indeed, for any workstation-class machine, it's one of the most basic requirements. As well as detecting basic errors, ECC can also make it more apparent if a memory component is developing systemic failure.

Evolve is a weekly column at Lifehacker looking at trends and technologies IT workers need to know about to stay employed and improve their careers.


Comments

    Angus, it does KIND of look like this article was specifically written to get that Dell ad (competition) smack bang in the middle of it... pinchie above makes a (somewhat too angry, but still) salient point!

    Poor article - confuses ECC and parity checking. EEC does not use a parity bit to check it uses more than 1 bit to check and correct.

Join the discussion!