How Are You Going To Permanently Delete A Zetabyte Of Data?

Much of the discussion around big data centres on how to maximise the analytic value of existing business data sources. But what happens when you actually have to get rid of that data?

Big data picture from Shutterstock

I'm attending the Navigating Big Data conference hosted by the Australian Information Industry Association (AIIA) in Canberra today. Opening keynote speaker Parviz Peiravi from Intel raised the question of data destruction.

"How are you going to delete a zetabyte of data?" he asked. If you work in a sensitive industry such as government where data deletion requires multiple overwrites, that can be a very time-consuming process with current data volumes, he noted. It's also a fruitful area for potential future research.

The problem isn't likely to disappear, with data volumes continuing to increase. With zetabytes rapidly becoming the norm, we'll soon be in the yottabyte era. "It sounds like a Star Wars movie and I don't know what is after the yottabyte," Peiravi said.


    You delete it the same way you delete any amount of data. It's not like it's on one drive where the overwrites are synchronous, it's spread out across hundreds or thousands of drives which can all overwrite in parallel. Wiping data takes no longer for a thousand drives in parallel than it does for one.

      They probably are talking about data spread across thousands upon thousands of drives, it's still going to take an extremely long time even if they're in petabyte HDDs, a Zettabyte would be 1 million petabyte drives, I can't imagine how long that would take.

        Another problem as well is just deleting is isn't really enough.. Especially in such a distributed infrastructure it's entirely possible that like 99% of the file would remain intact for months or even years after deletion (purely depending on luck and distribution).

        Not really good enough for sensitive business data.

          yeah they cover that in the article: "data deletion requires multiple overwrites", so I presume it's not just a case of wiping it once but chucking dummy data over the top and then wiping that, rinse and repeat.

        We don't have single unit hard drives anywhere near that size though, at least not in commercial use. I never worked directly in a datacenter myself but a friend of mine not long ago suggested that drive sizes in datacenters really aren't that big, even if they're merged with software or hardware solutions. Assuming 5TB drives, whether you're wiping one or a thousand won't change the speed involved because you wipe them in parallel, so you're only paying the time cost of 5TB of data rather than 5000TB.

        Unless, of course, you were using a single controller to wipe it all synchronously, which would be a terrible idea.

    With one of these:


    CTRL+A, Shift+Delete

    Give it to Facebook. I'm sure they have a great method of deleting data?

    The real problem here is that the government has defined a strict rule saying "data must be overwritten 7 times" rather than "The data must be deleted in a way that cannot be recovered by any means"

    If you make sure that all of your data is stored on disk encrypted, all major databases support this at the application level and there are many file systems disks and filers that support it at a lower level, then all you need to do is overwrite the encryption key and destroy all backups.
    Maybe overwrite enough disks that parity+1 stripes are destroyed in every mirror as well so that it can't be brute forced over a thousand years. If the data is not encrypted this is not so useful as you can recover things like images and text files (albeit with corruption) in this case.

    Just do what Defense does. Physically DESTROY the drives, then bury them on base.

    Some poor soul is going to have to spend months sitting in front of a blue and white screen watching DBAN (Drake's Boot and Nuke) for a very very long time haha.

Join the discussion!

Trending Stories Right Now