It's a common practise to blur or pixelate sensitive information like account numbers when you share an image online, but your info might not be as secure as you think. It takes some work, but there are ways to uncover that sensitive text.
Undoubtedly you have all seen photographs of people on TV and online who have been blurred to hide faces. For example, here's one of Bill Gates.
For the most part this is a fine way to censor peoples' faces, as there isn't a convenient way to reverse the blur back into a photo detailed enough to be recognisable. So that's good, if that's what you intended. However, many people also resort to blurring sensitive numbers and text. I'll illustrate why that is a bad idea.
Suppose someone posted a photo of their cheque or credit card online for whatever reason (proving to someone that I earned a million dollars, showing something funny about a cheque, comparing the size of something to a credit card, etc), censoring the image with the far-too-common mosaic effect to hide the numbers:
Seem secure because nobody can read the numbers anymore? Wrong. Here's a way to attack this scheme:
Step 1: Get a Blank Cheque Image
There are two ways of doing this. You can either Photoshop out the numbers in your existing image, or in the case of credit cards, you can get an account with the same organisation and take a photo of your own card from the same angle, and match the white balancing and contrast levels. Then, use your own high resolution photo to Photoshop out your numbers.
This is easy in these example images, of course:
Step 2: Iterate the Possibilities
Use a script to iterate through all the possible account numbers and generate a cheque for each, blocking out the various sections of digits as sections. For example, for a Visa card, the digits are grouped by 4, so you can do each section individually, thus requiring only 4*10000 = 40000 images to generate, which is easy with a script.
Step 3: Blur Each Image in an Identical Manner to the Original Image
Identify the exact size and offset, in pixels, of the mosaic tiles used to blur the original image, and then do the same to each of your blurred images. In this case, we see that the blurred image we have 8x8 pixel mosaic units, and the offset is determined by counting from the top of the image (not shown):
Now we iterate through all the images, blurring them in the same way as the original image to obtain something like this:
Step 4: Identify the Mosaic Brightness Vector of Each Blurred Image
What does this mean? Well, let's take the mosaic version of 0000001 (zoomed in):
...and identify the brightness level (0-255) of each mosaic region, indexing them in some consistent fashion as a=[a_1,a_2...,a_n]:
In this case, the account number 0000001 creates mosaic brightness vector a(0000001)=[213,201,190,...]. We find the mozaic brightness vector for every account number in a similar fashing using a script to blur each image and read off the brightnesses. Let a(x) be the function of the account number x. a(x)_i denotes the ith vector value of the mosaic brightness vector a obtained from account number x. Above, a(0000001)_1 = 213.
We now do the same for the original cheque image we found online or wherever, obtaining a vector we hereby call z=[z_1,z_2,...z_n]:
Step 4: Find the Iteration with the Closest Distance to the Original Image
Identify the mosaic brigtness of the original image, call it z=[z_1,z_2,...z_n], and then simply compute the distance of each account number's (denote by x) mosaic brightness vector (normalizing each first):
d(x)=sqrt((a(x)_0/N(a(x)) - z_0/N(z))^2 + (a(x)_1/N(a(x)) - z_1/N(z))^2 + ...)
Where N(a(x)) and N(z) are the normalization constants given by
N(a(x)) = (a(x)_0^2 + a(x)_1 ^2 + ...)^2
N(z) = (z_0^2 + z_1 ^2 + ...)^2
Now, we then simply find the lowest d(x). For credit cards, only a small fraction of possible numbers validate to hypothetically possible credit card numbers, so it's an easy cheque as well.
In the above case, we compute, for example,
N(z) = sqrt(206^2+211^2+...) = 844.78459
N(a(0000001)) = 907.47837
N(a(0000002)) = 909.20647
and then proceed to calculate the distances:
d(0000001) = 1.9363
d(0000002) = 1.9373
d(1124587) = 0.12566
d(1124588) = 0.00000
Might the account number just be 1124588?
"But you used your own crafted easy-to-decipher image!"
In the real world we have photos, not fictitious cheques made in Photoshop. We have distortions of the text because of the camera angle, imperfect alignment, and so on. But that doesn't stop a human from determining exactly what these distortions are and creating a script to apply them! Either way, the lowest few distances determined can be considered as candidates. And especially in the world of credit cards -- where numbers are nicely chunked out in groups of four, and only one in ten numbers is actually valid -- it makes it easy to select from your top few lowest distances, which are the most likely candidates.
One important thing that you would need to do in order to implement this on real photos is to improve the distance algorithm. For example, you could rewrite the distance formula above to normalize the standard deviations in addition to the means to improve performance. You could also do the RGB or HSV values independently for each mosaic region, and you can also use scripting to distort the text by a few pixels in each direction and compare as well (which still leaves you with a feasible number of comparisons on a fast PC). You can also employ algorithms similar to existing nearest-shape algorithms to help improve the reliability of this on real photos.
So yes, I used an image against itself and designed it to work here. But the algorithm can surely be improved to work on real world photos. This is just a proof of concept. But one thing is for sure: it's a very easy situation to fix. Don't use simple mosaics to blur your image. All you do is reduce the amount of information from an image containing only log(10^N)/log(2) effective bits of account data. When you distribute such images, you want to eliminate personal information, not obscure it by reducing the amount of visual information in the image.
Think about creating a 100x100 graphic on the screen. Now lets say I just averaged out the entire graphic and replaced every pixel with the whole average (i.e. turn it into a single pixel "mosaic"). You have just created a function that starts with 256^(10000) possibilities and hashes it to 256 possibilities. There is obviously no way with the resulting 8 bits of information you can possibly reverse it to the original image. However, if you know that the original image was one of 10 possibilities, you can easily have success at determining which of the original images was used from just knowing the resulting 8-bit number.
How a Dictionary Password Attack is Similar
Most UNIX/Linux system administrators know that /etc/passwd or /etc/shadow store passwords encrypted using one-way encryption such as Salt or MD5. This is reasonably secure since nobody will ever be able to decrypt the password from looking at its ciphertext. Authentication occurs by performing the same one-way encryption on the password entered by the user logging in, and comparing that result to the stored one-way result. If the two match, the user has successfully authenticated.
It is well known that the one-way encryption scheme is easily broken when the user picks a dictionary word as their password. All an attacker would have to then do is encipher the entire English dictionary and compare the ciphertext of each word to the ciphertext stored in /etc/passwd and pick up the correct word as the password. As such, users are commonly advised to pick more complex passwords that are not words. The dictionary attack can be illustrated like this:
The similarity between the dictionary attack and the blurred image attack lies in the fact that blurring an image is a one-way encryption scheme. You are converting the image you have into another image designed to be unreadable. However, since account numbers only typically go up to the millions, we can assemble a "dictionary" of possible account numbers -- that is, all the numbers from 0000001 to 9999999, for example, use an automated image processor to Photoshop each of those numbers onto a photo of a blank cheque, and blur each image. At that point, one can simply compare the blurred pixels to see what most closely matches the original blurred photo.
The solution is simple: don't blur your images! Instead, just colour over them:
Remember, you want to leave the viewer of the image with no information, not blurred information. This may seem like a complicated method to recover text, but the real point is that it's entirely possible.
Why Blurring Sensitive Information Is a Bad Idea [dheera.net]
Dheera Venkatraman is a graduate student in the Electrical Engineering and Computer Science department at the Massachusetts Institute of Technology, Cambridge, MA, USA.
Lifehacker's Evil Week highlights the dark side of life hacking. How you use that knowledge is up to you.