“If you rename a .docx file to .zip you can open it”, tweets SwiftOnSecurity, the security professional/Taylor Swift parody account. Then you can grab images and video that were embedded in the document, all as separate files. Swift’s followers have more great tips for rescuing data from different file formats. For example, you can use Swift’s method to open a corrupted DOCX file:
If your .docx file has some corrupted images in it and won’t open, you can rename it to a .zip file, delete those images out of the “archive”, and then rename it back to a .docx and it will just show them as broken links.
— BenMS (@BenMS) July 1, 2018
You can even use it to shrink a large DOCX file by downsizing its embedded images:
You can also write a script that unzips them, goes through the images, converts them to a more sensible format, changes the links to images and rezip the whole thing for a much smaller package.
— Patrik Hirvinen (@hirvinen) June 30, 2018
Here are some more interchangeable file formats:
I did this years ago and had a single file that was a valid docx, pptx, epub, jar, and png simultaneously.
— ▉▊▋▍▎▏ (@willkirkby) July 1, 2018
This is all kind of dizzying.
If you save a .html file as .docx you can open it and it’ll hit embedded images. Unless that’s changed in the last couple years.
— Willa Riggins (@willasaywhat) June 30, 2018
Unfortunately, you can’t daisy-chain a HTML file into a ZIP, says a reply. You also can’t open a password-protected DOCX this way, says another reply.
So your computer is only slightly magic.
Comments
One response to “How To Rescue Embedded Images From A DOCX File”
It’s also a great way to rescue damaged Excel workbooks where one of the sheets has become bloated or corrupted.
Had a 40MB workbook at work recently and was able to identify where cell formatting was being saved separately for all 1 million+ rows. Fixing that brought the same file back down to 900KB.