Convert Word Documents to Cruft-free HTML
Posted by Gina Trapani at 5:00 AM on April 29, 2008
Anyone who's tried saving a Word document as a web page knows you get way more than you bargained for in the HTML and CSS department in the result. The Productivity Portfolio blog offers two alternatives when you want to zip a .DOC to a .HTML file in a jiffy without all the cruft: Using the online Word HTML Cleaner at Textism (files up to 20K only), or sending yourself the document via Gmail and hitting the "View as HTML" link. Handy.

Comments (AU Comments · US Comments)
There are currently no AU comments for this post.
skwirl
Posted 5:15 AM 29/4/08
Wow. I really needed this tip a couple weeks ago. I think I ended up using Word's built in Save as Web Page -> Web Page, Filtered option, but I still had to go back in with Notepad and hand delete some of the cruft that was messing up my email mail merge.
skwirl
ww2db.com
Posted 7:18 AM 29/4/08
Too bad there is a size limit, but an useful tool nevertheless.
A few years ago, at my last job, I had access to a Dreamweaver license. It had a special functionality to "clean up Word HTML", which did a pretty good job to my recollection. Perhaps the most recent release of it has something similar as well?
ww2db.com
Fras
Posted 7:09 AM 29/4/08
Smart idea. I like the e-mail to gmail idea. I use Any2FB which likes HTML as a source to create FB format ebooks for my Nokia N800. This tip will allow me to copy Word and PDF files to simple HTML ready for conversion.
Fras
Reilaos~
Posted 8:12 AM 29/4/08
Speaking of Google and documents...
Here's an interesting fix. More times than I'd be comfortable with, I've had a USB memory stick on me with something I needed to print out in an OpenDocument format.... And a public windows machine with no idea what I was trying to do. So I upload the file to Google Docs, and then download it right back, but as a .doc instead of an .odf. Fun.
Reilaos~
DillyTonto
Posted 6:12 AM 29/4/08
There are several utilities out there that replace the ribbon with the classic menus. Here's a review of a couple of them.
How to get around the time-sucking Ribbon in MS Word
[www.midmarket.eweek.com]
DillyTonto
mdrisser
Posted 9:35 AM 29/4/08
HTML Tidy is another option, its been around for years, and has been cleaning Word HTML for nearly as long:
[tidy.sourceforge.net]
There's also a couple of online versions:
[infohound.net]
valet.htmlhelp.com/tidy/
A windows GUI:
[pagesperso-orange.fr]
mdrisser
JohnD65
Posted 7:28 AM 29/4/08
I wonder if there is an API for this thing.... anyone?
JohnD65
graham.reeds
Posted 3:27 PM 29/4/08
A word document spewed out as html less than 20Kb? My resume is only 57kb and that spewed to 500kb the last time I tried to get an html version of it.
graham.reeds
muteboy
Posted 12:20 AM 30/4/08
@mdrisser: Seconded. In fact, I used HTML Tidy plus the GUI as my primary HTML editor for a long time. The tidying features would give me instant validation, and prettify the code at the same time.
muteboy
Jared
Posted 12:59 AM 30/4/08
Any suggestions for going the other way, from HTML to Word?
Jared
jrizzo
Posted 7:10 AM 29/4/08
Textism does fairly well, and I've been using it for a while. Still have to clean it up by hand sometimes. Opening the doc in OpenOffice and saving it from there helps too!
jrizzo
joeyd
Posted 6:55 AM 30/4/08
Webmaster Sherpa has a nifty tool to cleanup MS Word documents as well with the notable difference being that it also handles list items (numbered and bulleted). It is based on the open source FCKeditor and can be used online or you can download the PHP code to incorporate in your own CMS or other application. Check it out at:
[www.webmastersherpa.com]
joeyd
dsevil
Posted 12:59 AM 2/5/08
@Jared: Microsoft Word. It opens HTML documents.
dsevil