Keeping a firm grip on the written word, the Microsoft way!
by Gus Mueller
This one is reproduced without permission, but really cannot be missed!
Formats are essential for the storage and propagation of information. A
lot of a format's usefulness has to do with its long-term stability and the
number of entities (people, computers, mitochondria, etc.) that understand
For example, were it not for the stability and ubiquity of the three or
four billion year old DNA data format, none of us would have the language
skills necessary to make babies or catch colds. Another lesser factor in a
format's utility is its efficiency. A format that requires bulky storage
methods or excessive redundancy is less likely to stand the test of time
than one that is more efficient. This is why few people have the equipment
or software to read such old bulky storage schemes as punched cards,
ferrite core arrays and eight inch floppy disks.
While it's true that DNA tends to be used wastefully and redundantly in
multicellular animals (with huge expanses of media coding useless noise),
the density of this format minimizes the burden of such waste while
providing nearly unlimited room for genome expansion, much like blank space
on a computer hard drive.
Sometimes, of course, formats aren't designed to last for the ages. Often
manmade formats, particularly those concocted by individual corporations,
are designed to maximize market share dominance. When you make use of such
formats for your creative output, you are trusting a corporation to stow
your information in a proprietary bottle of their own design, to be
manipulated in ways that they alone can fully control.
Some of these formats are truly horrendous, both in terms of inefficiency
and in terms of permanence. The most widely-used impermanent, inefficient
format is whatever happens to be the latest one for Microsoft Word.
Whatever that happens to be, it is scheduled to expire the moment the next
version is released, and that in turn is scheduled to expire when the next
version is released.
Supposedly later versions of Microsoft Word can read documents created by
earlier versions, but I know for a fact that this isn't always true. I
recently found myself spending hours trying to tease the text out of
Microsoft Word documents saved on various Macintoshes in the early to mid
1990s. This shouldn't have been a problem since I am equipped with Word
2000, but it was. Furthermore, these old formats even crashed my text
editors, the most bullet-proof of document tools. The implication is that
there will come a day in the not too distant future when documents being
saved today in the absolutely latest version of Microsoft Word will not be
readable by any contemporary software.
But wait, it gets worse. Today one of my clients emailed me some Microsoft
Word documents that had been saved in the file format of the most recent
version of the program. Using Microsoft Word 2000, I was unable to read
them as anything but text documents filled with page after page of
extraneous and sometimes garbled data (as well as several copies of the
operative text, at least one of which was in unicode.)
I could have written to the client and had him resend it in a more timeless
format, but since the text was only about a paragraph or two in length and
came to only two or three kilobytes, I figured I'd extract it from the
exotic new format manually. It was at this point that I made a couple of
alarming discoveries. First of all, the size of this tiny document when
saved in the latest whiz-bang Microsoft Word format came to 130K - or about
the text size of a small novel.
Instead of downloading the text equivalent of a smallish personal letter,
my email software had been forced, using my slow dialup connection, to
download the equivalent of an average book by Steinbeck. And what was all
the format padding comprised of? Well, as already mentioned, there were
several copies of the full text of the document. But then there were other
things, things that perhaps the average Microsoft Word user wouldn't be
pleased to be sending out with every document they email.
I was astounded to see the file contained a huge list of proper names. It
looked like the complete content of an address book with all the email
addresses excised. What, I wonder, is the justification of including all
this with even the tiniest of documents? More importantly, why do people
continue to put up with this disaster of a format as the default standard
of document exchange?
If you are saving your creative output in such proprietary formats, now
would be the time to consider exporting your documents to some open,
efficient standard that doesn't violate your privacy in every file. The
one I happen to use is HTML, but there are others as well.