News Team Current issue History Online Support Download Forum @Pouet

01 - 02 - SE - 03 - 04 - 05 - 06 - 07 - 08 - 09 - 10 - 11 - 12 - 13 - 14

Alive 7

Keeping a firm grip on the written word, the Microsoft way!
               by Gus Mueller

This one is reproduced without permission, but really cannot be missed!

Formats  are essential for the storage and propagation of  information.   A
lot of a format's usefulness has to do with its long-term stability and the
number of entities (people,  computers, mitochondria, etc.) that understand

For  example,  were  it not for the stability and ubiquity of the three  or
four  billion year old DNA data format,  none of us would have the language
skills necessary to make babies or catch colds.  Another lesser factor in a
format's  utility is its efficiency.   A format that requires bulky storage
methods  or excessive redundancy is less likely to stand the test  of  time
than one that is more efficient.  This is why few people have the equipment
or  software  to  read such old bulky storage  schemes  as  punched  cards,
ferrite core arrays and eight inch floppy disks.

While  it's  true that DNA tends to be used wastefully and  redundantly  in
multicellular  animals (with huge expanses of media coding useless  noise),
the  density  of  this  format minimizes the burden  of  such  waste  while
providing nearly unlimited room for genome expansion, much like blank space 
on a computer hard drive.

Sometimes,  of course, formats aren't designed to last for the ages.  Often
manmade  formats,  particularly those concocted by individual corporations,
are designed  to maximize market share dominance. When you make use of such
formats  for your creative output,  you are trusting a corporation to  stow
your  information  in  a  proprietary bottle of their  own  design,  to  be
manipulated in ways that they alone can fully control.

Some  of these formats are truly horrendous,  both in terms of inefficiency
and in terms of permanence.   The most widely-used impermanent, inefficient 
format  is  whatever  happens  to be the latest  one  for  Microsoft  Word.
Whatever that happens to be,  it is scheduled to expire the moment the next
version is released,  and that in turn is scheduled to expire when the next
version is released.

Supposedly  later versions of Microsoft Word can read documents created  by
earlier  versions,  but  I know for a fact that this isn't always true.   I
recently  found  myself  spending hours trying to tease  the  text  out  of
Microsoft  Word documents saved on various Macintoshes in the early to  mid
1990s.  This  shouldn't  have been a problem since I am equipped with  Word
2000,  but  it  was.  Furthermore,  these  old formats even crashed my text
editors,  the most bullet-proof of document tools.  The implication is that
there  will come a day in the not too distant future when  documents  being
saved today in the absolutely latest version of Microsoft Word will not  be
readable by any contemporary software.

But wait, it gets worse.  Today one of my clients emailed me some Microsoft
Word  documents that had been saved in the file format of the  most  recent
version of the  program.   Using Microsoft Word 2000,  I was unable to read
them  as  anything  but  text documents filled  with  page  after  page  of 
extraneous  and  sometimes garbled data (as well as several copies  of  the
operative text, at least one of which was in unicode.)

I could have written to the client and had him resend it in a more timeless
format,  but since the text was only about a paragraph or two in length and
came  to  only two or three kilobytes,  I figured I'd extract it  from  the
exotic  new format manually.   It was at this point that I made a couple of
alarming  discoveries.   First of all,  the size of this tiny document when
saved in the latest whiz-bang Microsoft Word format came to 130K - or about 
the text size of a small novel.

Instead  of downloading the text equivalent of a smallish personal  letter,
my  email  software had been forced,  using my slow dialup  connection,  to
download the equivalent of an average book by Steinbeck.   And what was all 
the format padding comprised of?   Well,  as already mentioned,  there were
several copies of the full text of the document.  But then there were other
things,  things  that  perhaps the average Microsoft Word user wouldn't  be 
pleased to be sending out with every document they email.

I was astounded to see the file contained a huge list of proper names.   It
looked  like  the complete content of an address book with  all  the  email
addresses excised.   What,  I wonder, is the justification of including all
this with even the tiniest of documents?   More importantly,  why do people
continue  to put up with this disaster of a format as the default  standard
of document exchange?

If  you  are saving your creative output in such proprietary  formats,  now
would  be  the  time to consider exporting your  documents  to  some  open,
efficient  standard that doesn't violate your privacy in every  file.   The
one I happen to use is HTML, but there are others as well.

Alive 7