April 24, 2019

From the Desktop: Opening Up the Rest of Linux - page 2

Stretching the Bounds of "Open"

  • July 14, 2000
  • By Brian Proffitt

The time has come, my friends, to look towards and start using universal file formats. And before you all start groaning and muttering "here's another ASCII freak," I will pointedly remind you about the last time you tried to open a heavily formatted Word document without a Windows machine around. Even StarOffice and Applixware can choke on these babies.

Not so an ASCII document. This one simple protocol is understood and used by almost every PC out there. The problem is that ASCII is just that: simple. No formatting allowed, at least in the conditional sense, because font information is not managed in ASCII. Fonts, spacing, and the formatting stuff is a binary function.

Except for (and this is the good part) Standard Generalized Markup Language, or SGML for short.

SGML is, as the name indicates, a markup language. Passages of text are surrounded by delimiters known as tags. Tags define the look of the text they contain. Tags, in turn, are defined by a Document Type Definition (DTD). The DTD simply lists what the tags are and what each one will do to the text.

The most famous example of an SGML DTD is HTML, the very format this document appears in. HTML makes up a huge majority of the content of the Web. This is no newsflash, but it does help set the stage for my next point: if HTML is doing so well on the Web, why can't we use it in the workplace for day-to-day documents?

The answer lies in the way markup languages are supposed to work. When you write a document (and I do this too), you tend to think of the document as a whole entity. You want to say something and the document is the expression of that statement. If you are writing a book, you might start thinking of the document in terms of chapters, if only to keep from going insane. But that is about as far as most people would break it down.

The real power of SGML is in breaking down the document to its component parts: headings, footnotes, paragraphs in a chapter, paragraphs in the introduction, and so on. Every element of the document is categorized and given its own tag.

Say, for instance, you had a document that had a number and a title at the beginning of each section. In a regular document, you would highlight the number, and apply a new font and font size to it, such as 20 pt. bold Helvetica. The same procedure with the title. In an SGML document, you would do something like this:

<section number>2.0</section number>

<section title>The Joys of SGML</section title>

Then you would go back to the DTD for the document and define the

tags to make text within 20 pt. bold Helvetica.

At first, this seems like a lot of extra work for the same result. But what happens when a new publisher or supervisor comes in and decrees that because of a horrible incident she had in Switzerland, no Helvetica font will ever be used in the organization's documents.

Now, I know that you can open up each document and search and replace all instance of Helvetica. But that means opening every document. Okay in a small office, but what if you are dealing with hundreds of thousands of files?

This is where SGML really shines. If you know all of your documents are linked to the same DTD, just go to the DTD and redefine the appropriate tags to exclude Helvetica. Instead of hundreds of thousands of files, you just edit one.

This is just one very simplified example of the benefits of SGML. There are many others. There are some issues against using SGML as well, not the least of which is the lack of good editing tools out there. Tags and DTDs are powerful, but they are cumbersome to use, especially for beginners. You can type everything in by hand in emacs, kwrite, or whatever else you prefer, but this situation gets really tiresome in a hurry.

Adobe Framemaker 5.5.6 for Linux is close to providing an SGML solution, in that it treats content primarily for what it is and not for what it looks like. Applixware also tends to be SGML-like in its handling of content.

Probably the best-known application is DocBook, which is used to create documentation for a variety of Linux applications and environments. I could, at great length, go on about DocBook but instead I recommend you peruse some of the DocBook information yourself, particularly if you have KDE.

As good as DocBook is, it still has a big learning curve to overcome. Ideally, what is needed is a more streamlined entry-level application that the average person can get a handle on without his/her head exploding from comprehending DTDs and the like.

Linux itself, so far, is a model of open source success. It is ironic that many documents created with Linux applications can still use proprietary file formats. We need to catch this in a hurry, because the better we can communicate, the stronger the Linux community will be.

Most Popular LinuxPlanet Stories