Giving Voice to Linux with ViaVoice

Open the Pod Bay Door Please, HAL

  • December 26, 2000
  • By Scott Courtney

Let's face it: Each and every one of us geeks, from the moment we first saw 2001: A Space Odyssey in 1968, has dreamt of owning HAL9000. For 22 years we've crept steadily closer to the elusive goal of a self-aware computer with virtually infinite capacity, and with which we can converse as naturally as with any human. Admittedly, we'd like to leave out that part about suffocating us in our sleep and locking us out of our vehicle when we're a couple billion miles from home. But, after all, that was just the beta version!

The phenomenal growth in processing power since then has, alas, failed to bring us the magic of HAL9000. One of the most intractable problems has turned out to be the recognition of human speech by a computer. Merely increasing processing speed isn't enough, because the fundamental difficulties lie in duplicating the contextual ways in which human beings understand speech. Raymond Kurzweil has written an overview article describing the problems, for those who are interested.

Yet there has been much progress. Many of us can remember the tantalizing glimpse of the future when IBM released OS/2 Warp Version 4.0 with VoiceType Dictation integrated into the Workplace Shell user interface. Warp 4 predated the ship date of Windows 95 by several months, and indeed the full integration of speech recognition with a commercial operating system is something that has still not been achieved in Redmond. I was an OS/2 user at the time, and like most others who had the necessary hardware I couldn't wait to get my hands on VoiceType. It seemed too good to be true.

To a large extent, unfortunately, it was. Not that it was a bad product -- it was quite advanced for its day and did basically everything IBM claimed it could do. The integration with Workplace Shell wasn't quite HAL9000 but it was as seamless as one could hope to achieve without major redesign of applications and the Workplace Shell itself. The problems were resource consumption--VoiceType was, to put it mildly, a pig--and the need to extensively adapt one's speech pattern to the software. To--use--VoiceType--you--had--to--put--a--break--between--each--word. It felt unnatural, and for those of us who are fast touch-typists, it was actually less productive than simply typing the text.

So what happened to VoiceType? Well, we OS/2 users trotted it out for demonstration to our friends and co-workers, showing off its tight integration with Workplace Shell along with its sheer gee-whiz factor. This, we said, was proof that OS/2 was a superior operating system! Then, as soon as we were alone with the computer again, we shut down the speech option to save memory, and we reached for the keyboard so we could get some real work done. VoiceType, as good as it was, remained on the shelf as an amusing toy. As OS/2 Warp has faded from the marketplace, VoiceType Dictation joins Workplace Shell in the fond memory albums of its former users (many of whom now run Linux).

IBM has not, however, been idle in the years since Warp 4 was released. Always known for its research-and-development leadership, IBM has spent millions quietly developing and improving its speech recognition tools. The latest embodiment of that technology is its ViaVoice Dictation product. ViaVoice was originally only available for Windows, but a Linux port was released late this summer. I was fortunate enough to receive a review copy, and began my arduous but rewarding journey from installation through to productive use.

