April 24, 2019

Programming Guide Gets Down to the Metal - page 3

Nothing gawk-y About It

  • July 31, 2005
  • By Martin C. Brown

LinuxPlanet: You are the current maintainer of gawk, the GNU implementation of awk. What does that involve, exactly?

Arnold Robbins: My work these days focuses on bug fixes, portability improvements, standards compliance, and performance improvements. I am trying to avoid adding new features at the language level; there are already too many, some of which I now think are of questionable value.

One example of the kind of thing I do currently is that in the 3.1.5 release, gawk's character-related functions (index(), substr(), match(), length()) now work entirely in terms of characters, not bytes, even in multibyte locales.

Although I consider gawk to be mature and stable, problems do still crop up once in a while, although admittedly in corner cases where the corners have been getting progressively darker, smaller, and more remote. (:-)

LP: How do you feel Gawk stands up to--and indeed cooperates with--the functionality provided by Perl, Python or newer languages like Ruby?

Robbins: Gawk has the advantage of being based on a standard. It's also a huge advantage that there are other implementations; it gives me something to measure myself against in many ways. In the cases you mention, the Language and The Implementation Are One. I think that's not good: You don't get a really good shakedown of a language definition until there are multiple implementations. This was true of both C and C++, and is true now of awk.

Gawk isn't as ambitious as the other languages, which is also good. Gawk programs are refreshingly small, easy to read and write, usually fast enough, and usually capable enough. I occasionally have people tell me that gawk is faster than perl on their particular tasks, which is always fun to hear.

I do think python is an elegant language, and wish I had time to work with it and learn it well.

LP: What do you see as the future direction of gawk? Is it something that continues to grow?

Robbins: Surprisingly, yes. There are two independent major pieces of work going on that I hope will one day contribute significantly to the main gawk release. One project affects the language, the other the implementation. I don't have the time I'd like to give these the attention they deserve. One of the projects is publicly available, see xmlgawk.sourceforge.net.

One of these days I want to revamp the way loadable extensions work. So I haven't given up entirely on new features. :-)

LP: Gawk runs on a wide range of platforms. How do you coordinate the efforts of all the maintainers (myself included!) into the gawk project?

Robbins: It actually isn't that hard. I have a person or team+team leader for the different non-Unix platforms. As a result, patches for specific platforms don't overlap with each other, making it easy to apply them.

When problems are found, sometimes the reporter will also submit a patch too, which is great. Otherwise, they can usually provide me ssh access to the failing system so that I can use gdb and figure out what the problem is.

The debugging is what's time consuming, especially if it's on a system "across the pond" as you English folk like to say. In that case, my old-fashioned skills with the `ed' editor can actually pay off!

The GNU Translation Project is pretty automated; I occasionally email the URL of a test release to the Translation Robot, and magically, within a few days, I get email with the URL of new translations. I download them and drop them into the directory; it doesn't take 3 minutes. What's nice is that the number of translations continues to grow, too.

Updating infrastructure (automake, autoconf, gettext) isn't always as easy as I'd like, but I don't have to do it that often, which is good. And about once a year I update files from other projects, such as the regular expression and argument parsing libraries from GLIBC. That seems to be enough; it isn't necessary to track every little change to those routines.

My release "spirals" take longer than I'd like. I suppose if I released more often, it wouldn't be quite so difficult. I think of them as spirals because my initial test releases usually don't compile or run everywhere, and thus I begin a process of "spiralling in" to the point where things compile and pass the test suite on as many compiler+platform combinations as my testers have access to.

Most Popular LinuxPlanet Stories