Apples and Oranges, Part II: A Linux DBMS Comparison

By: Matthias Warkus
Thursday, November 18, 1999 01:25:46 PM EST
URL: http://www.linuxplanet.com/linuxplanet/tutorials/1251/1/

The Database Design

Editor's Note: This is Part II of a series covering freely available Linux DBMSes. You can find the first part of the series here.

To test and compare three database management systems, I naturally needed a database to manage with them. After having read SQL for Dummies, I had some basic knowledge of how realistic databases are designed, so I picked a particularly unimaginative real-world case (an Internet bookstore, what else?) and wrote up a little database design exploiting a couple of SQL's capabilities.

There are five tables in this database. BOOK stores the available books with bookkeeping information; CUSTOMER contains the registered customers. For every order, a row in BOOKORDER is created, referencing its customer. For every item ordered, a row in ORDER_POSITION references its order number. As a kind of bonus, I've added a RATING table for book ratings.

The following is SQL code that can be fed into a DBMS's SQL monitor and should be accepted without any problem. All the tables are dropped before creation so the script won't bail out if some of them already exist.

DROP TABLE BOOK;
CREATE TABLE BOOK (
    ARTICLE_NO            INTEGER        PRIMARY KEY,
	AUTHOR_FIRST_NAMES    CHARACTER(30),
	AUTHOR_LAST_NAMES     CHARACTER(30),
	TITLE                 CHARACTER(30),
	ISBN                  CHARACTER(13)  UNIQUE, 
	WHOLESALE_PRICE       NUMERIC(4,2),
	RETAIL_PRICE          NUMERIC(4,2),
	COPIES_AVAILABLE      INTEGER
);

DROP TABLE CUSTOMER;
CREATE TABLE CUSTOMER (
    CUSTOMER_NO           INTEGER        PRIMARY KEY,
	FIRST_NAMES           CHARACTER(30),
	LAST_NAMES            CHARACTER(30),
	STREET                CHARACTER(30),
	HOUSE_NO              SMALLINT,
	POSTCODE              CHARACTER(7),
	TOWN                  CHARACTER(30),
	ISO_COUNTRY_CODE      CHARACTER(2)
);

DROP TABLE BOOKORDER;
CREATE TABLE BOOKORDER (
    ORDER_NO              INTEGER        PRIMARY KEY,
	CUSTOMER_NO           INTEGER        NOT NULL,
	ORDERED               DATE,
	DELIVERY              DATE,
	STATUS                CHARACTER(8)
	    CHECK (STATUS IN ('ACCEPTED',
		                  'DELAYED',
						  'SHIPPED',
						  'RETURNED',
						  'PAID')),
	CONSTRAINT CUSTOMER_FK FOREIGN KEY (CUSTOMER_NO)
	    REFERENCES KUNDE (KUNDENNAME)
);

DROP TABLE ORDER_POSITION;
CREATE TABLE ORDER_POSITION (
    POSITION_NO           INTEGER        PRIMARY KEY,
    ORDER_NO              INTEGER        NOT NULL,
	ARTICLE_NO            INTEGER        NOT NULL,
	NUMBER                SMALLINT,
	CONSTRAINT ORDER_FK FOREIGN KEY (ORDER_NO)
	    REFERENCES BOOKORDER (ORDER_NO),
	CONSTRAINT BOOK_FK FOREIGN KEY (ARTICLE_NO)
	    REFERENCES BOOK (ARTICLE_NO)
);

DROP TABLE RATING;
CREATE TABLE RATING (
    RATING_NO             INTEGER        PRIMARY KEY,
    ARTICLE_NO            INTEGER        NOT NULL,
	SCORE                 NUMERIC(1,0),
	COMMENT               CHARACTER VARYING(300),
	CONSTRAINT BOOK_FK FOREIGN KEY (ARTICLE_NO)
	    REFERENCES BOOK (ARTICLE_NO)
);

Of course, this is an extremely simple-minded database. It tries to look realistic, but it's not suited for real-world application. It doesn't save customer records or anything, and it hasn't even got a column for a book's publisher. It's just a test case.

Note that I didn't want to undertake the effort of constraining CUSTOMER.ISO_COUNTRY_CODE to the codes that are actually valid today. I've got a couple of constraints in the code just to test whether the database systems they will accept them; I did not try to make the database bullet-proof.

Adapting the Design to PostgreSQL

The difficulties I encountered when I fed the ANSI-compliant CREATE TABLE statements into PostgreSQL's psql monitor program were only minor. I received some warnings that foreign key constraints were accepted but not yet implemented, and I had to cut the COMMENT field in RATING back to 255 characters since this is the maximum field width for a CHARACTER VARYING field in PostgreSQL. The system provides BLOB data types for storing larger amounts of data, but those aren't in the standard version so I decided not to use them. Another issue is rather ugly--since I couldn't find information about how PostgreSQL casts NUMERICs from and into C data types and since I didn't want to use floats anyway to avoid round-off errors, I decided to make the monetary field integer numbers of cents.

I ended up with this slightly different script:

DROP TABLE BOOK;
CREATE TABLE BOOK (
    ARTICLE_NO            INTEGER        PRIMARY KEY,
	AUTHOR_FIRST_NAMES    CHARACTER(30),
	AUTHOR_LAST_NAMES     CHARACTER(30),
	TITLE                 CHARACTER(30),
	ISBN                  CHARACTER(13)  UNIQUE, 
	WHOLESALE_PRICE       INTEGER,
	RETAIL_PRICE          INTEGER,
	COPIES_AVAILABLE      INTEGER
);

DROP TABLE CUSTOMER;
CREATE TABLE CUSTOMER (
    CUSTOMER_NO           INTEGER        PRIMARY KEY,
	FIRST_NAMES           CHARACTER(30),
	LAST_NAMES            CHARACTER(30),
	STREET                CHARACTER(30),
	HOUSE_NO              SMALLINT,
	POSTCODE              CHARACTER(7),
	TOWN                  CHARACTER(30),
	ISO_COUNTRY_CODE      CHARACTER(2)
);

DROP TABLE BOOKORDER;
CREATE TABLE BOOKORDER (
    ORDER_NO              INTEGER        PRIMARY KEY,
	CUSTOMER_NO           INTEGER        NOT NULL,
	ORDERED               DATE,
	DELIVERY              DATE,
	STATUS                CHARACTER(8)
	    CHECK (STATUS IN ('ACCEPTED',
		                  'DELAYED',
						  'SHIPPED',
						  'RETURNED',
						  'PAID')),
	CONSTRAINT CUSTOMER_FK FOREIGN KEY (CUSTOMER_NO)
	    REFERENCES KUNDE (KUNDENNAME)
);

DROP TABLE ORDER_POSITION;
CREATE TABLE ORDER_POSITION (
    POSITION_NO           INTEGER        PRIMARY KEY,
    ORDER_NO              INTEGER        NOT NULL,
	ARTICLE_NO            INTEGER        NOT NULL,
	NUMBER                SMALLINT,
	CONSTRAINT ORDER_FK FOREIGN KEY (ORDER_NO)
	    REFERENCES BOOKORDER (ORDER_NO),
	CONSTRAINT BOOK_FK FOREIGN KEY (ARTICLE_NO)
	    REFERENCES BOOK (ARTICLE_NO)
);

DROP TABLE RATING;
CREATE TABLE RATING (
    RATING_NO             INTEGER        PRIMARY KEY,
    ARTICLE_NO            INTEGER        NOT NULL,
	SCORE                 SMALLINT,
	COMMENT               CHARACTER VARYING(255),
	CONSTRAINT BOOK_FK FOREIGN KEY (ARTICLE_NO)
	    REFERENCES BOOK (ARTICLE_NO)
);

Adapting the Design to MySQL

MySQL ignores foreign key constraints like PostgreSQL, but it will barf on UNIQUE constraints. The resulting script is not much different from the PostgreSQL script:

DROP TABLE BOOK;
CREATE TABLE BOOK (
    ARTICLE_NO            INTEGER        PRIMARY KEY,
	AUTHOR_FIRST_NAMES    CHARACTER(30),
	AUTHOR_LAST_NAMES     CHARACTER(30),
	TITLE                 CHARACTER(30),
	ISBN                  CHARACTER(13), 
	WHOLESALE_PRICE       INTEGER,
	RETAIL_PRICE          INTEGER,
	COPIES_AVAILABLE      INTEGER
);

DROP TABLE CUSTOMER;
CREATE TABLE CUSTOMER (
    CUSTOMER_NO           INTEGER        PRIMARY KEY,
	FIRST_NAMES           CHARACTER(30),
	LAST_NAMES            CHARACTER(30),
	STREET                CHARACTER(30),
	HOUSE_NO              SMALLINT,
	POSTCODE              CHARACTER(7),
	TOWN                  CHARACTER(30),
	ISO_COUNTRY_CODE      CHARACTER(2)
);

DROP TABLE BOOKORDER;
CREATE TABLE BOOKORDER (
    ORDER_NO              INTEGER        PRIMARY KEY,
	CUSTOMER_NO           INTEGER        NOT NULL,
	ORDERED               DATE,
	DELIVERY              DATE,
	STATUS                CHARACTER(8),
	CONSTRAINT CUSTOMER_FK FOREIGN KEY (CUSTOMER_NO)
	    REFERENCES KUNDE (KUNDENNAME)
);

DROP TABLE ORDER_POSITION;
CREATE TABLE ORDER_POSITION (
    POSITION_NO           INTEGER        PRIMARY KEY,
    ORDER_NO              INTEGER        NOT NULL,
	ARTICLE_NO            INTEGER        NOT NULL,
	NUMBER                SMALLINT,
	CONSTRAINT ORDER_FK FOREIGN KEY (ORDER_NO)
	    REFERENCES BOOKORDER (ORDER_NO),
	CONSTRAINT BOOK_FK FOREIGN KEY (ARTICLE_NO)
	    REFERENCES BOOK (ARTICLE_NO)
);

DROP TABLE RATING;
CREATE TABLE RATING (
    RATING_NO             INTEGER        PRIMARY KEY,
    ARTICLE_NO            INTEGER        NOT NULL,
	SCORE                 NUMERIC(1,0),
	COMMENT               CHARACTER VARYING(255),
	CONSTRAINT BOOK_FK FOREIGN KEY (ARTICLE_NO)
	    REFERENCES BOOK (ARTICLE_NO)
);

Adapting the Design to mSQL

As mSQL is a scaled-down database manager (indeed, some may question whether MySQL and mSQL are database-management systems at all), it has jettisoned most of SQL's functionality and accepts only a very restricted subset of SQL. Thus, mSQL's script looks a lot different:

DROP TABLE BOOK
CREATE TABLE BOOK (
    ARTICLE_NO            INTEGER    NOT NULL,
	AUTHOR_FIRST_NAMES    CHARACTER(30),
	AUTHOR_LAST_NAMES     CHARACTER(30),
	TITLE                 CHARACTER(30),
	ISBN                  CHARACTER(13), 
	WHOLESALE_PRICE       MONEY,
	RETAIL_PRICE          MONEY,
	COPIES_AVAILABLE      INTEGER
)

DROP TABLE CUSTOMER
CREATE TABLE CUSTOMER (
    CUSTOMER_NO           INTEGER    NOT NULL,
	FIRST_NAMES           CHARACTER(30),
	LAST_NAMES            CHARACTER(30),
	STREET                CHARACTER(30),
	HOUSE_NO              SMALLINT,
	POSTCODE              CHARACTER(7),
	TOWN                  CHARACTER(30),
	ISO_COUNTRY_CODE      CHARACTER(2)
)

DROP TABLE BOOKORDER
CREATE TABLE BOOKORDER (
    ORDER_NO              INTEGER    NOT NULL,
	CUSTOMER_NO           INTEGER    NOT NULL,
	ORDERED               DATE,
	DELIVERY              DATE,
	STATUS                CHARACTER(1)
)

DROP TABLE ORDER_POSITION
CREATE TABLE ORDER_POSITION (
    POSITION_NO           INTEGER    NOT NULL,
    ORDER_NO              INTEGER    NOT NULL,
	ARTICLE_NO            INTEGER    NOT NULL,
	NUMBER                SMALLINT
)

DROP TABLE RATING
CREATE TABLE RATING (
    RATING_NO             INTEGER    NOT NULL,
    ARTICLE_NO            INTEGER    NOT NULL,
	SCORE                 SMALLINT,
	COMMENT               TEXT(255)
)

Nearly all of the constraints have disappeared, and NUMERIC and CHARACTER VARYING have been replaced by MONEY and TEXT, respectively.

There is an annoying deficiency in mSQL's monitor program: it doesn't seem to accept SQL scripts fed into it from standard input. Thus, one needs to cut and paste the code. mSQL will complain about semicolons, too; I ended up feeding the commands into it one by one and terminating each with \g (the "go" 'slash command').

Implementing the Test Client

To compare the three database managers, I decided to write a test client in C for the purpose of performing transactions on the bookstore database. As a result, I implemented a couple of actions to be able to compare the APIs. For the performance comparisons, I then implemented them fully and added a noninteractive mode to the client so it could run on its own, generating random data and randomly executing transactions.

I decided to implement the following actions on the sample database:

  • Adding a new book: INSERT INTO BOOK (...) VALUES (...);
  • Deleting an existing book: DELETE FROM BOOK WHERE ARTICLE_NO=...;
  • Adding a customer: INSERT INTO CUSTOMER (...) VALUES (...);
  • Deleting a customer: DELETE FROM CUSTOMER WHERE CUSTOMER_NO=...;
  • A customer ordering books: INSERT INTO BOOKORDER (...) VALUES (...); INSERT INTO ORDER_POSITION (...) VALUES (...);;
  • A customer rating a book: INSERT INTO RATING (...) VALUES (...);
  • Changing an order's status: UPDATE BOOKORDER SET STATUS=... WHERE ORDER_NO=...;

Then, the following reports can be generated:

  • List of books: SELECT * FROM BOOK;
  • List of customers: SELECT * FROM CUSTOMER;
  • List of deliveries underway, sorted by status: SELECT * FROM BOOKORDER ORDER BY STATUS;
  • Profit margin for the books, with final average: SELECT RETAIL_PRICE-WHOLESALE_PRICE FROM BOOK; SELECT AVG(RETAIL_PRICE-WHOLESALE_PRICE) FROM BOOK;
  • Comments, ratings and average rating for a book: SELECT * FROM RATING WHERE ARTICLE_NO=...; SELECT AVG(SCORE) FROM RATING WHERE ARTICLE_NO=...;

Bringing the Client to Life on PostgreSQL

The nice thing about programming PostgreSQL in C is that you can use Embedded SQL. (Well, at least I like it.) It's not very well-documented, but the ESQL preprocessor ecpg works and generates code interfacing to PostgreSQL just fine. SQL's fixed-width way of thinking sometimes got in my way; but, otherwise, it wasn't hard to develop the client.

Did I say "not very well-documented"? That's an understatement. The otherwise complete HTML documentation of PostgreSQL is sorely lacking in this area. My knowledge of ESQL from books was rudimentary, and the online documentation didn't help much, so I had to work out all by myself exactly how C variables are cast into NUMERIC values by ecpg--among other things. Furthermore, the ESQL preprocessor is not very verbose and always seems to bail out completely whenever it encounters any trivial error. This will be a hassle for anyone working on projects with long turn-around times.

(Side note: I am fortunate enough to never have written any COBOL, but I know what it's about, and SQL is very reminiscent of it. The NUMERIC syntax is one example, but then, so is all of SQL's. Not that I would appreciate it any more if the syntax was C-like.)

I encountered some minor weirdnesses when programming the client with PostgreSQL. For example, while it is documented as being possible, ecpg would not accept a FOR READ ONLY clause when declaring a cursor. The ORDER BY clause isn't even implemented. The problems I've encountered are mostly related to the ecpg preprocessor. Postgres has a C API (after all, ESQL needs to be compiled into something), which is probably excellent, but I didn't use it (that's life). When there's ESQL, I want to use ESQL.

This is the list_books() function from postgres-client.pgc:

void list_books(void)
{
  EXEC SQL BEGIN DECLARE SECTION;
  int article_no;
  char author_first_names[30];
  char author_last_names[30];
  char title[30];
  char isbn[14];
  int wholesale_price;
  int retail_price;
  int copies_available;
  EXEC SQL END DECLARE SECTION;

  EXEC SQL DECLARE book_cursor CURSOR FOR
    SELECT ARTICLE_NO, AUTHOR_FIRST_NAMES, AUTHOR_LAST_NAMES,
           TITLE, ISBN, WHOLESALE_PRICE, RETAIL_PRICE,
	       COPIES_AVAILABLE FROM BOOK;

  EXEC SQL OPEN book_cursor;

  while (1)
	{
	  EXEC SQL FETCH NEXT FROM book_cursor
		INTO :article_no, :author_first_names, :author_last_names,
		:title, :isbn, :wholesale_price, :retail_price,
		:copies_available;
	  if (sqlca.sqlcode == 100) /* 100 == NOT FOUND */
		break; /* bail out */

	  printf("\nArticle no. %d\n", article_no);
	  printf("%s, %s:\n", author_last_names, author_first_names);
	  printf("    %s (%s)\n", title, isbn);
	  printf("Bought at %d; selling at %d; %d copies available\n\n",
			 wholesale_price, retail_price, copies_available);
	};

  EXEC SQL CLOSE book_cursor;
}

The code is pretty straightforward. It declares a couple of host variables, wrapping the declaration in a BEGIN/END DECLARE SECTION construct, opens a SELECT query with a cursor, and then fetches the rows one by one into the host variables. The cursor is then closed.

I have used the older, deprecated sqlcode variable instead of the more modern sqlstate, because it is far easier to check for a NOT FOUND state this way.

Bringing the Client to Life on MySQL

MySQL's C API is pretty easy to use. The central element is a structure containing information about, and state of, the database connection, which is initialized by connecting to the MySQL server. A pointer to this structure must be passed to all MySQL client functions.

Queries are submitted as strings; this means that one must deal with C's string conversion facilities. Should data containing null bytes (\0) be used, the situation becomes more complicated, because a counted string is then passed instead of a C string.

To fetch a query's result, a pointer to a MYSQL_RES structure and a count variable are initialized with appropriate API functions, and then rows are fetched into a MYSQL_ROW variable, which is an array of strings. Directly putting results into integer variables, as PostgreSQL's implementation of ESQL could do, is not possible. The result buffer is then freed. As you can see, the semantics are almost the same as using a cursor in ESQL.

Here is list_books(), written using MySQL's C API:

list_books(void)
{
  int count;
  MYSQL_RES *result;

  mysql_query(&bookstore, "SELECT ARTICLE_NO, AUTHOR_FIRST_NAMES,\
AUTHOR_LAST_NAMES, TITLE, ISBN, WHOLESALE_PRICE, RETAIL_PRICE,\
COPIES_AVAILABLE FROM BOOK");
  result = mysql_store_result(&bookstore);

  for(count = mysql_num_rows(result); count > 0; count--)
	{
	  MYSQL_ROW record;
	  record = mysql_fetch_row(result);

	  printf("\nArticle no. %s\n", record[0]);
	  printf("%s, %s:\n", record[2], record[1]);
	  printf("    %s (%s)\n", record[3], record[4]);
	  printf("Bought at %s; selling at %s; %s copies available\n\n",
			 record[5], record[6], record[7]);
	};

  mysql_free_result(result);
}

The API functions are concisely, but sufficiently, documented in the Texinfo file that comes as MySQL's main source of documentation.

Bringing the Client to Life on mSQL

The differences between mSQL's and MySQL's C API are very, very small. Thus, there is even an automatic converter. The principal differences are:

  • mSQL does not store a connection data block, only a number (int bookstore)
  • some mSQL functions don't take the connection as an argument
  • mSQL function names are Pascalish (they use StudlyCaps instead of underscores)

The handy MONEY data type is a fixed-precision decimal fraction type with two decimals. To make mSQL correctly store integer numbers of cents in MONEY columns, I needed to convert them, casting them into floats, dividing them and formatting them in the sprintf statement of the add_new_book() function.

This is list_books(), ported to mSQL:

void
list_books(void)
{
  int count;
  m_result *result;

  msqlQuery(bookstore, "SELECT ARTICLE_NO, AUTHOR_FIRST_NAMES,\
AUTHOR_LAST_NAMES, TITLE, ISBN, WHOLESALE_PRICE, RETAIL_PRICE,\
COPIES_AVAILABLE FROM BOOK");
  result = msqlStoreResult();

  for(count = msqlNumRows(result); count > 0; count--)
	{
	  m_row record;
	  record = msqlFetchRow(result);

	  printf("\nArticle no. %s\n", record[0]);
	  printf("%s, %s:\n", record[2], record[1]);
	  printf("    %s (%s)\n", record[3], record[4]);
	  printf("Bought at %s; selling at %s; %s copies available\n\n",
			 record[5], record[6], record[7]);
	};

  msqlFreeResult(result);
}

Documentation for mSQL's C API can be found in the mSQL manual that ships with mSQL as PostScript and as a large HTML file.

Some Early Conclusions

All of the three discussed DBMSes are pretty easy to install, set up and program. The client libraries implementing the C API are tiny; compared to, say, state-of-the-art GUI toolkits, their size is negligible, and there are no big differences in binary size or the memory footprints of the clients.

The increased verbosity and longer turn-around times of PostgreSQL's ESQL API are compensated for by less effort spent converting strings to non-string data types and vice versa.

So far, I have said nothing about performance. I'll delve into that in the next part of this series.

Copyright Jupitermedia Corp. All Rights Reserved.