November 26, 2014
 
 
RSSRSS feed

Using Temporary Tables to Speed Up MySQL

When to use Temporary Tables

  • October 30, 2009
  • By Rob Gravelle

Knowing how to use temporary tables in your stored procedures is a great way to save time and effort, particularly when dealing with multiple queries that return very large and/or similar resultsets. If you're fetching several resultsets that contain a lot of common data, it can make sense to obtain a single resultset that has all the data that's required, store it in a temporary table, and then select from this temporary table as needed. Similarly, when repeatedly referring to the same subset of table data, it's often faster and more economical to store results in a temporary table, rather than repeating a complex join several times. In today's article, we'll review just how easily temporary tables can be utilized to make your stored procedures more efficient.

When to use Temporary Tables

Temporary Tables, or temp tables for short, allow you to create a short-term storage place within the database for a set of data that you need to use several times in a single series of operations. Temp tables come into play when it isn't possible to retrieve all the data that you require using one SELECT statement or when you want to work with subsets of the same, larger resultset over several successive operations. Temp tables are supported in MySQL 3.23 and later.

To create one, all you need to do is include the TEMPORARY keyword in a table creation statement:

CREATE TEMPORARY TABLE temp_table (count int);

You can now use the table in a query:

INSERT INTO temp_table VALUES((1),(2),(3));SELECT * FROM temp_table;Returns:Count
-----
1
2
3

You don't have to worry about removing the table when you're done because it will be deleted once you log out. Attempting to refer to it in a later session will result in an error such as:

Table '.temp_table' doesn't exist.

If you need to, you can delete temp tables using the same command as for regular ones:

DROP TABLE temp_table;

Using a Temp Table to Combine Data

One very common use of temp tables is to combine data from different sources together to produce a single result set. In the following example, the "temp_table_article" database contains three tables: one that holds client data, one that stores citizenship information, and a relationship table that links them:

CREATE TABLE 'temp_table_article'.'client' (
	'client_id' INT UNSIGNED NOT NULL AUTO_INCREMENT, 
	'date_of_birth' DATE, 
	'gender' CHAR (1), 
	'logical_delete_indicator' TINYINT (1) DEFAULT '0' NOT NULL, 
	PRIMARY KEY('client_id'), 
	UNIQUE('client_id')
);INSERT INTO temp_table_article.client VALUES(1, '1976-02-12', 'M', 1);
INSERT INTO temp_table_article.client VALUES(2, '1944-01-15', 'F', 0);
INSERT INTO temp_table_article.client VALUES(3, '1956-06-04', 'M', 1);
INSERT INTO temp_table_article.client VALUES(4, '1938-11-19', 'F', 0); CREATE TABLE 'temp_table_article'.'citizenship' (
	'citizenship_id' INT UNSIGNED AUTO_INCREMENT, 
	'country_code' INT UNSIGNED, 
	'primary_citizenship' TINYINT UNSIGNED DEFAULT '1' NOT NULL, 
	'logical_delete_indicator' TINYINT UNSIGNED DEFAULT '0' NOT NULL, 
	PRIMARY KEY('citizenship_id'), 
	UNIQUE('citizenship_id')
); INSERT INTO temp_table_article.citizenship VALUES(1, 23, 1, 0);
INSERT INTO temp_table_article.citizenship VALUES(2, 22, 0, 1);
INSERT INTO temp_table_article.citizenship VALUES(3, 19, 1, 1);
INSERT INTO temp_table_article.citizenship VALUES(4, 20, 0, 0);
INSERT INTO temp_table_article.citizenship VALUES(5, 19, 1, 1);
INSERT INTO temp_table_article.citizenship VALUES(6, 56, 1, 0);
INSERT INTO temp_table_article.citizenship VALUES(7, 77, 1, 0); CREATE TABLE 'temp_table_article'.'client_citizenship_rel' (
	'relationship_id' INT UNSIGNED NOT NULL AUTO_INCREMENT, 
	'client_id' INT UNSIGNED NOT NULL, 
	'citizenship_id' INT UNSIGNED NOT NULL, PRIMARY KEY('relationship_id'), 
	UNIQUE('relationship_id')
);INSERT INTO temp_table_article.client_citizenship_rel VALUES(1, 1, 1);
INSERT INTO temp_table_article.client_citizenship_rel VALUES(2, 1, 2);
INSERT INTO temp_table_article.client_citizenship_rel VALUES(3, 1, 3);
INSERT INTO temp_table_article.client_citizenship_rel VALUES(4, 2, 4);
INSERT INTO temp_table_article.client_citizenship_rel VALUES(5, 2, 5);
INSERT INTO temp_table_article.client_citizenship_rel VALUES(6, 3, 6);
INSERT INTO temp_table_article.client_citizenship_rel VALUES(7, 3, 7);
INSERT INTO temp_table_article.client_citizenship_rel VALUES(8, 2, 7);

The preceding tables' relationship can be represented graphically as follows:

The preceding tables' relationship can be represented graphically

I have stated before that normalized structures such as the one above are great for efficiently storing data, but are actually less-than-ideal for querying. We can see for ourselves as soon as we try to obtain a list of active clients and their related citizenships. Assuming that citizenships are not mandatory, we would want all of the records in the client table whose logical_delete_indicator are equal to 0. Joining the tables using common fields, we could write the following SELECT statement:

SELECT cl.*,   
       cit.*  
FROM   temp_table_article.client AS cl,   
       temp_table_article.citizenship AS cit,   
       temp_table_article.client_citizenship_rel AS rel  
WHERE  cl.client_id = rel.client_id 
AND    cit.citizenship_id = rel.citizenship_id
AND    cl.logical_delete_indicator = 0
AND    cit.logical_delete_indicator = 0;

It does limit the results to active clients, but it also misses those who do not possess any associated citizenships. Here are the first three fields of the preceding query's results:

client_id    date_of_birth    gender    ...		
2            1944-01-15       F		
2            1944-01-15       F					

We could try an OUTER JOIN to include all of the client records, regardless of whether or not they possess any related citizenships:

SELECT cl.*,   
       cit.*  
FROM   temp_table_article.client AS cl,   
       temp_table_article.citizenship AS cit     
LEFT OUTER JOIN temp_table_article.client_citizenship_rel rel ON rel.client_id  
WHERE cit.citizenship_id = rel.citizenship_id
AND   cl.logical_delete_indicator = 0
AND   cit.logical_delete_indicator = 0;

This solution is even more inadequate than the first, as it produces a cartesian product, which is a dataset of two relations whose join is not restricted by any criteria, resulting in every tuple of the first relation being matched with every tuple of the second relation:

client_id    date_of_birth      gender		...
2            1944-01-15         F		
4            1938-11-19         F		
2            1944-01-15         F		
4            1938-11-19         F		
2            1944-01-15         F		
4            1938-11-19         F		
2            1944-01-15         F		
4            1938-11-19         F		
2            1944-01-15         F		
4            1938-11-19         F					

What we'd like to do is append the extra records to the first query to include the orphan client rows. One way to do that is by using a temporary table. Here's the code to store the fields that we are interested in:

CREATE TEMPORARY TABLE client_citizenship (
	client_id INT NOT NULL, 
	date_of_birth DATE, 
	gender CHAR (1), 
   	citizenship_id INT, 
	country_code INT , 
	primary_citizenship TINYINT
);

To populate the table, we'll use an INSERT INTO statement. It supports values or the results of a SELECT query, as we'll be doing:

INSERT INTO client_citizenship
SELECT 	cl.client_id,
	cl.date_of_birth, 
	cl.gender,    
	cit.citizenship_id, 
	cit.country_code, 
	cit.primary_citizenship
FROM  	temp_table_article.client AS cl,   
	temp_table_article.citizenship AS cit,   
	temp_table_article.client_citizenship_rel AS rel  
WHERE cl.client_id  		= rel.client_id 
AND cit.citizenship_id 	= rel.citizenship_id
AND cit.logical_delete_indicator = 0
AND cl.logical_delete_indicator = 0;

We can now perform a "SELECT ALL" on our temp table to see how we're coming along:

SELECT * FROM client_citizenship;Returns:client_id, date_of_birth, gender, citizenship_id, country_code, primary_citizenship
2,         1944-01-15,    F,      4,              20,           0
2,         1944-01-15,    F,      7,              77,           1

Now all that's missing are client records who do not possess associated citizenships. To fetch those, we can perform a second INSERT which includes a "NOT IN" clause. The idea is to find those client records that do not appear in the relationship table. To do that, we can perform a subquery that retrieves just the client_ids. Unfortunately, you cannot refer to a temporary table more than once in the same query in MySQL, so we can't compare the client_ids directly to those which are already in the temp table. For example, the following won't work:

INSERT INTO  client_citizenship
SELECT 
FROM   temp_table_article.client AS cl,
       client_citizenship AS cc
WHERE  cl.client_id <> cc.client_id
AND cl.logical_delete_indicator = 0;

It results in the following error:

ERROR 1137: Can't reopen table: 'client_citizenship'

Instead of reading from the temp table, we can retrieve the client_ids from the relationship table:

INSERT INTO client_citizenship
SELECT cl.client_id,
       cl.date_of_birth,
       cl.gender,    
       NULL, 
       NULL, 
       NULL
FROM   temp_table_article.client AS cl
WHERE  cl.client_id NOT IN 
(SELECT rel.client_id
FROM    temp_table_article.client_citizenship_rel rel)
AND cl.logical_delete_indicator = 0;

Even though we aren't looking for citizenship fields, we still have to insert something in every column of the temp table. Nulls are always a good choice, as they work for any data type. Like last time, we want to check that the row has not been logically deleted.

Now, we can select our final resultset from the temp table:

SELECT * FROM client_citizenship;Returns:client_id, date_of_birth, gender, citizenship_id, country_code, primary_citizenship
2,         1944-01-15,    F,      1,              23,           1
2,         1944-01-15,    F,      7,              77,           1
4,         1938-11-19,    F,      NULL,           NULL,         NULL

In this example, I used codes where ever possible, just like in a real world database. To pick up the text descriptions, we would include subqueries to a code table for each field in our final SELECT statement. We'll be looking at subqueries next time, so I'll elaborate on this point then.

See All Articles by Columnist Rob Gravelle

Article courtesy of Database Journal

Sitemap | Contact Us