A subquery is a SELECT statement inside another statement. For example:
SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);
In the above example, SELECT * FROM t1 ... is the outer query (or outer statement), and (SELECT column1 FROM t2) is the subquery. We say that the subquery is nested in the outer query, and in fact it's possible to nest subqueries within other subqueries, to a great depth. A subquery must always be inside parentheses.
Starting with version 4.1, MySQL supports all subquery forms and operations which the SQL standard requires, as well as a few features which are MySQL-specific. The main advantages of subqueries are:
They allow queries which are structured so that it's possible to isolate each part of a statement.
They provide alternative ways to perform operations which would otherwise require complex joins and unions.
They are, in many people's opinion, readable. Indeed, it was the innovation of subqueries which gave people the original idea of calling the early SQL ``Structured Query Language''.
With earlier MySQL versions it was necessary to work around or avoid subqueries, but people starting to write code now will find that subqueries are a very useful part of the toolkit.
Here is an example statement which shows the major points about subquery syntax as specified by the SQL standard and supported in MySQL.
DELETE FROM t1 WHERE s11 > ANY (SELECT COUNT(*) /* no hint */ FROM t2 WHERE NOT EXISTS (SELECT * FROM t3 WHERE ROW(5*t2.s1,77)= (SELECT 50,11*s1 FROM t4 UNION SELECT 50,77 FROM (SELECT * FROM t5) AS t5)));
For MySQL versions prior to 4.1, most subqueries can be successfully rewritten using joins and and other methods. See Rewriting subqueries.
In its simplest form (the scalar subquery as opposed to the row or table subqueries which will be discussed later), a subquery is a simple operand. Thus you can use it wherever a column value or literal is legal, and you can expect it to have those characteristics that all operands have: a data type, a length, an indication whether it can be NULL, and so on. For example:
CREATE TABLE t1 (s1 INT, s2 CHAR(5) NOT NULL); SELECT (SELECT s2 FROM t1);
The subquery in the above SELECT has a data type of CHAR, a length of 5, a character set and collation equal to the defaults in effect at CREATE TABLE time, and an indication that the value in the column can be NULL. In fact almost all subqueries can be NULL, because if the table is empty -- as in the example -- then the value of the subquery will be NULL. There are few restrictions.
A subquery's outer statement can be any one of: SELECT, INSERT, UPDATE, DELETE, SET, or DO.
A subquery can contain any of the keywords or clauses that an ordinary SELECT can contain: DISTINCT, GROUP BY, ORDER BY, LIMIT, joins, hints, UNION constructs, comments, functions, and so on.
So, when you see examples in the following sections that contain the rather Spartan construct (SELECT column1 FROM t1), imagine that your own code will contain much more diverse and complex constructions.
For example, suppose we make two tables:
CREATE TABLE t1 (s1 INT); INSERT INTO t1 VALUES (1); CREATE TABLE t2 (s1 INT); INSERT INTO t2 VALUES (2);
Then perform a SELECT:
SELECT (SELECT s1 FROM t2) FROM t1;
The result will be 2 because there is a row in t2, with a column s1, with a value of 2.
The subquery may be part of an expression. If it is an operand for a function, don't forget the parentheses. For example:
SELECT UPPER((SELECT s1 FROM t1)) FROM t2;
The most common use of a subquery is in the form:
<non-subquery operand> <comparison operator> (<subquery>)
Where <comparison operator> is one of:
= > < >= <= <>
For example:
... 'a' = (SELECT column1 FROM t1)
At one time the only legal place for a subquery was on the right side of a comparison, and you might still find some old DBMSs which insist on that.
Here is an example of a common-form subquery comparison which you can't do with a join: find all the values in table t1 which are equal to a maximum value in table t2.
SELECT column1 FROM t1 WHERE column1 = (SELECT MAX(column2) FROM t2);
Here is another example, which again is impossible with a join because it involves aggregating for one of the tables: find all rows in table t1 which contain a value which occurs twice.
SELECT * FROM t1 WHERE 2 = (SELECT COUNT(column1) FROM t1);
Syntax:
<operand> <comparison operator> ANY (<subquery>) <operand> IN (<subquery>) <operand> <comparison operator> SOME (<subquery>)
The word ANY, which must follow a comparison operator, means ``return TRUE if the comparison is TRUE for ANY of the rows that the subquery returns.'' For example,
SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);
Suppose that there is a row in table t1 containing {10}. The expression is TRUE if table t2 contains {21,14,7} because there is a value in t2 -- 7 -- which is less than 10. The expression is FALSE if table t2 contains {20,10}, or if table t2 is empty. The expression is UNKNOWN if table t2 contains {NULL,NULL,NULL}.
The word IN is an alias for = ANY. Thus these two statements are the same:
SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2); SELECT s1 FROM t1 WHERE s1 IN (SELECT s1 FROM t2);
The word SOME is an alias for ANY. Thus these two statements are the same:
SELECT s1 FROM t1 WHERE s1 <> ANY (SELECT s1 FROM t2); SELECT s1 FROM t1 WHERE s1 <> SOME (SELECT s1 FROM t2);
Use of the word SOME is rare, but the above example shows why it might be useful. The English phrase ``a is not equal to any b'' means, to most people's ears, ``there is no b which is equal to a'' -- which isn't what is meant by the SQL syntax. By using <> SOME instead, you ensure that everyone understands the true meaning of the query.
Syntax:
<operand> <comparison operator> ALL (<subquery>)
The word ALL, which must follow a comparison operator, means ``return TRUE if the comparison is TRUE for ALL of the rows that the subquery returns''. For example,
SELECT s1 FROM t1 WHERE s1 > ALL (SELECT s1 FROM t2);
Suppose that there is a row in table t1 containing {10}. The expression is TRUE if table t2 contains {-5,0,+5} because all three values in t2 are less than 10. The expression is FALSE if table t2 contains {12,6,NULL,-100} because there is a single value in table t2 -- 12 -- which is greater than 10. The expression is UNKNOWN if table t2 contains {0,NULL,1}.
Finally, if table t2 is empty, the result is TRUE. You might think the result should be UNKNOWN, but sorry, it's TRUE. So, rather oddly,
SELECT * FROM t1 WHERE 1 > ALL (SELECT s1 FROM t2);
is TRUE when table t2 is empty, but
SELECT * FROM t1 WHERE 1 > (SELECT s1 FROM t2);
is UNKNOWN when table t2 is empty. In addition,
SELECT * FROM t1 WHERE 1 > ALL (SELECT MAX(s1) FROM t2);
is UNKNOWN when table t2 is empty. In general, tables with NULLs and empty tables are edge cases -- when writing subquery code, always consider whether you have taken those two possibilities into account.
A correlated subquery is a subquery which contains a reference to a column which is also in the outer query. For example:
SELECT * FROM t1 WHERE column1 = ANY (SELECT column1 FROM t2 WHERE t2.column2 = t1.column2);
Notice, in the example, that the subquery contains a reference to a column of t1, even though the subquery's FROM clause doesn't mention a table t1. So MySQL looks outside the subquery, and finds t1 in the outer query.
Suppose that table t1 contains a row where column1 = 5 and column2 = 6; meanwhile table t2 contains a row where column1 = 5 and column2 = 7. The simple expression ... WHERE column1 = ANY (SELECT column1 FROM t2) would be TRUE, but in this example the WHERE clause within the subquery is FALSE (because 7 <> 5), so the subquery as a whole is FALSE.
Scoping rule: MySQL evaluates from inside to outside. For example:
SELECT column1 FROM t1 AS x WHERE x.column1 = (SELECT column1 FROM t2 AS x WHERE x.column1 = (SELECT column1 FROM t3 WHERE x.column2 = t3.column1));
In the above, x.column2 must be a column in table t2 because SELECT column1 FROM t2 AS x ... renames t2. It is not a column in table t1 because SELECT column1 FROM t1 ... is an outer query which is further out.
For subqueries in HAVING or ORDER BY clauses, MySQL also looks for column names in the outer select list.
MySQL's unofficial recommendation is: avoid correlation because it makes your queries look more complex, and run more slowly.
If a subquery returns any values at all, then EXISTS <subquery> is TRUE, and NOT EXISTS <subquery> is FALSE. For example:
SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);
Traditionally an EXISTS subquery starts with SELECT * but it could begin with SELECT 5 or SELECT column1 or anything at all -- MySQL ignores the SELECT list in such a subquery, so it doesn't matter.
For the above example, if t2 contains any rows, even rows with nothing but NULL values, then the EXISTS condition is TRUE. This is actually an unlikely example, since almost always a [NOT] EXISTS subquery will contain correlations. Here are some more realistic examples.
Example: What kind of store is present in one or more cities?
SELECT DISTINCT store_type FROM Stores WHERE EXISTS (SELECT * FROM Cities_Stores WHERE Cities_Stores.store_type = Stores.store_type);
Example: What kind of store is present in no cities?
SELECT DISTINCT store_type FROM Stores WHERE NOT EXISTS (SELECT * FROM Cities_Stores WHERE Cities_Stores.store_type = Stores.store_type);
Example: What kind of store is present in all cities?
SELECT DISTINCT store_type FROM Stores S1 WHERE NOT EXISTS ( SELECT * FROM Cities WHERE NOT EXISTS ( SELECT * FROM Cities_Stores WHERE Cities_Stores.city = Cities.city AND Cities_Stores.store_type = Stores.store_type));
The last example is a double-nested NOT EXISTS query -- it has a NOT EXISTS clause within a NOT EXISTS clause. Formally, it answers the question ``does a city exist with a store which is not in Stores?''. But it's easier to say that a nested NOT EXISTS answers the question ``is x TRUE for all y?''.
The discussion to this point has been of column (or scalar) subqueries -- subqueries which return a single column value. A row subquery is a subquery variant that returns a single row value -- and may thus return more than one column value. Here are two examples:
SELECT * FROM t1 WHERE (1,2) = (SELECT column1, column2 FROM t2); SELECT * FROM t1 WHERE ROW(1,2) = (SELECT column1, column2 FROM t2);
The queries above are both TRUE if table t2 has a row where column1 = 1 and column2 = 2.
The expression (1,2) is sometimes called a row constructor and is legal in other contexts too. For example
SELECT * FROM t1 WHERE (column1,column2) = (1,1);
is equivalent to
SELECT * FROM t1 WHERE column1 = 1 AND column2 = 1;
The normal use of row constructors, though, is for comparisons with subqueries that return two or more columns. For example, this query answers the request: ``find all rows in table t1 which are duplicated in table t2'':
SELECT column1,column2,column3 FROM t1 WHERE (column1,column2,column3) IN (SELECT column1,column2,column3 FROM t2);
Subqueries are legal in a SELECT statement's FROM clause. The syntax that you'll actually see is:
SELECT ... FROM (<subquery>) AS <name> ...
The AS <name> clause is mandatory, because any table in a FROM clause must have a name. Any columns in the <subquery> select list must have unique names. You may find this syntax described elsewhere in this manual, where the term used is ``derived tables''.
For illustration, assume you have this table:
CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);
Here's how to use the Subqueries in the FROM clause feature, using the example table:
INSERT INTO t1 VALUES (1,'1',1.0); INSERT INTO t1 VALUES (2,'2',2.0); SELECT sb1,sb2,sb3 FROM (SELECT s1 AS sb1, s2 AS sb2, s3*2 AS sb3 FROM t1) AS sb WHERE sb1 > 1;
Result: 2, '2', 4.0.
Here's another example: Suppose you want to know the average of the sum for a grouped table. This won't work:
SELECT AVG(SUM(column1)) FROM t1 GROUP BY column1;
But this query will provide the desired information:
SELECT AVG(sum_column1) FROM (SELECT SUM(column1) AS sum_column1 FROM t1 GROUP BY column1) AS t1;
Notice that the column name used within the subquery (sum_column1) is recognized in the outer query.
At the moment, subqueries in the FROM clause cannot be correlated subqueries.
There are some new error returns which apply only to subqueries. This section groups them together because reviewing them will help remind you of some points.
ERROR 1235 (ER_NOT_SUPPORTED_YET) SQLSTATE = 42000 Message = "This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'"
This means that
SELECT * FROM t1 WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1)
will not work, but only in some early versions, such as MySQL 4.1.1.
ERROR 1240 (ER_CARDINALITY_COL) SQLSTATE = 21000 Message = "Operand should contain 1 column(s)"
This error will occur in cases like this:
SELECT (SELECT column1, column2 FROM t2) FROM t1;
It's okay to use a subquery that returns multiple columns, if the purpose is comparison. See Row subqueries. But in other contexts the subquery must be a scalar operand.
ERROR 1241 (ER_SUBSELECT_NO_1_ROW) SQLSTATE = 21000 Message = "Subquery returns more than 1 row"
This error will occur in cases like this:
SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);
but only when there is more than one row in t2. That means this error might occur in code that has been working for years, because somebody happened to make a change which affected the number of rows that the subquery can return. Remember that if the object is to find any number of rows, not just one, then the correct statement would look like this:
SELECT * FROM t1 WHERE column1 = ANY (SELECT column1 FROM t2);
Error 1093 (ER_UPDATE_TABLE_USED) SQLSTATE = HY000 Message = "You can't specify target table 'x' for update in FROM clause"
This error will occur in cases like this:
UPDATE t1 SET column2 = (SELECT MAX(column1) FROM t1);
It's okay to use a subquery for assignment within an UPDATE statement, since subqueries are legal in UPDATE and in DELETE statements as well as in SELECT statements. However, you cannot use the same table, in this case table t1, for both the subquery's FROM clause and the update target.
Usually, failure of the subquery causes the entire statement to fail.
Development is ongoing, so no optimization tip is reliable for the long term. Some interesting tricks that you might want to play with are:
Using subquery clauses which affect the number or order of the rows in the subquery, for example
SELECT * FROM t1 WHERE t1.column1 IN (SELECT column1 FROM t2 ORDER BY column1); SELECT * FROM t1 WHERE t1.column1 IN (SELECT DISTINCT column1 FROM t2); SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 LIMIT 1);
Replacing a join with a subquery, for example
SELECT DISTINCT column1 FROM t1 WHERE t1.column1 IN ( SELECT column1 FROM t2);
instead of
SELECT DISTINCT t1.column1 FROM t1, t2 WHERE t1.column1 = t2.column1;
Moving clauses from outside to inside the subquery, for example:
SELECT * FROM t1 WHERE s1 IN (SELECT s1 FROM t1 UNION ALL SELECT s1 FROM t2);
instead of
SELECT * FROM t1 WHERE s1 IN (SELECT s1 FROM t1) OR s1 IN (SELECT s1 FROM t2);
For another example:
SELECT (SELECT column1 + 5 FROM t1) FROM t2;
instead of
SELECT (SELECT column1 FROM t1) + 5 FROM t2;
Using a row subquery instead of a correlated subquery, for example:
SELECT * FROM t1 WHERE (column1,column2) IN (SELECT column1,column2 FROM t2);
instead of
SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t2.column1=t1.column1 AND t2.column2=t1.column2);
Using NOT (a = ANY (...)) rather than a <> ALL (...).
Using x = ANY (table containing {1,2}) rather than x=1 OR x=2.
Using = ANY rather than EXISTS
The above tricks may cause programs to go faster or slower. Using MySQL facilities like the BENCHMARK() function, you can get an idea about what helps in your own situation. Don't worry too much about transforming to joins except for compatibility with older versions.
Some optimizations that MySQL itself will make are:
MySQL will execute non-correlated subqueries only once, (use EXPLAIN to make sure that a given subquery really is non-correlated),
MySQL will rewrite IN/ALL/ANY/SOME subqueries in an attempt to take advantage of the possibility that the select-list columns in the subquery are indexed,
MySQL will replace subqueries of the form
... IN (SELECT indexed_column FROM single_table ...)
with an index-lookup function, which EXPLAIN will describe as a special join type,
MySQL will enhance expressions of the form
value {ALL|ANY|SOME} {> | < | >= | <=} (non-correlated subquery)
with an expression involving MIN or MAX (unless NULL values or empty sets are involved). For example,
WHERE 5 > ALL (SELECT x FROM t)
might be treated as
WHERE 5 > (SELECT MAX(x) FROM t)
There is a chapter titled ``How MySQL Transforms Subqueries'' in the MySQL Internals Manual, which you can find by downloading the MySQL source package and looking for a file named internals.texi.
Up to version 4.0, only nested queries of the form INSERT ... SELECT ... and REPLACE ... SELECT ... are supported. The IN() construct can be used in other contexts.
It is often possible to rewrite a query without a subquery:
SELECT * FROM t1 WHERE id IN (SELECT id FROM t2);
This can be rewritten as:
SELECT t1.* FROM t1,t2 WHERE t1.id=t2.id;
The queries:
SELECT * FROM t1 WHERE id NOT IN (SELECT id FROM t2); SELECT * FROM t1 WHERE NOT EXISTS (SELECT id FROM t2 WHERE t1.id=t2.id);
Can be rewritten as:
SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id WHERE table2.id IS NULL;
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better -- a fact that is not specific to MySQL Server alone. Prior to SQL-92, outer joins did not exist, so subqueries were the only way to do certain things in those bygone days. Today, MySQL Server and many other modern database systems offer a whole range of outer joins types.
For more complicated subqueries you can often create temporary tables to hold the subquery. In some cases, however, this option will not work. The most frequently encountered of these cases arises with DELETE statements, for which standard SQL does not support joins (except in subqueries). For this situation there are three options available:
The first option is to upgrade to MySQL version 4.1.
The second option is to use a procedural programming language (such as Perl or PHP) to submit a SELECT query to obtain the primary keys for the records to be deleted, and then use these values to construct the DELETE statement (DELETE FROM ... WHERE ... IN (key1, key2, ...)).
The third option is to use interactive SQL to construct a set of DELETE statements automatically, using the MySQL extension CONCAT() (in lieu of the standard || operator). For example:
SELECT CONCAT('DELETE FROM tab1 WHERE pkid = ', "'", tab1.pkid, "'", ';') FROM tab1, tab2 WHERE tab1.col1 = tab2.col2;
You can place this query in a script file and redirect input from it to the mysql command-line interpreter, piping its output back to a second instance of the interpreter:
shell> mysql --skip-column-names mydb < myscript.sql | mysql mydb
MySQL Server 4.0 supports multiple-table DELETE statements that can be used to efficiently delete rows based on information from one table or even from many tables at the same time. Multiple-table UPDATE statements are also supported from version 4.0.