Table of Contents
First, one thing that affects all queries: The more complex permission system setup you have, the more overhead you get.
If you do not have any GRANT statements done, MySQL will optimize the permission checking somewhat. So if you have a very high volume it may be worth the time to avoid grants. Otherwise, more permission check results in a larger overhead.
If your problem is with some explicit MySQL function, you can always time this in the MySQL client:
mysql> SELECT BENCHMARK(1000000,1+1); +------------------------+ | BENCHMARK(1000000,1+1) | +------------------------+ | 0 | +------------------------+ 1 row in set (0.32 sec)
The above shows that MySQL can execute 1,000,000 + expressions in 0.32 seconds on a PentiumII 400MHz.
All MySQL functions should be very optimized, but there may be some exceptions, and the BENCHMARK(loop_count,expression) is a great tool to find out if this is a problem with your query.
EXPLAIN tbl_name or EXPLAIN SELECT select_options
EXPLAIN tbl_name is a synonym for DESCRIBE tbl_name or SHOW COLUMNS FROM tbl_name.
When you precede a SELECT statement with the keyword EXPLAIN, MySQL explains how it would process the SELECT, providing information about how tables are joined and in which order.
With the help of EXPLAIN, you can see when you must add indexes to tables to get a faster SELECT that uses indexes to find the records.
You should frequently run ANALYZE TABLE to update table statistics such as cardinality of keys which can affect the choices the optimizer makes. See ANALYZE TABLE.
You can also see if the optimizer joins the tables in an optimal order. To force the optimizer to use a specific join order for a SELECT statement, add a STRAIGHT_JOIN clause.
For non-simple joins, EXPLAIN returns a row of information for each table used in the SELECT statement. The tables are listed in the order they would be read. MySQL resolves all joins using a single-sweep multi-join method. This means that MySQL reads a row from the first table, then finds a matching row in the second table, then in the third table and so on. When all tables are processed, it outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table.
In MySQL version 4.1 the EXPLAIN output was changed to work better with constructs like UNION statements, subqueries and derived tables. Most notable is the addition of two new columns: id and select_type.
Output from EXPLAIN consists of the following columns:
id | SELECT identifier, the sequential number of this SELECT within the query. | |||||||||||||||||||||
select_type | Type of SELECT clause, which can be any of the following: |
| ||||||||||||||||||||
table | The table to which the row of output refers. | |||||||||||||||||||||
type | The join type. The different join types are listed here, ordered from best to worst type: |
| ||||||||||||||||||||
possible_keys | The possible_keys column indicates which indexes MySQL could use to find the rows in this table. Note that this column is totally independent of the order of the tables. That means that some of the keys in possible_keys may not be usable in practice with the generated table order. If this column is NULL, there are no relevant indexes. In this case, you may be able to improve the performance of your query by examining the WHERE clause to see if it refers to some column or columns that would be suitable for indexing. If so, create an appropriate index and check the query with EXPLAIN again. See ALTER TABLE. To see what indexes a table has, use SHOW INDEX FROM tbl_name. | |||||||||||||||||||||
key | The key column indicates the key (index) that MySQL actually decided to use. The key is NULL if no index was chosen. To force MySQL to use an key listed in the possible_keys column, use USE KEY/IGNORE KEY in your query. See SELECT. Also, running myisamchk --analyze (see myismchk syntax) or ANALYZE TABLE (see ANALYZE TABLE) on the table will help the optimizer choose better indexes. | |||||||||||||||||||||
key_len | The key_len column indicates the length of the key that MySQL decided to use. The length is NULL if the key is NULL. Note that this tells us how many parts of a multi-part key MySQL will actually use. | |||||||||||||||||||||
ref | The ref column shows which columns or constants are used with the key to select rows from the table. | |||||||||||||||||||||
rows | The rows column indicates the number of rows MySQL believes it must examine to execute the query. | |||||||||||||||||||||
Extra | This column contains additional information of how MySQL will resolve the query. Here is an explanation of the different text strings that can be found in this column: |
|
You can get a good indication of how good a join is by multiplying all values in the rows column of the EXPLAIN output. This should tell you roughly how many rows MySQL must examine to execute the query. This number is also used when you restrict queries with the max_join_size variable. See Server parameters.
The following example shows how a JOIN can be optimized progressively using the information provided by EXPLAIN.
Suppose you have the SELECT statement shown here, that you examine using EXPLAIN:
EXPLAIN SELECT tt.TicketNumber, tt.TimeIn, tt.ProjectReference, tt.EstimatedShipDate, tt.ActualShipDate, tt.ClientID, tt.ServiceCodes, tt.RepetitiveID, tt.CurrentProcess, tt.CurrentDPPerson, tt.RecordVolume, tt.DPPrinted, et.COUNTRY, et_1.COUNTRY, do.CUSTNAME FROM tt, et, et AS et_1, do WHERE tt.SubmitTime IS NULL AND tt.ActualPC = et.EMPLOYID AND tt.AssignedPC = et_1.EMPLOYID AND tt.ClientID = do.CUSTNMBR;
For this example, assume that:
The columns being compared have been declared as follows:
Table | Column | Column type |
tt | ActualPC | CHAR(10) |
tt | AssignedPC | CHAR(10) |
tt | ClientID | CHAR(10) |
et | EMPLOYID | CHAR(15) |
do | CUSTNMBR | CHAR(15) |
The tables have the indexes shown here:
Table | Index |
tt | ActualPC |
tt | AssignedPC |
tt | ClientID |
et | EMPLOYID (primary key) |
do | CUSTNMBR (primary key) |
The tt.ActualPC values aren't evenly distributed.
Initially, before any optimizations have been performed, the EXPLAIN statement produces the following information:
table type possible_keys key key_len ref rows Extra et ALL PRIMARY NULL NULL NULL 74 do ALL PRIMARY NULL NULL NULL 2135 et_1 ALL PRIMARY NULL NULL NULL 74 tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 range checked for each record (key map: 35)
Because type is ALL for each table, this output indicates that MySQL is generating a Cartesian product of all the tables! This will take quite a long time, as the product of the number of rows in each table must be examined! For the case at hand, this is 74 * 2135 * 74 * 3872 = 45,268,558,720 rows. If the tables were bigger, you can only imagine how long it would take.
One problem here is that MySQL can't (yet) use indexes on columns efficiently if they are declared differently. In this context, VARCHAR and CHAR are the same unless they are declared as different lengths. Because tt.ActualPC is declared as CHAR(10) and et.EMPLOYID is declared as CHAR(15), there is a length mismatch.
To fix this disparity between column lengths, use ALTER TABLE to lengthen ActualPC from 10 characters to 15 characters:
mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15);
Now tt.ActualPC and et.EMPLOYID are both VARCHAR(15). Executing the EXPLAIN statement again produces this result:
table type possible_keys key key_len ref rows Extra tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 Using where do ALL PRIMARY NULL NULL NULL 2135 range checked for each record (key map: 1) et_1 ALL PRIMARY NULL NULL NULL 74 range checked for each record (key map: 1) et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1
This is not perfect, but is much better (the product of the rows values is now less by a factor of 74). This version is executed in a couple of seconds.
A second alteration can be made to eliminate the column length mismatches for the tt.AssignedPC = et_1.EMPLOYID and tt.ClientID = do.CUSTNMBR comparisons:
mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15), -> MODIFY ClientID VARCHAR(15);
Now EXPLAIN produces the output shown here:
table type possible_keys key key_len ref rows Extra et ALL PRIMARY NULL NULL NULL 74 tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52 Using where ClientID, ActualPC et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1
This is almost as good as it can get.
The remaining problem is that, by default, MySQL assumes that values in the tt.ActualPC column are evenly distributed, and that isn't the case for the tt table. Fortunately, it is easy to tell MySQL about this:
shell> myisamchk --analyze PATH_TO_MYSQL_DATABASE/tt shell> mysqladmin refresh
Now the join is perfect, and EXPLAIN produces this result:
table type possible_keys key key_len ref rows Extra tt ALL AssignedPC NULL NULL NULL 3872 Using where ClientID, ActualPC et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1
Note that the rows column in the output from EXPLAIN is an educated guess from the MySQL join optimizer. To optimize a query, you should check if the numbers are even close to the truth. If not, you may get better performance by using STRAIGHT_JOIN in your SELECT statement and trying to list the tables in a different order in the FROM clause.