2.6.1.9. Database optimization
Using examples of errors encountered by hosting users, let's explore how to optimize a database.
Here is an example of a table containing more than 40,000 entries, which receives a large number of queries that place a significant load on the server:
CREATE TABLE `links` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`source` INT(11) UNSIGNED NOT NULL,
`category` INT(11) UNSIGNED NOT NULL,
`title` text NOT NULL,
`description` text NOT NULL,
`text` text NOT NULL,
`link` text NOT NULL,
`publication` datetime NOT NULL,
`scan` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique` (`id`) USING BTREE,
KEY `normal` (`source`,`category`,`title`(100),`publication`,`description`(100),`text`(100),`scan`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=40000 DEFAULT CHARSET=utf8
Examples of incoming queries:
SELECT COUNT(*)
FROM links
WHERE link = 'http://www.......com.ua/31-08-2014/trolleybook-podgotovil-prazdnichnuyu-programmu-k-limba-noastr-/n96349/'
AND UNIX_TIMESTAMP(publication) = 1409472019
When we run the query with the EXPLAIN prefix, we see that the entire table — about 40,000 rows — is scanned to return the results. To optimize the table's performance, we add an index on the publication field. This alone will be enough to speed up the query and reduce the load on the MySQL server's CPU.
Adding an index
You can add an index in any of the following ways:- Using the SQL query:
ALTER TABLE `table_name` ADD INDEX(`column_name`);In the query:
table_nameis the table name, andcolumn_nameis the name of the column for which you want to add an index.
We also recommend:
- Change the data type of the
title,description, andlinkfields fromtexttovarcharwith the appropriate length. - Remove the
UNIQindex on theidcolumn. It is unnecessary because thePRIMARYindex on the same column already serves this purpose. - An index named
normalraises even more questions, since the server has to work quite hard to create it. It is simply enormous, and it is highly unlikely that searches are performed across all the fields listed in it. Most likely, this is a "dead" index that does more harm than good. Developers should be aware that the index will only be used if the query contains fields that appear in it from left to right. For example, this index will be used if theWHEREclause specifies the fieldscategory,title, andsource. If thesourcefield is not specified in the query, then the index will not be used.
Finally, there is a question regarding the necessity of using the COUNT(*) clause itself. It seems very likely that the author intends to use this to check whether a entry exists in the table, rather than to obtain the total number of entries in the table with the specified URL for the specified date. In that case, it is correct to use a query without the COUNT(*) clause but with the LIMIT 1 condition:
SELECT id
FROM links
WHERE
link = 'http://www.......com.ua/31-08-2014/trolleybook-podgotovil-prazdnichnuyu-programmu-k-limba-noastr-/n96349/'
AND UNIX_TIMESTAMP(publication) = 1409472019
LIMIT 1
In this case, the database will stop the search as soon as the first record is found, whereas with the COUNT clause, it will scan the entire table.
P.S. Although the SQL query searches across two columns, publication and link, the link field was not added to the index. This is because, for this table, the publication field — which includes the date — is quite unique and will contain at most a few rows with different link values. Therefore, the overhead of creating an index for the link field significantly exceeds the cost of scanning a few rows using a single publication index.