2.6.1.9. Database optimization

Using examples of errors encountered by hosting users, let's explore how to optimize a database.

Here is an example of a table containing more than 40,000 entries, which receives a large number of queries that place a significant load on the server:

CREATE TABLE `links` (
  `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
  `source` INT(11) UNSIGNED NOT NULL,
  `category` INT(11) UNSIGNED NOT NULL,
  `title` text NOT NULL,
  `description` text NOT NULL,
  `text` text NOT NULL,
  `link` text NOT NULL,
  `publication` datetime NOT NULL,
  `scan` datetime NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique` (`id`) USING BTREE,
  KEY `normal` (`source`,`category`,`title`(100),`publication`,`description`(100),`text`(100),`scan`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=40000 DEFAULT CHARSET=utf8

Examples of incoming queries:

SELECT COUNT(*)
FROM links
WHERE link = 'http://www.......com.ua/31-08-2014/trolleybook-podgotovil-prazdnichnuyu-programmu-k-limba-noastr-/n96349/'
AND UNIX_TIMESTAMP(publication) = 1409472019

When we run the query with the EXPLAIN prefix, we see that the entire table — about 40,000 rows — is scanned to return the results. To optimize the table's performance, we add an index on the publication field. This alone will be enough to speed up the query and reduce the load on the MySQL server's CPU.

Adding an index

You can add an index in any of the following ways:
  • Using the SQL query:
    ALTER TABLE `table_name` ADD INDEX(`column_name`);

    In the query: table_name is the table name, and column_name is the name of the column for which you want to add an index.

We also recommend:

  1. Change the data type of the title, description, and link fields from text to varchar with the appropriate length.
  2. Remove the UNIQ index on the id column. It is unnecessary because the PRIMARY index on the same column already serves this purpose.
  3. An index named normal raises even more questions, since the server has to work quite hard to create it. It is simply enormous, and it is highly unlikely that searches are performed across all the fields listed in it. Most likely, this is a "dead" index that does more harm than good. Developers should be aware that the index will only be used if the query contains fields that appear in it from left to right. For example, this index will be used if the WHERE clause specifies the fields category, title, and source. If the source field is not specified in the query, then the index will not be used.

Finally, there is a question regarding the necessity of using the COUNT(*) clause itself. It seems very likely that the author intends to use this to check whether a entry exists in the table, rather than to obtain the total number of entries in the table with the specified URL for the specified date. In that case, it is correct to use a query without the COUNT(*) clause but with the LIMIT 1 condition:

SELECT id
FROM links
WHERE
link =  'http://www.......com.ua/31-08-2014/trolleybook-podgotovil-prazdnichnuyu-programmu-k-limba-noastr-/n96349/'
AND UNIX_TIMESTAMP(publication) = 1409472019
LIMIT 1

In this case, the database will stop the search as soon as the first record is found, whereas with the COUNT clause, it will scan the entire table.

P.S. Although the SQL query searches across two columns, publication and link, the link field was not added to the index. This is because, for this table, the publication field — which includes the date — is quite unique and will contain at most a few rows with different link values. Therefore, the overhead of creating an index for the link field significantly exceeds the cost of scanning a few rows using a single publication index.

Content

    (3)