2.6.1.10. Database query optimization

This article explains the logic behind optimizing database queries, since most programmers test their programs with only a small number of records in the tables, while the site owner's problems begin later, once the product catalogs are populated.

For example, here is a query:

SELECT 
   p.product_id, 
  (SELECT AVG(rating) AS total FROM mc_review r1 WHERE r1.product_id = p.product_id AND r1.STATUS ='1' GROUP BY r1.product_id) AS rating 
FROM mc_product p 
LEFT JOIN mc_product_description pd ON (p.product_id = pd.product_id) 
LEFT JOIN mc_product_to_store p2s ON (p.product_id = p2s.product_id) 
WHERE 
   pd.language_id = '2' AND 
   p.STATUS = '1' AND 
   p.date_available <= NOW() AND 
   p2s.store_id = '0' AND 
   p.product_id IN (SELECT pt.product_id FROM mc_product_tag pt WHERE pt.language_id = '2' AND LOWER(pt.tag) LIKE '%roxolana%') 
ORDER BY rating ASC 
LIMIT 0,20

If you run the query with the EXPLAIN clause at the beginning, you will get the query execution plan:

Select type Table Type Possible keys Key Key len Ref Rows Extra
PRIMARY p ALL PRIMARY 2907 Using where;
Using filesort
PRIMARY pd eq_ref PRIMARY PRIMARY 8 mebelnyc_db.p.product_id, const 1 Using where;
Using index
PRIMARY p2s eq_ref PRIMARY PRIMARY 8 mebelnyc_db.p.product_id, const 1 Using where;
Using index
DEPENDENT SUBQUERY pt ALL 6803 Using where
DEPENDENT SUBQUERY r1 ref product_id product_id 4 mebelnyc_db.p.product_id 1 Using where
  1. If you remove the LIMIT condition from this query, it will return 2,907 records. The SELECT query nested within the condition will be executed exactly 2907 times. If you move this part of the query into a separate query, it will reduce the load on the database by a factor of 2907/20 = 145. However, judging by the query's name, one can conclude that the program's author is attempting, in this interesting way, to count product statistics every time a visitor accesses the site — statistics that could instead be recalculated, for example, once a day, or better yet, when a review is added to a product and added to a separate column in the mc_product table, which would eliminate the need for this nested query.
  2. In the WHERE condition, we see a subquery that is executed within the IN condition. If the programmer had specified static values in the IN condition instead of a subquery — for example, IN (121, 1235, 43554) — then MySQL would have used an index and executed the query quickly. But with nested queries, the situation is quite different — MySQL executes them without using indexes, or more precisely, as follows: FIN_IN_SET(p.product_id, '121,1235,43554'). In such cases, you need to write the query separately and then substitute the result of its execution into the IN condition.
Content

    (12)