MySQL SELECT MAX vs ORDER BY LIMIT 1 on WHERE clause

SELECT MAX vs ORDER BY LIMIT 1 question has been answered here several times, but if I add a WHERE clause, things change dramatically

Here is my table:

FieldTypeNullKeyDefaultExtra
idb’int’‘NO’‘PRI’None‘auto_increment’
‘open_time’b’bigint’‘NO’‘UNI’None

Note, that both columns are indexed.

And here are the requests:

SELECT id from table
WHERE open_time > 0
ORDER BY id DESC LIMIT 1

SELECT MAX(id) from BTCUSDT1mHist
WHERE open_time > 0

EXPLAIN ANALYZE shows the following:
ORDER BY:

-> Limit: 1 row(s)  (cost=0.10 rows=1) (actual time=0.038..0.038 rows=1 loops=1)
    -> Filter: (table.open_time > 0)  (cost=0.10 rows=1) (actual time=0.037..0.037 rows=1 loops=1)
        -> Index scan on table using PRIMARY (reverse)  (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)

MAX():

-> Aggregate: max(table.id)  (cost=325890.06 rows=1081033) (actual time=1025.181..1025.181 rows=1 loops=1)
    -> Filter: (table.open_time > 0)  (cost=217786.76 rows=1081033) (actual time=0.032..866.890 rows=2180645 loops=1)
        -> Index range scan on table using open_time  (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)

ORDER BY finishes in 0.0012 seconds, while MAX() does in 1.026 seconds

I have read this question also, but it doesn’t seem to cover my situation

The question is: why does MAX() takes so much longer than ORDER BY LIMIT?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Compare in the analyze:

    -> Index scan on table using PRIMARY (reverse)  (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)

Versus:

    -> Index range scan on table using open_time  (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)

Examining 2 rows must be a lot quicker than examining 2,180,645 rows.

In the query with ORDER BY id DESC LIMIT 1, it uses the primary key index. It starts at the end because it’s a reverse order. Then it just iterates down the leaf nodes of the index (in descending order), until it examines the first row that also matches open_time > 0. Then the LIMIT optimization allows the query execution to finish. Based on its statistics, it estimates this will happen after examining 2 rows.

In the query with MAX(id), it uses the index on open_time. But because it’s a range condition open_time > 0, it can’t assume the maximum id is found at the start or end of that range. So it must examine every matching entry in the open_time index, searching for the greatest value of id (primary keys are implicitly part of a secondary index). There’s no chance of early-termination as there is in the query with LIMIT.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x