MySQL table partition by month

I have a huge table that stores many tracked events, such as a user click.

The table is already in the 10s of millions, and it’s growing larger every day.
The queries are starting to get slower when I try to fetch events from a large timeframe, and after reading quite a bit on the subject I understand that partitioning the table may boost the performance.

What I want to do is partition the table on a per month basis.

I have only found guides that show how to partition manually each month, is there a way to just tell MySQL to partition by month and it will do that automatically?

If not, what is the command to do it manually considering my partitioned by column is a datetime?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

As explained by the manual: http://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html

This is easily possible by hash partitioning of the month output.

CREATE TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE)
    ENGINE=INNODB
    PARTITION BY HASH( MONTH(tr_date) )
    PARTITIONS 6;

Do note that this only partitions by month and not by year, also there are only 6 partitions (so 6 months) in this example.

And for partitioning an existing table (manual: https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html):

ALTER TABLE ti
    PARTITION BY HASH( MONTH(tr_date) )
    PARTITIONS 6;

Querying can be done both from the entire table:

SELECT * from ti;

Or from specific partitions:

SELECT * from ti PARTITION (HASH(MONTH(some_date)));

Method 2

CREATE TABLE `mytable` (
  `post_id` int DEFAULT NULL,
  `viewid` int DEFAULT NULL,
  `user_id` int DEFAULT NULL,
  `post_Date` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PARTITION BY RANGE (extract(year_month from `post_Date`))
(PARTITION P0 VALUES LESS THAN (202012) ENGINE = InnoDB,
 PARTITION P1 VALUES LESS THAN (202104) ENGINE = InnoDB,
 PARTITION P2 VALUES LESS THAN (202108) ENGINE = InnoDB,
 PARTITION P3 VALUES LESS THAN (202112) ENGINE = InnoDB,
 PARTITION P4 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)

Method 3

Be aware of the “lazy” effect doing it partitioning by hash:

As docs says:

You should also keep in mind that this expression is evaluated each time a row is inserted or updated (or possibly deleted); this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.

The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value, as this allows for “pruning” on ranges of partitions. That is, the more closely that the expression varies with the value of the column on which it is based, the more efficiently MySQL can use the expression for hash partitioning.

For example, where date_col is a column of type DATE, then the expression TO_DAYS(date_col) is said to vary directly with the value of date_col, because for every change in the value of date_col, the value of the expression changes in a consistent manner. The variance of the expression YEAR(date_col) with respect to date_col is not quite as direct as that of TO_DAYS(date_col), because not every possible change in date_col produces an equivalent change in YEAR(date_col).

Method 4

HASHing by month with 6 partitions means that two months a year will land in the same partition. What good is that?

Don’t bother partitioning, index the table.

Assuming these are the only two queries you use:

SELECT * from ti;
SELECT * from ti PARTITION (HASH(MONTH(some_date)));

then start the PRIMARY KEY with the_date.

The first query simply reads the entire table; no change between partitioned and not.

The second query, assuming you want a single month, not all the months that map into the same partition, would need to be

SELECT * FROM ti  WHERE the_date >= '2019-03-01'
                    AND the_date  < '2019-03-01' + INTERVAL 1 MONTH;

If you have other queries, let’s see them.

(I have not found any performance justification for ever using PARTITION BY HASH.)


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x