I’ve inherited some MySQL table that was designed without correct constraint so it gets filled with some duplicate rows which I need to remove. The problem that across duplicate rows data isn’t usually consistent, see example below:
id | request_id | guid_id | details | flag |
---|---|---|---|---|
1 | 10 | fh82EN | help me | 1 |
2 | 11 | fh82EN | ||
3 | 12 | fh82EN | assistance required | 1 |
4 | 12 | fh82EN | assistance required | 1 |
5 | 13 | fh82EN | ||
6 | 13 | fh82EN | assist me. | 1 |
7 | 13 | fh82EN | ||
8 | 14 | fh82EN | ||
Records with id: 1,2,8 perfectly fine. For duplicate records with id 3, 4 I have designed the query below which works fine and removes all duplicates without an issue:
DELETE IR.* FROM platform.temp IR WHERE id IN ( SELECT maxId AS id FROM ( SELECT MAX(id) as maxId, request_id, guid_id FROM platform.temp GROUP BY request_id, guid_id HAVING COUNT(*) > 1 ) AS T );
The problem is records with id 5,6,7. You can see that the same record by (guid_id and request_id) is not consistent. So, my previous query will delete records with content too because of MAX(id)
. I have designed a query that fixes these records, but we are talking about a huge database and this query is painfully slow:
UPDATE platform.temp AS DEST_T INNER JOIN ( SELECT request_id, guid_id, details, flag FROM platform.temp WHERE details IS NOT NULL AND details != '' ) AS SOURCE_T SET DEST_T.details = SOURCE_T.details, DEST_T.flag = SOURCE_T.flag WHERE DEST_T.guid_id = SOURCE_T.guid_id AND DEST_T.request_id = SOURCE_T.request_id;
How can I change my delete query that it will order my subgroup by field details
and will select not MAX(id)
but first id, so I will be sure that last row in subgroup will always be populated with value and will left?
MySQL version: 5.6.40-log
UPDATE1:
The desired outcome after cleaning the table should be as follow:
id | request_id | guid_id | details | flag |
---|---|---|---|---|
1 | 10 | fh82EN | help me | 1 |
2 | 11 | fh82EN | ||
3 | 12 | fh82EN | assistance required | 1 |
6 | 13 | fh82EN | assist me. | 1 |
8 | 14 | fh82EN | ||
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Use a self join of the table:
DELETE t1 FROM tablename t1 INNER JOIN tablename t2 ON t2.request_id = t1.request_id AND t2.guid_id = t1.guid_id WHERE (t2.id < t1.id AND COALESCE(t1.details, '') = '') OR (t2.id > t1.id AND COALESCE(t2.details, '') <> '');
This will keep 1 row for each request_id
and guid_id
combination, not necessarily the one with the min id
.
See the demo.
Another way to do it, with conditional aggregation:
DELETE t1 FROM tablename t1 INNER JOIN ( SELECT request_id, guid_id, MIN(id) min_id, MIN(CASE WHEN COALESCE(details, '') <> '' THEN id END) min_id_not_null FROM tablename GROUP BY request_id, guid_id ) t2 ON t2.request_id = t1.request_id AND t2.guid_id = t1.guid_id WHERE t1.id <> COALESCE(t2.min_id_not_null, t2.min_id);
This will keep the row with the min id
under your conditions, but maybe its performance would not be that good compared to the 1st query.
See the demo.
Method 2
Another way is to emulate the ROW_NUMBER ad then perform the delete operation.
DELETE FROM test WHERE id NOT IN (select id from (SELECT id, @row_number := CASE WHEN @last_request_id <> x.request_id + x.guid_id THEN 1 ELSE @row_number + 1 END AS row_num, @last_request_id := x.request_id + x.guid_id FROM test x CROSS JOIN (SELECT @row_number := 0, @last_request_id := null, @last_guid_id := null) y ORDER BY request_id, guid_id, details DESC) temp where row_num = 1);
Method 3
As i said in the comments i would use it with row_numbers, which in mysql 8 would look much more nicer
CREATE TABLE temp (`id` varchar(4), `request_id` varchar(12), `guid_id` varchar(9), `details` varchar(21), `flag` varchar(6)) ; INSERT INTO temp (`id`, `request_id`, `guid_id`, `details`, `flag`) VALUES ('1', '10', 'fh82EN', 'help me', '1'), ('2', '11', 'fh82EN', NULL, NULL), ('3', '12', 'fh82EN', 'assistance required', '1'), ('4', '12', 'fh82EN', 'assistance required', '1'), ('5', '13', 'fh82EN', NULL, NULL), ('6', '13', 'fh82EN', 'assistance required', '1'), ('7', '13', 'fh82EN', NULL, NULL), ('8', '14', 'fh82EN', NULL, NULL) ;
DELETE t1 FROM temp t1 INNER JOIN (SELECT `id` , IF(@request = `request_id` AND @guid = guid_id, @rn:= @rn+1,@rn := 1) rn ,@request := `request_id` as request_id ,@guid := guid_id as guid_id fROM temp,(SELECT @request := 0, @guid := '',@rn := 0) t1 ORDER BY `guid_id`,`request_id`,`details` DESC, id) t2 ON t1.`id` = t2.`id` AND rn > 1
SELECT * FROM tempid | request_id | guid_id | details | flag :- | :--------- | :------ | :------------------ | :--- 1 | 10 | fh82EN | help me | 1 2 | 11 | fh82EN | null | null 3 | 12 | fh82EN | assistance required | 1 6 | 13 | fh82EN | assistance required | 1 8 | 14 | fh82EN | null | null
db<>fiddle here
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0