I have a PHP
script (deleteAndReInsert.php
) that deletes all rows where name = 'Bob'
, and then inserts 1000 new rows with name = 'Bob'
. This works correctly, and the initially empty table ends up with 1000 total rows as expected.
$query = $pdo->prepare("DELETE FROM table WHERE name=?"); $query->execute(['Bob']); $query = $pdo->prepare("INSERT INTO table (name, age) VALUES (?,?)"); for ($i = 0; $i < 1000; $i++) { $query->execute([ 'name' => 'Bob', 'age' => 34 ]); }
The problem is if I run deleteAndReInsert.php
twice (almost at the exact same time), The final table contains more than 1000 rows.
What seems to be happening is that the DELETE
query from the first run finishes, and then many (but not all) of the 1000 INSERTS
get called.
Then the second DELETE
query starts and finishes before the first 1000 INSERTS
finishes (say 350 of the 1000 INSERTS
complete). Now the second 1000 INSERTS
runs, and we end up with 1650 total rows instead of 1000 total rows because there are still 1000 - 350 = 650 INSERTS
remaining after the second DELETE
gets called.
What’s the correct way to prevent this problem from happening? Should I wrap everything in a transaction, or should I make 1 batch insert call instead of 1000 individual inserts? Obviously I can implement both of these solutions, but I’m curious as to which one is guaranteed to prevent this problem.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Using transactions + batch insert
I think that the correct way to solve the problem is using transaction. We are going to do a Delete + a batch insert, here is the code:
$pdo->beginTransaction();
$query = $pdo->prepare("DELETE FROM table WHERE name=?");
$query->execute(['Bob']);
$sql = "INSERT INTO table (name, age) VALUES ".implode(', ',array_fill(0,999, '(:name, :age)'));
$query = $sth->prepare($sql);
$query->execute(array([ ':name' => 'Bob', 'age' => 34 ]));
$pdo->commit();
Using just a batch insert (Won’t work)
Why to do just a batch insert not solves the problem? Imagine the follow scenario:
- First script make a delete and remove the first 1000 rows. ==> Form 1000 rows to 0.
- Second script try to make a delete but there are no rows. ==> Form 0 rows to 0.
- First (or second) script make a 1000 batch insert. ==> Form 1000 rows to 1000.
- Second (or first) script make a the second 1000 batch insert. ==> Form 1000 rows to 2000.
This is why the process are asynchronous, so the second script may read the table befores the first script finish the insert.
Using a secondary table to simulate a lock (Not recomended)
And if we don’t have transaction, how we would solve that problem? I think that is is an intersecting exercise.
This is a classic concurrency problem where there are two or more process modifying same data. To solve that problem, I propose you use a second auxiliary table for simulate a Lock and control the concurrency access to the main table.
CREATE TABLE `access_table` (
`access` TINYINT(1) NOT NULL DEFAULT 1
)
And in the script
// Here we control the concurrency
do{
$query = $st->prepare('UPDATE access_table SET access = 0 WHERE access = 1');
$query ->execute();
$count = $query ->rowCount();
// You should put here a random sleep
}while($count === 0);
//Here we know that only us we are modifying the table
$query = $pdo->prepare("DELETE FROM table WHERE name=?");
$query->execute(['Bob']);
$query = $pdo->prepare("INSERT INTO table (name, age) VALUES (?,?)");
for ($i = 0; $i < 1000; $i++)
{
$query->execute([ 'name' => 'Bob', 'age' => 34 ]);
}
//And finally we open the table for other process
$query = $st->prepare('UPDATE access_table SET access = 1 WHERE access = 0');
$query ->execute();
You can adapt the table to your problem, for example, if the INSERTS/DELETES are by name, you can use a varchar(XX)
for name.
CREATE TABLE `access_table` (
`name` VARCHAR(50) NOT NULL,
`access` TINYINT(1) NOT NULL DEFAULT 1
)
With this scenario
- First script change the access value to 0.
- Second script can’t change the value so it stay in the loop
- First script makes the DELETES/INSERTS
- First script change the status to 1
- Second script changes the access value to 0 and breaks the look.
- Second script makes the DELETES/INSERTS
- Second script change the status to 1
This is because updates are atomics, that means that two process cannot update the same date at same time, so when the first script update the value, the second script can not modify, that action is atomic.
I hope to have helped you.
Method 2
An alternative to the other solutions is to create an actual lock file when the script launches and check if it exists before running.
while( file_exists("isrunning.lock") ){ sleep(1); } //create file isrunning.lock $myfile = fopen("isrunning.lock", "w"); //deleteAndinsert code //delete lock file when finished fclose($myfile); unlink("isrunning.lock");
Method 3
you have to lock the operation and dont release it before insertion ends.
you can use a file on filesystem, but as @chris Hass suggested you can use symfony’s package like this:
install symfony lock:
composer require symfony/lock
you sould include composer’s autoload
require __DIR__.'/vendor/autoload.php';
then in your deleteAndReInsert.php :
use SymfonyComponentLockLockFactory;
use SymfonyComponentLockStoreSemaphoreStore;
//if you are on windows or for any reason this store(FlockStore) didnt work
// you can use another stores available here: https://symfony.com/doc/current/components/lock.html#available-stores
$store = new FlockStore();
$factory = new LockFactory($store);
$lock = $factory->createLock('bob-recreation');
$lock->acquire(true)
$query = $pdo->prepare("DELETE FROM table WHERE name=?");
$query->execute(['Bob']);
$query = $pdo->prepare("INSERT INTO table (name, age) VALUES (?,?)");
for ($i = 0; $i < 1000; $i++)
{
$query->execute([ 'name' => 'Bob', 'age' => 34 ]);
}
$lock->release();
What Happened
as you mentioned what happened is a race condition:
If Two Concurrent process are accessing a Shared Resource,
That resembles the Critical Section Which maybe needs to get protected with locks
Method 4
The count is an approximation
SHOW TABLE STATUS
(and many variants on such) provide only an estimate of the number of rows. (Please say how you are getting the “1650”.)
The precise way to count is
SELECT COUNT(*) FROM table;
Further discussion
There are 2 main ways to do “transaction locking”. Both protect against other connections interfering.
-
Autocommit:
SET autocommit = ON; -- probably this is the default -- Now each SQL statement is a separate "transaction"
-
BEGIN…COMMIT
BEGIN; -- (this is performed in a variety of ways by the db layer) delete... insert... COMMIT; --everything above either entire happens or is entirely ROLLBACK'd
Performance:
DELETE
–>TRUNCATE
- Batched insert (1000 rows in a single
INSERT
) BEGIN...COMMIT
LOAD DATA
instead ofINSERT
But none of the performance techniques will change the problem you are encountering — except “coincidentally”.
Why 1650?
(or some other number) InnoDB’s transactional nature requires that it hang onto previous copies of rows that are being deleted or inserted until the COMMIT
(whether explicit or autocommit’d). This clutters the database with “rows” that might go away. Hence any attempt to dead-reckon the exact number of rows is not practical.
That leads to using a different technique to estimate the row count. It goes something like this: The table takes this much disk, and we have an estimate that the average row is this many bytes. Divide those to get the row count.
That leads to your theory about the Delete not being finished. As far as any SQL goes, the Delete is finished. However the temporarily saved copies of the 1000 rows have not been thoroughly cleaned out of the table. Hence, the imprecise computation of the number of rows.
Locking?
No locking technique will “fix” the 1650. Locking is needed if you don’t want other threads inserting/deleting rows while running your Delete+Insert experiment. You should use locking for that purpose.
Meanwhile, you must use COUNT(*)
if you want the precise count.
Method 5
What’s the correct way to prevent this problem from happening?
This is not a problem and is expected behaviour for two pages accessing the same table on a database.
Should I wrap everything in a transaction, or should I make 1 batch insert call instead of 1000 individual inserts? Obviously I can implement both of these solutions, but I’m curious as to which one is guaranteed to prevent this problem.
Won’t make a blind bit of difference other than limiting the amount of inserts to be n000 by the number of pages you run.
Scenario 1 – do nothing
There are two pages running one after the other or at similar times. This is why you are seeing 1650 records due to implicit transactions within the execute method, allowing other processes (pages in your case) to access the data in a table.
Action | Page a | Page b | Table Row count |
---|---|---|---|
1 | Deletes all Bobs | 0 | |
… | Insert a row | 1 | |
351 | Insert a row | Deletes all Bobs | 0 |
352 | Insert a row | Insert a row | 2 |
… | Insert a row | Insert a row | 4 |
1001 | Insert a row | Insert a row | 1298 |
1002 | Insert a row | 1299 | |
… | Insert a row | … | |
1352 | Insert a row | 1650 |
Thus 1650 Bobs are inserted.
Scenario 2 – use explicit transactions (optimistic)
| Action | Page a | Page b | Table Row count | Transactions |
| —– | —— | —— | — | — |
| 1 | starts | | 0 | |
| 2 | Deletes all Bobs | starts | 0 | (a-d0)|
| 3 | Inserts 1000 rows | Deletes all Bobs | 0| (a-d0-i1000)(b-d1000) |
| 4 | commits | Inserts 1000 rows | 1000 | (b-d1000-i1000) |
| 5 | | commits | 1000 |
Scenario 3 – add locking
| Action | Page a | Page b | Table Row count |
| —– | —— | —— | — |
| 1 | AQ lock | | 0 |
| 2 | starts | | 0 |
| 3 | Deletes all Bobs | AQ lock | 0 |
| 4 | Inserts 1000 rows | no lock| 0|
| 5 | commits | no lock | 1000 |
| 6 | | AQ lock | 1000 |
| 6 | | starts | 1000 |
| 6 | | Deletes all Bobs | 1000 (0) |
| 6 | | Inserts 1000 rows | 1000 (1000) |
| 6 | | unlock | 1000 |
Method 6
You can check the Process List on the Server and prevent your script from executing, if there is another instance.
Method 7
you hit deleteAndReInsert.php
twice, and each of script has 1001 commands, first is delete all name = Bob
and the rest is insert 1000 times Bob again.
so totally you have 2002 commands, and the you don’t declare somethings that make Mysql understand that you want to execute it synchronous, and your 2002 commands will run concurrent, and will lead to the unexpected result. (more than 1000 inserted name= Bob
). the process could described like this:
->delete `name= bob` (clear count = 0) ->insert `name = bob` ->insert `name = bob` ->insert `name = bob` ->insert `name = bob` .... ->insert `name = bob` ->delete `name= bob` (the second time deleteAndReInsert.php hit deleted at 300 times insert `name = bob` of first time deleteAndReInsert.php -> clear count rows = 0) ->insert `name = bob` ->insert `name = bob` ->insert `name = bob` .... -> insert `name = bob` (now it could be more than 1000 rows)
So if you want the result is 1000 rows. you must make mysql understand that: i want deleteAndReInsert.php that run synchronous, step by step. and to archived that you can do one of these solutions:
- use
LOCK TABLE
statement to lock table andUNLOCK
when finish, that make second script can not do anything with table unless the first script is done. - wrap all in transaction
BEGIN COMMIT
then mysql will run as an atomic action. (Good) - simulate
LOCK
by redis (Redlock), file .. to make your action run synchronous (Good)
Hope that could help you solve problem.
Method 8
What you want to do is issuing a LOCK TABLE ... WRITE
as the first statement of your job, and RELEASE TABLES
as its last.
Then the thousand rows will be deleted, then inserted, then deleted, then inserted again.
But the whole procedure smells like a XY problem to me. What is it that you really need to do?
Because I have often needed to do something like this that you describe (“refreshing” some summaries for example), and the best way to do this is, in that scenario and in my opinion, neither LOCK nor DELETE/INSERT, but rather
INSERT INTO table ON DUPLICATE KEY UPDATE ...
In my case, if I only need to add or refresh records, that is enough.
Otherwise, I usually add a “time” field that allows me to recognize all the records that were “left out” from the refresh cycle; those – and only those – get deleted upon finish.
For example, say that I need to calculate, with a complex PHP calculation, the maximum financial exposure for many customers, and then insert into a table for ease of use. Each night every customer has its values refreshed, then the next day the “cache” table is used. Truncating the table and reinserting everything is a pain.
Instead, I calculate all values and build a very large multiple INSERT query (I could split it into X smaller multiple queries if needed):
SELECT barrier:=NOW(); INSERT INTO `financial_exposures` ( ..., amount, customer_id, last_update ) VALUES ( ..., 172035.12, 12345, NOW()), ( ..., 123456.78, 12346, NOW()), ... ( ..., 450111.00, 99999, NOW()) ON DUPLICATE KEY UPDATE amount=VALUES(amount), last_update=VALUES(last_update); DELETE FROM financial_exposures WHERE last_update < @barrier;
New customers get inserted, old customers get updated unless their values are unchanged (in that case MySQL skips the update, saving time), and at each instant, a record is always present – the one before the update, or the one after the update. Removed customers get removed as the last step.
This works better when you have a table that you need to frequently use and update. You can add a transaction (SET autocommit = 0
before the INSERT
, COMMIT WORK
after the DELETE
) with no locks to ensure that all clients see the whole update as if it happened instantly.
Method 9
@Pericodes’ answer is correct, but there’s an error in the code snippet.
You can avoid duplicates by wrapping the code in a transaction (batch insert isn’t required to stop duplicates).
It’s better to use 1 batch insert instead of 1000 separate inserts even though it’s not required.
You can test by running this code twice (almost at the same time), and the table ends up with exactly 1000 records.
<? $pdo->beginTransaction(); $query = $pdo->prepare("DELETE FROM t1 WHERE name=?"); $query->execute(['Bob']); $query = $pdo->prepare("INSERT INTO t1 (name, age) VALUES (:name,:age)"); for ($i = 0; $i < 100; $i++) { $query->execute([ 'name' => 'Bob', 'age' => 34 ]); } $pdo->commit();
Several of the answers mention locks (db-level and code-level), but those aren’t necessary for this problem, and are overkill imo.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0