Transaction vs Batch Query to Avoid Duplicate MySQL Inserts

I have a PHP script (deleteAndReInsert.php) that deletes all rows where name = 'Bob', and then inserts 1000 new rows with name = 'Bob'. This works correctly, and the initially empty table ends up with 1000 total rows as expected.

$query = $pdo->prepare("DELETE FROM table WHERE name=?");

$query = $pdo->prepare("INSERT INTO table (name, age) VALUES (?,?)");
for ($i = 0; $i < 1000; $i++)
    $query->execute([ 'name' => 'Bob', 'age' => 34 ]);

The problem is if I run deleteAndReInsert.php twice (almost at the exact same time), The final table contains more than 1000 rows.

What seems to be happening is that the DELETE query from the first run finishes, and then many (but not all) of the 1000 INSERTS get called.

Then the second DELETE query starts and finishes before the first 1000 INSERTS finishes (say 350 of the 1000 INSERTS complete). Now the second 1000 INSERTS runs, and we end up with 1650 total rows instead of 1000 total rows because there are still 1000 - 350 = 650 INSERTS remaining after the second DELETE gets called.

What’s the correct way to prevent this problem from happening? Should I wrap everything in a transaction, or should I make 1 batch insert call instead of 1000 individual inserts? Obviously I can implement both of these solutions, but I’m curious as to which one is guaranteed to prevent this problem.


Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Using transactions + batch insert

I think that the correct way to solve the problem is using transaction. We are going to do a Delete + a batch insert, here is the code:

$query = $pdo->prepare("DELETE FROM table WHERE name=?");

$sql = "INSERT INTO table (name, age) VALUES ".implode(', ',array_fill(0,999, '(:name, :age)'));
$query = $sth->prepare($sql);
$query->execute(array([ ':name' => 'Bob', 'age' => 34 ]));


Using just a batch insert (Won’t work)

Why to do just a batch insert not solves the problem? Imagine the follow scenario:

  1. First script make a delete and remove the first 1000 rows. ==> Form 1000 rows to 0.
  2. Second script try to make a delete but there are no rows. ==> Form 0 rows to 0.
  3. First (or second) script make a 1000 batch insert. ==> Form 1000 rows to 1000.
  4. Second (or first) script make a the second 1000 batch insert. ==> Form 1000 rows to 2000.

This is why the process are asynchronous, so the second script may read the table befores the first script finish the insert.

Using a secondary table to simulate a lock (Not recomended)

And if we don’t have transaction, how we would solve that problem? I think that is is an intersecting exercise.

This is a classic concurrency problem where there are two or more process modifying same data. To solve that problem, I propose you use a second auxiliary table for simulate a Lock and control the concurrency access to the main table.

CREATE TABLE `access_table` (

And in the script

// Here we control the concurrency
   $query = $st->prepare('UPDATE access_table SET access = 0 WHERE access = 1');
   $query ->execute();
   $count = $query ->rowCount();
   // You should put here a random sleep
}while($count === 0);

//Here we know that only us we are modifying the table
$query = $pdo->prepare("DELETE FROM table WHERE name=?");

$query = $pdo->prepare("INSERT INTO table (name, age) VALUES (?,?)");
for ($i = 0; $i < 1000; $i++)
    $query->execute([ 'name' => 'Bob', 'age' => 34 ]);

//And finally we open the table for other process
$query = $st->prepare('UPDATE access_table SET access = 1 WHERE access = 0');
$query ->execute();

You can adapt the table to your problem, for example, if the INSERTS/DELETES are by name, you can use a varchar(XX) for name.

CREATE TABLE `access_table` (
  `name` VARCHAR(50) NOT NULL,

With this scenario

  1. First script change the access value to 0.
  2. Second script can’t change the value so it stay in the loop
  3. First script makes the DELETES/INSERTS
  4. First script change the status to 1
  5. Second script changes the access value to 0 and breaks the look.
  6. Second script makes the DELETES/INSERTS
  7. Second script change the status to 1

This is because updates are atomics, that means that two process cannot update the same date at same time, so when the first script update the value, the second script can not modify, that action is atomic.

I hope to have helped you.

Method 2

An alternative to the other solutions is to create an actual lock file when the script launches and check if it exists before running.

while( file_exists("isrunning.lock") ){

//create file isrunning.lock
$myfile = fopen("isrunning.lock", "w");

//deleteAndinsert code

//delete lock file when finished

Method 3

you have to lock the operation and dont release it before insertion ends.

you can use a file on filesystem, but as @chris Hass suggested you can use symfony’s package like this:

install symfony lock:

composer require symfony/lock

you sould include composer’s autoload

require __DIR__.'/vendor/autoload.php';

then in your deleteAndReInsert.php :

use SymfonyComponentLockLockFactory;
use SymfonyComponentLockStoreSemaphoreStore;

//if you are on windows or for any reason this store(FlockStore) didnt work
// you can use another stores available here: 
$store = new FlockStore();
$factory = new LockFactory($store);
$lock = $factory->createLock('bob-recreation');

$query = $pdo->prepare("DELETE FROM table WHERE name=?");

$query = $pdo->prepare("INSERT INTO table (name, age) VALUES (?,?)");
for ($i = 0; $i < 1000; $i++)
    $query->execute([ 'name' => 'Bob', 'age' => 34 ]);

What Happened

as you mentioned what happened is a race condition:

If Two Concurrent process are accessing a Shared Resource,
That resembles the Critical Section Which maybe needs to get protected with locks

Method 4

The count is an approximation

SHOW TABLE STATUS (and many variants on such) provide only an estimate of the number of rows. (Please say how you are getting the “1650”.)

The precise way to count is


Further discussion

There are 2 main ways to do “transaction locking”. Both protect against other connections interfering.

  • Autocommit:
    SET autocommit = ON;   -- probably this is the default
    -- Now each SQL statement is a separate "transaction"
    BEGIN;  -- (this is performed in a variety of ways by the db layer)
    COMMIT;  --everything above either entire happens or is entirely ROLLBACK'd


  • Batched insert (1000 rows in a single INSERT)
  • LOAD DATA instead of INSERT

But none of the performance techniques will change the problem you are encountering — except “coincidentally”.

Why 1650?

(or some other number) InnoDB’s transactional nature requires that it hang onto previous copies of rows that are being deleted or inserted until the COMMIT (whether explicit or autocommit’d). This clutters the database with “rows” that might go away. Hence any attempt to dead-reckon the exact number of rows is not practical.

That leads to using a different technique to estimate the row count. It goes something like this: The table takes this much disk, and we have an estimate that the average row is this many bytes. Divide those to get the row count.

That leads to your theory about the Delete not being finished. As far as any SQL goes, the Delete is finished. However the temporarily saved copies of the 1000 rows have not been thoroughly cleaned out of the table. Hence, the imprecise computation of the number of rows.


No locking technique will “fix” the 1650. Locking is needed if you don’t want other threads inserting/deleting rows while running your Delete+Insert experiment. You should use locking for that purpose.

Meanwhile, you must use COUNT(*) if you want the precise count.

Method 5

What’s the correct way to prevent this problem from happening?

This is not a problem and is expected behaviour for two pages accessing the same table on a database.

Should I wrap everything in a transaction, or should I make 1 batch insert call instead of 1000 individual inserts? Obviously I can implement both of these solutions, but I’m curious as to which one is guaranteed to prevent this problem.

Won’t make a blind bit of difference other than limiting the amount of inserts to be n000 by the number of pages you run.

Scenario 1 – do nothing

There are two pages running one after the other or at similar times. This is why you are seeing 1650 records due to implicit transactions within the execute method, allowing other processes (pages in your case) to access the data in a table.

ActionPage aPage bTable Row count
1Deletes all Bobs0
Insert a row1
351Insert a rowDeletes all Bobs0
352Insert a rowInsert a row2
Insert a rowInsert a row4
1001Insert a rowInsert a row1298
1002Insert a row1299
Insert a row
1352Insert a row1650

Thus 1650 Bobs are inserted.

Scenario 2 – use explicit transactions (optimistic)
| Action | Page a | Page b | Table Row count | Transactions |
| —– | —— | —— | — | — |
| 1 | starts | | 0 | |
| 2 | Deletes all Bobs | starts | 0 | (a-d0)|
| 3 | Inserts 1000 rows | Deletes all Bobs | 0| (a-d0-i1000)(b-d1000) |
| 4 | commits | Inserts 1000 rows | 1000 | (b-d1000-i1000) |
| 5 | | commits | 1000 |

Scenario 3 – add locking
| Action | Page a | Page b | Table Row count |
| —– | —— | —— | — |
| 1 | AQ lock | | 0 |
| 2 | starts | | 0 |
| 3 | Deletes all Bobs | AQ lock | 0 |
| 4 | Inserts 1000 rows | no lock| 0|
| 5 | commits | no lock | 1000 |
| 6 | | AQ lock | 1000 |
| 6 | | starts | 1000 |
| 6 | | Deletes all Bobs | 1000 (0) |
| 6 | | Inserts 1000 rows | 1000 (1000) |
| 6 | | unlock | 1000 |

Method 6

You can check the Process List on the Server and prevent your script from executing, if there is another instance.

Method 7

you hit deleteAndReInsert.php twice, and each of script has 1001 commands, first is delete all name = Bob and the rest is insert 1000 times Bob again.
so totally you have 2002 commands, and the you don’t declare somethings that make Mysql understand that you want to execute it synchronous, and your 2002 commands will run concurrent, and will lead to the unexpected result. (more than 1000 inserted name= Bob). the process could described like this:

->delete `name= bob` (clear count = 0)
->insert `name = bob`
->insert `name = bob`
->insert `name = bob`
->insert `name = bob`
->insert `name = bob`
->delete `name= bob` (the second time deleteAndReInsert.php 
hit deleted at 300 times insert `name = bob` of first
 time deleteAndReInsert.php -> clear count rows = 0)
->insert `name = bob`
->insert `name = bob`
->insert `name = bob`
-> insert `name = bob` (now it could be more than 1000 rows)

So if you want the result is 1000 rows. you must make mysql understand that: i want deleteAndReInsert.php that run synchronous, step by step. and to archived that you can do one of these solutions:

  1. use LOCK TABLE statement to lock table and UNLOCK when finish, that make second script can not do anything with table unless the first script is done.
  2. wrap all in transaction BEGIN COMMIT then mysql will run as an atomic action. (Good)
  3. simulate LOCK by redis (Redlock), file .. to make your action run synchronous (Good)

Hope that could help you solve problem.

Method 8

What you want to do is issuing a LOCK TABLE ... WRITE as the first statement of your job, and RELEASE TABLES as its last.

Then the thousand rows will be deleted, then inserted, then deleted, then inserted again.

But the whole procedure smells like a XY problem to me. What is it that you really need to do?

Because I have often needed to do something like this that you describe (“refreshing” some summaries for example), and the best way to do this is, in that scenario and in my opinion, neither LOCK nor DELETE/INSERT, but rather


In my case, if I only need to add or refresh records, that is enough.

Otherwise, I usually add a “time” field that allows me to recognize all the records that were “left out” from the refresh cycle; those – and only those – get deleted upon finish.

For example, say that I need to calculate, with a complex PHP calculation, the maximum financial exposure for many customers, and then insert into a table for ease of use. Each night every customer has its values refreshed, then the next day the “cache” table is used. Truncating the table and reinserting everything is a pain.

Instead, I calculate all values and build a very large multiple INSERT query (I could split it into X smaller multiple queries if needed):

SELECT barrier:=NOW();
INSERT INTO `financial_exposures` ( ..., amount, customer_id, last_update )
    ( ..., 172035.12, 12345, NOW()),
    ( ..., 123456.78, 12346, NOW()),
    ( ..., 450111.00, 99999, NOW())
 DELETE FROM financial_exposures WHERE last_update < @barrier;

New customers get inserted, old customers get updated unless their values are unchanged (in that case MySQL skips the update, saving time), and at each instant, a record is always present – the one before the update, or the one after the update. Removed customers get removed as the last step.

This works better when you have a table that you need to frequently use and update. You can add a transaction (SET autocommit = 0 before the INSERT, COMMIT WORK after the DELETE) with no locks to ensure that all clients see the whole update as if it happened instantly.

Method 9

@Pericodes’ answer is correct, but there’s an error in the code snippet.

You can avoid duplicates by wrapping the code in a transaction (batch insert isn’t required to stop duplicates).

It’s better to use 1 batch insert instead of 1000 separate inserts even though it’s not required.

You can test by running this code twice (almost at the same time), and the table ends up with exactly 1000 records.


$query = $pdo->prepare("DELETE FROM t1 WHERE name=?");

$query = $pdo->prepare("INSERT INTO t1 (name, age) VALUES (:name,:age)");
for ($i = 0; $i < 100; $i++)
    $query->execute([ 'name' => 'Bob', 'age' => 34 ]);


Several of the answers mention locks (db-level and code-level), but those aren’t necessary for this problem, and are overkill imo.

All methods was sourced from or, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Notify of

Inline Feedbacks
View all comments
Would love your thoughts, please comment.x