How to Ignore Duplicate Key Errors Safely Using insert_many

I need to ignore duplicate inserts when using insert_many with pymongo, where the duplicates are based on the index. I’ve seen this question asked on stackoverflow, but I haven’t seen a useful answer.

Here’s my code snippet:

try:
    results = mongo_connection[db][collection].insert_many(documents, ordered=False, bypass_document_validation=True)
except pymongo.errors.BulkWriteError as e:
    logger.error(e)

I would like the insert_many to ignore duplicates and not throw an exception (which fills up my error logs). Alternatively, is there a separate exception handler I could use, so that I can just ignore the errors. I miss “w=0″…

Thanks

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can deal with this by inspecting the errors produced with BulkWriteError. This is actually an “object” which has several properties. The interesting parts are in details:

import pymongo
from bson.json_util import dumps
from pymongo import MongoClient
client = MongoClient()
db = client.test

collection = db.duptest

docs = [{ '_id': 1 }, { '_id': 1 },{ '_id': 2 }]


try:
  result = collection.insert_many(docs,ordered=False)

except pymongo.errors.BulkWriteError as e:
  print e.details['writeErrors']

On a first run, this will give the list of errors under e.details['writeErrors']:

[
  { 
    'index': 1,
    'code': 11000, 
    'errmsg': u'E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }', 
    'op': {'_id': 1}
  }
]

On a second run, you see three errors because all items existed:

[
  {
    "index": 0,
    "code": 11000,
    "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }", 
    "op": {"_id": 1}
   }, 
   {
     "index": 1,
     "code": 11000,
     "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }",
     "op": {"_id": 1}
   },
   {
     "index": 2,
     "code": 11000,
     "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 2 }",
     "op": {"_id": 2}
   }
]

So all you need do is filter the array for entries with "code": 11000 and then only “panic” when something else is in there

panic = filter(lambda x: x['code'] != 11000, e.details['writeErrors'])

if len(panic) > 0:
  print "really panic"

That gives you a mechanism for ignoring the duplicate key errors but of course paying attention to something that is actually a problem.

Method 2

Adding more to Neil’s solution.

Having ‘ordered=False, bypass_document_validation=True’ params allows new pending insertion to occur even on duplicate exception.

from pymongo import MongoClient, errors

DB_CLIENT = MongoClient()
MY_DB = DB_CLIENT['my_db']
TEST_COLL = MY_DB.dup_test_coll

doc_list = [
    {
        "_id": "82aced0eeab2467c93d04a9f72bf91e1",
        "name": "shakeel"
    },
    {
        "_id": "82aced0eeab2467c93d04a9f72bf91e1",  # duplicate error: 11000
        "name": "shakeel"
    },
    {
        "_id": "fab9816677774ca6ab6d86fc7b40dc62",  # this new doc gets inserted
        "name": "abc"
    }
]

try:
    # inserts new documents even on error
    TEST_COLL.insert_many(doc_list, ordered=False, bypass_document_validation=True)
except errors.BulkWriteError as e:
    print(f"Articles bulk insertion error {e}")

    panic_list = list(filter(lambda x: x['code'] != 11000, e.details['writeErrors']))
    if len(panic_list) > 0:
        print(f"these are not duplicate errors {panic_list}")

And since we are talking about duplicates its worth checking this solution as well.

Method 3

The correct solution is to use a WriteConcern with w=0 and ordered=False:

import pymongo
from pymongo.write_concern import WriteConcern


mongodb_connection[db][collection].with_options(write_concern=WriteConcern(w=0)).insert_many(messages, ordered=False)


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x