Unpickling a python 2 object with python 3

I’m wondering if there is a way to load an object that was pickled in Python 2.4, with Python 3.4.

I’ve been running 2to3 on a large amount of company legacy code to get it up to date.

Having done this, when running the file I get the following error:

  File "H:fixers - 3.4addressfixer - 3.4trunklibaddressaddress_generic.py"
, line 382, in read_ref_files
    d = pickle.load(open(mshelffile, 'rb'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal
not in range(128)

Looking at the pickled object in contention, it’s a dict in a dict, containing keys and values of type str.

So my question is: Is there a way to load an object, originally pickled in python 2.4, with python 3.4?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You’ll have to tell pickle.load() how to convert Python bytestring data to Python 3 strings, or you can tell pickle to leave them as bytes.

The default is to try and decode all string data as ASCII, and that decoding fails. See the pickle.load() documentation:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

Setting the encoding to latin1 allows you to import the data directly:

with open(mshelffile, 'rb') as f:
    d = pickle.load(f, encoding='latin1')

but you’ll need to verify that none of your strings are decoded using the wrong codec; Latin-1 works for any input as it maps the byte values 0-255 to the first 256 Unicode codepoints directly.

The alternative would be to load the data with encoding='bytes', and decode all bytes keys and values afterwards.

Note that up to Python versions before 3.6.8, 3.7.2 and 3.8.0, unpickling of Python 2 datetime object data is broken unless you use encoding='bytes'.

Method 2

Using encoding='latin1' causes some issues when your object contains numpy arrays in it.

Using encoding='bytes' will be better.

Please see this answer for complete explanation of using encoding='bytes'


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x