How to convert a string of bytes into an int?

How can I convert a string of bytes into an int in python?

Say like this: 'yxccxa6xbb'

I came up with a clever/stupid way of doing it:

sum(ord(c) << (i * 8) for i, c in enumerate('yxccxa6xbb'[::-1]))

I know there has to be something builtin or in the standard library that does this more simply…

This is different from converting a string of hex digits for which you can use int(xxx, 16), but instead I want to convert a string of actual byte values.

UPDATE:

I kind of like James’ answer a little better because it doesn’t require importing another module, but Greg’s method is faster:

>>> from timeit import Timer
>>> Timer('struct.unpack("<L", "yxccxa6xbb")[0]', 'import struct').timeit()
0.36242198944091797
>>> Timer("int('yxccxa6xbb'.encode('hex'), 16)").timeit()
1.1432669162750244

My hacky method:

>>> Timer("sum(ord(c) << (i * 8) for i, c in enumerate('yxccxa6xbb'[::-1]))").timeit()
2.8819329738616943

FURTHER UPDATE:

Someone asked in comments what’s the problem with importing another module. Well, importing a module isn’t necessarily cheap, take a look:

>>> Timer("""import structnstruct.unpack(">L", "yxccxa6xbb")[0]""").timeit()
0.98822188377380371

Including the cost of importing the module negates almost all of the advantage that this method has. I believe that this will only include the expense of importing it once for the entire benchmark run; look what happens when I force it to reload every time:

>>> Timer("""reload(struct)nstruct.unpack(">L", "yxccxa6xbb")[0]""", 'import struct').timeit()
68.474128007888794

Needless to say, if you’re doing a lot of executions of this method per one import than this becomes proportionally less of an issue. It’s also probably i/o cost rather than cpu so it may depend on the capacity and load characteristics of the particular machine.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

In Python 3.2 and later, use

>>> int.from_bytes(b'yxccxa6xbb', byteorder='big')
2043455163

or

>>> int.from_bytes(b'yxccxa6xbb', byteorder='little')
3148270713

according to the endianness of your byte-string.

This also works for bytestring-integers of arbitrary length, and for two’s-complement signed integers by specifying signed=True. See the docs for from_bytes.

Method 2

You can also use the struct module to do this:

>>> struct.unpack("<L", "yxccxa6xbb")[0]
3148270713L

Method 3

As Greg said, you can use struct if you are dealing with binary values, but if you just have a “hex number” but in byte format you might want to just convert it like:

s = 'yxccxa6xbb'
num = int(s.encode('hex'), 16)

…this is the same as:

num = struct.unpack(">L", s)[0]

…except it’ll work for any number of bytes.

Method 4

I use the following function to convert data between int, hex and bytes.

def bytes2int(str):
 return int(str.encode('hex'), 16)

def bytes2hex(str):
 return '0x'+str.encode('hex')

def int2bytes(i):
 h = int2hex(i)
 return hex2bytes(h)

def int2hex(i):
 return hex(i)

def hex2int(h):
 if len(h) > 1 and h[0:2] == '0x':
  h = h[2:]

 if len(h) % 2:
  h = "0" + h

 return int(h, 16)

def hex2bytes(h):
 if len(h) > 1 and h[0:2] == '0x':
  h = h[2:]

 if len(h) % 2:
  h = "0" + h

 return h.decode('hex')

Source: http://opentechnotes.blogspot.com.au/2014/04/convert-values-to-from-integer-hex.html

Method 5

import array
integerValue = array.array("I", 'yxccxa6xbb')[0]

Warning: the above is strongly platform-specific. Both the “I” specifier and the endianness of the string->int conversion are dependent on your particular Python implementation. But if you want to convert many integers/strings at once, then the array module does it quickly.

Method 6

In Python 2.x, you could use the format specifiers <B for unsigned bytes, and <b for signed bytes with struct.unpack/struct.pack.

E.g:

Let x = 'xffx10x11'

data_ints = struct.unpack('<' + 'B'*len(x), x) # [255, 16, 17]

And:

data_bytes = struct.pack('<' + 'B'*len(data_ints), *data_ints) # 'xffx10x11'

That * is required!

See https://docs.python.org/2/library/struct.html#format-characters for a list of the format specifiers.

Method 7

>>> reduce(lambda s, x: s*256 + x, bytearray("yxccxa6xbb"))
2043455163

Test 1: inverse:

>>> hex(2043455163)
'0x79cca6bb'

Test 2: Number of bytes > 8:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAA"))
338822822454978555838225329091068225L

Test 3: Increment by one:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAB"))
338822822454978555838225329091068226L

Test 4: Append one byte, say ‘A’:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))
86738642548474510294585684247313465921L

Test 5: Divide by 256:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))/256
338822822454978555838225329091068226L

Result equals the result of Test 4, as expected.

Method 8

I was struggling to find a solution for arbitrary length byte sequences that would work under Python 2.x. Finally I wrote this one, it’s a bit hacky because it performs a string conversion, but it works.

Function for Python 2.x, arbitrary length

def signedbytes(data):
    """Convert a bytearray into an integer, considering the first bit as
    sign. The data must be big-endian."""
    negative = data[0] & 0x80 > 0

    if negative:
        inverted = bytearray(~d % 256 for d in data)
        return -signedbytes(inverted) - 1

    encoded = str(data).encode('hex')
    return int(encoded, 16)

This function has two requirements:

  • The input data needs to be a bytearray. You may call the function like this:
    s = 'yxccxa6xbb'
    n = signedbytes(s)
  • The data needs to be big-endian. In case you have a little-endian value, you should reverse it first:
    n = signedbytes(s[::-1])

Of course, this should be used only if arbitrary length is needed. Otherwise, stick with more standard ways (e.g. struct).

Method 9

int.from_bytes is the best solution if you are at version >=3.2.
The “struct.unpack” solution requires a string so it will not apply to arrays of bytes.
Here is another solution:

def bytes2int( tb, order='big'):
    if order == 'big': seq=[0,1,2,3]
    elif order == 'little': seq=[3,2,1,0]
    i = 0
    for j in seq: i = (i<<8)+tb[j]
    return i

hex( bytes2int( [0x87, 0x65, 0x43, 0x21])) returns ‘0x87654321’.

It handles big and little endianness and is easily modifiable for 8 bytes

Method 10

As mentioned above using unpack function of struct is a good way. If you want to implement your own function there is an another solution:

def bytes_to_int(bytes):
    result = 0
    for b in bytes:
        result = result * 256 + int(b)
return result

Method 11

In python 3 you can easily convert a byte string into a list of integers (0..255) by

>>> list(b'yxccxa6xbb')
[121, 204, 166, 187]

Method 12

A decently speedy method utilizing array.array I’ve been using for some time:

predefined variables:

offset = 0
size = 4
big = True # endian
arr = array('B')
arr.fromstring("x00x00xffx00") # 5 bytes (encoding issues) [0, 0, 195, 191, 0]

to int: (read)

val = 0
for v in arr[offset:offset+size][::pow(-1,not big)]: val = (val<<8)|v

from int: (write)

val = 16384
arr[offset:offset+size] = 
    array('B',((val>>(i<<3))&255 for i in range(size)))[::pow(-1,not big)]

It’s possible these could be faster though.

EDIT:
For some numbers, here’s a performance test (Anaconda 2.3.0) showing stable averages on read in comparison to reduce():

========================= byte array to int.py =========================
5000 iterations; threshold of min + 5000ns:
______________________________________code___|_______min______|_______max______|_______avg______|_efficiency
⣿⠀⠀⠀⠀⡇⢀⡀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⡀⠀⢰⠀⠀⠀⢰⠀⠀⠀⢸⠀⠀⢀⡇⠀⢀⠀⠀⠀⠀⢠⠀⠀⠀⠀⢰⠀⠀⠀⢸⡀⠀⠀⠀⢸⠀⡇⠀⠀⢠⠀⢰⠀⢸⠀
⣿⣦⣴⣰⣦⣿⣾⣧⣤⣷⣦⣤⣶⣾⣿⣦⣼⣶⣷⣶⣸⣴⣤⣀⣾⣾⣄⣤⣾⡆⣾⣿⣿⣶⣾⣾⣶⣿⣤⣾⣤⣤⣴⣼⣾⣼⣴⣤⣼⣷⣆⣴⣴⣿⣾⣷⣧⣶⣼⣴⣿⣶⣿⣶
    val = 0 nfor v in arr: val = (val<<8)|v |     5373.848ns |   850009.965ns |     ~8649.64ns |  62.128%
⡇⠀⠀⢀⠀⠀⠀⡇⠀⡇⠀⠀⣠⠀⣿⠀⠀⠀⠀⡀⠀⠀⡆⠀⡆⢰⠀⠀⡆⠀⡄⠀⠀⠀⢠⢀⣼⠀⠀⡇⣠⣸⣤⡇⠀⡆⢸⠀⠀⠀⠀⢠⠀⢠⣿⠀⠀⢠⠀⠀⢸⢠⠀⡀
⣧⣶⣶⣾⣶⣷⣴⣿⣾⡇⣤⣶⣿⣸⣿⣶⣶⣶⣶⣧⣷⣼⣷⣷⣷⣿⣦⣴⣧⣄⣷⣠⣷⣶⣾⣸⣿⣶⣶⣷⣿⣿⣿⣷⣧⣷⣼⣦⣶⣾⣿⣾⣼⣿⣿⣶⣶⣼⣦⣼⣾⣿⣶⣷
                  val = reduce( shift, arr ) |     6489.921ns |  5094212.014ns |   ~12040.269ns |  53.902%

This is a raw performance test, so the endian pow-flip is left out.
The shift function shown applies the same shift-oring operation as the for loop, and arr is just array.array('B',[0,0,255,0]) as it has the fastest iterative performance next to dict.

I should probably also note efficiency is measured by accuracy to the average time.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x