Adding BOM (unicode signature) while saving file in python

How can I add BOM (unicode signature) while saving file in python:

file_old = open('old.txt', mode='r', encoding='utf-8')
file_new = open('new.txt', mode='w', encoding='utf-16-le')
file_new.write(file_old.read())

I need to convert file to utf-16-le + BOM. Now script is working great, except that there is no BOM.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Write it directly at the beginning of the file:

file_new.write('ufeff')

Method 2

It’s better to use constants from ‘codecs’ module.

import codecs
f.write(codecs.BOM_UTF16_LE)

Method 3

Why do you think you need to specifically make it UTF16LE? Just use ‘utf16’ as the encoding, Python will write it in your endianness with the appropriate BOM, and all the consumer needs to be told is that the file is UTF-16 … that’s the whole point of having a BOM.

If the consumer is insisting that the file must be encoded in UTF16LE, then you don’t need a BOM.

If the file is written the way that you specify, and the consumer opens it with UTF16LE encoding, they will get a ufeff at the start of the file, which is a nuisance, and needs to be ignored.

Method 4

Just choose the encoding with BOM:

with codecs.open('outputfile.csv', 'w', 'utf-8-sig') as f:
   f.write('a,é')

(In python 3 you can drop the codecs.)

Method 5

I had a similar situation where a 3rd party app did not accept the file I generated unless it had a BOM.

For some reason in Python 2.7 the following does not work for me

write('ufeff')

I had to substitute it with

write('xffxfe')

and that is the same as

write(codecs.BOM_UTF16_LE)

my final output file was written with the following code

import codecs
mytext = "Help me"

with open("c:\temp\myFile.txt", 'w') as f:
    f.write(codecs.BOM_UTF16_LE)
    f.write(mytext.encode('utf-16-le'))

This answer may be useless for the original asker but it may help someone like me who stumbles upon this issue

Method 6

For UTF-8 with BOM you can use:

def addUTF8Bom(filename):
  f = codecs.open(filename, 'r', 'utf-8')
  content = f.read()
  f.close()
  f2 = codecs.open(filename, 'w', 'utf-8')
  f2.write(u'ufeff')
  f2.write(content)
  f2.close()

Method 7

vitperov’s answer for python3:

def add_utf8_bom(filename):
    with codecs.open(filename, 'r', 'utf-8') as f:
        content = f.read()
    with codecs.open(filename, 'w', 'utf-8') as f2:
        f2.write('ufeff')
        f2.write(content)
return

Method 8

TRY IT:

def add_bom(file, bom: bytes):
    with open(file, 'r+b') as f:
        org_contents = f.read()
        f.seek(0)
        f.write(bom + org_contents)

USAGE:

import codecs

...

file = 'test.txt'
with open(file, 'w', encoding='utf-8') as f:  # without BOM
    f.write('A')

add_bom(file, codecs.BOM_UTF16_LE)

# TEST
with open(file, 'rb') as f:
    print(f.read())  # b'xffxfeA'

Method 9

My method of adding BOM is by writing ansi characters ‘” at the beginning of the file, then open file in UTF-8 and write desired data:

# Create file with ANSI encoding
file= open("file.txt", "a", encoding="ansi", errors='ignore')
# Add BOM at the beginning of the file BOM 0xEFBBBF
file.write("")
# Close file
file.close()
# Open file in UTF-8 and write data
file= open("file.txt", "a", encoding="utf-8", errors='ignore')
file.write("Write your data here, Enjoy!!")


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x