Is it possible to read and write Word (2003 and 2007) files in Python without using a COM object?
I know that I can:
f = open('c:file.doc', "w")
f.write(text)
f.close()
but Word will read it as an HTML file not a native .doc file.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
See python-docx, its official documentation is available here.
This has worked very well for me.
Method 2
If you only what to read, it is simplest to use the linux soffice command to convert it to text, and then load the text into python:
Method 3
I’d look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.
Method 4
doc (Word 2003 in this case) and docx (Word 2007) are different formats, where the latter is usually just an archive of xml and image files. I would imagine that it is very possible to write to docx files by manipulating the contents of those xml files. However I don’t see how you could read and write to a doc file without some type of COM component interface.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0