Specify encoding with libreoffice –convert-to csv

Excel files can be converted to CSV using:

$ libreoffice --convert-to csv --headless --outdir dir file.xlsx

Everything appears to work just fine. The encoding, though, is set to something wonky. Instead of a UTF-8 mdash (—) that I get if I do a “save as” manually from LibreOffice Calc, it gives me a 227 (�). Using file on the CSV gives me “Non-ISO extended-ASCII text, with very long lines”. So, two questions:

  1. What on earth is happening here?
  2. How do I tell libreoffice to convert to UTF-8?

The specific file that I’m trying to convert is here.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Apparently LibreOffice tries to use ISO-8859-1 by default, which is causing the problem.
In response to this bug report, a new parameter --infilter has been added. The following command produces U+2014 em dash:

libreoffice  --convert-to csv --infilter=CSV:44,34,76,1 --headless --outdir dir file.xlsx

I tested this with LO 5.0.3.2. From the bug report, it looks like the earliest version containing this option is LO 4.4.

See also: https://ask.libreoffice.org/en/question/13008/how-do-i-specify-an-input-character-coding-for-a-convert-to-command-line-usage/

Method 2

You could try,

    $ libreoffice --convert-to 
    > csv:"Text - txt - csv (StarCalc)":"44,34,0,1,,0" 
    > --headless --outdir dir file.xlsx

Here, you have a very detailed help about.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x