Excel files can be converted to CSV using:
$ libreoffice --convert-to csv --headless --outdir dir file.xlsx
Everything appears to work just fine. The encoding, though, is set to something wonky. Instead of a UTF-8 mdash (—) that I get if I do a “save as” manually from LibreOffice Calc, it gives me a 227 (�). Using file on the CSV gives me “Non-ISO extended-ASCII text, with very long lines”. So, two questions:
- What on earth is happening here?
- How do I tell libreoffice to convert to UTF-8?
The specific file that I’m trying to convert is here.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Apparently LibreOffice tries to use ISO-8859-1 by default, which is causing the problem.
In response to this bug report, a new parameter --infilter has been added. The following command produces U+2014 em dash:
libreoffice --convert-to csv --infilter=CSV:44,34,76,1 --headless --outdir dir file.xlsx
I tested this with LO 5.0.3.2. From the bug report, it looks like the earliest version containing this option is LO 4.4.
Method 2
You could try,
$ libreoffice --convert-to
> csv:"Text - txt - csv (StarCalc)":"44,34,0,1,,0"
> --headless --outdir dir file.xlsx
Here, you have a very detailed help about.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0