I am experiencing the issue of unproper reading of cyrillic letters from MySQL table.
I use the following code:
library(RMySQL) library(keyring) mydb = dbConnect(MySQL(), ...) dbReadTable(mydb, 'tregions2')
The table is read but cyrillic letters are substituted with question marks:
id regionname iSOID administrativeCenter 1 1 ????????? ???? RU-ALT ???????
I started investigating into the issue.
The result of the query show variables like 'character_set_%';
in MySQL Workbench for the same user logged in on the same PC returns:
character_set_client utf8mb4 character_set_connection utf8mb4 character_set_database utf8 character_set_filesystem binary character_set_results utf8mb4 character_set_server utf8mb4 character_set_system utf8 character_sets_dir C:Program FilesMySQLMySQL Server 8.0sharecharsets
But result of the query returned by R is different:
> dbGetQuery(mydb, "show variables like 'character_set_%'") Variable_name Value 1 character_set_client latin1 2 character_set_connection latin1 3 character_set_database utf8 4 character_set_filesystem binary 5 character_set_results latin1 6 character_set_server utf8mb4 7 character_set_system utf8 8 character_sets_dir C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\
The locale variables of R are the following:
> Sys.getlocale() [1] "LC_COLLATE=Russian_Russia.1251;LC_CTYPE=Russian_Russia.1251;LC_MONETARY=Russian_Russia.1251;LC_NUMERIC=C;LC_TIME=Russian_Russia.1251
I tried to change character set and collation of the table in DB. Earlier setting cp1251 character set helped me to properly write the data into the database. But not now. I tried utf8/koi8r/cp1251 without any effect.
Attempt to execute Sys.setlocale(,"ru_RU")
aborted with an error that it could not be executed.
I am stuck. Could anyone give me an advise what else I should do?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
After several hours of investigation I finaly figured out the solution. Hope it will help someone encountering the same problem:
> dbExecute(mydb, "SET NAMES cp1251") [1] 0 > dbGetQuery(mydb, "show variables like 'character_set_%'") Variable_name Value 1 character_set_client cp1251 2 character_set_connection cp1251 3 character_set_database utf8 4 character_set_filesystem binary 5 character_set_results cp1251 6 character_set_server utf8mb4 7 character_set_system utf8 8 character_sets_dir C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\ > > TrTMP <- dbReadTable(mydb, 'tregions') > TrTMP[1,c(1,2,6,14)] id regionname iSOID administrativeCenter 1 1 Алтайский край RU-ALT Барнаул
Method 2
Tool -> Global Options -> Code -> Saving and put UTF-8 rs <- dbSendQuery(con, 'set character set "utf8"') rs <- dbSendQuery(con, 'SET NAMES utf8')
options(encoding = “UTF-8”) at the top of my main script from which I call my package seems to fix the issue with having non-ascii characters in my package code.
read_chunk(lines = readLines("TestSpanishText.R", encoding = "UTF-8")) (also file())
For more flexibility, you should use utf8mb4 instead of cp1251. If you have data coming into the client in cp1251, then you probably have to stick with that charset.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0