Why is the Chinese display of query results messy with dbeaver?
I used the latest version of dbeaver23.2 to connect heavydb6.4, why is the Chinese display of query results messy? How to solve it, thank you!
-
Hi,
I believe it depends on how the data was encoded when it was ingested into the database. If it was encoded as UTF-8, everything should be displayed correctly. However, if a legacy encoding was used, you may encounter incorrect output. You can adjust the encoding settings in DBeaver. In my installation, you can select UTF-8, UTF-16, and a few others.
Here's a small example with some UTF-encoded data inserted into the database:
If you have a file with a legacy encoding like BIG5, you can use the
iconv
utility to convert it into UTF-8.When you load the original file with the legacy encoding into the database, you may encounter incorrect output due to the source file's encoding. For example:
head -n 5 adminbk1.txt
id","title (WG)","title (Pinyin)","title (English)","title (Chinese)","author","Boundary","Name","Code","Other (specify)","Period","# of Pages","Pub_Info","Location","Call #","ISBN","Language","Description"
1,"Ching tai ti li yen ko piao","Qing dai di li yan ge biao",,"�� �� �� �� �� �� ��","Zhao, Quan-cheng ( �� Ȫ �� )",1,1,0,,"Qing Dynasty","204","�� �� �� �� �� 1940 �� �� �� �� ��, 1979 ��","UW East Asian Library","DS755 .S532 v.628",,"Chinese","China - historical - geography; China - Administrative - and - political -divisions. It contains descriptions and charts about the administrative boundary changes in Qing Danasty."
2,"Chung-kuo shih hsien shou tse","Zhong guo shi xian shou ce",,"�� �� �� �� �� ��","Wang, Yueh",0,1,0,,"-1986","641","�� �� ʡ �� �� �� �� �� 1987","UW East Asian Library","JS7351 A3 C59 1987",,"Chinese","China - administrative - and - political - divisions. It contains the name, geography, and other information about Chinese cities and counties up to 1986."
3,"Ko sheng chu yu yen ko i lan piao","Ge sheng qu yu yan ge yi lan biao",,"�� ʡ �� �� �� �� һ �� ��",,0,1,0,,,"47","�� �� �� �� ӡ �� �� 1914","UW East Asian Library","DS737 .H7",,"Chinese","Names - geographical - China; China - administrative - and - political - divisions. Colophon title, errata slip inserted. Changes of county names and provincial names."
4,,"Zhong hua ren min gong he guo xing zheng qu hua jian ce","Simplified handbook on administrative divisions of the People's Republic of China,1977","�� �� �� �� �� �� �� �� �� �� �� �� ��","�� �� ��",0,1,0,,"-1976","154","Arlington, Va: Joint Publications Research Services. Sold by NTIS, 1978.","UW East Asian Library","JS7351 .A3 1978",,"Chinese/English","The report contains a breakdown of all administrative divisions of the PRC at county level and above throughout the country. It is a translation of the Chinese version."However, by using iconv or similar utilities to convert the file to UTF-8 and then loading the converted file into the database, the data will be correctly displayed when queried. The conversion process ensures that the data is in the correct character encoding for proper display.
mapd@zion-tr:~$ head -n 5 adminbk1.txt | iconv -f BIG-5 -t UTF-8
"id","title (WG)","title (Pinyin)","title (English)","title (Chinese)","author","Boundary","Name","Code","Other (specify)","Period","# of Pages","Pub_Info","Location","Call #","ISBN","Language","Description"
1,"Ching tai ti li yen ko piao","Qing dai di li yan ge biao",," 測 華 燴 朓 賂 桶","Zhao, Quan-cheng ( 梊 割 )",1,1,0,,"Qing Dynasty","204","恅 漆 堤 唳 扦 1940 爛 唳 腔 婬 唳, 1979 爛","UW East Asian Library","DS755 .S532 v.628",,"Chinese","China - historical - geography; China - Administrative - and - political -divisions. It contains descriptions and charts about the administrative boundary changes in Qing Danasty."
2,"Chung-kuo shih hsien shou tse","Zhong guo shi xian shou ce",,"笢 弊 庈 瓮 忒 聊","Wang, Yueh",0,1,0,,"-1986","641","涳 蔬 吽 諒 郤 堤 唳 扦 1987","UW East Asian Library","JS7351 A3 C59 1987",,"Chinese","China - administrative - and - political - divisions. It contains the name, geography, and other information about Chinese cities and counties up to 1986."
3,"Ko sheng chu yu yen ko i lan piao","Ge sheng qu yu yan ge yi lan biao",,"跪 吽 郖 朓 賂 珨 擬 桶",,0,1,0,,,"47","奻 漆 妀 昢 荂 抎 奩 1914","UW East Asian Library","DS737 .H7",,"Chinese","Names - geographical - China; China - administrative - and - political - divisions. Colophon title, errata slip inserted. Changes of county names and provincial names."
4,,"Zhong hua ren min gong he guo xing zheng qu hua jian ce","Simplified handbook on administrative divisions of the People's Republic of China,1977","笢 貌 鏍 僕 睿 弊 俴 淉 赫 潠 聊","囀 昢 窒",0,1,0,,"-1976","154","Arlington, Va: Joint Publications Research Services. Sold by NTIS, 1978.","UW East Asian Library","JS7351 .A3 1978",,"Chinese/English","The report contains a breakdown of all administrative divisions of the PRC at county level and above throughout the country. It is a translation of the Chinese version."If you have any further questions or need assistance, please feel free to ask.
Regards,
Candido -
I have solved the problem.Thanks a lot!. You need to make sure that the exported csv file is in utf8 format
heavyai@node13:/var/lib/heavyai/storage/import/sample_datasets$ enca -L zh_CN sid_latn1.csv
Simplified Chinese National Standard; GB2312
heavyai@node13:/var/lib/heavyai/storage/import/sample_datasets$ enca -L zh_CN -x UTF-8 < sid_latn1.csv > sid_latn2.csv
heavyai@node13:/var/lib/heavyai/storage/import/sample_datasets$ enca -L zh_CN sid_latn2.csv
Universal transformation format 8 bits; UTF-8 -
HI @jjieguo,
It's not important what's is in your environment, but the ecofing used by your terminal
So if I set my terminal this way, the characters are displayed as expected
but almost every tool uses UTF-8 as a default so that it would be safer loading data encoded in UTF-8, rather than legacy encodings like BIG5 or GB2312 with tools like iconve.g.
mapd@zion-tr:~$ iconv -c -f GB2312, -t utf-8 -o adminbk1_202309151038_utf.csv adminbk1_202309151038.csv
mapd@zion-tr:~$ /opt/heavyai/bin/heavysql -p HyperInteractive
User admin connected to database heavyai
heavysql> truncate table adminbk1;
heavysql> copy adminbk1 from '/home/mapd/adminbk1_202309151038_utf.csv' with (header='true');
Result
Loaded: 73 recs, Rejected: 0 recs in 0.156000 secsNow using Squirrel, DBeaver, or other tools, you'll get the proper output.
AFAIK, there isn't an option to adjust the encoding of DBeaver or SquirrelSQL, I'm not the developer or the maintainer of those tools, and I can be wrong, so if you don't want to covert your data you can ask to the developer of those tools if there is a way to do it.
(I also tried with other tools, and I'm not able to change the charset encoding of the output).
Regards,
Candido
Please sign in to leave a comment.
Comments
5 comments