Friday, May 7, 2021

MySQL DB: mixed charset problem


The problem:

Client application A writes character ¢ into MySQL DB.

Client application B reads it from DB and gets ¢.

If we look inside the DB to see how it is stored using function hex() -- select hex(fieldname) from tablename, we can see it is stored as \xC2\xA2 in hex form.

 

The reason:

The MySQL DB server is using ISO-8859-1 (latin1).

Client A writes characters to MySQL DB server in UTF-8. But the server incorrectly thinks the connection is using latin1. So one character \xA2 is stored as \xC2\xA2 (UTF-encoding).

Client B reads characters from the server in UTF-8. And the server correctly identifies the connection is using UTF-8. So the sever will encode the 2 hex characters into \xC3\x82\xC2\xA2 and send back 4 characters as the UTF-2 encoding for the 2 characters stored in the table.

 

To fix:

Make sure the charset settings of the connection and the client match the what the client actually uses.

The SQL command to check the charsets is:

SHOW SESSION VARIABLES LIKE 'character\_set\_%';

The SQL command to change the charsets to UTF-8 is:

SET NAMES utf8


No comments:

 
Get This <