Archive for the ‘unicode’ tag
Handling Unicode with PHP
Unicode characters and webservices always create one or the other problem for me
Working on PlannerX backend was not any different. Spent some good hours fixing Unicode / UTF-8 related issues.
And while I was searching for some solutions, I found an excellent “PHP UTF-8 Cheat Sheet” by Nick Nettleton of DropSend. I highly recommend it if you are going to do anything with PHP and Unicode!
And BTW, don’t use Base64 encoding with UTF-8. It will not work!
Removing Word special characters from text / XML in PHP
Microsoft Word converts certain characters into "smart characters". Double quotes, dashes (em dash / en dash), bullets and so on.
These characters break PHP's XML handling. (or at least they broke it for me - using simplexml_load_string!).
How do you clean them?
There is an old post that suggests using ereg_replace on a set of characters - essentially converts them to html entities.
Unfortunately, that did not work with me. Since the text is UTF8, the replace logic replaced alphabets too.
I tried a lot to get a solution, but could not find something that would work. Finally, just stripped out all non printing characters except line breaks and tabs.
-
private function cleanWordSpecialCharacters($body)
-
{
-
return $body;
-
}
This too breaks with non English characters. Any suggestions?
