In gamedev there is simple rule: don't try to do any of that.
If it is text game needs to show to user then every version of the text that is needed is a translated text. Programmer will never know if context or locale will need word order changes or anything complicated. Just trust the translation team.
If text is coming from user - then change design until its not needed to 'convert'. There are major issues just to show user back what he entered! Because the font for editing and displayed text could be different. Not even mentioning RTL and other issues.
Once ppl learn about localization the questions like why a programming language does not do this 'simple text operation' are just a newcomer detector. :)
It is issues like this due to which I gave up on C++. There are so many ways to do something and every way is freaking wrong!
An acceptable solution is given at the end of the article:
> If you use the International Components for Unicode (ICU) library, you can use u_strToUpper and u_strToLower.
Makes you wonder why this isn't part of the C++ standard library itself. Every revision of the C++ standard brings with itself more syntax and more complexity in the language. But as a user of C++ I don't need more syntax and more complexity in the language. But I do need more standard library functions that solves these ordinary real-world programming problems.
The key takeaway here is that you can't correctly process a string if you don't what language it's in. That includes variants of the same language with different rules, eg en-US and en-UK or es-MX and es-ES.
If you are handling multilingual text the locale is mandatory metadata.
> From the article: "I find it quaint that Unicode character names are ALL IN CAPITAL LETTERS, in case you need to put them in a Baudot telegram or something."
I had to do that. When we had our steampunk telegraph office at steampunk conventions [1], people could text in a message via SMS, it would be printed on a Model 14 or 15 Teletype, put in an envelope, and hand-delivered. People would use emoji in messages, and the device could only print Baudot, or International Telegraphic Alphabet #2, which is upper case only with some symbols.
Emoji translation would cause the machine to hammer out
SleepyMyroslav ·74 days ago
If it is text game needs to show to user then every version of the text that is needed is a translated text. Programmer will never know if context or locale will need word order changes or anything complicated. Just trust the translation team.
If text is coming from user - then change design until its not needed to 'convert'. There are major issues just to show user back what he entered! Because the font for editing and displayed text could be different. Not even mentioning RTL and other issues.
Once ppl learn about localization the questions like why a programming language does not do this 'simple text operation' are just a newcomer detector. :)
Show replies
blenderob ·74 days ago
An acceptable solution is given at the end of the article:
> If you use the International Components for Unicode (ICU) library, you can use u_strToUpper and u_strToLower.
Makes you wonder why this isn't part of the C++ standard library itself. Every revision of the C++ standard brings with itself more syntax and more complexity in the language. But as a user of C++ I don't need more syntax and more complexity in the language. But I do need more standard library functions that solves these ordinary real-world programming problems.
Show replies
appointment ·74 days ago
If you are handling multilingual text the locale is mandatory metadata.
Show replies
vardump ·74 days ago
That said, 99% time when doing upper- or lowercase operation you're interested just in the 7-bit ASCII range of characters.
For the remaining 1%, there's ICU library. Just like Raymond Chen mentioned.
Show replies
Animats ·73 days ago
I had to do that. When we had our steampunk telegraph office at steampunk conventions [1], people could text in a message via SMS, it would be printed on a Model 14 or 15 Teletype, put in an envelope, and hand-delivered. People would use emoji in messages, and the device could only print Baudot, or International Telegraphic Alphabet #2, which is upper case only with some symbols.
Emoji translation would cause the machine to hammer out
or whatever emoji description was needed.Used the emoji list at [2], an older version.
[1] https://vimeo.com/124065314
[2] http://unicode.org/emoji/charts-beta/full-emoji-list.html