Chuckellania for November 3rd, 2021

Published on Thursday, November 11, 2021

unicode is complicated because languages are complicated because people are complicated

going "ehh these complexities don't matter in the context i'm working in" is the same as going "ehh these people don't matter in the context i'm working in"

I go – somewhat controversially — further than that. If you inadvertently do that, and your software incidentally doesn't support other cultures well, that's one thing. Happens to all of us. But if you repeatedly brush off the problems Unicode was trying to solve because you think they're not important to you, that's a form of racism.

Almost invariably, it's people who write in English, or in another language using the Latin script, making assertions of the "why is Unicode so complicated?" or "it could easily fit in 2 bytes if you leave out all the needless complexity?" kind, where "needless" is needless for them, not for the people that complexity was designed for. Really, when you interrogate that any further, you realize what they're asking is "why won't those people just write in English instead?"

I sympathize that getting Unicode right is hard — so hard that most languages and frameworks don't even implement something like a string.getLength() API "correctly". They look at amount of bytes, or amount of code points, rather than what you probably want (but is much harder to compute): amount of visible glyphs on screen.

But just because it's hard doesn't mean we shouldn't strive to do it.


These pseudo-C# annotations to assembler look nice. For example, add r8d, 64h is commented with r8d += 100, which far more closely matches the way I think.


The feeling that others have used your app far more than you ever will is quite amazing.