Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So far, if I understand correctly, nobody knows of a sequence of characters you could write down in any language that would trigger a crash when encoded in Unicode in a straightforward way. That suggests that the invariant being assumed may come, ironically, from a deep understanding of the scripts rather than ignorance.


No, the crashy Bengali sequences are reasonable to have; like I mentioned ZWNJ has semantic meaning with bengali vowels.


I am judging by this statement in the blog post:

In Bengali and Oriya specifically, a ZWNJ can be used to force a different vowel form when used before a vowel (e.g. রু vs র‌ু), however this bug seems to apply to vowels for which there is only one form

This seems to say that the ZWNJ has a meaning before vowels that have different forms, but the crash happens with vowels that only have one form, where the ZWNJ has no effect. Maybe I am misreading?


Yeah, you are.

I'm saying that this crash _also_ applies to vowels with one form.

র‌ু was the original Bengali crash, and that has two forms. I'm saying it's less likely to be related to the zwnj-vowel interaction because it also occurs for vowels where such interaction doesn't exist.


What do you mean by "straightforward" ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: