Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're right that "code unit" is the correct Unicode term for parts of encoded Unicode strings, which is exactly why it's the wrong term in this case, because the whole point is the filenames don't have to be Unicode.

So I can take the byte string [0xFF, 0xFF, 0xFF] and use that as a filename on Linux, or I can take the sequence of 16-bit values [0xD800, 0xD800, 0xD800] and use that as a filename on NTFS. They're not made of code units because they're not Unicode strings.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: