Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It's not - you get the worst of both worlds: indexing is not a constant-time operation

You can't usefully index a unicode stream in constant time and do correct and useful textual stuff anyway due to combining codepoints which may not have precombined forms (if only because there is no defined limit to the number of combining codepoints tacked onto the base) (so normalization will not save you) or codepoints which are not visible to the user and which you may or may not want to see depending on the work you're doing.

People really need to come to terms that a unicode stream is exactly that, a stream.



> You can't usefully index a unicode stream in constant time and do correct and useful textual stuff anyway

To find an index of a substring you need to scan the string, right. But once you have the byte index you can quickly jump to its position in the string, e.g. when you do a slice operation based on that index: s[i:]. If strings.Index() returned a code point index and not a byte index you would have to scan the string again.


> To find an index of a substring you need to scan the string, right. But once you have the byte index you can quickly jump to its position in the string

Stop doing that and just get the bit of string you want in the first place?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: