> But doesn't appending to a string buffer still mean copying a lot of bytes as you build up the final string?
Look at the article and see for yourself. They do the string-builder measurement and show results. It's in figure 5.
The "GEOMETRIC" approach to string building from small fragments is slightly slightly faster than the rope approach. That's where characters are copied to a contiguous buffer, and the buffer is reallocated by doubling in size whenever it is filled.
The slowest by far is the "UNBUFFERED" [cord] version, which creates a new cord node for each append, instead of copying the characters.
"BUFFERED" [cord] is quite good. It's still a bit slower than "GEOMETRIC", but good enough.
In a nutshell, the paper shows that copying bytes when the number of bytes at a time is small is essential to all fast methods of string building.
That's because creating nodes and changing pointers is slower than copying contiguous bytes, up to some number of bytes. That number is probably larger now than it was when the paper was written due to modern memory architecture.
Hm that's interesting, actually the paper is sort of wishy-washy about the contribution? It doesn't claim that ropes are faster.
It claims that they make systems "more robust". I think they basically mean that certain apps that use long strings are less likely to run out of memory.
(Also, I would expect penalties of pointer chasing and thus ropes to be much more severe in 2020 vs. 1995, like maybe 10x more?)
We claim that the traditional implementations ofstrings, and often the supported functionality, are not well suited to such general-purpose use.They should be confined to applications with specific, and unusual, performance requirements.We present ‘ropes’ or ‘heavyweight’ strings as an alternative that, in our experience leads to systems that are more robust, both in functionality and in performance.
Conclusion:
It is not clear whether the implementation we suggested above is in any senseoptimal. However, it performs sufficiently well for our purposes, and sufficientlywell that for most purposes it is preferred over traditional ‘flat’ string representations.Our experience has been that it results in software that is noticeably more robust,particularly in that it scales correctly to unexpectedly large inputs.
edit: I guess they are making a computational complexity claim then? But again I would say it's a common idiom in Python and JS to append to a list and join(). At least now it is, maybe in 1995 programmers weren't used to that idiom.
Look at the article and see for yourself. They do the string-builder measurement and show results. It's in figure 5.
The "GEOMETRIC" approach to string building from small fragments is slightly slightly faster than the rope approach. That's where characters are copied to a contiguous buffer, and the buffer is reallocated by doubling in size whenever it is filled.
The slowest by far is the "UNBUFFERED" [cord] version, which creates a new cord node for each append, instead of copying the characters.
"BUFFERED" [cord] is quite good. It's still a bit slower than "GEOMETRIC", but good enough.
In a nutshell, the paper shows that copying bytes when the number of bytes at a time is small is essential to all fast methods of string building.
That's because creating nodes and changing pointers is slower than copying contiguous bytes, up to some number of bytes. That number is probably larger now than it was when the paper was written due to modern memory architecture.