I will retract that you didn't mention memory-level parallelism back then, I lazily looked at the git blame and assumed it was introduced then. That was sloppy of me, sorry.
> I described what made it fast
I still partially disagree. Memory-level parallelism is one aspect but I think instruction-level parallelism and hiding data dependencies from iteration to iteration are the bigger factors. That's mainly what I chose to fully exploit in glidesort by actively splitting merges into smaller merges, and implementing bidirectional quicksort. That was the main takeaway from my glidesort talk.
> and it was obvious there was general applicability. If I understanding your reasoning correctly, because you took inspiration from my more advanced unguarded two directional merge, you feel like a guarded two directional merge is some new novel discovery.
The novelty is pointing out that also the hiding of data dependencies and instruction-level parallelism is making it fast, and to then go on and exploit this to the maximum. E.g. if I turn off the merge splitting in glidesort (reducing the amount of parallel merges) performance degrades 17.5% on my machine.
Your README explicitly called them parity merges, and that the arrays have to be of equal length. You might find the extension to unbalanced merges and using it for all merges instead of only the smallsort obvious, I disagree. If it is obvious... why didn't you do it? Correct me if I'm wrong but I was under the impression quadsort only uses the bidirectional merges for the small base case where arrays have statically known (equal) sizes.
> Subsequently you then proceed to give me zero credit for branchless merging in your Youtube presentation, instead referencing a branchless merge that gave a 10-20% speed-up instead of quadsort's 150% speed-up.
Because I was already aware of it, from the earlier work. That they implemented it poorly is besides the point. You and I both know that a bad implementation of a good idea in sorting can easily destroy all performance gains.
The part that was novel to me from your work on quadsort was eliminating the bounds check on perfectly balanced merges and the original idea of merging from both ends. That I credit you for.
> You also claimed in the presentation there were no bidirectional branchless algorithms written to date, this is simply not true.
>
> This does upset me. It would be a simple matter, and the right thing, to point out that quadsort introduced bidirectional branchless merging.
I don't think I claimed that. And I have modified the slides now to explicitly mention quadsort on the slide for bidirectional merging.
As I wrote previously, thank you for fully addressing all my concerns in your recent FOSDEM presentation.
While it would have been outside the scope of the presentation, and time being short, quadsort does present an interesting alternative to handling the merge length problem powsort tries to solve.
I've honestly been unable to detect any notable instruction-level parallelism on my own hardware. I suspect there may not be any? It'll be interesting to get to the bottom of this.
As for quadsort, it does contain a guarded bidirectional merge. I published it after you started working on glidesort however. It was always on my todo list, but I do get tired of programming from time to time.
As for unguarded parity merges, it was indeed one of my brighter moments when I came up with that.
Anyways, once again, thanks for addressing my concerns in the FOSDEM presentation, and if it wasn't clear, I think you did some really excellent work on glidesort.
> I described what made it fast
I still partially disagree. Memory-level parallelism is one aspect but I think instruction-level parallelism and hiding data dependencies from iteration to iteration are the bigger factors. That's mainly what I chose to fully exploit in glidesort by actively splitting merges into smaller merges, and implementing bidirectional quicksort. That was the main takeaway from my glidesort talk.
> and it was obvious there was general applicability. If I understanding your reasoning correctly, because you took inspiration from my more advanced unguarded two directional merge, you feel like a guarded two directional merge is some new novel discovery.
The novelty is pointing out that also the hiding of data dependencies and instruction-level parallelism is making it fast, and to then go on and exploit this to the maximum. E.g. if I turn off the merge splitting in glidesort (reducing the amount of parallel merges) performance degrades 17.5% on my machine.
Your README explicitly called them parity merges, and that the arrays have to be of equal length. You might find the extension to unbalanced merges and using it for all merges instead of only the smallsort obvious, I disagree. If it is obvious... why didn't you do it? Correct me if I'm wrong but I was under the impression quadsort only uses the bidirectional merges for the small base case where arrays have statically known (equal) sizes.
> Subsequently you then proceed to give me zero credit for branchless merging in your Youtube presentation, instead referencing a branchless merge that gave a 10-20% speed-up instead of quadsort's 150% speed-up.
Because I was already aware of it, from the earlier work. That they implemented it poorly is besides the point. You and I both know that a bad implementation of a good idea in sorting can easily destroy all performance gains.
The part that was novel to me from your work on quadsort was eliminating the bounds check on perfectly balanced merges and the original idea of merging from both ends. That I credit you for.
> You also claimed in the presentation there were no bidirectional branchless algorithms written to date, this is simply not true. > > This does upset me. It would be a simple matter, and the right thing, to point out that quadsort introduced bidirectional branchless merging.
I don't think I claimed that. And I have modified the slides now to explicitly mention quadsort on the slide for bidirectional merging.