Isn't this is more hardware dependent? A 2012 Intel notebook vs a 2022 AMD laptop would yield different results with same code. Also a well tune assembly can do these way faster. A python wrapped with assembly functions and run on a machine 6ghz Intel machine with ramdisk would also outperform this. I always feel this kind of claim is not helpful in day to day usage.