this post was submitted on 08 Jun 2023
13 points (100.0% liked)
Programming
13368 readers
2 users here now
All things programming and coding related. Subcommunity of Technology.
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
That is 70% faster on sequences of five elements. Using this for handling the end of the recursion in LLVMs current std::sort implementation results in a 1.7% speedup on sequences exceeding 250k elements.
I wonder if their paper has a plot of the speedups against number of elements. Did they just stop measuring at 250k? What was the shape of the curve?
I didn't bother reading, not so interested in AI stuff. But since they highlight these results, it likely does not get better for higher numbers.
I'll bet it just ends up being the limitations of memory bandwidth to stuff things into registers for the optimized algorithm. Or, something like Mojo's autotuning finds the best way to partition the work for the hardware.