Parallel Sorting on GPU: Part 2
This is the closing sequel to the first part and dedicated to
the derivation of an iterative form of the Batcher’s Odd-Even Mergesort algorithm; implementing it with a GPU shader; a little of benchmarks for fun. Theory preparation Reminder from the first post The GPU’s computation model drastically differs from the CPU one. In simplest terms, we have magnitutes more workers (cores) organized in multiple disjoint local groups. The best (but pretty utopian) way of utilizing this model is to move the data to all of them and let them process it independently (or as locally as possible), because communication between workers from different groups is quite expensive and is able to offset the entire potential speedup from using GPU compared to CPU.