Mind wandering For some time I had an idea of getting some old PC/laptop, the motivation being: play old games, revisit the operating system of my childhood, put the machine in order, mess with it as I please. Basically, the only specific requirement was the presence of a CPU from Ivy Bridge generation for a full support of Windows XP (yeah I am not that old so it makes me nostalgic).

This is the closing sequel to the first part and dedicated to
the derivation of an iterative form of the Batcher’s Odd-Even Mergesort algorithm; implementing it with a GPU shader; a little of benchmarks for fun. Theory preparation Reminder from the first post The GPU’s computation model drastically differs from the CPU one. In simplest terms, we have magnitutes more workers (cores) organized in multiple disjoint local groups. The best (but pretty utopian) way of utilizing this model is to move the data to all of them and let them process it independently (or as locally as possible), because communication between workers from different groups is quite expensive and is able to offset the entire potential speedup from using GPU compared to CPU.

This post series is dedicated to the description and implementation of sorting an array with a graphics processing unit (GPU). It’ll be split into two posts.
In this part, we’ll first try to derive one of the earliest comparably efficient parallel sorting algorithm from the sorting networks class. What’s remarkable for this class of sorting algorithms, is that before performing the algorithm, we already know the whole sequence of pairs of indices to be compared and swapped (if needed).

I want to use it to write down some of the insights I gain in exploring different things. Additionally, to describe solutions to some problems I encounter.