When sorting data the NoChop algorithm is used in 2 phases. The build-phase creates an STree of all the keys (NoChop excels at this), then the read-phase extracts them all, from lowest to highest (a recursive-read is used). The STree data-structure is slightly-amended to allow for duplicate keys (a count-by-key is kept).
QuickSort was chosen to race against NoChop as it is viewed as being the quickest current sort algorithm. AGD Research also considers it to be a worthy competitor of NoChop; its' simplicity-and-elegance are awe-inspiring. It has less than 30 lines of code and uses miniscule memory (necessary for 1960 when it was discovered).
QuickSort works by swopping keys in an array if they are the "wrong-side" of a pivot-value. The array is then split into 2 and the process is repeated, then into 4, and so-on recursively (download the white-paper for details of NoChop and QuickSort from the Downloads web-page).
The accompanying spreadsheet shows the results of the 2 algorithms sorting 23k world-cites. Other tabs sort 10k passwords and 58k words, and new tabs and data-sets can easily be created. NoChop's supremacy is already at +50% (1.5 times faster = Mach 1.5). and this is with first-generation, non-optimized code (QuickSort has benefitted from over 60 years of "tweaking").
The NoChop/STree combination is ideal for massive parallelization, and the flow-diagram on the right shows one approach. Unsorted keys are first fed into a fast triage that sends them to their appropriate processor, where an STree is built. A simple triage would send them to 1 of 256 processors based on their first ascii-character. A smart one would balance load on a configurable number of processors.
As soon as the last key has been input into an STree, the sorted keys can be output, with zero latency. Then NoChp reads the sorted-keys in ascending order across processors and STrees. This approach lends itself to augmenting and/or replacing existing ones used to support platforms and operating-systems that require efficient and fast sorting.
A final speed improvement is that both the input-pipe (the unsorted keys as above) and the output-pipe (the sorted keys as above) can also be massively parallelized. Data held in a single input-pipe could be chopped-up and fed into separate triaging processes simultaneously (if possible within the context). Data held in a single output-pipe could also be chopped-up and fed to separate output destinations (again, if possible within the context). |