New Fast Algorithms & Data-Structures

the leanest cleanest & greenest discoveries in IT
explore & exploit these prize-winning ideas from www.agdresearch.com

In R&D terms (research-and-development), AGD Research is a pure R organization (business and network). This site showcases its' current-findings to software-developers, and all who have an interest. The site-content is free-source IP, is in the public domain, and there to be utilized by any-and-all, however they wish.

The web-pages are designed to convey a summary understanding. For a detailed understanding there are downloads available on the Downloads web-page. These contain working examples, the computer code (MS VBA), and white-papers.

Of particular interest is the NoChop algorithm and its' STree (sparse-tree) data-structure. This was a major prizewinner in the Blue-Yonder and its' parent company Panasonic's 2022 "Crystal-Ball" competition, coming ahead of over 100 competing ideas in AI and IT from around the world.

The accompanying diagram positions NoChop within the IT "stack" (a ubiquitous hierarchy representation of software dependencies). This deep-stack algorithm is theoretically up to 8 times faster sorting, retrieving and maintaining data than current-ones. In-practice it already has a demonstrable-speed-supremacy over QuickSort (+50%), BinaryChop (+350%), and BTree (+100%).

These algorithms form necessary-and-critical components of databases (relational and NoSql/distributed in-memory), platforms, and operating systems, as well as certain sophisticated applications such as virtual-machines. They are fundamental building-blocks supporting the entire stack-structure of IT. NoChop/STree is a new-and-better approach to an existing requirement, which in IT "speak"/vernacular is termed a "new-old" solution.

The NoChop/STree combination was recently discovered by AGD Research as a bi-product of research into symbolic-processors. These form a putative new application-type which possesses a universal data-model (the spectral-data-model), that is super-set of the STree data-structure. A symbolic-processor is a new approach to a new requirement, which in IT "speak"/vernacular is termed a "new-new" solution.

UK/Netherlands based, AGD Research (founded in 1999) is a niche provider of services to the major IT consultancies and fortune-500 companies in the US, Europe and APAC. These services have focused on implementing sophisticated optimization algorithms for supply-chains and value-chains.

AGD Research is also an informal network of consultants and technicians who have worked closely with each other over many years on multiple projects, and whose expertise has been combined and maximized to deliver significant business benefits.
About imageAbout image
The recently discovered NoChop algorithm, with its' STree (sparse-tree) data-structure, uniquely enables this logic-chain:

Goal: Sort, retrieve and maintain data significantly faster and efficiently than now
Problem: A speed-limit existed, controlled by n log(n) where n is the number of keys
Solution: Break this speed-limit, and set a new one controlled by n/8 log(n)
Mechanism: Exploit hidden structure in the keys to achieve this 8-fold improvement

Breaking the sound-barrier is analogous to breaking the n log(n) limit. A new engine and airframe were needed to break the sound-barrier. Similarly, a new algorithm (NoChop) and data-structure (STree) is needed to break the n log(n) barrier.

The critical understanding to take from the accompanying diagram is that legacy algorithms (BTree etc.) all have a maximum of 2 branches per processing-node, compared to NoChop's 256. This is overhead-free (no "chop" or partitioning is needed) and results in up to 8 times less processing-levels (2 ^ 8 = 256 = the number of ascii characters), with a corresponding 8-fold speed-limit increase.

A complimentary way of understanding the NoChop algorithm is that it first builds an index (the STree) of a set of records in super-fast time. For sorting records it then reads the entire index in ascending order to deliver the sorted-records, again in super-fast time. For retrieving and maintaining data it only needs to read a single index entry.

STree stores nodes as row-pointers in a sparse-matrix of n+ rows and 256 columns, and branches (or "junk-DNA") in a string-array of n+ rows. This "junk-DNA" usually comes at the end of keys (e.g. "ffalo" in "Buffalo"), but can occur anywhere (e.g. "est " in "West Carson" and "West Chester"). If a row-pointer is non-zero its column-position represents the ascii code concerned (e.g. a 123 in column 102 means "B"s are located in row 123, and a 4567 in column 165 means "u"s are located in row 4567).

This STree dual data-structure (unlike hashing) enables searching for both exact record matches (e.g. = "Buffalo") and next/previous record matches (e.g. > "Buffalo", >= "Buffalo", < "Buffalo" and <= "Buffalo").

As every algorithm has a different processing-overhead at each level, the best comparison method is to "race" them (i.e. follow an empirical approach to re-enforce the theoretical one). This is exactly what the subsequent web-pages explore, with a NoChop supremacy +50% over QuickSort, +350% over BinaryChop and +100% over BTree.

There are detailed white-papers, working examples with test-data, and the computer-code can be examined. Additional test-data can easily be added by the user. Smoothing is used to average-out background-processing. Access to all these materials is via the Downloads web-page.

Developers and researchers are encouraged to build on the above. The initial coding and the comparison class-library algorithms are written in the VBA programming language due to its' ability to easily-integrate with front-end tools such as spreadsheets which helps aid knowledge-transfer and education. The next steps would be to replicate-and-improve upon these results in other more "industrial" languages such as Python and C, and then to expand into faster implementation ones such as Go, and eventually to assembler and machine-code. New class-libraries for NoChop/STree could then be built using these advanced languages.

The STree data-structure is very amenable to massive parallelization, the simplest design having 256 processors. Simple triaging would assign each key to its appropriate core based on its first ascii character. For sorting data, the result could be read-off with zero latency as soon as the last key was processed. For retrieving and maintaining data an in-memory model would provide a near instantaneous response.

NoChop is technically "stable" with no conditions that make it run slowly. This is unlike other current algorithms (e.g. QuickSort pivots and BTree re-balancing).

As NoChop achieves its' supremacy by performing less operations than its' legacy competitors, not more, it should generate a green-dividend in-line with the performance-dividend (e.g. significant reductions in electricity-consumption, improved environmental "footprints" of data-centres, etc.). Existing pools of DRAM and flash-memory can be sequestered by NoChop to build the required STrees in-memory.

This then is the potential of the NoChop algorithm: simultaneously delivering significant performance increases with commensurate reductions in electricity consumption. From a game-theory analysis with 3 "players", the end-user wins (quicker response times), the developer wins (increased competitiveness), and the planet wins (less greenhouse gasses).


NoChop/STree imageNoChop/STree image
When sorting data the NoChop algorithm is used in 2 phases. The build-phase creates an STree of all the keys (NoChop excels at this), then the read-phase extracts them all, from lowest to highest (a recursive-read is used).  The STree data-structure is slightly-amended to allow for duplicate keys (a count-by-key is kept).

QuickSort was chosen to race against NoChop as it is viewed as being the quickest current sort algorithm. AGD Research also considers it to be a worthy competitor of NoChop; its' simplicity-and-elegance are awe-inspiring. It has less than 30 lines of code and uses miniscule memory (necessary for 1960 when it was discovered).

QuickSort works by swopping keys in an array if they are the "wrong-side" of a pivot-value. The array is then split into 2 and the process is repeated, then into 4, and so-on recursively (download the white-paper for details of NoChop and QuickSort from the Downloads web-page).

The accompanying spreadsheet shows the results of the 2 algorithms sorting 23k world-cites. Other tabs sort 10k passwords and 58k words, and new tabs and data-sets can easily be created. NoChop's supremacy is already at +50% (1.5 times faster = Mach 1.5). and this is with first-generation, non-optimized code (QuickSort has benefitted from over 60 years of "tweaking").

The NoChop/STree combination is ideal for massive parallelization, and the flow-diagram on the right shows one approach. Unsorted keys are first fed into a fast triage that sends them to their appropriate processor, where an STree is built. A simple triage would send them to 1 of 256 processors based on their first ascii-character. A smart one would balance load on a configurable number of processors.

As soon as the last key has been input into an STree, the sorted keys can be output, with zero latency. Then NoChp reads the sorted-keys in ascending order across processors and STrees. This approach lends itself to augmenting and/or replacing existing ones used to support platforms and operating-systems that require efficient and fast sorting.

A final speed improvement is that both the input-pipe (the unsorted keys as above) and the output-pipe (the sorted keys as above) can also be massively parallelized. Data held in a single input-pipe could be chopped-up and fed into separate triaging processes simultaneously (if possible within the context). Data held in a single output-pipe could also be chopped-up and fed to separate output destinations (again, if possible within the context).




Sort imageSort image
Retrieving data from file-systems or databases requires 2 modes; exact-match and near-match. Exact-match retrieves data based on a key. An example is locating a word in a dictionary (e.g. "buffage"; it either exists or it does not). Near-match optionally includes the previous or next key (e.g. "buff" or "buffalo"). NoChop and BinaryChop support both modes.

BinaryChop (aka BinarySearch) was chosen to race against NoChop as it is viewed as being the quickest current data-retrieval algorithm. AGD Research also considers it to be a worthy competitor of NoChop; its' simplicity-and-elegance are awe-inspiring. It has less than 30 lines of code and uses miniscule memory.

BinaryChop works by chopping a sorted list of keys into 2 equal-sized half-lists and establishing if the search key is in the first or second half-list, and retains that half before discarding the other. It then keeps repeating this process until the search-key is located or not (download the white-paper for details of NoChop and QuickSort from the Downloads web-page).

The accompanying spreadsheet shows the results of the 2 algorithms retrieving 10k passwords and 23k world-cities, and new tabs and data-sets can easily be created. NoChop's supremacy is already at +350% (4.5 times faster = Mach 4.5). and this is with first-generation, non-optimized code (BinaryChop has benefitted from over 60 years of "tweaking").

The NoChop/STree combination is ideal for massive parallelization, and the accompanying flow-diagram on shows one approach. A set of 256 STrees is pre-built containing the data and a simple triage would interrogate the appropriate one based on the first ascii-character of the search-key. A smart one would balance load on a configurable number of processors.

This approach lends itself to augmenting and/or replacing existing ones used to support the back-end of online-and-mobile retail and similar applications. Both SQL (relational) and NoSQL (distributed in-memory) databases can easily incorporate the NoChop/STree combination in their existing architectures.





Retrieve imageRetrieve image
Maintaining data in file-systems or databases requires 2 modes; exact-match and near-match. Exact-match retrieves data based on a key. An example is locating a word in a dictionary (e.g. "buffage"; it either exists or it does not). Near-match optionally includes the previous or next key (e.g. "buff" or "buffalo"). NoChop and BTree support both modes.

BTree was chosen to race against NoChop as it is viewed as being the quickest current data-maintenance algorithm. AGD Research considers it to be less of a competitor of NoChop; its' over 600 lines long and requires complex rebalancing logic.

BTree works by building-and-maintaining a tree structure (a BTree). Each node in the tree (which is also a key) connects to 2 others; 1 lesser and 1 greater. BTree then navigates through these nodes until the search-key is located or not (download the white-paper for details of NoChop and BTree from the Downloads web-page).

The accompanying spreadsheet shows the results of the 2 algorithms retrieving 10k passwords, and new tabs and data-sets can easily be created. NoChop's supremacy is already at +100% (2.0 times faster = Mach 2.0). and this is with first-generation, non-optimized code (BTree has benefitted from over 60 years of "tweaking").

The NoChop/STree combination is ideal for massive parallelization, and the accompanying flow-diagram shows one approach. A set of 256 STrees is built, maintained and read. A simple triage would assign a key to a processor based on the first ascii-character of the search-key, for both inputs (adds, updates and deletes) and outputs (reads). A smart one would balance load on a configurable number of processors.

This approach lends itself to augmenting and/or replacing existing ones used to support the back-end of online-and-mobile retail and similar applications. Both SQL (relational) and NoSQL (distributed in-memory) databases can easily incorporate the NoChop/STree combination in their existing architectures.





Maintain imageMaintain image
A symbolic-processor (the Downloads web-page contains a proof-of-concept) is a putative new application-type that interprets SPL (symbolic-processing-language) and has been designed to enable this logic-chain:

Goal: A new, simple, single, & universal way of processing data (so commoditizing it)
Problem: A myriad of existing, competing and complex computer-languages
Solution: Extend the universal notation of arithmetic into computer-programming
Mechanism: Adopt the 4 principles below:

No Mathematical-Platonism
Traditional computer-languages, and indeed mathematics itself, use symbols to represent underlying objects that exist "out-there", such as integers, reals. bytes etc. These objects can then participate in various operations that again exist "out-there", such as addition, multiplication and concatenation etc.

Unlike this mathematical-Platonism, a symbolic-processor just processes symbols; no pretense is made of any underlying spooky "out-there" reality that they refer to; e.g. the statement abc + def is equally as valid as 123 + 456; it just returns a symbolic-string of zero length (nothing) as compared to the 3 symbol-string (579).

Spectral Data-Model

The spectral data-model maps symbolic-strings to physical data-structures and data-stores, whether these be in an application's fast-memory, an underlying file-system or database, on the web, etc. This allows a common way of maintaining data in any application, anywhere, whether it be a spreadsheet, a word-processor, an ERP system, accounting software, B2C retail, or whatever. The accompanying fractal tree-diagram (the pattern repeats itself in ever increasing granularity/resolution) represents the spectral data-model.

Data within a string has left-to-right asymmetry; leading symbols are more "well-known" and represent its key (the how-to-find) and are represented by colours towards the blue-end of the spectrum, while trailing symbols are less "well-known" and represent the record itself (the what-to-find) and are represented by colours towards the red-end of the spectrum.

Combinatory Binary-Operations

With the exception of read-operations which utilize wild-cards and filters, all other operations in a symbolic-processor are binary; they have 1 operator, 2 input operands and 1 output, as in 1 + 2 which returns 3. These operations can be combined in various combinations across symbolic-strings to produce bijections and various Cartesian-products:
1 2 + 3 4 5 returns 1 5 4 5 (example standard)
1 2 ++ 3 4 5 returns 4 6 (example bijection)
1 2 +++ 3 4 5 returns 13 14 (example Cartesian-accumulate)
1 2 ++++ 3 4 5 returns 4 8 13 5 9 14 (example Cartesian-carry)
1 2 +++++ 3 4 5 returns 4 5 6 5 6 7 (example Cartesian-classic)

No Explicit Loops, Routines or If/Then Statements
The functionality above almost-entirely removes the need for a symbolic-processing-language to have explicit loops, routines or if/then statements. More fundamental operations (comparison and replication) can be used to provide this implicit functionality if required.






Symbolic-Processor imageSymbolic-Processor image

Download (to .zip compressed-format), unzip (to .xslm Excel/VBA-fpormat), allow security if required (right-click on the file in file-editor, select Properties, then click the Security-Unblock box), open the file and respond to the "Enable" prompt(s). The user can first view the overview, tabs and white-paper safely (decline "Enable"), before enabling algorithms/VBA (accept "Enable")

NoChop-vs-QuickSort

Theory (white-paper) & Practice (algorithms raced empirically)

NoChop-vs-BinaryChop

Theory (white-paper) & Practice (algorithms raced empirically)

NoChop-vs-BTree

Theory (white-paper) & Practice (algorithms raced empirically)

SymbolicProcessor

Theory (white-paper) & Proof-of-Concept (command-line)

AGD Research exists as both a network of IT consultants and as a provider of professional services to customers via associate organizations EDO and Peningo. Services include consulting, implementation, design, education-and-training, coding, and documentation etc.

To join the conversations surrounding NoChop/STree, Symbolic-Processors and supply-chain/value-chain optimization please contact AGD Research directly via the Contact web-page. To explore resourcing options please contact the above associate-organizations directly.






Services image
To contact AGD Research (the network and business) about any aspect of NoChop, STree,  Symbolic-Processors or supply-chain/value-chain optimization please email novella.foster@gmail.com or aguthriedow@yahoo.com







Contact image