New Ideas in IT

explore & exploit these new ideas and approaches from www.agdresearch.com

In R&D terms (research-and-development), AGD Research is a pure R organization (business and network). This site showcases its' current-findings to software-developers, and all who have an interest. The site-content is free-source IP, is in the public domain, and there to be utilized by any-and-all, however they wish.

The web-pages are designed to convey a summary understanding. For a detailed understanding there are downloads available on the Downloads web-page. These contain working examples, the computer code (MS VBA), and white-papers.

Symbolic-processors form a putative new application-type which possesses a universal data-model (the spectral-data-model) A symbolic-processor is a new approach to a new requirement, which in IT "speak"/vernacular is termed a "new-new" solution.

UK/Netherlands based, AGD Research (founded in 1999) is a niche provider of services to the major IT consultancies and fortune-500 companies in the US, Europe and APAC. These services have focused on implementing sophisticated optimization algorithms for supply-chains and value-chains.

AGD Research is also an informal network of consultants and technicians who have worked closely with each other over many years on multiple projects, and whose expertise has been combined and maximized to deliver significant business benefits.
About imageAbout image
A symbolic-processor (the Downloads web-page contains a proof-of-concept) is a putative new application-type that interprets SPL (symbolic-processing-language) and has been designed to enable this logic-chain:

Goal: A new, simple, single, & universal way of processing data (so commoditizing it)
Problem: A myriad of existing, competing and complex computer-languages
Solution: Extend the universal notation of arithmetic into computer-programming
Mechanism: Adopt the 4 principles below:

1. No Mathematical-Platonism
Traditional computer-languages, and indeed mathematics itself, use symbols to represent underlying objects that exist "out-there", such as integers, reals. bytes etc. These objects can then participate in various operations that again exist "out-there", such as addition, multiplication and concatenation etc.

Unlike this mathematical-Platonism, a symbolic-processor just processes symbols; no pretense is made of any underlying spooky "out-there" reality that they refer to; e.g. the statement abc + def is equally as valid as 123 + 456; it just returns a symbolic-string of zero length (nothing) as compared to the 3 symbol-string (579).

2. Spectral Data-Model

The spectral data-model maps symbolic-strings to physical data-structures and data-stores, whether these be in an application's fast-memory, an underlying file-system or database, on the web, etc. This allows a common way of maintaining data in any application, anywhere, whether it be a spreadsheet, a word-processor, an ERP system, accounting software, B2C retail, or whatever. The accompanying fractal tree-diagram (the pattern repeats itself in ever increasing granularity/resolution) represents the spectral data-model.

Data within a string has left-to-right asymmetry; leading symbols are more "well-known" and represent its key (the how-to-find) and are represented by colours towards the blue-end of the spectrum, while trailing symbols are less "well-known" and represent the record itself (the what-to-find) and are represented by colours towards the red-end of the spectrum.

The spectral data-model is implemented via 2 approaches. Firstly a thin processing layer is introduced at the application level to translate the local data-model (e.g. relational, file-system, spread-sheet etc.) into and out of the spectral data-model. Secondly data within the spectral model is held as a "Trie" data-structure. This data-structure has been optimized via "go-faster-striping" meta-data to deliver maximized performance (see the "Striped-Trie" web-page for details).

3. Combinatory Binary-Operations

With the exception of read-operations which utilize wild-cards and filters, all other operations in a symbolic-processor are binary; they have 1 operator, 2 input operands and 1 output, as in 1 + 2 which returns 3. These operations can be combined in various combinations across symbolic-strings to produce bijections and various Cartesian-products:
1 2 + 3 4 5 returns 1 5 4 5 (example standard)
1 2 ++ 3 4 5 returns 4 6 (example bijection)
1 2 +++ 3 4 5 returns 13 14 (example Cartesian-accumulate)
1 2 ++++ 3 4 5 returns 4 8 13 5 9 14 (example Cartesian-carry)
1 2 +++++ 3 4 5 returns 4 5 6 5 6 7 (example Cartesian-classic)

4. No Explicit Loops, Routines or If/Then Statements
The functionality above almost-entirely removes the need for a symbolic-processing-language to have explicit loops, routines or if/then statements. More fundamental operations (comparison and replication) can be used to provide this implicit functionality if required.

Details
For a detailed understanding there are downloads available on the Downloads web-page. These contain working examples, the computer code (MS VBA), and white-papers.






Symbolic-Processor imageSymbolic-Processor image
A striped-trie (an "STrie" for short - the Downloads web-page contains examples and white-papers) is a data-structure optimized for input-output performance for near key/entry matches (i.e. for next and previous key/entry location and maintenance - not just exact matches as in other algorithms such as hashing).

Goal: A trie data-structure optimized for use in the spectral data-model & other applications (e.g. data sorting, retrieval & maintenance etc.)
Problem: Standard tries are inefficient at data-storage and sub-optimal in near key/entry matching
Solution: Minimize data-storage and maximize near key/entry location performance
Mechanism: A 256 bit ascii pointer vector approach in combination with meta-data striping/clustering

The accompanying trie diagram of 4 colours has 3 nodes and 6 branches (one of which is empty). Navigating from the root-node outwards through all the branches and sub-nodes returns the 4 colours "Marigold, Magenta, Magnolia and Mauve" (the 4 keys/entries) as a sorted sequence.

In general, if n is the number of keys/entries and n+ the number of nodes, the nodes are represented as a 256 bit ascii pointer vector in 256 dimensions ("IT-speak"!). In perhaps simpler-but-equivalent "math-speak", they are represented as a sparse 2-dimensional integer matrix of n+ rows (the nodes) and 256 columns (the ascii character-set). This forms a "DNA" sequence of characters/symbols that navigates from the root-node outwards and reconstructs each individual key/entry.

Branches are represented as a 1-dimensional string-array of n+ rows. This forms a "junk-DNA" sequence of characters/symbols which do not affect the "node-navigation", but do contribute to the reconstruction of the key/entry. This "junk-DNA" usually comes at and/or near the end of keys (e.g. "ffalo" in "Buffalo"), but can occur anywhere (e.g. "est " in "West Carson" and "West Chester"). If a row-pointer is non-zero its column-position represents the ascii code concerned (e.g. a 123 in column 102 means "B"s are located in row 123, and a 4567 in column 165 means "u"s are located in row 4567).

The accompanying striping/clustering diagram shows how the sparse integer matrix is refined by meta-data which indicates where non-zero row-pointers occur within the nodes (i.e. "where the action is"). The vertical stripes partition the ascii character-set into like-bands where data tends to cluster (e.g. alphabetics, numbers, etc.). The horizontal lines and points demarcate within each node where the data actually clusters which minimizes the number of linear-scans required. In this implementation if there are 3 or less row-pointers they are represented as point-vectors (e.g. "r", "g" and "u"), otherwise they are represented as a range-vector and a single point-vector (e.g. "a to y" and "z").

This approach is essential to maximize near key/entry location performance which is required to out-compete current algorithms that sort, retrieve and maintain data (e.g. QuickSort, BinarySearch and BTree etc.). These are required for applications such as symbolic-processors, sort-engines, spell-checkers and text-auto-fillers, dictionaries, relational and NoSQL databases, file-systems, etc.

The Downloads web-page contains working-examples and white-papers of STrie versus QuickSort (50+ percent faster), STrie versus BinarySearch (350% faster) and STrie versus BTree (100+ percent faster). The proof-of-concept code is written in MS VBA which was chosen for its ability to seamlessly integrate with Excel.

Although these are significant the compelling differentiator of the algorithm is its suitability for massive parallelization. Both data and processing can easily be spread across CPUs. The simplest way of doing this would be to front-end the algorithm with a triaging routine that distributes data (and hence processing) to 1 of 256 CPUs, based on the initial character of the STrie. A more sophisticated triaging routine would better-balance loads based on usage criteria.







Striped-Trie imageStriped-Trie image

Download (to .zip compressed-format), unzip (to .xslm Excel/VBA-format), allow security if required (right-click on the file in file-editor, select Properties, then click the Security-Unblock box), open the file and respond to the "Enable" prompt(s). The user can first view the overview, tabs and white-paper safely (decline "Enable"), before enabling algorithms/VBA (accept "Enable"). Compatible with both 32 and 64 bit pc's.

SymbolicProcessor

Theory (white-paper) & Proof-of-Concept (command-line). Version 13.

AGD Research exists as both a network of IT consultants and as a provider of professional services to customers via associate organizations EDO and Peningo. Services include consulting, implementation, design, education-and-training, coding, and documentation etc.

To join the conversations surrounding Symbolic-Processors and supply-chain/value-chain optimization please contact AGD Research directly via the Contact web-page. To explore resourcing options please contact the above associate-organizations directly.






Services image
To contact AGD Research (the network and business) about any aspect of Symbolic-Processors or supply-chain/value-chain optimization please email aguthriedow@yahoo.com or novella.foster@gmail.com to receive additional information. This includes copious additional implicit knowledge (i.e. that which is not recorded on any media, such as what didn't work and why, and the underlying sub-cultural influences that helped drive these discoveries and developments). This augments-and-complements the explicit knowledge detailed in these web-pages and download-files.









Contact image