Moving foward with fixated vocabularies aimed at general, symbolic, and medical.
So; the way I have planned currently is each vocabulary node houses 5 rotations of translated sparse pentachora as anatomical rotation blueprints. Those are meant to be very sparse and very frozen; however they are also built with the idea that - these anchored views and the rotation offsets are meant to solidify down the rope. Essentially, standard vocabulary tends to drift the longer you traverse a rope. This causes many cascade failures and problems, even with experts in charge of the tokens.
With each node comes a component twin; meant to be specifically curated based on the dataset. Those twins are tuned and aimed at training time - meant to associate semantics with symbolic and become generalizable. This has shown high yield potential in the image classification systems.
MLP and expert tokens assist with this, but as you know there are byproducts. This solution - using CONV buffer registration, is meant to solidify anchoring. Each elemental depth of those 5 pentachora shapes are meant to be one of five, so each dimensional structure is meant to be rotated uniquely from the other - allowing imposing information from any actual corpus used to assist with guiding and rapid shortcutting between internal structures.
Untested at scale and yet tested on smaller systems with high success
This is fairly untested on long chain, however it HAS shown some serious promise with smaller crystallization systems such as David being capable of utilizing standard features to classify imagenet 1000 at a higher accuracy than the original clip model could - just from the very features the clip model spits out.
This sort of response isn't to be dismissed, and this sort of conv pattern recognition can very well be normalized and applied downstream in very utilizable trajectory-based ways - creating legitimate output shifts and legitimate change through the core of a system. I believe this with the utmost of my being, as I've seen similar systems work.
David-imagenet 1000 handles 512, 768, and 1280 dim features. It just can handle them, and those very features can be utilized by the classification system as very similar and accurate to the expected patch features or images when ran through with untaught validation sets. This classification system works, and thus the gate works. That's 100% fact, but the gate system is not 100% accurate - which I believe will be solved through correct geometric tuning and expanded vocabulary access directly into a larger LLM than simply a little clip model with it's pictures.
They share latent space, I've already done the logistics for this as well when i created a full standalone vit using geometric systems. The vit systems defaulted to cosine almost exclusively for learning, which means the vits themselves weren't actually using the geometry - it was the classification using it. The classification found those patterns, and thus those patterns can definitely be directly associated with those classification structures. There is no doubt about it in my mind, this will work, and it will extend the rope indefinitely if carefully organized.
It will reduce drift, reduce hallucination, increase accuracy, increase access, reduce token cost, improve performance, reduce vram requirement, reduce disc requirement, and even introduce new elements that could not be trained before - such as adding or removing entire harmony chains in the future.
This is possible and even doable, we are at the stage where it must be done. It MUST be handled correctly and curated correctly.
Pushing into medical and why
There is a few reasons for this. Primarily, I believe systems like MRI, scanning, imaging, and diagnosis will heavily benefit from such structural awareness when tied into handheld medical tools that will guide diagnosis. These sorts of systems must be as accurate as possible, and I believe my systems when correctly tuned will find those exactly avenues required to diagnose and save lives.
This is a no brainer to me. By saving people, these systems not only prove their worth, but they also prove that novel systems can in fact be utilized in careful consideration towards optimization - not requiring massive structural systems to conjoin or contort the output in various directions. Instead, the systems forming cohesive generalization in highly compact spaces in order to pass those diagnostics to a much more intelligent variant downstream.
Run a thousand and then get some options, run those options on highly specialized forms and get much more accurate diagnostics much faster. I believe this is a tremendous shift for medical - allowing more rapid diagnostics faster, potentially preventing many deaths and many problematic issues in the future from developing.
These little systems are more accurate as it is now than many of the larger competitors, and they can be specifically finetuned with curative potential in mind - meaning their outputs will be captured as feature maps for more specific models to detect correctness with - while simultaneously allowing for smaller models to be DIRECTLY finetuned on the individual patients using their specific information at runtime.
Generalization and importance of generalizing
Generalization is highly utilizable on many fronts and this is particularly important for LLMs. Those generalized sets of information can be pooled better and gated differently, similar to passing a series of "could be" down a pipeline to other small AIs to see as "maybe is" - is akin to seeing MANY divergent opinions formed from many different variants of the same information all running as multi-opinion solidification structures.
Group bias and forethought
The potential for large count group bias is always present, which is why the vocabulary tuning is better performed based on the corpus directly. In this case, I believe the corpus is substantially important with sequential lexical, ngram expansion, and a large expansive subsect of inflated informational access throughout much larger models.
The more general, the more bias is likely to form and be completely unrelated tro the valuations. Specifics and the specific experts are to be trained independently in experimentation, and those expert opinions are to be passed down stream for other experts to process.
These are completely independent AI and must be treated as such, meaning they will be trainable in parallel and not require the full contrastive learning required of a full model as of today, and with that whatever they are trained with is better off frozen at first and then expanded into a direction of deviance.
That's what the pentachoron and geometric AI training experiments have yielded the most information about.