The Bonsai AI Revolution
Amid the clamour of the AI world, which increasingly resembles a chilli-eating contest—who is the spiciest, the biggest, the most expensive—a “village kid” named PrismML suddenly appears. It arrives without much fanfare. It does not bring mountains of GPUs or server farms the size of football pitches. It brings only one thing that sounds like a joke: 1-bit AI. And that night, on Fahd Mirza’s YouTube screen, the joke turned into a harsh slap. The AI model named Bonsai answered mathematics questions neatly. Bonsai also proved clever at writing deep-sea simulation code complete with glowing jellyfish. Even when psychologically tricked and manipulated, it refused—without hallucinating. Not only fast, it is also sane. Fahd could only say, “Wow.” A simple word that usually emerges when we run out of vocabulary facing something beyond expectations. The name “Bonsai” is not just an artistic label. It is a very apt metaphor. A bonsai tree is not a random miniature. The tree is shaped, pruned, and optimised with precision, so its small form still carries the structure and identity of a large tree. Small, but whole. Concise, but still complete. And there lies the silent message: this is not just about shrinking, but redesigning so that the small remains meaningful. So, what exactly is the “bit” we have heard about for that fruit? Let us lower our ego a bit, accustomed to big numbers. In the AI world, a “bit” is like a life choice. 32-bit AI means full colour: millions of possible values, high precision, like a Michelin chef weighing salt to fractions of a gram. 16-bit or f16 AI is the stage of starting to economise. It is still sophisticated, but not too verbose—precise enough for most modern AI needs. Then come 8-bit, 4-bit, 2-bit AI. This is like student dorm life: what matters is it’s enough, simple, not necessarily perfect. And 1-bit AI? It is like a world that only knows two answers: yes or no. No grey areas. No drama. Just black and white. Technically, if 32-bit can store numbers with millions of variations, 1-bit only stores two possibilities: 0 or 1. And AI machines only know numbers, not words. In AI models, this means every weight—which is usually complex like long fractions—is forced into a simple decision: active or not, up or down. The problem is, we have long believed that the simpler the representation, the dumber the result. That is an unwritten law in computing: reduce precision, intelligence drops. But here, the PrismML team plays like magicians defying the laws of physics. They do not just “compress” the AI model. They redesign its way of thinking. The entire network in the AI—from embeddings, attention, multilayer perceptrons, to output layers—is built entirely in 1-bit. No back doors. No hidden tricks with high precision. No secret compromises. This is not a diet. It is a total transformation. Behind that apparent near-reckless simplicity, PrismML’s scientific work stands on long research that goes against the current. Over the past decade, almost all major labs have moved on one conviction: the bigger the model, the smarter the result. Parameters added, data expanded, computation enlarged. Intelligence treated like nasi Padang—just add portions. But the PrismML team chose a quieter path: not enlarging the brain, but densifying the mind. They call it intelligence density—the density of intelligence per unit of model size. If the old approach asks, “How smart is this model?”, the new approach asks, “How efficiently is this smartness packaged?” In this framework, intelligence is no longer standalone but always linked to size and cost. It is even formulated as the relationship between the model’s error rate and its size. The smaller the size with consistently low error, the higher the density value. This is not just a new metric. It is a change in perspective on intelligence itself. Unlike old techniques like quantisation that only shrink the model after training—like reducing a high-resolution photo—the new approach builds the model from the start to live in extreme constraints. The 1-bit world is not a world of compromise, but of redesign. The challenges are certainly not small. How to maintain reasoning ability if every decision is only two choices? How to keep information flow intact in a highly discrete network? The answer lies in architectural engineering and training methods that keep signals strong and stable. Even though each element only chooses between two states, the overall arrangement can still form complex patterns sufficient to support reasoning. And the results are starting to be felt when compared to the history of previous small models. We have seen DistilBERT, MobileBERT, and various other small models trying to make AI lighter. But the recurring pattern is compromise. The smaller the model, the quicker it loses complex thinking ability. Simple tasks might still work. But it starts to falter when facing multi-level mathematics, layered logic, and programming.