Advancing LLMs using 3 ways that have led to humanity’s success
There are some interesting parallels and insights to how we are developing machine intelligence leveraging 3 human capabilities that have over history contributed to our own success as a species: collaboration, an imagined set of rules and storytelling.
As an AI enthusiast who is avidly following GenAI’s public facing development and fame, I have been thinking of the parallels between how we as humans achieve our success and intelligence and how we are experimenting and crafting the intelligence of our AI machines and tools.
The 3 distinct human capabilities, as highlighted in Yuval Noah Harari’s works, that have allowed humans to advance beyond other primates and animals are also in part, the techniques employed by generative artificial intelligence (GenAI) and large language models (LLMs) to deliver value and are as follows:
Collaboration
Imagining a set of rules
Storytelling
Image from article here.
Collaboration
For us humans, collaboration and networking is based on our ability to scale as a ‘collective’ over and above what an ‘individual’ can accomplish on their own.
Humans have consistently proven that we are incredibly effective at achieving monumental tasks when we collaborate and work together, whether it was in ancient times when organising multiple people into a pyramid structure so as to fool a large animal that we humans were stronger and fiercer at battle or leveraging thousands of people to build the Egyptian pyramids. “Cooperation, coordination, and relationships became key to our survival and dominance over other species” (ref).
More than 30,000 years ago, our ability to think and communicate was a competitive edge because of language (ref). It is therefore not surprisingly that the ability of LLMs to respond to language and share information with us triggered an innate excitement from users.
Collaboration is all about scale and size. This has led me to thinking lately about size and under what circumstances is big vs. small a competitive advantage and of benefit to society when it comes to organising groups of people and organisations. What are the trade-offs in terms of productivity, efficiency, regulation and competitiveness?
What is the ideal team size and why?.
In trying to understand what is the optimal size for collaboration, one of the most interesting numbers is the Dunbar Number. Dunbar’s Number identifies the largest number possible for a cohesive community to stay together.
Robin Dunbar, a British anthropologist, found people can only ‘handle’ up to about 150 relationships — whether in early hunter-gatherer societies or in the modern workplace (ref).
Dunbar’s Number is usually defined as 150 but in reality it can range from 100 to 230 (it’s not very precise). “For a social group of this size to work people need to know one another well enough to effectively gossip about each other. Beyond this limit, things start breaking down” (ref).
Dunbar’s extensive research demonstrated 5 layers of collaboration. The tightest circle has just five people — loved ones. Successive layers include 15 (good friends), 50 (friends), 150 (meaningful contacts), 500 (acquaintances) and 1500 (people you can recognise).
According to the theory, people migrate in and out of these layers and space needs to be carved out for any new entrants” (ref).
Does Dunbar theory change in a digital age?
Apparently, not. Dunbar’s Number still limits our sociality in the digital world. Recent data on digital relationships suggests that the strength of social interactions reaches a maximum between 100 and 200 friends, in agreement with Dunbar’s prediction. While modern social networks allow us to ‘log’ people that we meet and interact with, these digital networks are unable to overcome the biological and physical constraints that limit stable social relations (ref).
So, what about companies, is there an ideal size, and why?
Dunbar’s Number, commonly set at 150 (ref), is alluded to Malcolm Gladwell’s book The Tipping Point: How Little Things Can Make a Big Difference as the magical number whereby social groups below this number function well without a necessity of a formal managerial, communicational overhead.
In reality, the size of corporations is limited by a range of factors including market demand, competition, government regulations, and the ability of the corporation to effectively manage and sustain its operations.
There are examples across industries where mega corporates, whether it is in food, tech, banking or retail, become extremely large from successful business strategies, acquisitions, and global expansion.
Some of the downsides of becoming larger include challenges in maintaining agility, innovation, and efficient decision-making. In most countries, antitrust laws prevent the formation of monopolies and promote fair competition, which can also serve as a limit to the size of corporations.
An example of this is in the United States, where if you have more than 500 shareholders you automatically turn into a public company, and are required to do all of the filings and disclosures as per this classification. The New York Times has an article on this topic given the fact that Facebook, Twitter, Zynga and LinkedIn are not listed on a public stock exchange. Each is privately held (ref). Unlike publicly-held companies, private companies are not subject to as many regulations (ref).
The importance of size when it comes to business is not only measured by number of employees but by annual revenue.
Collaboration and Large Language Models
Similar to how humans have made progress through collaboration, LLMs have made leaps in progress through the technique of networking.
LLM is a type of AI algorithm that uses deep learning techniques. Deep learning uses multi-layered neural networks. Many people are aware that this technique was inspired by the human brain, but what it also alludes to is the power of connecting one to many and using the coordination of many computers to achieve outcomes at scale and at speed.
Every neural network consists of layers of nodes— an input layer, one or more hidden layers, and an output layer. Each node connects to others, and has its own associated weight and threshold. An individual node gets activated if the output of any individual node is above the specified threshold value, sending data to the next layer of the network.
A neural network with more than three layers is considered a deep learning algorithm.
Since open-sourced LLMs were released onto the world stage, experts and the general public have been quick to weigh in on both the dazzling capabilities as well as the silly errors derived from these shiny, new tools.
Understandably, LLMs’ impressive ability to generate text that mimics human-like responses created a surge of interest and hype.
Unrealistic fears of machines taking over have mostly dissipated for two reasons, firstly GenAI is already proving to be useful in almost every field and people are becoming more aware that LLMs are not without their challenges and problems and further development is required before GenAI meets people’s expectations.
While current LLMs generate responses to questions; their effectiveness is often limited by the sub-optimal quality of their answers (ref).
As succinctly summarised by Gabriel Scali, LLMs are limited by a lack of common-sense, sufficient contextual understanding, explainability and interpretability. LLMs are also prone to biases and ethical concerns, require high computational resources, have high environmental impacts and raise concerns regarding data privacy and security.
There is also growing concern that current methods for LLMs are approaching their limits.
Recent advancements in LLMs are undeniably impressive; however, there appears to be smaller and smaller performance gains and the discovery that increasing model size and complexity is not translating to proportionally better results (ref).
As people continue to share the results of experiments and approaches for optimising the outputs of these large and at times, uncanny, beasts of data — there has been the realisation, as of late, that big is not necessarily better.
In fact — “small is the new big.”
It has been found that the combination of fine-tuning the LLM with a process known as Retrieval Augmented Generation (RAG) helps to generate responses with improved accuracy (ref).
There has been a shift in focus from trying to improve model performance from simply increasing model size to exploring more efficient and specialised architectures (ref).
Another technique that has been advocated for by many, including the infamous Andrew Ng is quantization.
Quantization is a method that can allow models to run faster and use less memory. “LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware” (ref). Quantization aims to compress models while maintaining reasonable performance.
Small models can also be beneficial in terms of energy and costs. LLMs require vast amounts of energy to train and run.
AI startup Hugging Face estimated the overall emissions for its own large language model, BLOOM, by estimating emissions produced during the model’s whole life cycle rather than just during training (ref).
When Hugging Face examined the impact of their LLM called BLOOM, they found training “led to 50 metric tons of carbon dioxide emissions is the equivalent of around 60 flights between London and New York” (ref).
After BLOOM was launched, the model emitted around 19 kilograms of carbon dioxide per day, which is similar to the emissions produced by driving around 54 miles in an average new car.”
LLMs have revolutionised a host of applications including text generation, language translation and creative content generation.
GenAI, a set of algorithms built on top of foundation models, generates text, images, or audio from the training data and has massive implications for businesses needing to generate texts based on natural language instructions to power chatbots, condense lengthy documents into brief summaries, perform data entry, write software, find common bugs in code and many more (ref).
The emergence of small LLMs is so far proving to offer similar functionalities to LLMs but with greater efficiency (such as quick training and cost effective deployment), accessibility (including flexible hardware requirements and lower data requirements) and functionality (i.e. text generation and language translation) (ref).
Rules
Humans love rules — we love making them and we love breaking them.
Rules are not only the set of instructions that help us organise and govern our lives and the societies that we shape, but rules include our values and principles.
So, does size matter when it comes to making, disseminating and enforcing rules?
In his book, 21 Lessons for the 21st Century, the bestselling author, Yuval Noah Harari declares that:
“Homo sapiens is a post-truth species, whose power depends on creating and believing fictions. Ever since the stone age, self-reinforcing myths have served to unite human collectives. Indeed, Homo sapiens conquered this planet thanks above all to the unique human ability to create and spread fictions. As long as everybody believes in the same fictions, we all obey the same laws, and can thereby cooperate effectively” (ref).
The last point is such a critical component of protecting and nurturing society’s fabric and values — maintaining a common set of rules that everyone follows.
Algorithms are basically sets of instructions — written by humans and translated into a sequence of operations that computer hardware can execute (ref).
What is fascinating is that the history of AI has been dominated by two main paradigms: connectionism and Symbolic AI (ref).
The connectionist paradigm is associated with processing vast amounts of data and learning a statistical model from it. Recently, the progress in GenAI and LLMs has taken this approach.
The symbolic paradigm is founded on knowledge representation, reasoning, and logic. Symbolic AI relies on explicit rules and logical inference to solve problems and perform higher-level cognitive tasks (ref).
Recent advances in the AI products being build from GenAI improvements using the connectionist paradigm are encouraging and inciting increased interest in both causal AI and symbolic AI.
As described earlier in this piece, connectionism which focuses on creating adaptive networks that can learn and recognise patterns from vast amounts of data has led to remarkable breakthroughs but is constrained by limitations, most notably, a levelling off of its improvement curve, even with exponential increases in computing power and vast quantities of data. Of critical importance, connectionism (aka predictive AI) also lacks explainability (see my earlier post on the importance of Causal AI for this reason).
The scientific community is increasingly looking for ways to design AI systems that combine the power of connectionism with the explainability of symbolism.
Symbolic AI relies on explicit rules and logical inference to solve problems and perform higher-level cognitive tasks; thus focusing on the processing and manipulation of symbols or concepts, rather than numerical data (ref).
The difficulties encountered by symbolic AI are significant, including the common sense knowledge problem and the sheer number of rules and facts that need to be specified to power these symbolic AI systems, which is enormous (ref). A possible approach to tackle this challenge is Hybrid AI which combines connectionist AI and symbolic AI. Hybrid AI aims to use statistical models to recognise patterns, correlations, and relationships in data and extract the symbolic rules emerging from the data (ref).
Storytelling
If humans cooperating with one another is akin to neural networks and the rules or instructions that we design are similar to AI algorithms that power the outputs generated, then perhaps storytelling is analogous to our visions of what AI is capable of solving in the future.
Storytelling is the context, the glue that holds it all together — the magical, imaginative purpose that translates two inputs into a output.
Human Collaboration + Rules/Instructions = Outcome (Our Story).
Image from Nightcafe
The biggest idea in Sapiens was the intersubjective — shared fictions like gods, nations, money, or human rights that let strangers cooperate and dominate both the objective and subjective worlds (Corey B, ref).
“The power of human cooperation networks depends on a delicate balance between truth and fiction. If you distort reality too much, it will weaken you, and you will not be able to compete against more clear-sighted rivals. On the other hand, you cannot organise masses of people effectively without relying on some fictional myths. So if you stick to unalloyed reality, without mixing any fiction with it, few people will follow you.”
We can choose what story we want to tell and make a reality. Unfortunately storytelling can lead to negative consequences if distorted and manipulated. There have been ample examples in the digital age and expansion of social media where storytelling has led to depolarisation, propaganda, fraud and serious harm. The scale and speed of storytelling transmission in the digital world means information can become viral and get spread to wider corners of the earth at a faster speed than in previous decades.
We are at an interesting inflexion point with AI advancements. We can choose the story we want our collaborative efforts and rules/values to develop and build. It is our story to tell. Let’s hope we make it a good one for all.
Thanks for reading!