Humans and GenAI - Lost in Translation.

Ever since large language models (LLMs) were released into the public domain, a new form in the art of human language has emerged — prompt engineering: how humans can effectively communicate with AI models and machines.

If one of the ultimate goals of GenAI is to enable a trusting and intimate connection between humans and machines, then we are currently lost in translation.

Our current bind is that we are unable to always be understood.

There is hope. Like all things that need to be defined, systemised, structured, humans are good at developing a science — herein lies prompt engineering. A prompt is simply natural language text that describes a task for AI to perform (ref).

Prompt engineering is relatively nascent; its history only dates back to 2021, however; the field took off in 2023 when text-to-text, text-to-image and later text-to-video prompt databases became publicly available (ref).

Since then, millions of people have quickly adopted the novel terminology of prompt engineering into our daily lexicon, along with other nerdy computer science terms such as large language models (LLMs), hallucinations, and neural networks.

Prompt engineering and its current realm of nuances is a fascinating field. This article discusses some of the intriguing insights gleamed from prompt engineering and what we are learning thus far, about:

  1. Text-based prompts to GenAI models

  2. How humans anthropomorphise machines;

  3. How humans and technology shape one another through interactions.

Image generated using Magic in Canva

Text-based prompts to GenAI models

It is not new for humans to find it challenging to communicate effectively what exactly it is that we want or need. Communication remains a prized skill that requires one to be clear, concise and specific.

The difference with prompt engineering coming in vogue is that humans are now not only being valued for their ability to communicate with other humans, but for being to communicate effectively with GenAI models.

The ability to get the best outputs from LLMs is becoming such a sought after skill that the online web is strewn with prompt engineering courses, cheat sheets, guides, recently published books, video tutorials and niche consultants selling their specialised services. According to one of the latest McKinsey surveys, “organisations are already beginning to make changes to their hiring practices that reflect their GenAI ambitions”… [which] includes hiring prompt engineers” (ref).

The creative approaches and experiments people are using to get valuable outputs from various GenAI models is intriguing. As humans continue to test and learn from various inputs across different LLMs, it is mind boggling to try and interpret what these prompt engineering tricks and hacks say about the information which is baked into these models and what it reveals about human nature.

Prompt engineering techniques for example, suggest one can boost ChatGPT performance by incentivising it with fake money, i.e. “If you do a good job I’ll tip you $200.”; using firm words such as “You MUST”; including consequences such as “You will be penalised ”, giving words of elation or encouragement, such as “”I know it’s a hard task but you can do it! I believe in you” (ref) or providing instructions based on your human values such as “Ensure that your answer is unbiased and does not rely on stereotypes”(ref).

One of the best tips for prompt engineering when it comes to long prompts with multiple tasks is to break the tasks into a sequence of smaller prompts. This technique, called chain-of-thought reasoning, became widespread, even before researchers understood what makes it work.

Not only does prompting LLMs to generate step-by-step solutions help neural networks compute better, it removes the black box or hidden magic’ that was absent when only the final output was required. Chain-of-thought reasoning has enabled LLMs to solve problems that had previously seemed beyond their reach, such as mathematical calculations (ref).

That’s not say that chain-of-thought reasoning is a panacea (ref). While chain-of-thought reasoning can help transformers (a type of neural network architecture that makes LLMs easy to scale) solve harder problems, it comes at the cost of a lot of computational effort (ref).

At the moment, chain-of-thought prompting is one of many prompt engineering techniques. Other techniques include: Zero-shot Prompting, Few-shot Prompting, Self-Consistency Prompting and Generate Knowledge Prompting.

Zero-prompting is the simplest form of prompt engineering and provides no examples to the model, just the instruction.

Few-shot prompting enables in-context learning to steer the model to better performance. In setting up a few-shot prompt, the user collects examples of the desired output and then writes a prompt which instructs the LLM on what to do with the examples provided.

Self-Consistency prompting asks a LLM the same prompt multiple times and then takes the majority result as the ultimate answer.

Generate Knowledge Prompting instructs LLMs to generate new insights or knowledge that the model was not explicitly trained on and then uses the information generated as additional input to derive the final output (ref).

Users and researchers continue to test and learn with various prompt engineering techniques. Eventually, these techniques will become built-in features of LLMs as humans increase their understanding of how to translate what they want as inputs into GenAI models.

How humans anthropomorphise machines

Isn’t it incredible how fast prompt engineering has taken off? Perhaps this is not at all surprising given the user count for just ChatGPT alone grew from 100 million in January 2023 to currently more than 180 million (ref).

What may be surprising or even comical, is how we first interacted with large language models (LLMs) when they became publicly available. Many of us did not foresee that we would need to derive a process for learning how to structure text in a way that could be interpreted and understood by GenAI. We did not at first know that we would need to devise a way to communicate effectively with machines.

There was a recent thread in Hackers News about an individual’s disappointment with ChatGPT’s performance as a reporting assistant. Among the responses, was someone who wrote:

“The problem is that people like this author are trying to literally treat it like a person instead of an LLM.”

Did we actually think the first publicly available LLM would be capable of communicating with us like a human?

Yes! Of course we did. Why wouldn’t we?

Firstly, humans anthropomorphise machines. ‘Anthropomorphism is generally understood as the human tendency to attribute human traits to non-human entities’ (ref). People have been giving tools human names and attributing human characteristics to inanimate objects for centuries (ref). We name our cars, create stories for our children with speaking animals and inanimate objects and more recently Sophia, humanoid robot developed by Hanson Robotics, which is able to have a human-like conversation and make many human-like facial expressions (ref).

Humans have a strong need to connect and have social relationships. We impose our style of communication and how we exchange information and ideas onto machines. We use human communication as a ‘blueprint for the design of technology’, so as to improve robot usability (ref). In wanting to gain insight from machines, we even go as far as to ascribe human traits to machines, such as emotions, values, morality and etiquette.

Another reason why it is not surprising that many people assumed communicating with ChatGPT would be like speaking with a human is we have been fed idealistic expectations of AI. For a long time, we have been seeing idealistic cultural and futuristic visions of robots and machines that perform tasks above and beyond our own human capabilities and intelligence.

The sci-fi films that we are raised on feed us aspirational examples of how future humans will communicate with machines in a seamless and sophisticated manner.

Let’s just take one simple and commonly known example — the humanoid robot character, C-3PO in the 1977 film Star Wars.

Star Wars character (ref)

In the film Star Wars, C-3PO is a highly effective humanoid robot; apt at assisting with etiquette, customs and translation; boasting fluency in over six million forms of communication.

Surely, having watched C-3PO in excitement decades ago gives a certain ground for forgiveness for an initial false assumption that ChatGPT could be communicated with like another human being?

We have been raised to have high expectations of what AI will deliver, the vision and dreams of what we want it to achieve has been brewing in the minds of scientists, artists, film-makers, technologists and other brilliant thinkers for many decades.

We see the end state and are rarely knowledgeable enough about the current state and its quagmires or the facts about AI’s product lifecycle roadmap. For many outside of the field, we are ignorant of the various unknowns, challenges and obstacles that still need to be worked out for humans to produce at scale and at cost the ‘idealised’ GenAI versions and advanced features we expected at the first release of ChatGPT.

Some may argue, the infancy of current LLMs compared to what the general public’s initial expectations is a result of the trend for disruptive companies to release products in their MVP form. There is definitely some truth in that.

When OpenAI released ChatGPT in its beta-version in November 2022, Professor Ethan Mollick of Wharton referred to it as an “omniscient, eager-to-please intern who sometimes lies to you.”

No instruction manuals were given or forewarning of inaccuracies, biases or hallucinations. All the while, large tech and the talented few designing and building these models, were well aware of these issues prior to the media and public discourse on biases and hallucinations became mainstream news.

In July 2021, Meta warned during its release of BlenderBot 2 that the system was prone to “hallucinations”, defined as “confident statements that are not true”.

On 15 November 2022, Meta unveiled a demo of Galactica, designed to “store, combine and reason about scientific knowledge”. Content generated by Galactica came with the warning “Outputs may be unreliable! Language Models are prone to hallucinate text.” Meta withdrew Galactica on 17 November due to offensiveness and inaccuracy (ref).

We live in an post-industrial era where we have become very good at developing useful processes, regulation and governance for products in most industries, such as in food, skincare, furniture, etc. We expect products to meet a certain level of standard to ensure human safety and consumer expectations, especially when consumers are paying for a product.

Counter to these arguments and perhaps in defence of the AI products being released into the wild, are the examples in history when innovative and disruptive products are invented that lead to a re-design of the workforce and our lives. In these cases, products were so revolutionary that they led to a removal, revision and/or development of new rules, processes, structures and jobs.

Take for example, the automobile.

Development of the automobile started in 1672 as a steam-powered vehicle; which is a far cry from where we ended up in 1886 when the first gasoline-powered automobile was developed or in 1913 when the automobile production line was revolutionised by Ford T models enabling the first mass-affordable automobile or to where we are now in the 21st century where climate concerns, higher gasoline prices and improvements in battery technology have led to a renewed interest in electric vehicles whose journey of invention dates back to 1828.

Created by Author

How humans and technology shape one another through interactions.

Lastly and what is probably the most fascinating learning about prompt engineering, is how these early global activities in prompt engineering signify the beginning of a long journey whereby humans and machines will engage in a form of multi-modal communication. Prompt engineering is a nascent stage of the human-machine partnership that will have deep and long-lasting influences on how humans feel, think and behave.

We are at the start of a partnership, a relationship, that is shaping both parties. We are changing and adjusting how we communicate with LLMs to get the outputs that we want and these complex networks of machines are learning from millions of human interactions and the models are changing accordingly to become more sophisticated in its understanding of our preferences, behaviours and communication styles.

For the first time in humanity, we have created a machine that is evolving with us. These LLMs are, in a very strange sense, our global babies, amassed from the DNA of our collective intelligence, creativity, vices, biases and even nonsense.

Prompt engineering and the current realm of nuances are indicative of the nascent technology and its developmental phase. As GenAI advances LLMs will become more sophisticated. The field of AI is moving quickly — recent examples include AI can learn to “unlearn” unwanted content without the need to retrain the entire AI model from scratch (ref) and that very early findings suggesting personalised chatbots may be more persuasive than humans in changing people’s minds (ref).

We will quickly progress from concerns about our needs and desires being lost in translation to being more worried and fearful about becoming lost in manipulation.

Thanks for reading!

Previous
Previous

Advancing LLMs using 3 ways that have led to humanity’s success

Next
Next

Why organisations hoping to transform need ‘Benevolent Mavericks’.