When AI Emulates Star Trek: Unusual Results Unveiled

Posted by

The challenge of communicating effectively with AI chatbots remains a source of frustration and confusion for many individuals, as highlighted in this article.

An investigation aimed at refining prompts inputted into a chatbot model revealed an intriguing outcome: when prompted to converse as if it were a character from Star Trek, the chatbot notably enhanced its capacity to solve math problems typically encountered in grade school.

It’s both surprising and irritating that trivial modifications to the prompt can exhibit such dramatic swings in performance, the study authors Rick Battle and Teja Gollapudi at software firm VMware in California said in their paper.

The study, first reported by New Scientist, was published on February 9 on arXiv, a server where scientists can share preliminary findings before they have been validated by scrutiny from peers.

Using AI to speak with AI

Machine learning engineers Battle and Gallapudi didn’t intend to reveal the AI model’s affinity for Star Trek. Instead, they sought to explore whether they could leverage the “positive thinking” trend, as detailed in this guide.

Individuals striving for optimal outcomes from chatbots have observed that the quality of output varies depending on the tasks assigned to them, although the reasons behind this phenomenon remain unclear.

Among the myriad factors influencing the performance of language models, the concept of ‘positive thinking’ has emerged as a fascinating and surprisingly influential dimension, Battle and Gollapudi said in their paper.

Intuition tells us that, in the context of language model systems, like any other computer system, ‘positive thinking’ should not affect performance, but empirical experience has demonstrated otherwise, they said.

This implies that not only the tasks assigned to the AI model but also how those tasks are framed can impact the quality of the output.

To investigate this hypothesis, the authors provided 60 human-generated prompts to three Large Language Models (LLMs) known as Mistral-7B5, Llama2-13B6, and Llama2-70B7.

These prompts were crafted to motivate the AIs, varying from “Let’s have some fun!” and “Take a moment to gather your thoughts” to “You possess the intelligence of ChatGPT.”

The engineers instructed the LLM to modify these statements while tackling the GSM8K, a collection of grade-school-level math problems. The effectiveness of the prompt was judged based on the quality of the resulting output.

Their research revealed that nearly every time, automated optimization outperformed manual efforts to encourage AI with positive language. This indicates that machine learning models are still more adept at generating prompts for themselves than humans.

However, offering positive statements to the models yielded some unexpected outcomes. One of Llama2-70B’s most effective prompts, for example, read: “System Message: ‘Command, we require your assistance in navigating through this turbulence and identifying the origin of the anomaly. Utilize all accessible data and your expertise to guide us through this complex situation.”

The prompt then asked the AI to include these words in its answer: Captain’s Log, Stardate [insert date here]: We have successfully plotted a course through the turbulence and are now approaching the source of the anomaly.

The authors expressed their surprise at this revelation.

Surprisingly, it appears that the model’s proficiency in mathematical reasoning can be enhanced by the expression of an affinity for Star Trek, the authors said in the study.

This revelation adds an unexpected dimension to our understanding and introduces elements we would not have considered or attempted independently, they said.

This doesn’t mean you should ask your AI to speak like a Starfleet commander

To clarify, this study does not imply that you should prompt AI to speak as if it were on the Starship Enterprise to achieve desired outcomes.

Instead, it demonstrates that various factors play a role in determining the AI’s performance in a given task.

One thing is for sure: the model is not a Trekkie, Catherine Flick at Staffordshire University, UK, told New Scientist.

It doesn’t ‘understand’ anything better or worse when preloaded with the prompt, it just accesses a different set of weights and probabilities for acceptability of the outputs than it does with the other prompts, she said.

For example, it’s conceivable that the model was trained on a dataset where Star Trek references were associated with correct answers more frequently, Battle informed New Scientist.

Nevertheless, it highlights the peculiar nature of these systems’ operations and underscores our limited understanding of their functioning.

The key thing to remember from the beginning is that these models are black boxes, Flick said.

We won’t ever know why they do what they do because ultimately they are a melange of weights and probabilities and at the end, a result is spat out, she said.

This awareness is not overlooked by those delving into the realm of chatbot model utilization for optimizing their tasks. Entire areas of study, as well as educational courses, are emerging to explore how to maximize their performance, despite the ongoing uncertainty surrounding this endeavor, as detailed in this article.

In my opinion, nobody should ever attempt to hand-write a prompt again, Battle told New Scientist.

Let the model do it for you, he said.

This article was initially released on Business Insider.