Digital Product Studio

OpenAI GPT-4 Passes the Turing Test According to New Research, Outsmarting Humans

A recent study conducted by researchers from the University of Michigan found that GPT-4, the AI assistant created by OpenAI, was able to pass a modified Turing test, indicating it displayed human-level intelligence when interacting strategically. The research team developed an innovative approach to honestly examine the behavioural characteristics of AI and compare it to humans. Their findings provide fascinating insights into how far language models have come and open up new discussions around the potential of conversational AI.

What is the Turing Test?

The Turing test was proposed by Alan Turing in 1950 as a way to assess whether artificial intelligence (AI) could emulate human behaviour. In the test, a human interrogator interacts with an AI system and human responder and tries to determine which is the AI. If the interrogator cannot reliably distinguish the AI from the human, the AI is said to have passed the Turing test.

The Research: Testing GPT-3.5 and GPT-4 Behavior via Turing Test

Researchers at the University of Michigan conducted a Turing test on AI chatbots, specifically different versions of ChatGPT (GPT-3.5 and GPT-4). Instead of just evaluating if an AI can hold a conversation like a human, the researchers developed a novel methodology to examine the behaviours behind an AI’s responses. They had the chatbots take a Big Five personality questionnaire and play various behavioural games designed to assess cooperation, trust, fairness, etc. Researchers compared the chatbot behaviours to a dataset of over 100,000 human respondents who took the same personality quiz and played the same games.

Performance Evaluation

1. Personality Profiles Similar to Humans

When given the widely-used Big 5 personality test to determine traits like extraversion, agreeableness, etc., both AI models scored within the range of humans. Especially impressive was that GPT-4 responses were almost indistinguishable from typical human answers across all five dimensions. While GPT-3.5 was slightly less open, its profile aligned closely, too. This demonstrates that modern language models are capable of exhibiting believable personality traits during conversations.

2. The Games

Next, the researchers had ChatGPT engage in a variety of economic games designed to elicit traits like trust, fairness, cooperation and risk aversion. These included games like Dictator, Ultimatum, Trust, Prisoner’s Dilemma, and others that involved strategic decision-making. Both AI versions engaged in multiple rounds of each game to analyze their strategies and responses. Their choices were then compared to the decisions of thousands of human players facing the same scenarios. Critically, this moved beyond just responding to prompts – it required logical thinking and adapting to another player’s unknown choices.

The Turing Test Results

To formally assess if ChatGPT was indistinguishable from a human, the researchers conducted a Turing test comparison. They randomly sampled actions from both GPT-4 data and actual humans who played the same games. An evaluator then had to determine which action was more likely performed by a human based on common behaviours. Shockingly, he picked GPT-4 as human or tied significantly more than 50% of the time, on average, across all games. This showed that GPT-4 had successfully passed the Turing test, with behaviours effectively matching those of real people.

GPT-4 More Altruistic and Fair Than Humans

Examining GPT-4 choices in more depth, some tendencies stood out. It acted more generously than typical humans in Dictator, Ultimatum and Trust games, which involved distributing resources. As a responder in Ultimatum, it was also more likely to accept lower but still fair offers. This highlighted an underlying altruism and concern for fairness beyond just maximizing its own rewards. Both are promising attributes for cooperating with and being helpful to people.

Learned Strategies and Contextual Sensitivity

The researchers also observed ChatGPT’s behaviours dynamically adapting based on its role-playing experiences. For example, it adopted a “tit-for-tat” reciprocal strategy after encountering defection in a prisoner’s dilemma game. Additionally, its generosity increased when it provided different contextual framings before decisions. This learning and sensitivity to social cues further demonstrated sophisticated human-like reasoning developing over multiple strategic interactions.

Conclusion

This innovative University of Michigan study presented the most rigorous evaluation to date of whether AI exhibits human-like behaviours. By developing a new methodology analyzing personality profiles and performance in interactive games, they found that GPT-4 could successfully pass a modern Turing Test. Its responses were truly indistinguishable from humans. For full details, please visit https://www.pnas.org/doi/10.1073/pnas.2313925121.

| Latest From Us

SUBSCRIBE TO OUR NEWSLETTER

Stay updated with the latest news and exclusive offers!


* indicates required
Picture of Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Don't Miss Out on AI Breakthroughs!

Advanced futuristic humanoid robot

*No spam, no sharing, no selling. Just AI updates.