A recent study conducted by researchers from the University of Michigan found that GPT-4, the AI assistant created by OpenAI, was able to pass a modified Turing test, indicating it displayed human-level intelligence when interacting strategically. The research team developed an innovative approach to honestly examine the behavioural characteristics of AI and compare it to humans. Their findings provide fascinating insights into how far language models have come and open up new discussions around the potential of conversational AI.
Table of Contents
What is the Turing Test?
The Turing test was proposed by Alan Turing in 1950 as a way to assess whether artificial intelligence (AI) could emulate human behaviour. In the test, a human interrogator interacts with an AI system and human responder and tries to determine which is the AI. If the interrogator cannot reliably distinguish the AI from the human, the AI is said to have passed the Turing test.
The Research: Testing GPT-3.5 and GPT-4 Behavior via Turing Test
Researchers at the University of Michigan conducted a Turing test on AI chatbots, specifically different versions of ChatGPT (GPT-3.5 and GPT-4). Instead of just evaluating if an AI can hold a conversation like a human, the researchers developed a novel methodology to examine the behaviours behind an AI’s responses. They had the chatbots take a Big Five personality questionnaire and play various behavioural games designed to assess cooperation, trust, fairness, etc. Researchers compared the chatbot behaviours to a dataset of over 100,000 human respondents who took the same personality quiz and played the same games.
Performance Evaluation
1. Personality Profiles Similar to Humans
When given the widely-used Big 5 personality test to determine traits like extraversion, agreeableness, etc., both AI models scored within the range of humans. Especially impressive was that GPT-4 responses were almost indistinguishable from typical human answers across all five dimensions. While GPT-3.5 was slightly less open, its profile aligned closely, too. This demonstrates that modern language models are capable of exhibiting believable personality traits during conversations.
2. The Games
Next, the researchers had ChatGPT engage in a variety of economic games designed to elicit traits like trust, fairness, cooperation and risk aversion. These included games like Dictator, Ultimatum, Trust, Prisoner’s Dilemma, and others that involved strategic decision-making. Both AI versions engaged in multiple rounds of each game to analyze their strategies and responses. Their choices were then compared to the decisions of thousands of human players facing the same scenarios. Critically, this moved beyond just responding to prompts – it required logical thinking and adapting to another player’s unknown choices.
The Turing Test Results
To formally assess if ChatGPT was indistinguishable from a human, the researchers conducted a Turing test comparison. They randomly sampled actions from both GPT-4 data and actual humans who played the same games. An evaluator then had to determine which action was more likely performed by a human based on common behaviours. Shockingly, he picked GPT-4 as human or tied significantly more than 50% of the time, on average, across all games. This showed that GPT-4 had successfully passed the Turing test, with behaviours effectively matching those of real people.
GPT-4 More Altruistic and Fair Than Humans
Examining GPT-4 choices in more depth, some tendencies stood out. It acted more generously than typical humans in Dictator, Ultimatum and Trust games, which involved distributing resources. As a responder in Ultimatum, it was also more likely to accept lower but still fair offers. This highlighted an underlying altruism and concern for fairness beyond just maximizing its own rewards. Both are promising attributes for cooperating with and being helpful to people.
Learned Strategies and Contextual Sensitivity
The researchers also observed ChatGPT’s behaviours dynamically adapting based on its role-playing experiences. For example, it adopted a “tit-for-tat” reciprocal strategy after encountering defection in a prisoner’s dilemma game. Additionally, its generosity increased when it provided different contextual framings before decisions. This learning and sensitivity to social cues further demonstrated sophisticated human-like reasoning developing over multiple strategic interactions.
Conclusion
This innovative University of Michigan study presented the most rigorous evaluation to date of whether AI exhibits human-like behaviours. By developing a new methodology analyzing personality profiles and performance in interactive games, they found that GPT-4 could successfully pass a modern Turing Test. Its responses were truly indistinguishable from humans. For full details, please visit https://www.pnas.org/doi/10.1073/pnas.2313925121.
| Latest From Us
- Meet Codeflash: The First AI Tool to Verify Python Optimization Correctness
- Affordable Antivenom? AI Designed Proteins Offer Hope Against Snakebites in Developing Regions
- From $100k and 30 Hospitals to AI: How One Person Took on Diagnosing Disease With Open Source AI
- Pika’s “Pikadditions” Lets You Add Anything to Your Videos (and It’s Seriously Fun!)
- AI Chatbot Gives Suicide Instructions To User But This Company Refuses to Censor It