OpenAI ChatGPT was recently found to have a vulnerability in its long-term memory feature. The flaw allows hackers to plant false information and steal user data. The alarming findings demonstrate the need for caution when interacting with AI systems, as well as continued responsible disclosure from security experts.
Table of Contents
ChatGPT Memory Feature
In February, OpenAI introduced a long-term conversation memory tool for ChatGPT to provide more natural dialogue. By referencing details from prior discussions, ChatGPT could avoid having to re-explain contexts like a person’s age or location. While promising for user experience, the memory tool opened new avenues for potential exploitation if not properly safeguarded.
Discovery of Vulnerability
In May, researcher Johann Rehberger first contacted OpenAI. He reported that malicious actors could inject false memories and instructions into ChatGPT through “prompt injection.” This tricks ChatGPT using content from untrusted sources like emails or websites. However, OpenAI dismissed it as a safety concern rather than a security risk. Later on, Rehberger created a proof-of-concept exploit to demonstrate how the vulnerability could be used to exfiltrate all user input in perpetuity.
How to Trick ChatGPT With False Memory
Rehberger showed he could store false memories, like making the AI think a user was 102 years old and lived in the Matrix. This stole would then influence all future conversations. Moreover, Rehberger triggered ChatGPT to copy all conversation data to a server of his choice simply by prompting the AI to view an image hosted online. From then on, any inputs or outputs made through ChatGPT were persistently sent to the attacker.
OpenAI Response
Faced with an irrefutable PoC, OpenAI engineers updated ChatGPT to block the exfiltration vector. However, Rehberger noted prompt injection can still implant fake long-term memories that mislead future conversations.
He suggests users carefully review stored conversation contexts for potential tampering. While the API prevents attacks through the web app, more robust protections are still needed to fully secure conversation memories from being arbitrarily rewritten through indirect means like linked content.
Lessons Learned
AI systems with user data access require robust security to prevent stealthy data theft through novel attacks. Most concerningly, current conversational AI systems still cannot differentiate contextual facts from fiction on their own. More work is still required to develop trustworthy memory tools and help users spot potential manipulation.
| Latest From Us
- Google Knows Where You Are By Tracking Your Location Even With GPS Disabled
- Nvidia’s New Open Model NVLM 1.0 Takes On GPT-4o in Multimodal AI
- Do AI Coding Assistants Really Improve Developer Productivity?
- Nintendo Is Going Against Popular YouTube Channels That Show Its Games Being Emulated
- By 2027, 79 Percent of CEOs Expect Remote Jobs to Be Extinct