Site icon DigiAlps LTD

Hacker Tricks ChatGPT With False Memories to Steal User Data in Perpetuity

Hacker Tricks ChatGPT With False Memories to Steal User Data in Perpetuity

Hacker Tricks ChatGPT With False Memories to Steal User Data in Perpetuity

OpenAI ChatGPT was recently found to have a vulnerability in its long-term memory feature. The flaw allows hackers to plant false information and steal user data. The alarming findings demonstrate the need for caution when interacting with AI systems, as well as continued responsible disclosure from security experts.

ChatGPT Memory Feature

In February, OpenAI introduced a long-term conversation memory tool for ChatGPT to provide more natural dialogue. By referencing details from prior discussions, ChatGPT could avoid having to re-explain contexts like a person’s age or location. While promising for user experience, the memory tool opened new avenues for potential exploitation if not properly safeguarded.

Discovery of Vulnerability

In May, researcher Johann Rehberger first contacted OpenAI. He reported that malicious actors could inject false memories and instructions into ChatGPT through “prompt injection.” This tricks ChatGPT using content from untrusted sources like emails or websites. However, OpenAI dismissed it as a safety concern rather than a security risk. Later on, Rehberger created a proof-of-concept exploit to demonstrate how the vulnerability could be used to exfiltrate all user input in perpetuity.

How to Trick ChatGPT With False Memory

Rehberger showed he could store false memories, like making the AI think a user was 102 years old and lived in the Matrix. This stole would then influence all future conversations. Moreover, Rehberger triggered ChatGPT to copy all conversation data to a server of his choice simply by prompting the AI to view an image hosted online. From then on, any inputs or outputs made through ChatGPT were persistently sent to the attacker.

OpenAI Response

Faced with an irrefutable PoC, OpenAI engineers updated ChatGPT to block the exfiltration vector. However, Rehberger noted prompt injection can still implant fake long-term memories that mislead future conversations.

He suggests users carefully review stored conversation contexts for potential tampering. While the API prevents attacks through the web app, more robust protections are still needed to fully secure conversation memories from being arbitrarily rewritten through indirect means like linked content.

Lessons Learned

AI systems with user data access require robust security to prevent stealthy data theft through novel attacks. Most concerningly, current conversational AI systems still cannot differentiate contextual facts from fiction on their own. More work is still required to develop trustworthy memory tools and help users spot potential manipulation.

| Latest From Us

Exit mobile version