OpenAI weighs risks of releasing ChatGPT text watermarking tool

OpenAI has developed a promising tool that could potentially detect assignments written by its AI, ChatGPT.

However, the company is currently debating whether to release it, citing complexities and potential impacts on the broader ecosystem.

In a statement to TechCrunch, an OpenAI spokesperson confirmed that the company is researching a text watermarking method.

This method, described in a recent Wall Street Journal article, involves embedding an invisible watermark in the AI-generated text, detectable by a separate tool.

The spokesperson emphasized that OpenAI is taking a “deliberate approach” due to “the complexities involved and its likely impact on the broader ecosystem beyond OpenAI.”

“The text watermarking method we’re developing is technically promising,” the spokesperson stated. “But it has important risks we’re weighing while we research alternatives, including susceptibility to circumvention by bad actors and the potential to disproportionately impact groups like non-English speakers.”

This cautious approach contrasts with previous, largely ineffective attempts to detect AI-generated text. OpenAI itself discontinued its earlier AI text detector last year due to its “low rate of accuracy.”

Unlike prior methods, the proposed text watermarking would focus solely on detecting writing from ChatGPT, not from other AI models. By making subtle changes in how ChatGPT selects words, the tool would create an invisible watermark detectable by a specialized tool.

Following the Wall Street Journal’s publication, OpenAI updated a May blog post on its research into detecting AI-generated content. The update highlighted that text watermarking has shown “highly accurate” results and effectiveness against localized tampering, such as paraphrasing. However, it has proven “less robust against globalized tampering,” such as using translation systems, rewording with another generative model, or inserting and then deleting special characters between words.

OpenAI admitted that the method is “trivial to circumvent by bad actors.” The update also reiterated concerns about the potential negative impact on non-English speakers, stating that text watermarking could “stigmatize the use of AI as a useful writing tool for non-native English speakers.”