OpenAI clashes with NYT over user data in AI copyright battle
OpenAI pushes back on court order to hand over 20M ChatGPT convos, citing privacy risks
A new twist in the escalating copyright war between OpenAI and The New York Times is raising alarm bells about privacy, data access, and how far publishers are willing to go to challenge AI training methods.
This week, OpenAI asked a US federal judge to block an order requiring it to hand over 20 million anonymised ChatGPT conversations to The New York Times and other publishers. The court had previously ruled in favor of the data handover as part of a high-profile copyright lawsuit accusing OpenAI of illegally training its models on publisher-owned content.
OpenAI’s defense? Most of those conversations have nothing to do with the case. Releasing them could also jeopardize user privacy on a massive scale. This article explores what’s behind the court battle, why marketers should pay attention, and what this means for data ethics in AI.
Short on time?
Here’s a table of contents for quick access:
- What’s happening in the OpenAI vs. NYT lawsuit
- Why ChatGPT data is the new legal battleground
- What marketers should know

What's happening in the OpenAI vs. NYT lawsuit
OpenAI is facing mounting legal pressure over how it sourced content to train ChatGPT, with The New York Times and other publishers arguing that the company scraped and reused copyrighted articles without permission.
On November 13, OpenAI filed a motion opposing a court order that compels it to hand over 20 million anonymised chat logs from regular ChatGPT users. The company claims that “99.99%” of those chats are irrelevant to the allegations and that fulfilling the request would expose private user data, even if anonymised.
In its statement, OpenAI warned that the court’s order could affect anyone who has used ChatGPT over the last three years. Dane Stuckey, the company’s Chief Information Security Officer, wrote in a blog post that the request “disregards long-standing privacy protections” and could compromise millions of conversations from individuals unrelated to the lawsuit.
Previously, The New York Times had requested access to as many as 1.4 billion conversations and reportedly rejected OpenAI’s proposed compromise to perform targeted keyword searches instead of full data disclosure.
Despite OpenAI’s pushback, Magistrate Judge Ona Wang, who initially approved the order, emphasized that the dataset would be thoroughly de-identified under legal safeguards.
The deadline for OpenAI to comply is Friday.
Why ChatGPT data is the new legal battleground
This isn’t just about chat logs or training data. It’s about precedent. If courts require generative AI companies to reveal user-level interactions as part of copyright discovery, it could reshape how AI firms collect, use, and defend their data sources going forward.
The case touches on several unresolved legal questions:
- Does training a model on copyrighted content constitute infringement?
- Can AI outputs that replicate protected material be considered derivative works?
- And most urgently: how should courts balance copyright enforcement with user privacy in generative AI contexts?
Earlier this month, Getty Images lost most of its case against Stability AI in London, where a judge found insufficient grounds for broad copyright violations tied to AI training. That decision may signal that courts are still cautious about labeling AI training as outright infringement. Still, the matter is far from settled.
This makes the OpenAI vs. NYT lawsuit a bellwether for marketers, publishers, and technologists alike. The outcome could influence how AI products are built, what datasets are considered “fair use,” and how public trust around data handling evolves.
What marketers should know
Marketers aren’t in the courtroom, but they’re certainly in the splash zone. Here’s why this case matters for your strategy:
1. Expect tighter scrutiny on AI data sources
Regulatory conversations around generative AI are heating up. If OpenAI or other firms are forced to open their training datasets to discovery or public scrutiny, you can expect tighter compliance requirements around data origin, licensing, and usage disclosure, especially for enterprise-facing AI tools.
2. Privacy expectations will shape brand trust
OpenAI’s stance hinges on protecting user trust. Whether that message sticks could influence how customers view AI interactions in your own products or services. Brands using generative tools should be ready to articulate where their data goes, who sees it, and how it’s protected.
3. AI-generated content policies may shift
Depending on the court’s ruling, publishers may become more aggressive about policing AI-generated outputs that resemble their articles. This could affect how marketing teams deploy AI tools for content generation, especially if they're relying on publicly trained models.
4. Legal gray areas still rule
This lawsuit underscores how unsettled AI copyright law remains. Marketers working with AI vendors need to build flexibility into contracts and compliance language, especially as courts and lawmakers start catching up with the tech.
As AI lawsuits move from theory to precedent, brands should stay plugged into how legal frameworks around training data, privacy, and output liability evolve. OpenAI’s legal drama isn’t just about tech. It’s about trust, ownership, and how businesses will navigate a future where AI touches every part of the marketing stack.


