Who Is the Pot and Who Is the Kettle: OpenAI “Investigating theft” Blog

Last week (end of January 2025) the stock market was rattled when DeepSeek, a Chinese AI company rolled out a new open-source reasoning model, which some say is as good as ChatGPT. (Apparently OpenAI thinks it’s so good that it’s the same.) The low development costs are what caused Nvidia stocks to decline sharply and DeepSeek became the talk of the internet. I know you don’t care about the impacts on the financial markets, but it’s what brought all of this to the world’s attention. It’s also relevant to the blog…. Read on.

Essentially, DeepSeek is able to create its own “ChatGPT” for much less and perhaps without the need for the level of investment that other techie companies like Open AI, Nvidia, and others have required. It’s a big deal. It’s also piqued the interest of OpenAI, who (and I really love this) is ‘investigating’ whether DeepSeek improperly took data from its own models to build the new AI assistant. OpenAI believes that DeepSeek may have “inappropriately distilled” its own AI models.

What is distillation? (I had to look it up too…) It is when you take the knowledge of a larger model and condense it to a smaller model. Hmmm… sort of like taking all of the information in a book or catalog of music or scraping from websites and use it to train an AI? Maybe, but I digress.

In an article from the Hill (link below) a spokesperson from OpenAI actually said the following: “We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the U.S. government to protect the most capable models being built here.” Yes, they are serious.

OpenAI investigating whether DeepSeek improperly obtained data

Why do I find this at least ironic if not completely hilarious? OpenAI has been sued… well a lot… for essentially ripping off the intellectual property of others. I found a great summary of the cases and the status of them here,

Generative AI Lawsuits Timeline: Legal Cases vs. OpenAI, Microsoft, Anthropic, Nvidia, Perplexity, Intel and More - Sustainable Tech Partner for Green IT Service Providers

The article notes, “Multiple Canadian media companies have filed suit against OpenAI, alleging that "OpenAI regularly breaches copyright and online terms of use by scraping large swaths of content from Canadian media to help develop its products, such as ChatGPT. OpenAI is capitalizing and profiting from the use of this content, without getting permission or compensating content owners."” In other words, OpenAI is taking the property of others to develop its own products. Sound familiar – see above – OpenAI accuses DeepSeek of taking its property to develop its own products.

But the lawsuits by the Canadian companies and many others against the various AI companies speak to the core of what concerns the music and artist community about AI. In order for an artificial intelligence to learn it has to have the information coming from somewhere. AI works by processing and interpreting data. AI can learn to recognize patterns, make predictions, and make decisions. The problem with AI is that, for now at least, it lacks the spark of humanity that keeps most of us from doing bad shit. Let me explain.

If you are an artist and you take a sample from someone else’s song, you know that you need a license. You may not want to get the license. You may not know how to go about doing it, but you know that if you sample from another song, you are likely going to get sued for copyright infringement (and you will probably lose). You may ultimately decide to roll the dice and hope that you don’t get caught, but you are going to have that bad feeling in your chest because you know you are doing something wrong. Let me insert a shameless self-promotion: If you need a license call me and let’s get it done correctly.

AI doesn’t care. AI isn’t up all night sweating it out. And AI doesn’t stop and think… should I get a license? AI is T-1 blowing up cars whether or not someone is in them and trying to kill Sara because that is his programming.

The best example of this is the case Getty Images (US) Inc v. Stability AI Inc, U.S. District Court for the District of Delaware, No. 1:23-cv-00135. Essentially, Stability AI takes user prompts and will create an image. The problem is that it was at least in part trained on Getty Images available online and in some of the ‘creations’ it just took parts of the Getty Images and copied them. (I have way oversimplified the allegations.) Like, you can see the Getty Images logo in some of the Stability AI generated works. Copyright law is pretty clear: You can’t just copy an image without getting permission and possibly paying for its use. But AI companies seem to be using a combination of arguments like “we are using the information for educational purposes” or “this is fair use.” (You can read the last sentence with a bit of disdain, since I personally don’t buy either argument.)

What struck me about the claims of OpenAI and apparently the other companies now jumping on the bandwagon to take down DeepSeek, is that DeepSeek isn’t substantively doing anything different than any of the other AI companies! DeepSeek used other AI to train/educate its own AI. DeepSeek is absolutely the kettle here, but should other AI companies really want to be the pot?

Let me add the following. I absolutely believe that there are wonderful uses for AI. Perhaps less so in music, art, and for other creative ventures. But suggestive phrases when you are typing, or even to take large quantities of data to organize it or synthesize it – yeah, that’s great. But for these companies to seriously suggest that they are all doing anything other than ripping off someone else is absurd.

Amy is an intellectual property attorney handling matters for artists, creators, and small businesses in the music and entertainment industry. Book time with her here: Film, Music & Entertainment Contract Attorney | MadJam Music & Entertainment Law, LLC.

Who is the pot and who is the kettle: OpenAI “Investigating theft” by DeepSeek