Meta Rips Off The Author And Passes The Savings On To Skynet

It turns out that Meta, AKA Facebook, used a giant database of pirated books known as “book3” for their AI generative training efforts.

Indeed, you can now search an index to see who was ripped off.

Did they rip me off? Not by name, as I have no published novels, but they did rip off Mike Ashley’s The Mammoth Book of Extreme Science Fiction, which has my story “Crucifixion Variations” in it, so yeah.

They ripped off Howard Waldrop:

Dream Factories and Radio Pictures

Going Home Again: Stories

Horse of a Different Color

Other Worlds, Better Lives

Things Will Never Be the Same

They ripped off a whole lot of Joe R. Lansdale.

They ripped off a whole lot of George R. R. Martin (in multiple languages).

There’s already been a lawsuit filed against Meta by Richard Kadrey, Sarah Silverman and Christopher Golden over using their material for training AIs, but there seems to be no mention of pirated books or book3.

The fact that Meta is not only training AI on author’s works without their permission, but using pirated copies to do so adds insult to injury.

And probably additional monetary damages from the resulting lawsuits.

I expect the latest piracy revelations to lead to whole host of new lawsuits…

Tags: AI, Crime, Facebook, Joe R. Lansdale, Media Watch, Richard Kadrey, Sarah Silverman, technology

This entry was posted on Wednesday, September 27th, 2023 at 4:46 PM and is filed under Crime, Media Watch. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

12 Responses to “Meta Rips Off The Author And Passes The Savings On To Skynet”

Meatwood Flack says:

September 27, 2023 at 6:24 PM

In other words, Mark “serial IP thief” Zuckerberg has struck again. He’s lucky too, what with CA’s repeal of the 3 strikes law.
10x25mm says:

September 28, 2023 at 9:18 AM

You have to wonder whether FakeBook’s ripoff occurred in the United States or another country. The Marshall Islands appears to offer no copyright protection whatsoever.

Disney really screwed legitimate AI training in the United States with their 95 / 120 year Copyright Act enhancement now enshrined in Title 17 of the United States Code. Since most other countries have much shorter copyright protection periods, legitimate AI development is likely to migrate to foreign countries.
Instapundit » Blog Archive » OUCH: Meta Rips Off The Author And Passes The Savings On To Skynet. “The fact that Meta is not only says:

September 28, 2023 at 9:41 AM

[…] Meta Rips Off The Author And Passes The Savings On To Skynet. “The fact that Meta is not only training AI on author’s works without their permission, but […]
Georg Felis says:

September 28, 2023 at 10:01 AM

Interesting how they picked real writers and their best books to train the AI rather than ‘award-winning’ trash like “If you were a dinosaur, my love”
Sabrina Chase says:

September 28, 2023 at 10:24 AM

Book3 and pirated books *are* mentioned in the class action lawsuit brought against OpenAI by the Authors Guild https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/

One of my books is in that list.
CardanoCrusader says:

September 28, 2023 at 11:01 AM

Given the vast input an AI LLM needs to do it’s job, it would be REALLY hard to argue that the AI is merely derivative, especially given the proprietary algorithms required to produce the output. If anything qualifies for “fair use”, certainly transforming 500 pages as part of a 20 million page data set would qualify. Not only do the proprietary statistical algorithms add value, differentiate the output, but even the original word-to-number conversion is proprietary.

The conversion of the words of the original work into numbers is already a differentiation. It is an add-on value given to the work by the people who assign the numbers, the weightings and the numerical categorizations to the words. Arguably, the number string derived from a given work is it’s own entity, unique in value from the original work, and that’s BEFORE it is fed into the algorithms. At that point, the original author of the original work arguably no longer has a copyright claim.

So, then this proprietary number string, with its unique weightings and categorizations, is fed through proprietary algorithms. The output is unique to the algorithms, the weighting and the original number conversion. So, what’s left to copyright? The output stream? How?
jabrwok says:

September 28, 2023 at 12:11 PM

Larry Correia shows up in that searchable index. I wonder if he’s contemplating a lawsuit. Or maybe Baen Books could do so on behalf of all its authors.
Paul says:

September 28, 2023 at 1:07 PM

No, 500 pages or a 500 page book is not fair use. Fair use is based in part on how much of a work is being used compared to the totality of that particular work. So if I grab half a page of a book and footnote it and use it in my work, that’s fair use. Or 2 paragraphs of a 6 paragraph blog post. Since Facebook is using *all* 500 pages of a 500 page book, that would *not* be fair use. That’s using the whole bloody thing. That’s blatant theft of intellectual property.
They are stealing all 20 million pages.
Now, if they grabbed page 10 from 5 thousand books, you *might* have a fair use argument, but that would only get them 5,000 pages, not 20 million.

As for “derivative”… does the LLM exist and function without the 20 million pages of input? No, it does not. It requires the input. Without it, the LLM does not exist. It becomes a tLM (tiny Language Model)(@copyright me). So it is obviously 90% derivative from the body of text it is being fed. Some programmers worked on the algorithms for what? a hundred man-years? The books they are stealing comprise the work of probably hundreds of thousands of man-years of labor and love.

Next, are these companies making money from the use of the entirety of multiple author’s works? Yes, they are. So they should be completely screwed and bankrupted when this goes to court.

Additionally, LLM’s in science, current events, news, pop culture, etc., will quickly age if they don’t continually get up to date input. So how is that going to work? You can’t have an AI write an article on the Israeli election without input from somewhere. Where are they receiving that information?

But companies could build LLM’s on their own data. Microsoft Press owns the copyright to hundreds or thousands of technical manuals. They are free to use those to build a technical LLM. Similarly, the US military has thousands of manuals and textbooks for machinery, ships, vehicles, weapons, military tactics, strategy, and logistics. DoD should build an internal LLM using all of that plus selected non-DoD works that are either expired copyright or where they compensate the authors appropriately.

If someone were to digitize every novel and book printed, say, pre-1900, and build an LLM from that, then go for it. Those copyrights should all be expired by now (see Project Gutenberg).

That they don’t consider an AI built on that information viable because it doesn’t have current information, shows that the current authors’ works are critical to the success of their systems.
JBalconi says:

September 28, 2023 at 1:33 PM

Dean Koontz could destroy them. They pirated not only his own novels but collaborations.
Paxton Takes On Big Data « Lawrence Person's BattleSwarm Blog says:

June 5, 2024 at 12:05 PM

[…] place to start: Joining in a lawsuit where Facebook’s parent company Meta actually used stolen data to train AI, namely using a giant database of pirated books without paying authors. Paxton’s office could […]
Paxton Wrings $1.4 Billion Settlement From Facebook « Lawrence Person's BattleSwarm Blog says:

July 31, 2024 at 10:47 AM

[…] Illegally stealing information to train AI seems to be a habit with Meta, which is why they’re being sued for using pirated books to train their AI. […]
How odd. The site telling you if your books were ripped off by Meta AI trainer no longer functions. – Moe Lane says:

November 17, 2024 at 11:18 PM

[…] was working fine this morning, and now it’s not. It’s not that Battle Swarm‘s link is bad, either. I had gone earlier to see if my books were in there*, and the link […]

Lawrence Person's BattleSwarm Blog

Meta Rips Off The Author And Passes The Savings On To Skynet

12 Responses to “Meta Rips Off The Author And Passes The Savings On To Skynet”

Leave a Reply

Blogroll

Local/Texas Politics

Gun Blogs

Foreign Policy/Jihad

Think Tanks Etc.

California/Unions/Etc.

Victimhood Identity Politics

Social Media

Lawrence Person’s BattleSwarm

Pages

Categories