145 comments
rockemsockem · 13 hours ago
It seemed obvious to me for a long time before modern LLM training that any sort of training of machine intelligence would have to rely on pirated content. There's just no other viable alternative for efficiently acquiring large quantities of text data. Buying millions of ebooks online would take a lot of effort, downloading data from publishers isn't a thing that can be done efficiently (assuming tech companies negotiated and threw money at them), the only efficient way to access large volumes of media is piracy. The media ecosystem doesn't allow anything else.

Show replies

dang · 10 hours ago
loeg · 11 hours ago
> “By downloading through the bit torrent protocol, Meta knew it was facilitating further copyright infringement by acting as a distribution point for other users of pirated books,” the amended complaint notes.

> “Put another way, by opting to use a bit torrent system to download LibGen’s voluminous collection of pirated books, Meta ‘seeded’ pirated books to other users worldwide.”

It is possible to (ab)use the bittorrent ecosystem and download without sharing at all. I don't know if this is what Meta did, or not.

Show replies

svl7 · 6 hours ago
While Meta's use of copyrighted material might actually fall under fair use I wonder about the implications of having to use the whole source material for training purposes...

Let's say I quote some key parts of a copyrighted book in an way that complies with fair use for a work of mine. In order to find the quoted parts I have to read the whole book first. To read the book first I need to acquire it. If it was simply pirated, wouldn't that technically be the main issue, not the fair use part in their service? I am an absolute layman when it comes to the subject of law and just thinking loudly. It seems to me that admitting using pirated works could be more problematic on itself, regardless of the resulting fair use when it is clear that the whole content had to be consumed / processed to get to the result.

crmd · 10 hours ago
I am trying to imagine the legal contortions required for the US Supreme Court to relieve Meta of copyright infringement liability for participating in a bit torrent cloud (and thereby facilitating "piracy" by others) in this case, while upholding liability for ordinary people using bit torrent.

Would love if any lawyers here can speculate.

Show replies