An anonymous hacker has claimed to have leaked 270 GB of internal data and source code from The New York Times (NYT) on the controversial image board 4chan.
The leak, reportedly containing over 5,000 repositories and 3.6 million files, was published on June 6, 2024. It has since raised widespread concern and speculation about the potential implications for the historic news organization.
The hacker, who has not been identified, posted a magnet link to the files on 4chan, encouraging users to download and share the data. According to the hacker, the leaked collection comprises uncompressed tar files with fewer than 30 encrypted repositories.
The leaked data reportedly contains a variety of source code, including the blueprints of well-known games like Wordle, email marketing campaigns, and ad reports. The hacker’s message was signed “With love from /aicg/,” a nod to a 4chan community.
While the leak’s legitimacy has not been independently verified, cybersecurity experts and media outlets have expressed serious concerns. The Register reported that it had seen a list of files in the purported leak but had not confirmed their authenticity.
Bryan Lunduke of The Lunduke Journal (who’s covered leaked/hacked material like this before) downloaded the files. He says they’re 334GB worth of files (maybe the size discrepancy is zipped vs unzipped) and thinks they’re real.
This dropped June 6.
“We are talking about a 334 gigabyte archive containing supposedly 3.6 million and some change files, individual source code files. Massive. Off-the-charts massive.”
He though it might just be every New York Times story ever published, but it doesn’t appear to be. Nor does it look like an email server dump.
“This is massive. It almost is making my brain hurt simply going through all of this.”
“I went through it. I read a bunch of it in depth. When I say a bunch of it, I mean I spent a long time on it and barely made a dent.”
“It truly does look to be over 3 something million source code files.”
“The first things I looked through were tremendously boring. It was just stupid JavaScript files dealing with Markdown.” JavaScript is a front-end programming language used for performing a huge variety of tasks in your browser. Markdown is an HTML-like text markup language used as a basis for rendering documents in a variety of different formats (standard web page, phone webpage, PDF, online help, etc.
A lot of it appears to be internal website documents.
“It’s from a wide variety of stuff. I mean it’s all over the map. We’re talking onboarding documents and technical documents, hiring documents, switchboard documents, user attribute documents, a huge amount of documentation.”
Plus actual source code for iOS and Android applications.
Lunduke explains legal doctrine on leaked materials and reporting, saying he didn’t commit any crime to obtain the material, which should legally put him in the clear for talking about material therein relevant to the public interest. Normally I’d point out “Hacking is wrong, mkay,” but New York Times has itself published hacked/leaked/stolen material itself at least as far back as The Pentagon Papers, so this is a case of biter bit.
“There a reasonable assumption that publishing some of this leaked material would be of the public interest…There are a number of policies and other interesting things in place documented within this material that could be of the public interest.”
“This does appear to be real. I cannot fathom how all of this could have been created if it wasn’t real.” I am inclined to agree. But! It’s important to note that a real archive can be salted with false information for a variety of nefarious purposes, so caveat lector.
“It is an absolutely monstrous amount. Simply searching through it and scanning it is insane. There are over 5,000 individual mini-archives within this link each one appears to represent an individual source code repository, or at least a folder or subfolder within source code repositories.” He says it appears to be just the latest snapshot, and not all the versions you would find in a source code repository like GitHub.
The time stamps on the files look recent.
“Man, there’s some funky things going on here.”
I am most interested in how internal policies codify/enforce woke social justice priorities, if there are any special instructions for covering Donald Trump (or other Republicans), racial preferences in hiring policies, etc.