NYTimes Hacked, Source Code Stolen

This seems like a story that should be getting a lot more coverage: The New York Times was evidently hacked and hundred of gigabytes of their source code released.

An anonymous hacker has claimed to have leaked 270 GB of internal data and source code from The New York Times (NYT) on the controversial image board 4chan.

The leak, reportedly containing over 5,000 repositories and 3.6 million files, was published on June 6, 2024. It has since raised widespread concern and speculation about the potential implications for the historic news organization.

The hacker, who has not been identified, posted a magnet link to the files on 4chan, encouraging users to download and share the data. According to the hacker, the leaked collection comprises uncompressed tar files with fewer than 30 encrypted repositories.

The leaked data reportedly contains a variety of source code, including the blueprints of well-known games like Wordle, email marketing campaigns, and ad reports. The hacker’s message was signed “With love from /aicg/,” a nod to a 4chan community.

While the leak’s legitimacy has not been independently verified, cybersecurity experts and media outlets have expressed serious concerns. The Register reported that it had seen a list of files in the purported leak but had not confirmed their authenticity.

Bryan Lunduke of The Lunduke Journal (who’s covered leaked/hacked material like this before) downloaded the files. He says they’re 334GB worth of files (maybe the size discrepancy is zipped vs unzipped) and thinks they’re real.

  • This dropped June 6.
  • “We are talking about a 334 gigabyte archive containing supposedly 3.6 million and some change files, individual source code files. Massive. Off-the-charts massive.”
  • He though it might just be every New York Times story ever published, but it doesn’t appear to be. Nor does it look like an email server dump.
  • “This is massive. It almost is making my brain hurt simply going through all of this.”
  • “I went through it. I read a bunch of it in depth. When I say a bunch of it, I mean I spent a long time on it and barely made a dent.”
  • “It truly does look to be over 3 something million source code files.”
  • “The first things I looked through were tremendously boring. It was just stupid JavaScript files dealing with Markdown.” JavaScript is a front-end programming language used for performing a huge variety of tasks in your browser. Markdown is an HTML-like text markup language used as a basis for rendering documents in a variety of different formats (standard web page, phone webpage, PDF, online help, etc.
  • A lot of it appears to be internal website documents.
  • “It’s from a wide variety of stuff. I mean it’s all over the map. We’re talking onboarding documents and technical documents, hiring documents, switchboard documents, user attribute documents, a huge amount of documentation.”
  • Plus actual source code for iOS and Android applications.
  • Lunduke explains legal doctrine on leaked materials and reporting, saying he didn’t commit any crime to obtain the material, which should legally put him in the clear for talking about material therein relevant to the public interest. Normally I’d point out “Hacking is wrong, mkay,” but New York Times has itself published hacked/leaked/stolen material itself at least as far back as The Pentagon Papers, so this is a case of biter bit.
  • “There a reasonable assumption that publishing some of this leaked material would be of the public interest…There are a number of policies and other interesting things in place documented within this material that could be of the public interest.”
  • “This does appear to be real. I cannot fathom how all of this could have been created if it wasn’t real.” I am inclined to agree. But! It’s important to note that a real archive can be salted with false information for a variety of nefarious purposes, so caveat lector.
  • “It is an absolutely monstrous amount. Simply searching through it and scanning it is insane. There are over 5,000 individual mini-archives within this link each one appears to represent an individual source code repository, or at least a folder or subfolder within source code repositories.” He says it appears to be just the latest snapshot, and not all the versions you would find in a source code repository like GitHub.

  • The time stamps on the files look recent.
  • “Man, there’s some funky things going on here.”
  • I am most interested in how internal policies codify/enforce woke social justice priorities, if there are any special instructions for covering Donald Trump (or other Republicans), racial preferences in hiring policies, etc.

    I’m hoping for some juicy revelations…

    Tags: , , , , , , , ,

    7 Responses to “NYTimes Hacked, Source Code Stolen”

    1. Meatwood Flack says:

      Good. I hope it ruins them. Couldn’t have happened to a finer group of muckrakers.

    2. cthulhu says:

      That xkcd is completely accurate about Git. Hands-down the worst version control system I’ve ever used, and I’ve used a bunch, including version control for things like CDC Cyber 175 NOS/BE, which was barely an OS.

      If this is real, it couldn’t happen to a more deserving bunch.

    3. Greg the Class Traitor says:

      270 GB of internal data and source code from The New York Times

      Seriously, WTF? That’s got to be a lot of internal data, because there’s nothing thy’re doing that shoudl take more than a couple GB of source code

    4. Greg the Class Traitor says:

      Git’s merge feature totally sucks. I’ve got far better tools on my computer than Git for that, so when I have to do a merge, I tell Git to just take all the other person’s code, and do the merge on my own.

      But other than that it’s a perfectly functional source code control system

    5. Lubert Das says:

      I wouldn’t be surprised if there are some references to how Trump and the GOP in general should be reported upon, but probably not as much as you’d think. The Progressive mind virus is strong with this lot, and I don’t think we’re going to find a log of evidence of Two-Minute Hates going on.

    6. […] WONDER WHY THIS ISN’T A BIGGER STORY: NYTimes Hacked, Source Code Stolen. “We are talking about a 334 gigabyte archive containing supposedly 3.6 million and some change […]

    7. Malthus says:

      “I don’t think we’re going to find a log of evidence of Two-Minute Hates going on.”

      4chan is Emmanuel Goldstein’s alter ego. 4chan keeps the hoi polloi in a continuous state of nervous apprehension. What diabolical plot will 4chan unveil next? Will 4chan hack the access code to the nuclear football and direct a thermonuclear attack on Trump’s enemies?

      If the FBI can track down and arrest anyone even tangentially related to the J6 protests, how is it that 4chan eludes detection? Occam’s Razor suggests that 4chan is is a disinformation operation. If we were to rendition John Owen Brennan to El Salvador, he may be persuaded to tell us more about his role in this.

    Leave a Reply