Thursday, June 4, 2015

How Imgur saved Feminist Hacker Barbie behind the scenes

Back in November 2014, the internet erupted in rage over discovering Mattel's/Barbie's "I can be a computer engineer!" book. Spoilers: the book basically tells girls they need boys to code and fix their computer problems for them.

Inspired by the amazing Kate Compton (master of the pink and glittery Crystal Code Palace of hardcore AI tutorials, including the Javascript grammar expansion library Tracery) I whipped up a new meme-generator website called Feminist Hacker Barbie to stoke the flames/give folks on the internet a creative and constructive outlet to rewrite pages of the book themselves.

A new #FeministHackerBarbie page:
Why ELSE would Barbie be hacking on so many computers at once??
Actually there were many great potential explanations for this glorious scene.


This post isn't about #FeministHackerBarbie itself. NPR, Wired, The Verge, and other such coverage exists already.

This post is meant to be a brief glimpse behind the scenes: one particular design choice I made out of love and laziness, and how that totally saved my ass.

Imgur is great

I'm a lurker on Imgur. I don't post, I just observe and absorb the weird community memes. I probably check it more than Facebook because it's more interesting and clever over all. Anyway, Imgur has an API.

Since I was making a website that let people create their own new images, I needed a place to host these images. I have made many a system to accept and process user images (PhotoCity, Sketch-a-bit, Picard) but it's a hassle and there's a risk of running out of disk space or people uploading terrible things. I heard that Imgur had an API, so I used my Barbie hacking as an excuse to learn it and try it out. It was super easy. You post some image data at Imgur, and you get back an image key.

The flow of my website was something like this:

  1. user creates a new image by typing some text that gets rendered into an image using HTML5 canvas stuff
  2. canvas gets converted to binary data with canvas.toDataUrl()
  3. binary encoded data gets sent to imgur, imgur returns a key
  4. imgur key saved by being posted back to feministhackerbarbie site


The TTD metric

What happens when you open up a hole on the internet where users can submit their own content? The internet fills that hole with dicks.

There's a metric called "time to dicks" that is basically how long it takes before users submit penises. They could be photos, drawings, or even penis-shaped SPORE creatures.

For FeministHackerBarbie, there was actually (luckily?) a wave of Richard Stallman pictures before the straight-up porn. Still pretty creepy, though.


When the site got Stallman'd

Wait, I thought the site just allowed you to re-write Barbie pages. What are these other photos doing in there??? It got hacked, kids. The "learning how the internet works" kind of hacking, I think.

Django's CSRF hacked

Let's let barbie explain how the site was hacked:

Conveniently, someone shared the script they were using to hack the site.

  • someone wrote a really simple python script that took an imgur image id as a command line argument
  • the script generated a random number to serve as the csrf (cross site request forgery protection) token
  • then the script posted some data, including the imgur id and the fake csrf token, to a URL on my site called /save_page
  • the result: the imgur image appearing as a recent submission to my website


I was just looking in to WHY this happened. How the hell could a random CSRF token bypass whatever magical security I thought Django was promising me?? It turns out that while Django complains if you DON'T pass CSRF tokens around, it doesn't actually care if the CSRF-checking mechanism is turned on or not. I realize now I should have either:


Back to why I love Imgur


So, hacking was bound to happen. I'm not surprised and I learned a lot from the experience. When it did happen, I didn't actually try to fix it (or enter into an arms race with the hackers), I completely shut down the ability to upload content.

People could still use the site to generate the images and then post them themselves, and it was really cool to see the meme live on on Twitter for a while.

But while the hacking was happening, relying on Imgur to host the images was beneficial in many ways:

  1. No risk of me running out of disk space (the site was hosted on Heroku, which doesn't even have free persistent storage)
  2. I spent all of 0.5 seconds looking at the website when there was porn on it and took refuge in the MySQL command line instead. Since people were posting the same pornographic imgur ids over and over, it was easy to find and delete the duplicate ids. 
  3. A backchannel chatroom claimed they'd successfully posted a variety of terrible, exploitative images, including ones that would be illegal to have on one's computer. But because images went straight to Imgur, none of them ever went through my server. 
  4. Imgur probably has means of dealing with terrible user submitted content since that is their basis of their entire existence. 

Lessons

I'm trying to come up with some good take-aways here. When you make things on the internet that let people submit their own content, you need to be at least a little bit careful. Here are some ways to be careful:

  1. Imgur is a wonderful home for internet memes including #FeministHackerBarbie, and it might be appropriate for whatever you're trying to do!
  2. Showing unmoderated user submitted content on the front page (like I originally did) is not a great idea. It helped the site take off in the first ~18 hours, but then it was like a siren call to the hacker trolls. "Put porn here to shock and upset the general internet!!"
  3. Check that your CSRF tokens are actually working and protecting your django site. Here's an ever so slightly modified version of the barbie hack code if you wanted to try it on your own django site: http://pastie.org/10224586 
And that's about it!

1 comment: