Tuesday, June 10, 2014

In which I battle MTurk External Hits a second time

I wrote a blog post over two years ago about my experiences with external hits on Mechanical Turk. It was rough. The tools were bad, I used them in a way that drove me crazy, and only after much trial and error did I figure things out.

Since then, I've become more learned and wise about many things, especially certain libraries and Amazon tools. This is my attempt to write a more coherent MTurk External Hit tutorial with less swearing. I haven't actually used external hits since then, so this will be a learning adventure for all of us.

Step 1: Get the python library Boto

Boto is "one of the fancy tools built on top of the [crappy Amazon] command line tools" that I referred to in my last post. I've recently been using it to transfer data to/from S3. Get it here! https://github.com/boto/boto (or install with pip)

A wild error appears! 

The specified claims are invalid.   Based on your request, your signature should be generated using the following string: AWSAccessKeyIdXXXXXXXOperationGetAccountBalanceSignatureVersion1Timestamp2014-06-10T22:04:46ZVersion2012-03-25.  Check to make sure your system clock and timezone is not incorrect.  Our current system time: 2014-06-10T22:04:46Z.
I spent at least an hour just now convinced boto was broken... but rather than being related to this bug, it looks like my problem was extra characters typed in the secret access key. I am using boto 2.29.1, which I just installed/upgraded with pip.

Step 2: Try something simple, like checking your account balance

import boto.mturk.connection
sandbox_host = 'mechanicalturk.sandbox.amazonaws.com'
real_host = 'mechanicalturk.amazonaws.com'
mturk = boto.mturk.connection.MTurkConnection(
    aws_access_key_id = 'XXX',
    aws_secret_access_key = 'XXX',
    host = sandbox_host,
    debug = 1 # debug = 2 prints out all requests.
print boto.Version # 2.29.1
print mturk.get_account_balance() # [$10,000.00]
(Gist: https://gist.github.com/ktuite/0cdaca2d574f358bdcd3#file-mturk_boto_intro-py)

Step 3: Try to actually post an external hit

An external hit is a webpage that MTurk loads inside of an iframe so that requesters can design custom tasks that don't fit Amazon's provided templates.

To use boto to do this, you just set up a bunch of details about the hit, like the URL, frame height (how tall the iframe on turk will be), title, description, keywords, and amount paid.

url = "https://the-url-of-my-external-hit"
title = "A special hit!"
description = "The more verbose description of the job!"
keywords = ["cats", "dogs", "rabbits"]
frame_height = 500 # the height of the iframe holding the external hit
amount = .05
questionform = boto.mturk.question.ExternalQuestion( url, frame_height )
create_hit_result = mturk.create_hit(
    title = title,
    description = description,
    keywords = keywords,
    question = questionform,
    reward = boto.mturk.price.Price( amount = amount),
    response_groups = ( 'Minimal', 'HITDetail' ), # I don't know what response groups are
(Gist: https://gist.github.com/ktuite/0cdaca2d574f358bdcd3#file-mturk_external_hit-py)

Then you can look at your hit:

  • Go the requester console (or sandbox version requestersandbox.mturk.com)
    • Click Manage
    • Click "Manage HITs Individually" on the upper right
    • Click on the name/title of your hit to expand the detail panel about it
  • You can also log into the requester sandbox and search for your requester name or your job's title to find it "in the wild". 
Managing Mturk HITs in the requester sandbox

Aargg! Another error! 

I did all that, and all I saw was an empty box. Where's my webpage?? So I opened up the debugging/console on my browser (Chrome: View->Developer->Javascript Console). This time, the error was as so:

[blocked] The page at '[url]' was loaded over HTTPS, but ran insecure content from '[url]': this content should also be loaded over HTTPS.

Good job, Chrome... you caught me not using HTTPS.  Luckily, my site is written in Django and is hosted on Heroku, which lets you plop a https in front of your app url. So I don't have go to sign up for my own SSL certificate right now. However, this lead to a second error: 

Refused to display '[https-url]' in a frame because it set 'X-Frame-Options' to 'SAMEORIGIN'.

Around the time of this https error, I had brought Adam over to help me out. He's the one who told me Heroku does https. He also said it was probably my server sending that X-Frame-Options thing. This  stackoverflow q/a also said it was probably the web server's fault. And indeed it was. To get around this problem, I read this Django documentation and added the '@xframe_options_exempt' decorator to my view method serving up my external hit. Sure enough, this fixed it. 

Step 4: Posting answers back to Mturk

When the worker is done with the external hit, the external hit needs to phone home/notify Mturk. The submit url is either www.mturk.com/externalSubmit or https://workersandbox.mturk.com/externalSubmit according to this fine documentation. It turns out that Amazon will attach a get parameter 'turkSubmitTo' to your URL loaded into the frame, so you can look this up without hardcoding whether or not you're using the sandbox. Be sure to add /mturk/externalSubmit to the end of that url.


But what kind of stuff do you submit? Make an HTML form (maybe even with hidden values that your javascript. Check out an example here.

I was getting this error...

There was a problem submitting your results for this HIT. This HIT is still assigned to you. To try this HIT again, click "HITs Assigned To You" in the navigation.

And it turned out I needed to include the assignmentId (passed in through the URL as a GET parameter) back to the form.

Step 5: Including template variables in your hits (or use Javascript to change your task on the fly)

When I wrote my first blog post, the MTurk Command Line Tools (and the web-based HIT creation tool) required you to submit a file with comma-separated parameters that outlined your hit. Figuring out how to use that correctly was a big headache (ultimately a learning experience) (it's all just HTML forms! all the way down). Since then, I've crafted Mturk tasks with other real live humans who seem to use an entirely different pattern.

The pattern: Use Javascript to set up (and possibly randomize) your task

Say you're running some experiment, and you want your worker to experience one of three experimental conditions, but not accidentally be in two different conditions because they clicked on different tasks. The solution is to have your webpage randomly assign the worker to one of the three tasks when it loads... and only let the worker have access to a single task. 

Helpful Note: When you set  'max_assignments' in 'create_hit' , that's the number of people who will see your task... and they'll each see it only once. 

*ponder ponder* Maybe that's why those batch files defining your task are important... for the cases when you DO want a worker to have access to a bunch of different variations of your task. At the moment, I'm not sure how to do this with boto, other than just manually creating copies of the task a few more times.

To conclude...

This was definitely a headache the second time around, too, but a teensy bit less mysterious. Hopefully if you embark on a similar journey and get stuck at similar places, this post can help you get unstuck faster.

A couple little code samples can be found here: https://gist.github.com/ktuite/0cdaca2d574f358bdcd3