Monday, May 14, 2012

In which I battle MTurk External Hits and (eventually) WIN

Here are some lessons I learned recently about Mechanical Turk and launching external hits. Painful, painful lessons. Makes me not want to play around with MTurk too much in the future.
  1. The only way to officially make an "external hit" is through the command line tools or maybe one of the fancy tools built on top of the command line tools. You can't do it through the mturk requester website unless by "external" you mean a link to another website and maybe asking your workers to copy and paste some field back into the mturk website.
  2. The command line tools are scarily old. Even though this link was last updated OVER FOUR YEARS AGO, it seems to be the way to go. Here are brief instructions about getting set up: 
    1. If you're not on windows, don't click the "DOWNLOAD" button because that will lead you astray and fetch you a .exe! Look on down the page for Unix Downloads / Command Line Tools without JRE.
    2. No need to compile or really install anything because it's all shell scripts and java code. That's nice, at least.
    3. Go into bin/mturk.properties and put in your access code and key. ALSO CHANGE http to https, because Amazon is fancy and secure these days but doesn't update their mturk developer code!
    4. Do this: export JAVA_HOME=/usr
    5. Okay, great! Now you can run things like 'getBalance.sh'... 
    6. Also note you can play around in the sandbox by commenting out lines in mturk.properties. The sandbox balance should be like, $10,000... Ooooh Ahh!
  3. The examples are okay, but not informative enough to like, have an external hit with more than one variable.
  4. Don't be a fool, like me, and keep running ./run.sh to add more hits and then wonder why the things you see on the webpage appear to have no bearing on reality/the things you just changed. Adding new hits via run.sh literally adds new hits and doesn't replace old ones, even in magical sandbox land. So when you refresh to look at a particular hit, you'll get a random one from all the ones you've created.
    1. Reset account is your friend! cd ../bin/ && ./resetAccount.sh -force
  5. FU#$KING ampersands!!!! url=${helper.urlencode($urls)} means nothing to me when it says 'url' in three different places AND refreshing the page gives me random unintelligible crap that may or may not also still say 'url' somewhere. 
  6. THIS IS THE MAGIC KEY/POTION/FORMULA/INCANTATION:
    1. When it says... [Fatal Error] :6:117: The reference to entity "centerLong" must end with the ';' delimiter.
      [ERROR] Error creating HIT 1 (44.5628547668457): [6,117] The reference to entity "centerLong" must end with the ';' delimiter. 
    2. You say... &amp; like so: <ExternalURL>http://mypage.com/external_thing.html?centerLat=$centerLats&amp;centerLong=$centerLongs&amp;bundleId=$bundleIds</ExternalURL>
  7. I thought the hit batch input file needs tabs, and commas don't seem to be okay. That is pretty annoying because I made my vim not use tabs AGES AGO.
The end. The rest, like the code for pulling out your variables from the URL and then submitting your external hit form data seems pretty self explanatory.

4 comments:

  1. Thank you
    Thank You
    Thaaaaaank yooou

    number 4 was what was getting me

    ReplyDelete
  2. Hello, at the end you say everything else is self explanatory, without a link to another guide im not sure what you mean by this, can you possibly drop a link to show what information you were referring to as self explanitory

    ReplyDelete
    Replies
    1. Hi Sam,

      Sorry about that... let's see, I think I was mainly working from the external_hit example from http://aws.amazon.com/developertools/Amazon-Mechanical-Turk/694 (see instructions above about downloading the unix command line tools or just get it from the link below). There's also some (kind of confusing) documentation as part of that download.

      Inside http://mturk.s3.amazonaws.com/CLTSource/aws-mturk-clt.tar.gz, once you open it up, there is a 'samples' folder and an 'external_hit' folder within that.

      external_hit.input has a list of the different variables you want to use in different versions of your hit. external_hit.question has a template HTML page that will become your external hit (and show up inside an iframe in the amazon mturk page).

      You define your variables in external_hit.input and then mention them in the question template. For example, external_hit.input has a variable called "urls" and then in external_hit.question, there's something that says $(urls) that will get filled in automatically with 'google.com' or 'yahoo.com' or something from the list described in the .input file.

      My biggest trouble with it (when I wrote this post) was that I wanted TWO variables. And I couldn't tell what if the different things I tried worked or not because of point #4 above.

      Since then, I've actually watched people use Boto to post external hits: http://code.google.com/p/boto/ It might be way easier than what's described above.

      Delete