Back to blogging in 2020!

Filling in the gaps with PhotoCity

My research in grad school has revolved mainly around games as a means of crowdsourcing the collection of diverse and useful data. I'm embarking on a new project that, like my original game PhotoCity, also combines computer vision and games. It will become a big chunk of my thesis. My original plan for this post was to write about that new project, after providing context by explaining PhotoCity. It turned out that I just wrote about PhotoCity.

Sidenote: Some game designer folks might read this and be concerned about how I'm using games for the wrong reasons. Gamification is a sin! Games with a purpose are questionable! I have more to say about how I think I'm using games for the right reasons (feedback! learning! mastery!) but I'm not going to go into it now.

This is the finish line graphic for The Big Race 3D.


Around 2006 (while I was working on The Big Race, actually) a project called Photo Tourism came out of UW and was then licensed by Microsoft as Photosynth. It had 3 parts:
  1. Use large photo collections from Flickr. It turns out people have photographed and shared enough images from enough different vantage points to reconstruct popular locations like Trevi Fountain and Notre Dame in 3D. 
  2. Perform magic (I can tell you more about the specifics of this, too, if you want!) to turn that hodge-podge of photos into a 3D scene. Release code online, thus enabling many hobbyists and many grad students to base their research off of this work.
  3. Make a slick viewer that lets you navigate the photos in 3D, because you've already recovered their 3D poses relative to each other and the scene. This is the part that Photosynth mainly focused on.  
A big idea of this work that carried over from this work was that there is so much information already out there on the internet! Look! Every photo of the Colosseum you'd ever want to take has already been taken! (Rome in a day is the scaled-up, even more impressive evolution of Photo Tourism, which attempted to reconstruct Rome from 100,000 tagged Flickr photos.)

Using Rome in a Day to make my point, the data is not all there already. 100,000 photos went in and something like 10,000 were incorporated into models. Trevi Fountain was one of the largest clusters with 1,900 images. Most "rome" photos on Flickr are of people and food and interiors... not images that you could use to reconstruct a city. Even if you take a different source of data, like aerial images or street view images, it's going to be missing information along SOME dimensions (time, point of view, level of detail). Information that a normal person with a camera phone could conceivably provide.

hello, trevi fountain. hello, adam and kathleen and bunny ears.
Adam and Kathleen are not all that impressed.
PhotoCity was this awesome project to crowdsource photos that could be used to reconstruct a location in 3D. Will people be able to rebuild, say, the entire UW campus, knowing that that's their goal and getting feedback along the way? Surprise answer: Yes!! Here's a video of the final UW model.

It was part game, in that your progress was measured by points (real 3D points that your photos had added to the 3D model) and flags (denoting territory on a map that you had contributed to). The game elements weren't as key as the feedback, though. PhotoCity was really a collaborative, interactive version of the original Photo Tourism that showed how every photo was incorporated into a model, what new points it added, where it fit in spatially. And except for the part that already existed to do the 3D reconstruction, I built it all.

Contributing in a way that adds up

The original motivation for PhotoCity was incentivization and accomplishing a task that depended on real people with real cameras walking around in the real world.

The more general motivation that rolls through my head these days is this:
  • Does the data exist for {x}? 
  • If not, can I go out and collect that data myself? Or augment existing data with my own?
  • How do I know I'm collecting useful data? 

For PhotoCity, {x} was photos that could be used to reconstruction locations around the world. Obviously there was a lot of the world not reconstructed within PhotoCity, but it was fun to travel to new places and try to start a new model, or in some rare cases, travel to a new country and add to someone else's existing model.

It wasn't just a game, it was an interactive system built on some really neat technology that people all over could contribute to and help grow.

Human-computer symbiosis 

That's a phrase one of my advisors uses to... sound impressive, I think. It evolved from him talking about Foldit but I don't think it's specific to Foldit anymore.

From my recent blog post on design patterns of crowdsourced art and the last little section, it probably sounds like my main goal is enabling creative collaboration and outlets for people to make useful contributions. I do build things in that direction, but there's more...

Wikipedia is the obvious example of an incredibly useful collaborative ecosystem. You write an article on something you're knowledgeable about, or edit or update other people's articles. It's now an amazing resource (also driving lots and lots of computer science research). In PhotoCity, you'd take a picture of some place you happened to be, or you'd seek out near by locations that needed more data and take pictures there.

The difference between Wikipedia and PhotoCity (besides text info about the world vs. photographic imagery) is that a computer processed your photo to decide where it fit in and how much novel information it provided. It's not just an ecosystem of human writers, editors, and browsers, it's an ecosystem with some structure from motion and a big, fat bundle adjustment at its core. When I put on my grad student hat, I care about humans contributing and collaborating with each other, but also with some underlying technology/system/algorithm/computer process. And sometimes the computer is wrong!

Up next (coming soon in another post)

I had the best intentions of writing this here.. it would help me figure out what I'm doing for the rest of the week. For now, though, you just get a bulleted list:
  • New domain (photos of facial expressions and appearance variations) 
  • Contribute face photos that fill in the facial frontier and make face detection, face recognition, and various facial classifiers better
  • Get feedback on how your face photos are advancing that frontier
  • Give feedback on how the computer should correct or question some of its faulty assumptions
I'd like to have more conversations about this. If you have things you want to say, say them here in the comments or via email or whatever!