OREANDA-NEWS. July 16, 2015.

Flickr, one of the world’s largest photo sharing services, sees it all. And, now, Flickr’s image recognition technology can categorize more than 11 billion photos like these. And it does it automatically.

This comic captures how many experts once viewed image recognition tasks. (XKCDSome rights reserved)

It’s called “Magic View.” And if it seems like magic, you’re not alone. Categorizing photographs is tough. So hard that, until recently, many believed computers just couldn’t do it. Now, with the help of GPUs, Flickr is doing just that. Instantaneously. And on a massive scale.

The Magic of Deep Learning

The magic behind “Magic View”: a fast-growing technology known as “deep learning.” Deep learning uses neural networks to teach computers to deliver near-human level accuracy.

Businesses are now using deep neural networks for tasks that millions use every day. Work such as image classification, voice recognition, and natural language processing.

Flickr offers a great example. It trains its neural networks on NVIDIA GPUs to recognize key visual concepts. Our GPUs are ideal for this task. Because they’re built with hundreds of computing cores, GPUs can speed up a process that would otherwise take months to just weeks. Or even days.

Flickr’s model training process now involves around 15 million images. But that’s a fraction of the corpus of images that Flickr manages — and could be training on. So what you’re seeing now is just the start.

Flickr’s deep learning effort began in 2013 when Yahoo acquired Lookflow, a six-person enhanced image recognition startup. Launched four years earlier by Simon Osindero and Bobby Jaros, LookFlow created the technology Flickr now uses to auto-tag its photos.

“Magic View” at the Center of Flickr Redesign

Simon is now the AI Architect at Flickr, and Bobby leads deep learning research at Yahoo Labs. Last month, LookFlow’s product – now known as “Magic View” – was one of the key features unveiled in Flickr’s redesign.

“GPU-powered machine learning plays a big role here,” Simon explains. “Particularly in terms of being able to train large models and explore the space of potential model architectures in a reasonable time. We’re leaning very heavily on GPUs to train the neural nets we use in auto-tagging, as well as for several new projects that we have in the works.”

“Magic View” uses Flickr’s image recognition technology to identify the content of your photos. It then sorts them into more than 60 categories. With more than 11 billion photos, that’s an enormous task. It’s also tagging new uploads from Flickr’s mobile apps (iOS and Android), desktop uploaders and website get tagged automatically. You can get a sense for how this works in the animated GIF below.

It’s sophisticated stuff. And remarkably accurate. But sometimes it makes mistakes. One challenge: setting a balance between precision and recall. There’s a trade-off between failing to tag an image with a label and incorrectly applying a label.

When errors occur, Flickr users can delete the inaccurate tags. This manual input helps the algorithm become even more accurate. The result: Flickr’s technology becomes more accurate.

An example of auto-tagging output for an image of a seaside sunset.

Simon can’t share details about upcoming features. But he says Flickr’s team is bringing more machine learning smarts to their mobile platforms, training more sophisticated models on much bigger sets of images – and using GPUs to do it.

More Magic Coming

“We do have some exciting new image intelligence features — beyond simple auto-tagging — that we should be rolling out later this year,” he says. “And for the auto-tagging system, we are continually expanding the repertoire of concepts our models are able to handle, as well as working to improve the accuracy and coverage of the concepts that we already use.”