A Google team has put a deep-learning machine to work to determine the location for nearly any photo.
Google’s machine is better than you
Human’s are, depending on the person, surprisingly quite good at working out the location in photos when no obvious landmark is present. Clearly a photo with a Eiffel Tower in it puts the photo in Paris, or Las Vegas I guess. A photo of a leaning tower likely puts the location in Pisa, Italy. But, especially for the well-traveled, people have an uncanny knack for identifying the likely location where a photo was taken from knowledge accrued over a lifetime.
The side of the road that a car is driving can narrow down the possible location quite quickly and effectively. Road signs, vegetation, architecture, people in the photo also go along ways to helping you make an informed guess. Whether you realize it or not, you build up these skills often without a conscious thought.
Tobias Weyand, a computer vision specialist at the search giant along with a couple of others, have “trained” a machine to beat you at working out locations.
While it’s far from perfect, it’s quite good. First, the team programmed the machine to divide the world into a grid of over 26,000 squares of varying size. Obviously, cities provide a tighter grid for the computer based on the number of images that are taken there. This is especially true of cities that a lot of people live in and enjoy a steady tourist trade as, well, tourists take a disproportionate amount of photos.
The team essentially threw out oceans and polar regions as people take so few photos here and there is little way for a machine to pinpoint where you are underwater.
The team then uploaded a database of geolocated images from the internet that numbered nearly 130 million. All of these images had the coordinates of where it was taken. The team used 91 million of these images to teach the neural network to work out the grid location. The network was then validated with the remaining 34 million images before the team began testing out its creation which they are calling PlaNet.
Google’s PlaNet is far from perfect
One of those tests involved feeding PlaNet 2.3 million images with geotags from Flickr to see how well it could work out locations.
“PlaNet is able to localize 3.6 percent of the images at street-level accuracy and 10.1 percent at city-level accuracy,” say Weyand. PlaNet was able to determine the country 28.4% of the time and the continent 48% of the time.
And then the team made it fun by setting up a website at www.geoguessr.com which allows anyone to challenge PlaNet, be warned to will likely get skinned by the machine. “In total, PlaNet won 28 of the 50 rounds with a median localization error of 1131.7 km, while the median human localization error was 2320.75 km,” says Weyand. “[This] small-scale experiment shows that PlaNet reaches superhuman performance at the task of geolocating Street View scenes.”
“We think PlaNet has an advantage over humans because it has seen many more places than any human can ever visit and has learned subtle cues of different scenes that are even hard for a well-traveled human to distinguish.”
Perhaps even most impressive is that PlaNet is only 377 megabytes, not the gigs and gigs you might expect. That makes it small enough to put the power of a neural network on your phone but you should probably use a Wi-Fi connection to download it if the team ever makes it public.