A new artificial intelligence program has been created that can automatically produce an accurate caption based on the content of the image. The program being developed by Google Inc NASDAQ:GOOG NASDAQ:GOOGL and researchers at Stanford University will analyze a scanned image of the Mona Lisa and be able to describe (caption) it as “a plump woman in dark clothes with an enigmatic smile.”
A photo recognition system like this could prove tremendously useful in helping visually impaired people comprehend pictures, or provide text instead of images in areas where mobile connections are slow, and make it a good bit easier to search for images on large databases such as Google’s.
Google photo recognition based on neural network technology
The photo recognition system is based on two synchronized neural networks: one for analyzing the image pixel-by-pixel, and another to search a language database to produce an accurate description of the image.
The concept was derived from advances in machine translation, where a Recurrent Neural Network (RNN) transforms, for example, a sentence in Spanish into a vector representation, and a second RNN uses that vector representation to generate a target sentence in German.
Google and Stanford researchers decided to replace the first RNN and its input words with a deep Convolutional Neural Network (CNN) trained to classify objects in images. This means the system can feed the CNN’s detailed encoding of the image into a RNN designed to create descriptive phrases. The entire system is then trained using images and their captions, with the goal to maximize the probability that the descriptions it produces match the training descriptions for each image.
System learns from each image
The researchers report that they can ‘teach’ the program to accurately recognize what’s in your photos by introducing a large series of images that are already captioned.
The program takes the descriptions given by researchers in the already-captioned images and analyzes and stores that data for use in future images.
This new image captioning system is much more intelligent than a simple tagging system. It picks out the relevant details in an image, such as colors or object, and can also understand the “scene” of the image in context. Google new photo recognition system can not only note that an image has ‘snow’ and ‘trees’ in it, it can also determine that “the snow is falling in front of the line of trees.”