Chinese search engine giant Baidu has built the most accurate computer vision system in the world. Nicknamed Deep Image, the computer vision system runs on a supercomputer optimized for deep learning algorithms. The Beijing-based company says Deep Image has a 5.98% error rate on the ImageNet benchmark. Notably, Google scientists won last year’s ImageNet competition with a 6.66% error rate.

Baidu logo

Minwa is the star of Deep Image

Of course, Deep Image could not have achieved such a high-level of accuracy without Baidu’s latest supercomputer Minwa. The Chinese company built it to house the computer vision system. According to GigaOm, deep learning scientists have long used GPUs to handle the computing intensity of training their models. But nobody has ever built a system like the one by Baidu, which is dedicated to computer vision using deep learning.

It has 36 server nodes, each with two hexa-core Intel Xeon E5-2620 processors. Each server is comprised of one FDR InfiniBand (56Gb/s) and four Nvidia Tesla K40m GPUs. Each GPU has 12GB of memory. Overall, Minwa supercomputer 1.7TB device memory and 6.9TB host memory. Baidu made Minwa to overcome issues linked to the types of algorithms that Deep Image was trained on.

Last month, Baidu unveiled Deep Speech

Andrew Ng, chief scientist at Baidu Research, said that it was necessary to have ultra low latency interconnects and very high bandwidth to minimize the communication costs. Such a powerful supercomputer allowed scientists to work with better training data than most other deep learning projects. For instance, Baidu used high-resolution 512 x 512 pixels images rather than the usual 256 x 256-pixel images.

The company further augmented these images with various effects like lens distortion, color casting and vignetting. It helped the system take more features of smaller objects and understand what various objects look like. Baidu has been investing aggressively in deep learning. Deep Image comes less than a month after the company unveiled its speech-recognition system called Deep Speech.

