IBM has beaten the record for artificial intelligence (AI) image recognition by developing distributed deep learning (DDL) software.
Accuracy and speed were both improved. The IBM report showed that given any random image from a set of 7.5 million, the trained AI model chose with an accuracy of 33.8%, compared to the previous record of 29.8% recorded by Microsoft.
The research team argues that its technology will help other AI models with specific tasks, such as detecting cancer cells in medical images, to be much more accurate and able to be trained in hours.
The overall aim of the research was to ‘reduce the wait-time associated with deep learning training from days or hours to minutes or seconds, and enable improved accuracy of these AI models’, according to Hillery Hunter, who led the team.
IBM has met this objective by creating DDL software to help GPUs talk to each other. This was necessary due to the use of multiple servers with GPUs, while previous attempts have only used multiple GPUs on a single server.
In the report, Hunter describes the parable of the ‘Blind men and the elephant’. The story describes how the blind men feel various parts of an elephant – for instance, the tusk or the trunk. Having done so, they all come to different conclusions as to what an elephant looks like. However, with more time and discussion, they are able to create a reasonably accurate picture of an elephant.
Hunter argues the same is true with GPUs. They need to be able to ‘talk’ to each other to get an accurate picture. This actually means that the more GPUs there are, or if they are of a higher quality, the learning time can actually be slower as they have more to ‘talk’ about.
This is where scaling efficiency comes in. According to IBM’s report, the best scaling for 256 GPUs was previously demonstrated by a team from Facebook AI Research (FAIR), which achieved 89% scaling efficiency on a 256 NVIDIA P100 GPU accelerated cluster using the Caffe2 deep learning software.
Using a ResNet-50 model and the same dataset as Facebook, the IBM Research DDL software achieved an efficiency of 95%. The team also achieved a record in fastest absolute training time of 50 minutes compared to Facebook’s previous record of one hour.