Speech recognition systems are assessed using a standard test called Switchboard – a collection of recorded telephone conversations. Microsoft has achieved a 5.1 human parity word error rate rate on this test using new machine learning techniques, putting their software on par with human transcribers who are concentrating hard on recognising and translating audio.
Last year, Microsoft’s system was able to produce a 5.9 human parity word error rate which is the accepted level of accuracy for most people. Microsoft Research says the improved accuracy, which is the level researchers have found is achieved by transcribers, was achieved by improved machine learning algorithms that are able to predict words in conversation as well as better recognition.
Much of the work is powered by Azure GPUs.
The systems developed by Microsoft Research will become part of services such as Cortana and Microsoft Cognitive Services.