Xiongguo Lei, Vice President, AISpeech
With the booming of artificial intelligence, intelligent voice technology, the most natural interaction method, has been brought into focus. Many companies devote to developing voice technology, such as Nuance, Amazon, Microsoft, etc.
Among various speech technologies, automatic speech recognition (ASR) has always attracted most attention. I believe that ASRaccuracy may be increased by new technologies in acoustic modeling and signal processing. For example, by using more advanced microphone array techniques, we can significantly reduce noise and side-talks and thus improve the recognition accuracy under these conditions. We may also generate or collect more training data for far field microphones and thus improve the performance when similar microphones are used.
"using more advanced microphone array techniques, we can significantly reduce noise and side-talks and thus improve the recognition accuracy under these conditions."
Microphone array has become a necessary tool in speech interaction. A typical example is Amazon Echo, which employs circular array technique. AISpeech released a “7-Microphone Circular Array” solution in December 2015. In the module, six microphones form a ring and one microphone in the center for sound pickup. It supports far field voice recognition and the accuracy is above 92% within 5 meters. It can cover 360 degree with a margin of error of ±10 degree. Through denoising algorithm and speech enhancement, it can identify environment noise and improve recognition accuracy. This solution is suitable for smart home devices and robots which need to pickup sound without dead angle, such as sound box. This technique forms the solid foundation for voice interaction.
Besides ASR, voice solution also has to strengthen the dialogue interactionability. When we talk about artificial intelligent interaction, what we are actually talking about is the backend information resources for interaction, voice/vision/action are only the methods for interaction. So, providing necessary resources to satisfy users’ needs is the key point for interaction.
Although current voice technology has been widely used, it still has a long way to go. Many capabilities are so far not reachable by current deep learning technology, and require “looking outside” into other fields, such as cognitive science, computational linguistics, neuroscience and so on, to assist the development of voice technology. With the widespread use of voice technology, human-machine interaction will be more practical and interesting in our true life.