apacciooutlook logo

Voice Interaction: Make Life more Practical and Interesting

By Xiongguo Lei, Vice President, AISpeech

content-image

Xiongguo Lei, Vice President, AISpeech

With the booming of artificial intelligence, intelligent voice technology, the most natural interaction method, has been brought into focus. Many companies devote to developing voice technology, such as Nuance, Amazon, Microsoft, etc.

Among various speech technologies, automatic speech recognition (ASR) has always attracted most attention. I believe that ASR’s accuracy may be increased by new technologies in acoustic modeling and signal processing. For example, by using more advanced microphone array techniques, we can significantly reduce noise and side-talks and thus improve the recognition accuracy under these conditions. We may also generate or collect more training data for far field microphones and thus improve the performance when similar microphones are used.

"using more advanced microphone array techniques, we can significantly reduce noise and side-talks and thus improve the recognition accuracy under these conditions."        

Microphone array has become a necessary tool in speech interaction. A typical example is Amazon Echo, which employs circular array technique. AISpeech released a “7-Microphone Circular Array” solution in December 2015. In the module, six microphones form a ring and one microphone in the center for sound pickup. It supports far field voice recognition and the accuracy is above 92% within 5 meters. It can cover 360 degree with a margin of error of ±10 degree. Through denoising algorithm and speech enhancement, it can identify environment noise and improve recognition accuracy. This solution is suitable for smart home devices and robots which need to pickup sound without dead angle, such as sound box. This technique forms the solid foundation for voice interaction.

Besides ASR, voice solution also has to strengthen the dialogue interactionability. When we talk about artificial intelligent interaction, what we are actually talking about is the backend information resources for interaction, voice/vision/action are only the methods for interaction. So, providing necessary resources to satisfy users’ needs is the key point for interaction.

Although current voice technology has been widely used, it still has a long way to go. Many capabilities are so far not reachable by current deep learning technology, and require “looking outside” into other fields, such as cognitive science, computational linguistics, neuroscience and so on, to assist in the development of voice technology. With the widespread use of voice technology, human-machine interaction will be more practical and interesting in our true life.

Magazine Current Issue

magazine current issue

Leaders Speak

Andy Nallappan, VP & CIO,

The Industry Demands Quick Upgrade into Cloud

By Andy Nallappan, VP & CIO,

Global Information Technology, Avago Technologies

Steven Weinreb, CIO & EVP, Technology & Operations, Asia, MetLife

Embracing Advanced Tech-enabled Solutions that Foster Innovation and Growth

By Steven Weinreb, CIO & EVP, Technology & Operations, Asia, MetLife

Anil Khatri,

Trends that are on Every CIO's Watch-list

By Anil Khatri,

Head IT-South Asia,

SAP

James F. Hanauer, CTO, VP Engineering and Art Saisuphaluck, Solutions Architect, R&D Lead, CTSI-Global

Simplifying Infrastructure Management with Microsoft Solutions

By James F. Hanauer, CTO, VP Engineering and Art Saisuphaluck, Solutions Architect, R&D Lead, CTSI-Global

Mickey Bradford, VP-IT/CTO, Exchange; & Jay McCartin, VP-Logistic Operations,  Army & Air Force Exchange Service

Embracing Cloud Hosting Benefits

By Mickey Bradford, VP-IT/CTO, Exchange; & Jay McCartin, VP-Logistic Operations, Army & Air Force Exchange Service