Home > Writings > Science & Technology > Understanding Speech Recognition > Command and Control versus Large Vocabulary Systems

Understanding Speech Recognition

Command and Control versus Large Vocabulary Systems

Since speech recognition basically uses statistical analysis to match spoken output to phonemes and then words and phrases, it will be obvious s that recognising only a small set of fairly distinctive (in terms of statistical properties) words is simpler and can be more accurate than identifying thousands or tens of thousands of words in unconstrained speech. Voice control systems not only need to recognise only a relatively small set of words, the actual combinations of words in command sentences further limits the domain for any given phrase. Because of this, command and control style recognition can work even in less than optimal acoustic circumstances and without user specific training of the system. It is even possible to create voice and control systems for users with severe dysarthria (although user training will often be required in that case).

Close-up of the linguatronic lever in my car

A good illustration of a command and control application is the voice control system I have in my car. It's called Linguatronic, a factory fitted system operated by pulling a lever and speaking commands such as "listen to phonebook", "dial number", "next track", etc. It allows me to operate the multimedia and sat-nav functions without taking my eyes of the road and I find it very useful indeed. Because the system only needs to recognise a few hundred words and only in a fairly limited number of combinations, it can work well even in a car driving at high speed and without speaker training (I speak with quite an accent in English as I am not a native speaker, but that does not bother the Linguatronic system).

By contrast, recognising large vocabularies across almost any topic is a much harder job. The differences between utterances can be very subtle and words can appear in almost any context (especially if taking into account the fact that people seldom talk in a way that rigorously adheres to grammar and other formal language rules). Moreover, living language usage evolves all the time. But the true weakness of speech recognition systems is that they do not understand language in the way humans do. It means that for large vocabulary applications, the output inevitably will be less than fully accurate.

Next: Getting good results

 

Understanding Speech Recognition

Science and Technology
News

Download

Printable version
(PDF Document)

Size: 66KB