Home > Writings > Science & Technology > Understanding Speech Recognition > Real-world applications

Understanding Speech Recognition

Real-world applications

Despite its limitations, present speech recognition technology can be a very useful tool for a variety of applications, as long as designers and users fully understand the boundaries and weaknesses of such systems. It is regrettable that the desire to hype up a new product or generation of speech recognition engine sometimes leads to blatantly misleading statements or misrepresentation of the realities of speech recognition and its role in real-world delivery. Notable examples in this category are the Lernout & Hauspie debacle, or more recently the Spinvox saga.

Speech recognition is already used for live subtitling on television, as dictation tools in the medical and legal profession, and for off-line speech-to-text conversion or notetaking systems. For all these applications, human editing of the output is needed to achieve really good levels of accuracy. In addition, and as already mentioned, there are an increasing number of small vocabulary or specialised command and control applications, from sat-nav systems and voice command in smartphones, to home automation.

What is not feasible with the current state of science and technology, is to produce a system that converts free, natural speech into text in a fully reliable manner or with at least human-level accuracy. In my previous role as a director of technology at an organisation for people with hearing loss, I often got the question why we didn't produce an app for a mobile handset that people with hearing loss could use to convert speech on the fly into text while going about their daily business. As should be clear from this article, that is well beyond the capabilities of current science and technology. Apart from the fundamental problem, covered earlier, that such a device would not truly "understand" the speech, the real-world situations in which it would need to function make it unfortunately quite unfeasible: the places where it would be most needed (ticket counters in a post office or the underground, meetings, receptions and other noisy environments, out and about in the city) are acoustically totally unsuitable environments for this application.

There are obviously other problems with the concept of large vocabulary speech-recognition on the move (and which is not to be confused with the small vocabulary command systems on mobiles already referred to), such as the limited processing power and memory of mobile devices compared to PCs. However, those limitations should ultimately disappear under the influence of Moore's Law. Also, with mobile broadband now already well advanced, it would be feasible to put the recognition function in the network and just use the mobile device to capture the audio, stream it to a network based recogniser and get the text back. However, the fundamental problem of non-understanding remains, as do the problems of background noise. Microphones in mobile handsets are also often producing an inferior signal compared to a high quality microphone for desktop usage.

In conclusion: speech recognition offers real potential, but comes also with fairly rigid and significant limitations. Unless we make real progress in the field of Artificial Intelligence, these limitations will broadly remain. Designers and engineers building products and services based on speech recognition must take the limitations into account and user expectations must be managed appropriately.

 

Understanding Speech Recognition

Science and Technology
News

Download

Printable version
(PDF Document)

Size: 66KB