Voice Recognition Options

I’ve been having fun with SayIt: a voice recognition module available from Parallax (http://www.parallax.com) for about $60. The module is an affordable and painless means of adding voice recognition functionality to your robot or other microprocessor based application.

SayIt supports 23 built-in commands which can be overridden by 32 custom commands of your choosing, and comes with an easy-to-use interface for the BASIC Stamp 2, as well as a training application. You can use the module as-is with other microprocessors, but you’ll need a Stamp 2 if you want to add custom commands.

The voice recognition module brings back memories of my first voice recognition card from over 20 years ago. I forget the brand, but a card with functionality and vocabulary similar to that of SayIt was available for about $60 for the PC. Although the card required most of the resources of the PC for operation, training it was simpler than training the SayIt because of a screen-based menu system. Today, the same functionality is available on a thumbnail sized chip running on an inexpensive microcontroller – with a total footprint less than that early voice recognition card.

Voice recognition isn’t new, and robotics have become commonplace on the PC. For example, I’m dictating this editorial using the latest version of Dragon Dictate for Windows. If you’re a Macintosh user, then you already have voice recognition software installed as part of your operating system. Unfortunately, the recognition accuracy of this ‘free’ software is less than stellar. You’d do better with an add-on product. In either case, dedicated hardware cards for speech recognition are a thing of the past, thanks to standard audio input/output hardware.

An advantage of using voice recognition to control your robot is that it frees your hands to do other things. Another advantage is that it enables those not versed in the nuances of their remote to operate their robot. It’s one thing to have a robot that only responds to your voice, but you can get a lot of mileage out of a robot that will respond to the verbal commands of your friends and family, as well.

Voice recognition really shines when it’s teamed with voice synthesis. A popular text-to-speech chip is Speakjet. If you look online, you can find numerous examples of interfacing the Speakjet to just about every microprocessor on the market. The output is distinctively ‘robotic,’ but in a robotics application, it’s fitting.

The trick to working with speech as a user interface for both feedback and commands is to pick short commands that are both readily recognizable by the system and that can be readily learned by you and other users. For example, if you use simple one syllable commands (such as “right” and “go”), then recognition accuracy will be poor even though the commands are intuitive and easy to remember. If you use multiple syllable words that sound significantly different from each other, then recognition accuracy should increase considerably.

As a benchmark, ‘good’ accuracy in the current generation of inexpensive, discrete word, limited vocabulary voice recognition hardware is 90% or greater. That’s one error in 10 words. The occasional error isn’t a big deal if you’re steering a carpet rover with voice commands. However, the error can become a serious matter for someone controlling a surgical robot or a weapon system. Even if you’re just controlling a desktop robotic arm, it’s a good idea to have a manual override for when the occasional misunderstanding occurs.

A more sophisticated workaround is to have the system repeat your command and wait for an acknowledgement. If you don’t affirm the command within a few seconds, the system ignores the command. Depending on your robotic application and the complexity of commands, the tedium of acknowledging every command may be unbearable. I like the kill switch approach. It’s immediate, and I can usually tell within a second or two whether my voice command has been properly interpreted.

The overhead imposed by speech recognition and — to a lesser extent — text-to-speech, can be significant on a robot tasked with monitoring multiple sensors and controlling several motors. A workaround is to dedicate a microcontroller to the speech interface. This frees your main microcontroller to deal with navigation and other primary tasks.

If funds aren’t a consideration, then the way to go is to handle the voice recognition and synthesis remotely on your PC. By sending the audio from your robot to a PC or laptop and having the audio processed there, you’ll be able to use large vocabulary continuous speech to control your robot. Instead of single, low level commands, you can issue complex, high level commands such as “pick up the red ball.” Of course, you have to have other resources (such as image recognition) to identify the red ball.

You’ll also need a macro language to translate “pick up” into a series of commands that are tied into feedback from sensors on your robot. Obviously, developing a sophisticated human-robot interface based on verbal communications is no small undertaking. However, if you’re new to this arena, experimenting with an inexpensive voice recognition chip is a great place to start. If you’re using voice recognition and speech synthesis in your robotic project, please consider sharing your experiences with other readers. SV

Posted by Michael Kaudze on 07/06 at 10:35 AM


<< Back to blog