I just finished listening to book one of Kevin Anderson’s Saga of Seven Suns, in which robots play a central role. In the story, the Klikiss robots are highly intelligent,
multi-limbed bug-like creatures that communicate with other robots using digital data streams and with humans via speech. The tale reminded me that at least one perception of intelligent robots revolves around the power of speech.
Unfortunately, progress in robotic speech is relatively stagnant. Speech synthesis has been a mature technology for decades, and advances in large vocabulary, continuous speech recognition seems to have hit a wall in the late 1990s. This is in part because the projected multi-billion dollar market for PC-based speech recognition document processing products never materialized. Today, few people even take notice of the speech recognition software available for the PC and Mac – and most hate the speech recognition systems used by the automated attendants employed by the airlines and credit card industries.
Despite the mystique of “AI” surrounding speech recognition, speech recognition software that you can purchase for your PC/Mac works by simply matching spectral templates of sounds and using tables of likely word sequences to build sentences. For example, if you say “ball,” the speech recognition software would identify likely candidates such as “ball,” “fall,” and “gall.” Now, if the previous three words are “Johnny hit the,” the algorithm will likely rank ball as the most probable word. Current accuracy limitations are about 97%, even with individual training, and accuracy isn’t improved by adding processing power or memory.
The obvious limitation to current speech recognition software is that it’s simply a replacement for the keyboard and video display. There is no underlying intelligence or reasoning capability. Of course, prototype systems capable of reasoning have been developed in academia, but these demonstration projects have been limited to highly constrained domains.
What we need in robotics is a system that not only recognizes the phrase, “Johnny hit the ball,” but that can infer with what. If Johnny is playing soccer, we might infer he hit the ball with his head. If the sport is baseball, then we might infer he used a bat. Back to our needs in robotics, the owner of a service bot should be able to say, “Please bring me the paper” and the robot should be able to infer that the owner is referring to the newspaper. There are also issues of image recognition, mobility, and grasping the paper, but they all depend on the robot understanding the need of the owner.
The limitation of speech recognition in robotics then isn’t in the ability to transform utterances into machine readable form, but with how the computational elements of the robot should process the machine readable words and phrases into actionable commands. So, how do you go about accomplishing this?
It’s a non-trivial task, as a search of the IEEE literature on Natural Language Processing will illustrate. The traditional techniques — such as Hidden Markov Modeling — might be a bit intimidating if you don’t have a degree in computer science. However, you can get a feel for the tools used to map out the contextual meanings of words and phrases by working with Personal Brain. You can download the free, fully-functional personal version at www.thebrain.com.
You can use the Brain to build context maps that show, for example, inheritance and the relationship between various objects in your home (see Figure 1). For your robot to bring you the newspaper, it would have to first locate the paper, and it would help to know the possible locations the paper might be found in the home. It would be inefficient, for example, if the robot began digging through your clothes’ closet in search of the newspaper, instead of on the table in your kitchen.
Once you get used to working with Personal Brain, you might want to explore other uses in robotics. For example, I keep track of my various robotic projects – parts, suppliers, references, etc.— by creating networks with the program. In fact, the best way to build context maps is to create explicit, detailed maps that actually help you in everyday tasks. SV