Controlling Robots Using Amazon’s Alexa
If you have ever used an Alexa (Amazon’s Echo device), you may have thought how great it would be if she could be used to control your robots. The custom Alexa skill described in this article makes that possible.
My previous write-up discussed TCP communication. The capabilities explained in that article allow you to send messages and thus control devices like robots over the Internet. Another great thing about understanding how to send and receive messages over the ‘net is that it makes it possible to utilize a unique Alexa skill called ROBOT BASIC to control your robot with voice commands. While the skill was designed to work seamlessly with the RobotBASIC language (free from RobotBASIC.org), it can be used with any system capable of TCP communication.
Alexa is a fantastic piece of hardware with extraordinary voice recognition capabilities due to multiple microphones and noise-cancelling algorithms. Voice commands can be whispered to Alexa if she is close by or yelled from another room. Recognition is generally not greatly affected by many of the problems typically associated with voice control products.
It’s not always perfect, of course, but I found it much better than other systems I have tried. Strategically placing two or three Alexa devices throughout your home can often make voice control a reality from almost anywhere in the house.
The first time I used an Alexa, I immediately wanted to interface it with my robotics projects. The possibility of controlling robots like my Arlo was intriguing, to say the least (see Figure 1 and a series of SERVO articles on Arlo starting back in January 2015).
I signed up with Amazon as an Alexa developer, but quickly found that Amazon has placed many restrictions — not only on what can be said, but exactly how things must be stated. While this certainly makes sense for typical Alexa applications, I needed a less confining approach. I needed someone that truly understood the options that were available.
I sought out a developer with Alexa experience and we used my ideas and his skills to create a workable system. Because of Amazon’s requirements and restrictions, there are some things I’m not totally happy with, but overall, the ROBOT BASIC skill provides a speech recognition system for robotics with capabilities beyond anything most hobbyists have experienced.
Alexa is a cloud-based system. When you speak, she simply sends the converted text to one of many programs running on an Amazon server (these programs effectively implement the Alexa skills).
The programs can be written in languages such as Java or Python, but Amazon provides a special language call Node.JS (Java Script) that is usually the best choice for Alexa developers.
The need to get the spoken text sent to the right program is a major source of many of Amazon’s restrictions on the syntax of verbal commands.
For my purposes, Alexa needed to send the text to the ROBOT BASIC skill in the cloud.
For that reason, the name ROBOT BASIC itself had to be fairly unique, so that the various commands would be sent to the right program.
Once you get Alexa’s attention using the wake word (I use Alexa), you must say something that lets her know that the spoken information should be sent to the ROBOT BASIC program. You can use phrases like OPEN ROBOT BASIC, LAUNCH ROBOT BASIC, TELL ROBOT BASIC, ASK ROBOT BASIC, etc.
The words you can use are tightly controlled by Amazon and very little leeway is allowed in many cases.
Normally, the cloud-based program will handle all the details of implementing a skill. Generally, this involves parsing the text, determining what is being requested, and, in some way, completing the requested task.
This lets Alexa respond properly to phrases like “Alexa what is the weather today” or “Alexa turn on my floor lamp.”
It’s worth mentioning that there are a few control-a-robot skills available for Alexa, but to my knowledge, all are very restrictive in the commands allowed because the skill itself performs the actions (like typical Alexa skills). Sometimes, they even require a special phone app to utilize the application. I wanted something much more versatile.
My original goal was to have the cloud-based skill send the text of a spoken command over the Internet to a specific IP and port address (see my last article for more information on Internet communication). Node.JS had the capability of communicating using TCP.
As mentioned earlier, this process is totally different from how Amazon expects skill programs to work. In this case, the skill serves only as a speech-to-text converter. Processing the command and performing the actions associated with it now become the responsibility of the program receiving the TCP message containing the spoken information.
Structuring the Commands
I had hoped that once Alexa determined I wanted to use the ROBOT BASIC skill, she could send everything said to the cloud so that it could be relayed with TCP to my intended destination. Unfortunately, that’s not how Alexa works.
For a custom skill like I wanted, the user must use specific transition words in a dictated sequence (words like tell, ask, to, for, and if, for example) to connect the skills invocation name (ROBOT BASIC) to special, predefined command words. While this makes total sense for the situations Alexa was designed to handle, it complicates things severely for what I wanted to accomplish.
In the end, I came up with some specific phrases that satisfy Amazon’s requirements while still providing the functionality I wanted. The phrases are not ideal, but at least with Amazon’s current restrictions, I think they’re a good compromise. Figure 2 shows some sample commands that work. Brackets are used in these examples to indicate the text that will be relayed using TCP communication. Note: You can also use the word “ask” instead of “tell” for all of these commands.
Alexa tell robot basic to make <command text >
Alexa tell robot basic it should <command text>
Alexa tell robot basic to move <command text>
Alexa tell robot basic the robot <command text>
This means you could say “Alexa tell robot basic to move Arlo to the kitchen.” The word “Alexa” wakes up the machine, “tell robot basic” controls which cloud-based program will get the information, and “to move” provides a required transition to the actual text (Arlo to the kitchen) to be sent to the cloud. Once the skill receives this text, it’s easy to relay it using TCP to a program that’s controlling your robot.
An important point here is that the skill receives only the text inside the brackets, so that is the only text that can be sent to a TCP server that controls your robot.
I’m using RobotBASIC to create a sample TCP server (see my last article) to receive this text (and to act upon it as I see fit), but you can use any programming language capable of TCP communication. It’s the responsibility of the server program to parse the text and look for specific words such as Arlo, kitchen, etc., and then perform the desired action.
If this is confusing, refer to my previous Arlo articles in SERVO (or my book, Arlo: The Robot You’ve Always Wanted). The articles explain how to utilize Windows’ speech-to-text capability to control Arlo. Since Windows and Alexa both simply provide the spoken command as a string, it’s easy to modify a program designed for one to work with the other. Take a look at Figure 2.
Using Your IP Address and Port Number
Of course, it’s essential that when you give commands to your Alexa, that the translated text should be sent to a computer that controls your robot. This means that each user of the ROBOT BASIC skill must have a stored IP address and port number that should be used just for them. Fortunately, Amazon provides the ability for skill programs to store information in a special database file. Linking your IP and port information to commands given to your Alexa is all handled internally by the skill itself.
You can set up your personal IP and port number by using verbal commands such as:
Alexa tell robot basic the new I P address is 1 9 2 dot 1 6 8 dot 0 dot 1 2 0
Alexa tell robot basic the new port number is 4 2 0 0 1
Since Alexa has fantastic recognition capabilities, you can even say things like one hundred ninety two instead of the individual digits 1 9 2 if you wish. Of course, if the IP address of your machine changes or if you have a conflict with the port number you’re using, you’ll need to specify new values. Generally, this should seldom be a problem.
Controlling a Robot
The details of how your program controls your robot will generally be unique to your situation. In order to demonstrate how to perform such control, my example program will allow Alexa commands to move RobotBASIC’s integrated simulated robot around the screen. One of the great things about this option is that this very same program can move any real robots that are built using the RobotBASIC RROS system. Refer to my articles in the December 2017, January 2018, and March 2018 issues of SERVO for more information on this approach.
The program in Figure 3 is a modified version of the TCP server program discussed in the Sept/Oct issue.
// initialize TCP
lcdPort = 42000
n = tcps_Serve(lcdPort)
// wait for message and send back GOT IT
rBuff = tcps_Read()
vp+=15 // move text down by 15
xyString 585,5,”Text Received”
vp = 25 //vert pos of text
// create obstacles
// create the robot
// rBuff contains the text
n = 60 // amount to move
dir = 1 //direction of turns
dir = -1
if not rFeel()&13 then rForward n
When run, the program produces an environment for the robot to use when executing Alexa commands (see Figure 4).
There are three obstacles on the screen that can hamper the robot’s movements. To prevent collisions, the program uses the robot’s rFeel() function to determine if objects are in the robot’s way and make it refuse to move forward when something is detected within range.
Notice also that the text received from Alexa is displayed on the right side of the output screen. The robot leaves a trail indicating its movements, so it’s easy for you to connect the commands to the movements performed.
Using RobotBASIC’s robot simulator makes it possible to easily experiment with the ROBOT BASIC skill. You should be able to activate the skill on your Echo device by simply saying “Alexa open robot basic.” Use the information in my last article to determine your IP and port information and enter those into the skill as discussed earlier. If Alexa understands the IP and port commands you give, she will repeat the numbers back to you so that you can confirm she understood them correctly.
Customization, Flexibility, and Additional Examples
I tried to create control words and phrases for the skill that could be used for a variety of situations. Hopefully, the skill can meet most people’s needs as written. If you require some customization to handle a specific need though, I recommend you contact the Alexa developer I used: Carter Kwon (https://carterkwon.com/).
If there is enough interest, I’m considering writing a book with more examples and details of how the ROBOT BASIC skill can be used in education and hobby robotics.
What’s in the zip?