Add Voice Control and Speech to the Game
You can use the Alexa Web API for Games to add Alexa speech and voice commands to your web-based game. For more details about the Alexa Web API for Games, see About Alexa Web API for Games.
Add Alexa interactions to your game
You can add speech to your web-based game so that Alexa talks to the user while they interact with the game. The Alexa speech might be to respond to user interactions or share information about what's happening in the game. Alexa can also prompt the user for a spoken response, as described later.
For example:
User touches the "Fire" button on the screen.
Alexa: Firing the torpedoes…. (sound effects)…. Sorry, looks like you missed. You'll have to wait until your next turn to try again! (As Alexa speaks, the display on the web app changes.)
Web app presents new graphics and waits for the user's touch input.
To make Alexa speak to the user
- In the web app, call
alexa.skill.sendMessage()
to send the skill a message. - In your skill code, create a handler for the
Alexa.Presentation.HTML.Message
request generated by thesendMessage()
call.
This handler returns a response with:- The
outputSpeech
Alexa should say. - The
shouldEndSession
property left undefined (not set).
This response tells Alexa to speak the text, and then leave the session without opening the microphone.
- The
- In the web app, register listener functions to respond to Alexa events.
Alexa notifies your app when speech starts and stops:
Prompt the user for voice input
Your web app can make Alexa prompt the user for voice input during the game, such as in response to a button press in the game. For example:
User touches the "Fire" button on the screen.
Alexa: Firing the torpedoes… (sound effects)…. Sorry, looks like you missed. Do you want to try that again?
Alexa opens the microphone to listen to the user's response.
User: Yes (The skill gets a normal intent from the interaction model, such as AMAZON.YesIntent
.)
Game continues….
When you decide how to prompt the user for voice input, it's important to consider what methods they have for initiating speech on their own. If their device has a button available that gives the user push-to-talk functionality, it might be more natural for them to initiate conversations by pressing or pressing-and-holding the button. Alternatively, if the device supports wake-word activation, the user might be able to use the wake word to play your game hands free. You can determine what methods the device supports by using the capabilities
interface.
Dialog
directives (such as Dialog.Delegate
) during this conversation. Returning a directive from any interface other than Alexa.Presentation.HTML
closes the web app.To prompt the user for voice input
- In the web app, call
alexa.skill.sendMessage()
to send the skill a message. - In your skill code, create a handler for the
Alexa.Presentation.HTML.Message
request. This handler returns a response with:- The
outputSpeech
Alexa should say. - A
reprompt
to use if the user doesn't respond. - The
shouldEndSession
property set tofalse
.
This response tells Alexa to speak the text, and then open the microphone for the user's response.
- The
- In the web app, register listener functions to respond to Alexa events. Alexa notifies your app when speech starts/stops and when the microphone opens/closes:
These steps trigger a normal Alexa skill interaction. Alexa speaks the outputSpeech
, and then opens the microphone for a few seconds to listen for the user's response. If the user's response isn't understood, Alexa speaks the reprompt, and then opens the microphone again. If the user still doesn't respond, Alexa closes the microphone, but keeps the session open because the web app is still displayed on the screen.
After the user responds to the prompt with an utterance that resolves to an intent in your model, your skill gets an IntentRequest
. An intent handler in your skill should handle this request. For example, your intent handler might return a response that contains:
- An
Alexa.Presentation.HTML.HandleMessage
to tell the web app relevant information from the user's spoken response. - (optional)
outputSpeech
if you want Alexa to say something to the user. - The
shouldEndSession
property set to either undefined (when you don't need to open the microphone for another response) orfalse
(when you do want to open the microphone for additional spoken input).
Finally, in your web app, call alexa.skill.onMessage()
to register a callback to respond to the incoming message.
Get user-initiated voice input
When your web app is on the screen, the user can use the wake word to speak to Alexa at any time. Your skill should expect user-initiated voice input while the web app is active.
User touches the screen to select several targets. Web app responds with normal sound effects and graphics.
User: Alexa, fire at the targets! (Because the skill session is open, the user can invoke an intent in your skill with just the wake word and an utterance.)
Skill receives an IntentRequest
corresponding to the "fire at the targets" utterance.
Alexa: Roger. Firing the torpedoes now!
Your web app responses with sound effects and graphics.
To get user-initiated voice input
- In your skill's interaction model, add intents with sample utterances that users might speak when playing your game.
- In your intent handlers for these intents, return:
- An
Alexa.Presentation.HTML.HandleMessage
to tell the web app relevant information from the user's spoken request. - (Optional)
outputSpeech
if you want Alexa to say something to the user. - The
shouldEndSession
property set to either undefined (when you don't need to open the microphone for another response) orfalse
(when you do want to open the microphone for additional spoken input).
- An
- In your web app, call
alexa.skill.onMessage()
to register a callback to respond to the incoming message.
Use transformers to render voice natively in HTML
The Alexa.Presentation.HTML.Start
and the Alexa.Presentation.HTML.HandleMessage
take an optional transformers
array. A transformer converts either Speech Synthesis Markup Language (SSML) or plain text into an audio stream and provides a URL to that stream to your web app. You can use the fetchAndDemuxMP3
method in your skill backend to extract the audioBuffer
and speechMarks
from the output speech. You can synchronize visuals and web audio with the speechMarks
so you can have finer control over the synchronization than is possible with just the speech callbacks.
Related topics
- Build Your Web App with Web API for Games
- Alexa.Presentation.HTML Interface
- Alexa HTML API
- Create the Interaction Model for Your Skill
- Speech Synthesis Markup Language (SSML) Reference
Last updated: May 01, 2024