Integrate Visual and Audio Responses
With Alexa Presentation Language (APL), you can build a visual response for devices with screens. With APL for audio, you can build a rich audio response that mixes speech with sound effects and other audio. Combine your visual response and your audio response to provide an engaging user experience.
About integrating a visual and audio response
You can combine a visual response and an audio response in two different ways:
- Return two
RenderDocument
directives in the same response – Your skill returns two separate directives in the same response:Alexa.Presentation.APL.RenderDocument
with the APL document for the visual response.Alexa.Presentation.APLA.RenderDocument
with the APL for audio document for the audio response.
- Embed the audio in the visual response – Your skill returns a single
Alexa.Presentation.APL.RenderDocument
that includes both the APL document for the visual response and the APL for audio document for the audio response. You then manually invoke the audio with an APL command.
For details, see the following sections:
Return two RenderDocument directives in the same response
When you provide the RenderDocument
directives for both visual and audio responses in a single response, Alexa displays the content on the screen, speaks any provided output speech, and then plays the APL for audio response. The normal speech output and the audio response never overlap.
To return the RenderDocument directive with both an audio and visual response
- Build the APL document for your visual response and the APL for audio document for your audio response.
-
In your skill code, add both the
Alexa.Presentation.APL.RenderDocument
andAlexa.Presentation.APLA.RenderDocument
to thedirectives
array.The order of the directives in the
directives
array doesn't matter.
For example, the following APL document and data source displays the AlexaHeadline
template with a "hello world" welcome message.
Assume you wanted to display this content and also play audio that combines speech and background sound effects. The following APL for audio document and data source provides this audio.
To send both of these examples at the same time, you include RenderDocument
from both the APL and the APLA interfaces in your skill response.
The following example shows a response that adds the two RenderDocument
directives to the directives
array. The response also sets the outputSpeech
property. Alexa displays the content on the screen, speaks the outputSpeech
, and then plays the APL for audio response.
helloWorldRenderExample
. The APL for audio document was saved with the name helloWorldRenderExampleAudio
.For details about linking to an APL document, see Link to an APL document saved in the authoring tool. For details about linking to an APLA document, see Link to an APLA document saved in the authoring tool.
Embed the audio in the visual response
You can embed the audio response in the APL document for the visual response. Your skill returns a single Alexa.Presentation.APL.RenderDocument
directive, and you manually invoke the audio with the APL SpeakItem
or SpeakList
command.
This option lets you more precisely synchronize the audio with your visual content. For example, Alexa can play your audio for each item in a list and automatically scroll and highlight each list item. Embedding the audio is also useful if you want to play the audio in response to a user event, such as the user tapping a button on the screen.
Because the audio plays in response to the command, it's possible for the audio to start playing before Alexa finishes saying the normal output speech. You can use different approaches to avoid this. For example, you can include all the relevant speech for the response in the APL for audio document, and then don't include the outputSpeech
property.
Embedding the audio requires multiple steps. You use the aplAudioToSpeech
transformer to convert the APL for audio document to an audio file, and then you bind a reference to that audio file to a component in your APL document.
To embed the audio in the visual response and invoke it from the document
- Build an APL document for your visual response and an APL for audio document for your audio response.
-
In the data source for the APL document, configure the
aplAudioToSpeech
transformer.The transformer converts the APL for audio document to an audio file you can invoke with an APL command. For details, see Configure the aplAudioToSpeech transformer.
- Bind the transformer output to the
speech
property on a component in your APL document. For details, see Bind the transformer output to a component. -
Run either the
SpeakItem
orSpeakList
APL command and target the component with thespeech
property set.You can start these commands in several different ways, such as in response to the user tapping the screen, or when the document initially displays. For details, see Run the SpeakItem or SpeakList command to invoke the audio.
-
In your skill code, return the
Alexa.Presentation.APL.RenderDocument
. Include the APL for audio document in the same directive in thesources
property.For details, see Include the audio response as part of the RenderDocument directive.
The following sections provide more details about each of these steps.
Build the APL and APL for audio documents
Build the APL and APL for audio documents. You can save the documents in the authoring tool and then link to them from your code when you send the RenderDocument
directive, or you can include the full JSON of each document in your skill code.
Configure the aplAudioToSpeech transformer
The aplAudioToSpeech
transformer converts your APL for audio document into audio file you can reference within your APL document. You include the transformer in a transformers
array in the data source for your APL document.
A transformer converts data you provide in your data source and then writes the output to a property in the same data source.
For the aplAudioToSpeech
transformer, you provide:
template
– The name of an APL for audio document to convert.outputName
– The name of the property where the transformer stores the URL to the converted audio file.inputPath
– (optional) A property in the data source that contains data to use in the APL for audio document. Use this property to create audio for each item in an array. For an example, see Play the audio for each item in a list.
{
"transformers": [
{
"template": "helloWorldEmbedAPLAudioExampleAudio",
"transformer": "aplAudioToSpeech",
"outputName": "welcomeSpeech"
}
]
}
To use a transformer, you must define the data source as an object data source by setting the type
property to object
. Define any properties you want to convert with the transformer within a properties
object.
The following example shows a valid data source that you could use with the "hello world" document shown earlier. The aplAudioToSpeech
transformer converts the APL for audio document called helloWorldEmbedAPLAudioExampleAudio
to an audio file and stores the URL for this audio file in the property helloWorldData.properties.welcomeSpeech.url
.
{
"helloWorldData": {
"type": "object",
"objectId": "helloWorldSample",
"properties": {
"headerTitle": "Example: Invoke an audio response from the visual response",
"primaryText": {
"type": "PlainText",
"text": "Welcome to APL, with an audio response!"
},
"secondaryText": {
"type": "PlainText",
"text": "This example embeds the APL for audio response in an APL document."
},
"welcomeText": {
"contentType": "SSML",
"textToSpeak": "<speak><amazon:emotion name='excited' intensity='medium'>Welcome to APL!</amazon:emotion> This example integrates the APL for audio response in the APL document. To do this, use the APL audio to speech transformer to create the audio clip. Next, bind the output of the transformer to the speech property on a component in the APL document. Finally, invoke the audio with the SpeakItem command. This example runs the command from the onMount handler, so the command runs when the document displays.</speak>"
}
},
"transformers": [
{
"template": "helloWorldEmbedAPLAudioExampleAudio",
"transformer": "aplAudioToSpeech",
"outputName": "welcomeSpeech"
}
]
}
}
The following example shows the data source after the aplAudioToSpeech
transformer runs. The properties
object now has an additional property welcomeSpeech
with the results.
Bind the transformer output to a component
To use the transformer output in your APL document, you use an APL data binding expression to bind the output to the speech
property of a component. Bind to the outputName.url
property.
{
"type": "AlexaHeadline",
"id": "helloWorldHeadline",
"headerTitle": "${payload.helloWorldData.properties.headerTitle}",
"primaryText": "${payload.helloWorldData.properties.primaryText.text}",
"secondaryText": "${payload.helloWorldData.properties.secondaryText.text}",
"speech": "${payload.helloWorldData.properties.welcomeSpeech.url}"
}
Run the SpeakItem or SpeakList command to invoke the audio
To play the audio, run the SpeakItem
or SpeakList
command and target the component with the speech
property.
To play the audio… | Do this… |
---|---|
When the document displays |
Invoke In this scenario, the audio begins to play when the document displays, even if Alexa is still speaking the |
When the user selects a component on the screen |
Invoke the
For an example, see Play the audio in response to user interactions. For a good user experience, you should also let users select buttons and other touchable items by voice. |
When the user makes a request by voice |
Create an intent in your interaction model to capture the request. In the handler for this intent, return the For details, see Play the audio in response to user interactions. |
The following example plays the speech bound to the speech
property on the component with the ID helloWorldHeadline
when the document displays.
{
"onMount": [
{
"type": "SpeakItem",
"componentId": "helloWorldHeadline"
}
]
}
Include the audio response as part of the RenderDocument directive
To use the embedded audio, return the Alexa.Presentation.APL.RenderDocument
directive that includes both the APL document and the APL for audio document:
- Set the
document
property to the APL document. You can setdocument
to either a link to the document saved in the authoring tool, or to the full JSON for the document. - Set the
sources
property to a string/object map. Within this map, set the string to a name for the APL for audio document and set the object to the document.- The name must match the string you used for the
template
property in theaplAudioToSpeech
transformer. -
You can provide the document as either a link to the document saved in the authoring tool, or the full JSON for the document.
The following example shows the
sources
map with one source calledhelloWorldEmbedAPLAudioExampleAudio
. This example assumes you saved the APL for audio document in the authoring tool with the name "helloWorldEmbedAPLAudioExampleAudio".{ "sources": { "helloWorldEmbedAPLAudioExampleAudio": { "src": "doc://alexa/apla/documents/helloWorldEmbedAPLAudioExampleAudio", "type": "Link" } } }
- The name must match the string you used for the
- Set the
datasources
property to a string/object map containing the data sources you use in both the APL and APL for audio documents.
The following example shows a response that sends a single RenderDocument
directive that includes both the APL document and the APL for audio document. Note that the response doesn't include the outputSpeech
because the APL document uses the onMount
handler to start the audio. If you did include outputSpeech
, the speech and audio would overlap.
Play audio for each item in a list
The APL SpeakList
command plays speech associated with each item in a list, such as a Sequence
. The list automatically scrolls each item into view and can highlight the item by changing the item appearance for a karaoke effect. You can use SpeakList
with APL for audio to play an audio clip for each item in a list.
The overall steps are the same as described in Embed the audio in the visual response. You configure the aplAudioToSpeech
transformer, bind the transformer output to the list items, and then invoke the SpeakList
command. When you configure the transformer, you configure it to process an array of items and create a clip for each item instead of generating a single clip.
To play audio for each item in the list
- Build an APL document that uses a multi-child component, such as
Sequence
orGridSequence
, to display your list. Include theid
property on the component. For an example, see Build an APL document and data source to display a list. - Create a data source with an array of the list items to display.
- Use an object data source, and put the data array within the
properties
object. - In your document, bind the data array to the
data
property of theSequence
orGridSequence
.
For an example, see Build an APL document and data source to display a list.
- Use an object data source, and put the data array within the
- Set the
inputPath
property on theaplAudioToSpeech
transformer to refer to the path to your data array with your list items. For details, see Configure the aplAudioToSpeech transformer to process an array of items. - Build an APL for audio document that plays the audio you want for a single list item. Use data binding to access the data from the array of list items. For details, see Build the APL for audio document.
- Bind the output of the transformer to the child component of the
Sequence
orGridSequence
. For details, see Bind the transformer output to the Sequence child component. - Run the
SpeakList
command and target theSequence
orGridSequence
. For details, see Run the SpeakList command and configure the karaoke style. - In your skill code, return the
Alexa.Presentation.APL.RenderDocument
. Include the APL for audio document in the same directive in thesources
property.
Build an APL document and data source to display a list
The following examples show a Sequence
that displays a list of sound names. The data to display in the list comes from the array listOfSounds
in the listOfSoundsData
data source. When this document displays, the Text
component in the Sequence
displays one time for each item in data
.
{
"type": "Sequence",
"height": "100%",
"width": "100%",
"id": "listOfSoundsSequence",
"numbered": true,
"padding": [
"@marginHorizontal",
"@spacingLarge"
],
"items": [
{
"type": "Text",
"text": "${ordinal}. ${data.name}",
"spacing": "@spacingLarge"
}
],
"data": "${payload.listOfSoundsData.properties.listOfSounds}"
}
Use an object type data source and put the array within the properties
object in the data source so that the transformer can access the array later. The following example shows this data source with the first four list items.
{
"listOfSoundsData": {
"type": "object",
"objectId": "speakListAplForAudioExample",
"properties": {
"listOfSounds": [
{
"audioUrl": "soundbank://soundlibrary/animals/amzn_sfx_bird_chickadee_chirp_1x_01",
"duration": 0.63,
"name": "Bird Chickadee Chirp 1x (1)"
},
{
"audioUrl": "soundbank://soundlibrary/animals/amzn_sfx_bird_forest_01",
"duration": 2.9,
"name": "Bird Forest (1)"
},
{
"audioUrl": "soundbank://soundlibrary/animals/amzn_sfx_bird_robin_chirp_1x_01",
"duration": 0.67,
"name": "Bird Robin Chirp 1x (1)"
},
{
"audioUrl": "soundbank://soundlibrary/animals/amzn_sfx_raven_caw_1x_01",
"duration": 0.63,
"name": "Raven Caw 1x (1)"
}
]
}
}
}
Configure the aplAudioToSpeech transformer to process an array of items
To create an audio clip for each item in an array, you use the inputPath
property on the aplAudioToSpeech
transformer. The inputPath
property specifies the path to an object in the data source that contains data to use in the APL for audio document.
Set the inputPath
to the same array you're displaying in the Sequence
. Set the template
to the name of the APL for audio document, and the outputName
to the property to hold the transformer output.
In the following example, Alexa uses the provided template
(soundItemToPlay
) to generate an audio clip for each item in the listOfSounds
array.
{
"transformers": [
{
"transformer": "aplAudioToSpeech",
"template": "soundItemToPlay",
"outputName": "speech",
"inputPath": "listOfSounds.*"
}
]
}
Build the APL for audio document
The APL for audio document that you build should play the audio for a single item in the list. The transformer generates a separate audio clip for each item in the inputPath
array using the APL for audio document as a template.
In the APL for audio document, you can use data binding to access the data for an array item. Use the expression ${payload.data}
to access this data.
The following example shows a document with a Sequencer
component that speaks the sound name, followed by audio of the sound effect. The values for ${payload.data.name}
and ${payload.data.audioUrl}
come from the array referenced in the inputPath
property — in this example, the listOfSounds
property shown earlier.
{
"type": "APLA",
"version": "0.91",
"mainTemplate": {
"parameters": [
"payload"
],
"item": {
"type": "Sequencer",
"items": [
{
"type": "Speech",
"contentType": "SSML",
"content": "${payload.data.name}"
},
{
"type": "Silence",
"duration": 500
},
{
"type": "Audio",
"source": "${payload.data.audioUrl}"
}
]
}
}
}
When the aplAudioToSpeech
transformer runs, it does the following for each item in the listOfSounds
array:
- Replaces the expressions
${payload.data.name}
and${payload.data.audioUrl}
with the values from the item inlistOfSounds
. - Creates an audio clip based on the APL for audio document. In this example, the clip speaks the name of the sound, followed by a sample of the sound itself.
-
Adds a new object to the item in the array in the property
outputName
. This object has aurl
property with the URL of the generated sound clip. The following example shows the transformer output for the first item in the list:{ "duration": 0.63, "audioUrl": "soundbank://soundlibrary/animals/amzn_sfx_bird_chickadee_chirp_1x_01", "speech": { "url": "https://tinyaudio.amazon.com/ext/v1/apl/audio/AYADeIg.../resource.mp3" }, "name": "Bird Chickadee Chirp 1x (1)" }
Bind the transformer output to the Sequence child component
In the APL document, bind the speech
property for the Sequence
or GridSequence
child component to the transformer output. Make sure you bind the speech
component on the child component, not on the Sequence
itself.
The following example shows the earlier Sequence
component. The data
property is bound to the listOfSounds
array in the data source. The speech
property of the Text
component is bound to the output of the aplAudioToSpeech
component.
{
"type": "Sequence",
"height": "100%",
"width": "100%",
"id": "listOfSoundsSequence",
"numbered": true,
"padding": [
"@marginHorizontal",
"@spacingLarge"
],
"items": [
{
"type": "Text",
"text": "${ordinal}. ${data.name}",
"spacing": "@spacingLarge",
"speech": "${data.speech.url}"
}
],
"data": "${payload.listOfSoundsData.properties.listOfSounds}"
}
Run the SpeakList command and configure the karaoke style
To play the audio, run the SpeakList
command and target the Sequence
or GridSequence
component. As described in Run the SpeakItem or SpeakList command to invoke the audio, you can invoke the SpeakList
command in several different ways. Make sure you target the component for your list (the Sequence
or GridSequence
).
The following example defines a button that plays the audio for each item in the Sequence
.
Users expect to be able to select individual items within a list. To play the audio for a single list item when the user selects that item, wrap the Text
component in a TouchWrapper
and set the onPress
property to the SpeakItem
command. The speech
property must be on the Sequence
child component. Set the property on the TouchWrapper
instead of the Text
component. In this example, the SpeakItem
command doesn't need the componentId
property because the command targets its own component, the TouchWrapper
itself.
{
"type": "TouchWrapper",
"spacing": "@spacingLarge",
"speech": "${data.speech.url}",
"items": [
{
"type": "Text",
"text": "${ordinal}. ${data.name}"
}
],
"onPress": [
{
"type": "SpeakItem"
}
]
}
The SpeakList
command can also highlight each item during the audio. To enable this, add a style that changes the visual appearance of the Sequence
child component based on the karaoke
state. A component has the karaoke
state during the time Alexa plays its speech
. Assign the style to the style
property on the component.
For example, the following style changes the color of a component to blue when Alexa plays the speech
for the component:
{
"styles": {
"textStyleListItem": {
"values": [
{
"when": "${state.karaoke}",
"color": "blue"
}
]
}
}
}
The following example shows the TouchWrapper
with the Text
component for a list item, now with the text style. The karaoke
state applies to the item with the speech
property, which is the TouchWrapper
in this example. To apply this state to the Text
component, set inheritParentState
to true
.
{
"type": "TouchWrapper",
"spacing": "@spacingLarge",
"speech": "${data.speech.url}",
"items": [
{
"type": "Text",
"text": "${ordinal}. ${data.name}",
"style": "textStyleListItem",
"inheritParentState": true
}
],
"onPress": [
{
"type": "SpeakItem"
}
]
}
The following examples show the complete APL document and data source that displays a list of items. The user can select the button to hear all items on the list, or select an individual item to hear a single item.
Return the RenderDocument directive
To use the embedded audio, return the Alexa.Presentation.APL.RenderDocument
directive that includes both the APL document and the APL for audio document as described in Include the audio response as part of the RenderDocument directive.
Play the audio in response to user interactions
You can run the SpeakItem
or SpeakList
commands in response to user interactions, such as when the user taps the screen or makes a request by voice.
The overall steps are the same as described in Embed the audio in the visual response:
- Configure the
aplAudioToSpeech
transformer and bind the transformer output to thespeech
property of a component. - For a tap event, such as tapping a button, run the
SpeakItem
orSpeakList
command from a handler on the component the user can tap. - For a voice request, create an intent in your interaction model to capture the request. In the handler for this intent, return the
ExecuteCommands
directive with theSpeakItem
orSpeakList
command and the ID of the component with thespeech
property set.
For the best user experience, make your skill respond to both tap events and voice requests. The user can then choose how to interact with your skill.
Respond to tap events with the audio
The following APL document displays a list of sounds, similar to the example in Play audio for each item in a list. This example improves the overall look of the content by using the AlexaTextListItem
responsive component instead of a custom component. The primaryAction
property on an AlexaTextListItem
specifies the command to run when the user taps the list item.
Respond to spoken requests with the audio
Users expect to interact with Alexa by voice, even when viewing content on the screen. For example, when viewing a list of items, the user might want to ask for a specific item with an utterance like "select the second one."
Create intents to capture the voice request relevant to your APL content. To respond to an intent with audio embedded in the document, invoke SpeakItem
or SpeakList
with the Alexa.Presentation.APL.ExecuteCommands
directive. When your skill returns the ExecuteCommands
directive, Alexa speaks any outputSpeech
in the response first, and then invokes the specified commands.
Continuing the previous example with the list of sounds, add multiple intents to fully voice-enable the visual content:
- Use the built-in intent
AMAZON.SelectIntent
to let the user ask for a particular item on the list. The user can say phrases like "select the fourth item." TheAMAZON.SelectIntent
sends anIntentRequest
to your skill that includes information about the selected item. - Create a custom intent to let the user ask to read the entire list. For example, this intent might include the utterances "listen to these sample sounds" and "read me this list."
- Create a custom intent with a slot to let the user ask to for an item by name. This lets the user say utterances like "play the bird forest one sound." Use a slot on this intent to collect the name of the sound to play.
For each of these intents, your skill responds with the ExecuteCommands
directive and either the SpeakList
or SpeakItem
command.
Example: Respond to AMAZON.SelectIntent with audio for a single list item
The built-in AMAZON.SelectIntent
intent includes a slot called ListPosition
, which captures the ordinal position of the list item and converts it to a number. When the user says "select the third one," the ListPosition
slot value is 3
.
The ListPosition
slot also can provide the component ID of the list item when both of the following are true:
- The item the user asks about is visible on the screen.
- The component for the item has an ID defined.
You can use this intent and slot to capture the specific item the user requests and respond with ExecuteCommands
to play the speech
associate with that item.
The following handler runs when both of the following conditions are true:
- The device is displaying the document shown in Respond to tap events with the audio.
- The user invokes
AMAZON.SelectIntent
.
The handler attempts to get the item the user asked about from the ListPostion
slot and responds with ExecuteCommands
to play the audio associated with the selected item.
Related topics
- Alexa.Presentation.APL Interface Reference
- Alexa.Presentation.APLA Interface Reference
- Standard Commands
- Multi-child Component Properties
- Touchable Component Properties
Last updated: Nov 28, 2023