Synchronize Spoken Text with Text on the Screen
Your skill response can associate speech with an APL Text component, and issue a command that highlights lines of text as the speech audio is played, to create a "karoke" effect that shows the lines that are in focus for a block of text.
To use this feature, you must provide speech data as plain text or as marked-up text using Speech Synthesis Markup Language (SSML) expressions. Before this data can be consumed by an Alexa-enabled device, it must be transformed into speech. To enable this transformation, you can use the ssmlToSpeech
transformer to transform the text to speech and strip SSML tags from an SSML expression. These transformers cannot be used with the audio tag.
ssmlToSpeech and ssmlToText transformers
Property | Type | Required | Description |
---|---|---|---|
transformer |
enum: ssmlToSpeech | ssmlToText |
Yes | The type of transformation required. Initially, two transformers will be available: 1) ssmlToSpeech converts a data source value to a text-to-speech URL, and 2) ssmlToText converts an SSML expression to plain text by stripping out any SSML tags. |
inputPath |
string | Yes | The path of the data source value that needs to be transformed. |
outputName |
string | No | The name of the data source property where the transformed output will be stored. This output property will always be a sibling of the input property. If an outputName isn't provided, the value in the inputPath will be replaced with the output of the transformer. |
The following sample APL document shows a version of a "Cat Facts" skill that associates speech with a Text component bound to a cat fact. The Text
component is wrapped in a ScrollView
component. This means the device will automatically scroll to the parts of the cat fact that aren't visible on screen as they are spoken.
Part of an APL document that shows a Text component that binds to speech
{
"type": "ScrollView",
"item": {
"type": "Text",
"id": "catFactText",
"text": "${catFactData.properties.catFact}",
"speech": "${catFactData.properties.catFactSpeech}"
}
}
The following sample shows the corresponding object data source and transformers sent by skill developers.
Object data source and transformer bound to the APL document
{
"datasources": {
"catFactData": {
"type": "object",
"properties": {
"backgroundImage": "https://.../catfacts.png",
"title": "Cat Fact #9",
"logoUrl": "https://.../logo.png",
"image": "https://.../catfact9.png",
"catFactSsml": "<speak>Not all cats like <emphasis level='strong'>catnip</emphasis>.</speak>"
},
"transformers": [{
"inputPath": "catFactSsml",
"outputName": "catFactSpeech",
"transformer": "ssmlToSpeech"
},
{
"inputPath": "catFactSsml",
"outputName": "catFact",
"transformer": "ssmlToText"
}
]
}
}
}
In this snippet, the transformed data source is now set to the device.
Transformed data source received by the device
{
"datasources": {
"catFactData": {
"type": "object",
"properties": {
"backgroundImage": "https://.../catfacts.png",
"title": "Cat Fact #9",
"logoUrl": "https://.../logo.png",
"image": "https://.../catfact9.png",
"catFactSsml": "<speak>Not all cats like <emphasis level='strong'>catnip</emphasis>.</speak>",
"catFactSpeech": "https://tinyurl.amazon.com/aaaaaa/catfact.mp3",
"catFact": "Not all cats like catnip."**
}
}
}
}
To read the cat fact, you must use the Alexa.Presentation.APL.ExecuteCommands
directive with the SpeakItem
command. The next snippet shows the Alexa.Presentation.APL.ExecuteCommands
directive that you can use to read the cat fact. The token
supplied in the ExecuteCommands
directive is required, and must match the token provided by the skill in the RenderDocument
directive used to render the APL document.
An Alexa.Presentation.APL.ExecuteCommands skill directive with a SpeakItem command
{
"type" : "Alexa.Presentation.APL.ExecuteCommands",
"token": "[SkillProvidedToken]",
"commands": [{
"type": "SpeakItem",
"componentId" : "catFactText"
}]
}
Last updated: Nov 28, 2023