APL Visual Context in the Skill Request
The Alexa Presentation Language (APL) visual context provides your skill with information about the content displayed on the screen when the user invokes an intent or triggers a user event. Your skill can use the context to determine the state of on-screen elements, such as which parts of a list are visible on the screen.
About the visual context
The APL Visual context is information sent to your skill about the content the user sees on the device. The context provides both structural and semantic information about the content the user sees:
- Structural: How visual components appear on the screen – For example, there's a picture on the left, some scrolling text on the right, and two buttons under the text.
- Semantic: What the components represent – For example, the picture is a picture of a box of a particular brand of protein bars, the text describes these protein bars, the left button is a "more information" button and the right button is a "buy now" button.
The APL runtime constructs and reports the structural context. To make this semantic context useful, you provide information in your APL document that describes the meaning of your components. You can define the semantic data for a component in two places:
- Component
id
property – The visual context includes theid
you provide for a component. - Component
entities
property – The visual context includes the entities you provide for a component.
In the earlier example, the entities
for the protein bar picture might contain an identifier for the product. The Component id
for the two buttons might be buttonTellMeMore
and buttonBuyNow
to identify the buttons.
Visual context in the skill request
A request sent to your skill includes the visual context when the user's device has a screen and the screen is displaying an APL document your skill sent with the RenderDocument
directive.
The visual context is available in the Alexa.Presentation.APL
property within the top-level context
property in the request. The top-level context has the properties shown in the following table.
Property | Type | Description |
---|---|---|
|
String |
The token identifying the document displayed on the device. You define the |
|
String |
The version of the APL runtime that reported the visual context. |
|
Array |
Contains the elements that were visible on the screen when the user triggered the request to your skill. For details about the properties within an element, see Context core properties. |
The following example shows the visual context when the screen was displaying a touchable element with the ID fadeHelloTextButton
.
{
"version": "1.0",
"session": {},
"context": {
"Viewports": [],
"Alexa.DataStore.PackageManager": {
"installedPackages": []
},
"Viewport": {},
"Extensions": {},
"System": {},
"Alexa.Presentation.APL": {
"token": "helloworldWithButtonToken",
"version": "AriaRuntimeLibrary-2023.2.449.0",
"componentsVisibleOnScreen": [
{
"uid": ":1000",
"position": "960x480+0+0:0",
"type": "text",
"tags": {
"viewport": {}
},
"children": [
{
"id": "fadeHelloTextButton",
"uid": ":1002",
"position": "273x76+344+360:0",
"type": "text",
"tags": {
"focused": false,
"clickable": true
},
"entities": []
}
],
"entities": []
}
]
}
},
"request": {}
}
Context core properties
The componentsVisibleOnScreen
array organizes the visual elements into a hierarchy. A given element can contain child elements. For example, an element representing a scrolling region of the screen might include one or more child elements that represent the list times displayed within the scrolling region.
Each element in the hierarchy corresponds to a single component in your APL document. However, the context doesn't include all components defined in your APL document. For details, see Rules for generating the element hierarchy.
The following table defines the core properties for an element in the context.
Property | Type | Default | Source | Description |
---|---|---|---|---|
|
Array |
[] |
Calculated |
Array of child elements |
|
Array |
[] |
Component |
Array of entity data copied from the component |
|
String |
"" |
Component |
The id of the component |
|
String |
REQUIRED |
Calculated |
Global position of the element, as seen by a user. |
|
Map |
REQUIRED |
Calculated |
Any number of valid element tags |
|
Transform |
[] |
Component |
A visual transformation applied against the position. |
|
One of: |
REQUIRED |
Calculated |
Describes the visual appearance of the element |
|
String |
REQUIRED |
Generated by the runtime |
The unique runtime-generated id for the component |
|
Number |
1.0 |
Calculated |
Relative visibility of the element. |
To save space, the context omits properties that contain default values. For example, assume the device is displaying the following TouchWrapper
component that contains a Text
component.
{
"type": "TouchWrapper",
"id": "idForTheTouchWrapper",
"item": [
{
"type": "Text",
"id": "idForTextWrappedInTouchWrapper",
"text": "Text component wrapped in the TouchWrapper",
"inheritParentState": true,
"style": "touchableText"
}
],
"onPress": []
}
The TouchWrapper
is fully visible on the screen when the user invokes an intent that sends a request to your skill. The context reports the element as shown in the following example.
{
"id": "idForTheTouchWrapper",
"uid": ":1022",
"position": "952x51+35+224:0",
"type": "text",
"tags": {
"focused": false,
"clickable": true
},
"entities": []
}
Because the TouchWrapper
is fully visible, the visibility
property contains the default value 1. Alexa therefore omits this property. The TouchWrapper
has no values for the transform
property. Alexa omits this property. The Text
child component for this TouchWrapper
doesn't meet any of the requirements to be included in the context Therefore, Alexa omits the children
property.
children
An array containing the elements that fall logically under this element. For example, a scrolling list might contain multiple child elements in the list. The element defines the order of the children in the array.
An element in the context omits the children
property when the element has no reported children.
When generating the element hierarchy, the children of a component that isn't reported are attached to the parent of the component. For details on when a component isn't reported, see Rules for generating the element hierarchy.
For example, a document might have the following hierarchy of components.
Container "A" - reported
Container "B" - not reported
Text "C" - reported
Image "D" - not reported
Container "F" - not reported
Text "G" - reported
Text "H" - reported
This set of components produces the following element hierarchy in the context.
Element "A", type "mixed"
Element "C", type "text"
Element "G", type "text"
Element "H", type "text"
In this hierarchy, the children of "Container B" and "Container F" are within "A" because the context doesn't include "B" or "F."
entities
An array of entity data copied from the component. This data is opaque. You can provide data in the entities
property for the component to describe the meaning of the component.
When you set the entities
property for a component, provide an array of objects. The object can have the properties id
, type
, and value
. Any other properties aren't included in the visual context.
id
The id
property for the component as specified in the APL document. An element in the context omits the id
property when the corresponding component doesn't have an id
property.
uid
An identifier generated by the APL runtime. Each component is assigned a uid
. The value is an opaque string and is guaranteed to be unique in the scope of the document and not clash with any assigned id
value. Each element in the context always includes the uid
property.
position
Specifies the position of the element on the screen, in the form of a 5-tuple of width
, height
, x-position
, y-position
, and layer
. These values are in global coordinates, and aren't relative to the parent element. The values are the default or resting position of the items before applying any transformations. For details about transformations, see the component transform
property.
For compactness and interpretation, the position is a single string:
"position": "<WIDTH>x<HEIGHT>[+-]<XPOSITION>[+-]<YPOSITION>:<LAYER>"
The numeric values reported are dimensionless non-negative integers. The x and y-positions are measured from the top-left of the viewport. The layer value must meet the following requirements:
- No two elements have the same layer
- When two elements overlap, the element with the larger layer value is drawn on top of the element with the lower layer value.
The reported position is always in global coordinates. Using global coordinates ensures that the position uses the perspective of the user. You can also compare the relative position of any two elements.
The following example shows how the position
value represents different positions on the screen.
1280x800+0+0:0 // Top-level element on a 1280x800 dp screen
620x780+10+10:1 // The left column of the above top-level element
620x780+650+10:2 // The right column of the above top-level element
Each element in the context always includes the position
property.
tags
A map of attributes and data about those attributes. An element in the context includes the tags
property when the element has at least one tag.
For details about the possible tags, see Element tags.
transform
A 6-element array containing the 2D homogeneous transformation matrix applied against this element. The center of the transformation coordinate system is the center of the component. The transformation array is ordered as [A,B,C,D,Tx,Ty]
.
The transform property is reported if the transformation isn't the identify transformation.
type
An enumerated value that describes how the user perceives the element. The following table shows the valid type values.
Type | Description |
---|---|
|
An empty component with no visible content. |
|
A bitmap image or vector graphic |
|
A blend of graphics, video, and text |
|
Human-readable text |
|
A video player |
Alexa uses the rules shown in the following table to generate the type
for an element.
Component | Rules |
---|---|
The combination of all visible children. For example, if all the visible child components map to | |
| |
The child type. | |
The combination of all visible children. | |
| |
The child type of the current page. | |
The combination of all visible children. | |
| |
The child type. | |
The combination of all visible children. | |
| |
|
The combination of any two of text
, graphic
, or video
is mixed
. The type
property defaults to empty
if the component has no valid content and has no children. An element in the context always includes the type
property.
visibility
The visibility
property is an approximate calculation of how well the user can see the object. The visibility is defined as the percentage of the bounding box of the element that's visible in its parent multiplied by the opacity of the element.
For example, assume a vertically scrolling list where the last item in the list is 50 percent off the screen and has an 80 percent opacity. The visibility
for this item is 40%
, which is reported as 0.4
.
The visibility calculations don't consider applied component transform
values. The visibility calculation also doesn't consider that a component might be obscured by a child component on top of it.
Components with a display
property of invisible
or none
have zero visibility.
An element in the context includes the visibility
property when the visibility is greater than zero. The element omits the visibility
property when the value is 1
(the default, fully visible). The element includes the visibility
property when the value is zero in certain circumstances. For details on when items with zero visibility are reported, see Rules for generating the element hierarchy.
Element tags
An element tag provides additional information about the element. Most elements included in the context contain at least one tag. An element can have multiple tags.
The following table lists the available tags.
Tag | Type | Description | Created By |
---|---|---|---|
|
Boolean |
The checked state of a component that has two states. |
Any component with the |
|
Boolean |
A button or item that the user can press. | |
|
Boolean |
True when this component is disabled. |
Component with the |
|
Boolean |
The focused state of a component that can take focus |
The following components:
|
|
Object |
An ordered list of items |
|
|
Object |
Information about a |
|
|
Object |
Media player |
|
|
Integer |
A visibly numbered element |
|
|
Object |
A collection of objects displayed one at a time. |
|
|
Object |
A region of the screen that can scroll. |
|
|
Boolean |
A region of the screen that can be read by text-to-speech |
Component with the |
|
Object |
The entire screen in which a document is rendered |
Top-level component |
Each tag is either a basic data type (Boolean
, String
, or Integer
) or an Object data type containing more granular information.
checked
Boolean tag indicating that the checked
state for the component is true
. Because all components can have a checked state, any type of component might report the checked
tag.
{
"id": "XXXX",
"uid": ":1234",
"position": "10x10+0+0:0",
"type": "graphic",
"tags": {
"checked": true
}
}
A component with the inheritParentState
property set to true
doesn't report the checked
tag. To save space, an element in the context reports the checked
tag when its value is true
.
clickable
Boolean tag indicating that this component can be "clicked." This means that the user can activate the component by touch, from a keyboard, or with a remote. All touchable components are clickable.
{
"id": "XXXX",
"uid": ":1234",
"position": "10x10+0+0:0",
"type": "mixed",
"tags": {
"clickable": true,
"focused": false
}
}
An element in the context includes the clickable
tag if it's a touchable component. The tag returns true
for touchable components with the disabled
state.
disabled
Boolean tag indicating that the user can't interact with this component. All components can set the disabled
state, including components that don't receive clicks or focus. The following example shows the element reported for a disabled Text
component with the checked
state.
{
"id": "XYZZY",
"uid": ":1235",
"position": "100x50+10+10:5",
"type": "text",
"tags": {
"disabled": true,
"checked": true
}
}
Unlike the checked
state, the disabled
state is reported for components that have the inheritParentState
property set. To save space, the disabled
tag is reported when it's true
.
focused
Boolean tag indicating that a component can take keyboard focus. The value of the tag indicates the current state of the control. For example, a touchable item that doesn't have focus reports the focused
tag as false
.
{
"id": "XXXX",
"uid": ":1236",
"position": "10x10+0+0:0",
"type": "text",
"tags": {
"focused": false,
"clickable": true
}
}
The context includes the focused
tag when the component can take focus. The following components can take focus:
The focused
tag reports true
if the component has focus and false
if the component doesn't have focus.
list
An object with a collection of properties reported for a Sequence
or GridSequence
. The list tag object contains properties shown in the following table.
Property | Type | Description |
---|---|---|
|
Integer |
Total number of items in the list. |
|
Integer |
The index of the highest item seen |
|
Integer |
The ordinal of the highest ordinal-equipped item seen |
|
Integer |
The index of the lowest item seen |
|
Integer |
The ordinal of the lowest ordinal-equipped item seen |
Lists track the lowest and highest index/ordinal seen so you can make informed inferences about what the user might have observed on the screen. For example, if a new list displays ordinals 10 through 20, but 10 through 12 are visible on the screen, it's reasonable to disallow the user from saying "pick number 18" because the user doesn't know what item 18 contains.
The following example shows a list tag.
{
"id": "myListOfDogs",
"uid": ":138",
"position": "1280x800+0+0:0",
"type": "mixed",
"tags": {
"list": {
"itemCount": 190,
"lowestIndexSeen": 0,
"highestIndexSeen": 3,
"lowestOrdinalSeen": 1,
"highestOrdinalSeen": 4
},
"scrollable": {
"direction": "vertical",
"allowForward": true,
"allowBackwards": true
},
"focused": false
},
"children": [
{
"position": "800x600+20+33:0",
"uid": ":2352",
"type": "mixed",
"tags": {
"clickable": true,
"ordinal": 2,
"listitem": {
"index": 2
}
}
},
{
"position": "800x600+20+633:0",
"uid": ":23112",
"visibility": 0.16,
"type": "mixed",
"tags": {
"clickable": true,
"ordinal": 3,
"listItem": {
"index": 3
}
}
}
]
}
The list
tag isn't reported for an empty Sequence
or GridSequence
.
itemCount
The total number of items in the list. If the length of the list is unknown, the itemCount
is –1.
highestIndexSeen
The highest index of any child seen for this list. An item is "seen" if any part of the item displayed on the screen, even when it's a small number of pixels.
The highestIndexSeen
value is zero-based. For example, if a list contains three items and all displayed on the screen, the highestIndexSeen
is 2.
highestOrdinalSeen
The highest ordinal value of any child seen for this list. An item is "seen" if any part of the item displayed on the screen, even when it's a small number of pixels. This tag applies to list items with an ordinal
value. A list item has an ordinal
value when the component for the item has the ordinal
property set.
The highestOrdinalSeen
value is reported when at least one child with an ordinal value has been seen, either currently or in the past.
lowestIndexSeen
The lowest index of any child seen for this list. An item is "seen" if any part of the item displayed on the screen, even if it's a small number of pixels.
The lowestIndexSeen
value is zero-based. An APL list is commonly first displayed with the lowest item in the list visible on the screen. Therefore, lowestIndexSeen
usually returns zero.
lowestOrdinalSeen
The lowest ordinal value of any child seen for this list. An item is "seen" if any part of the item was displayed on the screen, even if it was just a few pixels. This tag applies to list items with an ordinal value. A list item has an ordinal value when the component for the item has the ordinal
property set.
The lowestOrdinalSeen
value is reported when at least one child with an ordinal value has been seen, either currently or in the past.
listItem
Information about the child of a Sequence
or GridSequence
. The listItem
property has the properties shown in the following table.
Property | Type | Description |
---|---|---|
|
Integer |
Zero-based index of this element in its parent |
The listItem
is reported as an object to reserve space for reporting the row and column of a list item displayed in a grid.
index
The index of this list item in its parent. The index is zero-based. You can compare the index with the lowestIndexSeen
and highestIndexSeen
values in the list tag.
media
An object with a collection of properties reported for a media player, such as a video player. The media
tag describes the current state of the media player and what operations are possible on the media player. The media
tag object has the properties shown in the following table.
Property | Type | Description |
---|---|---|
|
Boolean |
Can seek forward relative to the current position |
|
Boolean |
Can seek backwards relative to the current position. |
|
Boolean |
Can move forward to the next track. |
|
Boolean |
Can move backward to the previous track. |
|
Array (default []) |
Current track entity data |
|
Integer |
Current position of the play head from the start of the track. |
|
One of: |
The current operating state. |
|
String |
Current track source URL |
The media
tag is reported if there is at least one media track available for playing.
The following example shows the media
tag.
{
"id": "myVideoPreview",
"uid": ":1138",
"position": "1024x600+0+0:0",
"type": "video",
"tags": {
"media": {
"allowAdjustSeekPositionForward": true,
"allowAdjustSeekPositionBackwards": true,
"allowNext": true,
"allowPrevious": false,
"entities": [
"MY_ENTITY_DATA"
],
"positionInMilliseconds": 34214,
"state": "playing",
"url": "https://myvideolocationhere"
}
}
}
allowAdjustSeekPositionForwards
When true
, this media track supports seeking forward in time to a new position. Live media streams normally report false
for this property.
allowAdjustSeekPositionBackwards
When true
, this media track supports seeking backwards in time to a new position. Live media streams normally report false
for this property.
allowNext
When true
, the media player can advance forward to the next media track. The allowNext
property is false
if the media player is on the final track.
allowPrevious
If true
, the media player can move back to the previous media track. The allowPrevious
property is false
if the media player is on the first track.
entities
Entity data associated with the current media track (see video_source_property_entity
). The media
object omits the entities
property when the media track doesn't have any entity data associated with it.
positionInMilliseconds
The media player head position within the current media track, measured in milliseconds.
state
The current playing state of the media track. The playing state is one of the following values:
Name | Description |
---|---|
|
The media player hasn't played any content. |
|
The media player has played some content, but is now paused. |
|
The media player is actively playing content. |
url
The URL of the current media track.
ordinal
Reported if the element has a defined ordinal value. The ordinal value is a natural number (a positive integer). The ordinal tag is assigned to children of a Multi-child component with the numbered
property set to true
.
{
"id": "myListItem8",
"uid": ":1231",
"position": "200x100+23+26:1",
"type": "text",
"tags": {
"ordinal": 6 // Ordinal is not always equal to index - 1.
"clickable": true,
"focused": false
}
}
An element with a listItem
might also compare its ordinal
value against the highestOrdinalSeen
and lowestOrdinalSeen
values of the parent list
tag.
pager
An object with a collection of properties reported for a Pager
component if it has at least two pages. The pager
tag object has the properties shown in the following table.
Property | Type | Description |
---|---|---|
|
Integer |
Index of the current page. The index for a |
|
Integer |
Total number of pages |
|
Boolean |
When |
|
Boolean |
When true, the user can move the pager backwards. |
The allowForward
and allowBackwards
properties indicate what the user can do, based the navigation
property for the Pager
and the current page. These properties don't consider what you can do programmatically with the SetPage
command.
For example, assume navigation
is normal
, which lets the user navigate freely back and forth in the Pager
.
- When the
Pager
is on the first page,allowForward
reportstrue
andallowBackwards
reportsfalse
. - When the
Pager
is on a page that's neither the first nor the last page, both properties reporttrue
. - When the
Pager
is on the last page,allowForward
reportsfalse
andallowBackwards
reportstrue
.
When navigation
is none
, the user can't navigate the Pager
at all. In this scenario, both properties report false
, regardless of the page displayed.
When navigation
is wrap
, the user can always navigate forward or backwards. When the user is on the last page, navigating forward wraps back to the first page. In this scenarios, both properties report true
, regardless of the page displayed.
The following example shows the pager
tag.
{
"id": "weatherPager",
"uid": ":111",
"position": "1024x600+0+0:0",
"type": "mixed",
"tags": {
"pager": {
"index": 0,
"pageCount": 4,
"allowForward": true,
"allowBackwards": false
},
"focused": false
}
}
scrollable
An object indicating that a region can scroll forward or backwards. The following components can scroll content:
These components report the scrollable
tag when the component contains enough content to require scrolling. When all the content within the component is fully visible, the visual context doesn't include the scrollable
tag.
The scrollable
tag object has the properties shown in the following table.
* Property * Type * Description
* direction
* One of: horizontal
, vertical
* Direction of scrolling.
* allowFoward
* Boolean
* When true, the content in the scrolling area can scroll forward.
* allowBackwards
* Boolean
* When true, the content in the scrolling area can scroll backwards.
For example, assume a Sequence
contains 10 items and is large enough to display 5 items at a time.
- When the component shows the items with the index zero through four,
allowForward
istrue
andallowBackward
isfalse
. - When the user scrolls down to show items with the index two through six, both
allowForward
andallowBackward
aretrue
. - When the user scrolls all the way to the end of the list,
allowForward
isfalse
andallowBackward
istrue
.
The scrollable
tag is reported when either allowForward
or allowBackward
is true
. When both properties are false
, the tag isn't included.
The following example shows a scrollable
tag.
{
"id": "todoList",
"uid": ":211",
"position": "1024x550+0:50:0",
"type": "mixed",
"tags": {
"scrollable": {
"direction": "horizontal",
"allowForward": true,
"allowBackwards": false
},
"focused": false
}
}
spoken
Boolean tag indicating that this element has content that Alexa can read out loud. Any component that sets the speech
property returns the spoken
tag.
The following example shows spoken
tag.
{
"id": "myListItem",
"uid": ":444",
"position": "800x80+72+437:0",
"type": "text",
"tags": {
"clickable": true,
"focused": true,
"spoken": true
}
}
viewport
The viewport
tag is reserved for the top-level element on a screen. The viewport tag has no defined properties.
Example:
{
"id": "top",
"uid": ":101",
"position": "480x480+0+0:0",
"type": "mixed",
"tags": {
"viewport": {}
}
}
Rules for generating the element hierarchy
The following rules apply when generating the element hierarchy for the visual context:
- The top-level component always generates an element with the
viewport
tag. - A component is reported as an element if it's visible on the screen and has at least one of the following attributes:
- Non-empty
entities
property. - True
clickable
tag. - A
media
tag. - A
pager
tag. - A
scrollable
tag. - A
spoken
tag.
- Non-empty
- A component that isn't visible might still be reported as an element if both of the following conditions are true:
- The component has a non-empty
entities
property - The component was visible on the screen at some time in the past
- The component has a non-empty
The intent of reporting non-visible components is to allow the user to refer to an item that was visible on the screen and might have scrolled out of view. The context reporting system doesn't guarantee that every previously visible item is reported. The system keeps a window of recently visible items and reports back the most recently seen elements.
Nested element example
The following example shows how components nest in normal reporting. The example shows an APL document that displays a background image, a text element corresponding to the title of an article, and a scrolling element holding the content. The content is further divided into a labeled image element and a text element.
The top-level Container
has a single entity
, and the text content has the speech
property set.
For this document, the rules to generate the hierarchy therefore produce the following:
- The top-level
Container
is reported because it has entity data assigned. - The background image isn't reported because it has no entity data and no other tags apply.
- The title isn't reported because it has no entity data and no other tags apply.
- The scrolling region is reported because it has a valid
scrollable
tag. - The small picture isn't reported because it doesn't have entity data and no other tags apply.
- The large text is reported because it has a
spoken
tag.
The resulting element hierarchy might look like the following.
{
"id": "top-level",
"uid": ":9549",
"position": "960x480+0+0:0",
"type": "mixed",
"tags": {
"viewport": {}
},
"children": [
{
"id": "scrollingRegion",
"uid": ":9552",
"position": "960x349+0+131:1",
"type": "mixed",
"tags": {
"focused": false,
"scrollable": {
"direction": "vertical",
"allowForward": false,
"allowBackwards": true
}
},
"children": [
{
"id": "articleId",
"uid": ":9555",
"position": "832x1410+64-1002:1",
"type": "text",
"tags": {
"spoken": true
},
"visibility": 0.20000000298023224,
"entities": []
}
],
"entities": []
}
],
"entities": [
{
"id": "mainPage"
}
]
}
The example shows allowForward
as false
because the user scrolled to the bottom of the content and then made an utterance that sent a request to the skill.
Last updated: Nov 28, 2023