About the Real-Time Communication Interface
The Web Real-Time Communication (WebRTC) standard supports sending real-time video, audio, and arbitrary data between two peers. Amazon supports WebRTC to enable real-time streaming of audio, video, and (optionally) arbitrary data between Alexa and your smart home device. Alexa communicates commands over the real-time communication (RTC) data channel to your device, and then your device responds and reports state back over the data channel. To enable real-time streaming of audio and video, you implement the Alexa.RTCSessionController interface in your Alexa skill. If you want to include data channel support, you also implement the Alexa.RangeController interface for cameras that support pan, tilt, and zoom.
Users can communicate remotely with your devices by using a Fire TV or any Echo device, such as an Echo Dot, Echo Plus, Echo Show, or Echo Spot. Users can also view live feeds from a camera in the Alexa app.
WebRTC signaling
The following sequence diagram shows the WebRTC signaling protocol between Alexa and your smart home skill.
Session Description Protocol offer/answer format
The RTCSessionController
interface uses the Session Description Protocol (SDP) to negotiate session capabilities between peers.
a=sendonly
attribute in your m-line for audio. If your device strictly sends video, don’t include an m-line for audio.Offer/answer exchange example
Each media track has a set of Interactive Connectivity Establishment (ICE) candidates. The example shows ICE candidates of type host
. If your devices aren't routed through a public gateway, also include either server-reflexive by using STUN or relay candidates by using TURN.
v=0
o=- 3747690900 3747690900 IN IP4 0.0.0.0
s=a 2 z
c=IN IP4 0.0.0.0
t=0 0
a=group:BUNDLE audio0 video0
m=audio 1 RTP/SAVPF 96 0
a=candidate:1 1 UDP 2013266430 xxx.xxx.xxx.xxx 8620 typ host
a=candidate:2 1 TCP 1010827775 xxx.xxx.xxx.xxx 45351 typ host tcptype passive
a=candidate:3 2 UDP 2013266429 xxx.xxx.xxx.xxx 50066 typ host
a=candidate:4 2 TCP 1010827774 xxx.xxx.xxx.xxx 65157 typ host tcptype passive
a=candidate:5 2 TCP 1015022078 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=candidate:6 1 TCP 1015022079 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=setup:actpass
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=rtpmap:96 opus/48000/2
a=rtcp:9 IN IP4 0.0.0.0
a=rtcp-mux
a=sendrecv
a=mid:audio0
a=ssrc:118039096 cname:user2571875795@host-433aaf59
a=ice-ufrag:AGVf
a=ice-pwd:h3JAYGhIaQ/Nvyaz9dLoz9
a=fingerprint:sha-256 34:D4:54:17:0C:95:2A:79:FF:72:10:21:E9:6E:F3:77:86:2F:8D:6C:33:45:BA:14:1D:43:01:D7:CD:0A:1A:84
m=video 1 RTP/SAVPF 99
a=candidate:4 1 UDP 2013266430 xxx.xxx.xxx.xxx 8620 typ host
a=candidate:5 1 TCP 1015022079 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=candidate:4 2 UDP 2013266429 xxx.xxx.xxx.xxx 50066 typ host
a=candidate:6 1 TCP 1010827775 xxx.xxx.xxx.xxx 45351 typ host tcptype passive
a=candidate:5 2 TCP 1015022078 xxx.xxx.xxx.xxx 9 typ host tcptype active
a=candidate:6 2 TCP 1010827774 xxx.xxx.xxx.xxx 65157 typ host tcptype passive
b=AS:500
a=setup:actpass
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=rtpmap:99 H264/90000
a=rtcp:9 IN IP4 0.0.0.0
a=rtcp-mux
a=sendrecv
a=mid:video0
a=rtcp-fb:99 nack
a=rtcp-fb:99 nack pli
a=rtcp-fb:99 ccm fir
a=ssrc:3643559644 cname:user2571875795@host-433aaf59
a=ice-ufrag:AGVf
a=ice-pwd:h3JAYGhIaQ/Nvyaz9dLoz9
a=fingerprint:sha-256 34:D4:54:17:0C:95:2A:79:FF:72:10:21:E9:6E:F3:77:86:2F:8D:6C:33:45:BA:14:1D:43:01:D7:CD:0A:1A:84
Supported communication types
The RTCSessionController
interface supports one-way (half-duplex) or two-way (full-duplex) communication. For an audio-only scenario, such as an Echo Plus connecting to a front door intercom, you must have two-way communication. For an audio and video scenario, such as an Echo Show connecting to a front door camera, Alexa supports one-way video communication and one-way or two-way audio communication.
- Half-duplex communication allows users to communicate in two directions, but not simultaneously. For example:
- A walkie-talkie
- A push-to-talk door intercom
If your device doesn't have acoustic echo cancellation support, choose half duplex.
- Full-duplex communication allows users to communicate in two directions simultaneously. For example:
- A telephone
- A telephone door intercom
Supported resolutions
The supported resolutions are 480p to 1080p.
Prerequisites and service-level requirements
Low latency is critical to an optimal user experience. To use the RTCSessionController
, you must meet to following requirements:
-
Alexa requires your device to support live streaming for at least one minute.
-
When your skill receives an offer, you must respond with an SDP answer within six seconds.
- Your device or platform must be WebRTC-compliant, or support the suite of protocols by WebRTC and all supported resiliency mechanisms used in WebRTC. Specifically, the following protocols and mechanisms:
-
For resource considerations, you must support BUNDLE and rtcp-mux. You use a bundle to send audio and video over the same connection to reduce the number of open sockets.
-
To support full duplex communication, your device must employ effective algorithms for acoustic echo cancellation (AEC) and noise suppression.
-
To support half duplex communication, you can use the push to talk (PTT) feature through the typical live view scenario. Declare
isFullDuplexAudioSupported
asfalse
in the discovery response. - To support video, you must use the following video codec:
- H.264 (up to profile high, level 4.1)
- To support audio, you must use one of the following audio codecs:
- Opus (preferred codec)
- PCMU/PCMA
- For ICE candidates, you can use either UDP or TCP but you must use IPv4. Alexa doesn't support trickle ICE. You must gather all ICE candidates up front and send them in the SDP answer.
- Minimize the number of ICE candidates to allow your device to respond within six seconds.
Cameras that support pan, tilt, zoom
To enable an Alexa user to use the pan, tilt, and zoom features, your device must implement the RTCSessionController
and the RangeController
interfaces.
Alexa uses the Alexa.RangeController directives to request that your camera do the following:
- Pan – Rotate the camera on the horizontal plane, left and right.
- Tilt – Rotate the camera on the vertical plane, up and down.
- Zoom – Change the view to see a smaller area with more detail (zoom in) or more area with less detail (zoom out).
Specify the camera range
A camera can implement all or any subset of pan, tilt, and zoom. To specify what properties the camera supports, set the instance
field in the Alexa.RangeController
to Camera.Pan
, Camera.Tilt
, Camera.Zoom
.
For each instance
, specify the minimum and maximum ranges that your camera supports.
For pan and tilt, specify ranges as a percentage of the field of view (FOV) of your camera. For example, if your camera has a 90-degree horizontal FOV, and can rotate 360 total degrees, the range of motion is 400%. The range represents the number of times you can fit the FOV in the total range. You can define your total supported range as 0–400 or –200–200 (minus 200 to 200). If you use –200–200, Alexa can use zero for the direction, straight ahead. For zoom, specify the range as a percent from 0 to the maximum zoom that your camera supports.
The following diagram shows the camera field of view before and after the user asks Alexa to pan to the right. Here, the camera has a 90 degree FOV and can rotate 360 degrees.
Camera pan example
The following example shows the Alexa.RangeController
interface for a camera that supports pan. Send these properties in the discovery response. For details about the properties, see Alexa.RangeController
.
{
"type": "AlexaInterface",
"interface": "Alexa.RangeController",
"version": "3",
"instance": "Camera.Pan",
"capabilityResources": {
"friendlyNames": [{
"@type": "text",
"value": {
"text": "Camera Pan",
"locale": "en-US"
}
},
{
"@type": "text",
"value": {
"text": "Camera Rotation",
"locale": "en-US"
}
},
{
"@type": "text",
"value": {
"text": "Rotation",
"locale": "en-US"
}
}
]
},
"properties": {
"supported": [{
"name": "rangeValue"
}],
"retrievable": true,
"proactivelyReported": true
},
"configuration": {
"supportedRange": {
"minimumValue": -200,
"maximumValue": 200,
"precision": 1
},
"presets": [{
"rangeValue": -200,
"presetResources": {
"friendlyNames": [{
"@type": "text",
"value": {
"text": "Far Left",
"locale": "en-US"
}
}]
}
},
{
"rangeValue": 0,
"presetResources": {
"friendlyNames": [{
"@type": "text",
"value": {
"text": "Center",
"locale": "en-US"
}
}]
}
},
{
"rangeValue": 200,
"presetResources": {
"friendlyNames": [{
"@type": "text",
"value": {
"text": "Far Right",
"locale": "en-US"
}
}]
}
}
]
}
}
Respond to pan, tilt, and zoom directives
To request pan, tilt, and zoom, Alexa sends a SetRangeValue
or AdjustRangeValue
directive to your device over the RTC data channel. Respond immediately. Don't wait for movement to complete. After the motion completes, partially completes, or fails, send an asynchronous an Alexa.ChangeReport
event with the current position of the camera. Send the Alexa.ChangeReport
event to Alexa over both the RTC data channel and the Alexa gateway.
The following list shows the response scenarios:
- If the camera can fulfill the requested motion, respond with an
Alexa.Response
and include the final position. Here, the final position is the requested position. - If the camera can partially fulfill the requested motion, respond with an
Alexa.Response
and include the final position. For example, if the request contains a range that's outside of the camera range, the final position is the furthest extent the camera can move. - If the camera knows it can't fulfill the request, respond with an
Alexa.ErrorResponse
. - After any completed full or partial position change, due a request from Alexa or from any an external source, send an
Alexa.ChangeReport
event with the current position of the camera. - If the requested motion fails after you send an
Alexa.Response
, send anAlexa.ChangeReport
event with the current position of the camera.
Alexa.RangeController
doesn't support Alexa.DeferredResponse
.Pan and tilt examples
The following examples show the request and response payloads for pan and tilt requests.
Pan to center request example
{
"directive": {
"header": {
"namespace": "Alexa.RangeController",
"instance": "Camera.Pan",
"name": "SetRangeValue",
"messageId": "Unique version 4 UUID",
"correlationToken": "Opaque correlation token",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "OAuth2.0 bearer token"
},
"endpointId": "Unique ID of the endpoint",
"cookie": {}
},
"payload": {
"rangeValue": 0
}
}
}
Pan to center response example
{
"event": {
"header": {
"namespace": "Alexa",
"name": "Response",
"messageId": "Unique identifier, preferably a version 4 UUID",
"correlationToken": "Opaque correlation token",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "OAuth2.0 bearer token"
},
"endpointId": "Unique ID of the endpoint"
},
"payload": {}
},
"context": {
"properties": [
{
"namespace": "Alexa.RangeController",
"instance": "Camera.Pan",
"name": "rangeValue",
"value": "0",
"timeOfSample": "2017-02-03T16:20:50.52Z",
"uncertaintyInMilliseconds": 0
}
]
}
}
Tilt down 20 percent request example
{
"directive": {
"header": {
"namespace": "Alexa.RangeController",
"instance": "Camera.Tilt",
"name": "AdjustRangeValue",
"messageId": "Unique version 4 UUID",
"correlationToken": "Opaque correlation token",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "OAuth2.0 bearer token"
},
"endpointId": "Unique ID of the endpoint",
"cookie": {}
},
"payload": {
"rangeValueDelta": -20,
"rangeValueDeltaDefault": false
}
}
}
Tilt not supported response example
{
"event": {
"header": {
"namespace": "Alexa",
"name": "ErrorResponse",
"messageId": "Unique identifier, preferably a version 4 UUID",
"payloadVersion": "3"
},
"endpoint":{
"endpointId": "Unique ID of the endpoint"
},
"payload": {
"type": "INVALID_VALUE",
"message": "Camera doesn't support tilt."
}
}
}
Related topics
- Alexa.RTCSessionController
- Use the Smart Home Live Debugger Tool to Test, Debug, and Speed Up Your Camera WebRTC Integration
Last updated: Mar 27, 2024