Alexa.SmartVision.ObjectDetectionSensor Interface 1.0
Implement the Alexa.SmartVision.ObjectDetectionSensor
interface in your Alexa skill so that customers can receive notifications when their smart-vision device detects an object, such as a person or package. Customers can enable the class of objects for which they want to receive notifications. Your skill reports smart-vision events to Alexa to notify the customer that the device detected an object of interest.
Typically, you use the Alexa.SmartVision.ObjectDetectionSensor
interface with the Alexa.RTCSessionController
interface. Also, you can use the Alexa.DataController
interface to enable customers to review and delete detection events.
For the list of languages that the Alexa.SmartVision.ObjectDetectionSensor
interface supports, see List of Alexa Interfaces and Supported Languages. For the definitions of the message properties, see Alexa Interface Message and Property Reference.
Object detection
Object detection is a computer vision technique to identify objects within an image or video stream. Some smart-vision devices can follow an object through a video stream, aggregate the frames of the same physical object, interpret the contents, and report the detected object information.
Typically, customers can configure the types of objects that they want their device to identify and report. Object detection occurs when the smart-vision device sees an object from the configured object class in the video stream. Your skill reports the object detection event to Alexa, and, then Alexa notifies the customer. The customer can also review the history of reported events in the Alexa app.
If your smart-vision device can detect multiple objects in a single video stream, send an object detection event for each object, as soon as detection occurs, without waiting for the video processing session to end. As the session continues, your smart-vision processing software can add data to the event, aggregate frames of the same physical object, and detect other objects of interest from the same or different object classes in the video stream. You send additional events for other objects of interest that appear in the stream. Event data might include the detection time, the object class, a unique identifier for the detected object, and an image of the object.
Alexa caches data associated with the detected event, such as the event identifier. When the customer opens the Alexa app to review the detection event, Alexa uses the Alexa.DataController
interface to retrieve the event data from your skill.
Object classes
The Alexa.SmartVision.ObjectDetectionSensor
interface uses nouns from the WordNet® database to define the types of physical objects, called object classes, that the smart-vision device might detect. WordNet is a large lexical database of English words, grouped into related words and concepts, that many smart-vision devices use to identify objects.
The following table shows common object class names used with the Alexa.SmartVision.ObjectDetectionSensor
interface. You can use any object class that your device supports.
Object class | Description |
---|---|
|
A parcel or bundle. An object or group of objects wrapped in paper or plastic, or packed in a box. |
|
A human being. |
(Source: Princeton University "About WordNet." WordNet. Princeton University. 2010.)
Utterances
The Alexa.SmartVision.ObjectDetectionSensor
interface doesn't define any user utterances. Instead, Alexa communicates with your skill about the object detection classes that the customer configures in the Alexa app.
Properties and objects
The Alexa.SmartVision.ObjectDetectionSensor
interface includes the following properties and objects.
Reportable properties
The Alexa.SmartVision.ObjectDetectionSensor
interface uses the objectDetectionClasses
as the primary property. You identify the properties that you support in your discovery response.
The objectDetectionClasses
property defines the objects that the smart-vision endpoint can detect. The property is an array of ClassConfiguration
objects.
ClassConfiguration object
The ClassConfiguration
object provides information about the class of images that the endpoint can detect.
Property | Description | Type |
---|---|---|
|
The class of images that the endpoint can detect. |
String |
Discovery
You describe endpoints that support Alexa.SmartVision.ObjectDetectionSensor
by using the standard discovery mechanism described in Alexa.Discovery
.
Set retrievable
to true
for the properties that you report when Alexa sends your skill a state report request.
Set proactivelyReported
to true
for the properties that you proactively report to Alexa in a change report.
Use CAMERA
for the display category. For the full list of display categories, see display categories.
Sensor devices must also implement Alexa.EndpointHealth
.
Alexa.DataController
interface, include one instance of Alexa.DataController
only.Configuration object
In addition to the usual discovery response fields, for Alexa.SmartVision.ObjectDetectionSensor
, include a configuration
object that contains the following fields.
Property | Description | Type | Required |
---|---|---|---|
|
Object detection classes that the endpoint supports. |
Array of objects |
Yes |
|
The class of images that the endpoint can detect. |
String |
Yes |
|
Indicates whether you can enable the class on the endpoint. |
Boolean |
No |
|
Indicates why the class isn't available on the endpoint. |
String |
No |
Discover response example
The following example shows a Discover.Response
message for an Alexa skill that supports the Alexa.SmartVision.ObjectDetectionSensor
, Alexa.DataController
, and Alexa.EndpointHealth
interfaces.
{
"event": {
"header": {
"namespace": "Alexa.Discovery",
"name": "Discover.Response",
"payloadVersion": "3",
"messageId": "Unique identifier, preferably a version 4 UUID"
},
"payload": {
"endpoints": [{
"endpointId": "Unique ID of the endpoint",
"manufacturerName": "Sample Manufacturer",
"description": "Description that appears in the Alexa app",
"friendlyName": "Your device name, displayed in the Alexa app",
"displayCategories": ["CAMERA"],
"additionalAttributes": {
"manufacturer": "Sample Manufacturer",
"model": "Sample Model",
"serialNumber": "Serial number of the device",
"firmwareVersion": "Firmware version of the device",
"softwareVersion": "Software version of the device",
"customIdentifier": "Optional custom identifier for the device"
},
"cookie": {},
"capabilities": [{
"type": "AlexaInterface",
"interface": "Alexa.SmartVision.ObjectDetectionSensor",
"version": "1.0",
"properties": {
"supported": [{
"name": "objectDetectionClasses"
}],
"proactivelyReported": true,
"retrievable": true
},
"configuration": {
"objectDetectionConfiguration": [{
"imageNetClass": "person"
},
{
"imageNetClass": "package",
"isAvailable": false,
"unavailabilityReason": "SUBSCRIPTION_REQUIRED"
}
]
}
},
{
"type": "AlexaInterface",
"interface": "Alexa.DataController",
"instance": "Camera.SmartVisionData",
"version": "1.0",
"properties": {},
"configuration": {
"targetCapability": {
"name": "Alexa.SmartVision.ObjectDetectionSensor",
"version": "1.0"
},
"dataRetrievalSchema": {
"type": "JSON",
"schema": "SmartVisionData"
},
"supportedAccess": ["BY_TIMESTAMP_RANGE"]
}
},
{
"type": "AlexaInterface",
"interface": "Alexa.EndpointHealth",
"version": "3.1",
"properties": {
"supported": [{
"name": "connectivity"
}],
"proactivelyReported": true,
"retrievable": true
}
},
{
"type": "AlexaInterface",
"interface": "Alexa",
"version": "3"
}
]
}]
}
}
}
AddOrUpdateReport
You must proactively send an Alexa.Discovery.AddOrUpdateReport
event if the feature support of your endpoint changes. For example, if the subscription status of a supported object class changes. For details, see AddOrUpdateReport event.
AddOrUpdateReport event example
The following example shows an AddOrUpdateReport
message to report the package
class no longer requires a subscription.
{
"event": {
"header": {
"namespace": "Alexa.Discovery",
"name": "AddOrUpdateReport",
"payloadVersion": "3",
"messageId": "Unique identifier, preferably a version 4 UUID"
},
"payload": {
"endpoints": [{
"endpointId": "Unique ID of the endpoint",
"manufacturerName": "Sample Manufacturer",
"description": "Description that appears in the Alexa app",
"friendlyName": "Your device name, displayed in the Alexa app",
"displayCategories": ["CAMERA"],
"additionalAttributes": {
"manufacturer": "Sample Manufacturer",
"model": "Sample Model",
"serialNumber": "Serial number of the device",
"firmwareVersion": "Firmware version of the device",
"softwareVersion": "Software version of the device",
"customIdentifier": "Optional custom identifier for the device"
},
"cookie": {},
"capabilities": [{
"type": "AlexaInterface",
"interface": "Alexa.SmartVision.ObjectDetectionSensor",
"version": "1.0",
"properties": {
"supported": [{
"name": "objectDetectionClasses"
}
],
"proactivelyReported": true,
"retrievable": true
},
"configuration": {
"objectDetectionConfiguration" : [
{
"imageNetClass" : "person"
},
{
"imageNetClass" : "package"
}
]
}
},
{
"type": "AlexaInterface",
"interface": "Alexa.EndpointHealth",
"version": "3.1",
"properties": {
"supported": [{
"name": "connectivity"
}],
"proactivelyReported": true,
"retrievable": true
}
},
{
"type": "AlexaInterface",
"interface": "Alexa",
"version": "3"
}
]
}]
}
}
}
Directives and events
The Alexa.SmartVision.ObjectDetectionSensor
interface defines the following directives and events.
SetObjectDetectionClasses directive
Support the SetObjectDetectionClasses
directive so that the customer can configure the objects for which they want to receive notifications. The customer can configure the object classes in the Alexa app. For this endpoint, you must disable events for any object class that isn't included in the request.
SetObjectDetectionClasses directive example
The following example shows a SetObjectDetectionClasses
directive that Alexa sends to your skill. This example enables detection of objects in the person
and package
classes.
{
"directive": {
"header": {
"namespace": "Alexa.SmartVision.ObjectDetectionSensor",
"name": "SetObjectDetectionClasses",
"payloadVersion": "1.0",
"messageId": "Unique version 4 UUID",
"correlationToken": "Opaque correlation token"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "OAuth2 bearer token"
},
"endpointId": "endpoint id",
"cookie": {}
},
"payload": {
"objectDetectionClasses": [{
"imageNetClass": "person"
},
{
"imageNetClass": "package"
}
]
}
}
}
SetObjectDetectionClasses directive payload
The following table shows the payload details for the SetObjectDetectionClasses
directive that Alexa sends to your skill.
Property | Description | Type | Required |
---|---|---|---|
|
Classes of objects that the customer wants the camera to detect. You must disable object detection for any other object classes that your smart-vision camera supports. |
Array of |
Yes |
SetObjectDetectionClasses response
If you handle a SetObjectDetectionClasses
directive successfully and you can configure events for the requested object classes, respond with an Alexa.Response
and include the resulting supported objectDetectionClasses
array.
The following example shows a successful response to the SetObjectDetectionClasses
directive.
{
"event": {
"header": {
"namespace": "Alexa",
"name": "Response",
"messageId": "Unique identifier, preferably a version 4 UUID",
"correlationToken": "Opaque correlation token that matches the request",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "OAuth2.0 bearer token"
},
"endpointId": "endpoint id",
"cookie": {}
},
"payload": {}
},
"context": {
"properties": [{
"namespace": "Alexa.SmartVision.ObjectDetectionSensor",
"name": "objectDetectionClasses",
"value": [{
"imageNetClass": "person"
},
{
"imageNetClass": "package"
}
]
},
{
"namespace": "Alexa.EndpointHealth",
"name": "connectivity",
"value": {
"value": "OK"
},
"timeOfSample": "2017-02-03T16:20:50.52Z",
"uncertaintyInMilliseconds": 0
}
]
}
}
SetObjectDetectionClasses directive error handling
If you can't handle a SetObjectDetectionClasses
directive successfully, respond with an Alexa.SmartVision.ObjectDetectionSensor.ErrorResponse
event. You can also respond with a generic Alexa.ErrorResponse
event if your error isn't specific to object detection.
ObjectDetection event
Send the ObjectDetection
event to the Alexa Event Gateway when your device recognizes an object from one of the configured object classes. For details, see Send Events to the Event Gateway. On receipt of the event, Alexa notifies the customer about the detected object.
Also, Alexa caches the event data by eventIdentifier
and endpointId
so that the customer can later view and delete the event data.
Assign a unique eventIdentifier
for each detected object in the video stream and send one event per detected object. Also, produce at most one detection event per detected object class and video stream.
After your skill reports an ObjectDetection
event, you must wait at least 30 seconds before sending another ObjectDetection
event for the same object.
Alexa.DataController
interface.ObjectDetection event example
The following example shows an ObjectDetection
event that you send to Alexa. This example reports the detection of an object in the person
class.
{
"event": {
"header": {
"namespace": "Alexa.SmartVision.ObjectDetectionSensor",
"name": "ObjectDetection",
"messageId": "Unique identifier, preferably a version 4 UUID",
"payloadVersion": "1.0"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "OAuth2 bearer token"
},
"endpointId": "endpoint id"
},
"payload": {
"events": [{
"eventIdentifier": "2b3409df-d686-4a52-9bba-d361860bac61",
"imageNetClass": "person",
"timeOfSample": "2021-07-02T16:20:50.52Z",
"uncertaintyInMilliseconds": 0,
"objectIdentifier": "573409df-5486-7c52-b4ba-d361860bac73",
"frameImageUri": "https://example.com/frames/frame1.jpg",
"croppedImageUri": "https://example.com/images/image1.jpg"
}]
}
}
}
ObjectDetection event payload
The following table shows the payload details for the ObjectDetection
event.
Property | Description | Type | Required |
---|---|---|---|
|
Objects that the endpoint detected. |
Array of objects |
Yes |
|
Uniquely identifies the event in the event history. You can use the identifier to retrieve and delete the event from the camera stream. Generate an event identifier for each detected object. |
Version 4 UUID String |
Yes |
|
Class of the detected object. For valid class names, see WordNet database. |
String |
Yes |
|
Uniquely identifies the physical object detected in this session. Generate an identifier for each detected object. If your device doesn't distinguish between detected objects, don't include this identifier. |
Version 4 UUID String |
No |
|
URI to the frame that shows the detected object. The frame gives context to the scene where the device detected the object. |
String |
No |
|
URI to the cropped image centered on the detected object. |
String |
No |
|
Time the endpoint detected the object. |
String |
Yes |
|
Uncertainty of the |
Number |
No |
ObjectDetection response
If your skill proactively sent an ObjectDetection
event and Alexa handles the event successfully, your skill receives HTTP status code 202 Success
. On error, Alexa sends the appropriate HTTP status code.
The following table shows the HTTP status codes sent in response to the event.
Status | Description |
---|---|
|
Operation succeeded. |
|
Indicates that the request is invalid or badly formatted. Verify the event payload and check for any missing or invalid fields. |
|
Indicates that the request didn't include the authorization token or the token is invalid or expired. |
|
Indicates that the authorization token doesn't have sufficient permissions or the skill is disabled. |
|
Indicates that the skill doesn't exist in the corresponding stage. |
|
Maximum number or size of a parameter exceeds the limit. |
|
Number of requests per minute is too high. Use exponential back-off and retry the request. |
|
An error occurred on the server. The skill can retry by using exponential back-off. |
|
Server is busy or unavailable. The skill can retry by using exponential back-off. |
Event update and deletion
As soon as your device recognizes a configured object, you send the ObjectDetection
event to Alexa. On receipt of the event, Alexa sends a notification to the customer. As the video stream continues, you can update the event data by using the Alexa.DataController
interface to send a DataReport
event to Alexa. For example, you might want to aggregate frames of the same physical object. Alexa doesn't send a notification when you update the event data.
You can also use the Alexa.DataController
interface to delete data stored on Alexa. For example, you send a DataDeleted
event to Alexa when the customer deletes the detection event directly from your camera or your camera app.
Smart-vision data schema example
You send smart-vision event data as an array in the data
property of the DataReport
event.
The following example shows a DataReport
event from a smart camera that includes data for two frames and the associated images of the detected object.
{
"event": {
"header": {
"namespace": "Alexa.DataController",
"name": "DataReport",
"instance": "DataController-SmartVisionData",
"messageId": "Unique identifier, preferably a version 4 UUID",
"correlationToken": "Opaque correlation token that matches the request",
"payloadVersion": "1.0"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "Opaque correlation token"
},
"endpointId": "endpoint id"
},
"payload": {
"paginationContext": {
"nextToken": "token"
},
"dataSchema": {
"type": "JSON",
"schema": "SmartVisionData"
},
"data": [{
"eventIdentifier": "2b3409df-d686-4a52-9bba-d361860bac61",
"imageNetClass": "person",
"mediaId": "2c3409df-d686-4a52-9bba-d361860bac61",
"objectIdentifier": "4a3409df-d686-4a52-9bba-d361860bacbf",
"frameIndex": 2,
"frameWidthInPixels": 1980,
"frameHeightInPixels": 1080,
"frameImageUri": "https://example.com/frames/frame1.jpg",
"croppedImageUri": "https://example.com/images/image1.jpg"
},
{
"eventIdentifier": "2b3409df-d686-4a52-9bba-d361860bac61",
"imageNetClass": "person",
"mediaId": "2c3409df-d686-4a52-9bba-d361860bac62",
"objectIdentifier": "4a3409df-d686-4a52-9bba-d361860bacbf",
"frameIndex": 5,
"frameWidthInPixels": 1980,
"frameHeightInPixels": 1080,
"frameImageUri": "https://example.com/frames/frame2.jpg",
"croppedImageUri": "https://example.com/images/image2.jpg"
}
]
}
}
}
Smart-vision data schema definition
The following table shows the JSON data schema defined by the Alexa.SmartVision.ObjectDetectionSensor
interface.
Property | Description | Type | Required |
---|---|---|---|
|
Uniquely identifies the event in the event history. You can send updated data for the same camera session and customer. |
Version 4 UUID String |
Yes |
|
Class of the detected object. |
String |
Yes |
|
Uniquely identifies the media recording in which the event occurred. |
Version 4 UUID String |
No |
|
Frame number. |
Integer |
No |
|
Width of the frame in pixels. |
Integer |
No |
|
Height of the frame in pixels. |
Integer |
No |
|
Uniquely identifies the physical object detected in this session. Generate an identifier for each detected object. If your device doesn't distinguish between detected objects, don't include this identifier. |
Version 4 UUID String |
No |
|
URI to the frame that shows the detected object. The frame gives context to the scene where the device detected the object. |
String |
No |
|
URI to the cropped image centered on the detected object. |
String |
No |
State reporting
Alexa sends a ReportState
directive to request information about the state of an endpoint. When Alexa sends a ReportState
directive, you send a StateReport
event in response. The response contains the current state of all retrievable properties in the context object. You identify your retrievable properties in your discovery response. For details about state reports, see Understand State and Change Reporting.
StateReport response example
In this example, the smart-vision endpoint supports the person
and package
object detection classes.
{
"event": {
"header": {
"namespace": "Alexa",
"name": "StateReport",
"messageId": "Unique identifier, preferably a version 4 UUID",
"correlationToken": "Opaque correlation token that matches the request",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "OAuth2 bearer token"
},
"endpointId": "endpoint id",
"cookie": {}
},
"payload": {}
},
"context": {
"properties": [{
"namespace": "Alexa.SmartVision.ObjectDetectionSensor",
"name": "objectDetectionClasses",
"value": [{
"imageNetClass": "person"
},
{
"imageNetClass": "package"
}
],
"timeOfSample": "2024-07-03T11:20:50.52Z",
"uncertaintyInMilliseconds": 0
},
{
"namespace": "Alexa.EndpointHealth",
"name": "connectivity",
"value": {
"value": "OK"
},
"timeOfSample": "2024-07-03T10:45:00.52Z"
"uncertaintyInMilliseconds": 0
}
]
}
}
Change reporting
You send a ChangeReport
event to report changes proactively in the state of an endpoint. You identify the properties that you proactively report in your discovery response. For details about change reports, see Understand State and Change Reporting.
The payload
contains the values of properties that have changed, the context
contains the values of other relevant properties.
ChangeReport event example
The following example shows a ChangeReport
event after the customer changes their preference for which objects to detect.
{
"event": {
"header": {
"namespace": "Alexa",
"name": "ChangeReport",
"messageId": "Unique identifier, preferably a version 4 UUID",
"payloadVersion": "3"
},
"endpoint": {
"scope": {
"type": "BearerToken",
"token": "OAuth2 bearer token"
},
"endpointId": "endpoint id"
},
"payload": {
"change": {
"cause": {
"type": "PHYSICAL_INTERACTION"
},
"properties": [{
"namespace": "Alexa.SmartVision.ObjectDetectionSensor",
"name": "objectDetectionClasses",
"value": [{
"imageNetClass": "person"
}],
"timeOfSample": "2024-07-03T10:20:50.52Z",
"uncertaintyInMilliseconds": 0
}]
}
}
},
"context": {
"properties": [{
"namespace": "Alexa.EndpointHealth",
"name": "connectivity",
"value": {
"value": "OK"
},
"timeOfSample": "2024-07-03T10:19:02.12Z",
"uncertaintyInMilliseconds": 60000
}]
}
}
Related topics
Last updated: Aug 23, 2024