Alexa.SmartVision.ObjectDetectionSensor Interface 1.0


Implement the Alexa.SmartVision.ObjectDetectionSensor interface in your Alexa skill so that customers can receive notifications when their smart-vision device detects an object, such as a person or package. Customers can enable the class of objects for which they want to receive notifications. Your skill reports smart-vision events to Alexa to notify the customer that the device detected an object of interest.

Typically, you use the Alexa.SmartVision.ObjectDetectionSensor interface with the Alexa.RTCSessionController interface. Also, you can use the Alexa.DataController interface to enable customers to review and delete detection events.

For the list of languages that the Alexa.SmartVision.ObjectDetectionSensor interface supports, see List of Alexa Interfaces and Supported Languages. For the definitions of the message properties, see Alexa Interface Message and Property Reference.

Object detection

Object detection is a computer vision technique to identify objects within an image or video stream. Some smart-vision devices can follow an object through a video stream, aggregate the frames of the same physical object, interpret the contents, and report the detected object information.

Typically, customers can configure the types of objects that they want their device to identify and report. Object detection occurs when the smart-vision device sees an object from the configured object class in the video stream. Your skill reports the object detection event to Alexa, and, then Alexa notifies the customer. The customer can also review the history of reported events in the Alexa app.

If your smart-vision device can detect multiple objects in a single video stream, send an object detection event for each object, as soon as detection occurs, without waiting for the video processing session to end. As the session continues, your smart-vision processing software can add data to the event, aggregate frames of the same physical object, and detect other objects of interest from the same or different object classes in the video stream. You send additional events for other objects of interest that appear in the stream. Event data might include the detection time, the object class, a unique identifier for the detected object, and an image of the object.

Alexa caches data associated with the detected event, such as the event identifier. When the customer opens the Alexa app to review the detection event, Alexa uses the Alexa.DataController interface to retrieve the event data from your skill.

Object classes

The Alexa.SmartVision.ObjectDetectionSensor interface uses nouns from the WordNet® database to define the types of physical objects, called object classes, that the smart-vision device might detect. WordNet is a large lexical database of English words, grouped into related words and concepts, that many smart-vision devices use to identify objects.

The following table shows common object class names used with the Alexa.SmartVision.ObjectDetectionSensor interface. You can use any object class that your device supports.

Object class Description

package

A parcel or bundle. An object or group of objects wrapped in paper or plastic, or packed in a box.

person

A human being.

(Source: Princeton University "About WordNet." WordNet. Princeton University. 2010.)

Utterances

The Alexa.SmartVision.ObjectDetectionSensor interface doesn't define any user utterances. Instead, Alexa communicates with your skill about the object detection classes that the customer configures in the Alexa app.

Properties and objects

The Alexa.SmartVision.ObjectDetectionSensor interface includes the following properties and objects.

Reportable properties

The Alexa.SmartVision.ObjectDetectionSensor interface uses the objectDetectionClasses as the primary property. You identify the properties that you support in your discovery response.

The objectDetectionClasses property defines the objects that the smart-vision endpoint can detect. The property is an array of ClassConfiguration objects.

ClassConfiguration object

The ClassConfiguration object provides information about the class of images that the endpoint can detect.

Property Description Type

imageNetClass

The class of images that the endpoint can detect.
For valid class names, see WordNet database. Class names are nouns, such as person or package.

String

Discovery

You describe endpoints that support Alexa.SmartVision.ObjectDetectionSensor by using the standard discovery mechanism described in Alexa.Discovery.

Set retrievable to true for the properties that you report when Alexa sends your skill a state report request. Set proactivelyReported to true for the properties that you proactively report to Alexa in a change report.

Use CAMERA for the display category. For the full list of display categories, see display categories.

Sensor devices must also implement Alexa.EndpointHealth.

Configuration object

In addition to the usual discovery response fields, for Alexa.SmartVision.ObjectDetectionSensor, include a configuration object that contains the following fields.

Property Description Type Required

objectDetectionConfigurations

Object detection classes that the endpoint supports.

Array of objects

Yes

objectDetectionConfigurations[*].imageNetClass

The class of images that the endpoint can detect.
For valid class names, see WordNet database. Class names are nouns, such as person or package.

String

Yes

objectDetectionConfigurations[*].isAvailable

Indicates whether you can enable the class on the endpoint.
Default value: true.

Boolean

No

objectDetectionConfigurations[*].unavailabilityReason

Indicates why the class isn't available on the endpoint.
Valid values: SUBSCRIPTION_REQUIRED.

String

No

Discover response example

The following example shows a Discover.Response message for an Alexa skill that supports the Alexa.SmartVision.ObjectDetectionSensor, Alexa.DataController, and Alexa.EndpointHealth interfaces.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa.Discovery",
            "name": "Discover.Response",
            "payloadVersion": "3",
            "messageId": "Unique identifier, preferably a version 4 UUID"
        },
        "payload": {
            "endpoints": [{
                "endpointId": "Unique ID of the endpoint",
                "manufacturerName": "Sample Manufacturer",
                "description": "Description that appears in the Alexa app",
                "friendlyName": "Your device name, displayed in the Alexa app",
                "displayCategories": ["CAMERA"],
                "additionalAttributes": {
                    "manufacturer": "Sample Manufacturer",
                    "model": "Sample Model",
                    "serialNumber": "Serial number of the device",
                    "firmwareVersion": "Firmware version of the device",
                    "softwareVersion": "Software version of the device",
                    "customIdentifier": "Optional custom identifier for the device"
                },
                "cookie": {},
                "capabilities": [{
                        "type": "AlexaInterface",
                        "interface": "Alexa.SmartVision.ObjectDetectionSensor",
                        "version": "1.0",
                        "properties": {
                            "supported": [{
                                "name": "objectDetectionClasses"
                            }],
                            "proactivelyReported": true,
                            "retrievable": true
                        },
                        "configuration": {
                            "objectDetectionConfiguration": [{
                                    "imageNetClass": "person"
                                },
                                {
                                    "imageNetClass": "package",
                                    "isAvailable": false,
                                    "unavailabilityReason": "SUBSCRIPTION_REQUIRED"
                                }
                            ]
                        }
                    },
                    {
                        "type": "AlexaInterface",
                        "interface": "Alexa.DataController",
                        "instance": "Camera.SmartVisionData",
                        "version": "1.0",
                        "properties": {},
                        "configuration": {
                            "targetCapability": {
                                "name": "Alexa.SmartVision.ObjectDetectionSensor",
                                "version": "1.0"
                            },
                            "dataRetrievalSchema": {
                                "type": "JSON",
                                "schema": "SmartVisionData"
                            },
                            "supportedAccess": ["BY_TIMESTAMP_RANGE"]
                        }
                    },
                    {
                        "type": "AlexaInterface",
                        "interface": "Alexa.EndpointHealth",
                        "version": "3.1",
                        "properties": {
                            "supported": [{
                                "name": "connectivity"
                            }],
                            "proactivelyReported": true,
                            "retrievable": true
                        }
                    },
                    {
                        "type": "AlexaInterface",
                        "interface": "Alexa",
                        "version": "3"
                    }
                ]
            }]
        }
    }
}

AddOrUpdateReport

You must proactively send an Alexa.Discovery.AddOrUpdateReport event if the feature support of your endpoint changes. For example, if the subscription status of a supported object class changes. For details, see AddOrUpdateReport event.

AddOrUpdateReport event example

The following example shows an AddOrUpdateReport message to report the package class no longer requires a subscription.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa.Discovery",
            "name": "AddOrUpdateReport",
            "payloadVersion": "3",
            "messageId": "Unique identifier, preferably a version 4 UUID"
        },
        "payload": {
            "endpoints": [{
                "endpointId": "Unique ID of the endpoint",
                "manufacturerName": "Sample Manufacturer",
                "description": "Description that appears in the Alexa app",
                "friendlyName": "Your device name, displayed in the Alexa app",
                "displayCategories": ["CAMERA"],
                "additionalAttributes": {
                    "manufacturer": "Sample Manufacturer",
                    "model": "Sample Model",
                    "serialNumber": "Serial number of the device",
                    "firmwareVersion": "Firmware version of the device",
                    "softwareVersion": "Software version of the device",
                    "customIdentifier": "Optional custom identifier for the device"
                },
                "cookie": {},
                "capabilities": [{
                        "type": "AlexaInterface",
                        "interface": "Alexa.SmartVision.ObjectDetectionSensor",
                        "version": "1.0",
                        "properties": {
                            "supported": [{
                                    "name": "objectDetectionClasses"
                                }
                            ],
                            "proactivelyReported": true,
                            "retrievable": true
                        },
                        "configuration": {
                            "objectDetectionConfiguration" : [
                                {
                                   "imageNetClass" : "person"
                                },
                                {
                                   "imageNetClass" : "package"
                                }
                            ]
                        }
                    },
                    {
                        "type": "AlexaInterface",
                        "interface": "Alexa.EndpointHealth",
                        "version": "3.1",
                        "properties": {
                            "supported": [{
                                "name": "connectivity"
                            }],
                            "proactivelyReported": true,
                            "retrievable": true
                        }
                    },
                    {
                        "type": "AlexaInterface",
                        "interface": "Alexa",
                        "version": "3"
                    }
                ]
            }]
        }
    }
}

Directives and events

The Alexa.SmartVision.ObjectDetectionSensor interface defines the following directives and events.

SetObjectDetectionClasses directive

Support the SetObjectDetectionClasses directive so that the customer can configure the objects for which they want to receive notifications. The customer can configure the object classes in the Alexa app. For this endpoint, you must disable events for any object class that isn't included in the request.

SetObjectDetectionClasses directive example

The following example shows a SetObjectDetectionClasses directive that Alexa sends to your skill. This example enables detection of objects in the person and package classes.

{
    "directive": {
        "header": {
            "namespace": "Alexa.SmartVision.ObjectDetectionSensor",
            "name": "SetObjectDetectionClasses",
            "payloadVersion": "1.0",
            "messageId": "Unique version 4 UUID",
            "correlationToken": "Opaque correlation token"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "OAuth2 bearer token"
            },
            "endpointId": "endpoint id",
            "cookie": {}
        },
        "payload": {
            "objectDetectionClasses": [{
                    "imageNetClass": "person"
                },
                {
                    "imageNetClass": "package"
                }
            ]
        }
    }
}

SetObjectDetectionClasses directive payload

The following table shows the payload details for the SetObjectDetectionClasses directive that Alexa sends to your skill.

Property Description Type Required

objectDetectionClasses

Classes of objects that the customer wants the camera to detect. You must disable object detection for any other object classes that your smart-vision camera supports.

Array of ClassConfiguration objects

Yes

SetObjectDetectionClasses response

If you handle a SetObjectDetectionClasses directive successfully and you can configure events for the requested object classes, respond with an Alexa.Response and include the resulting supported objectDetectionClasses array.

The following example shows a successful response to the SetObjectDetectionClasses directive.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa",
            "name": "Response",
            "messageId": "Unique identifier, preferably a version 4 UUID",
            "correlationToken": "Opaque correlation token that matches the request",
            "payloadVersion": "3"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "OAuth2.0 bearer token"
            },
            "endpointId": "endpoint id",
            "cookie": {}
        },
        "payload": {}
    },
    "context": {
        "properties": [{
                "namespace": "Alexa.SmartVision.ObjectDetectionSensor",
                "name": "objectDetectionClasses",
                "value": [{
                        "imageNetClass": "person"
                    },
                    {
                        "imageNetClass": "package"
                    }
                ]
            },
            {
                "namespace": "Alexa.EndpointHealth",
                "name": "connectivity",
                "value": {
                    "value": "OK"
                },
                "timeOfSample": "2017-02-03T16:20:50.52Z",
                "uncertaintyInMilliseconds": 0
            }
        ]
    }
}

SetObjectDetectionClasses directive error handling

If you can't handle a SetObjectDetectionClasses directive successfully, respond with an Alexa.SmartVision.ObjectDetectionSensor.ErrorResponse event. You can also respond with a generic Alexa.ErrorResponse event if your error isn't specific to object detection.

ObjectDetection event

Send the ObjectDetection event to the Alexa Event Gateway when your device recognizes an object from one of the configured object classes. For details, see Send Events to the Event Gateway. On receipt of the event, Alexa notifies the customer about the detected object. Also, Alexa caches the event data by eventIdentifier and endpointId so that the customer can later view and delete the event data.

Assign a unique eventIdentifier for each detected object in the video stream and send one event per detected object. Also, produce at most one detection event per detected object class and video stream. After your skill reports an ObjectDetection event, you must wait at least 30 seconds before sending another ObjectDetection event for the same object.

ObjectDetection event example

The following example shows an ObjectDetection event that you send to Alexa. This example reports the detection of an object in the person class.

Copied to clipboard.

 {
    "event": {
        "header": {
            "namespace": "Alexa.SmartVision.ObjectDetectionSensor",
            "name": "ObjectDetection",
            "messageId": "Unique identifier, preferably a version 4 UUID",
            "payloadVersion": "1.0"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "OAuth2 bearer token"
            },
            "endpointId": "endpoint id"
        },
        "payload": {
            "events": [{
                "eventIdentifier": "2b3409df-d686-4a52-9bba-d361860bac61",
                "imageNetClass": "person",
                "timeOfSample": "2021-07-02T16:20:50.52Z",
                "uncertaintyInMilliseconds": 0,
                "objectIdentifier": "573409df-5486-7c52-b4ba-d361860bac73",
                "frameImageUri": "https://example.com/frames/frame1.jpg",
                "croppedImageUri": "https://example.com/images/image1.jpg"
            }]
        }
    }
}

ObjectDetection event payload

The following table shows the payload details for the ObjectDetection event.

Property Description Type Required

events

Objects that the endpoint detected.

Array of objects

Yes

events[*].eventIdentifier

Uniquely identifies the event in the event history. You can use the identifier to retrieve and delete the event from the camera stream. Generate an event identifier for each detected object.

Version 4 UUID String

Yes

events[*].imageNetClass

Class of the detected object. For valid class names, see WordNet database.

String

Yes

events[*].objectIdentifier

Uniquely identifies the physical object detected in this session. Generate an identifier for each detected object. If your device doesn't distinguish between detected objects, don't include this identifier.

Version 4 UUID String

No

events[*].frameImageUri

URI to the frame that shows the detected object. The frame gives context to the scene where the device detected the object.
If the device extracts frames, you can retrieve the frames by using the Alexa.DataController interface.

String

No

events[*].croppedImageUri

URI to the cropped image centered on the detected object.
If the device extracts cropped images, you can retrieve the images by using the Alexa.DataController interface.

String

No

events[*].timeOfSample

Time the endpoint detected the object.
Defined in ISO 8601 format, YYYY-MM-DDThh:mm:ssZ.

String

Yes

events[*].uncertaintyInMilliseconds

Uncertainty of the timeOfSample in milliseconds.
This field represents the number of milliseconds before or after the endpoint detected the object. For example, uncertainty due to the transmission delay between the action in front of the camera and the corresponding object detection processing software.

Number

No

ObjectDetection response

If your skill proactively sent an ObjectDetection event and Alexa handles the event successfully, your skill receives HTTP status code 202 Success. On error, Alexa sends the appropriate HTTP status code.

The following table shows the HTTP status codes sent in response to the event.

Status Description

202 Success

Operation succeeded.

400 Invalid Request

Indicates that the request is invalid or badly formatted. Verify the event payload and check for any missing or invalid fields.

401 Unauthorized

Indicates that the request didn't include the authorization token or the token is invalid or expired.

403 Forbidden

Indicates that the authorization token doesn't have sufficient permissions or the skill is disabled.

404 Not Found

Indicates that the skill doesn't exist in the corresponding stage.

413 Request Entity Too Large

Maximum number or size of a parameter exceeds the limit.

429 Too Many Requests response

Number of requests per minute is too high. Use exponential back-off and retry the request.

500 Internal Server Error

An error occurred on the server. The skill can retry by using exponential back-off.

503 Service Unavailable

Server is busy or unavailable. The skill can retry by using exponential back-off.

Event update and deletion

As soon as your device recognizes a configured object, you send the ObjectDetection event to Alexa. On receipt of the event, Alexa sends a notification to the customer. As the video stream continues, you can update the event data by using the Alexa.DataController interface to send a DataReport event to Alexa. For example, you might want to aggregate frames of the same physical object. Alexa doesn't send a notification when you update the event data.

You can also use the Alexa.DataController interface to delete data stored on Alexa. For example, you send a DataDeleted event to Alexa when the customer deletes the detection event directly from your camera or your camera app.

Smart-vision data schema example

You send smart-vision event data as an array in the data property of the DataReport event.

The following example shows a DataReport event from a smart camera that includes data for two frames and the associated images of the detected object.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa.DataController",
            "name": "DataReport",
            "instance": "DataController-SmartVisionData",
            "messageId": "Unique identifier, preferably a version 4 UUID",
            "correlationToken": "Opaque correlation token that matches the request",
            "payloadVersion": "1.0"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "Opaque correlation token"
            },
            "endpointId": "endpoint id"
        },
        "payload": {
            "paginationContext": {
                "nextToken": "token"
            },
            "dataSchema": {
                "type": "JSON",
                "schema": "SmartVisionData"
            },
            "data": [{
                    "eventIdentifier": "2b3409df-d686-4a52-9bba-d361860bac61",
                    "imageNetClass": "person",
                    "mediaId": "2c3409df-d686-4a52-9bba-d361860bac61",
                    "objectIdentifier": "4a3409df-d686-4a52-9bba-d361860bacbf",
                    "frameIndex": 2,
                    "frameWidthInPixels": 1980,
                    "frameHeightInPixels": 1080,
                    "frameImageUri": "https://example.com/frames/frame1.jpg",
                    "croppedImageUri": "https://example.com/images/image1.jpg"
                },
                {
                    "eventIdentifier": "2b3409df-d686-4a52-9bba-d361860bac61",
                    "imageNetClass": "person",
                    "mediaId": "2c3409df-d686-4a52-9bba-d361860bac62",
                    "objectIdentifier": "4a3409df-d686-4a52-9bba-d361860bacbf",
                    "frameIndex": 5,
                    "frameWidthInPixels": 1980,
                    "frameHeightInPixels": 1080,
                    "frameImageUri": "https://example.com/frames/frame2.jpg",
                    "croppedImageUri": "https://example.com/images/image2.jpg"
                }
            ]
        }
    }
}

Smart-vision data schema definition

The following table shows the JSON data schema defined by the Alexa.SmartVision.ObjectDetectionSensor interface.

Property Description Type Required

eventIdentifier

Uniquely identifies the event in the event history. You can send updated data for the same camera session and customer.

Version 4 UUID String

Yes

imageNetClass

Class of the detected object.

String

Yes

mediaId

Uniquely identifies the media recording in which the event occurred.

Version 4 UUID String

No

frameIndex

Frame number.

Integer

No

frameWidthInPixels

Width of the frame in pixels.

Integer

No

frameHeightInPixels

Height of the frame in pixels.

Integer

No

objectIdentifier

Uniquely identifies the physical object detected in this session. Generate an identifier for each detected object. If your device doesn't distinguish between detected objects, don't include this identifier.

Version 4 UUID String

No

frameImageUri

URI to the frame that shows the detected object. The frame gives context to the scene where the device detected the object.
If the device extracts frames, you can retrieve the frames by using the Alexa.DataController interface.

String

No

croppedImageUri

URI to the cropped image centered on the detected object.
If the device extracts cropped images, you can retrieve the images by using the Alexa.DataController interface.

String

No

State reporting

Alexa sends a ReportState directive to request information about the state of an endpoint. When Alexa sends a ReportState directive, you send a StateReport event in response. The response contains the current state of all retrievable properties in the context object. You identify your retrievable properties in your discovery response. For details about state reports, see Understand State and Change Reporting.

StateReport response example

In this example, the smart-vision endpoint supports the person and package object detection classes.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa",
            "name": "StateReport",
            "messageId": "Unique identifier, preferably a version 4 UUID",
            "correlationToken": "Opaque correlation token that matches the request",
            "payloadVersion": "3"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "OAuth2 bearer token"
            },
            "endpointId": "endpoint id",
            "cookie": {}
        },
        "payload": {}
    },
    "context": {
        "properties": [{
                "namespace": "Alexa.SmartVision.ObjectDetectionSensor",
                "name": "objectDetectionClasses",
                "value": [{
                        "imageNetClass": "person"
                    },
                    {
                        "imageNetClass": "package"
                    }
                ],
                "timeOfSample": "2024-07-03T11:20:50.52Z",
                "uncertaintyInMilliseconds": 0
            },
            {
                "namespace": "Alexa.EndpointHealth",
                "name": "connectivity",
                "value": {
                    "value": "OK"
                },
                "timeOfSample": "2024-07-03T10:45:00.52Z"
                "uncertaintyInMilliseconds": 0
            }
        ]
    }
}

Change reporting

You send a ChangeReport event to report changes proactively in the state of an endpoint. You identify the properties that you proactively report in your discovery response. For details about change reports, see Understand State and Change Reporting.

The payload contains the values of properties that have changed, the context contains the values of other relevant properties.

ChangeReport event example

The following example shows a ChangeReport event after the customer changes their preference for which objects to detect.

Copied to clipboard.

{
    "event": {
        "header": {
            "namespace": "Alexa",
            "name": "ChangeReport",
            "messageId": "Unique identifier, preferably a version 4 UUID",
            "payloadVersion": "3"
        },
        "endpoint": {
            "scope": {
                "type": "BearerToken",
                "token": "OAuth2 bearer token"
            },
            "endpointId": "endpoint id"
        },
        "payload": {
            "change": {
                "cause": {
                    "type": "PHYSICAL_INTERACTION"
                },
                "properties": [{
                    "namespace": "Alexa.SmartVision.ObjectDetectionSensor",
                    "name": "objectDetectionClasses",
                    "value": [{
                        "imageNetClass": "person"
                    }],
                    "timeOfSample": "2024-07-03T10:20:50.52Z",
                    "uncertaintyInMilliseconds": 0
                }]
            }
        }
    },
    "context": {
        "properties": [{
            "namespace": "Alexa.EndpointHealth",
            "name": "connectivity",
            "value": {
                "value": "OK"
            },
            "timeOfSample": "2024-07-03T10:19:02.12Z",
            "uncertaintyInMilliseconds": 60000
        }]
    }
}

Was this page helpful?

Last updated: Aug 23, 2024