CN111919250A

CN111919250A - Intelligent assistant device for conveying non-language prompt

Info

Publication number: CN111919250A
Application number: CN201980022427.2A
Authority: CN
Inventors: S·N·巴蒂克; V·普拉德普; A·N·贝内特; D·G·奥尼尔; A·C·里德; K·J·卢克韦耶科; T·I·柯拉沃利
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2018-03-26
Filing date: 2019-03-19
Publication date: 2020-11-10
Also published as: WO2019190812A1; EP3776537A1

Abstract

The smart assistant device is configured to communicate non-verbal cues. Image data indicating the presence of a person is received from one or more cameras of the device. In response, one or more components of the device are actuated to communicate the presence of the person in a non-verbal manner. Data indicative of contextual information of a person is received from one or more sensors. Using at least the data, one or more contexts of the person are determined and one or more components of the device are actuated to convey the one or more contexts of the person in a non-verbal manner.

Description

Intelligent assistant device for conveying non-language prompt

Background

Intelligent assistant devices, such as voice (voice) command devices or "smart speakers" and their virtual assistants, may receive and process language (verbal) queries and commands to provide intelligent assistance to users. These devices are typically activated by speaking keywords and provide a linguistic response to a request via computerized speech (speech) broadcast to the user. However, these devices do not provide non-verbal communication (communication) without user commands or requests.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.

A method for communicating non-verbal cues, comprising: image data indicative of a presence of a person is received from one or more cameras of the intelligent assistant device. In response, one or more components of the device are actuated to communicate the presence of the person in a non-verbal manner. Data indicative of contextual information of a person is received from one or more sensors of a device. Using at least the data, one or more contexts of the person are determined. In response, one or more components of the smart assistant device are actuated to communicate one or more contexts of the person in a non-verbal manner.

Drawings

FIG. 1 illustrates an example environment for an intelligent assistant device in the form of an all-in-one computing device according to one example of this disclosure.

Fig. 2 schematically illustrates one example of the intelligent assistant device of fig. 1, in accordance with an example of the present disclosure.

Fig. 3 schematically illustrates another example of an intelligent assistant device according to an example of the present disclosure.

Fig. 4 schematically illustrates another example of an intelligent assistant device according to an example of the present disclosure.

Fig. 5A and 5B schematically illustrate another example of an intelligent assistant device according to an example of the present disclosure.

Fig. 6 schematically illustrates an example logical architecture for implementing an intelligent assistant system, according to an example of this disclosure.

Fig. 7 schematically illustrates an entity tracking computing system that can determine the identity, location (position), and/or current state of one or more entities according to an example of the present disclosure.

Fig. 8 schematically illustrates an entity tracking computing system that receives and interprets (interpret) sensor data over multiple time frames according to an example of the present disclosure.

Fig. 9 schematically illustrates one example of sensor confidence decay over time via an entity tracking computing system, according to an example of the present disclosure.

Fig. 10 schematically illustrates one example of identifying a person's speech using a trained voice (voice) recognition engine according to an example of the present disclosure.

Fig. 11A and 11B schematically illustrate a field of detection (FOD) of a sensor of a smart assistant device in an environment according to an example of the present disclosure.

12A, 12B, and 12C illustrate a method for communicating non-verbal cues via a smart assistant device configured to respond to natural language input, according to an example of the present disclosure.

Fig. 13A and 13B schematically illustrate detection of an entity in the FOD of a sensor.

Fig. 14 schematically illustrates an array of light sources on an intelligent assistant device, according to an example of the present disclosure.

Fig. 15A-15D schematically illustrate a display device displaying animated shapes in an intelligent assistant device according to examples of the present disclosure.

Fig. 16 schematically illustrates an example of two people being detected by an intelligent assistant device according to an example of the present disclosure.

Fig. 17 is a schematic top view of the room of fig. 16, showing one example of a smart assistant device communicating the location (location) of two people in a non-verbal manner according to an example of the present disclosure.

Fig. 18 schematically illustrates an example of an all-in-one computing device in which components of an intelligent assistant device implementing the present disclosure are arranged together in a stand-alone device, according to an example of the present disclosure.

Fig. 19 schematically illustrates a computing system according to an example of the present disclosure.

Detailed Description

The popularity of intelligent assistant devices reflects their increasing popularity as people seek greater convenience in everyday life. As described above, such devices may perform tasks and services for users via facilitated voice interaction. However, since these devices do not provide non-verbal communication of their understanding of the user, much of the information that helps in expressing (covey) cannot be communicated.

Non-verbal communication is often used consciously and unconsciously to convey useful understanding when people interact with each other. For example, when Alice (Alice) walks on the street and another person is in proximity to bardana (Bhavana), a non-verbal prompt from bardana may convey to Alice certain understanding of bardana about Alice. If bavara looks at alice with a curious line of sight and face, she communicates to alice that she may know alice or think that she knows alice. If bavara showed a clear pleasure and surprise to alice, she communicated that she was very exciting to see alice. On the other hand, if the rowa brow and diverted her road from alice, the information she conveyed is quite different. Of course, many other types and forms of non-verbal communication (such as gestures, distances, etc.) may also provide non-verbal cues and cues.

Such non-verbal communication allows for person-to-person interaction to be more informative and richer. Accordingly, the present disclosure relates to intelligent assistant devices and methods for communicating non-verbal information via such devices. The methods and techniques discussed herein are primarily described from the perspective of a stand-alone, all-in-one smart assistant device that is configured to respond to natural language input, for example, by answering questions or performing actions. The intelligent assistant device utilizes an entity tracking computing system. In some examples, the tracking of entities in the environment may be performed using only sensor input from the smart assistant device. In other examples, the tracking of the entity may be performed using various smart assistant computing devices and/or other sensors, security devices, home automation devices, and so forth.

Fig. 1 illustrates a person 2 entering a living room (living room)4, an example of which is a smart assistant device 10 in the form of an all-in-one computing device. As described in more detail below, in some examples, the smart assistant device 10 may be configured to receive and process natural language input. A user may use the intelligent assistant device for a variety of functions. For example, a user may provide natural language input to ask the smart assistant device to perform various tasks, such as providing information, changing the state of the device, sending a message, completing a purchase, and so forth.

The user may query the system for information about a wide range of topics, such as weather, personal calendar events, movie show times, and the like. In some examples, the smart assistant device 10 may also be configured to control elements in the living room 4, such as the television 6, the speakers 8 of the music system, or the motorized shades 16. The intelligent assistant device 10 may also be used to receive and store messages and/or reminders for delivery at an appropriate future time. Using data received from the sensors, the smart assistant device may track and/or communicate with one or more users or other entities. Additionally and as described in more detail below, the smart assistant device 10 may communicate non-verbal information to the user via one or more light sources and/or other components of the device.

In some examples, the smart assistant device 10 may be operatively connected with one or more other computing devices using a wired connection, or may employ a wireless connection via Wi-Fi, bluetooth, or any other suitable wireless communication protocol. For example, the intelligent assistant device 10 may be communicatively coupled to one or more other computing devices via a network. The network may take the form of a Local Area Network (LAN), Wide Area Network (WAN), wired network, wireless network, personal area network, or a combination thereof, and may include the internet. Additional details regarding the components and computing aspects of the intelligent assistant device 10 are described in more detail below with reference to fig. 19.

Although the smart assistant device may be operatively connected to other devices, as described above, in some examples, the smart assistant device may perform the methods and techniques described herein entirely locally via one or more processors on the device board. Advantageously, in these examples, any latency, bandwidth limitations, and other drawbacks associated with exchanging data with a remote server or other device are eliminated. In this way, more real-time interaction and non-verbal communication with the user is possible.

Fig. 2 schematically illustrates one example implementation of an intelligent assistant device according to the present disclosure. In this example, the smart assistant device 10 is an all-in-one computing device that includes various sensors, output devices, and other components. The device includes an intelligent assistant system 10 according to an example of the present disclosure, the intelligent assistant system 10 being capable of recognizing and responding to natural language input. Additional description and details of the components and functions performed by the intelligent assistant system 10 are provided below.

In the example of fig. 2, the smart assistant device 10 includes a cylindrical housing 80, the cylindrical housing 80 housing a microphone 81, a camera 82, a speaker 83, and a plurality of light sources 84 positioned around at least a portion of the housing. In this example, the light source 84 comprises an LED. In other examples, one or more of the light sources 84 may include one or more display devices or any other suitable type of light source. Additionally and as described in more detail below, one or more of the light sources 84 may be illuminated and modulated to communicate information to the user in a non-verbal manner.

In different examples, the microphone 81 may include multiple microphones (such as a microphone array) arranged at various locations on the device. In this example, three

cameras

82A, 82B, and 82C are shown, and a fourth camera (not visible) is located on the back side of the housing. In this example, the fields of view of the four cameras 82 overlap to enable the smart assistant device 10 to receive image data from the entire 360 degrees around the device. In other examples, fewer or more cameras, and configurations providing less than a 360 degree field of detection (FOD) may be used. Additional details regarding various types of cameras, microphones, and other sensors that may be used with the smart assistant device 10 are provided below.

In other examples, one or more light sources in the form of display devices may be used in addition to or in place of LEDs. For example and referring to fig. 3, another implementation of a smart assistant device 150 is schematically illustrated, the smart assistant device 150 including a display 152 around the perimeter of the housing 80. In this example, as described in the examples below, the display 152 may be used to display vector graphics 154 (such as various static or animated shapes, patterns, etc.) to communicate with the user in a non-verbal manner.

In other examples, the smart assistant device may utilize one or more projectors to project non-verbal cues onto a surface in addition to or in lieu of using LEDs and/or one or more displays to provide non-verbal communication. For example and referring to fig. 4, schematically illustrating another implementation of the smart assistant device 158, the smart assistant device 158 includes a projector 180, the projector 180 may project light onto a surface. In this example, projector 180 projects an image of circle 182 onto a surface 184 of a table on which the device is located. As described in more detail below, such projected light may create any number of static or animated shapes, patterns, icons, etc. that may be used to convey non-verbal cues to a user.

In other examples, the smart assistant device may actuate one or more other components to communicate information to the user in a non-verbal manner in addition to or instead of using LEDs, one or more displays, and/or one or more projectors to provide non-verbal communication. For example and referring to fig. 5A and 5B, another implementation of a smart assistant device 186 is schematically illustrated, the smart assistant device 186 including a movable (movable) top 188, the movable top 188 including a camera 189. In this example and as described in more detail below, the movable top 188 may be actuated to convey a non-verbal cue to a user. In some examples, the smart assistant device 186 may track the location of the person, and the movable top 188 may be moved around the perimeter of the device to follow the location of the person and aim the camera 189 at (foveateon) the person.

It will be understood that the example

smart assistant devices

10, 150, 158, and 186 described and illustrated in fig. 2-5 are provided for illustrative purposes only, and that many other form factors, shapes, configurations, and other variations of such devices may be used and are within the scope of the present disclosure.

Referring now to fig. 6, the following is a description of an example logical architecture for implementing the intelligent assistant system 20, the intelligent assistant system 20 being capable of recognizing and responding to natural language inputs, according to an example of the present disclosure. As described in more detail below, in various examples, system 20 may be implemented in a single, unitary computing device (such as smart assistant device 10), across two or more devices, in a cloud-supported network, and in combinations of the above.

In this example, the intelligent assistant system 10 includes at least one sensor 22, an entity tracking computing system 100, a voice listener (listener)30, a parser 40, an intent processor 50, a commitment engine (commitment engine)60, and at least one output device 70. In some examples, the sensors 22 may include one or more microphones 24, a visible light camera 26, an infrared camera 27, and a connectivity device 28 (such as a Wi-Fi or bluetooth module). In some examples, sensor(s) 22 may include a stereo and/or depth camera, a head tracker, an eye tracker, an accelerometer, a gyroscope, a gaze detection (gaze detection) device, an electric field sensing component, a GPS or other position tracking device, a temperature sensor, a device status sensor, and/or any other suitable sensor.

The entity tracking computing system 100 is configured to detect entities and their activities, including humans, animals or other biological and non-biological objects. The entity tracking computing system 100 includes an entity identifier (identifier)104, the entity identifier 104 configured to identify a person, an individual user, and/or a non-biological object. The voice listener 30 receives audio data and translates (translate) utterances into text using speech recognition functionality. The voice listener 30 can also assign confidence value(s) to the translated text, and can perform speaker recognition to determine the identity of the person being spoken, as well as assign probabilities to the accuracy of such identification. The parser 40 analyzes the text and confidence values received from the voice listener 30 to derive user intent and generate a corresponding machine executable language.

The intent processor 50 receives the machine executable language representing the user's intent from the parser 40 and parses the missing and ambiguous information to generate commitments. The promising engine 60 stores the promise from the intent processor 50. At a contextually appropriate time, the promising engine may deliver one or more messages and/or perform one or more actions associated with one or more promises. The promising engine 60 may store messages in the message queue 62 or cause one or more output devices 70 to generate output. The output device 70 may include one or more of the following: speaker(s) 72, video display(s) 74, indicator light(s) 76, haptic device(s) 78, and/or other suitable output devices. In other examples, the output device 70 may include one or more other devices or systems (e.g., home lights, thermostats, media programs, door locks, etc.) that may be controlled via actions performed by the promising engine 60.

In different examples, the voice listener 30, the parser 40, the intent processor 50, the promising engine 60, and/or the entity tracking computing system 100 may be embodied in software that is stored in memory and executed by one or more processors of the computing device. In some implementations, a programming-specific logic processor may be used to increase the computational efficiency and/or effectiveness of the intelligent assistant device. Additional details regarding the components and computing aspects of the computing device that may store and execute these modules are described in more detail below with reference to FIG. 19.

In some examples, the voice listener 30 and/or the promising engine 60 may receive contextual information from the entity tracking computing system 100, the contextual information including an associated confidence value. As described in more detail below, the entity tracking computing system 100 may determine the identity, location, and/or current status of one or more entities within range of one or more sensors, and may output such information to one or more other modules, such as the voice listener 30, the promising engine 60, and so forth. In some examples, the entity tracking computing system 100 may interpret and evaluate sensor data received from one or more sensors and may output contextual information based on the sensor data. The contextual information may include guesses/predictions by the entity tracking computing system of the identity, location, and/or state of one or more detected entities based on the received sensor data. In some examples, the guess/prediction may additionally include a confidence value that defines a statistical likelihood that the information is accurate.

Fig. 7 schematically illustrates an example entity tracking computing system 100, which entity tracking computing system 100 may include components of the intelligent assistant system 20 in some examples. The entity tracking computing system 100 may be used to determine the identity, location, and/or current status of one or more entities within range of one or more sensors. The entity tracking computing system 100 may output such information to one or more other modules of the intelligent assistant system 20, such as the promising engine 60, voice listener 30, and so forth.

The word "entity" as used in the context of the entity tracking computing system 100 may refer to humans, animals, or other biological and non-biological objects. For example, the entity tracking computing system may be configured to identify furniture, appliances, autonomous robots, structures, landscape features, vehicles, and/or any other physical objects and determine the location/position and current state of those physical objects. In some cases, the entity tracking computing system 100 may be configured to identify only people, and not other creatures or non-creatures. In such cases, the word "entity" may be synonymous with the words "person" or "human".

The entity tracking computing system 100 receives sensor data from one or more sensors 102 (such as sensor a 102A, sensor B102B, and sensor C102C), but it will be understood that the entity tracking computing system may be used with any number and variety of suitable sensors. By way of example, sensors that may be used with an entity tracking computing system may include cameras (e.g., visible light cameras, UV cameras, IR cameras, depth cameras, thermal cameras), microphones, directional microphone arrays, pressure sensors, thermometers, motion detectors, proximity sensors, accelerometers, Global Positioning Satellite (GPS) receivers, magnetometers, radar systems, lidar systems, environmental monitoring devices (e.g., smoke detectors, carbon monoxide detectors), barometers, health monitoring devices (e.g., electrocardiographs, sphygmomanometers, electroencephalographs), automotive sensors (e.g., speedometers, odometers, tachometers, fuel sensors), and/or any other sensor or device that collects and/or stores information relating to the identity, location, and/or current status of one or more persons or other entities. In some examples, such as in the smart assistant device 10, the entity tracking computing system 100 may occupy a common device housing with one or more of the plurality of sensors 102. In other examples, the entity tracking computing system 100 and its associated sensors may be distributed across multiple devices configured to communicate via one or more network communication interfaces (e.g., Wi-Fi adapters, bluetooth interfaces).

As shown in the example of fig. 7, the entity tracking computing system 100 may include an entity identifier 104, a person identifier 105, a location (position) identifier 106, and a status identifier 108. In some examples, the person identifier 105 may be a dedicated component of the entity identifier 100 that is specifically optimized to recognize humans relative to other biological and non-biological entities. In other cases, the person identifier 105 may operate separately from the entity identifier 104, or the entity tracking computing system 100 may not include a dedicated person identifier.

Any or all of the functions associated with the entity identifier, person identifier, location identifier, and status identifier may be performed by the individual sensors 102A-102C, depending on the particular implementation. Although the present description generally describes the entity tracking computing system 100 as receiving data from sensors, this does not require that the entity identifier 104, as well as other modules of the entity tracking computing system, be implemented on a single computing device that is separate and distinct from the plurality of sensors associated with the entity tracking computing system. Rather, the functionality of the entity tracking computing system 100 may be distributed among multiple sensors or other suitable devices. For example, rather than sending raw sensor data to the entity tracking computing system, individual sensors may be configured to attempt to identify the entities they detect and report that identification to the entity tracking computing system 100 and/or other modules of the intelligent assistant system 20. Furthermore, to simplify the following description, the term "sensor" is sometimes used to describe not only a physical measurement device (e.g., a microphone or camera), but also various logical processors configured and/or programmed to interpret signals/data from the physical measurement device. For example, "microphone" may be used to refer to a device that converts acoustic energy to an electrical signal, an analog-to-digital converter that converts the electrical signal to digital data, an on-board application specific integrated circuit that pre-processes the digital data, and downstream modules described herein (e.g., entity tracking computing system 100, entity identifier 104, voice listener 30, or parser 40). As such, references to a generic "sensor" or a specific sensor (e.g., "microphone" or "camera") should not be construed to refer only to a physical measurement device, but rather to a collaboration module/engine that may be distributed across one or more computers.

Each of the entity identifier 104, the person identifier 105, the location identifier 106, and the status identifier 108 is configured to interpret and evaluate sensor data received from the plurality of sensors 102 and output context information 110 based on the sensor data. The contextual information 110 may include guesses/predictions by the entity tracking computing system of the identity, location, and/or state of one or more detected entities based on the received sensor data. As will be described in more detail below, each of the entity identifier 104, the person identifier 105, the location identifier 106, and the status identifier 108 may output their predictions/identifications and confidence values.

The entity identifiers 104, person identifiers 105, location identifiers 106, status identifiers 108, and other processing modules described herein may utilize one or more machine learning techniques. Non-limiting examples of such machine learning techniques may include feed-forward networks, Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), convolutional neural networks, Support Vector Machines (SVMs), generative countermeasure networks (GANs), variant autocoders, Q learning, and decision trees. The various identifiers, engines, and other processing blocks described herein may be trained using these or any other suitable machine learning techniques, via supervised and/or unsupervised learning, to make the described evaluations, decisions, identifications, and so forth.

The entity identifier 104 may output an entity identity 112 of the detected entity, and such entity identity may have any suitable degree of specificity. In other words, based on the received sensor data, the entity tracking computing system 100 may predict the identity of a given entity and output information such as entity identity 112. For example, the entity identifier 104 may report that a particular entity is a person, furniture, dog, or the like. Additionally or alternatively, the entity identifier 104 may report that a particular entity is an oven having a particular model; pet dogs of a particular name and breed; an owner or known user of the intelligent assistant device 10, where the owner/known user has a particular name and profile. In different examples, the entity may be identified in any of a variety of suitable manners: potentially involving facial recognition, voice recognition, detecting the presence of a portable computing device associated with a known entity, assessing a person's height, weight, body shape, gait, hair style, and/or shoulder shape, etc.

In some examples, the entity identifier 104 may determine two or more levels of identity of a person. Such identity levels may correspond to one or more identity certainty thresholds represented by confidence values. For example, such an identity level may include an initial identity corresponding to a previously identified person and representing an initial confidence value, and a verified identity representing a verified confidence value that is greater than the initial confidence value that the person is a previously identified person. For example, an initial identity of the person may be determined, wherein the associated confidence value maps to at least 99.0000% of the likelihood that the person is a previously identified person. A verified identity of the person may be determined, wherein the associated confidence value maps to at least 99.9990% of the likelihood that the person is a previously identified person. For example, a verified identity may be required to authenticate personnel at an enterprise security level to access particularly sensitive data, such as bank accounts, confidential company information, health-related information, and the like. In some examples, the degree of specificity to which the entity identifier 104 identifies/classifies the detected entity may depend on one or more of user preferences and sensor limitations. In some cases, the entity identity output by the entity identifier may simply be a universal identifier, which does not provide information about the nature of the tracked entity, but rather is used to distinguish one entity from another.

When applied to a person, the entity tracking computing system 100 may, in some cases, collect information about the person that cannot be identified by name. For example, the entity identifier 104 may record images of a person's face and associate the images with the audio of the recorded person's voice. If the person then speaks or is otherwise addressed to the intelligent assistant system 20, the entity tracking computing system 100 will have at least some information about who the intelligent assistant device is interacting with. In some examples, the intelligent assistant system 20 may also prompt the person to state their name in order to more easily identify the person in the future.

In some examples, the intelligent assistant device 10 may utilize the identity of a person to customize the user interface for that person. In one example, a user with limited visual capabilities may be identified. In this example and based on the identification, the display of the smart assistant device 10 (or other device with which the user interacts) may be modified to display larger text, or to provide a voice-only interface.

The location identifier 106 may be configured to output an entity location (i.e., position) 114 of the detected entity. In other words, the location identifier 106 may predict the current location of a given entity based on the collected sensor data and output information such as the entity location 114. Like entity identity 112, entity location 114 may have any suitable level of detail, and this level of detail may vary with user preferences and/or sensor limitations. For example, the location identifier 106 may report that the detected entity has a two-dimensional location defined on a plane such as a floor or wall. In some examples, the entity location 114 (such as an angular direction or distance from such a device) may be determined relative to the smart assistant device. Additionally or alternatively, the reported entity location 114 may include a three-dimensional location of the detected entity within a three-dimensional environment of the real world. In some examples, the physical location 114 may include a GPS location, a position fix within an environment-dependent coordinate system, and the like.

The reported entity location 114 for the detected entity may correspond to the geometric center of the entity, a particular portion of the entity that is classified as important (e.g., a human head), a series of bounds that define the boundaries of the entity in three-dimensional space, and so forth. The location identifier 106 may further calculate one or more additional parameters describing the location and/or orientation of the detected entity, such as pitch, roll, and/or yaw parameters. In other words, the reported position of the detected entity may have any number of degrees of freedom and may include any number of coordinates that define the position of the entity in the environment. In some examples, the entity location 114 of the detected entity may be reported even if the entity tracking computing system 100 is unable to identify the entity and/or determine the current state of the entity.

The state identifier 108 may be configured to output an entity state 116 of the detected entity. In other words, the entity tracking computing system 100 may be configured to predict the current state of a given entity based on received sensor data and output information such as the entity state 116. An "entity state" may refer to virtually any measurable or classifiable attribute, activity, or behavior of a given entity. For example, when applied to a person, the person's physical state may indicate the person's presence, the person's height, the person's posture (e.g., standing, sitting, lying down), the speed at which the person is walking/running, the person's current activity (e.g., sleeping, watching television, working, playing games, swimming, making a phone call), the person's current mood (e.g., by assessing the person's facial expressions or mood), the person's biological/physiological parameters (e.g., the person's heart rate, respiration rate, blood oxygen saturation, body temperature, neural activity), whether the person has any current or upcoming calendar events/appointments, and the like. "entity status" may refer to additional/alternative attributes or behaviors when applied to other biological or non-biological objects, such as the current temperature of an oven or kitchen sink, whether a device (e.g., television, light, microwave) is turned on, whether a door is open, etc.

In some examples, the state identifier 108 may use sensor data to calculate various different biological/physiological parameters of a human. This may be done in various suitable ways. For example, the entity tracking computing system 100 may be configured to interface with an optical heart rate sensor, a pulse oximeter, a sphygmomanometer, an electrocardiograph, and the like. Additionally or alternatively, the state identifier 108 may be configured to interpret data from one or more cameras and/or other sensors in the environment, and process the data to calculate a person's heart rate, respiration rate, blood oxygen saturation, and the like. For example, the status identifier 108 may be configured to utilize euler magnification and/or similar techniques to magnify minor movements or changes captured by the camera, thereby allowing the status identifier to visualize blood flow through the person's circulatory system and calculate associated physiological parameters. Such information may be used to determine, for example, when a person falls asleep, exercises, falls into a predicament, encounters a health problem, and so forth.

Upon determining one or more of the entity identity 112, entity location 114, and entity status 116, such information may be sent as context information 110 to any of a variety of external modules or devices, where such information may be used in a variety of ways. For example, and as described in more detail below, the context information 110 may be used to determine one or more contexts of a human user, and to actuate one or more components of the intelligent assistant device to communicate the one or more contexts to the user in a non-verbal manner. Additionally, the context information 110 may be used by the promising engine 60 to manage commitments and associated messages and notifications. In some examples, the context information 110 may be used by the promising engine 60 to determine whether a particular message, notification, or promise should be performed and/or presented to the user. Similarly, the context information 110 can be used by the voice listener 30 when interpreting human speech or activating a function in response to a keyword trigger.

As described above, in some examples, the entity tracking computing system 100 may be implemented in a single computing device (such as the smart assistant device 10). In other examples, one or more functions of the entity tracking computing system 100 may be distributed across multiple computing devices working in concert. For example, one or more of the entity identifier 104, the person identifier 105, the location identifier 106, and the status identifier 108 may be implemented on different computing devices, while still collectively comprising an entity tracking computing system configured to perform the functions described herein. As described above, any or all of the functions of the entity tracking computing system may be performed by individual sensors 102. Further, in some examples, the entity tracking computing system 100 may omit one or more of the entity identifier 104, the person identifier 105, the location identifier 106, and the status identifier 108, and/or include one or more additional components not described herein, while still providing the context information 110. Additional details regarding components and computing aspects that may be used to implement the entity tracking computing system 100 are described in more detail below with respect to fig. 19.

Each of the entity identity 112, entity location 114, and entity status 116 may take any suitable form. For example, each of the entity identity 112, location 114, and status 116 may take the form of a discrete data packet that includes a series of values and/or tags that describe information collected by the entity tracking computing system. Each of the entity identity 112, location 114, and status 116 may additionally include a confidence value that defines a statistical likelihood that the information is accurate. For example, if the entity identifier 104 receives sensor data that strongly indicates that a particular entity is a human male named "John Smith," the entity identity 112 may include this information along with a corresponding relatively high confidence value (such as a 90% confidence). If the sensor data is more ambiguous, the confidence value included in the entity identity 112 may be relatively low (such as 62%). In some examples, separate predictions may be assigned separate confidence values. For example, the entity identity 112 may indicate that a particular entity is a human male with a 95% confidence level, and that the entity is john smith with a 70% confidence level. Such confidence values (or probabilities) may be used by a cost function to generate cost calculations for providing a message or other notification to a user and/or performing action(s).

In some implementations, the entity tracking computing system 100 may be configured to combine or fuse data from multiple sensors in order to determine context information 100 and corresponding contexts and output more accurate predictions. As an example, a camera may locate a person in a particular room. Based on the camera data, the entity tracking computing system 100 may identify the person with a 70% confidence value. However, the entity tracking computing system 100 may additionally receive recorded speech from a microphone. Based solely on the recorded speech, the entity tracking computing system 100 may identify the person with a confidence value of 60%. By combining data from the camera with data from the microphone, the entity tracking computing system 100 can identify a person with a higher confidence value than using data from either sensor alone. For example, the entity tracking computing system may determine that the recorded speech received from the microphone corresponds to movement of the lips of a person visible to the camera when the speech is received, concluding with a relatively high degree of confidence (such as 92%) that the person visible to the camera is the person speaking. In this manner, the entity tracking computing system 100 may combine two or more predicted confidence values to identify a person with a combined higher confidence value.

In some examples, data received from various sensors may be weighted differently depending on the reliability of the sensor data. This may be particularly relevant where multiple sensors are outputting data that appears to be inconsistent. In some examples, the reliability of the sensor data may be based at least in part on the type of data generated by the sensor. For example, in some implementations, the reliability of video data may be given a higher weight than the reliability of audio data, as the presence of an entity on a camera may be a more reliable indicator of its identity, location and/or status than the recorded sound (sound) assumed to originate from the entity. It should be appreciated that the reliability of the sensor data is a different factor than the confidence value associated with the prediction accuracy of the data instance. For example, several instances of video data may have different confidence values based on different contextual factors present at each instance. However, in general, each of these instances of video data may be associated with a single reliability value for the video data.

In one example, data from the camera may suggest a particular person is in the kitchen with a 70% confidence value, such as via facial recognition analysis. Data from the microphones may suggest that the same person is in a nearby hallway with a 75% confidence value, such as via voice recognition analysis. Even if the instance of the microphone data has a higher confidence value, the entity tracking computing system 100 may output a prediction of the person in the kitchen based on a higher reliability of the camera data (as compared to a lower reliability of the microphone data). In this manner and in some examples, different reliability values for different sensor data may be used with the confidence values to reconcile conflicting sensor data and determine the identity, location, and/or status of the entity.

Additionally or alternatively, more weight may be given to sensors with greater accuracy, more processing power, or other greater capabilities. For example, professional-level cameras can have significantly improved lenses, image sensors, and digital image processing capabilities compared to basic web cameras (webcams) in notebook computers. Thus, a higher weight/reliability value may be given to video data received from professional-grade cameras as compared to web cameras, since such data may be more accurate.

Referring now to fig. 8, in some examples, individual sensors used with the entity tracking computing system 100 may output data at different frequencies than other sensors used with the entity tracking computing system. Similarly, sensors used with the entity tracking computing system 100 may output data at a different frequency than the frequency at which the entity tracking computing system evaluates the data and outputs context information. In the example of fig. 8, the entity tracking computing system 100 may receive and interpret sensor data over multiple time frames 200A, 200B, and 200C. The single time frame may represent any suitable length of time, such as 1/30 seconds, 1/60 seconds, and so forth.

In this example, during the time frame 200A, the entity tracking computing system 100 receives a set of sensor data 202, the set of sensor data 202 including sensor a data 204A, sensor B data 204B, and sensor C data 204C. Such sensor data is interpreted and transformed by the entity tracking computing system 100 into contextual information 206, which contextual information 206 may be used to determine the identity, location, and/or status of one or more detected entities as described above. During time frame 200B, the entity tracking computing system 100 receives sensor data 208, the sensor data 208 including sensor a data 210A and sensor B data 210B. The entity tracking computing system 100 does not receive data from sensor C during time frame 200B because sensor C outputs data at a different frequency than sensors a and B. Similarly, the entity tracking computing system 100 does not output context information during time frame 200B because the entity tracking computing system outputs context information at a different frequency than sensors a and B.

During time frame 200C, the entity tracking computing system 100 receives sensor data 212, the sensor data 212 including sensor a data 214A, sensor B data 214B, sensor C data 214C, and sensor D data 214D. The entity tracking computing system 100 also outputs the context information 216 during time frame 200C, and because the context information was last output in time frame 200A, the context information 216 may be based on any or all of the sensor data received by the entity tracking computing system. In other words, context information 216 may be based at least in part on sensor data 208 as well as sensor data 212. In some examples, context information 216 may also be based at least in part on sensor data 202 and sensor data 208 and sensor data 212.

As shown in fig. 8, after the entity tracking computing system 100 receives data from a particular sensor, multiple time frames may pass before the entity tracking computing system receives more data from the same sensor. During these multiple time frames, the entity tracking computing system 100 may output contextual information. Similarly, the usefulness of data received from a particular sensor may vary over a time frame. For example, at a first time frame, the entity tracking computing system 100 may receive audio data of a particular person speaking via a microphone and thus identify the person's entity location 114 with a relatively high confidence value. In a subsequent time frame, the person may stay at the identified location, but may have stopped speaking since the first time frame. In this case, the absence of useful data from the microphone is a reliable indicator that it may not be the absence of a person. Similar problems may occur with other types of sensors. For example, if a person occludes the face of the person or is occluded by an obstruction (such as another person or a moving object), the camera may lose track of the person. In this case, although the current camera data may not be able to signify the presence of a person, the previous instance of camera data may imply that the person is still located at the previously identified location. In general, while sensor data may reliably indicate the presence of an entity, such data may not be reliable when it suggests that an entity is not present.

Thus, the entity tracking computing system 100 may utilize one or more confidence attenuation functions, which may be defined by the entity tracking computing system and/or the sensors themselves in different examples. A confidence attenuation function may be applied to sensor data to reduce the confidence of an entity tracking computing system in data from a particular sensor over time since the sensor last positively detected an entity. As an example, after a sensor detects an entity at a particular location, the entity tracking computing system 100 may report context information 110 indicating that the entity is located at the location with a relatively high degree of confidence. If after one or more time frames, the sensor no longer detects the entity at the location, and unless conflicting evidence is subsequently collected, the entity tracking computing system 100 can still report that the entity is located at the location, but with a lower confidence. As time continues to elapse since the last time the sensor detected an entity at the location, the likelihood of the entity remaining at the location is progressively reduced. Thus, the entity tracking computing system 100 may utilize a confidence decay function to gradually reduce the confidence value of its reported contextual information 110, eventually reaching a 0% confidence if no additional sensors detect the entity.

In some cases, different confidence attenuation functions may be used for different sensors and sensor types. The selection of a particular decay function may depend, at least in part, on the particular properties of the sensor. For example, the confidence value associated with data from a camera may decay faster than the confidence value associated with data from a microphone because the absence of an entity in a video frame is a more reliable indicator that the entity is not present than the silence recorded by the microphone.

One example of sensor confidence attenuation is schematically illustrated in fig. 9, which fig. 9 shows the entity tracking computing system 100 receiving sensor data during three different time frames 300A, 300B, and 300C. During time frame 300A, entity tracking computing system 100 receives camera data 302 in which camera data 302 the entity is visible in the frame. Based on this data, the entity tracking computing system 100 reports the entity location 304 with a 90% confidence value. In time frame 300B, the entity tracking computing system 100 receives camera data 306, in which camera data 306 the entity is no longer visible in the frame. However, the entity may not move, but simply be occluded, or otherwise undetectable by the camera. Thus, the entity tracking computing system 100 reports the same entity location 304, but with a lower confidence value of 80%.

Finally, in time frame 300C, the entity tracking computing system 100 receives camera data 310 indicating that the entity is still not visible in the frame. Over time, there is less and less likelihood that the entities are still in the same location. Thus, the entity tracking computing system 100 reports the same entity location 304 with a lower confidence value of 60%.

In some examples, the variable reliability of sensor data may be at least partially addressed by utilizing data filtering techniques. In some examples, a Kalman filter (Kalman filter) may be used to filter the sensor data. A kalman filter is a mathematical function that can combine multiple uncertain measurements and output a prediction with a higher confidence than using any of the individual measurements. Each measurement input to the kalman filter may be weighted based on the perceived reliability of the measurement. The kalman filter operates in a two-step process, including a prediction step and an update step. During the prediction step, the filter outputs a prediction based on the most recently weighted measurement. In the update step, the filter compares its prediction to the actual observed values or states and dynamically adjusts the weights applied to each measured value to output a more accurate prediction.

In some examples, the entity tracking computing system 100 may include a kalman filter, such as, when the sensor confidence value has decayed over time since the last positive detection, the kalman filter combines data from the various sensors to compensate for the lower sensor reliability. In some examples, the entity tracking computing system 100 may apply a kalman filter to the sensor data when the one or more sensor confidence values are below a predetermined threshold. In an example scenario, image data from a camera may be analyzed using face detection techniques to reliably detect people in a particular room. In response, the entity tracking computing system 100 may report with a high degree of confidence that the person is located in the room.

In a subsequent time frame, the camera may no longer be able to capture and/or positively identify the face of a person in the room. For example, a person's face may be occluded, or a camera may transmit data much less frequently than the entity tracking computing system 100 outputs the contextual information 110. If the entity tracking computing system 100 relies solely on data from the camera, the confidence value of the location of the person it reports will gradually decrease until the next positive detection. However, in some examples, data from the camera may be supplemented with data from other sensors. For example, during a subsequent time frame, the microphone may report that it heard the person's voice in the room, or another sensor reports that it can detect the presence of the person's portable computing device in the room. In this case, the data may be assigned weights by a kalman filter and may be used to predict the current location of the person with a higher confidence than if only camera data were used.

In some cases, detection of people and/or other entities in the environment may become more complex when the sensor data is contaminated with background information. Such context information may compromise the confidence with which the entity tracking computing system 100 reports the entity identity 112, location 114, and/or status 116. For example, the intelligent assistant device 10 may need to determine the identity of the person who is speaking in order to respond to the query or command appropriately. Such a determination may be difficult when multiple people are speaking at the same time, are playing television, are running a noisy machine, etc.

Accordingly, the entity tracking computing system 100 may use various audio processing techniques to more confidently identify particular active participants that are conversing with others and/or with the intelligent assistant device 10. As one example, the entity tracking computing system 100 may implement a Voice Activity Detection (VAD) engine that may distinguish human voice from ambient noise and identify the presence or absence of human voice.

A generic VAD engine may be used for the purpose of classifying a particular audio segment as including speech or non-speech with a corresponding confidence value. The entity tracking computing system 100 may also utilize a speaker recognition engine to match particular audio pieces to particular persons. As more speech is received, the speaker recognition engine may be gradually adjusted to classify the audio as including speech from a particular conversation participant or not including speech from a particular conversation participant. In this manner, the entity tracking computing system 100 may recognize speech from one or more particular people/conversation participants.

Training of the speaker recognition engine may occur when the entity tracking computing system 100 confidently identifies a particular person and records audio that can be confidently attributed to that person. For example, using camera data, the entity tracking computing system 100 may identify a particular person and determine that the person's lips are moving. The entity tracking computing system 100 may simultaneously receive audio from the microphone, which may be safely assumed to include speech from the identified person. Thus, the received audio may be used to retrain the speaker recognition engine to more specifically recognize the voice of the identified person.

In some cases, such retraining may occur only when a person has been identified (e.g., via accurate facial recognition or any other method) with a high confidence value, such as a confidence value exceeding a predetermined threshold, and when the entity tracking computing system 100 has received an audio recording of the person' S voice with a high volume/amplitude and a high signal-to-noise ratio (S/N). Using this technique, the entity tracking computing system 100 can accumulate a variety of person-specific voice models, allowing the entity tracking computing system to more consistently identify speech from a particular person and ignore background noise.

Referring now to FIG. 10, an example of using a trained speech recognition engine to recognize speech from a particular person is schematically illustrated. In this example, the entity tracking computing system 100 receives two speech segments 400A and 400B. The voice segment 400A includes the recorded voice of person 1, and the voice segment 400B includes the recorded voice of person 2. The entity tracking computing system 100 includes a speech recognition engine 402, as described above, the speech recognition engine 402 has been specially trained to recognize the speech of person 1 using the voice 1 model 404. When speech segment 400A and speech segment 400B are received by entity tracking computing system 100, voice 1 model 404 may be applied to each of speech segment 400A and speech segment 400B.

After processing the speech segments, the entity tracking computing system 100 outputs a prediction of the likelihood that each speech segment corresponds to person 1. As shown, for speech segment 400A, the entity tracking computing system outputs person 1 identification 404A with a 90% confidence value, indicating that the speech segment may include speech from person 1. For speech segment 400B, the entity tracking computing system outputs person 1 identification 404B with a 15% confidence value, indicating that speech segment 400B may not include speech from person 1.

In some examples, the entity tracking computing system 100 may be configured to identify background noise present in the environment and subtract such background noise from the received audio data using audio processing techniques. For example, a particular device in one's home may be playing background audio, such as music or a television/movie conversation. Various microphone equipped devices in a person's home can record such audio. Where such microphone-equipped devices include the smart assistant device 10 and/or provide audio data to the entity tracking computing system 100, such background audio may compromise the ability of the system to recognize, interpret, and/or respond to human questions or commands.

Accordingly and in some examples, a device playing background audio and/or another microphone equipped device recording background audio may send captured audio signals to the entity tracking computing system 100. In this manner, the entity tracking computing system 100 may subtract background audio from the audio signal received by the microphone equipped device. In some examples, the operation of subtracting the background audio signal from the recorded audio data may be performed by the device(s) capturing the audio data, or by associated audio processing components, prior to sending the audio data to the entity tracking computing system 100.

Additionally or alternatively, the device and/or entity tracking computing system 100 may be trained to identify particular sources of background noise (e.g., from a vent or refrigerator) and automatically ignore waveforms corresponding to such noise in the recorded audio. In some examples, the entity tracking computing system 100 may include one or more audio recognition models specifically trained to recognize background noise. For example, audio from various noise databases may be run with supervised or unsupervised learning algorithms to more consistently identify such noise. By allowing the entity tracking computing system 100 to identify uncorrelated background noise, the ability of the entity tracking computing system to identify correlated human speech and other sounds may be improved. In some implementations, knowledge of the location of the sound source may be used to focus listening from the directional microphone array.

As noted above, in some cases, a smart assistant device as described herein may be configured to track people or other entities as they move throughout an environment. This may be accomplished, for example, by interpreting data received from a plurality of sensors communicatively coupled to the smart assistant device. In some examples, the smart assistant device may track the one or more entities by maintaining an environment-dependent coordinate system to which a field of detection (FOD) of each of the plurality of sensors is mapped. As used herein, an "environment" may refer to any real-world region (e.g., a single room, house, apartment, store, office, building, venue, outdoor space, grid area, etc.).

Referring now to fig. 11A and 11B, environment 4 of fig. 1 is schematically illustrated along with intelligent assistant device 10. In these views, FOD 500A of camera 82A and FOD500B of camera 82B of smart assistant device 10 are schematically illustrated. Since the sensor shown in FIG. 11A is a camera,

FODs

500A and 500B are the fields of view (FOVs) of

cameras

82A and 82B. In other words, the

FODs

500A and 500B illustrate portions of a three-dimensional space in which the

cameras

82A and 82B may detect entities in the environment 4. As will be described in more detail below, in some examples, when receiving image data from one or more cameras indicating the presence of a person, the smart assistant device 10 may actuate one or more components (e.g., light source(s), movable portion, etc.) to convey the presence of the person in a non-verbal manner.

Although the sensors shown in fig. 11A and 11B are cameras, as described above, the smart assistant device may include any of a variety of suitable sensors. By way of non-limiting example, such sensors may include visible light cameras, Infrared (IR) cameras, depth cameras, cameras sensitive to other wavelengths of light, microphones, radar sensors, any other sensor described herein, and/or any other sensor that may be used to track an entity. Further, the sensors in communication with the intelligent assistant device may take any suitable orientation.

Thus, as described above, the intelligent assistant device may maintain an environment-dependent coordinate system to which the FODs of the sensors in the environment are mapped. The coordinate system may, for example, represent an understanding of the real-world relationship of the FOD in the environment by the intelligent assistant device. In other words, the FOD of each sensor in the environment may be mapped to an environment-dependent coordinate system such that the intelligent assistant device understands the real-world regions where various sensors may detect entity presence, movement, and other contextual information. The environment-related coordinate system may additionally include other information related to the environment, such as physical dimensions of the environment (e.g., dimensions of a room, building, outdoor space, grid section) and/or locations of any furniture, obstacles, porches, sensors, or other detectable features present within the environment.

It will be appreciated that the environment-dependent coordinate system may take any suitable form and include any suitable information relating to the environment. The environment-dependent coordinate system may utilize any suitable scale, grid system, and/or other method to map/quantify the environment, and any suitable number of coordinates and parameters may be used to define the sensor FOD location. In some cases, the environment-related coordinate system may be a two-dimensional coordinate system, and the sensor FOD is defined relative to a two-dimensional surface (such as a floor of the environment). In other cases, the environment-dependent coordinate system may define the sensor FOD in three-dimensional space.

It should also be noted that tracking entities with a private environment (such as a living space, bedroom, bathroom, etc.) may pose potential privacy concerns. Thus, all data collected by the intelligent assistant device (such as entity location, appearance, movement, behavior, communication, etc.), which may be personal in nature, will be treated with the greatest appreciation for entity privacy. In some cases, any or all of the entity tracking techniques described herein may be performed only in response to receiving explicit user permission. For example, the user may specify which sensors are active, the amount and type of data collected by the sensors, which spaces or rooms in the environment are monitored by the entity tracking computing system, the level of security or encryption used with the data collected by the entity tracking computing system, whether the collected data is stored locally or remotely, and so forth. In some examples, a user may choose to monitor a sensitive area in an environment with a relatively low resolution sensor (such as a radar sensor). This may alleviate at least some privacy concerns regarding entity tracking, allowing an entity tracking computing device to track entity movement without requiring a user to install a high resolution camera in a sensitive area, such as a bathroom.

As described above, the intelligent assistant device of the present disclosure can detect the presence of a person and various contextual information related to the person. Furthermore, in some examples, incorporating one or more cameras in a device for sensing one or more types of visual data provides additional capabilities and opportunities for enhanced assistance and enhanced interaction with a user. More specifically, and as previously described, when a non-verbal communication is received from another party, the interaction of the person with another person or entity will be enhanced and the amount of information will be greater. 12A-12C, an example method 600 for communicating non-verbal cues via a smart assistant device is disclosed. By way of example, the method 600 may be performed by the

smart assistant device

10, 150, 158, 186 and/or the unitary computing device 160 of fig. 18. The following description of the method 600 is provided with reference to the software and hardware components described herein and shown in fig. 1-11 and 13-19. It will be understood that method 600 may also be performed in other contexts using other suitable hardware and software components.

Referring to fig. 12A, at 604, method 600 may include receiving image data indicative of human presence from one or more cameras of the smart assistant device. This is schematically illustrated in fig. 13A and 13B, which again show the environment 4 of fig. 1, fig. 13A and 13B. In particular, fig. 13A shows the FOD500B of the human entity 2 entering the camera 82B of the intelligent assistant device 10, while fig. 10B shows a view 800 of the environment 4 from the perspective of the camera 82B.

Upon detection of human entity 2 within FOD 500A, the camera may communicate an indication of the detected entity presence to smart assistant device 10. The indication of the presence of the entity may take any suitable form depending on the implementation and the particular sensor used. In one example scenario, a camera may capture an image of a human face. In some cases, the camera may transmit unprocessed image data to the smart assistant device, the image data including one or more pixels corresponding to a face. The transmitted pixels corresponding to the entity thus represent an indication of the entity's presence and may be processed by the intelligent assistant device to determine the location and/or identity of the entity. Notably, the image data may be transmitted by the camera at any suitable frequency and need not be transmitted only in response to detecting the candidate entity. In other cases, the camera may perform some degree of processing on the image data and send a summary or interpretation of the data to the intelligent assistant device. Such a summary may indicate, for example, that a particular, identified person is present at a particular location given by the sensor's sensor-related coordinate system. Regardless of the specific form in which the entity is indicated to be present, in an example scenario, the data received by the smart assistant device may still be used to identify a human face detected in the FOD of the sensor.

The indication of the presence of the entity may also include other forms of data depending on where the one or more additional sensors detected the entity. For example, when the sensor is a microphone, the indication of the presence of the entity may comprise recorded audio of the entity's voice or a sensor-related location of the entity determined via sound processing. When the sensor is a radar sensor, the indication of the presence of the entity may comprise a silhouette (silhouette) or "blob" (blob) formed by detecting radio waves reflected from the entity. It will be appreciated that different sensors will detect the presence of an entity in different ways, and that the indication of the presence of an entity may take any suitable form depending on the particular sensor(s) used. Further, the processing of sensor data may occur on an entity tracking computing system, on a sensor or related component, and/or distributed among multiple devices or systems.

Returning briefly to fig. 12A, at 608, method 600 may include: in response to receiving image data indicating the presence of the person, one or more components of the smart assistant device are actuated to convey the presence of the person in a non-verbal manner. As described in examples presented herein, in some examples, one or more components may include a single light source or multiple light sources. In different examples, a single light source may include a light emitting element such as an LED, or a display such as an OLED or LCD display. The plurality of light sources may include a plurality of light emitting elements, a single display, or a plurality of displays, as well as various combinations of the foregoing. In this manner and as described in the examples presented below, a person receiving such non-verbal communication is conveniently notified that her presence was detected by the intelligent assistant device. Further, by expressing the useful information via non-verbal communication, the device conveniently and non-invasively notifies the user of the information.

In one example and referring again to fig. 12A, actuating one or more components of the smart assistant device to communicate the presence of the person in a non-verbal manner at 612 may include illuminating at least one light source located on the smart assistant device. In this way, the person can be conveniently visually informed that the intelligent assistant device has detected her presence.

As described above and referring again to fig. 2, in one example, the smart assistant device 10 includes a cylindrical housing 80, the cylindrical housing 80 including a plurality of light sources 84 extending around at least a portion of a perimeter of the housing. For ease of description, fig. 14 is a schematic diagram showing the light source array 84 in an "expanded" two-dimensional view. In some examples, the light source 84 may extend 360 degrees around the perimeter of the housing 80 of the smart assistant device 10. In other examples, the array may extend 90 degrees, 120 degrees, 180 degrees, or any other suitable degree around the perimeter. Additionally, the example of fig. 14 shows a substantially rectangular 4 x 20 array of light sources. In other examples, different numbers and arrangements of light sources located at various locations on the smart assistant device 10 may be utilized and are within the scope of the present disclosure. In some examples, different individual light sources may have different shapes, sizes, outputs, and/or other qualities or characteristics.

In some examples and as described in more detail below, to communicate the presence of a person in a non-verbal manner, the smart assistant device 10 may determine a location of the person relative to the device and may illuminate at least one light source located on a person-facing portion of the device.

Returning briefly to fig. 12A, in some examples and at 616, the method 600 may include illuminating the at least one light source by modulating at least one of a frequency, a brightness, a color, a number, and a shape of the at least one light source to communicate the non-verbal cue to the user. In some examples and at 620, where the at least one light source is a plurality of light sources, the light sources may be LEDs. In other examples, any other suitable type of light source may be utilized.

Referring again to fig. 14 and as described in more detail below, in some examples, the frequency of the one or more light sources 84 may be modulated to communicate that the smart assistant device 10 detects the presence of a person in a non-verbal manner. Referring to the example of fig. 13A, when the person 2 enters the living room 4, the one or more light sources 84 may be illuminated to blink or pulse at a predetermined frequency when the image data from the camera of the smart assistant device 10 indicates the presence of the person 2. Additionally and as described in more detail below, in response to determining one or more contexts of the person (such as a location, height, or identity of the person), the one or more light sources may be illuminated to blink or pulse at different frequencies to communicate the one or more contexts of the person in a non-verbal manner. It will be appreciated that a variety of techniques of illuminating the light source(s) may be utilized, such as different frequencies and illumination patterns to create various visual effects, shapes, animations, and the like.

In some examples, one or more of the brightness, color, and number of light sources may be modulated in addition to or instead of modulating the frequency of the light source(s). For example, when person 2 enters living room 4, one or more light sources 84 may be illuminated at an initial brightness to communicate the presence of person 2 in a non-verbal manner. When one or more other contexts of the person are determined, the one or more light sources may be illuminated at a modified and enhanced brightness to convey the one or more contexts of the person in a non-verbal manner.

Similarly, when person 2 enters living room 4, one or more light sources 84 may be illuminated in an initial color (such as blue) to communicate the presence of person 2 in a non-verbal manner. When another context of the person is determined, the color of the one or more light sources may be changed to green to communicate the one or more contexts of the person in a non-verbal manner. In another example, blue light source(s) may be maintained to indicate presence, and other light source(s) may be illuminated in different colors to convey one or more contexts of a person in a non-verbal manner. In another example, when person 2 enters living room 4, only one of light sources 84 may be illuminated to communicate the presence of person 2 in a non-verbal manner. When another context of the person is determined, the plurality of light sources may be illuminated so as to communicate one or more contexts of the person in a non-verbal manner. It will be understood that the above examples are provided for illustrative purposes only, and that many variations and combinations of illuminating one or more light sources in various ways to communicate non-verbal cues may be utilized and are within the scope of the present disclosure.

Returning briefly to fig. 12A, at 624 method 600 may include displaying a vector graphic via a display of the smart assistant device, thereby communicating the non-verbal prompt. As noted above with respect to fig. 3, in some examples, the one or more light sources may include a display 152, the display 152 surrounding all or a portion of the perimeter of the device housing. In these examples, the display 152 may be used to display vector graphics 154 (such as various static or animated shapes, patterns, etc.) to communicate with the user in a non-verbal manner. Thus, in some examples, one or more shapes generated by the display may be modulated to communicate with the user in a non-verbal manner.

Referring now to fig. 15A-15D, in one example, the display may animate a shape that deforms from the circle shown in fig. 15A to the horizontal ellipse in fig. 15B, back to the circle in fig. 15C, and then back to the vertical ellipse shown in fig. 15D. As described above, in other examples, the display may generate a variety of shapes and/or patterns that are static and/or animated to convey various prompts to the user in a non-verbal manner.

Briefly returning to FIG. 12A and at 628, actuating one or more components to communicate with the user in a non-verbal manner can include projecting a non-verbal cue onto a surface. As noted above with respect to fig. 4, in some examples, the smart assistant device 158 may include a projector 180, the projector 180 projecting one or more static or animated shapes, patterns, icons, or the like onto a surface. In the example of fig. 4, projector 180 projects an image of circle 182 onto surface 184 of the table on which the device is located.

In some examples, data from one or more sensors of the smart assistant device may indicate the presence of multiple people. In these examples and briefly returning to fig. 12A, at 632 method 600 may include receiving an indication of a presence of a plurality of people from one or more sensors of a smart assistant device. Accordingly and using one or more techniques described herein, the smart assistant device may separately communicate different non-verbal cues to two or more of the plurality of persons.

Referring now to fig. 16, in one example, one or more sensors of the smart assistant device 10 may detect the second person 12 as well as the first person 2 in the living room 4. In this example, it may be desirable for the smart assistant device to communicate in a non-verbal manner that it responds to natural language input by a particular person; that is, a particular person has "focus" of the device. For example, where the first person 2 initiates interaction with the smart communication device (such as by speaking a keyword phrase such as "hey, computer"), the device may then identify the first person's voice and only respond to commands and queries from the first person. Accordingly and referring briefly to fig. 12A, at 636, the method 600 may include illuminating at least one light source of the smart assistant device to communicate in a non-verbal manner that the device is responsive to natural language input from the first person 2. To visually provide such non-verbal cues, the smart assistant device may use any of the techniques described above to illuminate one or more light sources on the device.

In some examples and as described above, the smart assistant device may determine the location of the first person 2 relative to the device. In these examples, the device may illuminate one or more LEDs (located on the portion of the device facing the person) to communicate the location of the person understood by the device in a non-verbal manner. Additionally and as described in more detail below, the smart assistant device may provide other non-verbal communication for two or more individuals to express additional context and other information, such as the location, height, and identity of the individual.

Referring now to fig. 12B, at 640, method 600 may include receiving data indicative of contextual information of a person from one or more sensors of an intelligent assistant device. As described above, the contextual information may include guesses/predictions by the entity tracking computing system of the identity, location, and/or state of one or more detected entities based on the received sensor data. At 644, method 600 may include: one or more contexts of the person are determined using at least data indicative of context information of the person. At 648, the one or more contexts of the person may include one or more of the following: (1) a location of the person relative to the intelligent assistant device; (2) the height of the person; (3) an initial identity of the person corresponding to the previously identified person and representing an initial confidence value; (4) a verified identity of the person representing a verified confidence value greater than the initial confidence value; and (5) the distance of the person from the intelligent assistant device.

In some examples and as described above, the location of one or more persons relative to the intelligent assistant device may be determined. Referring to the examples of fig. 16 and 17, image data from the camera of the smart assistant device may be used to identify and locate the first person 2 and the second person 12 relative to the device. For example, the intelligent assistant device 10 may process the image data to generate a sensor-related location of the detected person within a sensor-related coordinate system. For example, the sensor-related position may be given by a set of pixel coordinates relative to a two-dimensional grid of pixels captured by the camera. When the camera is a depth camera, the sensor-related position of the person may be a three-dimensional position.

The sensor-related location of the entity may take any suitable form as to the indication of the presence of the entity. In some examples, data from one or more other sensors may be used to determine the location of the person in addition to or instead of the image data. For example, when the sensor is a microphone, the sensor-related position may be inferred from the amplitude of the recorded audio signal, thereby serving as an indicator of the distance of the person from the sensor. Similarly, as with the environment-dependent coordinate system, the sensor-dependent coordinate system for each sensor may take any suitable form, and may use any scale, grid system, or other suitable method of calibrating/quantifying the local environment of the sensor, depending on the type of data collected or observed by the sensor.

In some examples, the detected sensor-related location of the person may be translated into an environment-related location of the person within an environment-related coordinate system. As described above, such translation may be associated with a mapping of the sensor's FOD to an environment-dependent coordinate system. The mapping may be implemented in any of a variety of suitable manners and may be performed at any suitable time. For example, in some cases, the mapping of the sensor FOD to the environment-dependent coordinate system may be performed at initial setup of the smart assistant device, gradually as the use of the device evolves, and/or at another suitable time.

Referring briefly to fig. 12B, at 652, the method 600 may include: in response to determining the one or more contexts of the person, one or more components of the smart assistant device are actuated to communicate the one or more contexts of the person in a non-verbal manner. Referring again to fig. 16 and 17, where the location of the first person 2 is determined, the smart assistant device may communicate such location to the person in a non-verbal manner. As schematically shown in fig. 17, in one example, the location of the first person 2 may be communicated in a non-verbal manner by illuminating one or more LEDs located on a person-facing portion 19 of the device 10, as indicated by dashed line 15.

In some examples, in addition to communicating in a non-verbal manner that the smart assistant device has detected the first person 2, the device may also communicate that it is tracking the location of the first person. For example and referring to fig. 17, as the first person 2 walks from the first location 21 to the second location 23, the smart assistant device 10 may gradually illuminate different light sources, thereby communicating in a non-verbal manner that the device is tracking the location of the first person. In one example and referring to fig. 14, as the first person 2 moves relative to the array of light sources 84 (which may be LEDs) in the direction of arrow a, the individual LEDs may be progressively illuminated and dimmed from right to left in a manner that follows the changing location of the person, and thus communicate in a non-verbal manner that the device is tracking the location of the person.

As described above, the smart assistant device 10 may detect the presence of more than one person. Referring briefly again to fig. 12B, at 656, the method 600 may include receiving an indication of a presence of a second person from one or more sensors of the intelligent assistant device. At 660, method 600 may include illuminating at least one light source of the smart assistant device to communicate in a non-verbal manner: the intelligent assistant device is tracking the location of the first person and the location of the second person.

In one example and referring again to fig. 17, in addition to communicating the location of the first person 2 in a non-verbal manner by illuminating one or more LEDs indicated by dashed line 15, in a similar manner, the smart assistant device 10 may also communicate the location of the second person 12 in a non-verbal manner by illuminating one or more LEDs, as indicated by dashed line 17, located on a different portion 25 of the device facing the second person 12. As described above for the first person 2, the smart assistant device 10 may also gradually illuminate different light sources to communicate in a non-verbal manner that the device is also tracking the location of the second person.

In some examples, the smart assistant device 10 may additionally or alternatively communicate the distance of the first person 2 from the device in a non-verbal manner. In one example, the brightness of one or more LEDs illuminated to indicate the location of a person may increase as the user moves closer to the device and decrease as the user moves farther away from the device. It will be appreciated that many other examples of illuminating a light source to communicate distance to a person in a non-verbal manner may be utilized.

As described above, the intelligent assistant device 10 may use data indicative of contextual information of a person to determine one or more contexts of the person. In some examples, the one or more contexts of the person may include a height of the person. In some examples where depth image data from a depth camera is received, the smart assistant device may utilize such data to determine a height of the detected person, and may communicate an indication of such height in a non-verbal manner by illuminating one or more of its light sources. In one example and referring to fig. 14, the height of different detected persons may be indicated generally by illuminating a varying number of LEDs in a vertical column. For example, for a person less than 4 feet in height, 1 LED may be illuminated; for people between 4 and 5 feet in height, 2 LEDs may be illuminated; for persons 5 to 6 feet in height, 3 LEDs may be illuminated; for people over 6 feet tall, all 4 LEDs may be illuminated. It will be appreciated that many other examples of illuminating a light source to convey a person's height in a non-verbal manner may be utilized.

In some examples and as described above, the one or more contexts of the person may include an initial identity and a verified identity of the person. As explained above, the entity identifier of the intelligent assistant device may determine two or more levels of identity of the person. For example, such an identity level may include an initial identity corresponding to a previously identified person and representing an initial confidence value, and a verified identity representing a verified confidence value that is greater than the initial confidence value that the person is a previously identified person. Where an initial identity is determined, the smart assistant device may communicate an indication of such identity in a non-verbal manner by illuminating one or more of its light sources in a particular manner.

In one example and referring to fig. 14, an initial identity of a person may be indicated by illuminating one or more LEDs in a first color (such as blue). In the event that such a person is then authenticated as a verified identity (representing a verified confidence value that is greater than the initial confidence value), such a verified identity may be indicated by illuminating one or more LEDs in a second, different color (such as green). It will be appreciated that many other examples may be utilized that illuminate a light source to convey an initial identity, a verified identity, and/or additional levels of identity security of a person in a non-verbal manner.

In some examples, a user of the intelligent assistant device 10 may desire to know which type(s) of data the device is collecting and utilizing. For example, some users may wish a device to collect or avoid collecting one or more types of data. In one example and referring briefly again to fig. 12B, at 664, method 600 may include illuminating at least one light source of the smart assistant device to communicate in a non-verbal manner a type of sensor data used by the smart assistant device to determine one or more contexts of the person. For example, where the light source comprises a display on the device, the display may generate a vector graphic showing the camera to indicate that video data is being collected by the device. It will be appreciated that many other examples may be utilized that illuminate light sources to communicate the type of sensor data used by the smart assistant device in a non-verbal manner.

As described above, in some examples, the smart assistant device 10 may receive and utilize a variety of different sensor data from a variety of different sensors on the device. In one example and referring now to fig. 12C, at 668, the method 600 may include wherein the one or more contexts of the person include an initial identity of the person, receiving and fusing data indicative of contextual information of the person from a plurality of different sensors of the intelligent assistant device to determine the initial identity of the person. As noted, in other examples, the intelligent assistant device 10 may fuse such data to determine a variety of different contexts of the person as described herein.

As also described above, in some example implementations of a smart assistant device (such as the examples shown in fig. 5A and 5B), one or more components of the device may be actuated to communicate the presence of a person in a non-verbal manner by translating, rotating, and/or otherwise moving the components. Referring briefly again to fig. 12C, at 672, the method 600 may include one or more of: the camera of the mobile device is aimed at the person and the display is moved to follow the person's position, thereby conveying the person's presence in a non-verbal manner.

In some examples, the one or more light sources of the smart assistant device may be Infrared (IR) emitters. For example, the device may include an IR projector configured to emit an encoded IR signal that is reflected from objects in the environment for receipt by an IR camera of the device. In some examples, the visible glow of such IR projectors may prove annoying or distracting to the user. Thus, in some examples and referring briefly again to fig. 12C, at 676, the method 600 may include wherein the smart assistant device includes a plurality of light sources, at least one of the plurality of light sources being illuminated to one or more of: (1) reducing the visibility of the at least one IR emitter, and (2) incorporating light emitted from the at least one IR emitter into an illumination pattern produced by the at least one light source. In one example, the IR emitter may be located in the middle of multiple LEDs on the device. When the IR emitter is illuminated, the LED may be illuminated such that the glow from the IR emitter mixes with the light emitted from the LED to reduce the visibility of the IR emitter. Further, in some examples, as described above, the techniques may also be used to convey information to a user in a non-verbal manner. In another example where the IR emitter is located among a plurality of LEDs, when the IR emitter is activated, the LEDs may be selectively illuminated to create a pleasing pattern that incorporates light from the IR emitter into the pattern, thereby masking such IR light.

Referring now to fig. 18, additional example implementations of the intelligent assistant device 10 in a single computing device are illustrated. Additional details regarding the components and computing aspects of the computing device shown in FIG. 18 are described below with reference to FIG. 19.

Fig. 18 shows one example of a unitary computing device 160 in which the components that implement the smart assistant device 10 are arranged together in a stand-alone device in the unitary computing device 160. In some examples, the unitary computing device 160 may be communicatively coupled to one or more other computing devices 162 via a network 166. In some examples, the unitary computing device 160 may be communicatively coupled to a data store 164, which data store 164 may store various data, such as user profile data. The unitary computing device 160 includes at least one sensor 22, a voice listener 30, a parser 40, an intent processor 50, a promising engine 60, an entity tracking computing system 100, and at least one output device 70. Sensor(s) 22 include at least one camera for receiving visual data and at least one microphone for receiving natural language input from a user. In some examples, one or more other types of sensor(s) 22 may also be included.

As described above, the voice listener 30, the parser 40, and the intent processor 50 work in concert to convert natural language input into a commitment executable by the unitary apparatus 160. Such commitments may be stored by the commitment engine 60. The entity tracking computing system 100 may provide contextual information to the promising engine 60 and/or other modules. At a contextually appropriate time, the promising engine 60 may perform the promise and provide output, such as an audio signal, to the output device(s) 70.

In some embodiments, the methods and processes described herein may be associated with a computing system of one or more computing devices. In particular, such methods and processes may be implemented as computer applications or services, Application Programming Interfaces (APIs), libraries, and/or other computer program products.

FIG. 19 schematically illustrates one non-limiting embodiment of a computing system 1300, the computing system 1300 may implement one or more of the methods and processes described above. Computing system 1300 is shown in simplified form. Computing system 1300 can take the form of: one or more smart assistant devices, one or more personal computers, server computers, tablet computers, home entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smartphones), and/or other computing devices described herein.

Computing system 1300 includes a logic machine 1302 and a storage machine 1304. Computing system 1300 may optionally include a display subsystem 1306, an input subsystem 1308, a communication subsystem 1310, and/or other components not shown in fig. 19.

The logic machine 1302 includes one or more physical devices configured to execute instructions. For example, the logic machine may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, implement a technical effect, or otherwise achieve a desired result.

The logic machine may include one or more processors configured to execute software instructions. Additionally or alternatively, the logic machine may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. The processors of the logic machine may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. The individual components of the logic machine may optionally be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic machine may be virtualized and executed by remotely accessible networked computing devices configured in a cloud computing configuration.

The storage machine 1304 includes one or more physical devices configured to hold instructions executable by a logic machine to implement the methods and processes described herein. When such methods and processes are implemented, the state of the storage machine 1304 may be transformed, for example, to hold different data.

The storage 1304 may include removable and/or built-in devices. The storage machine 1304 may include optical memory (e.g., CD, DVD, HD-DVD, blu-ray disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. The storage machine 1304 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.

It should be understood that the storage machine 1304 includes one or more physical devices. However, aspects of the instructions described herein may alternatively be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.) in which the physical device does not hold for a limited duration.

Aspects of the logic machine 1302 and the storage machine 1304 may be integrated together into one or more hardware logic components. Such hardware logic components may include, for example, Field Programmable Gate Arrays (FPGAs), program specific and application specific integrated circuits (PASIC/ASIC), program specific and application specific standard products (PSSP/ASSP), system on a chip (SOC), and Complex Programmable Logic Devices (CPLDs).

The terms "module," "program," and "engine" may be used to describe an aspect of computing system 1300 that is implemented to perform particular functions. In some cases, a module, program, or engine may be instantiated via logic machine 1302 executing instructions held by storage machine 1304. It will be appreciated that different modules, programs, and/or engines may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same module, program, and/or engine may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms "module," "program," and "engine" may encompass a single or a group of executable files, data files, libraries, drivers, scripts, database records, and the like.

It will be understood that, as used herein, a "service" is an application that is executable across multiple user sessions. A service may be available to one or more system components, programs, and/or other services. In some implementations, the service may run on one or more server computing devices.

When included, display subsystem 1306 may be used to present a visual representation of data held by storage machine 1304. In some examples, display subsystem 1306 may include one or more light sources as described herein. Where display subsystem 1306 includes a display device that generates vector graphics and other visual representations, such representations may take the form of a Graphical User Interface (GUI). When the herein described methods and processes change the data held by the storage machine, and thus transform the state of the storage machine, the state of display subsystem 1306 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 1306 may include one or more display devices utilizing virtually any type of technology.

When included, input subsystem 1308 may include or interact with one or more user input devices, such as a keyboard, a mouse, a touch screen, or a game controller. In some embodiments, the input subsystem may include or interact with a selected Natural User Input (NUI) component. Such components may be integrated or peripheral, and the transduction and/or processing of input actions may be performed on-board or off-board. Example NUI components may include a microphone for speech and/or voice recognition; infrared, color, stereo and/or depth cameras for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer and/or gyroscope for motion detection and/or intent recognition; and an electric field sensing assembly for assessing brain activity.

When included, communication subsystem 1310 may be configured to communicatively couple computing system 1300 with one or more other computing devices. Communication subsystem 1310 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As a non-limiting example, the communication subsystem may be configured for communication via a wireless telephone network or a wired or wireless local or wide area network. In some embodiments, the communication subsystem may allow computing system 1300 to send and/or receive messages to and/or from other devices via a network, such as the internet.

The following paragraphs provide additional support for the claims of the subject application. One aspect provides, at a smart assistant device configured to respond to natural language input, a method for communicating non-language prompts, the method comprising: receiving image data indicative of a presence of a person from one or more cameras of a smart assistant device; in response to receiving the image data, actuating one or more components of the smart assistant device to convey the presence of the person in a non-verbal manner; receiving data indicative of contextual information of a person from one or more sensors of an intelligent assistant device; determining one or more contexts of the person using at least data indicative of context information of the person; and in response to determining the one or more contexts of the person, actuating one or more components of the intelligent assistant device to communicate the one or more contexts of the person in a non-verbal manner. The method may additionally or alternatively comprise: wherein actuating one or more components of the smart assistant device to communicate the presence of the person in a non-verbal manner further comprises: at least one light source located on the smart assistant device is illuminated. The method may additionally or alternatively comprise: wherein actuating one or more components of the smart assistant device to communicate the presence of the person in a non-verbal manner further comprises one or more of: moving the camera to aim at the person, and moving the display to follow the person's position. The method may additionally or alternatively include, wherein the one or more contexts of the person include one or more of: (1) a location of the person relative to the intelligent assistant device; (2) the height of the person; (3) an initial identity of the person, the initial identity corresponding to a previously identified person and representing an initial confidence value; (4) a verified identity of the person, the verified identity representing a verified confidence value that is greater than the initial confidence value; and (5) the distance of the person from the intelligent assistant device. The method may additionally or alternatively comprise: one or more contexts in which the one or more components are actuated to communicate the person in a non-verbal manner further include: illuminating at least one light source located on the smart assistant device; and illuminating the at least one light source comprises modulating at least one of a frequency, a brightness, a color, a number, and a shape of the at least one light source. The method may additionally or alternatively comprise wherein the at least one light source is a plurality of light sources and the plurality of light sources comprises a plurality of LEDs. The method may additionally or alternatively include, wherein the actuating one or more components to communicate one or more contexts of the person in a non-verbal manner further comprises: the vector graphics are displayed via a display of the intelligent assistant device. The method may additionally or alternatively include, wherein the actuating one or more components to communicate one or more contexts of the person in a non-verbal manner further comprises: projecting the non-verbal cue onto a surface. The method may additionally or alternatively include, wherein the person is a first person, the method further comprising: receiving, from one or more sensors of the smart assistant device, an indication of a presence of the second person; and illuminating at least one light source located on the smart assistant device to communicate in a non-verbal manner that the smart assistant device is responsive to natural language input from the first person. The method may additionally or alternatively include, wherein the person is a first person and the one or more contexts of the person include a location of the first person, the method further comprising: receiving, from one or more sensors of the smart assistant device, an indication of a presence of the second person; and illuminating at least one light source located on the smart assistant device to communicate in a non-verbal manner that the smart assistant device is tracking the location of the first person and the location of the second person. The method may additionally or alternatively include, wherein the one or more contexts of the person include an initial identity of the person, the method further comprising: data indicative of contextual information of a person is received and fused from a plurality of different sensors of an intelligent assistant device to determine an initial identity of the person. The method may additionally or alternatively comprise: illuminating at least one light source located on the smart assistant device to communicate a type of sensor data in a non-verbal manner, the sensor data being used by the smart assistant device to determine one or more contexts of the person. The method may additionally or alternatively comprise: wherein the one or more components comprise a plurality of light sources and the plurality of light sources comprise at least one infrared emitter, the method further comprising illuminating at least one light source of the plurality of light sources to one or more of: (1) reducing the visibility of the at least one infrared emitter, and (2) incorporating light emitted from the at least one infrared emitter into an illumination pattern produced by the at least one light source.

Another aspect provides an intelligent assistant device configured to respond to natural language input, comprising: a plurality of light sources; a plurality of sensors having one or more cameras; at least one speaker; a logic machine; and a storage machine storing machine holding instructions executable by the logic machine to: receiving image data indicative of a presence of a person from at least one of the one or more cameras; in response to receiving the image data, actuating one or more components of the smart assistant device to convey the presence of the person in a non-verbal manner; receiving data indicative of contextual information of a person from one or more sensors of a plurality of sensors; determining one or more contexts of the person using at least data indicative of context information of the person; and in response to determining the one or more contexts of the person, actuating one or more components of the intelligent assistant device to communicate the one or more contexts of the person in a non-verbal manner. The intelligent assistant device may additionally or alternatively comprise: wherein actuating one or more components of the smart assistant device to communicate the presence of the person in a non-verbal manner further comprises: at least one light source of the plurality of light sources is illuminated. The smart assistant device may additionally or alternatively include, wherein actuating one or more components of the smart assistant device to communicate the presence of the person in a non-verbal manner further includes one or more of: the camera is moved to aim at the person and the display is moved to follow the person's position. The intelligent assistant device may additionally or alternatively include, wherein the one or more contexts in which the one or more components are actuated to communicate in a non-verbal manner to the person further comprise: illuminating at least one light source located on the smart assistant device, and illuminating the at least one light source comprises modulating at least one of a frequency, a brightness, a color, a number, and a shape of the at least one light source. The intelligent assistant device may additionally or alternatively comprise: wherein the person is a first person and the instructions are executable to: receiving, from one or more sensors of the smart assistant device, an indication of a presence of the second person; and illuminating at least one light source located on the smart assistant device to communicate in a non-verbal manner that the smart assistant device is responsive to natural language input from the first person. The intelligent assistant device may additionally or alternatively include, wherein the person is a first person, and the one or more contexts of the person include a location of the first person, and the instructions are executable to: receiving, from one or more sensors of the smart assistant device, an indication of a presence of the second person; and illuminating at least one light source located on the smart assistant device to communicate in a non-verbal manner that the smart assistant device is tracking the location of the first person and the location of the second person.

In another aspect, there is provided an intelligent assistant device configured to respond to natural language input, comprising: a housing; a plurality of LEDs positioned around at least a portion of the housing; a plurality of sensors including at least one camera and at least one microphone; at least one speaker; a logic machine; and a storage machine storing machine holding instructions executable by the logic machine to: receiving image data from at least one camera indicating a presence of a person; in response to receiving the image data, illuminating at least one of the plurality of LEDs to communicate the detection of the presence of the person in a non-verbal manner; receiving data indicative of contextual information of a person from one or more sensors of a plurality of sensors; determining one or more contexts of the person using at least data indicative of context information of the person; and in response to determining the one or more contexts of the person, illuminating at least one of the plurality of LEDs to communicate the one or more contexts of the person in a non-verbal manner.

It will be appreciated that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Also, the order of the above-described processes may be changed.

The subject matter of the present disclosure includes all novel and nonobvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as all equivalents thereof.

Claims

1. A method at a smart assistant device for communicating non-verbal cues, the smart assistant device configured to respond to natural language input, the method comprising:

receiving image data indicative of a presence of a person from one or more cameras of the intelligent assistant device;

in response to receiving the image data, actuating one or more components of the smart assistant device to communicate the presence of the person in a non-verbal manner;

receiving data indicative of contextual information of the person from one or more sensors of the intelligent assistant device;

determining one or more contexts of the person using at least the data indicative of contextual information of the person; and

in response to determining the one or more contexts of the person, actuating one or more components of the intelligent assistant device to communicate the one or more contexts of the person in a non-verbal manner.

2. The method of claim 1, wherein actuating one or more components of the smart assistant device to communicate the presence of the person in a non-verbal manner further comprises: illuminating at least one light source located on the intelligent assistant device.

3. The method of claim 1, wherein actuating one or more components of the smart assistant device to communicate the presence of the person in a non-verbal manner further comprises one or more of: moving a camera to aim at the person, and moving a display to follow the positioning of the person.

4. The method of claim 1, wherein the one or more contexts of the person comprise one or more of: (1) a location of the person relative to the smart assistant device; (2) the height of the person; (3) an initial identity of the person, the initial identity corresponding to a previously identified person and representing an initial confidence value; (4) a verified identity of the person, the verified identity representing a verified confidence value that is greater than the initial confidence value; and (5) a distance of the person from the intelligent assistant device.

5. The method of claim 1, wherein actuating one or more components to communicate the one or more contexts of the person in a non-verbal manner further comprises: illuminating at least one light source located on the smart assistant device; and illuminating the at least one light source comprises modulating at least one of a frequency, a brightness, a color, a number, and a shape of the at least one light source.

6. The method of claim 5, wherein the at least one light source is a plurality of light sources and the plurality of light sources comprises a plurality of LEDs.

7. The method of claim 1, wherein actuating one or more components to communicate the one or more contexts of the person in a non-verbal manner further comprises: displaying, via a display of the intelligent assistant device, a vector graphic.

8. The method of claim 1, wherein actuating one or more components to communicate the one or more contexts of the person in a non-verbal manner further comprises: projecting the non-verbal cue onto a surface.

9. The method of claim 1, wherein the person is a first person, the method further comprising:

receiving, from one or more sensors of the intelligent assistant device, an indication of a presence of a second person; and

illuminating at least one light source located on the smart assistant device to communicate in a non-verbal manner that the smart assistant device is responsive to the natural language input from the first person.

10. The method of claim 1, wherein the person is a first person and the one or more contexts of the person include the location of the first person, the method further comprising:

illuminating at least one light source located on the smart assistant device to communicate in a non-verbal manner that the smart assistant device is tracking the location of the first person and the location of the second person.

11. The method of claim 1, wherein the one or more contexts of the person include the initial identity of the person, the method further comprising: receiving and fusing the data indicative of contextual information of the person from a plurality of different sensors of the intelligent assistant device to determine the initial identity of the person.

12. The method of claim 1, further comprising: illuminating at least one light source located on the smart assistant device to communicate a type of sensor data in a non-verbal manner, the sensor data used by the smart assistant device to determine the one or more contexts of the person.

13. The method of claim 1, wherein the one or more components comprise a plurality of light sources and the plurality of light sources comprise at least one infrared emitter, the method further comprising illuminating at least one light source of the plurality of light sources to one or more of: (1) reducing visibility of the at least one infrared emitter, and (2) incorporating light emitted from the at least one infrared emitter into an illumination pattern produced by the at least one light source.

14. An intelligent assistant device configured to respond to natural language input, comprising:

a housing;

a plurality of LEDs positioned around at least a portion of the housing;

a plurality of sensors including at least one camera and at least one microphone;

at least one speaker;

a logic machine; and

a storage machine holding instructions executable by the logic machine to:

receiving image data from the at least one camera indicating the presence of a person;

in response to receiving the image data, illuminating at least one LED of the plurality of LEDs to communicate detection of the presence of the person in a non-verbal manner;

receiving data indicative of contextual information of the person from one or more sensors of the plurality of sensors;

in response to determining the one or more contexts of the person, illuminating at least one LED of the plurality of LEDs to communicate the one or more contexts of the person in a non-verbal manner.

15. The intelligent assistant device of claim 14, wherein the person is a first person, and the instructions are executable to: