CN111919250B - Intelligent assistant device for conveying non-language prompt - Google Patents

Intelligent assistant device for conveying non-language prompt Download PDF

Info

Publication number
CN111919250B
CN111919250B CN201980022427.2A CN201980022427A CN111919250B CN 111919250 B CN111919250 B CN 111919250B CN 201980022427 A CN201980022427 A CN 201980022427A CN 111919250 B CN111919250 B CN 111919250B
Authority
CN
China
Prior art keywords
person
assistant device
intelligent assistant
entity
contexts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201980022427.2A
Other languages
Chinese (zh)
Other versions
CN111919250A (en
Inventor
S·N·巴蒂克
V·普拉德普
A·N·贝内特
D·G·奥尼尔
A·C·里德
K·J·卢克韦耶科
T·I·柯拉沃利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/936,076 external-priority patent/US11010601B2/en
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Publication of CN111919250A publication Critical patent/CN111919250A/en
Application granted granted Critical
Publication of CN111919250B publication Critical patent/CN111919250B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The intelligent assistant device is configured to communicate non-verbal cues. Image data indicating the presence of a person is received from one or more cameras of the device. In response, one or more components of the device are actuated to communicate the presence of the person in a non-verbal manner. Data indicative of contextual information of a person is received from one or more sensors. Using at least the data, one or more contexts of the person are determined and one or more components of the device are actuated to communicate the one or more contexts of the person in a non-verbal manner.

Description

Intelligent assistant device for conveying non-language prompt
Background
An intelligent assistant device, such as a voice command device or "intelligent speaker" and its virtual assistant, may receive and process language (verbal) queries and commands to provide intelligent assistance to a user. These devices are typically activated by speaking keywords and provide a linguistic response to the request via computerized speech (speech) broadcast to the user. However, these devices do not provide for non-verbal communication (communication) without user commands or requests.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
A method for communicating non-verbal cues, comprising: image data indicating the presence of a person is received from one or more cameras of the intelligent assistant device. In response, one or more components of the device are actuated to communicate the presence of the person in a non-verbal manner. Data indicative of contextual information of a person is received from one or more sensors of a device. Using at least the data, one or more contexts of the person are determined. In response, one or more components of the intelligent assistant device are actuated to communicate one or more contexts of the person in a non-verbal manner.
Drawings
FIG. 1 illustrates an example environment for a smart assistant device in the form of an all-in-one computing device in accordance with one example of this disclosure.
Fig. 2 schematically illustrates one example of the intelligent assistant device of fig. 1, according to examples of the present disclosure.
Fig. 3 schematically illustrates another example of a smart assistant device according to an example of the present disclosure.
Fig. 4 schematically illustrates another example of a smart assistant device according to an example of the present disclosure.
Fig. 5A and 5B schematically illustrate another example of a smart assistant device according to an example of the present disclosure.
Fig. 6 schematically illustrates an example logical architecture for implementing an intelligent assistant system according to examples of this disclosure.
Fig. 7 schematically illustrates an entity tracking computing system that may determine the identity, location, and/or current state of one or more entities, according to an example of the present disclosure.
Fig. 8 schematically illustrates an entity tracking computing system receiving and interpreting (inter) sensor data over multiple time frames according to an example of the present disclosure.
Fig. 9 schematically illustrates one example of sensor confidence decay over time via an entity tracking computing system, according to an example of the present disclosure.
Fig. 10 schematically illustrates one example of identifying a person's speech using a trained voice recognition engine according to an example of the present disclosure.
Fig. 11A and 11B schematically illustrate a detection Field (FOD) of a sensor of a smart assistant device in an environment according to an example of the present disclosure.
Fig. 12A, 12B, and 12C illustrate a method for communicating non-verbal cues via a smart assistant device configured to respond to natural language input in accordance with an example of the present disclosure.
Fig. 13A and 13B schematically illustrate detection of entities in the FOD of the sensor.
Fig. 14 schematically illustrates an array of light sources on a smart assistant device according to an example of the present disclosure.
Fig. 15A-15D schematically illustrate a display device displaying an animated (animated) shape in a smart assistant device according to an example of the present disclosure.
Fig. 16 schematically illustrates an example in which two persons are detected by an intelligent assistant device according to an example of the present disclosure.
Fig. 17 is a schematic top view of the room of fig. 16, showing one example of a smart assistant device communicating the location (position) of two persons in a non-verbal manner, in accordance with examples of the present disclosure.
Fig. 18 schematically illustrates an example of an integrated computing device in which components of a smart assistant device embodying the present disclosure are arranged together in a stand-alone device, according to an example of the present disclosure.
Fig. 19 schematically illustrates a computing system according to an example of the present disclosure.
Detailed Description
As people seek greater convenience in daily life, the popularity of intelligent assistant devices reflects their increasing popularity. As described above, such devices may perform tasks and services for users via convenient voice interactions. However, since these devices do not provide for non-verbal communication of their understanding of the user, much of the information that contributes to the expression (convey) cannot be communicated.
Non-verbal communication is often used consciously and unconsciously to express useful understanding when people interact with each other. For example, when Alice (Alice) walks on a street and another person is in close proximity to the pavilion (Bhavana), a non-verbal cue from the pavilion may convey to Alice some understanding of the pavilion about Alice. If the pavana looks at alice with a curious line of sight and face, she communicates to alice that she may or may not know that she knows alice. If the Pavana shows a clear happiness and surprise to Alice, she conveys that she is very exciting to see Alice. On the other hand, if the baccarat is riding on the eyebrows and diverting his own road away from alice, then the information she conveys is quite different. Of course, many other types and forms of non-verbal communication (such as gestures, distances, etc.) may also provide word-free cues and cues.
Such non-verbal communication allows for human-to-human interactions to become more informative and richer. Accordingly, the present disclosure relates to intelligent assistant devices and methods for communicating non-verbal information via such devices. The methods and techniques discussed herein are described primarily from the perspective of a stand-alone, unitary intelligent assistant device configured to respond to natural language input, for example, by answering questions or performing actions. The intelligent assistant device tracks the computing system with the entity. In some examples, tracking of entities in the environment may be performed using only sensor inputs from the intelligent assistant device. In other examples, tracking of entities may be performed using various intelligent assistant computing devices and/or other sensors, security devices, home automation devices, and the like.
Fig. 1 illustrates a person 2 entering a living room (living room) 4, one example of which is that the intelligent assistant device 10 is in the form of a unitary computing device. As described in more detail below, in some examples, the intelligent assistant device 10 may be configured to receive and process natural language input. The user may use the intelligent assistant device for a variety of functions. For example, a user may provide natural language input to ask the intelligent assistant device to perform various tasks, such as providing information, changing the state of the device, sending a message, completing a purchase, and so forth.
The user may ask the system for information about a broad topic, such as weather, personal calendar events, movie show times, etc. In some examples, the intelligent assistant device 10 may also be configured to control elements in the living room 4, such as the television 6, speakers 8 of a music system, or motorized shades 16. The intelligent assistant device 10 may also be used to receive and store messages and/or reminders for delivery at an appropriate future time. Using the data received from the sensors, the intelligent assistant device may track and/or communicate with one or more users or other entities. Additionally and as described in more detail below, the intelligent assistant device 10 may communicate non-verbal information to a user via one or more light sources and/or other components of the device.
In some examples, the intelligent assistant device 10 may be operatively connected with one or more other computing devices using a wired connection, or may employ a wireless connection via Wi-Fi, bluetooth, or any other suitable wireless communication protocol. For example, the intelligent assistant device 10 may be communicatively coupled to one or more other computing devices via a network. The network may take the form of a Local Area Network (LAN), wide Area Network (WAN), wired network, wireless network, personal area network, or a combination thereof, and may include the internet. Additional details regarding the components and computing aspects of the intelligent assistant device 10 are described in greater detail below with reference to fig. 19.
Although, as described above, the intelligent assistant device may be operatively connected to other devices, in some examples, the intelligent assistant device may perform the methods and techniques described herein entirely locally via one or more processors on a device board. Advantageously, in these examples, any latency, bandwidth limitations, and other drawbacks associated with exchanging data with a remote server or other device are eliminated. In this way, more real-time interactions and non-verbal communication with the user are possible.
Fig. 2 schematically illustrates one example implementation of a smart assistant device according to the present disclosure. In this example, the intelligent assistant device 10 is a unitary computing device that includes a variety of sensors, output devices, and other components. The device includes an intelligent assistant system 10 according to examples of the present disclosure, the intelligent assistant system 10 being capable of recognizing and responding to natural language input. Additional description and details of the components and functions performed by the intelligent assistant system 10 are provided below.
In the example of fig. 2, the intelligent assistant device 10 includes a cylindrical housing 80, the cylindrical housing 80 housing a microphone 81, a camera 82, a speaker 83, and a plurality of light sources 84 located around at least a portion of the housing. In this example, the light source 84 comprises an LED. In other examples, one or more of the light sources 84 may include one or more display devices or any other suitable type of light source. Additionally and as described in more detail below, one or more of the light sources 84 may be illuminated and modulated to communicate information to a user in a non-verbal manner.
In different examples, the microphone 81 may include a plurality of microphones (such as a microphone array) arranged at various locations on the device. In this example, three cameras 82A, 82B, and 82C are shown, and a fourth camera (not visible) is located on the back side of the housing. In this example, the fields of view of the four cameras 82 overlap to enable the intelligent assistant device 10 to receive image data from the entire 360 degrees around the device. In other examples, fewer or more cameras, and configurations that provide less than 360 degrees of detection Field (FOD) may be used. Additional details regarding the various types of cameras, microphones, and other sensors that may be used with the intelligent assistant device 10 are provided below.
In other examples, one or more light sources in the form of display devices may be used in addition to or instead of LEDs. For example and with reference to fig. 3, another implementation of a smart assistant device 150 is schematically illustrated, the smart assistant device 150 including a display 152 around the perimeter of the housing 80. In this example, as described in the examples below, the display 152 may be used to display vector graphics 154 (such as various static or animated shapes, patterns, etc.) to communicate with the user in a non-verbal manner.
In other examples, the intelligent assistant device may also utilize one or more projectors to project non-verbal cues onto a surface in addition to or in lieu of using LEDs and/or one or more displays to provide non-verbal communication. For example and with reference to fig. 4, another implementation of the intelligent assistant device 158 is schematically illustrated, the intelligent assistant device 158 including a projector 180, the projector 180 being capable of projecting light onto a surface. In this example, projector 180 projects an image of circle 182 onto a surface 184 of a table on which the device is located. As described in more detail below, such projected light may create any number of static or animated shapes, patterns, icons, etc., that may be used to convey non-verbal cues to a user.
In other examples, in addition to or instead of using LEDs, one or more displays, and/or one or more projectors to provide non-verbal communication, the intelligent assistant device may also actuate one or more other components to communicate information to the user in a non-verbal manner. For example and with reference to fig. 5A and 5B, another implementation of a smart assistant device 186 is schematically illustrated, the smart assistant device 186 including a movable (moveable) top 188, the movable top 188 including a camera 189. In this example and as described in more detail below, the movable top 188 may be actuated to convey a non-verbal cue to the user. In some examples, the intelligent assistant device 186 may track the location of the person and the movable top 188 may be moved around the perimeter of the device to follow the location of the person and align (foveate on) the camera 189 with the person.
It will be appreciated that the example intelligent assistant devices 10, 150, 158, and 186 described and illustrated in fig. 2-5B are provided for illustrative purposes only, and that many other form factors, shapes, configurations, and other variations of such devices may be used and are within the scope of the present disclosure.
Referring now to fig. 6, the following is a description of an example logic architecture for implementing the intelligent assistant system 20, the intelligent assistant system 20 being capable of recognizing and responding to natural language input, in accordance with an example of the present disclosure. As described in more detail below, in various examples, the system 20 may be implemented in a single, unitary computing device (such as the intelligent assistant device 10), across two or more devices, in a cloud-supported network, and in a combination thereof.
In this example, the intelligent assistant system 10 includes at least one sensor 22, an entity tracking computing system 100, a voice listener (listener) 30, a parser 40, an intent processor 50, a promising engine (commitment engine) 60, and at least one output device 70. In some examples, the sensor 22 may include one or more microphones 24, a visible light camera 26, an infrared camera 27, and a connectivity device 28 (such as a Wi-Fi or bluetooth module). In some examples, sensor(s) 22 may include stereo and/or depth cameras, head trackers, eye trackers, accelerometers, gyroscopes, gaze detection (gaze detection) devices, electric field sensing components, GPS or other location tracking devices, temperature sensors, device status sensors, and/or any other suitable sensor.
The entity tracking computing system 100 is configured to detect entities, including humans, animals, or other biological and non-biological objects, and their activities. The entity tracking computing system 100 includes an entity identifier (identifier) 104, the entity identifier 104 configured to identify a person, an individual user, and/or a non-biological object. The voice listener 30 receives the audio data and utilizes speech recognition functionality to translate (translate) the utterance into text. The voice listener 30 may also assign confidence value(s) to the translated text and may perform speaker recognition to determine the identity of the person speaking and assign probabilities to the accuracy of such identifications. Parser 40 analyzes the text and confidence values received from voice listener 30 to derive user intent and generate a corresponding machine executable language.
The intent processor 50 receives machine-executable language representing the user's intent from the parser 40 and parses the missing and ambiguous information to generate commitments. The promising engine 60 stores promises from the intention processor 50. At a contextually appropriate time, the promising engine may deliver one or more messages and/or perform one or more actions associated with the one or more promises. The promising engine 60 may store the message in the message queue 62 or cause one or more output devices 70 to generate an output. The output device 70 may include one or more of the following: speaker(s) 72, video display(s) 74, indicator light(s) 76, haptic device(s) 78, and/or other suitable output devices. In other examples, the output device 70 may include one or more other devices or systems (e.g., home lighting, thermostats, media programs, door locks, etc.) that may be controlled via actions performed by the promising engine 60.
In different examples, the voice listener 30, the parser 40, the intent processor 50, the promising engine 60, and/or the entity tracking computing system 100 may be embodied in software that is stored in memory and executed by one or more processors of the computing device. In some implementations, a programmed-specific logic processor may be used to increase the computing efficiency and/or effectiveness of the intelligent assistant device. Additional details regarding the components and computing aspects of a computing device that may store and execute these modules are described in greater detail below with reference to FIG. 19.
In some examples, the voice listener 30 and/or the promising engine 60 may receive context information from the entity tracking computing system 100, the context information including an associated confidence value. As described in more detail below, the entity tracking computing system 100 may determine the identity, location, and/or current status of one or more entities within range of one or more sensors, and may output such information to one or more other modules, such as the voice listener 30, the promising engine 60, and the like. In some examples, the entity tracking computing system 100 may interpret and evaluate sensor data received from one or more sensors and may output context information based on the sensor data. The contextual information may include guesses/predictions of the identity, location, and/or status of one or more detected entities by the entity tracking computing system based on the received sensor data. In some examples, the guess/prediction may additionally include a confidence value that defines the statistical likelihood that the information is accurate.
Fig. 7 schematically illustrates an example entity tracking computing system 100, which entity tracking computing system 100 may include components of intelligent assistant system 20 in some examples. The entity tracking computing system 100 may be used to determine the identity, location, and/or current state of one or more entities within range of one or more sensors. The entity tracking computing system 100 may output such information to one or more other modules of the intelligent assistant system 20, such as the promising engine 60, the voice listener 30, and the like.
The term "entity" as used in the context of entity tracking computing system 100 may refer to humans, animals, or other biological and non-biological objects. For example, the entity tracking computing system may be configured to identify furniture, appliances, autonomous robots, structures, landscape features, vehicles, and/or any other physical objects, and determine the location/position and current state of these physical objects. In some cases, entity tracking computing system 100 may be configured to identify only people, and not other living beings or non-living beings. In such cases, the word "entity" may be synonymous with the word "person" or "human".
The entity-tracking computing system 100 receives sensor data from one or more sensors 102 (such as sensor a 102A, sensor B102B, and sensor C102C), but it will be appreciated that the entity-tracking computing system may be used with any number and variety of suitable sensors. As an example, sensors that may be used with the entity tracking computing system may include cameras (e.g., visible light cameras, UV cameras, IR cameras, depth cameras, thermal cameras), microphones, directional microphone arrays, pressure sensors, thermometers, motion detectors, proximity sensors, accelerometers, global Positioning Satellite (GPS) receivers, magnetometers, radar systems, lidar systems, environmental monitoring devices (e.g., smoke detectors, carbon monoxide detectors), barometers, health monitoring devices (e.g., electrocardiographs, blood pressure meters, electroencephalograms), automotive sensors (e.g., speedometers, odometers, tachometers, fuel oil sensors), and/or any other sensor or device that collects and/or stores information related to the identity, location, and/or current state of one or more persons or other entities. In some examples, such as in the intelligent assistant device 10, the entity tracking computing system 100 may occupy a common device housing with one or more of the plurality of sensors 102. In other examples, the entity tracking computing system 100 and its associated sensors may be distributed across multiple devices configured to communicate via one or more network communication interfaces (e.g., wi-Fi adapter, bluetooth interface).
As shown in the example of fig. 7, the entity tracking computing system 100 may include an entity identifier 104, a person identifier 105, a location (position) identifier 106, and a status identifier 108. In some examples, the person identifier 105 may be a dedicated component of the entity identifier 104 that is specifically optimized to identify humans relative to other living and non-living things. In other cases, the person identifier 105 may operate separately from the entity identifier 104, or the entity tracking computing system 100 may not include a dedicated person identifier.
Any or all of the functions associated with the entity identifier, person identifier, location identifier, and status identifier may be performed by the individual sensors 102A-102C, depending on the particular implementation. While the present description generally describes entity tracking computing system 100 as receiving data from sensors, this does not require entity identifier 104 and other modules of the entity tracking computing system to be implemented on a single computing device that is separate and distinct from the plurality of sensors associated with the entity tracking computing system. Rather, the functionality of the entity tracking computing system 100 may be distributed among multiple sensors or other suitable devices. For example, rather than sending raw sensor data to an entity tracking computing system, individual sensors may be configured to attempt to identify entities they detect and report the identification to other modules of entity tracking computing system 100 and/or intelligent assistant system 20. Furthermore, to simplify the following description, the term "sensor" is sometimes used to describe not only a physical measurement device (e.g., a microphone or camera), but also to describe various logic processors configured and/or programmed to interpret signals/data from the physical measurement device. For example, "microphone" may be used to refer to devices that convert acoustic energy into electrical signals, analog-to-digital converters that convert electrical signals into digital data, on-board application specific integrated circuits that pre-process digital data, and downstream modules described herein (e.g., entity tracking computing system 100, entity identifier 104, voice listener 30, or parser 40). As such, references to a generic "sensor" or a specific sensor (e.g., "microphone" or "camera") should not be construed to refer to physical measurement devices only, but rather to collaboration modules/engines that may be distributed across one or more computers.
Each of the entity identifier 104, the person identifier 105, the location identifier 106, and the status identifier 108 is configured to interpret and evaluate sensor data received from the plurality of sensors 102 and output the context information 110 based on the sensor data. The context information 110 may include a guess/prediction of the identity, location, and/or status of one or more detected entities by the entity tracking computing system based on the received sensor data. As will be described in more detail below, each of the entity identifier 104, person identifier 105, location identifier 106, and status identifier 108 may output their predictions/identifications as well as confidence values.
The entity identifier 104, person identifier 105, location identifier 106, state identifier 108, and other processing modules described herein may utilize one or more machine learning techniques. Non-limiting examples of such machine learning techniques may include feed forward networks, recurrent Neural Networks (RNNs), long term memory (LSTM), convolutional neural networks, support Vector Machines (SVMs), generation countermeasure networks (GAN), variant automatic encoders, Q learning, and decision trees. The various identifiers, engines, and other processing blocks described herein may be trained using these or any other suitable machine learning techniques, via supervised and/or unsupervised learning, to make the described evaluations, decisions, identifications, etc.
The entity identifier 104 may output the entity identity 112 of the detected entity, and such entity identity may have any suitable degree of specificity. In other words, based on the received sensor data, the entity tracking computing system 100 may predict the identity of a given entity and output information such as the entity identity 112. For example, entity identifier 104 may report that a particular entity is a person, furniture, dog, or the like. Additionally or alternatively, the entity identifier 104 may report that the particular entity is an oven having a particular model; pet dogs with specific names and breeds; the owner or known user of the intelligent assistant device 10, wherein the owner/known user has a specific name and profile. In different examples, the entity may be identified in any of a variety of suitable ways: potentially involving facial recognition, voice recognition, detecting the presence of a portable computing device associated with a known entity, assessing a person's height, weight, shape, gait, hairstyle, and/or shoulder shape, etc.
In some examples, the entity identifier 104 may determine two or more levels of identity of a person. Such an identity level may correspond to one or more identity certainty thresholds represented by a confidence value. For example, such an identity level may include an initial identity corresponding to a previously identified person and representing an initial confidence value, and a verified identity representing a verified confidence value that is greater than the initial confidence value of the person being the previously identified person. For example, an initial identity of a person may be determined, with an associated confidence value mapped to at least 99.0000% of the likelihood that the person is a previously identified person. A verified identity of the person may be determined, with the associated confidence value mapped to at least 99.9990% of the likelihood that the person is a previously identified person. For example, an authenticated identity may be required to authenticate to personnel at an enterprise security level to access particularly sensitive data, such as bank accounts, confidential corporate information, health related information, and the like. In some examples, the degree of specificity of the entity identifier 104 in identifying/classifying the detected entity may depend on one or more of user preferences and sensor limitations. In some cases, the entity identity output by the entity identifier may be simply a generic identifier that does not provide information about the nature of the tracked entity, but rather is used to distinguish one entity from another.
When applied to a person, the entity tracking computing system 100 may in some cases gather information about the person that cannot be identified by name. For example, the entity identifier 104 may record images of a person's face and associate those images with the audio of the recorded person's voice. If the person subsequently speaks or otherwise addresses the intelligent assistant system 20, the entity tracking computing system 100 will have at least some information about who the intelligent assistant device is interacting with. In some examples, intelligent assistant system 20 may also prompt the person to state his name in order to more easily identify the person in the future.
In some examples, the intelligent assistant device 10 may utilize the identity of a person to customize a user interface for that person. In one example, users with limited visual capabilities may be identified. In this example and based on the identification, the display of the intelligent assistant device 10 (or other device interacting with the user) may be modified to display larger text or provide a voice-only interface.
The location identifier 106 may be configured to output an entity location (i.e., position) 114 of the detected entity. In other words, the location identifier 106 may predict the current location of a given entity based on the collected sensor data and output information such as the entity location 114. As with entity identity 112, entity location 114 may have any suitable level of detail, and the level of detail may vary with user preferences and/or sensor limitations. For example, the location identifier 106 may report that the detected entity has a two-dimensional location defined on a plane such as a floor or wall. In some examples, the entity location 114 (such as an angular direction or distance from such a device) may be determined relative to the intelligent assistant device. Additionally or alternatively, the reported entity location 114 may include a three-dimensional location of the detected entity within a three-dimensional environment of the real world. In some examples, the physical location 114 may include a GPS location, a position fix within an environment-related coordinate system, and the like.
The reported entity location 114 for the detected entity may correspond to a geometric center of the entity, a particular portion of the entity that is classified as important (e.g., a person's head), a series of bounds defining a boundary of the entity in three-dimensional space, and so forth. The location identifier 106 may further calculate one or more additional parameters describing the location and/or orientation of the detected entity, such as pitch, roll, and/or yaw parameters. In other words, the reported position of the detected entity may have any number of degrees of freedom and may include any number of coordinates defining the position of the entity in the environment. In some examples, the entity location 114 of the detected entity may be reported even if the entity tracking computing system 100 is unable to identify the entity and/or determine the current state of the entity.
The status identifier 108 may be configured to output an entity status 116 of the detected entity. In other words, the entity tracking computing system 100 may be configured to predict a current state of a given entity based on the received sensor data and output information such as the entity state 116. An "entity state" may refer to virtually any measurable or sortable attribute, activity, or behavior of a given entity. For example, when applied to a person, the physical state of the person may indicate the presence of the person, the height of the person, the posture of the person (e.g., standing, sitting, lying down), the speed at which the person is walking/running, the current activity of the person (e.g., sleeping, watching television, working, playing games, swimming, making a call), the current emotion of the person (e.g., by evaluating the facial expression or mood of the person), the biological/physiological parameters of the person (e.g., heart rate, respiration rate, blood oxygen saturation, body temperature, neural activity of the person), whether the person has any current or upcoming calendar events/appointments, and so forth. "physical state" may refer to additional/alternative properties or behaviors when applied to other biological or non-biological objects, such as the current temperature of an oven or kitchen sink, whether a device (e.g., television, light, microwave) is turned on, whether a door is open, etc.
In some examples, the status identifier 108 may use the sensor data to calculate various different biological/physiological parameters of the human. This may be accomplished in a variety of suitable ways. For example, the entity tracking computing system 100 may be configured to interface with an optical heart rate sensor, pulse oximeter, sphygmomanometer, electrocardiograph, and the like. Additionally or alternatively, the status identifier 108 may be configured to interpret data from one or more cameras and/or other sensors in the environment and process the data to calculate a person's heart rate, respiration rate, blood oxygen saturation, and the like. For example, the status identifier 108 may be configured to amplify small movements or changes captured by the camera using euler magnification and/or similar techniques, allowing the status identifier to visualize blood flow through the circulatory system of the person and calculate the associated physiological parameter. Such information may be used to determine when a person falls asleep, exercises, falls into dilemma, encounters a health problem, etc., for example.
After determining one or more of the entity identity 112, the entity location 114, and the entity status 116, such information may be sent as context information 110 to any of a variety of external modules or devices, where such information may be used in a variety of ways. For example, and as described in more detail below, the context information 110 may be used to determine one or more contexts of a human user and to actuate one or more components of the intelligent assistant device to communicate the one or more contexts to the user in a non-linguistic manner. Additionally, the context information 110 may be used by the promising engine 60 to manage the promise and associated messages and notifications. In some examples, the context information 110 may be used by the promising engine 60 to determine whether a particular message, notification, or promise should be performed and/or presented to the user. Similarly, the context information 110 may be used by the voice listener 30 when interpreting human speech or activating a function in response to a keyword trigger.
As described above, in some examples, the entity tracking computing system 100 may be implemented in a single computing device (such as the intelligent assistant device 10). In other examples, one or more functions of entity tracking computing system 100 may be distributed across multiple computing devices working in concert. For example, one or more of entity identifier 104, person identifier 105, location identifier 106, and status identifier 108 may be implemented on different computing devices while still collectively comprising an entity tracking computing system configured to perform the functions described herein. As described above, any or all of the functions of the entity tracking computing system may be performed by a separate sensor 102. Further, in some examples, entity tracking computing system 100 may omit one or more of entity identifier 104, person identifier 105, location identifier 106, and status identifier 108, and/or include one or more additional components not described herein, while still providing context information 110. Additional details regarding the components and computing aspects that may be used to implement entity tracking computing system 100 are described in greater detail below with respect to FIG. 19.
Each of the entity identity 112, entity location 114, and entity status 116 may take any suitable form. For example, each of the entity identity 112, location 114, and status 116 may take the form of discrete data packets that include a series of values and/or tags that describe information collected by the entity tracking computing system. Each of the entity identity 112, location 114, and status 116 may additionally include a confidence value that defines a statistical likelihood that the information is accurate. For example, if the entity identifier 104 receives sensor data that strongly indicates that a particular entity is a human male named "John Smith," the entity identity 112 may include this information along with a corresponding relatively high confidence value (such as 90% confidence). If the sensor data is more ambiguous, the confidence value included in the entity identity 112 may be relatively low (such as 62%). In some examples, the separated predictions may be assigned separated confidence values. For example, the entity identity 112 may indicate with a 95% confidence that a particular entity is a human male, and with a 70% confidence that the entity is john smith. Such confidence values (or probabilities) may be used by a cost function to generate a cost calculation for providing messages or other notifications to a user and/or performing action(s).
In some implementations, the entity tracking computing system 100 may be configured to combine or fuse data from multiple sensors in order to determine the context information 110 and corresponding contexts and output more accurate predictions. As an example, a camera may locate a person in a particular room. Based on the camera data, the entity tracking computing system 100 may identify the person with a confidence value of 70%. However, the entity tracking computing system 100 may additionally receive recorded speech from a microphone. Based solely on the recorded speech, the entity tracking computing system 100 may identify the person with a 60% confidence value. By combining data from the camera with data from the microphone, the entity tracking computing system 100 may identify a person with a higher confidence value than using only data from either sensor. For example, the entity tracking computing system may determine that the recorded speech received from the microphone corresponds to movement of lips of a person visible to the camera when the speech is received, thereby concluding with a relatively high confidence (such as 92%) that the person visible to the camera is the person speaking. In this way, the entity tracking computing system 100 may combine two or more predicted confidence values to identify a person with a combined higher confidence value.
In some examples, data received from various sensors may be weighted differently depending on the reliability of the sensor data. This may be particularly relevant where multiple sensors are outputting data that appears to be inconsistent. In some examples, the reliability of the sensor data may be based at least in part on the type of data generated by the sensor. For example, in some implementations, the reliability of video data may be weighted higher than the reliability of audio data, as the presence of an entity on a camera may be a more reliable indicator of its identity, location, and/or status than what is assumed to originate from the entity's recorded sound (sound). It should be appreciated that the reliability of the sensor data is a different factor than the confidence value associated with the accuracy of the prediction of the data instance. For example, several instances of video data may have different confidence values based on different contextual factors present at each instance. However, in general, each of these instances of video data may be associated with a single reliability value of the video data.
In one example, data from a camera may suggest that a particular person is in the kitchen with a confidence value of 70%, such as via facial recognition analysis. Data from the microphone may suggest that the same person is in a nearby corridor with a 75% confidence value, such as via voice recognition analysis. Even though the instance of microphone data has a higher confidence value, the entity tracking computing system 100 may output a prediction of a person in the kitchen based on the higher reliability of the camera data (as compared to the lower reliability of the microphone data). In this way, and in some examples, different reliability values for different sensor data may be used with confidence values to coordinate conflicting sensor data and determine an identity, location, and/or status of an entity.
Additionally or alternatively, more weight may be given to a sensor with greater accuracy, more processing power, or other greater capability. For example, professional-grade cameras may have significantly improved lens, image sensor, and digital image processing capabilities compared to basic web cameras (webcams) in notebook computers. Thus, higher weight/reliability values may be given to video data received from professional-grade cameras than webcams, as such data may be more accurate.
Referring now to fig. 8, in some examples, a separate sensor used with the entity tracking computing system 100 may output data at a different frequency than other sensors used with the entity tracking computing system. Similarly, sensors used with the entity tracking computing system 100 may output data at a frequency different from the frequency at which the entity tracking computing system evaluates data and outputs context information. In the example of fig. 8, the entity tracking computing system 100 may receive and interpret sensor data over a plurality of time frames 200A, 200B, and 200C. A single time frame may represent any suitable length of time, such as 1/30 seconds, 1/60 seconds, etc.
In this example, during time frame 200A, entity tracking computing system 100 receives a set of sensor data 202, sensor data set 202 including sensor a data 204A, sensor B data 204B, and sensor C data 204C. Such sensor data is interpreted and transformed by the entity tracking computing system 100 into context information 206, the context information 206 may be used to determine the identity, location, and/or status of one or more detected entities as described above. During time frame 200B, entity tracking computing system 100 receives sensor data 208, sensor data 208 including sensor a data 210A and sensor B data 210B. Entity tracking computing system 100 does not receive data from sensor C during time frame 200B because sensor C outputs data at a different frequency than sensors a and B. Similarly, entity tracking computing system 100 does not output context information during time frame 200B because entity tracking computing system outputs context information at a different frequency than sensors a and B.
During time frame 200C, entity tracking computing system 100 receives sensor data 212, sensor data 212 including sensor a data 214A, sensor B data 214B, sensor C data 214C, and sensor D data 214D. Entity tracking computing system 100 also outputs context information 216 during time frame 200C, because context information was last output in time frame 200A, context information 216 may be based on any or all of the sensor data received by the entity tracking computing system. In other words, the context information 216 may be based at least in part on the sensor data 208 and the sensor data 212. In some examples, the context information 216 may also be based at least in part on the sensor data 202 and the sensor data 208 and the sensor data 212.
As shown in fig. 8, after the entity tracking computing system 100 receives data from a particular sensor, multiple time frames may elapse before the entity tracking computing system receives more data from the same sensor. During these multiple time frames, entity tracking computing system 100 may output context information. Similarly, the usefulness of data received from a particular sensor may vary over time frames. For example, at a first time frame, the entity tracking computing system 100 may receive audio data of a particular person speaking via a microphone and thus identify the person's entity location 114 with a relatively high confidence value. In a subsequent time frame, the person may stay at the identified location, but may have stopped speaking since the first time frame. In this case, the absence of useful data from the microphone may not be a reliable indicator of human absence. Similar problems can occur with other types of sensors. For example, if a person obscures the face of a person or is obscured by an obstacle (such as another person or a moving object), the camera may lose track of the person. In this case, while the current camera data may not suggest the presence of a person, the previous instance of the camera data may suggest that the person is still located at the previously identified location. Typically, while sensor data may reliably indicate the presence of an entity, such data may be unreliable when implying that an entity is not present.
Thus, the entity-tracking computing system 100 may utilize one or more confidence decay functions, which may be defined by the entity-tracking computing system and/or the sensor itself in different examples. A confidence decay function may be applied to the sensor data to reduce the confidence that an entity tracks the computing system to data from a particular sensor over time since the last positive detection of the entity by the sensor. As an example, after a sensor detects an entity at a particular location, the entity tracking computing system 100 may report context information 110 indicating that the entity is located at the location with a relatively high confidence. If after one or more time frames the sensor no longer detects an entity at the location, and unless conflicting evidence is subsequently collected, the entity tracking computing system 100 may still report that the entity is at the location, but with a low confidence. The likelihood that an entity is still located at the location gradually decreases as time continues since the last time the sensor detected the entity at the location. Thus, the entity tracking computing system 100 may utilize a confidence decay function to gradually decrease the confidence value of its reported context information 110, eventually reaching a confidence of 0% if no additional sensors detect an entity.
In some cases, different confidence decay functions may be used for different sensors and sensor types. The selection of a particular decay function may depend, at least in part, on the particular properties of the sensor. For example, the confidence value associated with the data from the camera may decay faster than the confidence value associated with the data from the microphone because the absence of an entity in the video frame is a more reliable indicator of the absence of an entity than the silence of the microphone recording.
One example of sensor confidence decay is schematically illustrated in fig. 9, fig. 9 showing the entity tracking computing system 100 receiving sensor data during three different time frames 300A, 300B, and 300C. During time frame 300A, entity tracking computing system 100 receives camera data 302, in which camera data 302 an entity is visible in the frame. Based on this data, entity tracking computing system 100 reports entity location 304 with a 90% confidence value. In time frame 300B, entity tracking computing system 100 receives camera data 306, in which the entity is no longer visible in the frame 306. However, the entity may not be moving, but simply be occluded, or not detectable by the camera. Thus, the entity tracking computing system 100 reports the same entity location 304, but with a lower confidence value of 80%.
Finally, in time frame 300C, entity tracking computing system 100 receives camera data 310 indicating that the entity is still invisible in the frame. Over time, the likelihood that the entities remain in the same location is getting smaller and smaller. Thus, the entity tracking computing system 100 reports the same entity location 304 with a lower confidence value of 60%.
In some examples, the variable reliability of the sensor data may be at least partially addressed by utilizing data filtering techniques. In some examples, a kalman filter (KALMAN FILTER) may be used to filter the sensor data. The kalman filter is a mathematical function that can combine multiple uncertain measurements and output predictions with a higher confidence than using any individual measurement. Each measurement value input to the kalman filter may be weighted based on the perceived reliability of the measurement value. The kalman filter operates in a two-step process that includes a prediction step and an update step. During the prediction step, the filter outputs predictions based on recently weighted measurements. In the update step, the filter compares its predictions to the actual observations or states and dynamically adjusts the weights applied to each measurement in order to output a more accurate prediction.
In some examples, the entity tracking computing system 100 may include a kalman filter, such as when sensor confidence values have decayed over time since a last positive detection, that combines data from various sensors to compensate for lower sensor reliability. In some examples, the entity-tracking computing system 100 may apply a kalman filter to the sensor data when one or more sensor confidence values are below a predetermined threshold. In an example scenario, image data from a camera may be analyzed using face detection techniques to reliably detect people in a particular room. In response, the entity tracking computing system 100 may report with high confidence that a person is located in a room.
In a subsequent time frame, the camera may no longer be able to capture and/or positively identify the face of the person in the room. For example, a person's face may be occluded, or a camera may transmit data at a much lower frequency than the entity tracking computing system 100 outputs the contextual information 110. If the entity tracking computing system 100 relies solely on data from the camera, the confidence value of the person's location it reports will gradually decrease until the next positive detection. However, in some examples, data from the camera may be supplemented with data from other sensors. For example, during a subsequent time frame, the microphone may report that it heard the person's voice in the room, or another sensor may report that it may detect the presence of the person's portable computing device in the room. In this case, the data may be assigned weights by a kalman filter and may be used to predict the current location of the person with a higher confidence than if only camera data were used.
In some cases, detection of people and/or other entities in the environment may become more complex when sensor data is contaminated with background information. Such context information may compromise the confidence with which entity tracking computing system 100 reports entity identity 112, location 114, and/or status 116. For example, the intelligent assistant device 10 may need to determine the identity of the person who is speaking in order to respond appropriately to a query or command. Such a determination may be difficult when multiple people are talking at the same time, playing televisions, running noisy machines, etc.
Accordingly, the entity tracking computing system 100 may use various audio processing techniques to more confidently identify particular active participants in conversations with other people and/or with the intelligent assistant device 10. As one example, the entity tracking computing system 100 may implement a Voice Activity Detection (VAD) engine that may distinguish human voice from ambient noise and identify the presence or absence of human speech.
The generic VAD engine can be used for the purpose of classifying a particular audio segment as including speech or non-speech with a corresponding confidence value. The entity tracking computing system 100 may also utilize a speaker recognition engine to match specific audio clips to specific persons. As more speech is received, the speaker recognition engine may be incrementally adjusted to classify the audio as including speech from a particular conversation participant or as excluding speech from a particular conversation participant. In this manner, the entity tracking computing system 100 may recognize speech from one or more particular people/conversation participants.
Training of the speaker recognition engine may occur when the entity tracking computing system 100 confidently identifies a particular person and records audio that may be confidently attributed to that person. For example, using camera data, the entity tracking computing system 100 may identify a particular person and determine that the person's lips are moving. The entity tracking computing system 100 may receive audio from the microphone simultaneously, which may be safely assumed to include speech from the identified person. Thus, the received audio may be used to retrain the speaker recognition engine to more specifically recognize the voice of the identified person.
In some cases, such retraining may only occur when a person has been identified with a high confidence value (such as when the confidence value exceeds a predetermined threshold), e.g., via accurate facial recognition or any other method, and when the entity tracking computing system 100 has received an audio recording of the voice of the person with high volume/amplitude and high signal-to-noise ratio (S/N). Using this technique, the entity tracking computing system 100 may accumulate various person-to-person voice models, allowing the entity tracking computing system to more consistently identify speech from a particular person and ignore background noise.
Referring now to FIG. 10, an example of using a trained speech recognition engine to recognize speech from a particular person is schematically illustrated. In this example, the entity tracking computing system 100 receives two speech segments 400A and 400B. The speech segment 400A includes the recorded speech of person 1 and the speech segment 400B includes the recorded speech of person 2. Entity tracking computing system 100 includes speech recognition engine 402, as described above, speech recognition engine 402 has been trained to use voice 1 model 404 to recognize the speech of person 1. As the speech segments 400A and 400B are received by the entity tracking computing system 100, a voice 1 model 404 may be applied to each of the speech segments 400A and 400B.
After processing the speech segments, entity tracking computing system 100 outputs a prediction of the likelihood that each speech segment corresponds to person 1. As shown, for speech segment 400A, the entity tracking computing system outputs person 1 identification 404A with a 90% confidence value, indicating that the speech segment may include speech from person 1. For speech segment 400B, the entity tracking computing system outputs person 1 identification 404B with a 15% confidence value, indicating that speech segment 400B may not include speech from person 1.
In some examples, the entity tracking computing system 100 may be configured to identify background noise present in the environment and subtract such background noise from the received audio data using audio processing techniques. For example, a particular device in a person's home may be playing background audio, such as music or a television/movie conversation. Various microphone-equipped devices in a person's home may record such audio. Where such microphone-equipped devices include the intelligent assistant device 10 and/or provide audio data to the entity tracking computing system 100, such background audio may compromise the ability of the system to recognize, interpret, and/or respond to human questions or commands.
Accordingly and in some examples, the device playing the background audio and/or another microphone-equipped device recording the background audio may send the captured audio signal to the entity-tracking computing system 100. In this way, the entity tracking computing system 100 may subtract background audio from the audio signal received by the microphone-equipped device. In some examples, the operation of subtracting the background audio signal from the recorded audio data may be performed by the device(s) capturing the audio data or by an associated audio processing component before the audio data is sent to the entity tracking computing system 100.
Additionally or alternatively, the device and/or entity tracking computing system 100 may be trained to identify a particular source of background noise (e.g., from a vent or refrigerator) and automatically ignore waveforms corresponding to such noise in recorded audio. In some examples, the entity-tracking computing system 100 may include one or more audio recognition models trained to recognize background noise. For example, audio from various noise databases may be run by means of supervised or unsupervised learning algorithms in order to more consistently identify such noise. By allowing the entity-tracking computing system 100 to identify uncorrelated background noise, the ability of the entity-tracking computing system to identify correlated human speech and other sounds may be improved. In some implementations, knowledge of the location of the sound source may be used to focus listening from a directional microphone array.
As described above, in some cases, a smart assistant device as described herein may be configured to track people or other entities as they move throughout the environment. This may be accomplished, for example, by interpreting data received from a plurality of sensors communicatively coupled to the intelligent assistant device. In some examples, the intelligent assistant device may track one or more entities by maintaining an environment-related coordinate system to which a detection Field (FOD) of each of the plurality of sensors is mapped. As used herein, an "environment" may refer to any real-world area (e.g., a single room, house, apartment, store, office, building, venue, outdoor space, grid area, etc.).
Referring now to fig. 11A and 11B, environment 4 of fig. 1 is schematically illustrated with intelligent assistant device 10. In these views, FOD 500A of camera 82A and FOD 500B of camera 82B of intelligent assistant device 10 are schematically illustrated. Because the sensor shown in fig. 11A is a camera, FODs 500A and 500B are fields of view (FOV) of cameras 82A and 82B. In other words, FODs 500A and 500B illustrate portions of a three-dimensional space in which cameras 82A and 82B may detect entities in environment 4. As will be described in greater detail below, in some examples, when image data is received from one or more cameras that indicates the presence of a person, the intelligent assistant device 10 may actuate one or more components (e.g., light source(s), movable portion, etc.) to communicate the presence of a person in a non-verbal manner.
Although the sensors shown in fig. 11A and 11B are cameras, as described above, the intelligent assistant device may include any of a variety of suitable sensors. As non-limiting examples, such sensors may include visible light cameras, infrared (IR) cameras, depth cameras, cameras that are sensitive to light of other wavelengths, microphones, radar sensors, any other sensor described herein, and/or any other sensor that may be used to track an entity. Further, the sensor in communication with the intelligent assistant device may take any suitable orientation.
Thus, as described above, the intelligent assistant device can maintain an environment-related coordinate system to which FODs of sensors in the environment are mapped. The coordinate system may, for example, represent an understanding of the real world relationship of the FOD in the environment by the intelligent assistant device. In other words, the FOD of each sensor in the environment may be mapped to an environment-related coordinate system such that the intelligent assistant device understands that the various sensors may detect real world areas of entity presence, movement, and other contextual information. The environment-related coordinate system may additionally include other information related to the environment, such as physical dimensions of the environment (e.g., dimensions of rooms, buildings, outdoor spaces, grid sections) and/or locations of any furniture, obstructions, porches, sensors, or other detectable features present within the environment.
It will be appreciated that the environment-related coordinate system may take any suitable form and include any suitable information relating to the environment. The environment-related coordinate system may utilize any suitable scale, grid system, and/or other method to map/quantify the environment, and any suitable number of coordinates and parameters may be used to define the sensor FOD location. In some cases, the environment-related coordinate system may be a two-dimensional coordinate system, and the sensor FOD is defined relative to a two-dimensional surface (such as a floor of the environment). In other cases, the environment-related coordinate system may define the sensor FOD in three-dimensional space.
It should also be noted that tracking entities by means of private environments (such as living spaces, bedrooms, bathrooms, etc.) may pose potential privacy concerns. Thus, all data collected by the intelligent assistant device, which may be personal in nature (such as entity location, appearance, movement, behavior, communication, etc.), will be treated with the greatest honoring of entity privacy. In some cases, any or all of the entity tracking techniques described herein may be performed only in response to receiving explicit user permissions. For example, the user may specify which sensors are active, the amount and type of data collected by the sensors, which spaces or rooms in the environment are monitored by the entity tracking computing system, the security level or encryption level used with the data collected by the entity tracking computing system, whether the collected data is stored locally or remotely, and so forth. In some examples, a user may choose to monitor a sensitive area in an environment with a relatively low resolution sensor (such as a radar sensor). This may alleviate at least some privacy concerns regarding entity tracking, allowing entity tracking computing devices to track entity movement without requiring a user to install a high resolution camera in a sensitive area such as a bathroom.
As described above, the intelligent assistant device of the present disclosure can detect the presence of a person and various contextual information related to the person. Further, in some examples, incorporating one or more cameras in the device for sensing one or more types of visual data provides additional capabilities and opportunities for enhanced assistance and enhanced interaction with the user. More specifically, and as previously described, when a non-verbal communication is received from another party, the person's interaction with another person or entity will be enhanced and the amount of information is greater. Thus and referring now to fig. 12A-12C, an example method 600 for communicating non-verbal cues via a smart assistant device is disclosed. As an example, the method 600 may be performed by the intelligent assistant device 10, 150, 158, 186 and/or the unitary computing device 160 of fig. 18. The following description of method 600 is provided with reference to the software and hardware components described herein and shown in fig. 1-11B and 13A-19. It will be appreciated that method 600 may be performed in other contexts using other suitable hardware and software components.
Referring to fig. 12A, at 604, method 600 may include receiving image data from one or more cameras of a smart assistant device indicating a presence of a human being. This is schematically illustrated in fig. 13A and 13B, fig. 13A and 13B again showing the environment 4 of fig. 1. Specifically, fig. 13A shows FOD 500B of human entity 2 entering camera 82B of intelligent assistant device 10, while fig. 13B shows view 800 of environment 4 from the perspective of camera 82B.
Upon detection of a human entity 2 within FOD 500A, the camera may transmit an indication of the detected entity's presence to intelligent assistant device 10. The indication of the presence of an entity may take any suitable form, depending on the implementation and the particular sensor used. In one example scenario, a camera may capture an image of a face. In some cases, the camera may transmit raw image data to the intelligent assistant device, the image data including one or more pixels corresponding to the face. The transmitted pixels corresponding to the entity thus represent an indication of the presence of the entity and may be processed by the intelligent assistant device to determine the location and/or identity of the entity. Notably, the image data may be transmitted by the camera at any suitable frequency and need not be transmitted only in response to detecting the candidate entity. In other cases, the camera may perform some degree of processing on the image data and send a summary or interpretation of the data to the intelligent assistant device. Such a summary may indicate, for example, that a particular, identified person is present at a particular location given by the sensor's sensor-related coordinate system. Regardless of the specific form of indicating the presence of an entity, in an example scenario, data received by the intelligent assistant device may still be available to identify faces detected in the FOD of the sensor.
The indication of the presence of an entity may also include other forms of data based on the location of the entity detected by one or more additional sensors. For example, when the sensor is a microphone, the indication of the presence of the entity may include recorded audio of the entity's voice or a sensor-related location of the entity determined via sound processing. When the sensor is a radar sensor, the indication of the presence of the entity may include a silhouette (silhouette) or "blob" formed by detecting radio waves reflected from the entity. It will be appreciated that different sensors will detect the presence of an entity in different ways, and that the indication of the presence of an entity may take any suitable form depending on the particular sensor(s) used. Further, processing of the sensor data may occur on the entity tracking computing system, on the sensor or related components, and/or distributed among multiple devices or systems.
Returning briefly to fig. 12A, at 608, the method 600 may include: in response to receiving the image data indicating the presence of the person, one or more components of the intelligent assistant device are actuated to communicate the presence of the person in a non-verbal manner. As described in the examples presented herein, in some examples, one or more components may include a single light source or multiple light sources. In different examples, a single light source may comprise a light emitting element such as an LED, or a display such as an OLED or LCD display. The plurality of light sources may include a plurality of light emitting elements, a single display, or a plurality of displays, as well as various combinations of the foregoing. In this manner, and as described in the examples presented below, a person receiving such a non-verbal communication is conveniently notified that her presence was detected by the intelligent assistant device. Furthermore, by expressing the useful information via non-verbal communication, the device conveniently and non-invasively informs the user of the information.
In one example and referring again to fig. 12A, at 612, actuating one or more components of the intelligent assistant device to communicate the presence of a person in a non-verbal manner may include illuminating at least one light source located on the intelligent assistant device. In this way, a person may be conveniently visually notified that the intelligent assistant device has detected her presence.
As described above and referring again to fig. 2, in one example, the intelligent assistant device 10 includes a cylindrical housing 80, the cylindrical housing 80 including a plurality of light sources 84 extending around at least a portion of the perimeter of the housing. For ease of description, fig. 14 is a schematic diagram showing the light source array 84 in an "expanded" two-dimensional view. In some examples, the light source 84 may extend 360 degrees around the perimeter of the housing 80 of the intelligent assistant device 10. In other examples, the array may extend 90 degrees, 120 degrees, 180 degrees, or any other suitable degree around the perimeter. Additionally, the example of fig. 14 shows a generally rectangular 4 x 20 array of light sources. In other examples, different numbers and arrangements of light sources located at various locations on the intelligent assistant device 10 may be utilized and are within the scope of the present disclosure. In some examples, different individual light sources may have different shapes, sizes, outputs, and/or other qualities or characteristics.
In some examples and as described in more detail below, to communicate the presence of a person in a non-verbal manner, the intelligent assistant device 10 may determine the location of the person relative to the device and may illuminate at least one light source located on a portion of the device facing the person.
Returning briefly to fig. 12A, in some examples and at 616, the method 600 may include illuminating at least one light source by modulating at least one of a frequency, a brightness, a color, a number, and a shape of the at least one light source to convey a non-verbal cue to a user. In some examples and at 620, wherein the at least one light source is a plurality of light sources, the light sources may be LEDs. In other examples, any other suitable type of light source may be utilized.
Referring again to fig. 14 and as described in more detail below, in some examples, the frequency of the one or more light sources 84 may be modulated to communicate in a non-verbal manner that the intelligent assistant device 10 detects the presence of a person. Referring to the example of fig. 13A, when a person 2 enters the living room 4, one or more light sources 84 may be illuminated to blink or pulse at a predetermined frequency when image data from the camera of the intelligent assistant device 10 indicates the presence of the person 2. Additionally and as described in more detail below, in response to determining one or more contexts of a person (such as a person's location, height, or identity), one or more light sources may be illuminated to blink or pulse at different frequencies to communicate the one or more contexts of the person in a non-verbal manner. It will be appreciated that a variety of techniques of illuminating the light source(s) may be utilized, such as creating different frequencies and illumination patterns of various visual effects, shapes, animations, etc.
In some examples, one or more of the brightness, color, and number of light sources may be modulated in addition to or instead of modulating the frequency of the light source(s). For example, when a person 2 enters the living room 4, one or more light sources 84 may be illuminated at an initial brightness to communicate the presence of the person 2 in a non-verbal manner. When one or more other contexts of the person are determined, the one or more light sources may be illuminated with a modified and enhanced brightness to communicate the one or more contexts of the person in a non-verbal manner.
Similarly, when a person 2 enters the living room 4, one or more light sources 84 may be illuminated in an initial color (such as blue) to communicate the presence of the person 2 in a non-verbal manner. When another context of the person is determined, the color of the one or more light sources may be changed to green to communicate the one or more contexts of the person in a non-verbal manner. In another example, blue light source(s) may be maintained to indicate presence, and other light source(s) may be illuminated in different colors to communicate one or more contexts of a person in a non-verbal manner. In another example, when the person 2 enters the living room 4, only one of the light sources 84 may be illuminated to communicate the presence of the person 2 in a non-verbal manner. When another context of the person is determined, the plurality of light sources may be illuminated to communicate the one or more contexts of the person in a non-verbal manner. It will be appreciated that the above examples are provided for illustrative purposes only, and that many variations and combinations of illuminating one or more light sources in various ways to convey non-verbal cues may be utilized and are within the scope of the present disclosure.
Returning briefly to fig. 12A, at 624, the method 600 may include displaying a vector graphic via a display of the intelligent assistant device, thereby conveying a non-verbal cue. As noted above with respect to fig. 3, in some examples, the one or more light sources may include a display 152, the display 152 surrounding all or part of the perimeter of the device housing. In these examples, display 152 may be used to display vector graphics 154 (such as various static or animated shapes, patterns, etc.) to communicate with the user in a non-verbal manner. Thus, in some examples, one or more shapes generated by a display may be modulated to communicate with a user in a non-verbal manner.
Referring now to fig. 15A-15D, in one example, the display may animate a shape that is deformed from a circular shape as shown in fig. 15A to a horizontal oval in fig. 15B, back to a circular shape in fig. 15C, and then to a vertical oval as shown in fig. 15D. As described above, in other examples, the display may generate a variety of shapes and/or patterns that are static and/or animated to convey various cues to the user in a non-verbal manner.
Returning briefly to FIG. 12A and at 628, braking one or more components to communicate with the user in a non-verbal manner may include projecting a non-verbal cue onto the surface. As noted above with respect to fig. 4, in some examples, the intelligent assistant device 158 may include a projector 180, the projector 180 projecting one or more static or animated shapes, patterns, icons, etc. onto a surface. In the example of fig. 4, projector 180 projects an image of circle 182 onto a surface 184 of a table on which the device is located.
In some examples, data from one or more sensors of the intelligent assistant device may indicate the presence of multiple persons. In these examples and briefly returning to fig. 12A, at 632 method 600 may include receiving an indication of presence of a plurality of persons from one or more sensors of a smart assistant device. Accordingly and using one or more techniques described herein, the intelligent assistant device can communicate different non-verbal cues individually to two or more of the plurality of persons.
Referring now to fig. 16, in one example, one or more sensors of the intelligent assistant device 10 may detect the second person 12 and the first person 2 in the living room 4. In this example, it may be desirable for the intelligent assistant device to communicate in a non-linguistic manner that it responds to a particular person's natural language input; i.e. a specific person has a "focus" of the device. For example, where the first person 2 initiates an interaction with the intelligent communication device (such as by speaking a keyword phrase such as "hey, computer"), the device may then identify the first person's voice and respond only to commands and queries from the first person. Accordingly and referring briefly to fig. 12A, at 636, the method 600 may include illuminating at least one light source of the intelligent assistant device to non-linguistically convey that the device responds to natural language input from the first person 2. To visually provide such non-verbal cues, the intelligent assistant device may use any of the techniques described above to illuminate one or more light sources on the device.
In some examples and as described above, the intelligent assistant device may determine the location of the first person 2 relative to the device. In these examples, the device may illuminate one or more LEDs (located on the person-facing portion of the device) to communicate in a non-verbal manner the location of the person as understood by the device. Additionally and as described in more detail below, the intelligent assistant device may provide other non-verbal communication for two or more persons to express additional context and other information, such as the location, height, and identity of the person.
Referring now to fig. 12B, at 640, method 600 may include receiving data indicative of contextual information of a person from one or more sensors of a smart assistant device. As described above, the contextual information may include guesses/predictions of the identity, location, and/or status of one or more detected entities by the entity tracking computing system based on the received sensor data. At 644, method 600 may include: one or more contexts of the person are determined using at least data indicative of context information of the person. At 648, the one or more contexts of the person may include one or more of the following: (1) positioning of a person relative to the intelligent assistant device; (2) the height of the person; (3) An initial identity of a person corresponding to a previously identified person and representing an initial confidence value; (4) A verified identity of a person representing a verified confidence value greater than the initial confidence value; and (5) the distance of the person from the intelligent assistant device.
In some examples and as described above, the location of one or more persons relative to the intelligent assistant device may be determined. Referring to the examples of fig. 16 and 17, image data from a camera of the intelligent assistant device may be used to identify and locate the first person 2 and the second person 12 relative to the device. For example, the intelligent assistant device 10 may process the image data to generate a sensor-related location of the detected person within a sensor-related coordinate system. For example, the sensor-related position may be given by a set of pixel coordinates of a two-dimensional grid of pixels captured relative to the camera. When the camera is a depth camera, the sensor-related location of the person may be a three-dimensional location.
With respect to the indication of the presence of an entity, the sensor-related location of the entity may take any suitable form. In some examples, data from one or more other sensors may be used in addition to or instead of image data to determine the location of a person. For example, when the sensor is a microphone, the sensor-related position may be inferred from the amplitude of the recorded audio signal, thereby serving as an indicator of the distance of the person from the sensor. Similarly, the sensor-related coordinate system of each sensor may take any suitable form, depending on the type of data collected or observed by the sensor, and may use any scale, grid system, or other suitable method of calibrating/quantifying the local environment of the sensor.
In some examples, the sensor-related locations of the detected person may be translated into an environment-related location of the person within an environment-related coordinate system. As described above, such translation may be related to the mapping of the FOD of the sensor to an environment-related coordinate system. The mapping may be implemented in any of a variety of suitable manners and may be performed at any suitable time. For example, in some cases, the mapping of the sensor FOD to the environment-related coordinate system may be performed at initial setup of the intelligent assistant device, evolving with use of the device, and/or at another suitable time.
Referring briefly to fig. 12B, at 652, the method 600 may include: in response to determining the one or more contexts of the person, one or more components of the intelligent assistant device are actuated to communicate the one or more contexts of the person in a non-verbal manner. Referring again to fig. 16 and 17, in the event that the location of the first person 2 is determined, the intelligent assistant device may communicate such location to the person in a nonverbal manner. As schematically shown in fig. 17, in one example, the location of the first person 2 may be communicated in a non-verbal manner by illuminating one or more LEDs located on the person-facing portion 19 of the device 10, as shown by dashed lines 15 and 15'.
In some examples, in addition to communicating in a non-verbal manner that the intelligent assistant device has detected the first person 2, the device may also communicate that it is tracking the location of the first person. For example and referring to fig. 17, as the first person 2 walks from the first location 21 to the second location 23, the intelligent assistant device 10 may gradually illuminate different light sources to communicate in a non-verbal manner that the device is tracking the location of the first person. In one example and referring to fig. 14, as a first person 2 moves in the direction of arrow a relative to an array of light sources 84 (which may be LEDs), individual LEDs may be gradually illuminated and darkened from right to left in a manner that follows the changing positioning of the person, and thus communicate in a non-verbal manner that the device is tracking the positioning of the person.
As described above, the intelligent assistant device 10 may detect the presence of more than one person. Referring briefly again to fig. 12B, at 656, method 600 may include receiving an indication of a presence of a second person from one or more sensors of a smart assistant device. At 660, method 600 may include illuminating at least one light source of the intelligent assistant device to communicate in a non-verbal manner: the intelligent assistant device is tracking the location of the first person and the location of the second person.
In one example and referring again to fig. 17, in addition to non-linguistically communicating the location of the first person 2 by illuminating one or more LEDs indicated by dashed line 15, in a similar manner, the intelligent assistant device 10 may also non-linguistically communicate the location of the second person 12 by illuminating one or more LEDs, as indicated by dashed line 17, located on a different portion 25 of the device facing the second person 12. As described above for the first person 2, the intelligent assistant device 10 may also gradually illuminate different light sources to communicate in a non-verbal manner that the device is also tracking the location of the second person.
In some examples, the intelligent assistant device 10 may additionally or alternatively communicate the distance of the first person 2 from the device in a non-verbal manner. In one example, the brightness of one or more LEDs illuminated to indicate the location of a person may increase as the user moves closer to the device and decrease as the user moves farther from the device. It will be appreciated that many other examples of illuminating a light source to non-verbally convey distance from a person may be utilized.
As described above, the intelligent assistant device 10 may use data indicative of the context information of the person to determine one or more contexts of the person. In some examples, the one or more contexts of the person may include a height of the person. In some examples where depth image data from a depth camera is received, the intelligent assistant device may utilize such data to determine the detected height of the person, and may communicate an indication of such height in a non-verbal manner by illuminating one or more of its light sources. In one example and referring to fig. 14, the different detected person heights may generally be indicated by illuminating a varying number of LEDs in a vertical column. For example, for people less than 4 feet in height, 1 LED may be illuminated; for a person between 4 and 5 feet in height, 2 LEDs may be illuminated; for a person of 5 to 6 feet in height, 3 LEDs may be illuminated; for people over 6 feet tall, all 4 LEDs may be illuminated. It will be appreciated that many other examples of illuminating a light source to convey a person's height in a non-verbal manner may be utilized.
In some examples and as described above, the one or more contexts of the person may include an initial identity and a verified identity of the person. As explained above, the entity identifier of the intelligent assistant device may determine two or more levels of identity of the person. For example, such an identity level may include an initial identity corresponding to a previously identified person and representing an initial confidence value, and a verified identity representing a verified confidence value that is greater than the initial confidence value of the person being the previously identified person. In the event that the initial identity is determined, the intelligent assistant device may communicate an indication of such identity in a non-verbal manner by illuminating one or more of its light sources in a particular manner.
In one example and referring to fig. 14, the initial identity of the person may be indicated by illuminating one or more LEDs in a first color (such as blue). Where such a person is then authenticated as a verified identity (representing a verified confidence value that is greater than the initial confidence value), such verified identity may be indicated by illuminating one or more LEDs in a second, different color (such as green). It will be appreciated that many other examples may be utilized that illuminate a light source to communicate an initial identity, a verified identity, and/or an additional level of identity security of a person in a non-verbal manner.
In some examples, a user of the intelligent assistant device 10 may desire to know which type(s) of data the device is collecting and utilizing. For example, some users may wish the device to collect or avoid collecting one or more types of data. In one example and referring briefly again to fig. 12B, at 664, method 600 may include illuminating at least one light source of a smart assistant device to non-linguistically communicate a type of sensor data used by the smart assistant device to determine one or more contexts of a person. For example, where the light source comprises a display on the device, the display may generate a vector graphic showing the camera to indicate that video data is being collected by the device. It will be appreciated that many other examples may be utilized that illuminate a light source to communicate in a non-verbal manner the type of sensor data used by the intelligent assistant device.
As described above, in some examples, the intelligent assistant device 10 may receive and utilize a variety of different sensor data from a variety of different sensors on the device. In one example and referring now to fig. 12C, at 668 method 600 can include, where one or more contexts of a person include an initial identity of the person, receiving and fusing data indicative of context information of the person from a plurality of different sensors of the intelligent assistant device to determine the initial identity of the person. As noted, in other examples, the intelligent assistant device 10 may fuse such data to determine various different contexts of a person as described herein.
Also as described above, in some example implementations of a smart assistant device (such as the examples shown in fig. 5A and 5B), one or more components of the device may be actuated to communicate the presence of a person in a non-verbal manner by translating, rotating, and/or otherwise moving the components. Referring briefly again to fig. 12C, at 672, method 600 may include one or more of: the camera of the mobile device is aimed at the person and the display is moved to follow the person's positioning to communicate the presence of the person in a non-verbal manner.
In some examples, the one or more light sources of the intelligent assistant device may be Infrared (IR) emitters. For example, the device may include an IR projector configured to emit an encoded IR signal that is reflected from objects in the environment for receipt by an IR camera of the device. In some examples, the visible glow of such IR projectors may prove annoying or distracting to the user. Thus, in some examples and referring briefly again to fig. 12C, at 676, the method 600 can include, wherein the intelligent assistant device includes a plurality of light sources, illuminating at least one of the plurality of light sources to one or more of: (1) Reducing the visibility of the at least one IR emitter, and (2) incorporating light emitted from the at least one IR emitter into an illumination pattern generated by the at least one light source. In one example, the IR emitter may be located in the middle of a plurality of LEDs on the device. When the IR emitter is illuminated, the LED may be illuminated such that glow from the IR emitter is mixed into the light emitted from the LED to reduce the visibility of the IR emitter. Furthermore, in some examples, the techniques may also be used to communicate information to a user in a non-verbal manner, as described above. In another example where the IR emitter is located among a plurality of LEDs, the LEDs may be selectively illuminated when the IR emitter is activated to create a pleasing pattern that incorporates light from the IR emitter into the pattern, thereby masking such IR light.
Referring now to fig. 18, an additional example implementation of the intelligent assistant device 10 in a single computing device is illustrated. Additional details regarding the components and computing aspects of the computing device shown in fig. 18 are described below with reference to fig. 19.
Fig. 18 illustrates one example of an integrated computing device 160, in which the components implementing the intelligent assistant device 10 are arranged together in separate devices 160. In some examples, the unitary computing device 160 may be communicatively coupled to one or more other computing devices 162 via the network 166. In some examples, the unitary computing device 160 may be communicatively coupled to a data store 164, which data store 164 may store various data, such as user profile data. The integrated computing device 160 includes at least one sensor 22, a voice listener 30, a parser 40, an intent processor 50, a promising engine 60, an entity tracking computing system 100, and at least one output device 70. The sensor(s) 22 include at least one camera for receiving visual data and at least one microphone for receiving natural language input from a user. In some examples, one or more other types of sensor(s) 22 may also be included.
As described above, the voice listener 30, parser 40, and intent processor 50 work cooperatively to translate natural language input into commitments executable by the unitary device 160. Such commitments may be stored by the promising engine 60. The entity tracking computing system 100 may provide contextual information to the promising engine 60 and/or other modules. At a context appropriate time, the promising engine 60 may perform the promising and provide output such as audio signals to the output device(s) 70.
In some embodiments, the methods and processes described herein may be associated with computing systems of one or more computing devices. In particular, such methods and processes may be implemented as a computer application or service, an Application Programming Interface (API), library, and/or other computer program product.
Fig. 19 schematically illustrates one non-limiting embodiment of a computing system 1300, in which the computing system 1300 may implement one or more of the above-described methods and processes. Computing system 1300 is shown in simplified form. Computing system 1300 may take the form of: one or more intelligent assistant devices, one or more personal computers, server computers, tablet computers, home entertainment computers, network computing devices, gaming devices, mobile computing devices, mobile communication devices (e.g., smart phone), and/or other computing devices as described herein.
Computing system 1300 includes a logic machine 1302 and a storage machine 1304. Computing system 1300 may optionally include a display subsystem 1306, an input subsystem 1308, a communication subsystem 1310, and/or other components not shown in fig. 19.
Logic machine 1302 comprises one or more physical devices configured to execute instructions. For example, a logic machine may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, implement a technical effect, or otherwise achieve a desired result.
The logic machine may comprise one or more processors configured to execute software instructions. Additionally or alternatively, the logic machines may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. The processors of the logic machine may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. The individual components of the logic machine may optionally be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic machine may be virtualized and executed by remotely accessible networked computing devices configured in a cloud computing configuration.
The storage machine 1304 includes one or more physical devices configured to hold instructions executable by the logic machine to implement the methods and processes described herein. When such methods and processes are implemented, the state of the storage 1304 may be transformed, for example, to hold different data.
The storage 1304 may include removable and/or built-in devices. The storage 1304 may include optical storage (e.g., CD, DVD, HD-DVD, blu-ray disc, etc.), semiconductor storage (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic storage (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. The storage 1304 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random access, sequential access, location-addressable, file-addressable, and/or content-addressable devices.
It should be appreciated that the storage 1304 includes one or more physical devices. However, aspects of the instructions described herein may alternatively be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.) that is not maintained by a physical device for a limited duration.
Aspects of logic machine 1302 and storage machine 1304 may be integrated together as one or more hardware logic components. Such hardware logic components may include, for example, field Programmable Gate Arrays (FPGAs), program specific and application specific integrated circuits (PASICs/ASICs), program specific and application specific standard products (PSSPs/ASSPs), system on a chip (SOCs), and Complex Programmable Logic Devices (CPLDs).
The terms "module," "program," and "engine" may be used to describe one aspect of computing system 1300 that is implemented to perform a particular function. In some cases, a module, program, or engine may be instantiated via logic machine 1302 executing instructions held by storage machine 1304. It will be appreciated that different modules, programs, and/or engines may be instantiated according to the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same module, program, and/or engine may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms "module," "program," and "engine" may encompass a single or a group of executable files, data files, libraries, drivers, scripts, database records, and the like.
It will be appreciated that as used herein, a "service" is an application executable across multiple user sessions. The services may be available to one or more system components, programs, and/or other services. In some implementations, the service may run on one or more server computing devices.
When included, the display subsystem 1306 may be used to present visual representations of data held by the storage 1304. In some examples, display subsystem 1306 may include one or more light sources as described herein. Where display subsystem 1306 includes a display device that generates vector graphics and other visual representations, such representations may take the form of a Graphical User Interface (GUI). When the herein described methods and processes change the data held by the storage machine, and thus transform the state of the storage machine, the state of the display subsystem 1306 may likewise be transformed to visually represent the changes in the underlying data. Display subsystem 1306 may include one or more display devices utilizing virtually any type of technology.
When included, the input subsystem 1308 may include or interact with one or more user input devices, such as a keyboard, a mouse, a touch-screen, or a game controller. In some embodiments, the input subsystem may include or interact with a selected Natural User Input (NUI) component. Such components may be integrated or peripheral, and the transduction and/or processing of input actions may be performed on-board or off-board. Example NUI components may include microphones for speech and/or voice recognition; infrared, color, stereo, and/or depth cameras for machine vision and/or gesture recognition; head trackers, eye trackers, accelerometers and/or gyroscopes for motion detection and/or intent recognition; and an electric field sensing component for assessing brain activity.
When included, communication subsystem 1310 may be configured to communicatively couple computing system 1300 with one or more other computing devices. Communication subsystem 1310 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network or a wired or wireless local or wide area network. In some embodiments, the communication subsystem may allow computing system 1300 to send and/or receive messages to and/or from other devices via a network, such as the internet.
The following paragraphs provide additional support for the claims of the subject application. One aspect provides a method at a smart assistant device for communicating non-verbal cues, the smart assistant device configured to respond to natural language input, the method comprising: receiving image data from one or more cameras of the intelligent assistant device indicating the presence of a person; responsive to receiving the image data, actuating one or more components of the intelligent assistant device to communicate the presence of the person in a non-verbal manner; receiving data indicative of contextual information of a person from one or more sensors of the intelligent assistant device; determining one or more contexts of the person using at least data indicative of context information of the person; and in response to determining the one or more contexts of the person, actuating one or more components of the intelligent assistant device to communicate the one or more contexts of the person in a non-verbal manner. The method may additionally or alternatively comprise: wherein actuating one or more components of the intelligent assistant device to communicate the presence of the person in a non-verbal manner further comprises: at least one light source located on the intelligent assistant device is illuminated. The method may additionally or alternatively comprise: wherein actuating one or more components of the intelligent assistant device to communicate the presence of the person in a non-verbal manner further comprises one or more of: the camera is moved to aim at the person and the display is moved to follow the person's positioning. The method may additionally or alternatively include wherein the one or more contexts of the person include one or more of: (1) positioning of a person relative to the intelligent assistant device; (2) the height of the person; (3) An initial identity of the person, the initial identity corresponding to the previously identified person and representing an initial confidence value; (4) A verified identity of the person, the verified identity representing a verified confidence value that is greater than the initial confidence value; and (5) the distance of the person from the intelligent assistant device. The method may additionally or alternatively comprise: wherein actuating the one or more components to non-linguistically convey the one or more contexts of the person further comprises: illuminating at least one light source located on the intelligent assistant device; and illuminating the at least one light source includes modulating at least one of a frequency, a brightness, a color, a number, and a shape of the at least one light source. The method may additionally or alternatively include wherein the at least one light source is a plurality of light sources, and the plurality of light sources includes a plurality of LEDs. The method may additionally or alternatively include wherein actuating one or more components to non-linguistically convey one or more contexts of the person further comprises: the vector graphics are displayed via a display of the intelligent assistant device. The method may additionally or alternatively include wherein actuating one or more components to non-linguistically convey one or more contexts of the person further comprises: non-verbal cues are projected onto a surface. The method may additionally or alternatively include wherein the person is a first person, the method further comprising: receiving an indication of a presence of a second person from one or more sensors of the intelligent assistant device; and illuminating at least one light source located on the intelligent assistant device to non-linguistically convey that the intelligent assistant device is responsive to natural language input from the first person. The method may additionally or alternatively include wherein the person is a first person and the one or more contexts of the person include a location of the first person, the method further comprising: receiving an indication of a presence of a second person from one or more sensors of the intelligent assistant device; and illuminating at least one light source located on the intelligent assistant device to non-linguistically communicate that the intelligent assistant device is tracking the location of the first person and the location of the second person. The method may additionally or alternatively include wherein the one or more contexts of the person include an initial identity of the person, the method further comprising: data indicative of contextual information of a person is received and fused from a plurality of different sensors of the intelligent assistant device to determine an initial identity of the person. The method may additionally or alternatively comprise: at least one light source located on the intelligent assistant device is illuminated to communicate the type of sensor data in a non-verbal manner, the sensor data being used by the intelligent assistant device to determine one or more contexts of the person. The method may additionally or alternatively comprise: wherein the one or more components comprise a plurality of light sources, and the plurality of light sources comprise at least one infrared emitter, the method further comprising illuminating at least one of the plurality of light sources to one or more of: (1) Reducing the visibility of the at least one infrared emitter, and (2) incorporating light emitted from the at least one infrared emitter into an illumination pattern generated by the at least one light source.
Another aspect provides a smart assistant device configured to respond to natural language input, comprising: a plurality of light sources; a plurality of sensors having one or more cameras; at least one speaker; a logic machine; and a storage machine holding instructions executable by the logic machine to: receiving image data indicating the presence of a person from at least one of the one or more cameras; responsive to receiving the image data, actuating one or more components of the intelligent assistant device to communicate the presence of the person in a non-verbal manner; receiving data indicative of contextual information of a person from one or more of a plurality of sensors; determining one or more contexts of the person using at least data indicative of context information of the person; and in response to determining the one or more contexts of the person, actuating one or more components of the intelligent assistant device to communicate the one or more contexts of the person in a non-verbal manner. The intelligent assistant device may additionally or alternatively include: wherein actuating one or more components of the intelligent assistant device to communicate the presence of the person in a non-verbal manner further comprises: at least one of the plurality of light sources is illuminated. The intelligent assistant device may additionally or alternatively include wherein actuating one or more components of the intelligent assistant device to communicate the presence of the person in a non-verbal manner further includes one or more of: the camera is moved to align the person and the display is moved to follow the person's positioning. The intelligent assistant device may additionally or alternatively include wherein actuating one or more components to communicate one or more contexts of the person in a non-verbal manner further comprises: illuminating at least one light source located on the intelligent assistant device, and illuminating the at least one light source includes modulating at least one of a frequency, a brightness, a color, a number, and a shape of the at least one light source. The intelligent assistant device may additionally or alternatively include: wherein the person is a first person and the instructions are executable to: receiving an indication of a presence of a second person from one or more sensors of the intelligent assistant device; and illuminating at least one light source located on the intelligent assistant device to non-linguistically convey that the intelligent assistant device is responsive to natural language input from the first person. The intelligent assistant device may additionally or alternatively include wherein the person is a first person and the one or more contexts of the person include a location of the first person, and the instructions are executable to: receiving an indication of a presence of a second person from one or more sensors of the intelligent assistant device; and illuminating at least one light source located on the intelligent assistant device to non-linguistically communicate that the intelligent assistant device is tracking the location of the first person and the location of the second person.
In another aspect, there is provided a smart assistant device configured to respond to natural language input, comprising: a housing; a plurality of LEDs located around at least a portion of the housing; a plurality of sensors including at least one camera and at least one microphone; at least one speaker; a logic machine; and a storage machine holding instructions executable by the logic machine to: receiving image data from at least one camera indicating the presence of a person; illuminating at least one LED of the plurality of LEDs in response to receiving the image data to communicate detection of the presence of the person in a non-verbal manner; receiving data indicative of contextual information of a person from one or more of a plurality of sensors; determining one or more contexts of the person using at least data indicative of context information of the person; and responsive to determining the one or more contexts of the person, illuminating at least one LED of the plurality of LEDs to communicate the one or more contexts of the person in a non-verbal manner.
It will be appreciated that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Also, the order of the above-described processes may be changed.
The subject matter of the present disclosure includes all novel and non-obvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as all equivalents thereof.

Claims (18)

1. A method at a smart assistant device for communicating non-verbal cues, the smart assistant device configured to respond to natural language input, the method comprising:
Receiving image data indicative of the presence of a person from one or more cameras of the intelligent assistant device by:
setting a confidence level of the image data to a predetermined value in response to the image data including the person;
In response to the image data not including the person and the image data received prior to one or more time frames including the person, attenuating a confidence level of the image data according to one or more confidence level attenuation functions;
Responsive to receiving the image data, actuating one or more components of the intelligent assistant device to communicate the presence of the person in a non-verbal manner;
Receiving data indicative of contextual information of the person from one or more sensors of the intelligent assistant device;
Determining one or more contexts of the person using at least the data indicative of contextual information of the person, wherein the one or more contexts of the person include: (1) An initial identity of the person, the initial identity corresponding to a previously identified person and representing an initial confidence value, and/or (2) a verified identity of the person, the verified identity representing a verified confidence value that is greater than the initial confidence value; and
In response to determining the one or more contexts of the person, one or more components of the intelligent assistant device are actuated to non-linguistically communicate the one or more contexts of the person by moving a camera to align the person and moving a display to follow one or more of the person's locations.
2. The method of claim 1, wherein actuating one or more components of the intelligent assistant device to communicate the presence of the person in a non-verbal manner further comprises: at least one light source located on the intelligent assistant device is illuminated.
3. The method of claim 1, wherein the one or more contexts of the person comprise at least one context selected from the group consisting of: (1) a location of the person relative to the intelligent assistant device, (2) a height of the person, and (3) a distance of the person from the intelligent assistant device.
4. The method of claim 1, wherein actuating one or more components to non-linguistically communicate the one or more contexts of the person further comprises: illuminating at least one light source located on the intelligent assistant device; and illuminating the at least one light source includes modulating at least one of a frequency, a brightness, a color, a number, and a shape of the at least one light source.
5. The method of claim 4, wherein the at least one light source is a plurality of light sources, and the plurality of light sources comprises a plurality of LEDs.
6. The method of claim 1, wherein actuating one or more components to non-linguistically communicate the one or more contexts of the person further comprises: the vector graphics are displayed via a display of the intelligent assistant device.
7. The method of claim 1, wherein actuating one or more components to non-linguistically communicate the one or more contexts of the person further comprises: the non-verbal cue is projected onto a surface.
8. The method of claim 1, wherein the person is a first person, the method further comprising:
Receiving an indication of a presence of a second person from one or more sensors of the intelligent assistant device; and
At least one light source located on the intelligent assistant device is illuminated to communicate in a non-verbal manner that the intelligent assistant device is responsive to the natural language input from the first person.
9. The method of claim 1, wherein the person is a first person and the one or more contexts of the person include a location of the first person, the method further comprising:
Receiving an indication of a presence of a second person from one or more sensors of the intelligent assistant device; and
At least one light source located on the intelligent assistant device is illuminated to communicate in a non-verbal manner that the intelligent assistant device is tracking the location of the first person and the location of the second person.
10. The method of claim 1, wherein the one or more contexts of the person include the initial identity of the person, the method further comprising: the data indicative of the person's contextual information is received and fused from a plurality of different sensors of the intelligent assistant device to determine the initial identity of the person.
11. The method of claim 1, further comprising: at least one light source located on the intelligent assistant device is illuminated to communicate, in a non-verbal manner, a type of sensor data used by the intelligent assistant device to determine the one or more contexts of the person.
12. The method of claim 1, wherein the one or more components comprise a plurality of light sources, and the plurality of light sources comprise at least one infrared emitter, the method further comprising illuminating at least one of the plurality of light sources to one or more of: (1) Reducing the visibility of the at least one infrared emitter, and (2) incorporating light emitted from the at least one infrared emitter into an illumination pattern generated by the at least one light source.
13. A smart assistant device configured to respond to natural language input, comprising:
A plurality of light sources;
A plurality of sensors including one or more cameras;
At least one speaker;
A logic machine; and
A storage machine holding instructions executable by the logic machine to:
Receiving image data indicative of the presence of a person from at least one of the one or more cameras by:
setting a confidence level of the image data to a predetermined value in response to the image data including the person;
In response to the image data not including the person and the image data received prior to one or more time frames including the person, attenuating a confidence level of the image data according to one or more confidence level attenuation functions;
Responsive to receiving the image data, actuating one or more components of the intelligent assistant device to communicate the presence of the person in a non-verbal manner;
receiving data indicative of contextual information of the person from one or more sensors of the plurality of sensors;
Determining one or more contexts of the person using at least the data indicative of contextual information of the person, wherein the one or more contexts of the person include: (1) An initial identity of the person, the initial identity corresponding to a previously identified person and representing an initial confidence value, and/or (2) a verified identity of the person, the verified identity representing a verified confidence value that is greater than the initial confidence value; and
In response to determining the one or more contexts of the person, one or more components of the intelligent assistant device are actuated to non-linguistically communicate the one or more contexts of the person by moving a camera to align the person and moving a display to follow one or more of the person's locations.
14. The intelligent assistant device of claim 13, wherein actuating one or more components of the intelligent assistant device to communicate the presence of the person in a non-verbal manner further comprises: at least one light source located on the intelligent assistant device is illuminated.
15. The intelligent assistant device of claim 13, wherein actuating one or more components to non-linguistically communicate the one or more contexts of the person further comprises: illuminating at least one light source located on the intelligent assistant device; and illuminating the at least one light source includes modulating at least one of a frequency, a brightness, a color, a number, and a shape of the at least one light source.
16. The intelligent assistant device of claim 13, wherein the person is a first person, and the instructions are executable to:
Receiving an indication of a presence of a second person from one or more sensors of the intelligent assistant device; and
At least one light source located on the intelligent assistant device is illuminated to communicate in a non-verbal manner that the intelligent assistant device is responsive to the natural language input from the first person.
17. The intelligent assistant device of claim 13, wherein the person is a first person, and the one or more contexts of the person include a location of the first person, and the instructions are executable to:
Receiving an indication of a presence of a second person from one or more sensors of the intelligent assistant device; and
At least one light source located on the intelligent assistant device is illuminated to communicate in a non-verbal manner that the intelligent assistant device is tracking the location of the first person and the location of the second person.
18. A smart assistant device configured to respond to natural language input, comprising:
A housing;
A plurality of LEDs located around at least a portion of the housing;
a plurality of sensors including at least one camera and at least one microphone;
At least one speaker;
A logic machine; and
A storage machine holding instructions executable by the logic machine to:
Receiving image data indicative of the presence of a person from the at least one camera by:
setting a confidence level of the image data to a predetermined value in response to the image data including the person;
In response to the image data not including the person and the image data received prior to one or more time frames including the person, attenuating a confidence level of the image data according to one or more confidence level attenuation functions;
Illuminating at least one LED of the plurality of LEDs in response to receiving the image data to non-linguistically convey detection of the presence of the person;
receiving data indicative of contextual information of the person from one or more sensors of the plurality of sensors;
Determining one or more contexts of the person using at least the data indicative of contextual information of the person, wherein the one or more contexts of the person include: (1) An initial identity of the person, the initial identity corresponding to a previously identified person and representing an initial confidence value, and/or (2) a verified identity of the person, the verified identity representing a verified confidence value that is greater than the initial confidence value; and
In response to determining the one or more contexts of the person, at least one LED of the plurality of LEDs is illuminated to communicate the one or more contexts of the person in a non-verbal manner.
CN201980022427.2A 2018-03-26 2019-03-19 Intelligent assistant device for conveying non-language prompt Active CN111919250B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15/936,076 2018-03-26
US15/936,076 US11010601B2 (en) 2017-02-14 2018-03-26 Intelligent assistant device communicating non-verbal cues
PCT/US2019/022836 WO2019190812A1 (en) 2018-03-26 2019-03-19 Intelligent assistant device communicating non-verbal cues

Publications (2)

Publication Number Publication Date
CN111919250A CN111919250A (en) 2020-11-10
CN111919250B true CN111919250B (en) 2024-05-14

Family

ID=65995893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980022427.2A Active CN111919250B (en) 2018-03-26 2019-03-19 Intelligent assistant device for conveying non-language prompt

Country Status (3)

Country Link
EP (1) EP3776537A1 (en)
CN (1) CN111919250B (en)
WO (1) WO2019190812A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BE1030875B1 (en) * 2022-09-13 2024-04-08 Niko Nv TO AUTOMATICALLY PROVIDE A LIST OF PREFERRED SUGGESTIONS ON A CONTROL DEVICE FOR CONTROL OF ELECTRICAL OR ELECTRONIC DEVICES IN A HOME OR BUILDING

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105917404A (en) * 2014-01-15 2016-08-31 微软技术许可有限责任公司 Digital personal assistant interaction with impersonations and rich multimedia in responses
CN107000210A (en) * 2014-07-15 2017-08-01 趣普科技公司 Apparatus and method for providing lasting partner device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE524784T1 (en) * 2005-09-30 2011-09-15 Irobot Corp COMPANION ROBOTS FOR PERSONAL INTERACTION
US20120268604A1 (en) * 2011-04-25 2012-10-25 Evan Tree Dummy security device that mimics an active security device
US9030562B2 (en) * 2011-12-02 2015-05-12 Robert Bosch Gmbh Use of a two- or three-dimensional barcode as a diagnostic device and a security device
US20150314454A1 (en) * 2013-03-15 2015-11-05 JIBO, Inc. Apparatus and methods for providing a persistent companion device
US10360907B2 (en) * 2014-01-14 2019-07-23 Toyota Motor Engineering & Manufacturing North America, Inc. Smart necklace with stereo vision and onboard processing
GB2522922A (en) * 2014-02-11 2015-08-12 High Mead Developments Ltd Electronic guard systems
US10235567B2 (en) * 2014-05-15 2019-03-19 Fenwal, Inc. Head mounted display device for use in a medical facility
US9812128B2 (en) * 2014-10-09 2017-11-07 Google Inc. Device leadership negotiation among voice interface devices

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105917404A (en) * 2014-01-15 2016-08-31 微软技术许可有限责任公司 Digital personal assistant interaction with impersonations and rich multimedia in responses
CN107000210A (en) * 2014-07-15 2017-08-01 趣普科技公司 Apparatus and method for providing lasting partner device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
任萍萍.《智能客服机器人》.成都时代出版社,2017,第147、148页. *

Also Published As

Publication number Publication date
CN111919250A (en) 2020-11-10
EP3776537A1 (en) 2021-02-17
WO2019190812A1 (en) 2019-10-03

Similar Documents

Publication Publication Date Title
US11010601B2 (en) Intelligent assistant device communicating non-verbal cues
US10628714B2 (en) Entity-tracking computing system
KR102223693B1 (en) Detecting natural user-input engagement
JP6568224B2 (en) surveillance
CN111163906B (en) Mobile electronic device and method of operating the same
EP3590002A1 (en) A portable device for rendering a virtual object and a method thereof
CN107111363B (en) Method, device and system for monitoring
US20220180887A1 (en) Multimodal beamforming and attention filtering for multiparty interactions
CN111919250B (en) Intelligent assistant device for conveying non-language prompt
CN113497912A (en) Automatic framing through voice and video positioning
US11112943B1 (en) Electronic devices and corresponding methods for using episodic data in media content transmission preclusion overrides
Mahdi et al. Robot companion for eldery care

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant