EP2920673A1 - Vereinheitlichtes rahmenwerk für vorrichtungskonfiguration, -interaktion und -steuerung sowie zugehörige verfahren, vorrichtungen und systeme - Google Patents

Vereinheitlichtes rahmenwerk für vorrichtungskonfiguration, -interaktion und -steuerung sowie zugehörige verfahren, vorrichtungen und systeme

Info

Publication number
EP2920673A1
EP2920673A1 EP13856018.0A EP13856018A EP2920673A1 EP 2920673 A1 EP2920673 A1 EP 2920673A1 EP 13856018 A EP13856018 A EP 13856018A EP 2920673 A1 EP2920673 A1 EP 2920673A1
Authority
EP
European Patent Office
Prior art keywords
user
information
mechanisms
speech
devices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP13856018.0A
Other languages
English (en)
French (fr)
Inventor
Duncan Lamb
Kenneth Jacobsen
John Evans
Thomas Moltoni
Felice Mancino
Aron Rosenberg
John Long
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aether Things Inc
Original Assignee
Aether Things Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aether Things Inc filed Critical Aether Things Inc
Publication of EP2920673A1 publication Critical patent/EP2920673A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback

Definitions

  • This invention relates to a unified framework for device
  • Computer-based devices in particular consumer computer-based devices are ubiquitous. These devices are typically designed and manufactured independent of each other, and, to the extent that they can interact or make use of each other, they tend to try to use ad hoc or standardized techniques to do so. As a consequence, consumers are often forced to perform complicated setup procedures for devices they acquire or when they try to use different devices together, even when the same vendor makes the devices. Even for single, stand-alone devices, setup procedures are typically complex. Some companies have tried to simplify the use of their own devices, but not the devices of others.
  • FIG. 1 depicts a framework for an exemplary system according to embodiments hereof;
  • FIG. 2 depicts aspects of configuration of a user's device within the framework of FIG. 1 according to embodiments hereof;
  • FIGS. 3A - 3C depict details of the database(s) of the framework of
  • FIG. 1 according to embodiments hereof;
  • FIGS. 4A, 4A-1, 4A-2, 4B, and 4C depict aspects of a typical device for use within the framework of FIG. 1 according to embodiments hereof; [0013] FIG. 4D - 4H depict exemplary organization of corpora within the system according to embodiments hereof;
  • FIGS. 41 - 4N are flowcharts depicting exemplary operation of a device within the framework of FIG. 1 according to embodiments hereof;
  • FIGS. 5A - 5E depict aspects of typical computer systems upon which embodiments of the present disclosure may be implemented and carried out;
  • FIGS. 6A - 61 show exemplary aspects of device provisioning and configuration within the framework of FIG. 1 according to embodiments hereof;
  • FIGS. 7 A - 7J show aspects of the interaction of devices within the framework of FIG. 1 according to embodiments hereof;
  • FIGS. 8A - 8D show aspects of an exemplary specific device for sound rendering according to embodiments hereof.
  • FIGS. 9 A - 9C shown aspects of cooperation between sound- rendering devices according to embodiments hereof.
  • FIG. 1 depicts a framework / system for an exemplary M system
  • An M-enabled device (or M device) 102 may be any device and any kind of device and may be a stand-alone device or integrated into or combined with any kind of device or system.
  • an M-enabled device 102 may be, without limitation, a device that captures and/or creates content (including digital or analog content), a device that produces and/or renders content (again including digital and/or analog content).
  • An M-enabled device may be (or be incorporated in), for example, a camera, a speaker, a computer, a phone, a set-top box, a television, an appliance, etc.
  • a set- top box that includes a camera and a speaker and a television monitor and a computer may be one or more M-enabled devices.
  • M-enabled devices or M devices 102 will be described in greater detail below.
  • a system 100 preferably includes or has access to one or more
  • CAs Certification authorities
  • PKI public key infrastructure
  • M-enabled devices 102 are preferably manufactured (at least in part) by one or more authorized device manufacturers 108. It should be appreciated that devices 102 in FIG. 1 may be different types of devices made by different manufacturers.
  • M-enabled devices 102 may interact with each other and with a backend system 104. It should be appreciated that different types of devices may interact with each other (e.g., an M device embodied in a mobile phone may interact with another M device embodied in a speaker).
  • M-enabled devices 102 are each associated with a user 110.
  • a particular user 110 may have multiple devices 102 associated therewith. In some implementations each device 102 may be associated with only one user 110.
  • the term "user" is an internal entity within the system 100 and is used to define a binding between devices within the system 100.
  • a user 110 may be considered to be an entity that has a certain relationship within the system 100. That entity may correspond to a person or a group of people or any other kind of entity (e.g., a company, a school, a club, etc.).
  • a user 110 is an entity that has registered with the system.
  • a particular person (or entity) may correspond to more than one user 110. For example, a person may choose to have two users 110 within the system 100.
  • the backend 104 comprises one or more backend applications 112, at least some of which interact with one or more databases 114 storing and maintaining information about devices 102 and users 110.
  • the backend 104 may also interact with various other entities, including social networking services 116 (such as Facebook, Linkedln, and the like), and content providers 118 (such as DIO, Pandora, Spotify, and the like). It should be appreciated that while the social networking services 116 are shown as separate from the content providers 118, there may be some overlap between these entities.
  • the backend 104 may interact with entities 120 that can provide added functionality to users 110 (or to devices 102). The added functionality may include, e.g., voice or instruction recognition functionality, face recognition, and the like.
  • the backend 104 may also interact with other miscellaneous external components 122.
  • the added functionality may provide or enhance or improve on functionality provided on some or all devices 102 or to provide functionality in addition to that provided on devices.
  • the added functionality 120 may be used, e.g., to provide functionality that it beyond the hardware capability of the device 120.
  • the added functionality 120 may provide voice recognition beyond that which is provided on a particular device 102 (or beyond that which is possible using the hardware of the particular device 102). Effectively, the added functionality 120 may be used, at least in part, to extend the functionality of any particular M-enabled device 102.
  • Arc #1 refers to interactions between two devices 102.
  • Arc #2 refers to interactions between device manufacturer(s) 108 and devices 102.
  • Arc #3 refers to interactions between CA(s) 106 and devices 102.
  • Arc #4 refers to interactions between CA(s) 106 and users 110.
  • Arc #5 refers to interactions between users 110 and the backend 104.
  • Arc #6 refers to interactions between the CA(s) 106 and the backend 104.
  • Arc #7 refers to interactions between devices 102 and the backend 104.
  • Arc #8 refers to interactions between the backend and the database(s) 114.
  • Arc #9 refers to interactions between the backend and the social networking services 116.
  • Arc #10 refers to interactions between the backend and the content provider(s) 118.
  • Arc #11 refers to the interactions between the backend 104 and entities 120 providing added functionality.
  • Arc #12 refers to interactions between the backend 104 and other miscellaneous external components 122.
  • Arc #13 refers to the interactions between device manufacturer(s) 108 and the Backend 104.
  • Arc #14 refers to interaction between devices 102 and users 110.
  • Arc #15 refers to interactions between CA(s) 106 and device manufacture ⁇ s) 108.
  • the various interactions described here may take place using any known method(s) and protocol(s), and may be wired, wireless or any combination thereof. Some interactions may take place, at least in part, via a network 101 (e.g., a packet-based network such as the Internet).
  • the network 101 may be a public network or a private network or some combination thereof.
  • the network may include one or more cellular and/or satellite components. It should be appreciated that the communications media or protocols through which the various
  • the backend 104 may make use of application program interfaces (APIs) or other interfaces of those components or entities.
  • APIs application program interfaces
  • the backend 104 may be integrated with some or all of the other components.
  • the manner in which the backend communicates with any other components is not intended to be in any way limiting of the system 100, and different modes and manners of communication are contemplated herein.
  • the degree of integration (or lack thereof) between the backend 104 and any other components is not intended to be in any way limiting of the system 100, and different modes, manners, and degrees of integration are contemplated herein.
  • Inter-device interactions may take place using any known method(s) and protocol(s), inter-device interactions (Arc #1 in FIG. 1) preferably use the quickest and cheapest communications techniques - typically local (e.g., Bluetooth, and the like, or Wi-Fi on a local network) when available, and preferably avoid
  • Each user 110 must have at least one unique User Identity (ID) within the system 100.
  • a user's User ID may be based on or derived from that user's social network ID (e.g., from their Facebook ID).
  • a user's User ID may be asserted in the form of a user certificate 175 issued by a CA 106.
  • the CA(s) 106 may include one or more device CAs 124 (shown in
  • FIG. 6B and one or more user CAs 126 (shown in FIG. 2), although it should be appreciated that the device and user CAs 124, 126, may be the same entity.
  • a user's user certificate 175 is preferably provided by the backend during a registration process or the like.
  • the user certificate 175 may be stored in a user's device 174 (e.g., a smart phone, a smart device enabled by M technology, or the like) in a location 176 for such certificates and user identification information.
  • a user's device 174 e.g., a smart phone, a smart device enabled by M technology, or the like
  • a location 176 for such certificates and user identification information.
  • a user 110 may have one or more friends in the system 100.
  • a friend of a user within the system 100 is another user 110 with which the first user has established a relationship via the system 100.
  • the relationship may be established in various ways, e.g., by sharing or interacting with each other's devices 102.
  • a user may establish permissions with that user's friends. The permissions may depend, e.g., on the type of device(s) 102 associated with the user and may differ from friend to friend.
  • the scope of a friendship may be used to limit various aspects of the friendship, e.g., duration of the friend relationship within the system, rights of the friend within the system, etc.
  • a user's friends within the system 100 may be distinguished from the user's social network friends (e.g., the user's Facebook friends), although there may be an overlap.
  • the user's social network friends e.g., the user's Facebook friends
  • the user's social network friends may overlap with the user's friends in the system 100.
  • system is not limited in any way by the manner in which users can establish a "friend" relationship with other users. It should also be appreciated that the system notion of a friend need not bear any relation to actual friendships or other relationships outside of the system 100, and that friend relationship within the system is used to make associations and bindings within the system 100.
  • Each device 102 has a Device Identity (ID) that must be unique within the system 100. Creation and storage of device IDs is discussed in greater detail below.
  • ID Device Identity
  • the backend 104 includes backend application(s) 112, including one or more applications that interact with one or more database(s) 114.
  • Database(s) 114 includes devices database(s) 128 and user database(s) 130, storing and maintaining information about devices 102 and users 110, respectively.
  • the databases may use any kind of database technology, and no particular database implementation is described or required here, and it should be appreciated that the system 100 is not limited in any way by the database implementation. In some cases, some or all of the database implementation may use third party databases. It should also be appreciated that the user and device databases 128, 130 may be integrated or made up of multiple different, possibly distributed databases.
  • the database(s) 114 may maintain
  • device database(s) 128) information about each device 102 within the system 100, including information about the device's owner (a user 110), the device's capabilities, the device's history, and other information such as, e.g., the devices last known location, the device's manufacturer, etc.
  • the device ID is used as a key (or index) into the device database 128.
  • a device's capabilities may include its type (e.g., speaker, etc.). It should be appreciated that the terms "own” and "owner” are not used here to imply or require any legal notion of ownership of a device by a user, and they refer to an association or binding between devices 102 and users 110 within the system 100.
  • the devices database 128 may include device corpora for each device. As will be explained in greater detail below, a device's corpora may correspond to corpora stored on the device, and may be used by the device or by users of the device.
  • the User Database 130 The User Database 130
  • the User ID acts as a primary database key to the user database 130.
  • the user database 130 preferably associates certain information with each user 110, including one or more of the following (with reference to
  • FIG. 3C is a diagrammatic representation of FIG. 3C.
  • a user profile describing information about that user may be linked to or obtain information from social network data of the user (e.g., from the user's Facebook profile).
  • the user profile may also contain information about content providers that the user is associated with (i.e., has accounts with), such as, e.g., DIO, etc.
  • the user's devices 102 and, optionally, information about those devices (e.g., the device's capabilities).
  • information about those devices e.g., the device's capabilities.
  • the user database 130 stores device identifiers for the devices of each user, then those device identifiers can be used to obtain device-specific information for the corresponding devices from the devices database 128.
  • a user's "friends" within the system are other users 110, so that the information about each user's friends may be stored within the user database 130 as a list of zero or more user IDs. Users may be able to set customized permissions for each friend. In some embodiments, users may be able to define or use classes or groups or types of friends and assign permissions based on classification or group or type. In this manner a user may be able to make use of a template or the like to quickly set or change permissions for one or more users. It should be appreciated that the system is not limited by the manner in which users select their friends or set permissions for their friends. Users may also remove other users as friends.
  • the user's history preferably stored or searchable by one or more of:
  • the device history may be a sequence of time-stamped events relating to each of the user's devices. For example, if a particular device 102 is a speaker, that device's history may be a time- stamped list of what was played over that speaker.
  • the user's history relating to friends may include times and locations at which a device of the user interacted with (or was used by) a device of the user's friend (and/or vice versa).
  • the user corpora preferably include local corpora and extended corpora.
  • a user's local corpora may correspond to that user's corpora on one or more devices, and may be used by devices when under control of the user.
  • a user's extended corpora may extend beyond the capacity of some or all devices 102, and correspond to that user's corpora that may be used by the added functionality 120.
  • Configuration information preferably including configuration details about the user's devices and information that may be used to configure other (e.g., new) devices.
  • the configuration information may contain information about the user's wireless network settings in various locations, including passwords and other information that may be needed for a device to connect to those networks.
  • Each device 102 has certain device-specific functionality / components associated therewith. For example, if the device 102 is a speaker (or camera or phone or the like), then the device-specific functionality / components will include functionality / components used to operate the device as a speaker (or camera or phone, etc.). With reference now to FIG. 4A, a device 102 includes mechanisms 132 supporting the device's device-specific functionality / components.
  • mechanism means hardware, alone or in combination with software, and includes firmware.
  • a device 102 also includes various M system mechanisms / data 134 used to support and implement functionality within the system 100.
  • the system mechanisms / data 134 may interact (at 136) with the device's device-specific functionality 132, although the system mechanisms / data 134 are preferably distinct from the device-specific functionality 132.
  • a device 102 preferably includes one or more sensors 138 that may be used by the system mechanisms 134 to implement aspects of the system functionality.
  • a “sensor” means any mechanism or device that can detect and/or measure a physical property or stimulus (e.g., heat, light, sound, pressure, magnetism, motion, touch, capacitance).
  • a sensor preferably provides an indication (e.g., as an electrical signal) of the property or stimulus it
  • a sensor may also be able to record, indicate, or otherwise respond to the physical property or stimulus it detects/measures.
  • a sensor may be implemented in hardware, software, or combinations thereof.
  • a sensor may be provided as a stand-alone device or chip or may be integrated into other devices (or sensors) or chips.
  • the specific sensors 138 may be implemented, at least in part, using specialized chips or circuitry or the like. Those of skill in the art will realize and understand, upon reading this description, that the system is not limited in any way by the manner in which sensors are implemented or integrated. It should also be understood that a particular sensor may detect and/or measure more than one kind of physical property or stimulus.
  • the sensors 138 may include, for example, one or more cameras, microphones, motion sensors (external motion and/or device motion), accelerometers, a compass, location positioning systems (LPSs), and the like.
  • LPS refers generally to any location positioning system that can be used to determine location of the device, and includes the United States' Global Positioning System (GPS) and the Russian GLObal NAvigation Satellite System (GLONASS), the European Union Galileo positioning system, the Chinese Compass navigation system, and the Indian Regional Navigational Satellite System.
  • GPS Global Positioning System
  • GLONASS Russian GLObal NAvigation Satellite System
  • Galileo positioning system the European Union Galileo positioning system
  • Chinese Compass navigation system the Indian Regional Navigational Satellite System.
  • LPS also includes Wi-Fi and/or cellular phone system(s) that can provide position data, as well as assisted and augmented positioning systems, e.g., using Wi-Fi and/or a cellular phone system.
  • Wi-Fi and/or cellular phone system(s) can provide position data
  • assisted and augmented positioning systems e.g., using Wi-Fi and/or a cellular phone system.
  • 3D three-dimensional
  • the sensors 138 may interact with the system mechanisms 134 via standard interfaces that are provided for each sensor (at 140). It should be appreciated that not every device 102 need have every kind of sensor 138, and the different kinds or devices (or different implementations of the same kind of device) may have different sensors or kinds of sensors.
  • a device 102 preferably includes one or more communications mechanisms 142 that may be used by the system mechanisms 134 (at 144) to implement aspects of the system functionality.
  • the communications mechanisms 142 may include, e.g., one or more of: mechanisms for local communication (e.g., Bluetooth, including Bluetooth Low Energy (BLE), ZigBee, etc.), mechanisms for Wi-Fi communication (e.g., 802.1 1, etc.), mechanisms for cellular communication (e.g., modems or other devices using a cellular telephone network, etc.); and mechanisms for wired communication (e.g., Ethernet or the like).
  • the communications mechanisms 142 may be implemented in protocol-specific chips or the like.
  • each device 102 should be able to communicate, at least some of the time, in some manner with the backend 104 (arc #7 in FIG. 1)
  • each device 102 has at least one communications mechanism that allows it to communicate, at least some of the time, with other devices 102 in the system 100.
  • Each device also includes the required/appropriate connectors and/or antenna(e) for its various communication mechanism(s). These connectors and/or antenna(e) may be incorporated into the various communication mechanisms, especially when the communication mechanisms are provided in specialized chips or circuits.
  • the system mechanisms 134 may be integrated into a device 102 as a separate board or chipset or they may share components with the device's mechanisms used to implement the device - specific functionality (as shown by the dotted-line oval shape connecting them). For example, if the device-specific functionality requires a microphone, that microphone may be shared with (or used by) the system mechanisms 134 as a sensor 138. Similarly, at least some of the communications mechanisms may be shared between system mechanisms 134 and the device-specific functionality 132 (as shown by the dotted-line oval shape connecting them). It should be
  • system mechanisms 134, sensors 138, and communications mechanisms 142 need to be able to operate and be controlled independent of the device-specific mechanisms, and that this need may override or prevent sharing of components in some implementations.
  • a device 102 may include a computer system 146 (which is described in greater detail below).
  • the computer system 146 may interact (at 148) with the system mechanisms 134 and may implement aspects of those
  • the computer system 146 may be shared with and be part of the system mechanisms / data 134 (as shown by the dotted oval shape connecting them). Similarly, at least some of the computer system 146 may overlap with the device-specific functionality 132 of the device 102 (as shown by the dotted oval shape connecting them).
  • a device's Device Identity (ID) and other identity and certificate information may be stored (at 150) in the system mechanisms / data 134.
  • Each device 102 preferably includes bootstrapping / provisioning mechanisms 152 and various operational mechanisms 154 (both described in greater detail below).
  • the various operational mechanisms 154 have
  • the operational storage 155 may store data used by the various operational mechanisms, and may be used for persistent and/or temporary storage. Those of skill in the art will realize and understand, upon reading this description, that some or all of the operational storage 155 may be integrated with other storage on the device, and operational storage 155 is shown in the drawing as a separate component to aid with this description.
  • Each device 102 includes at least one power system 157 that can power the device, including the system mechanism(s) 134, the computer system 146, sensors 138, communications 142 and the device specific functionality 132.
  • the power system 157 may include separate systems for some or all of the components, and may include battery power supplies alone or in conjunction with external power supplies.
  • the system mechanisms have a separate power supply (e.g., a battery) from the device specific functionality.
  • an external power supply e.g., A/C power via an adaptor
  • all components of the system should use the external power source even if they have separate internal power systems for use when not connected to an external source.
  • the system is not limited by the manner in which power is supplied to the components.
  • mechanisms 154 may include some or all of the following mechanisms:
  • the device-to-device interaction mechanisms 156 have and may use corresponding device-to-device storage 169 (FIG. 4C).
  • Mechanisms 158 for command and/or control of the device 102 • Mechanisms 158 for command and/or control of the device 102.
  • Command and/or control mechanisms 158 have and may use corresponding command and/or control storage 159 (FIG. 4C).
  • Device-to-backend mechanisms 160 have and may use corresponding device-to-backend storage 161 (FIG. 4C).
  • Interface mechanisms 162 used to operate the device within the system
  • the interface mechanisms 162 have and may use corresponding interface
  • the interface mechanisms 162 may include one or more of:
  • Gesture mechanisms 166 may be used by the operational mechanisms 154 to implement operational and/or functional features that make use of users' gestures (e.g., gesture commands and the like).
  • Gesture mechanisms 166 may have and use gesture storage 167.
  • Gesture mechanisms 166 preferably include one or more gesture detection mechanisms 168 and one or more gesture recognition mechanisms 170, and the gesture storage 167 preferably includes associated gesture corpora 172 for use by the various gesture mechanisms.
  • a gesture corpus (“plural "corpora”) refers to a collection of gestures or gesture samples usable by gesture mechanism(s) to detect and/or recognize gestures.
  • the gesture detection and/or recognition mechanisms 168, 170 may be trained and adapted to detect and/or recognize gestures of one or more people (who may be users), and the associated gesture corpora may be modified based on this training.
  • the gestures mechanism(s) 166 may use one or more sensors 138, including, e.g., camera sensor(s).
  • Voice/speech mechanisms 174 that may be used by the operation mechanisms 154 to implement operational and/or functional features that make use of human (e.g., users') voices (e.g., for voice commands and the like).
  • Voice/speech mechanisms 174 may have and use voice/speech storage 175.
  • Voice/speech mechanisms 174 preferably include one or more voice/speech detection mechanisms 176 and/or voice/speech recognition mechanisms 178, and the voice/speech storage 175 preferably includes associated corpora 180 for use by the various
  • a voice/speech corpus refers to a collection of word or phrase samples usable by voice/speech mechanism(s) to detect and/or recognize voice/speech.
  • the voice/speech recognition mechanisms may be trained and adapted to recognize voice/speech of one or more people (e.g., users), and the associated speech corpora may be modified based on this training.
  • the voice/speech mechanisms 174 may use one or more sensors 138, including, e.g.,
  • Face/gaze mechanism(s) 182 that may be used by the operation mechanisms 154 to implement operational and/or functional features that make use of peoples' faces and/or gazes. Face/gaze mechanism(s) 182 may use face/gaze storage 183. The face/gaze mechanism(s) 182 may include face/gaze detection
  • the face/gaze storage 183 preferably includes associated face/gaze corpora 188 for use by the face/gaze recognition/detection mechanism(s) 182.
  • a face/gaze corpus refers to a collection of face and/or gaze samples usable by face/gaze mechanism(s) to detect and/or recognize faces and/or gazes.
  • the face/gaze recognition/detection mechanisms 182 may be trained and adapted to recognize one or more faces, and the associated face/gaze corpora 182 may be modified based on this training.
  • the face movement detection mechanism(s) 187 may detect movement of parts of a face, e.g., movement of the mouth, eyes, etc., and may be used, e.g., to try to confirm that a person is speaking.
  • Face/gaze mechanism(s) 182 may use sensors 138, including, e.g., camera sensor(s).
  • operation mechanisms 154 to implement operational and/or functional features that make use of other types of user interactions (e.g., touch, typing, device movement, etc.).
  • Other interface mechanisms 190 may use other interface mechanisms storage 191.
  • the other interface mechanism(s) 190 may use sensors 138.
  • corpora for speech, gesture, and face / gaze recognition / detection are mentioned above, those of skill in the art will realize and understand, upon reading this description, that other mechanisms, especially the interface mechanisms 162, may have associated corpora and may also learn and adapt based on interactions the device has within (or outside of the system 100), including interactions with humans, interactions with other devices 102, interactions with the backend 104, and interactions with users 110.
  • a corpus for a particular interface mechanism refers to a collection of samples usable by that particular gaze mechanism to function and perform.
  • the other interface mechanisms 190 may include associated corpora 192.
  • corpora refers to a single “corpus” and/or plural corpora.
  • corpora refers to a single “corpus” and/or plural corpora.
  • a user can train their device(s) to recognize certain phrases and/or gesture patterns.
  • One or more of these signature phrase/gesture patterns may be used (and required), e.g., to trigger some or all of the commands for the device(s).
  • a device may learn (and thus be trained) without any specific user intervention or request, and, preferably, each device learns to recognize user interactions without being specifically requested to do so.
  • a device 102 also preferably includes heartbeat (HB) mechanism(s)
  • the HB mechanism(s) may interact with other operational mechanisms 154, including the device-to-device mechanism(s) 156 and device-to-backend mechanism(s) 158.
  • each operational mechanism 154 on a device 102 is able to operate on the device without any external interaction (that is, without any interaction with any other devices 102 or the backend 104).
  • the various operational mechanisms 154 may operate or use (or be enhanced by) added functionality 120 provided, e.g., via the backend 104 or one or more other devices 102
  • various voice/speech mechanisms 174 may support a limited use of voice commands and the like when used without any external interaction. These voice commands may be limited, e.g., by the capabilities of the various mechanisms and by the capacity (such as memory and computational capacity) of the device.
  • the system 100 may provide extended voice commands and interactions to a device when enhanced by added
  • the limited local voice/speech interactions would be reflected in the voice/speech corpora 180 stored in the voice/speech storage 175 on the device.
  • voice interactions e.g., a device 102 may have a limited corpus of words (e.g., stored as voice/speech corpora 180 in voice/speech storage 175) that it can recognize and that can be used to control aspects of the device.
  • Using the added functionality 120 provides the device access to a potentially much larger corpus as well as to the ability to parse more complex instructions / queries.
  • the corpus on a device 102 may support commands / instructions such as "Play louder”, “Softer”, etc.
  • an external corpus, provided by the added functionality 120 may support more complex instructions such as, e.g., "Play the song I was listening to last night.” This latter request is more complex
  • the interface mechanisms 162 may be provided with or include learning mechanisms that enable them to learn about the users of the device and about how those users interact with the device.
  • the voice/speech mechanisms 174 may learn to recognize the voices of particular people (preferably including a person corresponding to the user 110 associated with the device).
  • a device is preferably initially configured with generic corpora for its various interface mechanisms 162. As a device's interface mechanisms 162 learn, they may update the various corpora to reflect what has been learned. For example, as a device learns to recognize the voice of a particular person, it may update the voice/speech corpora for that person. Preferably a device retains a copy of the initial generic corpora and/or the original corpora may be saved in some location distinct from the device (e.g., in the "cloud").
  • a device may have different corpora for different users or people.
  • each interface mechanism on a device has at least one corpus associated with that user.
  • Other people who use the device may have their own corpora associated therewith.
  • a device 102 monitors for possible interactions
  • the device may monitor continuously (at S402) or it may monitor at specific times and/or under specific conditions. This monitoring preferably uses one or more of the devices sensors 138 (e.g., the devices camera(s),
  • the device preferably buffers or stores potential interactions (at S404). These buffered interactions may be stored in any known and appropriate manner for subsequent use, if needed, by the various interface mechanisms 162. For example, sound detected by the devices microphone(s) may be buffered in a manner suitable for subsequent use, if needed, by the voice/speech mechanism(s) 174. Similarly, external movement detected by the devices camera(s) may be buffered in a manner suitable for subsequent use, if needed, by the device's gesture mechanism(s) 166, and images detected by the device may be stored in a manner suitable for use, if needed, by the devices face/gaze mechanism(s) 182 and/or other mechanism(s) 190. It should be appreciated and understood that not all sensor input may be buffered for every device or type of device. However, it should also be
  • buffering sensor information may allow a device to provide more accurate interactions via its various interface mechanism(s) 162, since the device may be able to reconstruct an interaction even if the device only realizes that there is an interaction occurring some time after the interaction has begun.
  • the amount of information buffered depends on the kind of information (e.g., voice, images, movement history, etc.). Similarly, those of skill in the art will realize and understand, upon reading this description, that different information may be buffered for different periods of time. For example, sounds detected by the devices microphone(s) may be buffered for 30 seconds, whereas images detected by the camera(s) may be buffered for 15 seconds.
  • Buffering may use any technique, e.g., circular or wrap-around buffering, and the system is not limited by the kind of buffering used or the implementation of the buffering. Different buffering techniques may be used for different kinds of information.
  • the device 102 should have sufficient memory to store the required amount of buffered information.
  • the device tries to determine if a possible (buffered) interaction is an actual interaction (at S406). If the device determines that a possible interaction is (or may be) an actual interaction (at S406), the device may continue (at S408) to process that actual interaction. The processing of the actual interaction (at S408) may use the corpora selection process described above with reference to FIG. 41.
  • the determination (at S406) of whether there is (or may be) an actual interaction going on with the device may make use of one or more triggers. For example, if a person starts to give a device voice commands, the device may not know that the sound it is detecting corresponds to those voice commands. However, if the person is also looking at the device while talking (as detectable using gaze detection), then the device may rely on that detected gaze, in
  • a device 102 may operate in lots of different kinds of environments, and environmental factors and associated noise may affect interaction detection and processing.
  • the term "noise" with respect to any kind of information or signal refers to information and/or signals that that may degrade processing of the information or signal (e.g., that may degrade the corresponding detection and/or recognition of information in the signal).
  • the background sound of an air conditioner or fan may interfere with or degrade voice/speech recognition; or a constantly flashing light may interfere with or degrade the face or gesture mechanisms. It is therefore useful for a device to try to filter or remove noise prior to processing.
  • the other mechanism(s) 164 may include one or more noise removal filtering/cleaning mechanisms to remove noise from the inputs detected by the various sensors.
  • One such mechanism is sound-noise cancellation that removes ambient sounds (e.g., from air conditioners and fans) from sound detected (and buffered) by the devices microphone(s).
  • ambient sounds e.g., from air conditioners and fans
  • Different and or other noise removal filters may be used, and these filter mechanisms may adapt and learn from the environment in which the device is situated.
  • FIG. 4J describes an exemplary process a device
  • FIG. 41 Recall that various of the sensors 138 are monitored (at S402, FIG. 41) and the output of the sensors is buffered (at S404, FIG. 41) in case an actual interaction is detected and the sensor output is needed. Output from the various sensors 138 is provided (directly or via buffers) to the various interface
  • the gesture detection mechanism(s) 168 try to determine (at S410) whether any movement that they are detecting via the camera sensor(s) corresponds to a gesture.
  • the face/gaze detection mechanism(s) 184 try to determine (at S412) whether any images that they are detecting via the camera sensor(s) corresponds to at least one face.
  • the voice/speech detection mechanism(s) 176 try to determine (at S414) whether any sound that they are detecting via the microphone sensor(s) corresponds to a speech.
  • the other mechanism(s) 190 try to determine (at S416) whether they are detecting via the sensor(s) 138 (e.g., touch, movement, proximity of other user device) corresponds to an interaction with the device.
  • the user detection mechanism(s) try to detect (at S418) whether a user is interacting with the device.
  • the gaze detection mechanism(s) may be used (at S420) to determine if a detected face is looking at the device, and face movement detection mechanism(s) 187 may be used to determine (at S422) if a detected face is moving in a manner that corresponds to a gesture and/or speech.
  • Face movement detection mechanism(s) 187 may also be used alone or in conjunction with face / gaze detection mechanism(s) 184 to determine whether any detected face is moving in such a way as to correspond to speech and/or gestures.
  • the various interaction detection mechanisms may operate concurrently (as shown in the flow chart depicted in FIG. 4J), although some of the mechanisms (e.g., gaze detection and face movement detection) may depend on determinations of other mechanism(s).
  • the various detection mechanisms may produce a Boolean value (true or false) reflecting their detection decision.
  • the final determination as to whether an interaction has been detected is determined by the logical OR of those values.
  • the final determination as to whether an interaction has been detected may be computed using:
  • any true value i.e., any detection by any detection mechanism
  • any "Yes" value by any mechanism will result in a positive determination that an interaction is taking place.
  • the label “Yes” on the lines in the flowchart of FIG. 4J represent a decision or determination by each of the respective mechanisms that they have most likely detected some feature (e.g., gesture, face, speech, etc.).
  • the label “No” on lines in that flowchart represent a decision or determination by each of the respective mechanisms that they have most likely not detected some feature (e.g., gesture, face, speech, etc.).
  • the various labels on these lines should not be construed to mean that a feature is or is not occurring, only that it was or was not detected with some degree of certainty.
  • the device may use the probabilistic nature of the interaction detection.
  • each detection mechanism may produce a value or number reflecting a degree of certainty of the detection result (e.g., a real number from 0.0 to 1.0, where 0.0 means no interaction was detected and 1.0 means an interaction was certainly detected, or an integer from 0 to 100, with 0 meaning no interaction and 100 meaning definite interaction).
  • the final determination as to whether an interaction has been detected is determined, at least in part, using a score determined as a function of the values produced by the various detection mechanisms. For example, if there are N detection mechanisms, and the i-th detection mechanism produces a real value r x in the range 0.0 to 1.0, and the i-th detection mechanism's score has a weight of w t , then
  • the value Interaction Detected may be determined, e.g., by comparing the Interaction Score (e.g., as determined in equation 2) to a static or dynamic threshold. For example,
  • the function F may produce a weighted sum or average of the values produced by the various detection mechanisms, where different weights may be given to different detection mechanisms, depending, e.g., on their known or perceived or historical accuracy. For example, if there are N detection mechanisms, and the i-th detection
  • the final determination as to whether an interaction has been detected may be computed using:
  • the weights w t are such that the Interaction Score ⁇ 7.0
  • weighted sum is used in this example, those of skill in the art will realize and understand, upon reading this description, that different and/or other functions may be used to determine whether an interaction is taking place. For example, as shown in formula 2 " below, a weighted average score may be used to determine the interaction score:
  • each detection mechanism produces a score (e.g., a real value R, in the range 0.0 to 1.0), and the final determination as to whether an interaction is detected is determined, at least in part, as a function of those scores (e.g., per equation 2 or 2' above).
  • the gesture detection mechanism(s) 168 try to determine (at S410') whether any movement that they are detecting via the camera sensor(s) corresponds to a gesture, and produce a score (3 ⁇ 4 E [0 .. 1]) indicative of this determination.
  • the face/gaze detection mechanism(s) 184 try to determine (at S412') whether any images that they are detecting via the camera sensor(s) corresponds to at least one face, and produce a score (R 2 E [0 .. 1]) indicative of this determination.
  • the voice/speech detection mechanism(s) 176 try to determine (at S414') whether any sound that they are detecting via the microphone sensor(s) corresponds to a speech, and produce a score (R 3 E [0 .. 1]) indicative of this determination.
  • the other interface mechanism(s) 190 try to determine (at S416) whether they are detecting via the sensor(s) 138 (e.g., touch, movement, proximity of other user device) corresponds to an interaction with the device, and produce a score (R 4 E [0 .. 1]) indicative of these determinations.
  • the user detection mechanism(s) try to detect (at S418') whether a user is interacting with the device and produces a score (R 7 E [0 .. 1]) indicative of its determinations.
  • the gaze detection mechanism(s) may be used (at S420') to determine if a detected face is looking at the device. In this case the gaze detection
  • face movement detection mechanism(s) 187 may be used to determine (at S422') if a detected face is moving in a manner that corresponds to a gesture and/or speech and to produce a score (R 6 E [0 .. 1]) indicative of their determinations.
  • the decision as to whether or not to initiate or use the gaze detection and/or face movement detection mechanisms may be based, e.g., on the score (R 2 ) produced by the face detection mechanism(s) (at S412'), where predetermined threshold values (denoted T G and T M in FIG. 4K) may be used to initiate the gaze and / or face movement detections.
  • predetermined threshold values denoted T G and T M in FIG. 4K
  • the values of these thresholds (T G and T M ) may be the same.
  • the threshold values may be preset and fixed or they may be dynamic, based, e.g., on information the system learns about its detection success.
  • each detection mechanism may have a
  • the system preferably produces (at S407) a running value corresponding to the most recent scores (in this example implementation, R .. R 7 ) produced by each of the various detection mechanisms.
  • the mechanism may have a corresponding detection weight associated therewith, and the running value may be produced using a weighted function of the various scores (e.g., using formulas 2, 2' or 2 " above).
  • the decision as to whether or not an interaction has been detected may be based, at least in part, on a comparison between this score (computed at S407) and another threshold value (denoted
  • the threshold value T Interaction may be preset and fixed or dynamic, based, e.g., on information the system learns about its detection success.
  • the various interaction detection mechanisms / processes may proceed in parallel with and independent of each other. However, it should be appreciated that the various mechanisms / processes may, in some cases, have access to and use the scores produced by the other mechanisms. These other mechanisms' scores may be used, e.g., to trigger other detection (as in the exemplary cases in FIG. 4K where the score (R 2 ) produced by the face detection at S412' is used as a trigger for gaze and face movement detection).
  • the weights for the various detection mechanisms may vary dynamically based on information the device has learned from prior detections.
  • Each weight is a value in the range 0 to max_weight for some value of
  • max_weight For the purposes of this description, assume that max_weight is 1.0, although any value can be used, as long as the score threshold value (T Interaction in the example above) is set accordingly. Initially all mechanisms may be given the same weight (e.g., max_weighf), and then the weights assigned to certain mechanisms may be adjusted based on the accuracy or usefulness of those mechanisms. For example, in a darkened room, the gesture and face detection mechanisms may be given reduced weight, or in a noisy room (where the device cannot adequately filter out the noise), the speech detection mechanism(s) may be given reduced weight. Once the light in the room changes or the noise in the room is reduced, the weights of the corresponding mechanisms may be adjusted upwards. In addition, as the device learns from its prior decisions, weights can be adjusted.
  • the weights should be set conservatively high and the corresponding threshold values should be conservatively low in order to detect possible interactions with the device.
  • Each of the detection mechanisms consumes power, and the constant use of these mechanisms by the device may cause it to use too much power.
  • the need for accurate interaction detection must therefore be balanced with the need to conserve power on the device.
  • no limit need be placed on the user of detection mechanisms. Triggers for the various mechanisms may be used to conserve power.
  • the speech detection mechanisms may be triggered by a simpler sound detection mechanism (that consumes less power)
  • the face and gesture detection mechanisms may be triggered by a simpler mechanism (that consumes less power) that detects changes in the images captured by the camera sensor(s).
  • devices may be put into a mode (e.g., a sleep mode or an ignore mode) during which they perform minimal interaction detection.
  • This mode may be entered, e.g., based on a time of day, location, user instruction, or some other factor. For example, a device may be set to perform minimal interaction detection between midnight and 7 AM on weekdays. In some cases, where a device is used in a crowded or noisy location (e.g., a speaker at a party or a phone at a busy airport), the device may be set to ignore certain interactions or to require triggers or confirmations for interactions.
  • a mode e.g., a sleep mode or an ignore mode
  • certain interactions may be disabled based on what the device is currently doing, e.g., speech input may be disabled while a music playing device is actively playing music, but reactivated between pieces of music.
  • This kind of setting may be done by adjusting the weights associated with some of the detection mechanisms (e.g., set the weight for gesture detection very low at a dance party). Setting a low weight for a particular detection mechanism will not disable that mechanism and so it will still operate and consume power.
  • a mechanism may detect a gesture that the device may construe as a possible interaction.
  • certain interaction detection mechanisms can be turned off or disabled so that they do not operate or consume any power. In some cases this may be achieved by setting the corresponding weight for a detection mechanism to zero (0.0).
  • each detection mechanism resets its score to zero (or false) after some period of time (e.g., 5 - 10 seconds) or after a detected interaction has actually been carried out.
  • the device preferably uses the corpora of the user associated with the device to perform the recognition processes. As will be explained below, once a possible interaction is detected, the device may use (or try to use) different corpora to recognize and process the actual interaction.
  • the device Upon detection of possible interaction(s) (S406, FIG. 41), the device preferably maintains an indication of what kind of interaction(s) may be going on. This may be achieved by having the process of determining possible interactions (S406, FIGS. 4J, 4K) produce and provide information for use by subsequent processing.
  • the process may set an array or vector of values corresponding to the score produced by each detection mechanism for use by subsequent processing.
  • Other approaches may be used for the device to maintain information about possible interactions, and the system is not limited by the manner in which this information is maintained or communicated.
  • the device may not know (or need to know) which corpora to use for the various interface mechanisms 162.
  • the device 102 uses corpora associated with the user 110 with which the device is associated.
  • a device may use other corpora (e.g., generic corpora or corpora associated with other authorized users) at this stage of processing.
  • the device proceeds to process the interaction.
  • the processing of the actual interaction preferably uses the interface mechanisms 162 along with other operational mechanisms 154 to process the interaction.
  • the device may need to use corpora for the various interface mechanisms 162.
  • the device first determines (at S409) which corpora to use for the various interface mechanisms.
  • the corpora (both on the device and at the backend) may be organized, e.g., as shown in FIGS. 4D - 4H, so that when the device recognizes a particular person by their face, gestures, voice, or in some other way, the other corpora associated with that person may be determined and used by the corresponding mechanisms. For example, if a device first recognizes a person's face, the device may, if needed, access and use the other corpora associated with that person (FIG. 4D) (or it may make an association based on a user's smart device (e.g., phone or tablet). Similarly, if a device first recognizes a person's gestures (FIG. 4E), voice/speech (FIG. 4F), or some other aspect of the person (FIG. 4G), the device may, if needed, access and use the other corpora associated with that person. Additionally, if the device can determine the user it is
  • the device may, if needed, access the corpora for that user (FIG. 4H).
  • a "user" 110 is a notion used to form some kind of association or binding of one or more devices within the system.
  • a user 110 typically corresponds to a person, however not all people who interact with a device or who attempt to interact with a device are necessarily users within the system 100.
  • FIG. 4M An exemplary flow process used to select corpora (S409 in FIG. 4L) is shown in FIG. 4M.
  • the device includes at least some generic corpora for the various interface
  • These generic corpora may be included with the device at time of manufacture or during subsequent provisioning or configuration. To reach this process (selecting corpora), the device has detected some kind of interaction (at S406, FIG. 41).
  • the device may have detected one or more of sound, external movement, touch, movement of the device, etc. This detection may use some or all of the devices sensors 138, so that, e.g., the devices microphone may detect sound, the devices camera(s) may detect external movement, the devices accelerometer(s) may detect movement of the device, the devices touch sensors may detect that the device is being touched by a person or another device, the device may detect interaction from a user, etc.
  • the device may have an indication of what possible type of interaction was detected (e.g., using the bit vector (for the implementation of FIG. 4J) or a vector of score values (for the implementation of FIG. 4K)).
  • the device may use this information to try to determine what the interaction is and which corpora to use.
  • the device may then determine (at S426,
  • FIG. 4M whether it can recognize a particular user or person. If the device does recognize a user or person then (at S428, FIG. 4M) the device selects the corresponding corpora for that user or person. On the other hand, if the device does not recognize the potential interaction as corresponding to any user or person known to the device (at S426, FIG. 4M), then the device selects the device's generic corpora (at S430, FIG. 4M).
  • the device will use the appropriate interface mechanism(s) (at S426) in order to try to recognize a person or user known to the device.
  • a person or user is considered to be known to a device if the device has at least one corpus stored for that person or user.
  • a person may be known to a device based on previous interactions with the device, because the person is the user associated with the device, or because the person is associated with (e.g., a friend of) the user associated with the device and has been given permission to access the device in certain ways.
  • the user associated with the device should always be known to the device.
  • the device may use gesture detection and recognition mechanisms 168, 170 (at S432) to try to determine whether the gestures detected correspond to those of a known user/person. If the device does recognize the gestures (at S432) as those of a known user/person, then the device selects the corresponding corpora (at S434). The device may use the mapping shown in FIG. 4E to determine corpora based on recognized gestures.
  • the device may also (or instead) use the face/gaze detection and recognition mechanisms 184, 186, in conjunction with sensors 138 (e.g., sensors that are camera(s)), (at S436) to try to determine whether the interaction is that of a user/person known whose face is known to the device or to the system. If the device determines that a nearby or visible face is known to the device (i.e., corresponds to a user / person known to the device), then the device may select corresponding corpora based on that face (at S438). The device may use the mapping shown in FIG. 4D to determine corpora based on a recognized face.
  • sensors 138 e.g., sensors that are camera(s)
  • the device may also (or instead) use detected sound (using sensors 138 that are microphone(s)) and the voice/speech detection and recognition mechanisms 176, 178 to try (at S440) to determine whether the detected sound is speech, and, if so, whether it corresponds to speech of a person/user known to the device. If the detected sound is determined (at S440) to be speech of a person/user known to the device, then the device may select appropriate corpora based on that speech (at S442). The device may use the mapping shown in FIG. 4F to determine corpora based on recognized speech.
  • the device may also (or instead) recognize a person / user using some other interface mechanism(s) 190 (at S444), in which case the device may select the appropriate corresponding corpora for that person / user based (at S446), e.g., using the mapping shown in FIG. 4G.
  • the device may also (or instead) recognize that the interaction is with a known user (at S448), in which case the device may select the appropriate corresponding corpora for that user based (at S450), e.g., using the mapping shown in FIG. 4H.
  • FIG. 4M shows various recognition attempts (at S426), it should be appreciated that not all of these steps are performed (or even be available) in every device or type of device. Furthermore, it should be appreciated that in some devices or types of devices, the steps may be performed in parallel, in series, or in some combination thereof. In some cases, multiple tests may be used to recognize and/or confirm recognition (in S426) before corpora is selected.
  • a particular device or type of device may first use face recognition and then, only if that fails, use some other technique.
  • a particular device or type of device may simultaneously try to recognize faces (at S412, FIG. 4J; S412', FIG. 4K) and users (at S418, FIG. 4J; S418', FIG. 4K), and then, optionally, follow up with other recognition approaches.
  • a conflict may occur, e.g., when one mechanism identifies one user/person and another recognition mechanism identifies a different user/person. This may occur when the device is unable to determine enough information about the person/user interacting with the device, and may be because the device does not have enough information about that person and/or because there is more than one person potentially interacting with the device. For example, a new device may not yet have enough information about its potential users to make accurate recognition decisions, so that the different mechanisms may make different recognition decisions.
  • a device may also be confused because there is more than one person in the vicinity, so that its sensors are picking up details from different people. For example, if there are multiple people in the device's vicinity, its microphone(s) may be picking up the voice of one person while its camera(s) are picking up the face of another person.
  • a particular sensor will recognize more than one person.
  • the camera(s) in the device may recognize more than one face or they may find multiple faces.
  • a device 102 preferably has at least one conflict resolution strategy.
  • a conflict resolution strategy may be adaptive, with the device learning based on prior interactions and recognition decisions. In some cases, the device may use a weighted function of the recognition decisions, for example, giving most weight to the user recognition (at S448), less weight to face recognition (at S436), still less weight to voice/speech recognition (at S440), and so on.
  • a conflict resolution strategy may be dynamic, changing over time (e.g., based on learning).
  • a device 102 may also include various optimizations to improve recognition of whether a person is trying to interact with the device, and, if so, which person.
  • One exemplary optimization is to use gaze detection (using the face/gaze mechanism(s) 182) to determine whether someone is actually looking at the device. Gaze detection may be used, e.g., to select a corpus, as a trigger for other recognition and interactions, and/or as part of a conflict resolution strategy. For example, if the device detects sound (using its microphone(s)), the device may not know whether that sound
  • gaze detection does not require or depend of face recognition.
  • the device may use face movement detection mechanism(s) 187 to determine whether a face that it has detected (using the face/gaze detection mechanism(s) 184) is moving in a way that might be used to confirm (or reject) that detection. For example, if the system finds multiple faces (using face/gaze detection mechanism(s) 184), and also detects speech (using voice/speech mechanism(s) 174), then any faces that show movement corresponding to speech (e.g., where the mouth is moving) are preferred candidates for selection. It should be appreciated that the face movement detection mechanism(s) 187 need not interpret speech, they need only detect mouth or jaw movement or some other kind of movement that may correspond to speech.
  • face movement detection mechanism(s) 187 may be used to detect movement of other parts of a face, e.g., eyebrows, eyes, jaw, mouth, etc., for use by the gesture mechanism(s) 166. Face movements may themselves be gestures recognized by a device, or they may be used to confirm other detected gestures.
  • gestures may be used alone or to confirm other detected information (e.g., faces, voices, movements, etc.)
  • each interface mechanism 162 may be used alone or in combination with other interface mechanisms. When used together, the various interface mechanisms may be used to confirm each other and/or as triggers for each other.
  • the interaction detection e.g., S406 in FIG. 4J, S406' in FIG. 4K
  • corpus/corpora selection FIG. 4L
  • some of the determinations made the interaction detection may be used in subsequent processing, e.g., in determining whether a person / user is recognized (in S426, FIG. 4M).
  • the device in the process of detecting a possible interaction (S406 in FIG. 4J, S406' in FIG. 4K), the device may have determined sufficient information about a person / user interacting with the device to skip (or simplify) the process of user / person recognition (S426).
  • the device may proceed (at S411) to determine the actual interaction, using the corpora determined (at S409) and the various interface mechanisms 162.
  • the actual interaction may be determined (in full or in part) by the processes of person recognition (S426) and corpus selection (S428). If these two processes do not result in a determination of the actual interaction, then the device proceeds to determine the actual interaction (at S411).
  • the device determines (at S409 " ) which corpora to use for the actual interaction. If the detect user process (e.g., S418' in FIG. 4K) produced a score greater than some threshold value (denoted T user in FIG. 4N), then the device tries to recognize the user (at S418 " ) and, if successful, then selects corpora for the recognized user (at S450 ' j.
  • the detect user process e.g., S418' in FIG. 4K
  • some threshold value denoted T user in FIG. 4N
  • the device fails to recognize a user (preferably an authorized user), or if score ( 7 ) produced by the detect user process/mechanism does not exceed the threshold value (T user ), then the device selects corpora associated with the device's user (which may be generic corpora). With corpora selected (at S409 " ), the device proceeds to determine the actual interaction (at S411 ' .
  • the device may use scores produced by various interface detection mechanisms to determine whether or not to invoke corresponding recognition mechanisms. For example, as shown in FIG. 4N, the device may invoke the gesture recognition mechanism(s) 170 (at S410 " ) if the score (3 ⁇ 4) produced (e.g., at S410' in
  • FIG. 4K by the gesture detection mechanism(s) 168 exceeds a threshold value (denoted T Gesture in FIG. 4N).
  • the device may invoke the face
  • the device may invoke the following process: (1) if the score (R 2 ) produced (e.g., at S412' in FIG. 4K) by the face detection mechanism(s) 168 exceeds a threshold value (denoted T Face in FIG. 4N). And similarly, the device may invoke the following process: (1) if the score (R 2 ) produced (e.g., at S412' in FIG. 4K) by the face detection mechanism(s) 168 exceeds a threshold value (denoted T Face in FIG. 4N). And similarly, the device may invoke the
  • voice/speech recognition mechanism(s) 178 (at S414 " ) if the score (R 3 ) produced (e.g., at S414' in FIG. 4K) by the voice/speech detection mechanism(s) 176 exceeds a threshold value (denoted T Speech in FIG. 4N); and the device may invoke the other interface mechanism(s) 190 (at S416 " ) if the score ( 4 ) produced (e.g., at S416' in FIG. 4K) by the detect other interface mechanism(s) 190 exceeds a threshold value (denoted T 0t her in FIG. 4N).
  • the threshold values described with reference to the exemplary implementations may be static or dynamic, and the invocation of any particular recognition mechanism(s) may be based on different and/or other factors. In some implementations, some or all of the threshold values may be modified based on information the device learns from prior interactions. Those of skill in the art will realize and understand, upon reading this description, that the system is not limited in any way by the values or triggers used to invoke various recognition mechanisms.
  • recognition mechanisms e.g., once a gaze is detected, that information may be used to trigger interaction recognition, but it may no longer be needed by the actual recognition. It should be appreciated that some of the information used during the detection process (in addition to the scores produced by the various mechanisms) may be provided the recognition mechanisms. So, e.g., the face recognition mechanism (invoked at S412 " in FIG. 4N) may already have information from the gaze detection mechanism in order to know which face to try to recognize. Alternatively, the face and/or gesture recognition mechanisms may, themselves, invoke the gaze detection and/or face movement detection
  • the device proceeds (at S413) to carry out the instructions or commands or queries associated with the actual interaction. Determining and/or carrying out the actual interaction (at S411, S413) may be done locally, on the device, or it may require (or benefit from) information from the backend 104 (e.g., from the databases 114), interaction via the backend with some other entity (e.g., social networking service(s) 116, or content provider(s) 118, etc.), as well as processing or assistance from the added functionality entities 120. Carrying out the actual interaction may also involve interacting with another device 102. Interactions between the device and the backend correspond to arc #7 in FIG.
  • interactions with the backend database(s) 114 correspond to arc #8; interactions with the social networking service(s) 116, or content provider(s) 118 correspond to arcs #9 and #10, respectively, and interactions with added functionality entities 120 correspond to arc #11.
  • a device's interactions with other devices correspond to arc #1 in FIG. 1.
  • Carrying out the actual interaction may require the device to use previously buffered information (S404, FIG. 41). For example, if a device detects and buffers sound that may be a speech interaction, the device may not start the speech recognition until (or unless) the device also detects a person looking (gazing) at the device (e.g., S420, FIG. 4J, S420', FIG. 4K). The gaze thus acts as a trigger for subsequent speech recognition (e.g., S414 " , FIG. 4N).
  • This approach allows a device to capture speech or other information (e.g., gestures) that a person starts to give while not yet looking at the device, and continues to provide while looking at the device. It should be appreciated that other interactions (e.g., gestures, facial movements, tapping the device, etc.) may be used as triggers for subsequent recognition of interactions, including
  • the various mechanisms used for detection and recognition of interactions may include learning features so that they may become more accurate over time and use.
  • that mechanism may update corresponding corpora for that user.
  • a speech recognition mechanism learns the speech of a particular user, it may update the speech corpora associated with that particular user.
  • the device itself may learn how a particular user interacts with the device and may optimize its handling based on that learning. For example, if a particular user always uses the same gesture (e.g., pointing to the device) in combination with speech commands, then the device can learn that pattern. In that example, the pointing gesture may be given more weight as a trigger for speech recognition. As another example, if a particular user always uses a particular hand gesture along with face movements (e.g., eyebrow raising) and certain words, then those features, in combination, can be given higher weight by the device.
  • face movements e.g., eyebrow raising
  • the separations in FIG. 4B are not required, and that some or all of the mechanisms may be combined into different and/or other functional units, including into a single functional operational mechanism.
  • the gesture detection and recognition mechanisms 168, 170 may be part of a single gesture mechanism.
  • the voice/speech detection and recognition mechanisms 176, 178 may be part of a single gesture mechanism.
  • the face/gaze detection and recognition mechanisms 184, 186 may be part of a single face/gaze detection and recognition mechanism.
  • the logical depiction of the operational mechanisms' storage 155 in FIG. 4C is given to aid in this description and is not intended to limit the scope of this description in any way. Those of ordinary skill in the art will appreciate and understand, upon reading this description, that different and/or other storage organizations are possible and are contemplated herein.
  • provisioning refers to the process of installing (or updating) the various system mechanisms used by the device. Provisioning may include installing and/or updating firmware or software on a device.
  • provisioning may include installing and/or updating firmware or software on a device.
  • configuring refers to the process of establishing or setting operational options and/or parameters for the various mechanisms used by the device. For example, configuring a device may include setting passwords for the device, establishing network parameters for the device, and so on.
  • provisioning refers to the process of installing (or updating) the various system mechanisms used by the device. Provisioning may include installing and/or updating firmware or software on a device.
  • configuring refers to the process of establishing or setting operational options and/or parameters for the various mechanisms used by the device. For example, configuring a device may include setting passwords for the device, establishing network parameters for the device, and so on.
  • a device 102 is in a pre- provisioned state when it has not yet been provisioned with system mechanisms.
  • a device 102 is in a generic provisioned state when it has been provisioned with system mechanisms but is not yet associated with a user 110.
  • a device 102 is in an unconfigured state when it has not yet been configured.
  • a device 102 is in a generic configured state when it has been configured for use on the system 100, but not for a particular user 110.
  • a device 102 is said to be in a user-provisioned state when it has been provisioned for a particular user 110.
  • a device 102 is said to be in a user-configured state when it has been configured for a particular user 110.
  • Various possible state transitions (denoted Tl, T2 ...
  • T6 for a device 102 are shown in FIG. 6A.
  • These states and transitions are merely descriptive and exemplary, and are used here only to aid in this description. It should be appreciated that this system is not limited by these various states, what they are named or the state transitions described here. It should also be appreciated that these states and transitions may be independent of the device specific
  • the first aspect is essentially independent of any user and conforms the device 102 to then-current and/or localized versions of all system mechanisms. This aspect corresponds to the state transition 77 for a device from pre-provisioned to generic provisioned and, possibly, transition T2 from unconfigured to generic configured.
  • the second aspect provisioning and configuring a device 102 within a framework 100 conforms the device to the settings/requirements of a particular user 110 currently associated with the device.
  • this aspect corresponds to the state transition T3 from generic provisioned to user provisioned and, possibly, transition T4 from generic configured to user configured.
  • a device's association with a particular user may change, possibly only temporarily or under limited conditions (e.g., location, duration, use, etc.).
  • this second aspect of provisioning and configuring may also correspond to the state transition T5 where a user-provisioned device is provisioned for a different user, and/or to the state transition T6 where a user-configured device is configured for a different user.
  • the process of provisioning a device thus may include installing the latest versions of all mechanisms (e.g., software, firmware, etc.) on the device.
  • the manufacturer 108 typically installs a version of the system mechanisms 134 on the device (along with versions of other mechanisms such as those for the device-specific functionality).
  • those versions are often out of date even at the time of manufacture.
  • the bootstrap / provisioning mechanism 152 may update some or all mechanisms (e.g., software, firmware, etc.), either at first power on, or even before the device is shipped or sold.
  • a device 102 may be shipped in a low power mode with a timer set to power up after a fixed period (e.g., 36 hours) and to then to run the bootstrap / provisioning mechanism 152 in order to try to update all mechanisms.
  • the device 102 when powered on, will search for known wireless (Wi-Fi) networks or other ways to connect to the Internet.
  • Wi-Fi wireless
  • the device is pre-configured (as part of its generic
  • the device uses those networks to connect, via the network 101) to a known and trusted source location from which updates can be obtained. Once a network connection is found and established, the device 102 can begin updating itself from the trusted source. In this manner, when a device reaches a user (e.g., after it is sold to the user); the device should have the most current version (or a recent version) of all mechanisms and be fully (or substantially fully) provisioned for that user.
  • each device 102 is configured (e.g., at time of
  • an initial generic configuration for the device may include names and passwords for various wireless network and/or generic credentials to support secure wireless (e.g., cellular, WiFi, Bluetooth, or BLE) communication between the device and the backend 104.
  • secure is used here to refer to communications channels that can be trusted and that are preferably not spoofed.
  • the degree of security is a function of the type of device, and different types of devices may require different degrees of security for device- to-device and device-to-backend communications.
  • FIG. 6A is a diagrammatic representation of FIG. 6A.
  • each device 102 is preferably associated with a user 110.
  • a device may be associated with no user (e.g., when it is first manufactured), but a device may not be associated with more than one owner (and preferably not with more than one user at a time).
  • a device 102 may be used by more than one person or user 110, but, within the system 100, the device is only associated with a single user.
  • a device may be associated with a user as part of a provisioning step of the manufacturing and/or provisioning processes (e.g., if a user orders or purchases the device and provides the user's identification (User Identity) prior to manufacture of the device 102.
  • a provisioning step of the manufacturing and/or provisioning processes e.g., if a user orders or purchases the device and provides the user's identification (User Identity) prior to manufacture of the device 102.
  • the first provisioning process may take place without the device being associated with a user.
  • a second level of provisioning and configuration preferably takes place once the device 102 is associated with a user 110.
  • that device 102 can obtain configuration information (e.g., wireless network information such as network IDs and passwords) from that user.
  • configuration information e.g., wireless network information such as network IDs and passwords
  • the device may obtain information from the user database 130 about that user.
  • the device 102 may obtain user profile information, user local corpora, and/or configuration information about that user. This information may be stored by the device for use, e.g., by the operational mechanisms 155.
  • the device 102 may obtain user local corpora from the user database 130 and store those corpora in the corresponding appropriate interface mechanisms' storage 163 on the device. In this manner, if a user has already used a particular device or kind of device, a newly acquired device may not have to be re-trained to detect and recognize various interactions with that user.
  • Some information from the user database 130 may be encoded in a one-way encoding (e.g., using a cryptographic hash such as MD5) on the device.
  • a cryptographic hash such as MD5
  • the device IDs and user IDs are not exposed, but (as will be explained later) information in the lists may be used (e.g., to evaluate possible relationships between two devices).
  • the device and user databases are preferably updated (preferably in near real time) to reflect the provisioning and configuration states of each device, so that preferably the device database provides a current view of the state and configuration of each device in the system.
  • devices that are not connected or that cannot connect to a network 101 or use some other method (e.g., a cellular system), may defer providing information to the backend.
  • each device updates the backend with its state information while connected and when reconnecting to a network 101.
  • a device 102 connects to the backend 104 in some manner, that device preferably obtains updates from the backend, including any updates relating to the device's user.
  • each device 102 should regularly update the corpora in the device database 128 to reflect the current corpora on the device. These updates may take place on a regular scheduled basis (e.g., once a day) or whenever the device can connect to the databases (via the backend 104) and determine that the device database needs updating.
  • the corpora information in the user database 130 should reflect the current state of the corpora on each user's device(s).
  • the user database 130 should be updated (e.g., in the same manner as the device database) to reflect the latest corpora for that user.
  • the user database 130 may include user local corpora and user extended corpora.
  • the user local corpora correspond to the corpora on the user's devices.
  • the user extended corpora correspond to corpora used by the backend or other external systems (e.g., added functionality 120) to process user interactions.
  • the user local corpora may include limited speech recognition corpora that a device 102 can use, whereas the user extended corpora may include extended speech recognition corpora for that user that can be used by, e.g., the added functionality 120.
  • Corpora for each user may be organized or stored on the user database 130 based on the kind or capabilities of the user's devices. This allows the system to support multiple kinds of devices.
  • the database preferably stores those corpora based on the kind and abilities of the device.
  • the device database 128 and the user database 130 maintain prior versions of corpora.
  • a user may have multiple devices of the same type (e.g., multiple speakers). It is preferable for the corpora on each of the user's devices (especially devices of the same type) to have the most current version of that user's corpora for that type of device. Accordingly, each device should routinely contact the backend to determine whether it has the latest version of the corpora. If not, the device may obtain the latest version of the corpora from the user database 130.
  • the provisioning and configuration of the device corresponds to the state transitions T3 (and possibly T4) in FIG. 6A.
  • FIGS. 6B - 61 show examples of manufacture of devices 102.
  • Device manufacturer 108 is preferably authorized by the system 100 to make system-enabled devices 102.
  • each device has a unique serial number provided by the manufacturer, and each device has a Device ID that is a function of the device's unique serial number.
  • each device has a unique serial number provided by the M system. It should be appreciated that the unique serial number used by the M system may differ from other serial numbers used by the device manufacturer.
  • a device manufacturer 108 provides (at S601) the device serial numbers (either individually or in batch) to a Device Certificate Generator 172 (which could be part of the backend 104).
  • the Device Certificate Generator 172 uses device information provided by the manufacturer (e.g., the serial number) to create (at S602) a unique Device ID for that device (i.e., for the device associated with or to be associated with the serial number). That unique Device ID is sent to a device CA 124 (at S603) which puts it into a certificate signed by the Device CA 124.
  • the Device CA 124 then sends the signed certificate back to the manufacturer (at S604, S605) to be put into the device.
  • the signed certificate with the unique Device ID is stored in the device's certificates 150.
  • the information provided by the manufacturer also includes information about the device's capabilities and/or components.
  • the backend 112 receives information about that certificate (including possibly a copy of the certificate) from the backend 112 (at S606).
  • the device ID associated with the certificate may be used by the devices database 128 as a key or index into the database. Since the backend 104 provided information used to generate the device IDs, information about the device (e.g., its capabilities) may already be known by the backend, and so these can be associated with the device ID in the devices database 128.
  • information in the certificate is encrypted so that it may only be read with an appropriate decryption key by the device.
  • the device certificate generator 172 may provide certificates to the manufacturer in bulk, based on a list of serial numbers from the manufacturer.
  • the system generates (or obtains) blocks of serial numbers and associated certificates, and provides those serial numbers and certificates in blocks to the manufacturers.
  • the backend 104 generates serial numbers and provides them (at S651) to a device certificate generator 172'.
  • the device certificate generator 172' may be part of and/or co-located with the backend 104.
  • the Device Certificate Generator 172' uses information provided by the backend (e.g., the serial number) to create a unique Device ID for the device associated with or to be associated with the serial number. That unique Device ID is sent to a device CA 124 (at S653) which puts it into a certificate signed by the Device CA 124. The Device CA 124 then sends the signed certificate back to the Device Certificate Generator 172' (at S654) which sends it back to the backend (at S655).
  • the signed certificate with the unique Device ID is to be stored in a device's certificates 150.
  • the information in the certificates is preferably encrypted so that it may only be read by authorized devices.
  • the Device Certificate Generator 172' may encrypt the serial number and unique Device ID before providing the certificate to the Device CA 124.
  • the backend 104 provides the device manufacturer 108 with serial numbers and corresponding certificates (at S656), preferably a block of such numbers and certificates.
  • the manufacturer may use some or all of those certificates in devices, and provides the backend 108 (at S657) with a list of the serial numbers/certificates it uses.
  • the backend since the backend has a list of all serial numbers and copies of all certificates, it can track numbers used and can thereby verify information provided by manufacturers. For example, if a manufacturer fails to report the use of a particular serial number in a particular device, the backend will detect use of that serial number when that particular device connects to the system. Similarly, if a manufacturer uses the same serial number/certificate combination in multiple devices, the backend will detect the duplication.
  • FIG. 6E depicts aspects of the manufacturing embodiment described in FIG. 6D.
  • the dashed vertical line in FIG. 6E is provided to show which aspects are carried out by the system and which are carried out by the manufacturer.
  • the system generates serial numbers and certificates (at S661) and provides these to the backend (at S662), preferably in blocks of numbers/certificates.
  • the manufacturer gets the block of
  • the manufacturer may report additional information (at S665) such as device type, capability, etc.
  • FIG. 6F (corresponding to state changes Tl and T2, respectively, in FIG. 6A) are shown in the flow diagram in FIG. 6F.
  • the device has a device ID associated therewith (as described above).
  • the manufacturer provides the required mechanisms and other components (e.g., system mechanisms 134, sensors 138, communications 142) as needed for the device 102 (at S608).
  • These mechanisms and other components may be provided in the form of a single board or kit with connections for the device specific components, or they may be fully or partially integrated with the device specific components. When provided as a kit or board, not all components may be active or activated. For example, certain components may be deactivated based on the location of the device, the kind of device, or because users will be charged additional amounts for their subsequent activation.
  • the device With the mechanisms installed, the device is provided with an initial configuration (at S610).
  • the initial configuration may include generic corpora, as needed, for various mechanisms.
  • the initial configuration may also include information supporting the device's connection to the backend (e.g., information about known or trusted networks, etc.).
  • the device maintains a list or manifest of all mechanisms and their current state of configuration, including version information and corpora details.
  • a device is initially provisioned and configured (e.g., during manufacture), its current manifest is provided to the backend for storage in the database entry associated with the device (at S612).
  • a provisioned/configured device may update its mechanisms and configuration prior to being associated with a user (at S614).
  • a device may use a known/trusted Wi-Fi connection to update its firmware during shipping. Any such updates should be reflected in the device's manifest, both on the device and in the device database (at S616).
  • the device database may maintain a history of updates to the device.
  • Some mechanisms e.g., the voice/speech recognition, gesture recognition, etc.
  • the mechanism may have firmware and/or corpora included therewith, and updates to those components may have to be obtained from the third parties.
  • each user 110 must have at least one unique User Identity (ID) within the system 100.
  • ID User Identity
  • each user 110 obtains their User ID by registering with the system 100. User registration may take place via an offline process, via a web interface to the backend, or via a device 102. A user may register prior to having any device(s) 102. As part of a user's registration (as explained above), each user has a user ID that must be unique within the system 100. Once a user registers, an entry for that user is made in the user database 130, preferably keyed or indexed primarily on that user's user ID.
  • the user's database entry (corresponding to their User ID) is populated during the registration process to include information provided by the user (e.g., via a form or questionnaire) and/or information that the system can obtain or deduce based on information the user provides (directly or indirectly). For example, if the user uses a social network ID or registers via social network, then information about the user from that social network may be included in the database.
  • the user database 130 is preferably indexed using the User ID, those of ordinary skill in the art will appreciated and understand, upon reading this description, that different and/or other keys may be used to access data in the user database.
  • multiple devices 102 may be associated with each user 110. While the devices preferably have no hierarchical status, it is useful to describe the process of configuring a first device that a user obtains.
  • each user preferably has at least one user device 174 (e.g., a smart phone or the like) that has that user's user ID and associated user certificate 175 stored thereon.
  • the user device 174 may also have one or more system mechanisms 178 (e.g., in the form of an application (app) running on the user device 174).
  • the system mechanism / user app 178 provides the user with a way to configure the user device 174 within the system 100 as well as to configure other aspects of the user device.
  • the user device 174 may provide the user with a way to set Wi-Fi and other network information (such as identification data, service set identifiers (SSIDs) and passwords for local Wi-Fi networks).
  • SSIDs service set identifiers
  • This information may be stored as configuration information 180 on the user device 174 (FIG. 2) and may also be stored as configuration information associated with the user in the user database 130 (FIG. 3A).
  • This configuration information is sensitive and should be maintained in secrecy (e.g., via encryption).
  • the user device 174 may be an instance of a device 102.
  • a user obtains a user ID from the system (e.g., as described above).
  • the system then creates a user database entry for that user (preferably keyed or indexed on the User ID) (at S622).
  • the system then populates that database fields for that user (at S624).
  • Each device 102 needs to be associated with a user 110 in order for the device to operate fully within the system 100. It should be appreciated that a device that is not associated with any user may still be able to provide some or all of its device-specific functionality.
  • a user acquires a device 102 (e.g., a new device), that device needs to be associated with that user.
  • a device e.g., a new device
  • Exemplary association of a device with a user is described with reference to FIG. 6H.
  • the device is associated with the user in the device and user databases 128, 130.
  • the Owner of the device is set to the user's User ID.
  • the device's unique device ID is added to the devices associated with the user's User ID.
  • Information about the device may be added to other fields in the user database 130, and the history may be updated to reflect the user's association with this device.
  • a device may become associated with a user by having the user touch (or tap) the device with another device of the user's.
  • the particular device when a particular device is not yet owned by any user, the first time that particular device is tapped by another device, the particular device becomes associated with the user of the other device.
  • the particular device may obtain the user's configuration information (S628 in FIG. 6H) from the backend database(s) and/or from the user's device that tapped it.
  • a subsequent touch (or tap) from another device may be used to provide temporary permissions to the particular device, e.g., to allow the devices to be combined in some way or to allow the particular device to inherit some configuration information (preferably temporarily) from the device that touched it.
  • devices Once devices have been paired (e.g., by touch), they may then share information via a Bluetooth, BLE, or WiFi signal or the like. It should be appreciated that sharing information may be in multiple forms, for example, metadata may be shared via Bluetooth while content (e.g., music or video content) may be shared over WiFi.
  • two devices may detect each other's presence (e.g., by a Bluetooth signal within their range) to continue or re-establish collaboration.
  • each device maintains some information about the user(s) associated with that device (e.g., system mechanism(s) / data 134, FIG. 4A).
  • the information on the device may be updated (at S628) to reflect information about the user.
  • This information may include at least some of the information associated with the user's User ID in the user database 130.
  • the information stored on the device 102 may include the user's user ID, information from the user's profile, information about other devices associated with the user, information about the user's friends (e.g., by their respective user IDs), user certificates, user corpora, and user configuration information. It should be appreciated that at least some of the user's information on the device should preferably be stored and maintained in a secure manner, e.g., in encrypted form.
  • the device database 128 may need to be updated (at S630) to reflect changes made to the device. For example, if the user's local corpora were stored on the device (in place of whatever corpora were already there, e.g., the generic corpora), then the device database 128 should be updated to reflect this
  • the exemplary process described here to associate a device with a particular user 110 in the system 100 corresponds to the state changes T3, from generic provisioned to user provisioned, and T4, from generic configured to user configured in FIG. 6A.
  • the device may first need to be restored to a state in which it has no user information associated therewith.
  • the device may be restored by any technique of restoring it to its factory settings (or pre -user settings). This kind of reset corresponds to the state changes T3 ⁇ from user provisioned back to generic provisioned, and T4', from user configured back to generic configured in FIG. 6A.
  • a device may obtain user configuration information from the user database 130 (or from another device) when the device first becomes associated with the user (S628 in FIG. 6H).
  • the user information may change (e.g., the user gets a new friend within the system or the user has updated or new wireless network information, or the user has new cellular communication information, etc.), that information should propagate from the database to the device (and vice versa).
  • the providing of the user's configuration information to the device (at S628), and updating the user and device databases 128, 130 (at S630), is repeated as needed (when possible).
  • a device may check the databases when it can (e.g., when it is able to connect to the backend 104) to determine if it has the latest version of the user's information.
  • the backend may try to push notifications to the device to alert it about updates to the user information. Since a device may change (or cause a change) to a user's information (e.g., a device may have updated corpora or network configuration information), these changes also need to propagate back to the device and user databases. This process is reflected in
  • FIG. 61 which shows repeatedly (as needed or when possible) providing information from the device (at S632) to the backend 104, and then updating the user and device databases accordingly (at S634).
  • any update to the user database 130 for a particular user may require corresponding updates to be sent to that user's devices and corresponding updates to the device database 128.
  • the corpora and/or configuration information should propagate to that user's other devices. In this manner, a user need not have to independently or separately train or configure all of their devices, and the each device can benefit from the training and
  • a user's devices may get out of synch with each other and / or with the information in the user and device databases 128, 130. This may happen, e.g., when devices are unable to connect to the backend for some period of time.
  • the system preferably applies a conflict resolution technique in order to synchronize devices and the databases.
  • An exemplary conflict resolution approach may use time stamps to select the most current versions of configuration and corpora information.
  • Another exemplary conflict resolution approach may always assume that the versions of configuration and corpora information in the user database are correct.
  • any conflict resolution technique can be performed without user intervention, although the user may be provided with an interface to the backend and/or to that user's devices (e.g., via an app on a user device such as phone 174 or via a web interface) to allow the user to select specific
  • a user may be able to force (e.g., push) updates to their devices.
  • force e.g., push
  • Corpora are used, e.g., by the various interface mechanisms 162 in the devices 102.
  • voice / speech recognition mechanism(s) 178 may use local speech corpora (stored on the device 102).
  • voice / speech recognition may be affected by a number of factors, even for the same voice / speech recognition mechanism(s) 178.
  • different quality or kinds of input sensor e.g., microphones
  • corpora in the user database 128 may be organized based on hardware specifics of the users' devices.
  • a particular user 110 has multiple devices, some of which have a first hardware configuration for their system mechanism(s) 134 and/or sensors 138, and others have a second hardware configuration for their system mechanism(s) 134 and/or sensors 138.
  • the devices with the first hardware configuration use a first set of corpora for their corresponding operational mechanisms
  • the devices with the second hardware configuration use a second set of corpora (distinct from the first set of corpora) for their corresponding operational mechanisms.
  • a first device with the first hardware configuration updates its corpora (e.g., a speech recognition corpus or a gesture recognition corpus), that update should be sent to the user database 130 and to the device database 128, but it should only propagate to other devices of that particular user having the first hardware configuration.
  • its corpora e.g., a speech recognition corpus or a gesture recognition corpus
  • a device 102-A may obtain configuration information from another device 102-B.
  • a device may obtain information from another device by having the two devices touch each other. These interactions correspond to the device-to-device interactions #1 depicted in FIG. 1, and may be implemented, at least in part, by device-to-device mechanisms 156.
  • a device may obtain information from another device by being instructed to obtain it by a user command.
  • a user's device may also obtain configuration and other information from the user database.
  • each device 102 preferably includes heartbeat (HB) mechanism(s) 194 (FIG. 4B).
  • the Heartbeat mechanism(s) 194 on a device 102 have two primary functions: (1) to generate heartbeat messages (heartbeats), and (2) to monitor for heartbeats from other devices.
  • heartbeat mechanism(s) 194 on a particular device 102 may be used to provide various signals to the system (e.g., the backend 104) and/or to other devices 102 about the state or existence or presence of the particular device 102.
  • a device's heartbeat (HB) mechanism(s) 194 may use the device's communications mechanisms 142 to broadcast the device's heartbeat (and associated information) via one or more of the device's mechanisms for local communication (e.g., Bluetooth, including BLE, ZigBee, etc.), the device's mechanisms for Wi-Fi communication (e.g., 802.1 1 , etc.), the device's
  • the device's mechanisms for local communication e.g., Bluetooth, including BLE, ZigBee, etc.
  • Wi-Fi communication e.g., 802.1 1 , etc.
  • Each heartbeat message may contain information that allows other components of the system 100 (e.g., the backend 104, other devices 102) to recognize (and possibly confirm) that it is a heartbeat message, and information identifying the device so that other components of the system 100 can recognize (and possibly confirm the device identifying
  • Heartbeat messages may be broadcast via the different communication mechanisms.
  • a heartbeat message intended for the backend 104 and sent via the network 101 or a cellular network may be sent out daily or when some historical information is to be provided.
  • a heartbeat message intended for other devices and broadcast via the device's local communications mechanisms e.g., Bluetooth, BLE, or the like
  • sent on a local network to which the device is connected may go out every minute (or at some other regular and short time interval).
  • a heartbeat signal should include some information about the device, preferably at least the device's Device ID. For example, as shown in
  • a heartbeat signal 700 from a device includes an encoding of the corresponding device ID and, optionally, an encoding of the user ID for the owner of the device.
  • the heartbeat signal may include additional information (shown by the dotted line in the drawing in FIG. 7A).
  • the signal sent to the backend may include additional information such as, e.g., the device's location, history, etc.
  • a local heartbeat signal may include only the device ID.
  • Information in a heartbeat signal is preferably protected (e.g., via encryption).
  • the device ID and user ID may also be encoded with a one-way encoding (e.g., a cryptographic hash such as MD5) to prevent their exposure.
  • Each device 102 should also routinely (preferably continuously) monitor for heartbeats from other devices 102 that may be nearby or on the same network (e.g., using the device's mechanisms for local communication as well as the device's mechanisms for Wi-Fi and wired communication). Another device may be said to be nearby a particular device if the particular device can pick up the other devices heartbeat via the particular device's local communications mechanism(s). Other notions of nearness may be used and are contemplated herein.
  • FIG. 7B shows an example device A (102-A) broadcasting a heartbeat signal via the device's heartbeat (HB) mechanism(s) 194-A using the device's communications mechanisms 142-A (e.g., a local communications mechanism).
  • a second device (device B - 102-B), detects device-A's heartbeat signal via device-B's communications mechanism 142-B.
  • device B is also broadcasting its heartbeat signal and that device A may detect device B's heartbeat signal.
  • each of the devices may also be sending other heartbeat signals via different communications mechanisms.
  • FIG. 7C shows exemplary processing by each device 102 (e.g., by the heartbeat mechanism 194 of the device) to monitor for heartbeats from other devices (at S702). If a heartbeat from another device is detected (at S704), then that heartbeat is processed (at S706), otherwise the system continues to monitor for heartbeats (at S702). Once the detected heartbeat is processed (or once the device begins to process a detected heartbeat), the device continues to monitor for heartbeats from other devices.
  • Some devices 102 may operate alone and in combination with one or more other devices. It should be appreciated that devices do not have to be homogeneous or even of the same kind to operate together. For example, devices that are speakers may operate together (as will be described below). As another example, a device that is a speaker may operate together with a device that is a video display.
  • devices 102 are said to be joined if they are combined and cooperate for the purpose of at least some aspects of their operation.
  • Devices may be joined in various ways. In some cases, a device may join one or more other devices merely by being put in proximity to the other devices. In some cases, devices may be joined by specific instructions via a user interface. In some cases, devices may be joined by having one of them touch the other.
  • devices may cooperate without changing their ownership. That is, a device of one user may cooperate or be joined with the device of another user without either of the devices changing ownership (i.e., without either device becoming associated with a different user in the system 100).
  • a device's processing of a heartbeat detected from another device may depend on a number of factors, including, e.g., at least some of the following:
  • the devices are owned by friends. • Whether the devices can cooperate in some way (this may depend, at least in part, on the kind of device or on each devices specific functionality, the devices proximity and/or what each device is already doing). For example, a smartphone device and a speaker device may cooperate to play music from the smartphone device on the speaker device if they are in the same room; or two speaker devices may cooperate to both play the same music that one of them is already playing. Two headphone devices may, on the other hand, not cooperate if they are both already playing (i.e., rendering) sound.
  • FIG. 7D shows exemplary processing (at S706) by a device
  • device A determines the device ID of device B from the received heartbeat message (e.g., heartbeat signal 700-A in FIG. 7B).
  • the device ID may be encoded in the signal in such a way that it can be extracted by other devices.
  • a device's heartbeat may contain a cryptographic hash (e.g., an MD5 hash) of the device's device ID. In these cases, other devices can use the hash of the device ID as the basis of their decisions and the device ID itself does not get exposed.
  • a cryptographic hash e.g., an MD5 hash
  • device A Having determined the other device's device ID (at S708) (or a hash thereof), device A then determines if it and device B are owned by the same user (i.e., if they are co-owned) (at S710). Recall that each device stores and maintains information from the user database 130 for the user of that device. This
  • Device A can determine if the device ID in the heartbeat message matches a device ID in the list of that user's devices. If the device ID is hashed or one-way encoded in some other manner, then the device A may store the list of co-owned devices using the same encoding.
  • device A evaluates possible cooperation with device B (at S712). Possible cooperation may depend on a number of factors, as noted above. In addition to cooperation with respect to their underlying functionality (e.g., as speakers, etc.), when co- owned devices find each other (e.g., via a heartbeat), the devices may share and/or update configuration information, as needed.
  • device A Whether or not the devices actually cooperate (as determined in S712), device A preferably updates its history to reflect its encounter with device B (at S714). Device A may then advise the backend of that encounter (at S716). Note that preferably the processing of a detected heartbeat by a device takes place without the device having to contact the backend. Therefore advising the backend of the encounter need not take place until another routine connection is made with the backend. Note too that if either device updates its configuration as a result of the encounter then that update should eventually propagate back to the backend.
  • device A tries to determine (at S718) if device B is owned by a friend of the owner of device A.
  • the heartbeat message may contain an encoding of the user ID of the user of device B, and that each device stores information from the user database 130, including a list of friends.
  • the user ID in the heartbeat message can be compared to the user IDs in the list of friends to determine if there is a match. It should be appreciated that if the user ID in the heartbeat message is one-way encoded then the user IDs in the friends list should be similarly encoded.
  • device A evaluates possible cooperation between the devices (as cooperation between friends' devices) (at S720).
  • This kind of cooperation may include the same kinds of cooperation as between co-owned devices (in S712), however it may depend on permissions associated with the friend. Friends' devices may even share some configuration and corpora information, however this is preferably done on a limited and temporary basis.
  • device A preferably updates its history to reflect its encounter with device B (at S714), and again, device A may then advise the backend of that encounter (at S716).
  • device A determines (at S722) whether it has encountered device B before. Device A may then record information about device B (at S724) and then proceeds to update its history (at S714) and, eventually, to advise the backend of the encounter (at S716).
  • co-owned devices may need to communicate with each other in order to synchronize their configuration information.
  • Each device 102 therefore preferably has at least one mechanism or process running that is listening for communications from other devices via the various communications mechanisms.
  • Two devices that encounter each other may then interact further, as needed.
  • Communication between two devices e.g., via a local wireless mechanism or a local wired network, etc.
  • two devices may first encounter each other via a heartbeat on one communications mechanism (e.g., Bluetooth or BLE) and then have subsequent communication using a different communication mechanism (e.g., Wi-Fi).
  • FIG. 7E Exemplary processing for potential cooperation between co-owned devices (at S712) is shown in FIG. 7E.
  • the device A the one that detected the heartbeat of device B
  • the devices may (at S728) update / synchronize their configuration information (if needed).
  • the device also determines information about itself and the other device in order to determine if any cooperation is possible and desired.
  • the device may determine the device-specific functionality of the other device (at S730), what the other device is doing (at S732), and what the device itself is doing (at S734). This information may be used to determine (at S736) possible cooperation(s) between the devices. Protocols may be established for various devices or types of devices to support their cooperation. For example, in some implementations one device is touched to another to establish or indicate a desired cooperation between them.
  • device A may also determine (at S738) if there have been any indications of desired cooperation (e.g., if one device has touched the other or if a person has instructed the devices to cooperate with certain devices if and when found). Based on the information determined (at S730, S732, S734, S736, S738), the device may select and initiate a possible cooperation (at S740). If multiple cooperation(s) are possible (as determined at S736), then the selection of one of them may be favored by an indication of desired cooperation (as determined at S738).
  • device A detected device B's heartbeat
  • device B also detected device A's heartbeat.
  • a convention may need to be established as to which device makes certain decisions.
  • One exemplary convention is that the device that first initiates a contact with the other device takes the lead, if/when needed, in decision making.
  • Another possible approach is to have the device with the highest device ID take the lead, if/when needed, in decision making.
  • device A initiates a cooperation with device B (at S740) does not mean that device B will go along with the cooperation.
  • devices A and B may negotiate and agree cooperation before it is initiated, in which cases device B goes along as agreed.
  • FIG. 7F Exemplary processing for potential cooperation between friends' devices (at S720 in FIG. 7D) is shown in FIG. 7F.
  • This process is similar to that for co-owned devices (described above with reference to FIG. 7E), but (i) requires that the devices have permission to cooperate, and (ii) preferably only updates configuration information needed to support any chosen cooperation.
  • device A if permitted (based on the friend permissions associated with the owner of device B), device A establishes a connection with device B (at S742). If permitted, device A determines the specific functionality of the device B (at S744) and what device B is doing (at S746). Device A determines what it is currently doing (at S748). Based at least in part on some of the information it has
  • device A determines (at S750) possible cooperation with device B.
  • Device A also determines (at S752) if there have been any other indications of desired cooperation between the devices. Based at least in part on these determinations, device A selects (at S754) a possible and permitted cooperation with device B. If multiple cooperation(s) are possible (as determined at S736), then the selection of one of them may be favored by an indication of desired cooperation (as
  • Devices A and B update (at S756) their configuration information (as and if needed) to support the selected/permitted cc cooperation. Then the permitted cooperation is initiated (at S758).
  • a device may join one or more other devices merely by being put in proximity to the other devices; in some cases, devices may be joined by specific instructions via a user interface; and in some cases, devices may be joined by having one of them touch the other.
  • these other factors e.g., touch, proximity, specific command, etc.
  • each device 102 may routinely monitor for some type of contact from other devices (at S760).
  • the type of contact detected may depend on one or more factors such as, e.g., the type of device (i.e., on its underlying device-specific functionality).
  • Some devices may attempt to initiate contact with other devices based on physical touch, user instructions, etc.
  • the detection of a contact attempt from another device may involve interpretation of voice and/or gesture instructions (using voice/speech mechanism(s) 174 and/or gesture mechanism(s) 166), from sensor input (from one or more sensors 138), etc.
  • the device proceeds (at S764) to process the contact with the other device.
  • Processing of a possible contact attempt may be similar to the heartbeat processing described above (with reference to FIGS. 7D - 7F).
  • the first device when a first device detects a contact attempt from a second device, the first device will still need to determine the device ID of the second device (and vice versa), determine if the devices are co-owned or owned by friends, and process the contact attempt accordingly.
  • a device may assume that the other device (the device that initiated the contact) is trying to establish some form of cooperation between the devices.
  • the processing to evaluate cooperation between co-owned devices (at S712 in FIGS. 7D, 7E) and evaluate cooperation between friends' devices (at S720 in FIGS. 7D, 7F) may be modified as described here with reference to FIGS. 7H - 13.
  • the primary difference between the processing is that the desired cooperation is given precedence in selecting a possible cooperation.
  • the device- initiated desired cooperation is selected and initiated (at S740' in FIG. 71) if it is determined to be a possible cooperation; and in evaluating device -initiated cooperation between friends' devices (at S720' in FIG. 13) if the desired device- initiated cooperation is possible and permitted.
  • Devices that are cooperating also need to be able to end their cooperation.
  • Cooperation may be terminated in various ways, including, without limitation, by one or more of the devices being powered off, by explicit user instruction, by a change in permissions (as determined from updated configuration information received by a device), by the devices becoming separated in such a way that they can no longer cooperate (e.g., one device is removed to a different room in a house).
  • Those of skill in the art will realize and understand, upon reading this description, that different and/or other ways of terminating device cooperation may be used.
  • Programs that implement such methods may be stored and transmitted using a variety of media (e.g., computer readable media) in a number of manners.
  • Hard-wired circuitry or custom hardware may be used in place of, or in combination with, some or all of the software instructions that can implement the processes of various embodiments.
  • various combinations of hardware and software may be used instead of software only.
  • FIG. 5A is a schematic diagram of a computer system 500 upon which embodiments of the present disclosure may be implemented and carried out.
  • the computer system 500 may include a bus 502 (i.e., interconnect), one or more processors 504, one or more communications ports 514, a main memory 506, read-only memory 508, removable storage media 510, and a mass storage 512.
  • a "processor” means one or more microprocessors, central processing units (CPUs), computing devices, microcontrollers, digital signal processors, or like devices or any combination thereof, regardless of their architecture.
  • An apparatus that performs a process can include, e.g., a processor and those devices such as input devices and output devices that are appropriate to perform the process.
  • Processor(s) 504 can be custom processors or any known processor, such as, but not limited to, an Intel® Itanium® or Itanium 2® processor(s), AMD® Opteron® or Athlon MP® processor(s), or Motorola® lines of processors, ARM-based processors, and the like.
  • Communications port(s) 514 can be any of an RS-232 port for use with a modem based dial-up connection, a 10/100 Ethernet port, a Gigabit port using copper or fiber, or a USB port, and the like.
  • Communications port(s) 514 may be chosen depending on a network such as a Local Area Network (LAN), a Wide Area Network (WAN), or any network to which the computer system 500 connects.
  • the computer system 500 may be in communication with peripheral devices (e.g., display screen 516, input device(s) 518) via Input / Output (I/O) port 520.
  • peripheral devices e.g., display screen 516, input device(s) 518) via Input / Output (I/O) port 520.
  • peripheral devices While referred to herein as peripheral devices, it should be
  • Such devices may be integrated into the form of a device comprising the computer system 500.
  • a computer system that is used in a cellular phone has the display screen and input device as part of the phone.
  • the peripheral devices if provided, may be combined (e.g., in the case of a touch screen or the like).
  • Main memory 506 can be Random Access Memory (RAM), or any other dynamic storage device(s) commonly known in the art.
  • Read-only memory 508 can be any static storage device(s) such as Programmable Read-Only Memory (PROM) chips for storing static information such as instructions for processor 504.
  • Mass storage 512 can be used to store information and instructions. For example, hard disks such as the Adaptec® family of Small Computer Serial Interface (SCSI) drives, an optical disc, an array of disks such as Redundant Array of Independent Disks (RAID), such as the Adaptec® family of RAID drives, or any other mass storage devices may be used.
  • SCSI Small Computer Serial Interface
  • RAID Redundant Array of Independent Disks
  • Bus 502 communicatively couples processor(s) 504 with the other memory, storage and communications blocks.
  • Bus 502 can be a PCI / PCI-X, SCSI, a Universal Serial Bus (USB) based system bus (or other) depending on the storage devices used, and the like.
  • Removable storage media 510 can be any kind of external hard-drives, floppy drives, IOMEGA® Zip Drives, Compact Disc - Read Only Memory (CD-ROM), Compact Disc - Re-Writable (CD-RW), Digital Video Disk - Read Only Memory (DVD-ROM), SDRAM, etc.
  • Embodiments herein may be provided as one or more computer program products, which may include a machine-readable medium having stored thereon instructions, which may be used to program a computer (or other electronic devices) to perform a process.
  • machine- readable medium refers to any medium, a plurality of the same, or a combination of different media, which participate in providing data (e.g., instructions, data structures) which may be read by a computer, a processor or a like device.
  • Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.
  • Non-volatile media include, for example, optical or magnetic disks and other persistent memory.
  • Volatile media include dynamic random access memory, which typically constitutes the main memory of the computer.
  • Transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise a system bus coupled to the processor. Transmission media may include or convey acoustic waves; light waves and electromagnetic emissions, such as those generated during radio frequency ( F) and infrared (IR) data communications.
  • F radio frequency
  • IR infrared
  • the machine-readable medium may include, but is not limited to, floppy diskettes, optical discs, CD-ROMs, magneto-optical disks, ROMs, RAMs, erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), SDRAMs, magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing electronic instructions.
  • embodiments herein may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer by way of data signals embodied in a carrier wave or other propagation medium via a
  • communication link e.g., modem or network connection.
  • data may be (i) delivered from RAM to a processor; (ii) carried over a wireless transmission medium; (iii) formatted and/or transmitted according to numerous formats, standards or protocols; and/or (iv) encrypted in any of a variety of ways well known in the art.
  • a computer-readable medium can store (in any appropriate format) those program elements which are appropriate to perform the methods.
  • main memory 506 is encoded with application(s) 522-1 that supports the functionality as discussed herein (the application 522-1 may be an application that provides some or all of the functionality of the services described herein, e.g., backend processing).
  • Application(s) 522-1 (and/or other resources as described herein) can be embodied as software code such as data and/or logic instructions (e.g., code stored in the memory or on another computer readable medium such as a disk) that supports processing functionality according to different embodiments described herein.
  • application(s) 522-1 may include backend applications 524-1, and when a computer system 500 is used to implement functionality of a device, then applications 522-1 may include device applications 526-1.
  • processor(s) 504 accesses main memory 506 via the use of bus 502 in order to launch, run, execute, interpret or otherwise perform the logic instructions of the application(s) 522-1.
  • Execution of application(s) 522-1 produces processing functionality of the service related to the application(s).
  • the process(es) 522-2 represent one or more portions of the application(s) 522-1 performing within or upon the processor(s) 504 in the computer system 500.
  • process(es) 522-2 may include backend process(es) 524-2; and when a computer system 500 is used to implement functionality of a device, then process(es) 522-2 may include device process(es) 526-2.
  • the application 522-1 itself (i.e., the un-executed or non-performing logic instructions and/or data).
  • the application 522-1 may be stored on a computer readable medium (e.g., a repository) such as a disk or in an optical medium.
  • the application 522-1 can also be stored in a memory type system such as in firmware, read only memory (ROM), or, as in this example, as executable code within the main memory 506 (e.g., within Random Access Memory or RAM).
  • application 522-1 may also be stored in removable storage media 510, read-only memory 508, and/or mass storage device 512.
  • the computer system 500 can include other processes and/or software and hardware components, such as an operating system that controls allocation and use of hardware resources.
  • OS operating system
  • a kernel may be processes on the computer system.
  • embodiments of the present invention include various steps or operations. A variety of these steps may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor programmed with the instructions to perform the operations. Alternatively, the steps may be performed by a combination of hardware, software, and/or firmware.
  • module refers to a self-contained functional component, which can include hardware, software, firmware or any combination thereof.
  • an apparatus may include a computer/computing device operable to perform some (but not necessarily all) of the described process.
  • Embodiments of a computer-readable medium storing a program or data structure include a computer-readable medium storing a program that, when executed, can cause a processor to perform some (but not necessarily all) of the described process.
  • a process is described herein, those of skill in the art will appreciate that the process may operate without any user intervention. In another embodiment, the process includes some human intervention (e.g., a step is performed by or with the assistance of a human).
  • a device 102 includes a computer system 146.
  • computer system 146 may correspond to a computer system 500 as described above (with reference to FIGS. 5A-5C), although, as should be appreciated, the computer system 146 may not include all of the components shown in FIG. 5A, and the computer system 146 may include additional components (e.g., related to special processing required by the device 102).
  • a computer system 146 may include multiple processors, multiple memories, etc.
  • a computer system 146 may be formed of multiple computer systems 500.
  • the computer system 146 may implement some or all of the device-specific functionality 132.
  • System-enabled devices 102 may be controlled by one or more of voice control, gesture control, contact control (e.g., using buttons and the like).
  • certain kinds of system-enabled devices 102 when in the presence of other like devices, may be fully or partially controlled by one or more of the other devices or by instructions given to one or more of the other devices.
  • the devices 102 are speakers, multiple devices may be combined to operate together.
  • certain commands e.g., raise volume
  • a device's voice mechanism(s) 166 may be used to support voice control of the device.
  • Voice mechanism(s) 166 preferably include voice recognition mechanisms for basic commands appropriate to the kind of device. For example, for a device that is primarily a speaker, the voice commands may include commands to power the device off (or on from a low power mode), play louder, softer, etc.
  • the voice mechanism(s) 166 may be implemented using special hardware or circuitry and DSPs (Digital Signal Processors).
  • Each device preferably maintains a corpus of recognized words from users.
  • the device may maintain multiple corpora of words, one for each of a number of users. Since a device may be controlled by more than one person (and, depending on permissions set in the device, the person controlling a device may not be a known user of the system), the device needs to be able to associate certain commands with appropriate users. In this manner the device can determine which corpus of words to use for the voice / command recognition.
  • the device 102 may use face recognition mechanism(s) 168 in combination with one or more cameras (sensors 138) to associate a voice with a particular user in order to select an appropriate corpus.
  • the device may not be able to process a voice command/request. This may be due to any number of factors including
  • the device may, if possible (e.g., if connected to the network), and if permissible, send the voice command/request to the backend for processing (e.g., by voice recognition provided added functionality 120).
  • the voice may be sent in a raw form or in some preprocessed form.
  • the result of such processing may be a command/request for the backend (e.g., a database query) or a command for the device itself.
  • device commands processed remotely via the backend may not be quick enough to control certain aspects of a device (e.g., for a speaker, "Play louder"), and that backend processing is more useful for more complex commands, especially those involving database queries.
  • a device's gesture mechanism(s) 164 may be used, alone or in combination with the voice mechanism(s) 166, to support gesture control of the device.
  • Gesture mechanism(s) 164 preferably include gesture recognition mechanisms for basic commands appropriate to the kind of device.
  • the gesture mechanism(s) 164 may use one or more of the sensors 138, including, e.g., one or more cameras. Special purpose gesture detection / recognition hardware and circuitry may be used.
  • a device may use a combination of gaze detection (determined by face/gaze mechanism(s) 168) to determine whether a voice command is intended for the device.
  • Face/gaze mechanism(s) 168 may use one or more sensors (e.g., one or more cameras) to determine whether or not a person talking is actually looking at the device 102. Since a person may begin talking (to a device) before they completely face the device, preferably each device constantly buffers a period of sound so that once a gaze is detected, the device can begin voice recognition of the buffered stream.
  • mouth movement detection can be used in some cases.
  • each device reports information to the backend 102 (corresponding to arc #7 in FIG. 1).
  • the information preferably includes the unique device ID of the reporting device and, if the device is associated with a user, the unique user ID associated with the owner of the reporting device.
  • Some or all of the information reported by each device may be stored in the devices database 128 and/or the user database 130, e.g., as device history and/or user history, respectively. Since each device has a unique device ID, and since each user has as unique user ID, information from a device may be stored in the database(s) keyed on the device and user IDs.
  • a device 102 may include information about its location at the time of its reporting.
  • the location information may be stored in the device database (both as current device location and as device history).
  • the location information may be stored in the user database as per device history. In this manner, queries to the database may include queries about device location.
  • the user may provide location identification information associated with their current location at the time of registration.
  • a user may also store multiple locations in the system, each with different identification provided by the user. For example, a user may store GPS location information for their home, their work, their friends' homes, etc. In this manner, the system can support database queries based on named locations (e.g., "Where is my device?" to which the system's response may be "At Joe's home.”).
  • a user need not specifically request storage of location information, as location (e.g., GPS) data are preferably stored automatically as part of history data or context metadata.
  • a device 102 may also report information that is specific to the kind of device (e.g., the device's specific functionality). For example, a device that is primarily a speaker may report to the backend information about what it plays and where. In some cases, the information may include information about device settings and what other devices were involved (e.g., joined). In this manner, the database(s) will support queries of the kind "What was I playing at Joe's house last night at around 10 o'clock?", to which the system may provide a list of songs.
  • a device 102 may also report information about proximate devices or users.
  • FIGS. 8A - 8D depict aspects of the architecture of an exemplary device 800 (an embodiment of device 102) in which the specific functionality of the device is sound rendering.
  • Device 800 may be used, e.g., as a speaker.
  • sound rendering device 800 includes components 832 supporting device-specific functionality. These components 832 include one or more speaker drivers 860, one or more signal processors 862, one or more processors 864, memory/storage 866, and controls 868.
  • the device 800 may include
  • Bluetooth mechanisms BLE
  • the communications mechanisms of a device 800 include Bluetooth mechanisms and do not include Ethernet, ZigBee, or cellular mechanisms.
  • sound rendering device 800 may also include sensors 838, including one or more cameras 870, one or more microphones 872, device motion sensor(s), location / position sensor(s), external motion sensor(s), touch/contact sensor(s), light sensor(s), temperature sensor(s), and other sensors.
  • the sensors of a device 800 do not include cameras or temperature sensors.
  • Subwoofer e.g., GGEC W0200A
  • two speaker drivers 860 Subwoofer (e.g., GGEC W0200A) and two
  • tweeters e.g., GGEC T20N4A
  • signal processors 862 e.g., TI Class-D amplifier and DSP
  • processors 864 Single or multicore ARM-based SOC (e.g.,
  • Flash NAND e.g., 4 GBytes from Micron
  • DDR e.g., 1 GByte DDR3 from Micron
  • controls 868 Capacitive touch buttons, strips and surfaces, haptic and digital encoders, voice and gesture. Accelerometers, and device motion
  • Bluetooth WiFi/BT combo SIP e.g., Marvell 88W8797
  • Wi-Fi WiFi/BT combo SIP e.g., Marvell 88W8797
  • MCU FreeScale i.MX6 ARM-based MCU
  • FreeScale power management chip e.g., FreeScale power management chip, coulomb counter and battery power management chip
  • any known mechanism may be used for the various interface mechanisms 162.
  • the face movement detection may use the
  • CANDIDE system for model-based coding of human faces.
  • CANDIDE uses a facial model with a small number of polygons (approximately 100) that allows for fast reconstruction with moderate computing power.
  • the sound-rendering device 800 may operate as a device 102 as described above.
  • the various components may be implemented and packaged in multiple ways, and that the device is not limited by the manner in which the components are implemented or packaged. It should further be appreciated that the device is not limited by the form that the packaging or device takes (i.e., by the device's form factor).
  • phrases means one or more words.
  • phrases in bold italic font are in the local corpus; phrases in square parentheses ("[", "]") are optional.
  • ") between phrases means “or” (i.e., one of the phrases).
  • a phrase followed by a star means that the phrase may be repeated.
  • a word phrase followed by "(s)” means that the singular or plural of the word may be used.
  • item number 2 in the table
  • item no. 3 in the table could mean any of: “Play the next ten tunes”, “Play any random song”, “play next tune”, “replay the previous song,” “Play random tunes”, “Replay”, “Play”, etc.
  • item no. 5 could mean any of: “Skip”, “next three songs”, “a tune”, “skip seven”, “previous tune”, etc.
  • item no. 8 could mean any of "tone”, “adjust treble up", “bass lower”, etc.
  • ⁇ musical entity> can be a specific song, artist, or album,
  • ⁇ artist entity> is the name of an artist (e.g. Pink Floyd)
  • ⁇ album entity> is a specific collection of songs in order (e.g., "Dark side of the moon")
  • the voice commands in some embodiments may include:
  • the voice/speech recognition mechanism(s) 178 may thus recognize certain spoken phrases and will then have to determine their
  • this exemplary corpus provides the syntax of recognized phrases, and that not all phrases will have meaning (or reasonable meaning) for the device. For example, no. 3 above would support recognition of the phrase "replay the next any three song", and no. 5 above would support recognition of the phrase "skip the previous a tune”. While both of these phrases are syntactically correct (according to the syntax in the corpus), they may not correspond to any meaningful command and may be ignored by the device.
  • voice/speech recognition mechanism(s) 178 is only provided as an example, and those of skill in the art will realize and understand, upon reading this description, that different and/or other voice instructions may be understood by the sound rendering device 800, and are contemplated herein.
  • Sound rendering devices 800 may cooperate with each other to render the same sounds (e.g., to play the same music from the same source and at the same time - preferably synchronized). When two or more sound rendering devices 800 are cooperating to render sound from the same source, they need not all render exactly the same sound. For example, multiple sound rendering devices 800 may cooperate to render sound from the same source as a surround sound system. As another example, multiple sound rendering devices 800 may cooperate to render sound from the same source such that some of them render some sound (e.g., from some musical instruments) while others render other sound (e.g., from other musical instruments).
  • a sound-rendering device 800 may also be a source of the signal used to produce the sound.
  • a smartphone such as an iPhone or the like
  • two sound-rendering devices 800-A and 800-B may cooperate to provide a stereo effect.
  • the DSPs in the devices cooperate to produce, e.g., a Haas effect.
  • the devices may determine their own relative positions (e.g., using echo location or some other mechanism), and they may use this relative position information to optimize the cooperative effect.
  • multiple sound-rendering devices 800-A - 800-D may cooperate such that each one of them plays only some of the instruments in the source signal.
  • the source signal may provide separate streams for each instrument or each DSP may be programmed to filter out certain instruments.
  • each device is allocated one or more instruments to render. For example, assume that initially device A was playing alone and rendering all sounds in the source signal.
  • device B joins device A
  • device A may render, e.g., Bass and violin
  • device B may render Cello and vocals.
  • device C joins devices A and B, then device C can be given responsibility for violin, leaving device A with only bass.
  • device D when device D joins, it can take responsibility for vocals from device B (as shown in the drawing in FIG. 9B). If other devices join the group they can combine with one or more of the already-present devices or they can take on some other responsibility. If a device leaves the group then the part of the signal for which it was responsible should be re-assigned to another device still in the group.
  • devices 800 may be given responsibility for different and/or other aspects of an audio stream. It should be appreciated that a device 800 may render (or not render) any part or parts of an audio stream that its DSP can filter (in or out). Furthermore, a device 800 may enhance or modify any part or parts of an audio stream that its DSP can filter.
  • multiple sound-rendering devices 800-A - 800-E placed or located in an arbitrary (haphazard) arrangement may cooperate.
  • the devices may determine their own relative positions (e.g., using echo location or some other approach), and they may use this relative position information to optimize the cooperative effect.
  • the devices may also cooperate to produce an optimal or beneficial cooperative effect for a listener (if the position of the listener is known or can be determined).
  • devices can use their respective cameras to locate (and follow) the listener, adjusting the sound accordingly.
  • a single camera in a single device may be able to determine the direction in which the listener is located. Some techniques allow single cameras to determine approximate distance. Multiple cameras (in a single device or in multiple devices) can more accurately locate a listener (e.g., by face location and/or movement tracking).
  • location detection may be achieved using voice input and echo detection. Thus, e.g., in a device that does not have a camera voice input, echo detection may be used alone to determine location. In a device that has a camera, voice input, echo detection may be used alone or in combination with the camera(s) to determine location.
  • FIG. 9C bass may be rendered by devices A and C (along the stereo line LI), violin and cello may be rendered by devices A and E (along the stereo line L2), and vocals may be rendered by devices D and E (along the stereo line L3). Effects (e.g., cancellation of room noise or echoes) may be performed by device B.
  • the devices may cooperate to render these channels.
  • a user may grant a friend guest privileges to share their devices 800.
  • a user may grant temporary ("party mode") privileges to all any other device to share their sound rendering devices 800.
  • Sound may be classified into genre (e.g., vocal, instrumental, jazz, classical, spoken voice, etc.) and these genre may be provided with the sound source signal and may be used to automatically set or adjust the DSPs in a sound rendering device 800.
  • the preset genre information may be combined with or overridden by user preferences (which may be provided via some user interface or learned by the device based on user interactions with the device). For example, if a user always adjusts the DSP settings for a certain genre of music, always overriding the preset DSP settings, then the device 800 may learn the user's desired settings and always use those instead of the system's preset settings for that genre.
  • Genre information may be set, e.g., in an advance, offline process that analyzes the source sound.
  • the provider of a source library may pre-analyze all music in their library to classify the genre of each item of music. That classification may be stored, e.g., a bit vector representing the genre, and may be provided with the source data. It should be appreciated, however, that the processing of genre information in a source signal is independent of the manner in which that genre information was obtained or set.
  • Cooperating devices 800 may use genre information in the source signal to determine and adjust how they cooperate. Thus, when rendering sound corresponding to multiple songs, cooperating devices 800 may modify the way in which they cooperate depending on the genre of each song.
  • the system 100 may obtain information from each device 102.
  • the devices preferably inform the backend 184 what sound they are rendering (e.g., what music, etc. they are playing), as well as when and where it is being rendered.
  • each device 800 retains a history of its activities and provides that history to the backend 184 on a regular basis and/or when it is able to.
  • the history may be provided as a time-stamped, ordered list of activities and device settings that may be used to reproduce the device's activities. If a device is cooperating with another device, that information is also included in the history and both (all) cooperating devices provide their own history information to the backend.
  • the backend stores device history information in the device and user databases 128, 130.
  • This kind of device history information supports subsequent queries (via the backend 184 and possibly the added functionality 120) of the kind:
  • query #2 may require that Joe and the user making the query be friends and may require permission from Joe.
  • Query #4 may require that Mary and the user making the query be friends and may require permission from Mary. Note too that Query #4 assumes that the system has been updated to know (in near real time) what Mary is listening to.
  • a device may try to filter out environmental noise in order to process voice interactions more precisely.
  • a sound-rendering device 800 poses additional problems, since the device itself may be a source of
  • a sound-rendering device 800 filters out the sound it produces from sound obtained by its sound sensors (microphones).
  • the words “first”, “second”, and so on, when used as adjectives before a term, are merely used to distinguish similar terms, and their use does not imply or define any numerical limits or any ordering (temporal or otherwise).
  • the terms “first device” and “second device” are merely used to refer to and distinguish between different devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)
EP13856018.0A 2012-11-16 2013-11-14 Vereinheitlichtes rahmenwerk für vorrichtungskonfiguration, -interaktion und -steuerung sowie zugehörige verfahren, vorrichtungen und systeme Withdrawn EP2920673A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261727217P 2012-11-16 2012-11-16
PCT/US2013/070002 WO2014078480A1 (en) 2012-11-16 2013-11-14 Unified framework for device configuration, interaction and control, and associated methods, devices and systems

Publications (1)

Publication Number Publication Date
EP2920673A1 true EP2920673A1 (de) 2015-09-23

Family

ID=50731669

Family Applications (1)

Application Number Title Priority Date Filing Date
EP13856018.0A Withdrawn EP2920673A1 (de) 2012-11-16 2013-11-14 Vereinheitlichtes rahmenwerk für vorrichtungskonfiguration, -interaktion und -steuerung sowie zugehörige verfahren, vorrichtungen und systeme

Country Status (6)

Country Link
EP (1) EP2920673A1 (de)
JP (1) JP2016502137A (de)
KR (1) KR20150086332A (de)
CA (1) CA2891202A1 (de)
TW (1) TW201423485A (de)
WO (1) WO2014078480A1 (de)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI677751B (zh) * 2017-12-26 2019-11-21 技嘉科技股份有限公司 攝像裝置與運作攝像裝置的方法
KR20190114325A (ko) 2018-03-29 2019-10-10 삼성전자주식회사 사용자 음성 입력을 처리하는 장치
EP4130941A1 (de) 2018-05-04 2023-02-08 Google LLC Hot-word-freie anpassung von automatisierten hilfsfunktionen
KR102661487B1 (ko) * 2018-05-04 2024-04-26 구글 엘엘씨 검출된 제스처 및 시선에 기초하여 자동화된 어시스턴트 기능 호출
EP4343499A3 (de) 2018-05-04 2024-06-05 Google LLC Anpassung eines automatisierten assistenten auf basis von erfasster mundbewegung und/oder blick
JP2021144259A (ja) * 2018-06-06 2021-09-24 ソニーグループ株式会社 情報処理装置および方法、並びにプログラム
TWI826031B (zh) * 2022-10-05 2023-12-11 中華電信股份有限公司 基於歷史對話內容執行語音辨識的電子裝置及方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7438414B2 (en) * 2005-07-28 2008-10-21 Outland Research, Llc Gaze discriminating electronic control apparatus, system, method and computer program product
WO2008069519A1 (en) * 2006-12-04 2008-06-12 Electronics And Telecommunications Research Institute Gesture/speech integrated recognition system and method
KR100880536B1 (ko) * 2007-01-05 2009-01-28 아주대학교산학협력단 이기종 컴퓨팅 및 서비스 통합을 위한 오픈 프레임워크시스템
US8676942B2 (en) * 2008-11-21 2014-03-18 Microsoft Corporation Common configuration application programming interface
US8843893B2 (en) * 2010-04-29 2014-09-23 Sap Ag Unified framework for configuration validation
KR101789619B1 (ko) * 2010-11-22 2017-10-25 엘지전자 주식회사 멀티미디어 장치에서 음성과 제스쳐를 이용한 제어 방법 및 그에 따른 멀티미디어 장치

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2014078480A1 *

Also Published As

Publication number Publication date
TW201423485A (zh) 2014-06-16
KR20150086332A (ko) 2015-07-27
CA2891202A1 (en) 2014-05-22
JP2016502137A (ja) 2016-01-21
WO2014078480A1 (en) 2014-05-22

Similar Documents

Publication Publication Date Title
EP2920673A1 (de) Vereinheitlichtes rahmenwerk für vorrichtungskonfiguration, -interaktion und -steuerung sowie zugehörige verfahren, vorrichtungen und systeme
JP7225301B2 (ja) 音声インターフェイスデバイスにおけるマルチユーザパーソナライゼーション
US10447748B2 (en) Sharing media information between applications on client devices
CN106557297B (zh) 基于上下文适配音频输出
JP2021121928A (ja) ホームオートメーションのためのインテリジェントアシスタント
KR102393364B1 (ko) 오디오 신호 제어 방법 및 이를 지원하는 전자장치
US10073578B2 (en) Electromagnetic interference signal detection
CN109791765A (zh) 多个语音服务
US20140188985A1 (en) Method and system for executing an application
GB2524864A (en) Adjusting speech recognition using contextual information
US11558848B2 (en) Intelligent notification delivery
US10101869B2 (en) Identifying device associated with touch event
US20190347560A1 (en) Cognitive engine for multiple internet of things devices
US11275576B2 (en) Techniques for firmware updates with accessories
WO2016171887A1 (en) Base station for use with digital pens
EP3335099A1 (de) Elektromagnetische störsignalerfassung
US10936276B2 (en) Confidential information concealment
US20210005189A1 (en) Digital assistant device command performance based on category
US20140282683A1 (en) Computing system with device interaction mechanism and method of operation thereof
EP3350681B1 (de) Elektromagnetische störsignalerfassung
TW201407414A (zh) 輸入裝置及搭配其使用之主機
US12032951B2 (en) Techniques for firmware updates with accessories
US20210397436A1 (en) Techniques for firmware updates with accessories
US20210397435A1 (en) Techniques for firmware updates with accessories
US20160027296A1 (en) Using device data collected from other proximate devices

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20150511

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20151127