WO2023083262A1 - 基于多设备提供服务的方法、相关装置及系统 - Google Patents

基于多设备提供服务的方法、相关装置及系统 Download PDF

Info

Publication number
WO2023083262A1
WO2023083262A1 PCT/CN2022/131166 CN2022131166W WO2023083262A1 WO 2023083262 A1 WO2023083262 A1 WO 2023083262A1 CN 2022131166 W CN2022131166 W CN 2022131166W WO 2023083262 A1 WO2023083262 A1 WO 2023083262A1
Authority
WO
WIPO (PCT)
Prior art keywords
central control
combinable
control device
user
communication system
Prior art date
Application number
PCT/CN2022/131166
Other languages
English (en)
French (fr)
Inventor
王成录
甘嘉栋
潘邵武
邵英玮
周昕宇
周剑辉
王松涛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023083262A1 publication Critical patent/WO2023083262A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Definitions

  • the present application relates to the field of terminal and communication technologies, and in particular to a method, a related device and a system for providing services based on multiple devices.
  • terminal devices With the popularity of terminal devices, individuals can own multiple terminal devices, such as mobile phones, tablet computers, smart screens, and so on. How to use the resources of multiple terminal devices to provide users with natural and intelligent services is the direction of current and future research.
  • This application provides a method for providing services based on multiple devices, a related device and a system, which can realize cross-device resource intercommunication and sharing, and provide users with natural and intelligent services.
  • the embodiment of the present application provides a communication system that provides services based on multiple devices.
  • the resources perform the following steps: the first resource among the multiple resources detects the first event, and the number of the first resources is one or more; the second resource among the multiple resources executes the task to be executed corresponding to the first event, and the second resource The number is one or more; all the resources included in the first resource and/or the second resource come from at least two different electronic devices; wherein, the multiple resources managed by the central control device include part or all of multiple electronic devices resource.
  • Multiple resources may include, but are not limited to, camera resources, microphone resources, sensor resources, display screen resources, or computing resources.
  • the quantity of the first resource may be multiple resources of the same type (such as multiple camera resources, wherein the multiple camera resources may be multiple camera resources of the same device, or multiple camera resources of multiple devices), Or, multiple different types of resources (such as camera resources and microphone resources).
  • the central control device can uniformly dispatch some or all resources in the communication system, efficiently integrate resources in the system, realize cross-device resource intercommunication and sharing, and provide users with natural and intelligent multi-device collaboration Serve.
  • first resources there are one or more first resources, and one or more second resources.
  • All resources included in the first resource and/or the second resource come from at least two different electronic devices, which may be: multiple first resources come from multiple different electronic devices; or, multiple second resources come from multiple different electronic devices; or, it may also be: any two or more resources in the first resource and the second resource come from different electronic devices, for example, when there is only one first resource and the second resource, The first resource and the second resource come from different devices, or, for example, when the first resource or the second resource includes multiple resources, any one of the first resources and any one of the second resources come from different equipment, etc.
  • the multiple different electronic devices mentioned above are all electronic devices in the communication system.
  • the central control device is also used to manage multiple resources, so that the multiple resources execute: before the second resource executes the task to be executed corresponding to the first event, the third resource among the multiple resources A user intention represented by the first event is identified, and a to-be-executed task satisfying the user intention is determined.
  • the third resource in the communication system can identify the user intention represented by the first event, and divide the user intention into tasks to be performed, so that the second resource can execute the task to be performed subsequently.
  • the resources managed by the central control device are combinable capabilities, and the combinable capabilities are resources described in a predetermined manner.
  • the first resource is the first combinable capability, and the second resource is the second combinable capability.
  • the predetermined manner may include, but is not limited to, a predetermined format, protocol or standard, etc., for example.
  • each device can deconstruct its own resources into composable capabilities in a unified predetermined manner.
  • the combinable capabilities obtained through unified deconstruction are decoupled from devices, device models, and device manufacturers, so they can be used by other devices in the communication system to call across devices without barriers, that is, to support unified scheduling of central control devices to meet user needs.
  • describing resources in a predetermined way can make the method provided by the embodiment of the present application adapt to different devices, and support devices of different types and manufacturers to join the communication system to jointly provide services for users.
  • the central control device is further configured to: configure combinable capabilities of some or all of the multiple electronic devices before managing the multiple resources so that the multiple resources perform the steps of the first aspect It is a virtual aggregation device.
  • both the first combinable capability and the second combinable capability are combinable capabilities of the virtual aggregation device.
  • the central control device configures the virtual aggregation device, it can manage the composable capabilities in the virtual aggregation device.
  • the configuration of the virtual aggregation device by the central control device refers to configuring the parameters of combinable capabilities of some or all of the multiple electronic devices.
  • the configuration of parameters includes: the configuration of related parameters for the flow direction of data processing. After the central control device configures the virtual aggregation device, it is equivalent to specifying the flow of information collection and processing.
  • the application can perceive the independent virtual aggregation device instead of multiple other separate physical equipment. In this way, various upper-layer applications can more conveniently schedule resources in other physical devices.
  • the central control device configures the virtual aggregation device, which can make preparations in advance for the combinable capabilities that may be used in the future, and can improve the response speed when the combinable capabilities are subsequently activated to provide services for users. , to shorten the response delay.
  • the communication system can only aggregate part of the combinable capabilities in the communication system, which can avoid wasting unnecessary resources.
  • the central control device is also used to receive combinable capability information sent by other devices other than the central control device before configuring the combinable capabilities of some or all of the multiple electronic devices as a virtual aggregation device, and the combinable capabilities The information is used to indicate the composable capabilities provided by the corresponding device.
  • the central control device is specifically configured to configure part or all of the combinable capabilities of the multiple electronic devices as a virtual aggregation device according to the combinable capability information of the multiple electronic devices.
  • the combinable capability information may further include the attribute of the combinable capability, and the attribute of the combinable capability includes any one or more of the following: position, orientation, category, performance, parameter, version or size of the combinable capability.
  • the attribute of composable capability can be used for subsequent central control equipment to better manage the resources of each electronic equipment.
  • each electronic device in the communication system can also synchronize the combinable capability information with each other, so that the central control device can learn the combinable capabilities of other devices in the communication system, which is convenient for subsequent flexibility. Scheduling some or all of the resources in the communication system to provide services to users and realize cross-device resource intercommunication and sharing.
  • each electronic device in the communication system can periodically synchronize the combinable capability information with each other, and can also synchronize the combinable capability information with each other after a new device joins the communication system or goes online.
  • the device may also send the combination capability information to other electronic devices in the communication system.
  • the central control device can timely know the combinable capabilities available in the communication system, so as to more flexibly schedule part or all of the resources in the communication system, and provide better services for users.
  • the combinable capabilities indicated by the combinable capability information received by the central control device from other devices may be part or all of the combinable capabilities in the electronic device. Part or all of the combinable capabilities may be determined by the authentication result when the other device joins the communication system. The higher the level of the authentication result when the electronic device joins the communication system, the more types and/or quantities of combinable capabilities the electronic device provides for other devices to invoke. In this way, the electronic device can only open more composable capabilities to other trusted devices, so as to ensure the information security of the electronic device.
  • some or all of the combinable capabilities may also be determined by users according to their own needs.
  • the virtual aggregation device is used to run a single smart assistant, and the single smart assistant is used to support the central control device to manage multiple resources, so that multiple resources execute the first aspect. step. That is, the physical device on which the composable capabilities included in the virtual aggregation device reside is used to run the single intelligent assistant.
  • the single smart assistant can facilitate the central control device to flexibly dispatch some or all resources in the communication system, thereby providing users with natural and intelligent services. Instead of running one smart assistant per device, multiple smart assistants then negotiate internally to interact.
  • the central control device is specifically configured to configure part or all of the combinable capabilities of multiple electronic devices as a virtual aggregation device according to one or more of the following: user status, device status, Environmental state, persona, global context or memory.
  • the virtual aggregation device can be configured according to various information, and the virtual aggregation device can better provide services for users.
  • the central control device is specifically configured to configure the following items as the virtual aggregation device: the combinable capabilities of the central control device itself, and, the central control device in the communication system A fourth combinable capability of an electronic device other than a control device.
  • the determination method of the fourth combinable capability may include the following two types:
  • the fourth combinable capability is determined by the central control device according to a preset strategy.
  • Such preset policies may include, for example:
  • a comprehensive detection strategy for example, the central control device determines all combinable capabilities of electronic devices other than the central control device as the fourth combinable capability. Using the comprehensive detection strategy can obtain all kinds of information comprehensively and accurately, so as to provide services for users.
  • the central control device determines the combinable capability of collecting non-private content in electronic devices other than the central control device as the fourth combinable capability.
  • Using the privacy priority policy can protect the user's privacy from being leaked.
  • the power consumption priority strategy for example, the central control device determines the combinable capability in electronic devices connected to the power supply other than the central control device as the fourth combinable capability.
  • the power of each device can be fully considered to obtain environmental information, and the power of each device in the communication system can be avoided from being exhausted.
  • the fourth combinable capability is determined according to the environmental information after the central control device obtains the environmental information by using its own combinable capability.
  • the above method (2) confirms the initial configuration of the virtual aggregation device according to the exploration results of the central control device, which is flexible and convenient.
  • multiple devices in the communication system can be initialized as a virtual aggregation device with a central control device in different environments, and make preparations in advance for the combinable capabilities that may be used later.
  • the initialization process of the above-mentioned virtual aggregation device may be executed when the communication system is started or restarted for the first time or when a new device is added.
  • the central control device can also be used to manage multiple resources, so that Multiple resource execution: the first composable capability detects the second event; the second composable capability determines the service solution according to the second event. Afterwards, the central control device is also used to reconfigure the composable capabilities corresponding to the service plan as a virtual aggregation device.
  • the central control device is also used to manage multiple resources, so that multiple resources execute: the second resource analyzes user needs according to the second event, and determines a service plan according to user needs.
  • the second resource may use any one of fixed rules, knowledge graphs, or machine learning to analyze user needs according to the second event.
  • the central control device can continuously detect the status of users, equipment, and the environment through the virtual aggregation device on the basis of the existing virtual aggregation device, and analyze the potential of the user based on the detected information. service requirements, and adaptively adjust the virtual aggregation device, that is, reconfigure the virtual aggregation device.
  • the currently existing virtual aggregation device may be an initialized virtual aggregation device, or a virtual aggregation device after multiple reconfigurations.
  • the first event includes any of the following:
  • the electronic device obtains the notification message, or obtains the event of the upcoming schedule information.
  • the communication system provided by the embodiment of the present application can not only provide services for users in response to user interaction behaviors, but also provide services for users according to information such as user status changes, environmental changes, and device status, so as to realize natural, intelligent Coordinated multi-device services.
  • the first combinable capability can be determined in any of the following ways:
  • the first combinable capability includes multiple combinable capabilities for collecting the first modality data.
  • the communication system can use multiple combinable capabilities to collect the data of the mode to detect the first time, so that the mode information collected by multiple channels can be integrated to obtain more accurate and rich information. Modal information to facilitate the accuracy of subsequent operations.
  • the first combinable capability is determined by the central control device according to one or more of user habits, activity of combinable capabilities, distance between combinable capabilities and users, and default ranking.
  • the central control device may, according to user habits, preferentially select the most frequently called combinable capability in history as the first combinable capability, or select the combinable capability with the highest activity as the first combinable capability, or select The combinable capability closest to the user is used as the first combinable capability, or the combinable capability with the highest default ranking is selected as the first combinable capability.
  • the default sorting can be determined according to device priority.
  • the first combinable capability includes a combinable capability selected by the user.
  • the central control device can select the first combinable capability according to the actual needs of the user.
  • the first combinable capability includes the combinable capability in the electronic device where the user's attention is focused.
  • the central control device may select the combinable capability in the device where the user's attention is located as the first combinable capability.
  • the second combinability can be determined by any of the following methods:
  • the second combinable capability includes the combinable capability in the device where the first combinable capability resides.
  • the central control device can simultaneously select the first combinable capability and the second combinable capability in the same device.
  • the second combinable capability is determined by the central control device according to one or more of user habits, activity of combinable capabilities, distance between combinable capabilities and users, and default ranking.
  • the central control device can preferentially select the most frequently called combinable capability in history as the second combinable capability according to user habits, or select the combinable capability with the highest activity as the second combinable capability, or select The combinable capability closest to the user is used as the second combinable capability, or the combinable capability with the highest default ranking is selected as the second combinable capability.
  • the default sorting can be determined according to device priority.
  • the second combinable capability includes a combinable capability selected by the user.
  • the central control device can select the second combinable capability according to the actual needs of the user.
  • the second combinable capability includes the combinable capability in the electronic device where the user's attention is focused.
  • the central control device may select the combinable capability in the device where the user's attention is focused as the second combinable capability.
  • the central control device is also used to determine the device that the user is paying attention to in any of the following ways:
  • the device where the user's attention is located is determined.
  • the fourth device and the fifth device may be any devices in the communication system.
  • the multiple electronic devices in the communication system are used to determine the central control device from the multiple electronic devices in any of the following situations:
  • the communication system can determine the central control device under the trigger of the user.
  • the communication system may determine the central control device periodically or aperiodically according to a preset rule.
  • the communication system can delay determining the central control device, so that the communication system can collect more comprehensive device information to elect the central control device, so as to elect a more suitable central control device.
  • multiple electronic devices in the communication system may determine the strategy of the central control device may include the following:
  • the first strategy is to determine the central control device from multiple electronic devices according to one or more of resource stability, device mode, or user habit. For example, a device with relatively stable computing resources, a device with relatively stable memory resources, a device with relatively stable power supply, a device with many available modes, or a device frequently used by users can be determined as a central control device.
  • the second strategy is to determine an electronic device of a preset type among the plurality of electronic devices as a central control device.
  • the smart screen can be determined as the central control device.
  • the third strategy is to determine the electronic device selected by the user as the central control device. In this way, the central control device can be determined according to the actual needs of the user.
  • Strategy 4 Determine the central control device from multiple electronic devices according to the historical interaction information of each electronic device.
  • the historical interaction information of electronic devices includes but is not limited to any one or more of the following: device identification, device type, current power consumption, available resources, device mode, current use status, online information, offline Line information, historical interaction information of other devices in the communication system 10, device location (such as room, living room, etc.), orientation, and environment type of the device (such as office, home range, etc.).
  • strategy four above specifically includes:
  • the electronic device with the largest average number of online devices is determined as the central control device, and the average number of online devices is the average number of devices that are online in the communication system per unit time, which is counted by the electronic device within the statistical period;
  • the electronic device with the largest mathematical expectation of the average number of online devices is determined as the central control device.
  • the number of central control devices determined by multiple electronic devices in the communication system includes multiple, and multiple central control devices are connected to all electronic devices in the communication system at the same time or in the same space .
  • the central control device can directly interact with other devices in the communication system, thereby making full use of the information of each electronic device to provide services for users.
  • the central control device is specifically used to manage multiple resources, so that multiple resources execute: the third resource splits user intentions into multiple tasks to be executed in units of modalities; different second resources execute Pending tasks for different modalities.
  • the user intention is divided into multiple tasks to be executed according to the modality, and the tasks to be executed in different modalities are distributed to different second resources for execution, so as to provide better services for users.
  • the tasks to be performed that satisfy the user's intention include: multiple tasks with logical relationships, and the logical relationships include any one or more of the following: sequential relationships, conditional relationships, circular relationships, or Boolean logic.
  • the central control device is specifically configured to manage multiple resources, so that the multiple resources execute: the second resource executes the multiple logically related tasks according to the logical relationship.
  • the multiple resources provided in this embodiment can execute multiple logically related tasks based on the user's explicit or implicit instructions.
  • the communication system provided by this embodiment can perform more extensive types of tasks, and can better meet the complex needs of users, thereby providing better services for users.
  • the central control device is further configured to manage a plurality of resources, so that the plurality of resources perform the following steps: before the third resource recognizes the user intention represented by the first event, the plurality of the resources
  • the first resource receives interactive input.
  • the third resource generates a global context according to the interaction input.
  • the global context includes one or more of the following: the time when the first resource receives the interaction input, the first resource, the interaction content of the interaction input, the physiological feature information of the user corresponding to the interaction input, the first The device information of the electronic device to which the resource belongs, or the device information of the target device controlled by the interactive input.
  • the central control device is specifically configured to manage multiple resources, so that the multiple resources perform the following steps: the third resource identifies the user intention represented by the first event based on the global context.
  • the interaction input includes: historical input, and, current input.
  • the global context includes: historical interaction information, and current round interaction information.
  • the central control device is specifically configured to manage multiple resources, so that the multiple resources perform the following steps: the first resource acquires the historical interaction information based on the historical input, and acquires the current round of interaction information based on the current input.
  • the third resource matches the first historical interaction information associated with the current round of interaction information from the historical interaction information.
  • the third resource identifies the user intention represented by the first event based on the first historical interaction information.
  • the first historical interaction information includes: historical interaction information related to the first user, where the first user is the user who triggers the current input. Or, the historical interaction information received by the sixth device at the first time, the sixth device being the first device or a near-field device of the first device, the first time and the time when the current round of interaction information is received The interval is less than the first duration. Or, the second historical interaction information received at a second time, the target device of the second historical interaction information is the target device or the near-field device of the current round of interaction information, and the second time is the same as receiving the current round of interaction The time interval of the information is less than the second duration. Or, historical interaction information whose correlation with the current round of interaction information is greater than a threshold.
  • the third resource when the third resource recognizes the user intent represented by the first event in combination with the global context, the third resource can analyze the user's service based on the received user, device, environment status, historical interaction information, etc. Therefore, the user's intention can be determined more accurately and personalized, so as to better serve the user.
  • the first event includes first dialogue information.
  • the first dialog information includes a first instruction and a second instruction, the intent corresponding to the first instruction is associated with the intent corresponding to the second instruction, and the first instruction includes a first referential pronoun.
  • the central control device is also used to manage multiple resources, so that the multiple resources perform the following steps: before the second resource recognizes the user intention represented by the first event, refer to the first referring pronoun in the first dialog information
  • the object of is replaced with the object corresponding to the second instruction, so as to obtain the second dialogue information.
  • the central control device is specifically configured to manage multiple resources, so that the multiple resources perform the following steps: the third resource identifies the user intention represented by the first event based on the second dialogue information.
  • the first pronoun in the first dialogue information is replaced with the object corresponding to the second instruction, so as to obtain the second dialogue information may be as follows: 1. Divide the first dialogue information into the first An instruction and a second instruction, wherein the first instruction includes the first referential pronoun. 2. Identifying the first instruction that includes the first referential pronoun. 3. Identify the intent corresponding to the first instruction and the intent corresponding to the second instruction based on the intent classification template. 4. When it is determined that the intent corresponding to the first instruction is associated with the intent corresponding to the second instruction, merging the first instruction and the second instruction. 5. Based on the merged first instruction and the second instruction, replace the object referred to by the first pronoun with the object corresponding to the second instruction to obtain second dialogue information.
  • the third resource can first replace the referential pronouns of the first dialog information Substitute the corresponding referent object, so as to obtain the second dialogue information after the referent pronoun is replaced. In this way, the third resource can more accurately determine the user's intention based on the second dialogue information, thereby providing better services for the user.
  • the central control device is further configured to manage multiple resources, so that the multiple resources perform the following steps: before the third resource identifies the user intention represented by the first event, the first
  • the resource receives an interactive input within a first preset time.
  • the third resource determines a memory based on the interaction input, the memory characterizing habits or preferences for interactions between the user and the device.
  • the central control device is specifically configured to manage multiple resources, so that the multiple resources perform the following steps: the third resource identifies the user intention represented by the first event based on the memory.
  • the memory can be divided into short-term memory and long-term memory, wherein the short-term memory can represent the habit or preference of interaction between the user and the device based on the interaction input satisfying the first condition.
  • the long-term memory can characterize the habit or preference of the interaction between the user and the device based on the interaction input satisfying the second condition.
  • the first condition may refer to that the above-mentioned interactive input is received within a preset time window (for example, within the last 6 hours).
  • the second condition may refer to that the above-mentioned interactive input is received within a plurality of consecutive preset time windows (for example, within 6 hours, within 8 hours).
  • the first condition may refer to that within a specified time period 1 (for example, from 0:00 am to 24:00 pm), the number of times the above-mentioned interaction input is received is greater than the third threshold.
  • the second condition may mean that in multiple consecutive specified time periods 1 (for example, from 0:00 am to 24:00 pm), the number of times the above-mentioned interactive input is received in each specified time period 1 is greater than the third threshold.
  • the third resource may construct a memory through a principal component analysis algorithm, or one or more artificial neural network algorithms among CNN, RNN, and LSTM.
  • the third resource can build a memory representing the user's habits or preferences based on the user's interactive input.
  • the third resource may identify a user intent characterized by the first event based on the memory.
  • the third resource can accurately and individually meet the user's definite or potential (not occurred) service needs, thereby providing better services for the user.
  • the central control device is further configured to manage multiple resources, so that the multiple resources perform the following steps: before the third resource identifies the user intention represented by the first event, the third The resource obtains the user profile.
  • the central control device is specifically configured to manage multiple resources, so that the multiple resources perform the following steps: the third resource identifies the user intention represented by the first event based on the user portrait.
  • the third resource can construct a user portrait based on the user's interactive input, and the third resource can recognize the user's intention represented by the first event based on the memory. In this way, the third resource can accurately and individually meet the user's definite or potential (not occurred) service needs, thereby providing better services for the user.
  • the central control device is specifically used to manage multiple resources, so that the multiple resources perform the following steps: the third resource identifies the user's intention represented by the first event according to any one or more of the following, and determines that the user's intention is satisfied
  • the intent's pending tasks user state, device state, environment state, persona, global context, or memory.
  • the first event includes multiple modal data
  • the central control device is specifically configured to manage multiple resources, so that the multiple resources perform the following steps: the first resource uses the first sampling rate to collect corresponding modal data;
  • the first sampling rate is a preset sampling rate, or the first sampling rate is a sampling rate of a resource with the highest activity among multiple resources included in the first resource.
  • different first resources can use a uniform sampling rate to collect data, and can obtain data of multiple modalities with a similar amount of data, allowing the third resource to integrate multi-modal data more conveniently and quickly, so as to A user intent represented by the first event is identified.
  • the combinable capabilities of multiple electronic devices in the communication system include: interactive combinable capabilities and service combinable capabilities.
  • the first composable capability belongs to the interactive composable capability, and the second composable capability belongs to the service composable capability.
  • the combinable capabilities of multiple electronic devices in the communication system include any one or more of the following: camera resources, microphone resources, sensor resources, display screen resources or computing resources described in a predetermined manner.
  • multiple electronic devices in the communication system can communicate through any one or more of the following technologies: WLAN, Wi-Fi P2P, BT, NFC, IR, ZigBee, UWB, hotspot, Wi-Fi softAP, cellular or wired technology.
  • the embodiment of the present application provides a method for providing services based on multiple devices.
  • the method is applied to a central control device.
  • the method includes: the central control device manages multiple resources, so that multiple resources perform the following steps: multiple resources
  • the first resource in detects the first event, and the number of the first resource is one or more;
  • the second resource among the plurality of resources executes the task to be executed corresponding to the first event, and the number of the second resource is one or more;
  • All the resources included in the first resource and/or the second resource include at least the resources of two different electronic devices; wherein, the multiple resources managed by the central control device include part or all of the resources of multiple electronic devices, and the multiple electronic devices include Central control equipment.
  • the central control device can uniformly dispatch some or all resources of multiple electronic devices in the communication system, efficiently integrate resources in the system, realize cross-device resource intercommunication and sharing, and provide users with natural and intelligent Multi-device collaborative service.
  • first resources there are one or more first resources, and one or more second resources.
  • All the resources included in the first resource and/or the second resource come from at least two different electronic devices, which may be: multiple first resources come from multiple different electronic devices, or, multiple second resources come from multiple different electronic devices; or, it may also be: multiple first resources include first sub-resources, multiple second resources include second sub-resources, and the first sub-resources and second sub-resources come from different electronic devices .
  • the multiple different electronic devices mentioned above are all electronic devices in the communication system.
  • the central control device may manage multiple resources, so that the multiple resources execute: the third resource among the multiple resources identifies the first An event characterizes the user's intent, and determines the tasks to be performed that meet the user's intent.
  • the central control device can manage the third resource, so that the third resource can identify the user intention represented by the first event, and divide the user intention into tasks to be executed, so that the subsequent second resource can execute the task to be executed .
  • the definition of resources and composable capabilities can refer to the relevant descriptions in the first aspect.
  • the central control device manages multiple resources, so that before the multiple resources execute the steps in the second aspect, the central control device can configure the combinable capabilities of some or all of the multiple electronic devices as A virtual aggregation device; wherein, the first combinable capability and the second combinable capability are both combinable capabilities of the virtual aggregation device. After the central control device configures the virtual aggregation device, it can manage the composable capabilities in the virtual aggregation device.
  • the configuration of the virtual aggregation device by the central control device refers to configuring the parameters of combinable capabilities of some or all of the multiple electronic devices.
  • the configuration of parameters includes: the configuration of related parameters for the flow direction of data processing. After the central control device configures the virtual aggregation device, it is equivalent to specifying the flow of information collection and processing.
  • the central control device may receive combinable capability information sent by other devices other than the central control device, and the combinable capabilities The information is used to indicate the composable capabilities provided by the corresponding device.
  • combinable capability information refer to the related description of the first aspect.
  • the central control device can configure part or all of the combinable capabilities of the multiple electronic devices as a virtual aggregation device according to the combinable capability information of the multiple electronic devices.
  • the virtual aggregation device is used to run a single smart assistant, and the single smart assistant is used to support the central control device to manage multiple resources, so that multiple resources execute the second aspect. step. That is, the physical device on which the composable capabilities included in the virtual aggregation device reside is used to run the single smart assistant.
  • the category of the first event may refer to the related description of the first aspect.
  • the central control device manages multiple resources, so that multiple resources execute: the third resource splits user intentions into multiple tasks to be executed in units of modalities; different second resources execute different modalities pending tasks.
  • the third resource splits the user's intention into multiple tasks to be executed according to the modality, and the central control device distributes the tasks to be executed in different modalities to different second resources for execution, so as to provide better services for users.
  • the central control device manages multiple resources, so that the multiple resources execute: before the third resource recognizes the user intention represented by the first event, multiple first resources receive interaction enter.
  • the third resource generates a global context according to the interaction input.
  • the global context includes one or more of the following: the time when the first resource receives the interaction input, the first resource, the interaction content of the interaction input, the physiological feature information of the user corresponding to the interaction input, the first The device information of the electronic device to which the resource belongs, or the device information of the target device controlled by the interactive input.
  • the third resource identifies user intent represented by the first event based on the global context.
  • the interaction input includes: historical input, and, current input.
  • the global context includes: historical interaction information, and current round interaction information.
  • the central control device manages multiple resources, so that the multiple resources execute: the first resource acquires the historical interaction information based on the historical input, and acquires the current round of interaction information based on the current input.
  • the third resource matches the first historical interaction information associated with the current round of interaction information from the historical interaction information.
  • the third resource identifies the user intention represented by the first event based on the first historical interaction information.
  • the third resource when the third resource recognizes the user intent represented by the first event in combination with the global context, the third resource can analyze the user's service based on the received user, device, environment status, historical interaction information, etc. Therefore, the user's intention can be determined more accurately and personalized, so as to better serve the user.
  • the first event includes first dialogue information.
  • the first dialog information includes a first instruction and a second instruction, the intent corresponding to the first instruction is associated with the intent corresponding to the second instruction, and the first instruction includes a first referential pronoun.
  • the central control device manages multiple resources, so that the multiple resources perform: before the second resource recognizes the user intention represented by the first event, replace the object referred to by the first pronoun in the first dialog information with the The object corresponding to the second instruction is to obtain the second dialog information, to obtain the second dialog information.
  • the third resource identifies the user intention represented by the first event based on the second dialogue information.
  • the first pronoun in the first dialog information is replaced with the object corresponding to the second instruction, so that the step of obtaining the second dialog information can refer to the related description of the first aspect.
  • the third resource can first replace the referential pronouns of the first dialog information Substitute the corresponding referent object, so as to obtain the second dialogue information after the referent pronoun is replaced. In this way, the third resource can more accurately determine the user's intention based on the second dialogue information, thereby providing better services for the user.
  • the central control device manages multiple resources, so that the multiple resources execute: before the third resource recognizes the user intention represented by the first event, the first resource receives the first preset Interactive input within a set time.
  • the third resource determines a memory based on the interaction input, the memory characterizing habits or preferences for interactions between the user and the device.
  • the third resource identifies user intent represented by the first event based on the memory.
  • the classification and definition of memory can refer to the related description of the first aspect.
  • the algorithm used for memory construction can refer to the related description of the first aspect.
  • the third resource can build a memory representing the user's habits or preferences based on the user's interactive input.
  • the third resource may identify a user intent characterized by the first event based on the memory. In this way, the third resource can accurately and individually meet the user's definite or potential (not occurred) service needs, thereby providing better services for the user.
  • the embodiment of the present application provides an electronic device, including: memory, one or more processors; the memory is coupled with one or more processors, and the memory is used to store computer program codes, and the computer program codes include computer instructions
  • One or more processors call computer instructions to make the electronic device execute the method according to the second aspect or any implementation manner of the second aspect.
  • the embodiment of the present application provides a communication system, the communication system includes a plurality of electronic devices, and the plurality of electronic devices include a central control device, and the central control device is used to perform any implementation of the second aspect or the second aspect. way of way.
  • the embodiment of the present application provides a computer-readable storage medium, including instructions, and when the instructions are run on the electronic device, the electronic device executes the method according to the second aspect or any one of the implementation manners of the second aspect.
  • an embodiment of the present application provides a computer program product, which, when running on a computer, causes the computer to execute the method of the second aspect or any one of the implementation manners of the second aspect.
  • multiple devices form a communication system
  • the central control device among the multiple devices can uniformly schedule some or all resources in the communication system to provide services for users.
  • Some or all of the resources in the communication system are uniformly dispatched by the central control equipment, which can efficiently integrate resources in the system, realize cross-device resource intercommunication and sharing, and provide users with natural and intelligent services.
  • FIG. 1A is a schematic structural diagram of a communication system provided by an embodiment of the present application.
  • FIG. 1B is a schematic diagram of the software structure of a single smart assistant running on the communication system 10;
  • FIG. 2 is a schematic structural diagram of an electronic device 100 provided in an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for providing services based on multiple devices provided in an embodiment of the present application
  • Figure 4 is an example of the types of combinable capabilities provided by the embodiment of the present application.
  • 5A-5M are a set of user interfaces involved in building the communication system 10 provided by the embodiment of the present application.
  • FIG. 6A is a scene diagram of the delayed election central control device provided by the embodiment of the present application.
  • 6B-6C are scene diagrams of the election central control device provided by the embodiment of the present application.
  • FIG. 6D is a scene diagram of the same device joining different communication systems provided by the embodiment of the present application.
  • FIG. 7 is a virtual aggregation device provided by the embodiment of the present application.
  • FIGS. 8A-8D are schematic diagrams of the device where the user's attention is provided by the embodiment of the present application.
  • FIG. 9 is a scene diagram of a service provided by multiple devices provided by an embodiment of the present application.
  • FIG. 10A is a scene diagram of another multi-device service provided by the embodiment of the present application.
  • FIG. 10B is a schematic flowchart of an interaction method based on a global context provided by an embodiment of the present application.
  • FIG. 10C is a schematic diagram of a software architecture applied to global context-based interaction provided by the embodiment of the present application.
  • FIG. 11 is a schematic flow chart of matching analysis based on specified matching rules provided by the embodiment of the present application.
  • FIG. 12 is a schematic flow diagram of a matching analysis based on a specified algorithm provided by the embodiment of the present application.
  • FIG. 13 is a schematic flowchart of encoding dialogue information of a certain round of historical dialogue provided by the embodiment of the present application.
  • FIG. 14 is a schematic diagram of the composition of a correlation model provided by the embodiment of the present application.
  • FIG. 15 is a scene diagram of another multi-device service provided by the embodiment of the present application.
  • FIG. 16 is a schematic flow chart of a method for resolving multi-instructions and references under a single-round dialogue provided by the embodiment of the present application;
  • FIG. 17A is a schematic flow diagram of a semantic unit recognition provided by the embodiment of the present application.
  • FIG. 17B is a schematic diagram of a semantic unit recognition model provided by the embodiment of the present application.
  • Fig. 17C is a schematic flow diagram of referring to resolution for an exemplary dialog interaction information provided by the embodiment of the present application.
  • FIG. 18 is another scene diagram of multiple devices providing services provided by the embodiment of the present application.
  • Fig. 19A is a schematic flow chart of executing a long-term task provided by the embodiment of the present application.
  • FIG. 19B is a schematic diagram of an execution process for constructing a long-term task provided by the embodiment of the present application.
  • FIG. 20 is a schematic flowchart of a personalized interaction method provided by the embodiment of the present application.
  • Fig. 21A is a schematic diagram of a memory model provided by the embodiment of the present application.
  • FIG. 21B is another scene diagram of multiple devices providing services provided by the embodiment of the present application.
  • FIG. 21C is another scene diagram of multi-device providing services provided by the embodiment of the present application.
  • FIG. 21D is a schematic flowchart of an interaction method based on user portraits provided by the embodiment of the present application.
  • Figure 22 and Figure 23 are scenarios where multiple devices provide services provided by the embodiments of the present application.
  • first and second are used for descriptive purposes only, and cannot be understood as implying or implying relative importance or implicitly specifying the quantity of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, the “multiple” The meaning is two or more.
  • UI user interface
  • the term "user interface (UI)” in the following embodiments of this application is a medium interface for interaction and information exchange between an application program or an operating system and a user, and it realizes the difference between the internal form of information and the form acceptable to the user. conversion between.
  • the user interface is the source code written in a specific computer language such as java and extensible markup language (XML).
  • the source code of the interface is parsed and rendered on the electronic device, and finally presented as content that can be recognized by the user.
  • the commonly used form of user interface is the graphical user interface (graphic user interface, GUI), which refers to the user interface related to computer operation displayed in a graphical way. It may be text, icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, Widgets, and other visible interface elements displayed on the display screen of the electronic device.
  • the present application provides a method, a related device and a system for providing services based on multiple devices.
  • multiple devices form a communication system, and the multiple devices can negotiate to determine a central control device.
  • the central control device can select appropriate resources in the communication system to detect a specific event and analyze the user represented by the specific event. Intent, perform a task that satisfies that user intent. In this way, the central control device can uniformly schedule part or all of the resources in the communication system, provide users with various services they need, and meet the needs of users.
  • the resources of multiple devices in the communication system are uniformly dispatched by the central control device, which can efficiently integrate resources in the system, realize cross-device resource intercommunication and sharing, and provide users with natural and intelligent multi-device collaborative services.
  • the central control device can efficiently integrate resources in the system, realize cross-device resource intercommunication and sharing, and provide users with natural and intelligent multi-device collaborative services.
  • the user issues instructions to any device in the communication system, other devices in the communication system can continue to perform corresponding tasks under the management of the central control device, without the need for the user to issue additional cross-device operation instructions. This can be seen as providing continuous, uninterrupted service to users.
  • the central control device may also be referred to as a central device.
  • the device for detecting a specific event the device for analyzing the user's intention represented by the specific event, and the device for performing a task that satisfies the user's intention are respectively referred to as the first device, second device, third device.
  • the central control device, the first device, the second device and the third device may be the same device or different devices.
  • Each of the central control device, the first device, the second device, and the third device may include one or more devices, which is not limited in this embodiment of the present application.
  • the central control device can select the appropriate first device, second device or third device by combining one or more of the following: user status, device status, environmental status, user portrait detected in the history of the communication system , global context, or memory.
  • user state, device state, environment state, user portrait, global context, definition and acquisition method of memory can refer to the detailed introduction of the following embodiments.
  • the specific event may include an interactive operation input by the user, and may also include an event of a change in the user state, device state, or environment state.
  • an event of a change in the user state, device state, or environment state please refer to the relevant description of the subsequent method embodiments.
  • multiple devices in the communication system can jointly run a single smart assistant.
  • the smart assistant supports the multiple devices in the communication system to negotiate and determine the central control device, and supports the central control device to select appropriate resources among the multiple devices in the communication system to detect a specific event and analyze the user intention represented by the specific event , Execute the task that satisfies the user's intent.
  • Multiple devices run a single smart assistant, and each device can share and synchronize information based on the smart assistant, ensuring the consistency of interaction context, user portrait, and personalized data, thereby providing users with a coherent and consistent interactive experience.
  • multiple devices in the communication system run a smart assistant, which can save system power consumption.
  • each device in the communication system can deconstruct or encapsulate its own resources into standardized composable capabilities in a unified manner, and provide standard interfaces for other devices in the communication system to call.
  • composable capabilities are single capability components abstracted from physical devices. Composable capabilities can be divided into different types, for details, refer to the detailed introduction of the subsequent embodiments.
  • the central control device can select an appropriate combinable capability in the communication system to detect a specific event, analyze the user intention represented by the specific event, and execute a task satisfying the user intention.
  • the combinable capabilities obtained through deconstruction in a unified way are decoupled from devices, device models, and device manufacturers, so they can be used by other devices in the communication system to call across devices without barriers, that is, to support central control devices to uniformly schedule the resources of each device to meet user needs.
  • the resources of each device are deconstructed into standardized combinable capabilities in a unified way, which is equivalent to using the same resource description specification for different devices, so that the method provided by this application can be adapted to different devices and support different types and different
  • the manufacturer's equipment is added to the communication system to jointly provide services for users, and has a wide range of applications.
  • the central control device can combine some or all resources in the communication system into a virtual aggregation device according to the actual needs of users.
  • the central control device can combine some or all of the combinable capabilities in the communication system for this virtual aggregation device.
  • the virtual aggregation device can be used to detect a specific event, analyze the user intention represented by the specific event, and execute a task satisfying the user intention.
  • various resources required by users can be prepared in advance according to the actual needs of users, so that these resources can be used conveniently and quickly to meet user needs.
  • FIG. 1A is a schematic structural diagram of a communication system 10 provided by an embodiment of the present application.
  • a communication system 10 includes a plurality of electronic devices.
  • the multiple electronic devices in the communication system 10 may be of various types, which is not limited in this embodiment of the present application.
  • the plurality of electronic devices may include mobile phones, tablet computers, desktop computers, laptop computers, handheld computers, notebook computers, smart screens, wearable devices, augmented reality (augmented reality, AR) devices, virtual reality (virtual reality) , VR) devices, artificial intelligence (AI) devices, car machines, smart headphones, game consoles, digital cameras and other smart devices, and can also include smart speakers, smart lamps, smart air conditioners, water heaters, kettles, ovens, coffee Internet of things (IoT) devices or smart home devices such as mobile phones, cameras, doorbells, and millimeter-wave sensors, as well as office equipment such as printers, scanners, fax machines, copiers, and projectors.
  • the multiple electronic devices in the communication system 10 may also include non-portable terminal devices such as laptops with touch-sensitive surfaces or touch panels, desktop computers with touch-sensitive surfaces or touch panels, etc. wait.
  • the communication system 10 may include movable electronic devices, such as mobile phones, tablet computers, smart bracelets, etc., and may also include non-movable smart screens, smart lamps, smart air conditioners, and other devices.
  • the communication system 10 may include electronic devices produced by the same manufacturer, or may include electronic devices produced by different manufacturers, which is not limited in this embodiment of the present application.
  • communication systems in different scenarios may include different devices.
  • the scenarios may include smart home scenarios, sports and health scenarios, audio-visual entertainment scenarios, smart office scenarios, smart travel scenarios, and so on.
  • smart home scenarios may include smart screens, electric toothbrushes, wireless routers, smart speakers, sweepers, body fat scales, watches, mobile phones, and earphones.
  • Smart office scenarios can include computers, mice, wireless routers, electric curtains, desk lamps, watches, mobile phones, and earphones.
  • the multiple electronic devices in the communication system 10 may include smart devices configured with a software operating system (operating system, OS) such as mobile phones, smart screens, computers, etc., and may also include non-intelligent devices such as water heaters and kettles that are not configured with an OS. etc.
  • OS operating system
  • the OS configured by each electronic device can be different, including but not limited to Harmony etc.
  • Each electronic device can also be configured with the same software operating system, for example, it can be configured with Harmony
  • the electronic devices in the communication system 10 establish connections and sessions with other part or all of the electronic devices, and can communicate based on the connections and sessions. That is to say, any two electronic devices in the communication system 10 may be directly connected and communicate with each other, or may communicate indirectly through another electronic device, or there may be no connection and communication relationship.
  • the smart phone A can communicate directly with the smart screen B, the smart screen B and the smart bracelet C can communicate indirectly through the mobile phone, and the smart screen B and the smart speaker D can have no connection relationship and cannot communicate directly .
  • connection between electronic devices can be established in various ways, for example, the connection can be established under the trigger of the user, or the connection can be established actively by the device, which is not limited here.
  • the communication system 10 may only require authentication or authority authentication for the electronic device or the user using the electronic device, and the electronic device is allowed to join the communication system 10 only after the authentication or authority authentication is passed.
  • authentication or authority authentication of electronic devices or users please refer to the related descriptions of the following embodiments for details.
  • the electronic devices in the communication system 10 can establish a connection and communicate through any one or more of the following technologies: wireless local area network (wireless local area network, WLAN), wireless fidelity direct (Wi-Fi direct)/wireless fidelity point-to-point ( Wi-Fi peer-to-peer, Wi-Fi P2P), Bluetooth (Bluetooth, BT), near field communication (near field communication, NFC), infrared (infrared, IR), ZigBee (ZigBee), ultra-wideband (ultra wideband, UWB), hotspot, Wi-Fi softAP, cellular network, wired technology or remote connection technology, etc.
  • the bluetooth may be classic bluetooth or bluetooth low energy (bluetooth low energy, BLE).
  • an electronic device can communicate with other devices in the same wireless local area network (WLAN) through a wireless local area network (WLAN).
  • WLAN wireless local area network
  • an electronic device can discover other nearby devices through short-distance communication technologies such as BT and NFC, and communicate with other devices after establishing a communication connection.
  • an electronic device can work in a wireless access point (access point, AP) mode and create a wireless local area network. softAP communication.
  • multiple electronic devices can log in to the same account or family account or related accounts, for example, log in to the same system account (such as a Huawei account), and then use 3G, 4G, 5G and other cellular network technologies or wide area network technologies to maintain system accounts. communicate with a server (such as a server provided by Huawei), and then communicate through the server.
  • a family account refers to an account shared by family members.
  • Associated accounts refer to multiple accounts that are bound.
  • the mobile phone and the electric toothbrush can communicate through Bluetooth, and the mobile phone and the smart screen can communicate through Wi-Fi.
  • each electronic device in the communication system 10 can communicate in a short distance or in a long distance. That is to say, each electronic device in the communication system 10 may be located in the same physical space, or may be located in different physical spaces.
  • Each electronic device in the communication system 10 may synchronize or share device information with each other based on the communication connection between the devices.
  • the device information may include, for example but not limited to: device identification, device type, available capabilities of the device, status information of the user, device and environment collected by the device, and the like.
  • multiple electronic devices in the communication system 10 can negotiate to determine a central control device based on the device information of each device, and the central control device can select appropriate resources among multiple devices in the communication system 10 to detect A specific event, analyzing the user intent represented by the specific event, and executing a task that satisfies the user intent.
  • the central control device can also be implemented as a distributed system, which can be distributed on multiple devices in the communication system 10, and realize the functions of the central control device by using part or all of the resources of the multiple devices for communication.
  • the central control device can combine some resources in the communication system into a virtual aggregation device according to the actual needs of users.
  • the central control device can combine some or all of the combinable capabilities in the communication system into the virtual aggregation device.
  • the virtual aggregation device can be used to detect a specific event, analyze the user intention represented by the specific event, and execute a task satisfying the user intention.
  • the virtual aggregation device may be deployed on one or more physical devices in the communication system 10, and may be integrated from all or part of resources in the one or more physical devices.
  • each electronic device in the communication system 10 may deconstruct its own resources into standardized combinable capabilities in a unified manner.
  • Composable capabilities can be divided into different types, for details, refer to the detailed introduction of the subsequent embodiments.
  • smart screens can abstract composable capabilities such as screen display, camera recording, speaker playback, microphone pickup, and multimedia playback services.
  • Each electronic device in the communication system 10 may install and run an independent smart assistant, or may not install an independent smart assistant, which is not limited here.
  • Smart assistant is an application program based on artificial intelligence. With the help of speech semantic recognition algorithm, it can help users complete operations such as information query, device control, and text input through instant question-and-answer voice interaction with users. Smart assistants usually adopt cascading processing in stages, and realize the above functions through processes such as voice wake-up, voice front-end processing, automatic speech recognition, natural language understanding, dialogue management, natural language generation, text-to-speech, and response output.
  • multiple electronic devices in the communication system 10 can jointly run a single smart assistant.
  • the smart assistant is deployed on the communication system 10 .
  • the communication system 10 runs an instance of the smart assistant, and the identifiers (such as process numbers) running in each device are the same.
  • the communication system 10 may also run multiple instances of the smart assistant.
  • Instances are running applications.
  • An instance can refer to a process or a thread.
  • a process is an execution of an application program on a computer.
  • a thread is a single sequential flow of control in an application's execution.
  • a process can contain multiple threads.
  • a single smart assistant that multiple electronic devices in the communication system 10 operate together can be implemented as any one or more of system applications, third-party applications, service interfaces, applets or web pages.
  • the single smart assistant run by the communication system 10 is used to support the communication system 10 to execute the method for providing services based on multiple devices provided in the embodiment of the present application.
  • FIG. 1B is a schematic diagram of the software structure of the smart assistant running on the communication system 10 provided by the embodiment of the present application.
  • the smart assistant on the communication system 10 may be a single smart assistant on a virtual aggregation device.
  • the smart assistant can include the following components:
  • the capability discovery component can be deployed in each electronic device of the communication system 10 .
  • the capability discovery component is used to synchronize combinable capabilities with other electronic devices in the communication system 10 , and is also used to manage combinable capabilities available in the communication system 10 .
  • the capability discovery component can also be used to authenticate or authorize the peer device or user before establishing a connection between devices in the communication system 10 .
  • the capability discovery component may further include: an authentication/authentication module, a combinable capability discovery module, a combinable capability set, and a perception data docking module.
  • the authentication/authentication module is used for authenticating and authenticating the local device or the user using the local device before the local device establishes a connection with other devices.
  • the manner of authentication and authentication refer to the introduction of subsequent method embodiments.
  • the combinable capability discovery module is configured to discover other devices in the communication system 10 and the combinable capabilities of the other devices, and synchronize the combinable capabilities of the local device to other devices in the communication system 10 .
  • the manner in which the capability discovery module can be combined to discover other devices can refer to the introduction of subsequent method embodiments.
  • Composable capability set the user manages the composable capabilities of local devices and other discovered devices.
  • the perception data docking module is used to manage the format specification of various data sensed by the sensing perception component. Through this specification, various types of data collected by each device in the communication system 10 can be standardized and managed, so that these data can be called across devices, and resource intercommunication and sharing across devices can be realized.
  • the sensing and sensing component may be deployed in electronic devices capable of sensing in the communication system 10 .
  • Sensing components can be used to perceive the status information of users, equipment and environment, and also to create and maintain user portraits, context and memory.
  • the sensor perception component may further include: a user state perception module, a device state perception module, an environment state perception module, a user portrait module, a context module, and a memory model.
  • the user state perception module, the device state perception module, and the environment state perception module are respectively used to sense the state information of the user, the device, and the environment.
  • the user portrait module is configured to create and maintain a user portrait of the user according to the interaction between the user and each device in the communication system 10 .
  • the context module is configured to create and maintain a global context for the communication system 10 according to the interaction history between the user and each device in the communication system 10 .
  • the memory model is used to create and maintain the memory of the communication system 10 according to the interaction history between the user and each device in the communication system 10, the operation history of the device, and the like.
  • the system central control component is deployed on the central control device determined through negotiation among the various electronic devices in the communication system 10 .
  • the central control component of the system is used to dynamically build a virtual aggregation device based on various types of information obtained by the sensing component and the actual needs of users, select the appropriate available capabilities in the communication system 10 maintained by the capability discovery component.
  • the system central control component is also used to select an appropriate combinable capability in the communication system 10 to detect a specific event, analyze the user's intention represented by the specific event, and execute a task satisfying the user's intention.
  • system central control component may further include: a system reconstruction module, an interaction mode scheduling module, and a service capability scheduling module.
  • the system reconfiguration module is used to dynamically construct a virtual aggregation device based on various types of information obtained by the sensing component and the actual needs of users, select suitable and available composable capabilities in the communication system 10 maintained by the capability discovery component.
  • the interaction mode scheduling module is used to select the appropriate combinable capabilities in the communication system 10 to detect a specific event and analyze the user intention represented by the specific event.
  • the service capability scheduling module is used to select the appropriate combinable capability in the communication system 10 to execute the task satisfying the user's intention.
  • the interaction identification component is deployed on the electronic device selected by the central control device, where the combinable capability for detecting a specific event and analyzing the user intention represented by the specific event resides.
  • This particular event can be a modality or a combination of modalities.
  • the modality may include, for example, text, voice, vision, action, situation (such as the location of the user, the distance between the user and the device), scenes (such as office scenes, home scenes, commuting scenes), etc.
  • the interaction identification component is used to determine whether a specific event has been detected based on various information obtained by the sensor perception component, and analyze the user intention represented by the detected specific event, and can also decompose the user intention into a multi-modal form.
  • the interaction recognition component may further include: an interaction trigger module, an interaction instruction recognition module, and a multimodal intent decision module.
  • the interactive triggering module is used to determine whether a specific event is detected according to various information obtained by the sensor perception component.
  • the interactive instruction recognition module is used for analyzing the user intention represented by the detected specific event.
  • the multi-modal intent decision-making module is used to decompose user intent into multi-modal tasks to be performed.
  • the service response component is deployed on the electronic device selected by the central control device, where the composable capabilities for performing tasks satisfying the user's intention are located.
  • the service response component is used to arrange the response task sequence according to the user intention obtained from the analysis of the interactive recognition component, and control the execution of the response task according to a certain logical relationship. It is also used to dynamically connect or switch devices according to various information obtained by the sensor perception module. /ability to perform answering tasks.
  • the service response component can be used to perform tasks of various modalities.
  • the service response component may further include: a task sequence generation module, a task mapping module, a task management module, and a task execution runtime (Runtime).
  • a task sequence generation module may further include: a task sequence generation module, a task mapping module, a task management module, and a task execution runtime (Runtime).
  • a task sequence generating module configured to generate one or more tasks satisfying the user intent.
  • a task mapping module is used to map one or more tasks to suitable composable capabilities for execution.
  • the task management module is used to control the execution of one or more tasks according to a certain logical relationship according to the user intention analyzed by the interactive identification component.
  • the task execution runtime (Runtime) is used to run the response task.
  • the software structure of the smart assistant shown in FIG. 1B is only an example, and does not constitute a specific limitation on the smart assistant running on the communication system 10 .
  • the smart assistant may include more or fewer components than shown in the illustration, or combine some components, or split some components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the above-mentioned components or modules shown in FIG. 1B can be deployed in the form of device-side programs and/or cloud services, and can also be deployed and run on one or more devices using a distributed or centralized architecture.
  • the authentication/authentication module, user portrait module, context module, memory model, interactive instruction recognition module, multi-modal intent decision-making module, etc. can be deployed in a combination of devices and clouds.
  • the communication system 10 may also include multiple central control devices, and each central control device may form different virtual aggregation devices, and run instances of multiple virtual aggregation devices, respectively running on different virtual aggregation devices A single smart assistant. In this way, different virtual aggregation devices can be set up for different users, so as to provide personalized services for different users.
  • the number of electronic devices in the communication system 10 may vary. For example, some devices may be added to the communication system 10, or some devices may be reduced.
  • the device If the device is connected to the communication system 10 without the communication system 10 knowing or storing the relevant information of the device (such as identification, type, etc.), it is said that the device joins the communication system 10 . If the device is connected to the communication system 10 under the condition that the communication system 10 knows or stores the related information of the device (such as identification, type, etc.), it is called that the device goes online.
  • the device is disconnected from the communication system 10 if the communication system 10 does not store the relevant information of the device, it is called that the device leaves the communication system 10. If the communication system 10 still stores relevant information of the device after the device is disconnected from the communication system 10, it is called that the device is offline. Electronic devices usually leave the communication system 10 or go offline from the communication system 10 due to reasons such as location change or battery exhaustion.
  • the communication system 10 shown in FIG. 1A is only an example.
  • the communication system 10 may include more devices, which is not limited here.
  • the communication system 10 may also include a router for providing WLAN, a server for providing authentication/authentication services, a server for storing combinable capability information, context, user portrait or memory, and a server for managing accounts , a server for managing each electronic device in the communication system 10, and the like.
  • the communication system 10 may also be called a distributed system, an interconnected system, and other terms, which are not limited here.
  • FIG. 2 is a schematic structural diagram of an electronic device 100 provided in an embodiment of the present application.
  • the electronic device 100 may be any electronic device in the communication system 10 .
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • the processor 110 may be configured to destructure the resources of the electronic device 100 into standardized composable capabilities in a unified manner.
  • the processor 110 can be used to select appropriate resources among multiple devices in the communication system 10 to detect a specific event and analyze the specific event.
  • the user's intention represented by the event, and the task that satisfies the user's intention is executed.
  • the processor 110 is configured to invoke related components of the electronic device (such as a display screen, a microphone, a camera, etc.) to detect a specific event.
  • related components of the electronic device such as a display screen, a microphone, a camera, etc.
  • the processor 110 is configured to analyze the user intention represented by the specific event.
  • the processor 110 is used to call related components of the electronic device (such as a display screen, a microphone, a camera, etc.) to perform tasks that meet the user's intention. .
  • the wireless communication function of the electronic device 100 can be realized by the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, a baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves and radiate them through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be set in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be set in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is passed to the application processor after being processed by the baseband processor.
  • the application processor outputs sound signals through audio equipment (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110, and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide WLAN (such as Wi-Fi), BT, global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), NFC, IR, UWB applied on the electronic device 100. and other wireless communication solutions.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , demodulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the mobile communication module 150 or the wireless communication module 160 is used to support the electronic device 100 and other devices in the communication system 10 to establish a connection and communicate, and to synchronize or share device information with each other.
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR techniques, etc.
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • code division multiple access code division multiple access
  • CDMA broadband Code division multiple access
  • WCDMA wideband code division multiple access
  • time division code division multiple access time-division code division multiple access
  • TD-SCDMA time-division code division multiple access
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou navigation satellite system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 realizes the display function through the GPU, the display screen 194 , and the application processor.
  • the GPU is a microprocessor for image processing, connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel may be a liquid crystal display (LCD).
  • the display panel can also use organic light-emitting diodes (organic light-emitting diodes, OLEDs), active-matrix organic light-emitting diodes or active-matrix organic light-emitting diodes (active-matrix organic light emitting diodes, AMOLEDs), flexible light-emitting diodes ( flex light-emitting diode, FLED), miniled, microled, micro-oled, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the electronic device may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 and the application processor.
  • the ISP is used for processing the data fed back by the camera 193 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, etc.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be located in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be realized through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the internal memory 121 may include one or more random access memories (random access memory, RAM) and one or more non-volatile memories (non-volatile memory, NVM).
  • RAM random access memory
  • NVM non-volatile memory
  • Random access memory can include static random-access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous dynamic random access memory, SDRAM), double data rate synchronous Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM, such as the fifth generation DDR SDRAM is generally called DDR5 SDRAM), etc.; non-volatile memory can include disk storage devices, flash memory (flash memory) .
  • SRAM static random-access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • non-volatile memory can include disk storage devices, flash memory (flash memory) .
  • flash memory can include NOR FLASH, NAND FLASH, 3D NAND FLASH, etc.
  • it can include single-level storage cells (single-level cell, SLC), multi-level storage cells (multi-level cell, MLC), triple-level cell (TLC), quad-level cell (QLC), etc.
  • SLC single-level storage cells
  • MLC multi-level storage cells
  • TLC triple-level cell
  • QLC quad-level cell
  • UFS universal flash storage
  • embedded multimedia memory card embedded multi media Card
  • the random access memory can be directly read and written by the processor 110, and can be used to store executable programs (such as machine instructions) of an operating system or other running programs, and can also be used to store data of users and application programs.
  • the non-volatile memory can also store executable programs and data of users and application programs, etc., and can be loaded into the random access memory in advance for the processor 110 to directly read and write.
  • the external memory interface 120 can be used to connect an external non-volatile memory, so as to expand the storage capacity of the electronic device 100 .
  • the external non-volatile memory communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, files such as music and video are stored in an external non-volatile memory.
  • the electronic device 100 can implement audio functions through the audio module 170 , the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 may be set in the processor 110 , or some functional modules of the audio module 170 may be set in the processor 110 .
  • Speaker 170A also referred to as a "horn" is used to convert audio electrical signals into sound signals.
  • Electronic device 100 can listen to music through speaker 170A, or listen to hands-free calls.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the receiver 170B can be placed close to the human ear to receive the voice.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a phone call or sending a voice message, the user can put his mouth close to the microphone 170C to make a sound, and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In some other embodiments, the electronic device 100 may be provided with two microphones 170C, which may also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions, etc.
  • the gyro sensor 180B can be used to determine the motion posture of the electronic device 100 .
  • the angular velocity of the electronic device 100 around three axes may be determined by the gyro sensor 180B.
  • the gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shaking of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • the distance sensor 180F is used to measure the distance.
  • the electronic device 100 may measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F for distance measurement to achieve fast focusing.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can utilize the collected fingerprint characteristics to implement fingerprint unlocking, access to application locks, fingerprint photography, and fingerprint answering of incoming calls.
  • the bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the human pulse and receive the blood pressure beating signal. In some embodiments, the bone conduction sensor 180M can also be disposed in the earphone, combined into a bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vibrating bone mass of the vocal part acquired by the bone conduction sensor 180M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.
  • FIG. 2 does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than those shown in the illustration, or combine certain components, or separate certain components, or include components different from those in FIG. 2 .
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • FIG. 3 exemplarily shows a flow of a method for providing services based on multiple devices.
  • the method may include the following steps:
  • the electronic device deconstructs resources into composable capabilities.
  • the number of electronic devices executing S101 may be one or multiple.
  • the resources in the electronic device may include one or more of the following: software resources of the electronic device, hardware resources, peripherals or resources of peripherals, and so on. in:
  • the hardware resource is related to the hardware configured by the electronic device, and may include, for example, a camera, a sensor, an audio device, a display screen, a motor, a flash light, and the like of the electronic device.
  • Software resources are related to the software configured by electronic devices, such as memory resources, computing capabilities (such as beauty algorithm capabilities, audio and video codec capabilities), network capabilities, device connection capabilities, device discovery capabilities, and data transmission capabilities possessed by electronic devices. etc. Further, the software resource may include photographing service, recording service, fingerprint authentication service, sports health service, playback service, short message service, voice recognition service, video call service, etc. provided by the electronic device.
  • the software resources may include system resources or third-party resources, which are not limited here.
  • Peripherals refer to devices that are connected to electronic devices and are used to transmit, transfer and store data and information. Peripherals may include, for example, accessory devices of electronic devices, such as a mouse, an external display screen, a Bluetooth headset, a keyboard, and smart watches, smart bracelets, etc. managed by the electronic device. Peripheral resources may include hardware resources and software resources, and the hardware resources and software resources may refer to the related description above.
  • an electronic device may include any one or more types of resources mentioned above, and may also include one or more resources.
  • the electronic device can deconstruct or package its own resources into standardized composable capabilities in a unified manner, and provide standard interfaces for calling by other devices in the communication system.
  • a composable capability is a single capability component abstracted from a physical device.
  • An electronic device may use one or more of capabilities, services, or information to describe its own resources as standardized composable capabilities.
  • an electronic device can deconstruct its own resources into different composable capabilities according to different types of capabilities (such as connection capabilities, audio and video capabilities, and shooting capabilities, etc.), or it can use different types of services (such as location services, cloud location services, Cloud computing services, etc.) deconstruct their own resources into different composable capabilities, and can also deconstruct their own resources into different composable capabilities according to different types of information (such as image information, text information).
  • a composable capability is a resource described by an electronic device in a predetermined manner.
  • the predetermined manner may include a predetermined format, protocol or standard, and the like.
  • Schema, Protobuf, extensible markup language (extensible markup language, XML), JSON (java script object notation) and other modes can be used to describe the composable capability of the device, so as to be forward/backward compatible with the composable capability description file different versions of .
  • FIG. 4 exemplarily shows the types of combinable capabilities provided by the embodiment of the present application.
  • Interaction class composable capabilities can be used to detect specific events.
  • a specific event may include an interactive operation input by a user, and may also include an event in which a state of a user, a device, or an environment changes.
  • the interactive class combinable capabilities may further include but not limited to the following one or a combination of several types:
  • the voice interaction class can be combined to collect voice commands input by users and surrounding environmental sounds.
  • the voice interaction-type combinable capability can be obtained based on resource packaging of the electronic device such as the microphone 170C, the receiver 170B, and an external earphone or microphone.
  • the text interaction class can be combined to collect the text entered by the user.
  • the composable capability of the text interaction class can be obtained based on resource encapsulation such as the display screen 194 of the electronic device.
  • the visual interaction class can be combined to collect visual interaction information such as visible light images, infrared images, depth images, skeletal points, eye movement/line of sight, etc. Combinable visual interaction capabilities can be obtained based on resources such as the camera 193 (such as an infrared camera and a depth camera) of an electronic device.
  • the camera 193 such as an infrared camera and a depth camera
  • the tactile interaction class can be combined to collect the user's touch input, knuckle input, key input, etc.
  • the composable capability of tactile interaction can be obtained based on resource packaging such as the display screen 194 of the electronic device, the touch sensor 180K, and the pressure sensor 180A.
  • Physiological signal interaction class can be combined, which can be used to collect physiological signals such as electromyographic signals, brain waves, heart rate, and blood oxygen.
  • physiological signals such as electromyographic signals, brain waves, heart rate, and blood oxygen.
  • the composable capability of physiological signal interaction can be obtained based on the packaging of hardware resources such as optical sensors and electrodes.
  • the gesture interaction class can be combined to collect user gesture information.
  • the composable capability of attitude interaction can be obtained based on resource packaging such as gyroscope sensor 180B, acceleration sensor 180E, and inertial sensor.
  • the ability to identify class composables can be used to identify the user intention represented by a specific event detected by the combinable ability of the interaction class, and determine the task to be executed corresponding to the user intent.
  • the combinable capabilities of interaction recognition can first identify the specific information represented by the specific event (such as semantics, text information), etc., and then identify the user intention represented by the specific information.
  • the ability to identify class combinations may further include, but not limited to, one or more of the following:
  • Speech recognition can be combined, which can be used to recognize speech, and can be packaged based on automatic speech recognition (automatic speech recognition, ASR), natural language understanding (natural language understanding, NLU) and other technologies.
  • ASR automatic speech recognition
  • NLU natural language understanding
  • the visual recognition class can be combined, which can be used for gestures and gestures, and can be obtained based on resource encapsulation such as computer vision algorithms.
  • the ability of environment recognition can be combined, which can be used to identify user location and user interest, and can be obtained based on package, location recognition algorithm, etc.
  • Service classes compose capabilities to perform tasks that satisfy user intent, thereby providing services to users.
  • the service class composability may further include, but not limited to, one or more of the following combinations:
  • the environment adjustment class can be combined to adjust the environment, such as heating, cooling, humidification, dehumidification, adjustment of light intensity, etc., and can be obtained based on air conditioners, humidifiers, lamps and other equipment packages.
  • Capabilities can be combined in the control category to control devices, for example, it can include device start and stop, device pairing, parameter adjustment, and so on.
  • the information service class can combine capabilities to provide information services, such as search, navigation, ordering meals, and so on.
  • the data processing class can combine capabilities to process various types of data, such as music playback, video playback, data synchronization, and so on.
  • connection class can be combined to support the connection, communication and interaction between devices, and can also be used to describe communication parameters such as communication delay and bandwidth of devices.
  • connection-type composable capabilities may further include one or a combination of the following types:
  • Combinable short-distance connection capabilities are used to support devices to connect and communicate with other devices through short-distance communication technologies.
  • Short-range communication technologies may include, for example, Wi-Fi, BT, NFC, UWB, and the like.
  • the combinable capability of short-distance connection can be obtained based on the encapsulation of resources such as wireless communication module 160 and antenna.
  • Combinable long-distance connection capabilities are used to support devices to connect and communicate with other devices through long-distance communication technologies.
  • the long-distance communication technology may include, for example, cellular technology (such as 4G, 5G), LAN, wired technology (such as optical fiber), and the like.
  • the long-distance connection-type combinable capability can be obtained based on the encapsulation of resources such as the mobile communication module 150, antenna, and wired interface.
  • the manner of classifying the combinable capabilities shown in FIG. 4 is not limited. In some other embodiments, other manners may be used to classify the class of combinable capabilities, which is not limited here. For example, it can also be divided by data type, and the data type can include image/video, voice, text, and so on.
  • an electronic device may include any one or more types of combinable capabilities described above, and may also include one or more combinable capabilities.
  • the combinable capabilities obtained by deconstructing in a unified way are decoupled from devices, device models, and device manufacturers, so they can be used by other devices in the communication system to call across devices without barriers, that is, to support central control devices to uniformly schedule the resources of each device , so as to meet the needs of users.
  • Devices are added to the communication system to jointly provide services for users.
  • attributes of the combinable capability may also be obtained.
  • the attributes of the combinable capability may include, for example, the following two categories: 1. Static information, such as the class of the combinable capability itself, parameters (such as the resolution of the captured image), performance (such as the sound pickup range), version, power consumption, size specifications (such as display specifications), etc. 2. Dynamic information, including information that will change in different environments, such as location (such as indoors, outdoors, living room, bedroom, near field or far field, etc.), orientation, whether it is plugged in (such as when the mobile phone is plugged in, become less sensitive to power consumption), etc.
  • the attributes of the composable capabilities may be manually configured by the user (such as specifying its location when the device is initialized), or it may be detected by the device itself during operation (such as detecting whether there are other devices around through the ultrasonic sensor, etc.).
  • multiple electronic devices may establish a session after establishing a connection, so as to form a communication system 10 .
  • a plurality of electronic devices may first establish a connection, and then establish a session after authentication and authentication, so as to form the communication system 10 .
  • a plurality of electronic devices form a communication system 10, which can also be referred to as the electronic device joining the communication system 10
  • the establishment of a connection or a session between the above-mentioned multiple electronic devices may be that one electronic device establishes a connection or session with any other electronic devices in whole or in part.
  • Electronic devices can be connected and communicate through one or more of the following technologies: WLAN, Wi-Fi P2P, BT, NFC, IR, ZigBee, UWB, hotspot, Wi-Fi softAP, cellular network, wired technology or remote connection technology and more.
  • WLAN Wi-Fi P2P
  • BT Wireless Fidelity
  • an electronic device can establish a connection with other devices in any of the following ways:
  • the electronic device can establish a connection with other devices under the trigger of the user.
  • the electronic device may receive an input user operation, and the electronic device may establish a connection with other devices triggered by the user operation.
  • the embodiment of the present application does not limit the implementation of the user operation.
  • the user operation may include, but is not limited to: touch operation/click operation/long press operation on the display screen, voice commands, air gestures, shaking electronic devices, etc. The operation of the device, the operation of pressing the button, and so on.
  • the electronic device may display options 501 of multiple surrounding wireless networks found in the user interface 51 provided by the setting application. After the user clicks one of the options, a password input box 502 shown in FIG. 5B may be displayed. The user can join the wireless network after entering the authentication password of the wireless network corresponding to this option. After an electronic device joins the wireless network, it establishes a connection with other electronic devices connected to the wireless network.
  • the electronic device can discover other nearby devices through BT, NFC and other short-range communication technologies, or NFC touch, or Wi-Fi P2P technology, and establish communication connections with other devices.
  • the electronic device can log in to the account after the user enters the account number 503 and password 504 of the system, and then use cellular network technologies such as 3G, 4G, and 5G or wide area network technologies to pass through a server maintaining the system account (such as Huawei Provided server) to establish a connection with other devices logged in with the same account or family account or associated account.
  • cellular network technologies such as 3G, 4G, and 5G or wide area network technologies to pass through a server maintaining the system account (such as Huawei Provided server) to establish a connection with other devices logged in with the same account or family account or associated account.
  • the electronic device may receive an input user operation, and instruct the device managed by the electronic device to establish a connection with other devices under the trigger of the user operation.
  • a smart phone can manage IOT devices (such as smart speakers, smart lamps, smart screens, etc.). -Fi password. Afterwards, the smartphone can send the Wi-Fi password to the IOT device, triggering the IOT device to join the Wi-Fi network.
  • IOT devices such as smart speakers, smart lamps, smart screens, etc.
  • the smartphone can send the Wi-Fi password to the IOT device, triggering the IOT device to join the Wi-Fi network.
  • the electronic device may display a user interface 61 .
  • the user interface 61 may be provided by a smart life APP in the electronic device.
  • Smart Life is an application for managing various devices owned by users.
  • the user interface 61 displays: family name 611 , device quantity 612 , discovered device options 613 , virtual aggregation device options 614 , add control 615 , one or more device options, and a page option display area. in:
  • the home name 611 may be used to indicate the area name covered by the communication system 10 .
  • the family name 611 can be set by the user.
  • the family name 611 may be "home".
  • the number of devices 612 may be used to indicate the number of devices included in the communication system 10 .
  • the number of devices included in the communication system 10 is "5 devices”.
  • the discovered device option 613 may be used to trigger the electronic device to display device options corresponding to one or more electronic devices included in the communication system 10 . As shown in FIG. 5D , the found device option 613 has been selected, and the electronic device can display the information corresponding to one or more electronic devices (for example, routers, air conditioners, speakers, headlights, and large screens, etc.) included in the communication system 10. Equipment options (for example, router equipment options, air conditioner equipment options, speaker equipment options, headlight equipment options, and large screen equipment options, etc.).
  • the virtual aggregated device option 614 may be used to trigger the electronic device to display the electronic device to which each combinable capability of the virtual aggregated device belongs.
  • Add control 615 may be used to trigger the addition of one or more electronic devices to communication system 10 .
  • the user interface displayed by the electronic device to add the electronic device to the communication system 10 in response to acting on the adding control 615 will be described in subsequent embodiments, and will not be repeated here.
  • One or more equipment options can be used to display electronic equipment specifically included in the communication system 10 (for example, routers, Air conditioners, speakers, headlights and large screens, etc.), the location of each electronic device (for example, bedroom, living room, etc.), and the device status of each electronic device (for example, online, turned off, etc.).
  • Each equipment option may also include a corresponding control control (for example, the control control 616 in the air conditioner equipment option) for controlling the start or stop of the corresponding electronic equipment (for example, air conditioner).
  • the electronic device may display an option 617 on the user interface 61 .
  • the option 617 may display a text prompt message "add device”.
  • the electronic device may display the user interface 62 as shown in FIG. 5F.
  • the user interface 62 includes a return control 621 , a page title 622 , a scanning prompt 623 , a scanned device display area 624 , a manual adding control 625 and a scanning code adding control 626 . in:
  • the scanning prompt 623 may be used to prompt the user of the scanning status of the electronic device.
  • “scanning” may indicate that the electronic device is scanning for nearby electronic devices that may be added.
  • the electronic device can determine whether there is an electronic device that can be added to the communication system 10 in the vicinity through Bluetooth communication.
  • the electronic device may broadcast a device discovery request through Bluetooth communication.
  • the electronic device with Bluetooth enabled can send a discovery response to the electronic device through Bluetooth communication.
  • the electronic device can scan the smart home device, and display the added control of the smart home device in the user interface.
  • the embodiment of the present application does not limit the method for the electronic device to scan the electronic device.
  • the scan prompt 623 may also include content for prompting the user to add a smart home device.
  • the above considerations can include "Please ensure that the smart device is connected to a power source and located near the mobile phone”.
  • the scanned device display area 624 can be used to display electronic devices scanned by the electronic device. For example, an electronic device scans to a desktop computer. The electronic device may display the name of the desktop computer "desktop computer", the device status "online” of the desktop computer, the location of the desktop computer "bedroom” and add control 624A in the scanned device display area 624 .
  • the add control 624A described above can be used to trigger the desktop computer to join the communication system 10 .
  • the manual adding control 625 can facilitate the user to add the electronic device to the communication system 10 by manually inputting the information of the electronic device to be added in the electronic device.
  • the scan code adding control 626 can be used to trigger the electronic device to start the scanning device. That is, users can add electronic devices to the communication system 10 by scanning data such as two-dimensional codes and barcodes.
  • the embodiment of the present application does not limit the implementation methods of manually adding an electronic device to the communication system 10 and scanning a code to add an electronic device to the communication system 10 .
  • the electronic device may display the device option 618 of the desktop computer on the user interface 61 .
  • the device option 618 may include the name of the desktop computer "desktop computer", the device status of the desktop computer "online” and the location of the desktop computer "bedroom”.
  • the device option 618 may also include a control control 618A that may be used by the user to control the turning off or turning on of the desktop computer.
  • the electronic device can actively establish a connection with other devices.
  • Electronic devices can actively establish connections with other devices under certain circumstances. In this way, manual operation by the user is not required, user behavior can be simplified, and the efficiency of providing services based on multiple devices can be improved.
  • the electronic device can actively search for a nearby wireless network, and if the electronic device itself stores the password of the wireless network, it can actively join the wireless network. For example, after the user goes home every day, the electronic device carried by the user can automatically connect to the home network.
  • the electronic device when it is in a specific location (such as home, office, etc.), it may actively join the communication system in the specific location.
  • a specific location such as home, office, etc.
  • authentication and authentication may be performed first, such as password authentication shown in FIG. 5B .
  • an electronic device After being authenticated and authenticated, an electronic device can establish a connection with other electronic devices.
  • other electronic devices can also authenticate and authenticate the electronic device or the user using the electronic device, and The electronic device is allowed to establish a session with other electronic devices only after the authorization is passed, so that the communication system 10 is formed.
  • Other electronic devices authenticate and authenticate the electronic devices, which can ensure that only trusted and secure devices can be connected to the other electronic devices, and can ensure the data security of other electronic devices.
  • the manner of authentication or authentication may include verifying the security level, type, etc. of the electronic device.
  • the device security level is mainly determined by the hardware and software configuration of the electronic device itself.
  • the manner of authenticating or authenticating the user may include identity authentication.
  • Identity authentication methods can include: password (such as a string composed of numbers, letters, and symbols) authentication, graphic authentication, biometrics (such as face, voiceprint, fingerprint, palm shape, retina, iris, human body odor, face shape, blood pressure) , blood oxygen, blood sugar, respiration rate, heart rate, a cycle of ECG waveform), etc.
  • password such as a string composed of numbers, letters, and symbols
  • biometrics such as face, voiceprint, fingerprint, palm shape, retina, iris, human body odor, face shape, blood pressure
  • blood oxygen blood sugar, respiration rate, heart rate, a cycle of ECG waveform
  • FIG. 5C after the user enters the password, the electronic device may display the prompt information shown in FIG. 5H.
  • the prompt information can
  • FIG. 5H shows a user interface displayed on the electronic device when the electronic device is authenticated or authenticated.
  • the electronic device may display a prompt box 505 after receiving the wireless network password input by the user in FIG. 5B , or after inputting the account number and password in FIG. 5C .
  • the prompt box 505 can be used to prompt the user to input a face to complete the authentication and authentication of the electronic device.
  • the electronic device after the electronic device establishes a connection with other electronic devices using the above method 1 or method 2, it can authenticate and authenticate other devices, and establish a session with other electronic devices after the authentication and authentication pass, so as to Formed into a communication system 10 .
  • Electronic devices can authenticate and authenticate other devices by verifying their security level or type.
  • the electronic device authenticates and authenticates other electronic devices, which can ensure that the electronic device establishes a session with other trusted and secure electronic devices to form a communication system, and can ensure the security of data in the electronic device.
  • the authentication and authentication process before establishing a connection between electronic devices, and the authentication and authentication process before establishing a session between electronic devices can be reused, that is, the same authentication and authentication process can be implemented. right process. That is to say, electronic devices can establish connections and sessions with each other after one authentication and authentication. For example, in the way of adding devices to the communication system 10 through the Smart Life APP shown in FIGS. 5D-5G , a connection and session can be established between electronic devices after one authentication and authentication. In some other embodiments, the authentication and authentication process before establishing a connection between electronic devices, and the authentication and authentication process before establishing a session between electronic devices may be performed separately and independently.
  • the electronic device can record the other party's information, so that the electronic device can subsequently authenticate and authenticate other electronic devices again. Afterwards, the electronic device may be disconnected or disconnected from other electronic devices or sessions due to reasons such as location changes and power exhaustion. When the electronic device is connected to other electronic devices or establishes a session again, it can be done without manual operation by the user. The recorded information can be used for authentication and authentication.
  • the mobile phone A joins the communication system 10 in the home area after passing the authentication, and when the user returns to the home area with the mobile phone A after going out, the user does not need to manually operate the mobile phone A to join the communication system 10 in the home area. communication system 10.
  • the operation of adding the electronic device to the communication system 10 can be simplified, and the implementation efficiency of the method provided by the present application can be improved.
  • the aforementioned authentication and authentication of electronic devices may be performed locally by the device performing authentication and authentication, or may be performed in conjunction with a cloud server.
  • the authentication and authentication of the electronic device After the authentication and authentication of the electronic device is passed, its authentication results can be classified into different levels.
  • the setting of the authentication result level is not specifically limited here. For example, the higher the security level of the electronic device, the higher the level of its authentication result. That is to say, the level of the authentication result reflects the degree of trust for the electronic device, and the higher the level of the authentication result, the higher the level of trust.
  • the electronic device after the electronic device joins the communication system 10, it may leave the communication system 10 or go offline from the communication system 10 due to reasons such as location change and battery exhaustion. After the electronic device leaves the communication system 10, it may also join the communication system 10 again. After the electronic device is offline from the communication system 10, it may also be online again.
  • one electronic device may join multiple different communication systems. For example, when mobile phone A is within the home range, it can join the communication system in the home; when mobile phone A is in the office, it can join the communication system in the office.
  • multiple electronic devices can be connected to each other and form a communication system, which is convenient for subsequent multiple electronic devices to provide users with collaborative services and realize efficient and natural cross-device resource sharing.
  • each electronic device in the communication system 10 synchronizes device information with each other based on the connection between the devices.
  • each electronic device in the communication system 10 may synchronize device information with each other immediately after S102 is executed, that is, after the communication system 10 is established.
  • the electronic device can synchronize device information with other devices in the communication system 10 .
  • the electronic device can synchronize the device information with other devices in the communication system 10 .
  • each electronic device in the communication system 10 may also periodically or aperiodically synchronize device information with each other according to a preset rule, for example, every 30 seconds or every minute.
  • each electronic device in the communication system 10 may synchronize device information with each other based on the connection between the devices. For example, if each electronic device in the communication system 10 is connected to the same WLAN, the device information can be synchronized with each other through the WLAN (for example, through a router). For another example, if electronic devices in the communication system 10 are connected through Bluetooth, device information may be synchronized with each other based on the Bluetooth connection. For another example, if each electronic device in the communication system 10 is remotely connected by logging into the same account, the device information can be transferred through the server that manages the account. If the communication system 10 includes two electronic devices that are not directly connected, the two electronic devices can synchronize device information with each other through an intermediate device in the communication system 10 .
  • the device information of the mutual synchronization of the electronic devices in the communication system 10 includes: all or part of combinable capability information of the electronic devices.
  • the combinable capability information is used to characterize or describe the corresponding combinable capability.
  • the composable capability information may also be used to describe the attributes of the composable capability.
  • the implementation form of the combinable capability information is not limited in this embodiment of the present application. For the classification and attributes of composable capabilities, please refer to the relevant description in S101 above.
  • all electronic devices in the communication system can synchronize all combinable capability information with each other.
  • the information of all combinable capabilities of the electronic device refers to the information of all the combinable capabilities of the electronic device obtained by deconstructing its own resources in S101.
  • electronic devices in the communication system may synchronize part or all of their combinable capability information with each other.
  • Part of the combinable capability information sent by the electronic device to other devices in the communication system 10 may be determined according to any of the following strategies:
  • the electronic device determines combinable capability information to be sent to other devices in the communication system 10 according to the authentication result level when joining the communication system 10 .
  • the authentication result may include an authentication result of the electronic device to the communication system 10, and/or an authentication result of the communication system 10 to the electronic device, for details, refer to the related description of S102 above.
  • the electronic device determines combinable capability information to be sent to other devices in the communication system 10 according to the needs of the user.
  • the user can manually set the combinable capabilities open to other devices in the communication system 10 on the electronic device, and the electronic device can send corresponding combinable capability information to other devices in the communication system 10 according to the user's settings.
  • FIG. 5I exemplarily shows a way for a user to set combinable capabilities open to other devices in the communication system 10.
  • FIG. 5I is a user interface 55 provided by a setting application in an electronic device, and the user interface 55 displays: one or more combinable capability options 506 .
  • the one or more combinable capability options 506 may correspond to the combinable capabilities obtained by the configuration of the electronic device in S101.
  • the electronic device may detect a user operation acting on the combinable capability option 506 , and open the combinable capability corresponding to the combinable capability option 506 to other devices in the communication system 10 .
  • the electronic device may also display finer-grained combinable capabilities for the user to select.
  • the user can also set the combinable capabilities of the electronic device to other devices in the communication system 10 through other methods.
  • users can open different combinable capabilities for different communication systems.
  • the user can enable different combinable capabilities for the same communication system under situational conditions, such as different time periods.
  • the electronic device determines the combinable capability information to be sent to other devices in the communication system 10 according to its own policy.
  • the electronic device may only synchronize part of the combinable capability information to other devices in the communication system 10 based on factors such as user privacy and device power consumption.
  • the electronic device may hide gesture interaction combinable capabilities and only send other combinable capabilities to other devices in the communication system 10 .
  • the electronic device may open the combinable capability of visual interaction with low confidentiality to other devices, but not open the combinable capability of payment to other devices.
  • Each device in the communication system 10 synchronizes combinable capability information with each other, which can facilitate subsequent implementation of cross-device resource sharing.
  • the device information of the electronic devices synchronized with each other in the communication system 10 may further include: device attributes of the electronic devices.
  • Device properties may include, for example, one or more of the following: device identification, device type, current power consumption, available resources, device mode, current usage status, online information, offline information, and historical interactions with other devices in the communication system 10 Information, device location (such as room, living room, etc.), orientation, type of environment in which the device is located (such as office, home range, etc.), and so on.
  • the device identifier may be an IP address or a MAC address of the device.
  • device types can be divided into rich devices and thin devices, and can also be divided into smart screens, air conditioners, printers and other types according to the device form.
  • Available resources may include, for example, computing resources, memory resources, power resources, and the like.
  • a device mode refers to an information interaction mode provided or supported by an electronic device, and may include, for example, a voice interaction mode, a display interaction mode, a lighting interaction mode, a vibration interaction mode, and the like.
  • the current usage status may include, for example, currently enabled applications or hardware of the device.
  • the online information may include the number, time, duration, etc. of the electronic device online.
  • the offline information may include the number, time, duration, etc. of the electronic device offline.
  • the historical interaction information between the electronic device and other devices characterizes the rule of interaction between the electronic device and other devices.
  • the historical interaction information may include, for example: the type of interactive business, the business initiator, the business responder, the interaction duration, the business initiation time, the business end time, the average number of online devices within the statistical time period, and the average online devices within the statistical time period One or more of the normalized standard deviation of the number and the number of historical online devices within the statistical time period.
  • Services interacted between electronic devices may include, for example, file transfer, video connection, audio connection, signaling transmission, data distribution, and the like.
  • the tablet computer and the smart screen can record the interactive behavior information corresponding to this video connection service.
  • the interactive behavior information may include one or more of the following information: service type-video connection, service initiator-tablet computer, service responder-smart screen, interaction duration-2 hours and 15 minutes, service initiation time-1 19:37 on January 1st, business end time - 21:52 on January 1st.
  • the average number of online devices within the statistical time period, the normalized standard deviation of the average online device number within the statistical time period, and the historical number of online devices within the statistical time period can be determined by the electronic device according to other information in the communication system 10 within the statistical time period.
  • the statistical time period can be set according to actual needs, for example, it can be the last 1 day, 3 days, 1 week or 1 month and so on.
  • the average number of online devices refers to the average number of devices in the communication system 10 that are online in a unit time (eg, one day, one week, etc.) within a statistical time period, which is counted by electronic devices. If the same device goes online multiple times per unit time, it can only be counted once, and its online times are not accumulated. For example, assuming that the statistical time period is from January 1 to January 7, the number of online devices counted by electronic devices within the statistical time period is shown in Table 1:
  • the normalized standard deviation of the average number of online devices refers to the standard deviation of the number of devices that are online in the communication system 10 in a unit of time (such as one day, one week, etc.) in the electronic device within the statistical period of time, divided by The value obtained by the average number of online devices.
  • the electronic device calculates the standard deviation of the number of online devices counted every day according to the data in Table 1, the calculation process of the standard deviation can be expressed as:
  • the device information of each electronic device in the communication system 10 that is synchronized with each other may further include: user information.
  • user information For the detailed content and functions of the user information, please refer to the description related to the user portrait later.
  • Each device in the communication system 10 synchronizes device attributes with each other, which can facilitate the subsequent communication system 10 to determine the central control device.
  • the communication system 10 determines a central control device.
  • each electronic device in the communication system 10 may execute S104 under any of the following situations:
  • Each electronic device in the communication system 10 may execute S103 under the trigger of the user.
  • the user may input an operation on a central device (such as a router or a mobile phone) in the communication system 10, triggering the central device to notify other devices in the communication system 10 to jointly execute S103 through broadcasting or other forms.
  • the user operation for triggering S103 may also be referred to as a second operation.
  • Each electronic device in the communication system 10 may also perform S103 periodically or aperiodically according to a preset rule. For example, each electronic device in the communication system 10 may execute S104 once a week or a month. That is to say, multiple electronic devices in the communication system 10 can determine the central control device when the preset time arrives.
  • each electronic device in the communication system 10 coordinates to execute S103, that is, re-elects the central control device.
  • the reasons why the central control device goes offline may include, for example, the position of the central control device changes, the power is exhausted, the user manually triggers the central control device to go offline, and so on.
  • the central control device after the communication system 10 determines the central control device, no matter whether the central control device is offline or not, the central control device can continue to retain the identity of the central control device. In this way, the problem of frequent election of the central control device caused by the frequent offline and offline of the central control device can be avoided.
  • the communication system 10 may execute S104 after executing S102 or after waiting for a preset period of time.
  • the preset duration can be set according to actual needs, which is not limited in this embodiment of the present application.
  • the preset duration can be set to 10 seconds, 1 minute, 1 hour, 12 hours, 1 day, 2 days, 3 days and so on.
  • each electronic device in the communication system 10 can fully and comprehensively synchronize device information with each other. For example, within a preset period of time, there may be a new online device, and the newly online device can synchronize its own device information to other devices. For example, referring to FIG. 6A , assuming that the preset statistical period is 2 days, after the smart large screen 51 and the smart speaker 54 synchronize the interaction statistical information, the smart large screen 51 enters a waiting state. Assuming that the smart large screen 51 finds that the mobile phone 52 and the smart speaker 54 are online on the first day, the smart large screen 51 can send device information to the mobile phone 52 and receive the device information sent by the mobile phone 52 . As shown in FIG.
  • the smart big screen 51 can send device information to the tablet 53 and receive the device information sent by the tablet 53 .
  • the smart large screen 51 ends the waiting state, and completes the synchronization of the interactive statistical information.
  • the smart large screen 51 not only obtains the device information of the smart speaker 54 , but also obtains the device information of the mobile phone 52 and the device information of the tablet computer 53 .
  • the communication system 10 can collect more comprehensive device information to elect the central control device, and can elect a more suitable central control device.
  • each electronic device in the communication system 10 may execute S104, that is, negotiate, elect, decide or determine a central control device, through broadcast, multicast, query, and other means based on the connection between the devices.
  • S104 that is, negotiate, elect, decide or determine a central control device, through broadcast, multicast, query, and other means based on the connection between the devices.
  • Each electronic device can communicate multiple times when negotiating with the central control device, and the embodiment of the present application does not limit the negotiation process and the number of interactions.
  • the number of central control devices in the communication system 10 may be one or multiple.
  • each electronic device in the communication system 10 may negotiate one or more interactions, or may not need to interact, which is not limited here.
  • Each electronic device in the communication system 10 may determine the central control device through a certain policy. This embodiment of the application does not limit the policy. Several ways for the communication system 10 to determine the central control device are listed below:
  • each electronic device in the communication system 10 may determine a device with relatively stable computing resources, a device with relatively stable memory resources, a device with relatively stable power supply, a device with many available modes, or a device commonly used by users as a medium control device.
  • the communication system 10 can elect a smart screen that is always connected to the power supply as the central control device.
  • each electronic device in the communication system 10 select one or more devices from the plurality of electronic devices in the communication system 10 as the central control device.
  • the communication system 10 may determine the electronic device with the largest average number of online devices as the central control device. For example, assuming that the interaction statistics information of each device in the communication system 10 is as shown in Table 2, the smart screen can be determined as the central control device.
  • the communication system 10 may determine the electronic device with the largest normalized standard deviation of the average number of online devices as the central control device. For example, referring to Table 2, the communication system 10 may determine a smart speaker as a central control device.
  • the communication system 10 may determine electronic devices whose average number of online devices is greater than the first value and whose normalized standard deviation is greater than the second value as candidate target devices.
  • the first value and the second value are preset parameters.
  • the target device may be directly determined as the central control device.
  • the communication system 10 may determine the central control device according to one or more of decision factors such as the number of historical online devices, device type, memory size, and device identification. Each of the above decision factors may have different priorities, and the communication system 10 may start from the decision factor with the highest priority, compare each target device in turn, and select a better electronic device as the central control device.
  • decision factors such as the number of historical online devices, device type, memory size, and device identification.
  • Each of the above decision factors may have different priorities, and the communication system 10 may start from the decision factor with the highest priority, compare each target device in turn, and select a better electronic device as the central control device.
  • the smart screen and the smart speaker meet the requirement that the average number of online devices is greater than the first value, and the normalized standard deviation of the average number of online devices is greater than the second value, therefore, the smart screen and smart speaker are targeted equipment. Since the number of historical online devices of the smart screen and the smart speaker is the same, the smart screen can further obtain the memory size of the smart screen and the smart speaker. Assume that the memory of the smart screen is 6GB, the memory of the smart speaker is 512MB, and the memory of the smart screen is larger than that of the smart speaker. Therefore, the communication system 10 can determine the smart screen as the central control device.
  • the communication system 10 can use Poisson distribution to model the number of online devices of each device, and calculate the mathematical expectation value of the number of online devices of each device through maximum likelihood estimation, and the communication system 10 can calculate the number of online devices The electronic device with the largest mathematical expectation value is determined as the central control device.
  • other probabilistic statistical models can also be used to model the number of online devices of each device, and according to one or more of statistical parameters such as mathematical expectation, variance, and standard deviation, determine the central control equipment.
  • the central control device is selected according to the historical interaction information of each electronic device in the communication system 10, and the electronic device that has interacted with more devices in the communication system 10 can be determined as the central control device.
  • the central control device has more interactions with other devices, and can undertake the tasks of collecting information and coordinating calculations, obtaining various data of other devices, generating portraits of other devices, generating global context, generating memory, etc., so as to ensure The effect of providing services based on multiple devices.
  • the communication system 10 uses the above-mentioned first or second strategy to determine a central control device, and there are other devices in the communication system 10 that have not established a direct connection with the central control device, the communication system 10 can continue the election The way to determine more central control equipment.
  • a central control device determined using the first or second strategy above may not be directly connected to some electronic devices in the communication system 10, that is, it cannot interact directly at the same time or in the same space.
  • the central control device is a smart screen in the living room.
  • the user turns on the smart screen in the living room to watch the program.
  • the user turns off the smart screen in the living room, returns to the bedroom, and turns on the smart screen in the bedroom to watch programs.
  • the smart screen in the living room and the smart screen in the bedroom are in the same local area network, the smart screen in the living room and the smart screen in the bedroom cannot interact directly at the same time, and the smart screen in the living room cannot obtain the historical interaction information of the smart screen in the bedroom.
  • the non-central control devices in the communication system 10 can continue to elect multiple central control devices in order to obtain communication Historical interaction information of each electronic device in the system 10. In this way, multiple selected central control devices can connect all devices in the communication system 10 at the same time or in the same space.
  • a group is formed between the central control device and other directly connected devices. Other devices directly connected to the central control device may be called candidate devices.
  • the central control device may send group information to the candidate device, and the group information may include: an identifier of the central control device, and an identifier of the candidate device.
  • the candidate device determines whether there is an outlier device not included in the group among the devices directly connected to it, and if there is an outlier device, the candidate device is added as a central control device. As a newly added central control device, the candidate device also sends group information to other devices to inquire about outlier devices until there are no more outlier devices in the entire communication system 10 .
  • FIG. 6B schematically shows a topology diagram of a communication system 10, which shows an example of a central control device for continuation elections.
  • the communication system 10 includes: a mobile phone A, a tablet computer B, a smart screen C in the living room, and a smart screen H in the bedroom.
  • the lines in the figure represent the direct connections between devices. If the central control device determined for the first time is smart screen C, then smart screen C can send group information to mobile phone A and tablet computer B. After tablet B receives the group information, it can know that the connected smart screen H is an outlier device, and then tablet B also determines itself as a central control device.
  • mobile phone A, tablet computer B, smart screen C, and smart screen H can be divided into two different groups through the above-mentioned continuation of election.
  • group 1 with smart screen C as the central control device includes mobile phone A, tablet B, and smart screen C
  • group 2 with tablet B as the central control device includes tablet B and smart screen H.
  • a central control device is determined for the first time, a group is formed between the central control device and other directly connected devices, and the central control device can send group information to candidate devices.
  • the candidate device receives the group information, it determines whether there is any outlier device not included in the group among the devices directly connected to it, and if there is an outlier device, the candidate device and the outlier device form a group. And negotiate a new central control device in this group.
  • the newly negotiated central control device also sends group information to other devices to query outlier devices until there are no more outlier devices in the entire communication system 10 .
  • FIG. 6C schematically shows a topology diagram of a communication system 10, which shows an example of a central control device for continuation elections.
  • the local area network includes five electronic devices including device A, device B, device C, device D, and device E, the lines in the figure indicate the direct connection relationship between the devices.
  • Device B may send group information 1 to device A and device C, where group information 1 includes a device identifier of device A, a device identifier of device B, and a device identifier of device C.
  • Device E may send group information 2 to device A, where group information 2 includes a device identifier of device A and a device identifier of device E.
  • the device A After receiving the group information 1 and the group information 2, the device A detects that neither the group information 1 nor the group information 2 has the device identifier of the device D. Therefore, device A may determine device D as an outlier device. After that, both device A and device D can continue to elect a new central control device, for example, device A can be determined as the newly added central control device. Afterwards, device A may send group information 3 to device D, where group information 3 includes the device identifier of device A and the device identifier of device D.
  • device A, device B, device C, device D, and device E can be divided into three different groups through the above-mentioned continuation election method.
  • group 1 with device B as the central control device includes device A, device B, and device C
  • group 2 with device E as the central control device includes device A and device E
  • Group 3 includes device A and device D.
  • multiple central control devices can be determined in the communication system 10 , and the multiple central control devices can be connected to all devices in the communication system 10 .
  • the non-central control device can determine the outlier device through the above-mentioned continuation election method, and determine a new central control device together with the outlier device.
  • the new central control device can interact with the above-mentioned outlier devices, so as to obtain all-round historical interaction information of each electronic device in the communication system 10, so as to fully utilize the information of each electronic device to provide services for users.
  • the communication system 10 determines the device selected by the user as the central control device.
  • the user can set one or more devices in the communication system 10 as central control devices by inputting user operations (such as voice commands, or touch operations on certain electronic devices, etc.) to the communication system 10 .
  • user operations such as voice commands, or touch operations on certain electronic devices, etc.
  • the communication system can elect the central control device according to the actual needs of the users.
  • the communication system 10 may also use other strategies for electing the central control device, which are not specifically limited here.
  • the communication system 10 may also use a Raft algorithm, a Paxos algorithm, etc. to elect a central control device.
  • the communication system 10 may also select a central control device according to the device type, for example, always elect a smart screen as the central control device.
  • the communication system 10 may also fix a certain electronic device as the central control device, and it will not be changed later.
  • different central control devices may correspond to different time periods.
  • the smart screen in the living room can be selected as the central control device during the day, and the mobile phone can be selected as the central control device at night.
  • the electronic device may be mobile, so the electronic device may join different communication systems at different times.
  • the electronic devices may correspond to different central control devices.
  • the electronic device can associate and store the identification of the communication system and the identification of the corresponding central control device, so as to perform different operations after entering a different communication system.
  • the identifier of the communication system may include, for example, an identifier of a local area network, a location of the communication system, and the like.
  • the location of the communication system may include GPS positioning, a location manually marked by a user, an equipment list and its derivative information, and the like.
  • the central control device of the communication system can be a desktop computer.
  • the user can bring the mobile phone into the home area, and the mobile phone joins the communication system in the home area.
  • the central control device of the communication system can be a smart screen.
  • the above S101-S104 can be performed by multiple electronic devices running part of the functions of a single smart assistant, and the subsequent S105-108 can be performed by the multiple electronic devices jointly running the single smart assistant. That is to say, a single smart assistant running together by multiple electronic devices supports the communication system 10 to execute the subsequent steps S105-S108.
  • the central control device utilizes the combinable capability of each device in the communication system 10 to initialize the virtual aggregation device.
  • initializing the virtual aggregation device refers to configuring the virtual aggregation device in the state and capability of the pre-installed virtual aggregation device.
  • configuring a virtual aggregation device means that the central control device selects appropriate resources in the communication system 10 for initialization, that is, the central control device uses some or all of the combinable capabilities in the communication system to combine or form a virtual device .
  • the initialization of resources may include loading codes or software libraries of composable capabilities, starting sensors or peripherals related to composable capabilities (such as microphones, cameras, etc.), reading or recording data related to composable capabilities (such as audio, images, etc. ), one or more of operations such as downloading dependent data or calculation models from the Internet. Initialization of resources may be performed by the physical device where the aggregated composable capabilities reside.
  • configuring the virtual aggregation device refers to performing parameter configuration, network connection, connection relationship, data channel, and the configurable parameters of the combinable capability itself (for example: the volume of the playback capability, the camera The configuration on the resolution) and so on.
  • the configuration of parameters includes: the configuration of related parameters for the flow direction of data processing.
  • the central control device configures the virtual aggregation device, it is equivalent to specifying the flow of information collection and processing. That is to say, after the virtual aggregation device is configured, the combinable capabilities for collecting and processing information in the communication system 10 and the coordination relationship between the combinable capabilities can be determined. After the virtual aggregation device is configured, the composable capabilities in the virtual aggregation device are in a working state or in a waiting state.
  • the application can perceive the independent virtual aggregation device instead of multiple other individual physical devices. In this way, various upper-layer applications can more conveniently schedule resources in other physical devices.
  • Configuring the virtual aggregation device through S105 can make preparations in advance for the combinable capabilities that may be used later, and can improve the response speed when the combinable capabilities are subsequently activated to provide services for users.
  • the communication system 10 only needs to aggregate part of the combinable capabilities in the communication system 10, which can avoid wasting unnecessary resources.
  • the central control device can continue to adjust and optimize the configuration in advance based on the scene requirements, so when the user sends an instruction involving multiple devices, the central control device can immediately call the relevant The ready-to-configure composable capabilities perform tasks and give responses, effectively shortening the command response delay.
  • This can support the communication system 10 to actively provide services and provide long-term tasks for users, and avoid triggering immediate cooperative combination when users issue instructions, resulting in slow response and not supporting active services.
  • the virtual aggregation device includes: a central control device, and some or all other combinable capabilities in the communication system 10 selected by the central control device.
  • the virtual aggregation device is regarded as a single complete device that can independently execute application tasks, but its various capabilities (such as interaction, service, etc.) may actually come from different physical devices. That is to say, a virtual aggregation device is obtained by aggregating some or all of the capabilities provided by multiple physical devices. Each combinable capability of the virtual aggregation device may come from any one or more physical devices in the communication system 10, which is not limited here.
  • the virtual aggregation device can be used to perform subsequent steps S106, S107, S108 and S109.
  • the central control device can select appropriate combinable capabilities to form a virtual aggregate according to the configuration of each device in the communication system 10, historical interaction information, user preferences, user status, device status, environment status, etc. equipment.
  • the configuration of the virtual aggregation device can include the following two types:
  • the initialization of the virtual aggregation device can be performed in any of the following situations:
  • the virtual aggregation device is initialized.
  • the rules activated by the communication system 10 can be set in advance according to actual needs. For example, it can be set that the communication system will be activated after more than a certain number of devices are connected to the communication system 10 .
  • the virtual aggregation device is initialized.
  • the rules for restarting the communication system 10 can be set in advance according to actual needs. For example, it can be set that the communication system will be restarted after the old central control device goes offline.
  • Equipment failures may include, for example, loss of networking, broken audio components, and the like.
  • the network environment of some or all devices in the communication system 10 changes, such as a network identifier change, a change from a WiFi connection to a wireless cellular network, and the like.
  • the user manually triggers the system initialization, such as changing the user account, resetting the system, etc.
  • the process for the central control device to initialize the virtual aggregation device may include the following steps:
  • Step 1 the central control device activates its own composable capabilities to obtain environmental information.
  • the central control device can activate its own combinable interactive recognition capabilities, such as combinable capabilities for user location recognition, etc., to obtain current environmental information. That is to say, the central control device can use GPS, GLONASS, BDS and other positioning technologies, location recognition algorithms, indoor positioning technology, millimeter wave sensors, etc. to obtain current environmental information.
  • combinable capabilities for user location recognition etc.
  • the central control device can use GPS, GLONASS, BDS and other positioning technologies, location recognition algorithms, indoor positioning technology, millimeter wave sensors, etc. to obtain current environmental information.
  • the environment information characterizes the environment or scene where the communication system 10 or the user is currently located.
  • the communication system 10 or the current environment or scene of the user can be classified according to different rules. For example, it can be divided into public scenes (such as offices) and private scenes (such as family ranges) according to the degree of privacy, can be divided into multi-person scenes and single-person scenes according to the number of people, and can be divided into occupied scenes and unmanned scenes according to whether there are users. It can also be divided into scenes such as morning, noon, and evening according to time.
  • the environmental information acquired by the central control device may be a single modality information, or a combination of multimodal information.
  • the environment information may include any one or more of the following: location information, text, audio, video, and so on.
  • step 1 the self-combinable capabilities activated by the central control device can be regarded as the most primitive virtual aggregation device. That is to say, the central control device first configures its own composable capabilities as a virtual aggregation device, and then adds more combinable capabilities in the communication system 10 to the virtual aggregation device through subsequent steps 2 and 3 .
  • Step 2 the central control device invokes the combinable capabilities of other devices in the communication system 10 to obtain more environmental information.
  • step 2 the combinable capabilities of other devices in the communication system 10 invoked by the central control device may be referred to as fourth combinable capabilities.
  • Step 2 may specifically include the following two implementation methods:
  • the central control device invokes other combinable capabilities through preset settings to obtain more environmental information comprehensively.
  • the central control device may be set to use any of the subsequent policies, or set to be located in a certain environment (such as an office or a home range).
  • the central control device can use GPS, GLONASS, BDS, etc. to obtain location information.
  • the dynamic policy may include, for example: a privacy priority policy, a comprehensive detection policy, a power consumption priority policy, and the like.
  • a privacy priority policy for example, an office might use a comprehensive detection strategy, and a bedroom space might use a privacy-first strategy.
  • the central control device activates all currently available combinable capabilities (such as cameras, microphones, etc.) to obtain environmental information. For example, for public places such as offices, choose a comprehensive detection strategy, that is, start all interactive recognition-type combinable capabilities (such as microphones, cameras, etc.) in the area for information collection. Using the comprehensive detection strategy can obtain all kinds of information comprehensively and accurately, so as to provide services for users in the future.
  • combinable capabilities such as cameras, microphones, etc.
  • the central control device invokes the composable capabilities of other devices in the communication system 10 to obtain more environmental information according to the privacy degree of the current environment.
  • Using the privacy priority policy can protect the user's privacy from being leaked.
  • Power consumption priority strategy the central control device activates the combinable capabilities available on devices (such as smart screens and smart speakers) with sufficient power in the communication system 10 to obtain environmental information.
  • the power consumption priority strategy the power of each device can be fully considered to obtain the environment information, so as to avoid the power of each device in the communication system 10 from being exhausted.
  • the above implementation mode (1) is equivalent to the communication system 10 detecting the initial environment state according to the dynamic policy, and obtaining the basis for the initial configuration of the virtual aggregation device.
  • the central control device determines the combinable capabilities of other devices in the communication system 10 to be invoked through an algorithm according to the environment information obtained by itself, and obtains more environment information through the combinable capabilities.
  • the embodiment of the present application may predefine different scenarios according to different scenario classification rules. For example, it can be divided into public scenes (such as offices) and private scenes (such as family ranges) according to the degree of privacy, can be divided into multi-person scenes and single-person scenes according to the number of people, and can be divided into occupied scenes and unmanned scenes according to whether there are users. It can also be divided into scenes such as morning, noon, and evening according to time.
  • the minimum modal information required to judge the scenario, the lowest confidence level (ie confidence threshold), and the combinable capabilities that need to be activated are defined in advance.
  • the central control device will judge whether the communication system is in the scenario based on the information collected by itself. Confidence refers to the probability that the communication system is in the scene determined according to the modal information collected by the central control device.
  • the composable capabilities that need to be activated in a scenario can be set according to the characteristics of the scenario, or based on empirical data, which is not limited here.
  • Table 3 the first four columns of Table 3 exemplarily list several scenarios, and their corresponding required modal information, minimum confidence and combinable capabilities that need to be activated.
  • the central control device compares the modal information required by the preset scenes (templates) to determine the current scene under different scene classifications, and , the actual confidence level. If the modal information obtained by the central control device includes a certain scene-dependent modal category, it can determine whether it is currently in the scene.
  • the central control device can use multimodal machine learning, deep learning algorithm, sound event detection algorithm, AlexNet, VGG-Net, GooLeNet, ResNet, CNN, FNN, CRNN and other methods to determine whether the current in the scene.
  • the modal information acquired by the central control device itself includes a, b, c, d
  • the central control device selects a union of combinable capabilities corresponding to some of the multiple determined scenarios that need to be activated, and continues to acquire more environmental information.
  • some of the multiple scenarios may include: among the multiple scenarios determined by the central control device, all the scenarios whose actual confidence is higher than the confidence threshold, or the scenarios whose actual confidence is higher than the confidence threshold and have the highest confidence Top N scenes. N can be set in advance.
  • the central control device can call the combination (1, 2, 3, 4, 5, 6, 9) of the two scenarios corresponding to the need to activate the composable capabilities to obtain more environmental information.
  • the central control device can dynamically activate more composable capabilities to obtain environmental information under different scene classification rules.
  • this method considers multi-modal information and scene information in a more fine-grained manner, and supports more flexible and dynamic activation of composable capabilities.
  • step 2 where the central control device invokes the combinable capabilities of other devices to obtain more environmental information based on the environment information obtained by itself.
  • the central control device can directly follow a predetermined strategy to It was decided to invoke the composable capabilities of other devices to obtain environmental information.
  • the embodiment of this application does not limit the predetermined policy. In this way, the speed at which the central control device obtains the environment information can be accelerated, and the efficiency of configuring the virtual aggregation device can be improved.
  • the communication system 10 is powered off during the process of initializing the virtual aggregation device, since the central control device selected in the initialization stage and the information of each device obtained through interaction (for example: device list, combination capability is enabled state, device state, etc.) does not change much, and can be restored to the configuration state of the virtual aggregation device before the power failure according to the memory.
  • the central control device selected in the initialization stage and the information of each device obtained through interaction for example: device list, combination capability is enabled state, device state, etc.
  • the above process of initializing the virtual aggregation device can support multiple devices to be initialized as a virtual aggregation device with a central control device in different environments when the communication system 10 starts or restarts for the first time or when a new device joins. Only part of the composable capabilities in the communication system 10 need to be enabled when the virtual aggregation device is initialized, so unnecessary waste of computing resources can be avoided.
  • the central control device activates the component capabilities of multiple devices through dynamic strategies, and confirms the virtual initialization configuration according to the exploration results of multiple devices.
  • the initialization configuration process can weigh factors such as privacy, power consumption, and effectiveness for scenarios, and is flexible and convenient. .
  • the user may also manually adjust the virtual aggregation device. That is, the communication system 10 may receive user operations, and aggregate virtual aggregation devices according to the user operations.
  • the electronic device may display the user interface 63 .
  • the user interface 63 may include options corresponding to one or more combinable capabilities (for example, near-field voice input capabilities, music playback capabilities, infrared image detection capabilities, etc.) that constitute the virtual aggregation device, and an add device option 633 . in:
  • the options corresponding to the one or more combinable capabilities may display the status of each combinable capability (for example, available state, closed state, etc.) and the electronic device to which the combinable capability belongs.
  • the option corresponding to the one or more combinable capabilities may also include a corresponding control control (for example, the control control 631 of the near-field voice input capability), for controlling the corresponding combinable capability (for example, the near-field voice input capability) on or off.
  • the option corresponding to the one or more combinable capabilities may also include a delete control (for example, the delete control 632 in the option corresponding to the near-field voice input capability), so that the combinable capability is no longer a component in the composition of the virtual aggregation device. A part, that is to say, the virtual aggregation device will not be able to invoke this composable capability.
  • the add device option 633 can be used to add the combinable capabilities of the discovered devices into the virtual aggregate device and become a part of the composition of the virtual aggregate device, that is to say, the virtual aggregate device can call the newly added virtual aggregate device to form combinability.
  • the electronic device may no longer display the option corresponding to the near-field voice input capability in the user interface 63 . That is to say, the near-field voice input capability running on the speaker is no longer a part of the composition of the virtual aggregation device, and the virtual aggregation device will no longer be able to call the near-field voice input capability on the speaker.
  • the electronic device may display a window 633E on the user interface 63 as shown in FIG. 5K .
  • the window 633E may display the combinable capabilities included in the discovered device but not included in the virtual aggregation device, for example, the text input capability option 633A on the desktop computer and the face detection capability option 633B on the smart screen.
  • the text input capability option 633A can include a corresponding add control 633C, which can be used to add the text input capability on the desktop computer to the virtual aggregation device, so that the virtual aggregation device can call the text on the desktop computer. Input capability.
  • the face detection capability option 633B can include a corresponding add control 633D, and the add control 633C can be used to add the face detection capability on the smart screen to the virtual aggregation device, so that the virtual aggregation device can call the smart screen. Face detection capability.
  • the electronic device may display the text input capability option 634 in the user interface 63 . That is, the virtual converged device includes text entry capabilities on a desktop computer.
  • the text input capability option 634 may include the name of the combinable capability "text input capability", the status of the combinable capability "available” and the electronic device "desktop computer” to which the combinable capability belongs.
  • the device options 634 may also include a control control 634A and a delete control 634B. For the description of the control control 634A and the delete control 634B, reference may be made to the description in the foregoing embodiments, and details are not repeated here.
  • the above user interfaces shown in FIGS. 5D-5G , 5J-5L may be provided by any device in the communication system 10 .
  • it may be provided by a central control device.
  • the user can select required combinable capabilities to add to the virtual aggregation device.
  • the central control device after the central control device configures the virtual aggregation device, it can also trigger the physical device where the combinable capability in the virtual aggregation device is located to output a prompt message to prompt the user that the combinable capability in the physical device has been added to the virtual aggregation in the device.
  • the implementation form of the prompt information is not limited. For example, physical devices can alert users by flashing lights, vibrating, etc.
  • the virtual aggregation device includes a central control device that supports the interactive combinable capability of collecting near-field voice, far-field voice, and gestures, and supports near-field ASR, far-field ASR, NLU, palm detection, and dialogue management (Dialogue management (DM)’s recognition class composability, and the service class composability of supporting skills 1-skill N.
  • Dialogue management (DM) Dialogue management
  • DM Dialogue management
  • Collected data can be analyzed by the recognition combinable capabilities that support near-field ASR, far-field ASR, NLU, palm detection, and DM. According to the analysis results, start the service class that supports skill 1-skill N, and the ability class can be combined to perform corresponding tasks.
  • the central control device can manage resources in the virtual aggregation device (that is, combine capabilities), and provide services for users through the virtual aggregation device. That is to say, the central control device can be used to manage some or all of the resources in the multiple electronic devices included in the communication system 10 .
  • the central control device triggers the first device in the virtual aggregation device to detect a specific event.
  • the specific event may also be referred to as the first time.
  • a specific event refers to an event that implies a user intention.
  • a specific event may be one mode or a combination of multiple modes.
  • Modalities may include, for example, text, voice, vision (such as gestures), actions, postures (such as the location of the user, the distance between the user and the device), scenes (such as office scenes, home scenes, commuting scenes), etc.
  • the interactive operations input by the user may include, but are not limited to: voice commands, touch operations on the display screen (such as click operations, long press operations, double-click operations, etc.), air gestures/floating gestures, and touch operations on device buttons. Actions, gestures, eye movements, mouth movements, movements or shaking of the device, etc.
  • the device may start to detect voice commands after receiving a wake-up word, for example, the wake-up word may include a voice wake-up word (such as "Xiaoyi Xiaoyi”), or a gesture wake-up word (such as "OK" gesture) .
  • a voice wake-up word such as "Xiaoyi Xiaoyi”
  • a gesture wake-up word such as "OK” gesture
  • a mobile phone when using a mobile phone, if the user wants to cast a screen, he can output a voice command "cast screen", or click a screen projection button on the display screen of the mobile phone.
  • the interactive operation input by the user may also be referred to as the first operation.
  • the user status may include, for example, the location of the user, the affairs performed by the user (such as exercising, working, watching TV, etc.), and the like.
  • Events of user state change may include, for example: the user gets up, the user sleeps, the user goes out, the user moves, and so on.
  • the posture between the user and the device may include, for example, the distance between the two.
  • the event that the situation between the user and the device changes may include, for example, that the user moves the device (for example, picks up a mobile phone), and the distance between the user and the device changes (for example, becomes larger or smaller).
  • the environmental state may include, for example: ambient temperature, humidity, ultraviolet intensity, air volume, ambient light, and the like.
  • the device receives the notification message, or obtains the event of the upcoming schedule information
  • the device obtains the notification message, or obtains the event of the upcoming schedule information.
  • the notification message obtained by the electronic device can be actively generated by the application in the device during operation, or can be sent by the server corresponding to the application in the device.
  • the electronic device can receive a notification message sent by a trusted organization for notifying extreme weather (such as storms, heavy snow, etc.) notification messages and the like.
  • Schedule refers to the plan and arrangement for a certain moment or time period.
  • a schedule may also be called an event, a transaction, a schedule or other names, which are not limited here.
  • the schedule information may come from a memo, a calendar (calendar), an alarm clock (clock), a ticket booking application, an online meeting application, etc. in the electronic device.
  • the central control device may select some or all resources in the virtual aggregation device to detect a specific event. Some or all of the resources may be referred to as first resources.
  • the quantity of the first resource may be one or more.
  • the first resource may include resources from one electronic device, and may also include resources from multiple electronic devices.
  • the first resource is a composable capability, for example, an interactive combinable capability.
  • the central control device may select some or all of the interactable combinable capabilities in the configured virtual aggregation device to detect a specific event.
  • the central control device can arbitrarily select or select part of the interactive combinable capabilities of the virtual aggregation device according to a certain policy to detect specific events.
  • the central control device can combine one or more of the following to select the appropriate interaction class and combinable capabilities to detect specific events: user status, device status, environment status, user portrait, global context, or memory.
  • the above strategies may include, for example, any one or a combination of the following:
  • Strategy 1 select interactive class composable capabilities according to modal channels.
  • a specific event may be a single mode or a combination of multiple modes, and there may be multiple acquisition channels for a certain mode
  • the central control device can select the first device and the first Composable capabilities.
  • a channel refers to a device or a combinable capability for collecting modalities.
  • a specific event of a voice modality may be picked up by both a far-field device and a near-field device.
  • the voice mode in a specific event can be picked up through the far-field device's voice interaction combinable capability, and the gesture mode in this specific event can be collected through the visual interaction combinable capability.
  • the central control device may select some channels from all the collection channels to collect the modal information of a specific event. For example, when the human-machine distance between the user and the device is long, you can choose the far-field sound pickup and combinability capability to collect voice commands.
  • the central control device may select multiple channels to jointly collect the modality information.
  • This modality information may be referred to as first modality data.
  • both near-field and far-field pickups can be combined to capture voice commands.
  • the modal information collected by multiple channels can be fused to obtain more accurate and rich modal information, which facilitates the accuracy of subsequent operations.
  • the central control device may select one or more combinable capabilities with higher activity or the highest level to detect a specific event.
  • the activity of the composable capability is related to the following device information: (1) Whether the device where the composable capability resides is in the activated state. If activated, the activity level is high. (2) The latest activation time of the device where the combinable capability resides. The longer a device has been active, the more active it will be. (3) Frequency dependence of input received by the device where the combinable capability resides. The more frequently a device receives input, the more active it is.
  • the central control device may select an interactive combinable capability in a device that is closer to the user to detect a specific event.
  • the distance between the device where the combinable capability is located and the user can be judged by detecting the strength of biometric signals (such as face, voiceprint, skin electricity, heart rate, etc.).
  • the central control device may preferentially select the most frequently invoked combinable capability in the historical records to detect a specific event according to user habits.
  • the user can independently select the combinable capabilities for detecting specific events, and the central control device can trigger the combinable capabilities selected by the user to detect specific events.
  • the central control device can trigger the combinable capabilities selected by the user to detect specific events.
  • the user can be selected by operating on the central control device, or by voice, gesture, and the like.
  • the embodiments of the present application may also use other strategies to select the interaction class and composable capabilities to detect specific events.
  • the central control device can also select the interactive combinable capability of the device that is closer to the central control device to detect specific events, or select the interactive combinable capability of the device that interacts with the central control device to detect specific events .
  • the central control device may select a device with stronger interaction capability or more combinable interaction capabilities in the device to detect a specific event.
  • the central control device may preferentially select the interactive composable capabilities in mobile phones with strong interactive capabilities to detect specific events.
  • the device or the user may preset the priority of the device for detecting a specific event, for example, it may be preset when the device leaves the factory or be preset by the user during use.
  • the preset device priority may be stored in the cloud server, or in any one or more devices in the communication system 10 .
  • the central control device can preferentially select the service class composable capability in the device with higher priority according to the preset device priority to detect specific events.
  • Part or all of the interactive combinable capabilities selected by the central control device in the virtual aggregation device for detecting specific events may be referred to as the first combinable capability, and the device where the first combinable capability resides is the first device.
  • the quantity of the first device may also be one or more.
  • the central control device can determine the camera and the smart speaker as the first device, and trigger the camera to start the combinable capability of visual interaction to collect images, and trigger the smart speaker to start the combinable capability of voice interaction to collect audio.
  • the central control device may select some or all of the virtual aggregation devices capable of detecting a specific event as the first device.
  • a device configured with a microphone, a display screen, a camera, or an acceleration sensor is capable of detecting user-input interactive operations, and the central control device may select this type of device as the first device.
  • a device equipped with a camera and a speed sensor can be used to detect the user's state, and the central control device can select this type of device as the first device.
  • a device configured with a camera and a distance sensor can be used to detect the situation between the user and the device, and the central control device can select this type of device as the first device.
  • a device equipped with a temperature sensor and a humidity sensor can be used to detect an environmental state, and the central control device can select this type of device as the first device.
  • a device capable of sending and receiving messages can be used to receive notification messages
  • a device capable of adding schedules can be used to obtain schedule information
  • the central control device can select this type of device as the first device.
  • the central control device configures a virtual aggregation device in S105, the first combinable capability in the virtual aggregation device is ready to start in advance. Therefore, in S106, the central control device can quickly and conveniently trigger the first A device activates the first combinable capability to detect specific events. It can be seen that by configuring the virtual aggregation device, the efficiency of performing S106 by the communication system 10 can be improved, so as to provide better services for users.
  • the central control device may trigger the first device to activate the first combinable capability to detect a specific event by sending a notification message (such as broadcast or multicast) through the connection between the devices.
  • a notification message such as broadcast or multicast
  • each first device activates the first combinable capability to collect corresponding data, it can locally analyze the data, and send the analysis result (such as an identified event) to the central control device, For the central control device to know whether a specific event is currently detected.
  • one or more first devices activate the first combinable capability to collect corresponding data
  • they can send the data collected by themselves to the central control device, and the central control device will integrate multiple The data collected by the first capability can be combined to analyze whether a specific event is currently detected.
  • one or more first devices may send the data collected by themselves to the second device in subsequent S107, and the second device Analyze user intent based on this data and split pending tasks.
  • one combinable capability can be used to collect data of a small number of modalities (for example, one modality), and multi-modal data needs to be collected by multiple combinable capabilities.
  • Different combinable capabilities typically have different sampling rates.
  • the sampling rate refers to the number of data collected by the combinable capability within a unit of time (such as one second, ten seconds, one minute, etc.).
  • the sampling rate of each combinable capability may be independently set by the electronic device, which is not specifically limited here.
  • a specific event may be a combination of multiple modes. That is, certain events may include multimodal data.
  • the central control device may determine a unified sampling rate, and trigger the first combinable capability in the first device to uniformly use the sampling rate to collect data.
  • each of the first combinable capabilities uses the same sampling rate to sample data, and the virtual aggregation device can obtain data of multiple modalities with a similar amount of data, and can more conveniently and quickly fuse multi-modal data to identify specific event. It can be seen that the first combinable capabilities use a uniform sampling rate to collect data, which can ensure the integrity of data features collected by each first combinable capability, and can save resources consumed in detecting specific events.
  • the central control device determines that the unified sampling rate can be determined by any of the following methods:
  • Mode 1 the central control device arbitrarily selects a policy as the unified sampling rate.
  • the central control device can pre-store a uniform sampling rate.
  • the central control device may arbitrarily select a sampling rate of a combinable capability among multiple first combinable capabilities as the unified sampling rate.
  • the central control device determines the sampling rate of the first combinable capability with the highest activity as the unified sampling rate.
  • the central control device may notify each first combinable capability to report activity information and a sampling rate, and determine the activity of each first combinable capability according to the activity information reported by each first combinable capability. Then, the central control device sends the sampling rate of the first combinable capability with the highest activity to each first combinable capability, and notifies each first combinable capability to uniformly sample according to the sampling rate.
  • the activity of the combinable capability reflects the frequency with which the user uses the combinable capability, or the frequency with which the user reads the data collected by the combinable capability. The higher the above frequency, the higher the activity of the combinable ability.
  • Activity information may include one or more of the following: the usage status of the device where the combinable capability is located, the change in the amount of data collected by the combinable capability twice in the initial time interval, the degree of association between the data collected by the combinable capability and the user .
  • the initial time interval may be a preset fixed system parameter, or a parameter adjusted according to a certain policy.
  • the use state of the device where the combinable capability is located may include the frequency of use, etc., and the higher the frequency of use, the higher the activity.
  • the greater the change in the amount of data collected twice in the initial time interval of the composable capability the higher the activity.
  • the higher the degree of correlation between the data collected by the composable capability and the user the higher the activity. For example, the user spends more time in the living room during the day, and the data collected by the combinable capability of the living room device is more relevant to the user than the data collected by the combinable capability of the bedroom device.
  • the central control device may also use a uniform sampling rate to collect environmental information.
  • the above-mentioned uniform sampling rate may also be referred to as a first sampling rate.
  • the central control device may trigger the first combinable capability in the virtual aggregation device to uniformly use the sampling rate to collect data after S105. In this way, the impact of sampling rate on modal data fusion can be considered, and the virtual aggregation device can be equipped with an adaptive initial perception acquisition strategy.
  • the central control device triggers the second device in the virtual aggregation device to analyze the user intention represented by the specific event, and determine the task to be executed corresponding to the user intention.
  • the central control device may select part or all of the resources in the virtual aggregation device to analyze the user intention represented by a specific event, and determine the task to be executed corresponding to the user intention.
  • Some or all of the resources may be referred to as third resources.
  • the third resource may include resources from one electronic device, or may include resources from multiple electronic devices.
  • the third resource is a composable capability, for example, it may be a recognition-type combinable capability.
  • the central control device may select some or all of the combinable capabilities in the configured virtual aggregation device to identify user intentions and tasks to be performed.
  • the central control device can arbitrarily select or select part of the recognition-type combinable capabilities of the virtual aggregation device according to a certain strategy to identify user intentions and tasks to be executed corresponding to user intentions.
  • the central control device can combine one or more of the following to select an appropriate recognition type combinable capability to identify user intent and the corresponding task to be executed corresponding to the user intent: user status, device status, and environmental status detected by the virtual aggregation device history , user persona, global context, or memory.
  • the above strategies may include, for example, any one or a combination of the following:
  • the central control device may select one or more combinable capabilities with higher activity or the highest level to identify the user's intention and determine the task to be executed corresponding to the user's intention.
  • the method of determining the activity level of composable capabilities please refer to the above.
  • the central control device may select a recognition-type combinable capability in a device that is closer to the user to identify the user's intention and determine the task to be executed corresponding to the user's intention. For the method of judging the distance between the device and the user, refer to the previous section.
  • the central control device may preferentially select the recognition-type combinable capability in the first device to identify the user's intention and determine the task to be executed corresponding to the user's intention.
  • the central control device may, according to user habits, preferentially select the most frequently invoked combinable capability in the historical records to identify the user's intention and determine the task to be executed corresponding to the user's intention.
  • the central control device may preferentially select the recognition-type combinable capability in the device where the user's attention is focused to identify the user's intention and determine the task to be executed corresponding to the user's intention.
  • the central control device can collect specific events within a specific time range and the associated data of the recognition-type composable capabilities activated, and predict a series of models that users may need to activate the composable capabilities from the former input based on machine learning/deep learning methods . Then, based on the model, specific events are used as input to obtain the composable capabilities that need to be activated.
  • This method can be implemented by referring to the ranking technology that has been widely used in recommender systems. Also consider doing multimodal input as an extension of specific events.
  • the user can independently select the combinable capabilities used to identify user intentions and determine the tasks to be executed corresponding to the user intentions, and the central control device can trigger the combinable capabilities selected by the user to identify user intentions and determine the tasks to be executed corresponding to the user intentions.
  • Task There is no limit to the way the user selects the composable capabilities for identifying the user's intention and determining the task to be executed corresponding to the user's intention. For example, it can be selected through operation on the central control device, through voice, gesture, etc. .
  • the central control device may select stronger or more combinable recognition capabilities in devices to identify user intentions and tasks to be executed.
  • the central control device can give priority to the combination of identification capabilities in mobile phones with strong processing capabilities to identify user intentions and tasks to be executed.
  • the device or the user may preset the priority of the device used to identify the user's intention and the task to be performed, for example, it may be preset when the device leaves the factory or by the user during use.
  • the preset device priority may be stored in the cloud server, or in any one or more devices in the communication system 10 .
  • the central control device may, according to the preset device priorities, preferentially select service class composable capabilities in devices with higher priorities to identify user intentions and tasks to be executed.
  • the embodiments of the present application may also use other strategies to select the recognition class combinable capability to identify user intent and determine the to-be-executed task corresponding to the user intent.
  • the central control device can also select the recognition-type combinable capability of the device that is closer to the central control device to identify the user's intention and determine the task to be performed corresponding to the user's intention, or select the device that interacts with the central control device recently.
  • the recognition class can combine capabilities to recognize user intent and determine the task to be executed corresponding to the user intent.
  • the third combinable capability selected by the central control device in the virtual aggregation device for analyzing the user intent represented by a specific event, and determining part or all of the tasks to be performed corresponding to the user intent can be referred to as the third combinable capability , the physical device where the third combinable capability resides is the second device.
  • the central control device may select some or all of the devices capable of identifying user intentions and tasks to be executed corresponding to the user intentions among the virtual aggregated devices as the second device.
  • the number of the third combinable capabilities may be one or more.
  • the quantity of the second device may also be one or more.
  • the central control device can determine the smart screen and the mobile phone as the second device, and trigger the smart screen and the mobile phone to start the processor to analyze the user intention represented by a specific event, and determine the waiting time corresponding to the user intention. perform tasks.
  • the central control device configures a virtual aggregation device in S105, the third combinable capability in the virtual aggregation device is ready to start in advance. Therefore, in S107, the central control device can quickly and conveniently trigger the third The second device activates the third combinable capability to analyze the user intention represented by a specific event, and determine the task to be executed corresponding to the user intention. It can be seen that by configuring the virtual aggregation device, the efficiency of performing S107 by the communication system 10 can be improved, so as to provide better services for users.
  • the central control device can trigger the second device to start the third combinable capability to analyze the user intention represented by a specific event by sending a notification message (such as broadcast, multicast) through the connection between the devices. , and determine the to-be-executed task corresponding to the user intention.
  • a notification message such as broadcast, multicast
  • the first device may notify the second device of the specific event, so that the second device can start the third combinable capability to analyze the specific event. Characterized user intentions, and determine the tasks to be performed corresponding to the user intentions.
  • the central control device may notify the first device after determining the second device, so that the first device notifies the second device of a specific event.
  • the central control device in S106 may directly trigger or notify the second device to analyze the user intention represented by the specific event, and determine The task to be executed corresponding to the user intent.
  • the second device may use the This data analyzes user intent and splits pending tasks.
  • the second device can analyze the user intention represented by a specific event in combination with one or more of the following: user state, device state, environment state, user portrait, global context detected in the history of the first device, or memory.
  • the second device may use an intention recognition algorithm, a neural network algorithm, and the like to analyze the user intention represented by a specific event.
  • the same specific event recognized by the second device may represent different user intentions.
  • User intent refers to the user's purpose or need.
  • the correspondence between specific events and user intentions can be set in advance, or can be learned by the smart assistant on the virtual aggregation device during operation.
  • the second device can analyze the user's intention through voice recognition, including: check the situation of the living room.
  • the specific event is the voice command "turn on the light” input by the user
  • the second device can analyze the user's intention through voice recognition and the current location of the user (such as the living room): turn on the light in the living room.
  • the user intention recognized by the second device may be described in the form of structured data.
  • Structured data refers to data expressed with some kind of structural logic (such as a two-dimensional table).
  • the user intent may be "operation: turn on the light; location: living room.”
  • the user intent may be "operation: play music; content: “Qilixiang””.
  • the second device After the second device recognizes the user intention represented by the specific event, it may determine the to-be-executed task corresponding to the user intention. The device executes the to-be-executed task corresponding to the user's intention, which can satisfy the user's intention, that is, satisfy the user's demand.
  • the tasks to be performed corresponding to the same user intention determined by the second device may be different.
  • the process of the second device determining the to-be-executed task corresponding to the user's intention can be regarded as a process of splitting the user's intention into to-be-executed tasks.
  • the number of to-be-executed tasks obtained by splitting user intentions may be one or multiple.
  • the multiple tasks may be executed in parallel at the same time, or may have a certain logical execution relationship.
  • the logic execution relationship may include, for example, sequence relationship, cycle relationship, conditional relationship, Boolean logic, and the like.
  • User intents can include multiple modalities or types.
  • the user intention may include: an intention to play a visual image, an intention to play an audio, an intention to turn on a light, an intention to vibrate, an intention of a mobile device, and the like.
  • the user intention of "checking the situation of the living room" includes two modes: viewing real-time images of the living room and listening to real-time audio of the living room.
  • a task is one or more actions performed by a device.
  • Tasks which are operations performed by devices, can also be divided into modalities or service types.
  • the tasks may include: visual image playing tasks, audio playing tasks, vibration tasks, flash light tasks, moving tasks and so on.
  • the second device may split the user's intention into one or more tasks to be performed in units of modalities.
  • the second device may combine one or more of the following to split the user's intention into tasks to be performed in units of modalities: user state, device state, environment state detected by the first device history, user portrait, global context, or memory.
  • the second device may select an appropriate splitting method to split the user intent according to the modality or type of the user intent.
  • methods for splitting user intent may include the following:
  • Method 1 splitting user intent based on activation information of historical composable capabilities.
  • the second device may split the user intention according to the category of the combinable capabilities that have been activated according to the historical user intention. That is to say, the second device may search for results of historical splitting of user intentions, and refer to the historical splitting results to split the currently recognized user intents.
  • the second device can collect user intentions, the user state, device state, and environment state collected by the first device, and the associated data of the combinable capability categories that the user actually chooses to start, and can be trained based on machine/deep learning methods.
  • the input of user intent and user/device/environmental state infers a model of combinable capability categories, and finally splits multiple tasks to be performed based on this.
  • the second device may preset user intentions and corresponding tasks to be executed in different scenarios. After the scene and user intention are recognized, the scene and user intention can be used as input, and after the fixed logic processing of the rules, the corresponding one or more tasks to be executed are output.
  • the second device splits the to-be-executed tasks intended by the user into deterministic tasks and probabilistic tasks.
  • the former means that the communication system identifies split tasks to be executed based on clear user intentions; the latter means that the user may need the device to perform tasks to be executed without clear user intentions.
  • Probabilistic tasks generally correspond to ambiguous user intent. Since probabilistic tasks have corresponding confidence levels for different types of tasks to be performed, they can be further selected according to rules. For example, only tasks to be performed that meet a certain threshold can be determined as tasks to be performed corresponding to user intentions.
  • the second device analyzes and obtains the user intention represented by the specific event, and after determining the task to be performed corresponding to the user intention, may also send the identification result to the central control device. That is to say, the second device may send the user intention represented by the analyzed specific event, and/or the task to be executed corresponding to the user intention, to the central control device.
  • the virtual aggregation device can use user intentions to perform fine-grained control and execution of input/output involving cross-device, while considering the semantic information of the perceived data and the explicit and implicit intentions of the environment , to further enhance the advantages of single-device functions and personalized output.
  • the actual capabilities of the communication system can be flexibly adapted, and the interactive experience and scene adaptability can be improved.
  • step S109 the central control device reconfigures the virtual aggregation device.
  • the central control device after the central control device initializes the virtual aggregation device in S105, it can also continuously detect the status of users, devices, environments, etc. through the virtual aggregation device on the basis of the existing virtual aggregation device, according to The detected information analyzes potential service demands of users, and adaptively adjusts the virtual aggregation device, that is, reconfigures the virtual aggregation device.
  • the currently existing virtual aggregation device may be an initialized virtual aggregation device, or a virtual aggregation device after multiple reconfigurations. That is to say, S109 can be executed multiple times.
  • the central control device may reconfigure the virtual aggregation device after the virtual aggregation device detects a state change event.
  • the state change event may also be referred to as a second event.
  • the reconfiguration of the virtual aggregation device by the central control device may include the following steps:
  • Step 1 the central control device triggers the first device to detect a state change event.
  • state change events may include events that affect the quality of service provided by communication system 10 .
  • the quality of service provided by the communication system 10 may include, for example, user satisfaction, degree of matching with user habits, accuracy of human-computer interaction recognition, response speed, and the like.
  • a state change event can be a modality or a combination of modalities.
  • the modality may include, for example, text, voice, vision, action, situation (such as the location of the user, the distance between the user and the device), scenes (such as office scenes, home scenes, commuting scenes), etc.
  • State change events can include the following types:
  • Type 1 interactive operation of user input.
  • the interactive operations input by the user may include, but are not limited to: voice commands, touch operations on the display screen (such as click operations, long press operations, double-click operations, etc.), air gestures/floating gestures, and touch operations on device buttons. Actions, gestures, eye movements, mouth movements, movements or shaking of the device, etc.
  • a mobile phone when using a mobile phone, if the user wants to cast a screen, he can output a voice command "cast screen", or click a screen projection button on the display screen of the mobile phone.
  • Type 2 the event that the user status changes.
  • the user status may include, for example, the location of the user, the affairs performed by the user (such as exercising, working, watching TV, etc.), and the like.
  • Events of user state change may include, for example: user location movement (for example, moving 0.5 meters), user getting up, user sleeping, user going out, user motion, and so on.
  • Type 3 the event that the device state changes.
  • the device status may include, for example, device power, power consumption, location, and the like. Events of device status changes may include battery power below a threshold, location moving, power consumption above a threshold, new devices joining or going online to the communication system 10, devices exiting or going offline in the communication system 10, and so on.
  • Type 4 events where the situation between the user and the device changes.
  • the posture between the user and the device may include, for example, the distance between the two.
  • the event that the situation between the user and the device changes may include, for example, that the user moves the device (for example, picks up a mobile phone), and the distance between the user and the device changes (for example, becomes larger or smaller).
  • Type 5 an event in which the state of the environment changes.
  • the environmental state may include, for example: ambient temperature, humidity, ultraviolet intensity, air volume, ambient light, and the like.
  • An event of a change in environmental state may include, for example, a temperature greater than a threshold (eg, 30 degrees Celsius).
  • Type 6 the device obtains a notification message, or obtains an event of upcoming schedule information.
  • the notification message obtained by the electronic device may be actively generated by an application in the device during operation, or may be sent by a server corresponding to the application in the device, or may be sent by other devices.
  • the electronic device may receive a notification message sent by a trusted institution for notifying extreme weather (eg, storm, heavy snow, etc.).
  • Schedule refers to the plan and arrangement for a certain moment or time period.
  • a schedule may also be called an event, a transaction, a schedule or other names, which are not limited here.
  • the schedule information may come from a memo, a calendar (calendar), an alarm clock (clock), a ticket booking application, an online meeting application, etc. in the electronic device.
  • the central control device can select a composable capability that supports detection of state change events in the currently configured virtual aggregation device, that is, select a part of the interactive combinable capabilities of the current virtual aggregation device Or all interaction classes may combine capabilities to detect this state change event.
  • the central control device can arbitrarily select or select part or all of the combinable capabilities to detect state change events from the interactive combinable capabilities of the current virtual aggregation device or according to a certain policy.
  • the strategy can be: select the combinable capability in the device closer to the central control device to detect specific events, select the combinable capability that has recently interacted with the central control device to detect state change events, activity priority, and near user priority , User habits first, and so on.
  • the central control device may select some interactive combinable capabilities in the virtual aggregation device according to the current time or scene to detect the state change event. For example, scenarios may include day mode, night mode, movie viewing mode, sports mode, etc., and the central control device may select different interactive combinable capabilities in the virtual aggregation device in different scenarios to detect state change events.
  • Some or all of the interactive combinable capabilities selected by the central control device for detecting state change events are the same as the interactive combinable capabilities for detecting specific events in S106, that is, the interactive combinable capabilities for detecting state change events
  • the user's device that detects a state change event is the first device.
  • the policy of selecting the first combinable capability and the first device refer to the relevant description in S106 above.
  • the central control device can trigger the first device to activate the first combinable capability to detect the state change event by sending a notification message (such as broadcast, multicast) through the connection between the devices.
  • a notification message such as broadcast, multicast
  • each first device activates the first combinable capability to collect corresponding data, it can locally analyze the data, and send the analysis result (such as an identified event) to the central control device, It is used for the central control device to know whether a status change event is currently detected.
  • one or more first devices activate the first combinable capability to collect corresponding data
  • they can send the data collected by themselves to the central control device, and the central control device will integrate multiple The data collected by the first capability can be combined to analyze whether a state change event is currently detected.
  • one or more first devices activate the first combinable capability to collect corresponding data
  • they may send the data collected by themselves to the second device in subsequent S107, and the second device Based on the data, the user's service needs are analyzed.
  • Step 2 the central control device triggers the second device to analyze the user's service requirements according to the detected state change event.
  • the central control device can analyze or identify the service requirement of the user according to the state change event. In other words, the central control device can predict the user's service demand according to the status change event.
  • the user service requirements identified by the central control device based on the state change event can be divided into deterministic requirements and probabilistic requirements.
  • Deterministic requirements are the identified exact services that need to be provided to the user, usually by the user's explicit instructions (such as the user's operation on the device's human-machine interface, the user's clear voice instruction, or gestures that conform to the definition of a specific device's human-computer interaction etc.) as input.
  • the user's explicit instructions such as the user's operation on the device's human-machine interface, the user's clear voice instruction, or gestures that conform to the definition of a specific device's human-computer interaction etc.
  • Probabilistic demand means that the identified potential service demand of the user, that is, the user has a tendency to request the service, but it cannot be determined that the service needs to be provided immediately.
  • Probabilistic requirements generally correspond to non-dominant user behaviors (such as location changes, sleep state changes, etc.) or state changes of the environment itself (such as temperature changes, etc.). Since probabilistic requirements often have multiple possible outputs with corresponding confidence levels, selection can be further made according to rules, for example, one or more that meet a certain threshold can be selected as alternatives.
  • state change events can include a variety of types. Therefore, in a specific implementation, the original virtual aggregation device may detect multiple state change events, and the central control device may analyze the user's service requirements based on a single state change event, or comprehensively analyze the user's service based on multiple state change events. need.
  • the ways in which the central control device identifies or analyzes user service needs based on state change events may include the following:
  • Mode 1 determining the user's service needs based on fixed rules.
  • a fixed judgment and recognition rule is preset, and after the state change event is processed by the fixed logic of the rule, the judgment result is output, that is, the user service demand is output. For example, there is a high probability that a kitchen appliance control event will occur when the user is in the kitchen. Therefore, if it is detected that the user steps into the kitchen, it can be determined that the user has a need for kitchen appliance control. For another example, the user will most likely turn on the air conditioner when the temperature is higher than 30 degrees Celsius, so if it is detected that the temperature is higher than 30 degrees Celsius, it can be determined that the user has a need to start the air conditioner.
  • Mode 2 determining the user's service needs based on the knowledge graph.
  • a knowledge graph is a knowledge base in which data is integrated through a graph-structured data model or topology. Knowledge graphs are often used to store entities that have interrelationships with each other. In the embodiment of this application, the knowledge graph shows the data structure of different state change events and the interrelated data structure between the user's service requirements.
  • the knowledge graph can be constructed based on past interaction information between users and the communication system 10 . In some other embodiments, the knowledge graph can also be manually designed, or obtained based on the statistics of a large number of group users. For example, in the above example of starting the air conditioner, at the beginning, it can be defined manually or statistically that there is a need to start the air conditioner when it is "30°C". , you can update the map content.
  • Mode 2 can flexibly expand the rules and scenarios for determining service requirements.
  • Method 3 collects data associated with state change events in actual scenarios and users’ actual service needs, and trains a model that can deduce the latter from the former based on machine learning methods, as an implementation mechanism for judging service needs.
  • Method 3 can be realized by referring to the semantic understanding technology that has been widely used in voice assistants (such as recognizing user intent from user language instructions), and can consider the expansion of multimodal (such as multiple state change events as input) recognition.
  • the central control device can also identify the user's service needs in combination with the context, user portrait, memory, user habits, current configuration status of the virtual aggregation device, etc., so that the user's service needs can be identified more accurately and effectively .
  • the configuration state of the virtual aggregation device is formed by configuring the virtual aggregation device.
  • the "current configuration status of the virtual aggregation device" can to some extent represent the service requirements of users in the past, so that it can be used to infer and identify the service requirements of users in the current moment (such as using Markov process modeling).
  • the first device may notify the second device of the state change event, so that the second device can start the third combinable capability to analyze the state change event. Service requirements corresponding to state change events.
  • the central control device may notify the first device after determining the second device, so that the first device notifies the second device of the state change event.
  • the central control device may directly trigger or notify the second device to analyze the service requirements corresponding to the state change event .
  • the second device may Analyze the corresponding service requirements based on the data.
  • Step 3 the central control device triggers the second device to determine a service plan based on the user's service requirements.
  • the central control device can use a certain strategy to determine the service plan to be prepared based on the user's service requirements (deterministic or probabilistic).
  • a service plan includes the preparations required to provide services to users. After the corresponding preparations are made, the central control device can directly execute the processes and functions involved in the service when it is confirmed that the service needs to be provided to the user.
  • the determined service plan may contain multiple pieces of information to indicate the subsequent adjustment and adaptation behavior of the communication system, for example as follows:
  • Required interaction methods such as audio input/output, video input/output, position detection, gesture detection, etc.
  • it may also include other attributes that can be combined, such as location (such as in a designated room, designated area, etc.), performance (such as far-field/near-field sound pickup, etc.).
  • Capability combination strategy for example, you can use the same device priority (required resources and capabilities should come from a single device as much as possible), near user priority (the ability to interact with the user should be as close as possible to the user), performance index priority (priority selection can meet all requirements) Interactive performance requirements, such as far-field audio playback is generally preferred when playing music) and other strategies to determine the combinability of aggregates.
  • the currently supported simple interactive services (such as weather, alarm clock, etc.) are directly processed by the current interactive device according to the original process, which can maintain the user's simple and direct experience.
  • a combination scheme of multi-device combinability For a service involving multi-device interaction, output a combination scheme of multi-device combinability according to the service. Possible methods include: pre-configuration in the corresponding service item, knowledge-based reasoning, etc.
  • the service plan output by the second device there may be some alternatives in the service plan output by the second device, for example, for a certain service, image display is preferred, but when the image display capability is not available, it can be done by voice; or a certain service is preferred Far-field pickup, but it can be supplemented by near-field pickup when the ability is not available; or a certain service prefers multi-mode output, but when the capability is not available, you can only use the necessary modes and give up some optional modes state.
  • the above possible combinations can also be formed into multiple alternative service solutions for processing in subsequent steps.
  • the central control device requests to organize the composable capabilities of the virtual aggregation device.
  • the central control device may request aggregation of combinable capabilities corresponding to the service plan based on the service plan.
  • the implementation plan may include any one or more of the following:
  • the user/environment status screen the physical location, orientation and other factors that affect the interaction effect that require a combinable capability, such as living room, bedroom, etc. For example, if the user is in the bedroom, the combinable capability located in the bedroom may be selected.
  • screen the performance requirements of a combinable capability such as far-field/near-field sound pickup, public large screen or private small screen, device mobility, etc. For example, if there are many people in the current environment, you can choose a private small screen to ensure user privacy.
  • the central control device can combine the actual available combinable capabilities of the current virtual aggregation device to exclude impossible solutions, and combine user pre-configuration strategy, finally select a solution and submit an aggregation application to the communication system 10.
  • the central control device may directly request aggregation of the corresponding composable capabilities based on the service requirements analyzed in step 2.
  • the central control device After the central control device obtains the user's service requirements based on the analysis of state change events, it does not need to perform the step of formulating a service plan separately, but directly analyzes the configuration adjustment target specifications of the virtual aggregation device from the user service requirements, thereby supporting subsequent implementation configurations Adjustment.
  • the central control device can directly screen out the required service requirements corresponding to the service requirements by combining the actual set of combinable capabilities of the communication system 10 and the attributes (performance, location, etc.) of the combinable capabilities therein. Combination ability.
  • the central control device may also implement the aforementioned screening process based on a certain fixed or configurable capability combination policy.
  • Analyzing the configuration and adjustment target specifications of virtual aggregation devices directly from user service requirements can simplify the step of formulating service plans and facilitate implementation.
  • the solution can reduce the processing capability requirements for the central control equipment, and can be widely used in the central control equipment with low performance configuration.
  • the configuration of environmental equipment is relatively simple (for example, the number of equipment is small), and the business scene is relatively fixed (such as offices, etc.), and the flexibility of intelligent collaboration of environmental equipment is not high, and it can meet the requirements of users. Quickly and easily implement reconfiguration of virtual aggregation devices.
  • the central control device can make comprehensive decisions based on user configuration preferences, human-computer interaction or user operation history, and status information such as users, equipment, and the environment, and select appropriate combinable capabilities.
  • Step 5 The central control device aggregates the composable capabilities corresponding to the service plan to reconfigure the virtual aggregation device.
  • the central control device can further reconfigure the virtual aggregation device based on the applied combinable capabilities and the configuration status of the current virtual aggregation device.
  • the reconfiguration of the virtual aggregate device may include: changing the configuration parameters of the composable capabilities in the current virtual aggregate device, reselecting or updating the composable capabilities constituting the virtual aggregate device, and so on. Updating the composable capabilities of the current virtual aggregation device may include: adding new combinable capabilities, and releasing the original combinable capabilities when they are no longer needed.
  • the reconfiguration of the virtual aggregation device in step 5 is mainly completed by the smart assistant application calling the specific interface of each device operating system. For example, it can be completed for the unique distributed technology in the operating system, which is not limited here.
  • the central control device can dynamically reconfigure the virtual aggregation device after detecting a specific event, and can also dynamically reconfigure the virtual aggregation device after detecting other state change events.
  • the composable capabilities that make up the virtual aggregated devices can include the radio reception provided by these devices (headphones and mobile phones are near-field, speakers and smart screens are far-field), sound playback (headphones and mobile phones are near-field, speakers and smart screens are Far field), display (smart screen and mobile phone), shooting (smart screen and mobile phone), and software services and capabilities provided in these devices, as well as other capabilities of these devices.
  • the smart assistant can analyze and determine that the virtual aggregation device is currently required to have video conferencing, camera, display, The ability to pick up and play sound. Therefore, the central control device can form the following composable capabilities into a virtual aggregation device:
  • the camera of the smart screen has a fixed position and a wide angle, suitable for video conferencing scenarios
  • the screen of the smart screen has a fixed position and large size, suitable for video conferencing scenarios;
  • the speaker is equipped with a microphone array, which has a better sound pick-up effect and can provide spatial effects;
  • Sound playback earphones, which can avoid disturbing external sounds in the middle of the night.
  • the central control device can respectively configure the combinable capabilities of the above-mentioned devices as input components of the video conferencing App in the mobile phone, and start the video conferencing App on the mobile phone. In this way, for the App, it is exactly the same as simply running the function on a single mobile phone, but the service has actually been provided in a way that is more suitable for the scenario and has a higher experience.
  • the central control device learns from information such as operation history and user portraits that users often go to the living room to check their schedules on the large screen after waking up in the morning.
  • the mobile phone will generally remain as a part of the virtual aggregation device.
  • the sleep detection capability of the smart watch worn by the user is configured in the virtual aggregation device to detect the user's sleep state.
  • Other devices are dormant with no associated execution or detection tasks.
  • the smart assistant can perform the following operations: (1) determine the user's service needs: the expected user will go to the living room and broadcast the schedule on a large screen; (2) determine the service plan as : Detect the user's location; (3) Then perform dynamic configuration: start the combinable capabilities of indoor location detection (such as millimeter-wave radar, camera, etc.), that is, configure and activate these combinable capabilities to become a part of the virtual aggregation device.
  • indoor location detection such as millimeter-wave radar, camera, etc.
  • the smart assistant can perform the following operations: (1) determine the user's service demand: browse the schedule in the living room; (2) determine the service plan as follows: Requires access to schedule information (already provided by the phone), and an explicit ability to present the schedule. (3) Then perform dynamic configuration: according to the user's preference (for example, the user prefers to use the large screen in the living room for display), the display capability of the large screen (one of its combinable capabilities) is configured as a part of the virtual aggregation device, that is, the virtual aggregation device It has been prepared in advance for large-screen display for users.
  • the smart assistant can perform the following operations: (1) determine the service demand of the user: broadcast schedule; (2) determine the service solution: use the display method preferred by the user (smart screen) broadcast schedule information (from mobile phone); (3) Since the composable capability supporting this service solution has been configured as a virtual aggregation device at this time, it can respond quickly (without temporarily performing related preparations), using this Capabilities can be combined to perform tasks.
  • the virtual aggregation device detects a total of three state change events: the state change event of the user waking up, the state change event of the user walking to the living room, and the voice command "broadcast schedule".
  • the state change event of the user waking up and the state change event of the user walking to the living room trigger the reconfiguration of the virtual aggregation device.
  • the voice command "broadcast schedule" triggers the response of the virtual aggregation device as a specific event.
  • the central control device can configure the composable capabilities supporting the task to be executed as a virtual aggregation device.
  • the central control device may also trigger some devices in the communication system 10 to prompt the user that the virtual aggregation device has been reconfigured.
  • the prompt method is not limited.
  • the central control device may trigger an electronic device (such as a mobile phone) to display prompt information on the user interface 63 to remind the user that the virtual aggregation device has been reconfigured.
  • the user can also click on the control 635 to view the composable capabilities contained in the currently reconfigured virtual aggregation device.
  • the central control device triggers the third device in the virtual aggregation device to execute a task to be executed that satisfies the user's intention.
  • the central control device may select part or all of the resources in the virtual aggregation device to execute tasks to be executed that meet the user's intention.
  • Some or all of the resources may be referred to as second resources.
  • the second resource may include resources from one electronic device, or may include resources from multiple electronic devices.
  • the quantity of the second resource can be one or more.
  • the second resource is a composable capability, for example, it may be a service class composable capability.
  • the central control device may select some or all of the service class composable capabilities in the configured virtual aggregation device to perform the above tasks to be performed. That is to say, the central control device can trigger the second device to match the task to be executed to an appropriate combinable capability, and then trigger the corresponding device where the combinable capability is located to execute the task to be executed.
  • the central control device can arbitrarily select or select part of the service class combinable capabilities of the virtual aggregation device according to a certain strategy to perform the above tasks to be executed.
  • the central control device can combine one or more of the following to select an appropriate service class composable capability to perform the above tasks to be performed: user status, device status, environment status, user portrait, and global context detected by the communication system 10 history , or memory.
  • the above strategies may include, for example, any one or a combination of the following:
  • the central control device may select one or more combinable capabilities with higher or highest activity levels to execute the above tasks to be executed.
  • the method of determining the activity level of composable capabilities please refer to the above. In this way, the composable capability of the service class with high activity can be selected to execute the above tasks to be executed.
  • the central control device may select a service-type composable capability in a device that is closer to the user to execute the above-mentioned tasks to be executed.
  • a service-type composable capability in a device that is closer to the user to execute the above-mentioned tasks to be executed.
  • the central control device may preferentially select the service class composable capability in the first device to execute the above-mentioned tasks to be executed.
  • the central control device may select a service class composable capability in the device that collects more critical information for identifying a specific event to combine the above tasks to be executed. For example, if the device used to detect voice commands in S106 includes mobile phone A on the coffee table far away from the user, and mobile phone B held in the user's hand, the central control device can select the collected mobile phone A with higher sound intensity to Respond to user voice commands.
  • the central control device may, according to user habits, preferentially select the most frequently invoked combinable capabilities in history to execute the above-mentioned tasks to be executed. In this way, the composable capabilities of the service class that users are accustomed to using can be selected to perform the above tasks to be performed.
  • the central control device can collect associated data of tasks to be executed and service-type composable capabilities within a specific time frame, and predict a series of users who may need to activate the composable capabilities from the former input based on machine learning/deep learning methods Model. Afterwards, based on the model and taking the tasks to be executed as input, the composable capabilities to be activated are obtained. This method can be implemented by referring to the ranking technology that has been widely used in recommender systems. At the same time, multimodal input needs to be considered as an extension of the task to be performed.
  • the user can independently select the composable capabilities for performing the above-mentioned tasks to be executed, and the central control device can trigger the service-type combinable capabilities selected by the user to execute the above-mentioned to-be-executed tasks.
  • the central control device can trigger the service-type combinable capabilities selected by the user to execute the above-mentioned to-be-executed tasks.
  • the user can select the combinable capabilities for performing the above-mentioned tasks to be performed, for example, it can be selected by operating on the central control device, or by voice, gesture, and the like.
  • service class composable capabilities can be selected according to the actual needs of users to perform the above tasks to be performed.
  • the central control device can select more capable or more service class composable capabilities in the device to perform the above tasks to be executed.
  • the central control device may preferentially select the combinable capability of the service class in the device with a screen or with a speaker to perform the above-mentioned to-be-executed tasks.
  • the device or the user may preset the priority of the device for performing the above tasks to be performed, for example, it may be preset when the device leaves the factory or be preset by the user during use.
  • the preset device priority may be stored in the cloud server, or in any one or more devices in the communication system 10 .
  • the preset device priority can be: speakers with screen, speakers without screen, smart screen, car machine, mobile phone, PAD, watch.
  • the central control device may preferentially select service class composable capabilities in devices with higher priorities according to preset device priorities to perform the above tasks to be executed.
  • Attention refers to the user's observation of the external world and perception of the surrounding environment.
  • the device where the user's attention is focused on is the focus device, which is the device that the user's face, line of sight and body focus on.
  • the central control device may select the combinable capability of the service class in the device where the user's attention is located to perform the above-mentioned to-be-executed tasks.
  • the central control device may use the environment information collected by one or more devices in the communication system 10 to determine the device where the user's attention is located.
  • Environmental information can be a single modality or a combination of multimodal information.
  • the environment information may include any one or more of the following: location information, text, audio, video, and so on.
  • the method for the central control device to determine the device where the user's attention is located may specifically include the following:
  • the central control device uses the image collected by the B device equipped with a camera to determine the device where the user's attention is located.
  • the B-device equipped with a camera may also be referred to as the fourth device.
  • the central control device can determine that the user's attention is on device A through the following process:
  • one of the following methods can be used to calculate whether the user's attention is on device A:
  • Method 1 Calculate the directional similarity between the direction vector from the user to the A device and the line of sight direction, and if the direction similarity is greater than a threshold, determine that the user's attention is on the A device.
  • Method 2 Calculate the distance between the point where the user's line of sight falls on any coordinate plane of device A and the actual coordinates of device A. If the distance is smaller than the threshold, it is determined that the user's attention is on device A.
  • the point where the user's line of sight falls on the A_xy plane is Z′ A_B , and the distance between Z A and Z′ A is calculated, and the calculation method for other planes is the same.
  • the central control device uses device A equipped with a microphone, and device B equipped with a microphone and a camera to determine the device where the user's attention is located.
  • the device A with the microphone may also be called the fourth device, and the device B with the microphone and the camera may also be called the fifth device.
  • the central control device can determine the user's attention through the following process On A-device:
  • A locates the user's orientation ⁇ 1 in the A coordinate system through sound source positioning
  • B locates the user's coordinates (x1, y1) in the B coordinate system through visual detection
  • the method in the previous scene can be used to transform and map the user's line of sight direction detected by device B to the A coordinate system, and the method in scene (1) can be used to calculate whether the user's attention is on device A.
  • the central control device uses the A device and the B device equipped with a camera to determine the device where the user's attention is.
  • the device A with the camera can also be called the fourth device, and the device B with the camera can also be called the fifth device.
  • the control device can determine that the user's attention is on the A device through the following process:
  • the conversion relationship between user and B is R user ⁇ B , T user ⁇ B ;
  • normB user*R user ⁇ B +T user ⁇ B
  • normA normB*R B ⁇ A +T B ⁇ A
  • R user ⁇ A R user ⁇ B *R B ⁇ A
  • T user ⁇ A T user ⁇ B *R B ⁇ A +T B ⁇ A
  • Gaze coordA Gaze coordB *R user ⁇ A +T user ⁇ A
  • the central control device can determine that the user's attention is on device A through the following process:
  • a and B determine that the fields of view of the two overlap by means of feature point matching/image matching, etc.
  • a and B respectively obtain the world coordinates of the matching points in their respective coordinate systems through depth estimation
  • B calculates the user's world coordinates and line of sight direction in the B coordinate system through visual detection
  • the user's world coordinates and line-of-sight directions in the B coordinate system can be mapped to the A coordinate system. Then the method in scenario (1) can be used to calculate whether the user's attention is on the A device.
  • the A-device and B-device in the above policy 9 may be any device in the communication system 10, or may be the first device selected by the central control device according to a certain strategy, which is not limited here.
  • the service class composable capabilities in the device where the user's attention is located can be selected to perform the above tasks to be performed, making the interaction more natural and more in line with the user's needs.
  • the user can also adjust the line of sight to trigger the virtual aggregation device to select the device where his attention is located to perform the above-mentioned tasks to be performed.
  • the embodiments of the present application may also use other strategies to select service class composable capabilities to execute the above tasks to be executed.
  • the central control device can also select the service class composable capability in the device that is closer to the central control device to perform the above tasks to be performed, or select the service class composable capability in the device that interacts with the central control device to execute the above tasks.
  • Part or all of the service class composable capabilities selected by the central control device in the virtual aggregation device to perform the above tasks to be executed may be called the second combinable capability, and the physical device where the second composable capability is located is the third composable capability. equipment.
  • the number of second combinable capabilities may be one or more. There may also be one or more third devices. For example, in the home range, the central control device can determine the smart screen and the smart speaker as the third device, and trigger the smart screen to play images, and trigger the smart speakers to play audio.
  • the central control device may select some or all of the devices capable of performing the above tasks to be performed among the virtual aggregation devices as the third device.
  • the second combinable capability and the third device for performing the task to be performed may also be different.
  • the central control device configures a virtual aggregation device in S105, the second combinable capability in the virtual aggregation device is ready to start in advance. Therefore, in S108, the central control device can quickly and conveniently trigger the first
  • the third device activates the second combinable capability to perform the aforementioned tasks to be performed. It can be seen that by configuring the virtual aggregation device, the efficiency of performing S108 by the communication system 10 can be improved, so as to provide better services for users.
  • S108 may be executed with a delay. In some embodiments, S108 may not be executed.
  • the central control device can trigger the third device to start the second combinable capability to perform the above-mentioned tasks to be executed by sending a notification message (such as broadcast, multicast) through the connection between the devices.
  • a notification message such as broadcast, multicast
  • the central control device can distribute instructions for executing tasks to be executed to the third device where each combinable capability is located based on the results of the above-mentioned screening of combinable capabilities, and trigger the third device to execute according to the execution of multiple to-be-executed tasks. relationship to execute the corresponding task to be executed.
  • the third device may notify the third device of the task to be performed, so that the third device can activate the second combinable capability to perform the above pending tasks.
  • the central control device may notify the second device after determining the third device, so that the second device notifies the third device of the task to be executed.
  • the central control device may directly trigger or notify the third device to perform the above task to be performed.
  • FIG. 9 shows the scenario of Example 1.
  • the virtual aggregated device at this time may include: a collection of component capabilities in the living room and room devices (in this example, the smart screen and speakers in the living room, the mobile phone in the room, and related devices with the ability to locate personnel in the environment). Initially, the virtual aggregation device dispatches the relevant capabilities of the smart screen in the living room to serve the needs of children watching TV. At the same time, since the mother is in the room, the virtual aggregation device needs to configure related capabilities (such as the capabilities of the mobile phone) to prepare for sensing the mother's request.
  • related capabilities such as the capabilities of the mobile phone
  • the smart assistant can recognize the intention of "Look at the living room”, and split it into three modes: watching the living room camera, recording the living room sound, and playing the living room sound. Status pending tasks.
  • Optional capability components in virtual aggregation devices include smart screens and camera capabilities of mobile phones;
  • Audio category Optional capability components include audio capabilities of smart screens, mobile phones, and speakers;
  • Display category optional capability components include the display capabilities of smart screens and mobile phones;
  • the smart assistant selects the corresponding device to perform the task to be performed, the following combinable capabilities can be selected to perform the above task to be performed:
  • the sound pickup ability of the speaker In the living room, because the child is watching TV and may have voice interaction with the smart screen, the display, sound playback and sound pickup capabilities of the smart screen have been occupied and are not considered. Therefore, the smart assistant can choose other capabilities, such as the ability to pick up sound provided by the speaker.
  • the camera capability of the smart screen Although part of the capabilities of the smart screen are occupied, its camera capability is idle and available, and it is the only option for this type of capability in the living room, so this capability was chosen to perform the task.
  • the playback capabilities of the mobile phone can select the phone to play the audio on, for reasons such as the only phone available in the room, or because the user is interacting with the phone (sending the command).
  • the above example supports the mother to use the idle combinable capabilities to obtain the visual situation of the living room when some of the combinable capabilities of the smart screen are occupied.
  • the conflict problem of combinable capability allocation can be solved, and a physical device is divided into multiple combinable capabilities to serve different tasks respectively. That is, the embodiment of the present application supports the management and consumption of each device according to the combinable capabilities, and allows multiple users to use different combinable capabilities of the same device at the same time.
  • virtual aggregated devices include: a collection of combinable capabilities in living room and room devices.
  • the mobile phone in the room receives a call from the user's relatives, and the smart assistant can recognize the intention of "answering the call".
  • the smart assistant can first break down the intent of "answer the phone call" into a pending task of playing audio.
  • the optional combinable capabilities of the virtual aggregation device include: (1) audio combinable capabilities on mobile phones; (2) audio combinable capabilities on speakers; (3) audio combinable capabilities on large screens.
  • the smart assistant can select the appropriate combinable capability to perform the pending task of playing audio according to the degree of matching.
  • the smart assistant can distribute the to-be-executed task of playing audio to the audio combinable capability of the speaker, and at the same time distribute the task of adjusting the volume to the conflicting capability component.
  • the virtual aggregation device can split the user's intention into multiple tasks to be executed, and then distribute them to different combinable capabilities in the virtual aggregation device, which can make full use of the capabilities of the virtual aggregation device to provide users with a range of Broader and wraparound services.
  • the first device, the second device, and the third device may include the same device, or may include different devices.
  • any one or multiple items of the first resource, the second resource, and the third resource may all come from the same device, or all or part of them may come from different devices.
  • Any multiple items of the first resource, the second resource, and the third resource may be the same or different.
  • Any one or more of the first combinable capability, the second combinable capability, and the third combinable capability may all come from the same device, or all or part of them may come from different devices. Any number of the first combinable capability, the second combinable capability, and the third combinable capability may be the same or different.
  • the communication system 10 may also execute the foregoing method based on the global context.
  • This interaction method can be applied to the aforementioned steps S107-S108.
  • the method may receive multiple rounds of interaction input based on the first combinable capability on the first device.
  • the above multiple rounds of interactive input may come from a single device or from multiple devices.
  • the first composable capability can analyze the above-mentioned received multiple rounds of interaction input to obtain the global context.
  • the third combinable capability can determine the user's intention based on the global context, and then enable the virtual aggregation device to select an appropriate second combinable capability to perform a task corresponding to the user's intention.
  • the global in the global context may refer to all connected devices included in the communication system 10, for example, when the devices included in the communication system 10 are all connected devices in the user's home, then the global refers to all connected devices in the user's home All connected devices.
  • the global context refers to the device state information, environment information and/or user information detected by each interaction type in all connected devices included in the communication system 10.
  • the device state information may refer to the battery of the electronic device. Status, usage of electronic equipment, and whether the combinable capabilities in electronic devices can be used; environmental information can refer to environmental conditions such as temperature changes, light changes, and biological activities in the area detected by the combinable capabilities; user The information may refer to the user's explicit intention input or implicit intention input such as voice information input by the user, gesture information input by the user, and habits of the user.
  • the interaction method enables the virtual aggregation device to obtain the interaction input and interaction history from multiple devices, and manages the global context in a unified manner, so that the communication system 10 can more clearly identify the user's real identity based on the above global context. intent, improving the efficiency of cross-device control.
  • a possible multi-device scenario in the embodiment of the present application may include: a large screen 110 , a mobile phone 120 , a smart speaker 130 , a smart watch 140 , and a doorbell 150 .
  • the large screen 110 is located in the living room
  • the mobile phone 120 and the smart speaker 130 are located in the user's bedroom
  • the smart watch 140 is worn on the user's arm
  • the doorbell 150 is located at the door of the living room.
  • a courier is triggering the doorbell 150 outside the door of the living room.
  • the devices can be connected in a wired or wireless manner for data exchange between the devices.
  • the global refers to the large screen 110 , mobile phone 120 , smart speaker 130 , smart watch 140 and doorbell 150 listed above.
  • the combinable capabilities of the above-mentioned devices may be included in the communication system 10 covering the above-mentioned room area.
  • the combinable capabilities may include near-field voice input capability, far-field voice input capability, user physiological signal detection capability (for example, EEG detection, EMG detection and heart rate detection, etc.), one or more sensors, voice Recognition ability, device status detection ability, music playback ability, video playback ability, etc.
  • some or all of the combinable capabilities of the above multiple devices may form a virtual aggregation device.
  • the large screen 110 is used as the central control device, that is to say, one or more of the combined capabilities of the large screen 110, the mobile phone 120, the smart speaker 130, the smart watch 140 and the doorbell 150
  • the large screen 110 can schedule and control each combinable capability on the large screen 110 , the mobile phone 120 , the smart speaker 130 , the smart watch, and the doorbell 150 .
  • the virtual aggregation device can obtain the global context based on the above-mentioned composable capabilities, and manage the global context in a unified manner, so as to determine the user's intention based on the global context, and select and control the appropriate composable capabilities to perform corresponding functions.
  • the multi-device scenario shown in FIG. 10A is only used to illustrate the embodiment of the present application, and does not constitute any limitation to the present application.
  • the multi-device scene may also include more electronic devices, such as refrigerators, air conditioners, computers, etc.
  • the present application does not limit the devices included in the multi-device scene.
  • the first combinable capability described below may be a first resource
  • the second combinable capability may be a second resource
  • the third combinable capability may be a third resource
  • the first event may be a doorbell input event.
  • the first combinable capability receives multiple rounds of interactive input.
  • multiple rounds of interactive input may be received by the first combinable capability on the first device, and each round of interactive input may include information related to this round of interactive input (for example, the occurrence of this round of interactive input time, the combinable capability of the electronic device corresponding to this round of interactive input, and this round of interactive content, etc.).
  • the number of first devices may be one or more.
  • the first combinable capability may be one or more interactive combinable capabilities on the first device. Among them, the types included in the interaction class composable capabilities may be shown in FIG. 4 .
  • the interactive combinable capabilities configured by the current virtual aggregation device may include: near-field voice input capability, far-field voice input capability, and user physiological signal detection capability ( For example, EEG detection, EMG detection and heart rate detection, etc.) and so on.
  • the above-mentioned interaction class can be combined to receive the input from the doorbell 150 detecting the triggering of the doorbell and the detection of an outsider at the door of the living room, the input from the large screen 110 detecting that there is no one in the living room, and the input from the smart watch 140 based on the user’s heart rate detection.
  • the input of being sound asleep comes from the input that the mobile phone 120 detects that the battery power of the mobile phone 120 is sufficient, the input that the playback capability of the mobile phone 120 is available, and the input that the mobile phone 120 was used 30 minutes ago. That is to say, the above-mentioned multiple rounds of interactive input come from the interactive combinable capabilities of the large screen 110 (for example, the combinable capabilities of the perspective interaction class, the combinable capabilities of the voice interaction class, etc.), the interactive combinable capabilities of the mobile phone 120 (such as , touch interaction class combinability, gesture interaction class combinability, etc.), the interaction class combinability of smart watch 140 (for example, physiological signal interaction class combinability, etc.) and the interaction class combinability of doorbell 150 (such as , touch interaction class combinability, etc.) and other interaction class composable capabilities on multiple devices.
  • the interactive combinable capabilities of the large screen 110 for example, the combinable capabilities of the perspective interaction class, the combinable capabilities of the voice interaction class, etc.
  • the interactive combinable capabilities of the mobile phone 120 such
  • the above-mentioned multiple rounds of interactive input can have a variety of different modes.
  • the interactive mode of the input that triggers the doorbell in the doorbell 150 can be a doorbell trigger event
  • the interactive mode of the input of no one in the living room in the large screen 110 can be perspective interaction.
  • the interaction mode of the input of the user in the smart watch 140 who is sleeping may be the interactive input of physiological signals and the like.
  • the third composable capability analyzes the above-mentioned multiple rounds of interaction input received to obtain a global context.
  • the second device may use the third composable capability to perform analysis according to the order in which each round of interaction input occurs, so as to determine the global context.
  • the global context may include one or more of the following: the time of receiving each round of interactive input, the first combinable capability of receiving each round of interactive input, the interactive content of each round of interactive input, the interactive content of each round of interactive input.
  • Each round of interactive input corresponds to the user's physiological feature information, the device information of the electronic device to which the first combinable capability belongs (that is, the first device), or the device information of the target device controlled by the interactive input.
  • the global context may be stored on a specified device.
  • the designated device may be a central control device in the virtual aggregation device.
  • the global context can be stored in a non-central control device with sufficient storage resources (for example, the mobile phone 120 or smart speaker 130, etc.).
  • the non-central control device storing the global context may provide a program and/or interface for accessing the global context, so that the third composable capability may determine the user's intention based on the global context.
  • the first device may be the large screen 110 .
  • the global context acquired by the first composable capability may be shown in Table 4:
  • the interactive input marked as “1” occurs at 13:03:12, and the corresponding electronic device combinable capability is the button input capability of the doorbell 150, and the interactive content is to trigger the doorbell; marked as "2"
  • the occurrence time of the interactive input is 13:03:14, the corresponding electronic equipment can be combined with the infrared image detection capability of the large screen 110, and the interactive content is that there is no one in the living room;
  • the interactive input marked with "3" occurs at 13 :03:16, the corresponding combinable capability of the electronic device is the heart rate input capability of the smart watch 140, and the interactive content is that the user is sleeping and so on.
  • the above global context including multiple rounds of interactive input identified as “1", “2” and “3” as shown in Table 4 can be stored on the central control device in the multi-device scenario shown in FIG. 10A , that is, the large screen 110 .
  • the second device can recognize the user intention based on the above global context through the third composable capability, split the user intention into tasks to be executed, and make the virtual aggregation device map the task to be executed to an appropriate in the second composable capability of .
  • the specific process may be as shown in the following S1003-S1005.
  • the third composable capability determines the user intention based on the global context.
  • the second device may identify the user intention represented by the first event based on the obtained global context through the third composable capability.
  • the second device may be the large screen 110 .
  • the large screen 110 can identify the current environmental state as that there is a visitor at the door of the living room and the doorbell is triggered, there is no activity in the living room, the current state of the user is sleeping, and the current state of the mobile phone 120 is The battery power is sufficient, the playback capability is available, and the mobile phone 120 has been used by the user 30 minutes ago. Therefore, the large screen 110 can determine the first event based on the above-mentioned global context through the third combinable capability, that is, the user's intention represented by the doorbell input event is "remind the user that someone outside the door of the living room requests to open the door".
  • the third composable capability splits the above user intention into tasks to be executed.
  • the second device may split the above-mentioned user intention into tasks to be performed through the third composable capability, so that the virtual aggregation device can map the tasks to be performed to an appropriate second composable capability.
  • the user intention determined in step S1003 above is "remind the user that someone outside the door of the living room requests to open the door".
  • the large screen 110 can divide the user intention into multiple tasks through the task mapping module in the service response component, for example, the task of outputting vibration to remind the user, the task of playing the screen outside the door, the task of outputting the doorbell prompt tone, and so on.
  • the virtual aggregation device maps the above-mentioned to-be-executed tasks to the second composable capability.
  • the virtual aggregation device may select an appropriate second combinable capability in the third device to perform the task to be performed based on the determined user intention and/or the task to be performed.
  • the quantity of the third device may be one or more.
  • the large screen 110 may map the to-be-executed tasks determined in S1004 to the second combinable capabilities in one or more third devices.
  • the second combinable capability may be a motor vibration capability in the smart watch 140 , a music playback capability in the mobile phone 120 , and a video playback capability in the mobile phone 120 .
  • the smart watch 140 can output a vibration reminder based on the vibration capability of the motor, the mobile phone 120 can output the doorbell prompt tone from weak to strong based on the music playback capability, and the mobile phone 120 can output the image outside the living room door acquired by the doorbell 150 based on the video playback capability etc.
  • the third combinable capability can be based on the stored global context in accordance with the specified order of each interaction input (for example, according to each interaction input The time of occurrence is retrieved in order from near to far), and based on the specified matching rules, the stored historical interaction information and the current interaction information are matched and analyzed to serve as the basis for missing slot analysis and missing referent identification to determine the user's intention.
  • the specific process in this implementation manner will be described in detail in subsequent embodiments, and will not be repeated here.
  • FIG. 10C exemplarily shows a schematic diagram of the software architecture.
  • the software architecture may include: a multi-source input interaction context analysis module, a multi-modal intention decision-making module, a task sequence generation module, a task management module and a task mapping module.
  • the multi-source input interaction context analysis module can receive and analyze the multiple rounds of interactive inputs to obtain a global context.
  • the global context has multiple modalities
  • the multimodal intent decision-making module can analyze the intent recognition result based on the global context to determine the user's intent.
  • the task sequence generation module can control one or more combinable capabilities to perform corresponding functions through the task management module and the task mapping module based on the intention recognition result.
  • the global context is a single mode
  • the task sequence generation module can control one or more composable capabilities to perform corresponding functions through the task management module and the task mapping module based on the user intention determined by the global context.
  • the multimodal intent decision-making module can obtain the intent recognition result based on the above multiple rounds of interactive input to determine the user's intention. Then, the task sequence generation module can control one or more combinable capabilities to perform corresponding functions through the task management module and the task mapping module based on the intention recognition result.
  • the global context-based interaction method provided by the embodiment of the present application can enable the virtual aggregation device to more accurately identify the user's intention based on user instructions and user state information, device state information, and/or environment state information received across devices .
  • the virtual aggregation device can dynamically adjust and optimize the combined configuration of composable capability components in advance, shortening the command response time in multi-device scenarios. delay, and can support active services and the realization of long-term tasks in subsequent embodiments.
  • the above-mentioned software architecture provided by the embodiment of the application is only used to illustrate the application.
  • the software architecture may include more or fewer modules than those provided by the embodiment of the application, or may Including other modules, there may also be combinations and information interactions between modules that are different from those in the embodiments of the present application, which is not limited in the present application.
  • FIG. 11 exemplarily shows the specific flow of the matching analysis method provided by the embodiment of the present application.
  • the interactive input may include historical input and current input.
  • the global context can be generated based on the above-mentioned historical input and current input. Therefore, the global context may include historical interaction information and current round interaction information.
  • the historical interaction information associated with the current round of interaction information may be referred to as first historical interaction information.
  • the historical input can be the historical voice dialogue input
  • the current input can be the current voice dialogue input
  • the historical interaction information can be the dialogue information of the historical dialogue
  • the current round of interaction information can be the current round of dialogue. conversation information.
  • the first event may be a current voice dialogue interaction event.
  • the method may specifically include:
  • the first combinable capability acquires dialogue information of a current round of dialogue when a user interacts with a first device through a voice dialogue.
  • the first device may acquire the dialogue information of the current round of dialogue based on the first combinable capability (for example, voice input capability).
  • the first combinable capability for example, voice input capability
  • the first device when the user has a conversation with the mobile phone 120 , the first device may be the mobile phone 120 , and the first combinable capability may be the near-field voice input capability on the mobile phone 120 .
  • the first combinable capability can acquire the dialog information of the current round of dialog when the user is having a dialog with the mobile phone 120 .
  • the dialog information of the current round of dialog may include: the dialog content of the current round of dialog, the time when the current round of dialog occurs, the place where the current round of dialog occurs, the device information of the first device (such as the device name, device identification, etc.), the device information of the target device that the current round of dialogue wants to control, the physiological feature information of the user who sent the current round of dialogue, etc.
  • the device information of the target device to be controlled in the current round of dialog may be empty. in:
  • the dialogue content of the current round of dialogue may include the input information of the user in the current round of dialogue, and the input information may be one or more sentences issued by the user, or it may also be one or more sentences of voice-converted text/text information issued by the user .
  • the dialogue content of the current round of dialogue may also include the voice or text of the first device replying to the user in the current round of dialogue.
  • the time at which the current round of dialogue occurs may refer to the time at which the first device receives the voice information input by the user in the current round of dialogue.
  • the place where the current round of dialogue takes place may be the place where the first device is located. Taking the current round of dialogue between the user and the mobile phone 120 as an example, the place where the current round of dialogue occurs may be the bedroom where the mobile phone 120 is located.
  • the device information of the first device may refer to device information of an electronic device that performs dialog interaction with the user. For example, when the user conducts a current round of dialogue with the mobile phone 120 , the device information of the above-mentioned electronic device is the device information of the mobile phone 120 .
  • the device information of the target device to be controlled in the current round of dialog may refer to the device information of the target device actually controlled by the input information of the user in the current round of dialog.
  • the device information of the above-mentioned target device may be the device information of the big screen 110 .
  • the device information of the target device to be controlled in the current round of dialogue may be empty.
  • the physiological feature information of the user who sends out the current round of dialogue may be the user's voiceprint information, the user's face portrait, and the like.
  • the physiological feature information of the user in the current round of dialogue is the voiceprint information of the user currently conducting voice interaction with the mobile phone 120 .
  • the third combinable capability acquires dialog information of historical dialogs in the virtual aggregation device.
  • the second device may acquire dialog information of historical dialogs in the virtual aggregation device based on the third combinable capability.
  • the third combinable capability may acquire dialog information of historical dialogs stored on the virtual aggregation device in the scenario shown in FIG. 10A .
  • the dialogue information of the historical dialogue may come from the voice input capability of the large screen 110, the voice input capability of the mobile phone 120, the voice input capability of the smart speaker 130, the voice input capability of the smart watch 140, the voice input capability of the doorbell 150, and the like.
  • the third combinable capability can acquire the dialog information received via each device, and store the above dialog information in the virtual aggregation device ( For example, it may be stored in the central control device of the virtual aggregation device, that is, the large screen 110).
  • the third combinable capability can acquire one or more rounds of dialogue information of historical dialogues.
  • the dialog information of each round of historical dialog may include: the dialog content of this round of historical dialog, the time when this round of historical dialog occurs, the place where this round of historical dialog occurs, and the device information of the electronic device receiving this round of historical dialog (for example, device Name, device identification, etc.), the device information of the target device to be controlled by this round of historical dialogue, the physiological feature information of the user who sent this round of historical dialogue, etc. in:
  • the dialogue content of this round of historical dialogue may include the input information of the user in this round of historical dialogue.
  • the input information may be one or more sentences issued by the user, or it may also be the text/text converted from one or more sentences issued by the user. Word.
  • the dialog content of each round of historical dialog may also include the voice or text that the electronic device to which the voice input capability component belongs to in this round of historical dialog replies to the user.
  • the time at which this round of historical dialogue occurs may refer to the time at which this round of historical dialogue is received.
  • the place where this round of historical dialogue occurs may be the place where the device receiving this round of historical dialogue is located.
  • the place where this round of historical conversations occurs may be the living room.
  • the device information of the electronic device that receives this round of historical dialog may be the device information of the electronic device that has this round of historical dialog with the user.
  • the device information of the above-mentioned electronic device is the device information of the large screen 110 .
  • the device information of the target device to be controlled in this round of historical dialog may refer to the device information of the target device actually controlled by the user's input information in this round of historical dialog. For example, if the above-mentioned user has had a certain round of historical conversations with the large screen 110 and the voice command issued is "open the large screen", the device information of the above-mentioned target device may be the device information of the large screen 110 .
  • the physiological feature information of the user who sent out this round of historical dialogue may be the user's voiceprint information, the user's face portrait, and the like.
  • the physiological feature information of the user in this round of historical dialogue is the voiceprint information of the user who interacted with the large screen 110 in this round of historical dialogue .
  • the third combinable capability is based on the dialogue information of the current round of dialogue, and obtains the dialogue information of the historical dialogue related to the dialogue information of the current round of dialogue from the dialogue information of one or more rounds of historical dialogues obtained above.
  • the dialog information of the historical dialog related to the dialog information of the current round of dialog may be referred to as the first historical interaction information.
  • the process of matching the obtained one or more rounds of historical dialog information to obtain the dialog information of the current round of dialog information related to the historical dialog may include: according to the specified matching rules, Comparing and matching the acquired one or more rounds of historical dialogue information with the dialogue information of the current round of dialogue, and obtaining the dialogue information of the historical dialogue related to the dialogue information of the current round of dialogue.
  • specific specified matching rules may be as described in rule 1, rule 2, rule 3, rule 4 and rule 5 below.
  • the third combinable capability can determine the history of this round
  • the dialogue information of the dialogue is related to the dialogue information of the current round of dialogue. in:
  • the physiological feature information of the user in this round of historical dialogue is the same as that of the user in the current round of dialogue, then the third combinable ability can determine the user identified based on this round of historical dialogue and the user identified based on the current round of dialogue User is the same user. That is to say, the dialog information of the historical dialog related to the first user who triggered the current round of dialog input may be regarded as the first historical interaction information.
  • the interval between the occurrence time of this round of historical dialogue and the occurrence time of the current round of dialogue is less than duration 1 (also referred to as the first duration), and the electronic device receiving this round of historical dialogue (also referred to as the sixth device ) is the same as the device information of the first device.
  • the duration 1 may be 3 minutes, 5 minutes, etc., and the present application does not limit the specific size of the duration 1 .
  • the historical interaction information satisfying this rule can be regarded as the first historical interaction information.
  • the interval between the occurrence time of this round of historical dialogue and the occurrence time of the current round of dialogue is less than duration 2 (also referred to as the first duration), and the electronic device receiving this round of historical dialogue (also referred to as the sixth device ) is a near-field device of the first device.
  • duration 2 may be 3 minutes, 5 minutes, etc.
  • duration 2 may be the same as duration 1, or may be different from duration 1, which is not limited in this application.
  • the historical interaction information satisfying this rule can be regarded as the first historical interaction information.
  • duration 3 (also called the second duration)
  • duration 3 may be 3 minutes, 5 minutes, etc.
  • duration 3 may be the same as duration 1/duration 2, or may be different from duration 1/duration 2, which is not limited in this application.
  • the historical interaction information may be regarded as the second historical interaction information, and the second historical interaction information is included in the first historical information.
  • the interval between the occurrence time of this round of historical dialogue and the occurrence time of the current round of dialogue is less than duration 4 (also called the second duration), and the target device controlled by this round of historical dialogue is the one that the current round of dialogue wants to control The near-field device of the target device.
  • duration 4 may be 3 minutes, 5 minutes and so on.
  • the duration 4 may be the same as the duration 1/time 2/time 3, or may be different from the duration 1/time 2/time 3, which is not limited in this application.
  • the historical interaction information may be regarded as second historical interaction information, and the second historical interaction information is included in the first historical information.
  • the near-field device of the above-mentioned electronic device refers to other electronic devices that the electronic device can discover through the near-field identification capability.
  • the near-field identification capability can be provided by a near-field device identification module on the electronic device.
  • the near-field device identification module may be a capability module for detecting whether the electronic device and other electronic devices are connected to the same local area network, or a capability module for identifying other electronic devices based on Bluetooth or broadcast discovery capabilities.
  • the mobile phone 120 and the smart speaker 130 may be respectively configured with near-field device identification modules.
  • the mobile phone 120 can identify the smart speaker 130 based on the near-field device identification module on the mobile phone 120, and the smart speaker 130 can also identify the mobile phone 120 based on the near-field device identification module on the smart speaker 130, then the mobile phone 120 is the near-field device of the smart speaker 130.
  • the smart speaker 130 is a near-field device of the mobile phone 120 .
  • the virtual aggregation device may store a mapping relationship between the device and the device information of the device corresponding to the near-field device.
  • the third combinable capability matches the dialog information of the current round of dialog with the dialog information of a certain round of historical dialog based on the above-mentioned specified matching rules, the third combinable capability can query the near-field device information of the first device based on the above-mentioned mapping relationship Whether to include the device information of the electronic device receiving this round of historical dialog. If yes, the electronic device receiving this round of historical dialog is the near-field device of the first device.
  • the electronic device receiving this round of historical dialogue is not the near-field device of the first device.
  • the third combinable capability can query the desired control of the current round of dialog based on the above-mentioned mapping relationship. Whether the near-field device information of the target device includes the device information of the target device actually controlled by this round of historical conversations. If yes, the target device actually controlled in this round of historical dialog is the near-field device of the target device to be controlled in the current round of dialog. If not, the target device actually controlled in this round of historical dialog is not the near-field device of the target device to be controlled in the current round of dialog.
  • the current round of dialogue is Sc
  • a certain round of historical dialogue is So as an example.
  • the electronic device receiving the Sc is the first device.
  • the process of matching the dialog information of Sc with the dialog information of So can be as follows:
  • the third combinable capability can determine whether the user identified based on the Sc dialogue information is the same as the user identified based on the So dialogue information. For the same user, the So dialogue information is used as the relevant dialogue information of the Sc dialogue information.
  • the third combinable capability uses the So dialogue information as the Sc dialogue information related conversation information.
  • the third combinable capability uses the So dialogue information as the relevant dialogue information of the Sc dialogue information .
  • the third combinable capability converts the So dialog information Related session information as Sc session information.
  • the third combinable capability uses the So dialog information as the Sc dialog Conversation information about the message.
  • electronic devices may also be divided into public devices and private devices.
  • public equipment may refer to electronic equipment that can be used by multiple users
  • private equipment may refer to electronic equipment that is only used by a designated user. The electronic device will not be used until authorized.
  • the specified matching rules may include the following rules 6, 7, 8 and 9:
  • the physiological feature information of the user in this round of historical dialogue is the same as that of the user in the current round of dialogue, and the electronic device receiving this round of historical dialogue is a public device or the user's private device.
  • the physiological feature information of the user in this round of historical dialogue is different from that of the user in the current round of dialogue, and the electronic device receiving this round of historical dialogue is a public device.
  • Rule 8 The dialogue content of this round of historical dialogue is the content related to the specified service (such as querying weather, playing news, etc.).
  • Rule 9 The electronic device receiving this round of historical dialogue and the electronic device actually controlled in this round of historical dialogue are all public devices.
  • the third composable capability can first judge whether each round of historical dialogue satisfies any of the above-mentioned rules 6 to 9, and then proceed from satisfying the rules Any one of the historical dialogs in rules 6 to 9 is matched according to the above rules 1 to 5 to obtain the dialog information of the historical dialog related to the dialog information of the current round of dialog.
  • the third combinable capability determines that a certain round of historical dialogue satisfies rule 6 or rule 7, the dialogue information of this round of historical dialogue can be confirmed as the historical dialogue related to the dialogue information of the current round of dialogue conversation information.
  • the third combinable capability identifies the user intention represented by the dialog information of the current round of dialog based on the dialog information of the matched historical dialog.
  • the first event may be a current voice dialogue interaction event.
  • the current voice dialogue interaction event corresponds to the dialogue information of the current round of dialogue.
  • the third combinable capability may identify the user intention represented by the dialogue information of the current round of dialogue based on the matched dialogue information of the historical dialogue.
  • the matched dialogue information of the historical dialogues can be used as the analysis basis for the missing slots and/or The identification basis for the missing reference is used to determine the user intention represented by the dialogue information of the current round of dialogue.
  • the dialogue information of the current round of dialogue includes "order an airline ticket", and the dialogue information of the current round of dialogue lacks location slot information.
  • the dialogue information of the matched historical dialogue includes "check the weather in Beijing”. Then, based on the dialogue information of the above-mentioned historical dialogue, it can be identified that the missing location slot information in the dialogue information of the current round of dialogue is "Beijing". Therefore, the user intention represented by the dialogue information of the current round of dialogue is "ordering Air ticket to Beijing".
  • the matched dialog information of the historical dialog can be used to implement the process shown in FIG. Vectors are fused, and the resulting fused features are fed into a correlation model to calculate a correlation score.
  • FIG. 12 For the specific process of this implementation, reference may be made to the subsequent description of the embodiment shown in FIG. 12 , which will not be repeated here.
  • the matched dialog information of the historical dialog may be used in combination with the dialog information of the current round to expand the search range of keywords corresponding to the user's intent, so as to determine the user's intent.
  • the dialogue information of the current round of dialogue includes "Andy Lau”
  • the matched dialogue information of the historical dialogue includes "watching a movie at night”.
  • the dialog information of the historical dialog may include the behavior keyword "watching a movie” and the time keyword "evening”, as well as the implicit corresponding scene keyword "movie theater”.
  • the dialogue information of this historical dialogue expands the search range of keywords corresponding to the user's intention, and combined with the dialogue information of the current round of dialogue, it can be determined that the user's intention is "go to the cinema to watch Andy Lau's movie at night".
  • Embodiments The matching analysis method provided by the embodiment of this application enables the virtual aggregation device to match the historical dialogue information with the current round of dialogue information, and determine the user intention indicated by the current round of dialogue information based on the relevant historical dialogue information, which can improve How effectively the device recognizes user intent.
  • the introduction of user identity identification can protect the user's privacy when querying historical conversation information.
  • the devices are divided into private devices and public devices, and the type of historical dialogue information that can be shared can also be set by the user, so that the matching result is more personalized and conforms to the user's habits.
  • FIG. 12 exemplarily shows the specific flow of the matching analysis method provided by the embodiment of the present application.
  • the interactive input may include historical input and current input.
  • the global context can be generated based on the above-mentioned historical input and current input. Therefore, the global context may include historical interaction information and current round interaction information.
  • the historical interaction information associated with the current round of interaction information may be referred to as first historical interaction information.
  • the historical input can be the historical voice dialogue input
  • the current input can be the current voice dialogue input
  • the historical interaction information can be the dialogue information of the historical dialogue
  • the current round of interaction information can be the current round of dialogue. conversation information.
  • the first event may be a current voice dialogue interaction event.
  • the historical interaction information whose correlation with the current round of interaction information is greater than a threshold may be regarded as the first historical interaction information.
  • the method may specifically include:
  • the third combinable capability acquires dialog information of a current round of dialog when a user interacts with a first device through a voice dialog.
  • step S1101 in the foregoing embodiment shown in FIG. 11 , and details are not repeated here.
  • the third combinable capability inputs the dialogue information of the round of dialogue into the natural language understanding model, and obtains an intent vector corresponding to the dialogue information of the current round of dialogue.
  • the second device can convert the dialog information of the current round of dialog received from the first combinable capability into text information based on automatic speech recognition (ASR) technology through the third combinable capability, and input it in natural language understanding models.
  • the natural language understanding model can be based on the natural language understanding algorithm (natural language understanding, NLU), through word segmentation, part-of-speech tagging, keyword extraction and other processing operations, and output the dialogue information of the above-mentioned current round of dialogue into structured semantics that can be understood by electronic devices Representing data, this structured semantic representation data may be referred to as an intent vector.
  • the NLU algorithm can perform intent classification and slot keyword extraction based on the textual dialogue information of the current round of dialogue.
  • the third combinable capability can convert the voice command into text information, and then, based on the NLU algorithm, the intention Classification and slot keyword extraction, the intention classification result is "book a ticket”, the time slot keyword extraction information is "tomorrow”, and the location slot keyword extraction information is "Beijing".
  • the third composable capability uses a pre-trained natural language encoder to encode the dialogue information of one or more rounds of historical dialogues, and obtains an encoding result corresponding to the dialogue information of each round of historical dialogues.
  • the second device may use the third combinable capability, and the process of encoding the dialog information of a certain round of historical dialog based on the pre-trained natural language encoder may include steps a) to c) as follows:
  • FIG. 13 exemplarily shows a process of encoding dialog information of a certain round of historical dialog.
  • the dialog information of this round of historical dialog may include: the dialog round number of this round of historical dialog, the user's input information in this round of historical dialog, the device name of the device receiving this round of historical dialog, the device name of the device receiving this round of historical dialog, The device state of the device in the dialog, and the list of near-field devices of the device receiving this round of historical dialog.
  • the dialogue round number of this round of historical dialogue is "the first round”
  • the user's input information in this round of historical dialogue is "make a phone call”
  • the device name of the device receiving this round of historical dialogue is " mobile phone”
  • the device status of the device receiving this round of historical dialogue is “power on”
  • the near-field device list of the device receiving this round of historical dialogue includes “big screen, watch, speaker”
  • the dialogue information of this historical dialogue is processed
  • the dialogue information of this round of historical dialogue can be restored to the text described in natural language: ⁇ "the first round”, “make a phone call”, “mobile phone”, “start up”, “big screen, watch, speaker” ⁇ , and encode ⁇ "first round”, "make a phone call”, “mobile phone”, “start up”, “big screen, watch, speaker” ⁇ to obtain multiple vectors corresponding to the dialogue information of this round of historical dialogue .
  • the average value of multiple vectors can be calculated to obtain the encoding result corresponding to the dialog information of this round of
  • the third combinable capability may first pass the dialogue information of the above-mentioned historical dialogues through the specified
  • the recall engine of the matching rule obtains the dialog information of the historical dialog related to the dialog information of the current round of dialog, and then encodes the dialog information of the historical dialog related to the dialog information of the current round of dialog to perform the following step S1204 .
  • the third combinable capability fuses the intent vector corresponding to the dialogue information of the current round of dialogue with the encoding result corresponding to the dialogue information of each round of historical dialogue to obtain fusion features.
  • the third combinable capability inputs the fusion feature into the correlation model, and obtains the correlation score between the dialog information of the current round of dialog and the dialog information of each round of historical dialog output by the correlation model.
  • FIG. 14 exemplarily shows a schematic composition diagram of the correlation model provided by the embodiment of the present application.
  • a relevance model may include a sentence pair input encoder, a relevance scoring network, and a keyword extraction network.
  • the sentence pair input encoder can be encoded according to the fusion feature of the encoding result corresponding to the intent vector corresponding to the dialogue information of the current round of dialogue and the dialogue information of each round of historical dialogue;
  • the correlation score network can be generated according to the encoding result of the fusion feature The correlation score between the dialogue information of the current round of dialogue and the dialogue information of each round of historical dialogue;
  • the keyword extraction network can extract the keywords in the dialogue information of each round of historical dialogue according to the encoding result of the fusion feature.
  • the correlation score network generates the correlation score between the dialogue information of the current round of dialogue and the dialogue information of each round of historical dialogue according to the encoding result of the fusion feature, so as to obtain the score of the historical dialogue related to the dialogue information of the current round of dialogue.
  • Dialogue information may include: for the dialog information of each round of historical dialog, when the correlation score between the dialog information of this round of historical dialog and the dialog information of the current round of dialog is greater than threshold 1 (for example, 0.8), determine the round
  • the dialogue information of the historical dialogue is the dialogue information of the historical dialogue related to the dialogue information of the current round of dialogue.
  • the threshold 1 may be a preset value configured manually, and the present application does not limit the size of the threshold 1 .
  • the third combinable capability determines the dialog information of the historical dialog related to the dialog information of the current round of dialog based on the correlation score obtained above.
  • the dialog information of the historical dialog related to the dialog information of the current round of dialog may be referred to as the first historical interaction information.
  • the third combinable capability identifies the user's intention based on the dialog information of the current round of dialog and the dialog information of the related historical dialog.
  • Embodiments The matching analysis method provided by this application can uniformly encode the historical dialog information (including input text, device status, etc.) Matching accuracy and efficiency of historical dialogue information. Moreover, the dialogue information is described in natural language. When new dialogue information is added, the code of the dialogue information can be automatically generated without manually defining the key-value pairs in the dictionary, and the code can better represent the content of the dialogue information .

Abstract

本申请公开了基于多设备提供服务的方法、相关装置及系统。在该方法中,多个设备组成一个通信系统,该多个设备中的中控设备可以统一调度该通信系统中的部分或全部资源,从而为用户提供服务。由中控设备统一调度通信系统中的部分或全部资源,可以高效地整合系统内资源,实现跨设备的资源互通共享,为用户提供自然、智慧化的服务。

Description

基于多设备提供服务的方法、相关装置及系统
本申请要求于2021年11月12日提交中国专利局、申请号为202111340372.5、申请名称为“基于多设备提供服务的方法、相关装置及系统”的中国专利申请的优先权,以及,于2021年12月28日提交中国专利局、申请号为202111633492.4、申请名称为“基于多设备提供服务的方法、相关装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端及通信技术领域,尤其涉及基于多设备提供服务的方法、相关装置及系统。
背景技术
随着终端设备的普及,个人可以拥有多个终端设备,例如手机、平板电脑、智慧屏等等。如何利用多个终端设备的资源为用户提供自然、智慧化的服务,是当前及未来研究的方向。
发明内容
本申请提供了基于多设备提供服务的方法、相关装置及系统,可以实现跨设备的资源互通共享,为用户提供自然、智慧化的服务。
第一方面,本申请实施例提供了基于多设备提供服务的通信系统,该通信系统包括多个电子设备,多个电子设备包括中控设备,中控设备用于管理多个资源,使得多个资源执行以下步骤:多个资源中的第一资源检测第一事件,第一资源的数量为一个或多个;多个资源中的第二资源执行第一事件对应的待执行任务,第二资源的数量为一个或多个;第一资源和/或第二资源包括的全部资源,至少来自两个不同的电子设备;其中,中控设备管理的多个资源包括多个电子设备的部分或全部资源。
多个资源可以包括但不限于摄像头资源、麦克风资源、传感器资源、显示屏资源或计算资源。第一资源的数量为多个时,可以为多个同类的资源(如多个摄像头资源,其中多个摄像头资源可以为同一设备的多个摄像头资源,或多个设备的多个摄像头资源),或者,多个不同类的资源(如摄像头资源和麦克风资源)。
通过第一方面的通信系统,中控设备可以统一调度通信系统中的部分或全部资源,可以高效地整合系统内资源,实现跨设备的资源互通共享,为用户提供自然、智慧化的多设备协同服务。
结合第一方面,在一些实施方式中,第一资源的数量为一个或多个,第二资源的数量为一个或多个。第一资源和/或第二资源包括的全部资源,至少来自两个不同的电子设备,可以是:多个第一资源来自多个不同的电子设备;或者,也可以是多个第二资源来自多个不同的电子设备;再或者,还可以是:第一资源和第二资源中的任意两个或更多资源来自不同的电子设备,例如当第一资源和第二资源都只有一个时,第一资源和第二资源分别来自不同的设备,或,例如第一资源或第二资源包括多个资源时,第一资源中的任意一个资源和第二资源中的任意一个资源分别来自不同的设备,等。上述提及的多个不同的电子设备,均为通信系统中的电子设备。
结合第一方面,在一些实施方式中,中控设备还用于管理多个资源,使得多个资源执行:第二资源执行第一事件对应的待执行任务之前,多个资源中的第三资源识别第一事件表征的 用户意图,并确定满足用户意图的待执行任务。
通过上一实施方式,通信系统中的第三资源可以识别第一事件所表征的用户意图,并将用户意图拆分为待执行任务,便于后续第二资源执行该待执行任务。
结合第一方面,在一些实施方式中,中控设备管理的资源为可组合能力,可组合能力为使用预定方式描述的资源。第一资源为第一可组合能力,第二资源为第二可组合能力。
该预定方式例如可包括但不限于:预定的格式、协议或标准等等。
通过上一实施方式,各个设备可以使用统一的预定方式将自身资源解构为可组合能力。通过统一方式解构得到的可组合能力和设备、设备型号、设备厂商解耦,因而可以供通信系统中的其他设备跨设备无障碍调用,即支持中控设备统一调度,从而满足用户的需求。并且,通过预定方式来描述资源,可以使得本申请实施例提供的方法适配不同的设备,支持不同类型、不同厂商的设备加入到通信系统中共同为用户提供服务。
结合上一实施方式,在一些实施方式中,中控设备还用于:在管理多个资源,使得多个资源执行第一方面的步骤之前,将部分或全部多个电子设备的可组合能力配置为虚拟聚合设备。其中,第一可组合能力、第二可组合能力均为虚拟聚合设备的可组合能力。中控设备配置虚拟聚合设备之后,即可管理该虚拟聚合设备中的可组合能力。
中控设备配置虚拟聚合设备是指,配置部分或全部多个电子设备的可组合能力的参数。参数的配置包括:针对数据处理流向的相关参数的配置。中控设备配置虚拟聚合设备之后,相当于指定了信息的采集及处理流向。
通过上一实施方式,中控设备配置虚拟聚合设备后,对于通信系统中各个物理设备中的上层应用来说,该应用可以感知到独立的该虚拟聚合设备,而不会感知到多个其他单独的物理设备。这样可以方便各个上层应用更加便捷地调度其他物理设备中的资源。
通过上一实施方式,中控设备配置虚拟聚合设备,可以针对后续可能会使用到的可组合能力,提前做好启动前的准备,能够提高后续启动该可组合能力以为用户提供服务时的响应速度,缩短响应时延。此外,通过配置虚拟聚合设备,通信系统可以仅需聚合通信系统中的部分可组合能力,可以避免浪费不必要的资源。
在一些实施方式中,中控设备还用于,将部分或全部多个电子设备的可组合能力配置为虚拟聚合设备之前,接收中控设备以外的其他设备发送的可组合能力信息,可组合能力信息用于指示对应设备提供的可组合能力。中控设备具体用于根据多个电子设备的可组合能力信息,将部分或全部多个电子设备的可组合能力配置为虚拟聚合设备。
在一些实施方式中,可组合能力信息还可包括可组合能力的属性,可组合能力的属性包括以下任意一个或多个:可组合能力的位置、朝向、类别、性能、参数、版本或尺寸。可组合能力的属性可用于后续中控设备更好地管理各个电子设备的资源。
也就是说,中控设备在配置虚拟聚合设备之前,通信系统中的各个电子设备还可以相互同步可组合能力信息,这样可以让中控设备获知通信系统中其他设备的可组合能力,便于后续灵活调度通信系统中的部分或全部资源,从而为用户提供服务,实现跨设备的资源互通共享。
在一些实施方式中,通信系统中的各个电子设备可以周期性相互同步可组合能力信息,也可以在有新设备加入通信系统、有新设备上线后相互同步可组合能力信息,当设备中的可组合能力信息有更新时,该设备也可以向通信系统的其他电子设备发送可组合能力信息。这样,中控设备可以及时获知通信系统中可用的可组合能力,从而更加灵活地调度通信系统中的部分或全部资源,更好地为用户提供服务。
在一些实施方式中,中控设备接收到的其他设备发送的可组合能力信息所指示的可组合能力,可以是该电子设备中的部分或全部可组合能力。该部分或全部可组合能力可以由该其他设备加入通信系统时的鉴权结果确定。电子设备加入通信系统时的鉴权结果的等级越高,该电子设备提供给其他设备调用的可组合能力的种类和/或数量也就越多。这样能够使得电子设备仅对信任的其他设备开放更多的可组合能力,保障该电子设备的信息安全。
在另一些实施方式中,该部分或全部可组合能力也可以由用户根据自身需求决定。
在一些实施方式中,中控设备配置虚拟聚合设备之后,该虚拟聚合设备用于运行单一智慧助手,单一智慧助手用于支持中控设备管理多个资源,使得多个资源执行第一方面中的步骤。即,该虚拟聚合设备包括的可组合能力所在的物理设备用于运行该单一智慧助手。
通过虚拟聚合设备运行单一的智慧助手,该单一的智慧助手可以便于中控设备灵活地调度通信系统中的部分或全部资源,从而为用户提供自然、智慧化的服务。而不需要每个设备运行一个智慧助手,然后多个智慧助手在内部通过协商进行交互。
结合上一实施方式,在一些实施方式中,中控设备具体用于根据以下一项或多项,将部分或全部多个电子设备的可组合能力配置为虚拟聚合设备:用户状态、设备状态、环境状态、用户画像、全局上下文或记忆。这样,可以根据多方面的信息来配置虚拟聚合设备,该虚拟聚合设备可以更好地为用户提供服务。
结合上述中控设备配置虚拟聚合设备的实施方式,在一些实施方式中,中控设备具体用于将以下几项配置为虚拟聚合设备:中控设备自身的可组合能力,和,通信系统中中控设备以外的电子设备的第四可组合能力。其中,第四可组合能力的确定方式可以包括以下两种:
(1)第四可组合能力由中控设备根据预设策略确定。
该预设策略例如可包括:
全面探测策略,如中控设备将中控设备以外的电子设备的全部可组合能力确定为第四可组合能力。使用全面探测策略能够全面且准确地获取各类信息,便于为用户提供服务。
隐私优先策略,如中控设备将中控设备以外的电子设备中采集非隐私内容的可组合能力确定为第四可组合能力。使用隐私优先策略,能够保障用户的隐私不被泄露。
功耗优先策略,如中控设备将中控设备以外的连接电源的电子设备中的可组合能力确定为第四可组合能力。使用功耗优先策略,能够充分考虑各个设备的电量来获取环境信息,避免通信系统中各个设备的电量被耗尽。
(2)第四可组合能力由中控设备使用自身的可组合能力获知环境信息后,根据环境信息确定。
上述第(2)种确定方式根据中控设备的探索结果确认虚拟聚合设备的初始化配置,具有灵活便捷等特点。
通过上一实施方式,通信系统中的多个设备可以在不同的环境中初始化为带有中控设备的虚拟聚合设备,针对后续可能会使用到的可组合能力,提前做好启动前的准备。
在一些实施方式中,上述虚拟聚合设备的初始化过程,可以在通信系统首次启动或重新启动或有新设备加入时执行。
结合上述初始化虚拟聚合设备的实施方式,在一些实施方式中,中控设备将部分或全部多个电子设备的可组合能力配置为虚拟聚合设备之后,中控设备还可用于管理多个资源,使得多个资源执行:第一可组合能力检测第二事件;第二可组合能力根据第二事件确定服务方案。之后,中控设备还用于将服务方案对应的可组合能力,重新配置为虚拟聚合设备。
在一些实施方式中,中控设备还用于管理多个资源,使得多个资源执行:第二资源根据 第二事件分析用户需求,根据用户需求确定服务方案。
在一些实施方式中,第二资源可以根据第二事件,使用固定规则、知识图谱或机器学习中的任意一种来分析用户需求。
通过上述重新配置虚拟聚合设备的实施方式,中控设备可以在当前已有的虚拟聚合设备的基础上,通过该虚拟聚合设备持续检测用户、设备、环境等状态,根据检测到的信息分析用户潜在的服务需求,并适应性调整虚拟聚合设备,即重配置虚拟聚合设备。这样能够在用户、设备、环境状态持续变化的场景下,控制虚拟聚合设备动态、自适应地重配置,使其能够准确、个性化地满足用户确定或潜在(未发生)的服务需求,从而更好地为用户提供服务。当前已有的虚拟聚合设备可以是初始化后的虚拟聚合设备,也可以是经过多次重配置后的虚拟聚合设备。
结合第一方面以及上述任意一种实施方式,在一些实施方式中,第一事件包括以下任意一种:
用户输入的第一操作;
用户状态发生变化的事件;
用户和电子设备之间的距离发生变化的事件;
环境状态发生变化的事件;
电子设备获取到通知消息,或者,获取到即将执行的日程信息的事件。
也就是说,本申请实施例提供的通信系统,不仅可以响应用户的交互行为为用户提供服务,还可以根据用户状态的变化、环境变化、设备状态等信息来为用户提供服务,实现自然、智慧化的多设备协同服务。
在一些实施方式中,第一可组合能力可以通过以下任意一种方式确定:
(1)第一可组合能力包括多个用于采集第一模态数据的可组合能力。
也就是说,针对某种模态的数据,通信系统可以使用多个采集该模态数据的可组合能力来检测第一时间,这样可以融合多通道采集的模态信息,获得更加准确、丰富的模态信息,便于后续操作的准确性。
(2)第一可组合能力由中控设备根据用户习惯、可组合能力的活跃度、可组合能力和用户之间的距离、默认排序中的一个或多个确定。
例如,中控设备可以根据用户习惯,优先选择历史记录中最常被调用的可组合能力作为第一可组合能力,或者,选择活跃度最高的可组合能力作为第一可组合能力,或者,选择距离用户最近的可组合能力作为第一可组合能力,或者,选择默认排序靠前的可组合能力作为第一可组合能力。默认排序可以根据设备优先级确定。
(3)第一可组合能力包括用户选择的可组合能力。
这样,中控设备可以根据用户的实际需求来选择第一可组合能力。
(4)第一可组合能力包括用户注意力所在的电子设备中的可组合能力。
这样,中控设备可以将选择用户注意力所在设备中的可组合能力作为第一可组合能力。
在一些实施方式中,第二可组合能力可以通过以下任意一种方式确定:
(1)第二可组合能力包括第一可组合能力所在设备中的可组合能力。
这样,中控设备可以在相同设备中同时选择第一可组合能力和第二可组合能力。
(2)第二可组合能力由中控设备根据用户习惯、可组合能力的活跃度、可组合能力和用户之间的距离、默认排序中的一个或多个确定。
例如,中控设备可以根据用户习惯,优先选择历史记录中最常被调用的可组合能力作为 第二可组合能力,或者,选择活跃度最高的可组合能力作为第二可组合能力,或者,选择距离用户最近的可组合能力作为第二可组合能力,或者,选择默认排序靠前的可组合能力作为第二可组合能力。默认排序可以根据设备优先级确定。
(3)第二可组合能力包括用户选择的可组合能力。
这样,中控设备可以根据用户的实际需求来选择第二可组合能力。
(4)第二可组合能力包括用户注意力所在的电子设备中的可组合能力。
这样,中控设备可以将选择用户注意力所在设备中的可组合能力作为第二可组合能力。
结合上述两个实施方式,在一些实施方式中,中控设备还用于通过以下任意一种方式确定用户注意力所在的设备:
通过第四设备采集的图像,确定用户注意力所在的设备;
通过第四设备采集的音频、第五设备采集的音频和图像,确定用户注意力所在的设备;
通过第四设备采集的图像和第五设备采集的图像,确定用户注意力所在的设备。
其中,第四设备、第五设备可以是通信系统中的任意设备。
结合第一方面,在一些实施方式中,通信系统中的多个电子设备用于在以下任意一种情况下,从多个电子设备中确定中控设备:
(1)多个电子设备中有电子设备接收到第二操作时。即,通信系统可以在用户触发下确定中控设备。
(2)在预设时间到达时。即,通信系统可以按照预设的规则周期性或非周期性的确定中控设备。
(3)有电子设备加入或离开通信系统时。
(4)多个电子设备组成通信系统的预设时长后。即,在组建通信系统后,该通信系统可以延迟确定中控设备,这样通信系统可以搜集到更加全面的设备信息来选举中控设备,以选举到更加合适的中控设备。
结合第一方面以及上一实施方式,通信系统中的多个电子设备确定中控设备的策略可包括以下几种:
策略一,根据资源稳定性、设备模态或用户习惯中的一个或多个,从多个电子设备中确定中控设备。例如,可以将计算资源较为稳定的设备、内存资源较为稳定的设备,电源较为稳定的设备,可用模态较多的设备,或者,用户常用的设备,确定为中控设备。
策略二,将多个电子设备中属于预设类型的电子设备确定为中控设备。例如,可以将智慧屏确定为中控设备。
策略三,将用户选择的电子设备确定为中控设备。这样可以根据用户的实际需求来确定中控设备。
策略四,根据各个电子设备的历史交互信息,从多个电子设备中确定中控设备。
在一些设施方式中,电子设备的历史交互信息科包括但不限于以下任意一项或多项:设备标识、设备类型、当前功耗、可用资源、设备模态,当前使用状态、上线信息、下线信息、和通信系统10中其他设备的历史交互信息、设备位置(如房间、客厅等)、朝向、设备所处环境类型(如办公室、家庭范围等)。
在一些设施方式中,上述策略四具体包括:
将平均上线设备数最大的电子设备确定为中控设备,平均上线设备数为电子设备在统计时间段内统计到的,通信系统在单位时间上线的设备的数量的平均值;
将平均上线设备数的归一化标准差最大的电子设备确定为中控设备;
将平均上线设备数大于第一值且平均上线设备数的归一化标准差大于第二值的电子设备确定为中控设备;
或者,将平均上线设备数的数学期望值最大的电子设备确定为中控设备。
结合第一方面以及上一实施方式,通信系统中的多个电子设备确定的中控设备的数量包括多个,多个中控设备在同一时间或同一空间,连接到通信系统中的全部电子设备。这样,中控设备可以直接和通信系统中的其他设备交互,从而充分利用各个电子设备的信息来为用户提供服务。
在一些实施方式中,中控设备具体用于管理多个资源,使得多个资源执行:第三资源将用户意图拆分为以模态为单位的多个待执行任务;不同的第二资源执行不同模态的待执行任务。
通过上一实施方式,按照模态将用户意图拆分为多个待执行任务,并且将不同模态的待执行任务分发给不同的第二资源执行,可以更好地为用户提供服务。
结合第一方面,在一些实施方式中,满足该用户意图的待执行任务包括:多个具备逻辑关系的任务,该逻辑关系包括以下任意一种或多种:顺序关系、条件关系、循环关系或布尔逻辑。该中控设备具体用于管理多个资源,使得该多个资源执行:该第二资源按照该逻辑关系执行该多个具备逻辑关系的任务。
通过上一实施方式,本实施例提供的多个资源可以基于用户的显示指令或隐式指令,执行多个具备逻辑关系的任务。这样,本实施例提供的通信系统所能执行的任务类型更加广泛,能够更好地满足用户复杂的需求,从而更好地为用户提供服务。
结合第一方面,在一些实施方式中,该中控设备还用于管理多个资源,使得该多个资源执行以下步骤:该第三资源识别该第一事件表征的用户意图之前,多个该第一资源接收交互输入。该第三资源根据该交互输入生成全局上下文。其中,该全局上下文包括以下一项或多项:该第一资源接收到该交互输入的时间、该第一资源、该交互输入的交互内容、该交互输入对应用户的生理特征信息、该第一资源所属电子设备的设备信息、或该交互输入控制的目标设备的设备信息。该中控设备具体用于管理多个资源,使得该多个资源执行以下步骤:该第三资源基于该全局上下文,识别该第一事件表征的用户意图。
结合上述第三资源基于全局上下文识别该第一事件表征的用户意图的实施方式,在一些实施方式中,该交互输入包括:历史输入,和,当前输入。该全局上下文包括:历史交互信息,和,当前轮交互信息。该中控设备具体用于管理多个资源,使得该多个资源执行以下步骤:该第一资源基于该历史输入获取该历史交互信息,基于该当前输入获取该当前轮交互信息。该第三资源从该历史交互信息中,匹配和该当前轮交互信息相关联的第一历史交互信息。该第三资源基于该第一历史交互信息,识别该第一事件表征的用户意图。
在一些实施方式中,该第一历史交互信息包括:和第一用户相关的历史交互信息,该第一用户为触发该当前输入的用户。或者,由第六设备在第一时间接收到的历史交互信息,该第六设备为该第一设备或该第一设备的近场设备,该第一时间与接收该当前轮交互信息的时间的间隔小于第一时长。或者,在第二时间接收到的第二历史交互信息,该第二历史交互信息的目标设备,为,该当前轮交互信息的目标设备或近场设备,该第二时间与接收该当前轮交互信息的时间的间隔小于第二时长。或者,和该当前轮交互信息的相关性大于阈值的历史交互信息。通过上一实施方式,当第三资源结合全局上下文识别第一事件表征的用户意图时,第三资源可以根据接收到的用户、设备、环境状态、历史交互信息等多方面的信息分析用户的服务需求,因此能够更准确、个性化地确定出用户的意图,从而更好地为用户提供服务。
结合第一方面,在一些实施方式中,该第一事件包括第一对话信息。该第一对话信息包含第一指令和第二指令,该第一指令对应的意图和该第二指令对应的意图相关联,该第一指令包括第一指代词。该中控设备还用于管理多个资源,使得该多个资源执行以下步骤:该第二资源识别该第一事件表征的用户意图之前,将该第一对话信息中的第一指代词指代的对象替代为该第二指令对应的对象,以获取到第二对话信息。该中控设备具体用于管理多个资源,使得该多个资源执行以下步骤:该第三资源基于该第二对话信息,识别该第一事件表征的用户意图。
在一些实施方式中,将该第一对话信息中的第一指代词替代为该第二指令对应的对象,以获取到第二对话信息的步骤可以如下:1.将第一对话信息划分为第一指令和第二指令,其中,该第一指令包括该第一指代词。2.识别出包括该第一指代词的该第一指令。3.基于意图分类模板,识别出该第一指令对应的意图和该第二指令对应的意图。4.当确定出该第一指令对应的意图和该第二指令对应的意图相关联时,将该第一指令和该第二指令进行合并。5.基于合并后的该第一指令和该第二指令,将该第一指代词指代的对象替代为第二指令对应的对象,获得第二对话信息。
通过上一实施方式,当用户单轮输入的第一对话信息中包括有指代词时,第三资源在基于第一对话信息识别出表征的用户意图之前,可以先将第一对话信息的指代词替代为对应的指代对象,以获取到指代词被替代后的第二对话信息。这样,第三资源可以基于该第二对话信息更准确地确定出用户的意图,从而更好地为用户提供服务。
结合第一方面,在一些实施方式中,该中控设备还用于管理多个资源,使得该多个资源执行以下步骤:该第三资源识别该第一事件表征的用户意图之前,该第一资源接收第一预设时间内的交互输入。该第三资源基于该交互输入确定记忆,该记忆表征用户和设备之间交互的习惯或偏好。该中控设备具体用于管理多个资源,使得该多个资源执行以下步骤:该第三资源基于该记忆,识别该第一事件表征的用户意图。
在一些实施方式中,记忆可以被分为短时记忆和长时记忆,其中,短时记忆可以基于满足第一条件的交互输入来表征用户和设备之间交互的习惯或偏好。长时记忆可以基于满足第二条件的交互输入来表征用户和设备之间交互的习惯或偏好。
在一些实施方式中,第一条件可以是指上述交互输入是在预设时间窗内(例如,在最近6小时内)接收到的。第二条件可以是指上述交互输入在连续多个预设时间窗内(例如,6小时内、8小时内)都接收到。
在一些实施方式中,第一条件可以是指在指定时间段1(例如,从凌晨0点-晚24点),上述交互输入的接收次数大于第三阈值。第二条件可以是指在多个连续的指定时间段1(例如,从凌晨0点-晚24点),上述交互输入的接收次数在各指定时间段1中都大于第三阈值。
在一些实施例方式中,第三资源可以基于该交互输入,通过主成分分析算法,或CNN、RNN、LSTM中的一种或多种人工神经网络算法构建记忆。
通过上一实施方式,第三资源可以基于用户的交互输入构建出表征用户习惯或偏好的记忆。第三资源可以基于该记忆识别出第一事件表征的用户意图。这样,第三资源能够准确、个性化地满足用户确定或潜在(未发生)的服务需求,从而更好地为用户提供服务。结合第一方面,在一些实施方式中,该中控设备还用于管理多个资源,使得该多个资源执行以下步骤:该第三资源识别该第一事件表征的用户意图之前,该第三资源获取到用户画像。该中控设备具体用于管理多个资源,使得该多个资源执行以下步骤:该第三资源基于该用户画像,识别出该第一事件表征的用户意图。
通过上一实施方式,第三资源可以基于用户的交互输入构建出用户画像,第三资源可以基于该记忆识别出第一事件表征的用户意图。这样,第三资源能够准确、个性化地满足用户确定或潜在(未发生)的服务需求,从而更好地为用户提供服务。
在一些实施方式中,中控设备具体用于管理多个资源,使得多个资源执行以下步骤:第三资源根据以下任意一项或多项,识别第一事件表征的用户意图,并确定满足用户意图的待执行任务:用户状态、设备状态、环境状态、用户画像、全局上下文或记忆。这样,可以根据多方面的信息来识别第一时间所表征的用户意图,从而更好地为用户提供服务。
在一些实施方式中,第一事件包括多种模态数据,中控设备具体用于管理多个资源,使得多个资源执行以下步骤:第一资源使用第一采样率采集对应的模态数据;其中,第一采样率为预设的采样率,或者,第一采样率为第一资源包括的多个资源中,活跃度最高的资源的采样率。这样,不同的第一资源能够使用统一的采样率来采集数据,可以获取到数据量相差不大的多种模态的数据,可以让第三资源更加方便、快捷地融合多模态数据,以识别第一事件所表征的用户意图。
在一些实施方式中,通信系统中的多个电子设备的可组合能力包括:交互类可组合能力、服务类可组合能力。第一可组合能力属于交互类可组合能力,第二可组合能力属于服务类可组合能力。
在一些实施方式中,通信系统中的多个电子设备的可组合能力包括以下任意一个或多个:使用预定方式描述的摄像头资源、麦克风资源、传感器资源、显示屏资源或计算资源。
结合第一方面,在一些实施方式中,通信系统中的多个电子设备可以通过以下任意一种或多种技术通信:WLAN、Wi-Fi P2P、BT、NFC,IR、ZigBee、UWB、热点、Wi-Fi softAP、蜂窝网络或有线技术。
第二方面,本申请实施例提供一种基于多设备提供服务的方法,该方法应用于中控设备,该方法包括:中控设备管理多个资源,使得多个资源执行以下步骤:多个资源中的第一资源检测第一事件,第一资源的数量为一个或多个;多个资源中的第二资源执行第一事件对应的待执行任务,第二资源的数量为一个或多个;第一资源和/或第二资源包括的全部资源,至少包括两个不同电子设备的资源;其中,中控设备管理的多个资源包括多个电子设备的部分或全部资源,多个电子设备包括中控设备。
实施第二方面提供的方法,中控设备可以统一调度通信系统中多个电子设备的部分或全部资源,可以高效地整合系统内资源,实现跨设备的资源互通共享,为用户提供自然、智慧化的多设备协同服务。
结合第二方面,在一些实施方式中,第一资源的数量为一个或多个,第二资源的数量为一个或多个。第一资源和/或第二资源包括的全部资源,至少来自两个不同的电子设备,可以是:多个第一资源来自多个不同的电子设备,或者,也可以是多个第二资源来自多个不同的电子设备;再或者,还可以是:多个第一资源包括第一子资源,多个第二资源包括第二子资源,第一子资源和第二子资源来自不同的电子设备。上述提及的多个不同的电子设备,均为通信系统中的电子设备。
结合第二方面,在一些实施方式中,第二资源执行第一事件对应的待执行任务之前,中控设备可以管理多个资源,使得多个资源执行:多个资源中的第三资源识别第一事件表征的用户意图,并确定满足用户意图的待执行任务。
通过上一实施方式,中控设备可以管理第三资源,使得第三资源识别第一事件所表征的 用户意图,并将用户意图拆分为待执行任务,便于后续第二资源执行该待执行任务。
在第二方面中,资源、可组合能力的定义,可参考第一方面的相关描述。
结合第二方面,在一些实施方式中,中控设备管理多个资源,使得多个资源执行第二方面中的步骤之前,中控设备可以将部分或全部多个电子设备的可组合能力配置为虚拟聚合设备;其中,第一可组合能力、第二可组合能力均为虚拟聚合设备的可组合能力。中控设备配置虚拟聚合设备之后,即可管理该虚拟聚合设备中的可组合能力。
中控设备配置虚拟聚合设备是指,配置部分或全部多个电子设备的可组合能力的参数。参数的配置包括:针对数据处理流向的相关参数的配置。中控设备配置虚拟聚合设备之后,相当于指定了信息的采集及处理流向。
在第二方面中,中控设备配置虚拟聚合设备的技术效果,可参考第一方面的相关描述。
在一些实施方式中,中控设备将部分或全部多个电子设备的可组合能力配置为虚拟聚合设备之前,中控设备可以接收中控设备以外的其他设备发送的可组合能力信息,可组合能力信息用于指示对应设备提供的可组合能力。这里,可组合能力信息的定义,可参考第一方面的相关描述。之后,中控设备可以根据多个电子设备的可组合能力信息,将部分或全部多个电子设备的可组合能力配置为虚拟聚合设备。
中控设备配置虚拟聚合设备的方式,可参考第一方面的相关描述。
在一些实施方式中,中控设备配置虚拟聚合设备之后,该虚拟聚合设备用于运行单一智慧助手,单一智慧助手用于支持中控设备管理多个资源,使得多个资源执行第二方面中的步骤。即,该虚拟聚合设备包括的可组合能力所在的物理设备用于运行该单一智慧助手。
在第二方面中,第一事件的类别可参考第一方面的相关描述。
在第二方面中,第一资源、第二资源的确定方式,可参考第一方面的相关描述。
在第二方面中,中控设备的确定方式,可参考第一方面的相关描述。
在一些实施方式中,中控设备管理多个资源,使得多个资源执行:第三资源将用户意图拆分为以模态为单位的多个待执行任务;不同的第二资源执行不同模态的待执行任务。这样,第三资源按照模态将用户意图拆分为多个待执行任务,并且中控设备将不同模态的待执行任务分发给不同的第二资源执行,可以更好地为用户提供服务。
在第二方面中,满足该用户意图的待执行任务所包括的类型可以参照第一方面相关的描述。
结合第二方面,在一些实施方式中,该中控设备管理多个资源,使得该多个资源执行:该第三资源识别该第一事件表征的用户意图之前,多个该第一资源接收交互输入。该第三资源根据该交互输入生成全局上下文。其中,该全局上下文包括以下一项或多项:该第一资源接收到该交互输入的时间、该第一资源、该交互输入的交互内容、该交互输入对应用户的生理特征信息、该第一资源所属电子设备的设备信息、或该交互输入控制的目标设备的设备信息。该第三资源基于该全局上下文,识别该第一事件表征的用户意图。
结合上述第三资源基于全局上下文识别该第一事件表征的用户意图的实施方式,在一些实施方式中,该交互输入包括:历史输入,和,当前输入。该全局上下文包括:历史交互信息,和,当前轮交互信息。该中控设备管理多个资源,使得该多个资源执行:该第一资源基于该历史输入获取该历史交互信息,基于该当前输入获取该当前轮交互信息。该第三资源从该历史交互信息中,匹配和该当前轮交互信息相关联的第一历史交互信息。该第三资源基于该第一历史交互信息,识别该第一事件表征的用户意图。
通过上一实施方式,当第三资源结合全局上下文识别第一事件表征的用户意图时,第三 资源可以根据接收到的用户、设备、环境状态、历史交互信息等多方面的信息分析用户的服务需求,因此能够更准确、个性化地确定出用户的意图,从而更好地为用户提供服务。
结合第二方面,在一些实施方式中,该第一事件包括第一对话信息。该第一对话信息包含第一指令和第二指令,该第一指令对应的意图和该第二指令对应的意图相关联,该第一指令包括第一指代词。该中控设备管理多个资源,使得该多个资源执行:该第二资源识别该第一事件表征的用户意图之前,将该第一对话信息中的第一指代词指代的对象替代为该第二指令对应的对象,以获取到第二对话信息,以获取到第二对话信息。该第三资源基于该第二对话信息,识别该第一事件表征的用户意图。
在一些实施方式中,将该第一对话信息中的第一指代词替代为该第二指令对应的对象,以获取到第二对话信息的步骤可以参考第一方面的相关描述。
通过上一实施方式,当用户单轮输入的第一对话信息中包括有指代词时,第三资源在基于第一对话信息识别出表征的用户意图之前,可以先将第一对话信息的指代词替代为对应的指代对象,以获取到指代词被替代后的第二对话信息。这样,第三资源可以基于该第二对话信息更准确地确定出用户的意图,从而更好地为用户提供服务。
结合第二方面,在一些实施方式中,该中控设备管理多个资源,使得该多个资源执行:该第三资源识别该第一事件表征的用户意图之前,该第一资源接收第一预设时间内的交互输入。该第三资源基于该交互输入确定记忆,该记忆表征用户和设备之间交互的习惯或偏好。该第三资源基于该记忆,识别该第一事件表征的用户意图。
在一些实施方式中,记忆的分类以及定义可以参考第一方面的相关描述。
在一些实施例方式中,记忆构建所用算法可以参考第一方面的相关描述。
通过上一实施方式,第三资源可以基于用户的交互输入构建出表征用户习惯或偏好的记忆。第三资源可以基于该记忆识别出第一事件表征的用户意图。这样,第三资源能够准确、个性化地满足用户确定或潜在(未发生)的服务需求,从而更好地为用户提供服务。
第三方面,本申请实施例提供了一种电子设备,包括:存储器、一个或多个处理器;存储器与一个或多个处理器耦合,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,一个或多个处理器调用计算机指令以使得电子设备执行如第二方面或第二方面任意一种实施方式的方法。
第四方面,本申请实施例提供了通信系统,该通信系统包括多个电子设备,多个电子设备中包括中控设备,中控设备用于执行如第二方面或第二方面任意一种实施方式的方法。
第五方面,本申请实施例提供了一种计算机可读存储介质,包括指令,当指令在电子设备上运行时,使得电子设备执行如第二方面或第二方面任意一种实施方式的方法。
第六方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行第二方面或第二方面任意一种实施方式的方法。
实施本申请提供的技术方案,多个设备组成一个通信系统,该多个设备中的中控设备可以统一调度该通信系统中的部分或全部资源,从而为用户提供服务。由中控设备统一调度通信系统中的部分或全部资源,可以高效地整合系统内资源,实现跨设备的资源互通共享,为用户提供自然、智慧化的服务。
附图说明
图1A为本申请实施例提供的通信系统的结构示意图;
图1B为运行于通信系统10之上的单一智慧助手的软件结构示意图;
图2为本申请实施例提供的电子设备100的结构示意图;
图3为本申请实施例提供的基于多设备提供服务的方法的流程示意图;
图4为本申请实施例提供的可组合能力的类别示例;
图5A-图5M为本申请实施例提供的组建通信系统10时所涉及的一组用户界面;
图6A为本申请实施例提供的延迟选举中控设备的场景图;
图6B-图6C为本申请实施例提供的选举中控设备的场景图;
图6D为本申请实施例提供的同一设备加入不同通信系统的场景图;
图7为本申请实施例提供的一个虚拟聚合设备;
图8A-图8D为本申请实施例提供的用户注意力所在设备的示意图;
图9为本申请实施例提供的一种多设备提供服务的场景图;
图10A为本申请实施例提供的另一种多设备提供服务的场景图;
图10B为本申请实施例提供的一种基于全局上下文的交互方法的流程示意图;
图10C为本申请实施例提供的一种应用于基于全局上下文交互的软件架构示意图;
图11为本申请实施例提供的一种基于指定匹配规则进行匹配分析的流程示意图;
图12为本申请实施例提供的一种基于指定算法进行匹配分析的流程示意图;
图13为本申请实施例提供的一种将某轮历史对话的对话信息进行编码的流程示意图;
图14为本申请实施例提供的一种相关性模型的组成示意图;
图15为本申请实施例提供的另一种多设备提供服务的场景图;
图16为本申请实施例提供的一种单轮对话下多指令指代消解方法的流程示意图;
图17A为本申请实施例提供的一种语义单元识别的流程示意图;
图17B为本申请实施例提供的一种语义单元识别模型示意图;
图17C为本申请实施例提供的一种示例性对话交互信息进行指代消解的流程示意图;
图18为本申请实施例提供的另一种多设备提供服务的场景图;
图19A为本申请实施例提供的一种执行长时任务的流程示意图;
图19B为本申请实施例提供的一种构建长时任务的执行流程的示意图;
图20为本申请实施例提供的一种个性化交互方法的流程示意图;
图21A为本申请实施例提供的一种记忆模型的示意图;
图21B为本申请实施例提供的另一种多设备提供服务的场景图;
图21C为本申请实施例提供的另一种多设备提供服务的场景图;
图21D为本申请实施例提供的一种基于用户画像的交互方法的流程示意图;
图22和图23为本申请实施例提供的多设备提供服务的场景。
具体实施方式
下面将结合附图对本申请实施例中的技术方案进行清楚、详尽地描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;文本中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是 两个或两个以上。
本申请以下实施例中的术语“用户界面(user interface,UI)”,是应用程序或操作系统与用户之间进行交互和信息交换的介质接口,它实现信息的内部形式与用户可以接受形式之间的转换。用户界面是通过java、可扩展标记语言(extensible markup language,XML)等特定计算机语言编写的源代码,界面源代码在电子设备上经过解析,渲染,最终呈现为用户可以识别的内容。用户界面常用的表现形式是图形用户界面(graphic user interface,GUI),是指采用图形方式显示的与计算机操作相关的用户界面。它可以是在电子设备的显示屏中显示的文本、图标、按钮、菜单、选项卡、文本框、对话框、状态栏、导航栏、Widget等可视的界面元素。
本申请提供了基于多设备提供服务的方法、相关装置及系统。在该方法中,多个设备组成一个通信系统,该多个设备可以协商决定中控设备,该中控设备可以在通信系统中选择合适的资源来检测特定事件、分析该特定事件所表征的用户意图、执行满足该用户意图的任务。这样,中控设备可以统一调度该通信系统中的部分或全部资源,为用户提供其所需的各项服务,满足用户的需求。
由中控设备统一调度通信系统中多个设备的资源,可以高效地整合系统内资源,实现跨设备的资源互通共享,为用户提供自然、智慧化的多设备协同服务。此外,用户对通信系统中的任意设备发出指令,该通信系统中的其他设备可以在中控设备的管理下接续执行对应的任务,无需用户发出额外的跨设备操作指令。这可以看做是为用户提供了连续、不中断的服务。
中控设备也可以被称为中心设备。
为了描述简便,在本申请以下实施例中,将用于检测特定事件的设备、用于分析该特定事件所表征的用户意图的设备、执行满足该用户意图的任务的设备,分别称为第一设备、第二设备、第三设备。
中控设备、第一设备、第二设备及第三设备,可以是相同的设备,也可以是不同的设备。
中控设备、第一设备、第二设备及第三设备,均可以包括一个或多个设备,本申请实施例对此不做限制。
通信系统中的多个电子设备协商决定中控设备的策略,中控设备在通信系统中选择第一设备、第二设备及第三设备的策略,具体可参考后续方法实施例的详细描述,这里暂不赘述。
在一些实施例中,中控设备可以结合以下一项或多项来选择合适的第一设备、第二设备或第三设备:通信系统历史检测到的用户状态、设备状态、环境状态,用户画像、全局上下文,或记忆。其中,用户状态、设备状态、环境状态,用户画像、全局上下文、记忆的定义及获取方式,可参考后文实施例的详细介绍。
特定事件可包括用户输入的交互操作,也可以包括用户状态、设备状态、环境状态发生变化的事件,具体可参考后续方法实施例的相关描述。
在本申请实施例中,通信系统中的多个设备可以共同运行单一的智慧助手。该智慧助手支持通信系统中的该多个设备协商决定中控设备,并支持该中控设备在通信系统的多个设备中选择合适的资源来检测特定事件、分析该特定事件所表征的用户意图、执行满足该用户意图的任务。多设备运行单一的智慧助手,各个设备之间可以基于该智慧助手共享和同步信息,保障了交互上下文、用户画像、个性化数据的一致性,从而给用户提供连贯、一致的交互体验。此外,通信系统中的多个设备运行一个智慧助手,可以节约系统功耗。
在本申请实施例中,通信系统中的各个设备可以将自身资源按照统一的方式解构或封装 为标准化的可组合能力,并提供标准规范的接口以供通信系统中的其他设备调用。可见,可组合能力是从物理设备中抽象出来的单一能力部件。可组合能力可以分为多种不同的类型,具体可参考后续实施例的详细介绍。中控设备可以选择通信系统中合适的可组合能力来检测特定事件、分析该特定事件所表征的用户意图、执行满足该用户意图的任务。
通过统一的方式解构得到的可组合能力和设备、设备型号、设备厂商解耦,因而可以供通信系统中的其他设备跨设备无障碍调用,即支持中控设备统一调度各个设备的资源,从而满足用户的需求。此外,通过统一的方式来将各个设备的资源解构为标准化的可组合能力,相当于不同的设备都使用相同的资源描述规范,使得本申请提供的方法适配不同的设备,支持不同类型、不同厂商的设备加入到通信系统中共同为用户提供服务,适用范围广。
在本申请以下实施例中,中控设备可以根据用户的实际需求将通信系统中的部分或全部资源组合为虚拟聚合设备,例如,中控设备可以将通信系统中的部分或全部可组合能力组合为该虚拟聚合设备。该虚拟聚合设备可用于检测特定事件、分析该特定事件所表征的用户意图、执行满足该用户意图的任务。通过组合虚拟聚合设备,可以针对用户的实际需求提前准备好用户所需的各类资源,便于后续方便、快捷地使用这些资源来满足用户需求。通过组合虚拟聚合设备自适应地选择合适的可组合能力,一方面可选择靠近用户、用户感兴趣、受干扰程度较低、准确性高和/或感知能力较强的交互外设,避免无效、低效外设所拾取交互信号对交互准确率的影响;另一方面,可拓展部分资源紧张设备的能力,选择系统内运算能力较强和/或准确率较高的AI算法模型,提升交互识别准确率。
下面,首先介绍本申请实施例提供的通信系统。
参考图1A,图1A为本申请实施例提供的通信系统10的结构示意图。
如图1A所示,通信系统10包括多个电子设备。
通信系统10中的多个电子设备可以为各种类型,本申请实施例对此不作限制。该多个电子设备可以包括手机、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、智慧屏、可穿戴式设备、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、车机、智能耳机、游戏机、数码相机等智能设备,还可以包括智能音箱、智能灯具、智能空调、热水器、烧水壶、烤箱、咖啡机、摄像头、门铃、毫米波传感器等物联网(internet of things,IoT)设备或智能家居设备,还可以包括打印机、扫描仪、传真机、复印机、投影仪等办公设备。不限于此,通信系统10中的多个电子设备还可以包括具有触敏表面或触控面板的膝上型计算机(laptop)、具有触敏表面或触控面板的台式计算机等非便携式终端设备等等。
通信系统10中可以包括可移动的电子设备,例如手机、平板电脑、智能手环等,也可以包括不可移动的智慧屏、智能灯具、智能空调等设备。
通信系统10可以包括相同厂商生产的电子设备,也可以包括不同厂商生产的电子设备,本申请实施例对此不做限定。
在本申请实施例中,不同的场景中的通信系统可以包括不同的设备。例如,场景可包括智能家居场景、运动健康场景、影音娱乐场景、智能办公场景、智慧出行场景等等。例如图1A所示,智能家居场景可包括智慧屏、电动牙刷、无线路由、智能音箱、扫地机、体脂秤、手表、手机及耳机。智能办公场景可包括计算机、鼠标、无线路由、电动窗帘、台灯、手表、手机及耳机。
通信系统10中的多个电子设备可以包括配置有软件操作系统(operating system,OS)的 智能设备如手机、智慧屏、电脑等等,也可以包括未配置OS的非智能设备如热水器、烧水壶等等。各个电子设备配置的OS可以不同,包括但不限于Harmony
Figure PCTCN2022131166-appb-000001
Figure PCTCN2022131166-appb-000002
等等。各个电子设备也可以都配置相同的软件操作系统,例如可以均配置Harmony
Figure PCTCN2022131166-appb-000003
通信系统10中的电子设备和其他部分或全部电子设备建立有连接及会话,并可以基于该连接和会话通信。也就是说,通信系统10中的任意两个电子设备之间可以直接连接并通信,也可以通过另一电子设备间接通信,也可以并无连接及通信关系。
例如,在通信系统10中,智能手机A可以和智慧屏B直接通信,智慧屏B和智能手环C可以通过手机间接通信,智慧屏B和智能音箱D两者可以没有连接关系也不能直接通信。
电子设备之间建立通信连接后,即可以看作加入了同一个通信系统。电子设备之间可以通过多种方式来建立连接,例如可以在用户触发下建立连接,也可以由设备主动建立连接,这里不限定。
在一些实施例中,通信系统10还可以对电子设备或使用电子设备的用户仅需鉴权或权限认证,在鉴权通过或权限认证通过之后,才允许该电子设备加入通信系统10。对电子设备或用户的鉴权或权限认证的方式,具体可参考后续实施例的相关描述。
通信系统10中电子设备之间可以通过以下任意一种或多种技术建立连接并通信:无线局域网(wireless local area network,WLAN)、无线保真直连(Wi-Fi direct)/无线保真点对点(Wi-Fi peer-to-peer,Wi-Fi P2P)、蓝牙(Bluetooth,BT)、近场通信(near field communication,NFC),红外(infrared,IR)、紫蜂(ZigBee)、超宽带(ultra wideband、UWB)、热点、Wi-Fi softAP、蜂窝网络、有线技术或远程连接技术等等。其中,蓝牙可以是经典蓝牙,也可以是低功耗蓝牙(bluetooth low energy,BLE)。
例如,电子设备可以通过无线局域网(WLAN)和处于同一个无线局域网内的其他设备通信。又例如,电子设备可以通过BT、NFC等近距离通信技术发现附近的其他设备,并和其他设备建立通信连接后通信。又例如,一个电子设备可以工作在无线接入点(access point,AP)模式并创建无线局域网,其他电子设备连接到该电子设备创建的无线局域网后,该电子设备和其他设备可以通过Wi-Fi softAP通信。又例如,多个电子设备可以登录同一账号或家庭账号或关联账号,例如登录相同的系统账号(如华为账号),然后各自通过3G、4G、5G等蜂窝网络技术或者广域网技术,和维护系统账号的服务器(例如华为提供的服务器)通信,然后通过该服务器通信。家庭账号是指家庭成员共同使用的一个账号。关联账号是指绑定的多个账号。
在本申请实施例中,一个通信系统内可以存在多种不同的连接方式。例如图1A所示,手机和电动牙刷之间可通过蓝牙通信,手机和智慧屏之间可通过Wi-Fi通信。
可见,通信系统10中的各个电子设备可以近距离通信,也可以远距离通信。也就是说,通信系统10中的各个电子设备可以位于同一个物理空间,也可以位于不同的物理空间。
通信系统10中的各个电子设备可以基于设备间的通信连接,相互同步或共享设备信息。该设备信息例如可包括但不限于:设备标识、设备类型、设备的可用能力,设备采集到的用户、设备及环境的状态信息等等。
在本申请实施例中,通信系统10中的多个电子设备可以基于各个设备的设备信息,协商决定中控设备,该中控设备可以在通信系统10的多个设备中选择合适的资源来检测特定事件、分析该特定事件所表征的用户意图、执行满足该用户意图的任务。
在本申请一些实施例中,中控设备也可以实现为分布式系统,可以分布在通信系统10的 多个设备之上,利用通信该多个设备的部分或全部资源实现中控设备的功能。
在本申请实施例中,中控设备可以根据用户的实际需求将通信系统中的部分资源组合为虚拟聚合设备,例如,中控设备可以将通信系统中的部分或全部可组合能力组合为该虚拟聚合设备。该虚拟聚合设备可用于检测特定事件、分析该特定事件所表征的用户意图、执行满足该用户意图的任务。虚拟聚合设备可以部署于通信系统10中的一个或多个物理设备之上,可以由该一个或多个物理设备中的全部或部分资源整合而来。
在本申请实施例中,通信系统10中的各个电子设备可以将自身的资源,按照统一的方式解构为标准化的可组合能力。可组合能力可以分为多种不同的类型,具体可参考后续实施例的详细介绍。例如,智慧屏可以抽象出屏幕显示、摄像头录像、喇叭放音、麦克风拾音、多媒体播放服务等可组合能力。
通信系统10中的各个电子设备可以安装并运行独立的智慧助手,也可以不安装独立的智慧助手,这里不做限定。智慧助手是一种基于人工智能构建的应用程序,借助语音语义识别算法,通过与用户进行即时问答式的语音交互,帮助用户完成信息查询、设备控制、文本输入等操作。智慧助手通常采用分阶段级联处理,依次通过语音唤醒、语音前端处理、自动语音识别、自然语言理解、对话管理、自然语言生成、文本转语音、应答输出等流程实现上述功能。
在本申请实施例中,通信系统10中的多个电子设备可以共同运行一个单一的智慧助手。该智慧助手部署在通信系统10之上。在一些实施例中,通信系统10运行智慧助手的一个实例,各个设备中运行的标识(例如进程号)相同。在另一些实施例中,通信系统10也可以运行智慧助手的多个实例。实例是运行态的应用程序。实例可以指进程,也可以指线程。进程是应用程序在计算机上的一次执行活动。线程是应用程序执行中一个单一的顺序控制流程。一个进程可以包括多个线程。
在本申请实施例中,通信系统10中的多个电子设备共同运行的单一智慧助手,可以实现为系统应用、第三方应用、服务接口、小程序或网页中的任意一种或多种。
通信系统10运行的单一智慧助手用于支持通信系统10执行本申请实施例所提供的基于多设备提供服务的方法。
参考图1B,图1B为本申请实施例提供的运行于通信系统10之上的智慧助手的软件结构示意图。该通信系统10之上的智慧助手可以是虚拟聚合设备之上的单一智慧助手。
如图1B所示,该智慧助手可包括以下组件:
1.能力发现组件
能力发现组件可以部署于通信系统10的每个电子设备中。
能力发现组件用于和通信系统10中的其他电子设备相互同步可组合能力,还用于管理通信系统10中可用的可组合能力。能力发现组件还可用于在通信系统10的设备之间建立连接之前,对对端设备或用户进行鉴权或权限认证。
在一些实施例中,能力发现组件可进一步包括:认证/鉴权模块、可组合能力发现模块、可组合能力集、感知数据对接模块。
认证/鉴权模块,用于本地设备和其他设备建立连接之前,其他设备对该本地设备或使用该本地设备的用户进行认证和鉴权。认证和鉴权的方式,可参考后续方法实施例的介绍。
可组合能力发现模块,用于发现通信系统10中的其他设备以及其他设备的可组合能力,以及,将本地设备的可组合能力同步给通信系统10中的其他设备。可组合能力发现模块发现其他设备的方式,可参考后续方法实施例的介绍。
可组合能力集,用户管理本地设备以及发现的其他设备的可组合能力。
感知数据对接模块,用于管理传感感知组件感知到的各类数据的格式规范。通过该规范可以对通信系统10中各个设备采集到的各类数据进行标准化的管理,便于这些数据被跨设备调用,实现跨设备的资源互通共享。
2.传感感知组件
传感感知组件可以部署于通信系统10中具备感知能力的电子设备中。
传感感知组件可用于感知用户、设备及环境的状态信息,还用于创建并维护用户画像、上下文及记忆。
在一些实施例中,传感感知组件可进一步包括:用户状态感知模块、设备状态感知模块、环境状态感知模块、用户画像模块、上下文模块、记忆模型。
用户状态感知模块、设备状态感知模块、环境状态感知模块分别用于感知用户、设备及环境的状态信息。
用户画像模块,用于根据用户和通信系统10中的各个设备的交互情况,创建并维护该用户的用户画像。
上下文模块,用于根据用户和通信系统10中的各个设备的交互历史,创建并维护针对该通信系统10的全局上下文。
记忆模型,用于根据用户和通信系统10中的各个设备的交互历史、设备的操作历史等等,创建并维护通信系统10的记忆。
3.系统中控组件
系统中控组件部署于通信系统10中各个电子设备协商决定的中控设备之上。
系统中控组件用于根据传感感知组件获取到的各类信息,以及,用户的实际需求,选择能力发现组件维护的通信系统10中合适的可用能力,动态构建虚拟聚合设备。系统中控组件还用于选择通信系统10中合适的可组合能力来检测特定事件、分析该特定事件所表征的用户意图、执行满足该用户意图的任务。
在一些实施例中,系统中控组件可进一步包括:系统重构模块、交互模态调度模块、服务能力调度模块。
系统重构模块,用于根据传感感知组件获取到的各类信息,以及,用户的实际需求,选择能力发现组件维护的通信系统10中合适的可用的可组合能力,动态构建虚拟聚合设备。
交互模态调度模块,用于选择通信系统10中合适的可组合能力来检测特定事件、分析该特定事件所表征的用户意图。
服务能力调度模块,用于选择通信系统10中合适的可组合能力来执行满足该用户意图的任务。
4.交互识别组件
交互识别组件部署于中控设备选择的用于检测特定事件、分析该特定事件所表征的用户意图的可组合能力所在的电子设备上。该特定事件可以是一种模态或多种模态的组合。模态例如可包括文字、语音、视觉、动作、态势(如用户所在位置、用户和设备间的距离)、场景(如办公场景、家庭场景、通勤场景)等。
交互识别组件用于根据传感感知组件获取到的各类信息,判定是否检测到特定事件,并分析检测到的特定事件所表征的用户意图,还可以将用户意图分解为多模态的形式。
在一些实施例中,交互识别组件可进一步包括:交互触发模块、交互指令识别模块、多模态意图决策模块。
交互触发模块,用于根据传感感知组件获取到的各类信息,判定是否检测到特定事件。
交互指令识别模块,用于分析检测到的特定事件所表征的用户意图。
多模态意图决策模块,用于将用户意图分解为多模态形式的待执行任务。
5.服务应答组件
服务应答组件部署于中控设备选择的用于执行满足用户意图的任务的可组合能力所在的电子设备上。
服务应答组件用于根据交互识别组件的分析得到的用户意图,编排应答任务序列,控制应答任务按照一定逻辑关系执行,还用于根据传感感知模块获取到的各类信息,动态接续或切换设备/能力以执行应答任务。
在一些实施例中,服务应答组件可用于执行多种模态的任务。
在一些实施例中,服务应答组件可进一步包括:任务序列生成模块、任务映射模块、任务管理模块、任务执行运行时(Runtime)。
任务序列生成模块,用于生成满足用于意图的一个或多个任务。
任务映射模块,用于将一个或多个任务映射到合适的可组合能力中执行。
任务管理模块,用于按照交互识别组件分析得到的用户意图,控制一个或多个任务按照一定逻辑关系执行。
任务执行运行时(Runtime),用于运行应答任务。
上述图1B示出的智慧助手的软件结构仅为示例,并不构成对运行于通信系统10之上的智慧助手的具体限定。在本申请另一些实施例中,该智慧助手可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
上述图1B所示的组件或模块可采用设备端侧程序和/或云服务的形式部署,亦可采用分布式或集中式的架构部署在一台或多台设备上运行。例如,认证/鉴权模块、用户画像模块、上下文模块、记忆模型、交互指令识别模块、多模态意图决策模块等,可以采用端云结合的方式部署。
在其他一些实施例中,通信系统10中还可以包括多个中控设备,各个中控设备可以组建不同的虚拟聚合设备,并运行多个虚拟聚合设备的实例,在不同的虚拟聚合上分别运行单一的智慧助手。这样可以针对不同的用户组建不同的虚拟聚合设备,从而为不同的用户提供个性化的服务。
通信系统10中各个设备的具体作用,可参考后续方法实施例的详细描述。
在本申请实施例中,通信系统10中的电子设备的数量可以发生变化。例如,通信系统10中可能会新增一些设备,也可能会减少一些设备。
如果在通信系统10未获知或存储设备的相关信息(如标识、类型等)的情况下,该设备连接到通信系统10,则称为该设备加入通信系统10。如果在通信系统10获知或存储有设备的相关信息(如标识、类型等)的情况下,该设备连接到通信系统10,则称为该设备上线。
类似的,设备和通信系统10断开连接后,如果该通信系统10未存储该设备的相关信息,则称为该设备离开通信系统10。如果设备和通信系统10断开连接后,该通信系统10仍然存储有该设备的相关信息,则称为该设备下线。电子设备通常会由于位置变更或电量耗尽等原因离开通信系统10或从通信系统10下线。
图1A所示的通信系统10仅为示例,具体实现中,通信系统10还可以包括更多的设备,这里不做限定。例如,通信系统10还可以包括用于提供WLAN的路由器、用于为提供认证/ 鉴权服务的服务器、用于存储可组合能力信息、上下文、用户画像或记忆的服务器、用于管理账号的服务器、用于管理通信系统10中各个电子设备的服务器等等。
通信系统10也可以被称作分布式系统、互联系统等其他名词,这里不做限定。
参考图2,图2为本申请实施例提供的电子设备100的结构示意图。该电子设备100可以为通信系统10中的任意一个电子设备。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在本申请实施例中,处理器110可用于将电子设备100的资源,按照统一的方式解构为标准化的可组合能力。
如果图2所示的电子设备100为通信系统10中各个设备协商决定的中控设备,则处理器110可用于在通信系统10的多个设备中选择合适的资源来检测特定事件、分析该特定事件所表征的用户意图、执行满足该用户意图的任务。
如果图2所示的电子设备100为中控设备选择的第一设备,则处理器110用于调用电子设备的相关器件(如显示屏、麦克风、摄像头等等)来检测特定事件。
如果图2所示的电子设备100为中控设备选择的第二设备,则处理器110用于分析该特定事件所表征的用户意图。
如果图2所示的电子设备100为中控设备选择的第三设备,则处理器110用于调用电子设备的相关器件(如显示屏、麦克风、摄像头等等)来执行满足该用户意图的任务。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括WLAN(如Wi-Fi),BT,全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),NFC,IR、UWB等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号解调以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
移动通信模块150或无线通信模块160用于支持电子设备100和通信系统10中的其他设备建立连接并通信,相互同步或共享设备信息。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD)。显示面板还可以采用有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),miniled,microled,micro-oled,量子点发光二极管(quantum dot light emitting diodes,QLED)等制造。 在一些实施例中,电子设备可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,等进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
内部存储器121可以包括一个或多个随机存取存储器(random access memory,RAM)和一个或多个非易失性存储器(non-volatile memory,NVM)。
随机存取存储器可以包括静态随机存储器(static random-access memory,SRAM)、动态随机存储器(dynamic random access memory,DRAM)、同步动态随机存储器(synchronous dynamic random access memory,SDRAM)、双倍资料率同步动态随机存取存储器(double data rate synchronous dynamic random access memory,DDR SDRAM,例如第五代DDR SDRAM一般称为DDR5 SDRAM)等;非易失性存储器可以包括磁盘存储器件、快闪存储器(flash memory)。
快闪存储器按照运作原理划分可以包括NOR FLASH、NAND FLASH、3D NAND FLASH等,按照存储单元电位阶数划分可以包括单阶存储单元(single-level cell,SLC)、多阶存储单元(multi-level cell,MLC)、三阶储存单元(triple-level cell,TLC)、四阶储存单元(quad-level cell,QLC)等,按照存储规范划分可以包括通用闪存存储(英文:universal flash storage,UFS)、嵌入式多媒体存储卡(embedded multi media Card,eMMC)等。
随机存取存储器可以由处理器110直接进行读写,可以用于存储操作系统或其他正在运行中的程序的可执行程序(例如机器指令),还可以用于存储用户及应用程序的数据等。
非易失性存储器也可以存储可执行程序和存储用户及应用程序的数据等,可以提前加载到随机存取存储器中,用于处理器110直接进行读写。
外部存储器接口120可以用于连接外部的非易失性存储器,实现扩展电子设备100的存储能力。外部的非易失性存储器通过外部存储器接口120与处理器110通信,实现数据存储 功能。例如将音乐,视频等文件保存在外部的非易失性存储器中。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
图2示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者包括和图2不同的部件。图示的部件可以以硬件,软件或软件和硬件的组合实现。
关于电子设备100的各个模块的作用,具体可参考后续方法实施例的详细描述,在此暂时不赘述。
下面基于图1A示出的通信系统10、图1B示出的智慧助手架构、图2示出的电子设备架构,介绍本申请实施例提供的基于多设备提供服务的方法。
参考图3,图3示例性示出了基于多设备提供服务的方法的流程。
如图3所示,该方法可包括如下步骤:
S101,电子设备将资源解构为可组合能力。
执行S101的电子设备的数量可以为一个,也可以为多个。
电子设备中的资源可以包括以下一项或多项:电子设备的软件资源、硬件资源、外设或外设的资源等等。其中:
硬件资源和电子设备配置的硬件相关,例如可包括电子设备具备的摄像头、传感器、音频设备、显示屏、马达、闪光灯等等。
软件资源和电子设备配置的软件相关,例如可包括电子设备具备的内存资源、计算能力(例如美颜算法能力、音视频编解码能力)、网络能力、设备连接能力、设备发现能力、数据传输能力等等。进一步地,该软件资源可包括电子设备提供的拍照服务、录音服务、指纹认证服务、运动健康服务、播放服务、短信服务、语音识别服务、视频通话服务等等。软件资源可以包括系统资源,也可以包括第三方资源,这里不做限定。
外设是指和电子设备连接的,用于对数据和信息进行传输、转送和存储等作用的设备。外设例如可包括电子设备的配件设备,如鼠标、外接显示屏、蓝牙耳机、键盘,以及,该电子设备管理的智能手表、智能手环等等。外设的资源可包括硬件资源和软件资源,硬件资源和软件资源可参考前文相关描述。
在本申请实施例中,一个电子设备中可以包括上述任意一种或多种类别的资源,也可以包括一个或多个资源。
在本申请实施例中,电子设备可以将自身的资源按照统一的方式解构或封装为标准化的可组合能力,并提供标准规范的接口以供通信系统中的其他设备调用。可组合能力是从物理设备中抽象出来的单一能力部件。
由于电子设备的资源可能会因为连接新的外设、安装/卸载应用程序等情况发生变化,电子设备解构得到的可组合能力也可能会发生变化。
电子设备可以采用面向能力、服务或信息中一个或多个的方式,将自身资源描述为标准化的可组合能力。例如,电子设备可以根据不同类别的能力(例如连接能力、音视频能力、拍摄能力等)将自身资源解构为不同的可组合能力,也可以根据不同类别的服务(例如定位服务、云定位服务、云计算服务等)将自身资源解构为不同的可组合能力,还可以根据不同类别的信息(例如图像信息、文本信息)将自身资源解构为不同的可组合能力。
也就是说,可组合能力是电子设备使用预定方式描述的资源。该预定方式可包括预定的格式、协议或标准等等。
在一些实施例中,可以采用Schema、Protobuf、可扩展标记语言(extensible markup language,XML)、JSON(java script object notation)等方式描述设备可组合能力,以前向/向后兼容可组合能力描述文件的不同版本。
参考图4,图4示例性示出了本申请实施例提供的可组合能力的类别。
如图4所示,电子设备的资源可以被解构为以下四类可组合能力:
1.交互类可组合能力
交互类可组合能力可用于检测特定事件。特定事件可包括用户输入的交互操作,也可以包括用户、设备、环境状态发生变化的事件。
在一些实施例中,按照用户输入的交互操作的类别,交互类可组合能力进一步可包括但不限于以下一种或几种类别的组合:
语音交互类可组合能力,用于采集用户输入的语音指令以及周围的环境音。语音交互类可组合能力,可以基于电子设备的麦克风170C、受话器170B、以及外接的耳机或麦克风等资源封装得到。
文本交互类可组合能力,用于采集用户输入的文本。文本交互类可组合能力,可以基于电子设备的显示屏194等资源封装得到。
视觉交互类可组合能力,用于采集可见光图像、红外图像、深度图像、骨骼点、眼动/视线等视觉交互信息。视觉交互类可组合能力,可以基于电子设备的摄像头193(如红外摄像头、深度摄像头)等资源封装得到。
触觉交互类可组合能力,用于采集用户的触控输入、指关节输入、按键输入等等。触觉交互类可组合能力,可以基于电子设备的显示屏194、触摸传感器180K、压力传感器180A等资源封装得到。
生理信号交互类可组合能力,可用于采集肌电信号、脑电波、心率、血氧等生理信号。生理信号交互类可组合能力,可以基于光学传感器、电极等硬件资源封装得到。
姿态交互类可组合能力,用于采集用户的姿态信息。姿态交互类可组合能力,可以基于陀螺仪传感器180B、加速度传感器180E以及惯性传感器等资源封装得到。
2.识别类可组合能力
识别类可组合能力,可用于识别交互类可组合能力检测到的特定事件所表征的用户意图,以及,确定用户意图对应的待执行任务。具体的,交互识别类可组合能力可以首先识别该特定事件所代表的具体信息(如语义、文本信息)等,然后识别该具体信息所表征的用户意图。
在一些实施例中,识别类可组合能力可进一步包括但不限于以下一种或几种类别的组合:
语音识别类可组合能力,可用于识别语音,可基于自动语音识别(automatic speech recognition,ASR)、自然语言理解(natural language understanding,NLU)等技术封装得到。
视觉识别类可组合能力,可用于手势、姿态,可基于计算机视觉算法等资源封装得到。
环境识别类可组合能力,可用于识别用户位置、用户兴趣,可基于、位置识别算法等封装得到。
3.服务类可组合能力
服务类可组合能力,用于执行满足用户意图的任务,从而为用户提供服务。
在一些实施例中,服务类可组合能力可进一步包括但不限于以下一种或几种类别的组合:
环境调节类可组合能力,用于调节环境,例如升温、降温、加湿、除湿、调节光照强度等等,可基于空调、加湿器、灯具等设备封装得到。
操控类可组合能力,用于操控设备,例如可包括设备启停、设备配对、参数调节等等。
信息服务类可组合能力,用于提供信息服务,例如搜索、导航、订餐等等。
数据处理类可组合能力,用于处理各类数据,例如音乐播放、视频播放、数据同步等等。
4.连接类可组合能力
连接类可组合能力,用于支持设备之间的连接及通信、交互,还可用于描述设备的通信时延、带宽等通信参数。
在一些实施例中,连接类可组合能力可进一步包括以下一种或几种类别的组合:
短距连接类可组合能力,用于支持设备通过短距离通信技术和其他设备连接及通信。短距离通信技术例如可包括Wi-Fi、BT、NFC、UWB等。短距连接类可组合能力可基于无线通 信模块160、天线等资源封装得到。
长距连接类可组合能力,用于支持设备通过长距离通信技术和其他设备连接及通信。长距离通信技术例如可包括蜂窝技术(如4G、5G)、LAN、有线技术(如光纤)等。长距连接类可组合能力可基于移动通信模块150、天线、有线接口等资源封装得到。
不限于图4中示出的划分可组合能力的类别方式,在其他一些实施例中,还可以使用其他的方式来划分可组合能力的类别,这里不做限定。例如,还可以按数据类型进行划分,数据类型可包括图像/视频、语音、文本等等。
在本申请实施例中,一个电子设备中可以包括上述任意一种或多种类别的可组合能力,也可以包括一个或多个可组合能力。
通过S101,使用统一的方式解构得到的可组合能力和设备、设备型号、设备厂商解耦,因而可以供通信系统中的其他设备跨设备无障碍调用,即支持中控设备统一调度各个设备的资源,从而满足用户的需求。通过统一的方式来将各个设备的资源解构为标准化的可组合能力,相当于不同的设备都使用相同的资源描述规范,使得本申请提供的方法适配不同的设备,支持不同类型、不同厂商的设备加入到通信系统中共同为用户提供服务。
在本申请一些实施例中,电子设备结构得到标准化的可组合能力之后,还可以获取该可组合能力的属性。该可组合能力的属性例如可包括以下两类:1,静态信息,例如可组合能力本身的类别、参数(如采集图像的分辨率)、性能(如拾音范围)、版本、功耗、尺寸规格(如显示规格)等。2,动态信息,包括会在不同的环境下改变的信息,如位置(如室内、室外、客厅、卧室、近场或远场等)、朝向、是否插电(例如手机在插电状态下,对功耗变得不那么敏感)等。可组合能力的属性可能是用户手工配置的(例如设备初始化时指定其位置),也可能是设备在运行过程中自己检测到的(例如通过超声波传感器检测周边是否有其它设备等)。
S102,多个电子设备组成通信系统10。
在本申请一些实施例中,多个电子设备可以在建立连接后,建立会话,从而组建为通信系统10。
在本申请另一些实施例中,多个电子设备可以先建立连接,然后在经过认证和鉴权之后建立会话,从而组建为通信系统10。
多个电子设备组建为通信系统10,也可以称为该电子设备加入通信系统10
上述提及的多个电子设备建立连接或会话,可以是一个电子设备和其他的任意全部或部分电子设备建立连接或会话。电子设备之间可以通过以下一种或多种技术建立连接并通信:WLAN、Wi-Fi P2P、BT、NFC,IR、ZigBee、UWB、热点、Wi-Fi softAP、蜂窝网络、有线技术或远程连接技术等等。在同一个通信系统10中,可以存在多种不同的连接方式。
在本申请实施例中,电子设备可以通过以下任意一种方式和其他设备建立连接:
方式1.电子设备可以在用户的触发下和其他设备建立连接。
在一些实施例中,电子设备可以接收到输入的用户操作,并在该用户操作的触发下电子设备和其他设备建立连接。本申请实施例对该用户操作的实现方式不做限定,该用户操作例如可包括但不限于:作用于显示屏上的触摸操作/点击操作/长按操作、语音指令、隔空手势、摇晃电子设备的操作、按压按键的操作等等。
例如,参考图5A,电子设备可以在设置应用提供的用户界面51中展示发现的多个周围的无线网络的选项501,用户点击其中一个选项后,可以显示图5B所示的密码输入框502,用户输入该选项对应的无线网络的认证密码之后可以加入该无线网络。电子设备加入该无线 网络后,即和连接到该无线网络的其他电子设备建立连接。
又例如,电子设备可以在用户的触发下,通过BT、NFC等近距离通信技术、或者NFC碰一碰、或者Wi-Fi P2P技术发现附近的其他设备,和其他设备建立通信连接。
又例如,参考图5C,电子设备可以在用户输入系统的账号503和密码504之后,登录该账号,然后通过3G、4G、5G等蜂窝网络技术或者广域网技术,通过维护系统账号的服务器(例如华为提供的服务器),和登录相同账号或家庭账号或关联账号的其他设备建立连接。
在另一些实施例中,电子设备可以接收到输入的用户操作,并在该用户操作的触发下,指示该电子设备所管理的设备和其他设备建立连接。
例如,智能手机可以管理IOT设备(如智能音箱、智能灯具、智慧屏等),IOT设备首次上电时,广播自身信息,在附近的该智能手机上弹窗,引导用户在智能手机上输入Wi-Fi密码。之后,智能手机可以将Wi-Fi密码发送给IOT设备,触发IOT设备加入Wi-Fi网络。
示例性地,参考图5D,电子设备可以显示用户界面61。用户界面61可以由电子设备中的智慧生活APP提供。智慧生活是用于管理用户拥有的各类设备的应用。
如图5D所示,用户界面61显示有:家庭名称611、设备数量612、已发现设备选项613、虚拟聚合设备选项614、添加控件615、一个或多个设备选项和页面选项显示区域等。其中:
家庭名称611可以用于指示通信系统10覆盖的区域名称。该家庭名称611可以由用户设定。例如,家庭名称611可以为“家”。
设备数量612可以用于指示通信系统10所包括的设备数量。例如,在家庭名称611为图5D所示的“家”的情况下,该通信系统10所包括的设备数量为“5个设备”。
已发现设备选项613可以用于触发电子设备显示通信系统10所包括的一个或多个电子设备对应的设备选项。如图5D所示,已发现设备选项613已被选中,电子设备可以显示出通信系统10所包括的一个或多个电子设备(例如,路由器、空调、音箱、大灯和大屏等)对应的设备选项(例如,路由器设备选项、空调设备选项、音箱设备选项、大灯设备选项和大屏设备选项等)。
虚拟聚合设备选项614可以用于触发电子设备显示组成虚拟聚合设备的各可组合能力所属的电子设备。
添加控件615可以用于触发一个或多个电子设备添加至通信系统10中。电子设备响应于作用在添加控件615向通信系统10添加电子设备所显示的用户界面将在后续实施例中进行说明,在此先不赘述。
一个或多个设备选项(例如,路由器设备选项、空调设备选项、音箱设备选项、大灯设备选项和大屏设备选项等)可以用于显示通信系统10中具体包括的电子设备(例如,路由器、空调、音箱、大灯和大屏等)、各电子设备所处的位置(例如,卧室、客厅等)以及各电子设备的设备状态(例如,在线、已关闭等)。各设备选项中还可以包括对应的控制控件(例如,空调设备选项中的控制控件616),以用于控制对应电子设备(例如,空调)的启动或关闭。
响应于作用在添加控件615上的触摸操作(例如,点击),如图5E所示,电子设备可以在用户界面61上显示出选项617。该选项617可以显示出文本提示信息“添加设备”。
响应于作用在选项617上的触摸操作(例如,点击),如图5F所示,电子设备可以显示用户界面62。用户界面62包括返回控件621、页面标题622、扫描提示623、被扫描设备显示区域624、手动添加控件625和扫码添加控件626。其中:
扫描提示623可用于提示用户电子设备的扫描状态。例如“正在扫描”可以表示电子设备正在扫描附近可被添加的电子设备。例如,电子设备可以通过蓝牙通信的方式来判断附近 是否存在可被添加至通信系统10的电子设备。其中,电子设备可以通过蓝牙通信广播设备发现请求。蓝牙处于开启状态的电子设备在接收到上述发现请求后,可以通过蓝牙通信向电子设备发送发现应答。当接收到上述发现应答,电子设备可以扫描到该智能家居设备,并在用户界面中显示该智能家居设备的添加控件。本申请实施例对电子设备扫描电子设备的方法不作限定。
扫描提示623还可包含用于提示用户添加智能家居设备的注意事项的内容。上述注意事项可以包括“请确保智能设备已连接电源,且位于手机附近”。
被扫描设备显示区域624可用于显示电子设备扫描到的电子设备。例如,电子设备扫描到台式电脑。电子设备可以在被扫描设备显示区域624显示台式电脑的名称“台式电脑”、台式电脑的设备状态“在线”、台式电脑的位置“卧室”和添加控件624A。上述添加控件624A可用于触发台式电脑加入到通信系统10中。
手动添加控件625可便于用户通过在电子设备中手动输入需要添加的电子设备的信息,来将电子设备添加至通信系统10中。
扫码添加控件626可用于触发电子设备开启扫描装置。即用户可以通过扫描二维码、条形码等数据的方式来添加电子设备到通信系统10中。本申请实施例对上述手动添加电子设备到通信系统10和扫码添加电子设备到通信系统10的实现方法不作限定。
响应于作用在添加控件624A的触摸操作(例如,点击),如图5G所示,电子设备可以在用户界面61上显示出台式电脑的设备选项618。该设备选项618可以包括台式电脑的名称“台式电脑”、台式电脑的设备状态“在线”和台式电脑所在位置“卧室”。该设备选项618还可以包括控制控件618A,该控制控件618A可以用于用户控制台式电脑的关闭或开启。
方式2.电子设备可以主动和其他设备建立连接。
电子设备可以在一些情况下主动和其他设备建立连接。这样无需用户手动操作,可以简化用户行为,提高基于多设备提供服务的效率。
例如,电子设备可以主动搜索附近的无线网络,如果电子设备自身存储有该无线网络的密码,则可以主动加入该无线网络。例如,用户每天回家后,该用户携带的电子设备可以自动连接家庭网络。
又例如,电子设备可以在处于特定位置(例如家、办公室等)时,主动加入该特定位置中的通信系统。
可见,电子设备之间建立连接之前,可能会先经过认证和鉴权,例如图5B所示的密码认证。在经过认证和鉴权之后,电子设备才能和其他电子设备建立连接。
在一些实施例中,电子设备使用上述方式1或方式2和其他电子设备建立连接之后,其他电子设备还可以对该电子设备或使用该电子设备的用户进行认证和鉴权,并在认证和鉴权通过之后才允许该电子设备和其他电子设备建立会话,从而组建为通信系统10。其他电子设备对电子设备进行认证和鉴权,可以保证受信任的、安全的设备才能连接到该其他电子设备,能够保障其他电子设备的数据安全。
认证或鉴权的方式可以包括验证该电子设备的安全等级、类型等等。设备安全等级主要由电子设备本身的软硬件配置决定。对用户进行认证或鉴权的方式可以包括身份认证。身份认证方式可以包括:密码(如由数字、字母、符号组成的字符串)认证、图形认证、生物特征(如人脸、声纹、指纹、掌型、视网膜、虹膜、人体气味、脸型、血压、血氧、血糖、呼吸率、心率、一个周期的心电波形)等。示例性地,参考图5C,用户输入密码之后,电子设 备可以显示图5H所示的提示信息。提示信息可以用于提示用户输入人脸以完成认证和鉴权。
示例性地,参考图5H,图5H示出了对电子设备进行认证或鉴权时,该电子设备上显示的用户界面。如图5H所示,电子设备可以在图5B中接收到用户输入的无线网络密码后,或者,在图5C中输入账号和密码后,可以显示提示框505。提示框505可用于提示用户输入人脸以完成对该电子设备的认证和鉴权。
在一些实施例中,电子设备使用上述方式1或方式2和其他电子设备建立连接之后,可以对其他设备进行认证和鉴权,并在认证和鉴权通过之后才和其他电子设备建立会话,以组建为通信系统10。电子设备可以通过验证其他设备的安全等级或类型来对进行认证和鉴权。电子设备对其他电子设备进行认证和鉴权,可以保证电子设备和受信任的、安全的其他电子设备建立会话以组建为通信系统,能够保证该电子设备内的数据安全。
在本申请实施例中,电子设备之间建立连接之前的认证和鉴权过程,和,电子设备之间建立会话之前的认证和鉴权过程,可以复用,即可以实现为同一个认证和鉴权过程。也就是说,电子设备之间可以经过一次认证和鉴权,即可相互建立连接和会话。例如,在图5D-图5G示出的通过智慧生活APP将设备添加到通信系统10中的方式,电子设备之间经过一次认证和鉴权,即可建立连接和会话。在其他一些实施例中,电子设备之间建立连接之前的认证和鉴权过程,和,电子设备之间建立会话之前的认证和鉴权过程,可以分开独立执行。
在一些实施例中,电子设备在首次对其他电子设备进行认证和鉴权后,该电子设备可以记录对方的信息,便于后续该电子设备再次对其他电子设备进行认证和鉴权。之后,电子设备可能会因位置变更、耗尽电量等原因和其他电子设备断开连接或断开会话,后续该电子设备再次连接其他电子设备时或者再次建立会话时,可以无需用户手动操作,便可使用记录的信息进行认证和鉴权。例如,用户处于家庭范围时,手机A在鉴权通过后加入家庭范围中的通信系统10,用户携带手机A外出后再次回到家庭范围时,无需用户手动操作手机A便可加入家庭范围中的通信系统10。这样能够简化电子设备加入通信系统10的操作,提高本申请所提供方法的实施效率。
上述对电子设备进行认证和鉴权,可以由执行认证和鉴权的设备在本地执行,也可以结合云端服务器执行。
在针对电子设备的认证和鉴权通过后,其鉴权结果可以分为不同的等级。鉴权结果等级的设定这里不做具体限制。例如,电子设备的安全等级越高,对其的鉴权结果的等级也就越高。也就是说,鉴权结果等级反映了针对电子设备的信任程度,鉴权结果等级越高,其信任程度也越高。
鉴权结果等级的不同,该电子设备在后续步骤中开放给通信系统10的可组合能力的范围可以不同,具体可参考后续S103的详细描述。
在本申请实施例中,电子设备加入通信系统10后,可能因位置变更、耗尽电量等原因离开通信系统10或从通信系统10下线。电子设备离开通信系统10后,也可能再次加入该通信系统10。电子设备从通信系统10下线后,也可能再次上线。
在本申请实施例中,一个电子设备可以加入多个不同的通信系统。例如,手机A位于家庭范围时,可以家庭中的通信系统;手机A位于办公室时,可以加入办公室中的通信系统。
通过S102,多个电子设备可以相互连接并组成一个通信系统,便于后续多个电子设备协 同为用户提供服务,实现高效自然的跨设备资源共享。
S103,通信系统10中的各个电子设备基于设备间的连接,相互同步设备信息。
在一些实施例中,通信系统10中的各个电子设备可以在执行S102,即组建通信系统10后,即刻相互同步设备信息。
在一些实施例中,如果有新的电子设备加入通信系统10,或者,有新的电子设备上线后,则该电子设备可以和通信系统10中的其余设备相互同步设备信息。
在一些实施例中,如果通信系统10中电子设备的设备信息有更新,则该电子设备可以和通信系统10中的其余设备相互同步设备信息。
在一些实施例中,通信系统10中的各个电子设备也可以按照预设规则周期性地或非周期性地相互同步设备信息,例如可以每30秒或每分钟同步一次。
在本申请实施例中,通信系统10中的各个电子设备可以基于设备间的连接,相互同步设备信息。例如,如果通信系统10中的各个电子设备连接到同一个WLAN,则可以通过该WLAN(例如通过路由器中转)相互同步设备信息。又例如,如果通信系统10中的电子设备之间通过蓝牙连接,则可以基于该蓝牙连接来相互同步设备信息。又例如,如果通信系统10中的各个电子设备通过登录同一账号远程连接,则可以通过管理账号的服务器来中转设备信息。如果通信系统10中包含没有直接连接的两个电子设备,则该两个电子设备可以通过通信系统10中的中间设备来相互同步设备信息。
在本申请实施例中,通信系统10中各个电子设备相互同步的设备信息包括:电子设备的全部或部分可组合能力信息。可组合能力信息用于表征或描述对应的可组合能力。在一些实施例中,可组合能力信息还可以用于描述该可组合能力的属性。本申请实施例对可组合能力信息的实现形式不做限定。可组合能力的分类及属性,可参考前文S101中的相关描述。
在一些实施例中,通信系统的各个电子设备之间可以相互同步全部可组合能力信息。电子设备全部的可组合能力信息是指,电子设备在S101中将自身资源解构得到的全部可组合能力的信息。
在一些实施例中,通信系统的各个电子设备之间可以相互同步自身的部分或全部可组合能力信息。电子设备发送给通信系统10中其他设备的部分可组合能力信息,可以根据以下任意一种策略来决定:
1.电子设备根据在加入通信系统10时的鉴权结果等级,决定发送给通信系统10中其他设备的可组合能力信息。鉴权结果等级越高,该电子设备发送给通信系统10中其他设备的可组合能力信息也越多。
鉴权结果可包括电子设备对通信系统10的鉴权结果,和/或,通信系统10对电子设备的鉴权结果,具体可参考前文S102的相关描述。
鉴权结果的等级越高,电子设备在S103中同步给通信系统10中其他设备的可组合能力信息可以越多。由于鉴权结果的等级反映了电子设备和通信系统10中其他设备之间的信任程度,使用第1种策略,能够使得电子设备仅对信任的其他设备开放更多的可组合能力,保障该电子设备的信息安全。
2.电子设备根据用户的需求,决定发送给通信系统10中其他设备的可组合能力信息。
例如,用户可以手动在电子设备上设定开放给通信系统10中其他设备的可组合能力,电子设备可以根据用户的设定发送对应的可组合能力信息给通信系统10中的其他设备。
示例性地,参考图5I,图5I示例性示出了用户设定开放给通信系统10中其他设备的可 组合能力的一种方式。图5I为电子设备中的设置应用提供的用户界面55,该用户界面55中显示有:一个或多个可组合能力选项506。该一个或多个可组合能力选项506可以对应于电子设备在S101中结构得到的可组合能力。电子设备可以检测到作用于可组合能力选项506的用户操作,将可组合能力选项506对应的可组合能力开放给通信系统10中的其他设备。不限于图5I示出的较粗粒度的可组合能力分类,在其他一些实施例中,电子设备还可以显示更细粒度的可组合能力以供用户选择。
不限于图5I中示出的方式,在其他一些实施例中,用户还可以通过其他方式来设定电子设备开放给通信系统10中其他设备的可组合能力。例如,用户可以针对不同的通信系统,开放不同的可组合能力。又例如,用户可以情景条件下,如不同的时间段,针对同一个通信系统开放不同的可组合能力。
3.电子设备根据自身的策略,决定发送给通信系统10中其他设备的可组合能力信息。
电子设备可以基于用户隐私、设备功耗等因素,仅将部分可组合能力信息同步给通信系统10中的其他设备。
例如,电子设备为了保证功耗,可以隐藏姿态交互类可组合能力,仅将其他可组合能力发送给通信系统10中的其他设备。
又例如,电子设备可以将机密性较低的视觉交互类可组合能力等开放给其他设备,而不将支付类可组合能力开放给其他设备。
通信系统10的各个设备相互同步可组合能力信息,可以便于后续实现跨设备的资源共享。
在本申请一些实施例中,通信系统10中各个电子设备相互同步的设备信息还可以包括:电子设备的设备属性。设备属性例如可包括以下一项或多项:设备标识、设备类型、当前功耗、可用资源、设备模态,当前使用状态、上线信息、下线信息、和通信系统10中其他设备的历史交互信息、设备位置(如房间、客厅等)、朝向、设备所处环境类型(如办公室、家庭范围等),等等。
其中,设备标识可以是该设备的IP地址或MAC地址。设备类型例如可分为富设备和瘦设备,也可以依据设备形态分为智慧屏、空调、打印机等类型。
可用资源例如可包括计算资源、内存资源、电量资源等等。设备模态是指电子设备提供或支持的信息交互方式,例如可包括语音交互模态、显示交互模态、灯光交互模态、振动交互模态等等。当前使用状态例如可包括设备当前启用的应用或硬件等。
上线信息可以包括电子设备上线的次数、时间、时长等等。类似的,下线信息可以包括电子设备下线的次数、时间、时长等等。
电子设备和其他设备的历史交互信息表征了该电子设备和其他设备之间交互的规律。该历史交互信息例如可包括:交互的业务类型、业务发起方、业务响应方、交互时长、业务发起时间、业务结束时间、统计时间段内的平均上线设备数、统计时间段内的平均上线设备数的归一化标准差、统计时间段内的历史在线设备数中一种或多种。电子设备之间交互的业务例如可包括文件传输、视频接续、音频接续、信令传输、数据分发等。例如,当平板电脑响应于用户的操作,将本地播放的视频续接到智慧屏时,平板电脑和智慧屏可以记录本次视频接续业务对应的交互行为信息。该交互行为信息可以包括以下信息中的一项或多项:业务类型-视频续接、业务发起方-平板电脑、业务响应方-智慧屏、交互时长-2小时15分钟、业务发起时间-1月1日19点37分、业务结束时间-1月1日21点52分。
统计时间段内的平均上线设备数、统计时间段内的平均上线设备数的归一化标准差、统计时间段内的历史在线设备数,可以由电子设备根据统计时间段内通信系统10中其他设备的 上线信息和下线信息统计得到。统计时间段可以根据实际需求进行设置,例如可以为最近1天、3天、1周或1个月等等。
平均上线设备数是指电子设备在统计时间段内统计到的,通信系统10中在单位时间(例如一日、一周等等)上线的设备的数量的平均值。如果同一个设备在单位时间内上线多次,可仅统计一次,不对其上线次数进行累计。例如,假设统计时间段为1月1日至1月7日,电子设备在统计时间段内统计到的上线设备的数量如表1所示:
表1
1月1日 1月2日 1月3日 1月4日 1月5日 1月6日 1月7日
3 4 5 2 6 8 7
此时,每日的平均上线设备数为(3+4+5+2+6+8+7)/7=5。
平均上线设备数的归一化标准差是指电子设备在统计时间段内,统计到的通信系统10中在单位时间(例如一日、一周等等)上线的设备的数量的标准差,除以平均上线设备数得到的值。
例如,假设电子设备根据表1的数据计算每一天统计到的上线设备的数量的标准差,则该标准差的计算过程可以表示为:
Figure PCTCN2022131166-appb-000004
然后,电子设备将标准差2除以每日平均上线设备数5,得到每日上线设备数的归一化标准差为2/5=0.4。
历史在线设备数是指电子设备在统计时间段内,统计到的上线的不同设备的总数量。例如,假设电子设备根据表1的数据计算得到的上线设备的数量和为3+4+5+2+6+8+7=35,其中可能存在相同的设备,因此历史在线设备数可能少于35。
在本申请实施例中,通信系统10中各个电子设备相互同步的设备信息还可以包括:用户信息。该用户信息的详细内容及作用可参考后文和用户画像相关的描述。
通信系统10的各个设备相互同步设备属性,可以便于后续通信系统10确定中控设备。
S104,通信系统10确定中控设备。
在本申请实施例中,通信系统10中的各个电子设备可以在以下任意一种情况下执行S104:
1.通信系统10中的各个电子设备可以在用户的触发下,执行S103。例如,用户可以在通信系统10中的中枢设备(例如路由器或手机)上输入操作,触发该中枢设备通过广播等形式通知通信系统10的其他设备共同执行S103。用于触发S103的用户操作,也可以被称为第二操作。
2.通信系统10中的各个电子设备也可以根据预设的规则周期性或非周期性地执行S103。例如,通信系统10中的各个电子设备可以每周或每月执行一次S104。也就是说,通信系统10中的多个电子设备可以在预设时间到达时,确定中控设备。
3.在有新设备加入或离开通信系统10时执行S104。有电子设备上线或下线时,可沿用旧的中控设备,无需再次执行S104。
4.在旧的中控设备下线时,通信系统10中的各个电子设备协同执行S103,即重新选举中控设备。中控设备下线的原因例如可包括中控设备位置发生变化、电量耗尽、用户手动触发中控设备下线等等。
在本申请其他一些实施例中,通信系统10确定中控设备后,无论该中控设备是否下线,该中控设备都可以持续保留中控设备的身份。这样可以避免中控设备频繁上下线带来的频繁选举中控设备的问题。
5.通信系统10可以在执行S102之后,可以在等待预设时长之后,再执行S104。
预设时长可以根据实际需求进行设置,本申请实施例对此不予限制。例如,预设时长可以设置为10秒钟、1分钟、1小时、12小时、1天、2天、3天等。
在预设时长内,通信系统10中的各个电子设备可以充分、全面地相互同步设备信息。例如,在预设时长内,可能有新上线的设备,新上线的设备可以将自身的设备信息同步给其他设备。例如,参考图6A,假设预设统计时长为2天,智慧大屏51与智能音箱54同步了交互统计信息之后,智慧大屏51进入等待状态。假设第1天智慧大屏51发现手机52和智能音箱54上线了,则智慧大屏51可以向手机52发送设备信息,并接收手机52发送的设备信息。如图7所示,假设第2天智慧大屏51发现手机52和平板电脑53上线,则智慧大屏51可以向平板电脑53发送设备信息,并接收平板电脑53发送的设备信息。在第2天结束后,智慧大屏51结束等待状态,完成交互统计信息的同步工作。此时,智慧大屏51除了获取到智能音箱54的设备信息以外,还获取到手机52的设备信息和平板电脑53的设备信息。
通过上述第5种方式延迟选举中控设备,通信系统10可以搜集到更加全面的设备信息来选举中控设备,能够选举到更加合适的中控设备。
在本申请实施例中,通信系统10中的各个电子设备可以基于设备间的连接,通过广播、组播、查询等方式,来执行S104,即协商、选举、决策或确定中控设备。各个电子设备协商中控设备时可以多次通信,本申请实施例对其协商过程及交互次数均不做限定。
在本申请实施例中,通信系统10中的中控设备的数量可以为一个,也可以为多个。
在本申请实施例中,通信系统10确定中控设备时,该通信系统10的各个电子设备之间可以通过一次或多次交互协商,也可以无需交互,这里不做限定。
通信系统10中的各个电子设备可以通过一定策略来确定中控设备。本申请实施例对该策略不做限定。下面列举几种通信系统10确定中控设备的方式:
1.根据资源的稳定性或者可用性、设备模态可用性,用户习惯中的一个或多个因素,从通信系统10的多个电子设备中选择一个或多个设备作为中控设备。
例如,通信系统10中的各个电子设备可以将计算资源较为稳定的设备、内存资源较为稳定的设备,电源较为稳定的设备,可用模态较多的设备,或者,用户常用的设备,确定为中控设备。例如,在家庭范围中,通信系统10可以将常接电源的智慧屏选举为中控设备。
2.根据通信系统10中各个电子设备的历史交互信息,从通信系统10的多个电子设备中选择一个或多个设备作为中控设备。
(1)在一些实施例中,通信系统10可以将平均上线设备数最大的电子设备确定为中控设备。例如,假设通信系统10中各个设备的交互统计信息如表2所示,则可以将智慧屏确定为中控设备。
表2
Figure PCTCN2022131166-appb-000005
(2)在一些实施例中,通信系统10可以将平均上线设备数的归一化标准差最大的电子设备确定为中控设备。例如,请参阅表2,通信系统10可以将智能音箱确定为中控设备。
(3)在一些实施例中,通信系统10可以将平均上线设备数大于第一值,且平均上线设备数的归一化标准差大于第二值的电子设备确定为候选的目标设备。第一值和第二值为预先设置的参数。
当只有一个目标设备时,可以直接将该目标设备确定为中控设备。
当存在多个目标设备或者不存在目标设备时,通信系统10可以根据历史在线设备数、设备类型、内存大小、设备标识等决策因子中的一种或多种来确定中控设备。上述各个决策因子可以有不同的优先级,通信系统10可以从优先级最高的决策因子开始,依次对各个目标设备进行比较,选择较优的电子设备作为中控设备。
例如,请参阅表2,假设智慧屏和智能音箱均满足平均上线设备数大于第一值,且平均上线设备数的归一化标准差大于第二值,因此,将智慧屏和智能音箱为目标设备。由于智慧屏和智能音箱的历史在线设备数相同,智慧屏可以进一步获取智慧屏和智能音箱的内存大小。假设智慧屏的内存为6GB,智能音箱的内存为512MB,智慧屏的内存大于智能音箱的内存,因此,通信系统10可以将智慧屏确定为中控设备。
(4)在一些实施例中,通信系统10可采用泊松分布建模各设备的上线设备数,通过最大似然估计计算各设备的上线设备数的数学期望值,通信系统10可以将上线设备数的数学期望值最大的电子设备确定为中控设备。
可选的,在一些实施例中,还可采用其它概率统计模型,建模各设备的上线设备数,并根据其数学期望、方差、标准差等统计参数的一种或多种,确定中控设备。
通过上述第2种策略,根据通信系统10中各个电子设备的历史交互信息来选举中控设备,可以将和通信系统10中的较多设备有过交互的电子设备确定为中控设备。这样,该中控设备和其他设备有较多的交互,能够承担收集信息和统筹计算的任务,获取其他设备的各类数据,生成其他设备的画像、生成全局上下文、生成记忆等等,从而保证基于多设备提供服务的效果。
3.如果通信系统10使用上述第1种或第2种策略确定了一个中控设备,并且通信系统10中存在未和该中控设备建立直接连接的其他设备,则通信系统10可以通过延续选举的方式确定更多的中控设备。
在实际的应用场景中,使用上述第1种或第2种策略确定的一个中控设备,可能和通信系统10中的部分电子设备并无直接连接,即无法在同一时间或同一空间直接交互。
例如,假设中控设备为客厅的智慧屏。用户在白天时,打开客厅的智慧屏观看节目。在晚上时,用户关闭客厅的智慧屏,回到卧室,打开卧室的智慧屏观看节目。此时,客厅的智慧屏和卧室的智慧屏虽然处于同一局域网中,但客厅的智慧屏和卧室的智慧屏无法在同一时间直接交互,客厅的智慧屏无法获取卧室智慧屏的历史交互信息。
为了充分利用这些与中控设备没有直接交互的电子设备的历史交互信息,通信系统10中的非中控设备可以通过延续选举的方式,持续选举出多个中控设备,以全方位地获取通信系统10中各个电子设备的历史交互信息。这样,选举出来的多个中控设备可以在同一时间或同一空间连接通信系统10中的全部设备。
在一些实施例中,延续选举出的中控设备,和,首次选举出的中控设备未直接连接的设备之间,具有直接连接关系。具体的,首次确定一个中控设备后,该中控设备和直接连接的其他设备之间组成一个群组。该中控设备直接连接的其他设备可以称为候选设备。该中控设 备可以向候选设备发送群组信息,该群组信息可包括:该中控设备的标识,和,候选设备的标识。候选设备接收到群组信息后,确定自身直接连接的设备中,是否有未包含在该群组中的离群设备,若有离群设备,将该候选设备新增为中控设备。该候选设备作为新增加的中控设备,也向其他设备发送群组信息以查询离群设备,直至整个通信系统10中不再有离群设备。
参考图6B,图6B示例性示出了一个通信系统10的拓扑图,其示出了延续选举中控设备的一个例子。如图6B所示,通信系统10中包括:手机A、平板电脑B、位于客厅的智慧屏C、位于卧室的智慧屏H。图中的线条表示设备间的直接连接关系。如果首次确定的中控设备为智慧屏C,则智慧屏C可以向手机A和平板电脑B发送群组信息。平板电脑B接收到群组信息后,可以获知其连接的智慧屏H为离群设备,然后平板电脑B将自身也确定为中控设备。
如图6B中的虚线框所示,手机A、平板电脑B、智慧屏C、智慧屏H通过上述延续选举的方式,可以划分成两个不同的群组。其中,以智慧屏C为中控设备的群组1,包括手机A、平板电脑B、智慧屏C;以平板电脑B为中控设备的群组2,包括平板电脑B和智慧屏H。
在另一些实施例中,延续选举出来的中控设备,和,首次选举出的中控设备未直接连接的设备之间,可以不具有直接连接关系。具体的,首次确定一个中控设备后,该中控设备和直接连接的其他设备之间组成一个群组,该中控设备可以向候选设备发送群组信息。候选设备接收到群组信息后,确定自身直接连接的设备中,是否有未包含在该群组中的离群设备,若有离群设备,将该候选设备和离群设备组成一个群组,并在该群组内协商一个新的中控设备。新协商的中控设备,也向其他设备发送群组信息以查询离群设备,直至整个通信系统10中不再有离群设备。
参考图6C,图6C示例性示出了一个通信系统10的拓扑图,其示出了延续选举中控设备的一个例子。如图6C所示,假设局域网中包括设备A、设备B、设备C、设备D、设备E五个电子设备,图中的线条表示设备间的直接连接关系。
假设通信系统10首次将设备B和设备E确定为中控设备。设备B可以向设备A和设备C发送群组信息1,群组信息1包括设备A的设备标识、设备B的设备标识和设备C的设备标识。设备E可以向设备A发送群组信息2,群组信息2包括设备A的设备标识和设备E的设备标识。
设备A接收到群组信息1和群组信息2之后,检测到群组信息1和群组信息2中都没有设备D的设备标识。因此,设备A可以将设备D确定为离群设备。之后,设备A和设备D两者可以继续选举新的中控设备,例如可以将设备A确定为新加的中控设备。之后,设备A可以向设备D发送群组信息3,群组信息3包括设备A的设备标识和设备D的设备标识。
如图6C中的虚线框所示,设备A、设备B、设备C、设备D、设备E通过上述延续选举的方式,可以划分成三个不同的群组。其中,以设备B为中控设备的群组1,包括设备A、设备B和设备C;以设备E为中控设备的群组2,包括设备A和设备E;以设备A为中控设备的群组3,包括设备A和设备D。
通过上述第3种延续选举中控设备的策略,可以在通信系统10中确定多个中控设备,该多个中控设备可以连接通信系统10中的全部设备。
因此,非中控设备可以通过上述延续选举的方式,确定离群设备,与离群设备一起确定新的中控设备。新的中控设备可以与上述离群设备进行交互,从而全方位地获取通信系统10中各个电子设备的历史交互信息,从而充分利用各个电子设备的信息来为用户提供服务。
4.通信系统10将用户选择的设备确定为中控设备。
用户可以通过向通信系统10输入用户操作(例如语音指令,或者作用于某个电子设备上 的触摸操作等),将通信系统10中的一个或多个设备设定为中控设备。这样通信系统可以根据用户的实际需求来选举中控设备。
不限于上述列举的4种选举中控设备的策略,在本申请实施例中,通信系统10还可以使用其他的策略来选举中控设备,这里不做具体限定。例如,通信系统10还可以使用Raft算法、Paxos算法等来选举中控设备。又例如,通信系统10还可以根据设备类型来选举中控设备,例如总是选举智慧屏作为中控设备。又例如,通信系统10还可以将某个电子设备固定作为中控设备,且后续都不再变更。
在一些实施例中,针对同一个通信系统,在不同的时间段可以对应有不同的中控设备。例如,针对在家庭范围中的通信系统,白天可以将客厅的智慧屏选举为中控设备,晚上可以将手机选举为中控设备。
在本申请实施例中,电子设备可以是移动的,因此电子设备可以在不同的时间加入不同的通信系统。电子设备在不同的通信系统中时,可以对应有不同的中控设备。通信系统选举中控设备的方式可参考前文相关描述。电子设备可以关联存储通信系统的标识和对应中控设备的标识,便于在进入不同的通信系统后执行不同的操作。通信系统的标识例如可包括局域网标识、通信系统所在位置等等。通信系统所在的位置可包括GPS定位、用户人工标定的位置、设备清单及其衍生信息等等。
例如,参考图6D,用户白天上班时,可以带着手机进入办公室,手机加入办公室中的通信系统,该通信系统的中控设备可以为台式电脑。用户晚上下班后,可以带着手机进入家庭范围,手机加入家庭范围中的通信系统,该通信系统的中控设备可以为智慧屏。
在本申请实施例中,上述S101-S104,可以由多个电子设备分别运行单一智慧助手的部分功能来执行,后续S105-108,可以由该多个电子设备共同运行该单一智慧助手来执行。也就是说,多个电子设备共同运行的单一智慧助手,支持通信系统10执行后续步骤S105-S108。
S105,中控设备利用通信系统10中各个设备的可组合能力,初始化虚拟聚合设备。
在本申请实施例中,初始化虚拟聚合设备是指在没有或未使用前置的虚拟聚合设备的状态及能力下,配置虚拟聚合设备。
在本申请实施例中,配置虚拟聚合设备是指,中控设备选取通信系统10中合适的资源进行初始化,即中控设备利用通信系统中的部分或全部可组合能力组合或组建成一个虚拟设备。对资源的初始化可包括加载可组合能力的代码或软件库、启动可组合能力相关的传感器或外设(如麦克风、摄像头等)、读取或记录可组合能力相关的数据(如音频、图像等)、从互联网下载依赖的数据或计算模型等操作的一种或多种。资源的初始化可以由被聚合的可组合能力所在的物理设备执行。
也就是说,配置虚拟聚合设备是指对所选择的可组合能力进行参数配置、网络连接、连接关系、数据通道,以及该可组合能力自身的可配置参数(例:放音能力的音量,摄像头上的分辨率)等方面的配置。参数的配置包括:针对数据处理流向的相关参数的配置。中控设备配置虚拟聚合设备之后,相当于指定了信息的采集及处理流向。也就是说,配置虚拟聚合设备之后,通信系统10中用于采集及处理信息的可组合能力,以及,该可组合能力之间的配合关系即可以确定。配置虚拟聚合设备之后,该虚拟聚合设备中的可组合能力处于工作状态或待工作状态。
在配置虚拟聚合设备后,对于通信系统10各个物理设备中的上层应用来说,该应用可以 感知到独立的该虚拟聚合设备,而不会感知到多个其他单独的物理设备。这样可以方便各个上层应用更加便捷地调度其他物理设备中的资源。
通过S105来配置虚拟聚合设备,可以针对后续可能会使用到的可组合能力,提前做好启动前的准备,能够提高后续启动该可组合能力以为用户提供服务时的响应速度。此外,通过S105,通信系统10可以仅需聚合通信系统10中的部分可组合能力,可以避免浪费不必要的资源。
换句话说,通过S105中配置虚拟聚合设备,中控设备可以持续基于场景需求,提前做好配置的调整和优化,所以当用户在发出涉及多设备的指令时,中控设备能立即调用相关已经配置就绪的可组合能力执行任务并给出响应,有效缩短了指令的响应时延。这样可以支持通信系统10为用户主动提供服务以及提供长时任务,避免在用户发出指令时才触发即时的协同组合,导致响应慢、不支持主动服务的问题。
虚拟聚合设备包括:中控设备,以及,中控设备选择的通信系统10中其他的部分或全部可组合能力。
对于上层应用来说,虚拟聚合设备被视为可独立执行应用任务的单台完整设备,但其各个能力(如交互、服务等)可能实际来自于不同的物理设备。也就是说,虚拟聚合设备由多个物理设备提供的部分或全部能力聚合得到。虚拟聚合设备的各个可组合能力可以来自通信系统10中的任意一个或多个物理设备,这里不限定。
虚拟聚合设备可用于执行后续步骤S106、S107、S108和S109。
在本申请实施例中,中控设备可以根据通信系统10中各个设备的配置、历史交互信息、用户的偏好、用户状态、设备状态、环境状态等等,选择合适的可组合能力组建为虚拟聚合设备。
在本申请实施例中,虚拟聚合设备的配置可以包括以下两种:
1.初始化虚拟聚合设备
在本申请实施例中,虚拟聚合设备的初始化可以在以下任意一种情况下执行:
情况1,通信系统10首次启动时,初始化虚拟聚合设备。通信系统10启动的规则可以预先根据实际需求设定。例如,可以设定在有超过一定数量的设备接入通信系统10后,该通信系统即启动。
情况2,通信系统10重启动时,初始化虚拟聚合设备。通信系统10重启动的规则可以预先根据实际需求设定。例如,可以设定在旧的中控设备下线后,该通信系统即重启动。
情况3,在有新设备加入通信系统10或有新设备上线,或者,有设备离开通信系统10或有设备下线时,初始化虚拟聚合设备。
情况4,通信系统10中部分或全部设备的软件更新。
情况5,通信系统10中部分或全部设备出现故障。设备故障例如可包括无法联网、音频器件损坏等等。
情况6,通信系统10中部分或全部设备的网络环境发生变化,如网络标识符变更,由WiFi连接变更为无线蜂窝网等。
情况7,用户手工触发系统初始化,如更换用户账号、重置系统等。
中控设备初始化虚拟聚合设备的过程可包括如下步骤:
步骤1,中控设备启动自身的可组合能力,获知环境信息。
具体的,中控设备可以启动自身的交互识别类可组合能力,例如用户位置识别可组合能力等等,获知当前的环境信息。也就是说,中控设备可以使用GPS、GLONASS、BDS等定 位技术、位置识别算法、室内定位技术、毫米波传感器等等,获知当前的环境信息。
环境信息表征了通信系统10或用户当前所处的环境或场景。通信系统10或用户当前所处的环境或场景,可以按照不同的规则划分类别。例如,可以按照隐私程度划分为公共场景(例如办公室)和私人场景(例如家庭范围),可以按照人数划分为多人场景和单人场景,可以按照是否有用户划分为有人场景和无人场景,还可以按照时间划分为早上、中午、晚上等场景。
中控设备获取到的环境信息可以是单一模态信息,也可以是多模态信息的组合。例如,环境信息可包括以下任意一种或多种:位置信息、文字、音频、视频等等。
在步骤1中,中控设备启动的自身的可组合能力,可以看做最原始的虚拟聚合设备。也就是说,中控设备首先将自身的可组合能力配置为虚拟聚合设备,然后通过后续的步骤2和步骤3将通信系统10中更多的可组合能力添加到该虚拟聚合设备中。
步骤2,中控设备调用通信系统10中其他设备的可组合能力来获取更多的环境信息。
在步骤2中,中控设备调用的通信系统10中其他设备的可组合能力,可以被称为第四可组合能力。
步骤2具体可包括以下两种实现方式:
(1)根据中控设备预置的设定和历史配置,使用对应的动态策略,启动通信系统10中其他设备的可组合能力来获取更多的环境信息。
中控设备通过预先的设定来调用其他可组合能力,以全面地获知更多的环境信息。例如,中控设备可以被设定为使用后续的任意一种策略,或者,被设定为位于某种环境(如办公室或家庭范围)。中控设备可以使用GPS、GLONASS、BDS等方式来获取位置信息。
动态策略例如可包括:隐私优先策略、全面探测策略、功耗优先策略等等。不同的环境可以对应使用不同的动态策略。例如,办公室可以使用全面探测策略,卧室空间可以使用隐私优先策略。下面展开介绍。
全面探测策略:中控设备启动当前全部可用的可组合能力(例如摄像头、麦克风等)来获取环境信息。例如,针对办公室等公共场所,选择全面探测策略,即启动区域内所有交互识别类可组合能力(如麦克风、摄像头等)进行信息收集。使用全面探测策略能够全面且准确地获取各类信息,便于后续为用户提供服务。
隐私优先策略:中控设备根据当前所处环境的隐私度,调用通信系统10中其他设备的可组合能力来获取更多的环境信息。环境的隐私度越高,中控设备调用越多的可组合能力来获取环境信息。例如,针对卧室等私人空间,选择隐私优先策略,只启动非内容收集的可组合能力(如人声检测、红外探测等),以获取基本的环境信息。使用隐私优先策略,能够保障用户的隐私不被泄露。
功耗优先策略:中控设备启动通信系统10中当前电量充足的设备(例如智慧屏、智能音箱)上可用的可组合能力来获取环境信息。使用功耗优先策略,能够充分考虑各个设备的电量来获取环境信息,避免通信系统10中各个设备的电量被耗尽。
上述第(1)种实现方式相当于通信系统10根据动态策略进行初始化环境状态探测,获取虚拟聚合设备初始配置的依据。
(2)中控设备根据自身获取到的环境信息,通过算法确定需调用的通信系统10中其他设备的可组合能力,通过该可组合能力来获取更多的环境信息。
首先,本申请实施例可以根据不同的场景分类规则,预定义不同的场景。例如,可以按照隐私程度划分为公共场景(例如办公室)和私人场景(例如家庭范围),可以按照人数划分 为多人场景和单人场景,可以按照是否有用户划分为有人场景和无人场景,还可以按照时间划分为早上、中午、晚上等场景。
对于每一个场景,先预先定义判断该场景最少需要的模态信息、最低的置信度(即置信度阈值)、需要启动的可组合能力。根据中控设备自身拥有的组合能力组件,查看能满足多少个场景的检测要求。针对已经满足要求的场景,中控设备才会通过自身收集的信息判断通信系统是否处于该场景。置信度是指根据中控设备采集的模态信息所确定的通信系统处于该场景的概率。场景需启动的可组合能力可以依据该场景的特性设置,也可以根据经验数据设置,这里不做限定。
参考表3,表3的前四列示例性列出了几个场景,以及其对应所需的模态信息、最低置信度和需要启动的可组合能力。
表3
Figure PCTCN2022131166-appb-000006
然后,中控设备基于自身获取到的模态信息(即环境信息),对照预先设定的各场景(模板)所需的模态信息,确定在不同的场景分类下当前所处的场景,及,实际置信度。如果中控设备获取到的模态信息包含某个场景依赖的模态类别,则可以判定当前是否处于该场景。中控设备可以通过多模态机器学习、深度学习算法、声音事件检测算法、AlexNet,VGG-Net,GooLeNet、ResNet、CNN、FNN、CRNN等方法,基于自身获取到的模态信息来判定当前是否处于场景。
例如,假设中控设备自身获取到的模态信息包括a,b,c,d则可以判定当前通信系统10处于场景1、场景2。(不能针对场景3进行检测,因为缺乏模态e)。
最后,中控设备在确定出的多个场景中,选择该多个场景中部分场景对应需要启动的可组合能力的并集,继续获取更多的环境信息。其中,该多个场景中的部分场景可包括:中控设备确定的多个场景中,实际置信度高于置信度阈值的全部场景,或者,实际置信度高于置信度阈值并且置信度最高的前N个场景。N可以预先设置。
例如,假设根据中控设备获取到的模态信息,确定的各个场景的实际置信度如表3中第4列所示,则只有场景1,2的实际置信度均大于置信度阈值,中控设备可以调用该两个场景对应需启动可组合能力的并集(1,2,3,4,5,6,9),以获取更多的环境信息。
举例说明,在卧室内,中控设备利用多传感器的数据以多模态机器学习得知目前场景是有用户场景,单人场景,私人场景,假设N=3,三个场景的置信度都排在前三,故此采取三种场景需启动的可组合能力的并集,启动所有非内容收集的可组合能力,如人声检测、红外探测等可组合能力。
通过上述第(2)种方式,中控设备可以在不同的场景分类规则下,动态启动更多的可组合能力来获取环境信息。该方式在信息互补性的前提下更细粒度的考虑了多模态信息以及场景信息,支持更灵活地动态启动可组合能力。
不限于上述步骤2中中控设备基于自身获取到的环境信息,调用其他设备的可组合能力来获取更多的环境信息,在其他一些实施例中,中控设备可以直接按照一个预定的策略来决定调用其他设备的可组合能力来获取环境信息。本申请实施例对该预定策略不做限定。这样,可以加快中控设备获取环境信息的速度,提高配置虚拟聚合设备的效率。
上述步骤1中启动的中控设备的可组合能力,和,步骤2中启动的通信系统10中其他设备的可组合能力,即组成虚拟聚合设备的可组合能力。
在一些实施例中,如果通信系统10在上述初始化虚拟聚合设备的过程中断电,由于在初始化的阶段选取的中控设备以及通过交互获取到各个设备的信息(例:设备列表,组合能力开启状态,设备状态等)没有太大的变化,可以根据记忆恢复到断电前的虚拟聚合设备配置状态。
上述初始化虚拟聚合设备的过程,可以在通信系统10首次启动或重新启动或有新设备加入时,支持多设备在不同的环境中初始化为带有中控设备的虚拟聚合设备。初始化虚拟聚合设备时仅需启用通信系统10中的部分可组合能力,因此可以避免浪费不必要的计算资源。此外,中控设备通过动态策略启动多设备的组件能力,并根据多设备的探索结果确认虚拟初始化配置,初始化配置过程能够针对场景权衡隐私、功耗、有效性等因素,并且具有灵活便捷等特点。
不限于上述示出的初始化配置虚拟聚合设备的过程,在本申请一些实施例中,用户还可以手动调整虚拟聚合设备。即,通信系统10可以接收用户操作,并根据该用户操作来聚合虚拟聚合设备。
示例性地,参考图5J,当前述图5D所示用户界面61中的虚拟聚合设备选项614被选中时,电子设备可以显示出用户界面63。该用户界面63可以包括组成虚拟聚合设备的一个或多个可组合能力(例如,近场语音输入能力、音乐播放能力、红外图像检测能力等等)对应的选项和添加设备选项633等。其中:
该一个或多个可组合能力对应的选项可以显示出各可组合能力的状态(例如,可用状态、关闭状态等)和该可组合能力所属的电子设备。该一个或多个可组合能力对应的选项还可以包括对应的控制控件(例如,近场语音输入能力的控制控件631),以用于控制对应的可组合能力(例如,近场语音输入能力)的启动或关闭。该一个或多个可组合能力对应的选项还可以包括删除控件(例如,近场语音输入能力对应选项中的删除控件632),以用于使得该可组合能力不再成为虚拟聚合设备组成中的一部分,也即是说,虚拟聚合设备将不能调用该可组合能力。
添加设备选项633可以用于将已发现设备中的可组合能力加入进虚拟聚合设备中,成为虚拟聚合设备组成中的一部分,也即是说,虚拟聚合设备可以调用新添加进虚拟聚合设备组成中的可组合能力。
响应于作用在删除控件632上的触摸操作(例如,点击),电子设备可以在用户界面63中不再显示出该近场语音输入能力对应的选项。也即是说,运行于音箱上的近场语音输入能力不再成为虚拟聚合设备组成中的一部分,虚拟聚合设备将不能再调用音箱上的近场语音输入能力。
如图5J所示,响应于作用在添加物理设备选项633上的触摸操作(例如,点击),电子设备可以显示出如图5K所示在用户界面63上显示出窗口633E。该窗口633E中可以显示出包括在已发现设备但不包括在虚拟聚合设备中的可组合能力,例如,台式电脑上的文本输入 能力选项633A和智慧屏上的人脸检测能力选项633B。其中,该文本输入能力选项633A可以包括对应的添加控件633C,该添加控件633C可以用于将台式电脑上的文本输入能力添加至虚拟聚合设备中,以使得虚拟聚合设备可以调用台式电脑上的文本输入能力。该人脸检测能力选项633B可以包括对应的添加控件633D,该添加控件633C可以用于将该智慧屏上的人脸检测能力添加至虚拟聚合设备中,以使得虚拟聚合设备可以调用智慧屏上的人脸检测能力。
响应于作用在添加控件633C上的触摸操作(例如,点击),如图5L所示,电子设备可以在用户界面63中显示出该文本输入能力选项634。也即是说,虚拟聚合设备包括台式电脑上的文本输入能力。其中,该文本输入能力选项634可以包括可组合能力的名称“文本输入能力”、该可组合能力的状态“可用”和该可组合能力所属电子设备“台式电脑”。该设备选项634还可以包括控制控件634A和删除控件634B。关于该控制控件634A和删除控件634B的描述,可以参考前述实施例中的描述,在此不再赘述。
上述图5D-图5G、图5J-图5L所示的用户界面可以由通信系统10中的任意一个设备提供。例如,可以由中控设备提供。
通过上述图5J-图5L所示的由用户将可组合能力添加到虚拟聚合设备中的示例,用户可以选择需要的可组合能力添加到虚拟聚合设备中。
在一些实施例中,中控设备配置虚拟聚合设备之后,还可以触发该虚拟聚合设备中可组合能力所在的物理设备输出提示信息,以提示用户该物理设备中有可组合能力被加入到了虚拟聚合设备中。提示信息的实现形式不做限定。例如,物理设备可以通过闪光灯、震动等方式来提示用户。
示例性地,参考图7,图7示例性示出了中控设备组建的一个虚拟聚合设备。如图7所示,该虚拟聚合设备包括中控设备,支持采集近场语音、远场语音及手势的交互类可组合能力,支持近场ASR、远场ASR、NLU、手掌检测、对话管理(dialogue management,DM)的识别类可组合能力,以及支持技能1-技能N的服务类可组合能力。支持采集近场语音、远场语音及手势的交互类可组合能力采集到的数据,可以交由支持近场ASR、远场ASR、NLU、手掌检测、DM的识别可组合能力进行分析,之后可根据分析结果启动支持技能1-技能N的服务类可组合能力类执行对应的任务。
配置虚拟聚合设备之后,中控设备可以管理虚拟聚合设备中的资源(即可组合能力),通过虚拟聚合设备来为用户提供服务。也就是说,中控设备可用于管理通信系统10包括的多个电子设备中的部分或全部资源。
S106,中控设备触发虚拟聚合设备中的第一设备检测特定事件。
在本申请实施例中,特定事件也可以被称为第一时间。
在本申请实施例中,特定事件是指隐含用户意图的事件。
在本申请实施例中,特定事件可以是一种模态或多种模态的组合。模态例如可包括文字、语音、视觉(如手势)、动作、态势(如用户所在位置、用户和设备间的距离)、场景(如办公场景、家庭场景、通勤场景)等。
特定事件可以包括以下几种类型:
1.用户输入的交互操作
用户输入的交互操作可包括但不限于:语音指令、作用于显示屏上的触控操作(如点击操作、长按操作、双击操作等)、隔空手势/悬浮手势、作用于设备按键上的操作、姿势、眼 球转动指令、口型指令、移动或摇晃设备的操作,等等。
在一些实施例中,设备可以在接收到唤醒词后开始检测语音指令,唤醒词例如可包括语音唤醒词(例如“小艺小艺”),也可以包括手势唤醒词(例如“OK”手势)。
举例说明,用户使用手机时,如果想要投屏,可以输出语音指令“投屏”,也可以在手机的显示屏上点击投屏按钮。
上述特定事件实现为第1种类型时,用户输入的交互操作也可以被称为第一操作。
2.用户状态发生变化的事件
用户状态例如可包括用户所处的位置、用户执行的事务(例如运动、办公、看电视等)等等。用户状态变化的事件例如可包括:用户起床、用户睡觉、用户出门、用户运动等等。
3.用户和设备之间态势发生变化的事件
用户和设备之间的态势例如可包括两者之间的距离。用户和设备之间态势发生变化的事件例如可包括用户移动设备(例如拿起手机)、用户和设备之间的距离发生变化(例如变大或变小)。
4.环境状态发生变化的事件
环境状态例如可包括:环境的温度、湿度、紫外线强度、风量、环境光等等。
5.设备接收到通知消息,或者,获取到即将执行的日程信息的事件
设备获取到通知消息,或者,获取到即将执行的日程信息的事件。
电子设备获取到的通知消息可以由该设备中的应用在运行过程中主动生成,也可以由设备中应用对应的服务器发送,例如,电子设备可以接收到可信机构发送的用于通知极端天气(例如风暴、大雪等)的通知消息等等。
日程是指对某个时刻或时间段的计划与安排。日程也可以称为事件、事务、行程或其他名称,这里不做限定。日程信息可以来自于电子设备中的备忘录、日历(calendar)、闹钟(clock)、订票类应用、线上会议类应用等等。
在本申请一些实施例中,中控设备可以选择虚拟聚合设备中的部分或全部资源来检测特定事件。该部分或全部资源可以被称为第一资源。第一资源的数量可以为一个或多个。第一资源可包括来自一个电子设备的资源,也可以包括来自多个电子设备的资源。第一资源为可组合能力,例如可以为交互类可组合能力。
在本申请一些实施例中,中控设备可以选择配置的虚拟聚合设备中的部分或全部交互类可组合能力来检测特定事件。
中控设备可以在虚拟聚合设备的交互类可组合能力中,任意选择或者根据一定策略来选择部分交互类可组合能力来检测特定事件。
具体的,中控设备可以结合以下一项或多项来选择合适的交互类可组合能力检测特定事件:虚拟聚合设备历史检测到的用户状态、设备状态、环境状态,用户画像、全局上下文,或记忆。
上述策略例如可以包括以下任意一种或多种的结合:
策略1,根据模态通道来选取交互类可组合能力。
具体的,由于特定事件可能是单模态的,也可能是多模态的组合,而某一种模态的采集通道可能有多种,中控设备可以根据通道来选取第一设备及第一可组合能力。具体的,通道是指采集模态的设备或可组合能力。例如,语音模态的特定事件可能由远场设备和近场设备共同拾音获取。又例如,特定事件中的语音模态可以通过远场设备的语音交互类可组合能力 拾音获取,该特定事件中的手势模态可以通过视觉交互类可组合能力采集。
在一些实施例中,中控设备可以在所有的采集通道中,选取部分通道来采集特定事件的模态信息。例如,用户和设备之间的人机距离较远时,可以选择远场的拾音可组合能力来采集语音指令。
在一些实施例中,针对某种模态信息,中控设备可以选取多个通道共同采集该模态信息。该模态信息可以被称为第一模态数据。例如,可以同时选择近场和远场的拾音可组合能力来采集语音指令。这样,可以融合多通道采集的模态信息,获得更加准确、丰富的模态信息,便于后续操作的准确性。
策略2,可组合能力的活跃程度优先。
具体的,中控设备可以选择活跃程度较高或最高的一个或多个可组合能力来检测特定事件。可组合能力的活跃程度和以下设备信息相关:(1)可组合能力所在设备是否处于启动状态。若启动则活跃程度高。(2)可组合能力所在设备最近被激活时长。设备被激活时长越长,活跃程度也就越高。(3)可组合能力所在设备接收输入的频率相关。设备接收输入的频率越高,则活跃程度也就越高。
策略3,近用户优先。
具体的,中控设备可以选择距离用户较近的设备设备中的交互类可组合能力来检测特定事件。可组合能力所在设备与用户的距离,可通过检测生物识别信号(例脸,声纹,皮电,心率等信息)的强度进行判断。
策略4,用户习惯优先。
具体的,中控设备可以根据用户习惯,优先选择历史记录中最常被调用的可组合能力来检测特定事件。
策略5,用户选择优先。
具体的,用户可以自主选择用于检测特定事件的可组合能力,中控设备可以触发用户选择的可组合能力检测特定事件。这里对用户选择用于检测特定事件的可组合能力的方式不做限定,例如可以通过在中控设备上操作选择,可以通过语音、手势等方式选择等等。
策略6,用户的注意力优先。
注意力的含义,以及,策略6的具体实现,可参考后续S108的相关描述,这里暂不赘述。
使用上述策略,可以有效地筛选出合适的交互类可组合能力来检测特定事件。
不限于上述列举的几种策略,本申请实施例还可以使用其他策略来选择交互类可组合能力检测特定事件。例如,中控设备还可以选择距离中控设备较近设备中的交互类可组合能力来检测特定事件,或者,选择最近和中控设备有交互的设备中的交互类可组合能力来检测特定事件。
策略7,能力优先。
具体的,中控设备可以选择交互能力更强或更多的设备中的交互类可组合能力来检测特定事件。例如,中控设备可以优先选择交互能力较强的手机中的交互类可组合能力来检测特定事件。
策略8,预置的默认排序优先。
在一些实施例中,设备或者用户可以预先设置用于检测特定事件的设备的优先级,例如可以在设备出厂时预置或者由用户在使用过程中预置。该预置的设备优先级可以存储在云端服务器,也可以存储在通信系统10的任意一个或多个设备中。
具体实现中,中控设备可以根据预置的设备优先级,优先选择优先级高的设备中的服务 类可组合能力来检测特定事件。
中控设备在虚拟聚合设备中选择的用于检测特定事件的部分或全部交互类可组合能力,可以被称为第一可组合能力,第一可组合能力所在的设备即为第一设备。
第一可组合能力的数量可以为一个或多个。第一设备的数量也可以为一个或多个。举例说明,在家庭范围中,中控设备可以将摄像头、智能音箱确定为第一设备,并触发摄像头启动视觉交互类可组合能力采集图像,触发智能音箱启动语音交互类可组合能力采集音频。
也就是说,中控设备可以在虚拟聚合设备中,选择具备检测特定事件的能力的部分或全部设备作为第一设备。
例如,配置有麦克风、显示屏、摄像头或加速度传感器等的设备具备检测用户输入的交互操作的能力,中控设备可以选择这一类设备作为第一设备。
又例如,配置有摄像头、速度传感器的设备可用于检测用户的状态,中控设备可以选择这一类设备作为第一设备。
又例如,配置有摄像头、距离传感器的设备可用于检测用户和设备之间的态势,中控设备可以选择这一类设备作为第一设备。
又例如,配置有温度传感器、湿度传感器的设备可用于检测环境状态,中控设备可以选择这一类设备作为第一设备。
又例如,具备消息收发能力可用于接收通知消息,具备日程添加能力的设备可用于获取日程信息,中控设备可以选择这一类设备作为第一设备。
如果中控设备在S105中配置了虚拟聚合设备,则该虚拟聚合设备中的第一可组合能力已经提前做好了启动的准备,因此,在S106中,中控设备可以快速、方便地触发第一设备启动该第一可组合能力来检测特定事件。可见,通过配置虚拟聚合设备,可以提高通信系统10执行S106的效率,从而更好地为用户提供服务。
在本申请实施例中,中控设备可以通过设备间的连接,通过发送通知消息(如广播、组播)等方式,触发第一设备启动第一可组合能力来检测特定事件。
在本申请的一些实施例中,各个第一设备启动第一可组合能力采集相应的数据后,可以在本地分析该数据,并将分析的结果(例如识别到的事件)发送给中控设备,以供中控设备获知当前是否检测到特定事件。
在本申请的另一些实施例中,一个或多个第一设备启动第一可组合能力采集相应的数据后,可以将自身采集到的数据发送给中控设备,由中控设备来融合多个第一可组合能力采集的数据,分析当前是否检测到特定事件。
在本申请的另一些实施例中,一个或多个第一设备启动第一可组合能力采集相应的数据后,可以将自身采集到的数据发送给后续S107中的第二设备,由第二设备根据该数据分析用户意图并拆分待执行任务。
在一些实施例中,一个可组合能力可用于采集少量模态(例如一个模态)的数据,多模态的数据需由多个可组合能力采集。不同的可组合能力通常具备不同的采样率。采样率是指可组合能力在单位时间(例如一秒、十秒、一分钟等)内采集数据的次数。电子设备中,各个可组合能力的采样率可以由该电子设备自主设定,这里不做具体限定。
在本申请实施例中,特定事件可以是多模态的组合。也就是说,特定事件可以包括多模态的数据。基于此,在一些实施例中,中控设备可以确定统一的采样率,触发第一设备中的 第一可组合能力统一使用该采样率来采集数据。这样,各个第一可组合能力使用相同的采样率来采样数据,虚拟聚合设备可以获取到数据量相差不大的多种模态的数据,可以更加方便、快捷地融合多模态数据,以识别特定事件。可见,第一可组合能力使用统一的采样率采集数据,可以确保各个第一可组合能力采集的数据特征的完整性,并可以节省检测特定事件所消耗的资源。
中控设备确定可以通过以下任意一种方式确定统一的采样率:
方式1,中控设备任意选定一个策略作为该统一的采样率。
例如,中控设备可以预存统一的采样率。或者,中控设备可以在多个第一可组合能力中,任意选择一个可组合能力的采样率作为该统一的采样率。
方式2,中控设备将活跃度最高的第一可组合能力的采样率,确定为该统一的采样率。
具体的,中控设备可以通知各个第一可组合能力上报活跃度信息和采样率,并根据各个第一可组合能力上报的活跃度信息,确定各个第一可组合能力的活跃度。然后,中控设备将活跃度最高的第一可组合能力的采样率下发至各个第一可组合能力,通知各个第一可组合能力统一按照该采样率来采样。
可组合能力的活跃度反映了用户使用该可组合能力的频次,或,用户读取该可组合能力采集数据的频次。上述频次越高,可组合能力的活跃度也就越高。
活跃度信息可包括以下一项或多项:可组合能力所在设备的使用状态、可组合能力在初始时间间隔两次采集到的数据量的变化情况、可组合能力采集的数据和用户的关联程度。初始时间间隔可以是预先设置的固定系统参数,也可以是根据一定策略调整得到的参数。其中,可组合能力所在设备的使用状态可包括使用频次等,使用频次越高,活跃度也就越高。可组合能力在初始时间间隔两次采集到的数据量的变化越大,活跃度也就越高。可组合能力采集的数据和用户的关联程度越高,活跃度也就越高。例如,用户白天在客厅的时间更多,客厅设备中的可组合能力采集到的数据,相比于,卧室设备中的可组合能力采集到的数据,其和用户的关联程度越高。
通过上述方式2,使用活跃度最高的第一可组合能力的采样率作为统一的采样率,可以保证活跃度最高的第一可组合能力能够采集到足够丰富的和用户相关的数据,能够让中中控设备检测特定事件的准确率更高。
在本申请一些实施例中,上述S105中初始化虚拟聚合设备过程中的步骤1中,中控设备也可以使用统一的采样率来采集环境信息。
在本申请实施例中,上述提及的统一的采样率也可以被称为第一采样率。
在本申请一些实施例中,如果中控设备执行了上述可选步骤S105,则中控设备可以在S105之后,触发虚拟聚合设备中的第一可组合能力统一使用该采样率来采集数据。这样可以考虑到采样率对模态数据融合的影响,支持虚拟聚合设备配备自适应的初始感知采集策略。
S107,中控设备触发虚拟聚合设备中的第二设备分析特定事件所表征的用户意图,确定该用户意图对应的待执行任务。
在本申请一些实施例中,中控设备可以选择虚拟聚合设备中的部分或全部资源来分析特定事件所表征的用户意图,确定该用户意图对应的待执行任务。该部分或全部资源可以被称为第三资源。第三资源的数量可以为一个或多个。第三资源可包括来自一个电子设备的资源,也可以包括来自多个电子设备的资源。第三资源为可组合能力,例如可以为识别类可组合能力。
在本申请的一些实施例中,中控设备可以选择配置的虚拟聚合设备中的部分或全部识别类可组合能力来识别用户意图及待执行任务。
中控设备可以在虚拟聚合设备的识别类可组合能力中,任意选择或者根据一定策略来选择部分识别类可组合能力来识别用户意图及用户意图对应的待执行任务。
具体的,中控设备可以结合以下一项或多项来选择合适的识别类可组合能力识别用户意图及用户意图对应的待执行任务:虚拟聚合设备历史检测到的用户状态、设备状态、环境状态,用户画像、全局上下文,或记忆。
上述策略例如可以包括以下任意一种或多种的结合:
策略1,可组合能力的活跃程度优先。
具体的,中控设备可以选择活跃程度较高或最高的一个或多个可组合能力来识别用户意图及确定该用户意图对应的待执行任务。可组合能力的活跃程度的确定方式可参考前文。
策略2,近用户优先。
具体的,中控设备可以选择距离用户较近的设备设备中的识别类可组合能力来识别用户意图及确定该用户意图对应的待执行任务。设备与用户之间距离的判断方式可参考前文。
策略3,同输入设备优先。
具体的,中控设备可以优先选择第一设备中的识别类可组合能力来识别用户意图及确定该用户意图对应的待执行任务。
策略4,用户习惯优先。
具体的,中控设备可以根据用户习惯,优先选择历史记录中最常被调用的可组合能力来识别用户意图及确定该用户意图对应的待执行任务。
策略5,用户注意力优先。
具体的,中控设备可以优先选择用户注意力所在设备中的识别类可组合能力来识别用户意图及确定该用户意图对应的待执行任务。
策略6,基于机器学习/深度学习的推理判断。
具体的,中控设备可以收集在特定时间范围内特定事件和启动的识别类可组合能力的关联数据,基于机器学习/深度学习方法从前者的输入预测一系列用户可能需要启动可组合能力的模型。之后,并基于该模型以特定事件为输入,得到需启动的可组合能力。该方法可以参考当前已广泛应用于推荐系统的排序技术来实现。同时需要考虑进行多模态输入作为特定事件的扩展。
策略7,用户选择优先。
具体的,用户可以自主选择用于识别用户意图及确定该用户意图对应的待执行任务的可组合能力,中控设备可以触发用户选择的可组合能力识别用户意图及确定该用户意图对应的待执行任务。这里对用户选择用于识别用户意图及确定该用户意图对应的待执行任务的可组合能力的方式不做限定,例如可以通过在中控设备上操作选择,可以通过语音、手势等方式选择等等。
策略8,能力优先。
具体的,中控设备可以选择能力更强或更多的设备中的识别类可组合能力来识别用户意图及待执行任务。例如,中控设备可以优先选择处理能力较强的手机中的识别类可组合能力来识别用户意图及待执行任务。
策略9,预置的默认排序优先。
在一些实施例中,设备或者用户可以预先设置用于识别用户意图及待执行任务的设备的 优先级,例如可以在设备出厂时预置或者由用户在使用过程中预置。该预置的设备优先级可以存储在云端服务器,也可以存储在通信系统10的任意一个或多个设备中。
具体实现中,中控设备可以根据预置的设备优先级,优先选择优先级高的设备中的服务类可组合能力来识别用户意图及待执行任务。
使用上述策略,可以有效地筛选出合适的识别类可组合能力来识别用户意图及确定该用户意图对应的待执行任务。
不限于上述列举的几种策略,本申请实施例还可以使用其他策略来选择识别类可组合能力识别用户意图及确定该用户意图对应的待执行任务。例如,中控设备还可以选择距离中控设备较近设备中的识别类可组合能力来识别用户意图及确定该用户意图对应的待执行任务,或者,选择最近和中控设备有交互的设备中的识别类可组合能力来识别用户意图及确定该用户意图对应的待执行任务。
中控设备在虚拟聚合设备中选择的用于分析特定事件表征的用户意图,以及,确定该用户意图对应的待执行任务的部分或全部交互类可组合能力,可以被称为第三可组合能力,第三可组合能力所在的物理设备即为第二设备。
也就是说,中控设备可以在虚拟聚合设备中,选择具备识别用户意图及用户意图对应的待执行任务的能力的部分或全部设备作为第二设备。
第三可组合能力的数量可以为一个或多个。第二设备的数量也可以为一个或多个。举例说明,在家庭范围中,中控设备可以将智慧屏、手机确定为第二设备,并触发智慧屏和手机启动处理器来分析特定事件表征的用户意图,以及,确定该用户意图对应的待执行任务。
如果中控设备在S105中配置了虚拟聚合设备,则该虚拟聚合设备中的第三可组合能力已经提前做好了启动的准备,因此,在S107中,中控设备可以快速、方便地触发第二设备启动该第三可组合能力来分析特定事件所表征的用户意图,并确定该用户意图对应的待执行任务。可见,通过配置虚拟聚合设备,可以提高通信系统10执行S107的效率,从而更好地为用户提供服务。
在本申请实施例中,中控设备可以通过设备间的连接,通过发送通知消息(如广播、组播)等方式,触发第二设备启动第三可组合能力来分析特定事件所表征的用户意图,并确定该用户意图对应的待执行任务。
在本申请的一些实施例中,如果S106中第一设备在本地识别到特定事件,则可以将特定事件通知给第二设备,以供第二设备启动第三可组合能力来分析该特定事件所表征的用户意图,并确定该用户意图对应的待执行任务。这里,中控设备可以在确定第二设备后,将其告知第一设备,便于第一设备将特定事件通知给第二设备。
在本申请的另一些实施例中,如果S106中中控设备获知当前采集到的特定事件,则可以由中控设备直接触发或通知该第二设备分析该特定事件所表征的用户意图,并确定该用户意图对应的待执行任务。
在本申请的另一些实施例中,如果S106中一个或多个第一设备启动第一可组合能力采集相应的数据后,将采集到的数据发送给第二设备,则可以由第二设备根据该数据分析用户意图并拆分待执行任务。
在本申请实施例中,第二设备可以结合以下一项或多项来分析特定事件所表征的用户意图:第一设备历史检测到的用户状态、设备状态、环境状态,用户画像、全局上下文,或记 忆。
在本申请实施例中,第二设备可以使用意图识别算法、神经网络算法等来分析特定事件所表征的用户意图。
在一些实施例中,在不同的场景(例如家庭范围和办公室)下,第二设备识别到的相同特定事件所表征的用户意图可以不同。
用户意图是指用户的目的或需求。特定事件和用户意图之间的对应关系,可以预先设置,也可以由虚拟聚合设备上的智慧助手在运行过程中学习得到。
举例说明,如果特定事件为用户输入的语音指令“看看客厅的情况”,则第二设备通过语音识别,可分析得到该用户意图包括:查看客厅的情况。又例如,如果特定事件为用户输入的语音指令“开灯”,则第二设备通过语音识别以及用户当前所在的位置(如客厅),可分析得到该用户意图包括:开启客厅的灯。
在本申请实施例中,第二设备识别到的用户意图可以采用结构化数据的方式进行描述。结构化数据是指用某种结构逻辑(例如二维表)来表达的数据。例如,用户意图可以为“操作:开灯;位置:客厅。例如,用户意图可以为“操作:播放音乐;内容:《七里香》”。
第二设备在识别到特定事件所表征的用户意图之后,可以确定该用户意图对应的待执行任务。设备执行用户意图对应的待执行任务,即可以满足用户意图,即满足用户需求。
在一些实施例中,在不同的场景(例如家庭范围和办公室)下,第二设备确定的同一个用户意图对应的待执行任务可以不同。
第二设备确定用户意图对应的待执行任务的过程,可以看做是将该用户意图拆分为待执行任务的过程。由用户意图拆分得到的待执行任务的数量可以为一个,也可以为多个。该多个任务可以是同时并列执行关系,也可以具备一定逻辑执行关系。该逻辑执行关系例如可包括:顺序关系、循环关系、条件关系和布尔逻辑等。
用户意图可以包括多种模态或类型。例如,用户意图可包括:视觉图像播放意图、音频播放意图、开灯意图、振动意图、移动设备的意图等等。例如,用户意图“查看客厅的情况”包括两种模态:查看客厅的实时图像、收听客厅的实时音频。
任务是指设备所执行的一个或多个操作。任务,即设备执行的操作也可以分为多种模态或服务类型。例如,任务可包括:视觉图像播放任务、音频播放任务、振动任务、闪光灯任务、移动任务等等。
在一些实施例中,第二设备可以将用户意图拆分为以模态为单位的一个或多个待执行任务。
在一些实施例中,第二设备可以结合以下一项或多项将用户意图拆分为以模态为单位的待执行任务:第一设备历史检测到的用户状态、设备状态、环境状态,用户画像、全局上下文,或记忆。
在一些实施例中,第二设备可以根据用户意图的模态或类型,选择合适的拆分方法来拆分该用户意图。具体实现中,拆分用户意图的方法可包括以下几种:
方法1,基于历史可组合能力的启动信息来拆分用户意图。
具体的,第二设备可以根据历史用户意图曾经对应启动过的可组合能力的类别,来拆分用户意图。也就是说,第二设备可以查找历史上拆分用户意图的结果,并参考历史拆分结果来拆分当前识别到的用户意图。
方式2,基于机器/深度学习的推理判断。
具体的,第二设备可以收集用户意图,第一设备采集到的用户状态、设备状态及环境状态、用户实际选择启动的可组合能力类别的关联数据,基于机器/深度学习的方法训练出可从用户意图和用户/设备/环境状态的输入推理出可组合能力类别的模型,最终以此为基础拆分出多个待执行任务。
方法3,规则判断。
具体的,第二设备可以预先设置在不同场景中,用户意图及其对应的待执行任务。在识别到场景和用户意图后,可以该场景和用户意图作为输入,经过规则的固定逻辑处理后,即输出对应的一个或多个待执行任务。
在本申请实施例中,第二设备拆分用户意图得到的待执行任务,可以分为确定性任务和概率性任务。前者表示通信系统根据明确的用户意图识别出拆分好的待执行任务;后者则表示不明确用户意图下,用户可能需要设备执行的待执行任务。
概率性任务一般对应不明确的用户意图。由于概率性任务对于不同种类的待执行任务都带有对应的置信度,所以可以进一步按规则进行选择,例如可以仅将满足某阈值的待执行任务确定为用户意图对应的待执行任务。
在一些实施例中,第二设备分析得到特定事件所表征的用户意图,确定该用户意图对应的待执行任务之后,还可以将识别结果发送给中控设备。也就是说,第二设备可以将分析得到的特定事件所表征的用户意图,和/或,该用户意图对应的待执行任务,发送给中控设备。
通过上述基于模态拆分任务的方式,虚拟聚合设备能够利用用户意图对涉及跨设备的输入/输出进行细粒度的控制与执行,同时考虑感知数据的语义信息和环境的显式和隐式意图,进一步提升相对单设备的功能和个性化输出的优势。通过上述方式处理多端多模命令,能灵活适应通信系统的实际能力,提高交互体验和场景适应性。
可选步骤S109,中控设备重配置虚拟聚合设备。
在本申请一些实施例中,中控设备在S105中初始化虚拟聚合设备之后,还可以在当前已有的虚拟聚合设备的基础上,通过该虚拟聚合设备持续检测用户、设备、环境等状态,根据检测到的信息分析用户潜在的服务需求,并适应性调整虚拟聚合设备,即重配置虚拟聚合设备。当前已有的虚拟聚合设备可以是初始化后的虚拟聚合设备,也可以是经过多次重配置后的虚拟聚合设备。也就是说,S109可以多次执行。在本申请实施例中,中控设备可以在虚拟聚合设备检测到状态变化事件后,重配置虚拟聚合设备。
在本申请实施例中,状态变化事件也可以被称为第二事件。
中控设备重配置虚拟聚合设备的可包括如下步骤:
步骤1,中控设备触发第一设备检测状态变化事件。
在一些实施例中,状态变化事件可包括影响通信系统10提供的服务质量的事件。通信系统10提供的服务质量例如可包括用户满意度、和用户习惯的匹配度、人机交互识别的准确率、响应速度等等。
状态变化事件可以是一种模态或多种模态的组合。模态例如可包括文字、语音、视觉、动作、态势(如用户所在位置、用户和设备间的距离)、场景(如办公场景、家庭场景、通勤场景)等。
状态变化事件可以包括以下几种类型:
类型1,用户输入的交互操作。
用户输入的交互操作可包括但不限于:语音指令、作用于显示屏上的触控操作(如点击 操作、长按操作、双击操作等)、隔空手势/悬浮手势、作用于设备按键上的操作、姿势、眼球转动指令、口型指令、移动或摇晃设备的操作,等等。
举例说明,用户使用手机时,如果想要投屏,可以输出语音指令“投屏”,也可以在手机的显示屏上点击投屏按钮。
类型2,用户状态发生变化的事件。
用户状态例如可包括用户所处的位置、用户执行的事务(例如运动、办公、看电视等)等等。用户状态变化的事件例如可包括:用户位置移动(例如移动了0.5米)、用户起床、用户睡觉、用户出门、用户运动等等。
类型3,设备状态发生变化的事件。
设备状态例如可包括设备电量、功耗、所处位置等等。设备状态变化的事件可包括电量低于阈值、位置移动、功耗高于阈值、有新设备加入或上线通信系统10、通信系统10中有设备退出或下线等等。
类型4,用户和设备之间态势发生变化的事件。
用户和设备之间的态势例如可包括两者之间的距离。用户和设备之间态势发生变化的事件例如可包括用户移动设备(例如拿起手机)、用户和设备之间的距离发生变化(例如变大或变小)。
类型5,环境状态发生变化的事件。
环境状态例如可包括:环境的温度、湿度、紫外线强度、风量、环境光等等。环境状态变化的事件例如可包括温度大于阈值(例如30摄氏度)。
类型6,设备获取到通知消息,或者,获取到即将执行的日程信息的事件。
电子设备获取到的通知消息可以由该设备中的应用在运行过程中主动生成,也可以由设备中应用对应的服务器发送,也可以由其他设备发送。例如,电子设备可以接收到可信机构发送的用于通知极端天气(例如风暴、大雪等)的通知消息等等。
日程是指对某个时刻或时间段的计划与安排。日程也可以称为事件、事务、行程或其他名称,这里不做限定。日程信息可以来自于电子设备中的备忘录、日历(calendar)、闹钟(clock)、订票类应用、线上会议类应用等等。
在本申请实施例中,中控设备可以在当前配置的虚拟聚合设备中,选择支持检测状态变化事件的可组合能力,即在当前的虚拟聚合设备的交互类可组合能力中,选择其中的部分或全部交互类可组合能力来检测该状态变化事件。
中控设备可以在当前的虚拟聚合设备的交互类可组合能力中,任意选择或者根据一定策略来选择部分或全部可组合能力检测状态变化事件。该策略例如可以为:选择距离中控设备较近的设备中的可组合能力来检测特定事件、选择和中控设备最近有交互的可组合能力来检测状态变化事件、活跃度优先、近用户优先、用户习惯优先,等等。
在一些实施例中,中控设备可以根据当前时间或场景在虚拟聚合设备中选择部分交互类可组合能力,来检测状态变化事件。例如,场景可包括白天模式、夜间模式、观影模式、运动模式等等等,中控设备可以在不同的场景下选择虚拟聚合设备中不同的交互类可组合能力来检测状态变化事件。
中控设备选择的用于检测状态变化事件的部分或全部交互类可组合能力,和S106中用于检测特定事件的交互类可组合能力相同,即用于检测状态变化事件的交互类可组合能力为第一可组合能力,用户检测状态变化事件的设备为第一设备。选择第一可组合能力、第一设备的策略,可参考前文S106中的相关描述。
在本申请实施例中,中控设备可以通过设备间的连接,通过发送通知消息(如广播、组播)等方式,触发第一设备启动第一可组合能力来检测状态变化事件。
在本申请的一些实施例中,各个第一设备启动第一可组合能力采集相应的数据后,可以在本地分析该数据,并将分析的结果(例如识别到的事件)发送给中控设备,以供中控设备获知当前是否检测到状态变化事件。
在本申请的另一些实施例中,一个或多个第一设备启动第一可组合能力采集相应的数据后,可以将自身采集到的数据发送给中控设备,由中控设备来融合多个第一可组合能力采集的数据,分析当前是否检测到状态变化事件。
在本申请的另一些实施例中,一个或多个第一设备启动第一可组合能力采集相应的数据后,可以将自身采集到的数据发送给后续S107中的第二设备,由第二设备根据该数据分析用户的服务需求。
步骤2,中控设备触发第二设备根据检测到的状态变化事件,分析用户的服务需求。
在获知状态变化事件后,中控设备可以根据该状态变化事件来分析或识别用户的服务需求。换句话说,中控设备可根据状态变化事件来预测用户的服务需求。
在本申请实施例中,中控设备基于状态变化事件识别出来的用户服务需求可以分为确定性需求和概率性需求。
确定性需求为识别出的需要为用户提供的确切服务,通常由用户的明确指令(例如用户在设备人机界面上的操作、用户明确的语音指令,或符合特定设备人机交互所定义的手势等)为输入。
概率性需求则表示识别出的用户可能潜在的服务需求,即用户出现请求该服务的趋势,但未能确定需要马上提供服务。概率性需求一般对应非显性的用户行为(例如位置变化、睡眠状态变化等)或环境本身的状态变化(例如温度变化等)。由于概率性需求往往有多个可能的输出并带有对应的置信度,所以可以进一步按照规则进行选择,例如可以选择满足某阈值的一个或多个作为备选。
如上所述,状态变化事件可以包括多种类型。所以在具体实现中,原始的虚拟聚合设备可能会检测到多种状态变化事件,中控设备可能基于单一的状态变化事件分析用户的服务需求,也可能基于多个状态变化事件综合分析用户的服务需求。
中控设备根据状态变化事件识别或分析用户服务需求的方式,可以包括以下几种:
方式1,基于固定规则确定用户的服务需求。
具体的,即根据场景特性,预先设定固定的判断识别规则,状态变化事件经过该规则的固定逻辑处理后,即输出判别结果,即输出用户服务需求。例如,用户在厨房大概率会发生厨电控制事件,因此如果检测到用户步入厨房,则可以判断用户有厨电控制需求。又例如,用户在气温高于30摄氏度时大概率会开启空调,因此如果检测到气温高于30摄氏度,则可以判断用户有启动空调的需求。
方式2,基于知识图谱确定用户的服务需求。
知识图谱是一种知识库,其中的数据通过图结构的数据模型或拓扑整合而成。知识图谱通常被用来存储彼此之间具有相互联系的实体。在本申请实施例中,知识图谱展示了不同的状态变化事件,和,用户的服务需求之间相互联系的数据结构。知识图谱可以基于以往用户和通信系统10中的交互信息来构建。在其他一些实施例中,知识图谱还可以是人工设计,或 基于对大量群体用户的统计而获得。例如上述启动空调的例子中,初始可以通过人工设计或者统计方式定义“30℃”时有启动空调的需求,后续运行过程中如果逐步发现某具体用户实际上在“32℃”时才有这个需求,则可以更新图谱内容。
不同用户可以对应不同的知识图谱。
与基于固定规则确定服务需求的处理类似,区别在于方式2不用固定的判断逻辑来实现,而是将判断逻辑通过知识图谱所表示的关联关系进行处理。方式2可以灵活扩展确定服务需求的规则和场景。
方式3,通过机器学习确定用户的服务需求。
方式3通过收集实际场景中状态变化事件与用户实际服务需求的关联数据,基于机器学习的方法训练出可从前者推理出后者的模型,以作为判断服务需求的实现机制。方式3可参考当前已广泛应用于语音助手中的语义理解技术(例如从用户语言指令识别出用户意图)来实现,同时可以考虑多模态(例如多个状态变化事件作为输入)识别的扩展。
在本申请一些实施例中,中控设备还可以结合上下文、用户画像、记忆、用户习惯、当前虚拟聚合设备的配置状态等来识别用户的服务需求,这样能够更加精准有效的识别用户的服务需求。其中,通过配置虚拟聚合设备就形成了虚拟聚合设备的配置状态。原理上,“当前虚拟聚合设备的配置状态”可在一定程度上表征过去时刻用户的服务需求,从而可以用于推断和识别用户当前时刻的服务需求(如采用马尔科夫过程建模)。
上下文、用户画像、记忆、用户习惯的定义可参考后续实施例的相关描述。中控设备结合上下文、用户画像、记忆、用户习惯等来识别用户的服务需求的具体实现,可参考后续实施例的详细介绍。
在本申请的一些实施例中,如果步骤1中第一设备在本地识别到状态变化事件,则可以将状态变化事件通知给第二设备,以供第二设备启动第三可组合能力来分析该状态变化事件所对应的服务需求。这里,中控设备可以在确定第二设备后,将其告知第一设备,便于第一设备将状态变化事件通知给第二设备。
在本申请的另一些实施例中,如果步骤1中中控设备获知当前采集到的状态变化事件,则可以由中控设备直接触发或通知该第二设备分析该状态变化事件所对应的服务需求。
在本申请的另一些实施例中,如果步骤1中一个或多个第一设备启动第一可组合能力采集相应的数据后,将采集到的数据发送给第二设备,则可以由第二设备根据该数据分析对应的服务需求。
步骤3,中控设备触发第二设备基于用户的服务需求,确定服务方案。
中控设备可以使用一定的策略,基于用户的服务需求(确定性或概率性),确定需预备的服务方案。
服务方案包括为用户提供服务而需要进行的准备工作。在做好了对应的准备工作后,中控设备即可以在确认需要为用户提供服务时可以直接执行该服务所涉及的流程和功能。
具体地,所确定的服务方案可包含多项信息以指示通信系统的后继调整和适应行为,举例如下:
1.所需交互方式,例如音频输入/输出、视频输入/输出、位置检测、手势检测等。同时,针对各种交互方式,还可以包含其它可组合能力的属性,如位置(如在指定房间、指定区域等)、性能(如远场/近场拾音等)等。
2.能力组合策略,例如可以使用同设备优先(所需要的各个资源和能力尽量来自单一设 备)、近用户优先(与用户进行交互的能力尽量靠近用户)、性能指标优先(优先选择能满足所需交互方式的性能要求,例如播放音乐时一般优先选用远场音频播放)等策略来确定聚合的可组合能力。
为了能更准确、更适合场景地提前调整环境设备的能力组合配置,其具体实施可包含多种方法和策略,举例如下:
对当前已支持的简单交互的服务(例如天气、闹钟等),直接由当前交互设备按照原流程处理,可维持用户简单、直接的体验。
对于涉及多设备交互的服务,则根据该服务输出多设备可组合能力的组合方案,可能的方法有:在对应服务项中预配置、基于知识推理等。
在本申请实施例中,第二设备输出的服务方案中可能存在一些备选项,例如针对某个服务,优选图像展示,但在图像展示能力不具备时,可以通过语音方式;或者某次服务优选远场拾音,但在能力不具备时可以用近场拾音来补充;或者某次服务优选多模态输出,但在能力不具备时,可以只用必须的模态而放弃一些可选模态。在具体实现的时候,还可以将上面这些可能的组合形成多个备选的服务方案,供后面的步骤处理。
可选步骤4,中控设备请求组织虚拟聚合设备的可组合能力。
如果执行了上述步骤3,则中控设备可以基于服务方案,请求聚合该服务方案对应的可组合能力。
具体的,基于服务方案请求聚合可组合能力时,涉及与当前环境中虚拟聚合设备实际所具备的可组合能力的匹配和优选,实施方案可包括以下任意一种或多种:
根据所需交互方式,筛选可组合能力的类型,例如拾音、图像显示、可提供服务种类等。
根据用户/环境状态,筛选需要某个可组合能力的物理位置、朝向等会影响交互效果的因素,例如客厅、卧室等。例如,如果用户在卧室,则可以选择位于卧室的可组合能力。
根据交互方式和用户/环境状态,筛选某个可组合能力的性能需求,例如拾音是远场/近场、显示是公共大屏还是私人小屏、设备移动性等等。例如,如果当前环境中人数较多,则可以选择私人小屏,以保证用户隐私。
上述多种方法同时实施时,其顺序不受限定。
在一些实施例中,针对使用上述方案可能获得的多个满足条件的备选方案,中控设备可以结合当前虚拟聚合设备实际可用的可组合能力,排除不可能实现的方案,并结合用户预配置的策略,最终选定方案并向通信系统10提出聚合申请。
如果未执行上述步骤3,则中控设备可以基于步骤2中分析得到的服务需求,直接请求聚合对应的可组合能力。
中控设备在基于状态变化事件分析得到用户的服务需求后,可以不必单独执行制定服务方案这一步骤,而直接从用户服务需求分析出虚拟聚合设备的配置调整目标规格,从而支持后继的实施配置调整。
具体地,中控设备获取到用户的服务需求后,可以结合通信系统10实际的可组合能力集合以及其中可组合能力的属性(性能、位置等),直接筛选出该服务需求对应所需的可组合能力。在一些实施例中,中控设备还可以基于某一固定或可配置能力组合策略,实施前述的筛选过程。
直接从用户服务需求分析出虚拟聚合设备的配置调整目标规格,可以简化制定服务方案这一步骤,便于实现。另一方面,由于减少了中间层次的分析服务方案的过程,该方案可以 降低对中控设备的处理能力要求,可以广泛应用于性能配置较低的中控设备上。这种方式对于环境设备配置相对简单(如设备数量较少),业务场景相对固定(例如办公室等)等对环境设备智能协同的灵活度要求不高的情况,可以在满足用户要求的情况下,快速、方便地实现虚拟聚合设备的重配置。
通过步骤4,中控设备可根据用户配置偏好、人机交互或用户操作历史,以及用户、设备、环境等状态信息,进行综合决策,选择合适的可组合能力。
步骤5,中控设备聚合该服务方案对应的可组合能力,以重配置虚拟聚合设备。
在完成上述关键步骤后,中控设备即可以进一步对申请到的可组合能力,结合当前虚拟聚合设备的配置状况,进行虚拟聚合设备的重配置。虚拟聚合设备的重配置可包括:改变当前虚拟聚合设备中可组合能力的配置参数、重新选择或更新组成虚拟聚合设备的可组合能力等等。更新当前虚拟聚合设备的可组合能力可包括:新的可组合能力的加入,原有可组合能力在不在需要的时候予以释放。
步骤5中虚拟聚合设备的重配置主要由智慧助手应用调用各个设备操作系统的特定接口来完成,例如可以针对操作系统中特有的分布式技术来完成,这里不做限定。
在本申请一些实施例中,对比上述S109中的状态变化事件和S106中的特定事件可知,特定事件属于状态变化事件的一种,状态变化事件包含但不限于特定事件。也就是说,中控设备可以在检测到特定事件后动态重配置虚拟聚合设备,也可以在检测到其他的状态变化事件后动态重配置虚拟聚合设备。
下面列举几个重配置虚拟聚合设备的示例。
示例1:
假设当前的环境为书房,书房中配备了音箱和智慧屏,同时用户携带手机和耳机。此时,组成虚拟聚合设备的可组合能力可包括这些设备提供的收音(耳机和手机是近场、音箱和智慧屏是远场)、放音(耳机和手机是近场、音箱和智慧屏是远场)、显示(智慧屏和手机)、拍摄(智慧屏和手机),以及这些设备中提供的软件服务和能力,以及这些设备的其它能力。如果深夜,用户在书房发出语音指令“加入视频会议的指令”,以与有时差的海外同事开会,则智慧助手可以根据该语音指令,分析确定当前需要虚拟聚合设备具备视频会议、摄像、显示、拾音、放音的能力。因此,中控设备可以将以下可组合能力组建为虚拟聚合设备:
摄像:智慧屏的摄像头,位置固定、广角,适合视频会议场景;
显示:智慧屏的屏幕,位置固定、大尺寸,适合视频会议场景;
拾音:音箱,配备了麦阵,拾音效果更好,而且能提供空间效果;
放音:耳机,在深夜可以避免外放声音扰人。
中控设备可以将上述各个设备的可组合能力分别配置为手机中视频会议App的输入输入部件,并启动手机上的视频会议App。这样,对于该App来说,与原来单纯在单个手机上运行该功能是完全一样的,但实际已经用了更适合该情景、更高体验的方式来提供服务。
示例2:
中控设备通过操作历史、用户画像等信息,了解到用户在晨起后经常会到客厅通过大屏查询日程。初始时,手机作为常用交互设备,一般会保持作为虚拟聚合设备的一部分。同时,早上用户在房间睡觉,用户佩戴的智能手表的睡眠检测能力被配置到虚拟聚合设备中,用于检测用户睡眠状态。其它设备因没有相关执行或检测任务而休眠。
如果手表检测到用户醒来的状态变化事件,则智慧助手可执行以下操作:(1)确定用户的服务需求为:预期的用户会到客厅,用大屏播报日程;(2)确定服务方案为:对用户位置进行检测;(3)然后进行动态配置:启动室内位置检测的可组合能力(如毫米波雷达、摄像头等),即将这些可组合能力配置并激活成为虚拟聚合设备的一部分。
如果室内定位相关的可组合能力检测到用户走往客厅的状态变化事件,则智慧助手可执行以下操作:(1)确定用户的服务需求为:在客厅浏览日程;(2)确定服务方案为:需获取日程信息(手机已提供),以及呈现日程的显式能力。(3)然后进行动态配置:按照用户的偏好(例如用户偏爱用客厅大屏进行展示),将大屏的显示能力(其可组合能力之一)配置成为虚拟聚合设备的一部分,即虚拟聚合设备已提前准备好给用户进行大屏显示。
如果手机检测到用户发出的语音指令“播报日程”,则智慧助手可执行以下操作:(1)确定用户的服务需求为:播报日程;(2)确定服务方案为:使用用户所偏爱的展示方式(智慧屏)播报日程信息(来自手机);(3)由于此时支持该服务方案的可组合能力已经被配置为虚拟聚合设备,因此可以迅速响应(无需临时执行相关的准备工作),使用该可组合能力执行任务。
在上述示例2中,虚拟聚合设备共检测到三次状态变化事件:用户醒来的状态变化事件、用户走往客厅的状态变化事件、语音指令“播报日程”。其中,用户醒来的状态变化事件、用户走往客厅的状态变化事件触发了虚拟聚合设备的重配置。语音指令“播报日程”作为特定事件触发虚拟聚合设备作出响应。
在本申请一些实施例中,在执行上述107之后,即第二设备确定待执行任务之后,如果当前的虚拟聚合设备不能支持执行该待执行任务,也可以触发虚拟聚合设备的重配置。具体的,中控设备可以将支持该待执行任务的可组合能力配置为虚拟聚合设备。
在本申请实施例中,中控设备重配置虚拟聚合设备之后,还可以触发通信系统10中的部分设备提示用户当前已重配置该虚拟聚合设备。提示方式不做限定。例如,参考图5M,中控设备在重配置虚拟聚合设备后,可以触发电子设备(例如手机)在用户界面63中显示提示信息,以提示用户当前已重配置虚拟聚合设备。此外,用户还可以点击控件635,以查看当前重配置的虚拟聚合设备所包含的可组合能力。
通过重配置虚拟聚合设备这种逐步调整配置的方式,可以避免开启不适合目标场景的设备可组合能力。一方面,能在减少设备资源占用的情况下,仍能持续感知相关状态变化,避免了信息丢失,保证对用户的正确响应;另一方面,可以避免不必要的信息感知为用户隐私带来的风险。
通过上述重配置虚拟聚合设备的过程,能够在用户、设备、环境状态持续变化的场景下,控制虚拟聚合设备动态、自适应地重配置,使其能够准确、个性化地满足用户确定或潜在(未发生)的服务需求。
对比上述初始化虚拟聚合设备,和,重配置虚拟聚合设备的过程可知,初始化虚拟聚合设备时没有或未使用前置的虚拟聚合设备的状态及能力,而重配置虚拟聚合设备时则具备或使用了前置的虚拟聚合设备的状态及能力。
S108,中控设备触发虚拟聚合设备中的第三设备执行满足用户意图的待执行任务。
在本申请一些实施例中,中控设备可以选择虚拟聚合设备中的部分或全部资源来执行满足用户意图的待执行任务。该部分或全部资源可以被称为第二资源。第二资源可包括来自一个电子设备的资源,也可以包括来自多个电子设备的资源。第二资源的数量可以为一个或多 个。第二资源为可组合能力,例如可以为服务类可组合能力。
在本申请一些实施例中,中控设备可以选择配置的虚拟聚合设备中的部分或全部服务类可组合能力来执行上述待执行任务。也就是说,中控设备可以触发第二设备匹配待执行任务至合适的可组合能力,然后触发对应的该可组合能力所在的设备来执行该待执行惹怒。
中控设备可以在虚拟聚合设备的服务类可组合能力中,任意选择或者根据一定策略来选择部分服务类可组合能力来执行上述待执行任务。
具体的,中控设备可以结合以下一项或多项来选择合适的服务类可组合能力执行上述待执行任务:通信系统10历史检测到的用户状态、设备状态、环境状态,用户画像、全局上下文,或记忆。
上述策略例如可以包括以下任意一种或多种的结合:
策略1,可组合能力的活跃程度优先。
具体的,中控设备可以选择活跃程度较高或最高的一个或多个可组合能力来执行上述待执行任务。可组合能力的活跃程度的确定方式可参考前文。这样可以选择活跃度较高的服务类可组合能力来为执行上述待执行任务。
策略2,近用户优先。
具体的,中控设备可以选择距离用户较近的设备设备中的服务类可组合能力来执行上述待执行任务。设备与用户之间距离的判断方式可参考前文。
策略3,同输入设备优先。
具体的,中控设备可以优先选择第一设备中的服务类可组合能力来执行上述待执行任务。
在一些实施例中,中控设备可以选择采集到更加关键的用于识别特定事件的信息的设备中的服务类可组合能力来组合上述待执行任务。例如,如果S106中用于检测语音指令的设备包括离用户较远的茶几上的手机A,以及,用户手中握持的手机B,则中控设备可以选择采集到的较大声强的手机A来响应用户的语音指令。
策略4,用户习惯优先。
具体的,中控设备可以根据用户习惯,优先选择历史记录中最常被调用的可组合能力来执行上述待执行任务。这样可以选择用户习惯使用的服务类可组合能力来为执行上述待执行任务。
策略5,基于机器学习/深度学习的推理判断。
具体的,中控设备可以收集在特定时间范围内待执行任务和启动的服务类可组合能力的关联数据,基于机器学习/深度学习方法从前者的输入预测一系列用户可能需要启动可组合能力的模型。之后,并基于该模型以待执行任务为输入,得到需启动的可组合能力。该方法可以参考当前已广泛应用于推荐系统的排序技术来实现。同时需要考虑进行多模态输入作为待执行任务的扩展。
策略6,用户选择优先。
具体的,用户可以自主选择用于执行上述待执行任务的可组合能力,中控设备可以触发用户选择的服务类可组合能力执行上述待执行任务。这里对用户选择用于执行上述待执行任务的可组合能力的方式不做限定,例如可以通过在中控设备上操作选择,可以通过语音、手势等方式选择等等。这样可以根据用户的实际需求选择服务类可组合能力来为执行上述待执行任务。
策略7,能力优先。
具体的,中控设备可以选择能力更强或更多的设备中的服务类可组合能力执行上述待执 行任务。例如,中控设备可以优先选择有屏幕的或有扬声器的设备中的服务类可组合能力执行上述待执行任务。
策略8,预置的默认排序优先。
在一些实施例中,设备或者用户可以预先设置用于执行上述待执行任务的设备的优先级,例如可以在设备出厂时预置或者由用户在使用过程中预置。该预置的设备优先级可以存储在云端服务器,也可以存储在通信系统10的任意一个或多个设备中。例如,该预置的设备优先级由高到底可以是:有屏音箱、无屏音箱、智慧屏、车机、手机、PAD、手表。
具体实现中,中控设备可以根据预置的设备优先级,优先选择优先级高的设备中的服务类可组合能力来为执行上述待执行任务。
策略9,用户的注意力优先。
注意力是指用户对外部世界的观察,对周遭环境的感知。
用户注意力所在的设备是焦点设备,是用户的人脸、视线及身体所关注的设备。
在本申请一些实施例中,中控设备可以选择用户注意力所在的设备中的服务类可组合能力来为执行上述待执行任务。
中控设备可以利用通信系统10中一个或多个设备采集到的环境信息来确定用户的注意力所在的设备。环境信息可以是单一模态信息,也可以是多模态信息的组合。例如,环境信息可包括以下任意一种或多种:位置信息、文字、音频、视频等等。
具体的,中控设备确定用户注意力所在设备的方法,具体可包括以下几种:
(1)中控设备利用具备摄像头的B设备采集的图像来确定用户注意力所在的设备。
在第(1)种方法中,具备摄像头的B设备也可以被称为第四设备。
参考图8A,如果B设备采集到的图像包含A设备和用户,即A设备和用户都在B设备视野内,则中控设备可通过以下流程来确定用户的注意力在A设备上:
计算用户在B设备相机坐标系下的世界坐标Location_user(X u,Y u,Z u);
计算A设备在B设备相机坐标系下的世界坐标Location_A(X A,Y A,Z A);
计算用户在B设备相机坐标系下的视线方向Direction_Gaze(Pitch u,Yaw u,Raw u)。
之后,可通过以下任意一种方法计算用户的注意力是否在A设备上:
方法一:计算用户到A设备的方向矢量与视线方向的方向相似度,若该方向相似度大于阈值,则确定用户的注意力位于A设备上。
具体的,可以先计算用户到A设备空间矢量Direction_user2A(X A-X u,Y A-Y u,Z A-Z u),然后使用相似度度量方法如余弦相似度、欧式距离等,计算用户到A设备的方向矢量与视线方向的方向相似度。
方法二:计算用户的视线在A设备任意坐标平面上的落点与A实际坐标的距离,若该距离小于阈值,则确定用户的注意力位于A设备上。
具体的,以A的xy平面为例,用户视线在A_xy平面上的落点为Z′ A_B,计算Z A与Z′ A的距离,其他平面的计算方法相同。
(2)中控设备利用具备麦克风的A设备,和,具备麦克风和摄像头的B设备,确定用户注意力所在的设备。
在第(2)种方法中,具备麦克风的A设备也可以被称为第四设备,具备麦克风和摄像头的B设备也可以被称为第五设备。
参考图8B,如果B设备采集到的图像包含用户但不包含A设备,即用户在B设备视野内但A设备不在B设备视野范围内,则中控设备可通过以下流程来确定用户的注意力在A设 备上:
B通过声源定位,定位到在B坐标系下A的方位α 1
A通过声源定位,定位到在A坐标系下用户的方位β 1
B通过视觉检测,定位到用户在B坐标系下的坐标(x1,y1);
根据(x1,y1)、α 1、β 1,即已知三角形的2个角和1条边,可以求解出(x2,y2),即B坐标系下A的坐标,从而建立A坐标系和B坐标系之间的关系。
进而可以采用上一场景中的方法,将B设备所检测的用户视线方向,转换映射到A坐标系下,并采用场景(1)中的方法,计算用户的注意力是否在A设备上。
(3)中控设备利用具备摄像头的A设备和B设备,确定用户注意力所在的设备。
在第(3)种方法中,具备摄像头的A设备也可以被称为第四设备,具备摄像头的B设备也可以被称为第五设备。
在一些实施例中,参考图8C,如果A设备采集到的图像包括B设备,B设备采集到的图像包括用户,即B设备在A设备的视野内且用户在B设备的视野内,则中控设备可通过以下流程来确定用户的注意力在A设备上:
R A→B和T A→B为将A坐标系下的坐标向量(如normA=(x1,y1,z1)),转换为B坐标系下的坐标向量(如normB=(x2,y2,z2))的变换矩阵:normB=normA·R A→B+T A→B
用户与B的转换关系R user→B,T user→B
计算用户在B坐标系下的视线方向Gaze coordB
然后求用户人脸到A坐标系的转换关系:
normB=user*R user→B+T user→B
normA=normB*R B→A+T B→A
=user*R user→B*R B→A+T user→B*R B→A+T B→A
可以得到:
R user→A=R user→B*R B→A
T user→A=T user→B*R B→A+T B→A
之后,可得用户视线在A坐标系下的表示:
Gaze coordA=Gaze coordB*R user→A+T user→A
然后可以判断用户的视线落点是否在A设备上,若是,则可以确定用户的注意力在A设备上。
在另一些实施例中,参考图8D,如果A设备采集的图像不包括用户,B设备采集的图像包括用户,即用户不在A设备的视野内但在B设备的视野内,并且A设备与B设备视野有交集,则中控设备可通过以下流程来确定用户的注意力在A设备上:
A、B通过特征点匹配/图像匹配等方式,确定二者视野有重叠;
A、B分别通过深度估计得到匹配点在各自坐标系下的世界坐标;
通过A、B坐标系下的共点,利用对极约束,计算A、B两个坐标系的转换关系;
B通过视觉检测,计算得到用户在B坐标系下的世界坐标和视线方向;
得到A在B坐标系下的世界坐标;
在建立了A坐标系和B坐标系之间的空间映射关系后,可以将B坐标系下的用户世界坐标和视线方向,映射到A坐标系下。然后可以利用场景(1)中的方法,计算用户的注意力是否在A设备上。
上述策略9中的A设备和B设备,可以是通信系统10中的任意设备,也可以是中控设备根据一定策略选择的第一设备,这里不做限定。
可见,通过上述策略9,可以选择用户注意力所在的设备中的服务类可组合能力来执行上述待执行任务,使交互方式更加自然,也更符合用户的需求。此外,用户也可以通过调整视线,触发虚拟聚合设备选择自己注意力所在的设备来执行上述待执行任务。
使用上述几种确定用于执行上述待执行任务的服务类可组合能力的策略,在针对某个模态的待执行任务具备多个或多端(即多设备)可用的可组合能力的情况下,可以有效地筛选出合适的服务类可组合能力来执行该待执行任务。
不限于上述列举的几种策略,本申请实施例还可以使用其他策略来选择服务类可组合能力执行上述待执行任务。例如,中控设备还可以选择距离中控设备较近设备中的服务类可组合能力来执行上述待执行任务,或者,选择最近和中控设备有交互的设备中的服务类可组合能力来执行上述任务。
中控设备在虚拟聚合设备中选择的用于执行上述待执行任务的部分或全部服务类可组合能力,可以被称为第二可组合能力,第二可组合能力所在的物理设备即为第三设备。
第二可组合能力的数量可以为一个或多个。第三设备的数量也可以为一个或多个。举例说明,在家庭范围中,中控设备可以将智慧屏、智能音箱确定为第三设备,并触发智慧屏播放图像,触发智能音箱播放音频。
也就是说,中控设备可以在虚拟聚合设备中,选择具备执行上述待执行任务的能力的部分或全部设备作为第三设备。
在一些实施例中,在不同的场景(例如家庭范围和办公室)下,针对相同的待执行任务,用于执行该待执行任务的第二可组合能力和第三设备也可以不同。
如果中控设备在S105中配置了虚拟聚合设备,则该虚拟聚合设备中的第二可组合能力已经提前做好了启动的准备,因此,在S108中,中控设备可以快速、方便地触发第三设备启动该第二可组合能力来执行上述待执行任务。可见,通过配置虚拟聚合设备,可以提高通信系统10执行S108的效率,从而更好地为用户提供服务。
在本申请实施例中,S108可以延时执行。在一些实施例中,S108也可以不执行。
在本申请实施例中,中控设备可以通过设备间的连接,通过发送通知消息(如广播、组播)等方式,触发第三设备启动第二可组合能力来执行上述待执行任务。具体的,中控设备可以根据上述筛选可组合能力的结果,将执行待执行任务的指令分发给各个可组合能力所在的第三设备,触发该第三设备按照多个待执行任务之间的执行关系来执行对应的待执行任务。
在本申请的一些实施例中,S107中第三设备确定用户意图对应的待执行任务后,可以将该待执行任务通知给第三设备,以供第三设备启动第二可组合能力来执行上述待执行任务。这里,中控设备可以在确定第三设备后,将其告知第二设备,便于第二设备将该待执行任务通知给第三设备。
在本申请的另一些实施例中,如果中控设备获知了用户意图对应的待执行任务,则可以由中控设备直接触发或通知该第三设备执行上述待执行任务。
下面列举几个示例:
示例1:
参考图9,图9示出了示例1的场景。
如图9所示,假设小孩在客厅看电视并操作,母亲在房间工作并用客厅智慧屏摄像头看护孩子。此时的虚拟聚合设备可以包括:客厅和房间设备(这个例子中涉及客厅的智慧屏和音箱,房间的手机,以及环境中人员位置定位能力的相关设备)中组件能力的集合。初始时,虚拟聚合设备调度客厅智慧屏的相关能力服务于孩子看电视的需求。同时,由于母亲在房间,所以虚拟聚合设备需要配置相关的能力(如手机的能力)以准备感知母亲的请求。
如果母亲在房间对着手机说“看看客厅的情况”,智慧助手可以识别到“看客厅”的意图,并将其拆分为观看客厅摄像、收录客厅声音、播放客厅的声音等三个模态的待执行任务。
当前环境中有多个能力组件可能用于执行相关任务。例如:
视觉类:在虚拟聚合设备可选的能力组件包括智慧屏、手机的摄像能力;
音频类:可选的能力组件包括智慧屏、手机、音箱的音频能力;
显示类:可选的能力组件包括了智慧屏、手机的显示能力;
智慧助手在选择对应设备执行待执行任务时,可以选择以下可组合能力来执行上述待执行任务:
音箱的拾音能力。在客厅内,因为孩子正在看电视,而且可能会与智慧屏进行语音交互,所以智慧屏的显示、放音和拾音能力已经被占用,不在考虑范围。所以智慧助手可以选择其它能力,例如可以选择音箱提供的拾音能力。
智慧屏的摄像能力。虽然智慧屏的部分能力被占用,但其摄像能力处于空闲可用状态,而且是客厅内该类能力的唯一选择,所以选择了该能力执行任务。
手机的播放能力。出于房间中只有手机可用,或者因为用户正在与手机交互(发出该指令)等原因,智慧助手可选择手机来播放音频。
上述示例支持母亲在智慧屏部分可组合能力被占用的情况下,使用闲置的可组合能力获得客厅的视觉情况。通过上述方式,可以解决可组合能力分配的冲突问题,实现了一个物理设备划分出多个可组合能力分别服务于不同的任务。即,本申请实施例支持各个设备按照可组合能力进行管理和服用,并且允许多用户同时使用同一设备的不同可组合能力。
示例2:
假设虚拟聚合设备包括:客厅和房间设备中可组合能力的集合。在房间里的手机接到用户亲人的来电,智慧助手可以识别到“接听电话”的意图。智慧助手可先把“接听电话”的意图分解成播放音频的待执行任务。对于播放音频任务,虚拟聚合设备可选的可组合能力包括:(1)手机上的音频可组合能力;(2)音箱上的音频可组合能力;(3)大屏的音频可组合能力。智慧助手可根据匹配度选择合适的可组合能力执行播放音频的待执行任务。例如,由于手机在若干时间内没有被用户使用,可见手机不是用户想要的用于执行待执行任务的设备,大屏正在播放音频,音箱内的音频可组合能力是较为合适的提供放音能力的可组合能力。因此,智慧助手可将播放音频的待执行任务分发至音箱的音频可组合能力中,同时分发调节音量的任务给与其冲突的能力组件。
通过上述步骤S107-S108,虚拟聚合设备能够把用户意图分拆成多个待执行任务,再将其分发至虚拟聚合设备中不同的可组合能力,能够充分利用虚拟聚合设备的能力为用户提供范围更广且环绕式的服务。
在上述图3所示的基于多设备提供服务的方法中,第一设备、第二设备、第三设备可能包括相同的设备,也可能包括不同的设备。类似的,第一资源、第二资源、第三资源中的任意一项或多项,可能全部来自同一设备,也可能全部或部分来自不同的设备。第一资源、第 二资源、第三资源中的任意多项,可能相同,也可能不同。第一可组合能力、第二可组合能力、第三可组合能力中的任意一项或多项,可能全部来自同一设备,也可能全部或部分来自不同的设备。第一可组合能力、第二可组合能力、第三可组合能力中的任意多项,可能相同,也可能不同。
实施图3所示的基于多设备提供服务的方法,无需依赖不同设备上各个单独语音助手之间的指令信息交换来执行跨设备的任务,而是通过中控设备统一调度系统内资源,把已识别的用户意图分解成待执行任务,将其分发至合适的可组合能力执行。采用该技术方案,一方面能突破流程式单点的执行方式,提供环绕式的服务,达到一设备两用的效果;另一方面,也考虑了用户意图和实时的场景感知信息。相对于现有技术,该方法可以降低协作成本,同并支持多样化,个性化的任务分发。
在本申请实施例中,通信系统10还可以基于全局上下文来执行上述方法。
首先,介绍一种基于全局上下文的交互方法。该交互方法可以应用于前述的S107-S108步骤中。
具体的,该方法可以基于第一设备上的第一可组合能力接收到多轮交互输入。上述多轮交互输入可以来自于单个设备,也可以来自于多个设备。第一可组合能力可以分析上述接收到的多轮交互输入,以获取到全局上下文。然后,第三可组合能力可基于该全局上下文,确定出用户的意图,进而使得虚拟聚合设备选择合适的第二可组合能力执行用户意图对应的任务。其中,全局上下文中的全局可以指的是通信系统10包括的所有已连接设备,例如,当通信系统10所包括的设备是用户家庭中的所有已连接设备时,则全局为上述用户家庭中的所有已连接设备。全局上下文指的是在通信系统10包括的所有已连接设备中,各交互类可组合能力检测到的设备状态信息、环境信息和/或用户信息,例如,设备状态信息可以是指电子设备的电池状态、电子设备的使用情况和电子设备中的可组合能力是否可以使用等设备状态;环境信息可以是指可组合能力检测到的温度变化、光照变化和该区域内生物活动情况等环境状态;用户信息可以是指用户输入的语音信息、用户输入的手势信息和用户的习惯等用户的显式意图输入或隐式意图输入。
对于多设备场景,通过该交互方法使得虚拟聚合设备获取到来自多设备的交互输入和交互历史,对于全局上下文进行统一管理,可以使得通信系统10基于上述全局上下文更为清楚地识别出用户的真实意图,提高跨设备控制的效率。
下面,以多设备场景包括大屏、手机、智能手表、智能音箱和门铃为例,结合附图对本申请实施例提供的基于全局上下文的交互方法进行示例性说明。
如图10A所示,本申请实施例中一种可能的多设备场景可以包括:大屏110、手机120、智能音箱130、智能手表140和门铃150等。其中,大屏110位于客厅,手机120和智能音箱130位于用户所在的卧室,智能手表140穿戴于用户的手臂上,门铃150位于客厅大门。此时,客厅大门外有一名快递员正在触发门铃150。各设备间可以通过有线或无线的方式连接以用于进行设备间的数据交互。在该多设备场景下,全局指的是上述所列举的大屏110、手机120、智能音箱130、智能手表140和门铃150。上述各设备所具有的可组合能力可以包括于覆盖上述房间区域的通信系统10中。示例性的,可组合能力可以包括近场语音输入能力、远场语音输入能力、用户生理信号检测能力(例如,脑电检测、肌电检测和心率检测等等)、一个或多个传感器、语音识别能力、设备状态检测能力、音乐播放能力、视频播放能力等等。在该示例场景下,上述多个设备中的部分或全部可组合能力可以组成虚拟聚合设备。在该虚 拟聚合设备中,以大屏110为中控设备,也即是说,在包括大屏110、手机120、智能音箱130、智能手表140和门铃150中一个或多个可组合能力的通信系统10中,大屏110可以对大屏110、手机120、智能音箱130、智能手表和门铃150上的各可组合能力进行调度和控制。虚拟聚合设备可以基于上述各可组合能力获取到全局上下文,并对该全局上下文进行统一管理,以基于该全局上下文确定出用户的意图,选择并控制合适的可组合能力执行相应功能。
需要说明的是,上述图10A所示的多设备场景仅仅用于示例性说明本申请实施例,并不对本申请构成任何限制。该多设备场景还可以包括更多的电子设备,例如:冰箱、空调、电脑等,本申请对多设备场景中包括的设备不作限制。
基于上述图10A示例性所示的多设备场景,结合图10B所示的流程图,介绍该交互方法中全局上下文的获取以及应用方式。其中,下述中的第一可组合能力可以为第一资源,第二可组合能力可以为第二资源,第三可组合能力可以为第三资源,第一事件可以是门铃输入事件。
首先,介绍全局上下文的获取方式。
S1001、第一可组合能力接收多轮交互输入。
具体的,在本申请实施例中,可以由第一设备上的第一可组合能力接收多轮交互输入,各轮交互输入可以包括该轮交互输入的相关信息(例如,该轮交互输入的发生时间、该轮交互输入对应的电子设备可组合能力和该轮交互内容等)。第一设备的数量可以是一个或多个。第一可组合能力可以是第一设备上所具有的一个或多个交互类可组合能力。其中,交互类可组合能力所包括的类型可以如图4所示。
关于中控设备在虚拟聚合设备的交互类可组合能力中,任意选择或者根据一定策略来选择部分交互类可组合能力来检测特定事件,可以参考前述中的说明,在此不再赘述。
示例性的,以前述图10A所示的多设备场景为例,当前虚拟聚合设备所配置的交互类可组合能力可以包括:近场语音输入能力、远场语音输入能力、用户生理信号检测能力(例如,脑电检测、肌电检测和心率检测等)等等。上述交互类可组合能力可以接收到来自门铃150检测到触发门铃输入和检测到客厅大门外来人的输入,来自大屏110检测到客厅内没有人的输入,来自智能手表140基于用户心率检测的用户正在熟睡的输入,来自手机120检测到手机120电池电量充足状态的输入、手机120播放能力可用的输入以及手机120在30分钟前使用过的输入。也即是说,上述多轮交互输入来自大屏110的交互类可组合能力(例如,视角交互类可组合能力能力、语音交互类可组合能力等)、手机120的交互类可组合能力(例如,触控交互类可组合能力、姿态交互类可组合能力等)、智能手表140的交互类可组合能力(例如,生理信号交互类可组合能力等)和门铃150的交互类可组合能力(例如,触控交互类可组合能力等)等多个设备上的交互类可组合能力。上述多轮交互输入可以具有多种不同的模态,例如,门铃150中触发门铃的输入的交互模态可以为门铃触发事件,大屏110中客厅没有人的输入的交互模态可以为视角交互输入,智能手表140中用户正在熟睡的输入的交互模态可以为生理信号交互输入等等。
S1002、第三可组合能力分析上述接收到的多轮交互输入,获取全局上下文。
具体的,第二设备可以通过第三可组合能力按照各轮交互输入发生时间的先后顺序进行分析,以确定出全局上下文。其中,全局上下文可以包括以下一项或多项:接收到各轮交互输入的时间、接收到各轮交互输入的第一可组合能力、各轮交互输入的交互内容、各轮交互输入的交互内容、各轮交互输入对应用户的生理特征信息、第一可组合能力所属电子设备的 设备信息(也即是第一设备)、或所述交互输入控制的目标设备的设备信息。
其中,该全局上下文可以存储在指定设备上。该指定设备可以是虚拟聚合设备中的中控设备。在一种可能的实现方式中,若中控设备存储空间不足,不能够存储全局上下文时,该全局上下文可以存储在存储资源宽裕的非中控设备中(例如,图10A所示的手机120或智能音箱130等)。该存储有全局上下文的非中控设备可以提供访问该全局上下文的程序和/或接口,以使得第三可组合能力可以基于该全局上下文确定出用户的意图。
示例性的,以图10A所示的多设备场景为例,第一设备可以是大屏110。第一可组合能力获取的全局上下文可以如表4所示:
表4
Figure PCTCN2022131166-appb-000007
如表4所示,标识为“1”的交互输入发生时间为13:03:12,其对应的电子设备可组合能力是门铃150的按键输入能力,交互内容是触发门铃;标识为“2”的交互输入发生时间为13:03:14,其对应的电子设备可组合能力是大屏110的红外图像检测能力,交互内容是客厅内没有人;标识为“3”的交互输入发生时间为13:03:16,其对应的电子设备可组合能力是智能手表140的心率输入能力,交互内容是用户正在熟睡等等。
上述包括如表4所示标识为“1”、“2”和“3”的多轮交互输入的全局上下文,可以存储在图10A所示多设备场景中的中控设备即大屏110上。
在获取到全局上下文后,第二设备可以通过第三可组合能力,基于上述全局上下文识别出用户意图,将用户意图拆分为待执行任务,并使得虚拟聚合设备将该待执行任务映射到合适的第二可组合能力中。具体流程可以如下述S1003-S1005所示。
S1003、第三可组合能力基于全局上下文,确定出用户意图。
具体的,第二设备可以通过第三可组合能力,基于上述获取到的全局上下文,识别出第一事件所表征的用户意图。
示例性的,以图10A所示的多设备场景为例,第二设备可以是大屏110。大屏110可以基于所获取到的全局上下文,识别出当前的环境状态为客厅大门有来人且触发门铃输入,客厅内没有人物活动,当前用户的状态为正在熟睡中,当前手机120的状态为电池电量充足,播放能力可用且30分钟前该手机120被用户使用过。因此,大屏110可以通过第三可组合能力,基于上述全局上下文确定出第一事件,也即是门铃输入事件所表征的用户意图为“提醒用户客厅大门外有人请求开门”。
S1004、第三可组合能力将上述用户意图拆分为待执行任务。
具体的,第二设备可以通过第三可组合能力将上述用户意图拆分为待执行任务,以便于虚拟聚合设备将待执行任务映射到合适的第二可组合能力上。
示例性的,以图10A所示的多设备场景为例,上述步骤S1003确定出的用户意图为“提 醒用户客厅大门外有人请求开门”。大屏110可以通过服务应答组件中的任务映射模块将该用户意图拆分为多个任务,例如,输出振动提醒用户的任务、播放门外画面的任务、输出门铃提示音任务等等。
S1005、虚拟聚合设备将上述待执行任务映射到第二可组合能力。
具体的,虚拟聚合设备可以基于确定出的用户意图和/或待执行任务,选择第三设备中合适的第二可组合能力执行待执行任务。其中,第三设备的数量可以是一个或多个。
示例性的,以图10A所示的多设备场景为例,大屏110可以将S1004中确定出的待执行任务映射到一个或多个第三设备中的第二可组合能力。例如,第二可组合能力可以是智能手表140中的马达振动能力、手机120中的音乐播放能力和手机120中的视频播放能力。智能手表140可以基于马达振动能力输出振动提醒,手机120可以基于音乐播放能力以音量由弱到强的方式输出门铃提示音,手机120基于视频播放能力输出根据门铃150获取到的客厅大门外的图像等等。
在一种可能的实现方式中,当交互信息中存在指代缺失或槽位缺失的情况时,第三可组合能力可以基于存储的全局上下文按照各交互输入的指定顺序(例如,按照各交互输入发生时间的由近及远顺序)进行检索,并基于指定匹配规则对存储的历史交互信息与当前的交互信息进行匹配分析,以作为缺失槽位分析和缺失指代识别的依据,确定出用户的意图。关于该实现方式中具体流程,后续实施例中将详细描述,在此不再赘述。
下面,介绍本申请实施例提供的上述应用于基于全局上下文的交互方法的软件架构。该软件架构可以用于图10C所示的实施例中。
图10C示例性示出了该软件架构的示意图。如图10C所示,该软件架构可以包括:多源输入交互上下文分析模块、多模态意图决策模块、任务序列生成模块、任务管理模块以及任务映射模块。
具体的,当多轮交互输入来自多个电子设备的交互类可组合能力时,多源输入交互上下文分析模块可以接收到该多轮交互输入进行分析,获取全局上下文。当该全局上下文具有多种模态时,多模态意图决策模块可以基于该全局上下文分析出意图识别结果,以确定出用户的意图。然后,任务序列生成模块可以基于该意图识别结果,通过任务管理模块以及任务映射模块控制一个或多个可组合能力执行相应的功能。可选的,当全局上下文为单一模态时,任务序列生成模块可以基于该全局上下文所确定出的用户意图,通过任务管理模块以及任务映射模块控制一个或多个可组合能力执行相应的功能。
在一种可能的实现方式中,当多轮交互输入来自单个电子设备的交互类可组合能力时,多模态意图决策模块可以基于上述多轮交互输入获取到意图识别结果,以确定出用户的意图。然后,任务序列生成模块可以基于该意图识别结果,通过任务管理模块以及任务映射模块控制一个或多个可组合能力执行相应的功能。
本申请实施例提供的基于全局上下文的交互方法,可以使得虚拟聚合设备基于用户指令和跨设备接收到的用户状态信息、设备状态信息和/或环境状态信息,更为精确地识别出用户的意图。同时,也可以使得虚拟聚合设备基于上述接收到的用户状态信息、设备状态信息和/或环境状态信息,提前动态调整和优化可组合能力部件的组合配置,缩短涉多设备场景下指令响应的时延,并能支持主动服务和后续实施例中长时任务的实现。
需要说明的是,本申请实施例提供的上述软件架构仅仅用于示例性解释本申请,在实际应用中,软件架构中可以包括比本申请实施例提供的更多或更少的模块,也可以包括其他模 块,各模块之间也可以有不同于本申请实施例的组合与信息交互,本申请对此不作限制。
接下来,介绍申请实施例提供的一种基于指定匹配规则对存储的历史交互信息与当前的交互信息进行匹配分析的方法。以图10A所示的多设备场景以及交互输入是语音对话交互为例,图11示例性示出了本申请实施例提供的该匹配分析方法的具体流程。在本实施例中,交互输入可以包括历史输入和当前输入。全局上下文可以基于上述历史输入和当前输入生成。因此,全局上下文可以包括历史交互信息和当前轮交互信息。与当前轮交互信息相关联的历史交互信息可以被称为第一历史交互信息。在交互输入是语音对话交互的示例中,历史输入可以是历史语音对话输入,当前输入可以是当前语音对话输入,历史交互信息可以是历史对话的对话信息,当前轮交互信息可以是当前轮对话的对话信息。第一事件可以是当前语音对话交互事件。
如图11所示,该方法具体可以包括:
S1101、第一可组合能力获取用户与第一设备进行语音对话交互时当前轮对话的对话信息。
具体的,第一设备可以基于第一可组合能力(例如,语音输入能力)获取当前轮对话的对话信息。
示例性的,当用户与手机120进行对话时,则第一设备可以是手机120,第一可组合能力可以是手机120上的近场语音输入能力。第一可组合能力可以获取用户与手机120进行对话时当前轮对话的对话信息。
该步骤中用户与第一设备进行对话时当前轮对话的对话信息可以包括:当前轮对话的对话内容、当前轮对话发生的时间、当前轮对话发生的地点、第一设备的设备信息(例如设备名称、设备标识等)、当前轮对话想要控制的目标设备的设备信息、发出当前轮对话的用户的生理特征信息等。可选的,当当前轮对话中没有明确指示想要控制的目标设备时,则在当前轮对话的对话信息中,当前轮对话想要控制的目标设备的设备信息可以为空。其中:
当前轮对话的对话内容可以包括当前轮对话中用户的输入信息,该输入信息可以是用户发出的一句或多句语音,或者,也可以是用户发出的一句或多句语音转换的文本/文字信息。可选的,当前轮对话的对话内容还可以包括当前轮对话中第一设备回复用户的语音或文字。
当前轮对话发生的时间可以是指当前轮对话中第一设备接收到用户的输入的语音信息的时间。
当前轮对话发生的地点可以是第一设备所在的地点。以用户与手机120进行当前轮对话为例,则该当前轮对话发生的地点可以是手机120所在的卧室。
第一设备的设备信息可以是指与用户进行对话交互的电子设备的设备信息。例如,当用户与手机120进行当前轮对话时,上述电子设备的设备信息即是手机120的设备信息。
当前轮对话想要控制的目标设备的设备信息可以是指当前轮对话中用户的输入信息实际控制的目标设备的设备信息。例如,若上述用户与手机120进行当前轮对话,发出语音指令“停止播放大屏”,则上述目标设备的设备信息可以是大屏110的设备信息。在一种可能的实现方式中,若上述用户与手机120进行当前轮对话,发出语音指令“停止播放”,则当前轮对话想要控制的目标设备的设备信息可以为空。
发出当前轮对话的用户的生理特征信息可以是用户的声纹信息、用户的人脸画像等。例如,以上述用户与手机120进行当前轮对话为例,则当前轮对话的用户的生理特征信息即为当前与手机120进行语音交互的用户的声纹信息。
S1102、第三可组合能力获取虚拟聚合设备中历史对话的对话信息。
具体的,第二设备可以基于第三可组合能力获取虚拟聚合设备中历史对话的对话信息。
示例性的,以图10A所示的多设备场景为例,第三可组合能力可以获取到上述图10A场景中存储于虚拟聚合设备上的历史对话的对话信息。该历史对话的对话信息可以来自于大屏110的语音输入能力、手机120的语音输入能力、智能音箱130的语音输入能力、智能手表140的语音输入能力和门铃150的语音输入能力等等。也即是说,当用户与上述各设备上的语音输入能力进行对话交互时,第三可组合能力可以获取到经由各设备接收到的对话信息,并将上述对话信息存储在虚拟聚合设备中(例如,可以存储在虚拟聚合设备的中控设备即大屏110中)。
在该步骤中,第三可组合能力可以获取到一轮或多轮历史对话的对话信息。每一轮历史对话的对话信息可以包括:该轮历史对话的对话内容、该轮历史对话发生的时间、该轮历史对话发生的地点、接收该轮历史对话的电子设备的设备信息(例如,设备名称、设备标识等)、该轮历史对话想要控制的目标设备的设备信息、发出该轮历史对话的用户的生理特征信息等等。其中:
该轮历史对话的对话内容可以包括该轮历史对话中用户的输入信息,该输入信息可以是用户发出的一句或多句语音,或者,也可以是用户发出的一句或多句语音转换的文本/文字。可选的,每一轮历史对话的对话内容还可以包括该轮历史对话中语音输入能力组件所属电子设备回复用户的语音或文字。
该轮历史对话发生的时间可以是指接收该轮历史对话的时间。
该轮历史对话发生的地点可以是接收该轮历史对话的设备所在的地点。例如,以图10A所示的多设备场景为例,若接收到某轮历史对话的设备为大屏110,该大屏110所处地点为客厅,则该轮历史对话发生的地点可以为客厅。
接收该轮历史对话的电子设备的设备信息可以是与用户进行该轮历史对话的电子设备的设备信息。例如,当用户与大屏110已进行过某轮历史对话,则上述电子设备的设备信息即是大屏110的设备信息。
该轮历史对话想要控制的目标设备的设备信息可以是指该轮历史对话中用户的输入信息实际控制的目标设备的设备信息。例如,若上述用户与大屏110已进行过某轮历史对话,发出的语音指令为“打开大屏”,则上述目标设备的设备信息可以是大屏110的设备信息。
发出该轮历史对话的用户的生理特征信息可以是用户的声纹信息、用户的人脸画像等。例如,以上述用户与大屏110已进行过某轮历史对话为例,则该轮历史对话的用户的生理特征信息即为该轮历史对话中与大屏110进行语音交互的用户的声纹信息。
S1103、第三可组合能力基于当前轮对话的对话信息,从上述获取到的一轮或多轮历史对话的对话信息中匹配得到与当前轮对话的对话信息相关的历史对话的对话信息。
其中,在本实施例的示例中,与当前轮对话的对话信息相关的历史对话的对话信息可以被称为第一历史交互信息。
具体的,该步骤中基于当前轮对话,从上述获取到的一轮或多轮历史对话信息中匹配得到当前轮对话信息相关的历史对话的对话信息的流程,可以包括:按照指定的匹配规则,将获取到的一轮或多轮历史对话的对话信息与当前轮对话的对话信息进行比较匹配,得到与当前轮对话的对话信息相关的历史对话的对话信息。
示例性的,在一种可能的实现方式中,具体的指定匹配规则可以如下述规则1、规则2、 规则3、规则4和规则5所述。对于获取到的一轮或多轮历史对话,当某轮历史对话的对话信息与当前轮对话的对话信息满足下述的一条或多条规则时,则第三可组合能力可以确定出该轮历史对话的对话信息与当前轮对话的对话信息相关。其中:
规则1、该轮历史对话中用户的生理特征信息与当前轮对话中用户的生理特征信息相同,则第三可组合能力可以确定基于该轮历史对话识别出的用户与基于当前轮对话识别出的用户是同一个用户。也即是说,与触发当前轮对话输入的第一用户相关的历史对话的对话信息可以被视为第一历史交互信息。
规则2、该轮历史对话发生时间与当前轮对话发生时间的间隔时间小于时长1(也可以被称为第一时长),且接收该轮历史对话的电子设备(也可以被称为第六设备)的设备信息与第一设备的设备信息相同。其中,时长1可以是3分钟、5分钟等,本申请对时长1的具体大小不作限制。满足于该规则的历史交互信息可以被视为第一历史交互信息。
规则3、该轮历史对话发生时间与当前轮对话发生时间的间隔时间小于时长2(也可以被称为第一时长),且接收该轮历史对话的电子设备(也可以被称为第六设备)是第一设备的近场设备。其中,时长2可以是3分钟、5分钟等,时长2的大小可以和时长1的大小相同,也可以和时长1的大小不同,本申请对此不作限制。满足于该规则的历史交互信息可以被视为第一历史交互信息。
规则4、该轮历史对话发生时间与当前轮对话发生时间的间隔时间小于时长3(也可以被称为第二时长),且该轮历史对话所控制的目标设备的设备信息与当前轮对话想要控制的目标设备的设备信息相同。其中,时长3可以是3分钟、5分钟等,时长3的大小可以和时长1/时长2的大小相同,也可以和时长1/时长2的大小不同,本申请对此不作限制。也即是说,当在第二时间接收到的历史交互信息的目标设备为所述当前轮交互信息的目标设备,并且,第二时间与接收所述当前轮交互信息的时间的间隔小于第二时长时,该历史交互信息可以被视为第二历史交互信息,该第二历史交互信息包括于第一历史信息。
规则5、该轮历史对话发生时间与当前轮对话发生时间的间隔时长小于时长4(也可以被称为第二时长),且该轮历史对话所控制的目标设备为当前轮对话想要控制的目标设备的近场设备。其中,时长4可以是3分钟、5分钟等等。时长4的大小可以和时长1/时长2/时长3相同,也可以和时长1/时长2/时长3的大小不同,本申请对此不作限制。也即是说,当在第二时间接收到的历史交互信息的目标设备为所述当前轮交互信息的目标设备的近场设备,并且,第二时间与接收所述当前轮交互信息的时间的间隔小于第二时长时,该历史交互信息可以被视为第二历史交互信息,该第二历史交互信息包括于第一历史信息。
需要说明的是,上述规则1至规则5仅仅用于示例性解释本申请,并不对本申请作任何限制。
其中,上述电子设备的近场设备是指该电子设备可以通过近场识别能力发现的其他电子设备。该近场识别能力可以由电子设备上的近场设备识别模块提供。该近场设备识别模块可以是检测该电子设备与其他电子设备是否连接在相同局域网的能力模块,也可以是基于蓝牙或广播发现能力识别其他电子设备的能力模块。示例性的,手机120和智能音箱130可以分别配置有近场设备识别模块。手机120可以基于手机120上的近场设备识别模块识别出智能音箱130,智能音箱130也可以基于智能音箱130上的近场设备识别模块识别出手机120,则手机120是智能音箱130的近场设备,智能音箱130是手机120的近场设备。
当各设备基于自身的近场设备识别模块识别出对应的近场设备时,虚拟聚合设备可以存储该设备与该设备对应近场设备的设备信息的映射关系。当第三可组合能力基于上述指定匹 配规则对当前轮对话的对话信息与某一轮历史对话的对话信息进行匹配时,第三可组合能力可以基于上述映射关系查询第一设备的近场设备信息是否包括接收该轮历史对话的电子设备的设备信息。若是,则接收该轮历史对话的电子设备为第一设备的近场设备。若否,则接收该轮历史对话的电子设备不是第一设备的近场设备。或者,当第三可组合能力基于上述指定匹配规则对当前轮对话的对话信息与某一轮历史对话的对话信息进行匹配时,第三可组合能力可以基于上述映射关系查询当前轮对话想要控制的目标设备的近场设备信息是否包括该轮历史对话实际控制的目标设备的设备信息。若是,则该轮历史对话实际控制的目标设备为当前轮对话想要控制的目标设备的近场设备。若否,则该轮历史对话实际控制的目标设备不是当前轮对话想要控制的目标设备的近场设备。
示例性的,以当前轮对话为Sc,某轮历史对话为So为例。接收Sc的电子设备为第一设备。则按照上述指定匹配规则,将Sc的对话信息与So的对话信息进行匹配的过程可以如下:
当Sc对话信息中用户的声纹信息与So对话信息中用户的声纹信息相同时,则第三可组合能力可以确定出基于Sc对话信息识别出的用户与基于So对话信息识别出的用户是同一个用户,将So对话信息作为Sc对话信息的相关对话信息。
当Sc发生的时间与So发生的时间的间隔时间小于时长1,且接收So的电子设备的设备信息与第一设备的设备信息相同时,则第三可组合能力将So对话信息作为Sc对话信息的相关对话信息。
当Sc发生的时间与So发生的时间的间隔时间小于时长2,且接收So的电子设备是第一设备的近场设备,则第三可组合能力将So对话信息作为Sc对话信息的相关对话信息。
当Sc发生的时间与So发生的时间的间隔时间小于时长3,且So实际控制的目标设备的设备信息与Sc想要控制的目标设备的设备信息相同,则第三可组合能力将So对话信息作为Sc对话信息的相关对话信息。
当Sc发生的时间与So发生的时间的间隔时间小于时长4,且So实际控制的目标设备为Sc想要控制的目标设备的近场设备,则第三可组合能力将So对话信息作为Sc对话信息的相关对话信息。
在一种可能的实现方式中,在多设备场景中,还可以将电子设备划分为公用设备和私有设备。其中,公用设备可以指的是能够被多个用户使用的电子设备,私有设备可以指的是仅仅被某一指定用户使用的电子设备,除该指定用户外的其他用户在未经该指定用户的授权前不会使用该电子设备。基于该多设备场景下公有设备和私有设备的划分,指定匹配规则可以包括如下规则6、规则7、规则8和规则9:
规则6、该轮历史对话中用户的生理特征信息与当前轮对话中用户的生理特征信息相同,接收该轮历史对话的电子设备为公有设备或该用户的私有设备。
规则7、该轮历史对话中用户的生理特征信息与当前轮对话中用户的生理特征信息不同,接收该轮历史对话的电子设备为公有设备。
规则8、该轮历史对话的对话内容为指定服务(例如查询天气、播放新闻等)相关的内容。
规则9、接收该轮历史对话的电子设备和该轮历史对话中实际控制的电子设备都是公有设备。
也即是说,当多设备场景下划分有公有设备和私有设备时,第三可组合能力可以先判断每一轮历史对话是否满足上述规则6至规则9中的任意一种,然后从满足规则6至规则9中任意一种的历史对话根据上述规则1至规则5匹配得到与当前轮对话的对话信息相关的历史 对话的对话信息。在一种可能的实现方式中,第三可组合能力判断出某轮历史对话满足规则6或规则7时,可以将该轮历史对话的对话信息确认为与当前轮对话的对话信息相关的历史对话的对话信息。
S1104、第三可组合能力基于匹配出的历史对话的对话信息,识别出当前轮对话的对话信息所表征的用户意图。
其中,在本示例中,第一事件可以是当前语音对话交互事件。当前语音对话交互事件对应有当前轮对话的对话信息。第三可组合能力可以基于匹配出的历史对话的对话信息,识别出当前轮对话的对话信息所表征的用户意图。
在一种可能的实现方式中,若当前轮对话的对话信息中存在指代缺失和/或槽位缺失的情况,则匹配出的历史对话的对话信息可以作为缺失槽位的分析依据和/或缺失指代的识别依据,用以确定出当前轮对话的对话信息所表征的用户意图。示例性的,例如当前轮对话的对话信息包括“订购机票”,该当前轮对话的对话信缺失地点槽位信息。匹配出的历史对话的对话信息包括“查一下北京的天气”。则,基于上述历史对话的对话信息,可以识别出当前轮对话的对话信息中,其缺失的地点槽位信息是“北京”,因此,该当前轮对话的对话信息所表征的用户意图为“订购去北京的机票”。
在另一种可能的实现方式中,匹配出的历史对话的对话信息,可以用于实施图12所示的流程,将该历史对话的对话信息进行编码,与当前轮对话的对话信息对应的意图向量进行融合,所得的融合特征输入相关性模型以计算出相关性得分。关于该实现方式的具体流程,可以参考后续图12所示实施例的描述,在此不赘述。
在另一种可能的实现方式中,匹配出的历史对话的对话信息,可以用于结合当前轮对话的对话信息扩大用户意图对应关键词的搜索范围,以确定出用户的意图。示例性的,若当前轮对话的对话信息包括“刘德华”,匹配出的历史对话的对话信息包括“晚上看电影”。则该历史对话的对话信息可以有行为关键词“看电影”和时间关键词“晚上”,以及隐含的对应相关的场景关键词“电影院”。该历史对话的对话信息扩大了用户意图对应关键词的搜索范围,结合当前轮对话的对话信息,可以确定出用户意图为“晚上去电影院看刘德华的电影”。
实施例本申请实施例提供的该匹配分析方法,使得虚拟聚合设备可以将历史对话信息与当前轮对话信息进行匹配,基于相关的历史对话信息确定出当前轮对话信息所指示的用户意图,可以提高设备识别出用户意图的效率。同时,引入用户身份的识别,可以在查询历史对话信息中保护用户的隐私。并且,在一种可能的实现方式中,将设备划分为私有设备和公有设备,可共享的历史对话信息类型也可以由用户设置,使得匹配结果更个性化,符合用户的习惯。
接下来,介绍本申请实施例提供的一种基于指定算法(例如,深度学习模型)对存储的历史交互输入指令与当前的交互输入指令进行匹配分析的方法。
以图10A所示的多设备场景以及当前的交互输入指令是语音对话交互指令为例,图12示例性示出了本申请实施例提供的该匹配分析方法的具体流程。在本实施例中,交互输入可以包括历史输入和当前输入。全局上下文可以基于上述历史输入和当前输入生成。因此,全局上下文可以包括历史交互信息和当前轮交互信息。与当前轮交互信息相关联的历史交互信息可以被称为第一历史交互信息。在交互输入是语音对话交互的示例中,历史输入可以是历 史语音对话输入,当前输入可以是当前语音对话输入,历史交互信息可以是历史对话的对话信息,当前轮交互信息可以是当前轮对话的对话信息。第一事件可以是当前语音对话交互事件。与当前轮交互信息的相关性大于阈值的历史交互信息可以被视为第一历史交互信息。
如图12所示,该方法具体可以包括:
S1201、第三可组合能力获取用户与第一设备进行语音对话交互时当前轮对话的对话信息。
具体的,关于该步骤的说明可以参考前述图11所示实施例中S1101步骤中的描述,在此不再赘述。
S1202、第三可组合能力将该轮对话的对话信息输入自然语言理解模型中,获取到当前轮对话的对话信息对应的意图向量。
具体的,第二设备可以通过第三可组合能力,基于语音识别(automatic speech recognition,ASR)技术将从第一可组合能力接收到的当前轮对话的对话信息转换成文本信息,并将其输入自然语言理解模型中。该自然语言理解模型可以基于自然语言理解算法(natural language understanding,NLU),通过分词、词性标注和关键词提取等处理操作,将上述当前轮对话的对话信息输出为电子设备可以理解的结构化语义表示数据,该结构化语义表示数据可以被称为意图向量。其中,该NLU算法可以基于文本化的当前轮对话的对话信息进行意图分类和槽位关键词提取。
示例性的,当第一可组合能力接收到的用户语音指令为“订明天去北京的机票”时,第三可组合能力可以将该语音指令转换成文本信息,然后,可以基于NLU算法进行意图分类和槽位关键词提取,其意图分类结果为“预订机票”,时间槽位关键词提取信息为“明天”,地点槽位关键词提取信息为“北京”。
S1203、第三可组合能力采用预训练的自然语言编码器对一轮或多轮历史对话的对话信息进行编码,得到每一轮历史对话的对话信息对应的编码结果。
具体的,第二设备可以通过第三可组合能力,基于预训练的自然语言编码器对某轮历史对话的对话信息进行编码的过程可以包括如下步骤a)至步骤c)所示:
a)、将该轮历史对话的对话信息还原为自然语言描述的文本,得到该轮历史对话的对话信息对应的文本信息。
b)、将该轮历史对话的对话信息对应的文本信息进行编码,得到该轮历史对话的对话信息对应的多个向量。
c)、计算该轮历史对话的对话信息对应的多个向量的平均值,作为该轮历史对话的对话信息对应的编码结果。
示例性的,图13示例性示出了将某轮历史对话的对话信息进行编码的流程。如图13所示,该轮历史对话的对话信息可以包括:该轮历史对话的对话轮数、该轮历史对话中用户的输入信息、接收该轮历史对话的设备的设备名称、接收该轮历史对话的设备的设备状态、以及接收该轮历史对话的设备的近场设备列表。其中,在该示例中,该轮历史对话的对话轮数为“第一轮”,该轮历史对话中用户的输入信息为“打个电话”、接收该轮历史对话的设备的设备名称为“手机”、接收该轮历史对话的设备的设备状态为“开机”、以及接收该轮历史对话的设备的近场设备列表包括“大屏、手表、音箱”,则对该历史对话的对话信息进行编码时,首先可以将该轮历史对话的对话信息还原为自然语言描述的文本:{“第一轮”,“打个电话”,“手机”,“开机”,“大屏、手表、音箱”},并对{“第一轮”,“打个电话”,“手机”,“开机”,“大屏、手表、音箱”}进行编码,得到该轮历史对话的对话信息对应的多个向量。然后,可 以计算多个向量的平均值,得到该轮历史对话的对话信息对应的编码结果。
在一种可能的实现方式中,针对获取到的一轮或多轮历史对话的对话信息,第三可组合能力可以先将上述历史对话的对话信息通过基于规上述图11实施例所示的指定匹配规则的召回引擎,获取到与当前轮对话的对话信息相关的历史对话的对话信息,然后再将该与当前轮对话的对话信息相关的历史对话的对话信息进行编码,以执行下述步骤S1204。
S1204、第三可组合能力将当前轮对话的对话信息对应的意图向量和每一轮历史对话的对话信息对应的编码结果进行融合,得到融合特征。
S1205、第三可组合能力将融合特征输入相关性模型,得到相关性模型输出的当前轮对话的对话信息与每一轮历史对话的对话信息之间的相关性得分。
具体的,图14示例性示出了本申请实施例提供的相关性模型的组成示意图。相关性模型可以包括句对输入编码器、相关性得分网络和关键词提取网络。其中,句对输入编码器可以根据当前轮对话的对话信息对应的意图向量和每一轮历史对话的对话信息对应的编码结果的融合特征进行编码;相关性得分网络可以根据融合特征的编码结果生成当前轮对话的对话信息和每一轮历史对话的对话信息之间的相关性得分;关键词提取网络可以根据融合特征的编码结果,提取每一轮历史对话的对话信息中的关键词。其中,相关性得分网络根据融合特征的编码结果生成当前轮对话的对话信息和每一轮历史对话的对话信息之间的相关性得分,以获取到与当前轮对话的对话信息相关的历史对话的对话信息,可以包括:对每一轮历史对话的对话信息,当该轮历史对话的对话信息与当前轮对话的对话信息之间的相关性得分大于阈值1(例如,0.8)时,确定该轮历史对话的对话信息为与当前轮对话的对话信息相关的历史对话的对话信息。其中,阈值1可以是人为配置的预设值,本申请对阈值1的大小不作限制。
S1206、第三可组合能力基于上述得到的相关性得分,确定出与当前轮对话的对话信息相关的历史对话的对话信息。
其中,在本实施例的示例中,与当前轮对话的对话信息相关的历史对话的对话信息可以被称为第一历史交互信息。
S1207、第三可组合能力基于当前轮对话的对话信息,及与其相关的历史对话的对话信息,识别出用户意图。
具体的,该步骤可以参考前述步骤S1104中的描述,在此不再赘述。
实施例本申请提供的该匹配分析方法,可以将多个设备间的历史对话信息(包括输入文本、设备状态等),统一利用一个自然语言编码器进行编码,从而能够提高了当前轮对话信息与历史对话信息的匹配精确性和效率。并且,将对话信息采用自然语言描述,当新增对话信息时,可以自动生成该对话信息的编码,不需要人工定义字典中的键值对,且该编码可以更好地表征该对话信息的内容。
在一些实施例中,当用户输入的交互信息中包括指代词时,还可以通过指代消解的方法来确定出用户的意图。
在一些场景中,当用户基于对话交互向第一可组合能力发送语音指令或文本指令时,往往会出现用户向第一设备输入的单轮对话中包括连续多个指令的情况。并且,该包括多个指令的单轮对话里,其中的一个或多个指令中包括指代词,例如指代词“这”、“那”、“这里”、 “那里”等等。当第三可组合能力基于用户输入的语音指令确定用户意图时,第三可组合能力需要将上述指代词替换成具体的目标对象名称。其中,上述将语音指令中的指代词替换成具体的目标对象名称的应用场景,可以被称为指代消解场景。
因此,本申请实施例可以将上述实施例中的意图分类方法和槽位关键词提取方法应用于指代消解场景中,以使得第三可组合能力能够更准确地确定出用户的意图,并使得虚拟聚合设备基于确定出的用户意图控制对应的第二可组合能力执行相应的功能。
示例性的,以图15所示的多设备场景为例,用户位于卧室内,向手机120发出内容为“搜索一下新街口的周边美食,播放刘德华的忘情水,并帮我导航到那儿。”的单轮语音对话。其中,在该示例中,第一事件可以是用户向手机120发出单轮语音对话的事件。图15所示场景中的各电子设备可以参考图10A所示的多设备场景,在此不再赘述。该单轮对话中包括多个指令,分别为“搜索一下新街口的周边美食”、“播放刘德华的忘情水”和“导航到那儿”。该多个指令中的最后一个指令包括有指示代词“那儿”。
下面,结合图15所示的应用场景,介绍本申请实施例提供的一种单轮对话下多指令指代消解的方法。
图16示例性示出了该单轮对话下多指令指代消解方法的具体流程。如图16所示,该方法具体可以包括:
S1301、第一可组合能力接收到用户向第一设备发送的对话交互信息(也可以被称为第一对话信息)。
示例性的,以图15所示的多设备场景为例,该第一设备可以是与用户进行语音对话交互的手机120,第一可组合能力可以是设置在手机120上的语音交互组件。该语音交互组件可以接收到用户向手机120输入的对话交互信息为“搜索一下新街口的周边美食,播放刘德华的忘情水,并帮我导航到那儿。”的当前单轮语音对话。
S1302、第三可组合能力可以将上述对话交互信息划分为多个语义单元,得到语义单元文本列表。
具体的,第二设备可以通过第三可组合能力获取到上述第一可组合能力所接收的对话交互信息。然后,第二设备可以通过第三可组合能力基于多任务对抗学习的智能语义单元识别模型,识别出该对话信息中各语义单元边界,以得到包括对话信息中各语义单元所组成的语义单元文本列表。其中,语义单元也可以被称为指令,多个语义单元中包括第一指令和第二指令,第一指令包括第一指代词。同样的,后续步骤中的S1303-S1307都可以由第二设备通过第三可组合能力执行。
示例性的,如图15场景所示的对话交互信息“搜索一下新街口的周边美食,播放刘德华的忘情水,并帮我导航到那儿。”中,单个语义单元可以是该对话交互信息中的单指令语句“搜索一下新街口的周边美食”等。
如图17A所示,第三可组合能力可以基于如图17B所示的语义单元识别模型,将获取到的对话信息划分为多个语义单元,得到语义单元文本列表的流程。该方法的具体流程可以如下述步骤1)至步骤6)所示:
1)将对话交互信息对应的文本及对应的音频停顿特征输入语义单元识别模型。
其中,音频停顿特征可以指的是:对话交互信息对应文本中的token单元在该对话交互信息相应音频中对应有音频停顿,则为该文本信息中的音频停顿特征。可选的,对话交互信息对应文本中的音频停顿特征可以由值为“1”的编码值表示,其他非音频停顿特征的token 单元可以由值为“0”的编码值表示。其中,token单元可以指的是文本中预定义的包含有语义信息的单元。该token单元可以包括单个字、单个词、单个标点和/或单个段落等。
示例性的,以图15所示的多设备场景为例,该对话交互信息对应的文本可以是前述中的“搜索一下新街口的周边美食,播放刘德华的忘情水,并帮我导航到那儿。”其中,该文本中的token单元定义为包括文本中单个字的单元。若用户在输入“食”字token单元的语音时包括有停顿、输入“水”字token单元的语音时包括有停顿、输入“儿”字token单元的语音时包括有停顿,除此外文本中其他token单元的语音没有音频停顿,则上述文本中以值为“1”和“0”的编码值标识该文本的音频停顿特征可以是“00000000000100000000100000001”。
2)针对对话交互信息的文本进行预处理,得到归一化文本。
其中,上述预处理可以包括标点归一化、去除语句中间的标点以及保留缩略词/小数间的标点符号。其中,标点归一化可以指的是删除对话交互信息中语句首部和语句尾部的特殊标点。例如,若对话交互信息的语句中包括“/导航到那儿。/”,则标点归一化这一操作可以将该对话交互信息中语句首部的特殊标点“/”和语句尾部的特殊标点“/”删除,得到语句为“导航到那儿”;去除语句中间的标点可以是删除语句中间特殊标点符号和替换英文单词间的标点符号。示例性的,关于删除语句中间特殊标点符号的示例可以是:若对话交互信息的语句中包括有电话号码“010-65234”,则删除该电话号码中的特殊标点符号“-”,得到该删除上述电话号码间特殊符号后的语句为“01065234”。又示例性的,关于替换英文单词间标点符号的示例可以是:例如,英文单词间标点符号可以被替换成预定义的该标点符号对应的单词,如“&”可以被替换成单词“and”、“\”可以被替换成“or”、“=”可以被替换成“equal”等等。若对话交互信息的语句中包括英文单词“you&me”,该英文单词中间插入有标点符号“&”,则可以将该英文单词中间标点符号替换成预定义的该符号“&”对应的英文单词“and”;示例性的,保留缩略词/小数间的标点符号示例可以是:若对话信息中包括缩略词“e.g.”和/或小数“12.35”,则该缩略词间和小数间的标点符号可以进行保留。
3)去除归一化文本中的标点符号,得到归一化无标点文本。
示例性的,以前述图15中多设备场景中用户向手机120输入的对话交互信息为例,该对话交互信息对应的归一化文本可以是“搜索一下新街口的周边美食,播放刘德华的忘情水,并帮我导航到那儿。”的语句。基于该步骤将上述示例归一化文本去除标点符号后,可以得到该归一化文本对应的归一化无标点符号文本“搜索一下新街口的周边美食播放刘德华的忘情水并帮我导航到那儿”。
4)针对归一化文本和归一化无标点文本进行分词,并计算出归一化无标点文本分词后的token单元在归一化文本中的位移量。
其中,针对归一化文本和归一化无标点文本,可以基于BERT算法进行分词。具体的,BERT算法可以包括BasicTokenizer(也可以被称为BT)分词器和WordpieceTokenizer(也可以被称为WPT)分词器。BT分词器是一个初步的分词器,其流程可以为将待处理的文本转成unicode字符串、去除各种奇怪的字符、处理中文、空格分词、去除多余的字符和标点分词、再次空格分词。WPT分词器可以基于BERT预定义的词汇表,将BT分词器所得到的分词结果进行再一次切分,也即是按照从左到右的顺序,将一个词拆分成多个子词,每个子词尽可能长。可选的,针对中文文本的分词而言,BT分词器已将中文文本分词成了单个字符,因此WPT分词器可以不必将BT分词器输出的结果进行再一次切分。
然后,当归一化文本和归一化无标点文本都进行分词处理后,可以计算出归一化无标点文本中的token单元在归一化文本中的位移量。
5)基于智能语义单元识别模型,对上述步骤中得到的token单元进行计算,输出切分后的语义单元文本。
具体的,针对上述步骤中得到的token单元,智能语义识别模型可以采用4层BERT进行语义编码。智能语义识别模型可以基于上述的语义编码,进行多任务对比学习。其中,该多任务对比学习可以包括语义单元识别任务模块、槽位探测任务模块和任务判别器。其中,语义单元识别任务模块可以基于Softmax函数以及前述中输入的音频停顿特征,判断是否在某个token单元后进行切分。若该token单元需要进行切分,则以字符“E”表示。若该token单元不需要进行切分,则以字符“O”表示。槽位探测任务模块可以基于Softmax函数,判断某个token单元是否属于槽位(例如,时间槽位、地点槽位等等)。若该token单元属于槽位,则可以以值为“1”的编码值表示。若该token单元不属于槽位,则可以以值为“0”的编码值表示。该槽位探测任务模块可以避免槽位中间出现误切现象的发生。任务判别器可以基于Maxpooling函数、Gradient Reversal函数、Dense函数和/或Softmax函数进行多任务对抗学习,识别任务标签,避免共享语义控件从多意图识别任务中学习到任务相关信息。
6)基于上述步骤中切分得到的语义单元文本、以及各token单元在归一化文本中的位移量,得到语义单元文本列表。
示例性的,以图15所示的多设备场景中输入的对话交互信息为例,该对话交互信息对应的文本可以是前述中的“搜索一下新街口的周边美食,播放刘德华的忘情水,并帮我导航到那儿。”该文本经过上述步骤处理后,可以得到语义单元文本列表如下:{“搜索一下新街口的周边美食”}(也可以被称为第二指令)、{“播放刘德华的忘情水”}和{“并帮我导航到那儿”}(也可以被称为第一指令)。
S1303、第三可组合能力基于预定义的指代词词典,识别出语义单元文本列表中包括指代词的语义单元。
具体的,若该语义单元文本中包括有预定义的指代词词典中的指代词,则第三可组合能力可以将该语义单元文本确定为含指代词的语义单元,并且识别出该指代词的类型;若该语义单元文本中不包括预定义的指代词词典中的指代词,则第三可组合能力可以将该语义单元文本确定为不含指代词的语义单元。
示例性的,以图15所示的多设备场景中输入的对话交互信息中,该对话交互信息对应的语义单元文本列表可以如下:{“搜索一下新街口的周边美食”}、{“播放刘德华的忘情水”}和{“并帮我导航到那儿”}。第三可组合能力基于预定义的指代词词典,识别出该语义单元文本列表中包括指代词的语义单元为{“并帮我导航到那儿”},且指代词“那儿”的类型为指示代词,该指代词“那儿”也可以被视为第一指代词;该语义单元文本列表中不包括指代词的语义单元为{“搜索一下新街口的周边美食”}和{“播放刘德华的忘情水”}。其中,上述包括指代词的语义单元可以被称为指代语义单元。
S1304、第三可组合能力基于意图分类模板,识别出各语义单元的意图。
示例性的,以上述图15所示的多设备场景中输入的对话交互信息为例,该场景下的语义单元文本列表中的语义单元分为指代语义单元{“并帮我导航到那儿”},及不包括指代词的语义单元{“搜索一下新街口的周边美食”}和{“播放刘德华的忘情水”}。第三可组合能力可以基于意图分类模板,识别出语义单元{“并帮我导航到那儿”}的意图为“导航地点路线”,该意图可以被称为指代意图;识别出语义单元{“搜索一下新街口的周边美食”}的意图为“搜 索美食”,语义单元{“播放刘德华的忘情水”}的意图为“播放音乐”。其中,上述不包括指代词的语义单元的意图“搜索美食”和“播放音乐”可以被称为被指代意图,多个被指代意图可以组成被指代意图列表。
S1305、第三可组合能力基于各语义单元的意图,进行语义单元关联识别并合并关联的语义单元。
具体的,该步骤中的语义单元识别及合并关联的语义单元,可以指的是根据预定义的意图关联模板、指代意图、被指代意图和指代词类型确定出包括指代词的语义单元关联的被指代语义单元,并将两者进行合并。
具体的,意图关联模板的生成可以如下述步骤1)至步骤2)所示:
1)针对每种指代词类型(例如,人称代词、指示代词等),可以根据预定义的意图槽位关联体系中的槽位类型以及该槽位类型对应的意图来判断指代意图和被指代意图的所有可能组合。示例性的,在预定义的意图槽位关联体系中,可以定义槽位类型1(例如,人物槽位类型、地点槽位类型等)对应有意图1,而和槽位类型1相应指代词类型(例如,人物槽位类型对应的人称代词、地点槽位类型对应的指示代词等)对应有意图2。则意图1和意图2的组合可以被视为一种可能的指代意图与被指代意图的组合。其中,意图1可以被称为被指代意图,意图2可以被称为指代意图,
2)从云服务器获取涉及到指代场景的多轮对话交互,并基于该多轮对话交互中统计各指代意图和被指代意图关联出现的概率。若该指代意图和被指代意图关联出现的概率大于阈值2时,则将该指代意图和被指代意图的组合确定为意图关联模板中对应指代词类型的一种组合。其中,阈值2的数值可以是小于1的数值,如0.6、0.7或0.8等。关于阈值2的大小,本申请对此不作限制。例如,以上述步骤1)中的示例为例,若从云服务器获取的涉及到指代场景的多轮对话交互中,意图1和意图2的组合出现的概率大于阈值2,则将该意图1和意图2的组合确定为意图关联模板中槽位类型1相应指代词类型的一种组合。
示例性的,意图关联模板可以如表5所示:
表5
指代词类型 被指代意图 指代意图
指示代词 搜索美食 导航地点路线
  预订机票 查询天气
  …… ……
人称代词 播放音乐 播放视频
  …… ……
如表5所示,当指代词类型为指示代词时,意图关联模板中的组合可以有被指代意图“搜索美食”和指代意图“导航地点路线”的组合、被指代意图“预订机票”和指代意图“查询天气”的组合等;当指代词类型为人称代词时,意图关联模板中的组合可以有被指代意图“播放音乐”和指代意图“播放视频”的组合等。
需要说明的是,上述表5仅仅用于示例性解释本申请,并不构成对本申请的限制。
示例性的,以上述图15所示的多设备场景中输入的对话交互信息为例,根据预定义的意图关联模板、指代意图、被指代意图和指代词类型,第三可组合能力可以确定出指代语义单 元{“并帮我导航到那儿”}对应的被指代语义单元为{“搜索一下新街口的周边美食”}。然后,合并上述两个语义单元{“搜索一下新街口的周边美食,并帮我导航到那儿”}。
S1306、第三可组合能力基于合并后的指代语义单元和被指代语义单元,针对指代词进行指代消解。
具体的,第三可组合能力可以通过并发、串行或batch的方式,基于合并后的指代语义单元和被指代语义单元,针对指代词进行指代消解。也即是说,将语义单元文本列表中的指代词替换为具体的目标对象名称。
示例性的,以上述图15所示的多设备场景中输入的对话交互信息为例,在上述步骤中,指代语义单元{“并帮我导航到那儿”}和被指代语义单元{“搜索一下新街口的周边美食”}已合并为文本{“搜索一下新街口的周边美食,并帮我导航到那儿”}。通过该步骤的指代消解处理后,该文本中的指代词“那儿”可以被替换为具体的地点名称“新街口”。也即是说,指代语义单元{“并帮我导航到那儿”}可以被替换为包括具体地点名称的语义单元{“并帮我导航到新街口”}。
S1307、第三可组合能力基于上述进行指代消解后的指代语义单元,确定出对话交互信息的意图信息和槽位关键词。
其中,第一对话信息中的第一指代词被替换为第二指令对应的对象后,可以获取到第二对话信息。第三可组合能力可以基于该第二对话信息确定出对话交互信息的意图信息和槽位关键词,由此识别出第一事件表征的用户意图。
示例性的,以上述图15所示的多设备场景中输入的对话交互信息为例,第一对话信息可以是“搜索一下新街口的周边美食,播放刘德华的忘情水,并帮我导航到那儿。”的对话,经过上述步骤中的指代消解处理流程,第一指代词“那儿”被替换第二指令对应的对象后,获取到的第二对话信息可以是“搜索一下新街口的周边美食,播放刘德华的忘情水,并帮我导航到新街口。”的对话,则该对话交互信息可以包括如下多个具体的单指令语句:a)搜索一下新街口的周边美食;b)“播放刘德华的忘情水”;c)并帮我导航到新街口。交互指令识别模块可以基于上述多个具体的单指令语句,确定出该对话交互信息的意图1为“搜索美食”,该意图对应的地点槽位关键词为“新街口”;意图2为“播放音乐”,该意图对应的人物槽位关键词为“刘德华”,曲目槽位关键词为“忘情水”;意图3为“导航地点路线”,该意图对应的地点槽位关键词为“新街口”。
其中,关于图15示例性所示的对话交互信息基于图16所示的具体方法,最后确定出相应意图和槽位关键词的完整流程,可以如图17C所示。关于该图17C中步骤的描述,可以参考步骤S1301-步骤S1307中的说明,在此不再赘述。
实施例本申请提供的指代消解方法,可以使得虚拟聚合设备在用户输入包括多个指令的单轮对话信息时,更为准确地识别出用户的意图并基于用户意图执行相应的功能,提高了指令识别的效率,为用户提供了更加自然流畅的交互体验。
在本申请实施例中,还可以提供一种实现长时任务的交互和执行方法,应用于上述的S107-S108步骤中。
在一些应用场景中,用户向第一设备所输入的交互指令中,可以包括一个或多个指定的逻辑关系(例如,顺序关系、循环关系、条件关系和布尔逻辑等等)。然后,第三可组合能力 可以基于上述的交互指令,识别出用户的服务意图,并基于用户的服务意图和上述交互指令中一个或多个指定的逻辑关系,确定出一个或多个应答子服务信息以及各应答子服务间的逻辑关系。然后,虚拟聚合设备可以将上述一个或多个应答子服务映射至相应的第二可组合能力上,并基于应答子服务间的逻辑关系安排子服务的处理流程,以完成服务意图对应的任务。其中,用户向第一设备输入交互指令可以被称为第一事件,上述包括一个或多个指定逻辑关系的应答子服务所组成的任务,可以被称为长时任务。
示例性的,以图18所示的多设备场景对本方法进行说明。如图18所示,该多设备场景可以包括:智能烧水壶160、智能门锁170和手机180。其中,手机180可以位于用户所在的卧室,智能烧水壶160和智能门锁170可以位于厨房。关于智能烧水壶160、智能门锁170和手机180的硬件结构和软件架构,可以参考前述实施例中的描述,在此不再赘述。上述各设备所具有的可组合能力可以包括于覆盖上述房间区域的虚拟聚合设备中。在该虚拟聚合设备中,以手机180为中控设备,也即是说,在包括智能烧水壶160、智能门锁170和手机180中一个或多个可组合能力的虚拟聚合设备中,手机180可以对手机180、智能烧水壶160和智能门锁170上的一个或多个可组合能力进行调度和控制。在该多设备场景下,用户可以向手机180输入语音指令“烧开水时不要让小孩进入厨房”。
下面,结合图18所示的多设备场景,说明图19A所示的虚拟聚合设备执行长时任务的具体流程。如图19A所示,该流程具体可以包括:
S1401、第一可组合能力获取到用户输入的交互指令并将其转换为文本描述信息。
具体的,第一设备可以通过第一可组合能力接收到用户针对第一设备输入的交互指令。然后,第一可组合能力可以将该交互指令转换为等价的文本描述信息。也即是说,交互指令中用户所输入的描述内容与文本描述信息中用户所输入的描述内容相同。其中,该交互指令可以是语音指令、文本指令或其他形式输入的交互指令,本申请对此不作限制。
示例性的,以图18所示的多设备场景为例,当用户向手机180输入语音指令“烧开水时不要让小孩进入厨房”时,手机180上的近场语音输入能力可以接收到该语音指令,并将该语音指令转换为等价的文本描述信息。其中,该文本描述信息中用户所输入的描述内容和语音指令中用户所输入的描述内容相同,也即是“烧开水时不要让小孩进入厨房”。
S1402、第三可组合能力基于交互指令对应的文本描述信息,确定出用户的服务意图。
具体的,第二设备可以通过第三可组合能力,基于指定算法,从获取到的交互指令对应的文本描述信息确定出用户的服务意图。其中,上述指定算法可以是前述实施例中的NLU算法。本申请实施例对该步骤中交互指令识别模块所采用的算法不作限制。需要说明的是,后续步骤中的S1403-S1405也可以由第二设备通过第三可组合能力执行。
示例性的,以图18所示的多设备场景为例,第二设备可以是手机180。手机180可以通过第三可组合能力获取到上述用户向手机180发出的语音指令对应的文本描述信息。该文本描述信息中用户所输入的描述内容为“烧开水时不要让小孩进入厨房”。第三可组合能力可以通过NLU算法从该文本描述信息中确定出用户的服务意图为“当位于厨房的烧水壶160在烧开水时,智能门锁170、手机180等电子设备禁止小孩进入厨房”。
S1403、第三可组合能力基于服务意图和上述文本描述信息,确定出一个或多个应答子服务信息及各应答子服务间的逻辑关系信息。
具体的,当确定出用户的服务意图后,第二设备可以通过第三可组合能力,基于该服务 意图和文本描述信息,通过指定算法,确定出一个或多个应答子服务信息及各应答子服务间的逻辑关系信息。其中,该指定算法可以是前述实施例中的NLU算法,本申请实施例对该步骤中交互指令识别模块所采用的算法不作限制。应答服务子信息可以包括各应答子服务的类型、参数(例如,各应答子服务的内容描述)、执行各应答子服务的可组合能力等信息。各应答子服务间的逻辑关系信息可以包括执行各应答子服务时彼此之间的逻辑依赖关系,例如顺序关系、条件关系、循环关系和布尔逻辑等等。
示例性的,以图18所示的多设备场景为例,手机180可以通过第三可组合能力基于该服务意图和文本描述信息,通过NLU算法确定出的一个或多个应答子服务信息及各应答子服务信息间的逻辑关系信息可以用槽位填充的形式表征,可以如表6所示:
表6
Figure PCTCN2022131166-appb-000008
如表6所示,应答子服务1的槽位信息为“检测智能烧水壶160的状态”,对应的可组合能力为“智能烧水壶160状态获取能力”;应答子服务2的槽位信息为“检测小孩”,对应的可组合能力为“人脸识别能力”和“指纹识别能力”;应答子服务3的槽位信息为“阻止小孩进入厨房”,对应的可组合能力为“语音播报能力”和“门锁关闭能力”。各应答子服务间的逻辑关系可以表6所示:应答子服务1和应答子服务2之间具有条件逻辑关系,也即是说,若应答子服务1中智能烧水壶160状态获取能力检测出智能烧水壶160正在烧开水时,即执行应答子服务2,使得人脸识别能力和/或指纹识别能力检测附近是否有小孩;应答子服务2和应答子服务3之间具有条件逻辑关系,也即是说,若应答子服务2中的人脸识别能力和/或指纹识别能力检测出附近有小孩时,则执行应答子服务3,使得语音播报能力发出语音提醒,以及基于门锁关闭能力控制智能门锁170关闭。
需要说明的,上述表6仅仅用于示例性解释本申请,并不构成对本申请的具体限制。
S1404、虚拟聚合设备将各应答子服务对应的任务映射至对应的第二可组合能力。
具体的,基于前述实施例中的多模态决策模块和任务序列生成模块,根据步骤S1403中确定出的一个或个应答子服务信息生成各应答子服务对应的任务。然后,虚拟聚合设备上的智慧助手可以配置并调用已连接的电子设备上合适的第二可组合能力,以执行各应答子服务 对应的任务。
可选的,虚拟聚合设备上的智慧助手配置并调用已连接电子设备上用于执行应答子服务对应任务的第二可组合能力,可以参考前述实施例中描述的流程,在此不再赘述。
可选的,可以按照预设的规则选择最适合执行某个应答子服务对应任务的第二可组合能力。例如,可以基于长时任务的可持续性规则,选择低功耗,和/或电源供电时间长,和/或固定位置安装,和/或使用频率低的电子设备上的第二可组合能力。
可选的,用户也可以人工设置某个应答子服务对应任务的第二可组合能力。然后,任务映射模块可以将该任务和对应的执行指令发送至上述最合适的可组合能力所属电子设备。
示例性的,以图18所示的多设备场景为例,第三可组合能力可以基于上述表6所示的信息,将确定出的一个或个应答子服务信息生成各应答子服务对应的任务及对应的执行指令。虚拟聚合设备上的智慧助手可以配置并调用已连接的电子设备上合适的第二可组合能力,以执行各应答子服务对应的任务。例如,虚拟聚合设备上的智慧助手可以配置并调用智能烧水壶160上的智能烧水壶160状态获取能力,以执行应答子服务1“检测智能烧水壶160的状态”对应的任务;虚拟聚合设备上的智慧助手可以配置并调用智能门锁170上的人脸识别能力和指纹识别能力,以执行应答子服务2“检测小孩”对应的任务;虚拟聚合设备上的智慧助手可以配置并调用智能门锁170上的音播报能力和控制门锁关闭能力,以执行应答子服务3“阻止小孩进入厨房”对应的任务。
S1405、第三可组合能力基于各应答子服务间的逻辑关系信息,构建长时任务的执行流程。
可选的,长时任务流程可以采用xml、json等结构化语言或自定义的数据结构进行描述。其中,长时任务流程及其结构化描述可以是分层嵌套模式。当长时任务采用多个可组合能力执行单个应答子服务对应的任务时,第三可组合能力可以为每个可组合能力的执行流程构建单独的执行流水线,或者,也可以将该执行单个应答子服务对应任务的多个可组合能力构建为同一条执行流水线,以流水线内多分支或多线程/多进程的方式分别运行。
示例性的,以图18所示的多设备场景为例,图19B示例性示出了第三可组合能力基于各应答子服务间的逻辑关系信息,构建出的该场景下长时任务的执行流程。如图19B所示,该长时任务的具体执行流程可以如下步骤a)-步骤e)所示:
a)检测智能烧水壶160的状态。
b)当检测到智能烧水壶160正在烧开水时,启动智能门锁170上的视频监控且使得智能门锁170上锁。
c)当智能门锁170基于人脸识别能力识别出附近的人脸年龄小于14岁时,基于语音播报能力输出消息提醒,提示用户有小孩靠近厨房;当智能门锁170基于指纹识别能力识别出指纹信息和小孩的指纹信息相同时,保持门锁关闭。在一种可能的实现方式中,当智能门锁170基于指纹识别能力识别出指纹信息和小孩的指纹信息不相同时,可以使得指纹锁解锁。
d)检测该长时任务是否暂停或终止。
e)当检测到该长时任务暂停或终止时,则退出该长时任务的执行流程;当检测到该长时任务没有暂停或终止时,则可以继续基于该长时任务的执行流程执行该长时任务。
S1406、第二可组合能力基于上述构建的长时任务执行流程,执行长时任务。
具体的,前述实施例中的任务执行Runtime可以基于上述步骤所构建的长时任务执行流程唤醒前述选择出的第二可组合能力执行该长时任务。其中,该长时任务的终止可以由检测到的用户指令,和/或设备状态,和/或环境状态信息触发。
示例性的,以图18所示的多设备场景为例,该场景下长时任务终止的触发可以是用户输入的语音指令“停止烧开水时对小孩的检测”;或者,当智能门锁170基于人脸识别能力识别出附近的人脸年龄大于14岁,并且基于指纹识别能力识别出指纹信息和小孩的指纹信息不相同,使得指纹锁解锁时,可以检测智能烧水壶160的状态。当检测到智能烧水壶160没有烧水时,则该长时任务暂停或终止;若检测到智能烧水壶160仍在烧开水时,可以继续基于该长时任务的执行流程执行该长时任务。
实施例本申请提供的该长时任务执行方法,可以使得多设备场景下的虚拟聚合设备执行多个具有逻辑以来关系的指令,使得虚拟聚合设备可处理的任务类型更为丰富,提升各设备的资源利用率以及多设备场景下任务执行的效率和自动化水平。
在本申请实施例中,还可以提供一种记忆模型的构建方法,应用于上述的S107步骤中。
在一些应用场景中,第一可组合能力可以接收第一预设时间内的交互输入。基于上述接收到的交互输入,可以形成交互操作记录。第二设备中的第三可组合能力可以获取到交互操作记录,其中,该交互操作记录可以包括用户与设备间的交互信息(例如,用户通过浏览器查询新闻),和/或设备检测到用户处于指定状态下的信息(例如,电子设备检测到用户跑步30分钟的状态),和/或多个电子设备间的交互信息。第三可组合能力可以基于上述交互操作记录构建记忆模型(也可以被称为记忆)。其中,记忆模型可以包括短时记忆模型(也可以被称为短时记忆)和长时记忆模型(也可以被称为长时记忆),然后,第三可组合能力可以基于上述记忆模型,识别出第一事件所表征的用户意图。
其中,记忆模型可以基于接收到的交互输入来表征用户和设备之间交互的习惯或偏好。短时记忆模型可以基于满足第一条件的交互操作记录来表征用户和设备之间交互的习惯或偏好;长时记忆模型可以基于满足第二条件的交互操作记录来表征用户和设备之间交互的习惯或偏好。
在一种可能的实现方式中,第一条件可以是指上述交互操作记录是在预设时间窗内(也可以被称为第一预设时间,例如,在最近6小时内)接收到的。第二条件可以是指上述交互操作记录在连续多个预设时间窗内(例如,6小时内、8小时内)都接收到。
在另一种可能的实现方式中,第一条件可以是指在指定时间段1(也可以被称为第一预设时间,例如,从凌晨0点-晚24点),上述交互操作记录的接收次数大于第三阈值。第二条件可以是指在多个连续的指定时间段1(例如,从凌晨0点-晚24点),上述交互操作记录的接收次数在各指定时间段1中都大于第三阈值。
示例性的,图20示例性示出了该个性化交互方法的具体流程。如图20所示,该方法具体可以包括:
S1501、第三可组合能力获取到交互操作记录。
具体的,第二设备可以通过第三可组合能力获取到多个设备上的交互操作记录。
关于交互操作记录的说明,可以参考前述中的描述,在此不再赘述。该交互操作记录可以存储在中控设备上。可选的,当中控设备的存储资源不足以存储交互操作记录时,中控设备可以选择存储资源丰富的设备存储交互操作记录。该存储上述交互操作记录的设备可以向中控设备提供指定接口,以便中控设备可以从该设备获取到交互操作记录。
S1502、第三可组合能力基于上述交互操作记录构建记忆模型。
具体的,记忆模型可以基于显示形式进行构建,例如标签结构化数据形式。记忆模型也可以基于隐式形式进行构建,例如张量形式、神经网络参数形式等等。其中,记忆模型可以被划分为短时记忆模型和长时记忆模型。短时记忆模型可以基于交互操作记录为输入,以短时记忆模型期望的数据(例如,交互操作记录对应的短时记忆标签值)为输出进行构建,长时记忆模型可以基于短时记忆模型的数据为输入,以长时记忆模型期望的数据(例如,交互操作记录对应的长时记忆标签值)为输出进行构建。短时记忆模型和长时记忆模型可以通过主成分分析算法,或CNN、RNN、LSTM等一种或多种人工神经网络算法进行构建。
可选的,短时记忆模型/长时记忆模型可以采用计算交互操作记录的接收时间间隔、交互操作记录的接收次数等统计交互操作记录的参数的方法,当上述交互操作记录的参数满足预定义的规则(例如,当交互操作记录的接收时间间隔小于指定时间间隔、交互操作记录的接收次数大于指定阈值等)时,对短时记忆模型/长时记忆模型中相应的数据。
可选的,短时记忆模型也可以采用FIFO数据结构对预设时间窗内(例如,在最近6小时内)的交互操作记录进行记录以及基于上述记录对交互操作记录对应的短时记忆标签值进行更新,而不必基于上述的主成分分析算法或人工神经网络算法进行模型构建。
示例性的,以基于标签结构化数据形式构建记忆模型为例。第三可组合能力可以获取交互操作记录。关于交互操作记录的说明,可以参考前述中的描述,在此不再赘述。第三可组合能力可以判断某条交互操作记录的接收时间是否在预设时间窗内(例如,6小时内、8小时内)。若是,第三可组合能力可以将该交互操作记录对应的短时记忆标签值设置为“True”。当该交互操作记录对应的短时记忆标签的值“True”保持时间大于第二时间阈值(例如,7天、10天等)时,则将该交互操作记录对应的长时记忆标签值设置为“True”。当该交互操作记录对应的短时记忆标签的值“True”保持时间小于第二时间阈值(例如,7天、10天等)时,或者该交互操作记录对应的短时记忆标签的值“False”保持时间大于第二时间阈值(例如,7天、10天等)时,则将该交互操作记录对应的长时记忆标签值设置为“False”。可选的,短时记忆标签的值和长时记忆标签的值除了上述“True”和“False”的布尔型数据,也可以是字符型数据,例如响应于用户操作执行相应功能的APP名称等。
结合上述示例性的实施方式,若获取到的多条交互操作记录在预设时间窗内(例如,在最近6小时内)包括有“07:00跑步30分钟”、“08:00打开浏览器查询新闻”和“08:15打开G导航APP查询路线”等。上述各交互操作记录对应的短时记忆标签以及该标签的值和长时记忆标签及该标签的值可以如表7所示:
表7
Figure PCTCN2022131166-appb-000009
Figure PCTCN2022131166-appb-000010
如表7所示,交互操作记录“07:00跑步30分钟”的记录时间在预设时间窗内(例如,在最近6小时内),对应的短时记忆标签为“近期运动”,其值为“True”。该短时记忆标签的值“True”保持时间大于第二时间阈值(例如,7天),因此对应的长时记忆标签“爱好运动”的值为“True”;交互操作记录中没有购物操作记录,因此对应的短时记忆标签“近期购物”的值为“False”。该短时记忆标签的值“False”保持时间大于第二时间阈值(例如,7天),因此对应的长时记忆标签“爱好购物”的值为“False”;交互操作记录“08:00打开浏览器查询新闻”在预设时间窗内(例如,在最近6小时内),对应的短时记忆标签“近期阅读新闻”,其值为“True”。该短时记忆标签的值“True”保持时间大于第二时间阈值(例如,7天),因此对应的长时记忆标签“爱好阅读新闻”的值为“True”;交互操作记录“08:15打开G导航APP查询路线”在预设时间窗内(例如,在最近6小时内),对应的短时记忆标签“近期导航APP”,其值为记录中导航APP的名称“G”。而长时记忆标签“导航惯用APP”的导航APP名称值为“B”,也即是说,在记录短时记忆标签“近期导航APP”的值“G”之前,短时记忆标签“近期导航APP”的值“B”保持时间大于第二时间阈值(例如,7天)。
基于上述表7中所示标签结构化数据形式构建的记忆模型可以如图21A所示。其中,该示例中的短时记忆模型可以包括短时记忆网络和短时记忆遗忘网络,长时记忆模型可以包括长时记忆网络和长时记忆遗忘网络。该短时记忆模型和长时记忆模型可以采用CNN、RNN、LSTM等一种或多种人工神经网络算法进行构建。短时记忆网络可以基于交互操作记录为输入,以交互操作记录对应的短时记忆标签值或参数值为期望输出进行训练构建;长时记忆网络可以基于短时记忆网络的输出为输入,以对应的长时记忆标签值或参数值为期望输出进行训练构建。短时记忆遗忘网络和长时记忆遗忘网络可以用于逆向反馈,以实现短时记忆和长时记忆的遗忘退化。可选的,短时记忆遗忘网络/长时记忆遗忘网络可通过调整其控制参数,控制以不同速率或权重进行记忆信息的遗忘退化。可选的,该记忆模型可以通过指定转换网络将短时记忆网络的输出转换为长时记忆网络的输入。
S1503、第三可组合能力基于上述记忆模型,进行指令识别和/或意图决策。
具体的,记忆模型可以以数据库的形式进行存储,使得交互指令识别模块、多模态意图决策模块可以进行读取;记忆模型(例如,上述图21A示例中的短时记忆网络和长时记忆网络)也可以以人工神经网络、决策树等机器学习模型的形式进行存储,作为第二设备中交互指令识别模块、多模态意图决策模块等模块中算法的组成部分,以使得交互指令识别模块可以基于该记忆模型识别出用户意图,多模态意图决策模块可以基于上述记忆模型进行意图决策。
实施本申请实施例提供的上述个性化交互方法,可以使得虚拟聚合设备基于交互操作记录中包括的用户高频率操作行为更准确地识别用户输入的指令,提高了用户与设备交互的精确性,提升了识别用户意图的效率,同时也更贴近用户的个人使用习惯。
在本申请实施例中,还可以提供一种基于用户画像的交互方法,应用于上述的S107-S108步骤中。
虚拟聚合设备可以确定用户的用户画像,并利用用户画像发掘用户需求,分析用户偏好, 提供给用户更高效和更有针对性的信息输送以及更贴近个人习惯的用户体验。用户画像可以在多维度上建立针对用户的描述性标签,对用户多方面的真实个人特征进行勾勒。其中,用户画像是指根据用户的基本属性、用户偏好、生活习惯、用户行为等信息而抽象出来的标签化用户模型。用户画像可以由第三可组合能力从各个设备相互同步的用户信息获取得到。用户信息可以包括用户性别、年龄等用户自身固有的属性,还可以包括用户和设备之间的交互信息,如用户启动设备的次数、关闭设备的次数,在不同场景下触发设备执行的操作等等。第三可组合能力可以基于用户画像,识别出第一事件表征的用户意图。
示例性的,以图21B所示的客厅场景和图21C所示的客厅场景为例,在上述两示例中的客厅场景内,虚拟聚合设备可以包括下列多个设备:手机210、智慧屏220、大灯230和落地灯240。在该虚拟聚合设备中,可以以手机210作为中控设备。其中,图21B所示的用户A正坐在沙发上,观看智慧屏220播放的视频。落地灯240已开启。用户A可以发出语音指令“亮一点”(也可以被称为第一事件)。图21C所示的用户B正坐在沙发上阅读手机210上的信息。大灯230已开启。用户B也可以发出语音指令“亮一点”(也可以被称为第一事件)。可以看出,用户A和用户B都发出了相同的语音指令。手机210接收到用户A和用户B的语音指令,可以基于用户A对应的用户画画像A和用户B对应的用户画像B,确定出用户A的用户意图A和用户B的用户意图B。然后,手机210可以触发任务映射模块基于用户意图A安排待执行任务A,基于用户意图B安排待执行任务B。手机210可以触发服务能力调度模块将待执行任务A和待执行任务B映射到各设备的可组合能力。
下面,基于上述图21B和图21C所示的应用场景,结合图21D,具体说明基于本申请实施例提供的基于用户画像的交互方法。如图21D所示,该基于用户画像的交互方法的具体可以包括:
S1601、第一可组合能力接收用户的交互输入。
具体的,在本申请实施例中,可以由第一设备上的第一可组合能力接收用户的交互输入。第一设备的数量可以是一个或多个。第一可组合能力可以是第一设备上所具有的一个或多个交互类可组合能力。其中,交互类可组合能力所包括的类型可以如图4所示。
示例性的,在图21B所示的应用场景中,当用户A发出语音指令“亮一点”时,智慧屏220可以通过运行于其上的近场语音输入能力接收到该语音指令。在图21C所示的应用场景中,当用户B发出语音指令“亮一点”时,手机210可以通过运行于其上的近场语音输入能力接收到该语音指令。可选的,在图21B所示的应用场景中,也可以由运行在手机210上的近场语音输入能力接收用户A的语音指令;在图21C所示的应用场景中,也可以由运行于智慧屏220上的近场语音输入能力接收用户B的语音指令。也即是说,本申请实施例并不限制第一设备的类型。
S1602、第三可组合能力基于用户的交互输入识别出用户身份。
具体的,在本申请实施例中,可以由第二设备上的第三可组合能力基于交互输入所包括的用户生理特征信息来识别出用户身份。例如,第三可组合能力可以基于交互输入所包括的声纹信息来识别出用户身份,或者,第三可组合能力可以基于交互输入所包括的脸部特征信息来识别出用户身份。
示例性的,在图21B和图21C所示的应用场景中,第二设备可以是手机210。手机210可以通过第三可组合能力基于用户A输入的语音指令所包括的声纹信息,识别出该语音指令对应的用户身份为“用户A”;同样的,手机210可以通过第三可组合能力基于用户B输入的 语音指令所包括的声纹信息,识别出该语音指令对应的用户身份为“用户B”。
S1603、第一可组合能力获取到环境状态信息和/或设备状态信息。
具体的,在本申请实施例中,可以由第一设备上的第一可组合能力获取到环境状态信息和/或设备状态信息。第一设备的数量可以是一个或多个。第一可组合能力可以是如图4所示的识别类可组合能力。
示例性的,在图21B的应用场景中,第三设备可以是智慧屏220。智慧屏220可以检测出智慧屏220正在播放视频,以及通过运行于其上的环境识别类可组合能力识别出落地灯240发出的黄光;在图21C的应用场景中,第三设备可以是手机210。手机210可以检测出手机210这在被用户使用通过运行于其上的环境识别类可组合能力识别出大灯230发出的白光,以及通过运行于其上的环境识别类可组合能力识别出大灯230发出的白光。
S1604、第三可组合能力基于用户画像确定出用户意图,并安排待执行任务。
具体的,虚拟聚合设备可以根据用户和各个设备的交互情况,创建并维护该用户的用户画像。第二设备上的第三可组合能力可以基于步骤S1602识别出的用户身份,根据该用户对应的用户画像,确定出用户的意图。然后,第三可组合能力可以基于该用户意图安排待执行任务。
示例性的,在图21B的应用场景中,第二设备可以是手机210。手机210在S1602步骤中识别出发出语音指令的用户身份为“用户A”。手机210可以从用户画像模块中获取到用户A的用户画像A。手机210可以通过第三可组合能力从用户画像A中获取到用户A在使用智慧屏220观看视频时,打开了落地灯240,从而确定出用户意图A为调亮落地灯240的亮度。手机210可以触发任务映射模块基于用户意图A安排待执行任务A,即调亮落地灯240;在图21C的应用场景中,第二设备可以是手机210。手机210在S1602步骤中识别出发出语音指令的用户身份为“用户B”。手机210可以从用户画像模块中获取到用户B的用户画像B。手机210可以通过第三可组合能力从用户画像B中获取到用户B在使用手机210时,打开了大灯230,从而确定出用户意图B为调亮大灯230的亮度。手机210可以触发第三可组合能力基于用户意图B安排待执行任务B,即调亮大灯230。
S1605、虚拟聚合设备将待执行任务映射到第二可组合能力。
具体的,在本申请实施例中,虚拟聚合设备可以将第二设备中第三可组合能力安排的待执行任务映射到第三设备上的第二可组合能力,以使得第二可组合能力执行相应的操作。
示例性的,在图21B的应用场景中,虚拟聚合设备可以将待执行任务A,即调亮落地灯240映射到落地灯240的灯光调节能力。使得落地灯240可以实现用户意图A,使得落地灯240发出的黄光更明亮;在图21C的应用场景中,虚拟聚合设备可以将待执行任务B,即调亮大灯230映射到大灯230的灯光调节能力,使得大灯230可以实现用户意图B,使得大灯230发出的白光更明亮。
实施例本申请实施例提供的基于用户画像的交互方法,可以使得虚拟聚合设备更为精确地识别出用户的意图,提高用户意图的识别效率。
为了更清楚地介绍本申请实施例提供的基于多设备提供服务的方法,下面以一个典型的应用场景为例,阐述该方法的流程。
在某一居家场景内,用户随身携带了智能手机(101设备),客厅安装了智能音箱(102设备)、智慧屏(103设备),卧室安装了智能音箱(104设备),客厅和卧室等房间内安装了可感知用户位置的毫米波传感器(105设备,未绘于图22、图23中)。在本案例中,用户进行了以下活动或交互操作:(1)从外部返回该智慧家居场景;(2)就坐于客厅内沙发使用智能手机(101设备);(3)移动至卧室,仰躺在卧室床上使用智能手机(101设备);(4)发出语音指令“播放音乐《七里香》”。则对照图2所示流程,本系统的基本工作流程为:
S101:如图22所示,用户从外返回后,智能手机(101设备)自动连接已预存的WiFi网络,与102~105设备建立互联。由于101~105设备均登陆了同一ID的用户账号,以该账号为认证鉴权依据,各设备间可完整请求和访问对方的软硬件资源。
S102:建立互联后,101~105设备之间交换各自可组合能力可用状态、功耗、计算资源等信息。
S103:采用静态或动态策略,在101~105设备中选举一台设备作为控制智慧助手服务的中控设备。示例的,如以系统功耗、处理能力为主要考量因素,以选举相对稳定在线运行且具有一定处理能力的设备为中控设备为优化目标,则智慧屏(103设备)由于采用市电供电(相对而言手机为电池供电),且具备一定的运算能力(智能音箱依赖云服务处理而基本无运算能力),可选举其为中控设备。
S104:中控设备智慧屏(103设备)控制虚拟聚合设备的初始化配置,构建分布式系统内可用可组合能力集合或清单。该集合或清单的示例如表8所示。
S105:可选的,中控设备智慧屏(103设备)可控制部分系统可组合能力启动,如毫米波传感器(105设备)的用户位置检测能力,以支撑后续决策和/或服务。
表8 分布式系统可用可组合能力集局部示例
Figure PCTCN2022131166-appb-000011
S101、S102、103、S104-S105可看作是图3所述方法中的S102,S103、S104、S105。
S201:系统通过用户位置检测能力,感知到用户坐于客厅沙发上使用室内手机(101设备。
S202:参考用户历史操作记录、人机交互历史、当前时间/日程等信息,预测用户潜在的服务需求为音视频播放。
S203:可选的,根据用户潜在服务需求(音视频播放),以及用户偏好、操作/交互历史等信息,确定虚拟聚合设备待用的服务方案为语音交互控制的音视频播放。
S204:根据虚拟聚合设备服务方案,在分布式系统内申请支撑该方案的可组合能力,如远场拾音(102设备)、云侧远场ASR识别(102设备)、云侧NLU识别(102设备)、音乐播放(102设备)等。所述申请可组合能力的过程,包括跨设备进行可组合能力的初始化和占用(如将可用状态置为False)。如图22所示,在该场景下,可选的语音交互控制通路包括智能手机(101设备)、智能音箱(102设备)、智慧屏(103设备)等3条,可采用动态或静态策略,选择最合适(如最近调用过)的1条语音交互控制通路。也可先同时选择3条语音交互控制通路,并在后续流程中选用部分处理结果,或对处理结果进行融合。此处以仅选择智能音箱(102设备)的语音交互控制通路为例,即选择102设备的远场拾音能力为交互入口,以配套的云侧远场ASR识别(102设备、103设备、或104设备)、云侧NLU识别(102设备、103设备、或104设备)能力为识别组件。
S205:中控设备智慧屏(103设备)控制虚拟聚合设备重配置,包括释放与104设备相关的暂时不使用的可组合能力;对虚拟聚合设备执行配置,如配置102设备的远场拾音能力为虚拟聚合设备的交互入口等。
S201-S205可看作是图3所述方法中的S109,即一次重配置虚拟聚合设备的过程。
S206:虚拟聚合设备持续监测状态变化事件,如图23所示,当监测到用户移动至卧室后,中控设备智慧屏(103设备)控制虚拟聚合设备进行重配置。将虚拟聚合设备的交互入口,由102设备的远场拾音能力,切换为104设备的远场拾音能力,并释放与102设备相关的暂时不使用的可组合能力。
S206可看作是图3所述方法中的S109,即又一次重配置虚拟聚合设备的过程。
S301:虚拟聚合设备持续监测用户触发指令,如通过104设备的远场拾音能力拾取到语音唤醒词,触发语音交互流程。
S302:用户输入交互指令,如通过104设备的远场拾音能力拾取到用户语音指令“播放音乐《七里香》”。
S303:识别用户意图,如通过104设备的云侧远场ASR识别、云侧NLU识别等能力,识别出用户的服务需求类型为“播放音乐”,播放内容为“《七里香》”。
S304:拆分用户意图识别结果为待执行任务,如将用户服务需求拆分映射为单个可组合能力可执行的待执行任务“播放音乐”。
S305:匹配待执行任务至可组合能力,如将待执行任务“播放音乐”映射至104设备的音乐播放可组合能力。如表1所示,由于101~104设备均有音乐播放可组合能力,所需选择104设备的音乐播放可组合能力的过程,可参考用户当前位置(卧室)、偏好(习惯使用音箱)等多因素进行决策。
S306:控制可组合能力执行待执行任务,如控制104设备的音乐播放可组合能力,执行待执行任务“播放音乐”直至结束。示例仅涉及单个待执行任务,可选的,对涉及多个待执行任务的场景,可通过将多个待执行任务按时序、逻辑关系组织为有向无环图,控制进行次 序执行。
S301-S302可看作是图3所述方法中的S106,S303-S304可看作是图3所示方法中的S107,S305-S306可看作是图3所示方法中的S108。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当该计算机程序在计算机上运行时,使得计算机执行上述各个设备如中控设备、第一设备、第二设备、第三设备等分别执行的相关步骤,以实现上述实施例提供的基于多设备提供服务的方法。
本申请实施例还提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述各个设备如中控设备、第一设备、第二设备、第三设备等分别执行的相关步骤,以实现上述实施例中的基于多设备提供服务的方法。
另外,本申请实施例还提供一种装置。该装置具体可以是组件或模块,该装置可包括相连的一个或多个处理器和存储器。其中,存储器用于存储计算机程序。当该计算机程序被一个或多个处理器执行时,使得装置执行上述方法实施例中的各个设备如中控设备、第一设备、第二设备、第三设备等分别执行的相关步骤。
其中,本申请实施例提供的装置、计算机可读存储介质、计算机程序产品或芯片均用于执行上文所提供的基于多设备提供服务的方法。因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
采用本申请实施例提供的基于多设备提供服务的方法,可以实现以下技术效果:
1.体验一致:本申请所述技术方案在某种意义上类似于云计算中的PaaS(Platform-as-a-Service),将不同设备的资源抽象聚合为一个虚拟聚合设备,作为智慧助手的运行承载平台。在该平台上,仅运行唯一的智慧助手的应用程序实例,以提供人机交互服务,从而保障了交互上下文、用户画像、个性化数据的一致性。避免用户在与多台设备进行交互时,由于各设备分别独立运行智慧助手实例,导致切换交互入口设备时,体验的不一致。
2.功耗低:如前文所述,由于分布式系统内仅运行一个智慧助手的应用程序实例,在一定程度上可节约系统处理功耗。
3.交互准确率高:本申请所述技术方案中,分布式系统可根据环境状态,自适应地选择合适的可组合能力,如拾音麦克风、摄像头、AI算法模型等。一方面可选择靠近用户、用户感兴趣、和/或干扰程度较低的交互外设,避免无效、低效外设所拾取交互信号对交互准确率的影响。另一方面,可拓展部分资源紧张设备的能力,选择系统内准确率和运算量较高的AI算法模型,提升交互识别准确率。
4.易于生态拓展:本申请案所述资源的抽象描述方法,与设备型号、厂商、生态等要素解耦,符合该描述规范和/或组件接口标准的设备即可适配接入本案所述的分布式系统,方案通用性较好,相对降低了适配难度。
本申请的各实施方式可以任意进行组合,以实现不同的技术效果。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、 或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。
总之,以上所述仅为本申请技术方案的实施例而已,并非用于限定本申请的保护范围。凡根据本申请的揭露,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (53)

  1. 一种基于多设备提供服务的通信系统,其特征在于,所述通信系统包括多个电子设备,所述多个电子设备包括中控设备,所述中控设备用于管理多个资源,使得所述多个资源执行以下步骤:
    所述多个资源中的第一资源检测第一事件;所述第一资源的数量为一个或多个;
    所述多个资源中的第二资源执行所述第一事件对应的待执行任务;所述第二资源的数量为一个或多个;所述第一资源和/或所述第二资源包括的全部资源,至少来自两个不同的电子设备;
    其中,所述中控设备管理的所述多个资源包括所述多个电子设备的部分或全部资源。
  2. 根据权利要求1所述的通信系统,其特征在于,所述中控设备还用于管理多个资源,使得所述多个资源执行:
    所述第二资源执行所述第一事件对应的待执行任务之前,所述多个资源中的第三资源识别所述第一事件表征的用户意图,并确定满足所述用户意图的待执行任务。
  3. 根据权利要求1或2所述的通信系统,其特征在于,所述资源为可组合能力,所述可组合能力为使用预定方式描述的资源;
    所述第一资源为第一可组合能力,所述第二资源为第二可组合能力。
  4. 根据权利要求3所述的通信系统,其特征在于,所述中控设备还用于:
    所述中控设备管理多个资源,使得所述多个资源执行权利要求1中的步骤之前,将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备;
    所述第一可组合能力、所述第二可组合能力均为所述虚拟聚合设备的可组合能力。
  5. 根据权利要求4所述的通信系统,其特征在于,
    所述中控设备具体用于,配置部分或全部所述多个电子设备的可组合能力的参数。
  6. 根据权利要求4或5所述的通信系统,其特征在于,
    所述中控设备还用于,将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备之前,接收所述中控设备以外的其他设备发送的可组合能力信息,所述可组合能力信息用于指示对应设备提供的可组合能力;
    所述中控设备具体用于根据所述多个电子设备的可组合能力信息,将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备。
  7. 根据权利要求4-6任一项所述的通信系统,其特征在于,
    所述虚拟聚合设备用于运行单一智慧助手,所述单一智慧助手用于支持所述中控设备管理所述多个资源,使得所述多个资源执行权利要求1中的步骤。
  8. 根据权利要求4-7任一项所述的通信系统,其特征在于,
    所述中控设备具体用于,将以下几项配置为虚拟聚合设备:所述中控设备自身的可组合 能力,和,所述通信系统中所述中控设备以外的电子设备的第四可组合能力;
    其中,所述第四可组合能力由所述中控设备根据预设策略确定,或者,所述第四可组合能力由所述中控设备使用自身的可组合能力获知环境信息后,根据所述环境信息确定。
  9. 根据权利要求8所述的通信系统,其特征在于,所述第四可组合能力由所述中控设备根据预设策略确定,所述第四组合能力具体包括:
    所述通信系统中所述中控设备以外的电子设备的全部可组合能力;
    或者,
    所述通信系统中所述中控设备以外的电子设备中采集非隐私内容的可组合能力;
    或者,
    所述通信系统中所述中控设备以外的连接电源的电子设备中的可组合能力。
  10. 根据权利要求8或9所述的通信系统,其特征在于,
    所述中控设备还用于管理所述多个资源,使得所述多个资源执行以下步骤:在所述中控设备将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备之后,所述第一可组合能力检测第二事件;所述第二可组合能力根据所述第二事件确定服务方案;
    所述中控设备还用于,将所述服务方案对应的可组合能力,重新配置为虚拟聚合设备。
  11. 根据权利要求4-10任一项所述的通信系统,其特征在于,所述中控设备具体用于根据以下一项或多项,将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备:用户状态、设备状态、环境状态、用户画像、全局上下文或记忆。
  12. 根据权利要求1-11任一项所述的通信系统,其特征在于,所述第一事件包括以下任意一种:
    用户输入的第一操作;
    用户状态发生变化的事件;
    用户和所述电子设备之间的距离发生变化的事件;
    环境状态发生变化的事件;
    所述电子设备获取到通知消息,或者,获取到即将执行的日程信息的事件。
  13. 根据权利要求3-11任一项所述的通信系统,其特征在于,
    所述第一可组合能力包括多个用于采集第一模态数据的可组合能力;
    或者,所述第一可组合能力由所述中控设备根据用户习惯、可组合能力的活跃度、可组合能力和用户之间的距离、默认排序中的一个或多个确定;
    或者,所述第一可组合能力包括用户选择的可组合能力;
    或者,所述第一可组合能力包括用户注意力所在的电子设备中的可组合能力。
  14. 根据权利要求3-11、13中任一项所述的通信系统,其特征在于,
    所述第二可组合能力包括所述第一可组合能力所在设备中的可组合能力;
    或者,所述第二可组合能力由所述中控设备根据用户习惯、可组合能力的活跃度、可组合能力和用户之间的距离、默认排序中的一个或多个确定;
    或者,所述第二可组合能力包括用户选择的可组合能力;
    或者,所述第二可组合能力包括用户注意力所在设备中的可组合能力。
  15. 根据权利要求13或14所述的通信系统,其特征在于,
    所述中控设备还用于通过第四设备采集的图像,确定用户注意力所在的设备;
    或者,所述中控设备还用于通过第四设备采集的音频、第五设备采集的音频和图像,确定用户注意力所在的设备;
    或者,所述中控设备还用于通过第四设备采集的图像和第五设备采集的图像,确定用户注意力所在的设备。
  16. 根据权利要求1-15任一项所述的通信系统,其特征在于,所述多个电子设备用于,在以下任意一种情况下,从所述多个电子设备中确定所述中控设备:
    所述多个电子设备中有电子设备接收到第二操作;
    在预设时间到达时;
    有电子设备加入或离开所述通信系统时;
    或者,所述多个电子设备组成所述通信系统的预设时长后。
  17. 根据权利要求1-16任一项所述的通信系统,其特征在于,所述多个电子设备具体用于:
    根据资源稳定性、设备模态或用户习惯中的一个或多个,从所述多个电子设备中确定中控设备;
    将所述多个电子设备中属于预设类型的电子设备确定为中控设备;
    将用户选择的电子设备确定为中控设备;
    或者,根据各个电子设备的历史交互信息,从所述多个电子设备中确定中控设备。
  18. 根据权利要求17所述的通信系统,其特征在于,所述多个电子设备具体用于:
    将平均上线设备数最大的电子设备确定为中控设备,所述平均上线设备数为电子设备在统计时间段内统计到的,所述通信系统在单位时间上线的设备的数量的平均值;
    将平均上线设备数的归一化标准差最大的电子设备确定为中控设备;
    将平均上线设备数大于第一值且平均上线设备数的归一化标准差大于第二值的电子设备确定为中控设备;
    或者,将平均上线设备数的数学期望值最大的电子设备确定为中控设备。
  19. 根据权利要求1-18任一项所述的通信系统,其特征在于,所述中控设备的数量包括多个,多个所述中控设备在同一时间或同一空间,连接到所述通信系统中的全部电子设备。
  20. 根据权利要求2所述的通信系统,其特征在于,所述中控设备具体用于管理多个资源,使得所述多个资源执行:
    所述第三资源将所述用户意图拆分为以模态为单位的多个待执行任务;
    不同的所述第二资源执行不同模态的所述待执行任务。
  21. 根据权利要求2所述的通信系统,其特征在于,
    满足所述用户意图的待执行任务包括:多个具备逻辑关系的任务,所述逻辑关系包括以下任意一种或多种:顺序关系、条件关系、循环关系或布尔逻辑;
    所述中控设备具体用于管理多个资源,使得所述多个资源执行:所述第二资源按照所述逻辑关系执行所述多个具备逻辑关系的任务。
  22. 根据权利要求2所述的通信系统,其特征在于,
    所述中控设备还用于管理多个资源,使得所述多个资源执行以下步骤:
    所述第三资源识别所述第一事件表征的用户意图之前,多个所述第一资源接收交互输入;
    所述第三资源根据所述交互输入生成全局上下文;其中,所述全局上下文包括以下一项或多项:所述第一资源接收到所述交互输入的时间、所述第一资源、所述交互输入的交互内容、所述交互输入对应用户的生理特征信息、所述第一资源所属电子设备的设备信息、或所述交互输入控制的目标设备的设备信息;
    所述中控设备具体用于管理多个资源,使得所述多个资源执行以下步骤:所述第三资源基于所述全局上下文,识别所述第一事件表征的用户意图。
  23. 根据权利要求22所述的通信系统,其特征在于,所述交互输入包括:历史输入,和,当前输入;所述全局上下文包括:历史交互信息,和,当前轮交互信息;
    所述中控设备具体用于管理多个资源,使得所述多个资源执行以下步骤:
    所述第一资源基于所述历史输入获取所述历史交互信息,基于所述当前输入获取所述当前轮交互信息;
    所述第三资源从所述历史交互信息中,匹配和所述当前轮交互信息相关联的第一历史交互信息;
    所述第三资源基于所述第一历史交互信息,识别所述第一事件表征的用户意图。
  24. 根据权利要求23所述的通信系统,其特征在于,所述第一历史交互信息包括:
    和第一用户相关的历史交互信息,所述第一用户为触发所述当前输入的用户;
    或者,由第六设备在第一时间接收到的历史交互信息,所述第六设备为所述第一设备或所述第一设备的近场设备,所述第一时间与接收所述当前轮交互信息的时间的间隔小于第一时长;
    或者,在第二时间接收到的第二历史交互信息,所述第二历史交互信息的目标设备,为,所述当前轮交互信息的目标设备或近场设备,所述第二时间与接收所述当前轮交互信息的时间的间隔小于第二时长;
    或者,和所述当前轮交互信息的相关性大于阈值的历史交互信息。
  25. 根据权利要求2所述的通信系统,其特征在于,所述第一事件包括第一对话信息;所述第一对话信息包含第一指令和第二指令,所述第一指令对应的意图和所述第二指令对应的意图相关联,所述第一指令包括第一指代词;
    所述中控设备还用于管理多个资源,使得所述多个资源执行以下步骤:所述第二资源识别所述第一事件表征的用户意图之前,将所述第一对话信息中所述第一指代词指代的对象替代为所述第二指令对应的对象,以获取到第二对话信息;
    所述中控设备具体用于管理多个资源,使得所述多个资源执行以下步骤:所述第三资源基于所述第二对话信息,识别所述第一事件表征的用户意图。
  26. 根据权利要求2所述的通信系统,其特征在于,
    所述中控设备还用于管理多个资源,使得所述多个资源执行以下步骤:
    所述第三资源识别所述第一事件表征的用户意图之前,所述第一资源接收第一预设时间内的交互输入;所述第三资源基于所述交互输入确定记忆,所述记忆表征用户和设备之间交互的习惯或偏好;
    所述中控设备具体用于管理多个资源,使得所述多个资源执行以下步骤:所述第三资源基于所述记忆,识别所述第一事件表征的用户意图。
  27. 根据权利要求2所述的通信系统,其特征在于,
    所述中控设备还用于管理多个资源,使得所述多个资源执行以下步骤:所述第三资源识别所述第一事件表征的用户意图之前,所述第三资源获取到用户画像;
    所述中控设备具体用于管理多个资源,使得所述多个资源执行以下步骤:所述第三资源基于所述用户画像,识别出所述第一事件表征的用户意图。
  28. 根据权利要求2、20-27任一项所述的通信系统,其特征在于,所述中控设备具体用于管理多个资源,使得所述多个资源执行以下步骤:
    所述第三资源根据以下任意一项或多项,识别所述第一事件表征的用户意图,并确定满足所述用户意图的待执行任务:用户状态、设备状态、环境状态、用户画像、全局上下文或记忆。
  29. 根据权利要求1-28任一项所述的通信系统,其特征在于,所述第一事件包括多种模态数据,所述中控设备具体用于管理多个资源,使得所述多个资源执行以下步骤:
    所述第一资源使用第一采样率采集对应的模态数据;
    其中,所述第一采样率为预设的采样率,或者,所述第一采样率为所述第一资源包括的多个资源中,活跃度最高的资源的采样率。
  30. 根据权利要求3所述的通信系统,其特征在于,
    所述多个电子设备的可组合能力包括:交互类可组合能力、服务类可组合能力;
    所述第一可组合能力属于所述交互类可组合能力,所述第二可组合能力属于所述服务类可组合能力。
  31. 根据权利要求3所述的通信系统,其特征在于,所述多个电子设备的可组合能力包括以下任意一个或多个:使用预定方式描述的摄像头资源、麦克风资源、传感器资源、显示屏资源或计算资源。
  32. 根据权利要求4所述的通信系统,其特征在于,所述可组合能力信息还包括以下任意一个或多个:所述可组合能力的位置、朝向、类别、性能、参数、版本或尺寸。
  33. 根据权利要求1-32任一项所述的通信系统,其特征在于,所述多个电子设备通过以下任意一种或多种技术通信:WLAN、Wi-Fi P2P、BT、NFC,IR、ZigBee、UWB、热点、Wi-Fi softAP、蜂窝网络或有线技术。
  34. 一种基于多设备提供服务的方法,其特征在于,所述方法应用于中控设备,所述方法包括:
    所述中控设备管理多个资源,使得所述多个资源执行以下步骤:
    所述多个资源中的第一资源检测第一事件,所述第一资源的数量为一个或多个;
    所述多个资源中的第二资源执行所述第一事件对应的待执行任务,所述第二资源的数量为一个或多个;所述第一资源和/或所述第二资源包括的全部资源,至少来自两个不同的电子设备;
    其中,所述中控设备管理的所述多个资源包括多个电子设备的部分或全部资源,所述多个电子设备包括所述中控设备。
  35. 根据权利要求34所述的方法,其特征在于,所述第二资源执行所述第一事件对应的待执行任务之前,所述方法还包括:
    所述中控设备管理多个资源,使得所述多个资源执行:所述多个资源中的第三资源识别所述第一事件表征的用户意图,并确定满足所述用户意图的待执行任务。
  36. 根据权利要求34或35所述的方法,其特征在于,所述资源为可组合能力,所述可组合能力为使用预定方式描述的资源;
    所述第一资源为第一可组合能力,所述第二资源为第二可组合能力。
  37. 根据权利要求36所述的方法,其特征在于,所述中控设备管理多个资源,使得所述多个资源执行权利要求34中的步骤之前,所述方法还包括:
    所述中控设备将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备;
    其中,所述第一可组合能力、所述第二可组合能力均为所述虚拟聚合设备的可组合能力。
  38. 根据权利要求37所述的方法,其特征在于,所述中控设备将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备,具体包括:
    所述中控设备配置部分或全部所述多个电子设备的可组合能力的参数。
  39. 根据权利要求37或38所述的方法,其特征在于,
    所述中控设备将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备之前,所述方法还包括:所述中控设备接收所述中控设备以外的其他设备发送的可组合能力信息,所述可组合能力信息用于指示对应设备提供的可组合能力;
    所述中控设备根据所述多个电子设备的可组合能力信息,将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备。
  40. 根据权利要求37-39任一项所述的方法,其特征在于,
    所述虚拟聚合设备用于运行单一智慧助手,所述单一智慧助手用于支持所述中控设备管理所述多个资源,使得所述多个资源执行权利要求34中的步骤。
  41. 根据权利要求37-40任一项所述的方法,其特征在于,所述中控设备将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备,具体包括:
    所述中控设备将以下几项配置为虚拟聚合设备:所述中控设备自身的可组合能力,和,所述通信系统中所述中控设备以外的电子设备的第四可组合能力;
    其中,所述第四可组合能力由所述中控设备根据预设策略确定,或者,所述第四可组合能力由所述中控设备使用自身的可组合能力获知环境信息后,根据所述环境信息确定。
  42. 根据权利要求37-41任一项所述的方法,其特征在于,所述中控设备将部分或全部所述多个电子设备的可组合能力配置为虚拟聚合设备之后,
    所述中控设备管理所述多个资源,使得所述多个资源执行:所述第一可组合能力检测第二事件;所述第二可组合能力根据所述第二事件确定服务方案;
    所述中控设备将所述服务方案对应的可组合能力,重新配置为虚拟聚合设备。
  43. 根据权利要求34-42任一项所述的方法,其特征在于,所述第一事件包括以下任意一种:
    用户输入的第一操作;
    用户状态发生变化的事件;
    用户和所述电子设备之间的距离发生变化的事件;
    环境状态发生变化的事件;
    所述电子设备获取到通知消息,或者,获取到即将执行的日程信息的事件。
  44. 根据权利要求35所述的方法,其特征在于,所述中控设备管理多个资源,使得所述多个资源执行:
    所述第三资源将所述用户意图拆分为以模态为单位的多个待执行任务;
    不同的所述第二资源执行不同模态的所述待执行任务。
  45. 根据权利要求35所述的方法,其特征在于,满足所述用户意图的待执行任务包括:多个具备逻辑关系的任务,所述逻辑关系包括以下任意一种或多种:顺序关系、条件关系、循环关系或布尔逻辑;
    所述中控设备管理多个资源,使得所述多个资源执行:
    所述第二资源按照所述逻辑关系执行所述多个具备逻辑关系的任务。
  46. 根据权利要求35所述的方法,其特征在于,所述方法还包括:所述中控设备管理多个资源,使得所述多个资源执行:
    所述第三资源识别所述第一事件表征的用户意图之前,多个所述第一资源接收交互输入;
    所述第三资源根据所述交互输入生成全局上下文;其中,所述全局上下文包括以下一项或多项:所述第一资源接收到所述交互输入的时间、所述第一资源、所述交互输入的交互内容、所述交互输入对应用户的生理特征信息、所述第一资源所属电子设备的设备信息、或所述交互输入控制的目标设备的设备信息;
    所述第三资源基于所述全局上下文,识别所述第一事件表征的用户意图。
  47. 根据权利要求46所述的方法,其特征在于,所述交互输入包括:历史输入,和,当前输入;所述全局上下文包括:历史交互信息,和,当前轮交互信息;
    所述中控设备管理多个资源,使得所述多个资源执行:
    所述第一资源基于所述历史输入获取所述历史交互信息,基于所述当前输入获取所述当前轮交互信息;
    所述第三资源从所述历史交互信息中,匹配和所述当前轮交互信息相关联的第一历史交互信息;
    所述第三资源基于所述第一历史交互信息,识别所述第一事件表征的用户意图。
  48. 根据权利要求35所述的方法,其特征在于,所述第一事件包括第一对话信息;所述第一对话信息包含第一指令和第二指令,所述第一指令对应的意图和所述第二指令对应的意图相关联,所述第一指令包括第一指代词;
    所述中控设备管理多个资源,使得所述多个资源执行:
    所述第二资源识别所述第一事件表征的用户意图之前,将所述第一对话信息中所述第一指代词指代的对象替代为所述第二指令对应的对象,以获取到第二对话信息;
    所述第三资源基于所述第二对话信息,识别所述第一事件表征的用户意图。
  49. 根据权利要求35所述的方法,其特征在于,所述中控设备管理多个资源,使得所述多个资源执行:
    所述第三资源识别所述第一事件表征的用户意图之前,所述第一资源接收第一预设时间内的交互输入;所述第三资源基于所述交互输入确定记忆,所述记忆表征用户和设备之间交互的习惯或偏好;
    所述第三资源基于所述记忆,识别所述第一事件表征的用户意图。
  50. 一种电子设备,其特征在于,包括:存储器、一个或多个处理器;所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行如权利要求34-49任一项所述的方法。
  51. 一种计算机可读存储介质,其上存储有计算机程序指令;其特征在于,当所述计算机程序指令被电子设备执行时,使得电子设备实现如权利要求34-49任一项所述的方法。
  52. 一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,其特征在于,当所述计算机可读代码在电子设备中运行时,所述电 子设备中的处理器实现如权利要求34-49任一项所述的方法。
  53. 一种通信系统,所述通信系统包括多个电子设备,所述多个电子设备包括中控设备,所述中控设备用于执行如权利要求34-49任一项所述的方法。
PCT/CN2022/131166 2021-11-12 2022-11-10 基于多设备提供服务的方法、相关装置及系统 WO2023083262A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111340372 2021-11-12
CN202111340372.5 2021-11-12
CN202111633492.4 2021-12-28
CN202111633492.4A CN116126509A (zh) 2021-11-12 2021-12-28 基于多设备提供服务的方法、相关装置及系统

Publications (1)

Publication Number Publication Date
WO2023083262A1 true WO2023083262A1 (zh) 2023-05-19

Family

ID=86294337

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131166 WO2023083262A1 (zh) 2021-11-12 2022-11-10 基于多设备提供服务的方法、相关装置及系统

Country Status (2)

Country Link
CN (2) CN116126510A (zh)
WO (1) WO2023083262A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116431794A (zh) * 2023-06-15 2023-07-14 图观(天津)数字科技有限公司 一种基于流程自动化机器人技术的智能问答方法及系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116369864B (zh) * 2023-06-02 2023-08-08 长春中医药大学 基于数据编码的睡眠监测数据智能管理方法和系统
CN116598006B (zh) * 2023-07-18 2023-10-17 中国医学科学院北京协和医院 一种脓毒血症预警装置及应用系统
CN117032940B (zh) * 2023-10-08 2024-02-13 北京小米移动软件有限公司 资源调度的系统、方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062031A (zh) * 2018-02-13 2018-05-22 宁夏煜隆科技有限公司 智能家居控制方法、装置、系统及电子设备
CN109814717A (zh) * 2019-01-29 2019-05-28 珠海格力电器股份有限公司 一种家居设备控制方法、装置、控制设备及可读存储介质
CN112397062A (zh) * 2019-08-15 2021-02-23 华为技术有限公司 语音交互方法、装置、终端及存储介质
WO2021180062A1 (zh) * 2020-03-09 2021-09-16 华为技术有限公司 意图识别方法及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062031A (zh) * 2018-02-13 2018-05-22 宁夏煜隆科技有限公司 智能家居控制方法、装置、系统及电子设备
CN109814717A (zh) * 2019-01-29 2019-05-28 珠海格力电器股份有限公司 一种家居设备控制方法、装置、控制设备及可读存储介质
CN112397062A (zh) * 2019-08-15 2021-02-23 华为技术有限公司 语音交互方法、装置、终端及存储介质
WO2021180062A1 (zh) * 2020-03-09 2021-09-16 华为技术有限公司 意图识别方法及电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116431794A (zh) * 2023-06-15 2023-07-14 图观(天津)数字科技有限公司 一种基于流程自动化机器人技术的智能问答方法及系统
CN116431794B (zh) * 2023-06-15 2023-08-15 图观(天津)数字科技有限公司 一种基于流程自动化机器人技术的智能问答方法及系统

Also Published As

Publication number Publication date
CN116126510A (zh) 2023-05-16
CN116126509A (zh) 2023-05-16

Similar Documents

Publication Publication Date Title
WO2023083262A1 (zh) 基于多设备提供服务的方法、相关装置及系统
WO2021180062A1 (zh) 意图识别方法及电子设备
WO2021063343A1 (zh) 语音交互方法及装置
CN110336720B (zh) 设备控制方法和设备
CN110111787B (zh) 一种语义解析方法及服务器
WO2021052263A1 (zh) 语音助手显示方法及装置
US10389873B2 (en) Electronic device for outputting message and method for controlling the same
CN110503959B (zh) 语音识别数据分发方法、装置、计算机设备及存储介质
US11874904B2 (en) Electronic device including mode for using an artificial intelligence assistant function of another electronic device
WO2021052282A1 (zh) 数据处理方法、蓝牙模块、电子设备与可读存储介质
US11056114B2 (en) Voice response interfacing with multiple smart devices of different types
US20140143666A1 (en) System And Method For Effectively Implementing A Personal Assistant In An Electronic Network
CN113705823A (zh) 基于联邦学习的模型训练方法和电子设备
CN112331193A (zh) 语音交互方法及相关装置
CN111797249A (zh) 一种内容推送方法、装置与设备
US20220366327A1 (en) Information sharing method for smart scene service and related apparatus
WO2022088964A1 (zh) 一种电子设备的控制方法和装置
WO2022135157A1 (zh) 页面显示的方法、装置、电子设备以及可读存储介质
WO2022143258A1 (zh) 一种语音交互处理方法及相关装置
CN116670667A (zh) Ai系统中的接入认证
US20190163436A1 (en) Electronic device and method for controlling the same
CN114493470A (zh) 日程管理的方法、电子设备和计算机可读存储介质
WO2023001152A1 (zh) 一种推荐视频片段的方法、电子设备及服务器
WO2023071940A1 (zh) 跨设备的导航任务的同步方法、装置、设备及存储介质
WO2022188551A1 (zh) 信息处理方法与装置、主控设备和受控设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22892055

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022892055

Country of ref document: EP

Effective date: 20240425