CN113096668A

CN113096668A - Method and device for constructing collaborative voice interaction engine cluster

Info

Publication number: CN113096668A
Application number: CN202110404771.7A
Authority: CN
Inventors: 牛坤; 张伟萌; 戴帅湘
Original assignee: Beijing Moran Cognitive Technology Co Ltd
Current assignee: Fujian Yixinhai Information Technology Co ltd; Xiamen Lide Group Co ltd; Xiamen Power Supply Co of State Grid Fujian Electric Power Co Ltd
Priority date: 2021-04-15
Filing date: 2021-04-15
Publication date: 2021-07-09
Anticipated expiration: 2041-04-15
Also published as: CN113096668B

Abstract

The invention discloses a method and a device for constructing a collaborative voice interaction engine cluster, wherein the method comprises the following steps: the method comprises the steps that a first intelligent terminal generates a first voice processing capacity list after successfully accessing a local wireless network for the first time; the method comprises the steps that a first intelligent terminal sends a voice interaction engine cluster search request to a voice assistant cloud server, and receives a voice interaction engine cluster search response from the voice assistant cloud server, wherein the voice interaction engine cluster search response carries attribute information of at least one first voice interaction engine cluster; and the first intelligent terminal constructs and generates first cooperative voice interaction engine cluster attribute information according to the voice interaction engine cluster search response. By the method, when different voice interaction engines are respectively installed on different equipment of a user, and a voice interaction engine cluster cannot be formed on one equipment, good experience brought by a cooperative voice interaction engine cluster can be enjoyed, and the intelligent degree of the voice assistant of the intelligent terminal with limited hardware equipment is improved.

Description

Method and device for constructing collaborative voice interaction engine cluster

Technical Field

The embodiment of the invention relates to the technical field of information processing, in particular to a method and a device for constructing a collaborative voice interaction engine cluster.

Background

In recent years, intelligent voice interaction technology is rapidly developed, intelligent terminals can be connected to the internet due to the development of the internet of things, application scenes of voice assistants are more and more extensive, and a plurality of intelligent terminals such as intelligent televisions, intelligent sound boxes and the like start to provide voice assistant functions, so that users are allowed to request execution of various tasks through voice instructions, such as weather inquiry, air ticket booking and the like.

Considering the factors of size, price and the like, the processing capacity or storage space of some intelligent terminals has limitations, and the number of voice processing engines available for users to download is huge, and it is obviously unrealistic to download or load many voice processing engines on each intelligent terminal. The problem can be well solved by realizing the complementary and cooperative functions among the voice assistants of different intelligent terminals.

In the prior art, the voice assistants of each intelligent terminal can only realize limited complementary and cooperative functions, for example, after receiving a dialect voice instruction of a user, an intelligent television without dialect training finds that the intelligent television cannot recognize the voice instruction, the intelligent television requests other intelligent terminals nearby for cooperation, selects one intelligent terminal capable of recognizing the voice instruction, sends the voice instruction to the intelligent terminal, and receives a voice recognition result, so that specific operations such as television opening and channel switching are executed based on the voice recognition result. In the prior art, although the smart television can finally respond to the user instruction, the response timeliness is poor, and the real-time interaction requirement of the user is difficult to meet. In addition, in the prior art, only limited cooperation of the voice recognition function can be provided, and mutual calling or cooperation of the voice interaction engines between the intelligent terminals cannot be realized, when the voice interaction engine to be used by the user is only installed on part of the equipment of the user, the user needs to remember which intelligent terminal the voice interaction engine is installed on, so that the experience of the user using the voice assistant is poor.

In the prior art, a voice interaction engine cluster can be formed on a single intelligent terminal, wherein the voice interaction engine cluster comprises a plurality of voice interaction engines, when one of the voice interaction engines is activated, other voice interaction engines in the voice interaction engine cluster to which the voice interaction engine belongs are also activated, slot position instant synchronization is performed among the plurality of voice interaction engines, execution results of the plurality of voice interaction engines are simultaneously provided for a user, an intelligent task execution mode is provided for the user, and user experience is improved. However, the above prior art is limited to that a plurality of speech interaction engines forming a speech interaction engine cluster are located in the same intelligent terminal, and the speech interaction engine cluster cannot be formed among a plurality of devices, and when different speech interaction engines are respectively installed in different devices of a user, the user cannot enjoy good experience brought by the speech interaction engine cluster.

Based on the analysis, how to better realize the complementation and cooperation among the voice assistants of the plurality of intelligent terminals so as to respond to the user more quickly, how to construct an interaction engine cluster based on the voice interaction engines included by the voice assistants of the plurality of intelligent terminals and how to use the interaction engine cluster, and how to select when the plurality of terminals can cooperate, is an urgent problem to be solved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method and a device for constructing a collaborative voice interaction engine cluster.

The invention provides a method for constructing a collaborative voice interaction engine cluster, which comprises the following steps:

200, a first intelligent terminal generates a first voice processing capacity list after successfully accessing the local wireless network for the first time; the first voice processing capability list comprises at least one first voice interaction engine included by the first intelligent terminal and at least one voice interaction engine included by at least one second intelligent terminal; the second intelligent terminal successfully accesses the local wireless network for the first time before the first intelligent terminal accesses the local wireless network;

step 201, a first intelligent terminal sends a voice interaction engine cluster search request to a voice assistant cloud server, wherein the voice interaction engine cluster search request carries at least one first voice interaction engine;

step 202, a first intelligent terminal receives a voice interaction engine cluster search response from a voice assistant cloud server, wherein the voice interaction engine cluster search response carries attribute information of at least one first voice interaction engine cluster, and voice interaction engines included in the first voice interaction engine cluster include the first voice interaction engine and at least one second voice interaction engine;

step 203, the first intelligent terminal queries a first voice processing capability list and determines at least one second intelligent terminal corresponding to the at least one second voice interaction engine;

step 204, the first intelligent terminal constructs and generates first cooperative voice interaction engine cluster attribute information according to the voice interaction engine cluster search response, wherein the first cooperative voice interaction engine cluster attribute information comprises a corresponding relation between the first voice interaction engine and the first intelligent terminal and a corresponding relation between the at least one second voice interaction engine and the at least one second intelligent terminal.

The invention provides a device for constructing a collaborative voice interaction engine cluster, which comprises the following components:

the device comprises a voice processing capability list dynamic construction unit, a voice processing capability list dynamic construction unit and a voice processing capability list dynamic construction unit, wherein the voice processing capability list dynamic construction unit is used for generating a first voice processing capability list after the device successfully accesses the local wireless network for the first time; the first voice processing capability list comprises at least one first voice interaction engine included by the device and at least one voice interaction engine included by at least one second intelligent terminal; the second intelligent terminal successfully accesses the local wireless network for the first time before the device accesses the local wireless network;

the communication unit is used for sending a voice interaction engine cluster search request to the voice assistant cloud server, wherein the voice interaction engine cluster search request carries the at least one first voice interaction engine; receiving a voice interaction engine cluster search response from a voice assistant cloud server, wherein the voice interaction engine cluster search response carries attribute information of at least one first voice interaction engine cluster, and voice interaction engines included in the first voice interaction engine cluster comprise the first voice interaction engine and at least one second voice interaction engine;

the collaborative voice interaction engine cluster construction unit is used for inquiring the first voice processing capacity list and determining at least one second intelligent terminal corresponding to the at least one second voice interaction engine; and constructing and generating first cooperative voice interaction engine cluster attribute information according to the voice interaction engine cluster search response, wherein the first cooperative voice interaction engine cluster attribute information comprises the corresponding relation between the first voice interaction engine and the device and the corresponding relation between the at least one second voice interaction engine and the at least one second intelligent terminal.

The invention also provides a computer device characterized in that it comprises a processor and a memory, in which a computer program is stored that is executable on the processor, which computer program, when executed by the processor, implements the method as described above.

The invention also provides a computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program is executable on a processor, and when executed implements the method as described above.

The invention also provides a voice assistant which is characterized by comprising the device.

By the method and the device, the cooperative voice interaction engine cluster can be generated based on the voice interaction engines of the voice assistants of the multiple devices, so that a user can enjoy good experience brought by the cooperative voice interaction engine cluster when different voice interaction engines are respectively installed on different devices and the voice interaction engine cluster cannot be formed on one device, and the intelligent degree of the voice assistant of the intelligent terminal with limited hardware equipment is improved. In addition, the construction of the collaborative voice interaction engine cluster is simplified and accelerated, so that the user can enjoy the good experience brought by the interaction engine cluster as soon as possible.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a voice assistant system in one embodiment of the invention.

Fig. 2 is a method of dynamically building a list of speech processing capabilities in one embodiment of the invention.

FIG. 3 is a method for building a collaborative speech interaction engine cluster in another embodiment of the present invention.

FIG. 4 is a method for interacting with a voice assistant based on a collaborative speech interaction engine cluster in another embodiment of the invention.

Fig. 5 is an apparatus for dynamically building a list of speech processing capabilities in one embodiment of the invention.

FIG. 6 is an apparatus for building a collaborative speech interaction engine cluster according to another embodiment of the present invention.

FIG. 7 is an apparatus for interacting with a voice assistant based on a collaborative speech interaction engine cluster according to another embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The embodiments and specific features of the embodiments of the present invention are detailed descriptions of technical solutions of the embodiments of the present invention, and are not limited to technical solutions of the present invention, and the technical features of the embodiments and the embodiments of the present invention may be combined with each other without conflict.

1. Voice assistant system

FIG. 1 shows a block diagram of a voice assistant system on an intelligent terminal. The voice assistant system mainly comprises: a human-computer interaction interface, a processing module, a database and the like. The processing module comprises m voice recognition engines and n voice interaction engines, wherein m and n are positive integers which are more than or equal to 1. The processing module is connected with the human-computer interaction interface, can receive data input by a user through the human-computer interaction interface, and can output interaction data to the user through the human-computer interaction interface, such as dialogue data, task execution processes and results fed back to the user.

The voice recognition engine is used for recognizing the voice command of the user as a text; the speech interaction engines are used for executing specific tasks based on the text recognition result or the user text instruction, and each speech interaction engine can comprise a semantic understanding module, a dialogue management and control module, a dialogue generating module and a command executing module. Specifically, the voice interaction engine determines a user intent (i.e., determines a task), determines key knowledge data corresponding to each slot associated with the user intent, populates the key knowledge data into the corresponding slot, and then executes the task based on the populated slot or slots.

In some embodiments, the processing module may include an interaction main engine and/or at least one interaction sub-engine. Wherein the main voice interaction engine is a default voice interaction engine of the voice assistant system. Each speech interaction engine (main speech interaction engine, sub speech interaction engine) is capable of performing at least one task, i.e. each speech interaction engine may be associated with at least one task, and the tasks capable of being performed by the different sub speech interaction engines may be the same or different.

In some embodiments, the main speech interaction engine determines, based only on the user instruction, a sub speech interaction engine that is capable of handling the user instruction, which itself does not perform a specific task. Therefore, the determination of the user intention in the process can be performed by a main voice interaction engine in the voice assistant system, the main voice interaction engine is further used for selecting one or more sub voice interaction engines to process the user instruction based on the determined user intention, and the slot filling and the task execution in the process are performed by the one or more sub voice interaction engines determined by the main voice interaction engine. In the invention, the method executed by the intelligent terminal can be completed by the voice assistant or the voice interaction main engine of the voice assistant.

2. Dynamically building a list of speech processing capabilities

The present invention provides a method for dynamically constructing a speech processing capability list in a local wireless network, referring to fig. 2, the method comprises the following steps:

step 101, responding to the first successful access to the local wireless network, a first intelligent terminal acquires a first voice processing capacity which can be executed by the first intelligent terminal, and initializes a local first voice processing capacity list;

102, the first intelligent terminal judges whether at least one second intelligent terminal is successfully accessed to the local wireless network for the first time before the first intelligent terminal accesses to the local wireless network, if so, the step 103 is executed;

103, the first intelligent terminal generates a first voice processing capability interaction message, broadcasts the first voice processing capability interaction message in the local wireless network, so that after receiving the first voice processing capability interaction message, the at least one second intelligent terminal generates a second voice processing capability interaction message, sends the second voice processing capability interaction message to the first intelligent terminal, and updates a local second voice processing capability list based on the first voice processing capability interaction message;

wherein the first voice processing capability interactive message comprises a first voice processing capability which can be executed by the first intelligent terminal; the second voice processing capability interactive message comprises second voice processing capability which can be executed by the second intelligent terminal;

and 104, the first intelligent terminal receives a second voice processing capability interaction message sent by the at least one second intelligent terminal, and updates the first voice processing capability list based on the second voice processing capability interaction message.

As shown in step 101, the method of the present invention is executed only when the local wireless network is successfully accessed for the first time, and therefore, even if the intelligent terminal is online again after being in an offline state for a period of time with respect to the local wireless network, the above-mentioned step 101 and step 104 of the present invention do not need to be executed again.

Preferably, the first intelligent terminal and the second intelligent terminal can access to the internet by accessing a local wireless network.

Preferably, the process of accessing the local wireless network for the first time includes: the first intelligent terminal sends a first access request to a server in the local wireless network, the server authenticates the first intelligent terminal, if the authentication is passed, a first access success response is returned to the first intelligent terminal, and information of the first intelligent terminal is recorded, such as an MAC address and the like, so that an identifier of the first intelligent terminal can be uniquely determined. The server also records the networking state of the first intelligent terminal.

Preferably, the voice processing capability includes a voice recognition capability and a voice interaction capability. The speech recognition capability includes a speech recognition capability of a speech recognition engine (called a local speech recognition engine) included in a speech assistant of an intelligent terminal, such as a chinese recognition capability, an american english recognition capability, an english recognition capability, a mandarin recognition capability, a sikawa dialect recognition capability, and the like. The voice interaction capability comprises tasks which can be executed by a voice interaction engine (called a local voice interaction engine) included by a voice assistant of a certain intelligent terminal, such as air ticket reservation, weather inquiry and the like.

Preferably, the first speech processing capability list includes a speech processing engine type, a speech processing engine name, a capability providing terminal identifier. For example, if the voice recognition capability that the first intelligent terminal can execute includes mandarin chinese recognition capability and the voice interaction capability that can execute, that is, the task includes air ticket reservation and takeaway meal ordering, the list of the first voice processing capability initialized in step 101 is shown in table 1. Hereinafter, when the voice processing capability is a voice interaction engine, the capability name is the same as the task that the voice processing engine can perform, and the two can be used instead of each other.

TABLE 1 first Speech processing capabilities List

Preferably, in step 102, the first intelligent terminal sends an inquiry request to a server of the local wireless network, and the server carries, in an inquiry response, all second intelligent terminal identifiers that have successfully accessed the local wireless network for the first time before the first intelligent terminal accesses the local wireless network for the first time, and sends the inquiry response to the first intelligent terminal. For example, before the first intelligent terminal first accesses the network, if the second intelligent terminal a and the second intelligent terminal B have successively and successfully accessed to the local wireless network for the first time, the query response includes the identifiers of the second intelligent terminal a and the second intelligent terminal B. And if no other equipment except the first intelligent terminal accesses the local wireless network, the query response is null.

Preferably, in step 103, the first intelligent terminal sends the first voice processing capability interaction message to the server, where the first voice processing capability interaction message carries a broadcast identifier, and the server broadcasts the first voice processing capability interaction message in the local wireless network.

Preferably, in step 104, the first intelligent terminal may update the first voice processing capability list based on a second voice processing capability interaction message sent by a second intelligent terminal after receiving the second voice processing capability interaction message, or may update based on all received second voice processing capability interaction messages after receiving all second voice processing capability interaction messages sent by the second intelligent terminal. For example, if the voice recognition capabilities that the second intelligent terminal a can execute include mandarin chinese recognition capability and tetragon recognition capability, the voice interaction capabilities that the second intelligent terminal B can execute include weather query, mandarin chinese recognition capability and cantonese recognition capability, and the voice interaction capabilities that the second intelligent terminal B can execute include weather query and hotel reservation, the updated first voice processing capability list is shown in table 2.

TABLE 2 first Speech processing capability List

Preferably, the first intelligent terminal and the second intelligent terminal may also exchange intelligent terminal parameter information with each other, for example, the intelligent terminal parameter information is carried in the first voice processing capability interaction message and the second voice processing capability interaction message or in a separate intelligent terminal parameter interaction message, and includes at least one of: processor capability, communication capability, energy harvesting manner. The processor capacity can be CPU type, version, memory size and the like; the communication capabilities include: whether high-speed cellular communication (such as 4G, 5G and the like) is supported, whether direct communication between terminals (such as wifi direct connection, Bluetooth communication and one-to-many Bluetooth communication) is supported; the energy acquisition mode can be as follows: ac powered, battery powered and not supporting wireless charging, battery powered and supporting wireless charging, etc. After receiving the interactive message, the first intelligent terminal and the second intelligent terminal respectively locally record the parameter information, such as recording in a first voice processing capability list and a second voice processing capability list, or recording in a special intelligent terminal parameter list.

Preferably, if the first intelligent terminal receives the second voice processing capability interaction message only from a part of the at least one second intelligent terminal and does not receive the second voice processing capability interaction message from the rest second intelligent terminals within a preset time, the first intelligent terminal selects one second intelligent terminal from the part of the second intelligent terminals receiving the second voice processing capability interaction message as a relay second intelligent terminal, sends a voice processing capability relay request message to the relay second intelligent terminal, wherein the voice processing capability relay request message carries identifiers of the rest second intelligent terminals, and the relay second intelligent terminal queries the voice processing capability of the rest second intelligent terminals from a local second voice processing capability list after receiving the voice processing capability relay request message, and the relay response message is carried in the voice processing capability relay response message and is sent to the first intelligent terminal. At this time, the first intelligent terminal further updates the first voice processing capability list based on the voice processing capability relay response message.

Preferably, the first intelligent terminal selects one second intelligent terminal from the part of the second intelligent terminals that receive the second voice processing capability interaction message, specifically, selects the second intelligent terminal that replies the second voice processing capability interaction message earliest among the part of the second intelligent terminals.

Preferably, the query response in step 102 further carries a real-time networking status of the at least one second intelligent terminal, which has successfully accessed the local wireless network for the first time before the first intelligent terminal accesses the local wireless network, in the local wireless network. The real-time networking state is one of online or offline.

Preferably, when there is a change in the networking state of the intelligent terminal in the local wireless network, the server synchronizes this change to all (on-network) intelligent terminals in the network.

Preferably, the first voice processing capability list and the second voice processing capability list may further include a real-time networking status of the capability providing terminal.

Preferably, in step 104, the first intelligent terminal receives a second voice processing capability interaction message sent by at least one second intelligent terminal whose real-time networking status is on-network, and updates the first voice processing capability list based on the second voice processing capability interaction message. And then, the first intelligent terminal selects one second intelligent terminal from at least one second intelligent terminal with the real-time networking state as an on-line state as a relay second intelligent terminal, sends a voice processing capacity relay request message to the relay second intelligent terminal, wherein the voice processing capacity relay request message carries an identifier of the second intelligent terminal with the real-time networking state as an off-line state, and after receiving the voice processing capacity relay request message, the relay second intelligent terminal queries from a local second voice processing capacity list to obtain the voice processing capacity of the second intelligent terminal with the real-time networking state as the off-line state, carries the voice processing capacity relay request message and sends the voice processing capacity relay request message to the first intelligent terminal, so that the first intelligent terminal updates the first voice processing capacity list based on the voice processing capacity relay response message.

Preferably, the intelligent terminal in the local wireless network may further record a low battery state of other intelligent terminals, for example, include the low battery state in the voice processing capability list. When the battery power of the first intelligent terminal is lower than a preset threshold value and the power cannot be supplemented through wireless charging (if the battery power is not supported or an automatic wireless charging system fails), broadcasting a low power alarm message in the local wireless network, so that other intelligent terminals in the local wireless network record that the first intelligent terminal is in a low power state currently after receiving the message. Similarly, the first intelligent terminal can also receive low power alarm messages sent by other intelligent terminals.

Preferably, the method further comprises a step 105, wherein the first intelligent terminal receives a user instruction, and determines a first voice processing engine capable of processing the user instruction based on a first voice processing capability list; judging whether the first voice processing engine is a local voice processing engine of the first intelligent terminal, if not, sending the user instruction to a cooperative intelligent terminal to which the first voice processing engine belongs, and receiving a voice processing result from the cooperative intelligent terminal; if so, the first intelligent terminal adopts the first voice processing engine to obtain a voice processing result. In addition, when the voice processing result is a task execution result, the first intelligent terminal also provides the voice processing result for the user.

Preferably, the first intelligent terminal receives a user instruction, specifically, a user voice instruction, and thereafter, further performs a related operation of the definition determination in step 300 described below. Then, a first speech processing engine that determines that the user instruction can be processed based on the first speech processing capability list is executed.

Preferably, the step 105 may further include: the first intelligent terminal receives a user instruction, when the user instruction cannot be processed locally is judged, a second intelligent terminal capable of processing the user instruction is selected as a cooperative intelligent terminal based on the first voice processing capacity list, the user instruction is sent to the cooperative intelligent terminal, and a voice processing result is received from the cooperative intelligent terminal.

Preferably, the selecting, based on the first voice processing capability list, the second intelligent terminal capable of processing the user instruction as a cooperative intelligent terminal specifically includes: and if more than two second intelligent terminals can process the user instruction, selecting one of the intelligent terminals as a cooperative intelligent terminal according to the locally recorded intelligent terminal parameter information.

Preferably, the selection is made according to at least one of:

selecting one of the two or more second intelligent terminals as a cooperative intelligent terminal according to the recorded processor capacity of the two or more second intelligent terminals;

judging the networking state and the communication capacity of the more than two second intelligent terminals in the local wireless network, and selecting one of the remaining second intelligent terminals after eliminating the second intelligent terminals which meet one of the following conditions (for example, according to the processor capacity and the energy acquisition mode): the networking state is offline and does not support high speed cellular communications; the networking state is offline, high-speed cellular communication is supported, but information interaction with the first intelligent terminal cannot be carried out (for example, through high-speed cellular communication, wifi direct connection, Bluetooth communication, one-to-many Bluetooth communication and the like);

selecting one of the energy acquisition modes as a cooperative intelligent terminal according to the recorded energy acquisition mode and/or low-power state, for example, preferentially selecting the energy acquisition mode to be a second intelligent terminal which is powered by alternating current and a battery and supports wireless charging and battery power and does not support wireless charging in sequence; and/or selecting one of the remaining second intelligent terminals after excluding the second intelligent terminals in the low-battery state.

When considering some of the above factors, the network status and communication capability may be considered first, the processor capability is considered, and finally the energy acquisition mode is considered.

Preferably, when the user instruction is a voice instruction, judging that the user instruction cannot be processed locally can be that the voice instruction cannot be subjected to voice recognition locally; in addition, the determination that the user instruction cannot be processed locally may also be that a task determined based on the voice instruction cannot be processed locally, that is, the task does not have a voice interaction engine associated with the task locally, or the voice interaction engine associated with the task belongs to a voice assistant of another intelligent terminal.

Preferably, if the user instruction is a voice instruction and the local inability to process the user instruction is a local inability to perform voice recognition on the voice instruction, the step 105 further includes: and extracting the voiceprint characteristics of the user based on the user instruction, and storing the voiceprint characteristics and the cooperative intelligent terminal in a correlation manner.

Preferably, the method further comprises: and when the first intelligent terminal receives the voice command with the voiceprint characteristics again, the voice command is directly sent to the cooperative intelligent terminal without trying local voice recognition, and a voice recognition result is obtained from the cooperative intelligent terminal.

Preferably, after the first intelligent terminal and/or the second intelligent terminal is offline for a period of time and online again, the first intelligent terminal and/or the second intelligent terminal sends a voice processing capability list synchronization request to at least one other intelligent terminal online in the local wireless network, receives a voice processing capability list synchronization response from the at least one other intelligent terminal, and updates the local voice processing capability list based on the voice processing capability list synchronization response.

Preferably, after the first intelligent terminal and/or the second intelligent terminal is offline for a period of time and is on the network again, the networking states of all the intelligent terminals in the local wireless network are acquired from the server.

Preferably, messages sent between the intelligent terminals, such as the second voice processing capability interaction message, may be forwarded through the server.

Preferably, the method further includes step 106, when the voice processing capability changes, the first intelligent terminal updates a local first voice processing capability list, generates a first voice processing capability update message, and broadcasts the first voice processing capability update message in the local wireless network, so that the at least one second intelligent terminal updates its local second voice processing capability list after receiving the first voice processing capability update message.

Preferably, the method further includes step 107, the first intelligent terminal receives a second voice processing capability update message sent by the at least one second intelligent terminal, and updates the first voice processing capability list.

It should be noted that, any one of the steps 105-107 may be executed first, and the present invention does not limit the order of executing the above three steps.

Based on the method, when the intelligent terminal is initially accessed to the local wireless network, the voice processing capacity can be exchanged with other intelligent terminals in the network, so that after a user instruction which cannot be processed by the intelligent terminal is received, a proper intelligent terminal can be selected in time for cooperation processing, the response speed of the user is accelerated, and the user experience is improved.

3. Building a collaborative voice interaction engine cluster

A brief introduction of an interaction engine cluster (also referred to as a speech interaction engine cluster) in the prior art is given below (for a detailed description, reference may be made to the chinese patent application cn201911220477.x, the entire content of which is incorporated or incorporated by reference into the present application).

At least two voice interaction engines are included in one voice interaction engine cluster. The task associated with any one of the voice interaction engines in the voice interaction engine cluster is different from the tasks associated with other voice interaction engines in the voice interaction engine cluster, and the task associated with any one of the voice interaction engines in the voice interaction engine cluster and the task associated with at least one other voice interaction engine in the voice interaction engine cluster have at least one same or corresponding slot.

The intelligent terminal can download the attribute information of the voice interaction engine cluster from the server, and construct the voice interaction engine cluster locally according to the downloaded attribute information of the voice interaction engine cluster; the intelligent terminal can also locally aggregate and generate the voice interaction engine cluster according to the historical conversation records of the user and the voice assistant system.

When the voice interaction engine cluster is used, if any one of the voice interaction engines (hereinafter referred to as the first voice interaction engine) in the voice interaction engine cluster is activated, the voice assistant of the intelligent terminal activates the other voice interaction engines (hereinafter referred to as at least one second voice interaction engine) in the voice interaction engine cluster, and instantly synchronizes the first slot position and/or the second slot position of at least one second task associated with at least one second voice interaction engine according to the key knowledge data filled in the first slot position of the first task associated with the first voice interaction engine, and at least one second task associated with the at least one second voice interaction engine has the first slot position and/or a second slot position corresponding to the first slot position, and task execution results are obtained from all or part of the voice interaction engines in the voice interaction engine cluster and are simultaneously provided for the user.

The instant synchronization includes instant synchronization when the first slot position changes from unfilled to filled with first key knowledge data, and also includes instant synchronization when the key knowledge data filled by the first slot position changes from the first key knowledge data to second key knowledge data.

When the voice assistant system is in a cluster task working mode, if the first voice interaction engine is the leading voice interaction engine of the voice interaction engine cluster, when the first voice interaction engine is activated, other voice interaction engines in the voice interaction engine cluster are also activated, and instant synchronization of slot positions among the voice interaction engines is carried out, and if the first voice interaction engine is not the leading voice interaction engine of the voice interaction engine cluster, only the first voice interaction engine is activated.

However, in the prior art, a plurality of voice interaction engines forming a voice interaction engine cluster all belong to a voice assistant of the same intelligent terminal, and the voice interaction engine cluster cannot be formed by using voice interaction engines of voice assistants respectively belonging to different intelligent terminals, and when the voice assistants of different intelligent terminals of a user respectively install different voice interaction engines (for example, the intelligent assistant of a first intelligent terminal includes an air ticket reservation voice interaction engine, and the intelligent assistant of a second intelligent terminal includes a weather query voice interaction engine), the user cannot enjoy good experience brought by the voice interaction engine cluster formed by the different voice interaction engines.

Based on the above, the invention provides a special voice interaction engine cluster, which is called a collaborative voice interaction engine cluster, and the collaborative voice interaction engine cluster has all the other characteristics of the voice interaction engine cluster except the characteristic that the voice interaction engines in the voice interaction engine cluster are located in the same intelligent terminal.

As described above in the present invention, when the first intelligent terminal and the at least one second intelligent terminal in the local wireless network successfully access the local wireless network for the first time, the first intelligent terminal and the at least one second intelligent terminal exchange voice processing capabilities (voice recognition capability and voice interaction capability) with each other and can invoke each other or cooperate with each other, which is equivalent to the effect that any one intelligent terminal in the wireless network has all the voice processing capabilities of itself and the other intelligent terminals in the local wireless network, which makes it possible to construct a collaborative voice interaction engine cluster.

The invention provides a method for constructing a cooperative voice interaction engine cluster in a local wireless network, which comprises the steps of generating a first voice processing capacity list according to a first intelligent terminal after the first intelligent terminal is successfully accessed into the local wireless network for the first time, constructing the cooperative voice interaction engine cluster based on voice interaction engines belonging to different intelligent terminals, and controlling the different intelligent terminals to perform voice interaction with users based on the cooperative voice interaction engine cluster by the first intelligent terminal receiving a user instruction.

Referring to fig. 3, the method comprises the steps of:

Preferably, the method further includes step 205, the first intelligent terminal sends the attribute information of the first collaborative voice interaction engine cluster to at least one second intelligent terminal in the local wireless network. Therefore, the second intelligent terminal including at least one voice interaction engine in the collaborative voice interaction engine cluster or the second intelligent terminal not including any voice interaction engine in the collaborative voice interaction engine cluster can receive the attribute information of the collaborative voice interaction engine cluster, and then the collaborative voice interaction engine cluster is used, so that a user can enjoy good experience brought by the collaborative voice interaction engine cluster on any terminal of the local wireless network.

Preferably, the attribute information of the first speech interaction engine cluster in step 202 includes a cluster name of the first speech interaction engine cluster, names of at least two speech interaction engines included in the first speech interaction engine cluster, and the same or corresponding slots of the tasks that can be executed (or associated) by the at least two speech interaction engines.

Alternatively, in the above method, the voice interaction engine cluster search response carries configuration information of at least one first voice interaction engine cluster, where a task associated with the first voice interaction engine and a task associated with at least one second voice interaction engine included in the configuration information of the voice interaction engine cluster are carried. In step 203, the first intelligent terminal queries the first voice processing capability list, and determines at least one second intelligent terminal and at least one corresponding third voice interaction engine capable of completing the task associated with the at least one second voice interaction engine. The attribute information of the first collaborative voice interaction engine cluster comprises a corresponding relation between the first voice interaction engine and a first intelligent terminal and a corresponding relation between the at least one third voice interaction engine and at least one second intelligent terminal.

In the method, the information of the voice interaction engine cluster which is stored in the voice assistant cloud server before is utilized when the cooperative voice interaction engine cluster is constructed, so that the construction of the cooperative voice interaction engine cluster can be simplified and accelerated, and a user can enjoy good experience brought by the interaction engine cluster as soon as possible.

Preferably, the step 204 further comprises: before establishing and generating the attribute information of the collaborative voice interaction engine cluster, the first intelligent terminal sends a collaborative voice interaction engine cluster establishment inquiry request to at least one second intelligent terminal corresponding to at least one second voice interaction engine or completing the task associated with the at least one second voice interaction engine, and after receiving a collaborative voice interaction engine cluster establishment agreement response, establishes the attribute information of the collaborative voice interaction engine cluster according to the voice interaction engine cluster search response; after receiving the query request for constructing the collaborative voice interaction engine cluster, the second intelligent terminal judges whether the second intelligent terminal supports the collaborative voice interaction engine cluster function, and if so, generates a collaborative voice interaction engine cluster construction approval response and sends the collaborative voice interaction engine cluster construction approval response to the first intelligent terminal. The first intelligent terminal determines that the intelligent terminal to which each voice interaction engine in the collaborative voice interaction engine cluster to be built belongs supports the collaborative voice interaction engine cluster through the mode of building the inquiry request and building the agreement response, and therefore the situation that the collaborative voice interaction engine cluster cannot be normally used due to the fact that part of the intelligent terminals do not support after the first intelligent terminal builds the collaborative voice interaction engine cluster is avoided.

Preferably, if the first intelligent terminal does not receive a cooperative voice interaction engine cluster construction approval response from one or more second intelligent terminals after sending a cooperative voice interaction engine cluster construction inquiry request to the at least one second intelligent terminal, the first intelligent terminal sends a voice assistant upgrade request message to the one or more second intelligent terminals, and after receiving a voice assistant upgrade completion message sent by the one or more second intelligent terminals, the first intelligent terminal constructs attribute information of the cooperative voice interaction engine cluster according to the voice interaction engine cluster search response. And after the one or more second intelligent terminals receive the voice assistant upgrading request message, starting the voice assistant upgrading, and after the upgrading is finished, generating and sending a voice assistant upgrading finishing message to the first intelligent terminal.

Preferably, the attribute information of the collaborative speech interaction engine cluster includes a cluster name of the collaborative speech interaction engine cluster, names of at least two speech interaction engines included in the collaborative speech interaction engine cluster, identifiers of smart terminals to which the at least two speech interaction engines belong (i.e. the correspondence described above), and the same or corresponding slots that the at least two speech interaction engines can execute (or associate with) tasks. In a specific embodiment, the attribute information of the constructed collaborative speech interaction engine cluster is shown in table 3 below.

TABLE 3 Attribute information for collaborative voice interaction Engine Cluster

Preferably, the first speech interaction engine is a dominant interaction engine of the first speech interaction engine cluster, that is, only the speech interaction engine cluster taking the first speech interaction engine as the dominant interaction engine is included in the search response. And meanwhile, setting the first voice interaction engine as a leading interaction engine of the first collaborative voice interaction engine cluster, and recording the leading interaction engine in attribute information of the collaborative voice interaction engine cluster.

Preferably, in step 205, before the first intelligent terminal transmits the attribute information of the collaborative speech interaction engine cluster to at least one second intelligent terminal in the local wireless network, the first intelligent terminal sends a first instruction to the user to instruct the user to name (when the voice interaction engine cluster search response carries the configuration information) or rename (when the voice interaction engine cluster search response carries the attribute information) the collaborative voice interaction engine cluster, receiving a naming or renaming instruction of a user, recording a cluster name carried in the naming instruction in attribute information of the collaborative voice interaction engine cluster, or updating the cluster name in the attribute information of the collaborative voice interaction engine cluster by using the cluster name carried in the renaming instruction, and then, and then sending the attribute information of the cooperative voice interaction engine cluster to at least one second intelligent terminal in the local wireless network.

Preferably, in step 203, if the first intelligent terminal determines that there are a plurality of second intelligent terminals corresponding to a certain second speech interaction engine or second intelligent terminals capable of completing tasks associated with a certain second speech interaction engine, one second intelligent terminal is selected from the plurality of second intelligent terminals to serve as the second intelligent terminal corresponding to the second speech interaction engine or the second intelligent terminal capable of completing tasks associated with the second speech interaction engine.

Preferably, the selection is made according to at least one of:

the first intelligent terminal acquires the off-line time lengths of the plurality of second intelligent terminals from a server in the local wireless network, and selects the second intelligent terminal with the shortest off-line time length;

selecting one of the second intelligent terminals according to the recorded processor capabilities of the plurality of second intelligent terminals;

selecting a second intelligent terminal supporting high-speed cellular communication and direct communication between at least one terminal according to the recorded communication capability;

and the second intelligent terminal is sequentially preferentially selected from alternating current power supply, battery power supply and wireless charging support.

And if the at least two selection factors are comprehensively considered, the communication capability selection, the processor capability, the off-line time length and the energy acquisition mode are preferably considered in sequence.

Preferably, in step 203, if the first intelligent terminal determines that there are a plurality of second intelligent terminals corresponding to a certain second speech interaction engine or second intelligent terminals capable of completing tasks associated with a certain second speech interaction engine, one second intelligent terminal is selected from the plurality of second intelligent terminals as the second intelligent terminal corresponding to the second speech interaction engine or the second intelligent terminal capable of completing tasks associated with the second speech interaction engine (for example, one second intelligent terminal may be selected randomly as the second intelligent terminal corresponding to the second speech interaction engine), and the remaining second intelligent terminals are used as concurrent second intelligent terminals corresponding to the second speech interaction engine or concurrent second intelligent terminals capable of completing tasks associated with the second speech interaction engine; correspondingly, step 204 further includes: and the first intelligent terminal constructs and generates first cooperative voice interaction concurrent engine cluster attribute information according to the voice interaction engine cluster search response, wherein the first cooperative voice interaction concurrent engine cluster attribute information comprises the corresponding relation between the first voice interaction engine and the first intelligent terminal and the corresponding relation between the at least one second voice interaction engine and the at least one concurrent second intelligent terminal.

Preferably, the method further includes step 205, the first intelligent terminal sends the attribute information of the first collaborative voice interaction and concurrency engine cluster to at least one second intelligent terminal in the local wireless network.

Preferably, if the second intelligent terminal is selected according to the communication capability selection, the processor capability, the offline duration and the energy acquisition mode, the priority of the first collaborative voice interaction engine cluster constructed based on the second intelligent terminal is the highest, and the priority of the first collaborative voice interaction concurrent engine cluster is sorted according to the selection factors.

The first collaborative speech interaction engine cluster and the first collaborative speech interaction coexistence engine cluster are only for convenience of description, have the same characteristics, and are used interchangeably hereinafter.

Preferably, the method further comprises the steps of:

step 206, receiving a request for constructing a collaborative speech interaction engine cluster by an agent sent by the at least one second intelligent terminal in the local wireless network, wherein the request carries the at least one second speech interaction engine included in the at least one second intelligent terminal;

the first intelligent terminal constructs attribute information of a second collaborative voice interaction engine cluster in a similar manner of constructing the first collaborative voice interaction engine cluster, wherein the second collaborative voice interaction engine cluster comprises the at least one second voice interaction engine, and specifically, step 207 and step 211 are executed. Through the method, the first intelligent terminal can replace part of second intelligent terminals with lower processing capacity to execute the construction of the collaborative voice interaction engine cluster.

Step 207, the first intelligent terminal sends a voice interaction engine cluster search request to a voice assistant cloud server, wherein the voice interaction engine cluster search request carries and supports the at least one second voice interaction engine;

step 208, the first intelligent terminal receives a voice interaction engine cluster search response from the voice assistant cloud server, wherein the voice interaction engine cluster search response carries attribute information of at least one second voice interaction engine cluster, and voice interaction engines included in the second voice interaction engine cluster include the at least one second voice interaction engine and at least one fourth voice interaction engine;

step 209, the first intelligent terminal queries the first voice processing capability list and determines at least one fourth intelligent terminal corresponding to the at least one fourth voice interaction engine; the at least one fourth intelligent terminal is at least one of at least one second intelligent terminal which has successfully accessed the local wireless network for the first time before the first intelligent terminal accesses the local wireless network;

step 210, the first intelligent terminal constructs and generates the second collaborative voice interaction engine cluster attribute information according to the voice interaction engine cluster search response, wherein the second collaborative voice interaction engine cluster attribute information includes a corresponding relationship between the at least one second voice interaction engine and the at least one second intelligent terminal and a corresponding relationship between the at least one fourth voice interaction engine and the at least one fourth intelligent terminal.

Preferably, the method further includes step 211, the first intelligent terminal sends the attribute information of the collaborative voice interaction engine cluster to at least one second intelligent terminal in the local wireless network.

All the further definitions of step 201-205 described above are applicable to the corresponding steps in step 207-210, and are not described herein again.

4. Use of a collaborative speech interaction engine cluster

The invention also provides a method for interacting with a voice assistant based on a collaborative voice interaction engine cluster, which is used for a first intelligent terminal and is shown in figure 4, and the method comprises the following steps:

step 300, acquiring a user instruction;

step 301, determining a first task based on the user instruction, and determining a first voice interaction engine capable of executing the first task based on a first voice processing capability list; the first voice processing capability list comprises corresponding relations between different tasks and a voice interaction engine;

step 302, determining a first cooperative voice interaction engine cluster to which the first voice interaction engine belongs based on a locally stored cooperative voice interaction engine cluster; wherein the first collaborative speech interaction engine cluster comprises the first speech interaction engine and at least one second speech interaction engine;

step 303, determining a set of intelligent terminals corresponding to the voice interaction engines included in the first collaborative voice interaction engine cluster based on the first voice processing capability list, where the set of intelligent terminals includes a second intelligent terminal corresponding to the first voice interaction engine and at least one third intelligent terminal corresponding to the at least one second voice interaction engine;

step 304, sending the user instruction to a first voice interaction engine of the second intelligent terminal, and receiving inter-terminal slot position synchronization information from the first voice interaction engine of the second intelligent terminal, wherein key knowledge data which is required to be synchronized to at least one second voice interaction engine of the at least one third intelligent terminal by the first voice interaction engine of the second intelligent terminal is carried; the inter-terminal slot position synchronization information is forwarded to at least one second voice interaction engine of at least one third intelligent terminal; and respectively receiving a first task execution result and at least one second task execution result from a first voice interaction engine of a second intelligent terminal and at least one second voice interaction engine of at least one third intelligent terminal, and simultaneously providing the first task execution result and the at least one second task execution result to a user.

Preferably, step 304 specifically includes:

executing a first judgment: judging whether the first intelligent terminal and the second intelligent terminal are the same terminal, namely judging whether the first voice interaction engine is a local voice interaction engine of the first intelligent terminal;

case 1: if the result of the first judgment is yes, sending the voice instruction to the first voice interaction engine, receiving terminal slot position synchronization information from the first voice interaction engine, and sending the terminal slot position synchronization information to the at least one third intelligent terminal, so that the at least one second voice interaction engine of the at least one third intelligent terminal can perform instant synchronization on the slot position of the at least one second task associated with the at least one second voice interaction engine based on the terminal slot position synchronization information; receiving a first task execution result and at least one second task execution result from the first voice interaction engine and the at least one third intelligent terminal respectively, and providing the first task execution result and the at least one second task execution result to a user simultaneously;

case 2: if the first judgment result is negative, when the first intelligent terminal and one of the at least one third intelligent terminal are the same terminal, sending the voice instruction to a second intelligent terminal corresponding to the first voice interaction engine, activating a second voice interaction engine, namely a local second voice interaction engine, belonging to the first intelligent terminal in the cooperative voice interaction engine cluster, receiving inter-terminal slot position synchronization information from the second intelligent terminal, and carrying out instant synchronization on a slot position of a second task associated with the local second voice interaction engine based on the inter-terminal slot position synchronization information; receiving a second task execution result and a first task execution result from the local second voice interaction engine and the second intelligent terminal respectively, and providing the first task execution result and the second task execution result to a user at the same time;

case 3: if the first judgment result is negative, when the first intelligent terminal and any one of the at least one third intelligent terminal are not the same terminal, sending the voice instruction to a second intelligent terminal corresponding to the first voice interaction engine, informing the at least one third intelligent terminal to activate the at least one second voice interaction engine, receiving inter-terminal slot position synchronization information from the second intelligent terminal, and forwarding the inter-terminal slot position synchronization information to the at least one third intelligent terminal, so that the at least one third intelligent terminal carries out instant synchronization on the slot position of the at least one second task associated with the at least one second voice interaction engine based on the inter-terminal slot position synchronization information; and respectively receiving a first task execution result and at least one second task execution result from the second intelligent terminal and at least one third terminal, and simultaneously providing the first task execution result and the second task execution result to a user.

Preferably, in case 2, that is, if the result of the first determination is negative, and one of the first intelligent terminal and the at least one third intelligent terminal is the same terminal, the first intelligent terminal notifies the at least one third intelligent terminal to activate the at least one second voice interaction engine after sending the voice instruction to the second intelligent terminal corresponding to the first voice interaction engine, and forwards the received inter-terminal slot synchronization information to the other third intelligent terminals, so that the other third intelligent terminals synchronize slots of the second task associated with the corresponding second interaction engine based on the inter-terminal slot synchronization information, receive second task execution results from the other third intelligent terminals, and provide the second task execution results received from the other third intelligent terminals, the first task execution results, and the second task execution results received from the local second voice interaction engine to the user at the same time And (4) a user.

Preferably, the inter-terminal slot synchronization information includes key knowledge data filled in a first slot of a first task by a first voice interaction engine; the slot position of the at least one second task associated with the at least one second voice interaction engine is synchronized instantly, specifically, a first slot position of the at least one second task associated with the at least one second voice interaction engine and/or a second slot position corresponding to the first slot position are synchronized instantly.

Preferably, in step 302, a second determination is further performed, that is, it is determined whether the first speech interaction engine is a dominant interaction engine of the first collaborative speech interaction engine cluster, and if a result of the second determination is yes, the second determination is performed according to a way of the collaborative speech interaction engine cluster, that is, step 303 and subsequent steps are performed. If the result of the second determination is negative, step 305 is executed, and step 305 is similar to the process of step 105 (in the case where the speech processing engine is a speech interaction engine), that is, it is determined whether the first speech interaction engine is a local speech interaction engine of the first smart terminal, if so, the speech instruction is sent to the first speech interaction engine, so that the first speech interaction engine fills the slot of the first task based on the speech instruction, receives the first task execution result from the first speech interaction engine, and provides the first task execution result to the user, if not, the speech instruction is sent to the cooperative smart terminal to which the first speech interaction engine belongs, so that the cooperative smart terminal fills the slot of the first task based on the speech instruction, and receives the first task execution result from the cooperative smart terminal, and provides it to the user.

After step 301, determining whether the first intelligent terminal is in a collaborative interaction engine cluster working mode and determining whether the user instruction includes a collaborative interaction engine cluster working mode change instruction, if one of the two determinations is yes, executing step 302, and if both determinations are no, executing step 305.

Preferably, in step 300, a voice instruction of the user is obtained, and then, the first smart device determines a first intelligibility value of the voice instruction based on the received voice instruction of the user, broadcasts the first intelligibility value to at least one other smart device in the local wireless network, receives a second intelligibility value from the at least one other smart device, and determines whether the first intelligibility value is greater than the second intelligibility value, and if so, executes the subsequent steps.

Preferably, step 300 specifically includes: step 300-1: receiving a voice instruction of a user; and 300-2, the first intelligent terminal selects an intelligent terminal capable of carrying out voice recognition processing on the voice command based on the first voice capability list, and acquires a user command based on a processing result of the intelligent terminal.

Preferably, if a plurality of first speech interaction engines capable of executing the first task are determined based on the first speech processing capability list, first selecting one from the plurality of first interaction engines according to the method for selecting one from the plurality of intelligent devices as described in the embodiments of fig. 2 and fig. 3, and then determining a first collaborative speech interaction engine cluster to which the selected first speech interaction engine belongs based on a locally stored collaborative speech interaction engine cluster.

If it is determined that there are multiple first collaborative speech interaction engine clusters (first collaborative speech interaction coexistence engine clusters) to which the first speech interaction engine belongs, one of the first collaborative speech interaction engine clusters is selected, and then step 303 is executed. The selection method may be one of the following:

selecting according to the priority sequence of the first collaborative voice interaction engine cluster;

if the first voice interaction engine corresponds to different affiliated intelligent terminals in different first collaborative voice interaction engine clusters, if the affiliated intelligent terminal corresponding to the first voice interaction engine in one of the first collaborative voice interaction engine clusters is the first intelligent terminal, selecting the first collaborative voice interaction engine cluster, and continuing to execute the step 303;

if the intelligent terminal corresponding to the first speech interaction engine in all the first collaborative speech interaction engine clusters is not the first intelligent terminal, selecting one from different affiliated intelligent terminals corresponding to the first speech interaction engine in different first collaborative speech interaction engine clusters by the first speech interaction engine according to the method for selecting one from the plurality of intelligent devices introduced in the embodiments of fig. 2 and fig. 3, and taking the collaborative speech interaction engine cluster corresponding to the affiliated intelligent terminal as the selected collaborative speech interaction engine cluster;

if at least one second voice interaction engine corresponds to the same intelligent terminal in different first collaborative voice interaction engine clusters, if the intelligent terminal corresponding to at least one second voice interaction engine in one of the first collaborative voice interaction engine clusters is the first intelligent terminal, selecting the first collaborative voice interaction engine cluster, and continuing to execute the step 303;

if none of the intelligent terminals to which at least one second speech interaction engine in all the first collaborative speech interaction engine clusters corresponds is the first intelligent terminal, selecting one of the different affiliated intelligent terminals to which at least one second speech interaction engine corresponds in different first collaborative speech interaction engine clusters from the at least one second speech interaction engine according to the method of selecting one of the plurality of intelligent devices described in the embodiments of fig. 2 and fig. 3, and taking the collaborative speech interaction engine cluster corresponding to the affiliated intelligent terminal as the selected collaborative speech interaction engine cluster.

Either one of the selection of the first collaborative speech interaction engine cluster and the second determination may be performed first. This is because if the first speech interaction engine is the dominant speech interaction engine in one of the first collaborative speech interaction engine clusters, it is also the dominant speech interaction engine in the other first collaborative speech interaction engine clusters to which it belongs.

The three methods of fig. 2-4 of the present invention are applied to the same scene and belong to the same concept system, and the three methods can be combined arbitrarily.

The present invention also provides an apparatus for dynamically constructing a speech processing capability list, referring to fig. 5, the apparatus comprising:

the voice processing capability list dynamic construction unit is used for responding to the first successful access to the local wireless network, acquiring first voice processing capability which can be executed by the device and initializing a local first voice processing capability list; judging whether at least one second intelligent terminal is successfully accessed to the local wireless network for the first time before the second intelligent terminal is accessed to the local wireless network, if so, generating a first voice processing capability interaction message, triggering the communication unit to broadcast the first voice processing capability interaction message in the local wireless network, so that the at least one second intelligent terminal generates a second voice processing capability interaction message after receiving the first voice processing capability interaction message, sending the second voice processing capability interaction message to the device, and updating a local second voice processing capability list based on the first voice processing capability interaction message;

a communication unit for broadcasting the first voice processing capability interaction message within the local wireless network;

wherein the first voice processing capability interaction message comprises a first voice processing capability that the apparatus is capable of performing; the second voice processing capability interactive message comprises second voice processing capability which can be executed by the second intelligent terminal;

the communication unit is used for receiving a second voice processing capability interaction message sent by the at least one second intelligent terminal;

the dynamic voice processing capability list building unit is further configured to update the first voice processing capability list based on the second voice processing capability interaction message.

Preferably, the communication unit is further configured to send a first access request to a server in the local wireless network, and receive a first access success response returned by the server after the server passes authentication of the device;

the apparatus further includes a storage unit for storing the first speech processing capability list.

Preferably, the communication unit is further configured to send an inquiry request to a server of the local wireless network, and receive an inquiry response sent by the server, where the inquiry response carries all second intelligent terminal identifiers that have successfully accessed the local wireless network for the first time before the device accesses the local wireless network for the first time. And the voice processing capacity list dynamic construction unit judges whether at least one second intelligent terminal is successfully accessed to the local wireless network for the first time before the voice processing capacity list dynamic construction unit accesses to the local wireless network.

Preferably, the broadcasting, by the communication unit, the first voice processing capability interaction message in the local wireless network specifically includes: the communication unit sends the first voice processing capability interaction message to the server, the first voice processing capability interaction message carries a broadcast identifier, and the server broadcasts the first voice processing capability interaction message in the local wireless network.

Preferably, the communication unit is further configured to send other messages such as the intelligent terminal parameter interaction message described above.

The voice processing capability list dynamic construction unit is also used for recording the parameter information of other intelligent terminals in the storage unit.

Preferably, the speech processing capability list dynamic construction unit is further configured to: if the communication unit judges that the communication unit receives the second voice processing capability interaction message only from a part of the at least one second intelligent terminal within the preset time and does not receive the second voice processing capability interaction message from the rest second intelligent terminals, selecting one second intelligent terminal from the part of the second intelligent terminals receiving the second voice processing capability interaction message as a relay second intelligent terminal, triggering the communication unit to send a voice processing capability relay request message to the relay second intelligent terminal, wherein the relay second intelligent terminal carries the identifier of the rest second intelligent terminal, and after receiving the voice processing capability relay request message, inquiring from a local second voice processing capability list to obtain the voice processing capability of the rest second intelligent terminals, and carried in a voice processing capability relay response message to the communication unit of the device. At this time, the voice processing capability list dynamic construction unit further updates the first voice processing capability list based on the voice processing capability relay response message.

Preferably, the communication unit is further configured to receive a synchronization message sent by the server to indicate a change in the networking state of the other intelligent terminals.

Preferably, the communication unit further sends a low battery warning message to the local wireless network when the battery power is lower than a preset threshold and the battery power cannot be replenished through wireless charging (if the battery power is not supported or the automatic wireless charging system fails).

The communication unit is also used for receiving low power alarm messages sent by other intelligent terminals.

The voice processing capability list dynamic construction unit is further used for recording the low power state of the other intelligent terminals in a storage unit based on the low power warning message.

Preferably, the communication unit receives a second voice processing capability interaction message sent by at least one second intelligent terminal whose real-time networking status is on-network, and the voice processing capability list dynamic construction unit updates the first voice processing capability list based on the second voice processing capability interaction message. Then, the voice processing capability list dynamic construction unit selects one second intelligent terminal from at least one second intelligent terminal with real-time networking state as the on-network as the relay second intelligent terminal, triggers the communication unit to send a voice processing capability relay request message to the relay second intelligent terminal, wherein the identifier of the second intelligent terminal with the real-time networking state being offline is carried, after the relay second intelligent terminal receives the voice processing capability relay request message, the voice processing capability of the second intelligent terminal with the real-time networking state being offline is obtained by inquiring from a local second voice processing capability list, and a communication unit carried in a voice processing capability relay response message and sent to the device, wherein the voice processing capability list dynamic construction unit further updates the first voice processing capability list based on the voice processing capability relay response message.

The device also comprises a user instruction acquisition unit used for receiving a user instruction;

the device further comprises a task execution control unit for determining a first speech processing engine capable of processing the user instruction based on a first speech processing capability list; judging whether the first voice processing engine is a local voice processing engine of the device, if not, triggering the communication unit to send the user instruction to a cooperative intelligent terminal to which the first voice processing engine belongs, and receiving a voice processing result from the cooperative intelligent terminal; if so, obtaining a voice processing result by adopting the first voice processing engine.

The device also comprises a task execution result providing unit which is used for providing the voice processing result for the user when the voice processing result is the task execution result.

The task execution control unit is further configured to perform a relevant operation (see below specifically) of the definition judgment based on the user instruction received by the user instruction obtaining unit, and then determine a first speech processing engine capable of processing the user instruction based on the first speech processing capability list.

The task execution control unit is further configured to: and when the device is judged to be incapable of processing the user instruction, selecting a second intelligent terminal capable of processing the user instruction as a cooperative intelligent terminal based on the first voice processing capacity list, triggering the communication unit to send the user instruction to the cooperative intelligent terminal, and receiving a voice processing result from the cooperative intelligent terminal.

The process of selecting the cooperative intelligent terminal by the task execution control unit is specifically referred to the above description.

And the task execution control unit is further configured to, if the user instruction is a voice instruction and the local inability to process the user instruction is that the voice instruction cannot be locally recognized, extract voiceprint features of the user based on the user instruction, and store the voiceprint features in association with the intelligent collaboration terminal.

And the task execution control unit is further configured to trigger the communication unit to directly send the voice instruction to the intelligent cooperative terminal and obtain a voice recognition result from the intelligent cooperative terminal without attempting local voice recognition when it is determined that the voice instruction received again by the user instruction obtaining unit has the voiceprint feature.

The voice processing capability list dynamic construction unit is further configured to trigger the communication unit to send a voice processing capability list synchronization request to at least one other intelligent terminal on the local wireless network after the device is offline for a period of time and is online again, and receive a voice processing capability list synchronization response from the at least one other intelligent terminal; the dynamic speech processing capability list construction unit is further configured to update a local speech processing capability list based on the speech processing capability list synchronization response.

The voice processing capability list dynamic construction unit is also used for triggering the communication unit to acquire the networking states of all intelligent terminals in the local wireless network from the server after the device is offline for a period of time and is online again.

The voice processing capability list dynamic construction unit is further configured to update a local first voice processing capability list when the voice processing capability changes, generate a first voice processing capability update message, and trigger the communication unit to broadcast the first voice processing capability update message in the local wireless network, so that the at least one second intelligent terminal updates a local second voice processing capability list after receiving the first voice processing capability update message.

The communication unit is further configured to receive a second voice processing capability update message sent by the at least one second intelligent terminal; the dynamic voice processing capability list building unit is further configured to update the first voice processing capability list based on the second voice processing capability update message.

Preferably, the device is used for the first intelligent terminal.

The invention also provides a device for dynamically constructing the voice processing capability list, and the device is used for the second intelligent terminal. The device comprises a communication unit, a voice processing capability list dynamic construction unit and a storage unit, wherein the communication unit of the device is used for executing the message sending and the message receiving, and the voice processing capability list dynamic construction unit is used for dynamically constructing the voice processing capability list of the device.

The present invention also provides an apparatus for constructing a collaborative voice interaction engine cluster in a local wireless network, referring to fig. 6, the apparatus comprising:

Preferably, the device is used for the first intelligent terminal.

Preferably, the apparatus further comprises a storage unit, configured to store the first collaborative speech interaction engine cluster attribute information.

The communication unit is further configured to send attribute information of the first collaborative voice interaction engine cluster to at least one second intelligent terminal in the local wireless network.

The cooperative voice interaction engine cluster construction unit is further configured to query the first voice processing capability list when configuration information of the at least one first voice interaction engine cluster is carried in a voice interaction engine cluster search response, and determine at least one second intelligent terminal and at least one corresponding third voice interaction engine capable of completing a task associated with the at least one second voice interaction engine. At this time, the generated attribute information of the first collaborative speech interaction engine cluster includes a corresponding relationship between the first speech interaction engine and the device and a corresponding relationship between the at least one third speech interaction engine and the at least one second intelligent terminal.

The collaborative voice interaction engine cluster construction unit is further configured to trigger the communication unit to send a collaborative voice interaction engine cluster construction inquiry request to at least one second intelligent terminal corresponding to the at least one second voice interaction engine or completing a task associated with the at least one second voice interaction engine before the collaborative voice interaction engine cluster attribute information is constructed and generated, and to construct attribute information of a collaborative voice interaction engine cluster according to the voice interaction engine cluster search response after the communication unit receives a collaborative voice interaction engine cluster construction approval response; and the second intelligent terminal judges whether the second intelligent terminal supports the function of the collaborative voice interaction engine cluster after receiving the collaborative voice interaction engine cluster construction inquiry request, and if so, generates a collaborative voice interaction engine cluster construction approval response and sends the collaborative voice interaction engine cluster construction approval response to the device.

The cooperative voice interaction engine cluster building unit is further configured to trigger the communication unit to send a voice assistant upgrade request message to one or more second intelligent terminals if it is determined that the device has not received a cooperative voice interaction engine cluster building approval response from one or more of the second intelligent terminals after sending a cooperative voice interaction engine cluster building inquiry request to the at least one second intelligent terminal, and build attribute information of the cooperative voice interaction engine cluster according to the voice interaction engine cluster search response after the communication unit receives a voice assistant upgrade completion message sent by the one or more second intelligent terminals. And after the one or more second intelligent terminals receive the voice assistant upgrading request message, starting the voice assistant upgrading, and after the upgrading is finished, generating and sending a voice assistant upgrading finishing message to the device.

The collaborative voice interaction engine cluster constructing unit is further configured to set the first voice interaction engine as a leading interaction engine of the first collaborative voice interaction engine cluster, and record the leading interaction engine in attribute information of the collaborative voice interaction engine cluster.

The collaborative voice interaction engine cluster construction unit is further configured to trigger the task execution result providing unit to send a first instruction to a user before triggering the communication unit to send attribute information of the collaborative voice interaction engine cluster to at least one second intelligent terminal in the local wireless network, instruct the user to name (when configuration information is carried in a voice interaction engine cluster search response) or rename (when attribute information is carried in a voice interaction engine cluster search response) the collaborative voice interaction engine cluster, and trigger the user instruction acquisition unit to receive a naming or renaming instruction of the user; the collaborative voice interaction engine cluster construction unit is further configured to record the cluster name carried in the naming instruction in the attribute information of the collaborative voice interaction engine cluster, or update the cluster name in the attribute information of the collaborative voice interaction engine cluster by using the cluster name carried in the renaming instruction.

The collaborative voice interaction engine cluster building unit is further configured to select one second intelligent terminal from the plurality of second intelligent terminals corresponding to a certain second voice interaction engine or second intelligent terminals capable of completing tasks associated with a certain second voice interaction engine when the first intelligent terminal determines that the number of the second intelligent terminals corresponding to the certain second voice interaction engine or the plurality of second intelligent terminals capable of completing tasks associated with the certain second voice interaction engine is multiple. The specific process of selection is as described above and will not be described in detail.

The collaborative voice interaction engine cluster building unit is further used for selecting one second intelligent terminal from the collaborative voice interaction engine cluster building units as a second intelligent terminal corresponding to the second voice interaction engine or a second intelligent terminal capable of completing tasks related to the second voice interaction engine, and then using the other second intelligent terminals as concurrent second intelligent terminals corresponding to the second voice interaction engine or concurrent second intelligent terminals capable of completing tasks related to the second voice interaction engine; and constructing and generating attribute information of a first cooperative voice interaction concurrent engine cluster according to the voice interaction engine cluster search response, wherein the attribute information of the first cooperative voice interaction concurrent engine cluster comprises the corresponding relation between the first voice interaction engine and the first intelligent terminal and the corresponding relation between the at least one second voice interaction engine and the at least one concurrent second intelligent terminal.

The communication unit is further configured to send attribute information of the first collaborative voice interaction coexistence engine cluster to at least one second intelligent terminal in the local wireless network.

The collaborative voice interaction engine cluster construction unit is further configured to, if the second intelligent terminal is selected according to the communication capability selection, the processor capability, the offline duration and the energy acquisition manner, prioritize the first collaborative voice interaction concurrent engine cluster based on the highest priority of the first collaborative voice interaction engine cluster constructed by the second intelligent terminal, and prioritize the first collaborative voice interaction concurrent engine cluster according to the selection factors.

The cooperative voice interaction engine cluster building unit is further configured to, when it is determined that there are a plurality of second intelligent terminals corresponding to a certain second voice interaction engine or second intelligent terminals capable of completing tasks associated with a certain second voice interaction engine, trigger the communication unit to acquire offline durations of the plurality of second intelligent terminals from a server in the local wireless network, and use the second intelligent terminal with the shortest offline duration as the second intelligent terminal corresponding to the second voice interaction engine or use the second intelligent terminal with the shortest offline duration as the second intelligent terminal capable of completing tasks associated with the second voice interaction engine.

The communication unit is further configured to receive a request for constructing a collaborative speech interaction engine cluster by an agent sent by the at least one second intelligent terminal in the local wireless network, where the request carries at least one second speech interaction engine included in the at least one second intelligent terminal.

The collaborative voice interaction engine cluster constructing unit is further configured to construct attribute information of a second collaborative voice interaction engine cluster in a manner similar to that of the first collaborative voice interaction engine cluster, where the second collaborative voice interaction engine cluster includes the at least one second voice interaction engine, and trigger the communication unit to send the attribute information of the second collaborative voice interaction engine cluster to the at least one second intelligent terminal in the local wireless network.

The invention also provides a device for dynamically constructing the voice processing capability list, and the device is used for the second intelligent terminal. The device further comprises a storage unit, wherein the storage unit is used for storing attribute information of the first cooperation voice interaction engine cluster and the second cooperation voice interaction engine cluster.

The invention also provides a device for interacting with a voice assistant based on a collaborative voice interaction engine cluster, and referring to fig. 7, the device comprises:

the user instruction acquisition unit is used for acquiring a user instruction;

the task execution control unit is used for determining a first task based on the user instruction and determining a first voice interaction engine capable of executing the first task based on a first voice processing capability list; the first voice processing capability list comprises corresponding relations between different tasks and a voice interaction engine; determining a first cooperative voice interaction engine cluster to which the first voice interaction engine belongs based on a locally stored cooperative voice interaction engine cluster; wherein the first collaborative speech interaction engine cluster comprises the first speech interaction engine and at least one second speech interaction engine; determining a set of intelligent terminals corresponding to the voice interaction engines included in the first collaborative voice interaction engine cluster based on the first voice processing capability list, wherein the set of intelligent terminals includes a second intelligent terminal corresponding to the first voice interaction engine and at least one third intelligent terminal corresponding to the at least one second voice interaction engine; a trigger communication unit;

the communication unit is configured to send the user instruction to the first voice interaction engine of the second intelligent terminal in response to the trigger of the task execution control unit, and receive inter-terminal slot synchronization information from the first voice interaction engine of the second intelligent terminal, where the key knowledge data that is carried by the first voice interaction engine of the second intelligent terminal and needs to be synchronized to at least one second voice interaction engine of the at least one third intelligent terminal is received by the communication unit; the inter-terminal slot position synchronization information is forwarded to at least one second voice interaction engine of at least one third intelligent terminal;

a task execution result receiving unit, configured to receive a first task execution result and at least one second task execution result from a first voice interaction engine of a second intelligent terminal and at least one second voice interaction engine of the at least one third intelligent terminal, respectively, and provide the first task execution result and the at least one second task execution result to a user at the same time;

and the storage unit is used for storing the collaborative voice interaction engine cluster.

Preferably, the task execution result receiving unit may receive the first task execution result and the at least one second task execution result from the first voice interaction engine of the second intelligent terminal and the at least one second voice interaction engine of the at least one third intelligent terminal, respectively, through the communication unit.

The task execution control unit is further configured to perform a first determination: judging whether the device and the second intelligent terminal are the same terminal, namely judging whether the first voice interaction engine is a local voice interaction engine of the device;

case 1: if the result of the first judgment is yes, the voice instruction is sent to the first voice interaction engine, terminal slot position synchronization information is received from the first voice interaction engine, and the communication unit is triggered to send the terminal slot position synchronization information to the at least one third intelligent terminal, so that the at least one second voice interaction engine of the at least one third intelligent terminal can carry out instant synchronization on the slot position of the at least one second task associated with the at least one second voice interaction engine based on the terminal slot position synchronization information; the task execution result receiving unit is further configured to be triggered to enable the task execution result receiving unit to execute: receiving a first task execution result from the first voice interaction engine, receiving at least one second task execution result from the at least one third intelligent terminal through the communication unit, and simultaneously providing the first task execution result and the at least one second task execution result to a user;

case 2: if the first judgment result is negative, when the device and one of the at least one third intelligent terminal are the same terminal, triggering the communication unit to send the voice instruction to a second intelligent terminal corresponding to the first voice interaction engine; the task execution control unit is also used for activating a second voice interaction engine belonging to the first intelligent terminal in the cooperative voice interaction engine cluster, namely a local second voice interaction engine, and triggering the communication unit to receive the inter-terminal slot position synchronization information from the second intelligent terminal; the task execution control unit is also used for carrying out instant synchronization on the slot position of a second task associated with the second voice interaction engine based on the inter-terminal slot position synchronization information; the task execution control unit is further configured to trigger the task execution result receiving unit to enable the task execution result receiving unit to execute: receiving a second task execution result from the local second voice interaction engine, receiving a first task execution result from the second intelligent terminal through the communication unit, and simultaneously providing the first task execution result and the second task execution result to the user;

case 3: if the first judgment result is negative, when the device and any one of the at least one third intelligent terminal are not the same terminal, triggering the communication unit to enable the communication unit to execute: sending the voice instruction to a second intelligent terminal corresponding to the first voice interaction engine, informing at least one third intelligent terminal to activate the at least one second voice interaction engine, receiving inter-terminal slot position synchronization information from the second intelligent terminal, and forwarding the inter-terminal slot position synchronization information to the at least one third intelligent terminal, so that the at least one third intelligent terminal can perform instant synchronization on the slot position of at least one second task associated with the at least one second voice interaction engine based on the inter-terminal slot position synchronization information; triggering the task execution result receiving unit to enable the task execution result receiving unit to execute: and respectively receiving a first task execution result and at least one second task execution result from the second intelligent terminal and the at least one third terminal through the communication unit, and simultaneously providing the first task execution result and the second task execution result to a user.

Preferably, the task execution control unit is further configured to: in case 2, that is, if the first determination result is negative and the first intelligent terminal and one of the at least one third intelligent terminal are the same terminal, triggering the communication unit to: after the voice instruction is sent to a second intelligent terminal corresponding to the first voice interaction engine, the at least one third intelligent terminal is informed to activate the at least one second voice interaction engine, and the received inter-terminal slot position synchronization information is forwarded to other third intelligent terminals, so that the other third intelligent terminals synchronize slot positions of second tasks associated with the corresponding second interaction engines based on the inter-terminal slot position synchronization information; triggering the task execution result receiving unit to enable the task execution result receiving unit to execute: and receiving second task execution results from the other third intelligent terminals, and simultaneously providing the second task execution results received from the other third intelligent terminals, the first task execution results and the second task execution results received from the local second voice interaction engine to the user.

The task execution control unit is further configured to: after the first collaborative voice interaction engine cluster of the first voice interaction engine is determined, executing a second judgment, namely judging whether the first voice interaction engine is a dominant interaction engine of the first collaborative voice interaction engine cluster, and if the result of the second judgment is yes, executing the second judgment in the way of the collaborative voice interaction engine cluster; if the second judgment result is negative, judging whether the first voice interaction engine is a local voice interaction engine of the first intelligent terminal, if so, sending the voice instruction to the first voice interaction engine so that the slot position of the first task is filled by the first voice interaction engine based on the voice instruction, triggering the task execution result receiving unit to receive a first task execution result from the first voice interaction engine, and providing the first task execution result for a user; if not, triggering the communication unit to send the voice instruction to the cooperative intelligent terminal to which the first voice interaction engine belongs, so that the cooperative intelligent terminal fills the slot position of the first task based on the voice instruction, triggering the task execution result receiving unit to receive the first task execution result from the cooperative intelligent terminal through the communication unit, and providing the first task execution result for the user.

The task execution control unit is further configured to: and judging whether the first intelligent terminal is in a cooperative interaction engine cluster working mode or not and judging whether a cooperative interaction engine cluster working mode change instruction is included in the user instruction or not, if one of the two judgments is yes, determining a first cooperative voice interaction engine cluster to which the first voice interaction engine belongs and subsequent steps thereof based on the locally stored cooperative voice interaction engine cluster, and if the two judgments are not, judging whether the first voice interaction engine is a local voice interaction engine of the first intelligent terminal and subsequent steps thereof.

The task execution control unit is further configured to: determining a first definition value of a voice instruction based on the voice instruction of the user received by the user instruction unit, broadcasting the first definition value to at least one other intelligent device in a local wireless network by triggering the communication unit, and receiving a second definition value from the at least one other intelligent device; the task execution control unit is further configured to determine whether the first sharpness value is greater than the second sharpness value, and if so, perform subsequent steps.

The user instruction acquisition unit is specifically used for acquiring a user voice instruction;

the task execution control unit is further configured to: and selecting an intelligent terminal capable of performing voice recognition processing on the voice command based on the first voice capability list, and acquiring a user command based on a processing result of the intelligent terminal.

The task execution control unit is further configured to, if it is determined that there are a plurality of first speech interaction engines capable of executing the first task based on the first speech processing capability list, select one of the plurality of first interaction engines, and then determine, based on the locally stored collaborative speech interaction engine cluster, a first collaborative speech interaction engine cluster to which the selected first speech interaction engine belongs.

The task execution control unit is further configured to, if it is determined that there are multiple first collaborative speech interaction engine clusters (first collaborative speech interaction coexistence engine clusters) to which the first speech interaction engine belongs, select one of the first collaborative speech interaction engine clusters, and then determine, based on the first speech processing capability list, an intelligent terminal set corresponding to a speech interaction engine included in the first collaborative speech interaction engine cluster.

The invention also provides a device for dynamically constructing the voice processing capability list, and the device is used for the second intelligent terminal. The message sending and receiving executed by the second intelligent terminal are executed by the communication unit of the device, and the rest steps are executed by the task execution control unit of the device.

In three embodiments of the apparatus for a first intelligent terminal of the present invention, a unit with the same name has all functions of the unit in other embodiments. The devices in the above three embodiments can be combined in any way.

The present invention also provides a terminal characterized by comprising the apparatus as described above or the voice assistant as described above.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. The computer-readable storage medium may include: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), a flash memory, an erasable programmable read-only memory (EPROM), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Computer program code for carrying out operations of the present invention may be written in one or more programming languages, or a combination thereof.

The above description is only an example for the convenience of understanding the present invention, and is not intended to limit the scope of the present invention. In the specific implementation, a person skilled in the art may change, add, or reduce the components of the apparatus according to the actual situation, and may change, add, reduce, or change the order of the steps of the method according to the actual situation without affecting the functions implemented by the method.

While embodiments of the invention have been shown and described, it will be understood by those skilled in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents, and all changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A method of constructing a collaborative speech interaction engine cluster, the method comprising:

2. The method according to claim 1, further comprising step 205, wherein the first intelligent terminal transmits attribute information of the first collaborative speech interaction engine cluster to at least one second intelligent terminal in the local wireless network.

3. The method of claim 1, wherein step 204 further comprises: before establishing and generating the attribute information of the collaborative voice interaction engine cluster, the first intelligent terminal sends a collaborative voice interaction engine cluster establishment inquiry request to at least one second intelligent terminal corresponding to at least one second voice interaction engine or completing the task associated with the at least one second voice interaction engine, and after receiving a collaborative voice interaction engine cluster establishment agreement response, establishes the attribute information of the collaborative voice interaction engine cluster according to the voice interaction engine cluster search response; after receiving the query request for constructing the collaborative voice interaction engine cluster, the second intelligent terminal judges whether the second intelligent terminal supports the collaborative voice interaction engine cluster function, and if so, generates a collaborative voice interaction engine cluster construction approval response and sends the collaborative voice interaction engine cluster construction approval response to the first intelligent terminal.

4. The method of claim 1, wherein the first speech interaction engine is set as a dominant interaction engine of the first collaborative speech interaction engine cluster and recorded in attribute information of the collaborative speech interaction engine cluster.

5. An apparatus for building a collaborative speech interaction engine cluster, the apparatus comprising:

6. The apparatus of claim 5, wherein the communication unit is further configured to send attribute information of the first collaborative speech interaction engine cluster to at least one second intelligent terminal in the local wireless network.

7. The apparatus of claim 5,

8. The apparatus of claim 5,

9. A computer arrangement, characterized in that the computer arrangement comprises a processor and a memory, in which a computer program is stored which is executable on the processor, which computer program, when being executed by the processor, carries out the method according to any one of claims 1 to 4.

10. A computer-readable storage medium, in which a computer program that is executable on a processor is stored, which computer program, when being executed, carries out the method according to any one of claims 1 to 4.

11. A voice assistant, characterized in that it comprises a device according to any one of claims 5 to 8.