WO2022268136A1

WO2022268136A1 - Terminal device and server for voice control

Info

Publication number: WO2022268136A1
Application number: PCT/CN2022/100547
Authority: WO
Inventors: 王冰; 李含珍; 张路伟
Original assignee: 海信视像科技股份有限公司; 聚好看科技股份有限公司
Priority date: 2021-06-22
Filing date: 2022-06-22
Publication date: 2022-12-29
Also published as: CN117882130A

Abstract

The present application discloses a terminal device and a server for voice control. The server of the present embodiment receives a voice signal transmitted by a first terminal device, generates a voice instruction according to the voice signal, and feeds back the voice instruction to the first terminal device. If the first terminal device can execute an operation corresponding to the voice instruction, the corresponding operation is executed in response to the voice instruction. If the first terminal device cannot execute the operation corresponding to the voice instruction, an instruction distribution request is transmitted to the server. The server transmits the voice instruction to a second terminal device according to the instruction distribution request, so that the second terminal device executes the corresponding operation in response to the voice instruction, wherein the second terminal device can execute the operation corresponding to the voice instruction.

Description

A terminal device and server for voice control

Cross References to Related Applications

This application requires submission on June 22, 2021, with application number 202110688867.0; on August 11, 2021, with application number 202110917713.4; on December 13, 2021, with application number 202111521226.2; in February 2022 The priority of the Chinese patent application filed on the 18th with the application number 202210151526.4, the entire content of which is incorporated in this application by reference.

technical field

The present application relates to the technical field of voice interaction, in particular to a terminal device and a server for voice control.

Background technique

With the development of voice interaction technology, more and more home terminal devices have a voice interaction function. Using the voice interaction function, the user can voice control these terminal devices to perform corresponding operations, such as starting and stopping.

At present, the process of the user's voice control of the terminal device is that the user inputs a voice signal, and after the terminal device collects the voice signal, it converts the voice signal into a corresponding instruction, so that the terminal performs corresponding operations according to the instruction.

However, at present, the voice interaction functions of most terminal devices are limited by distance. Users cannot control the devices they want to control anywhere in the room. For example, the smart TV in the bedroom cannot be turned off or turned on by voice in the kitchen, and the temperature of the air conditioner in the bedroom cannot be adjusted by voice control in the living room. If the user wants to control the terminal device, he needs to move to an effective distance or increase the volume, resulting in poor user experience.

Contents of the invention

This embodiment provides a terminal device and a server for voice control, including: a voice collector configured to collect a voice signal input by a user; a controller configured to: receive a voice signal input by a user from the voice collector , sending the voice signal to a server, and receiving a voice command from the server, wherein the voice command is generated according to the voice signal; when the terminal device can perform an operation corresponding to the voice command, Responding to the voice instruction, performing an operation corresponding to the voice instruction; when the terminal device cannot perform the operation corresponding to the voice instruction, generating an instruction distribution request, and sending the instruction distribution request to the server, The server is configured to search for other terminal devices capable of performing operations corresponding to the voice command according to the command distribution request, and send the voice command to other terminal devices.

Description of drawings

FIG. 1 is a schematic diagram of a voice interaction principle provided by an embodiment of the present application;

FIG. 2 is a schematic framework diagram of a voice control system of a terminal device provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of a scenario of a voice control system of a terminal device provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a scene of another voice control system of a terminal device provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a scene of another voice control system of a terminal device provided by an embodiment of the present application;

FIG. 6 is a signaling diagram of a voice control method for a terminal device provided in an embodiment of the present application;

FIG. 7 is a usage scenario diagram of a voice control system provided by an embodiment of the present application;

FIG. 8 is a hardware configuration diagram of a terminal device provided in an embodiment of the present application;

FIG. 9 is a schematic diagram of a voice interaction process provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of multiple terminal devices responding to voice interaction effects provided by an embodiment of the present application;

FIG. 11 is a schematic flowchart of a multi-device voice wake-up method provided by an embodiment of the present application;

FIG. 12 is a schematic flow diagram of a screening process for a second terminal device provided in an embodiment of the present application;

FIG. 13 is a schematic flowchart of determining a second terminal device according to the number of devices provided by an embodiment of the present application;

Fig. 14 is a schematic flow diagram of a marking master device provided by the embodiment of the present application;

FIG. 15 is a schematic diagram of a process flow for updating device status provided by an embodiment of the present application;

FIG. 16 is a server-side sequence flow chart of a multi-device voice wake-up method provided by an embodiment of the present application;

FIG. 17 is a sequence flow chart on the terminal device side of a multi-device voice wake-up method provided by an embodiment of the present application;

FIG. 18 is an application scenario diagram of a terminal device provided in an embodiment of the present application;

FIG. 19 is another application scenario diagram of a terminal device provided by an embodiment of the present application;

FIG. 20 is a schematic flowchart of a voice control method provided in an embodiment of the present application;

FIG. 21 is a schematic flowchart of a screening terminal device provided by an embodiment of the present application;

FIG. 22 is a schematic diagram of a scene of a voice control process provided by an embodiment of the present application;

FIG. 23A is a schematic flowchart of a voice control method provided by an embodiment of the present application;

FIG. 23B is a schematic diagram of the principle of a voice control method provided in the embodiment of the present application;

FIG. 23C is a schematic diagram of the process of determining the second set of candidate terminal devices in the embodiment of the present application;

FIG. 24A is a schematic flowchart of another terminal home control method provided by the embodiment of the present application;

FIG. 24B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application;

FIG. 25A is a schematic flowchart of another voice control method provided by the embodiment of the present application;

FIG. 25B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application;

FIG. 26A is a schematic flowchart of another voice control method provided by the embodiment of the present application;

FIG. 26B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application;

FIG. 26C is a schematic diagram of the process of obtaining control information in the embodiment of the present application;

FIG. 27A is a schematic structural diagram of a local control device in an embodiment of the present application;

FIG. 27B is a schematic structural diagram of interaction between a local control device and a terminal device in an embodiment of the present application.

detailed description

In order to make the purpose and implementation of the application clearer, the following will clearly and completely describe the exemplary implementation of the application in conjunction with the accompanying drawings in the exemplary embodiment of the application. Obviously, the described exemplary embodiment is only the present application. Claim some of the examples, not all of them.

It should be noted that the brief description of the terms in this application is only for the convenience of understanding the implementations described below, and is not intended to limit the implementations of this application. These terms are to be understood according to their ordinary and usual meaning unless otherwise stated.

In order to clearly illustrate the embodiment of the present application, a voice recognition network architecture provided by the embodiment of the present application will be described below with reference to FIG. 1 .

Referring to FIG. 1 , FIG. 1 is a schematic diagram of a voice interaction principle provided by an embodiment of the present application. In FIG. 1 , the smart device is used to receive input information and output processing results of the information. Speech recognition service equipment is electronic equipment deployed with voice recognition services, semantic service equipment is electronic equipment deployed with semantic services, and business service equipment is electronic equipment deployed with business services. The electronic device here may include a server, a computer, etc., and the speech recognition service, semantic service (also called a semantic engine) and business service here are web services that can be deployed on the electronic device, wherein the speech recognition service is used for audio Recognized as text, the semantic service is used for semantic analysis of the text, and the business service is used to provide specific services such as the weather query service of Moji Weather, the music query service of QQ Music, etc. In an embodiment, there may be multiple entity service devices deployed with different business services in the architecture shown in FIG. 1 , or one or more functional services may be integrated in one or more entity service devices.

In some embodiments, the following is an example description of the process of processing the information input to the smart device based on the architecture shown in Figure 1. Taking the information input to the smart device as a query sentence input by voice as an example, the above process may include the following three processes :

[Speech Recognition]

After receiving the query sentence input by voice, the smart device can upload the audio of the query sentence to the voice recognition service device, so that the voice recognition service device can recognize the audio as text through the voice recognition service and return it to the smart device. In one embodiment, before uploading the audio of the query sentence to the speech recognition service device, the smart device may perform denoising processing on the audio of the query sentence, where the denoising processing may include steps such as removing echo and environmental noise.

[semantic understanding]

The smart device uploads the text of the query sentence recognized by the speech recognition service to the semantic service device, so that the semantic service device can perform semantic analysis on the text through the semantic service to obtain the business field and intention of the text.

[semantic response]

According to the semantic analysis result of the text of the query statement, the semantic service device sends a query instruction to the corresponding business service device to obtain the query result given by the business service. The smart device can obtain and output the query result from the semantic service device. As an embodiment, the semantic service device can also send the semantic analysis result of the query sentence to the smart device, so that the smart device can output the feedback sentence in the semantic analysis result.

It should be noted that the architecture shown in FIG. 1 is only an example, and does not limit the protection scope of the present application. In the embodiment of the present application, other architectures may also be used to implement similar functions. For example, all or part of the three processes may be completed by a smart terminal, which will not be described in detail here.

In some embodiments, the smart device shown in Figure 1 can be a display device, such as a smart TV, and the function of the voice recognition service device can be realized by the cooperation of the sound collector and the controller set on the display device, and the semantic service device and business service device The functions of can be realized by the controller of the display device, or by the server of the display device.

In order to solve the above problems, the present application provides a voice control system for a terminal device, as shown in FIG. 2 , which is a schematic framework diagram of a voice control system for a terminal device provided in an embodiment of the present application. The system includes at least two terminal devices 200 and a server 400 . The terminal device 200 is configured to collect voice signals input by the user. The terminal device 200 communicates with the server 400. The server 400 is configured to receive a signal or a request sent by the terminal device 200 , and feed back corresponding instructions to the terminal device 200 .

In some embodiments, the sound collector of the terminal device 200-1 collects the voice signal input by the user. Then the terminal device 200 - 1 sends the collected voice signal to the server 400 . The server 400 generates voice instructions according to the voice signal. It should be noted that the server 400 uses the semantic system to convert the voice signal into a voice instruction, and the specific conversion process here is not limited in this application.

Further, the server 400 feeds back the converted voice instruction to the terminal device 200-1. After the terminal device 200-1 receives the voice command, the local execution capability module of the terminal device 200-1 judges whether the terminal device has the capability to execute the operation corresponding to the voice command. If the judging result is that the operation corresponding to the voice command is capable of being performed, the voice command is sent to the controller. In response to the voice command, the controller controls the terminal device 200-1 to perform an operation corresponding to the voice command.

If the judgment result is that the machine is not capable of performing the operation corresponding to the voice instruction, an instruction distribution request is generated according to the voice instruction, and the instruction distribution request carries the voice instruction. Then send the instruction distribution request to the server 400 . After receiving the command distribution request, the server 400 searches for a terminal device 200 that can execute the voice command. For example, if it is found that the terminal device 200-2 can perform the operation corresponding to the voice command, the voice command is sent to the terminal device 200-2, so that the controller of the terminal device 200-2 controls the terminal device in response to the voice command 200-2 Execute the operation corresponding to the voice instruction.

Exemplarily, in a scenario, the user inputs a voice signal "turn on the TV" near the smart speaker, the TV is in the bedroom, but the smart speaker is in the kitchen. After receiving the voice signal “turn on the TV”, the smart speaker sends the voice signal to the server 400 . The server converts the voice signal into a voice command, and feeds the voice command back to the smart speaker.

Since the smart speaker cannot perform the operation of “turning on the TV”, the smart speaker sends an instruction distribution request carrying the voice command “turn on the TV” to the server 400 . After receiving the command distribution request, the server 400 searches for a terminal device capable of executing the voice command "turn on the TV". If it is found that the terminal device capable of executing the voice command "turn on the TV" is a TV, then the voice command "turn on the TV" is sent to the TV in the bedroom. After the TV in the bedroom receives the voice command "Turn on the TV", it responds to the voice command and performs a power-on operation. In this way, the purpose of turning on the TV in the bedroom can be controlled by voice without being in the bedroom.

In some embodiments, each terminal device is configured with a local execution capability filtering module, and the local execution capability filtering module is configured with local capability attribute parameters. The specific steps for the terminal device to determine whether the machine is capable of performing the corresponding operation of the voice command are as follows:

Parse the pending capability attribute parameter from the voice command, where the pending capability attribute parameter is the corresponding operation. The native capability attribute parameter is matched with the pending capability attribute parameter. If the native capability attribute parameter matches the pending capability attribute parameter, it means that the terminal device can execute the operation corresponding to the voice command. If the local capability attribute parameter does not match the pending capability attribute parameter, it means that the terminal device cannot perform the operation corresponding to the voice command.

The terminal device 200-1 is a display device, the terminal device 200-2 is an air conditioner device, the terminal 200-3 is a washing machine device, and the terminal 200-4 is a refrigerator device. Then the local capability attribute parameter of the terminal device 200-1 is playing audio and video, the local capability attribute parameter of the terminal 200-2 is cooling and heating, the local capability attribute parameter of the terminal 200-3 is laundry, and the terminal 200-4 The Native Capability property parameter of is Cooling.

If the user inputs the voice signal "heating" within the signal receiving range of the terminal device 200-2, the terminal device 200-2 collects the voice signal and receives the voice command "heating" sent by the server 400. Further, it can be analyzed from the voice command that the attribute parameter of the capacity to be processed is "heating". Then, through the local capability filtering module of the terminal device 200-2, the local capability attribute parameter of the terminal device 200-2 is "heating". The native capability attribute parameter of the terminal device 200-2 can match the capability attribute parameter to be processed. It means that the terminal device 200-2 can perform the operation corresponding to the voice command "heating".

It should be noted that if the attribute parameter of the capability to be processed cannot completely match the attribute parameter of the native capability in terms of text. The native capability filtering module of the present application can also perform corresponding conversion according to the parsed capability attribute parameters to be processed. For example, if the user inputs the voice signal "heating" within the signal receiving range of the terminal device 200-2, the text "heating" will be obtained after being parsed by the local capability attribute module. At this time, the capability attribute parameter to be processed cannot completely match the local capability attribute parameter “heating” of the terminal device 200-2. The local capacity filtering module can analyze the attribute parameters of the capacity to be processed, and obtain that "heating" and "heating" have the same meaning. Therefore, the pending capability attribute parameter is considered to match the native capability attribute parameter. That is, it is obtained that the terminal device 200-2 can realize the operation corresponding to the voice signal "heating".

If the user inputs the voice signal "play music" within the signal receiving range of the terminal device 200-2, the terminal device 200-2 collects the voice signal and receives the voice command "play music" sent by the server 400. Further, it can be analyzed from the voice command that the attribute parameter of the capability to be processed is "play music". Then, through the local capability filtering module of the terminal 200-2, the local capability attribute parameters of the terminal device 200-2 are "cooling" and "cooling". Then the local capability attribute parameter of the terminal device 200-2 does not match the capability attribute parameter to be processed. It means that the terminal device 200-2 cannot perform the operation corresponding to the voice instruction "play music".

In some embodiments, if the voice command only includes the device name, the server searches for the second terminal device corresponding to the device name according to the command distribution request, and sends the voice command to the second terminal device. so that the second terminal device performs a corresponding operation in response to the voice instruction.

Figure 3 is a schematic diagram of a voice control system for a terminal device provided in the embodiment of the present application. Device 3” and other voice commands. These voice commands all only include the device name "device 3". If the device name of device 1 does not match the device name included in the voice command, device 1 cannot perform the operation corresponding to the voice command. The server searches for a terminal device with a matching name based on the device name. Finally, it is found that the device name of device 3 matches the device name included in the voice command, and the server sends the voice command to device 3 . After the device 3 receives the voice command "turn on the device 3", it responds to the voice command and executes the starting operation. Alternatively, after the device 3 receives the voice command "shut down the device 3", it responds to the voice command and performs a shutdown operation.

In some embodiments, after the server finds the second terminal device and sends the voice command to the second terminal device, the second terminal device can also use the local capability filtering module to reconfirm whether the machine can perform the operation corresponding to the voice command. If it is confirmed again that the machine can perform the operation corresponding to the voice command, the corresponding operation is performed in response to the voice command. If it is reconfirmed that the machine cannot perform the operation corresponding to the voice command, the second terminal device can feed back an error signal to the server, so that the server can search for a terminal device that can perform the corresponding operation of the voice command again.

In some embodiments, if the voice command only includes the device capability, the server searches for a second terminal device having the device capability according to the command distribution request, and sends the voice command to the second terminal device. so that the second terminal device performs a corresponding operation in response to the voice instruction.

Fig. 4 is a schematic diagram of another voice control system scenario of a terminal device provided in the embodiment of the present application. As shown in Fig. 4, the user inputs "reduce the temperature" or "adjust the temperature to 20 degrees" within the acceptable range of the speaker device signal , "increase temperature", "increase wind speed" and other voice commands. These voice commands only include device capabilities. If the local capability attribute parameter of the speaker device does not conform to the device capability parameter included in the above voice command, the audio device cannot perform the operation corresponding to the above voice command. The speaker device sends an instruction distribution request to the server, and the server searches for a terminal device that meets the device's capability parameters according to the instruction distribution request. Among the devices shown in FIG. 4 , only the local capability attribute parameter of the air conditioner conforms to the device capability parameter. Then the server sends the voice command to the air conditioner, and the air conditioner performs corresponding operations in response to the voice command after receiving the voice command.

It should be noted that if the device capability parameters included in the voice command are met by multiple terminal devices, and the voice command only includes the device capability parameters, the server will send the voice command to the multiple terminal devices that meet the conditions. Multiple terminal devices perform corresponding operations in response to the voice instruction.

The user inputs "lower temperature" within the acceptable range of the air conditioner signal, and the local capability filtering module of the air conditioner first judges that the local unit can perform the operation corresponding to the voice command according to the local capability attribute parameters. Further, the air conditioner also sends an instruction distribution request to the server. According to the device capability parameter carried in the specified distribution request, the server searches for terminal devices other than the air conditioner that also meet the device capability parameter. That is, find the terminal device that can perform the operation corresponding to the voice command "lower temperature". Finally, if you find the refrigerator, you can also perform the corresponding operation of the voice command "lower temperature". The server sends the voice command "lower temperature" to the refrigerator, so that the refrigerator performs corresponding operations in response to the voice command. Through this embodiment, the user can input a voice command once and control multiple terminal devices at the same time.

It should also be noted that if the user inputs a voice command once, multiple terminal devices can be controlled at the same time, but the user does not need to control multiple terminal devices. The user can enter voice commands that include both device name and device capability parameters.

For example, if the user inputs "decrease the temperature of the air conditioner" within the receivable range of the air conditioner signal, the voice command includes the device name "air conditioner" and the device capability parameter "reduce the temperature". Then the air conditioner judges through the local capability filtering module that the local machine can perform the operation corresponding to the voice command, and at the same time, the device name of the air conditioner matches the device name carried in the voice command. Therefore, the air conditioner no longer sends an instruction distribution request to the server, and the server no longer searches for other terminal devices.

In some embodiments, if the voice instruction only carries custom rules, the server searches for a matching second terminal according to the custom rules. Wherein, in the self-defined rule, the voice command has a corresponding relationship with the terminal device.

For example, FIG. 5 is a schematic diagram of another voice control system scenario of a terminal device provided in the embodiment of the present application. As shown in FIG. Device 4 gives priority to playing audio novels, etc., that is, the instruction to play music corresponds to Device 2, the instruction to play movies and TV corresponds to Device 3, and the instruction to play audio novels corresponds to Device 4. When the user inputs the voice command "play music" within the signal receiving range of device 1, the local capability filtering module of device 1 first judges that the machine cannot perform the operation corresponding to the voice command, and then device 1 sends a command distribution command to the server. According to the custom rules, the server finds that the terminal device corresponding to the voice command "play music" is device 2, and then sends the voice command to device 2.

When the user inputs the voice command "play video" within the signal receiving range of device 1, first the local capability filtering module of device 1 judges that the machine cannot perform the operation corresponding to the voice command, and then device 1 sends the command distribution command to the server. According to the custom rules, the server finds that the terminal device corresponding to the voice command "play video" is device 3, and then sends the voice command to device 3.

In some embodiments, the server includes a fusion capability rules database and an instruction distribution module. The native capability attribute parameters of all devices are stored in the fusion capability rule database. Operators can update the device's local capability attribute parameters in the fusion capability rule database. For example, if a terminal device has been updated to have a new capability, it is necessary to increase the local capability attribute parameter of the device. In the fusion capability rule database, all devices are stored according to device name and device ID.

The instruction distribution module receives the instruction distribution request sent by the terminal device, and can parse the capability attribute parameter to be processed from the voice instruction carried in the instruction distribution request. Afterwards, the command distribution module searches the fusion capability rule database for the local capability attribute parameters that match the capability attribute parameters to be processed, so as to find the terminal device that can perform the corresponding operation of the voice command. In addition, the native capability filtering module of the terminal device may also search for native capability attribute parameters from the fusion capability rule database.

In some embodiments, when the user inputs vague voice commands, there may be multiple terminal devices that can perform operations corresponding to the voice commands. The vague voice command may be a vague device control command, a vague media asset playback command, and the like.

For example, in a family scenario, there may be multiple air conditioners. When the user enters the voice command "turn on the air conditioner in the living room", according to the device name rules and specific space rules, the voice command can be directly sent to the air conditioner in the living room, so that the air conditioner in the living room can be activated. When the user enters the voice command "turn on the air conditioner", according to the device name rules, both the air conditioner in the living room and the air conditioner in the bedroom can perform the corresponding operation of the voice command. Therefore other properties can be set to target more specific devices. For example, formulate time rules: turn on the air conditioner in the living room from 11:00 to 14:00, and turn on the air conditioner in the bedroom from 15:00 to 17:00. When the user inputs the voice command "turn on the air conditioner" at 12:00, the voice command is sent to the air conditioner in the living room according to the time rule, so that the air conditioner in the living room performs the start operation.

If there are multiple speakers in a home scene. When the voice command input by the user is related to children's stories and nursery rhymes, rules can be formulated, and the voice commands related to children's stories and nursery rhymes are sent to the speakers in the children's room.

In some embodiments, it is also possible to control the switches of the display devices in different spaces according to the playing time of the video programs that the user needs to play. For example, if the user inputs a voice command "play news feed", the user may specify in the rules to watch the news feed in the living room, and the server sends the voice command to the display device in the living room. In order to make the display device in the living room execute the operation of playing the news broadcast.

In some embodiments, the voice command input by the user may include multiple matching items, for example, may include device name, device response time period, space where the device exists, device capability parameters, and the like. Different terminal devices may satisfy the matching items included in the voice command at the same time. For example, the voice command includes four matching items: the device name, the device response time period, the space where the device exists, and the device capability parameter. Device 1 meets the matching item device name and time zone, and device 2 meets the matching item time range and device capability parameters. At this time, the corresponding weight value can be set for each matching item. For example, the weight value of the device name is 10, the weight value of the time period is 5, the weight value of the space is 3, and the weight value of the device capability parameter is 8. According to formula (1)

Wherein, a ⁱ is the weight value of the matching items that each terminal device meets, and the final weight values of device 1 and device 2 are respectively obtained. The total value of the weight attribute of device 1 is 15, and the total value of the weight attribute of device 2 is 11. Then the total value of the weight attribute of device 1 is the largest, and device 1 is the optimal matching terminal device. Finally, the server sends the voice command to the device 1, so that the device 1 performs corresponding operations in response to the voice command.

It should be noted that the servers in this application can be divided into semantic servers and instruction distribution servers. The semantic server is used to recognize voice commands from voice signals input by users. The instruction distribution server stores a fusion capability rule database, which is used to search for terminal devices that can perform operations corresponding to voice instructions according to the instruction distribution request. The semantic server can be a web server, while the instruction distribution server is a local server. Since the local server has the advantage of fast response, the response speed of the entire voice control process can be improved.

Based on the above embodiments, the present application also provides a voice control method for a terminal device. FIG. 6 is a signaling diagram of a voice control method for a terminal device provided in an embodiment of the present application. The signaling diagram is shown in FIG. 6 , so Said method comprises the following steps:

S101: The microphone of the first terminal device receives a voice signal input by a user.

S102: The first terminal device sends the voice signal to the server.

S103: The server generates a voice command according to the voice signal, and feeds back the voice command to the first terminal device.

S104: The first terminal device judges whether the machine can perform the operation corresponding to the voice command.

S105: If the machine can perform the operation corresponding to the voice command, perform the corresponding operation in response to the voice command.

S106: If the machine cannot perform the operation corresponding to the voice instruction, send an instruction distribution request to the server. The instruction distribution request carries the voice instruction.

S107: After receiving the instruction distribution request, the server searches for a second terminal device that can perform the operation corresponding to the voice instruction according to the instruction distribution request, and sends the voice instruction to the second terminal device.

S108: The second terminal device performs a corresponding operation in response to the voice instruction.

In some embodiments, the specific process for the first terminal device to determine whether the machine can perform the operation corresponding to the voice command is as follows:

By parsing the to-be-processed capability attribute parameters from the voice command, the local capability filtering module of the first terminal device may acquire the native capability attribute parameters from the fusion capability rule database. Afterwards, the local capability attribute parameter and the pending capability attribute parameter are matched, and if they match, the first terminal device may execute the operation corresponding to the voice command. If they do not match, the first terminal device cannot perform the operation corresponding to the voice command.

In some embodiments, if the voice instruction only carries the device name, the server searches for the second terminal device corresponding to the device name when searching for the second terminal device. For example, if the voice instruction is "turn on the speaker", the server searches for the speaker device according to the device name "speaker".

In some embodiments, if the voice instruction only carries the device capability parameter, the server searches for the second terminal device with the device capability parameter when searching for the second terminal device. For example, if the voice command is "decrease temperature", it is recognized that the capability parameter of the device to be processed is "decrease temperature". The instruction distribution module of the server can search the fusion capability rule database for the native capability attribute parameter matching the capability parameter of the device to be processed. That is, the terminal device that can perform the operation corresponding to the voice command is found.

In some embodiments, if the voice instruction is an instruction corresponding to a custom rule. Then, when searching for the second terminal device, the server searches for the second terminal device corresponding to the custom rule. For example, in the scenario shown in FIG. 5 , the custom rules include: device 2 gives priority to playing music, device 3 gives priority to playing videos, and device 4 gives priority to playing audio novels, etc. If the voice command input by the user is "play music", then the device 2 corresponds to the custom rule, and the device 2 is determined as the second terminal device.

In some embodiments, if the voice instruction includes at least two matching items, each matching item is set with a weight attribute value. If the server finds at least two terminal devices satisfying at least one matching item when searching according to matching items, then calculate the total weight attribute value of all matching items satisfied by these terminal devices, that is, the sum of weight values. The one with the largest weight attribute total value is determined to be the second terminal device.

The voice control system in the embodiment of the present application is a network system established based on a specific area network and based on a unified control service. The voice control system may include a plurality of terminal devices 200 that establish communication connections with each other. Multiple terminal devices 200 can realize the communication connection relationship between the devices by accessing the same local area network. A plurality of terminal devices 200 can also directly form a point-to-point network through a unified communication protocol to realize a communication connection. For example, multiple terminal devices 200 may communicate with each other by connecting to the same wireless local area network. For another example, one terminal device 200 may also establish communication connections with other multiple terminal devices 200 through Bluetooth, infrared, cellular network, power carrier communication and other means.

Wherein, the terminal device 200 refers to a device having a communication function, capable of receiving, sending, and executing control instructions and realizing specific functions. The terminal device 200 includes, but is not limited to, a smart display device, a smart terminal, a smart home appliance, a smart gateway, a smart lighting device, a smart audio device, a game device, and the like. The multiple terminal devices 200 constituting the voice control system may be of the same type or of different types. For example, as shown in FIG. 7 , in the same voice control system, smart TVs, smart speakers, smart refrigerators, multiple smart lamps, etc. may be included. These terminal devices 200 may be distributed in different locations, so as to meet usage requirements at corresponding locations.

It should be noted that the voice control system described in this application does not limit the scope of application of the solution to be protected in this application. That is, in practical applications, the server, terminal equipment and voice control method provided by this application are not limited to the application in the field of smart home, for other systems that support intelligent voice control, such as smart office systems, smart service systems, smart management The same applies to systems, industrial production systems, etc.

According to the actual functions of the terminal device 200, the terminal device 200 has a specific hardware configuration. As shown in FIG. 8, taking a display device as an example, a terminal device 200 with a display function may include a tuner and demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, and an audio output interface. 270. At least one of a memory, a power supply, and a user interface.

In some embodiments, the controller 250 includes a CPU, a video processor, an audio processor, a graphics processor, a RAM, a ROM, and a first interface to an nth interface for input/output.

In some embodiments, the display 260 includes a display screen component for presenting images, and a drive component for driving image display, for receiving image signals output from the controller, and displaying video content, image content, and menu manipulation interface. Components and user manipulation of the UI interface, etc.

In some embodiments, the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.

In some embodiments, the tuner-demodulator 210 receives broadcast TV signals through wired or wireless reception, and demodulates audio and video signals, such as EPG data signals, from multiple wireless or cable broadcast TV signals.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: high-definition multimedia interface (HDMI), analog or data high-definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), Any one or more interfaces such as RGB ports. It may also be a composite input/output interface formed by the above-mentioned multiple interfaces.

In some embodiments, the controller 250 controls the work of the smart device and responds to user operations through various software control programs stored in the memory. The controller 250 controls overall operations of the terminal device 200 . For example, in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the user can input user commands through a graphical user interface (GUI) displayed on the display 260, and the user input interface receives user input commands through the graphical user interface (GUI). Alternatively, the user may input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through a sensor to receive the user input command.

In some embodiments, the terminal device 200 also performs data communication with the server 400 . The terminal device 200 may be allowed to communicate via a local area network (LAN), a wireless local area network (WLAN), and other networks. The server 400 may provide various contents and interactions to the terminal device 200 . The server 400 may be one cluster, or multiple clusters, and may include one or more types of server groups.

In some embodiments, the terminal device 200-1 may have a built-in voice control system to support the user's intelligent voice control. The intelligent voice control refers to an interactive process in which the user operates the terminal device 200-1 by inputting voice and audio data. To implement intelligent voice control, the terminal device 200-1 may include an audio input device and an audio output device. Wherein, the audio input device is used to collect voice and audio data input by the user, and may be a built-in or external microphone device of the terminal device 200-1. The audio output device is used to emit sound to play the voice response. For example, as shown in FIG. 9, when the user inputs a wake-up word such as "Hi! Little ×" through the audio input device, the terminal device 200-1 can play a voice response of "I'm here" through the audio output device to guide the user to complete the follow-up. Voice input.

In some embodiments, the built-in intelligent voice system of the terminal device 200 also supports a one-language direct mode, that is, supports a "one-shot" mode. In this mode, the user can directly realize the control function through a small number of voice input. For example, in the traditional mode, if the user wants to control the terminal device 200 to play movie resources, he needs to input the voice "Hi, X" first, and then input "I want to watch a movie" after the terminal device 200 feedbacks "I'm here", then The terminal device 200 feeds back "the following movies have been found for you". In the "one-shot" mode, the user can directly input "Hi! X, I want to watch a movie", and the terminal device 200 will directly feed back "find the following movies for you" after receiving the voice command, reducing the number of voice interactions , improve voice interaction efficiency.

For multiple terminal devices 200 in the same voice control system, the user can control the linkage of multiple devices through intelligent voice. For example, the user can input a voice command "turn on the bedroom light" through the smart speaker, and the smart speaker can respond to the voice command to generate a control command for turning on the light, and then send the control command to the voice control system named "bedroom". lamps to control the turning on of the bedroom lights. At the same time, the smart speaker also responds to the user's voice input, that is, it plays feedback voice content such as "the bedroom light has been turned on for you".

During linkage control between multiple terminal devices 200, the control command can be directly transmitted to the controlled device through the terminal device 200-1 that receives the user's voice and audio data, or can be transmitted to a specific intermediate device such as a router through the terminal device 200-1. The relay device, and then passed to the controlled device by the relay device. In some embodiments, the control instruction may also be transmitted to the controlled device through the server 400 . For example, when the user controls a terminal device 200 in the voice control system through the smart terminal 300 outside the local area network where the smart home is located, the smart terminal 300 can first send the control command to the server 400, and the server 400 then transmits the control command to the terminal device 200, for control.

In order to control the terminal devices 200 in the voice control system, the server 400 can issue control instructions and related data to any terminal device 200 independently. For example, for a display device, the user can control the display device to request online playback of media assets through interactive operations, and the server 400 can feed back media asset data to the display device according to the playback request. As for the linkage control for multiple terminal devices 200, the server 400 can send control instructions and related data to the voice control system in a unified manner. For example, when the user's smart speaker controls to turn on the lamp in the bedroom, the smart speaker can send the control command input by the user to the server 400, and the server 400 sends feedback data to the voice control system, so that the voice control system sends an opening command to the bedroom lamp. At the same time, the control response is fed back to the smart speaker.

Some terminal devices 200 in the voice control system can have a built-in complete voice control system. This type of terminal device 200 can be used as the main control device, which can receive, process and respond independently, and can send voice and audio corresponding control to other terminal devices 200. instruction. For example, a complete voice control system may be built in terminal devices 200 such as a display device, a smart speaker, and a smart refrigerator, so as to receive voice and audio input by a user. Part of the terminal devices 200 in the voice control system may not have a complete intelligent voice system built in, and only serve as controlled devices to receive control instructions sent by the master control device. For example, smart devices such as lamps and small household appliances can receive control instructions from the display device as the main control device to start, stop or change operating parameters.

Since the number of terminal devices supporting a complete voice control system is increasing, the same voice control system may include multiple terminal devices supporting the voice control system. For example, a smart TV, a smart speaker, and a smart refrigerator are set in the same room, and these terminal devices 200 all have built-in complete voice control systems, which can respond to voice commands input by users. However, for different terminal devices 200 that support a complete voice control system, the ways of actually responding to voice commands and the types of supported voice commands are different. For example, as shown in Figure 10, for the voice command "I want to watch a movie" input by the user, the smart TV can respond by displaying a list of movies and feedback the voice content of "the following movies have been found for you". Smart speakers and smart refrigerators cannot respond, so they will feedback the voice content of "I can't understand what you are saying".

It can be seen that since the current voice control system includes multiple terminal devices 200 capable of supporting voice control, for the same voice command, multiple terminal devices 200 may wake up at the same time or by mistake, resulting in scene confusion , seriously affecting the user experience.

In order to alleviate the problem of scene confusion, in some implementations, the user can define a response device through the application program in the smart terminal 300 according to usage habits, and freely switch between different wake-up strategies. For example, the user can manually set the smart speaker as the main response device, then the voice command input by the user can be responded by the smart speaker, and control commands are sent to other terminal devices 200 through the smart speaker, so as to realize the control of the terminal devices in the entire voice control system. Intelligent voice control.

However, the method of controlling the wake-up policy in a user-defined manner requires the user to perform multiple manual switching operations, which is not intelligent enough. Moreover, no matter which wake-up strategy is switched to, the current execution process of multi-device wake-up is to determine which terminal device is currently woken up by communicating with each other between the devices to be woken up. There are great risks in this execution method. On the one hand, when there are a large number of devices to be woken up, since the wake-up process requires information interaction between each terminal device 200, it cannot be guaranteed that all terminal devices 200 will be completed within the specified time. The interaction of information between the terminal equipment 200 causes abnormal responses. On the other hand, because the wake-up delays of different types of terminal devices 200 are different, that is, the time from wake-up to response is different, so it cannot be guaranteed that different types of terminal devices 200 can be in the device information interaction time period at the same time when making wake-up decisions, and partially wake up The time-extended terminal device 200 may not have received the wake-up word during device information interaction, thus missing the time for device information interaction, causing the terminal device 200 to be unable to respond to the voice, and the problem of abnormal voice control occurs.

In order to alleviate the problem of abnormal voice control, some embodiments of the present application provide a voice control method, which can be applied to a voice control system. The voice control system includes a server 400 and multiple terminal devices 200 . Wherein, the server 400 should at least include a storage module 410 , a communication module 420 and a control module 430 . The storage module 410 is configured to store the device status reported by the terminal device 200 . The communication module 420 is configured to establish a communication connection with a plurality of terminal devices 200 to obtain device statuses reported by the terminal devices 200 and to issue control instructions and related data to the plurality of terminal devices 200 . The control module 430 is configured to execute the program steps on the side of the server 400 in the voice control method, so as to issue a response command or a silence command to different terminal devices 200 .

Similarly, in order to satisfy the implementation of the voice control method, the terminal device 200 in the voice control system should at least include an audio input device, an audio output device, a communicator 220 and a controller 250 . Wherein, the audio input device is configured to detect voice audio data input by the user. The audio output device is configured to play the spoken response. The communicator 220 is configured to establish a communication connection with the server 400 , so as to report the status of the device to the server 400 and receive a response instruction or a silent instruction issued by the server 400 . The controller 250 is configured as a program step executed on the terminal device 200 side in the voice control method, so as to complete the response of the intelligent voice control process.

As shown in Figure 11 and Figure 12, the voice control method includes the following contents:

The terminal device 200 acquires voice and audio data input by the user. When the user is in the environment of the voice control system, he can perform voice input in real time, and the built-in audio input device of the terminal device 200 can convert the voice signal input by the user into an electrical signal, and undergo a series of noise reduction, amplification, encoding, conversion, etc. The signal processing method obtains speech and audio data. During voice interaction, the user can input voice and audio data in various ways. That is, in some embodiments, the user can input voice and audio data through the built-in audio input device of the terminal device 200 . For example, the user can input the voice "Hi! Xiao X, I want to watch a movie" through the built-in microphone device on the terminal device 200, then the microphone can convert the voice signal into an electrical signal, and transmit it to the controller 250 for subsequent processing .

In order to trigger the terminal device 200 to perform intelligent voice control, in some embodiments, the user may also include a specific wake-up word in the input voice command. The wake-up word is a piece of speech containing specific content, such as "Hi! Xiao×", "Xiao×xiao×", "Hey!××" and so on. For the process of the user inputting voice and audio data, especially the process of inputting voice and audio data through the far-field microphone built in the terminal device 200, the terminal device 200 can judge whether the voice input by the user contains a wake-up word. , and then perform subsequent processing to alleviate false triggering of the intelligent voice control process.

According to the transmission characteristics of the sound signal, usually the sound volume attenuation of the user's voice detected by the voice control system closer to the user is small, and the propagation distance is relatively short, so after the user makes a sound, the terminal device 200 closer to the user will first detect to the user's voice audio data. However, since the specific content of the voice input by the user is different in different situations, the terminal device 200 that responds to the voice is uncertain, that is, the terminal device 200 that responds to the voice may be a device that is closer to the user, or may be a device that is farther away from the user. remote device. For example, when a user enters the voice of "Hi! X, I want to watch a movie" in the bedroom, the smart speaker in the bedroom will first detect the voice and audio data, but the smart speaker does not have a video playback function, while the smart speaker in the living room Smart TVs have video playback capabilities.

Therefore, in order to respond to the current user's voice, after acquiring the voice and audio data input by the user, the terminal device 200 will generate a voice instruction according to the voice and audio data. Among them, the voice command is a control command, which has a specific command format, including control action functions, control object codes, and the like. After the terminal device 200 receives the voice and audio data, the terminal device 200 can first convert the voice and audio data into text through the voice processing module in the intelligent voice system, that is, convert the waveform data in the voice and audio data into text data.

After being converted into text data, the terminal device 200 can use a word segmentation tool to convert unstructured text data into structured text data. That is, the terminal device 200 can remove meaningless text content such as modal particles and auxiliary words in the text data by means of thesaurus matching, retain keywords in the text data, and separate multiple keywords according to word meanings to obtain structured text.

After obtaining the structured text data, the terminal device 200 may also input the structured text into the word processing model. The word processing model is an artificial intelligence model based on machine learning. After the text data is input, the word processing model can calculate and determine the classification probability that the text information belongs to a specific semantic meaning. Therefore, by using various standard control instructions as classification labels, the word processing model can output the classification probability of text data for each standard control instruction, where the standard control instruction with the highest classification probability is the control instruction corresponding to the voice and audio data .

The word processing model can be obtained by repeatedly training the initial model by using sample data and set input and output rules. Among them, the sample data is text information with labels. In the process of model training, the sample data can be used as the input and the classification probability can be used as the output to perform calculations on the sample data. And compare the output result with the label in the sample data to obtain the training error, and then backpropagate the training error, that is, adjust the model parameters according to the training error, so that after repeated input of a large number of sample data, an accurate output can be obtained Word processing model for recognition results.

After the model calculation, the terminal device 200 can convert the voice and audio data input by the user into voice instructions. After conversion by the terminal device 200, the controlled device or the server 400 can directly process the voice command after receiving the voice command, such as executing control actions according to the voice command and extracting service requirement information from the voice command.

Obviously, in some embodiments, the terminal device 200 can directly send voice and audio data as a voice command, that is, for a terminal device 200 with low data processing capability or without a built-in complete voice control system, the terminal device 200 can directly send the audio data to The data is forwarded, and the server 400 or other terminal devices 200 perform language processing, so as to alleviate the computing load of the current terminal device 200 .

After generating the voice command, the terminal device 200 may send the voice command to the server 400 to trigger the server 400 to perform control on the wake-up process of multiple terminal devices 200 . It should be noted that since the voice control system may include multiple terminal devices 200 with built-in voice control systems, when the user inputs voice, the multiple terminal devices 200 in the voice control system can all detect voice and audio data. , in order to avoid repeated data transmission, the server 400 may suspend the voice command generation process and the voice command reporting process in other terminal devices 200 after receiving a voice command.

For example, after the smart TV sends a voice command to the server 400, the server 400 can send a control command for suspending command generation and command transmission to the smart speakers and smart refrigerators in the voice control system where the smart TV is located, then after receiving the control command After that, both the smart speaker and the smart refrigerator stop generating and sending voice commands. Since the terminal device 200 with higher data processing capability can usually complete the voice and audio data calculation in a shorter time, it can complete the generation of the voice instruction before other devices. Therefore, after receiving the voice command sent first, the server 400 stops the voice command generation and reporting process of other terminal devices 200, which can also shorten the voice command generation time and improve the voice response speed.

After receiving the voice command, the server 400 may analyze the service requirement information in the voice command. For different voice commands input by the user, the control content contained therein is also different, so they have different service requirements. For example, when the user inputs the voice "Hi! X, I want to listen to music", after processing by the terminal device 200, a voice command is generated, and the voice command includes the service requirement of "play music" (music_play). When the user enters the voice "Hi! X, turn on the bedroom light", a voice command containing the business requirement of "turn on the lamp" (light_power on) is generated.

Apparently, when the voice instruction contains service requirement information, the server 400 can directly extract the service requirement information from the summary of the voice instruction. And when the voice command is the voice and audio data uploaded by the terminal device 200, the server 400 can also identify and process the voice and audio data uploaded by the terminal device 200, that is, the processing method performed by the terminal device 200 on the voice and audio data in the above-mentioned embodiment is the same In other words, the server 400 can also recognize the voice and audio data through built-in speech-to-text tools, text structured processing tools, and word processing models, so as to identify business requirement information therefrom.

In order to facilitate the server 400 to parse the business requirement information from voice instructions, in some embodiments, a business requirement recognition model can be set, or the output classification of the above-mentioned word processing model can be set as a business requirement, so as to calculate the user voice and audio data through the model. Classification probability for each business requirement.

It should be noted that since the voice content input by the user may contain multiple user intentions, multiple service requirements may also be parsed from the corresponding voice instructions. For example, if the user inputs the voice "Hi! X, turn on the light in the living room and play a movie", the two business requirements of "turn on the light" and "play a movie" can be parsed out from the voice command. In addition, the voice control system can also realize richer voice interaction functions by presetting a richer instruction set, and then according to the set instruction set, the business requirements contained in it can be determined correspondingly. For example, if the user inputs the voice "Hi! X, turn on theater mode", the voice control system can determine the control content of the voice control system according to the instruction set of "theater mode", including playing a movie and turning off the lights at the same time, so as to imitate the atmosphere of a movie theater . Therefore, the server 400 can parse out the two service requirements of "turn off the lamp" and "play a movie" from the voice command.

Different service requirements correspond to different control operations performed by the terminal device 200, and correspond to different device states for the terminal device 200 that needs to respond to the voice command. For example, for a lamp, it can only support on/off, brightness adjustment and other controls when it is in the standby state; when the user turns off the power supply of the lamp through the wall switch and makes it offline, it cannot support on/off, brightness adjustment, etc. control.

Therefore, the terminal device 200 can report the device status to the server 400 through a predetermined information reporting strategy. In some embodiments, the terminal device 200 may report the current device status to the server 400 every specific time according to the data update frequency, and the server 400 may update the stored device status according to the reported status of the terminal device 200 .

For example, the server 400 may send a heartbeat command to the terminal device 200, and the terminal device 200 may feed back the current device status to the server 400 after receiving the heartbeat command, so that the server 400 may update the stored device status. And when the server 400 sends a heartbeat command to the terminal device 200 within a preset period, and the terminal device 200 does not feed back a heartbeat command to the server 400, the server 400 may update the corresponding device status to an offline status.

In order to make the device state based on the voice interaction process a more effective device state, in some embodiments, the device state of the terminal device 200 may also be triggered to be reported through a voice command. That is, the server 400 may acquire the voice and audio data corresponding to the voice command, and recognize the wake-up word from the voice and audio data. If the voice and audio data includes the wake-up word, locate the voice control system where the terminal device 200 is located, so as to send a status acquisition request to the voice control system. All terminal devices 200 in the voice control system may report the device status after receiving the status acquisition instruction.

For example, when the terminal device 200 reports voice and audio data, the server 400 can recognize the wake-up word "Hi! Xiao×" from the voice and audio data, then after recognizing the wake-up word "Hi! Xiao×" in the voice and audio data, the server 400 The voice control system currently used by the user can be determined according to the identification information of the terminal device 200, that is, "××'s home system". The voice control system has a smart TV, speaker A, and speaker B in the living room; there are lights and speaker C in the bedroom. ; There is a smart refrigerator in the kitchen. Then send a status acquisition request to the voice control system, so that the TV, speaker A, speaker B, lamp, speaker C, and smart refrigerator in the voice control system report the current device status.

After obtaining the service requirement information and the device status reported by the terminal device 200, the server 400 may screen the second terminal device according to the service requirement information and the device status information. Wherein, the second terminal device is an intelligent device whose device status can realize service requirement information.

Since whether the terminal device 200 can meet the business requirements requires specific preconditions, such as device type and device status, so the server 400 can filter the current voice control system according to different preconditions in the process of screening the second terminal device. The terminal device 200 performs multi-level screening. For example, if the user enters the voice "Hi! Little ×, turn on the light", the corresponding business requirement is "turn on the light". The preconditions required to realize the business requirement are: the device type is a light fixture, and the device status is standby , the server 400 can first filter out all terminal devices 200 whose type is a lamp in the current voice control system according to the device type, and then filter out lamps whose device status is in a standby state according to the device status as the second terminal device.

After the second terminal device is screened out, the server 400 sends a response command to the terminal device 200 as the second terminal device, and the terminal device 200 as the second terminal device can respond to the voice control function by running the response command. At the same time, the server 400 also sends a silent command to other terminal devices 200 other than the second terminal device in the current voice control system, so that other smart devices in the current voice control system other than the second terminal device can run the silent command without responding to the voice. control function.

For example, if the user inputs the voice "Hi! X, turn on the light", the terminal device 200 supporting voice interaction in the home environment will report the received voice command and device status to the server 400, that is, the voice command "Turn on the light " and the device status (standby) are reported to the server 400. After receiving the voice command, the server 400 can determine that the current device status of a lamp meets the object category corresponding to the service requirement in the current user voice command. Therefore, the server 400 can issue a response command for waking up the lamp, and at the same time issue a silent command to other devices, so that the device side executes the corresponding command, so that the lamps that meet the business requirements and device status are turned on, but do not meet the business requirements. and other terminal devices 200 in the device state remain silent.

It can be seen from the above that the voice control method provided in the above embodiment can use the service requirement information contained in the voice command and the device status reported by the terminal device 200 to screen out the second terminal capable of responding to the voice command in the current voice control system equipment. And send a response command to the second terminal device, and send a silent command to other devices at the same time, so that after the intelligent voice system receives the voice command input by the user, each terminal device 200 will exchange information through communication with the server 400 respectively, The second terminal device is automatically judged by the server 400 to reduce data interaction between multiple terminal devices 200, so as to alleviate the problem of low execution rate caused by frequent communication between multiple devices.

When the user inputs the voice, the device can be explicitly executed in the voice instruction. For example, if the voice content input by the user is "turn on the TV", the executing device is specified as the TV. At this time, since there is a clear executing device, the server 400 can The voice command is directly transmitted to the TV device, and the execution device can be determined without parsing the business requirement information. Therefore, in some embodiments, after the server 400 receives the voice command reported by the terminal device 200-1, it may also detect the executing device in the voice command. If there is no specific execution device in the voice instruction, the second terminal device is screened by parsing the service requirement information and matching with the device status according to the manner provided in the above-mentioned embodiment.

If the execution device is specified in the voice instruction, that is, the identification information of the execution device is included, the control command and feedback voice information can be generated according to the voice instruction. Wherein, the control command is a command corresponding to the voice command and oriented to the execution device. For example, for the "turn on the TV" voice input by the user, the corresponding generated control command is "TV_power on". Feedback voice information is a kind of voice audio sent out for the voice content, which is used to remind the user of the execution result of the instruction. For example, when the user enters the voice "Turn on the TV", the intelligent voice system will play the feedback voice information of "Turn on the TV for you" after turning on the TV.

The control command and the feedback voice information can be sent to the specific terminal device 200, so as to implement the service corresponding to the control command by executing the control command, and prompt the user of the service execution result by playing the feedback voice information. Both the control command and the feedback voice information can act on the execution device. For example, when the user enters the voice of "turn on the TV", the TV responds to the voice to power on and start up, and at the same time, "I have been asked to turn on the TV" is played through the TV's intelligent voice system and speakers. ” voice feedback.

However, since the execution device may be located far away from the user, if the feedback voice information is played through the execution device at this time, the user may not be able to hear the feedback voice content clearly due to the long distance, resulting in the user being unable to know the voice interaction process. control the outcome. Moreover, when there are multiple intelligent voice control devices at home, users often don't care which device is woken up to feedback the execution result. To this end, in some embodiments, the server 400 may send control commands and feedback voice information to different terminal devices 200 respectively. That is, the server 400 may send the control command to the execution device according to the identification information of the execution device, and send the feedback voice information to the smart device that inputs the voice command.

For example, when the user utters the voice "turn on the TV" in the bedroom, the intelligent air conditioner with the intelligent voice system in the bedroom first detects the voice audio data, and generates a voice command and sends it to the server 400, and the server 400 can determine that the TV is turned on according to the voice command. device, and generate the "TV_power on" control command and the feedback voice information of "Turn on the TV for you" according to the voice command. Then send the control command to the TV in the living room to turn on the TV, and send the feedback voice information to the smart air conditioner to play the voice feedback of "Turn on the TV for you" through the smart air conditioner in the bedroom.

It can be seen that in the above embodiment, when there is a clear execution device in the voice command, the voice command can be responded to by the execution device and the terminal device 200 that inputs the voice command, so as to meet the business needs and give the user better feedback effect.

Since the voice control system may contain multiple terminal devices 200, and different terminal devices 200 can support the same business needs and be in the same device state at the same time, the devices are screened through the methods in the above-mentioned embodiments , multiple second terminal devices may be screened out. At this time, if the server 400 directly sends a response command to the terminal device 200 as the second terminal device, it will cause multiple terminal devices 200 to respond to a voice command at the same time, and there is still the problem of scene confusion.

In this regard, the server 400 may further perform a detailed screening process by adding screening conditions, so as to reduce the number of terminal devices 200 serving as the second terminal devices. That is, in some embodiments, the service requirement information may further include service type and service status. Then, when the server 400 screens the second terminal device according to the service requirement information, it may extract the service type and service state from the service requirement information, and match the candidate device meeting the service type in the device state, wherein the candidate device has a service The device type required by the type, and then by traversing the device states of the candidate devices, to filter out the second terminal device whose device state conforms to the service state.

For example, if the user inputs the voice "Hi! X, turn off the music", the terminal device 200 in the home environment will report the received voice command, the current device type and device status to the cloud server 400, that is, the device type (music) and Device status (playing). After receiving the content reported by the terminal device 200, the server 400 can filter the corresponding device type and device status in the current voice control system according to the service type and service status required in the voice command, and determine that there is a speaker that is playing music. The current device type and device status conform to the object category of the current user's voice command. Therefore, the server 400 can send a response command to the corresponding speaker, and send a silence command to other devices at the same time, so that the speaker device in the current voice control system executes the corresponding response command and performs the operation of turning off the music.

In some embodiments, the service requirement information further includes a service execution location, and the server 400 may further screen the terminal device 200 according to the service execution location to determine the second terminal device. That is, the server 400 can extract the service execution location from the service requirement information when screening the second terminal device according to the service requirement information, and obtain the device locations of each candidate device in the current voice control system; if the device location of the candidate device is consistent with the service execution location coincidence, that is, the candidate device satisfies the service execution position, then the step of traversing the device states of the candidate devices may be performed to screen out the second terminal device whose device state meets the service state. If the device location of the candidate device does not coincide with the service execution location, it is marked that the candidate device is not the second terminal device, that is, the device can be deleted from the candidate device list.

For example, if the user inputs the voice "Hi! X, let the speaker in the bedroom play music", the terminal device 200 in the home environment will receive the user instruction and report the current device type and device status to the cloud server, that is, the device type (none) and device status (standby). After receiving the information reported by the terminal device 200, the server 400 can analyze the service execution location "bedroom" from the voice command, and screen the terminal device 200 in the current voice control system according to the service execution location, and determine that the device location is in the bedroom The terminal device 200 within the range. Therefore, the server 400 may issue a response instruction to the speakers in the bedroom when it is determined that a speaker in the bedroom corresponds to the current device type and the device status meets the object category controlled by the current user instruction. At the same time, the server 400 also sends a silent command to other devices in the current voice control system, including devices in the bedroom and devices outside the bedroom.

It can be seen from the above that the voice control method provided in the above embodiment can perform multiple rounds of screening on the terminal devices 200 in the voice control system based on service requirement information such as service type, service status, and service execution location, so as to determine a small number of terminal devices 200. The second terminal device is used to reduce the communication frequency between the terminal devices 200 and improve the execution efficiency of the intelligent voice control process.

Through the screening process provided in the above embodiments, the server 400 can screen out the second terminal device that can respond to the control instruction from among the many terminal devices 200 . Through the above screening process, although the number of terminal devices 200 as the second terminal devices can be greatly reduced, there are still many terminal devices 200 that can meet the service requirements in the partial screening process, and for the user's voice control process, usually only Specific one or more second terminal devices are required to perform the response.

Therefore, as shown in FIG. 13 , in order to determine the second terminal device that finally executes the response, in some implementations, the server 400 may further determine the final execution device from among the screened multiple terminal devices 200 that can meet service requirements. That is, when screening the second terminal device according to the service requirement information, the server 400 may perform the following steps:

S201: Obtain the number of terminal devices whose device status can realize service requirement information.

S202: If the number of terminal devices is equal to 1, that is, there is only one terminal device 200 that can meet the current service demand in the current voice control system, so the server 400 can directly mark the terminal device 200 that can realize the service demand information as the second terminal device.

S203: If the number of terminal devices is greater than or equal to 2, search for a master device. Wherein, the master device is one of multiple terminal devices capable of realizing service requirement information.

S204: The main device may perform further interaction with the user to determine the second terminal device that finally responds to the voice instruction.

That is, in some embodiments, after finding the master device, the server 400 may send an inquiry instruction to the master device, so that the master device plays an inquiry voice, wherein the inquiry instruction is multiple rounds of wake-up-free voice interaction instructions. Then receive the confirmation voice command input by the user through the main device, and extract the identification information of the second terminal device from the confirmation voice command, so as to screen the second terminal among multiple smart devices that can realize business demand information according to the identification information of the second terminal device equipment.

For example, when the user's environment includes two terminal devices 200 that are playing music, speaker A and speaker B, when the user inputs the voice: "Hi! X, turn off the music", speaker A and speaker B will receive After the voice command input by the user is received, the current respective device type (music) and device status (playing) are reported to the cloud server 400 respectively. After receiving the above content, the server 400 can filter out the terminal device 200 that meets the service requirement according to the service requirement in the voice command. That is, after judging that the current device type and device status of the two speakers are consistent with the required service type and service status, designate speaker A as the master device, and send multiple rounds of wake-up-free inquiry commands to speaker A, that is, "Do you have a speaker There are two devices, A and speaker B, which one do you want to turn off?", and then receive the confirmation voice command from the user, that is, when the user replies with the voice: "Turn off the music on speaker A", it is determined that speaker A is the final voice control response the second terminal device. At this time, the server 400 may send a response instruction to the speaker A, and send a silence instruction to other terminal devices 200 including the speaker B.

In order to find the main device among multiple terminal devices 200 capable of realizing service requirement information, as shown in FIG. 14 , in some embodiments, the main device may be the terminal device 200 closest to the location of the sound source corresponding to the voice command. When the server 400 searches for the main device, it can obtain the voice and audio data of multiple smart devices that can realize the business demand information for voice command detection, and extract the sound energy value from the voice and audio data, and then compare the sound energy value to obtain the sound energy value. The terminal device 200 with the highest energy value, thus marking the terminal device 200 with the highest sound energy value as the master device.

Since the reverberation time parameter T60 in a specific scene is determined, that is, the time required for the energy attenuation of 60db at any position is the same, and T60 can be estimated based on the energy ratio of the direct sound and the reverberation sound at the corresponding position, therefore, it can be based on From the beamformed spectrogram and the arrival time difference of the sound source, the energy ratio of the direct sound and the reverberant sound of all terminal devices 200 in the environment to the sound source is calculated, and then the direct energy is calculated. Then, by arranging the direct sound energy of the sound source received by each device, it can be determined that the terminal device 200 closest to the position of the sound source is the master device.

In addition to the foregoing method of determining the master device based on the sound energy value, the master device may also be determined based on other methods. That is, in some embodiments, the detection process of the distance between the sound source location and the terminal device 200 can also be completed by each terminal device 200, that is, the terminal device 200 can acquire images of the current environment through multi-eye cameras, and according to multiple A three-dimensional space model is constructed from images from different angles, and then a portrait is extracted from the three-dimensional space model according to the image recognition method, so as to locate the position of the user in the three-dimensional space model, that is, the position of the sound source. After locating the position of the sound source, the terminal device 200 determines the distance between the position of the sound source and each terminal device 200 according to the placement status of the current smart home model, and finally sends the calculated distance to the server 400, so that the server 400 can It is determined that the terminal device 200 closest to the sound source is the master device.

It can be seen that, in the above-mentioned embodiment, when there are multiple terminal devices 200 that can realize the service requirements, the server 400 can further select among multiple terminal devices 200 that can finally execute The second terminal device that responds to the voice control, so that before the voice control process, there will be no frequent communication between multiple devices, and the response speed of the voice interaction process will be improved.

Based on the voice control method provided in the above embodiments, the server 400 can determine the second terminal device, and by issuing a response instruction, the second terminal device can make an interactive response to the voice input by the user. Since the interactive response process can control the second terminal device to perform specific interactive actions, these interactive actions may change the device status of the terminal device 200, so after sending the response command to the second terminal device, the server 400 can also obtain the second terminal device. The device state of the terminal device after executing the response command, so as to update the stored device state in real time.

That is, as shown in FIG. 15 , the server 400 may receive the execution result data reported by the second terminal device after sending a response instruction to the second terminal device. Wherein, the execution result data includes the new state of the device after running the response instruction. Then extract the new state of the device from the execution result, and use the new state of the device to update the state of the device stored in the storage module.

Through the device state update method provided in the above-mentioned embodiments, the device state stored in the server 400 can be kept consistent with the actual device state of the terminal device 200 in the voice control system in time, so that the server 400, in the subsequent execution of the intelligent voice interaction process, The terminal device 200 can be screened based on the updated device state, so as to more accurately determine the second terminal device.

Based on the above voice control method, as shown in FIG. 16 , in some embodiments of the present application, a server 400 is also provided, including: a storage module 410 , a communication module 420 and a control module 430 . Wherein, the control module 430 is configured to perform the following program steps:

S301: Obtain a voice instruction input by the user through the terminal device.

S302: In response to the voice command, parse the service requirement information in the voice command.

S303: Screen the second terminal device according to the service requirement information, where the second terminal device is an intelligent device whose device status can realize the service requirement information.

S304: Send a response instruction to the second terminal device.

S305: Send a silent command to other smart devices other than the second terminal device in the current voice control system.

Cooperating with the above server 400, as shown in FIG. 17 , a terminal device 200 is also provided in some embodiments of the present application, including: an audio input device, an audio output device, a communicator 220 and a controller 250 . Wherein, the controller 250 is configured to perform the following program steps:

S401: Acquire voice and audio data input by a user for performing voice control.

S402: Generate a voice instruction according to the voice audio data.

S403: Send a voice instruction to the server, so that the server parses the service requirement information in the voice instruction, and screens a second terminal device according to the service requirement information, and the second terminal device is an intelligent device whose device status can realize the service requirement information.

S404: Receive a response instruction or a silent instruction sent by the server.

S405: Run the response command or the silent command.

It can be seen from the above content that the server 400 and the terminal device 200 provided in the above embodiment can form a voice control system for implementing the above voice control method. Wherein, the server 400 may analyze the service requirement information from the voice command after the user inputs the voice command, and screen the second terminal device whose current device status can realize the service requirement according to the service requirement information, so as to send a response command to the second terminal device , so that the smart device as the second terminal device makes a voice response; at the same time, the server 400 also sends a silent instruction to other devices in the current voice control system other than the second terminal device according to the screening result of the second terminal device, so that the The terminal device 200 that is not the second terminal device does not respond to the voice control function. The server 400 can pre-process voice commands, so that all types of terminal devices 200 can quickly and efficiently make correct wake-up responses within a specified time, and solve the problem of abnormal responses in traditional voice wake-up methods.

With the widespread application of smart homes, the corresponding terminal equipment can be controlled by voice commands, which is favored by the majority of users. FIG. 18 is an application scenario diagram of a terminal device provided by an embodiment of the present application. As shown in Figure 18, terminal devices such as smart TV 200-5, smart air conditioner 200-2, smart refrigerator 200-4, and smart washing machine 200-3 in the home can be connected to smart terminal 300 and server 400 through the Internet of Things module. , Between the intelligent terminal 300 and the terminal equipment, data transmission can be performed through a local area network or a wide area network, so as to realize the control and management of the terminal equipment. Typically, IoT modules can be built into individual end devices.

The first terminal device is generally a device with certain functions. For the received voice, if no terminal device is specified, it is considered that the device responds to the voice, and if the terminal device to be operated is specified, a control command is sent to the specified terminal device. Exemplarily, taking a smart speaker as an example, after receiving "fast forward", first judge whether there is an operation corresponding to the command on the smart speaker, and if so, execute the fast forward operation; if not, then feedback an unrecognized prompt information. If "TV fast forward" is received, the smart speaker first judges whether there is an operation corresponding to the corresponding TV corresponding to the mapping relationship. If it exists, it will send an operation command that triggers the TV fast forward to the TV. If it does not exist, it will feedback the representation Unrecognized hint message.

In some embodiments, if the user does not specify the second terminal device, the first terminal device may play a prompt to the user representing several terminal devices that have mappings.

In some embodiments, the determination of the mapping relationship between the first terminal device and other terminal devices may be performed in a local module or in the cloud. In some embodiments, the local module and can be placed in the first terminal device or fixed to each other. They can also be separate objects.

In some embodiments, taking cloud execution as an example, when a user adds a terminal device to a home, the terminal device can be divided into regions and the terminal device can be set in a fixed region. For example, install smart TVs, smart speakers, smart air conditioners, etc. in the living room area, and install smart refrigerators in the kitchen area. FIG. 19 is another application scenario diagram of a terminal device provided in an embodiment of the present application. When the user wants to listen to XXX songs, after the user inputs the wake-up word "Hi! XX" to the first terminal device, the first terminal device activates the voice application installed in the device, and inputs "play" to the first terminal device XXX song "voice message, the first terminal device converts the voice message into a voice command through the voice application and transmits it to the server 400, and the server 400 inquires about the terminal device currently configured by the user after receiving the voice command, and finds that there is a TV, After the refrigerator and air conditioner, the user will be fed back "You have 3 devices, which one do you want to use" through the first terminal device. If the user is in the living room at this time, he may continue to input the voice message of "play with the living room". The server 400 will further determine that there are two terminal devices, a TV and an air conditioner, in the living room at this time, and feed back to the user through the first terminal device "You have 3 devices in the living room, which one do you want to use?" The user needs to input " Let's play it on TV", so far, the server 400 will control the TV to play XXX song. Wherein, the first terminal device may be a device with a radio function such as a smart remote controller and a smart speaker.

Based on the fact that there are multiple terminal devices in one family, the server 400 needs to perform multiple rounds of voice interaction with the user, so that the user can actively make various prompts, and finally determine the second terminal device that will finally execute the command. The above process of multiple interactions with the user.

In some embodiments, in order to improve user experience, the present application provides a server in some embodiments, and the server is configured to perform a voice interaction process. The voice interaction process will be described below with reference to the accompanying drawings.

In some embodiments, the method being executed in the server 400 may be executed in another terminal device or the first terminal device. Follow-up uses server execution as an example.

FIG. 20 is a schematic flowchart of a voice control method provided by an embodiment of the present application. As shown in Figure 20, the method includes the following steps:

The first terminal device and other terminal devices such as TVs, refrigerators, and air conditioners are placed in the user's actual environment. Both the first terminal device and the other terminal devices can be connected to the server through the network, and the user pre-operates on a certain terminal device. These terminal devices are all logged into the same account, and the server side stores the mapping relationship between the user ID of the account and the terminal device. At any moment, the first terminal device receives the voice input by the user.

In some embodiments, the user may send a voice command to the first terminal device based on his own needs, and control the corresponding second terminal device to execute the voice command by voice. For example, when the user wants to use the second terminal device to watch the XXX movie, he can enter the wake-up word "Hi! XX" into the first terminal device, and then input a voice message of "Play XXX movie" to the first terminal device. It should be noted that the second terminal device and the first terminal device are two independent devices, and the second terminal device can have its own voice receiving device for voice control. The second terminal device and the first terminal device are not TVs and only It is not the relationship between the TV remote control used for the TV, nor the relationship between the TV processor and the voice receiving device on the TV.

In some embodiments, the wake word and voice command can be entered sequentially. The first terminal device can recognize the voice instruction contained in the sentence containing the wake-up word.

In some embodiments, the first terminal device can integrate functions of sound collection and speech analysis. Among them, the radio function is to receive the voice message sent by the user, and the voice analysis function refers to extracting the key part of the voice message sent by the user. The key part can reflect the user's intention or the content to be done, and analyze the user's intention Afterwards, it is converted into a voice command, and the voice command may be an executable command format agreed between the first terminal device and the server 400 . Wherein, the command format may include command (command) and parameters (parameter). After being converted into voice commands, the voice commands are sent to the server 400 and received by the server 400 . Wherein, adding the corresponding user identifier to the voice command facilitates the server 400 to identify which user sent the voice command, and also facilitates the user to search for all terminal devices configured by the user.

In some embodiments, the first terminal device has an independent audio-visual display system capable of decoding and playing audio and video streams.

In some embodiments, the second terminal device also has an independent audio and video display system, which can decode and play audio and video streams.

S2001: Receive a voice instruction including a user identifier sent by a first terminal device.

In some embodiments, the first terminal device may use the text converted from the voice as a voice instruction and send it to the server.

In some embodiments, the voice conversion server is in the server, so the first terminal device can send the received voice to the server according to the agreed encapsulation structure, and the server decapsulates the received data and obtains the voice command.

S2002: Find all terminal devices related to the user identifier.

In some embodiments, after receiving the voice command, the server 400 analyzes the voice command to obtain the user identifier and the user's intention conveyed by the voice command.

In some embodiments, the server determines the device type corresponding to the command by analyzing the voice command and according to the keywords in the parsed text, where the device type refers to the functional authority of the terminal device.

In some embodiments, the mapping relationship between keywords and device types may be cached in the server, and in some embodiments, a trained keyword-device type neural network model may also be stored.

Take the device type corresponding to the cache keyword in the server as an example, as shown in Table 1:

语音指令voice command	设备类型Equipment type	终端设备Terminal Equipment
播放电影play movie	视频播放video playback	智能电视、智能冰箱屏Smart TV, smart refrigerator screen
播放歌曲play song	音频播放Audio Player	智能音箱smart speaker
开始送风start blowing	室温调整room temperature adjustment	智能空调、智能电风扇Intelligent air conditioner, intelligent electric fan

Table 1

At the same time, after obtaining the user ID, the server 400 searches the database for all terminal devices related to the user ID, that is, searches for all terminal devices configured by the user. This is because, if device discovery is based on near-field communication, the device can discover all the devices it scans, including its own and/or devices of other users in close proximity, but some of these devices do not belong to the user , even if it is judged by the LAN, it is possible to initially select the scenario where a guest device is connected to the LAN. By pre-establishing a relationship between the user ID and its own equipment in a server with a management function, the server 400 can determine the equipment actually owned by the user according to the user ID.

In some embodiments, the terminal device refers to other devices associated with the user identifier except the first terminal device.

In some embodiments, the recognition of the voice command needs to determine the type of the voice command, and then compare it with the types that can be executed by each terminal device, and then determine the terminal device that can execute the voice command.

In some embodiments, the execution types of each terminal device may be pre-calibrated, for example, the kitchen refrigerator corresponds to freezing, refrigeration, recipe recommendation, and ingredient identification. It may also be determined according to the device identifier when a newly added device is scanned. For example, when the newly added device identifier indicates that the device is a refrigerator, an association between the device identifier and freezing, refrigeration, recipe recommendation, food material identification, etc. is established.

In some embodiments, after acquiring the user ID, the server 400 may also load all terminal devices and device attributes related to the user ID from the database into the cache. Subsequently, when the server 400 searches for the second terminal device, it may directly search in the cache, so that the speed at which the server 400 searches for the second terminal device can be accelerated. The device attributes include inherent attributes such as the location of the terminal device, the name of the terminal device, and the ID of the terminal device.

S2003: When there is no terminal device related to the user identifier, feed back a parameter representing the absence of the terminal device, so that the first terminal device broadcasts that there is no terminal device executing the voice command.

In some embodiments, if the server 400 does not find a terminal device related to the user identifier, it means that the user has not configured a terminal device. At this time, the server 400 needs to feed back the parameter indicating that there is no terminal device to the first terminal device. The first terminal device broadcasts the absence of the terminal device according to the received parameter representing the absence of the terminal device.

S2004: When there is a terminal device related to the user identifier, use preset filtering rules to filter out the most matching second terminal device, and feed back parameters characterizing the best matching second terminal device, so that the first terminal The device announces that there is a second terminal device that executes the voice command, and controls the most matching second terminal device to execute the voice command.

In some embodiments, the preset filtering rule refers to a rule for filtering out a preset rule conforming to the user's intention/a device capable of executing the voice instruction. For example, when the user inputs the voice of "play folk music", the first terminal device sends an instruction to play folk music to the server, and the server recognizes that the type of the user's intention is to play music, and then determines the first terminal that can perform this type of operation according to the type. Two terminal equipment.

In some embodiments, when there is a terminal device related to the user identifier, the terminal device related to the user identifier is used to execute the voice instruction.

In some embodiments, when there is a terminal device related to the user identifier, it is also necessary to determine the number of terminal devices and/or the functions that the terminal device can perform. Here, the function that the terminal device can perform is the terminal Device function permissions. Exemplarily, when the voice instruction is an instruction to play a video, it is necessary to determine the device that can play the video, and the device that cannot perform video playback will be screened out. is screened out, the voice command can be executed by the cooling device.

In some embodiments, the server 400 is configured with preset filtering rules. The filtering rule characterizes the mapping relationship between the voice instruction and the terminal device. The filtering rules include the first group of rules and the second group of rules, or only the first group of rules do not include the second group of rules, wherein the first group of rules refers to filtering out the most matching second terminal device Necessary rules, the second set of rules refer to the rules that are superimposed and utilized one by one when no best matching second terminal device is screened out.

In some embodiments, the first group of rules includes the mapping relationship between device types and each terminal device, for example, the mapping relationship shown in Table 1 in the above embodiments.

In some embodiments, the first set of rules may be a judgment on the number corresponding to user identifiers.

In some embodiments, after the second terminal device is determined according to the first set of rules, if there are still two or more terminal devices, the second rule needs to be further used for further screening. Exemplarily, after the optional second terminal device is determined according to the device type and the mapping relationship of each device, if the number of the second terminal device is 1, the voice instruction is directly sent to the second terminal device, if 0, then there is no executable device to feed back to the first terminal device. If it is more than 1, it can be based on location/installation time/usage frequency/device that executed this type of command last time/start time/signal strength/priority, etc. One or any combination of them can be used as the second set of rules for filtering.

FIG. 21 is a schematic flowchart of a screening terminal device provided by an embodiment of the present application. With reference to FIG. 21 , in some embodiments of the present application, the process of screening the second terminal device according to the filtering rules is as follows:

S2101: Use the first set of rules to screen the current terminal device and the terminal device related to the user identifier.

In some embodiments, when there is a terminal device related to the user identifier, the server 400 first screens the terminal device according to the necessary rules for screening out the most matching second terminal device. In some embodiments, the first group of rules includes terminal device function authority sub-rules, where the terminal device function authority sub-rules refer to the respective functions of each terminal device, such as a smart TV having the function of playing media assets, The air conditioner has the function of cooling and heating and so on. That is, when the server 400 screens out the terminal devices actually owned by the user, it further uses the terminal device function authority sub-rule to screen the terminal devices, and excludes the terminal devices that do not have corresponding authority functions.

In some embodiments, when terminal devices are screened by using the sub-rules of terminal device function rights, the server 400 respectively detects function rights of multiple terminal devices related to the user identifier. When the function authority is suitable for executing the voice instruction, the server 400 selects the corresponding terminal device as the candidate terminal device. When the function authority is not suitable for executing the voice instruction, the server 400 excludes the corresponding terminal device.

S2102: If the terminal device related to the user identifier does not have the authority to execute the voice command, feed back a parameter indicating that there is no terminal device, so that the first terminal device broadcasts that there is no terminal device that executes the voice command.

S2103: If the terminal device related to the user identifier has the right to execute the voice command, confirm the number of second terminal devices that have the right to execute the voice command.

In some embodiments, when using the first set of rules to filter terminal devices related to the user identifier, if none of the terminal devices actually owned by the user has the corresponding functional authority, the server 400 needs to feed back the token to the first terminal device. There are no parameters for end devices. The first terminal device broadcasts that there is no terminal device according to the received parameter representing the absence of the terminal device.

In some embodiments, when using the first set of rules to filter terminal devices related to the user identifier, if the terminal device actually owned by the user has the corresponding functional authority, the server 400 also needs to confirm that the terminal device with the corresponding functional authority quantity.

In some embodiments, when the server 400 detects that there is only one terminal device with the authority to execute the voice command, it feeds back the parameters representing the terminal device with the corresponding authority to the first terminal device, and the first terminal device according to The parameter representing the terminal device with the corresponding authority broadcasts the terminal device with the corresponding authority.

The server 400 also needs to control the current terminal device to perform corresponding operations according to the user's intention in the voice command.

In some embodiments, the server 400 sends the voice instruction to the current terminal device, and the current terminal device receives the voice instruction and performs a corresponding operation according to the voice instruction.

For example, the user is currently only equipped with a sweeping robot. If the user inputs a voice message of "sweeping the floor", the first terminal device transmits the voice message to the server 400. After querying, the server 400 finds that the current user is only equipped with a sweeping robot. With cleaning function. Therefore, the server 400 controls the first terminal device to broadcast "the sweeping robot starts to sweep the floor", and controls the sweeping robot to perform the sweeping function. Wherein, when the server 400 controls the broadcasting of the first terminal device, it may feed back the parameter representing the terminal device with the corresponding authority to the first terminal device through a long link.

In some embodiments, when there are multiple terminal devices that have the authority to execute the voice command, the sub-rules in the second set of rules are used to screen the multiple terminal devices that have the authority to execute the voice command one by one in order of priority from high to low. The second terminal device with the above-mentioned voice command authority, until a best-matching second terminal device is screened out.

In some embodiments, the second set of rules preset in the server 400 includes a user frequency sub-rule, a distance from the first terminal device sub-rule, and a second terminal device priority sub-rule. Here, the user frequency sub-rule refers to the number of times the user has executed similar voice commands through the second terminal device, for example, the number of times the user plays video data such as A movie and B variety show through a smart TV, or the user plays C through a smart speaker. The frequency of audio data such as songs and D songs. Among them, A movie, B variety show, C song, D song, etc. can be divided into media data. , "play C song" and "play D song" can be regarded as similar voice commands, all of which are playback voice commands, and "play A movie" and "play B variety show" can be further divided into video playback voice commands, "Play song C" and "play song D" are divided into audio playback voice instructions. The distance sub-rule from the first terminal device refers to the distance between each second terminal device and the first terminal device. The second terminal device priority sub-rule refers to the priority of each terminal device set by the user when executing voice commands. For example, corresponding to the voice commands for playing media assets, the priority of smart TVs is higher than that of smart refrigerator screens. Wherein, the priority of the terminal device can be set by the user through the application program on the smart terminal 300 .

In some embodiments, multiple sub-rules in the above-mentioned second group of rules can be set with corresponding priorities by the user, for example, the priority of setting the priority of the user usage frequency sub-rule is higher than the priority of the distance from the first terminal device sub-rule , set the priority of the terminal device priority sub-rule to be higher than that of the user usage frequency sub-rule, etc.

In some embodiments, the server 400 uses filtering rules to filter terminal devices to select one of the most suitable second terminal devices for executing the current voice command, that is, to filter to the end, the most matching second terminal device The number of terminal devices is 1. After screening by the first set of rules, as long as the number of corresponding terminal devices is not unique, the server 400 will use the sub-rules in the second set of rules to screen the remaining terminal devices one by one according to the priority.

For example, after the user inputs "play song A", the server 400 finds out that the user is equipped with a smart TV, a smart speaker, a smart air conditioner, a smart refrigerator, and a smart washing machine. After filtering, it is obtained that the smart TV, smart speaker, and smart refrigerator have playback functions, and the smart TV, smart speaker, and smart refrigerator need to be further screened through the second set of rules. The server 400 further filters through the distance sub-rule with the first terminal device, and through the device location information in the device attributes of the terminal device, the server 400 judges that the smart TV, the smart speaker and the first terminal device are all in the living room, that is, the distance is relatively small. The distance between the smart refrigerator and the first terminal device is relatively long, while the smart refrigerator is in the kitchen. Based on this, the smart refrigerator is excluded. Since there are still two candidate terminal devices, i.e. smart TVs and smart speakers, which need to be further screened, the server 400 continues to filter through the terminal device priority sub-rules. priority is higher than that of the smart speaker, so the server 400 selects the smart TV as the most matching terminal device.

For another example, after the user inputs "lower indoor temperature", the server 400 finds out that the user is equipped with a smart TV, a smart speaker, a smart air conditioner, a smart refrigerator, and a smart electric fan. After filtering by the device, it is obtained that the smart air conditioner and the smart electric fan have room temperature adjustment functions, and the server 400 needs to further screen the smart air conditioner and the smart electric fan through the second set of rules. The server 400 continues to filter through the terminal device priority sub-rules. For voice commands such as room temperature adjustment, the priority of the smart air conditioner set by the user is higher than that of the smart electric fan, so the server 400 selects the smart air conditioner as the most matching terminal equipment. Of course, users can also set the priority of smart electric fans higher than that of smart air conditioners according to their own needs.

In some embodiments, when using the user frequency sub-rules to screen terminal devices, the server 400 respectively detects execution frequencies of a plurality of second terminal devices related to the user identifier, wherein the execution frequency refers to The number of times that the second terminal device has executed similar voice commands in historical behaviors. The server 400 will reserve the second terminal device with the highest execution frequency.

In some embodiments, when using the sub-rule of distance from the first terminal device to screen terminal devices, the server 400 respectively detects the distances between multiple second terminal devices related to the user identifier and the first terminal device. The server 400 will reserve the second terminal device with the closest distance to the first terminal device.

In some embodiments, when screening terminal devices using the terminal device priorities, the server 400 reserves the second terminal device with the highest priority according to the terminal device priorities set by the user.

The above embodiments are not only applicable to home scenarios, but also to office scenarios.

In this application, when the server responds to the voice command issued by the user and inquires that there are currently multiple terminal devices, it can select the second terminal device that will finally execute the command based on the filtering rules, so as to avoid multiple communication between the user and the first terminal device. Voice interaction process to improve user experience.

The present application also provides a voice control method in some embodiments, the method includes: the server 400 receives a voice instruction including a user ID sent by a first terminal device, and searches for all terminal devices related to the user ID. When there is no terminal device related to the user identifier, the server 400 feeds back a parameter characterizing the absence of a terminal device, so that the first terminal device broadcasts that there is no terminal device executing the voice command. When there is a terminal device related to the user identifier, the server 400 uses preset filtering rules to filter out the most matching terminal device, and feeds back parameters representing the most matching terminal device, so that the first terminal device broadcasts the most matching terminal equipment, and control the most matching terminal equipment to execute the voice instruction.

FIG. 22 is a schematic diagram of a scene of a voice control process provided by an embodiment of the present application. As shown in FIG. 22, it is assumed that the terminal devices in the smart home scenario include terminal device 200-4 (that is, a smart refrigerator), terminal device 200-3 (that is, a smart washing machine), and terminal device 200-5 (that is, a smart display device). When you want to control the terminal equipment in the smart home scene, you first record through the recording application in the first terminal equipment, that is, the intelligent terminal 200-1, to obtain recording data, wherein the recording data is mainly the user's control intention, And the control intention does not include the second terminal device to be controlled. The intelligent terminal 200-1 sends the recorded data of the user to the server 400, so that the server 400 can recognize the voice command and obtain the specific control information corresponding to the voice command, so as to determine the first control information that the user actually wants to control according to the control information. Two terminal devices, and directly control the second terminal device to execute the corresponding control instructions, that is: through the interaction between the server 400 and the intelligent terminal 200-1, the voice control of the terminal device is realized; or the user can use the local server 400, such as the object The recording module in the networked terminal records sound, wherein the sound is mainly the user's control intention, and the control intention does not include the second terminal device to be controlled. The local server 400 recognizes the voice command entered by the user to obtain specific control information corresponding to the voice command, thereby determining the second terminal device that the user actually wants to control according to the control information, and directly controls the second terminal device to execute corresponding control commands.

In the above process, the user only needs to record, and the rest of the processing process does not require the user's participation, so that the terminal device can be automatically controlled by voice to execute corresponding control instructions, which is convenient for the user to control and use the smart home device, and is beneficial to Improve intelligence and accuracy.

It should be noted that: a smart home scene may include various terminal devices, and FIG. 22 is only an illustration, and does not specifically limit the type and number of smart devices.

The voice control method provided in the embodiment of the present application may be implemented based on a computer device, or a functional module or a functional entity in the computer device.

Wherein, the computer may be a personal computer (personal computer, PC), a server, a mobile phone, a tablet computer, a notebook computer, a mainframe computer, etc., which are not specifically limited in this embodiment of the present application.

The voice control method provided in the embodiment of the present application may be implemented based on the above computer device.

The voice control process provided by the embodiment of the present application can be implemented based on the above-mentioned computer equipment. This method can recognize the voice command of the user, and obtain the control information corresponding to the voice command. The control information includes the function category and the control command, and then according to the pre-established terminal device information table, determine the first candidate terminal device set corresponding to the function category; then based on the functional state corresponding to each candidate terminal device in the first candidate terminal device set, determine the second candidate terminal device set that matches the control instruction, Finally, determine the second terminal device that matches the control command from the second candidate terminal device set, and control the second terminal device to execute the control command, and automatically control the terminal device to execute the corresponding control command through voice, which is convenient for users to control smart home devices And use, help to improve intelligence and accuracy.

In order to describe this solution in more detail, the following will be described in conjunction with FIG. 23A in an exemplary manner. It can be understood that the steps involved in FIG. 23A may include more steps or fewer steps in actual implementation, And the order of these steps may also be different, as long as the voice control method provided in the embodiment of the present application can be realized.

FIG. 23A is a schematic flowchart of a voice control method provided in an embodiment of the present application, and FIG. 23B is a schematic diagram of a principle of a voice control method provided in an embodiment of the present application. This embodiment is applicable to the situation of controlling each terminal device included in the smart home scene. The method of this embodiment can be executed by a voice control device, which can be implemented in hardware/or software, and can be configured in computer equipment.

As shown in Figure 23A, the method specifically includes the following steps:

S2301. Recognize a user's voice command to obtain corresponding control information, where the control information includes a function category and a control command.

Among them, the voice command can be understood as the data formed after the user records. The control information can be understood as the control intention corresponding to the user's voice command, which includes the function category and control instructions related to the terminal device, but does not include the specific second terminal device to be controlled. Terminal devices can be understood as various devices included in smart home scenarios, such as audio and video equipment, lighting systems, curtain control, air conditioning control, digital theater systems, audio and video servers, and network appliances. The function category can be understood as the category to which the specific functions of the terminal device belong. For example, the category corresponding to the smart TV may include: volume, brightness, video playback scene, and recipe scene. The control instruction can be understood as an operation instruction related to the terminal device, such as opening, closing, playing, and pausing.

In a smart home scenario that includes multiple different types of terminal devices, each terminal device is in a different control state, and the user needs to specify the specific control state of each terminal device to control the terminal device. For voice commands that do not specify a terminal device, it may It will lead to execution failure during the voice control process or the need to guide the user to supplement information multiple times in order to determine the terminal device to be controlled.

The execution subject in this embodiment may be a local control device 200 with processing and interaction functions, such as an Internet of Things terminal, or a server 400 that interacts with the smart terminal 300 . After obtaining the user's recording data, since the local control device 200 and the server 400 cannot directly obtain the specific information contained in the voice command, it is necessary to recognize the user's voice command, specifically through the voice recognition method and the semantic understanding method. , can also be recognized by a method such as a neural network model or a speech recognition system, which is not specifically limited in this embodiment. After the recognition, the control information corresponding to the voice command can be obtained.

S2302. Determine a first set of candidate terminal devices corresponding to the function category according to the pre-established terminal device information table.

Among them, the terminal device information table can be understood as a pre-established table related to the information corresponding to each terminal device in the smart home scene, and the table can include the device identification number, device name, function category and function status of each terminal device, etc. . The first set of candidate terminal devices can be understood as a set of terminal devices included in the smart home scene that match the function category.

After obtaining the control information corresponding to the voice command, by matching the function category corresponding to each terminal device in the terminal device information table with the function category in the control information, the first candidate terminal device corresponding to the function category in the control information can be obtained gather.

S2303. Determine a second set of candidate terminal devices that matches the control instruction based on the function status corresponding to each candidate terminal device in the first set of candidate terminal devices.

Wherein, the second set of candidate terminal devices can be understood as a set of second terminal devices included in the smart home scene that match the control instructions, and this set is a candidate for the second terminal device that the user wants to control. gather.

The first candidate terminal device set may contain multiple candidate terminal devices, and each candidate terminal device may be in a different functional state. Therefore, after obtaining the first candidate terminal device set, in order to determine the terminal device that the user wants to control, further steps are required. narrow down. At this time, according to the functional status corresponding to each candidate terminal device in the first candidate terminal device set, the functional status corresponding to each candidate terminal device is compared with the control instruction in the control information, and the candidate that matches the control instruction can be obtained. A second set of candidate terminal devices formed by the terminal devices.

Exemplarily, assuming that the control instruction is open, the functional state corresponding to candidate terminal device 1 is normal; the functional state corresponding to candidate terminal device 2 is playing, and the functional state corresponding to candidate terminal device 3 is off, then the candidate terminal device 3 Add to the second set of candidate terminal devices.

S2304. Determine a second terminal device that matches the control instruction from the second candidate terminal device set, and control the second terminal device to execute the control instruction.

Wherein, the second terminal device may be understood as a terminal device that matches the control instruction.

Since there may be multiple terminal devices included in the second candidate terminal device set, it is necessary to determine the second terminal device that matches the control instruction from the second candidate terminal device set, and the number of second terminal devices may be multiple , may depend on specific circumstances, and this application does not make specific limitations. After the second terminal device is determined, a corresponding control command is sent to the terminal device to control the second terminal device to execute the control command, so as to meet the needs of the user and accurately execute the control of the second terminal device in accordance with the user's voice command.

Optionally, FIG. 23C is a schematic diagram of the process of determining the second set of candidate terminal devices in the embodiment of the present application, as shown in FIG. 23C:

1. Determine the total set of device names, the total set of function categories, and the total set of functional states corresponding to all terminal devices;

Among them, the total set of device names is defined as Dev, and each terminal device is Dev1, Dev2, Dev3, ...;

The total set of functional categories is defined as F, and each function is F1, F2, F3, ...;

The total set of functional states is defined as S, and the functional states are respectively S1, S2, S3, . . .

2. Determine the combination of function category and function status included in each terminal device, and the corresponding set is as follows:

Dev1={F1S1, F2S2,...}

Dev2={F1S2, F2S3, F3S3,...}

Dev3={F1S1, F2S3, F4S1, F5S2,...}

Dev4={F3S3, F3S3, F5S5,...}

3. Recognize the user's voice command, and obtain the combination FxSy corresponding to the function category and the control command.

4. Query the sets in step 2, and determine the second set of candidate terminal devices according to the terminal devices corresponding to the sets whose elements are the same as FxSy.

In this embodiment, the user's voice command is firstly recognized to obtain the control information corresponding to the voice command. The control information includes the function category and the control command, and then according to the pre-established terminal device information table, determine the first A set of candidate terminal devices; then, based on the functional state corresponding to each candidate terminal device in the first set of candidate terminal devices, determine a second set of candidate terminal devices that matches the control instruction, and finally determine from the second set of candidate terminal devices that matches the control command Match the second terminal device, and control the second terminal device to execute the control command, and automatically control the terminal device to execute the corresponding control command through voice, which is convenient for the user to control and use the smart home device, and is conducive to improving intelligence and accuracy.

In some embodiments, optionally, the terminal device information table is obtained in the following manner:

Obtain the device name, function name, function category and function status corresponding to each terminal device included in the preset scene;

According to all the device names, function names, function categories and function states, create or update the corresponding terminal device information table.

Wherein, the preset scene can be understood as a scene that includes multiple terminal devices and the multiple terminal devices are interconnected through a network, such as a smart home scene, a smart office scene, and the like.

Specifically, the device name, function name, function category, and function status corresponding to each terminal device included in the preset scene can be obtained through the reporting of terminal device information, and the corresponding device names of each terminal device can also be obtained in other ways. Name, feature name, feature category, and feature status. After obtaining the information corresponding to each terminal device, according to all the device names, function names, function categories and function states, it is possible to establish a corresponding terminal device information table, or in the device name, function name, function category and function status. The terminal device information table can be updated in time after at least one of the changes occurs.

In this embodiment, by establishing or updating the corresponding terminal device information table through the above method, it can be ensured that the terminal device information table is consistent with the actual functional status of each terminal device, thereby facilitating the determination of the first set of candidate terminal devices and ensuring the accuracy of the set. accuracy.

In some embodiments, optionally, the method further includes:

If the first set of candidate terminal devices is an empty set, or if the first set of candidate terminal devices is a non-empty set and the second set of candidate terminal devices is an empty set, then send second prompt information, wherein the The second prompt information is used to instruct the user to determine the second terminal device from multiple terminal devices;

receiving second response information, where the second response information includes second identification information corresponding to the second terminal device;

controlling the second terminal device corresponding to the second identification information to execute the control instruction.

Wherein, the first set of candidate terminal devices being an empty set may be understood as that there is no candidate terminal device meeting the conditions in the set. It may be understood that the second candidate terminal device set is an empty set, which means that there is no qualified terminal device in the set.

Specifically, if the first set of candidate terminal devices is an empty set, or if the first set of candidate terminal devices is a non-empty set and the second set of candidate terminal devices is an empty set, it means that the terminal that the user actually wants to control cannot be determined currently. equipment. At this time, the second prompt information can be sent, for example, the local control device 200 can send the second prompt information, for example, it can send the second prompt information to its own display screen or audio application to display or play the second prompt information to instruct the user Determine the second terminal device from the multiple terminal devices; or the server 400 sends second prompt information to the smart terminal 300 to instruct the user to determine the second terminal device from the multiple terminal devices. The second response information fed back by the user is received. Since the second response information includes the second identification information corresponding to the second terminal device, the second terminal device corresponding to the second identification information can be directly controlled to execute the control instruction.

In this embodiment, when the terminal device that the user actually wants to control cannot be determined currently, the second terminal device can be determined through the above method, so as to meet the control requirement of the user and improve the user experience.

FIG. 24A is a schematic flow chart of another voice control method provided by the embodiment of the present application, and FIG. 24B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application. This embodiment is further expanded and optimized on the basis of the foregoing embodiments. Optionally, a possible implementation of S2304 in this embodiment is as follows:

S23041. Determine the number of all second candidate terminal devices included in the second candidate terminal device set that matches the control instruction.

Since the second candidate terminal device set may contain multiple second candidate terminal devices, in order to determine the second terminal device that matches the control instruction, it is necessary to determine all the second candidate terminal devices that are included in the second candidate terminal device set that match the control instruction. The number of two candidate terminal devices, so that the second terminal device can be determined from the second candidate terminal device set according to the relationship between the number and the preset threshold.

S23042. Determine the second terminal device from the second candidate terminal device set according to the relationship between the quantity and the preset threshold, and control the second terminal device to execute the control instruction.

Wherein, the preset threshold may be a preset value, such as 1, 3, etc., and may also be determined according to specific circumstances, which is not specifically limited in this embodiment.

After obtaining the number of all second candidate terminal devices contained in the second candidate terminal device set, compare the number with the size of the preset threshold to obtain the size relationship between the two, so that the subsequent selection from the second candidate terminal device can be based on the size relationship between the two. The second terminal device is determined from the terminal device set, for example, all the second candidate terminal devices included in the second candidate terminal device set are second terminal devices, or part of the second candidate terminal devices included in the second candidate terminal device set The terminal device is the second terminal device. After the second terminal device is determined, it is necessary to control the second terminal device to execute the control instruction, so as to implement smart home control through voice and reduce user operations.

In this embodiment, determining the second terminal device through the above method is simple and quick, and can improve work efficiency.

FIG. 25A is a schematic flowchart of another voice control method provided by the embodiment of the present application, and FIG. 25B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application. This embodiment is further expanded and optimized on the basis of the foregoing embodiments. Optionally, a possible implementation of S23042 in this embodiment is as follows:

S230421. Determine whether the number of second candidate terminal devices is less than or equal to a preset threshold.

After obtaining the number of all second candidate terminal devices contained in the second candidate terminal device set, comparing the size relationship between the number and the preset threshold, it can be determined whether the number of second candidate terminal devices is less than or equal to the preset threshold .

If yes, execute S230422-S230423; if not, execute S230424-S230426.

S230422. Determine all second candidate terminal devices included in the second candidate terminal device set as second terminal devices.

If the number of second candidate terminal devices is less than or equal to the preset threshold, it means that the number of second candidate terminal devices does not exceed the upper limit. Therefore, all second candidate terminal devices included in the second candidate terminal device set are determined as the second Terminal Equipment.

S230423. Control each second terminal device to respectively execute the control instruction.

After all the second candidate terminal devices included in the second candidate terminal device set are determined as the second terminal devices, a control instruction needs to be sent to each second terminal device, so as to control each second terminal device to respectively execute the control instruction.

S230424. Send first prompt information, where the first prompt information is used to instruct the user to determine the second terminal device from the second candidate terminal device set.

If the number of second candidate terminal devices is greater than the preset threshold, it means that the number of second candidate terminal devices exceeds the upper limit. Sending the first prompt information, for example, the local control device 200 sends the first prompt information, for example, it can send the first prompt information to its own display screen or audio application, so as to display or play the first prompt information, to instruct the user to send the first prompt information from multiple terminals The second terminal device is determined in the device; or the server 400 sends the first prompt information to the smart terminal device 204 to instruct the user to determine the second terminal device from multiple terminal devices.

S230425. Receive first response information, where the first response information includes first identification information corresponding to the second terminal device.

The first response information fed back by the user is received, so as to subsequently control the second terminal device corresponding to the first identification information to execute the control instruction.

S230426. Control the second terminal device corresponding to the first identification information to execute the control instruction.

Since the first response information includes the first identification information corresponding to the second terminal device, it is possible to directly control the second terminal device corresponding to the first identification information to execute the control instruction.

In this embodiment, according to the two size relationships between the number of all second candidate terminal devices contained in the second candidate terminal device set and the preset threshold, corresponding steps are respectively performed, which can further improve the smart home voice control process. intelligence and accuracy.

FIG. 26A is a schematic flowchart of another voice control method provided by the embodiment of the present application, and FIG. 26B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application. This embodiment is further expanded and optimized on the basis of the foregoing embodiments. Optionally, a possible implementation of S2301 in this embodiment is as follows:

S23011. Perform text recognition on the voice command by a voice recognition method to obtain text information corresponding to the voice command.

Wherein, the speech recognition method is a method for converting speech into text, such as speech recognition software.

The speech recognition method can perform text recognition on the speech instruction, so as to obtain the text information corresponding to the speech instruction.

S23012, perform semantic understanding on the text information by using a semantic understanding method, and obtain control information contained in the text information, where the control information includes function categories and control instructions.

Wherein, the semantic understanding method may include a keyword extraction method, an information extraction method, and the like.

After obtaining the text information, because the text information recognized by the machine may contain redundant information, repeated information, etc., in order to further improve the accuracy of the recognition process, the text information is semantically understood through the semantic understanding method, and the control information contained in the text information is obtained. Information, the control information includes function categories and control instructions.

In this embodiment, the control information obtained through the above method is more accurate and more in line with the actual situation, which is beneficial to ensure the smooth progress of the subsequent process.

Exemplarily, FIG. 26C is a schematic diagram of the process of obtaining control information in the embodiment of the present application, as shown in FIG. 26C:

Firstly, voice recognition is performed on the voice command to obtain the first information, and then the semantic understanding of the first information is performed to obtain the control information.

Exemplarily, FIG. 27A is a schematic structural diagram of a local control device in the embodiment of the present application, as shown in FIG. 27A:

The local control device 200 includes voice recognition service, semantic understanding service and terminal device control service. Among them, the speech recognition service is mainly used for recording and recognizing the user's voice command to obtain the recognition result; the semantic understanding service is mainly used for determining the control information according to the recognition result; the home control service is used for maintaining the terminal device information table and receiving the terminal device report device information and control the corresponding terminal device according to the control information.

FIG. 27B is a schematic structural diagram of interaction between a local control device and a terminal device in the embodiment of the present application, as shown in FIG. 27B:

The voice recognition service includes a recording module and a recognition engine, wherein the recording module is used for recording, and the recognition engine is used for recognition according to the user's voice command to obtain a recognition result. Semantic understanding services include functional categories and control instructions. The home control service includes a terminal device information table, determining a second terminal device, and voice command control. The home control service interacts with each terminal device, such as terminal device A, terminal device B, ..., terminal device N, and the home control service obtains the terminal device information table according to the device information reported by each terminal device; according to the terminal device information table and the functional category to determine the first set of candidate terminal devices; determine the second set of candidate terminal devices according to the second set of candidate terminal devices and the control instruction, and determine the second set of terminal devices from the second set of candidate terminal devices, and control the second terminal device Execute the control command, so as to realize the control of the smart home through the voice command. The terminal device is responsible for reporting device information and receiving and executing control instructions issued by the home control service.

In some embodiments, it is assumed that the terminal device information table is as shown in Table 2 below:

设备标识号device identification number	设备名称device name	功能类别Functional category	功能状态functional status
11	窗帘curtain	亮度brightness	关闭closure
22	电视television	亮度brightness	正常normal
33	电视television	音量volume	正常normal
44	电视television	菜谱场景recipe scene	展示UIshow UI
55	智能音箱smart speaker	音乐场景music scene	正在播放Now Playing
66	烤箱oven	食材场景food scene	已放置牛排steak placed
77	冰箱refrigerator	门Door	开启turn on
88	洗衣机washing machine	门Door	关闭closure

Table 2

The information contained in each terminal device in Table 2 is as follows:

1. The curtains are closed;

2. The brightness and volume of the TV are normal and the menu UI is being displayed;

3. The smart speaker is playing music;

4. The steak has been placed in the oven, waiting to start cooking.

5. The refrigerator door is open.

6. The door of the washing machine is closed.

It should be noted that each device included in Table 1 is a terminal device.

Example 1: Suppose the voice command is "too dark", the function category is brightness, the control command is increase, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 1 Curtain and 2 TV; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 1 curtain, because all the second candidate terminal devices contained in the second candidate terminal device set If the number is equal to the preset threshold, it is determined that the second terminal device is a curtain, and the curtain is controlled to perform an opening function.

Example 2: Assume that the voice command is "I want to grill a steak", the functional category is ingredient scene and recipe scene, the control command is cooking and query, and the preset threshold is 1, then the first candidate terminal device can be determined according to the functional category and Table 1 The set is: 4 televisions and 6 ovens; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 6 ovens, because the second candidate terminal device set contains The number of all second candidate terminal devices is equal to the preset threshold, then it is determined that the second terminal devices are 6 ovens, and the ovens are controlled to perform the function of grilling steaks.

In some embodiments, if it is detected that there is no steak in the oven, but the TV supports the recipe query function, the TV will introduce the grilled steak recipe.

Example 3: Assuming that the voice command is "too loud", the function category is volume, the control command is lower, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 3 TVs and 5 smart speakers; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 5 smart speakers, because all the second candidate terminal devices contained in the second candidate terminal device set If the number of terminal devices is equal to the preset threshold, it is determined that the second terminal device is 5 smart speakers, and the smart speakers are controlled to perform a volume down function.

In some embodiments, if it is detected that the TV is also playing a video, the user is prompted to select whether the device to turn down the volume is a TV or a sound box.

Example 4: Suppose the voice command is "close the door", the function category is door, the control command is close, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 7 refrigerators and 8 televisions ; Then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 7 refrigerators, due to the number of all second candidate terminal devices contained in the second candidate terminal device set is equal to the preset threshold, then it is determined that the second terminal device is a 7 refrigerator, and the refrigerator is controlled to execute the door closing function.

In some embodiments, if it is detected that the oven door is also open, the user is prompted whether the device to close the door is a refrigerator or an oven.

In some embodiments, the embodiment of the present application provides an electronic device, the electronic device includes at least a processor and a memory, and the processor is used to implement the operation of any one of the above-mentioned terminal devices when executing the computer program stored in the memory. Voice control method.

In some embodiments, the embodiment of the present application provides a computer-readable non-volatile storage medium, which stores a computer program, and when the computer program is executed by a processor, it realizes voice control of any terminal device as described above. method.

Claims

A server for voice control, the server is used to execute:

receiving a voice signal sent by the first terminal device, generating a voice command according to the voice signal, and feeding back the voice command to the first terminal device;

When the first terminal device cannot perform the operation corresponding to the voice command, receiving an instruction distribution request sent by the first terminal device, the command distribution request carrying the voice command;

According to the instruction distribution request, search for a second terminal device that can perform the operation corresponding to the voice instruction, and send the voice instruction to the second terminal device, so that the second terminal device responds to the voice The command executes the corresponding operation;

When the first terminal device can execute the operation corresponding to the voice instruction, the instruction distribution request sent by the first terminal device is not received.
According to the server according to claim 1, when the voice command only carries the device name, sending the voice command to the second terminal device according to the command distribution request, comprising:

Searching for the second terminal device corresponding to the device name according to the instruction distribution request, and sending the voice instruction to the second terminal device.
According to the server according to claim 1, when the voice command only carries device capability parameters, sending the voice command to the second terminal device according to the command distribution request includes:

Searching for the second terminal device having the device capability parameter according to the instruction distribution request, and sending the voice instruction to the second terminal device.
According to the server according to claim 1, when the voice command is an command corresponding to a custom rule, wherein, in the custom rule, the voice command has a corresponding relationship with the terminal device, and the voice command is distributed according to the command request, sending the voice command to the second terminal device, including:

Searching for the second terminal device corresponding to the custom rule according to the instruction distribution request, and sending the voice instruction to the second terminal device.
The server according to claim 1, wherein the voice instruction includes at least two matching items, each of which is provided with a weight attribute value;

According to the instruction distribution request, sending the voice instruction to the second terminal device includes:

When there are at least two terminal devices that satisfy at least one of the matching items in the voice instructions, calculate the total weight attribute value of the matching items that the terminal devices satisfy, and send the voice instructions to the second The terminal device, wherein the total value of the weight attribute is the sum of the weight values of the matching items, and the total value of the weight attribute of the second terminal device is the largest.
The server according to claim 5, wherein the matching item is one of a device name, a response time period of the device, a space where the device exists, and a device capability parameter.
The server according to claim 1, configured to execute:

Analyzing the service requirement information in the voice instruction;

Screening a second terminal device according to the service requirement information, where the second terminal device is a terminal device capable of realizing the service requirement information in the device state;

The server is further configured to: send a silence instruction to other terminal devices in the current voice control system other than the second terminal device.
The server according to claim 7, the server is further configured to perform:

Acquiring voice and audio data corresponding to the voice command, and identifying a wake-up word from the voice and audio data;

If the voice and audio data includes the wake-up word, locate the voice control system where the first terminal device is located;

Sending a status acquisition request to the voice control system, so that all terminal devices in the voice control system report the device status after receiving the status acquisition instruction.
According to the server according to claim 7, the business requirement information includes business type and business status, and the server is further used to perform:

In the step of screening the second terminal device according to the service requirement information, extracting the service type and service status from the service requirement information;

matching a candidate terminal device that meets the service type in the device state, where the candidate terminal device has a device type that meets the requirements of the service type;

Traversing the device states of the candidate terminal devices to filter out a second terminal device whose device state conforms to the service state.
According to the server according to claim 9, the service requirement information also includes a service execution location, and the server is further used to perform:

In the step of screening the second terminal device according to the service requirement information, extracting the service execution location from the service requirement information;

Obtain the device location of each candidate terminal device in the current voice control system;

If the device location of the candidate terminal device coincides with the service execution location, perform the step of traversing the device status of the candidate terminal device;

If the device location of the candidate terminal device does not coincide with the service execution location, mark that the candidate terminal device is not the second terminal device.
The server according to claim 10, the server is further configured to perform:

In the step of screening the second terminal device according to the service requirement information, acquiring the number of terminal devices whose device status can realize the service requirement information;

If the number of terminal devices is greater than or equal to 2, search for the main terminal device to use the main terminal device to interact with the user to determine the second terminal device. The main terminal device is a plurality of one of the terminal equipment;

If the number of terminal devices is equal to 1, mark the terminal device capable of realizing the service requirement information as the second terminal device.
The server according to claim 11, the server is further configured to perform:

After the step of finding the master terminal device, sending an inquiry instruction to the master terminal device, so that the master terminal device plays an inquiry voice, the inquiry instruction is multiple rounds of wake-up-free voice interaction instructions;

receiving a confirmation voice command input by the user through the main terminal device;

extracting the identification information of the second terminal device from the confirmation voice instruction;

Screening the second terminal device from a plurality of terminal devices capable of realizing the service requirement information according to the identification information of the second terminal device.
The server according to claim 7, the server is further configured to perform:

parsing the identification information of the second terminal device from the voice instruction;

If the voice command includes the identification information of the second terminal device, generating a control command and feedback voice information according to the voice command;

Sending the control command to the second terminal device according to the identification information of the second terminal device, and sending the feedback voice information to the first terminal device.
The server according to claim 7, the server is further configured to perform:

After the step of sending a response command to the second terminal device, receiving the execution result data reported by the second terminal device, the execution result data includes the new state of the device after running the response command;

extracting the new state of the device from the execution result;

updating the device state stored in the storage module with the new device state.
The server according to claim 1, said server being configured to:

receiving a voice instruction including a user identification sent by the first terminal device,

Find all terminal devices related to the user identification;

When there is no terminal device related to the user identifier, feeding back a parameter representing the absence of a terminal device, so that the first terminal device broadcasts that there is no terminal device executing a voice command;

When there is a terminal device related to the user identifier, use preset filtering rules to filter out the most matching second terminal device, and feed back parameters characterizing the best matching second terminal device, so that the first terminal device broadcasts There is a second terminal device that executes the voice command, and the most matching second terminal device is controlled to execute the voice command.
According to the server according to claim 15, the filtering rule characterizes the mapping relationship between the voice command and the terminal device, and the filtering rule includes the first group of rules and the second group of rules, or only includes the first group of rules The rules do not include the second set of rules, wherein the first set of rules refers to the rules necessary to filter out the most matching second terminal device, and the second set of rules refers to the rules that are necessary to filter out the best matching second terminal device. The rules for superimposing and utilizing the second terminal device one by one.
According to the server according to claim 16, the first group of rules includes terminal device function authority sub-rules, and when there is a terminal device related to the user identifier, use preset filtering rules to filter out the most matching In the step of the second terminal device, the server is further configured to:

Using the first set of rules to screen the current terminal device and the terminal device related to the user identifier;

If the terminal device associated with the user identifier does not have the authority to execute the voice command, then feed back a parameter representing the absence of a second terminal device, so that the first terminal device broadcasts that there is no second terminal that executes the voice command equipment;

If the terminal device related to the user identifier has the right to execute the voice command, confirm the number of second terminal devices that have the right to execute the voice command.
According to the server according to claim 17, the second group of rules includes a plurality of sub-rules, and the plurality of sub-rules have priorities, and after confirming the number of second terminal devices having the authority to execute the voice command, The server is also configured to:

When there is a second terminal device with the authority to execute the voice command, feed back parameters representing the second terminal device with the corresponding authority, so that the first terminal device broadcasts the second terminal device with the corresponding authority, and controlling the current second terminal device to execute the voice instruction;

When there are multiple second terminal devices with the authority to execute the voice command, use the sub-rules in the second group of rules to screen multiple terminal devices with the authority to execute the voice command in order of priority from high to low. the second terminal device until a best-matching second terminal device is selected.
According to the server according to claim 16, the second group of rules includes at least one of the user frequency sub-rule, the distance from the first terminal device sub-rule, and the terminal device priority sub-rule.
According to the server according to claim 19, when the second terminal device is screened by using the frequency sub-rule of user usage, the server is configured to:

Detecting respectively the execution frequency of multiple second terminal devices having the authority to execute the voice command, wherein the execution frequency refers to the number of times the second terminal device has executed voice commands of the same function in historical behaviors;

The second terminal device with the highest execution frequency is retained.
According to the server according to claim 19, when using the distance sub-rule from the first terminal device to screen the second terminal device, the server is configured to:

Respectively detecting the distance between multiple second terminal devices having the authority to execute the voice command and the first terminal device;

The second terminal device with the closest distance to the first terminal device is reserved.
According to the server according to claim 19, when using the terminal device priority to screen the second terminal device, the server is configured to:

According to the terminal device priority set by the user, the second terminal device with the highest priority is reserved.
According to the server according to claim 15, after said receiving the voice instruction sent by the first terminal device, said server is further configured to:

All terminal devices and device attributes related to the user identifier are loaded from the database into the cache, so that the server searches for the second terminal device in the cache.
The server according to claim 1, configured to execute:

Recognizing the voice command to obtain corresponding control information, where the control information includes a function category and a control command;

determining a first set of candidate terminal devices corresponding to the function category according to a pre-established terminal device information table;

Determine a second set of candidate terminal devices matching the control instruction based on the functional state corresponding to each candidate terminal device in the first set of candidate terminal devices;

Determining a second terminal device that matches the control instruction from the second candidate terminal device set.
The server according to claim 24, the server is specifically configured to perform:

determining the number of all second candidate terminal devices included in the second candidate terminal device set matching the control instruction;

Determine the second terminal device from the second candidate terminal device set according to the magnitude relationship between the number and a preset threshold, and control the second terminal device to execute the control instruction.
The server according to claim 25, the server is specifically configured to perform:

If the number of all second candidate terminal devices is less than or equal to the preset threshold, determining all second candidate terminal devices included in the second candidate terminal device set as the second terminal device;

Each second terminal device is controlled to respectively execute the control instruction.
The server according to claim 26, further configured to perform:

If the number of all second candidate terminal devices is greater than the preset threshold, send first prompt information, where the first prompt information is used to instruct the user to determine the second candidate terminal device from the set of second candidate terminal devices. Terminal Equipment;

receiving first response information, where the first response information includes first identification information corresponding to the second terminal device;

controlling the second terminal device corresponding to the first identification information to execute the control instruction.
The server according to claim 24, the server is specifically configured to perform:

performing text recognition on the voice command by a voice recognition method to obtain text information corresponding to the voice command;

The text information is semantically understood by means of a semantic understanding method to obtain the control information contained in the text information.
The server according to claim 24, the server is specifically configured to perform:

Obtain the device name, function name, function category and function status corresponding to each terminal device included in the preset scene;

According to all the device names, function names, function categories and function states, create or update the corresponding terminal device information table.
According to the server according to any one of claims 24-29, the server is further configured to perform:

If the first set of candidate terminal devices is an empty set, or if the first set of candidate terminal devices is a non-empty set and the second set of candidate terminal devices is an empty set, send second prompt information, wherein the The second prompt information is used to instruct the user to determine the second terminal device from multiple terminal devices;

receiving second response information, where the second response information includes second identification information corresponding to the second terminal device;

controlling the second terminal device corresponding to the second identification information to execute the control instruction.
A terminal device for voice control, comprising:

A sound collector configured to collect a voice signal input by a user;

Controller, configured as:

receiving a voice signal input by a user from the sound collector, sending the voice signal to a server, and receiving a voice instruction from the server, wherein the voice instruction is generated according to the voice signal;

When the terminal device can perform the operation corresponding to the voice command, in response to the voice command, perform the operation corresponding to the voice command;

When the terminal device cannot execute the operation corresponding to the voice instruction, generate an instruction distribution request, and send the instruction distribution request to the server, so that the server can search for executable commands according to the instruction distribution request. other terminal devices that are operated correspondingly to the voice commands, and send the voice commands to other terminal devices.
According to the terminal device according to claim 31, the terminal device is configured with local capability attribute parameters, and the specific steps for the terminal device to determine whether the operation corresponding to the voice command can be performed are:

Analyzing the to-be-processed capability attribute parameter from the voice command;

When the local capability attribute parameter matches the to-be-processed capability attribute parameter, the terminal device may execute the operation corresponding to the voice command;

When the local capability attribute parameter does not match the pending capability attribute parameter, the terminal device cannot execute the operation corresponding to the voice command.
The terminal device according to claim 31, the controller is further configured to:

receiving a response command or a silent command issued by the server;

Run the responding command or the silent command.