CN113990316A

CN113990316A - Voice control method, device, server, terminal equipment and storage medium

Info

Publication number: CN113990316A
Application number: CN202111234001.9A
Authority: CN
Inventors: 汪民
Original assignee: Beijing Opper Communication Co ltd
Current assignee: Beijing Opper Communication Co ltd
Priority date: 2021-10-22
Filing date: 2021-10-22
Publication date: 2022-01-28

Abstract

The application discloses a voice control method, a voice control device, a server, a terminal device and a storage medium, wherein the method is applied to a first server and comprises the following steps: receiving a first request sent by first terminal equipment; the first request carries a first voice instruction; under the condition that the execution equipment corresponding to the first voice instruction is second terminal equipment, inputting the first voice instruction and first information of the second terminal equipment into a dialog system; the first information represents the relevant state of the terminal equipment; obtaining a first control instruction returned by the dialog system; the first control instruction is obtained by the dialog system by analyzing the first voice instruction based on the first information and is used for indicating the second terminal equipment to execute a first operation; and sending out a first control instruction.

Description

Voice control method, device, server, terminal equipment and storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to a voice control method, an apparatus, a terminal device, and a storage medium.

Background

Currently, electronic devices deployed with voice assistants are capable of user interaction through intelligent conversations or instant question and answer. In actual application, a cross-device execution scenario exists, that is, a scenario in which the wake-up device receives the voice instruction and the execution device executes the relevant operation indicated by the voice instruction. In the related art, the delay of the voice instruction execution across devices is high.

Disclosure of Invention

In view of this, embodiments of the present application provide a voice control method, an apparatus, a terminal device, and a storage medium, so as to at least solve the problem of high delay of cross-device execution of voice commands in the related art.

In order to achieve the purpose, the technical scheme of the application is realized as follows:

the embodiment of the application provides a voice control method, which is applied to a first server and comprises the following steps:

receiving a first request sent by first terminal equipment; the first request carries a first voice instruction;

under the condition that the execution equipment corresponding to the first voice instruction is second terminal equipment, inputting the first voice instruction and first information of the second terminal equipment into a dialog system; the first information represents the relevant state of the terminal equipment;

obtaining a first control instruction returned by the dialog system; the first control instruction is obtained by the dialog system by analyzing the first voice instruction based on the first information and is used for indicating the second terminal equipment to execute a first operation;

and sending out a first control instruction.

The embodiment of the present application further provides a voice control method, applied to a first terminal device, including:

receiving a first voice instruction;

sending a first request to a first server; the first request carries the first voice instruction;

receiving a first message correspondingly sent by the first server based on the first request, and responding to the first message to send a first control instruction carried by the first message to a second terminal device; wherein the content of the first and second substances,

the first control instruction is obtained based on the analysis of the first voice instruction and is used for indicating the second terminal equipment to execute a first operation.

An embodiment of the present application further provides a voice control apparatus, including:

a first receiving unit, configured to receive a first request sent by a first terminal device; the first request carries a first voice instruction;

the first processing unit is used for inputting the first voice instruction and first information of the second terminal equipment into a dialog system under the condition that the execution equipment corresponding to the first voice instruction is the second terminal equipment; the first information represents the relevant state of the terminal equipment;

the second processing unit is used for obtaining a first control instruction returned by the dialog system; the first control instruction is obtained by the dialog system by analyzing the first voice instruction based on the first information and is used for indicating the second terminal equipment to execute a first operation;

and the first sending unit is used for sending a first control instruction.

the second receiving unit is used for receiving the first voice instruction;

a second sending unit, configured to send the first request to the first server; the first request carries the first voice instruction;

a third sending unit, configured to receive a first message correspondingly sent by the first server based on the first request, and send, in response to the first message, a first control instruction carried in the first message to a second terminal device; wherein the content of the first and second substances,

An embodiment of the present application further provides a server, including: a processor and a memory for storing a computer program capable of running on the processor,

wherein the processor is configured to execute the steps of the voice control method when running the computer program.

An embodiment of the present application further provides a terminal device, including: a processor and a memory for storing a computer program capable of running on the processor,

The embodiment of the present application further provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the above-mentioned voice control method.

In the embodiment of the application, a first server receives a first request sent by a first terminal device; the first request carries a first voice instruction; under the condition that the execution device corresponding to the first voice instruction is the second terminal device, the first server inputs the first voice instruction and first information representing the relevant state of the second terminal device into the dialog system, the dialog system analyzes the first voice instruction based on the first information to obtain a first control instruction used for indicating the second terminal device to execute the first operation, and the first server obtains the first control instruction returned by the dialog system and sends the first control instruction. Therefore, when the cross-device execution is carried out, the first server can determine the control instruction corresponding to the voice instruction only by communicating with the awakening device, namely the first terminal device, and indicate the corresponding execution device, namely the second terminal device, to execute the corresponding operation, so that the time consumed for information transmission in the voice control process is reduced, the delay of the cross-device execution of the voice instruction is reduced, and the execution efficiency of the cross-device execution is improved.

Drawings

FIG. 1 is a diagram illustrating a voice control method in the related art;

FIG. 2 is a schematic diagram of a voice control system according to an embodiment of the present application;

fig. 3 is a schematic flow chart illustrating an implementation of a voice control method according to an embodiment of the present application;

fig. 4 is a schematic flow chart illustrating an implementation of a voice control method according to another embodiment of the present application;

FIG. 5 is an interaction diagram of a voice control method provided in an embodiment of the present application;

FIG. 6 is an interaction diagram of a voice control method according to another embodiment of the present application;

fig. 7 is a schematic flow chart illustrating an implementation of a voice control method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a voice control structure according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a voice control structure according to another embodiment of the present application;

fig. 10 is a schematic diagram of a hardware composition structure of a terminal device according to an embodiment of the present application.

Detailed Description

In daily life, a user can perform voice control on the smart television through a mobile phone with a voice assistant. As shown in fig. 1, in the implementation scheme, a mobile phone receives a voice instruction of a user and sends the voice instruction to a cloud server; the conversation system of the cloud server obtains a classification result according to the voice instruction, executes a conversation management process and sends a returned conversation management result to the mobile phone; based on the conversation management result, the mobile phone sends the voice command to the smart television, the smart television sends the voice command and relevant state information of the smart television, such as a playing state, a playing mode and the like, to the server, and a conversation system of the server returns a corresponding protocol according to the voice command and the relevant state information. Therefore, the intelligent television can acquire the corresponding resource according to the returned protocol and execute the corresponding playing action. However, since both the mobile phone and the smart television need to interact with the dialog system of the server, the time consumption of information transmission in the whole voice control process is long, and the delay of the execution of the voice command across the devices is high. In addition, in the voice control process, the dialog system needs to receive repeated information from the mobile phone and the smart television, and network resources and computing cost are wasted.

Based on this, the embodiment of the present application provides a voice control method, where a first server receives a first request sent by a first terminal device; the first request carries a first voice instruction; under the condition that the execution device corresponding to the first voice instruction is the second terminal device, the first server inputs the first voice instruction and first information representing the relevant state of the second terminal device into the dialog system, the dialog system analyzes the first voice instruction based on the first information to obtain a first control instruction used for indicating the second terminal device to execute the first operation, and the first server obtains the first control instruction returned by the dialog system and sends the first control instruction. Therefore, when the cross-device execution is carried out, the first server can determine the control instruction corresponding to the voice instruction only by communicating with the awakening device, namely the first terminal device, and indicate the corresponding execution device, namely the second terminal device, to execute the corresponding operation, so that the time consumed for information transmission in the voice control process is reduced, the delay of the cross-device execution of the voice instruction is reduced, and the execution efficiency of the cross-device execution is improved.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Fig. 2 is a schematic diagram of a voice control system according to an embodiment of the present application. The voice control system is provided with a group formed by interconnection and intercommunication between terminal devices and formed by at least two terminal devices, and the terminal devices are communicated based on a short-distance wireless communication technology. The short-range Wireless communication technology includes a bluetooth communication technology or a Wireless Fidelity (WiFi) communication technology. When a plurality of terminal devices are included in the group, one-to-one communication, one-to-many communication, or many-to-one communication may be performed between the terminal devices based on a short-range wireless communication technique. In practical applications, the interworking group may be established based on a third-party communication Framework, for example, an OAF communication Framework (options access Framework) established based on a third-party protocol. A group of at least two terminal devices wirelessly communicates with the server 21, and in a cross-device execution scenario, the terminal device 22 that receives the voice command is generally referred to as a wake-up device, and the terminal device 23 that executes an operation indicated by a control command corresponding to the voice command is referred to as an execution device. For example, a user inputs a first voice instruction "play a tv series a on a tv" to a mobile phone, where the mobile phone is a wake-up device and the smart tv is an execution device.

It should be noted that the terminal devices in the voice control system shown in fig. 2 are all installed with the first application, for example, the first application may be a voice assistant. The first application is used for supporting a user to input a voice command and supporting interaction between the terminal device and a cloud (namely, the server 21) and between the terminal device at an application layer. Based on the first application, the user can realize the operation of any terminal device in the group by inputting a voice instruction.

In practical applications, the terminal device in fig. 2 may include at least one of the following:

the system comprises various internet of things terminals such as a mobile phone, an intelligent sound box, a notebook computer, an intelligent watch, a tablet personal computer, a television, a refrigerator and an air conditioner.

Under the scene of the internet of things, families are taken as units, and terminal devices in a group at least comprise various terminal devices and intelligent household appliances used by family members. For example, a group is established between terminal devices used by family members and guests and between intelligent household appliances in the home based on a short-range wireless communication technology, so that all the terminal devices can communicate based on the short-range wireless communication technology, and a voice instruction can be sent to any terminal device in the group to control another terminal device to execute an operation corresponding to the voice instruction, wherein a wake-up device and an execution device corresponding to the voice instruction do not need to be terminal devices depending on the same account. In practical application, family members or guests can use different wake-up devices to perform voice control on the smart television.

The following describes in detail the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems by embodiments and with reference to the drawings. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 3 is a schematic flow chart illustrating an implementation of a voice control method according to an embodiment of the present application. The main execution body of the process is a first server, and the first server is the server 21 in fig. 2.

As shown in fig. 3, the voice control method includes:

step 301: receiving a first request sent by first terminal equipment; the first request carries a first voice instruction.

When a user wants to perform voice control on a second terminal device in a group through a first terminal device, the user inputs a first voice instruction to the first terminal device through a first application in the first terminal device. The first server receives a first request which is sent by first terminal equipment and carries a first voice instruction. Wherein the second terminal device is different from the first terminal device. The first voice instruction at least comprises a device slot position and a skill intention, wherein the device slot position represents an execution device for executing the corresponding operation, and the skill intention represents a target which a user wants to realize in the skill supported by the corresponding terminal device and is used for indicating the corresponding terminal device to execute the first operation. Skills generally refer to capabilities or functions that a terminal device possesses. In natural language understanding, the method mainly comprises two parts of intention recognition and slot filling, wherein a slot refers to important information in conversation, and recognition of a device slot can clarify an execution device corresponding to a voice instruction.

For example, the first voice instruction is "open tv series a on tv", the first server extracts the device slot as tv through Natural Language Processing (NLP), and the skill intention is to play tv series a.

Step 302: under the condition that the execution equipment corresponding to the first voice instruction is second terminal equipment, inputting the first voice instruction and first information of the second terminal equipment into a dialog system; the first information represents the relevant state of the terminal device.

The method comprises the steps that a first server receives a first voice instruction, an execution device corresponding to the first voice instruction is determined, when the execution device corresponding to the first voice instruction is a second terminal device, namely the execution device corresponding to the first voice instruction is not the first terminal device which receives the first voice instruction input by a user, the execution device is a cross-device execution voice instruction, the first server inputs the first voice instruction and the second terminal device, namely the relevant state of the execution device, into a dialog system, and the dialog system carries out semantic understanding on the first voice instruction, so that skill intentions are obtained through analysis. The first terminal device and the second terminal device are all terminal devices in a group which are interconnected and intercommunicated. The hardware device of the dialog system is a server and the dialog system can be a software system that implements speech instruction understanding.

Here, an equipment coordination service module may be newly created in the first server, and the equipment coordination service module performs modeling according to the equipment decision and information returned by the semantic result, and determines whether to perform an operation for a cross-equipment. In practical application, the device coordination service module comprehensively models according to the relevant information (which may be considered as one of the first information) of the interconnected devices acquired by the first terminal device, namely the wake-up device, and the slot position information identified by the NLP. For example, the mobile phone finds that the terminal devices interconnected and intercommunicated with the wake-up device are smart televisions and smart watches, the voice instruction of the user is "open xxx (drama) on the television", the device coordination service module learns the connection state with the mobile phone based on the information related to the interconnected devices of the mobile phone, and learns that the terminal devices such as smart televisions a and smart watches B are connected with the mobile phone at the moment; and through the NLP, the equipment slot position for extracting the voice instruction is 'television', and the 'television' is not a 'mobile phone' for receiving the voice instruction, and the smart television A is connected with the mobile phone at the moment, so that the operation is judged to be executed across equipment at the moment. Therefore, the accuracy of recognizing the execution object based on the voice command can be improved, and the accuracy of the control command obtained by analyzing the voice command can be improved.

Here, the first information may be used when the dialog system performs semantic understanding and when the dialog management flow is performed, thereby improving the accuracy of the control command obtained by the voice command analysis. In semantic understanding, the dialog system orders a plurality of results obtained by understanding based on the relevant state of the terminal device. The first information of the second terminal device includes but is not limited to: the method comprises the steps of installing an application on the terminal equipment, the current playing state of the terminal equipment, and a channel and a playing mode which are supported by the terminal equipment to acquire resources.

It should be noted that, when the execution device corresponding to the first voice instruction is the first terminal device, that is, the execution device corresponding to the first voice instruction is the first terminal device that receives the first voice instruction input by the user, at this time, the voice instruction is not executed across devices, but the voice control method according to the embodiment of the present application may still be used.

Step 303: obtaining a first control instruction returned by the dialog system; the first control instruction is obtained by the dialog system by analyzing the first voice instruction based on the first information and is used for indicating the second terminal equipment to execute a first operation.

Here, the first terminal device obtains a first control instruction obtained by the dialog system analyzing the first voice instruction based on the first information. The first control instruction is used for instructing the second terminal device to execute the first operation, namely the skill intention in the first voice instruction, and represents the target which the user wants to achieve in the skill supported by the corresponding terminal device.

Step 304: and sending out a first control instruction.

Here, the first server sending the first control instruction object is not limited, and may be a first terminal device sending the first request to the first server, or may be other devices, such as a management device of the terminal of the internet of things, or an execution device executing the first operation, that is, a second terminal device.

In this embodiment, a first server receives a first request sent by a first terminal device; the first request carries a first voice instruction; under the condition that the execution device corresponding to the first voice instruction is the second terminal device, the first server inputs the first voice instruction and first information representing the relevant state of the second terminal device into the dialog system, the dialog system analyzes the first voice instruction based on the first information to obtain a first control instruction used for indicating the second terminal device to execute the first operation, and the first server obtains the first control instruction returned by the dialog system and sends the first control instruction. Therefore, when the cross-device execution is carried out, the first server can determine the control instruction corresponding to the voice instruction only by communicating with the awakening device, namely the first terminal device, and indicate the corresponding execution device, namely the second terminal device, to execute the corresponding operation, so that the time consumed for information transmission in the voice control process is reduced, the delay of the cross-device execution of the voice instruction is reduced, and the execution efficiency of the cross-device execution is improved.

Considering that the terminal devices in the group are dynamically changeable, and the first information of each terminal device is also dynamically changeable, in an embodiment, before the inputting the first voice command and the first information of the second terminal device into the dialog system, the method further includes:

receiving first information of at least one terminal device reported by the first terminal device;

determining first information of the second terminal equipment from the first information of the at least one terminal equipment; wherein the content of the first and second substances,

and each terminal device in the at least one terminal device is interconnected and intercommunicated with the first terminal device.

The first server receives first information of at least one terminal device which is reported by the first terminal device and is interconnected and communicated with the first terminal device, the second terminal device is determined according to the first voice instruction, and the first information of the second terminal device is determined from the received first information of the terminal device. The first information reported by the first terminal device may be the first information of part or all of the terminal devices interconnected and intercommunicated with the first terminal device.

Here, the timing when the first server receives the first information is not limited to receiving the first information together with the first request, and may be the timing when the first server receives the first information after transmitting the request to the first terminal when it is determined that the first information is executed across devices. The first terminal device reports the first information, which may be actively reported (for example, sent together with the first request), or reported in response to a request sent by the first server.

Therefore, the first server can accurately determine the execution equipment and the corresponding skill intention, and the wrong control instruction is prevented from being sent to the execution equipment, so that the accuracy of voice control is improved.

In addition, the first information of at least one terminal device is not limited to be reported by the first terminal device, and may also be reported by the terminal devices in the group every set period, for example.

As mentioned above, the sending of the first control instruction object by the first server is not limited, and therefore, in an embodiment, the sending of the first control instruction includes:

sending a first message to the first terminal device; the first message carries the first control instruction, so that the first terminal device sends the first control instruction to the second terminal device in response to the first message.

Since the receiving object of the control instruction sent by the first server is not the second terminal device that actually executes the first control instruction, but the first terminal device, the first server needs to rewrite the first control instruction before sending the control instruction, so as to instruct the first terminal device to forward the control instruction to the corresponding execution device when receiving the corresponding message. Here, the first server rewrites the first control instruction in the following manner: and generating a first message carrying the first control instruction based on the first control instruction, and sending the generated first message to the first terminal device, so that when receiving the first message, the first terminal device sends a message carrying at least the first control instruction to the second terminal device in response to a set field in the first message, that is, the terminal device executing the first control instruction, that is, the second terminal device can receive the first control instruction based on the rewriting of the first control instruction by the first server. Therefore, in the voice control process, the second terminal device does not need to interact with the first server, so that the time consumption of information transmission in the voice control process is reduced, the delay of cross-device execution of voice commands is reduced, and the execution efficiency of cross-device execution is improved. Meanwhile, the first terminal device and the second terminal device are in the same group, and after receiving the first message, the first terminal device can send the first control instruction to the second terminal device through the OAF communication framework based on short-distance wireless communication technologies such as a Bluetooth communication technology, that is, each terminal device is not required to have the cloud-up capability, and the requirements on the software and the hardware of the terminal device are reduced.

Further, since the first control instruction is generated by the skill side of the corresponding dialog system according to the skill intention of the first voice instruction, wherein the skill side represents the type of the skill intention executable by the execution device, such as a playing function of the smart television, and an adjusting function of the smart cooking appliance. In some embodiments, a mode layer may also be newly created in the first server, and the mode layer is responsible for sensing the cross-device execution scenario. In the case of cross-device execution, the mode layer rewrites the first control instruction returned by the dialog system to be replaced by a protocol of multi-device interconnection. The skill side in the dialog system does not need to be adapted every time a new cross-device execution scenario occurs, i.e. the dialog system does not need to be aware of the state information of the executing device. Therefore, the difficulty of maintaining the dialogue system by the user is reduced, and the dialogue system developed by a third party is supported in the voice control process, so that the accuracy of the control instruction obtained based on the voice instruction analysis is improved.

and sending the first control instruction to the second terminal equipment.

Here, the first server queries the device information of the user according to the account of the user, and directly sends the first control instruction to the second terminal device, so that the time consumption of information transmission in the voice control process can be further reduced, the delay of cross-device execution of the voice instruction is reduced, and the execution efficiency of cross-device execution is improved.

Fig. 4 is a schematic flow chart illustrating an implementation of the voice control method according to the embodiment of the present application. The main execution body of the process is the first terminal device, and the first terminal device is the terminal device 22 in fig. 2.

As shown in fig. 4, the voice control method includes:

step 401: a first voice instruction is received.

When a user wants to perform voice control on a second terminal device in a group through a first terminal device, the user inputs a first voice instruction to the first terminal device through a first application in the first terminal device. The first voice instruction at least comprises a device slot position and a skill intention, the device slot position represents execution equipment for executing corresponding operation, and the skill intention represents a target which a user wants to realize in a skill supported by corresponding terminal equipment and is used for indicating the corresponding terminal equipment to execute the first operation.

It should be noted that, when receiving the first voice instruction, the device slot may be the first terminal device or the second terminal device, but generally refers to a second terminal device different from the first terminal device, in this case, a scenario of performing an operation across devices is adopted, and a corresponding technical effect can be achieved.

Step 402: sending a first request to a first server; the first request carries the first voice instruction.

After receiving the first voice instruction, the first terminal device generates and sends a first request to the first server, wherein the first request carries the first voice instruction.

Step 403: receiving a first message correspondingly sent by the first server based on the first request, and responding to the first message to send a first control instruction carried by the first message to a second terminal device; wherein the content of the first and second substances,

The first terminal device receives a first message correspondingly sent by the first server based on the first request, and in response to the received first message, the first terminal device sends a first control instruction carried by the first message to the second terminal device. The first control instruction is obtained by analyzing the first voice instruction based on the first server and used for indicating the second terminal device obtained by analysis, namely the execution device, to execute the corresponding first operation, the execution device is obtained by analyzing the device slot position based on the first voice instruction, and the first operation is obtained by analyzing the skill intention based on the first voice instruction. Therefore, when the cross-device execution is carried out, the first server can determine the control instruction corresponding to the voice instruction only by communicating with the awakening device, namely the first terminal device, and indicates the awakening device to send the command instruction to the corresponding execution device, namely the second terminal device, so that the time consumed by information transmission in the voice control process is reduced, the delay of cross-device execution of the voice instruction is reduced, and the execution efficiency of cross-device execution is improved.

As mentioned above, the dialog system of the first server may use the first information of the at least one terminal device during semantic understanding and dialog management process, and the first information of the at least one terminal device does not limit the terminal device to be reported. For example, in practical applications, a management device is determined from terminal devices having a cloud-up capability in a cluster, the terminal devices in the cluster may report first information to the management device every set period, and the management device sends the first information of at least one terminal device to the first server.

In view of the first terminal device communicating with the first server and having at least the capability of going to the cloud, in one embodiment, the method further comprises:

receiving first information sent by at least one terminal device;

sending first information of each of the at least one terminal device to the first server.

The first terminal equipment receives first information sent by at least one terminal equipment in the group, and sends the received first information to the first server.

Therefore, the first terminal device, namely the wake-up device, can acquire the relevant state information of the interconnection and interworking device, namely, the wake-up device can acquire the controllable execution device, namely, the second terminal device, and the state information corresponding to each second terminal device, thereby improving the accuracy of the control instruction obtained based on the voice instruction analysis.

In an internet of things scenario, in consideration that terminal devices in a group may change dynamically, and first information of each terminal device also changes dynamically, in an embodiment, the receiving first information sent by at least one terminal device includes:

broadcasting a second message based on the short-range wireless communication technology; the second message is used for requesting the corresponding terminal equipment to return the first information;

and receiving first information sent by the at least one terminal device based on the second message.

Based on the short-range wireless communication technology, the first terminal equipment broadcasts a second message to at least one terminal equipment in the group to request the corresponding terminal equipment to return corresponding first information,

here, the timing at which the first terminal device broadcasts the second message is not limited to being before or after the first voice instruction is received.

After receiving the first voice command, the first terminal device broadcasts the second message to collect the first information, and can accurately determine the first information of the terminal devices in the group, so that the first server can accurately determine the execution device and the corresponding skill intention based on the relevant state of the terminal devices collected in real time, and avoids sending wrong control commands to the execution device, thereby improving the accuracy of voice control.

As an embodiment of the present application, a process of implementing voice control by the voice control system of fig. 2 is described in detail below with reference to interaction diagrams of the voice control method shown in fig. 5 and fig. 6. The embodiment corresponding to fig. 5 is different from the embodiment corresponding to fig. 6 in that, in the embodiment corresponding to fig. 5, the first terminal device forwards the first information carrying the first control instruction; in the embodiment corresponding to fig. 6, the cloud server directly sends the first control instruction to the second terminal device.

Fig. 5 is an interaction diagram of a voice control method provided in an embodiment of the present application, and as shown in fig. 5, the voice control method includes:

step 501: the first terminal equipment receives a first voice instruction.

Step 501 is the same as step 401, and please refer to the related description in step 401 for the implementation process.

Step 502: the first terminal equipment receives first information sent by at least one terminal equipment.

The first terminal equipment receives first information sent by at least one terminal equipment in the group, wherein the first information comprises the first information sent by the second terminal equipment. And the terminal devices in the group are interconnected and intercommunicated.

Step 503: the first terminal device sends a first request to the cloud server.

Wherein the first request carries a first voice instruction; step 503 is the same as step 301 and step 402, and the implementation process refers to the relevant description in step 301 and step 402.

Step 504: the first terminal device sends first information of at least one terminal device to the cloud server.

Here, the first terminal device sends the first information of the at least one terminal device to the cloud server, and the first information at least includes the first information sent by the at least one terminal device received in step 502.

Step 505: and under the condition that the execution device corresponding to the first voice instruction is the second terminal device, the cloud server inputs the first voice instruction and the first information of the second terminal device into the dialog system.

Step 505 is the same as step 302, and please refer to the related description in step 302 for the implementation process.

Step 506: the cloud server obtains a first control instruction returned by the dialogue system.

The first control instruction is obtained by the dialog system through analyzing the first voice instruction based on the first information and is used for indicating the corresponding second terminal equipment to execute the first operation. Step 506 is the same as step 303, and the implementation process refers to the related description in step 303.

Step 507: based on the first control instruction, the cloud server generates and sends a first message to the first terminal device.

The first message carries a first control instruction, so that the first terminal device responds to the first message and sends the first control instruction to the second terminal device.

Step 508: and responding to the first message, and the first terminal equipment sends a first control instruction to the second terminal equipment.

Step 509: and under the condition of receiving the first control instruction, the second terminal equipment executes the first operation.

In this embodiment, in the voice control process, the second terminal device does not need to interact with the first server, so that the time consumption of information transmission in the voice control process is reduced, the delay of cross-device execution of the voice instruction is reduced, and the execution efficiency of cross-device execution is improved. Meanwhile, the first terminal device and the second terminal device are in the same group, and after receiving the first message, the first terminal device can send the first control instruction to the second terminal device through the OAF communication framework based on short-distance wireless communication technologies such as a Bluetooth communication technology, that is, each terminal device is not required to have the cloud-up capability, and the requirements on the software and the hardware of the terminal device are reduced.

Fig. 6 is an interaction diagram of a voice control method according to an embodiment of the present application. The embodiment corresponding to fig. 6 differs from the embodiment corresponding to fig. 5 in step 607. As shown in fig. 6, the voice control method includes:

step 601: the first terminal equipment receives a first voice instruction.

Step 601 is the same as step 401, and please refer to the related description in step 401 for the implementation process.

Step 602: the first terminal equipment receives first information sent by at least one terminal equipment.

Step 603: the first terminal device sends a first request to the cloud server.

Wherein the first request carries a first voice instruction; step 603 is the same as step 301 and step 402, and the implementation process refers to the relevant description in step 301 and step 402.

Step 604: the first terminal device sends first information of at least one terminal device to the cloud server.

Step 605: and under the condition that the execution device corresponding to the first voice instruction is the second terminal device, the cloud server inputs the first voice instruction and the first information of the second terminal device into the dialog system.

Step 605 is the same as step 302, and please refer to the related description in step 302 for the implementation process.

Step 606: the cloud server obtains a first control instruction returned by the dialogue system.

The first control instruction is obtained by the dialog system through analyzing the first voice instruction based on the first information and is used for indicating the second terminal equipment to execute the first operation. Step 606 is the same as step 303, and the implementation process refers to the related description in step 303.

Step 607: and the cloud server sends a first control instruction to the second terminal device.

Step 608: and under the condition of receiving the first control instruction, the second terminal equipment executes the first operation.

In this embodiment, the bottom layer server of the cloud server queries the device information of the user according to the unique account of the user, and directly sends the first control instruction to the second terminal device, so that the time consumed for information transmission in the voice control process can be further reduced, the delay of cross-device execution of the voice instruction is reduced, and the execution efficiency of cross-device execution is improved.

The examples of the present application are further illustrated below in connection with the application examples:

in the following, cross-device execution of the mobile phone, the cloud server, and the smart television is taken as a scenario, and the application is further described in detail with reference to an application embodiment.

Fig. 7 is a schematic flow chart illustrating an implementation of a voice control method according to an embodiment of the present application. As shown in fig. 7, the voice control method includes:

step 701: the mobile phone collects the relevant information of the terminal equipment.

Here, the mobile phones are in an interconnected cluster, and the cluster at least includes the mobile phones and the smart televisions. The mobile phone collects the relevant information of the connected terminal devices, and the collection of the smart television can obtain the application installed on the television, the current playing state of the television, the channel and the playing mode of the television for supporting the acquisition of resources and the like.

Step 702: and the mobile phone sends the collected relevant information of the terminal equipment to the cloud server.

The mobile phone sends the collected relevant information of the terminal device to the cloud server before sending the first voice command to the cloud server.

Step 703: and the cloud server carries out modeling according to the relevant information of the terminal equipment and the information returned by the semantic result, and judges whether the operation is executed in a cross-equipment mode.

The cloud server performs modeling according to the related information of the interconnected equipment acquired by the mobile phone and the slot position information of the equipment extracted by the NLP through a newly-built equipment coordination service module, and judges whether the operation is executed in a cross-equipment mode. For example, the cloud server knows that there are terminal devices such as a television and a watch connected to a mobile phone based on the received related information. And receiving a voice instruction of "open xxx (drama) on television", the cloud server extracts the equipment slot of the sentence as "television" through NLP, and the skill intention is "play xxx (drama)". In this way, the device coordination service module considers that the execution device is a television, the skill intention is to play xxx (a television series) on the television, and the terminal device receiving the voice instruction is a mobile phone, which is known as a cross-device execution operation.

Step 704: and under the condition that the cross-device execution is judged, the cloud server replaces the relevant information of the input dialogue system.

And under the condition that cross-device execution is judged, the device coordination service module determines corresponding related information of the intelligent television from the related information of the terminal device received by the server, replaces the related information of the mobile phone (awakening device) with the related information of the intelligent television (execution device), and inputs the related information into the dialog system.

In practical application, the related information of the terminal device, i.e. the first information and the end state information, can be used when the dialog system performs semantic understanding and dialog management process, so as to improve the accuracy of the control instruction obtained based on the voice instruction analysis. In semantic understanding, the dialogue system orders a plurality of results obtained by understanding based on the related state of the intelligent television.

For example, taking the example that the voice command is "play tv series a", the following understanding results of the voice command are obtained:

continuously playing the TV play A;

the drama a is opened and played.

If the first information of the intelligent television indicates that the terminal equipment is in a state of playing the television series B, the result of opening and playing the television series A is prior; if the first information of the smart television indicates that the terminal device is in a state of pausing playing of the series a, the result of "continuing playing of the series a" takes precedence.

Step 705: and the cloud server obtains a control instruction output by the dialogue system.

Step 706: based on the first control instruction, the cloud server generates and sends a first message to the first terminal device.

Here, the cloud server establishes a mode layer, and the mode layer is responsible for sensing a cross-device execution scene and simultaneously uniformly managing cross-device execution related control instructions returned by the dialog system. In the case of cross-device execution, the mode layer rewrites a first control instruction returned by the dialogue system, and replaces the first control instruction with a protocol of multi-device interconnection. The skill side in the dialog system does not need to be adapted every time a new cross-device execution scenario occurs, i.e. the dialog system does not need to be aware of the state information of the executing device. The skill side represents the type of skill intention which can be executed by the executing device, such as a playing function of a smart television and an adjusting function of a smart cooking appliance.

For example, the returned protocol on the skill side is executable by the smart television, but the protocol connected with the dialog system is a mobile phone, but the mobile phone cannot execute the control command for the smart television, at this time, the control command needs to be rewritten, the specific protocol is returned to tell the mobile phone, the control command is transmitted to the television through the OAF, and the television executes the corresponding control command.

Step 707: and the mobile phone sends a control instruction to the television through the OAF based on the received information.

And the mobile phone sends a control instruction to the television through the OAF based on the received information.

In order to implement the voice control method according to the embodiment of the present application, an embodiment of the present application further provides a voice control apparatus, and as shown in fig. 8, the voice control apparatus includes:

a first receiving unit 801, configured to receive a first request sent by a first terminal device; the first request carries a first voice instruction;

a first processing unit 802, configured to, when an execution device corresponding to the first voice instruction is a second terminal device, input the first voice instruction and first information of the second terminal device into a dialog system; the first information represents the relevant state of the terminal equipment;

the second processing unit 803 is configured to obtain a first control instruction returned by the dialog system; the first control instruction is obtained by the dialog system by analyzing the first voice instruction based on the first information and is used for indicating the second terminal equipment to execute a first operation;

the first sending unit 804 is configured to send a first control instruction.

Wherein, in one embodiment, the voice control apparatus further comprises:

a second receiving unit, configured to receive first information of at least one terminal device, which is reported by the first terminal device;

a third processing unit, configured to determine, from the first information of the at least one terminal device, first information of the second terminal device; wherein the content of the first and second substances,

In one embodiment, the first sending unit 804 is configured to:

and sending the first control instruction to the second terminal equipment.

In practical applications, the first receiving unit 801 and the second receiving unit may be implemented by a communication interface in a speech-based control device, the first processing unit 802, the second processing unit 803, and the third processing unit may be implemented by a processor in the speech-based control device, and the first sending unit 804 may be implemented by a processor in the speech-based control device in combination with the communication interface.

It should be noted that: in the voice control apparatus provided in the above embodiment, when performing voice control, only the division of the program modules is exemplified, and in practical applications, the processing distribution may be completed by different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the processing described above. In addition, the voice control apparatus and the voice control method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

In order to implement the voice control method according to the embodiment of the present application, an embodiment of the present application further provides a voice control apparatus, as shown in fig. 9, where the voice control apparatus includes:

a second receiving unit 901, configured to receive a first voice instruction;

a second sending unit 902, configured to send the first request to the first server; the first request carries the first voice instruction;

a third sending unit 903, configured to receive a first message correspondingly sent by the first server based on the first request, and send, in response to the first message, a first control instruction carried in the first message to a second terminal device; wherein the content of the first and second substances,

Wherein, in one embodiment, the voice control apparatus further comprises:

a third receiving unit, configured to receive first information sent by at least one terminal device;

a fourth sending unit, configured to send the first information of each of the at least one terminal device to the first server.

In one embodiment, the third receiving unit is configured to:

In practical applications, the second receiving unit 901, the second sending unit 902, the third receiving unit, and the fourth sending unit may be implemented by a communication interface in the speech-based control device, and the third sending unit 903 may be implemented by a processor in the speech-based control device in combination with the communication interface.

Based on the hardware implementation of the program module, and in order to implement the method of the embodiment of the present application, the embodiment of the present application further provides a terminal device. Fig. 10 is a schematic diagram of a hardware composition structure of a terminal device according to an embodiment of the present application, and as shown in fig. 10, the terminal device includes:

a communication interface 1 capable of information interaction with other devices such as network devices and the like;

and the processor 2 is connected with the communication interface 1 to realize information interaction with other equipment, and is used for executing the voice control method provided by one or more technical schemes when running a computer program. And the computer program is stored on the memory 3.

In practice, the various components in the terminal device are, of course, coupled together by means of the bus system 4. It will be appreciated that the bus system 4 is used to enable connection communication between these components. The bus system 4 comprises, in addition to a data bus, a power bus, a control bus and a status signal bus. For clarity of illustration, however, the various buses are labeled as bus system 4 in fig. 10.

The memory 3 in the embodiment of the present invention is used to store various types of data to support the operation of the terminal device. Examples of such data include: any computer program for operating on a terminal device.

It will be appreciated that the memory 3 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memory 2 described in the embodiments of the present invention is intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed by the above embodiment of the present invention can be applied to the processor 2, or implemented by the processor 2. The processor 2 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 2. The processor 2 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 2 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 3, and the processor 2 reads the program in the memory 3 and in combination with its hardware performs the steps of the aforementioned method.

When the processor 2 executes the program, the corresponding processes in the methods according to the embodiments of the present invention are realized, and for brevity, are not described herein again.

In an exemplary embodiment, the present invention further provides a storage medium, i.e. a computer storage medium, in particular a computer readable storage medium, for example comprising a memory 3 storing a computer program, which is executable by a processor 2 to perform the steps of the aforementioned method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, terminal and method may be implemented in other manners. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling an electronic device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any combination of at least two of any one or more of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A voice control method applied to a first server, the method comprising:

and sending out a first control instruction.

2. The method of claim 1, wherein prior to said entering the first voice instruction and the first information of the second terminal device into a dialog system, the method further comprises:

3. The method of claim 1, wherein said issuing a first control instruction comprises:

4. The method of claim 1, wherein said issuing a first control instruction comprises:

and sending the first control instruction to the second terminal equipment.

5. A voice control method is applied to a first terminal device and comprises the following steps:

receiving a first voice instruction;

6. The method of claim 5, further comprising:

receiving first information sent by at least one terminal device;

7. The method of claim 6, wherein the receiving the first information sent by the at least one terminal device comprises:

8. A voice control apparatus, comprising:

and the first sending unit is used for sending a first control instruction.

9. A voice control apparatus, comprising:

the second receiving unit is used for receiving the first voice instruction;

10. A server, comprising: a first processor and a first memory for storing a computer program capable of running on the processor,

wherein the first processor is configured to execute the steps of the speech control method according to any one of claims 1 to 4 when running the computer program.

11. A terminal device, comprising: a second processor and a second memory for storing a computer program capable of running on the processor,

wherein the second processor is configured to execute the steps of the speech control method according to any one of claims 5 to 7 when running the computer program.

12. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the speech control method of any one of claims 1 to 7.