CN113555018A

CN113555018A - Voice interaction method and device

Info

Publication number: CN113555018A
Application number: CN202110817882.0A
Authority: CN
Inventors: 刘璐
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2021-10-26

Abstract

The application provides a voice interaction method and device. The method comprises the following steps: and acquiring a voice text, and analyzing the voice text to obtain a first voice instruction. And determining a voice instruction to be supplemented based on a preset voice instruction set and the first voice instruction, wherein the preset voice instruction set comprises all voice instructions for realizing the function corresponding to the voice text. And sending the prompt information corresponding to the voice command to be supplemented to the user so that the user can input the voice command to be supplemented. According to the method and the device, all voice instructions of the functions corresponding to the voice texts can be acquired, so that the complete requirements of the user can be analyzed, the corresponding operation can be realized for the intelligent electronic equipment, the functions required by the user can be better realized, and the experience of the user is improved.

Description

Voice interaction method and device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a voice interaction method and device.

Background

With the development of artificial intelligence technology, voice interaction functions gradually enter various fields in people's lives. People can utilize the voice interaction function to realize voice control intelligent electronic equipment, for example: display devices, air conditioners, washing machines, and the like. People can use the voice interaction function to perform a series of operations such as watching video, listening to music, checking weather, controlling equipment and the like.

For intelligent electronic equipment, in the process of realizing a voice interaction function, a voice command input by a user is generally recognized as a text by a voice recognition module, and then a semantic analysis module analyzes the text according to lexical syntax and semantics, so that the requirements of the user are analyzed. And finally, the control end controls the intelligent electronic equipment to carry out corresponding operation according to the requirements of the user.

However, when the user uses the voice interactive function, the voice command input by the user may not be complete. For example, some key slot position parameters are lacking in the voice instruction, so that the complete requirement of the user cannot be analyzed, and at the moment, the intelligent electronic device cannot realize corresponding operation, so that the user experience of using the intelligent electronic device is poor.

Disclosure of Invention

The invention provides a voice interaction method and device. The problem that in the related art, the user experience of using the intelligent electronic device is poor is solved.

In a first aspect, the present application provides a voice interaction method, including:

acquiring a voice text, wherein the voice text is obtained by analyzing a first voice signal input by a user; analyzing the voice text to obtain a first voice instruction; determining a voice instruction to be supplemented based on a preset voice instruction set and the first voice instruction, wherein the preset voice instruction set comprises all voice instructions for realizing the function corresponding to the voice text; and sending the prompt information corresponding to the voice instruction to be supplemented to the user so that the user can input the voice instruction to be supplemented.

In some implementations, the determining a voice instruction to be supplemented based on a preset voice instruction set and the first voice instruction includes:

determining a first voice function corresponding to the voice text; acquiring a preset voice instruction set corresponding to the first voice function from a preset database, wherein all voice functions and the preset voice instruction set corresponding to each voice function are stored in the database; determining all voice instructions for realizing the first voice function according to a preset voice instruction set corresponding to the first voice function; and determining the voice instructions except the first voice instruction in all the voice instructions to obtain the voice instruction to be supplemented.

In some implementations, after the step of analyzing the voice text to obtain the first voice instruction, the method further includes:

acquiring a first slot position parameter in the first voice instruction; acquiring a first slot position parameter set, wherein the first slot position parameter set comprises all slot position parameters required to be contained in the first voice instruction; determining a second slot position parameter based on the first slot position parameter set and the first slot position parameter; and sending the slot position parameter prompt message corresponding to the second slot position parameter to the user so that the user inputs the second slot position parameter.

In some implementations, the determining a first set of slot parameters includes:

determining a slot position parameter set corresponding to the first voice instruction according to a preset voice instruction database to obtain a first slot position parameter set; all voice instructions and a slot position parameter set corresponding to each voice instruction are stored in the preset voice instruction database;

in some implementations, the determining a second slot position parameter based on the first set of slot position parameters and the first slot position parameter includes:

and determining the other slot position parameters except the first slot position parameter in the first slot position parameter set to obtain a second slot position parameter.

In some implementations, the method further includes:

receiving a second voice signal input by a user; analyzing the second voice signal to obtain a second voice instruction; determining a third voice instruction based on the voice instruction to be supplemented and the second voice instruction, wherein the third voice instruction is all voice instructions except the second voice instruction in the voice instruction to be supplemented; and sending the prompt message corresponding to the third voice instruction to the user so that the user inputs the third voice instruction.

In some implementations, the method further includes:

acquiring the association degree of the second voice instruction and the first voice instruction; and when detecting that the association degree reaches a preset association threshold value, executing a step of determining a third voice instruction based on the voice instruction to be supplemented and the second voice instruction.

In some implementations, obtaining the association degree of the second voice instruction with the first voice instruction includes:

acquiring all slot position parameters in the first voice instruction; calculating the association degree of each slot position parameter and the second voice instruction; and calculating the average value of all the association degrees, and taking the average value as the association degree of the second voice command and the first voice command.

In some implementations, the calculating the association of each slot parameter and the second voice instruction includes:

acquiring a plurality of semantemes of the second voice instruction and the semantic score of each semanteme, and determining the semantic score with the highest score as a first score; matching each slot position parameter with various semantics of the second voice instruction respectively, and determining a semantic score corresponding to the successfully matched semantics as a second score of each slot position parameter; subtracting a second score from the first score to obtain the difference between each slot position parameter and the second voice instruction; and determining the association degree of each slot position parameter and the second voice instruction according to the difference degree, wherein the sum of the difference degree and the association degree is 1.

In some implementations, the method further includes:

and when the relevance degree is detected not to reach a preset relevance threshold value, taking the second voice instruction as a new first voice instruction, and executing the step of determining the voice instruction to be supplemented.

In a second aspect, the present application provides a voice interaction apparatus, comprising:

the voice text acquisition unit is configured to acquire a voice text, wherein the voice text is obtained by analyzing a first voice signal input by a user;

the voice text analysis unit is configured to analyze the voice text to obtain a first voice instruction;

the voice instruction to be supplemented determining unit is configured to determine a voice instruction to be supplemented based on a preset voice instruction set and the first voice instruction, wherein the preset voice instruction set comprises all voice instructions for realizing the function corresponding to the voice text;

and the prompt information unit is configured to send prompt information corresponding to the voice instruction to be supplemented to the user so that the user can input the voice instruction to be supplemented.

According to the technical scheme, the application provides a voice interaction method and device. The voice text can be obtained and analyzed to obtain a first voice instruction. And determining a voice instruction to be supplemented based on a preset voice instruction set and the first voice instruction, wherein the preset voice instruction set comprises all voice instructions for realizing the function corresponding to the voice text. And sending the prompt information corresponding to the voice command to be supplemented to the user so that the user can input the voice command to be supplemented. According to the method and the device, all voice instructions of the functions corresponding to the voice texts can be acquired, so that the complete requirements of the user can be analyzed, the corresponding operation can be realized for the intelligent electronic equipment, the functions required by the user can be better realized, and the experience of the user is improved.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a schematic diagram of a voice recognition network architecture according to an embodiment of the present application;

FIG. 2 is a diagram illustrating an application scenario of the voice interaction method in some embodiments;

FIG. 3 illustrates a flow diagram of a method of voice interaction in some embodiments;

FIG. 4 illustrates a flow diagram for determining voice instructions to supplement in some embodiments;

FIG. 5 illustrates a flow diagram for detecting a voice instruction in some embodiments;

FIG. 6 is a diagram that illustrates a semantic score map for a second voice instruction in some embodiments;

FIG. 7 shows a schematic diagram of a voice interaction device in some embodiments.

Detailed Description

To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "to be supplemented," "third," and the like in the description and claims of this application and in the above-described figures, are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

For clarity of explanation of the embodiments of the present application, a speech recognition network architecture provided by the embodiments of the present application is described below with reference to fig. 1.

Referring to fig. 1, fig. 1 is a schematic diagram of a voice recognition network architecture according to an embodiment of the present application. In fig. 1, the smart device is configured to receive input information and output a processing result of the information. The voice recognition service equipment is electronic equipment with voice recognition service deployed, the semantic service equipment is electronic equipment with semantic service deployed, and the business service equipment is electronic equipment with business service deployed. The electronic device may include a server, a computer, and the like, and the speech recognition service, the semantic service (also referred to as a semantic engine), and the business service are web services that can be deployed on the electronic device, wherein the speech recognition service is used for recognizing audio as text, the semantic service is used for semantic parsing of the text, and the business service is used for providing specific services such as a weather query service for ink weather, a music query service for QQ music, and the like. In one embodiment, in the architecture shown in fig. 1, there may be multiple entity service devices deployed with different business services, and one or more function services may also be aggregated in one or more entity service devices.

In some embodiments, the following describes an example of a process for processing information input to a smart device based on the architecture shown in fig. 1, where the information input to the smart device is an example of a query statement input by voice, the process may include the following three processes:

[ Speech recognition ]

The intelligent device can upload the audio of the query sentence to the voice recognition service device after receiving the query sentence input by voice, so that the voice recognition service device can recognize the audio as a text through the voice recognition service and then return the text to the intelligent device. In one embodiment, before uploading the audio of the query statement to the speech recognition service device, the smart device may perform denoising processing on the audio of the query statement, where the denoising processing may include removing echo and environmental noise.

[ semantic understanding ]

The intelligent device uploads the text of the query sentence identified by the voice identification service to the semantic service device, and the semantic service device performs semantic analysis on the text through semantic service to obtain the service field, intention and the like of the text.

[ semantic response ]

And the semantic service equipment issues a query instruction to corresponding business service equipment according to the semantic analysis result of the text of the query statement so as to obtain the query result given by the business service. The intelligent device can obtain the query result from the semantic service device and output the query result. As an embodiment, the semantic service device may further send a semantic parsing result of the query statement to the intelligent device, so that the intelligent device outputs a feedback statement in the semantic parsing result.

It should be noted that the architecture shown in fig. 1 is only an example, and does not limit the scope of the present application. In the embodiment of the present application, other architectures may also be adopted to implement similar functions, for example: all or part of the three processes can be completed by the intelligent terminal, and are not described herein.

In some embodiments, the intelligent device shown in fig. 1 may be a display device, such as an intelligent television, the functions of the speech recognition service device may be implemented by cooperation of a sound collector and a controller provided on the display device, and the functions of the semantic service device and the business service device may be implemented by the controller of the display device or by a server of the display device. The intelligent device shown in fig. 1 may also be an electronic device such as a refrigerator, an air conditioner, or an oven, and the embodiments of the present application are not limited.

When the user uses the voice interactive function, the voice command input by the user may not be complete. For example, some key slot position parameters are lacking in the voice instruction, so that the complete requirement of the user cannot be analyzed, and at the moment, the intelligent electronic device cannot realize corresponding operation, so that the user experience of using the intelligent electronic device is poor.

In order to solve the above problem, an embodiment of the present application provides a voice interaction method, which can analyze a complete requirement of a user when the user uses a voice interaction function, thereby better implementing an operation required by the user and improving user experience.

Fig. 2 is a schematic view of an application scenario of the voice interaction method in some embodiments, which may be applied to a scenario in which a user manipulates an intelligent electronic device with voice, and is executed by the intelligent electronic device, or is executed by a device dedicated to voice interaction and disposed in the intelligent electronic device, and the execution subject is taken as an exemplary illustration of the intelligent electronic device, and is not limited thereto. The intelligent electronic equipment can be mobile phones, computers, televisions, washing machines, air conditioners, sound boxes and other electronic equipment.

FIG. 3 shows a flow diagram of a voice interaction method in some embodiments, the method comprising the steps of:

and step S101, acquiring a voice text. The voice text may be obtained by analyzing the first voice signal input by the user.

When the user uses the intelligent electronic equipment, the voice interaction function can be utilized to realize the control of the intelligent electronic equipment by utilizing voice. At this time, the user can control the intelligent electronic device in a mode of inputting a voice signal within a range where the intelligent electronic device can receive the signal. For example, assuming that the intelligent electronic device is an air conditioner, when a user wants to use the air conditioner, a voice of "turn on the air conditioner, cool 20 degrees" may be spoken to the air conditioner, so that the air conditioner realizes a corresponding cooling function.

The intelligent electronic device can receive the voice signal input by the user through an internal sound collector, such as a microphone. After the voice signal is received, the voice signal can be analyzed, so that a voice text corresponding to the voice signal is obtained.

And S102, analyzing the voice text to obtain a first voice instruction.

After the first voice signal input by the user is analyzed, the voice text can be obtained. In order to analyze the user's needs, the intelligent electronic device may continue to analyze the speech text, thereby obtaining the speech instructions contained in the speech text. For example, when the intelligent electronic device is an air conditioner, the user inputs a voice of "turn on the air conditioner and cool by 20 degrees", and after analyzing the voice text, the user can know that the voice text contains two instructions. One instruction is "cooling", and its detailed meaning should be "cooling mode in which the air conditioner is turned on". The other instruction is "20 degrees", which means in detail "set the temperature indicated by the air conditioner to 20 degrees". Thus, the first voice instruction includes the above two instructions.

In some embodiments, a semantic understanding unit may be disposed in the intelligent electronic device, and the first voice instruction may be obtained by inputting a voice text into the semantic understanding unit.

Step S103, determining a voice instruction to be supplemented based on a preset voice instruction set and the first voice instruction. The preset voice instruction set comprises all voice instructions for realizing the functions corresponding to the voice texts.

When a user uses the intelligent electronic device, the intelligent electronic device can realize multiple functions, and when the realized functions are different, the instructions input by the user are also different. For example, when the intelligent electronic device is an air conditioner, a mode adjustment function, a temperature adjustment function, or a timing function may be implemented. Wherein, the mode adjustment function can control the air conditioner to realize the refrigeration effect or the heating effect. The temperature adjustment function may modify the temperature indicated by the air conditioner. The timing function may set the operating time of the air conditioner. In order to implement a certain function, the corresponding condition needs to be satisfied, that is, the user needs to input a corresponding instruction.

For different functions which can be realized by the intelligent electronic equipment, the complexity of each function is different, and some functions are simpler and some functions are more complex. For more complex functions, the conditions that need to be satisfied are also more complex, and generally, the conditions are defined by a plurality of instruction combinations to form corresponding conditions. At this time, the user needs to input a plurality of instructions to realize the corresponding functions. For example, when the user wants the air conditioner to realize the mode adjustment function, the relevant instruction for the specific mode needs to be explicitly given, and the instruction may be "enter cooling mode" or "heat. It is also necessary to set what the specific temperature indicated by the air conditioner is, for example, "set temperature is 20 degrees". The air conditioner can realize the mode adjusting function only when receiving the two commands of 'air conditioner mode' and 'air conditioner temperature'. In the embodiment of the present application, the functions of the intelligent electronic device can be set to include two types: single instruction functions and multiple instruction functions. The single-instruction function refers to a function which is relatively simple to implement, and a user can input one instruction to implement the function. The multi-instruction function refers to a function which is complex to implement, and a user needs to input a plurality of instructions to implement the function.

For each voice function, a voice instruction set may be preset, where the voice instruction set includes all voice instructions for implementing the voice function. The voice instruction set of each voice function can be set by the user, and the embodiment of the present application is not limited in particular. Meanwhile, a database can be preset in the intelligent electronic equipment, and all voice functions which can be realized by the intelligent electronic equipment and a voice instruction set corresponding to each voice function are stored in the database. Therefore, all voice instructions required to be met for realizing each voice function can be known through a preset database. It should be noted that the voice function in the embodiment of the present application refers to a function that can be implemented by the intelligent electronic device.

When a user operates an intelligent electronic device by voice, if a certain more complex function, i.e., a multi-command function, is to be implemented, the condition for implementing the function may not be known, i.e., all commands are unknown. At this time, the user may only speak a part of the instructions, so that the corresponding function cannot be realized. For example, a user wants to implement a mode adjustment function of an air conditioner, but only says "cooling" during use, but does not set a specific temperature, and at this time, the air conditioner cannot implement the mode adjustment function. In this case, the intelligent electronic device needs to guide the user to give the rest of instructions, so as to implement the corresponding functions.

In some embodiments, a first speech function corresponding to the phonetic text may be first determined. It may be specifically determined whether the first speech function is a single instruction function or a multiple instruction function.

After the first voice function corresponding to the voice text is determined, it may be determined whether the instruction input by the user can meet a condition for implementing the first voice function, that is, whether all the voice instructions corresponding to the first voice function are included in the first voice instruction.

Specifically, a preset voice instruction set corresponding to the first voice function may be obtained in a preset database. All voice instructions for realizing the first voice function can be determined according to the voice instruction set.

In some embodiments, if the first voice function is a single command function, the first voice function can be implemented as long as the user inputs a correct voice command, and there is no need to detect other voice commands at this time, because the first voice function has only one voice command.

If the first voice function is a multi-instruction function and the first voice instruction includes all the voice instructions for realizing the first voice function, that is, the condition for realizing the first voice function is satisfied, the intelligent electronic device can directly realize the first voice function.

If the first voice function is a multi-instruction function, but the first voice instruction does not include all voice instructions for realizing the first voice function, the intelligent electronic device cannot realize the first voice function. At this time, it is necessary to determine the voice command to be supplemented that needs to be satisfied in order to implement the first voice function. Specifically, the voice commands except the first voice command in all the voice commands can be determined, so as to obtain the voice command to be supplemented. FIG. 4 illustrates a flow diagram for determining voice instructions to supplement in some embodiments.

For example, when the user uses the oven, the user wants to use the preheating function of the oven while inputting the voice "preheat for 10 minutes". After the oven receives the voice of the user, the voice text can be analyzed first, and the voice instruction of the user is obtained as 'the preheating time is set to be 10 minutes'. And determining the function which the user wants to realize according to the voice text. After determining that the user wants to implement the preheating function, all voice commands corresponding to the preheating function, including "time of preheating process" and "temperature of preheating process", can be queried in the database. Therefore, it can be determined that the user has not input all the voice instructions required to implement the warm-up function. At this time, a voice command to be supplemented, that is, "temperature of the preheating process" may be acquired.

And step S104, sending the prompt information corresponding to the voice command to be supplemented to the user so that the user can input the voice command to be supplemented.

After the voice command to be supplemented is determined, in order to realize the corresponding first voice function, the condition that the voice command of the user is incomplete needs to be informed, so that the user inputs the voice command to be supplemented.

Specifically, the related prompt information may be generated according to the voice instruction to be supplemented. For example, when the voice instruction to be supplemented is "temperature of warm-up process", a prompt message "please tell me temperature to set" may be made. And meanwhile, the prompt information is converted into active inquiry voice, and the active inquiry voice is broadcasted to the user through an audio output interface of the intelligent electronic equipment, such as a loudspeaker, so that the user can input a voice command to be supplemented. So far, the user and the intelligent electronic device complete the first round of voice interaction process.

After the user receives the active inquiry voice corresponding to the voice command to be supplemented, the voice command to be supplemented can be input to the intelligent electronic equipment. After the intelligent electronic equipment receives the voice command to be supplemented, the first voice function can be realized. For example, when the user hears the active query voice "please tell me a temperature to set", the voice instruction, for example, "200 degrees", may be continuously input. At this time, the oven has received all voice commands required to implement the preheating function, and thus a corresponding preheating operation can be performed. The oven can also continue to reply with the user prompt message of "good", thereby realizing the second round of voice interaction process. Therefore, the voice interaction process between the user and the oven is completely finished, and the preheating function required by the user is realized.

In some embodiments, there may be incomplete conditions in a single voice command input by the user, which may result in the intelligent electronic device not accurately recognizing the correct voice command.

For example, when a user uses a refrigerator, the user wants to open the refrigerator door and inputs a voice "open the refrigerator door". After the refrigerator receives the voice signal, the function is detected to be a single instruction function, and meanwhile, the voice instruction for realizing the function is inquired to be the type of the opened refrigerator door. It should be noted that, for a refrigerator, the refrigerator has storage compartments such as a refrigerating compartment, a freezing compartment, and a wide temperature compartment, and each storage compartment has a refrigerator door. However, at this time, the user merely inputs an instruction to open the door, and does not specifically define which of the storage compartments the door is opened. Therefore, the voice command inputted by the user is incomplete, and the refrigerator cannot recognize the correct voice command. Although the user inputs the voice command corresponding to the opening function of the refrigerator door, the command is incorrect, so that the corresponding function cannot be realized. In this case, the intelligent electronic device needs to guide the user to give a complete instruction to implement the corresponding function.

In some embodiments, a voice instruction structure may be preset for completely expressing the voice instruction. The voice command structure may be composed of a plurality of slots, such as command type (CommandType), destination (Object), destination Value (Value), Section (Section), and Units (Units), so as to ensure the integrity of the voice command to successfully implement each voice command. Wherein the instruction type is a specific action, such as setting, increasing or decreasing; various functional purposes for regulation, such as temperature, various modes, etc.; the target value is the degree of adjustment, e.g., a specifically set value, "20" in 20 degrees; the components are specific components of regulation, such as a refrigerating chamber, a freezing chamber or a wide temperature chamber; the unit is a unit of measurement of a numerical value, such as minutes, degrees centigrade, and the like.

Each voice instruction can have a unique voice instruction structure body and can comprise fixed slot positions, and slot position parameters of each slot position are input by a user. When the user inputs the same type of voice command, but different slot position parameters are input, the effect of the voice command is different. For example, for the voice instruction "time of preheating process", the voice instruction structure thereof may include four slots: instruction type, destination value, and unit. The slot parameter of the instruction type is generally "set", the slot parameter of the destination is generally "preheat", the slot parameter of the destination is generally a specific time value, and the slot parameter of the unit is generally "minute" or "second". When the voice input by the user is "warm-up for 10 minutes", it can be analyzed that the voice is actually meant as "set warm-up time for 10 minutes". Therefore, it can be determined that the slot parameter of the instruction type is "set", the slot parameter of the destination is "preheat", the slot parameter of the destination value is "10", and the slot parameter of the unit is "minute".

If the user inputs speech without some of the slot parameters, the speech instruction is incomplete. For example, the user gives a voice "warm up" when the slot parameters for the destination value and unit are missing. It is necessary to guide the user to give a complete instruction, i.e. to give the missing slot parameter.

It should be noted that the voice command structure of each voice command can be set by a technician, for example, by a developer of an algorithm related to the voice interaction function, so as to implement the voice interaction function of the intelligent electronic device.

In some embodiments, the first slot position parameter in each first voice instruction may be obtained. For example, for a first voice instruction "open the refrigerator door", it may be determined that the first slot position parameter included therein is: the slot parameter of the instruction type is "set", the slot parameter of the destination is "door", and the slot parameter of the destination value is "open".

For each voice instruction, a slot parameter set may be preset, where the slot parameter set includes all slot parameters that the voice instruction needs to include. Specifically, all slot positions in the voice instruction structural body corresponding to the voice instruction may be counted, and the slot position parameters required by the slot positions may be made into a slot position parameter set of each voice instruction. Meanwhile, the slot position parameter set of each voice instruction can be stored in a preset voice instruction database. Therefore, all slot position parameters required to be contained for realizing each voice instruction can be known through a preset voice instruction database.

Therefore, the first slot position parameter set corresponding to the first voice instruction can be obtained through preset data. And determining whether the first voice instruction is complete according to the first slot position parameter set and the first slot position parameters.

When the first slot position parameter contains all slot position parameters required to be contained by the first voice instruction, the first voice instruction is proved to be complete.

When the first slot position parameter does not contain all slot position parameters required to be contained by the first voice instruction, the first voice instruction is incomplete, and at the moment, the slot position parameters which need to be supplemented are determined. Specifically, the remaining slot position parameters except the first slot position parameter in the first slot position parameter set may be determined to obtain the second slot position parameter. The second slot position parameter is the slot position parameter which needs to be supplemented in the first voice instruction. For example, for the voice instruction "open the refrigerator door", the missing slot parameter may be determined to be the slot parameter corresponding to the component, and may be one of "cold room, freezer room, wide temperature room".

In some embodiments, the user may not specifically speak a slot parameter, but may analyze the slot parameter based on the meaning of the voice instruction. For example, after the voice text is analyzed for the voice "turn on the air conditioner and cool 20 degrees" input by the user, it can be known that the voice text contains two instructions. One command is "cool" and the other command is "20 degrees". For the voice command "cool", its detailed meaning should be "cool mode in which the air conditioner is turned on". The slot parameter of the instruction type is "set", the slot parameter of the destination is "mode", and the slot parameter of the destination value is "cool". For the voice instruction "20 degrees", the detailed meaning thereof is "set the temperature indicated by the air conditioner to 20 degrees". The slot position parameter of the instruction type is 'set', the slot position parameter of the purpose is 'temperature', the slot position parameter of the purpose value is 'refrigeration', and the slot position parameter of the unit is 'centigrade'. Although the user does not speak all the slot information of the voice command, it can be analyzed that the command is complete.

After the second slot position parameter is determined, the user needs to be informed of the incomplete condition of the voice instruction, so that the user can input the second slot position parameter.

Specifically, the related prompt information may be obtained according to the second slot position parameter. For example, when the second slot parameter is a slot parameter corresponding to a component, a prompt message "ask you which storage room door to open" may be created. And simultaneously converting the prompt information into active inquiry voice, and broadcasting the active inquiry voice to the user through a loudspeaker so that the user can input the second slot position parameter. After the user hears the active inquiry voice corresponding to the second slot position parameter, the second slot position parameter can be input to the intelligent electronic device.

In some embodiments, after the user inputs the voice, the type of the first voice function corresponding to the voice text may be detected. FIG. 5 illustrates a flow diagram for detecting a voice instruction in some embodiments.

When the first voice function is a single instruction function, it is only necessary to detect whether the voice instruction is complete. When the voice command is incomplete, prompt information corresponding to the slot position parameter to be supplemented needs to be sent to a user.

When the first voice function is a multiple instruction function, it is necessary to detect the number and integrity of voice instructions at the same time. The number of the voice commands is detected, that is, whether the voice commands include all the voice commands corresponding to the voice function is detected. And detecting the integrity of the voice instructions, namely detecting whether the slot position parameter of each voice instruction is complete. And when the number and the integrity of the voice instructions do not meet the conditions, prompting information needs to be sent to the user. The prompt message comprises a missing voice instruction to be supplemented and a missing slot position parameter in the first voice instruction.

By detecting the number and the integrity of the voice commands, whether the intelligent electronic equipment can realize the corresponding voice function or not can be determined.

In some embodiments, after the intelligent electronic device broadcasts the active query voice, the user may continue to input voice signals. However, the voice input by the user before and after may be irrelevant, and at this time, the intelligent electronic device may understand the wrong voice instruction, so that the wrong voice function is realized, and the user experience is reduced. For example, the first voice signal input by the user is "refrigerator temperature is set to 5 degrees", and the intelligent electronic device can feed back "good, which storage room temperature needs to be controlled". If the second voice signal inputted by the user is "please increase the freezer temperature by 2 degrees", the front and rear voices are irrelevant, and the refrigerator may misunderstand that "the freezer temperature is set to 5 degrees", and the refrigerator may implement an erroneous function. Therefore, it is necessary to determine whether or not the voice command input before and after the user is a relevant command.

In some embodiments, when the intelligent electronic device receives a second voice signal input by the user, the second voice instruction and the first voice instruction can be detected. Specifically, the association degree of the second voice instruction with the first voice instruction may be detected.

In some embodiments, upon detecting the association of the second voice instruction with the first voice instruction, all slot parameters in the first voice instruction may be determined first. For example, when the first voice signal input by the user is "set the refrigerator temperature to 5 degrees", the slot position parameters determined are as shown in the following table:

groove position	Name of the device	Purpose(s) to	Type of instruction	Target value	Unit of
						Slot position parameter	Refrigerator with a door	Temperature of	Is provided with	5	Degree centigrade

After analyzing the voice instruction converted from the first voice signal, it may be determined that the slot parameter includes 5 slot parameters, where the slot parameter corresponding to the device name is "refrigerator", the slot parameter corresponding to the destination is "temperature", the slot parameter corresponding to the instruction type is "set", the slot parameter corresponding to the destination value is "5", and the slot parameter corresponding to the unit is "celsius".

After all slot position parameters in the first voice instruction are determined, the association degree of each slot position parameter and the second voice instruction can be respectively calculated.

In some embodiments, when calculating the degree of association between the slot position parameter and the second voice instruction, the second voice signal may be first analyzed to obtain the second voice instruction. For example, for the speech signal "please turn the freezer compartment temperature up by 2 degrees", the second speech command may be obtained as "turn the freezer compartment temperature up by 2 degrees". However, when the intelligent electronic device analyzes the semantics, the most accurate semantics are not artificially and subjectively determined, and the second voice command may have multiple semantics. For example, "turn up" in the second voice instruction may have a variety of semantics: increase, decrease, set, query, etc., each semantic having a different probability, with the "increased" probability being the highest.

At this time, a plurality of semantics of the second voice instruction and a semantic score corresponding to each of the semantics may be determined. The semantic score represents the probability degree of the semantic, and the higher the semantic score is, the higher the probability degree is, and the more accurate the semantic is.

In some embodiments, when determining the plurality of semantics of the second voice instruction and the semantic score corresponding to each semantic, all words included in the second voice instruction may be determined first.

After determining all words included in the second voice instruction, a word set corresponding to each word may be obtained, and all word semantics corresponding to the word and a weight value of each word semantic may be included in the word set. Specifically, a term set corresponding to each term may be obtained by using a knowledge graph.

A knowledge graph is essentially a semantic network that can represent semantic relationships between entities. Entities are used as vertexes or nodes in the knowledge graph, and relationships are used as edges. The knowledge graph can be constructed in various ways, and for the prior art, the knowledge graph can be set by a technician, for example, by a developer of a voice interaction function related algorithm, so that the voice interaction function of the intelligent electronic device is realized. The embodiments of the present application are not limited.

After the word set corresponding to each word is obtained, the multiple semantics of each word can be combined, so that multiple semantic conditions of the second voice instruction are obtained. For example, for the speech instruction "freezer compartment temperature adjusted up by 2 degrees", its semantics may include: "freezer temperature increases by 2 degrees", "freezer temperature decreases by 2 degrees", "freezer temperature is set to 2 degrees", and "query freezer temperature", etc.

For each semantic meaning of the second voice instruction, the semantic score is the weight value of each word semantic meaning. All semantic conditions can be arranged according to the sequence of the voice scores of each semantic of the second voice instruction from large to small to obtain a semantic score map of the second voice instruction. FIG. 6 illustrates a schematic diagram of a semantic score map for a second voice instruction in some embodiments. For the target value slot, when the slot parameter is a specific numerical value, the slot parameter can be represented by a # symbol, and can also be filled with the specific numerical value; when there is no specific numerical value, it may not be expressed.

So far, the semantic condition of the second voice instruction can be analyzed. It should be noted that the method for analyzing semantic situations is prior art, and the embodiment of the present application is only an exemplary method, and does not limit the scope of the present application.

In some embodiments, after the semantic condition of the second voice instruction is analyzed, the highest semantic score of all semantics may be taken as the first score of the slot parameter.

When calculating the relevance between a certain slot position parameter of the first voice instruction and the second voice instruction, the slot position parameter and all semantics of the second voice instruction can be matched in the semantic score map. When the slot position parameter is found in the semantic score graph, the matching between the semantics containing the slot position parameter and the slot position parameter is judged to be successful. Namely, according to the sequence of the semantic score graph from top to bottom, the semantic of the slot position parameter appearing for the first time is set as the semantic of successful matching.

And if a plurality of semantemes all contain the slot position parameter, judging that the semanteme with the highest semantic score is successfully matched with the slot position parameter. For example, for the slot parameter "add", the matching successful speech is the semantic of the first line. For the slot position parameter "set", since it appears in the semantics of the third row for the first time, the semantics successfully matched with it is the semantics of the third row.

And when a certain slot position parameter of the first voice instruction is successfully matched with a certain semantic of the second voice instruction, determining the semantic score of the semantic as a second score of the slot position parameter.

After determining the first score and the second score for each slot parameter, the second score may be subtracted from the first score to obtain a difference between each slot parameter and the second voice instruction. The greater the difference, the smaller the association between the slot position parameter and the second voice instruction.

For the first voice instruction "set refrigerator temperature to 5 degrees", 5 slot location parameters are included: "refrigerator", "temperature", "set", "5" and "degrees celsius". Wherein, for the slot parameter "refrigerator", the first score is 38.06, and the second score is 38.06, so the difference between the slot parameter "refrigerator" and the second voice instruction is 0. Similarly, the difference between the slot position parameter 'temperature' and the second voice instruction is 0; the difference between the slot position parameter 'setting' and the second voice instruction is 1.69; the difference between the slot position parameter '5' and the second voice instruction is 0; the slot position parameter "centigrade" and the second voice instruction have a difference of 0.

It should be noted that, since the destination slot is different from other slots, a specific value may be set in the destination slot. Therefore, when calculating the difference corresponding to the slot parameter of the destination slot, it can be determined whether the slot parameter is a specific value. If the slot position parameter is a specific numerical value, whether the corresponding word semantics in the semantic score map is also the specific numerical value can be judged, if the semantic score map is not the specific numerical value, random matching can be considered, and the semantic score of the first row can be determined as the second score of the slot position parameter. And if the semantic score map is a specific numerical value, matching the slot position parameter with the numerical value in the semantic score map. When all values in the semantic score map are not successfully matched with the slot position parameter, the difference can be determined to be a preset value, for example 2.

In some embodiments, the sum of the degree of association and the degree of difference of the slot parameter and the second voice instruction may be set to 1. Therefore, after the difference degree between each slot position parameter and the second voice instruction is obtained, the association degree between each slot position parameter and the second voice instruction can be determined. Wherein the degree of association is 1-degree of difference.

For example, for the first voice instruction "set refrigerator temperature to 5 degrees", the association degrees of the slot location parameters "refrigerator", "temperature", "set", "5", and "degrees celsius" are in order: 1. 1, -0.69, -1, 1.

In some embodiments, when the association degrees of all slot position parameters in the first voice instruction and the second voice instruction are determined, the sum of all the association degrees can be obtained and used as the association degree of the first voice instruction and the second voice instruction. For example, for the first voice command "set the refrigerator temperature to 5 degrees", the overall degree of association is 1.31.

An association threshold, e.g. 3, may be preset. When it is detected that the association degree of the first voice command and the second voice command reaches a preset association threshold, it can be considered that the association between the first voice command and the second voice command is high, the content replied by the user is related to the voice interaction process, and voice interaction can be continued.

When the relevance degree is detected not to reach the preset relevance threshold, the content replied by the user is considered to be irrelevant to the voice interaction process, the voice interaction process can be interrupted at the moment, and the intelligent electronic equipment cannot realize the first voice function required by the user.

For example, the relevance degree of the first voice instruction "set the refrigerator temperature to 5 degrees" is 1.31, which is smaller than the relevance threshold value 3, so that the first voice instruction "set the refrigerator temperature to 5 degrees" is irrelevant to the second voice instruction "adjust the freezer temperature to 2 degrees", and the voice interaction at this time is interrupted.

In some embodiments, if the sum of the association degrees of the slot position parameters is determined, the accuracy may be affected in consideration of the difference in the number of slot position parameters included in different first voice commands. Therefore, after the sum of the association degrees of all the slot position parameters is obtained, the average association degree of each slot position parameter can be further calculated, that is, the average value of all the association degrees is calculated. The average value may be used as the association degree of the second voice command with the first voice command.

A smaller value, for example 0.7, may be set when setting the preset association threshold. At this time, whether the average value reaches a preset association threshold value can be detected, so that the association between the first voice command and the second voice command is judged.

In some embodiments, when it is detected that the association degree of the first voice instruction and the second voice instruction is greater than the preset association threshold, the voice interaction process may be continued. Detection of the second voice instruction may continue at this point.

At this time, it is necessary to determine whether the instruction input by the user can satisfy the condition for implementing the first voice function. Specifically, it may be determined whether the second voice command includes all voice commands corresponding to the voice command to be supplemented.

If the second voice instruction contains all the voice instructions corresponding to the voice instruction to be supplemented, the fact that the intelligent electronic equipment receives all the voice instructions required for realizing the first voice function at the moment is indicated, and the operation corresponding to the first voice function can be realized at the moment. Meanwhile, the intelligent electronic device can reply to the user 'good'.

If the second voice instruction does not contain all voice instructions corresponding to the voice instruction to be supplemented, the user is still required to supplement the remaining voice instructions. At this time, all the voice commands except the second voice command in the voice commands to be supplemented can be determined, and a third voice command is obtained. Meanwhile, the prompt message corresponding to the third voice instruction can be sent to the user, so that the user can input the third voice instruction.

In some embodiments, the intelligent electronic device may detect the number and integrity of voice commands together. When the number and the integrity of the first voice instructions of the user do not meet the conditions, the intelligent electronic device can send prompt information to the user. The prompt message comprises a missing voice instruction to be supplemented and a missing second slot position parameter in the first voice instruction.

When the second voice signal of the user is received, whether the second voice signal of the user comprises a voice instruction to be supplemented and the second slot position parameter can be detected. When the second voice signal of the user comprises all the contents, the intelligent electronic device can realize the corresponding first voice function.

When some information still lacks in the second voice signal of the user, the intelligent electronic device needs to broadcast the prompt information corresponding to the lacking information to the user again, so that the user supplements all information perfectly.

For example, the first voice signal input by the user is "turn on preheat". According to the voice instruction set corresponding to the preheating function, it can be known that the voice instruction required for realizing the preheating function is the time of the preheating process and the temperature of the preheating process. The following table is a schematic diagram of the complete voice instruction structure for the preheat function in some embodiments.

Therefore, for the "time to preheat process" instruction, the slot parameters that are missing are the slot parameters corresponding to the "destination value" and the "unit". While also lacking the entire instruction "temperature for preheat process".

At this time, the oven needs to use the slot parameter of the "destination value" and the "unit" as the second slot parameter, use the "temperature in the preheating process" as the voice instruction to be supplemented, and broadcast the corresponding prompt information to the user, for example, the prompt information may be "please tell me the duration and temperature that you need to set up", so that the user is supplemented completely. So far, the first round of voice interaction is.

After the user hears the prompt message, the second voice signal can be sent to the oven. If the second voice signal comprises the second slot position parameter and the voice instruction to be supplemented, the oven can perform corresponding preheating operation. The oven can also continue to reply with the user prompt message of "good", thereby realizing the second round of voice interaction process. Therefore, the voice interaction process between the user and the oven is completely finished, and the preheating function required by the user is realized.

If the second voice signal does not comprise the second slot position parameter and the voice instruction to be supplemented, information is still missing. For example, the user replies to the second voice signal "10 minutes", at which point the user gives the second slot location parameter, completing the "time to preheat process" instruction, but still lacking the "temperature to preheat process" instruction. At this point, the oven may continue to send a prompt to the user, such as "good, please tell me the temperature you are to set". To this end, the user and the intelligent electronic device complete a second round of voice interaction.

When the user again receives the prompt, a "200 degree" voice command may be spoken to the oven. At this time, the oven receives all voice commands required for implementing the preheating function, so that the corresponding preheating operation can be performed. The prompt oven can continue to reply the user prompt message 'good', so far, the third voice interaction process is realized.

In some embodiments, if the user inputs an incorrect voice command, the intelligent electronic device may directly interrupt the voice interaction. For example, the user gives a first voice instruction "set refrigerator temperature to 5 degrees", at which time the user must be replied to "good, which storage room temperature needs to be controlled". If the user inputs a second voice instruction that the temperature of the freezing chamber is increased by 2 degrees, the second voice instruction is irrelevant to the first voice instruction, and the refrigerator interrupts the voice interaction.

Meanwhile, the refrigerator can take the voice command input by the user for the last time as a new first voice command, namely, taking a second voice command 'freezing chamber temperature is increased by 2 degrees' of the last voice interaction process as a first voice command of the voice interaction process, and performing the subsequent voice interaction process.

In some embodiments, there may be multiple rounds of interaction per voice interaction process. The intelligent display device can set the maximum turn of voice interaction in each voice interaction process. For example, a maximum of 3 interaction rounds per voice interaction may be set. And if the user still does not completely provide all voice instructions in the 3-round interaction process, interrupting the voice interaction.

When the user continues to input the voice command in the 4 th round, the voice command in the 4 th round can be used as the voice command in the 1 st round of the next voice interaction process, and the next voice interaction process is continued.

An embodiment of the present application provides a voice interaction apparatus, configured to execute the embodiment corresponding to fig. 2, where as shown in fig. 7, the voice interaction apparatus includes:

a voice text acquisition unit 201 configured to acquire a voice text, which is obtained by analyzing a first voice signal input by a user;

a voice text analysis unit 202 configured to analyze the voice text to obtain a first voice instruction;

a to-be-supplemented voice instruction determining unit 203, configured to determine a to-be-supplemented voice instruction based on a preset voice instruction set and the first voice instruction, where the preset voice instruction set includes all voice instructions for implementing a function corresponding to the voice text;

and the prompt information unit 204 is configured to send prompt information corresponding to the voice instruction to be supplemented to the user so that the user inputs the voice instruction to be supplemented.

The voice interaction apparatus may be installed in various electronic devices such as a display device, a refrigerator, an air conditioner, or an oven. So that the electronic device can implement the voice interaction process as described above, thereby improving the user experience.

The same and similar parts in the embodiments in this specification may be referred to one another, and are not described herein again.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method in the embodiments or some parts of the embodiments of the present invention.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method of voice interaction, comprising:

acquiring a voice text, wherein the voice text is obtained by analyzing a first voice signal input by a user;

analyzing the voice text to obtain a first voice instruction;

determining a voice instruction to be supplemented based on a preset voice instruction set and the first voice instruction, wherein the preset voice instruction set comprises all voice instructions for realizing the function corresponding to the voice text;

and sending the prompt information corresponding to the voice instruction to be supplemented to the user so that the user can input the voice instruction to be supplemented.

2. The method of voice interaction according to claim 1, wherein the determining a voice instruction to be supplemented based on the preset voice instruction set and the first voice instruction comprises:

determining a first voice function corresponding to the voice text;

acquiring a preset voice instruction set corresponding to the first voice function from a preset database, wherein all voice functions and the preset voice instruction set corresponding to each voice function are stored in the database;

determining all voice instructions for realizing the first voice function according to a preset voice instruction set corresponding to the first voice function;

and determining the voice instructions except the first voice instruction in all the voice instructions to obtain the voice instruction to be supplemented.

3. The method of claim 1, wherein after the step of analyzing the voice text to obtain the first voice command, the method further comprises:

acquiring a first slot position parameter in the first voice instruction;

acquiring a first slot position parameter set, wherein the first slot position parameter set comprises all slot position parameters required to be contained in the first voice instruction;

determining a second slot position parameter based on the first slot position parameter set and the first slot position parameter;

and sending the slot position parameter prompt message corresponding to the second slot position parameter to the user so that the user inputs the second slot position parameter.

4. The voice interaction method of claim 3, wherein the determining the first slot parameter set comprises:

the determining a second slot position parameter based on the first slot position parameter set and the first slot position parameter comprises:

5. The voice interaction method of claim 1, further comprising:

receiving a second voice signal input by a user;

analyzing the second voice signal to obtain a second voice instruction;

determining a third voice instruction based on the voice instruction to be supplemented and the second voice instruction, wherein the third voice instruction is all voice instructions except the second voice instruction in the voice instruction to be supplemented;

and sending the prompt message corresponding to the third voice instruction to the user so that the user inputs the third voice instruction.

6. The voice interaction method of claim 5, further comprising:

acquiring the association degree of the second voice instruction and the first voice instruction;

and when detecting that the association degree reaches a preset association threshold value, executing a step of determining a third voice instruction based on the voice instruction to be supplemented and the second voice instruction.

7. The method of claim 6, wherein the obtaining the association degree between the second voice command and the first voice command comprises:

acquiring all slot position parameters in the first voice instruction;

calculating the association degree of each slot position parameter and the second voice instruction;

and calculating the average value of all the association degrees, and taking the average value as the association degree of the second voice command and the first voice command.

8. The voice interaction method of claim 7, wherein the calculating the relevancy of each slot position parameter to the second voice instruction comprises:

acquiring a plurality of semantemes of the second voice instruction and the semantic score of each semanteme, and determining the semantic score with the highest score as a first score;

matching each slot position parameter with various semantics of the second voice instruction respectively, and determining a semantic score corresponding to the successfully matched semantics as a second score of each slot position parameter;

subtracting a second score from the first score to obtain the difference between each slot position parameter and the second voice instruction;

and determining the association degree of each slot position parameter and the second voice instruction according to the difference degree, wherein the sum of the difference degree and the association degree is 1.

9. The voice interaction method of claim 6, further comprising:

10. A voice interaction apparatus, comprising: