CN110211577B

CN110211577B - Terminal equipment and voice interaction method thereof

Info

Publication number: CN110211577B
Application number: CN201910655031.3A
Authority: CN
Inventors: 陈斌德
Original assignee: Ningbo Fotile Kitchen Ware Co Ltd
Current assignee: Ningbo Fotile Kitchen Ware Co Ltd
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2021-06-04
Anticipated expiration: 2039-07-19
Also published as: CN110211577A

Abstract

The invention discloses a terminal device and a voice interaction method thereof. The method comprises the following steps: receiving a voice input; uploading the voice input to an analysis server for on-line semantic analysis; receiving an online semantic parsing result fed back by the parsing server; according to a semantic error correction rule prestored in the terminal equipment, performing semantic error correction on the online semantic analysis result locally at the terminal equipment to obtain a corrected semantic analysis result; and feeding back the corrected semantic analysis result. According to the invention, the semantic error correction is carried out on the online semantic analysis result through the semantic error correction rule pre-stored locally in the terminal equipment, so that the semantic analysis result can be suitable for the terminal equipment, the accuracy of voice analysis is improved, and the terminal equipment is facilitated to realize effective feedback.

Description

Terminal equipment and voice interaction method thereof

Technical Field

The invention belongs to the field of voice interaction, and particularly relates to a terminal device and a voice interaction method thereof.

Background

With the gradual popularization of intellectualization in household electrical equipment, how to realize the interaction between the equipment and people becomes a hot topic. Voice interaction has become a common interaction method, which is widely used because manual operation by a user is not required, and is helpful to improve user experience.

When the existing intelligent equipment performs voice interaction, the equipment end does not have a voice analysis function, the voice is required to be uploaded to a server for natural language analysis, and an analysis result is returned to the equipment end by the server. In this way, how accurate the voice parsing is completely determined by the parsing level of the server, the device side can only passively receive the parsing result. Generally, the analysis service of the server is provided by a third party or an integrator in each vertical field, and since the third party or the integrator in each vertical field probably does not know the functional characteristics and the application scenario of the device side, the analysis result is prone to errors and cannot be applied to the device side, so that the device side cannot effectively feed back to the user.

Disclosure of Invention

The invention aims to solve the technical problem that in the prior art, when voice interaction is carried out, only voice semantics can be analyzed at a server side, and the analysis accuracy completely depends on the server, so that an analysis result cannot be suitable for an equipment side.

The invention solves the technical problems through the following technical scheme:

a voice interaction method of terminal equipment comprises the following steps:

receiving a voice input;

uploading the voice input to an analysis server for on-line semantic analysis;

receiving an online semantic parsing result fed back by the parsing server;

according to a semantic error correction rule prestored in the terminal equipment, performing semantic error correction on the online semantic analysis result locally at the terminal equipment to obtain a corrected semantic analysis result;

and feeding back the corrected semantic analysis result.

Preferably, the voice interaction method further comprises:

after receiving voice input and before uploading the voice input to an analysis server for online semantic analysis, judging whether the voice input meets the local analysis condition of the terminal equipment:

if yes, inputting the voice into the terminal equipment for local semantic analysis to obtain a local semantic analysis result; then, according to the semantic error correction rule, performing semantic error correction on the local semantic analysis result locally on the terminal equipment to obtain a corrected semantic analysis result; then, feeding back the corrected semantic analysis result;

if not, uploading the voice input to an analysis server for on-line semantic analysis.

Preferably, the semantic error correction rule includes common entity object information of the terminal device in different voice interaction scenarios;

when the online semantic analysis result or the local semantic analysis result belongs to the content intention, the step of performing semantic error correction on the online semantic analysis result or the local semantic analysis result locally at the terminal equipment comprises the following steps:

verifying whether original entity object information in the online semantic analysis result or the local semantic analysis result is contained in common entity object information of the current voice interaction scene, if not, inquiring common entity object information, which is closest to the original entity object information, of the current voice interaction scene from the semantic error correction rule to replace the original entity object information, and forming the corrected semantic analysis result.

Preferably, the relationship between the original entity object information and the common entity object information is the closest when any one of the following conditions is included:

synonyms;

a near word;

homophones;

and the similarity of the fuzzy matching words exceeds a preset similarity threshold value.

Preferably, the local parsing condition includes that the voice input hits a local command word pre-stored by the terminal device, and the user model pre-stored by the terminal device includes different expression forms of the local command word;

the voice interaction method further comprises the following steps:

when the voice input hits any expression form in the user model, judging that the voice input hits a corresponding local command word;

and updating the user model according to the online semantic parsing result, wherein the updating comprises adding, deleting and modifying local command words and at least one of adding, deleting and modifying language expressions of the local command words.

Preferably, the voice interaction method further comprises:

and controlling the terminal equipment to execute a control command when the online semantic analysis result or the local semantic analysis result belongs to the control type intention.

Preferably, when the online semantic analysis result or the local semantic analysis result belongs to a content-class intention, the step of feeding back the corrected semantic analysis result specifically includes:

judging whether the corrected semantic analysis result hits scene state data pre-stored locally in the terminal equipment, wherein the scene state data comprises historical conversation flows distinguished according to different voice interaction scenes and content data associated with the historical conversation flows:

and if so, extracting hit content data from the terminal equipment as feedback of the corrected semantic analysis result.

Preferably, the scene state data is cached in a knowledge graph mode, keywords extracted from the conversation flow are used as nodes in the knowledge graph, and the associated nodes are connected with each other.

A terminal device, comprising:

the local storage module is used for prestoring semantic error correction rules;

the voice input module is used for receiving voice input;

the voice transmission module is used for uploading the voice input to an analysis server for on-line semantic analysis and receiving an on-line semantic analysis result fed back by the analysis server;

the semantic error correction module is used for carrying out semantic error correction on the online semantic analysis result locally at the terminal equipment according to the semantic error correction rule to obtain a corrected semantic analysis result;

and the result feedback module is used for feeding back the corrected semantic analysis result.

Preferably, the terminal device further includes:

the analysis judging module is used for judging whether the voice input meets the local analysis condition of the terminal equipment, if not, the voice transmission module is called, and if yes, the voice transmission module is called:

the local analysis module is used for carrying out local semantic analysis on the voice input at the terminal equipment to obtain a local semantic analysis result when the voice input meets the local analysis condition;

and the semantic error correction module is also used for performing semantic error correction on the local semantic analysis result locally on the terminal equipment according to the semantic error correction rule to obtain the corrected semantic analysis result.

when the online semantic analysis result or the local semantic analysis result belongs to the content intention, the semantic error correction module is specifically configured to verify whether original entity object information in the online semantic analysis result or the local semantic analysis result is included in commonly used entity object information of the current voice interaction scene, and if not, query commonly used entity object information, which is closest to the original entity object information, of the current voice interaction scene from the semantic error correction rule to replace the original entity object information, so as to form the corrected semantic analysis result.

synonyms;

a near word;

homophones;

Preferably, the local storage module is further configured to pre-store a local command word and a user model, where the user model includes different expression forms of the local command word, and the local parsing condition includes that the voice input hits the local command word;

the analysis judging module is specifically used for judging that the voice input hits a corresponding local command word when the voice input hits any one expression form in the user model;

the terminal device further includes:

and the model updating module is used for updating the user model according to the online semantic parsing result, wherein the updating comprises at least one of adding, deleting and modifying local command words and adding, deleting and modifying language expressions of the local command words.

Preferably, the terminal device further includes:

and the equipment control module is used for controlling the equipment to execute a control command when the online semantic analysis result or the local semantic analysis result belongs to the control type intention.

Preferably, the local storage module is further configured to pre-store scene state data, where the scene state data includes historical conversation streams distinguished according to different voice interaction scenes and content data associated with the historical conversation streams;

and when the online semantic analysis result or the local semantic analysis result belongs to the content type intention, the result feedback module is specifically configured to determine whether the corrected semantic analysis result hits the scene state data, and if so, extract the hit content data from the local storage module as feedback of the corrected semantic analysis result.

On the basis of the common knowledge in the field, the above preferred conditions can be combined randomly to obtain the preferred embodiments of the invention.

The positive progress effects of the invention are as follows: according to the invention, the semantic error correction is carried out on the online semantic analysis result through the semantic error correction rule pre-stored locally in the terminal equipment, so that the semantic analysis result can be suitable for the terminal equipment, the accuracy of voice analysis is improved, and the terminal equipment is facilitated to realize effective feedback; in addition, the invention can further add local semantic analysis function to the terminal equipment, so that the on-line semantic analysis and the local semantic analysis are combined, the speed of voice analysis is accelerated, and the feedback time is shortened.

Drawings

Fig. 1 is a flowchart of a voice interaction method of a terminal device according to embodiment 1 of the present invention;

fig. 2 is a flowchart of a voice interaction method of a terminal device according to embodiment 2 of the present invention;

fig. 3 is a schematic diagram of a knowledge graph in a voice interaction method of a terminal device according to embodiment 2 of the present invention;

fig. 4 is a flowchart of a voice interaction method of a terminal device according to embodiment 3 of the present invention;

fig. 5 is a schematic block diagram of a terminal device according to embodiment 4 of the present invention;

fig. 6 is a schematic block diagram of a terminal device according to embodiment 5 of the present invention;

fig. 7 is a schematic block diagram of a terminal device according to embodiment 6 of the present invention.

Detailed Description

The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.

Example 1

The embodiment provides a voice interaction method of terminal equipment. The voice interaction method acts on the terminal equipment, and man-machine interaction between a user and the terminal equipment can be realized. The terminal device can be any device, including but not limited to smart home devices, and particularly, smart kitchen electrical devices (such as range hoods and kitchen ranges). The terminal device not only has a software and hardware structure for realizing the original functions, but also can be provided with a voice receiving module (such as a microphone array), a processor, a memory, a component for realizing the networking function and the like. The memory may include volatile memory, such as random access memory and/or cache memory, and may further include read-only memory.

As shown in fig. 1, the voice interaction method of the terminal device may include the following steps:

step 101: a voice input is received. The language of the voice input is not limited in this embodiment, and may be chinese, english, japanese, german, french, or other languages.

Step 102: and uploading the voice input to an analysis server for on-line semantic analysis. The terminal device and the analysis server can be connected through a network, so that data transmission is realized. The analysis server can be a cloud server or any other server with a voice analysis function, and can perform semantic analysis on the voice input by adopting the existing known voice recognition and voice analysis technologies so as to generate an online semantic analysis result and feed the online semantic analysis result back to the terminal equipment.

Step 103: and receiving the online semantic parsing result fed back by the parsing server. The user intention can be understood through the online semantic parsing result.

Step 104: and according to a semantic error correction rule prestored in the terminal equipment, performing semantic error correction on the online semantic analysis result locally at the terminal equipment to obtain a corrected semantic analysis result. The semantic error correction rule can be formulated according to the functional characteristics and the application scene of the terminal device, can help to correctly understand the user intention, can provide a common analysis result set and a common word (or common word) set suitable for the terminal device, and can be in multiple languages. The semantic error correction may include correcting a part of the online semantic analysis result that does not comply with the semantic error correction rule to obtain a semantic analysis result that complies with the functional characteristics of the terminal device and the application scenario.

Step 105: and feeding back the corrected semantic analysis result. The feedback in this embodiment may have various forms, such as operating the terminal device according to the result of the semantic analysis, changing the state of the terminal device, and responding to the voice input. In addition, the corrected semantic analysis result is directly output in a form of voice playing or a form of text display (in the case that the terminal device has a display screen), which can also be used as a feedback of the embodiment.

In this embodiment, the semantic error correction rule may be built into a memory of the terminal device, a cache of a processor, or a native system of the terminal device before step 104, even before the terminal device leaves a factory. The semantic error correction rule can be updated regularly through a background or according to user requirements when the terminal equipment is networked.

The voice interaction method of the embodiment can utilize the semantic error correction rule to perform local semantic error correction on the online semantic analysis result, so that the semantic analysis result can be suitable for the terminal equipment, the accuracy of voice analysis is improved, and the terminal equipment is facilitated to realize effective feedback. The voice interaction method of the embodiment can realize multiple rounds of human-computer conversations, and the conversation is not limited to the speech communication between the terminal equipment and the user, and can also comprise the step that the terminal equipment generates the feedback in the form of the voice of the user. After the terminal equipment makes feedback, the user can continue voice input, and then continue to analyze semantics by using the voice interaction method of the embodiment again to make feedback, and the steps are repeated for many times.

Example 2

This example is a further improvement over example 1. In this embodiment, the user intentions expressed by the online semantic analysis result are mainly divided into two categories, which are: content class intent and control class intent. The so-called control class intention may indicate that the user wants to control the terminal device, such as causing the terminal device to perform some operation (e.g. power on or off or other operation depending on the kind of terminal device), change to some state (e.g. sleep state, run state or other state depending on the kind of terminal device). The content-class intention may indicate that the user wants to perform an inquiry of some kind of information or feedback of some specific content on the terminal device, for example, the terminal device is a range hood, and the content-class intention may be to inquire a certain menu and perform a voice conversation with the user.

Considering that the analysis of the control-type intent is generally accurate, in order to increase the feedback speed, in the voice interaction method of this embodiment, the semantic error correction of the online semantic analysis result may be selectively performed only for the content-type intent. As shown in fig. 2, the voice interaction method further includes, based on embodiment 1:

step 1031 is inserted between step 103 and step 104: and judging whether the online semantic analysis result belongs to a content class intention or a control class intention, if the online semantic analysis result belongs to the content class intention, executing the step 104, and if the online semantic analysis result belongs to the control class intention, executing the step 106.

Step 106: and controlling the terminal equipment to execute a control command. The specific control command is directly determined by the online semantic parsing result, such as a command for turning on the device and a command for turning off the device.

In this embodiment, in order to perform semantic error correction on the content intent and improve the accuracy of online semantic parsing, so that the semantic parsing result is well applicable to the terminal device, the semantic error correction rule may include common entity object information of the terminal device in different voice interaction scenarios. Each voice interaction scene can correspond to a specific user intention, and the common entity object information can comprise entity objects which are used more frequently in the specific user intention.

Correspondingly, step 104 may specifically include:

step 1041: verifying whether the original entity object information of the online semantic analysis result is contained in the common entity object information of the current voice interaction scene, if so, indicating that the online semantic analysis result conforms to the current voice interaction scene, and directly feeding back the online semantic analysis result as a corrected semantic analysis result; if not, go to step 1042.

Step 1042: and inquiring common entity object information which is closest to the original entity object information in the current voice interaction scene from the semantic error correction rule to replace the original entity object information to form the corrected semantic analysis result. In this embodiment, when the relationship between the original entity object information and the common entity object information includes any one of the following situations, the relationship may be the closest relationship:

synonyms;

a near word;

homophones;

Taking the terminal device as a range hood as an example, the semantic error correction process in step 104 in this embodiment is specifically described below:

since the range hood relates to cooking, one voice interaction scenario in the semantic error correction rule may be a recipe query scenario, and common entity object information thereof may include: vegetable names such as Chinese cabbage, and potato, fish names such as carp, crucian, and weever, and seafood names such as crab, shrimp, and clam.

The user inputs voice, and after online semantic analysis, the obtained online semantic analysis result is as follows: the user wants to inquire about how the braising is "good for".

In a menu query scene, a term of 'benefit' does not exist in the information of the common entity object, and the closest term is 'carp' having the same tone as 'benefit', so that the corrected semantic analysis result should be: the user wants to inquire the method for cooking the braised carp in brown sauce.

The range hood can broadcast voice to users or display the cooking method of braised carp in soy sauce through the display screen.

Similarly, cabbage and cabbage are synonyms, and in a menu query scene, if the online semantic analysis result contains cabbage, the common entity object information closest to the cabbage is cabbage, and can be replaced;

pineapple and pineapple are similar words, in a menu query scene, if pineapple exists in an online semantic analysis result, common entity object information closest to pineapple is pineapple, and the pineapple and pineapple can be replaced;

in a menu query scene, if the online semantic analysis result contains 'one spoon of salt', the fuzzy matching word with the highest similarity is '5 g of salt', and the fuzzy matching word can be replaced.

In this embodiment, in order to speed up content feedback, scene state data may be pre-stored locally in the device, where the scene state data may include historical dialog streams distinguished according to different voice interaction scenes and content data associated with the historical dialog streams. In this embodiment, the scene state data may be built into a memory of the terminal device, a cache of a processor, or a native system of the terminal device before step 105, even before the terminal device leaves a factory. The scene state number can be updated periodically through a background when the terminal equipment is networked, or according to the user requirement.

Step 105 may specifically include:

step 1051: and judging whether the corrected semantic analysis result hits the scene state data, if so, executing a step 1052, and if not, executing a step 1053. The hit of the scene state data may be a word in a dialog flow that hits a voice interaction scene in the scene state data.

Step 1052: and extracting the hit content data from the terminal equipment as feedback of the corrected semantic parsing result. The extracted content data may be content data associated with the hit dialog stream or content data associated with a hit dialog.

Step 1053: and feeding back through network search.

In this embodiment, the scene state data may be cached in a knowledge graph mode, the keywords extracted from the dialog flow are used as nodes in the knowledge graph, and the associated nodes are connected with each other.

Several segments of historical conversation flow are given below:

a. the user wants to eat the crucian braised in soy sauce, and the corresponding conversation flow is as follows:

"how do crucian? "═ returns: braising, frying and stewing

Returning from the selection of the bar for burning in brown sauce: menu of braised crucian (containing corresponding information such as main material, auxiliary material and steps)

b. The user wants to eat the crucian bean curd soup, and the corresponding conversation flow is as follows:

how do crucian bean curd soup? "═ returns: menu of crucian tofu soup (containing corresponding main material, auxiliary material, step and other information)

c. The user wants to eat the dish combined by the shallot and the bean curd, and the corresponding conversation flow is as follows:

"what do i have shallots and tofu and can do? "═ returns: recipe of shallot and bean curd (containing corresponding main material, auxiliary material, step and other information)

Based on the three-segment conversation flow, a knowledge graph can be generated as shown in fig. 3, wherein key words such as crucian, braised carp, fried, stewed and bean curd serve as nodes, key words such as the crucian, braised carp, fried, stewed and bean curd serve as nodes, key words such as braised carp, fried, stewed and bean curd serve as nodes, and a menu of braised crucian, a menu of crucian bean curd soup and a menu of shallot-mixed bean curd serve as associated content data to be cached.

After completion, it is assumed that the user inputs voice "what can be done with tofu? Then, two menus of 'shallot mixed bean curd' and 'crucian bean curd soup' can be matched locally at the terminal equipment, and the method of extracting the 'shallot mixed bean curd' and the 'crucian bean curd soup' from the cache can be directly fed back to the user

The voice interaction method further improves the specific processes of semantic error correction and content feedback in the terminal equipment, realizes local semantic error correction according with the voice interaction scene of the terminal equipment through the pre-stored semantic error correction rule, and accelerates the feedback speed by extracting the content feedback from the local.

Example 3

The voice interaction method of the embodiment can further realize local semantic analysis of the terminal equipment, in order to accelerate the voice interaction speed, the local semantic analysis can be preferentially adopted for the voice input which can be analyzed by the terminal equipment, and if the voice input exceeds the analysis range of the terminal equipment, the on-line semantic analysis is adopted. As shown in fig. 4, the voice interaction method may further include:

step 1011 is inserted between step 101 and step 102: judging whether the voice input meets the local analysis condition of the terminal equipment: if yes, go to step 107, otherwise go to step 102. In this embodiment, the local parsing condition may include that the voice input hits a local command word pre-stored in the terminal device. Each local command word may represent a user intent. The user model pre-stored by the terminal equipment comprises different expression forms of the local command words; for compatibility with more common user utterances, the models may be added incrementally. When some expressions of the user are identified to be used frequently in some regions, the expressions can be added to the user model through local training and then cloud publishing. Taking the local command word as an example for starting the range hood, the different expression forms can also be 'opening the range hood', 'starting the range hood', and the like, and certainly, the expression forms can also be different languages. And when the voice input hits any expression form in the user model, judging that the local command word corresponding to the voice input hit. Certainly, the user model may be updated periodically or according to user requirements, and specifically, the user model may be updated according to the online semantic parsing result, where the updating includes at least one of adding, deleting, and modifying a local command word, and adding, deleting, and modifying a linguistic expression of the local command word.

Step 107: and inputting the voice into the terminal equipment to carry out local semantic analysis to obtain a local semantic analysis result. The user intention can be understood through the local semantic parsing result. Similar to the online semantic analysis result in embodiment 2, the user intentions expressed by the local semantic analysis result may also be mainly divided into two categories, which are: the content class intent and the control class intent are not described in detail herein.

In order to ensure the accuracy of the local semantic analysis, the local error correction is the same as the local error correction of the online semantic analysis result, and the local error correction may be performed on the local semantic analysis result belonging to the content intent according to the semantic error correction rule to obtain a corrected semantic analysis result, and then step 105 is performed. For the semantic error correction rule and the local error correction process, refer to embodiment 2, and are not described herein again.

The voice interaction method of the embodiment adds a local semantic parsing function, compared with online semantic parsing, the local semantic parsing has a higher speed, and the local semantic parsing function can be used independently without networking.

Example 4

The embodiment provides a terminal device. The terminal equipment can realize human-computer interaction with a user. The terminal device can be any device, including but not limited to smart home devices, and particularly, smart kitchen electrical devices (such as range hoods and kitchen ranges). The terminal device may have a software/hardware structure for implementing its original functions, and may include, as shown in fig. 5: a local storage module 201, a voice input module 202, a voice transmission module 203, a semantic error correction module 204 and a result feedback module 205.

The local storage module 201 is configured to pre-store semantic error correction rules. The semantic error correction rule can be formulated according to the functional characteristics and the application scene of the terminal device, can help to correctly understand the user intention, can provide a common analysis result set and a common word (or common word) set suitable for the terminal device, and can be in multiple languages. The local storage module 201 may be a memory of the terminal device, a cache of a processor, or a storage space in a native system of the terminal device. The semantic error correction rule can be updated regularly through a background or according to user requirements when the terminal equipment is networked.

The voice input module 202 is used for receiving voice input, and the voice input module 202 may include a microphone array. The language of the voice input is not limited in this embodiment, and may be chinese, english, japanese, german, french, or other languages.

The voice transmission module 203 is configured to upload the voice input to an analysis server for online semantic analysis, and receive an online semantic analysis result fed back by the analysis server. The terminal device and the analysis server can be connected through a network, so that data transmission is realized. The parsing server can be a cloud server or any other server with a voice parsing function, and can perform semantic parsing on the voice input by adopting the existing known voice recognition and voice parsing technologies, so as to generate an online semantic parsing result, and feed the online semantic parsing result back to the voice transmission module 203. The user intention can be understood through the online semantic parsing result.

The semantic error correction module 204 is configured to perform semantic error correction on the online semantic analysis result locally at the terminal device according to the semantic error correction rule, so as to obtain a corrected semantic analysis result. The semantic error correction may include correcting a part of the online semantic analysis result that does not comply with the semantic error correction rule to obtain a semantic analysis result that complies with the functional characteristics of the terminal device and the application scenario.

The result feedback module 205 is configured to feed back the corrected semantic parsing result. The feedback in this embodiment may have various forms, such as operating the terminal device according to the result of the semantic analysis, changing the state of the terminal device, and responding to the voice input. In addition, the corrected semantic analysis result is directly output in a form of voice playing or a form of text display (in the case that the terminal device has a display screen), which can also be used as a feedback of the embodiment.

The terminal device of this embodiment can utilize the semantic error correction rule to perform local semantic error correction on the online semantic analysis result, so that the semantic analysis result can be applicable to the terminal device, the accuracy of voice analysis is improved, and the terminal device is facilitated to realize effective feedback. The terminal equipment of the embodiment can realize multiple rounds of human-computer conversations, and the conversation is not limited to the speech communication between the terminal equipment and the user, and can also comprise the feedback of the terminal equipment on the voice of the user in the form. After the terminal equipment makes feedback, the user can continue voice input, and then continue to analyze the semantics by using the terminal equipment of the embodiment again to make feedback, and the steps are repeated for many times.

Example 5

This example is a further improvement over example 4. In this embodiment, the user intentions expressed by the online semantic analysis result are mainly divided into two categories, which are: content class intent and control class intent. The so-called control class intention may indicate that the user wants to control the terminal device, such as causing the terminal device to perform some operation (e.g. power on or off or other operation depending on the kind of terminal device), change to some state (e.g. sleep state, run state or other state depending on the kind of terminal device). The content-class intention may indicate that the user wants to perform an inquiry of some kind of information or feedback of some specific content on the terminal device, for example, the terminal device is a range hood, and the content-class intention may be to inquire a certain menu and perform a voice conversation with the user.

Considering that the analysis of the control-type intent is usually accurate, in order to increase the feedback speed, in the terminal device of this embodiment, the semantic error correction of the online semantic analysis result may be selectively performed only for the content-type intent. As shown in fig. 6, the terminal device may further include: a device control module 206. And when the online semantic analysis result belongs to the content intention, calling the semantic error correction module 204. And when the online semantic analysis result belongs to the control class intention, calling the equipment control module 206. The device control module 206 is configured to control the device to execute a control command when the online semantic parsing result belongs to a control-class intention.

When the online semantic analysis result belongs to the content intention, the semantic error correction module 204 is specifically configured to verify whether original entity object information in the online semantic analysis result or the local semantic analysis result is included in commonly used entity object information of the current voice interaction scene, and if not, query, from the semantic error correction rule, commonly used entity object information of the current voice interaction scene that is closest to the original entity object information to replace the original entity object information, so as to form the corrected semantic analysis result. In this embodiment, when the relationship between the original entity object information and the common entity object information includes any one of the following situations, the relationship may be the closest relationship:

synonyms;

a near word;

homophones;

In this embodiment, in order to speed up content feedback, the local storage module 201 is further configured to pre-store scene state data, where the scene state data includes historical dialog streams distinguished according to different voice interaction scenes and content data associated with the historical dialog streams. The scene state number can be updated periodically through a background when the terminal equipment is networked, or according to the user requirement.

When the online semantic analysis result belongs to the content type intention, the result feedback module 205 is specifically configured to determine whether the corrected semantic analysis result hits the scene state data, if yes, extract the hit content data from the local storage module 201 as feedback for the corrected semantic analysis result, and if not, perform feedback through network search. The hit of the scene state data may be a word in a dialog flow that hits a voice interaction scene in the scene state data. The extracted content data may be content data associated with the hit dialog stream or content data associated with a hit dialog.

In this embodiment, the scene state data is cached in a knowledge graph manner, the keywords extracted from the dialog flow are used as nodes in the knowledge graph, and the associated nodes are connected with each other.

The terminal equipment of the embodiment further improves the specific functions of semantic error correction and content feedback in the terminal equipment, realizes local semantic error correction according with the voice interaction scene of the terminal equipment through the pre-stored semantic error correction rule, and accelerates the feedback speed by extracting the content feedback from the local.

Example 6

The terminal device of this embodiment may further implement local semantic parsing of the terminal device, and in order to accelerate the voice interaction speed, local semantic parsing may be preferentially adopted for the voice input that can be parsed by the terminal device, and if the voice input exceeds the range that can be parsed by the terminal device, online semantic parsing is adopted. As shown in fig. 7, the terminal device may further include: a parsing judgment module 207, a local parsing module 208 and a model update module 209.

The parsing determining module 207 is configured to determine whether the voice input meets a local parsing condition of the terminal device, if not, invoke the voice transmission module 203, and if so, invoke the local parsing module 208.

The local parsing module 208 is configured to perform local semantic parsing on the voice input at the terminal device when the voice input meets the local parsing condition, so as to obtain a local semantic parsing result;

the semantic error correction module 204 is further configured to perform semantic error correction on the local semantic analysis result locally at the terminal device according to the semantic error correction rule, so as to obtain the corrected semantic analysis result.

Specifically, the local storage module 201 may be further configured to pre-store a local command word and a user model, where the user model includes different expression forms of the local command word, and the local parsing condition includes that the voice input hits the local command word. The parsing and determining module 207 is specifically configured to determine that the voice input hits a corresponding local command word when the voice input hits any one of the expression forms in the user model.

The model updating module 209 is configured to update the user model according to the online semantic parsing result, where the updating includes at least one of adding, deleting, and modifying a local command word, and adding, deleting, and modifying a linguistic expression of the local command word.

The terminal equipment of the embodiment is added with the local semantic parsing function, compared with the on-line semantic parsing, the local semantic parsing has higher speed, and the local semantic parsing function can be independently used under the condition of no networking.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that these are by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. A voice interaction method of a terminal device is characterized by comprising the following steps:

receiving a voice input;

uploading the voice input to an analysis server for on-line semantic analysis;

receiving an online semantic parsing result fed back by the parsing server;

feeding back the corrected semantic analysis result;

the semantic error correction rule comprises common entity object information of the terminal equipment in different voice interaction scenes;

when the online semantic analysis result belongs to the content intention, the step of performing semantic error correction on the online semantic analysis result locally at the terminal equipment comprises the following steps:

verifying whether the original entity object information in the online semantic analysis result is contained in the common entity object information of the current voice interaction scene, if not, inquiring the common entity object information of the current voice interaction scene, which is closest to the original entity object information, from the semantic error correction rule to replace the original entity object information, and forming the corrected semantic analysis result.

2. The voice interaction method of the terminal device according to claim 1, wherein the voice interaction method further comprises:

3. The voice interaction method of the terminal device according to claim 2, wherein the step of performing semantic error correction on the local semantic analysis result locally at the terminal device when the local semantic analysis result belongs to the content-class intention comprises:

and verifying whether the original entity object information in the local semantic analysis result is contained in the common entity object information of the current voice interaction scene, if not, inquiring the common entity object information of the current voice interaction scene, which is closest to the original entity object information, from the semantic error correction rule to replace the original entity object information, and forming the corrected semantic analysis result.

4. The voice interaction method of a terminal device according to claim 1, wherein the relationship between the original entity object information and the common entity object information is closest when any one of the following conditions is included:

synonyms;

a near word;

homophones;

5. The voice interaction method of the terminal device according to claim 2, wherein the local parsing condition includes that the voice input hits a local command word pre-stored in the terminal device, and the user model pre-stored in the terminal device includes different expression forms of the local command word;

the voice interaction method further comprises the following steps:

6. The voice interaction method of the terminal device according to claim 1, wherein the voice interaction method further comprises:

and when the online semantic analysis result belongs to the control type intention, controlling the terminal equipment to execute a control command.

7. The voice interaction method of the terminal device according to claim 2, wherein the voice interaction method further comprises:

and controlling the terminal equipment to execute a control command when the local semantic analysis result belongs to the control type intention.

8. The method for voice interaction of a terminal device according to claim 1, wherein when the online semantic analysis result belongs to a content-class intention, the step of feeding back the corrected semantic analysis result specifically comprises:

9. The voice interaction method of the terminal device according to claim 2, wherein the step of feeding back the corrected semantic analysis result when the local semantic analysis result belongs to the content-class intention specifically comprises:

10. The voice interaction method of a terminal device according to claim 8 or 9, wherein the scene state data is cached in a knowledge graph manner, keywords extracted from the dialog flow are used as nodes in the knowledge graph, and associated nodes are connected with each other.

11. A terminal device, comprising:

the voice input module is used for receiving voice input;

the result feedback module is used for feeding back the corrected semantic analysis result;

when the online semantic analysis result belongs to the content intention, the semantic error correction module is specifically used for verifying whether the original entity object information in the online semantic analysis result is contained in the common entity object information of the current voice interaction scene, if not, the semantic error correction module queries the common entity object information of the current voice interaction scene, which is closest to the original entity object information, to replace the original entity object information, so as to form the corrected semantic analysis result.

12. The terminal device of claim 11, wherein the terminal device further comprises:

13. The terminal device according to claim 12, wherein when the local semantic analysis result belongs to a content-class intention, the semantic error correction module is specifically configured to verify whether original entity object information in the local semantic analysis result is included in commonly used entity object information of a current voice interaction scene, and if not, query commonly used entity object information of the current voice interaction scene that is closest to the original entity object information from the semantic error correction rule to replace the original entity object information, so as to form the corrected semantic analysis result.

14. The terminal device according to claim 11, wherein the closest is when the relationship between the legacy entity object information and the common entity object information includes any one of:

synonyms;

a near word;

homophones;

15. The terminal device according to claim 12, wherein the local storage module is further configured to pre-store a local command word and a user model, the user model includes different expression forms of the local command word, and the local parsing condition includes that the voice input hits the local command word;

the terminal device further includes:

16. The terminal device of claim 11, wherein the terminal device further comprises:

and the equipment control module is used for controlling the equipment to execute a control command when the online semantic analysis result belongs to the control type intention.

17. The terminal device of claim 12, wherein the terminal device further comprises:

and the equipment control module is used for controlling the equipment to execute a control command when the local semantic analysis result belongs to the control type intention.

18. The terminal device of claim 11, wherein the local storage module is further configured to pre-store scene state data, the scene state data including historical conversation streams distinguished by different voice interaction scenes and content data associated with the historical conversation streams;

and when the online semantic analysis result belongs to the content type intention, the result feedback module is specifically used for judging whether the corrected semantic analysis result hits the scene state data, and if so, the hit content data is extracted from the local storage module to be used as feedback of the corrected semantic analysis result.

19. The terminal device of claim 12, wherein the local storage module is further configured to pre-store scene state data, the scene state data including historical conversation streams distinguished by different voice interaction scenes and content data associated with the historical conversation streams;

and when the local semantic analysis result belongs to the content type intention, the result feedback module is specifically used for judging whether the corrected semantic analysis result hits the scene state data, and if so, the hit content data is extracted from the local storage module to be used as feedback of the corrected semantic analysis result.

20. The terminal device according to claim 18 or 19, wherein the scene state data is cached in a way of a knowledge graph, keywords extracted from the conversation flow are used as nodes in the knowledge graph, and associated nodes are connected with each other.