CN111062200B - Speaking generalization method, speaking recognition device and electronic equipment - Google Patents

Speaking generalization method, speaking recognition device and electronic equipment Download PDF

Info

Publication number
CN111062200B
CN111062200B CN201911288675.XA CN201911288675A CN111062200B CN 111062200 B CN111062200 B CN 111062200B CN 201911288675 A CN201911288675 A CN 201911288675A CN 111062200 B CN111062200 B CN 111062200B
Authority
CN
China
Prior art keywords
word
intention
relation
target
generalization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911288675.XA
Other languages
Chinese (zh)
Other versions
CN111062200A (en
Inventor
游程
苏少炜
陈孝良
常乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN201911288675.XA priority Critical patent/CN111062200B/en
Publication of CN111062200A publication Critical patent/CN111062200A/en
Application granted granted Critical
Publication of CN111062200B publication Critical patent/CN111062200B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a speaking operation generalization method, a speaking operation identification device and electronic equipment, wherein the method comprises the following steps: acquiring N first utterances under intention, wherein N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining the generalization result of the intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first phones, the intended generalization results are determined, and the generalization effect can be improved. In addition, the purpose of automatic generalization can be achieved by using dependency syntax analysis to obtain the intended generalization result.

Description

Speaking generalization method, speaking recognition device and electronic equipment
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a speech surgery generalization method, a speech surgery recognition device, and an electronic device.
Background
In the current automatic generalization technology, when generalizing a given speech, a manner of removing stop words is generally used, that is, unimportant words such as 'please' in the given speech are defined as stop words, the stop words are removed from the given speech, and when using, the stop words in a user request sentence are also removed to achieve a certain degree of generalization.
The current automatic generalization technology is realized by removing the stop words, and the generalization effect is poor.
Disclosure of Invention
The embodiment of the invention provides a voice operation generalization method, a voice operation identification device and electronic equipment, which are used for solving the problem of poor generalization effect of the generalization method in the prior art.
In order to solve the technical problems, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a session generalization method, including:
acquiring N first utterances under intention, wherein N is a positive integer;
performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;
and determining the generalization result of the intention according to the N dependency syntax analysis results.
In a second aspect, an embodiment of the present invention provides a speech recognition method, including:
acquiring an input speech operation;
performing intention recognition on the input voice operation through an intention model to obtain a target intention of the input voice operation, wherein the intention model comprises generalization results of a plurality of intentions;
and carrying out slot recognition on the input voice operation according to the target intention to obtain a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot.
In a third aspect, an embodiment of the present invention further provides a speech generalization apparatus, including:
the first acquisition module is used for acquiring N first utterances under intention, wherein N is a positive integer;
the second acquisition module is used for carrying out dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;
and the determining module is used for determining the generalization result of the intention according to the N dependency syntax analysis results.
In a fourth aspect, an embodiment of the present invention further provides a speech recognition apparatus, including:
the acquisition module is used for acquiring an input voice operation;
the first recognition module is used for carrying out intention recognition on the input voice operation through an intention model so as to obtain target intention of the input voice operation, wherein the intention model comprises generalized results of a plurality of intentions;
and the second recognition module is used for recognizing the input voice operation into a target slot according to the target intention, and obtaining the target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot.
In a fifth aspect, an embodiment of the present invention further provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program implements the steps of the speaking generalization method as described in the first aspect when executed by the processor, or the computer program implements the steps of the speaking recognition method as described in the second aspect when executed by the processor.
In a sixth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program implements the steps of the speech surgery generalization method according to the first aspect when being executed by the processor, or where the computer program implements the steps of the speech surgery recognition method according to the second aspect when being executed by the processor.
In the embodiment of the invention, N first utterances under the intention are acquired, wherein N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining the generalization result of the intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first phones, the intended generalization results are determined, and the generalization effect can be improved. In addition, the purpose of automatic generalization can be achieved by using dependency syntax analysis to obtain the intended generalization result.
Drawings
FIG. 1 is a flow chart of a method of speaking generalization provided by an embodiment of the present invention;
FIG. 2 is a flow chart of a method of identifying speech surgery provided by an embodiment of the present invention;
FIG. 3 is a block diagram of a speech generalization apparatus provided by an embodiment of the present invention;
FIG. 4 is a block diagram of a speech recognition device according to an embodiment of the present invention;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a speaking generalization method provided by an embodiment of the present invention, and as shown in fig. 1, the embodiment provides a speaking generalization method, which is applied to a speaking generalization apparatus, and includes the following steps:
step 101, acquiring N first utterances under intention, wherein N is a positive integer.
Specifically, there may be multiple expressions for one intention, and one intention corresponds to N first utterances. The N first utterances may be determined manually. Further, for slots in the first conversation, all slots may be replaced with a particular Chinese, such as "slots". Certain words may be used to replace slots. For example, for the intent to subscribe to an air ticket, the first utterance may be: i want to order { time } tickets from { origin } to { destination }; help me reserve a ticket to { destination }, etc., where the parts in "{ }" are slots that can be replaced with words in the slot dictionary. For example, for a time slot, the words in the slot dictionary for the time slot may be replaced with words in the slot dictionary for the time slot, and if "today" is a word in the slot dictionary for the time slot, the time slot in the first conversation may be replaced with "today". If the words in the dictionary of places and slots include "Beijing" and "Turkish", the place and slot of departure in the first phone may be replaced with "Beijing" and the destination in the first phone may be replaced with "Turkish".
After the words in the slot dictionary are adopted to replace the slots in the first conversation, the first conversation is specifically as follows: i want to order today's ticket from Beijing to Turkish, helping i to order a ticket to remove Turkish. The above alternatives are merely examples, and the present invention is not limited thereto, and the time slot in the first speech operation may be replaced with other words in the slot dictionary corresponding to the time slot instead of "today". Similarly, the place slots may be replaced with other words in the place slot dictionary instead of "Beijing" and "Turkey".
And 102, performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results.
And performing dependency syntax analysis on each first call in the N first calls to obtain a dependency syntax analysis result. The N first utterances correspond to N dependency syntax analysis results. Before the dependency syntax analysis is carried out on the first word, word segmentation is carried out on the first word, part-of-speech analysis is carried out on the segmented words, then the dependency syntax analysis is carried out on each segmented word, and the dependency syntax analysis result of each segmented word is obtained. The dependency grammar (Dependency Parsing, DP for short) reveals its syntactic structure by analyzing dependencies between components within a language unit. Dependency syntactic analysis identifies the grammar components "main predicate" and "fixed-form complement" in a sentence (which can be understood as a conversation), and analyzes the relationship between the components.
For example, for the term "apple is consumed by me", the word segmentation result is: apple, quilt, me, eat; wherein, the part of speech of "apple" is n, the part of speech of "quilt" is p, the part of speech of "I" is r, the part of speech of "eating" is v, and the part of speech of "having" is u. After the dependency syntax analysis is performed, the obtained dependency syntax analysis result is: the relation between "apple" and "eat" is Front Object (FOB), the relation between "quilt" and "eat" is in-shape structure (ADV), the relation between "me" and "quilt" is in-man relation (POB), and the relation between "quilt" and "eat" is in right additional Relation (RAD). "eat" is the core of the entire sentence.
In this step, N first utterances may be input into the dependency syntax analysis model, thereby obtaining a dependency syntax analysis result. The dependency syntax analysis model can analyze grammar components such as 'main predicate' and 'definite form complement' of the first grammar, and analyze the relation among the components to obtain a dependency syntax analysis result. The dependency syntactic analysis model may be trained using a corpus that belongs to the same language as the first utterance, e.g., if the first utterance is chinese, then the syntactic analysis model is trained based on the chinese corpus; if the first speech is english, then the syntactic analysis model is trained based on the english corpus.
And step 103, determining the generalization result of the intention according to the N dependency syntax analysis results.
Specifically, according to the N dependency syntax analysis results, determining an intent generalization result, for example, classifying the designated relation in the dependency syntax analysis results, counting the occurrence times of words in each classification, and if the occurrence times of the words exceed a preset threshold, taking the words as the intent generalization result.
In this embodiment, N first utterances under the intention are obtained, where N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining the generalization result of the intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first phones, the intended generalization results are determined, and the generalization effect can be improved. In addition, the purpose of automatic generalization can be achieved by using dependency syntax analysis to obtain the intended generalization result.
In one embodiment of the present application, the slot of each of the N first utterances is a slot candidate word;
step 103, determining a generalization result of the intention according to the N dependency syntax analysis results, including:
Counting the counting times of each word belonging to the first relation except the slot candidate words in the N dependency syntax analysis results, and counting times of each word belonging to the second relation;
and determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the intention.
Specifically, the candidate words of the slot may be specific Chinese, for example, "slot", or words in a dictionary of slots may be used, for example, for a time slot, the candidate words of the slot may be words in the dictionary of slots of the time slot; the slot candidate word may be a word in a slot dictionary of the place slot corresponding to the place slot.
The method comprises the steps that the groove position of each first conversation in N first conversations is replaced by words in a groove position dictionary corresponding to the groove position, so that the groove position of each first conversation in N first conversations is a groove position candidate word. When the slot is replaced, words in a slot dictionary corresponding to the slot can be obtained in a random mode, and the words are utilized to replace the slot.
After the dependency syntax analysis result of the first conversation is obtained, the words belonging to the first relation and the words belonging to the second relation in the dependency syntax analysis result are extracted. The first relationship may be a core relationship, a guest-guest relationship, etc., and the second relationship may be an object, an indirect object, a front object, etc., without limitation. The first relation and the second relation include words that do not include slot candidate words.
Extracting the words belonging to the first relation and the words belonging to the second relation from the dependency syntax analysis results corresponding to the first utterances for each of the N first utterances. And then, counting the words in the acquired first relation and second relation, and determining the words with the counted times larger than N/2 in the first relation and the second relation as the generalization result of the intention.
In this embodiment, the generalization of the dialogue is achieved by counting the number of statistics of each word belonging to the first relationship except the slot candidate word in the N dependency syntax analysis results and the number of statistics of each word belonging to the second relationship, and determining the words with the statistics number greater than N/2 in the first relationship and the second relationship as the generalization result of the intention, so that a better generalization effect can be achieved only by a small amount of corpus (i.e., the first dialogue).
In one embodiment of the present application, the first relationship includes a first word, and the statistical number of times of the first word is: the number of times that the first word appears in the first relation is added to the number of times of the semantics of the first word, wherein the number of times of the semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;
The second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.
Specifically, calculating the semantic similarity between the words in the first relationship, and if the semantic similarity exceeds a first threshold, counting the number of times corresponding to the words participating in the semantic similarity calculation may be: the number of times the term occurs in the first relationship, plus the number of times another term that participates in the semantic similarity calculation occurs in the first relationship. Namely the number of times that the first word appears in the first relation plus the semantic number of times of the first word, wherein the semantic number of times of the first word is: the number of occurrences of the term having a semantic similarity to the first term greater than a first threshold in a first relationship, the first relationship comprising the first term.
Similarly, the same processing manner is adopted for the words in the second relation, namely the second relation comprises the second words, and the statistical times of the second words are as follows: the number of times the second word appears in the second relationship plus the number of semantics of the second word, wherein the number of semantics of the second word is: the number of occurrences of the term in the second relationship having a semantic similarity to the second term that is greater than the second threshold. The first threshold and the second threshold may be set according to practical situations, and are not limited herein.
For the first word, it may belong to the first relation only in the dependent syntax analysis result of one first phone, or may belong to the first relation in all of the dependent syntax analysis results of a plurality of first phones, so that the first word may appear one or more times in the first relation, that is, the number of times the first word appears in the first relation. For the second word, it may belong to the second relationship only in the dependent syntax analysis result of one first phone, or may belong to the second relationship in all of the dependent syntax analysis results of a plurality of first phones, so that the second word may appear one or more times in the second relationship, that is, the number of times the second word appears in the second relationship.
In this embodiment, when determining the number of statistics of the first word, not only the number of occurrences of the first word in the first relationship is counted, but also the number of occurrences of the word having a semantic similarity with the first word greater than the first threshold in the first relationship is superimposed on the number of occurrences of the first word, so that the generalization capability of the first word can be enhanced, the semantic similarity generalization can be achieved, and the generalization effect can be enhanced.
Referring to fig. 2, fig. 2 is a flowchart of a voice recognition method according to an embodiment of the present invention, and as shown in fig. 2, the present embodiment provides a voice recognition method, which is applied to a voice recognition device, and includes the following steps:
Step 201, obtaining an input speech technology.
The input speech may be a query statement entered by the user.
Step 202, performing intention recognition on the input voice through an intention model to obtain a target intention of the input voice, wherein the intention model comprises generalized results of a plurality of intentions.
The intent model comprises generalization results of a plurality of intents, wherein the generalization result acquisition process of at least one intent of the plurality of intents is obtained by using the speaking generalization method of the embodiment shown in fig. 1. And carrying out intention recognition on the input voice operation through the intention model to obtain the target intention of the input voice operation.
Step 203, according to the target intention, carrying out slot recognition on the input speech operation to obtain a target slot.
And carrying out slot recognition on the input voice operation by adopting a corresponding slot recognition module according to the target intention to obtain a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot.
According to the voice operation identification method, an input voice operation is acquired; performing intention recognition on the input voice operation through an intention model to obtain a target intention of the input voice operation, wherein the intention model comprises generalization results of a plurality of intentions; and carrying out slot recognition on the input voice operation according to the target intention to obtain a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot. The target intention of the input voice operation is identified through the intention model, then the target slot position of the input voice operation is identified according to the target intention, and the target intention and the target slot position are used as the identification result of the input voice operation, so that the identification accuracy of the input voice operation can be improved.
In one embodiment of the present application, the plurality of intents includes a first intention, and the generalized result acquisition process of the first intention included in the intention model includes:
acquiring N first utterances under a first intention, wherein N is a positive integer;
performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;
and determining a generalization result of the first intention according to the N dependency syntax analysis results.
Specifically, the first intention may have a plurality of expressions, and the first intention may include N first utterances. The N first utterances may be determined manually. Further, for the slot in the first conversation, a specific word may be used to replace the slot. For example, for a first intent to subscribe to an air ticket, the first utterance may be: i want to order { time } tickets from { origin } to { destination }; help me reserve a ticket to { destination }, etc., where the parts in "{ }" are slots that can be replaced with words in the slot dictionary. For example, for a time slot, the words in the slot dictionary for the time slot may be replaced with words in the slot dictionary for the time slot, and if "today" is a word in the slot dictionary for the time slot, the time slot in the first conversation may be replaced with "today". If the words in the dictionary of places and slots include "Beijing" and "Turkish", the place and slot of departure in the first phone may be replaced with "Beijing" and the destination in the first phone may be replaced with "Turkish".
After the words in the slot dictionary are adopted to replace the slots in the first conversation, the first conversation is specifically as follows: i want to order today's ticket from Beijing to Turkish, helping i to order a ticket to remove Turkish. The above alternatives are merely examples, and the present invention is not limited thereto, and the time slot in the first speech operation may be replaced with other words in the slot dictionary corresponding to the time slot instead of "today". Similarly, the place slots may be replaced with other words in the place slot dictionary instead of "Beijing" and "Turkey".
And performing dependency syntax analysis on each first call in the N first calls to obtain a dependency syntax analysis result. The N first utterances correspond to N dependency syntax analysis results. Before the dependency syntax analysis is carried out on the first word, word segmentation is carried out on the first word, part-of-speech analysis is carried out on the segmented words, then the dependency syntax analysis is carried out on each segmented word, and the dependency syntax analysis result of each segmented word is obtained. The dependency grammar (Dependency Parsing, DP for short) reveals its syntactic structure by analyzing dependencies between components within a language unit. Dependency syntactic analysis identifies the grammar components "main predicate" and "fixed-form complement" in a sentence (which can be understood as a conversation), and analyzes the relationship between the components.
For example, for the term "apple is consumed by me", the word segmentation result is: apple, quilt, me, eat; wherein, the part of speech of "apple" is n, the part of speech of "quilt" is p, the part of speech of "I" is r, the part of speech of "eating" is v, and the part of speech of "having" is u. After the dependency syntax analysis is performed, the obtained dependency syntax analysis result is: the relation between "apple" and "eat" is Front Object (FOB), the relation between "quilt" and "eat" is in-shape structure (ADV), the relation between "me" and "quilt" is in-man relation (POB), and the relation between "quilt" and "eat" is in right additional Relation (RAD). "eat" is the core of the entire sentence.
N first utterances may be input into the dependency syntax analysis model, thereby obtaining a dependency syntax analysis result. The dependency syntax analysis model can analyze grammar components such as 'main predicate' and 'definite form complement' of the first grammar, and analyze the relation among the components to obtain a dependency syntax analysis result. The dependency syntactic analysis model may be trained using a corpus that belongs to the same language as the first utterance, e.g., if the first utterance is chinese, then the syntactic analysis model is trained based on the chinese corpus; if the first speech is english, then the syntactic analysis model is trained based on the english corpus.
Determining a generalization result of the first intention according to the N dependency syntax analysis results, for example, classifying the designated relation in the dependency syntax analysis results, counting the occurrence times of words in each classification, and taking the words as the generalization result of the first intention if the occurrence times of the words exceed a preset threshold value.
In this embodiment, N first utterances under the first intention are obtained, where N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining a generalization result of the first intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first utterances, the generalization results of the first intentions are determined, and the generalization effect can be improved. In addition, the first intention generalization result is obtained by using dependency syntax analysis, so that the purpose of automatic generalization can be achieved.
In one embodiment of the present application, the slot of each of the N first utterances is a slot candidate word;
the determining the generalization result of the first intention according to the N dependency syntax analysis results comprises:
counting the counting times of each word belonging to the first relation except the slot candidate words in the N dependency syntax analysis results, and counting times of each word belonging to the second relation;
And determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the first intention.
Specifically, the candidate words of the slot may be specific Chinese, for example, "slot", or words in a dictionary of slots may be used, for example, for a time slot, the candidate words of the slot may be words in the dictionary of slots of the time slot; the slot candidate word may be a word in a slot dictionary of the place slot corresponding to the place slot.
The method comprises the steps that the groove position of each first conversation in N first conversations is replaced by words in a groove position dictionary corresponding to the groove position, so that the groove position of each first conversation in N first conversations is a groove position candidate word. When the slot is replaced, words in a slot dictionary corresponding to the slot can be obtained in a random mode, and the words are utilized to replace the slot.
After the dependency syntax analysis result of the first conversation is obtained, the words belonging to the first relation and the words belonging to the second relation in the dependency syntax analysis result are extracted. The first relationship may be a core relationship, a guest-guest relationship, etc., and the second relationship may be an object, an indirect object, a front object, etc., without limitation. The first relation and the second relation include words that do not include slot candidate words.
Extracting the words belonging to the first relation and the words belonging to the second relation from the dependency syntax analysis results corresponding to the first utterances for each of the N first utterances. And then, counting the words in the acquired first relation and second relation, and determining the words with the counted times larger than N/2 in the first relation and the second relation as the generalization result of the first intention.
In this embodiment, the generalization of the dialogue is achieved by counting the number of statistics of each word belonging to the first relationship except the slot candidate word in the N dependency syntax analysis results and the number of statistics of each word belonging to the second relationship, and determining the words with the statistics number greater than N/2 in the first relationship and the second relationship as the generalization result of the first intention, so that a better generalization effect can be achieved only by a small amount of corpus (i.e., the first dialogue).
In one embodiment of the present application, the first relationship includes a first word, and the statistical number of times of the first word is: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;
The second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.
Specifically, calculating the semantic similarity between the words in the first relationship, and if the semantic similarity exceeds a first threshold, counting the number of times corresponding to the words participating in the semantic similarity calculation may be: the number of times the term occurs in the first relationship, plus the number of times another term that participates in the semantic similarity calculation occurs in the first relationship. Namely the number of times that the first word appears in the first relation plus the semantic number of times of the first word, wherein the semantic number of times of the first word is: the number of occurrences of the term having a semantic similarity to the first term greater than a first threshold in a first relationship, the first relationship comprising the first term.
Similarly, the same processing manner is adopted for the words in the second relation, namely the second relation comprises the second words, and the statistical times of the second words are as follows: the number of times the second word appears in the second relationship plus the number of semantics of the second word, wherein the number of semantics of the second word is: the number of occurrences of the term in the second relationship having a semantic similarity to the second term that is greater than the second threshold. The first threshold and the second threshold may be set according to practical situations, and are not limited herein.
For the first word, it may belong to the first relation only in the dependent syntax analysis result of one first phone, or may belong to the first relation in all of the dependent syntax analysis results of a plurality of first phones, so that the first word may appear one or more times in the first relation, that is, the number of times the first word appears in the first relation. For the second word, it may belong to the second relationship only in the dependent syntax analysis result of one first phone, or may belong to the second relationship in all of the dependent syntax analysis results of a plurality of first phones, so that the second word may appear one or more times in the second relationship, that is, the number of times the second word appears in the second relationship.
In this embodiment, when determining the number of statistics of the first word, not only the number of occurrences of the first word in the first relationship is counted, but also the number of occurrences of the word having a semantic similarity with the first word greater than the first threshold in the first relationship is superimposed on the number of occurrences of the first word, so that the generalization capability of the first word can be enhanced, the semantic similarity generalization can be achieved, and the generalization effect can be enhanced.
In one embodiment of the present application, step 202, performing intent recognition on the input speech surgery through an intent model to obtain a target intent of the input speech surgery includes:
Performing dependency syntactic analysis on the input speech technology to obtain a target dependency syntactic analysis result;
acquiring a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result;
and if the semantic similarity of the first target word and a third word belonging to a first relation in the generalization result of the second intention of the plurality of intentions is greater than a third threshold value, and the semantic similarity of the second target word and a fourth word belonging to a second relation in the generalization result of the second intention is greater than a fourth threshold value, determining the second intention as the target intention of the input speech.
Firstly, performing dependency syntax analysis on an input speech operation to obtain a target dependency syntax analysis result, then obtaining a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result, and then comparing the first target word and the second target word with generalized results of all intentions respectively, namely performing semantic similarity calculation on the first target word and words belonging to the first relation in the generalized results of all intentions, and performing semantic similarity calculation on the second target word and words belonging to the second relation in the generalized results of all intentions. And if the semantic similarity of the third word belonging to the first relation in the generalization result of the first target word and the second intention of the plurality of intentions is greater than a third threshold value, and the semantic similarity of the fourth word belonging to the second relation in the generalization result of the second target word and the second intention is greater than a fourth threshold value, determining the second intention as the target intention of the input speech. The third threshold and the fourth threshold may be set according to practical situations, and are not limited herein.
In this embodiment, if the semantic similarity of the third word belonging to the first relationship in the generalization result of the first target word and the second intentions of the plurality of intentions is greater than a third threshold, and the semantic similarity of the fourth word belonging to the second relationship in the generalization result of the second target word and the second intentions is greater than a fourth threshold, the second intention is determined as the target intention of the input speech, so that the recognition accuracy of the target intention can be improved.
The following describes the specific procedure of the above-described speech recognition method in detail.
First, an intent model is modeled. Preprocessing user-provided speech, including: storing all the utterances of the user according to the intention, and then replacing all the slots with a specific Chinese identifier for each utterances under the intention, such as: "slot position".
Syntactic analysis modeling is performed for conversational operation: inputting each of the speaking sets under the intention into a dependency syntax analysis model respectively to obtain a dependency syntax analysis result; extracting a core relation and a dynamic guest relation as 'v' of a corresponding conversation from each dependency syntactic analysis result, and extracting an object, an indirect object and a front object as 'n' of the sentence from each dependency syntactic analysis result; the following statistics and modeling were done for all the utterances under each intent:
Counting the times of words in all 'v' (which can be understood as a first relation) and the times of words in all 'n' (which can be understood as a second relation);
calculating the semantic similarity between each word in the 'v' and other words, and if the semantic similarity exceeds a first threshold value, adding the number of times of occurrence of the word and the number of times of occurrence of the word with the semantic similarity exceeding the first threshold value, and carrying out the same reason for the 'n';
reserving words (understood as slot candidate words) which are not specific Chinese identification and have the number of times (understood as statistical times) in v and n which is more than or equal to half of the total number of utterances under the intention as modeling results (understood as generalization results of the intention);
the modeling result of each intention is stored separately.
Next, a syntactic dependency analysis is performed on the user's query. Inputting the query language into a dependency syntax analysis model to obtain a dependency syntax analysis result; a first relationship, e.g., a core relationship, a guest-host relationship, in the dependency syntax analysis result is extracted as "v" of the sentence, and a second relationship, e.g., an object, an indirect object, a pre-object, in the dependency syntax analysis result is extracted as "n" of the sentence.
Again, matching the query modeled results with the intent model: respectively loading all stored intention modeling results; matching v and n in the query with v and n of each intention respectively, and if the similarity of the word senses is the same or exceeds a threshold value, considering the matching; the intent of all match hits for "v" and "n" is determined as the target intent.
And finally, aiming at the target intention, carrying out slot recognition by using a corresponding slot recognition module to obtain a target slot, and taking the target intention and the target slot as recognition results of the query operation.
Referring to fig. 3, fig. 3 is a block diagram of a speech generalization apparatus according to an embodiment of the present invention, and as shown in fig. 3, a speech generalization apparatus 300 includes:
the first obtaining module 301 is configured to obtain N first utterances under intention, where N is a positive integer;
a second obtaining module 302, configured to perform dependency syntax analysis on the N first utterances, to obtain N dependency syntax analysis results;
a determining module 303, configured to determine a generalization result of the intent according to the N dependency syntax analysis results.
Further, the slot position of each first conversation in the N first conversations is a slot position candidate word;
The determining module 303 includes:
a statistics sub-module, configured to count the number of statistics of each word belonging to the first relationship and the number of statistics of each word belonging to the second relationship except the slot candidate word in the N dependency syntax analysis results;
and the determining submodule is used for determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the intention.
Further, the first relation includes a first word, and the statistical number of times of the first word is: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;
the second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.
The generalization apparatus 300 is an apparatus applying the embodiment of the method shown in fig. 1, and is not described herein again for avoiding repetition.
According to the speaking operation generalization device 300, N first speaking operations under intention are acquired, wherein N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining the generalization result of the intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first phones, the intended generalization results are determined, and the generalization effect can be improved. In addition, the purpose of automatic generalization can be achieved by using dependency syntax analysis to obtain the intended generalization result.
Referring to fig. 4, fig. 4 is a block diagram of a voice recognition apparatus according to an embodiment of the present invention, and as shown in fig. 4, a voice recognition apparatus 400 includes:
an obtaining module 401, configured to obtain an input speech surgery;
a first recognition module 402, configured to recognize an intention of the input speech surgery through an intention model to obtain a target intention of the input speech surgery, where the intention model includes generalized results of a plurality of intentions;
and a second recognition module 403, configured to perform slot recognition on the input speech technology according to the target intention, so as to obtain a target slot, where a recognition result of the input speech technology includes the target intention and the target slot.
Further, the plurality of intents include a first intention, and the generalization result acquisition process of the first intention included in the intention model includes:
acquiring N first utterances under a first intention, wherein N is a positive integer;
performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;
and determining a generalization result of the first intention according to the N dependency syntax analysis results.
Further, the slot position of each first conversation in the N first conversations is a slot position candidate word;
the determining the generalization result of the first intention according to the N dependency syntax analysis results comprises:
counting the counting times of each word belonging to the first relation except the slot candidate words in the N dependency syntax analysis results, and counting times of each word belonging to the second relation;
and determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the first intention.
Further, the first relation includes a first word, and the statistical number of times of the first word is: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;
The second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.
Further, the first identifying module 402 includes:
the analysis submodule is used for carrying out dependency syntactic analysis on the input speech operation to obtain a target dependency syntactic analysis result;
the obtaining submodule is used for obtaining a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result;
a determining sub-module, configured to determine the second intent as the target intent of the input speech if the semantic similarity of the third term belonging to the first relationship in the generalization result of the first target term and the second intent of the plurality of intentions is greater than a third threshold, and the semantic similarity of the second target term and the fourth term belonging to the second relationship in the generalization result of the second intent is greater than a fourth threshold.
According to the voice operation identification method, an input voice operation is acquired; performing intention recognition on the input voice operation through an intention model to obtain a target intention of the input voice operation, wherein the intention model comprises generalization results of a plurality of intentions; and carrying out slot recognition on the input voice operation according to the target intention to obtain a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot. The target intention of the input voice operation is identified through the intention model, then the target slot position of the input voice operation is identified according to the target intention, and the target intention and the target slot position are used as the identification result of the input voice operation, so that the identification accuracy of the input voice operation can be improved.
Fig. 5 is a schematic hardware structure of an electronic device implementing various embodiments of the present invention, as shown in fig. 5, where the electronic device 500 includes, but is not limited to: radio frequency unit 501, network module 502, audio output unit 503, input unit 504, sensor 505, display unit 506, user input unit 507, interface unit 508, memory 509, processor 510, and power source 511. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 5 is not limiting of the electronic device and that the electronic device may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. In the embodiment of the invention, the electronic equipment comprises, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer and the like.
Wherein, in one embodiment of the present application, the processor 510 is configured to obtain N first utterances under the intention, where N is a positive integer;
performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;
and determining the generalization result of the intention according to the N dependency syntax analysis results.
Further, the slot position of each first conversation in the N first conversations is a slot position candidate word;
a processor 510, configured to count a number of statistics of each word belonging to the first relationship and a number of statistics of each word belonging to the second relationship, except the slot candidate word, in the N dependency syntax analysis results;
and determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the intention.
Further, the first relation includes a first word, and the statistical number of times of the first word is: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;
The second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.
In this embodiment, the electronic device 500 can implement the method in the embodiment shown in fig. 1, and in order to avoid repetition, a description is omitted here.
The electronic device 500 of the embodiment of the present invention obtains N first utterances under intention, where N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining the generalization result of the intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first phones, the intended generalization results are determined, and the generalization effect can be improved. In addition, the purpose of automatic generalization can be achieved by using dependency syntax analysis to obtain the intended generalization result.
In another embodiment of the present application, the processor 510 is configured to obtain an input speech;
performing intention recognition on the input voice operation through an intention model to obtain a target intention of the input voice operation, wherein the intention model comprises generalization results of a plurality of intentions;
And carrying out slot recognition on the input voice operation according to the target intention to obtain a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot.
Further, the plurality of intents include a first intention, and the generalization result acquisition process of the first intention included in the intention model includes:
acquiring N first utterances under a first intention, wherein N is a positive integer;
performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;
and determining a generalization result of the first intention according to the N dependency syntax analysis results.
Further, the slot position of each first conversation in the N first conversations is a slot position candidate word;
the determining the generalization result of the first intention according to the N dependency syntax analysis results comprises:
counting the counting times of each word belonging to the first relation except the slot candidate words in the N dependency syntax analysis results, and counting times of each word belonging to the second relation;
and determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the first intention.
Further, the first relation includes a first word, and the statistical number of times of the first word is: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;
the second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.
Further, the processor 510 is configured to perform dependency syntax analysis on the input speech technology, so as to obtain a target dependency syntax analysis result;
acquiring a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result;
and if the semantic similarity of the first target word and a third word belonging to a first relation in the generalization result of the second intention of the plurality of intentions is greater than a third threshold value, and the semantic similarity of the second target word and a fourth word belonging to a second relation in the generalization result of the second intention is greater than a fourth threshold value, determining the second intention as the target intention of the input speech.
The electronic device 500 is capable of implementing the method in the embodiment shown in fig. 2, and in order to avoid repetition, a description thereof will be omitted.
The electronic device 500 of the embodiment of the invention acquires an input speech technique; performing intention recognition on the input voice operation through an intention model to obtain a target intention of the input voice operation, wherein the intention model comprises generalization results of a plurality of intentions; and carrying out slot recognition on the input voice operation according to the target intention to obtain a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot. The target intention of the input voice operation is identified through the intention model, then the target slot position of the input voice operation is identified according to the target intention, and the target intention and the target slot position are used as the identification result of the input voice operation, so that the identification accuracy of the input voice operation can be improved.
It should be understood that, in the embodiment of the present invention, the radio frequency unit 501 may be used to receive and send information or signals during a call, specifically, receive downlink data from a base station, and then process the downlink data with the processor 510; and, the uplink data is transmitted to the base station. Typically, the radio frequency unit 501 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 501 may also communicate with networks and other devices through a wireless communication system.
The electronic device provides wireless broadband internet access to the user through the network module 502, such as helping the user to send and receive e-mail, browse web pages, access streaming media, and the like.
The audio output unit 503 may convert audio data received by the radio frequency unit 501 or the network module 502 or stored in the memory 509 into an audio signal and output as sound. Also, the audio output unit 503 may also provide audio output (e.g., a call signal reception sound, a message reception sound, etc.) related to a specific function performed by the electronic device 500. The audio output unit 503 includes a speaker, a buzzer, a receiver, and the like.
The input unit 504 is used for receiving an audio or video signal. The input unit 504 may include a graphics processor (Graphics Processing Unit, GPU) 5041 and a microphone 5042, the graphics processor 5041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 506. The image frames processed by the graphics processor 5041 may be stored in the memory 509 (or other storage medium) or transmitted via the radio frequency unit 501 or the network module 502. Microphone 5042 may receive sound and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output that can be transmitted to the mobile communication base station via the radio frequency unit 501 in case of a phone call mode.
The electronic device 500 also includes at least one sensor 505, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 5061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 5061 and/or the backlight when the electronic device 500 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for recognizing the gesture of the electronic equipment (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; the sensor 505 may further include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which are not described herein.
The display unit 506 is used to display information input by a user or information provided to the user. The display unit 506 may include a display panel 5061, and the display panel 5061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like.
The user input unit 507 is operable to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 507 includes a touch panel 5071 and other input devices 5072. Touch panel 5071, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on touch panel 5071 or thereabout using any suitable object or accessory such as a finger, stylus, etc.). Touch panel 5071 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 510, and receives and executes commands sent by the processor 510. In addition, the touch panel 5071 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 5071, the user input unit 507 may include other input devices 5072. In particular, other input devices 5072 may include, but are not limited to, physical keyboards, function keys (e.g., volume control keys, switch keys, etc.), trackballs, mice, joysticks, and so forth, which are not described in detail herein.
Further, the touch panel 5071 may be overlaid on the display panel 5061, and when the touch panel 5071 detects a touch operation thereon or thereabout, the touch operation is transmitted to the processor 510 to determine a type of touch event, and then the processor 510 provides a corresponding visual output on the display panel 5061 according to the type of touch event. Although in fig. 5, the touch panel 5071 and the display panel 5061 are two independent components for implementing the input and output functions of the electronic device, in some embodiments, the touch panel 5071 and the display panel 5061 may be integrated to implement the input and output functions of the electronic device, which is not limited herein.
The interface unit 508 is an interface for connecting an external device to the electronic apparatus 500. For example, the external devices may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 508 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 500 or may be used to transmit data between the electronic apparatus 500 and an external device.
The memory 509 may be used to store software programs as well as various data. The memory 509 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
The processor 510 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 509, and calling data stored in the memory 509, thereby performing overall monitoring of the electronic device. Processor 510 may include one or more processing units; preferably, the processor 510 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 510.
The electronic device 500 may also include a power supply 511 (e.g., a battery) for powering the various components, and preferably the power supply 511 may be logically connected to the processor 510 via a power management system that performs functions such as managing charging, discharging, and power consumption.
In addition, the electronic device 500 includes some functional modules, which are not shown, and will not be described herein.
Preferably, the embodiment of the present invention further provides an electronic device, including a processor 510, a memory 509, and a computer program stored in the memory 509 and capable of running on the processor 510, where the computer program when executed by the processor 510 implements each process of the foregoing speech generalization method embodiment and can achieve the same technical effect, or where the computer program when executed by the processor 510 implements each process of the foregoing speech recognition method embodiment and can achieve the same technical effect, and for avoiding repetition, a description is omitted herein.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements each process of the foregoing speaking generalization method embodiment, or when executed by a processor, implements each process of the foregoing speaking identification method embodiment, and can achieve the same technical effect, so that repetition is avoided and redundant description is omitted herein. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims (12)

1. A method of speech generalization, comprising:
acquiring N first utterances under intention, wherein the N first utterances are N expression modes aiming at the intention, and N is a positive integer;
performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;
determining a generalization result of the intention according to the N dependency syntax analysis results;
the slot position of each first conversation in the N first conversations is a slot position candidate word;
The determining the generalization result of the intention according to the N dependency syntax analysis results comprises the following steps:
counting the counting times of each word belonging to the first relation except the slot candidate words in the N dependency syntax analysis results, and counting times of each word belonging to the second relation;
and determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the intention.
2. The method of claim 1, wherein the first relationship comprises a first term, and wherein the first term is counted as: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;
the second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.
3. A method of speech recognition, comprising:
acquiring an input speech operation;
performing intention recognition on the input voice operation through an intention model to obtain a target intention of the input voice operation, wherein the intention model comprises generalization results of a plurality of intentions;
performing slot recognition on the input voice operation according to the target intention to obtain a target slot, wherein a recognition result of the input voice operation comprises the target intention and the target slot;
wherein the plurality of intents includes a first intention, and the generalization result acquisition process of the first intention included in the intention model includes:
acquiring N first utterances under a first intention, wherein the N first utterances are N expression modes aiming at the intention, and N is a positive integer;
performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;
determining a generalization result of the first intention according to the N dependency syntax analysis results;
the slot position of each first conversation in the N first conversations is a slot position candidate word;
the determining the generalization result of the first intention according to the N dependency syntax analysis results comprises:
Counting the counting times of each word belonging to the first relation except the slot candidate words in the N dependency syntax analysis results, and counting times of each word belonging to the second relation;
and determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the first intention.
4. The method of claim 3, wherein the first relationship comprises a first term, the first term being counted as: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;
the second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.
5. The method of claim 3, wherein the intent recognition of the input utterance by an intent model to obtain a target intent of the input utterance comprises:
performing dependency syntactic analysis on the input speech technology to obtain a target dependency syntactic analysis result;
acquiring a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result;
and if the semantic similarity of the first target word and a third word belonging to a first relation in the generalization result of the second intention of the plurality of intentions is greater than a third threshold value, and the semantic similarity of the second target word and a fourth word belonging to a second relation in the generalization result of the second intention is greater than a fourth threshold value, determining the second intention as the target intention of the input speech.
6. A speech generalization apparatus, comprising:
the first acquisition module is used for acquiring N first utterances under intention, wherein the N first utterances are N expression modes aiming at the intention, and N is a positive integer;
the second acquisition module is used for carrying out dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;
The determining module is used for determining the generalization result of the intention according to the N dependency syntax analysis results;
the slot position of each first conversation in the N first conversations is a slot position candidate word;
the determining module includes:
a statistics sub-module, configured to count the number of statistics of each word belonging to the first relationship and the number of statistics of each word belonging to the second relationship except the slot candidate word in the N dependency syntax analysis results;
and the determining submodule is used for determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the intention.
7. The apparatus of claim 6, wherein the first relationship comprises a first term, the first term being counted as: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;
the second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.
8. A speech recognition device, comprising:
the acquisition module is used for acquiring an input voice operation;
the first recognition module is used for carrying out intention recognition on the input voice operation through an intention model so as to obtain target intention of the input voice operation, wherein the intention model comprises generalized results of a plurality of intentions;
the second recognition module is used for recognizing the input voice operation into a target slot according to the target intention, and obtaining a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot;
the plurality of intents includes a first intention, and the generalized result acquisition process of the first intention included in the intention model includes:
acquiring N first utterances under a first intention, wherein the N first utterances are N expression modes aiming at the intention, and N is a positive integer;
performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;
determining a generalization result of the first intention according to the N dependency syntax analysis results;
the slot position of each first conversation in the N first conversations is a slot position candidate word;
the determining the generalization result of the first intention according to the N dependency syntax analysis results comprises:
Counting the counting times of each word belonging to the first relation except the slot candidate words in the N dependency syntax analysis results, and counting times of each word belonging to the second relation;
and determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the first intention.
9. The apparatus of claim 8, wherein the first relationship comprises a first term, the first term being counted as: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;
the second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.
10. The apparatus of claim 8, wherein the first identification module comprises:
the analysis submodule is used for carrying out dependency syntactic analysis on the input speech operation to obtain a target dependency syntactic analysis result;
the obtaining submodule is used for obtaining a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result;
a determining sub-module, configured to determine the second intent as the target intent of the input speech if the semantic similarity of the third term belonging to the first relationship in the generalization result of the first target term and the second intent of the plurality of intentions is greater than a third threshold, and the semantic similarity of the second target term and the fourth term belonging to the second relationship in the generalization result of the second intent is greater than a fourth threshold.
11. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program implementing the steps of the speech recognition method according to claim 1 or 2 when executed by the processor or the steps of the speech recognition method according to any one of claims 3 to 5 when executed by the processor.
12. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the speech generalization method according to claim 1 or 2 or which, when executed by the processor, implements the steps of the speech recognition method according to any one of claims 3 to 5.
CN201911288675.XA 2019-12-12 2019-12-12 Speaking generalization method, speaking recognition device and electronic equipment Active CN111062200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911288675.XA CN111062200B (en) 2019-12-12 2019-12-12 Speaking generalization method, speaking recognition device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911288675.XA CN111062200B (en) 2019-12-12 2019-12-12 Speaking generalization method, speaking recognition device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111062200A CN111062200A (en) 2020-04-24
CN111062200B true CN111062200B (en) 2024-03-05

Family

ID=70301559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911288675.XA Active CN111062200B (en) 2019-12-12 2019-12-12 Speaking generalization method, speaking recognition device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111062200B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069828B (en) * 2020-07-31 2023-07-04 飞诺门阵(北京)科技有限公司 Text intention recognition method and device
CN113343708A (en) * 2021-06-11 2021-09-03 北京声智科技有限公司 Method and device for realizing statement generalization based on semantics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062902A (en) * 2018-08-17 2018-12-21 科大讯飞股份有限公司 A kind of text semantic expression and device
CN109241524A (en) * 2018-08-13 2019-01-18 腾讯科技(深圳)有限公司 Semantic analysis method and device, computer readable storage medium, electronic equipment
CN110069709A (en) * 2019-04-10 2019-07-30 腾讯科技(深圳)有限公司 Intension recognizing method, device, computer-readable medium and electronic equipment
CN110096709A (en) * 2019-05-07 2019-08-06 百度在线网络技术(北京)有限公司 Command processing method and device, server and computer-readable medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10762062B2 (en) * 2016-04-04 2020-09-01 Xerox Corporation Data governance: change management based on contextualized dependencies

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241524A (en) * 2018-08-13 2019-01-18 腾讯科技(深圳)有限公司 Semantic analysis method and device, computer readable storage medium, electronic equipment
CN109062902A (en) * 2018-08-17 2018-12-21 科大讯飞股份有限公司 A kind of text semantic expression and device
CN110069709A (en) * 2019-04-10 2019-07-30 腾讯科技(深圳)有限公司 Intension recognizing method, device, computer-readable medium and electronic equipment
CN110096709A (en) * 2019-05-07 2019-08-06 百度在线网络技术(北京)有限公司 Command processing method and device, server and computer-readable medium

Also Published As

Publication number Publication date
CN111062200A (en) 2020-04-24

Similar Documents

Publication Publication Date Title
CN110334347B (en) Information processing method based on natural language recognition, related equipment and storage medium
CN107919138B (en) Emotion processing method in voice and mobile terminal
CN110570840B (en) Intelligent device awakening method and device based on artificial intelligence
CN110827826B (en) Method for converting words by voice and electronic equipment
CN109545221B (en) Parameter adjustment method, mobile terminal and computer readable storage medium
CN111833872B (en) Voice control method, device, equipment, system and medium for elevator
CN111159338A (en) Malicious text detection method and device, electronic equipment and storage medium
WO2017088434A1 (en) Human face model matrix training method and apparatus, and storage medium
CN109040444B (en) Call recording method, terminal and computer readable storage medium
CN111062200B (en) Speaking generalization method, speaking recognition device and electronic equipment
CN111522592A (en) Intelligent terminal awakening method and device based on artificial intelligence
CN109992753B (en) Translation processing method and terminal equipment
CN116070114A (en) Data set construction method and device, electronic equipment and storage medium
CN109063076B (en) Picture generation method and mobile terminal
CN111144065B (en) Display control method and electronic equipment
CN110826098B (en) Information processing method and electronic equipment
CN110390102B (en) Emotion analysis method and related device
CN109274814B (en) Message prompting method and device and terminal equipment
CN107957789B (en) Text input method and mobile terminal
CN116127966A (en) Text processing method, language model training method and electronic equipment
CN111753047B (en) Text processing method and device
CN109347721B (en) Information sending method and terminal equipment
CN108093124B (en) Audio positioning method and device and mobile terminal
CN108958505B (en) Method and terminal for displaying candidate information
CN113535926B (en) Active dialogue method and device and voice terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant