CN111062200B

CN111062200B - Speaking generalization method, speaking recognition device and electronic equipment

Info

Publication number: CN111062200B
Application number: CN201911288675.XA
Authority: CN
Inventors: 游程; 苏少炜; 陈孝良; 常乐
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2024-03-05
Anticipated expiration: 2039-12-12
Also published as: CN111062200A

Abstract

The invention provides a speaking operation generalization method, a speaking operation identification device and electronic equipment, wherein the method comprises the following steps: acquiring N first utterances under intention, wherein N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining the generalization result of the intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first phones, the intended generalization results are determined, and the generalization effect can be improved. In addition, the purpose of automatic generalization can be achieved by using dependency syntax analysis to obtain the intended generalization result.

Description

Speaking generalization method, speaking recognition device and electronic equipment

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a speech surgery generalization method, a speech surgery recognition device, and an electronic device.

Background

In the current automatic generalization technology, when generalizing a given speech, a manner of removing stop words is generally used, that is, unimportant words such as 'please' in the given speech are defined as stop words, the stop words are removed from the given speech, and when using, the stop words in a user request sentence are also removed to achieve a certain degree of generalization.

The current automatic generalization technology is realized by removing the stop words, and the generalization effect is poor.

Disclosure of Invention

The embodiment of the invention provides a voice operation generalization method, a voice operation identification device and electronic equipment, which are used for solving the problem of poor generalization effect of the generalization method in the prior art.

In order to solve the technical problems, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a session generalization method, including:

acquiring N first utterances under intention, wherein N is a positive integer;

performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;

and determining the generalization result of the intention according to the N dependency syntax analysis results.

In a second aspect, an embodiment of the present invention provides a speech recognition method, including:

acquiring an input speech operation;

performing intention recognition on the input voice operation through an intention model to obtain a target intention of the input voice operation, wherein the intention model comprises generalization results of a plurality of intentions;

and carrying out slot recognition on the input voice operation according to the target intention to obtain a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot.

In a third aspect, an embodiment of the present invention further provides a speech generalization apparatus, including:

the first acquisition module is used for acquiring N first utterances under intention, wherein N is a positive integer;

the second acquisition module is used for carrying out dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;

and the determining module is used for determining the generalization result of the intention according to the N dependency syntax analysis results.

In a fourth aspect, an embodiment of the present invention further provides a speech recognition apparatus, including:

the acquisition module is used for acquiring an input voice operation;

the first recognition module is used for carrying out intention recognition on the input voice operation through an intention model so as to obtain target intention of the input voice operation, wherein the intention model comprises generalized results of a plurality of intentions;

and the second recognition module is used for recognizing the input voice operation into a target slot according to the target intention, and obtaining the target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot.

In a fifth aspect, an embodiment of the present invention further provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program implements the steps of the speaking generalization method as described in the first aspect when executed by the processor, or the computer program implements the steps of the speaking recognition method as described in the second aspect when executed by the processor.

In a sixth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program implements the steps of the speech surgery generalization method according to the first aspect when being executed by the processor, or where the computer program implements the steps of the speech surgery recognition method according to the second aspect when being executed by the processor.

In the embodiment of the invention, N first utterances under the intention are acquired, wherein N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining the generalization result of the intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first phones, the intended generalization results are determined, and the generalization effect can be improved. In addition, the purpose of automatic generalization can be achieved by using dependency syntax analysis to obtain the intended generalization result.

Drawings

FIG. 1 is a flow chart of a method of speaking generalization provided by an embodiment of the present invention;

FIG. 2 is a flow chart of a method of identifying speech surgery provided by an embodiment of the present invention;

FIG. 3 is a block diagram of a speech generalization apparatus provided by an embodiment of the present invention;

FIG. 4 is a block diagram of a speech recognition device according to an embodiment of the present invention;

fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart of a speaking generalization method provided by an embodiment of the present invention, and as shown in fig. 1, the embodiment provides a speaking generalization method, which is applied to a speaking generalization apparatus, and includes the following steps:

step 101, acquiring N first utterances under intention, wherein N is a positive integer.

Specifically, there may be multiple expressions for one intention, and one intention corresponds to N first utterances. The N first utterances may be determined manually. Further, for slots in the first conversation, all slots may be replaced with a particular Chinese, such as "slots". Certain words may be used to replace slots. For example, for the intent to subscribe to an air ticket, the first utterance may be: i want to order { time } tickets from { origin } to { destination }; help me reserve a ticket to { destination }, etc., where the parts in "{ }" are slots that can be replaced with words in the slot dictionary. For example, for a time slot, the words in the slot dictionary for the time slot may be replaced with words in the slot dictionary for the time slot, and if "today" is a word in the slot dictionary for the time slot, the time slot in the first conversation may be replaced with "today". If the words in the dictionary of places and slots include "Beijing" and "Turkish", the place and slot of departure in the first phone may be replaced with "Beijing" and the destination in the first phone may be replaced with "Turkish".

After the words in the slot dictionary are adopted to replace the slots in the first conversation, the first conversation is specifically as follows: i want to order today's ticket from Beijing to Turkish, helping i to order a ticket to remove Turkish. The above alternatives are merely examples, and the present invention is not limited thereto, and the time slot in the first speech operation may be replaced with other words in the slot dictionary corresponding to the time slot instead of "today". Similarly, the place slots may be replaced with other words in the place slot dictionary instead of "Beijing" and "Turkey".

And 102, performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results.

And performing dependency syntax analysis on each first call in the N first calls to obtain a dependency syntax analysis result. The N first utterances correspond to N dependency syntax analysis results. Before the dependency syntax analysis is carried out on the first word, word segmentation is carried out on the first word, part-of-speech analysis is carried out on the segmented words, then the dependency syntax analysis is carried out on each segmented word, and the dependency syntax analysis result of each segmented word is obtained. The dependency grammar (Dependency Parsing, DP for short) reveals its syntactic structure by analyzing dependencies between components within a language unit. Dependency syntactic analysis identifies the grammar components "main predicate" and "fixed-form complement" in a sentence (which can be understood as a conversation), and analyzes the relationship between the components.

For example, for the term "apple is consumed by me", the word segmentation result is: apple, quilt, me, eat; wherein, the part of speech of "apple" is n, the part of speech of "quilt" is p, the part of speech of "I" is r, the part of speech of "eating" is v, and the part of speech of "having" is u. After the dependency syntax analysis is performed, the obtained dependency syntax analysis result is: the relation between "apple" and "eat" is Front Object (FOB), the relation between "quilt" and "eat" is in-shape structure (ADV), the relation between "me" and "quilt" is in-man relation (POB), and the relation between "quilt" and "eat" is in right additional Relation (RAD). "eat" is the core of the entire sentence.

In this step, N first utterances may be input into the dependency syntax analysis model, thereby obtaining a dependency syntax analysis result. The dependency syntax analysis model can analyze grammar components such as 'main predicate' and 'definite form complement' of the first grammar, and analyze the relation among the components to obtain a dependency syntax analysis result. The dependency syntactic analysis model may be trained using a corpus that belongs to the same language as the first utterance, e.g., if the first utterance is chinese, then the syntactic analysis model is trained based on the chinese corpus; if the first speech is english, then the syntactic analysis model is trained based on the english corpus.

And step 103, determining the generalization result of the intention according to the N dependency syntax analysis results.

Specifically, according to the N dependency syntax analysis results, determining an intent generalization result, for example, classifying the designated relation in the dependency syntax analysis results, counting the occurrence times of words in each classification, and if the occurrence times of the words exceed a preset threshold, taking the words as the intent generalization result.

In this embodiment, N first utterances under the intention are obtained, where N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining the generalization result of the intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first phones, the intended generalization results are determined, and the generalization effect can be improved. In addition, the purpose of automatic generalization can be achieved by using dependency syntax analysis to obtain the intended generalization result.

In one embodiment of the present application, the slot of each of the N first utterances is a slot candidate word;

step 103, determining a generalization result of the intention according to the N dependency syntax analysis results, including:

Counting the counting times of each word belonging to the first relation except the slot candidate words in the N dependency syntax analysis results, and counting times of each word belonging to the second relation;

and determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the intention.

Specifically, the candidate words of the slot may be specific Chinese, for example, "slot", or words in a dictionary of slots may be used, for example, for a time slot, the candidate words of the slot may be words in the dictionary of slots of the time slot; the slot candidate word may be a word in a slot dictionary of the place slot corresponding to the place slot.

The method comprises the steps that the groove position of each first conversation in N first conversations is replaced by words in a groove position dictionary corresponding to the groove position, so that the groove position of each first conversation in N first conversations is a groove position candidate word. When the slot is replaced, words in a slot dictionary corresponding to the slot can be obtained in a random mode, and the words are utilized to replace the slot.

After the dependency syntax analysis result of the first conversation is obtained, the words belonging to the first relation and the words belonging to the second relation in the dependency syntax analysis result are extracted. The first relationship may be a core relationship, a guest-guest relationship, etc., and the second relationship may be an object, an indirect object, a front object, etc., without limitation. The first relation and the second relation include words that do not include slot candidate words.

Extracting the words belonging to the first relation and the words belonging to the second relation from the dependency syntax analysis results corresponding to the first utterances for each of the N first utterances. And then, counting the words in the acquired first relation and second relation, and determining the words with the counted times larger than N/2 in the first relation and the second relation as the generalization result of the intention.

In this embodiment, the generalization of the dialogue is achieved by counting the number of statistics of each word belonging to the first relationship except the slot candidate word in the N dependency syntax analysis results and the number of statistics of each word belonging to the second relationship, and determining the words with the statistics number greater than N/2 in the first relationship and the second relationship as the generalization result of the intention, so that a better generalization effect can be achieved only by a small amount of corpus (i.e., the first dialogue).

In one embodiment of the present application, the first relationship includes a first word, and the statistical number of times of the first word is: the number of times that the first word appears in the first relation is added to the number of times of the semantics of the first word, wherein the number of times of the semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;

The second relation comprises a second word, and the statistical times of the second word are as follows: the number of times that the second word appears in the second relation is added to the number of times of semantics of the second word, wherein the number of times of semantics of the second word is as follows: and the number of times that the words with the semantic similarity with the second words in the second relation is larger than a second threshold value appear.

Specifically, calculating the semantic similarity between the words in the first relationship, and if the semantic similarity exceeds a first threshold, counting the number of times corresponding to the words participating in the semantic similarity calculation may be: the number of times the term occurs in the first relationship, plus the number of times another term that participates in the semantic similarity calculation occurs in the first relationship. Namely the number of times that the first word appears in the first relation plus the semantic number of times of the first word, wherein the semantic number of times of the first word is: the number of occurrences of the term having a semantic similarity to the first term greater than a first threshold in a first relationship, the first relationship comprising the first term.

Similarly, the same processing manner is adopted for the words in the second relation, namely the second relation comprises the second words, and the statistical times of the second words are as follows: the number of times the second word appears in the second relationship plus the number of semantics of the second word, wherein the number of semantics of the second word is: the number of occurrences of the term in the second relationship having a semantic similarity to the second term that is greater than the second threshold. The first threshold and the second threshold may be set according to practical situations, and are not limited herein.

For the first word, it may belong to the first relation only in the dependent syntax analysis result of one first phone, or may belong to the first relation in all of the dependent syntax analysis results of a plurality of first phones, so that the first word may appear one or more times in the first relation, that is, the number of times the first word appears in the first relation. For the second word, it may belong to the second relationship only in the dependent syntax analysis result of one first phone, or may belong to the second relationship in all of the dependent syntax analysis results of a plurality of first phones, so that the second word may appear one or more times in the second relationship, that is, the number of times the second word appears in the second relationship.

In this embodiment, when determining the number of statistics of the first word, not only the number of occurrences of the first word in the first relationship is counted, but also the number of occurrences of the word having a semantic similarity with the first word greater than the first threshold in the first relationship is superimposed on the number of occurrences of the first word, so that the generalization capability of the first word can be enhanced, the semantic similarity generalization can be achieved, and the generalization effect can be enhanced.

Referring to fig. 2, fig. 2 is a flowchart of a voice recognition method according to an embodiment of the present invention, and as shown in fig. 2, the present embodiment provides a voice recognition method, which is applied to a voice recognition device, and includes the following steps:

Step 201, obtaining an input speech technology.

The input speech may be a query statement entered by the user.

Step 202, performing intention recognition on the input voice through an intention model to obtain a target intention of the input voice, wherein the intention model comprises generalized results of a plurality of intentions.

The intent model comprises generalization results of a plurality of intents, wherein the generalization result acquisition process of at least one intent of the plurality of intents is obtained by using the speaking generalization method of the embodiment shown in fig. 1. And carrying out intention recognition on the input voice operation through the intention model to obtain the target intention of the input voice operation.

Step 203, according to the target intention, carrying out slot recognition on the input speech operation to obtain a target slot.

And carrying out slot recognition on the input voice operation by adopting a corresponding slot recognition module according to the target intention to obtain a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot.

According to the voice operation identification method, an input voice operation is acquired; performing intention recognition on the input voice operation through an intention model to obtain a target intention of the input voice operation, wherein the intention model comprises generalization results of a plurality of intentions; and carrying out slot recognition on the input voice operation according to the target intention to obtain a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot. The target intention of the input voice operation is identified through the intention model, then the target slot position of the input voice operation is identified according to the target intention, and the target intention and the target slot position are used as the identification result of the input voice operation, so that the identification accuracy of the input voice operation can be improved.

In one embodiment of the present application, the plurality of intents includes a first intention, and the generalized result acquisition process of the first intention included in the intention model includes:

acquiring N first utterances under a first intention, wherein N is a positive integer;

and determining a generalization result of the first intention according to the N dependency syntax analysis results.

Specifically, the first intention may have a plurality of expressions, and the first intention may include N first utterances. The N first utterances may be determined manually. Further, for the slot in the first conversation, a specific word may be used to replace the slot. For example, for a first intent to subscribe to an air ticket, the first utterance may be: i want to order { time } tickets from { origin } to { destination }; help me reserve a ticket to { destination }, etc., where the parts in "{ }" are slots that can be replaced with words in the slot dictionary. For example, for a time slot, the words in the slot dictionary for the time slot may be replaced with words in the slot dictionary for the time slot, and if "today" is a word in the slot dictionary for the time slot, the time slot in the first conversation may be replaced with "today". If the words in the dictionary of places and slots include "Beijing" and "Turkish", the place and slot of departure in the first phone may be replaced with "Beijing" and the destination in the first phone may be replaced with "Turkish".

N first utterances may be input into the dependency syntax analysis model, thereby obtaining a dependency syntax analysis result. The dependency syntax analysis model can analyze grammar components such as 'main predicate' and 'definite form complement' of the first grammar, and analyze the relation among the components to obtain a dependency syntax analysis result. The dependency syntactic analysis model may be trained using a corpus that belongs to the same language as the first utterance, e.g., if the first utterance is chinese, then the syntactic analysis model is trained based on the chinese corpus; if the first speech is english, then the syntactic analysis model is trained based on the english corpus.

Determining a generalization result of the first intention according to the N dependency syntax analysis results, for example, classifying the designated relation in the dependency syntax analysis results, counting the occurrence times of words in each classification, and taking the words as the generalization result of the first intention if the occurrence times of the words exceed a preset threshold value.

In this embodiment, N first utterances under the first intention are obtained, where N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining a generalization result of the first intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first utterances, the generalization results of the first intentions are determined, and the generalization effect can be improved. In addition, the first intention generalization result is obtained by using dependency syntax analysis, so that the purpose of automatic generalization can be achieved.

the determining the generalization result of the first intention according to the N dependency syntax analysis results comprises:

And determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the first intention.

Extracting the words belonging to the first relation and the words belonging to the second relation from the dependency syntax analysis results corresponding to the first utterances for each of the N first utterances. And then, counting the words in the acquired first relation and second relation, and determining the words with the counted times larger than N/2 in the first relation and the second relation as the generalization result of the first intention.

In this embodiment, the generalization of the dialogue is achieved by counting the number of statistics of each word belonging to the first relationship except the slot candidate word in the N dependency syntax analysis results and the number of statistics of each word belonging to the second relationship, and determining the words with the statistics number greater than N/2 in the first relationship and the second relationship as the generalization result of the first intention, so that a better generalization effect can be achieved only by a small amount of corpus (i.e., the first dialogue).

In one embodiment of the present application, the first relationship includes a first word, and the statistical number of times of the first word is: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;

In one embodiment of the present application, step 202, performing intent recognition on the input speech surgery through an intent model to obtain a target intent of the input speech surgery includes:

Performing dependency syntactic analysis on the input speech technology to obtain a target dependency syntactic analysis result;

acquiring a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result;

and if the semantic similarity of the first target word and a third word belonging to a first relation in the generalization result of the second intention of the plurality of intentions is greater than a third threshold value, and the semantic similarity of the second target word and a fourth word belonging to a second relation in the generalization result of the second intention is greater than a fourth threshold value, determining the second intention as the target intention of the input speech.

Firstly, performing dependency syntax analysis on an input speech operation to obtain a target dependency syntax analysis result, then obtaining a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result, and then comparing the first target word and the second target word with generalized results of all intentions respectively, namely performing semantic similarity calculation on the first target word and words belonging to the first relation in the generalized results of all intentions, and performing semantic similarity calculation on the second target word and words belonging to the second relation in the generalized results of all intentions. And if the semantic similarity of the third word belonging to the first relation in the generalization result of the first target word and the second intention of the plurality of intentions is greater than a third threshold value, and the semantic similarity of the fourth word belonging to the second relation in the generalization result of the second target word and the second intention is greater than a fourth threshold value, determining the second intention as the target intention of the input speech. The third threshold and the fourth threshold may be set according to practical situations, and are not limited herein.

In this embodiment, if the semantic similarity of the third word belonging to the first relationship in the generalization result of the first target word and the second intentions of the plurality of intentions is greater than a third threshold, and the semantic similarity of the fourth word belonging to the second relationship in the generalization result of the second target word and the second intentions is greater than a fourth threshold, the second intention is determined as the target intention of the input speech, so that the recognition accuracy of the target intention can be improved.

The following describes the specific procedure of the above-described speech recognition method in detail.

First, an intent model is modeled. Preprocessing user-provided speech, including: storing all the utterances of the user according to the intention, and then replacing all the slots with a specific Chinese identifier for each utterances under the intention, such as: "slot position".

Syntactic analysis modeling is performed for conversational operation: inputting each of the speaking sets under the intention into a dependency syntax analysis model respectively to obtain a dependency syntax analysis result; extracting a core relation and a dynamic guest relation as 'v' of a corresponding conversation from each dependency syntactic analysis result, and extracting an object, an indirect object and a front object as 'n' of the sentence from each dependency syntactic analysis result; the following statistics and modeling were done for all the utterances under each intent:

Counting the times of words in all 'v' (which can be understood as a first relation) and the times of words in all 'n' (which can be understood as a second relation);

calculating the semantic similarity between each word in the 'v' and other words, and if the semantic similarity exceeds a first threshold value, adding the number of times of occurrence of the word and the number of times of occurrence of the word with the semantic similarity exceeding the first threshold value, and carrying out the same reason for the 'n';

reserving words (understood as slot candidate words) which are not specific Chinese identification and have the number of times (understood as statistical times) in v and n which is more than or equal to half of the total number of utterances under the intention as modeling results (understood as generalization results of the intention);

the modeling result of each intention is stored separately.

Next, a syntactic dependency analysis is performed on the user's query. Inputting the query language into a dependency syntax analysis model to obtain a dependency syntax analysis result; a first relationship, e.g., a core relationship, a guest-host relationship, in the dependency syntax analysis result is extracted as "v" of the sentence, and a second relationship, e.g., an object, an indirect object, a pre-object, in the dependency syntax analysis result is extracted as "n" of the sentence.

Again, matching the query modeled results with the intent model: respectively loading all stored intention modeling results; matching v and n in the query with v and n of each intention respectively, and if the similarity of the word senses is the same or exceeds a threshold value, considering the matching; the intent of all match hits for "v" and "n" is determined as the target intent.

And finally, aiming at the target intention, carrying out slot recognition by using a corresponding slot recognition module to obtain a target slot, and taking the target intention and the target slot as recognition results of the query operation.

Referring to fig. 3, fig. 3 is a block diagram of a speech generalization apparatus according to an embodiment of the present invention, and as shown in fig. 3, a speech generalization apparatus 300 includes:

the first obtaining module 301 is configured to obtain N first utterances under intention, where N is a positive integer;

a second obtaining module 302, configured to perform dependency syntax analysis on the N first utterances, to obtain N dependency syntax analysis results;

a determining module 303, configured to determine a generalization result of the intent according to the N dependency syntax analysis results.

Further, the slot position of each first conversation in the N first conversations is a slot position candidate word;

The determining module 303 includes:

a statistics sub-module, configured to count the number of statistics of each word belonging to the first relationship and the number of statistics of each word belonging to the second relationship except the slot candidate word in the N dependency syntax analysis results;

and the determining submodule is used for determining words with the statistics times larger than N/2 in the first relation and the second relation as generalization results of the intention.

Further, the first relation includes a first word, and the statistical number of times of the first word is: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;

The generalization apparatus 300 is an apparatus applying the embodiment of the method shown in fig. 1, and is not described herein again for avoiding repetition.

According to the speaking operation generalization device 300, N first speaking operations under intention are acquired, wherein N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining the generalization result of the intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first phones, the intended generalization results are determined, and the generalization effect can be improved. In addition, the purpose of automatic generalization can be achieved by using dependency syntax analysis to obtain the intended generalization result.

Referring to fig. 4, fig. 4 is a block diagram of a voice recognition apparatus according to an embodiment of the present invention, and as shown in fig. 4, a voice recognition apparatus 400 includes:

an obtaining module 401, configured to obtain an input speech surgery;

a first recognition module 402, configured to recognize an intention of the input speech surgery through an intention model to obtain a target intention of the input speech surgery, where the intention model includes generalized results of a plurality of intentions;

and a second recognition module 403, configured to perform slot recognition on the input speech technology according to the target intention, so as to obtain a target slot, where a recognition result of the input speech technology includes the target intention and the target slot.

Further, the plurality of intents include a first intention, and the generalization result acquisition process of the first intention included in the intention model includes:

Further, the first identifying module 402 includes:

the analysis submodule is used for carrying out dependency syntactic analysis on the input speech operation to obtain a target dependency syntactic analysis result;

the obtaining submodule is used for obtaining a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result;

a determining sub-module, configured to determine the second intent as the target intent of the input speech if the semantic similarity of the third term belonging to the first relationship in the generalization result of the first target term and the second intent of the plurality of intentions is greater than a third threshold, and the semantic similarity of the second target term and the fourth term belonging to the second relationship in the generalization result of the second intent is greater than a fourth threshold.

Fig. 5 is a schematic hardware structure of an electronic device implementing various embodiments of the present invention, as shown in fig. 5, where the electronic device 500 includes, but is not limited to: radio frequency unit 501, network module 502, audio output unit 503, input unit 504, sensor 505, display unit 506, user input unit 507, interface unit 508, memory 509, processor 510, and power source 511. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 5 is not limiting of the electronic device and that the electronic device may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. In the embodiment of the invention, the electronic equipment comprises, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer and the like.

Wherein, in one embodiment of the present application, the processor 510 is configured to obtain N first utterances under the intention, where N is a positive integer;

a processor 510, configured to count a number of statistics of each word belonging to the first relationship and a number of statistics of each word belonging to the second relationship, except the slot candidate word, in the N dependency syntax analysis results;

In this embodiment, the electronic device 500 can implement the method in the embodiment shown in fig. 1, and in order to avoid repetition, a description is omitted here.

The electronic device 500 of the embodiment of the present invention obtains N first utterances under intention, where N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining the generalization result of the intention according to the N dependency syntax analysis results. According to the dependency syntax analysis results of the N first phones, the intended generalization results are determined, and the generalization effect can be improved. In addition, the purpose of automatic generalization can be achieved by using dependency syntax analysis to obtain the intended generalization result.

In another embodiment of the present application, the processor 510 is configured to obtain an input speech;

Further, the processor 510 is configured to perform dependency syntax analysis on the input speech technology, so as to obtain a target dependency syntax analysis result;

The electronic device 500 is capable of implementing the method in the embodiment shown in fig. 2, and in order to avoid repetition, a description thereof will be omitted.

The electronic device 500 of the embodiment of the invention acquires an input speech technique; performing intention recognition on the input voice operation through an intention model to obtain a target intention of the input voice operation, wherein the intention model comprises generalization results of a plurality of intentions; and carrying out slot recognition on the input voice operation according to the target intention to obtain a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot. The target intention of the input voice operation is identified through the intention model, then the target slot position of the input voice operation is identified according to the target intention, and the target intention and the target slot position are used as the identification result of the input voice operation, so that the identification accuracy of the input voice operation can be improved.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 501 may be used to receive and send information or signals during a call, specifically, receive downlink data from a base station, and then process the downlink data with the processor 510; and, the uplink data is transmitted to the base station. Typically, the radio frequency unit 501 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 501 may also communicate with networks and other devices through a wireless communication system.

The electronic device provides wireless broadband internet access to the user through the network module 502, such as helping the user to send and receive e-mail, browse web pages, access streaming media, and the like.

The audio output unit 503 may convert audio data received by the radio frequency unit 501 or the network module 502 or stored in the memory 509 into an audio signal and output as sound. Also, the audio output unit 503 may also provide audio output (e.g., a call signal reception sound, a message reception sound, etc.) related to a specific function performed by the electronic device 500. The audio output unit 503 includes a speaker, a buzzer, a receiver, and the like.

The input unit 504 is used for receiving an audio or video signal. The input unit 504 may include a graphics processor (Graphics Processing Unit, GPU) 5041 and a microphone 5042, the graphics processor 5041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 506. The image frames processed by the graphics processor 5041 may be stored in the memory 509 (or other storage medium) or transmitted via the radio frequency unit 501 or the network module 502. Microphone 5042 may receive sound and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output that can be transmitted to the mobile communication base station via the radio frequency unit 501 in case of a phone call mode.

The electronic device 500 also includes at least one sensor 505, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 5061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 5061 and/or the backlight when the electronic device 500 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for recognizing the gesture of the electronic equipment (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; the sensor 505 may further include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which are not described herein.

The display unit 506 is used to display information input by a user or information provided to the user. The display unit 506 may include a display panel 5061, and the display panel 5061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 507 is operable to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 507 includes a touch panel 5071 and other input devices 5072. Touch panel 5071, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on touch panel 5071 or thereabout using any suitable object or accessory such as a finger, stylus, etc.). Touch panel 5071 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 510, and receives and executes commands sent by the processor 510. In addition, the touch panel 5071 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 5071, the user input unit 507 may include other input devices 5072. In particular, other input devices 5072 may include, but are not limited to, physical keyboards, function keys (e.g., volume control keys, switch keys, etc.), trackballs, mice, joysticks, and so forth, which are not described in detail herein.

Further, the touch panel 5071 may be overlaid on the display panel 5061, and when the touch panel 5071 detects a touch operation thereon or thereabout, the touch operation is transmitted to the processor 510 to determine a type of touch event, and then the processor 510 provides a corresponding visual output on the display panel 5061 according to the type of touch event. Although in fig. 5, the touch panel 5071 and the display panel 5061 are two independent components for implementing the input and output functions of the electronic device, in some embodiments, the touch panel 5071 and the display panel 5061 may be integrated to implement the input and output functions of the electronic device, which is not limited herein.

The interface unit 508 is an interface for connecting an external device to the electronic apparatus 500. For example, the external devices may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 508 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 500 or may be used to transmit data between the electronic apparatus 500 and an external device.

The memory 509 may be used to store software programs as well as various data. The memory 509 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The processor 510 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 509, and calling data stored in the memory 509, thereby performing overall monitoring of the electronic device. Processor 510 may include one or more processing units; preferably, the processor 510 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 510.

The electronic device 500 may also include a power supply 511 (e.g., a battery) for powering the various components, and preferably the power supply 511 may be logically connected to the processor 510 via a power management system that performs functions such as managing charging, discharging, and power consumption.

In addition, the electronic device 500 includes some functional modules, which are not shown, and will not be described herein.

Preferably, the embodiment of the present invention further provides an electronic device, including a processor 510, a memory 509, and a computer program stored in the memory 509 and capable of running on the processor 510, where the computer program when executed by the processor 510 implements each process of the foregoing speech generalization method embodiment and can achieve the same technical effect, or where the computer program when executed by the processor 510 implements each process of the foregoing speech recognition method embodiment and can achieve the same technical effect, and for avoiding repetition, a description is omitted herein.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements each process of the foregoing speaking generalization method embodiment, or when executed by a processor, implements each process of the foregoing speaking identification method embodiment, and can achieve the same technical effect, so that repetition is avoided and redundant description is omitted herein. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims

1. A method of speech generalization, comprising:

acquiring N first utterances under intention, wherein the N first utterances are N expression modes aiming at the intention, and N is a positive integer;

determining a generalization result of the intention according to the N dependency syntax analysis results;

the slot position of each first conversation in the N first conversations is a slot position candidate word;

The determining the generalization result of the intention according to the N dependency syntax analysis results comprises the following steps:

2. The method of claim 1, wherein the first relationship comprises a first term, and wherein the first term is counted as: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;

3. A method of speech recognition, comprising:

acquiring an input speech operation;

performing slot recognition on the input voice operation according to the target intention to obtain a target slot, wherein a recognition result of the input voice operation comprises the target intention and the target slot;

wherein the plurality of intents includes a first intention, and the generalization result acquisition process of the first intention included in the intention model includes:

acquiring N first utterances under a first intention, wherein the N first utterances are N expression modes aiming at the intention, and N is a positive integer;

determining a generalization result of the first intention according to the N dependency syntax analysis results;

4. The method of claim 3, wherein the first relationship comprises a first term, the first term being counted as: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;

5. The method of claim 3, wherein the intent recognition of the input utterance by an intent model to obtain a target intent of the input utterance comprises:

6. A speech generalization apparatus, comprising:

the first acquisition module is used for acquiring N first utterances under intention, wherein the N first utterances are N expression modes aiming at the intention, and N is a positive integer;

The determining module is used for determining the generalization result of the intention according to the N dependency syntax analysis results;

the determining module includes:

7. The apparatus of claim 6, wherein the first relationship comprises a first term, the first term being counted as: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;

8. A speech recognition device, comprising:

the acquisition module is used for acquiring an input voice operation;

the second recognition module is used for recognizing the input voice operation into a target slot according to the target intention, and obtaining a target slot, wherein the recognition result of the input voice operation comprises the target intention and the target slot;

the plurality of intents includes a first intention, and the generalized result acquisition process of the first intention included in the intention model includes:

9. The apparatus of claim 8, wherein the first relationship comprises a first term, the first term being counted as: the number of times that the first word appears in the first relation is added to the number of times of semantics of the first word, wherein the number of times of semantics of the first word is as follows: the number of times that the word with the semantic similarity with the first word in the first relation is larger than a first threshold value appears;

10. The apparatus of claim 8, wherein the first identification module comprises:

11. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program implementing the steps of the speech recognition method according to claim 1 or 2 when executed by the processor or the steps of the speech recognition method according to any one of claims 3 to 5 when executed by the processor.

12. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the speech generalization method according to claim 1 or 2 or which, when executed by the processor, implements the steps of the speech recognition method according to any one of claims 3 to 5.