CN111062200A

CN111062200A - Phonetics generalization method, phonetics identification method, device and electronic equipment

Info

Publication number: CN111062200A
Application number: CN201911288675.XA
Authority: CN
Inventors: 游程; 苏少炜; 陈孝良; 常乐
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-04-24
Anticipated expiration: 2039-12-12
Also published as: CN111062200B

Abstract

The invention provides a phonetics generalization method, a phonetics recognition method, a device and electronic equipment, wherein the method comprises the following steps: acquiring N first dialects under intention, wherein N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining a generalization result of the intention according to the N dependency syntax analysis results. The generalization effect can be improved by determining the intended generalization result from the dependency syntax analysis results for the N first dialects. In addition, the dependency syntax analysis is used to obtain the generalization result of the intention, and the purpose of automatic generalization can be achieved.

Description

Phonetics generalization method, phonetics identification method, device and electronic equipment

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a phonetics generalization method, a phonetics recognition method, a device, and an electronic device.

Background

In the current automatic generalization technology, when generalizing a given dialect, a way of removing stop words is generally used, that is, non-important words such as ' in the given dialect, ' asking ' and the like are defined as stop words, the stop words are removed from the given dialect, and when using, the stop words in a user request sentence are also removed, so as to achieve a certain degree of generalization.

The existing automatic generalization technology is realized by removing stop words, and the generalization effect is poor.

Disclosure of Invention

The embodiment of the invention provides a phonetics generalization method, a phonetics recognition method, a device and electronic equipment, and aims to solve the problem that a generalization method in the prior art is poor in generalization effect.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a conversational generalization method, including:

acquiring N first dialects under intention, wherein N is a positive integer;

performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results;

and determining a generalization result of the intention according to the N dependency syntax analysis results.

In a second aspect, an embodiment of the present invention provides a method for identifying a word technique, including:

acquiring an input speech technique;

performing intention recognition on the input utterance through an intention model to obtain a target intention of the input utterance, wherein the intention model comprises generalization results of a plurality of intentions;

and according to the target intention, performing slot position identification on the input dialogues to obtain target slot positions, wherein identification results of the input dialogues comprise the target intention and the target slot positions.

In a third aspect, an embodiment of the present invention further provides a speech generalization apparatus, including:

a first obtaining module, configured to obtain N first dialogs under an intention, where N is a positive integer;

a second obtaining module, configured to perform dependency parsing on the N first dialects to obtain N dependency parsing results;

a determining module for determining a generalization result of the intent according to the N dependency parsing results.

In a fourth aspect, an embodiment of the present invention further provides a speech recognition apparatus, including:

the acquisition module is used for acquiring input speech;

a first identification module, configured to perform intent identification on the input utterance through an intent model to obtain a target intent of the input utterance, where the intent model includes generalization results of a plurality of intents;

and the second identification module is used for carrying out slot position identification on the input dialogues according to the target intentions to obtain target slot positions, wherein identification results of the input dialogues comprise the target intentions and the target slot positions.

In a fifth aspect, an embodiment of the present invention further provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program, when executed by the processor, implements the steps of the dialect generalization method according to the first aspect, or the computer program, when executed by the processor, implements the steps of the dialect identification method according to the second aspect.

In a sixth aspect, the embodiments of the present invention further provide a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by the processor, implements the steps of the speech generalization method according to the first aspect, or the computer program, when executed by the processor, implements the steps of the speech recognition method according to the second aspect.

In the embodiment of the invention, N first words under intention are obtained, wherein N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining a generalization result of the intention according to the N dependency syntax analysis results. The generalization effect can be improved by determining the intended generalization result from the dependency syntax analysis results for the N first dialects. In addition, the dependency syntax analysis is used to obtain the generalization result of the intention, and the purpose of automatic generalization can be achieved.

Drawings

FIG. 1 is a flow chart of a conversational generalization method provided by an embodiment of the present invention;

FIG. 2 is a flow chart of a speech recognition method provided by an embodiment of the present invention;

FIG. 3 is a block diagram of a speech generalization apparatus provided by an embodiment of the present invention;

FIG. 4 is a block diagram of a speech recognition device provided by an embodiment of the present invention;

fig. 5 is a block diagram of an electronic device provided in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a speech generalization method according to an embodiment of the present invention, and as shown in fig. 1, the embodiment provides a speech generalization method applied to a speech generalization device, including the following steps:

step 101, acquiring N first words under intention, wherein N is a positive integer.

Specifically, there may be multiple expressions for an intention, and one intention corresponds to N first words. The N first dialects may be determined manually. Further, for the first slot, all slots can be replaced by a specific chinese language, such as "slot". The slot position can be replaced by using specific words. For example, for the intent to book an airline ticket, a first dialog may be: i want to make a time ticket from the origin to the destination; help me reserve a ticket to destination, etc., where the portion in "{ }" is the slot, which can be replaced with a word in the slot dictionary. For example, for a time slot, a word in the slot dictionary of the time slot may be substituted, and if "today" is a word in the slot dictionary of the time slot, a time slot in the first term may be substituted with "today". If the words in the slot dictionary of the location slot include "Beijing" and "Turke", the starting slot in the first dialect can be replaced by "Beijing", and the destination slot in the first dialect can be replaced by "Turke".

After the slot position in the first phonetics is replaced by the word in the slot position dictionary, the first phonetics is specifically as follows: i want to order today's tickets from beijing to turkey, helping me order a ticket to turkey. The above alternative is merely an example, and is not limited herein, and the time slot in the first word may be replaced with other words in the slot dictionary corresponding to the time slot instead of "today". Similarly, the place slot may be replaced with other words in the slot dictionary of the place slot instead of "beijing" or "turkish".

And 102, performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results.

And performing dependency syntax analysis on each of the N first dialects to obtain a dependency syntax analysis result. The N first dialects correspond to the N dependency parsing results. Before the dependency syntax analysis is carried out on the first word technique, word segmentation is carried out on the first word technique, part-of-speech analysis is carried out on the words, then dependency syntax analysis is carried out on the words, and a dependency syntax analysis result of the words is obtained. Dependency grammar (DP) reveals its syntactic structure by analyzing dependencies between components within a linguistic unit. Dependency parsing identifies grammatical components "principal object", "shape complement" in a sentence (which can be understood as a grammar), and analyzes the relationship between the components.

For example, for the jargon that "apple was eaten by me", the word segmentation result obtained is: apple, quilt, me, eat; wherein, the part of speech of "apple" is n, the part of speech of "quilt" is p, the part of speech of "me" is r, the part of speech of "eat" is v, the part of speech of "having" is u. After the dependency syntax analysis is performed, the obtained dependency syntax analysis result is: the relation between the apple and the eating is a preposed object (FOB), the relation between the quilt and the eating is a structure in shape (ADV), the relation between the me and the quilt is a guest-intervening relation (POB), and the relation between the me and the eating is a right additional Relation (RAD). "eating" is the core of the entire sentence.

In this step, N first dialects may be input into the dependency parsing model, thereby obtaining a dependency parsing result. The dependency syntax analysis model can analyze grammatical components such as 'principal and predicate object', 'fixed form complement' and the like for the first grammar, and analyze the relation among the components to obtain a dependency syntax analysis result. The dependency parsing model may be trained using corpora that are in the same language as the first grammar, e.g., if the first grammar is Chinese, the parsing model is trained based on the Chinese corpora; if the first utterance is English, then the syntactic analysis model is trained based on the English corpus.

And 103, determining a generalization result of the intention according to the N dependency syntax analysis results.

Specifically, the generalization result of the intent is determined according to the N dependency syntax analysis results, for example, the specified relationships in the dependency syntax analysis results are classified, the number of occurrences of the word in each classification is counted, and if the number of occurrences of the word exceeds a preset threshold, the word is taken as the generalization result of the intent.

In this embodiment, N first utterances under intention are obtained, where N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining a generalization result of the intention according to the N dependency syntax analysis results. The generalization effect can be improved by determining the intended generalization result from the dependency syntax analysis results for the N first dialects. In addition, the dependency syntax analysis is used to obtain the generalization result of the intention, and the purpose of automatic generalization can be achieved.

In an embodiment of the present application, the slot position of each of the N first dialogues is a slot position candidate word;

step 103, determining a generalization result of the intention according to the N dependency syntax analysis results, including:

counting the counting times of each word belonging to the first relation except the slot position candidate word in the N dependency syntax analysis results and the counting times of each word belonging to the second relation;

determining the words with the statistical times larger than N/2 in the first relation and the second relation as the generalization result of the intention.

Specifically, the slot candidate word may be a specific chinese, such as "slot", or a word in a slot dictionary may be used, for example, for a time slot, the slot candidate word may be a word in the slot dictionary of the time slot; corresponding to the location slot, the slot candidate word may be a word in a slot dictionary of the location slot.

And replacing the slot position of each first phonetics in the N first phonetics by a word in a slot position dictionary corresponding to the slot position, so that the slot position of each first phonetics in the N first phonetics is a slot position candidate word. When the slot position is replaced, words in the slot position dictionary corresponding to the slot position can be obtained in a random mode, and the slot position replacement is carried out by using the words.

After the dependency syntax analysis result of the first grammar is obtained, words belonging to the first relation and words belonging to the second relation in the dependency syntax analysis result are extracted. The first relationship may be a core relationship, a guest-moving relationship, etc., and the second relationship may be an object, an indirect object, a preposed object, etc., without limitation. The terms included in the first relation and the second relation do not include slot position candidate words.

And extracting the words belonging to the first relation and the words belonging to the second relation in the dependency syntax analysis result corresponding to the first dialect for each first dialect in the N first dialects. Then, the words in the acquired first relation and second relation are counted, and the words with the counting times larger than N/2 in the first relation and the second relation are determined as the generalization result of the intention.

In this embodiment, generalization of the dialogs is achieved by counting the number of times of statistics of each word belonging to the first relationship and each word belonging to the second relationship, except for the slot position candidate word, in the N dependency parsing results, and determining the word with the number of times greater than N/2 in the first relationship and the second relationship as the generalized result of the intent, and a good generalization effect can be achieved only with a small amount of corpus (i.e., the first dialogs).

In one embodiment of the present application, the first relationship includes a first term, and the statistical number of the first term is: adding the number of times of occurrence of the first term in the first relationship to the semantic number of the first term, wherein the semantic number of the first term is: the number of times of appearance of a word in the first relationship, the semantic similarity of which to the first word is greater than a first threshold;

the second relationship comprises a second term, and the statistical frequency of the second term is as follows: adding the number of times that the second term appears in the second relationship to the semantic number of the second term, wherein the semantic number of the second term is: the number of times of occurrence of a word in the second relationship having a semantic similarity to the second word greater than a second threshold.

Specifically, the semantic similarity between the words in the first relationship is calculated, and if the semantic similarity exceeds a first threshold, the statistical frequency corresponding to the words participating in the semantic similarity calculation may be: the number of times the term appears in the first relationship plus the number of times another term participating in the semantic similarity calculation appears in the first relationship. That is, the number of times that the first term appears in the first relationship is added to the semantic number of the first term, where the semantic number of the first term is: the number of times of occurrence of a word in the first relationship having a semantic similarity to the first word greater than a first threshold includes the first word.

Similarly, the same processing manner as above is also applied to the words in the second relationship, that is, the second relationship includes the second word, and the statistical frequency of the second word is: adding the number of times of appearance of the second term in the second relationship to the semantic number of the second term, wherein the semantic number of the second term is: the number of times that a word in the second relationship with a semantic similarity to the second word is greater than a second threshold occurs. The first threshold and the second threshold may be set according to actual situations, and are not limited herein.

For a first term, the first term may belong to a first relationship only in one first-grammar dependency parsing result, or may belong to a first relationship in a plurality of first-grammar dependency parsing results, so that the first term appears in the first relationship one or more times, i.e., the number of times the first term appears in the first relationship. For the second term, it may belong to the second relationship only in one dependency parsing result of the first grammar, or may belong to the second relationship in all of the dependency parsing results of the first grammars, so that the second term appears in the second relationship one or more times, that is, the number of times the second term appears in the second relationship.

In this embodiment, when determining the statistical frequency of the first term, the frequency of occurrence of the first term in the first relationship is counted, and the frequency of occurrence of the term in the first relationship, whose semantic similarity to the first term is greater than the first threshold, is superimposed on the frequency of occurrence of the first term, so that the generalization capability of the first speech technology can be enhanced, semantic similarity generalization can be achieved, and the generalization effect can be enhanced.

Referring to fig. 2, fig. 2 is a flowchart of a speech recognition method according to an embodiment of the present invention, and as shown in fig. 2, the embodiment provides a speech recognition method applied to a speech recognition apparatus, including the following steps:

step 201, input dialog is acquired.

The input utterance may be a query statement input by a user.

Step 202, performing intention identification on the input utterance through an intention model to obtain a target intention of the input utterance, wherein the intention model comprises generalization results of a plurality of intentions.

The intention model comprises a generalization result of a plurality of intents, wherein the generalization result acquisition process of at least one intention in the plurality of intents is obtained by adopting the dialectical generalization method of the embodiment shown in FIG. 1. And performing intention recognition on the input speech through an intention model to obtain the target intention of the input speech.

And 203, according to the target intention, performing slot position identification on the input dialect to obtain a target slot position.

And according to the target intention, adopting a corresponding slot position identification module to identify the slot position of the input operation to obtain a target slot position, wherein the identification result of the input operation comprises the target intention and the target slot position.

The method for recognizing the dialect of the embodiment of the invention obtains the input dialect; performing intention recognition on the input utterance through an intention model to obtain a target intention of the input utterance, wherein the intention model comprises generalization results of a plurality of intentions; and according to the target intention, performing slot position identification on the input dialogues to obtain target slot positions, wherein identification results of the input dialogues comprise the target intention and the target slot positions. The target intention of the input speech is identified through the intention model, then the target slot position of the input speech is identified according to the target intention, and the target intention and the target slot position are used as the identification result of the input speech, so that the identification accuracy of the input speech can be improved.

In one embodiment of the present application, the plurality of intentions includes a first intention, and the generalization result obtaining process of the first intention included in the intention model includes:

acquiring N first dialects under a first intention, wherein N is a positive integer;

and determining a generalization result of the first intention according to the N dependency syntax analysis results.

Specifically, the first intention may have a plurality of expressions, and the first intention may include N first words and phrases. The N first dialects may be determined manually. Further, for the slot in the first dialog, the slot may be replaced by a specific word. For example, for a first intent to book an airline ticket, a first dialog may be: i want to make a time ticket from the origin to the destination; help me reserve a ticket to destination, etc., where the portion in "{ }" is the slot, which can be replaced with a word in the slot dictionary. For example, for a time slot, a word in the slot dictionary of the time slot may be substituted, and if "today" is a word in the slot dictionary of the time slot, a time slot in the first term may be substituted with "today". If the words in the slot dictionary of the location slot include "Beijing" and "Turke", the starting slot in the first dialect can be replaced by "Beijing", and the destination slot in the first dialect can be replaced by "Turke".

The N first dialects may be input into the dependency parsing model to obtain a dependency parsing result. The dependency syntax analysis model can analyze grammatical components such as 'principal and predicate object', 'fixed form complement' and the like for the first grammar, and analyze the relation among the components to obtain a dependency syntax analysis result. The dependency parsing model may be trained using corpora that are in the same language as the first grammar, e.g., if the first grammar is Chinese, the parsing model is trained based on the Chinese corpora; if the first utterance is English, then the syntactic analysis model is trained based on the English corpus.

And determining a generalization result of the first intention according to the N dependency syntax analysis results, for example, classifying the specified relationship in the dependency syntax analysis results, counting the number of occurrences of the word in each classification, and if the number of occurrences of the word exceeds a preset threshold, taking the word as the generalization result of the first intention.

In this embodiment, N first dialects under a first intention are obtained, where N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining a generalization result of the first intention according to the N dependency syntax analysis results. The generalization effect can be improved by determining the generalization result of the first intention according to the dependency syntax analysis results of the N first dialects. In addition, the dependency syntax analysis is used for obtaining the generalization result of the first intention, and the purpose of automatic generalization can be achieved.

determining, by the processor, a generalization result of the first intent based on the N dependency parsing results, including:

determining the words with the statistical times larger than N/2 in the first relation and the second relation as the generalization result of the first intention.

And extracting the words belonging to the first relation and the words belonging to the second relation in the dependency syntax analysis result corresponding to the first dialect for each first dialect in the N first dialects. Then, the words in the acquired first relation and second relation are counted, and the words with the counting times larger than N/2 in the first relation and the second relation are determined as the generalization result of the first intention.

In this embodiment, generalization of the dialogs is achieved by counting the number of times of statistics of each word belonging to the first relationship and each word belonging to the second relationship, except for the slot position candidate word, in the N dependency parsing results, and determining the word with the number of times greater than N/2 in the first relationship and the second relationship as the generalization result of the first intention, and a good generalization effect can be achieved only with a small amount of linguistic data (i.e., the first dialogs).

In one embodiment of the present application, the first relationship includes a first term, and the statistical number of the first term is: adding the number of times of occurrence of the first term in the first relationship to the number of times of semantics of the first term, wherein the number of times of semantics of the first term is: the number of times of appearance of a word in the first relationship, the semantic similarity of which to the first word is greater than a first threshold;

In an embodiment of the present application, the step 202 of performing intent recognition on the input utterance through an intent model to obtain a target intent of the input utterance includes:

performing dependency syntax analysis on the input speech technology to obtain a target dependency syntax analysis result;

acquiring a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result;

and if the semantic similarity of the first target word and a third word belonging to a first relation in the generalization result of a second intention of the plurality of intentions is greater than a third threshold value, and the semantic similarity of the second target word and a fourth word belonging to a second relation in the generalization result of the second intention is greater than a fourth threshold value, determining the second intention as the target intention of the input word.

Firstly, carrying out dependency syntax analysis on an input word to obtain a target dependency syntax analysis result, then obtaining a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result, and then respectively comparing the first target word and the second target word with generalization results of all intentions, namely carrying out semantic similarity calculation on the first target word and the word belonging to the first relation in the generalization results of all intentions, and carrying out semantic similarity calculation on the second target word and the word belonging to the second relation in the generalization results of all intentions. And if the semantic similarity of the first target word and a third word belonging to the first relation in the generalization result of the second intention of the plurality of intentions is greater than a third threshold value, and the semantic similarity of the second target word and a fourth word belonging to the second relation in the generalization result of the second intention is greater than a fourth threshold value, determining the second intention as the target intention of the input word. The third threshold and the fourth threshold may be set according to actual situations, and are not limited herein.

In this embodiment, if the semantic similarity of the first target word and the third word belonging to the first relationship in the generalization result of the first target word and the second intention of the plurality of intentions is greater than the third threshold, and the semantic similarity of the second target word and the fourth word belonging to the second relationship in the generalization result of the second intention is greater than the fourth threshold, the second intention is determined as the target intention of the input speech technology, so that the accuracy of identifying the target intention can be improved.

The specific process of the above-described speech recognition method is described in detail below.

First, an intent model is modeled. Preprocessing a user-provided utterance, comprising: storing all dialogs of the user according to intentions respectively, and then replacing all slot positions with a specific Chinese identifier aiming at the dialogs under each intention, such as: "slot".

Syntactic analysis modeling for dialogue: respectively inputting each language and technology set under the intention into a dependency syntax analysis model to obtain a dependency syntax analysis result; extracting the core relation and the dynamic guest relation in each dependency syntax analysis result as 'v' of the corresponding dialect, and extracting the object, the indirect object and the prepositive object in each dependency syntax analysis result as 'n' of the dialect; the following statistics and modeling were done for all dialects under each intent:

counting the times of the words in all the "v" (which can be understood as a first relation) and the times of the words in all the "n" (which can be understood as a second relation) respectively;

calculating the semantic similarity of each word in the 'v' with other words, if the semantic similarity exceeds a first threshold, adding the number of times of the word appearing to the number of times of the word with the semantic similarity exceeding the first threshold, and carrying out the same principle for the 'n';

keeping the times (which can be understood as statistical times) in the v and the n equal to or more than half of the total number of words under the intention and not a word (which can be understood as a slot candidate word) identified by a specific Chinese as a modeling result (which can be understood as a generalization result of the intention) of the intention;

the modeling results for each intent are stored separately.

Second, a syntactic dependency analysis is performed on the user's query grammar (query). Inputting the query operation into a dependency syntax analysis model to obtain a dependency syntax analysis result; a first relation, such as a core relation and a guest-moving relation, in the dependency syntax analysis result is extracted as 'v' of the sentence, and a second relation, such as an object, an indirect object and a preposed object, in the dependency syntax analysis result is extracted as 'n' of the sentence.

Thirdly, matching the result after query modeling with the intention model: loading all stored intention modeling results respectively; matching the "v" and the "n" in the query with the "v" and the "n" of each intention respectively, and considering matching if the "v" and the "n" are the same or the word sense similarity exceeds a threshold value; the intention of all matching hits of "v" and "n" is determined as the target intention.

And finally, aiming at the target intention, using a corresponding slot position identification module to identify the slot position to obtain the target slot position, and taking the target intention and the target slot position as identification results of the query operation.

Referring to fig. 3, fig. 3 is a block diagram of a speech generalization apparatus according to an embodiment of the present invention, and as shown in fig. 3, the speech generalization apparatus 300 includes:

a first obtaining module 301, configured to obtain N first dialects intended, where N is a positive integer;

a second obtaining module 302, configured to perform dependency parsing on the N first dialects to obtain N dependency parsing results;

a determining module 303, configured to determine a generalization result of the intent according to the N dependency syntax analysis results.

Further, the slot position of each first phonetics in the N first phonetics is a slot position candidate word;

the determining module 303 includes:

the counting submodule is used for counting the counting times of each word belonging to the first relation except the slot position candidate word in the N dependency syntax analysis results and the counting times of each word belonging to the second relation;

and the determining submodule is used for determining the words with the statistical times larger than N/2 in the first relation and the second relation as the generalization result of the intention.

Further, the first relationship includes a first term, and the statistical frequency of the first term is: adding the number of times of occurrence of the first term in the first relationship to the number of times of semantics of the first term, wherein the number of times of semantics of the first term is: the number of times of appearance of a word in the first relationship, the semantic similarity of which to the first word is greater than a first threshold;

The speech generalization apparatus 300 is an apparatus applying the method embodiment shown in fig. 1, and is not described herein again to avoid repetition.

The dialectical generalization device 300 of the embodiment of the present invention obtains N first dialects under intention, where N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining a generalization result of the intention according to the N dependency syntax analysis results. The generalization effect can be improved by determining the intended generalization result from the dependency syntax analysis results for the N first dialects. In addition, the dependency syntax analysis is used to obtain the generalization result of the intention, and the purpose of automatic generalization can be achieved.

Referring to fig. 4, fig. 4 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention, and as shown in fig. 4, the speech recognition apparatus 400 includes:

an obtaining module 401, configured to obtain an input utterance;

a first identification module 402, configured to perform intent identification on the input utterance by an intent model to obtain a target intent of the input utterance, where the intent model includes generalization results of a plurality of intents;

a second identifying module 403, configured to perform slot position identification on the input utterance according to the target intention, and obtain a target slot position, where an identification result of the input utterance includes the target intention and the target slot position.

Further, the plurality of intentions includes a first intention, and the generalized result obtaining process of the first intention included in the intention model includes:

Further, the first identification module 402 includes:

the analysis submodule is used for carrying out dependency syntax analysis on the input speech technology to obtain a target dependency syntax analysis result;

the obtaining submodule is used for obtaining a first target word belonging to a first relation and a second target word belonging to a second relation in the target dependency syntax analysis result;

a determining submodule, configured to determine a second intention as a target intention of the input terminology if a semantic similarity between the first target word and a third word belonging to a first relationship in a generalization result of the second intention is greater than a third threshold, and a semantic similarity between the second target word and a fourth word belonging to a second relationship in the generalization result of the second intention is greater than a fourth threshold.

Fig. 5 is a schematic diagram of a hardware structure of an electronic device for implementing various embodiments of the present invention, and as shown in fig. 5, the electronic device 500 includes, but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and a power supply 511. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 5 does not constitute a limitation of the electronic device, and that the electronic device may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the electronic device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

In an embodiment of the present application, the processor 510 is configured to obtain N first utterances intended, where N is a positive integer;

a processor 510, configured to count the number of times of counting each word belonging to the first relationship except the slot candidate word in the N dependency syntax analysis results, and count the number of times of counting each word belonging to the second relationship;

In this embodiment, the electronic device 500 can implement the method in the embodiment shown in fig. 1, and is not described here again to avoid repetition.

The electronic device 500 of the embodiment of the present invention obtains N first dialects intended, where N is a positive integer; performing dependency syntax analysis on the N first dialects to obtain N dependency syntax analysis results; and determining a generalization result of the intention according to the N dependency syntax analysis results. The generalization effect can be improved by determining the intended generalization result from the dependency syntax analysis results for the N first dialects. In addition, the dependency syntax analysis is used to obtain the generalization result of the intention, and the purpose of automatic generalization can be achieved.

In another embodiment of the present application, a processor 510 for obtaining an input utterance;

Further, processor 510 is configured to perform dependency parsing on the input utterance to obtain a target dependency parsing result;

The electronic device 500 can implement the method in the embodiment shown in fig. 2, and is not described herein again to avoid repetition.

The electronic device 500 of the embodiment of the present invention acquires an input utterance; performing intention recognition on the input utterance through an intention model to obtain a target intention of the input utterance, wherein the intention model comprises generalization results of a plurality of intentions; and according to the target intention, performing slot position identification on the input dialogues to obtain target slot positions, wherein identification results of the input dialogues comprise the target intention and the target slot positions. The target intention of the input speech is identified through the intention model, then the target slot position of the input speech is identified according to the target intention, and the target intention and the target slot position are used as the identification result of the input speech, so that the identification accuracy of the input speech can be improved.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 501 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 510; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 501 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 501 can also communicate with a network and other devices through a wireless communication system.

The electronic device provides wireless broadband internet access to the user via the network module 502, such as assisting the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.

The audio output unit 503 may convert audio data received by the radio frequency unit 501 or the network module 502 or stored in the memory 509 into an audio signal and output as sound. Also, the audio output unit 503 may also provide audio output related to a specific function performed by the electronic apparatus 500 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 503 includes a speaker, a buzzer, a receiver, and the like.

The input unit 504 is used to receive an audio or video signal. The input Unit 504 may include a Graphics Processing Unit (GPU) 5041 and a microphone 5042, and the Graphics processor 5041 processes image data of a still picture or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 506. The image frames processed by the graphic processor 5041 may be stored in the memory 509 (or other storage medium) or transmitted via the radio frequency unit 501 or the network module 502. The microphone 5042 may receive sounds and may be capable of processing such sounds into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 501 in case of the phone call mode.

The electronic device 500 also includes at least one sensor 505, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 5061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 5061 and/or a backlight when the electronic device 500 is moved to the ear. As one type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of an electronic device (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 505 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 506 is used to display information input by the user or information provided to the user. The Display unit 506 may include a Display panel 5061, and the Display panel 5061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 507 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 507 includes a touch panel 5071 and other input devices 5072. Touch panel 5071, also referred to as a touch screen, may collect touch operations by a user on or near it (e.g., operations by a user on or near touch panel 5071 using a finger, stylus, or any suitable object or attachment). The touch panel 5071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 510, and receives and executes commands sent by the processor 510. In addition, the touch panel 5071 may be implemented in various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 5071, the user input unit 507 may include other input devices 5072. In particular, other input devices 5072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

Further, the touch panel 5071 may be overlaid on the display panel 5061, and when the touch panel 5071 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 510 to determine the type of the touch event, and then the processor 510 provides a corresponding visual output on the display panel 5061 according to the type of the touch event. Although in fig. 5, the touch panel 5071 and the display panel 5061 are two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 5071 and the display panel 5061 may be integrated to implement the input and output functions of the electronic device, and is not limited herein.

The interface unit 508 is an interface for connecting an external device to the electronic apparatus 500. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 508 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the electronic apparatus 500 or may be used to transmit data between the electronic apparatus 500 and external devices.

The memory 509 may be used to store software programs as well as various data. The memory 509 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The processor 510 is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 509 and calling data stored in the memory 509, thereby performing overall monitoring of the electronic device. Processor 510 may include one or more processing units; preferably, the processor 510 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 510.

The electronic device 500 may further include a power supply 511 (e.g., a battery) for supplying power to various components, and preferably, the power supply 511 may be logically connected to the processor 510 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system.

In addition, the electronic device 500 includes some functional modules that are not shown, and are not described in detail herein.

Preferably, an embodiment of the present invention further provides an electronic device, which includes a processor 510, a memory 509, and a computer program stored in the memory 509 and capable of running on the processor 510, where the computer program, when executed by the processor 510, implements each process of the above-mentioned dialogizing method embodiment and can achieve the same technical effect, or, when executed by the processor 510, implements each process of the above-mentioned dialogizing method embodiment and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned embodiment of the speech recognition method, or when the computer program is executed by a processor, the computer program implements each process of the above-mentioned embodiment of the speech recognition method, and can achieve the same technical effect, and in order to avoid repetition, the computer program is not described here again. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method of phonetics generalization comprising:

acquiring N first dialects under intention, wherein N is a positive integer;

2. The method of claim 1, wherein the slot of each of the N first dialects is a slot candidate word;

determining a generalization result of the intent according to the N dependency parsing results, comprising:

3. The method of claim 2, wherein the first relationship comprises a first term, and wherein the first term is statistically counted: adding the number of times of occurrence of the first term in the first relationship to the number of times of semantics of the first term, wherein the number of times of semantics of the first term is: the number of times of appearance of a word in the first relationship, the semantic similarity of which to the first word is greater than a first threshold;

4. A method of speech recognition, comprising:

acquiring an input speech technique;

5. The method of claim 4, wherein the plurality of intents includes a first intention, and wherein the generalized result acquisition process for the first intention included in the intention model includes:

6. The method of claim 5, wherein the slot of each of the N first dialects is a slot candidate word;

7. The method of claim 6, wherein the first relationship comprises a first term, and wherein the first term is statistically counted: adding the number of times of occurrence of the first term in the first relationship to the number of times of semantics of the first term, wherein the number of times of semantics of the first term is: the number of times of appearance of a word in the first relationship, the semantic similarity of which to the first word is greater than a first threshold;

8. The method of claim 6, wherein the identifying the intent of the input utterance by an intent model to obtain a target intent of the input utterance comprises:

9. A speech generalization apparatus comprising:

10. The apparatus of claim 9, wherein the slot of each of the N first dialects is a slot candidate word;

the determining module includes:

11. The apparatus of claim 10, wherein the first relationship comprises a first term, and wherein the first term has a statistical count of: adding the number of times of occurrence of the first term in the first relationship to the number of times of semantics of the first term, wherein the number of times of semantics of the first term is: the number of times of appearance of a word in the first relationship, the semantic similarity of which to the first word is greater than a first threshold;

12. A speech recognition apparatus, comprising:

the acquisition module is used for acquiring input speech;

13. The apparatus of claim 12, wherein the plurality of intents includes a first intention, and wherein the generalized result obtaining process for the first intention included in the intention model includes:

14. The apparatus of claim 13, wherein the slot of each of the N first dialects is a slot candidate word;

15. The apparatus of claim 14, wherein the first relationship comprises a first term, and wherein the first term is statistically counted: adding the number of times of occurrence of the first term in the first relationship to the number of times of semantics of the first term, wherein the number of times of semantics of the first term is: the number of times of appearance of a word in the first relationship, the semantic similarity of which to the first word is greater than a first threshold;

16. The apparatus of claim 14, wherein the first identification module comprises:

17. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the dialoging method according to any one of claims 1 to 3 or the computer program, when executed by the processor, implementing the steps of the dialoging method according to any one of claims 4 to 8.

18. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the dialoging method according to one of the claims 1 to 3 or which computer program, when being executed by the processor, carries out the steps of the dialoging method according to one of the claims 4 to 8.