CN107305768B

CN107305768B - Error-prone character calibration method in voice interaction

Info

Publication number: CN107305768B
Application number: CN201610248440.8A
Authority: CN
Inventors: 黄亦睿; 刘功申; 苏波; 刘春梅; 李建华
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2016-04-20
Filing date: 2016-04-20
Publication date: 2020-06-12
Anticipated expiration: 2036-04-20
Also published as: CN107305768A

Abstract

The invention provides a method for calibrating error-prone characters in voice interaction, which comprises the following steps: the method comprises the steps of context recognition, automatic error correction based on limited semantics and manual error correction based on semantic feedback. The invention realizes the automatic error correction function for the entity with specific meaning by interacting with the user voice and sensing and identifying the topic context by utilizing the named entity identification technology in the limited semantic range, and supports the additional semantics obtained by manual feedback to further correct the error, thereby realizing higher input efficiency and more convenient error correction mode than the existing voice identification software.

Description

Error-prone character calibration method in voice interaction

Technical Field

The invention relates to a calibration technology of error-prone characters, in particular to a calibration method of error-prone characters in voice interaction, and particularly relates to a calibration scheme of available voice interaction error-prone characters, which is realized by applying a natural language understanding method to calibration and correction of voice interaction error-prone characters.

Background

As a new approach for man-machine interaction, voice interaction has been widely used in recent years. This is derived from the development of speech recognition technology, and the error rate of the speech recognition system is greatly reduced from Hidden Markov Models (HMM) and Gaussian Mixture Models (GMM) to the current Deep Neural Network models (DNN); secondly, the use habit of the intelligent equipment user is not formed, and new technologies such as voice interaction are easily accepted by the public; and due to the ultra-conventional development of cloud computing and mobile internet, a large amount of brand-new corpus resources are generated to further promote the development of the voice recognition technology.

Under many scenes, the voice interaction has more practical value and accords with the interactive habit of human beings. However, since the voice input is inevitably affected by environmental noise and fading channels, many erroneous results are often generated, and in addition, the machine cannot accurately recognize the voice input of the user due to the fact that a large number of homophones and phonetic close characters exist in the Chinese language, so that erroneous characters are easy to occur in the voice recognition. In other words, the accuracy of speech recognition has not yet reached the desired level, and speech recognition technology must also make breakthrough in many aspects.

Through the search of prior art documents, chinese patent document No. CN201210584746.2, publication No. CN103021412A, describes a "speech recognition method and system", which includes: carrying out voice recognition on a voice signal input by a user to obtain a voice recognition result and a voice segment corresponding to each character in the voice recognition result; receiving error correction information independently input by a user and generating an error correction character string; determining a voice section generating recognition errors in a voice signal input by a user according to the error correction character string; determining a character string corresponding to the speech segment with the recognition error in the speech recognition result as an error character string according to the speech segment corresponding to each character in the speech recognition result; the error string is replaced with an error correction string. The technology realizes an error correction method for the error character string, but the entry of the error correction character string can be entered after a special key is used, or entered by other modes such as pinyin, handwriting and the like. The voice input mode can only repeat the previously input content so as to achieve the aim of correcting the error recognition; but if the user enters a word that is not entered by the system, the scheme will not be corrected correctly.

Chinese patent document No. CN201310589827.6, publication No. CN103680505A, describes a "speech recognition method and system", the method including: continuously receiving a recording input; performing voice recognition on the recording by using a small vocabulary voice recognition network to check whether the recording contains preset keywords; and if the sound recording contains the keywords, identifying the sound recording after the keywords by using a large-vocabulary speech recognition network to obtain an identification result. The technology solves the problem of recognition accuracy rate when monitoring commands for a long time, and can smoothly transit from a small vocabulary network to a normal speech recognition stage, namely the large vocabulary network. However, the technology does not optimize large-vocabulary networks, such as semantic enhancement in the context of restriction, and does not mention related error-prone word alignment technology.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method for calibrating error-prone characters in voice interaction. The present invention uses the existing speech recognition API (Application Programming Interface) to complete a valuable error-prone word calibration system. The system senses and identifies topic contexts through voice interaction with a user, so that in a limited semantic range, an entity with specific significance is automatically corrected by using a named entity identification technology, additional semantics are obtained through manual feedback to correct errors, and higher input efficiency and a more convenient wrong word correction mode are realized compared with the existing voice identification software.

The invention provides a method for calibrating error-prone characters in voice interaction, which comprises the following steps:

and a context recognizing step: creating respective contextual knowledge bases for different domains, the step of constructing the contextual knowledge bases comprising: firstly, according to keywords of a field, obtaining related documents through a search engine to serve as a corpus of the field; and then, acquiring core words of the field according to the semantic knowledge, and clustering according to the core words to obtain example sentences of the field, thereby constructing a context knowledge base.

Preferably, in the step of identifying the context, the judgment is carried out according to the context similarity of the text sentences and different fields in the context knowledge base, and the judgment is used as the premise of automatic error correction; the specific algorithm of the context similarity is as follows:

s1: counting the occurrence times of each word in the text sentence A, and expressing the occurrence times into a vector form;

s2: according to a cosine similarity calculation formula, calculating a cosine value of a vector included angle between two vectors of a text sentence A and each example sentence B in a vector form in a context Ci, and taking the cosine value as the word shape similarity based on the vectors;

s3: converting all words of the text sentence A into a pinyin form, counting the occurrence times of each different pinyin sequence in the text sentence A, expressing the pinyin sequence into a vector form, calculating a cosine value of a vector included angle between two vectors of the text sentence A expressed in the pinyin form and each example sentence B in the vector form in the context Ci, and obtaining the pinyin similarity based on the vectors;

s4: the sentence similarity between the text sentence A and each example sentence B is calculated by giving different weights to the pinyin similarity and the morphological similarity, and the value with the maximum sentence similarity is selected as the sentence similarity between the text sentence A and the context Ci;

s5: calculating the matching rate of the core words of the text sentence A and the context Ci, namely the number of all the core words in the context Ci contained in the text sentence A accounts for the percentage of the number of all the words in the text sentence A;

s6: the context similarity of the text sentence A and the context Ci is calculated by giving different weights to the sentence similarity and the matching rate of the core words;

s7: computing smooth contextual similarity SmoothContextSim (a, C) of text sentence a and context Ci based on the context in front_i)：

SmoothContextSim(A,C_i)＝λ₁·ContextSim(A_-2,C_i)

+λ₂·ContextSim(A_-1,C_i)

+λ₃·ContextSim(A,C_i)

λ₁+λ₂+λ₃＝1

λ₁≤λ₂≤λ₃

Wherein, A_-1,A_-2Respectively representing a current text sentence, a first sentence before the current text sentence and a second sentence before the current text sentence; lambda [ alpha ]₁,λ₂,λ₃Is a constant; ContextSim (X, Y) represents the contextual similarity of the text sentence X to the context Y.

Preferably, the method further comprises the following steps:

and (3) automatic error correction based on the limited semantics: and acquiring the place name to be corrected in the text sentence input by the voice of the user, and performing error correction on the place name to be corrected.

Preferably, the automatic error correction based on the constraint semantics comprises:

reading a text sentence: reading in a text sentence P input by a user voice, wherein P is P₁P₂...P_i...P_n(ii) a Wherein p is_iRepresenting the ith Chinese character in the text sentence, and n represents the length of the text sentence;

and a to-be-corrected place name obtaining step: scanning P, and matching according to a place name matching rule to obtain a place name to be corrected;

error correction step: and carrying out short text similarity matching on the place name to be corrected and all the place names in the place name library to obtain the place name most similar to the place name to be corrected, and taking the place name most similar to the place name to be corrected as the correct place name after error checking and correcting.

Preferably, the place name matching rule includes any one of the following rules:

rule one is as follows: if W is_lBelonging to the set of left boundary words, W_rBelonging to the set of right boundary words, W_pNumber of words W_pLen is greater than 1, then W_pIdentifying the place name to be corrected;

rule two: if W is_lIn the set of left boundary words, W_rBelonging to the set of place name suffixes, then will be represented by W_p、W_rFormed character string

Identifying the place name to be corrected;

rule three: if W is_lBelonging to a collection of place suffixes, W_rBelonging to the set of right boundary words, W_pIf the number of words is greater than 1, then W is set_pIdentifying the place name to be corrected;

rule four: if W is_lBelonging to a collection of place suffixes, W_rSet of place name suffixes, then will be represented by W_p、W_rFormed character string

Identifying the place name to be corrected;

wherein, W_lIs the previous word of the word to be corrected, W_pIs a word to be corrected, W_rIs the latter word of the word to be corrected.

Preferably, in the automatic error correction step based on the restricted semantics, a weighted longest common subsequence algorithm is adopted to calculate the short text similarity matching; the weighted longest common subsequence algorithm is as follows: and a similarity function exists between any two elements of the two sequences, and a public subsequence with the maximum sum of the similarities in the two sequences is searched, wherein the similarity function is defined as the pinyin similarity between two pinyins.

Preferably, the pinyin similarity refers to: the similarity of initial consonants in the two phonetics and the similarity of final consonants in the two phonetics are respectively calculated, and the corresponding similarity is respectively given to the situation of syllable mixing.

Preferably, the method further comprises the following steps:

and (3) artificial error correction based on semantic feedback: correcting errors according to the correct sentence pattern input by the voice; wherein, the form of correcting sentence pattern includes:

in a first form: modifying, wherein the character A is a character C of the word B;

a second form: modifying, wherein the Nth character A is a character C of the word B;

wherein, the character A and the character C are the same character and are marked as indicating characters; the word B is a idiom or phrase containing the character A and the character C and is marked as a correcting word;

the pinyin of the indicating character is the same as the pinyin of the wrong character in the input text and the same as the pinyin of the correct character in the corrected word;

and according to the indicator, extracting the correct character from the corrected word as the corrected character for replacement.

Compared with the prior art, the invention has the following beneficial effects:

first, the three-stage calibration technique for error-prone words of the present invention can be widely applied to various speech recognition systems and speech interaction devices, and can be used together or separately to enhance the correction capability for error-prone words in a single aspect.

Secondly, the context recognition function of the invention can be applied to a generalized voice input system, and can recognize the corresponding context according to the input context of the user and improve the weight of various words under the context so as to improve the recognition accuracy.

Thirdly, the automatic error correction function based on the voice vehicle navigation context can improve the recognition accuracy of command entities such as road names, places and the like, reduce the interaction and correction frequency of drivers and navigation equipment and improve the driving safety.

Fourthly, the automatic error correction function of the artificial semantic feedback can be applied to the scene of long-time and large-amount text input, and natural and smooth command voice is used for realizing the error correction of the previously input information. The function accords with the Chinese language culture habit, and can realize the text input of pure voice without additional click.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a basic framework diagram of the present invention.

FIG. 2 is a schematic diagram of the overall calibration process of the present invention.

FIG. 3 is a flow chart illustrating the context recognition process of the present invention.

FIG. 4 is a schematic diagram of an automatic error correction process according to the present invention.

FIG. 5 is a schematic diagram of a manual error correction process according to the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The invention provides a calibration technology of error-prone characters in a series of voice interaction, which applies a natural language understanding method to the calibration and correction of the error-prone characters in the voice interaction and realizes a comprehensive voice interaction error-prone character calibration system. The system comprises the following functions:

first, semantic enhancement based on context. Under a plurality of specific contexts, the system perceives and identifies topic contexts by analyzing the voice input by the user and understands the interaction requirements of the user.

Second, automatic error correction based on constrained semantics. The system utilizes language features to pertinently improve the accuracy of voice recognition in a context environment with limited semantics by imposing restrictions on the context of voice interaction.

Third, artificial semantic enhancement based on voice interaction. The user is required to actively interact with the system voice through additional semantics, and semantic technology enhancement is carried out on the important words in the interactive system, so that the computer is guided to accurately understand the intention of the user and make corresponding feedback.

In particular, the present invention completes a valuable voice interaction error-prone word alignment system available based on existing speech recognition APIs. The system senses and identifies topic contexts through voice interaction with a user, so that in a limited semantic range, an entity with specific significance is automatically corrected by using a named entity identification technology, additional semantics are obtained through manual feedback to correct errors, and higher input efficiency and a more convenient wrong word correction mode are realized compared with the existing voice identification software. Fig. 1 depicts the basic framework of the present invention and fig. 2 depicts the overall calibration flow of the present invention.

first, identifying context step

The first premise for identifying a context is to create a corresponding context knowledge base for different domains. The process of constructing the context knowledge base for each domain is as follows: firstly, a large number of related documents are obtained through a search engine according to keywords of a selected field and are used as a corpus of the field. And then manually acquiring core words of the field according to semantic knowledge, and manually clustering according to the core words to obtain example sentences of the field, thereby constructing a context knowledge base.

In the step of identifying the context, the judgment is mainly carried out according to the similarity of the text sentences and the contexts of different fields in the context knowledge base, and the judgment is used as the premise of automatic error correction.

The specific algorithm of the context similarity is as follows:

s2: according to a cosine similarity calculation formula, calculating a cosine value of a vector included angle between two vectors of a text sentence A and an example sentence B in each vector form in a context Ci, and taking the cosine value as the word shape similarity based on the vectors;

s3: converting all words of the text sentence A into a pinyin form, counting the occurrence times of each different pinyin sequence in the text sentence A, expressing the occurrence times into a vector form, and calculating a cosine value of a vector included angle between two vectors of the text sentence A and an example sentence B in each vector form in the context Ci to obtain a pinyin similarity based on the vectors;

s4: the sentence similarity between the text sentence A and each example sentence B is calculated by giving different weights to the pinyin similarity and the morphological similarity, and the value with the maximum similarity is selected as the sentence similarity between the text sentence A and the context Ci;

SmoothContextSim(A,C_i)＝λ₁·ContextSim(A_-2,C_i)

+λ₂·ContextSim(A_-1,C_i)

+λ₃·ContextSim(A,C_i)

λ₁+λ₂+λ₃＝1

λ₁≤λ₂≤λ₃

Wherein, A_-1,A_-2Respectively representing a current text sentence, a first sentence before the current text sentence and a second sentence before the current text sentence; lambda [ alpha ]₁,λ₂,λ₃Is a constant; ContextSim (X, Y) represents the contextual similarity of the text sentence X and the context Y;

in the test of the invention, λ is selected₁＝0.1,λ₂＝0.2,λ₃0.7. FIG. 3 gives a general flow of identifying context.

Second, an automatic error correction step based on restricted semantics

The invention preferably applies the voice interaction scenario to the car navigation system, and therefore, in a preferred embodiment of the invention, the corpus is a cell word bank storing correct road names, place names and organization names.

Firstly, the invention defines the following set based on the analysis of place name composition and context rules in the vehicle navigation system:

set of place name suffixes PlaceTailWord, such as "city", "county", "road", "district", "village", and the like.

Set of left boundary words leftborderderword: such as "to", "from", "at", "distance", "close", etc.

Set of right border words rightborderworword: such as "near", "around", "beside", etc.

Asplace (S) denotes the recognition of S as a place name to be corrected.

Will be composed of_l、W_p、W_rThe constructed string is recorded as

W_lIs the previous word of the word to be corrected, W_pIs a word to be corrected, W_rIs the latter word of the word to be corrected.

The specific place name matching rule is defined as follows:

that is (W)_l∈LeftBorderWord)&&(W_r∈RightBorderWord)&&(W_p.len＞1)→AsPlace(W_p)

Identifying the place name to be corrected;

namely, it is

that is (W)_l∈PlaceTailWord)&&(W_r∈RightBorderWord)&&(W_p.len＞1)→AsPlace(W_p)

Identifying the place name to be corrected;

namely, it is

The recognition of the named entity is established on the basis of the word segmentation result, and once the word is not segmented correctly, the recognition accuracy of the named entity is greatly reduced. In order to solve the problem of error identification caused by word segmentation, each word is segmented into a character, and named entity identification is carried out by taking the character as a unit.

The specific algorithm is as follows:

In the automatic error correction stage, the invention mainly utilizes the place names in the common place name library to calibrate and confirm the voice recognition result. In other words, the place name to be corrected extracted according to the rule is compared with the place names in the common place name library in a short text, and the same or most similar place name is obtained to replace the place name to be corrected so as to realize error checking and correction.

In the automatic error correction step based on the limited semantics, calculating the similarity matching of the short text by adopting a weighted longest public subsequence algorithm; the weighted longest common subsequence algorithm is as follows: and a similarity function exists between any two elements of the two sequences, and a public subsequence with the maximum sum of the similarities in the two sequences is searched, wherein the similarity function is defined as the pinyin similarity between two pinyins.

The short text comparison algorithm is realized by taking pinyin as a unit, and considering that the difference of the composition structures of the initial consonant and the final in the pinyin is large, the similarity needs to be calculated for the initial consonant and the final when the pinyin similarity is calculated. In two different pinyins, once the initial consonants or the final consonants are completely the same, the similarity of 0.5 is given; if the initials or finals are similar (such as flat warped tongue sound, front and back nasal sound, etc.), a similarity of 0.25 is given.

On the basis, the method adopts a weighted longest public subsequence algorithm, calculates the pinyin similarity between the candidate place name A and the place name B in the common place name library by taking a character as a unit, and calculates the longest public subsequence of the candidate place name A and the place name B by utilizing a dynamic planning thought.

Let use two-dimensional array WLCS [ i, j]Means that the character string A is a₀a₁...a_nThe ith character and character string B ═ B₀b₁...b_mThe longest common subsequence with weight before the jth character in the sequence has

Wherein i is more than or equal to 0 and less than or equal to n, and j is more than or equal to 0 and less than or equal to m. SimPY (ai, b)_j) The pinyin similarity between the ith character of the character string A and the jth character of the character string B is calculated by the pinyin similarity calculation method.

The similarity SimWLCS (a, B) of the character strings a and B can be calculated by the following formula:

wherein WLCS (A, B) represents the sum of the longest common subsequence similarity of each corresponding bit in the string A, B; maxlan (a, B) represents the maximum value of the character length in the character string A, B.

Thirdly, artificial error correction based on semantic feedback

The basic mode of the voice interaction scheme of the artificial semantic feedback is that a voice recognition system continuously receives voice sent by a user, and recognizes and processes the voice. Under general conditions, a user normally uses voice to perform character input, when the user thinks that a certain character has recognition errors, the user can use the voice to perform correction, the corrected simple sentence pattern is ' correction, Wu is Wu ' of Kontangwu ', the system can automatically recognize the voice input pattern as a correction pattern, enter the feedback and correction processes of the system, extract correction information from the correction sentence pattern and correct the corresponding wrong character before. If there are other wrong words, the user can repeat the feedback process until the correction is satisfied, and then perform the subsequent entry, and the text entered before is confirmed by the user by default and no correction is accepted.

Specifically, when the user inputs a text sentence by voice, and the input text is inconsistent with the result expected by the user, the user can continue to speak a more correct sentence pattern by voice, and the more correct sentence pattern has two forms:

in a first form: modified, A is C for B.

In a second form: modified, Nth A is C of B.

Wherein, A and C are the same word and are called as 'indicator word'; b is an idiom or phrase containing A and C, called a "correcting word". The pinyin of the indicator word is the same as the pinyin of the wrong word in the entered text and the correct word in the corrected word under normal conditions. The existence of the indicating characters establishes the relation between the wrong characters and the corrected characters, and the correct corrected characters can be extracted from the corrected characters according to the indicating characters, the wrong characters are searched in the text, and the corrected characters are used for replacement. For example:

and (3) user voice input: i am yellow and also Rui.

And (3) voice recognition results: i call Huang-Yi Rui.

Where a word is considered by the user as an erroneous entry. It is also feared that the user can continue to use speech to say a more correct sentence "modify". In this case, the letter in section A, C is an indicator and "unfair" is a corrector. The system will start the error correction procedure and replace the wrong "one" with the "also" word, thereby showing the correct result "i am yellow or wise" on the screen.

In order to avoid that the user enters the text and a plurality of words with the same pronunciation cannot be selected to correct one of the words, the user needs to actively speak the specific sequence of the wrong words. The second Chinese character with the same tone as the indicator character can be corrected by a second form of correcting sentence pattern, using the number provided by the N part, such as "the second A is C of B", to pinpoint the position of the wrong character and avoid confusion caused by multiple homophones.

In the sentence correcting mode, the character part is indicated, on one hand, Chinese characters corresponding to pinyin are searched for in the corrected word part through the pinyin and are used as correct corrected characters; on the other hand, the corresponding Chinese character position is searched in the front text through pinyin, and the corrected character is used for replacement, so that the correction of the wrong text is completed.

The specific steps for searching for corrected words are as follows:

step (1): and converting the corrected sentence into a pinyin sequence, and segmenting according to the keywords to obtain the indicator and the corrected word. If the 'is also not so much' is converted into a pinyin sequence [ yi shi bu yi le hu yi ], the contents of the indicator A, the corrector B and the indicator C are obtained by segmenting the 'shi' and the 'de' keywords respectively to be 'yi', [ bu yi le hu ], 'yi'.

Step (2): and judging whether the indicator words A and C are the same, and if so, searching the position of the indicator word in the corrected word. That is, the position index of "yi" in [ bu yi le hu ] is 2 (starting from 1).

And (3): and in the process of matching the special knowledge base or the API, the Chinese character corresponding to the pinyin of the indicator character in the corrected word is obtained according to the position information and is used as the correct corrected word. Here, the word "never" is also true.

And (4): and searching the position of the wrong character in the previous sentence according to the pinyin of the wrong character, and replacing the wrong character with the corrected character so as to achieve the function of correcting the wrong character.

In speech feedback and correction, people often adopt words which are not easy to repeat to make words for corrected characters, such as common words, idioms, names of famous people or common phrases specially used for describing Chinese characters.

There are many proper nouns in chinese, where each word is an abbreviation of some word, such as "program", which can be described by "written program" and "course program".

Meanwhile, there is a common phenomenon in chinese, that is, a character is described by using a radical, which is often used in descriptions of surnames or words that are not easy to be formed, such as "grass-headed yellow" and "ancient chinese fiddle".

The following table lists several cases describing Chinese characters:

table 1 describes several cases of Chinese characters

For idioms, celebrity names and common words, the existing speech recognition API can correctly recognize the idioms, and accurate corrected words can be obtained. However, for describing Chinese font words, the existing speech recognition API can not correctly recognize all words because the words do not belong to common words. In this respect, the invention introduces a specialized knowledge base based on idioms to improve the recognition accuracy of the words.

The word to be corrected represented by each record of the knowledge base belongs to the category of common error-prone words, and stores the mapping of proper nouns and pinyin thereof, such as:

lizao zhang, standing chapter of morning chapter

gong chang zhang for treating long arch

After the user input is recognized by using the voice recognition API and the corrected sentence pattern is obtained, the system extracts the Chinese characters of the corrected word part, converts the Chinese characters into a pinyin sequence, uses the pinyin sequence to search a matched pinyin sequence in a knowledge base, and replaces the original recognition result of the corrected word part with the Chinese character words corresponding to the pinyin sequence to be used as a new corrected word part. If the correction words of the user cannot be matched with the local knowledge base, the system extracts the correction words according to the original API recognition result, if the modification sentence of the user is 'Qu is Qu' of fringed pink, and if the local knowledge base does not have corresponding records and the API can accurately recognize the fringed pink, the system can correct the wrong words by 'Qu'.

In speech recognition, the recognition result is not intended by the user, particularly when a single character is recorded, due to the interference of the accent or noise of the user. Even if the user inputs the single characters in a right-cavity circular manner, the single characters are difficult to recognize as the characters actually spoken by the user, such as 'ox' and 'Liu', 'Hu' and 'Fu', as well as errors of flat-rolled tongue sounds and front and back nasal sounds, due to the existence of accents and the fact that the single characters are not assisted by context words. In the invention, the indicator part is the result of single character recognition, and according to the process of extracting the corrected characters, if the corrected character part is correctly recognized, but the indicator part is recognized as common fuzzy-pronunciation characters, such as milk Liu, and the like, the result cannot be matched by searching in the pinyin sequence [ niu, nai ] by using the pinyin liu. This requires adding a blur sound at the time of the search to improve the correction success rate.

In the above paragraph, "Liu of milk" is taken as an example, we construct a fuzzy sound array [ liu, niu ] of Pinyin liu, and sequentially use the elements in the data to search in Pinyin sequences [ niu, nai ]. For the case where there are multiple fuzzy sounds such as zhen, the fuzzy sound array is ordered by similarity to the original sound, i.e. [ zhen, zen, zheng, zeng ]. The system will sequentially traverse the arrays and find matches in the pinyin sequence.

The use of fuzzy sound can improve the success rate of extracting corrected characters from corrected words. Similarly, when applying a corrective structure to the text, i.e., finding the wrong word and replacing, a fuzzy matching is also required to find the wrong word. The specific implementation is that the pinyin corresponding to the correct character is expanded into a fuzzy-tone array, the pinyin sequence of each element in the array in the front is used for searching, and then the found Chinese character is replaced.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A method for calibrating error-prone characters in voice interaction is characterized by comprising the following steps:

and a context recognizing step: creating respective contextual knowledge bases for different domains, the step of constructing the contextual knowledge bases comprising: firstly, according to keywords of a field, obtaining related documents through a search engine to serve as a corpus of the field; then, acquiring core words of the field according to semantic knowledge, and clustering according to the core words to obtain example sentences of the field, thereby constructing a context knowledge base;

in the step of identifying the context, judging according to the similarity of the text sentences and the contexts in different fields in a context knowledge base, and taking the similarity as the premise of automatic error correction; the specific algorithm of the context similarity is as follows:

SmoothContextSim(A,C_i)＝λ₁·ContextSim(A_-2,C_i)+λ₂·ContextSim(A_-1,C_i)+λ₃·ContextSim(A,C_i)

λ₁+λ₂+λ₃＝1

λ₁≤λ₂≤λ₃

the method for calibrating the error-prone characters in the voice interaction further comprises the following steps:

2. The method for calibrating error-prone words in voice interaction according to claim 1, wherein the automatic error correction based on constrained semantics comprises:

3. The method of claim 2, wherein the place name matching rule comprises any one of the following rules:

Identifying the place name to be corrected;

Identifying the place name to be corrected;

4. The method of error-prone word alignment in speech interaction of claim 1, wherein in the limited-semantic-based auto-error correction step, a weighted longest common subsequence algorithm is used to calculate short-text similarity matches; the weighted longest common subsequence algorithm is as follows: and a similarity function exists between any two elements of the two sequences, and a public subsequence with the maximum sum of the similarities in the two sequences is searched, wherein the similarity function is defined as the pinyin similarity between two pinyins.

5. The method of claim 4, wherein the Pinyin similarity is: the similarity of initial consonants in the two phonetics and the similarity of final consonants in the two phonetics are respectively calculated, and the corresponding similarity is respectively given to the situation of syllable mixing.

6. The method of calibrating error-prone words in a voice interaction of claim 1, further comprising: