CN116052657B

CN116052657B - Character error correction method and device for voice recognition

Info

Publication number: CN116052657B
Application number: CN202210917316.1A
Authority: CN
Inventors: 徐成国; 鲍鹏程; 郑豪
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2023-10-20
Anticipated expiration: 2042-08-01
Also published as: CN116052657A

Abstract

The embodiment of the application relates to the technical field of computers, in particular to a character error correction method and device for voice recognition. Firstly, constructing an error correction model according to the difference degree between characters, comprising the following steps: the method comprises the steps of inputting characters to be processed into an error correction model during error correction processing, outputting replacement characters of the characters to be processed and the difference degree between each replacement character and the characters to be processed by the error correction model, determining the comprehensive difference degree corresponding to each replacement character according to the difference degrees corresponding to the replacement characters in different error correction models, and finally determining the replacement character with the largest comprehensive difference degree as a target character, wherein the target character is used for replacing the characters to be processed. The method can be applied to text error correction, for example, when the text obtained by artificial intelligence natural language processing is subjected to error correction processing, the method can effectively reduce the complexity of error correction retrieval and improve the accuracy of error correction results.

Description

Character error correction method and device for voice recognition

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a character error correction method and device for voice recognition.

Background

Currently, speech recognition is increasingly used in a wide range of applications. For example, a terminal device makes a call based on a voice assistant is a very high frequency use scenario, and an instruction to make a call is intended to require a precise slot word (contact name) to recall the correct person name from an address book, display the correct person name on a screen, and conduct a call making action. The above operation has high requirements on the accuracy of voice recognition, and if the recognition is wrong, poor experience is formed for the user. Therefore, in the case of recognizing voice and making a call, it is necessary and valuable for the terminal device to correctly output the contact name.

In the current scene of user speech recognition based on automatic speech recognition technology (Automatic Speech Recognition, ASR), the main errors are near-speech errors, i.e. the character strings to be corrected are very similar to the correct character crosstalk. The correct vocabulary entry with the latest pronunciation is fuzzy matched from the vocabulary by using the word to be corrected containing the replacement wrong word based on text correction of the vocabulary. The Berkhord-Keller (BK) tree based on the editing distance is a traditional method for spelling error correction, but the calculating time of the editing distance is high in complexity; error correction is only carried out on BK tree based on editing distance of pronunciation, granularity is too coarse, recall is more, and accurate judgment is difficult.

Disclosure of Invention

The embodiment of the application provides a character error correction method and device for voice recognition. On the basis of BK tree, the error correction model is reconstructed by replacing the editing distance with the difference between the characters, the final target character is determined by integrating the output results of the error correction models, the target character is used for replacing the character to be processed, the searching complexity is reduced, and the error correction accuracy is improved.

In a first aspect, an embodiment of the present application provides a method for correcting character errors in speech recognition, where the method includes:

an error correction model is built according to the difference degree between characters, and the error correction model comprises: a pronunciation model, a pronunciation tone model and a blur model;

inputting characters to be processed into the error correction model, so that the error correction model outputs replacement characters of the characters to be processed and the degree of difference between each replacement character and the characters to be processed;

determining the comprehensive difference degree corresponding to each replacement character according to the difference degrees corresponding to the replacement characters in different error correction models;

and determining the replacement character with the smallest comprehensive difference degree as a target character, wherein the target character is used for replacing the character to be processed.

According to the embodiment of the application, the error correction model is constructed through the difference degree, so that the searching complexity can be reduced, the final target character is determined according to the pronunciation model, the pronunciation tone model and the replacement character output by the fuzzy model, and the target character can be more accurate.

In one implementation manner, before the error correction model is constructed according to the difference degree between the characters, the method further includes:

acquiring characters for constructing the error correction model;

determining the similarity between characters;

determining the error recognition association degree between characters;

and determining the difference degree between the characters according to the similarity degree and the error recognition association degree.

In the embodiment of the application, reasonable difference degree can be obtained through the similarity between the characters and the error recognition association degree.

In one implementation, the determining the similarity between characters includes:

a plurality of similarity levels are pre-configured according to spelling rules, and each similarity level corresponds to one similarity;

determining the similarity level between any two characters according to spelling differences between the characters;

and determining the similarity corresponding to the similarity level as the similarity between the characters.

In one implementation, if the error correction model is the pronunciation model or the blur model, the similarity level includes: the initials and finals are the same, the front and back nasal sounds are different in length, the pinyin with equal length is flat, the initials with equal length are different, the initials with different length are different, the finals with different length are different, the polyphone and the initials and finals are different.

In one implementation, if the error correction model is the fuzzy model, the similarity level further includes: the initials and finals are the same and different in accent.

The fuzzy model in the embodiment of the application can correct the error of different mouth sounds, and expands the error correction range.

In one implementation, if the error correction model is the pronunciation tone model, the similarity level includes: the same tone of the initial consonant and the final consonant, different tones of the initial consonant and the final consonant, the same tone of the nose before and after the unequal tone, different tones of the nose before and after the unequal tone, the same tone of the equal length pinyin flat tongue, different tones of the equal length consonant and the final consonant, different tones of the equal length vowel and the final vowel, different tones of the unequal length consonant and the final vowel, different tones of the unequal length vowel and the final vowel, and multiple tones and different vowels.

In one implementation, the determining the degree of error recognition association between characters includes:

performing voice recognition on each character for multiple times, and if the recognition is wrong, recording a corresponding wrong recognition result;

Counting the occurrence times of each error recognition result and the total times of recognition errors;

and determining the ratio of the occurrence times to the total times as the error recognition association degree between the corresponding error recognition result and the recognized character.

According to the embodiment of the application, the error recognition association degree between the characters is determined through the experimental statistical result, so that the accuracy of the error correction model can be improved.

In one implementation, the determining the degree of difference between characters according to the degree of similarity and the degree of error identification association includes:

the degree of difference between the characters is determined according to the formula dis (a, b) =1- [ W1 x V1 (a, b) +w2 x V2 (a, b) ], a, b is any two characters, dis (a, b) is the degree of difference between character a and character b, V1 (a, b) is the degree of similarity between character a and character b, W1 is the weight corresponding to the degree of similarity, V2 (a, b) is the degree of error recognition association between character a and character b, and W2 is the weight corresponding to the degree of error recognition association.

In one implementation, the constructing an error correction model according to the degree of difference between characters includes:

determining any character as a root node;

traversing the rest characters, and inserting the characters into the error correction model as child nodes of the root node or inserting the characters into the error correction model as child nodes of the child nodes based on a preset insertion rule.

In one implementation, the preset insertion rule includes:

taking the root node as an insertion node;

calculating a first difference degree between the character to be inserted and the insertion node;

calculating a second difference degree between the inserted node and each corresponding child node;

if a first child node exists, so that a second difference degree corresponding to the first child node is the same as the first difference degree, the first child node is used as an insertion node, and the process of calculating the first difference degree between the character to be inserted and the insertion node is executed; and if the first child node does not exist, inserting the character to be inserted into the error correction model as a child node of the insertion node.

The embodiment of the application builds the error correction model based on the traditional BK tree structure by using the difference degree, and can reduce the complexity of retrieval.

In one implementation, the method further comprises:

and presetting a difference threshold of the error correction model, wherein the difference between the replacement character and the character to be processed output by the error correction model is larger than the difference threshold.

In one implementation manner, the determining the comprehensive difference degree corresponding to each replacement character according to the difference degrees corresponding to the replacement characters in different error correction models includes:

Determining the comprehensive difference degree according to a formula Sdis (c, d) =w1×dis1 (c, d) +w2×dis2 (c, d) +w3×dis3 (c, d), wherein c and d are the comprehensive difference degree between the replacement character and the character to be processed respectively, sdis (c, d) is the comprehensive difference degree between the replacement character and the character to be processed, dis1 (c, d) is the difference degree of the output of the pronunciation model, dis2 (c, d) is the difference degree of the output of the pronunciation tone model, dis3 (c, d) is the difference degree of the output of the fuzzy model, and w1, w2 and w3 are weights of dis1 (c, d), dis2 (c, d) and dis3 (c, d) in sequence.

The embodiment of the application determines the target character finally used for replacing the character to be processed based on the output of the three error correction models, so that the error correction result is more accurate.

In a second aspect, an embodiment of the present application provides a character error correction apparatus for speech recognition, the apparatus including:

the building module is used for building an error correction model according to the difference degree between characters, and the error correction model comprises the following components: a pronunciation model, a pronunciation tone model and a blur model;

the input module is used for inputting the characters to be processed into the error correction model so that the error correction model outputs the replacement characters of the characters to be processed and the difference degree between each replacement character and the characters to be processed;

The determining module is used for determining the comprehensive difference degree corresponding to each replacement character according to the difference degrees corresponding to the replacement characters in different error correction models;

and the replacing module is used for determining the replacing character with the largest comprehensive difference degree as a target character, and the target character is used for replacing the character to be processed.

In a third aspect, an embodiment of the present application provides an electronic device, including:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions that are invoked by the processor to perform the method provided in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium including a stored program, wherein the program when executed by a processor implements the method provided in the first aspect.

In the embodiment of the application, an error correction model is constructed according to the difference degree between characters, and the method comprises the following steps: the method comprises the steps of inputting characters to be processed into an error correction model during error correction processing, outputting replacement characters of the characters to be processed and the difference degree between each replacement character and the characters to be processed by the error correction model, determining the comprehensive difference degree corresponding to each replacement character according to the difference degrees corresponding to the replacement characters in different error correction models, and finally determining the replacement character with the largest comprehensive difference degree as a target character, wherein the target character is used for replacing the characters to be processed. The method can effectively reduce the complexity of error correction retrieval and improve the accuracy of error correction results.

Drawings

FIG. 1 is a flow chart of a character error correction method for speech recognition according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a character error correction method for speech recognition according to an embodiment of the present application;

FIG. 3A is a schematic diagram of another character error correction method for speech recognition according to an embodiment of the present application;

FIG. 3B is a schematic diagram of another character error correction method for speech recognition according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another character error correction method for speech recognition according to an embodiment of the present application;

FIG. 5A is a flowchart of another character error correction method for speech recognition according to an embodiment of the present application;

FIG. 5B is a schematic diagram of another character error correction method for speech recognition according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a character error correction device for voice recognition according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For a better understanding of the technical solutions of the present specification, the following detailed description of the embodiments of the present application refers to the accompanying drawings.

It should be understood that the described embodiments are only some, but not all, of the embodiments of the present description. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present disclosure.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

Today, in the field of speech recognition, such as ASR phrase recognition, the main errors are near-speech errors, i.e. the character string to be corrected is very similar to the correct character crosstalk. The BK tree based on the editing distance is a method for character error correction, and the method can search correct characters in the BK tree, but the search time through the editing distance is high in complexity and large in search amount. The embodiment of the application provides a character error correction method for voice recognition, which constructs an error correction model by replacing editing distance with the difference degree between characters, thereby reducing the complexity of retrieval time.

Fig. 1 is a flowchart of a character error correction method for speech recognition according to an embodiment of the present application. The method can be applied to processing equipment such as a server and the like, can process the situation of voice recognition errors, such as user voice input, the server recognizes through a natural language processing technology, certain characters which are required to be processed by error correction in the recognition process are determined, and the method can output texts which are consistent with the user voice input. As shown in fig. 1, the method may include:

Step 101, constructing an error correction model according to the difference degree between characters, wherein the error correction model comprises: pronunciation model, pronunciation tone model and fuzzy model.

In the embodiment of the application, the characters used for constructing the error correction model are usually determined, and in an actual scene, the error correction processing is required to be carried out, and the characters are usually real words, such as names of people, names of places and the like, so that the characters with higher error probability can be prestored according to experimental results or experience. When the server builds the error correction model, the pre-stored characters are obtained, and the error correction model is built according to the difference between the characters. The degree of difference between characters is mainly determined by the difference in spelling tones of the characters. The error correction models may include three of a pronunciation model, a pronunciation tone model, and a blur model, and as such the degree of difference between two characters may be different in different error correction models. The character of the embodiment of the application can refer to a character string formed by a single character or a plurality of characters, and the error correction model is generally constructed in a pinyin form.

And 102, inputting the characters to be processed into an error correction model, so that the error correction model outputs the replacement characters of the characters to be processed and the degree of difference between each replacement character and the characters to be processed.

The character to be processed in the embodiment of the application is the character with error in voice recognition, after the server determines the character to be processed needing error correction, the character to be processed is input into an error correction model, the error correction model searches part of characters according to the search rule, the replacement characters meeting the conditions are output, and meanwhile, the difference degree between each replacement character and the character to be processed is output. Wherein, the search rule is similar to the search mode of BK tree.

And step 103, determining the comprehensive difference degree corresponding to each replacement character according to the difference degrees corresponding to the replacement characters in different error correction models.

In the embodiment of the application, the error correction model constructed by the server comprises the following steps: pronunciation model, pronunciation tone model and fuzzy model. Also the degree of difference of the two characters in different error correction models may not be the same. Therefore, the comprehensive difference degree between each replacement character and the character to be processed needs to be weighted and output based on the output results of all error correction models.

And 104, determining the replacement character with the smallest comprehensive difference degree as a target character, wherein the target character is used for replacing the character to be processed.

The server determines the target character and replaces the character to be processed with the target character, so that the correct text conforming to the original voice can be obtained.

In the embodiment of the application, the server constructs a plurality of error correction models to output the replacement characters and the corresponding difference degrees together, weights and determines the final comprehensive difference degrees, determines the replacement character with the minimum comprehensive difference degree as the target character, replaces the character to be processed with the target character, and completes the efficient and accurate error correction processing.

In an alternative embodiment, the server needs to determine the degree of difference between the characters before constructing the error correction model based on the degree of difference between the characters. After the server acquires the pre-stored characters, the similarity between the characters and the error recognition association degree between the characters are determined, and then the difference degree between the characters is determined according to the similarity and the error recognition association degree.

In an alternative embodiment, the server may pre-configure a plurality of similarity levels according to spelling rules among characters, each similarity level corresponding to a similarity degree. When the server determines the similarity between any two characters, the server may determine the consistent similarity level according to the spelling difference between the two characters, and the similarity corresponding to the similarity level is determined as the similarity between the characters.

In an alternative embodiment, the server builds the pronunciation model and the blur model with the similarity level between characters determined primarily by spelling differences and without the involvement of tones. Depending on the spelling differences, the similarity level may include: the initials and finals are the same, the front and back nasal sounds are different in length, the pinyin with equal length is flat, the initials with equal length are different, the initials with different length are different, the finals with different length are different, the polyphone and the initials and finals are different. According to the arrangement sequence, the similarity gradually decreases, the same similarity of the initials and the finals is highest, and the different similarities of the initials and the finals are lowest. It will be appreciated that during actual error correction, such as speech recognition error correction, the front and back nasal tones and flat cocktails are relatively prone to recognition errors. For example, the character 'Zhang' and the character 'Zhan zhan' are distinction between front and rear noses, and there is a certain probability that the 'Zhan' is erroneously recognized as 'Zhan', or the 'Zhan' is erroneously recognized as 'Zhan'. The 'Zhang Qian zhangqian' and the 'Zhan Jiang zhangqian' are equal-length pinyin flat-stick tongues, and when the pinyin is recognized, a certain probability exists that the 'Zhang Qian zhangqian' is wrongly recognized as the 'Zhan Jiang zhangqian', or the 'Zhan Jiang zhangqian' is wrongly recognized as the 'Zhang Qian zhangqian'. Compared with the front and back nasal sounds and the flat upwarp tongue, the equal-length initials are different, the equal-length finals are different, the unequal-length initials are different, the similarity of different finals with different lengths, multi-tone characters and different initials is gradually reduced. For example, 'alizarin' and 'alizarin' are polyphones, and due to the large difference between the pronunciation rules of the 'alizarin' and the 'alizarin', the 'alizarin' cannot be mistakenly recognized as 'alizarin' in the voice recognition process, so that the similarity of the polyphones is small. When the fuzzy model is constructed, a similarity level can be added on the basis of the similarity level: the initials and finals are the same and different in accent. When the server builds the pronunciation tone model, the similarity level between characters is mainly determined by spelling and tone, and specifically may include: the same tone of the initial consonant and the final consonant, different tones of the initial consonant and the final consonant, the same tone of the nose before and after the unequal tone, different tones of the nose before and after the unequal tone, the same tone of the equal length pinyin flat tongue, different tones of the equal length consonant and the final consonant, different tones of the equal length vowel and the final vowel, different tones of the unequal length consonant and the final vowel, different tones of the unequal length vowel and the final vowel, and multiple tones and different vowels. Based on the similarity level provided by the embodiment of the application, the method can be further distinguished according to actual conditions, and the embodiment of the application is not limited.

In an alternative embodiment, the degree of error recognition association between characters may be derived from experimental results. Firstly, recording is carried out in a manual recording mode, then a server carries out multiple times of recognition on recorded voice through ASR, if the recognition is wrong, corresponding error recognition results are recorded, the occurrence number of each error recognition result and the total number of voice recognition are counted after the recognition is finished, and the ratio of the occurrence number to the total number is determined to be the error recognition association degree between the corresponding error recognition result and the recognized character. The error recognition result can be recorded in a mode of indexing key-value, wherein key is a correct character, and value is an error recognition result. For example, a recording of a plum (li) is manually entered, and the server recognizes the plum (li) multiple times by ASR, wherein there are 5 times of mistakes recognized as li (li), 3 times of mistakes recognized as you (ni), two times of mistakes recognized as le (le), and the total number of recognition errors is 10. The server determines that the error recognition association degree of the plums (li) and the li (li) is 0.5, the error recognition association degree of the plums (li) and the you (ni) is 0.3, and the error recognition association degree of the plums (li) and the happiness (le) is 0.2 according to the recognition result.

In an alternative embodiment, after determining the similarity between the characters and the error recognition association, the server may determine the difference between the characters according to the formula dis (a, b) =1- [ W1×v1 (a, b) +w2×v2 (a, b) ], where a, b are any two characters, dis (a, b) is the difference between the characters a and b, V1 (a, b) is the similarity between the characters a and b, W1 is the weight corresponding to the similarity, V2 (a, b) is the error recognition association between the characters a and b, and W2 is the weight corresponding to the error recognition association. For example, the character a= 'Xie Shan' (xieshan), the character b= 'three' (xiesan), the server calculates the degree of difference of last name and first name, dis (thank, solution) =1, dis (san, three) =0.7, and then adds the two results dis (a, b) =1+0.7=1.7, respectively. The server finally determines that the degree of difference between 'Xie Shan' (xieshan) and 'solution three' (xiesan) is 1.7. The similarity corresponding to each similarity level between characters can be flexibly set according to experimental results or experience.

According to the embodiment of the application, the difference degree between the characters can be reasonably determined through the similarity and the error recognition association degree between the characters, and compared with the editing distance, the difference degree between the characters is more refined, so that the complexity of the retrieval time is reduced.

In an alternative embodiment, after the server completes the calculation of the difference between the characters, an error correction model can be constructed according to the difference between the characters. The error correction model of the embodiment of the application is reconstructed by taking the structure of the BK tree as a basis and replacing the editing distance of the BK tree by the difference degree between characters. The specific steps may include: and determining any character as a root node, traversing the rest characters, and inserting the character as a child node of the root node into the error correction model or inserting the character as a child node of the child node into the error correction model based on a preset insertion rule. The preset insertion rule of the embodiment of the application comprises the following steps: step 1, taking a root node as an insertion node; step 2, calculating a first difference degree between the character to be inserted and the insertion node; step 3, calculating a second difference degree between the inserted node and each corresponding child node; step 4, if the first sub-node exists, and the second difference degree corresponding to the first sub-node is the same as the first difference degree, the first sub-node is used as an insertion node, and the step 2 is returned; and if the first child node does not exist, inserting the character to be inserted into the error correction model as a child node of the insertion node. Taking fig. 2 as an example, the 'zhangsan'201 is taken as a root node, when a new character 'zhangsen'202 needs to be inserted into an error correction model, the 'zhangsen' 201 is taken as an insertion node, the first difference between the 'zhangsen'202 and the 'zhangsan'201 is calculated to be 1, and at the moment, the insertion node 'zhangsan'201 does not have other sub-nodes, so the 'zhangsen'202 can be directly used as the sub-node of the insertion node 'zhangsan'201 to be inserted into the error correction model. When a new character 'zhangshen'203 needs to be inserted into the error correction model, the root node 'zhangshen' 201 is also taken as an insertion node, and the first difference degree between the 'zhangshen'202 and the 'zhangshen' 201 is calculated to be 1, and at the moment, the difference degree between the child node 'zhangsen'202 of the insertion node 'zhangsen' 201 and the insertion node 'zhangshen' 201 is 1, so the 'zhangshen'202 is re-taken as the insertion node. Since the insertion node 'zhangsen'202 does not have a child node, the 'zhangshen'203 can be directly inserted into the error correction model as a child node of the insertion node 'zhangsen' 202. When a new character ' zhengsen '204 is inserted into the error correction model, firstly, a root node ' zhengsen '201 is taken as an insertion node, a first difference degree between the ' zhengsen '204 and the ' zhangsen '201 is calculated to be 2, and the insertion node zhangsen '201 has a child node with a second difference degree of 2, so that the ' zhengsen '204 can be directly inserted into the error correction model as the child node of the insertion node ' zhangsen ' 201. After all characters are inserted, the difference degree between each root node or child node and the corresponding child node is not repeated.

In an alternative embodiment, after the server builds the error correction model, a difference threshold of the error correction model is set, and after the character to be processed is input into the error correction model, the error correction model outputs all characters with the difference smaller than the difference threshold.

In the embodiment of the application, the server is based on the BK tree, and the error correction model is reconstructed according to the difference degree between the characters instead of the edit distance of the BK tree, and the difference degree between the characters is finer than the edit distance granularity, so that the search range can be effectively reduced, and the search efficiency is improved.

The distinction between the pronunciation model, the pronunciation tone model, and the blur model is further described below with specific embodiments. Taking fig. 3A as an example, both the last name and the first name of Zhang Qian are polyphones, zhang can read one sound and four sounds of 'zhang' and madder can read 'qian' and 'xi'. Since the pronunciation model does not involve a tone, 'zhangqian' and 'zhangxi' can both be determined to be exactly the same as Zhang Qian, with a degree of difference of 0. The voicing tone model relates to tone, 'zhang1qian4' and 'zhang4xi1' can be determined to be exactly the same as Zhang Qian, with a degree of difference of 0. In the fuzzy model, the difference degree of Zhang Qian and 'zhangqiang' and 'zhangxu' is 1, in the pronunciation model, the difference degree of Zhang Qian and 'zhangqiang' and 'zhangxu' is also 1, the basic structure of the paste model is similar to that of the pronunciation model, and the difference is that the fuzzy model modifies the model based on polyphones and accents, so that the influence of the polyphones and the accents on voice recognition can be effectively processed. Taking fig. 3B as an example, in the pronunciation tone model, the difference degree of ' zhang1qian4'301 and zhang1qian1'302 is 1, the spellings of the two are the same, the only difference is that the tone of qian, the difference degree of ' zhang1qian4'301 and ' zheng4qian4'303 is 2, and the spelling and tone of the two are different. In the pronunciation model, the degree of difference between 'zhangqian'304 and 'zhengqian'305 is 1, and the degree of difference between the two is smaller than in the pronunciation tone model because the pronunciation model does not involve tone. It will be appreciated that the degree of difference in the pronunciation tone model for 'Zhang Qian zhangqian' and 'Zheng Qian zhengqian' is 2 and the degree of difference in the pronunciation model is 1. The spelling of 'zhangqian'304 and 'zangqian' 306 differ more, with a degree of difference of 2 in the pronunciation model. In the fuzzy model, 'zhangqian'307 and 'zhengqian'309 also have a degree of difference of 1, which is the same as the degree of difference of both in the pronunciation model. And the degree of difference between 'zhangqian'307 and 'zangqian' 308 is 1, which is different from the degree of difference between the two in the pronunciation model. If the degree of difference between the characters is determined according to the formula of the degree of difference between the characters, 'zhangqian'307 and 'zangqian' 308 should be 2, but the influence of the user accent on the voice recognition is considered in the fuzzy model, the user in some areas does not distinguish the flat tongue when speaking, 'zhang' will normally sound as 'zang', and the degree of difference between 'zhangqian'307 and 'zangqian' 308 is reduced in the fuzzy model based on the influence factor. The fuzzy model also considers the influence of polyphones, for example, 'solution jie' usually reads 'solution xie' in a name, if a user reads the name, the correct pronunciation 'solution xie' is read as 'solution jie', and a voice error is identified as 'Jia jia', and 'Jia jia' is input into the pronunciation model, and because the initial and the final of 'Jia jia' are different from each other, the degree of difference is high, and therefore, the 'solution xie' is difficult to output as a replacement character; and 'Jia jia' is input into a fuzzy model, and the fuzzy model takes the situation that a user misplaces multi-tone words into consideration, so that the degree of difference between 'Jia jia' and 'solution xie' is low, and the fuzzy model can output the 'solution xie' as a replacement character. It can be appreciated that if the user mandarin level is high, the sounding standard, the replacement character output by the sounding tone model is more accurate; if the user mandarin level is common, the replacement character output by the pronunciation model is more accurate; if the user mandates a lower level and accents are heavy, or if the cultural level is generally wrong, the alternative character output by the fuzzy model is more accurate.

In an alternative embodiment, after the error correction model outputs the replacement character and the corresponding difference degree, the server may determine the integrated difference degree according to the formula Sdis (c, d) =w1×dis1 (c, d) +w2×dis2 (c, d) +w3×dis3 (c, d), where c, d are the integrated difference degree between the replacement character and the character to be processed, sdis (c, d) is the integrated difference degree between the replacement character and the character to be processed, dis1 (c, d) is the difference degree output by the pronunciation model, dis2 (c, d) is the difference degree output by the pronunciation tone model, dis3 (c, d) is the difference degree output by the fuzzy model, and w1, w2 and w3 are weights of dis1 (c, d), dis2 (c, d) and dis3 (c, d) in sequence. Typically, the weights are set to w1> w2> w3 taking into account the average level of mandarin of the user.

In an alternative embodiment, the server may reconstruct the error correction model based on the BK tree with the hamming distance representing the number of different characters in the corresponding positions of the two (same length) character strings instead of the edit distance. As shown in fig. 4, the hamming distances of 'zhangsen' 401 and 'zhangsen'402 are 1, the hamming distances of 'zhangsen' 401 and 'zhengsen'403 are 2, the hamming distances of 'zhangsen'402 and 'zhangshen'404 are 1, the hamming distances of 'zhangsen'402 and 'zhangshen' 405 are 2, and the hamming distances of 'zhengsen'403 and 'zhengsen' 406 are 1. The hamming distance of 'aim'407 and 'acm'408 is1 and the hamming distance of 'aim'407 and 'gay'409 is 3.

Replacing the edit distance of the BK tree with the Hamming distance can reduce the retrieval complexity for strings of the same length.

Fig. 5A is a flowchart of another character error correction method for speech recognition according to an embodiment of the present application. As shown in FIG. 5A, the method is divided into two aspects of online error correction processing and offline error correction model construction. The flow of the online error correction process may include:

step 501, obtaining a text to be processed.

The text here may be text resulting from speech recognition.

Step 502, identify and determine the fields to be processed.

The server can recognize the input text through natural language understanding (Natural Language Understanding, NLU) and determine the fields to be processed for correction.

In step 503, the pinyin tool notes the pinyin and generates the character to be processed.

The server can annotate the text field to be corrected with the voice through the pinyin tool to generate characters to be processed (pinyin forms). If the field to be processed is a name, the server can also perform auxiliary phonetic notation according to the name recorded in the address book.

At step 504, the error correction model recalls a plurality of replacement characters.

Each replacement character has a corresponding degree of difference from the character to be processed.

Step 505, weighting determines the target character.

According to the formula Sdis (c, d) =w1×dis1 (c, d) +w2×dis2 (c, d) +w3×dis3 (c, d), determining the comprehensive degree of difference of the replacement characters, and determining the replacement character with the smallest comprehensive degree of difference as the target character.

Step 506, outputting the target character.

The server replaces the target character with the character to be processed and outputs the final correct text.

In the embodiment of the present application, the step of offline constructing the error correction model may include:

in step 507, a basic text is obtained.

The basic text here includes characters required for constructing the error correction model.

In step 508, the similarity between the characters is determined.

Step 509, asr recognition.

Step 510, determining the degree of error recognition association between characters.

The server determines the association degree of the error recognition by performing ASR recognition on each character for a plurality of times and counting the result and the times of the error recognition.

In step 511, an error correction model is constructed.

The server determines the degree of difference between the characters based on the determined degree of similarity between the characters and the degree of erroneous recognition correlation between the characters. And constructing an error correction model according to the difference degree between the characters, wherein the error correction model comprises a pronunciation model, a pronunciation tone model and a fuzzy model. In addition, corresponding nodes (such as the degree of difference between 'jacquard jia' and 'solution jie' and the degree of difference between 'Zhang' and 'dirty zang') are added or modified in the fuzzy model according to the polyphonic vocabulary 512 and the accent vocabulary 513, so that the fuzzy model can cope with the polyphonic false recognition and accent false recognition cases.

In an actual use scenario, the method can also be applied to a smart phone. As shown in fig. 5B, if the user wants to dial a phone call Zhang Qian, the user can input "dial a phone call Zhang Qian" through the voice assistant of the smart phone, and the smart phone recognizes the input voice, and the recognition result is "dial a phone call Zhang Qian". After the intelligent mobile phone determines that the name field has identification errors, the intelligent mobile phone determines 'Zhang Qian' as a character to be processed and inputs an error correction model to perform error correction processing, the error correction model outputs 'Zhang Qian', and the intelligent mobile phone performs corresponding telephone dialing operation according to the output result.

Fig. 6 is a schematic structural diagram of a character error correction device for voice recognition according to an embodiment of the present application. The device may be deployed on a server, as shown in fig. 6, and may include: a construction module 610, an input module 620, a determination module 630, and a replacement module 640.

A construction module 610, configured to construct an error correction model according to the degree of difference between characters, where the error correction model includes: pronunciation model, pronunciation tone model and fuzzy model.

The input module 620 is configured to input the characters to be processed into the error correction model, so that the error correction model outputs the replacement characters of the characters to be processed and the degree of difference between each replacement character and the characters to be processed.

And the determining module 630 is configured to determine an integrated difference degree corresponding to each replacement character according to the difference degrees corresponding to the replacement characters in different error correction models.

And the replacing module 640 is used for determining the replacing character with the largest comprehensive difference degree as a target character, wherein the target character is used for replacing the character to be processed.

The character error correction device for voice recognition in the embodiment of the application can be used as character error correction equipment for voice recognition to realize the character error correction method for voice recognition provided by the embodiment of the application.

If the electronic device 700 is deployed in a smart phone, the electronic device 700 may include a processor 710, an external memory interface 720, an internal memory 721, a universal serial bus (universal serial bus, USB) interface 730, a charge management module 740, a power management module 741, a battery 742, an antenna, a mobile communication module 750, an audio module 760, and the like.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 700. In other embodiments of the application, electronic device 700 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 710 may include one or more processing units such as, for example: processor 710 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 710 for storing instructions and data. In some embodiments, the memory in processor 710 is a cache memory. The memory may hold instructions or data that has just been used or recycled by the processor 710. If the processor 710 needs to reuse the instruction or data, it may be called directly from the memory. Repeated accesses are avoided and the latency of the processor 710 is reduced, thereby improving the efficiency of the system.

In some embodiments, processor 710 may include one or more interfaces. The interface may comprise an integrated circuit (inter-integrated circuit, I2C) interface, the I2C interface being a bi-directional synchronous serial bus comprising a serial data line (SDA) and a serial clock line (derail clock line, SCL). In the embodiment of the present application, the processor 710 may be coupled to the audio module 760 through the I2S bus to implement character voice input, count recognition error results, and determine the degree of association of error recognition between characters.

USB interface 730 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc. USB interface 730 may be used to connect a charger to charge electronic device 700, or may be used to transfer data between electronic device 700 and a peripheral device.

It should be understood that the connection between the modules illustrated in the embodiments of the present application is only illustrative, and does not limit the structure of the electronic device 700. In other embodiments of the present application, the electronic device 700 may also use different interfacing manners, or a combination of multiple interfacing manners, as in the above embodiments.

The charge management module 740 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 740 may receive a charging input of a wired charger through the USB interface 730. In some wireless charging embodiments, the charge management module 740 may receive wireless charging input through a wireless charging coil of the electronic device 700. The charging management module 740 may also provide power to the electronic device through the power management module 741 while charging the battery 742.

The power management module 741 is configured to connect the battery 742, and the charge management module 740 and the processor 710. The power management module 741 receives inputs from the battery 742 and/or the charge management module 740 to power the processor 710, the internal memory 721, the mobile communication module 750, and the like.

The mobile communication module 750 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 700.

The external memory interface 720 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 700. The external memory card communicates with the processor 710 via an external memory interface 720 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

Internal memory 721 may be used to store computer-executable program code, including instructions. The internal memory 721 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 700 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 721 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 710 performs various functional applications and data processing of the electronic device 700 by executing instructions stored in the internal memory 721 and/or instructions stored in a memory provided in the processor.

If the electronic device 700 is deployed on a server, the electronic device 700 may include a processor 710 and an internal memory 721.

The processor 710 may perform various functional applications and data processing, such as implementing a character error correction method for voice recognition provided by an embodiment of the present application, by running a program stored in the internal memory 721.

The embodiment of the application also provides a non-transitory computer readable storage medium, which stores computer instructions that enable the computer to execute the character error correction method for voice recognition provided by the embodiment of the application.

The non-transitory computer readable storage media described above may employ any combination of one or more computer readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory; EPROM) or flash Memory, an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Claims

1. A method for correcting errors in speech recognition characters, the method comprising:

determining a replacement character with the minimum comprehensive difference degree as a target character, wherein the target character is used for replacing the character to be processed;

before the error correction model is built according to the difference degree between the characters, the method further comprises the following steps:

acquiring characters for constructing the error correction model;

determining the similarity between characters;

determining the error recognition association degree between characters;

determining the difference degree between characters according to the similarity degree and the error recognition association degree;

the determining the error recognition association degree between the characters comprises the following steps:

2. The method of claim 1, wherein determining the similarity between characters comprises:

3. The method of claim 2, wherein if the error correction model is the pronunciation model or the blur model, the similarity level comprises: the initials and finals are the same, the front and back nasal sounds are different in length, the pinyin with equal length is flat, the initials with equal length are different, the initials with different length are different, the finals with different length are different, the polyphone and the initials and finals are different.

4. The method of claim 2, wherein if the error correction model is the fuzzy model, the similarity level further comprises: the initials and finals are the same and different in accent.

5. The method of claim 2, wherein if the error correction model is the voicing tone model, the similarity level comprises: the same tone of the initial consonant and the final consonant, different tones of the initial consonant and the final consonant, the same tone of the nose before and after the unequal tone, different tones of the nose before and after the unequal tone, the same tone of the equal length pinyin flat tongue, different tones of the equal length consonant and the final consonant, different tones of the equal length vowel and the final vowel, different tones of the unequal length consonant and the final vowel, different tones of the unequal length vowel and the final vowel, and multiple tones and different vowels.

6. The method of claim 1, wherein said determining said degree of difference between characters from said degree of similarity and said degree of false recognition association comprises:

7. The method of claim 1, wherein the constructing an error correction model based on the degree of difference between characters comprises:

determining any character as a root node;

8. The method of claim 7, wherein the preset insertion rule comprises:

taking the root node as an insertion node;

9. The method according to claim 1, wherein the method further comprises:

and presetting a difference threshold of the error correction model, wherein the difference between the replacement character output by the error correction model and the character to be processed is smaller than the difference threshold.

10. The method of claim 1, wherein the determining the integrated variance for each replacement character based on the variance for the replacement character in different error correction models comprises:

11. A character error correction apparatus for speech recognition, the apparatus comprising:

the replacing module is used for determining a replacing character with the largest comprehensive difference degree as a target character, and the target character is used for replacing the character to be processed;

the construction module is also used for acquiring characters for constructing the error correction model; determining the similarity between characters; determining the error recognition association degree between characters; determining the difference degree between characters according to the similarity degree and the error recognition association degree;

12. An electronic device, comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions that are called by the processor to be able to perform the method of any one of claims 1 to 10.

13. A computer readable storage medium comprising a stored program, wherein the program when executed by a processor implements the method of any one of claims 1 to 10.