CN110516248A

CN110516248A - Method for correcting error of voice identification result, device, storage medium and electronic equipment

Info

Publication number: CN110516248A
Application number: CN201910798559.6A
Authority: CN
Inventors: 陈晓宇; 张彬彬; 江明奇; 雷欣
Original assignee: Go Out And Ask (suzhou) Information Technology Co Ltd
Current assignee: Volkswagen China Investment Co Ltd; Mobvoi Innovation Technology Co Ltd
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2019-11-29

Abstract

The embodiment of the present invention proposes a kind of method for correcting error of voice identification result, device, storage medium and electronic equipment, by the editing distance for calculating corresponding each second pinyin sequence of text sequence of corresponding first pinyin sequence of speech recognition result and multiple candidates, each second pinyin sequence after the first pinyin sequence and alignment after being aligned, it is greater than first threshold in response to the identical number of words of corresponding first text of the first pinyin sequence each second text corresponding with each second pinyin sequence after alignment after the alignment, the similarity of each second pinyin sequence after the first pinyin sequence and alignment after calculating the alignment, when maximum similarity in the similarity is greater than second threshold, institute's speech recognition result is replaced with to the text of corresponding second pinyin sequence of the maximum similarity, realize speech recognition As a result automatic error-correcting improves the recognition accuracy of speech recognition.

Description

Method for correcting error of voice identification result, device, storage medium and electronic equipment

Technical field

The present invention relates to technical field of voice recognition, and in particular to a kind of method for correcting error of voice identification result, device, storage Medium and electronic equipment.

Background technique

Under the validation of information scene based on automatic telephone customer service, system may require that user provides its people with voice mode Information, such as four after identification card number, the age, address etc., and be compared with the information of true user, comparison is successfully validated as Further information service can just can be provided for user after me, so the accuracy rate of the speech recognition of system is extremely under the scene It closes important.

But the identification of call voice at present, since call voice sample rate is low, channel transmission signal distortion, low noise Than etc. reasons, or due to user's accent problem, speech recognition result is often inaccurate, so that poor user experience.

Therefore, how using user information to speech recognition result carry out automatic error-correcting, to improve the knowledge of speech recognition Other accuracy rate is current urgent problem.

Summary of the invention

In view of this, the embodiment of the present invention proposes a kind of method for correcting error of voice identification result, device, storage medium and electronics Equipment promotes the recognition accuracy of speech recognition for carrying out automatic error-correcting to speech recognition result.

In a first aspect, the embodiment of the present invention proposes a kind of method for correcting error of voice identification result, which comprises

To speech recognition result carry out phonetic notation to determine corresponding first pinyin sequence of speech recognition result；

Determine the second pinyin sequence of multiple candidates；

Calculate the editing distance of first pinyin sequence and each second pinyin sequence；

The first pinyin sequence after alignment and each second pinyin sequence after alignment are determined according to corresponding editing distance；

Each second phonetic sequence after corresponding first text of the first pinyin sequence and alignment after determining the alignment Arrange corresponding each second text；

It is greater than first threshold with the identical number of words of each second text in response to first text, calculates the alignment The similarity of the first pinyin sequence afterwards and each second pinyin sequence after alignment；

It is greater than second threshold in response to the maximum similarity in the similarity, institute's speech recognition result is replaced with into institute State the text of corresponding second pinyin sequence of maximum similarity.

Preferably, it is described to speech recognition result carry out phonetic notation to determine corresponding first phonetic of speech recognition result Sequence includes:

Determine the corresponding pinyin combinations of each word in speech recognition result；

Scheduled separator is inserted between the pinyin combinations of adjacent word to obtain first pinyin sequence；

Alternatively,

It is described to speech recognition result carry out phonetic notation to determine the corresponding first pinyin sequence packet of speech recognition result It includes:

Determine that each word corresponds to pinyin combinations in speech recognition result；

Predetermined separator is inserted between the pinyin combinations of adjacent word and between the initial consonant and simple or compound vowel of a Chinese syllable of each pinyin combinations To obtain first pinyin sequence.

Preferably, the second pinyin sequence of the multiple candidates of the determination includes:

Obtain candidate text sequence；

The second pinyin sequence of corresponding multiple candidates is determined according to the text sequence of the candidate.

Preferably, the text sequence of the candidate include the identification card number of user, the birthday, address, address nearby hospitals and The text sequence of supermarket near address.

Preferably, the text sequence according to the candidate determines the second pinyin sequence packet of corresponding multiple candidates It includes:

In response to there are polyphones in text sequence, the corresponding multiple pinyin combinations of the polyphone are determined；

Multiple corresponding second pinyin sequences are determined respectively according to multiple pinyin combinations of the polyphone.

Preferably, each described the determined according to corresponding editing distance after the first pinyin sequence after alignment and alignment Two pinyin sequences include:

First pinyin sequence and each second pinyin sequence is marked to need to be inserted into or delete according to corresponding editing distance The part removed；

By the most left of first pinyin sequence and each second pinyin sequence by insertion or deletion sign flag The most right part with the insertion and deletion sign flag deleted with determine the first pinyin sequence after alignment and Each second pinyin sequence after alignment.

Preferably, the method also includes:

It is not more than second threshold in response to the maximum similarity in the similarity, keeps institute's speech recognition result not Become.

Second aspect, the embodiment of the present invention propose a kind of speech recognition result error correction device, and described device includes:

Phonetic notation unit, be configured as carrying out speech recognition result phonetic notation to determine speech recognition result corresponding the One pinyin sequence；

First determination unit is configured to determine that the second pinyin sequence of multiple candidates；

First computing unit, be configured as calculating the editor of first pinyin sequence and each second pinyin sequence away from From；

Second determination unit is configured as after determining the first pinyin sequence after alignment and alignment according to corresponding editing distance Each second pinyin sequence；

Third determination unit, corresponding first text of the first pinyin sequence and alignment after being configured to determine that the alignment Corresponding each second text of each second pinyin sequence afterwards；

Second computing unit is configured to respond to first text and is greater than with the identical number of words of each second text First threshold, the similarity of each second pinyin sequence after the first pinyin sequence and alignment after calculating the alignment；

Processing unit, the maximum similarity being configured to respond in the similarity is greater than second threshold, by institute's predicate Sound recognition result replaces with the text of corresponding second pinyin sequence of the maximum similarity.

The third aspect, the embodiment of the present invention propose a kind of computer readable storage medium, store computer program thereon Instruction, wherein the computer program instructions realize method as described in relation to the first aspect when being executed by processor.

Fourth aspect, the embodiment of the present invention propose a kind of electronic equipment, including memory and processor, wherein described Memory is for storing one or more computer program instructions, wherein one or more computer program instructions are by institute Processor is stated to execute to realize method as described in relation to the first aspect.

The text sequence that the embodiment of the present invention passes through calculating speech recognition result corresponding first pinyin sequence and multiple candidates The editing distance for arranging corresponding each second pinyin sequence, each described second after the first pinyin sequence and alignment after being aligned Pinyin sequence, in response to corresponding first text of the first pinyin sequence after the alignment and each second phonetic after alignment The identical number of words of corresponding each second text of sequence is greater than first threshold, the first pinyin sequence and alignment after calculating the alignment The similarity of each second pinyin sequence afterwards, when the maximum similarity in the similarity is greater than second threshold, by institute Speech recognition result replaces with the text of corresponding second pinyin sequence of the maximum similarity, realizes speech recognition result Automatic error-correcting improves the recognition accuracy of speech recognition.

Detailed description of the invention

By referring to the drawings to the description of the embodiment of the present invention, the above and other purposes of the present invention, feature and Advantage will be apparent from, in the accompanying drawings:

Fig. 1 is the schematic diagram of a scenario of the automatic telephone customer service of the embodiment of the present invention；

Fig. 2 is the flow chart of the method for correcting error of voice identification result of the embodiment of the present invention；

Fig. 3 is the data flowchart of the method for correcting error of voice identification result of the embodiment of the present invention；

Fig. 4 is the schematic diagram of the speech recognition result error correction device of the embodiment of the present invention；

Fig. 5 is the schematic diagram of the electronic equipment of the embodiment of the present invention.

Specific embodiment

Below based on embodiment, present invention is described, but the present invention is not restricted to these embodiments.Under Text is detailed to describe some specific detail sections in datail description of the invention.Do not have for a person skilled in the art The present invention can also be understood completely in the description of these detail sections.In order to avoid obscuring essence of the invention, well known method, mistake There is no narrations in detail for journey, process, element and circuit.

In addition, it should be understood by one skilled in the art that provided herein attached drawing be provided to explanation purpose, and What attached drawing was not necessarily drawn to scale.

Unless the context clearly requires otherwise, otherwise the similar word such as "include", "comprise" in entire application documents should solve It is interpreted as the meaning for including rather than exclusive or exhaustive meaning；That is, the meaning for being " including but not limited to ".

In the description of the present invention, it is to be understood that, term " first ", " second " etc. are used for description purposes only, without It can be interpreted as indication or suggestion relative importance.In addition, in the description of the present invention, unless otherwise indicated, the meaning of " multiple " It is two or more.

Speech recognition be using voice as research object, by Speech processing and pattern-recognition allow machine automatic identification and Understand the language of human oral.Speech recognition technology is exactly to allow machine that voice signal is changed into phase by identification and understanding process The high-tech of the text or order answered.

Fig. 1 is the schematic diagram of a scenario of the automatic telephone customer service of the embodiment of the present invention.As shown in Figure 1, in automatic speech customer service Scene under, the phone or mobile phone 11 of user pass through telephone network or internet 12 and automatic speech server 13 and computer equipment The system of 14 compositions establishes connection, and the voice messaging of user is transmitted to system, and system carries out identification to voice messaging and according to knowledge Other result makes answer, and voice messaging can also be passed to user by same system, and user makes according to the voice messaging of system and answering It is multiple, hereby it is achieved that automatic speech customer service.In this process, system can first confirm the identity of user, confirm and true User information further information service is unanimously just provided, thus how using user information to speech recognition result carry out from Dynamic error correction, so that the recognition accuracy for improving speech recognition is current urgent problem.

Fig. 2 is the flow chart of the method for correcting error of voice identification result of the embodiment of the present invention.As shown in Fig. 2, the present embodiment Method for correcting error of voice identification result includes the following steps:

Step S110, to speech recognition result carry out phonetic notation to determine the corresponding first phonetic sequence of speech recognition result Column.

Since speech recognition result is text, it can use and the phonetic notation mode of text is completed to speech recognition result Phonetic notation, obtain the corresponding phonetic of speech recognition result.

Specifically, speech recognition result can be the sentence of Chinese character composition, and corresponding phonetic refers to that the Chinese of no tone is spelled Sound sequence.

In an optional implementation manner, step S110 includes the following steps:

Step S111 determines the corresponding pinyin combinations of each word in speech recognition result.

Step S112 is inserted into scheduled separator between the pinyin combinations of adjacent word to obtain the first phonetic sequence Column.

For example, speech recognition result is " having family good fortune supermarket ", the corresponding pinyin combinations of each word are as follows:

Have: you；Family: jia；In: li；Good fortune: fu；It is super: chao；City: shi

Scheduled " space " separator is inserted between the pinyin combinations of adjacent word are as follows:

you jia li fu chao shi

That is, the first pinyin sequence are as follows:

you jia li fu chao shi

In another optional implementation, step S110 includes the following steps:

Step S113 determines the corresponding pinyin combinations of each word in speech recognition result.

Step S114 is inserted into pre- between the pinyin combinations of adjacent word and between the initial consonant and simple or compound vowel of a Chinese syllable of each pinyin combinations Separator is determined to obtain first pinyin sequence.

It is inserted between the pinyin combinations of adjacent word and between the initial consonant and simple or compound vowel of a Chinese syllable of each pinyin combinations scheduled " empty Lattice " separator are as follows:

y ou j ia l i f u ch ao sh i

That is, the first pinyin sequence are as follows:

y ou j ia l i f u ch ao sh i

It should be understood that the separator can according to need the character or symbol for replacing with and being of little use in other phonetics.Meanwhile Separator for distinguishing the separator of adjacent word and for distinguishing initial consonant and simple or compound vowel of a Chinese syllable may be the same or different.

Step S120 determines the second pinyin sequence of multiple candidates.

Specifically, step S120 may include steps of:

Step S121 obtains candidate text sequence.

The problem of being issued the user with according to system type obtains candidate's text about this problem types from user information This sequence.Wherein, based in the identification of some address classes, due to address range broadness, and exist it is a large amount of referred to as, unisonance The problem that location etc. causes discrimination lower, in preparatory user information collection process, can acquire supermarket near station address, Hospital etc. easily identifies, and the higher information of recognition accuracy, can be improved so quasi- about the identification of address class problem identification True rate is more conducive to the confirmation of user information.

For example, the supermarket near system interrogation user man, it can be from the supermarket obtained in user information near user family.If There are Milan life supermarket, China Resources supermarket, the sincere supermarket in Weihai, love convenience store and Carrefour hypermarket in supermarket near user family, that Supermarket, China Resources supermarket, the sincere supermarket in Weihai, love convenience store and Carrefour hypermarket are just lived into as candidate text sequence in Milan Column.

Certainly, in addition to address class, the lower validation of information problem of other recognition accuracies can also be by acquiring some be easy The identification and higher information of recognition accuracy carries out further user information confirmation.

Optionally, in embodiments of the present invention, the text sequence of the candidate includes the identification card number of user, the birthday, lives The text sequence of supermarket near location, address nearby hospitals and address.

Step S122 determines the second pinyin sequence of corresponding multiple candidates according to the text sequence of the candidate.

Specifically, step S122 is similar with step S110, and different places is, step S122 further include:

In response to there are polyphones in candidate text sequence, the corresponding multiple pinyin combinations of the polyphone are determined.

For example, in candidate text sequence " Carrefour hypermarket ", there are polyphone " pleasure ", the pinyin combinations of " pleasure " are as follows: le or yue

So, corresponding second pinyin sequence of candidate text sequence " Carrefour hypermarket " are as follows:

jia le fu chao shi

jia yue fu chao shi

Or

j ia l e f u ch ao sh i

j ia y ue f u ch ao sh i

In embodiments of the present invention, the considerations of polyphone situation being carried out to candidate text sequence, to there are polyphones Text carries out the determination of multiple corresponding second pinyin sequences, improves the accuracy rate of speech recognition error correction.

In addition, determining the second pinyin sequence of the first pinyin sequence and multiple candidates in step S110 and step S120 Same way need to be used.

Step S130 calculates the editing distance of first pinyin sequence and each second pinyin sequence.

Editing distance refers between two word strings, and the minimum edit operation times needed for another are changed into as one.License Edit operation include that a character is substituted for another character, be inserted into a character, delete a character.

In step s 130, define a kind of new editing distance, here editing distance refer to two pinyin sequences it Between, the minimum edit operation needed for another is changed into as one.

If obtaining pinyin sequence in a manner of being inserted into scheduled separator between the pinyin combinations in adjacent word, then The corresponding phonetic of single word is regarded as a character when carrying out editing distance calculating to be inserted into, be deleted or replacement operation.

For example, the first pinyin sequence are as follows: you jia li fu chao shi, the second pinyin sequence are as follows: jia le fu Chao shi, then the first pinyin sequence to be changed into the editing distance of the second pinyin sequence are as follows:

Jia li fu chao shi (deletes you)

Jia le fu chao shi (replaces li with le)

It is corresponding, the second pinyin sequence is changed into the editing distance of the first pinyin sequence are as follows:

You jia le fu chao shi (insertion you)

You jia li fu chao shi (replaces le with li)

If predetermined to be inserted between the pinyin combinations of adjacent word and between the initial consonant and simple or compound vowel of a Chinese syllable of each pinyin combinations The mode of separator obtains pinyin sequence, then regarding initial consonant, simple or compound vowel of a Chinese syllable as a character respectively when carrying out editing distance calculating To be inserted into, be deleted or replacement operation.

For example, the first pinyin sequence are as follows: y ou j ia l i f u ch ao sh i, the second pinyin sequence are as follows: j ia L e f u ch ao sh i, then the first pinyin sequence to be changed into the editing distance of the second pinyin sequence are as follows:

Ou j ia l i f u ch ao sh i (deletes y)

J ia l i f u ch ao sh i (deletes ou)

J ia l e f u ch ao sh i (replaces i with e)

Y j ia l e f u ch ao sh i (insertion y)

You j ia l e f u ch ao sh i (insertion ou)

You j ia l i f u ch ao sh i (replaces e with i)

Step S140 determines the first pinyin sequence after alignment and each described second after alignment according to corresponding editing distance Pinyin sequence.

In step S140, the operation in editing distance in addition to replacement is only considered, that is, only considering to delete and be inserted into.

Specifically, step S140 includes the following steps:

Step S141 marks first pinyin sequence and each second pinyin sequence needs according to corresponding editing distance Insertion or the part deleted.

Step S142, by first pinyin sequence and each second phonetic sequence by insertion or deletion sign flag It is deleted to determine that first after alignment spells the most left and most right part with the insertion and deletion sign flag of column Each second pinyin sequence after sound sequence and alignment.

In step S142, due to the insertion of editing distance, delete operation be it is corresponding, as shown in example in step S130, To the most left and most right of first pinyin sequence and each second pinyin sequence by insertion or deletion sign flag Part with the insertion or deletion sign flag is deleted, described after the sequence that will be aligned, namely alignment Each second pinyin sequence after first pinyin sequence and alignment.

As first example in step S130 with insertion or is deleted described in symbol "-" label according to corresponding editing distance First pinyin sequence needs the partial results be inserted into or deleted are as follows:

you jia li fu chao shi

The partial results for marking second pinyin sequence to need to be inserted into or delete with being inserted into or deleting symbol "-" are as follows:

__jia le fu chao shi

Will through insertion or delete sign flag first pinyin sequence (youJia li fu chao shi) and The most left and most right portion with the insertion and deletion sign flag of second pinyin sequence (_ _ jia le fu chao shi) Divide and deleted, as a result are as follows:

Jia li fu chao shi and jia le fu chao shi, thus the first pinyin sequence after being aligned Jia li fu chao shi and the second pinyin sequence jia le fu chao shi after alignment.

In addition, it is necessary to which explanation needs the part replaced not mark first pinyin sequence in the present embodiment Note processing.

Each described the after step S150, corresponding first text of the first pinyin sequence after determining the alignment and alignment Corresponding each second text of two pinyin sequences.

Step S160 is greater than first threshold with the identical number of words of each second text in response to first text, calculates institute The similarity of each second pinyin sequence after first pinyin sequence and alignment after stating alignment.

Specifically, the identical number of words of first text and each second text, namely the speech recognition after being aligned As a result the number of words of the candidate text sequence after hit alignment, first threshold refers to preset hit number of words, because there are user's languages Sound is very short and the case where just hitting the word in candidate text sequence, and the number of words that need to meet hit is greater than first threshold, Cai Nengjin The calculating of row similarity.

For example, corresponding first pinyin sequence of speech recognition result is shi, the second pinyin sequence is jia le fu chao Shi, if without whether being greater than the judgement of first threshold, and directly carry out the calculating of similarity, then similarity is a hundred percent, This is not consistent with practical.

In embodiments of the present invention, need to carry out the identical of first text and each second text before calculating similarity Whether number of words is greater than the judgement of first threshold, in the case where meeting identical number of words greater than first threshold, then carries out similarity It calculates, avoids speech recognition result error correction mistake, improve the accuracy rate of speech recognition result error correction.

Optionally, calculating for the similarity can be by the number of operations of the replacement of calculating editing distance in step S130 Divided by each second pinyin sequence after alignment or the letter sum of the first pinyin sequence after alignment.

For above-mentioned example, similarity is to calculate the replacement operation number 1 of editing distance divided by the second pinyin sequence Alphabetical sum 14.

Optionally, the calculating of the similarity can also add 1 inverse to calculate by editing distance.

In embodiments of the present invention, the similarity is each for first pinyin sequence after the alignment and after being aligned The editing distance of second pinyin sequence adds 1 inverse.Here editing distance refers to minimum edit operation times.

Optionally, the calculating of the similarity can also be calculated by COS distance.

In embodiments of the present invention, first pinyin sequence after the alignment and each described second after alignment is spelled Sound sequence vector, first pinyin sequence after obtaining the alignment are corresponding with each second pinyin sequence after alignment Vector, pass through each second pinyin sequence after calculating separately alignment with COS distance and described the after corresponding be aligned Two vectorial angle cosine values of one pinyin sequence, the cosine value are the similarity.Here, to will be after the alignment The method of each second pinyin sequence vectorization does not illustrate after first pinyin sequence and alignment.

Step S170 is greater than second threshold in response to the maximum similarity in the similarity, by the speech recognition knot Fruit replaces with the text of corresponding second pinyin sequence of the maximum similarity.

Due to candidate text be it is multiple, it is corresponding just to have multiple second pinyin sequences, multiple editing distances, after multiple alignment The first pinyin sequence, each second pinyin sequence after multiple alignment, multiple similarities.

In step S160, second threshold is used to characterize the preset similarity degree of the similarity.If maximum similar Degree is greater than second threshold, then it is assumed that the text and speech recognition result of corresponding second pinyin sequence of the maximum similarity are enough It is similar, so that can be determined that speech recognition result is the text of corresponding second pinyin sequence of the maximum similarity, therefore, Institute's speech recognition result is replaced with to the text of corresponding second pinyin sequence of the maximum similarity.

Furthermore it is also possible to include step S170, that is, being not more than second in response to the maximum similarity in the similarity Threshold value keeps institute's speech recognition result constant.

That is, not larger than determining that institute's speech recognition result is the maximum similarity in the maximum similarity When the first threshold of the text of corresponding second pinyin sequence, institute's speech recognition result is not processed, is avoided not true The mistake of speech recognition result is handled in the case where fixed.

Fig. 3 is the data flowchart of the method for correcting error of voice identification result of the embodiment of the present invention.As shown in figure 3, combining figure 2, the data flow of the present embodiment is as follows:

Step S310, to speech recognition result 31 carry out phonetic notation to determine corresponding first phonetic of speech recognition result Sequence 32.

In an optional implementation manner, step S310 includes the following steps:

Step S311 determines the corresponding pinyin combinations of each word in speech recognition result.

Step S312 is inserted into scheduled separator between the pinyin combinations of adjacent word to obtain the first phonetic sequence Column.

In another optional implementation, step S310 includes the following steps:

Step S313 determines the corresponding pinyin combinations of each word in speech recognition result.

Step S314 is inserted into pre- between the pinyin combinations of adjacent word and between the initial consonant and simple or compound vowel of a Chinese syllable of each pinyin combinations Separator is determined to obtain first pinyin sequence.

Step S320 determines the second pinyin sequence 33 of multiple candidates.

Specifically, step S320 includes the following steps:

Step S321 obtains candidate text sequence.

Optionally, in embodiments of the present invention, the text sequence of the candidate includes the identification card number of user, the birthday, lives Supermarket's text sequence near location, address nearby hospitals and address.

Step S322 determines the second pinyin sequence of corresponding multiple candidates according to the text sequence of the candidate.

Specifically, step S322 is similar with step S310, and different places is, step S322 further include:

Step S330 calculates the editing distance 34 of first pinyin sequence 32 and each second pinyin sequence 33.

In step S330, a kind of new editing distance is defined, editing distance 34 refers to minimum edit operation here. The edit operation of license includes that a character is substituted for another character, is inserted into a character, deletes a character.

Step S340, it is each described after determining the first pinyin sequence 36 after being aligned according to corresponding editing distance 35 and be aligned Second pinyin sequence 37.

Specifically, step S340 includes the following steps:

Step S341 marks first pinyin sequence 32 and each second pinyin sequence according to corresponding editing distance 35 33 parts for needing to be inserted into or delete.

Step S342, by first pinyin sequence 32 and each second phonetic by being inserted into or deleting sign flag It is deleted to determine the after alignment the most left and most right part with the insertion and deletion sign flag of sequence 33 Each second pinyin sequence 37 after one pinyin sequence 36 and alignment.

Wherein, in step S341, the part replaced is needed not mark processing first pinyin sequence.

Step S350, it is each described after corresponding first text 38 of the first pinyin sequence after determining the alignment and alignment Corresponding each second text 39 of second pinyin sequence.

Step S360, judges whether first text and the identical number of words of each second text are greater than first threshold 40, when the identical number of words is greater than first threshold 40, step S370, S390 is executed, it is no to then follow the steps S380.

Step S370, the first pinyin sequence 36 after calculating the alignment and each second pinyin sequence 37 after alignment Similarity 41.

Optionally, calculating for the similarity 41 can be secondary by calculating the operation of the replacement of editing distance in step S330 Number is total divided by the letter of each second pinyin sequence.

Optionally, the calculating of the similarity 41 can also add 1 inverse to calculate by editing distance.

Optionally, the calculating of the similarity 41 can also be calculated by COS distance.

Step S380 keeps institute's speech recognition result 31 constant.

Step S390, judges whether the maximum similarity 42 in the similarity 41 is greater than second threshold 43, in the phase When being greater than second threshold 43 like degree, step S400 is executed, it is no to then follow the steps S410.

Institute's speech recognition result 31 is replaced with corresponding second pinyin sequence of the maximum similarity 39 by step S400 Text 44.

Step S410 keeps institute's speech recognition result 31 constant.

In embodiments of the present invention, second threshold is used to characterize the preset similarity degree of the similarity.If maximum Similarity is greater than second threshold, then it is assumed that the text and speech recognition result of corresponding second pinyin sequence of the maximum similarity It is similar enough, so that can be determined that speech recognition result is the text of corresponding second pinyin sequence of the maximum similarity, Therefore, institute's speech recognition result is replaced with to the text of corresponding second pinyin sequence of the maximum similarity.Conversely, to institute Speech recognition result is not processed, and is avoided and is handled in case of doubt the mistake of speech recognition result, improves language The accuracy rate of sound identification.

The text that the embodiment of the present invention passes through calculating speech recognition result corresponding first pinyin sequence and multiple candidates as a result, It is each described after the editing distance of corresponding each second pinyin sequence of this sequence, the first pinyin sequence after being aligned and alignment Second pinyin sequence, in response to corresponding first text of the first pinyin sequence after the alignment and each described second after alignment The identical number of words of corresponding each second text of pinyin sequence is greater than first threshold, the first pinyin sequence after calculating the alignment and The similarity of each second pinyin sequence after alignment, when the maximum similarity in the similarity is greater than second threshold, The text that institute's speech recognition result is replaced with to corresponding second pinyin sequence of the maximum similarity, realizes speech recognition As a result automatic error-correcting improves the recognition accuracy of speech recognition.

Fig. 4 is the schematic diagram of the speech recognition result error correction device of the embodiment of the present invention.As shown in figure 4, the present embodiment Device includes phonetic notation unit 41, the first determination unit 42, the first computing unit 43, the second determination unit 44, third determination unit 45, the second computing unit 46 and processing unit 47.

Wherein, phonetic notation unit 41 be configured as to speech recognition result carry out phonetic notation to determine speech recognition result pair The first pinyin sequence answered.First determination unit 42 is configured to determine that the second pinyin sequence of multiple candidates.First calculates list Member 43 is configured as calculating the editing distance of first pinyin sequence and each second pinyin sequence.Second determination unit 44 It is configured as determining the first pinyin sequence after alignment and each second pinyin sequence after alignment according to corresponding editing distance. Third determination unit 45 be configured to determine that corresponding first text of the first pinyin sequence after the alignment and alignment after it is each Corresponding each second text of second pinyin sequence.Second computing unit 46 is configured to respond to first text and institute The identical number of words for stating each second text is greater than first threshold, each institute after the first pinyin sequence and alignment after calculating the alignment State the similarity of the second pinyin sequence.Processing unit 47 is configured to respond to the maximum similarity in the similarity and is greater than the Institute's speech recognition result is replaced with the text of corresponding second pinyin sequence of the maximum similarity by two threshold values.

The embodiment of the present invention proposes a kind of speech recognition result error correction device, by carrying out phonetic notation to speech recognition result Corresponding first pinyin sequence of speech recognition result to determine, determines the second pinyin sequence of multiple candidates, described in calculating The editing distance of first pinyin sequence and each second pinyin sequence, first after alignment is determined according to corresponding editing distance spells Each second pinyin sequence after sound sequence and alignment, corresponding first text of the first pinyin sequence after determining the alignment Each second text corresponding with each second pinyin sequence after alignment, in response to first text and each second text This identical number of words is greater than first threshold, each second phonetic after the first pinyin sequence and alignment after calculating the alignment The similarity of sequence is greater than second threshold in response to the maximum similarity in the similarity, institute's speech recognition result is replaced It is changed to the text of corresponding second pinyin sequence of the maximum similarity, speech recognition result automatic error-correcting is realized, improves The accuracy rate of speech recognition.

Fig. 5 is the schematic diagram of the electronic equipment of the embodiment of the present invention.Electronic equipment shown in fig. 5 is general data processing dress It sets comprising general computer hardware structure includes at least processor 51 and memory 52.Processor 51 and memory 52 It is connected by bus 53.Memory 52 is suitable for the instruction or program that storage processor 51 can be performed.Processor 51 can be independence Microprocessor, be also possible to one or more microprocessor set.Processor 51 is deposited by executing memory 52 as a result, The order of storage is realized thereby executing the method flow of embodiment present invention as described above for the processing of data and for other The control of device.Bus 53 links together above-mentioned multiple components, while said modules are connected to 54 He of display controller Display device and input/output (I/O) device 55.Input/output (I/O) device 55 can be mouse, keyboard, modulation /demodulation Device, network interface, touch-control input device, body-sensing input unit, printer and other devices well known in the art.Typically, Input/output (I/O) device 55 is connected by input/output (I/O) controller 56 with system.

Wherein, memory 52 can store component software, such as operating system, communication module, interactive module and application Program.Above-described each module and application program are both corresponded to complete one or more functions and be retouched in inventive embodiments One group of executable program instructions of the method stated.

It is above-mentioned according to the method for the embodiment of the present invention, the flow chart and/or frame of equipment (system) and computer program product Figure describes various aspects of the invention.It should be understood that each of flowchart and or block diagram piece and flow chart legend and/or frame The combination of block in figure can be realized by computer program instructions.These computer program instructions can be provided to general meter The processor of calculation machine, special purpose computer or other programmable data processing devices, to generate machine so that (via computer or What the processors of other programmable data processing devices executed) instruction creates for realizing in flowchart and or block diagram block or block The device of specified function action.

Meanwhile as skilled in the art will be aware of, the various aspects of the embodiment of the present invention may be implemented as be System, method or computer program product.Therefore, the various aspects of the embodiment of the present invention can take following form: complete hardware Embodiment, complete software embodiment (including firmware, resident software, microcode etc.) usually can all claim herein For the embodiment for combining software aspects with hardware aspect of circuit, " module " or " system ".In addition, side of the invention Face can take following form: the computer program product realized in one or more computer-readable medium, computer can Reading medium has the computer readable program code realized on it.

It can use any combination of one or more computer-readable mediums.Computer-readable medium can be computer Readable signal medium or computer readable storage medium.Computer readable storage medium can be such as (but not limited to) electronics, Magnetic, optical, electromagnetism, infrared or semiconductor system, device or any suitable combination above-mentioned.Meter The more specific example (exhaustive to enumerate) of calculation machine readable storage medium storing program for executing will include the following terms: with one or more electric wire Electrical connection, hard disk, random access memory (RAM), read-only memory (ROM), erasable is compiled portable computer diskette Journey read-only memory (EPROM or flash memory), optical fiber, portable optic disk read-only storage (CD-ROM), light storage device, Magnetic memory apparatus or any suitable combination above-mentioned.In the context of the embodiment of the present invention, computer readable storage medium It can be that can include or store the program used by instruction execution system, device or combine instruction execution system, set Any tangible medium for the program that standby or device uses.

Computer-readable signal media may include the data-signal propagated, and the data-signal of the propagation has wherein The computer readable program code realized such as a part in a base band or as carrier wave.The signal of such propagation can use Any form in diversified forms, including but not limited to: electromagnetism, optical or its any combination appropriate.It is computer-readable Signal media can be following any computer-readable medium: not be computer readable storage medium, and can be to by instructing Program that is that execution system, device use or combining instruction execution system, device to use is communicated, is propagated Or transmission.

Computer program code for executing the operation for being directed to various aspects of the present invention can be with one or more programming languages Any combination of speech is write, the programming language include: programming language such as Java, Smalltalk of object-oriented, C++, PHP, Python etc.；And conventional process programming language such as " C " programming language or similar programming language.Program code can be made It fully on the user computer, is partly executed on the user computer for independent software package；Partly in subscriber computer Above and partly execute on the remote computer；Or it fully executes on a remote computer or server.In latter feelings It, can be by remote computer by including that any type of network connection of local area network (LAN) or wide area network (WAN) are extremely used under condition Family computer, or (such as internet by using ISP) can be attached with outer computer.

The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art For, the invention can have various changes and changes.All any modifications made within the spirit and principles of the present invention are equal Replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of method for correcting error of voice identification result, which is characterized in that the described method includes:

Determine the second pinyin sequence of multiple candidates；

Each second pinyin sequence pair after corresponding first text of the first pinyin sequence and alignment after determining the alignment Each second text answered；

It is greater than first threshold with the identical number of words of each second text in response to first text, after calculating the alignment The similarity of each second pinyin sequence after first pinyin sequence and alignment；

Be greater than second threshold in response to the maximum similarity in the similarity, by institute's speech recognition result replace with it is described most The text of corresponding second pinyin sequence of big similarity.

2. the method according to claim 1, wherein described carry out phonetic notation described in determination speech recognition result Corresponding first pinyin sequence of speech recognition result includes:

Alternatively,

It is described that speech recognition result progress phonetic notation, to determine, corresponding first pinyin sequence of speech recognition result includes:

Predetermined separator is inserted between the pinyin combinations of adjacent word and between the initial consonant and simple or compound vowel of a Chinese syllable of each pinyin combinations to obtain Take first pinyin sequence.

3. the method according to claim 1, wherein the second pinyin sequence of the multiple candidates of the determination includes:

Obtain candidate text sequence；

4. according to the method described in claim 3, it is characterized in that, the text sequence of the candidate includes the identity card of user Number, the birthday, address, the neighbouring supermarket in address nearby hospitals and address text sequence.

5. according to the method described in claim 3, it is characterized in that, the text sequence according to the candidate determine it is corresponding The second pinyin sequence of multiple candidates includes:

6. the method according to claim 1, wherein first determined according to corresponding editing distance after alignment Pinyin sequence and alignment after each second pinyin sequence include:

First pinyin sequence and each second pinyin sequence is marked to need to be inserted into or delete according to corresponding editing distance Part；

By the most left of first pinyin sequence and each second pinyin sequence by insertion or deletion sign flag and most It is deleted to determine the first pinyin sequence and the alignment after alignment the right part with the insertion and deletion sign flag Each second pinyin sequence afterwards.

7. the method according to claim 1, wherein the method also includes:

It is not more than second threshold in response to the maximum similarity in the similarity, keeps institute's speech recognition result constant.

8. a kind of speech recognition result error correction device, which is characterized in that described device includes:

Phonetic notation unit is configured as carrying out speech recognition result phonetic notation corresponding first spelling of speech recognition result to determine Sound sequence；

First computing unit is configured as calculating the editing distance of first pinyin sequence and each second pinyin sequence；

Second determination unit is configured as each after determining the first pinyin sequence after alignment according to corresponding editing distance and being aligned Second pinyin sequence；

After third determination unit, corresponding first text of the first pinyin sequence after being configured to determine that the alignment and alignment Corresponding each second text of each second pinyin sequence；

Second computing unit is configured to respond to first text with the identical number of words of each second text and is greater than first Threshold value, the similarity of each second pinyin sequence after the first pinyin sequence and alignment after calculating the alignment；

Processing unit, the maximum similarity being configured to respond in the similarity are greater than second threshold, the voice are known Other result replaces with the text of corresponding second pinyin sequence of the maximum similarity.

9. a kind of computer readable storage medium, stores computer program instructions thereon, which is characterized in that the computer program Such as method of any of claims 1-7 is realized in instruction when being executed by processor.

10. a kind of electronic equipment, including memory and processor, which is characterized in that the memory is for storing one or more Computer program instructions, wherein one or more computer program instructions are executed by the processor to realize such as power Benefit requires method described in any one of 1-7.