US9318102B2 - Method and apparatus for correcting speech recognition error - Google Patents

Method and apparatus for correcting speech recognition error Download PDF

Info

Publication number
US9318102B2
US9318102B2 US14087944 US201314087944A US9318102B2 US 9318102 B2 US9318102 B2 US 9318102B2 US 14087944 US14087944 US 14087944 US 201314087944 A US201314087944 A US 201314087944A US 9318102 B2 US9318102 B2 US 9318102B2
Authority
US
Grant status
Grant
Patent type
Prior art keywords
speech recognition
syntax
corpus
recognition result
erroneous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14087944
Other versions
US20140163975A1 (en )
Inventor
Geun Bae Lee
Jun Hwi Choi
In Jae Lee
Dong Hyeon LEE
Hong Suck Seo
Yong Hee Kim
Seong Han Ryu
Sang Jun Koo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
POSTECH Industry-Academy Foundation
Original Assignee
POSTECH Industry-Academy Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Abstract

Disclosed are a speech recognition error correction method and an apparatus thereof. The speech recognition error correction method includes determining a likelihood that a speech recognition result is erroneous, and if the likelihood that the speech recognition result is erroneous is higher than a predetermined standard, generating a parallel corpus according to whether the speech recognition result matches the correct answer corpus, generating a speech recognition model based on the parallel corpus, and correcting an erroneous speech recognition result based on the speech recognition model and the language model. Accordingly, speech recognition errors are corrected.

Description

CLAIM FOR PRIORITY

This application claims priority to Korean Patent Application No. 10-2012-0141972, filed on Dec. 7, 2012 in the Korean Intellectual Property Office (KIPO), the entire contents of which are hereby incorporated by reference.

BACKGROUND

1. Technical Field

Example embodiments of the present invention relate in general to a speech recognition error correction method and apparatus, and more specifically, to a speech recognition error correction method and apparatus that correct the error generated in the speech recognition apparatus.

2. Related Art

Due to the proliferation of mobile apparatuses such as smart phones and tablet PCs, interest in speech recognition application software (for example, a conversation system such as SIRI of Apple Inc.) is increasing. However, existing speech recognition technology is not very accurate, speech recognition errors occur frequently, and this causes the speech recognition application software to malfunction.

SUMMARY

In order to overcome the above and other drawbacks of the conventional art, example embodiments of the present invention provide a speech recognition error correction method for correcting a speech recognition error based on a parallel corpus.

In order to overcome the above and other drawbacks of the conventional art, example embodiments of the present invention provide a speech recognition error correction apparatus for correcting a speech recognition error based on a parallel corpus.

In some example embodiments, a speech recognition error correction method performed by a speech recognition error correction apparatus includes determining a likelihood that a speech recognition result is erroneous based on a language model learned through a correct answer corpus and a domain corpus, if the likelihood that the speech recognition result is erroneous is higher than a predetermined standard, generating a parallel corpus according to whether the speech recognition result matches the correct answer corpus, generating a speech recognition model based on the parallel corpus, and correcting an erroneous speech recognition result based on the speech recognition model and the language model.

Here, the determining of the likelihood that a speech recognition result is erroneous may include determining a likelihood that a speech recognition result is erroneous based on likelihood of generating the speech recognition result.

Here, the generating of the parallel corpus may include detecting a correct answer pair from the correct answer corpus and the speech recognition result, detecting an incorrect answer pair from the correct answer corpus and the speech recognition result, and generating the parallel corpus based on the correct answer corpus, the speech recognition result, the correct answer pair, and the incorrect answer pair.

Here, the generating of the speech recognition model may include detecting a first syntax before speech recognition from the parallel corpus, detecting a second syntax after speech recognition from the parallel corpus, calculating speech recognition rate between the first syntax and the second syntax, and generating the speech recognition model based on the first syntax, the second syntax, and the speech recognition rate.

Here, the correcting of the erroneous speech recognition result may include generating a graph according to a correspondence relation between the first syntax and the second syntax, detecting a route with minimum errors from the graph, and correcting the erroneous speech recognition result based on the detected route.

Here, the generating of the graph may include assuming that a certain second syntax is a certain first syntax if the certain first syntax corresponding to the certain second syntax does not exist.

Here, the correcting of the erroneous speech recognition result based on the detected route may include correcting the erroneous speech recognition result without a rearrangement process according to the language model.

In other example embodiments, a speech recognition error correction apparatus includes a processing unit configured to determine a likelihood that a speech recognition result is erroneous based on a language model learned through a correct answer corpus and a domain corpus, and if the likelihood that the speech recognition result is erroneous is higher than a predetermined standard, generate a parallel corpus according to whether the speech recognition result matches the correct answer corpus, generate a speech recognition model based on the parallel corpus, and correct an erroneous speech recognition result based on the speech recognition model and the language model; and a storage unit configured to store information to be processed and information processed in the processing unit.

Here, the processing unit may determine the likelihood that a speech recognition result is erroneous based on likelihood of generating the speech recognition result.

Here, the processing unit may detect a correct answer pair from the correct answer corpus and the speech recognition result, detect an incorrect answer pair from the correct answer corpus and the speech recognition result, and generate the parallel corpus based on the correct answer corpus, the speech recognition result, the correct answer pair, and the incorrect answer pair.

Here, the processing unit may detect a first syntax before speech recognition from the parallel corpus, detect a second syntax after speech recognition from the parallel corpus, calculate a speech recognition rate between the first syntax and the second syntax, and generate the speech recognition model based on the first syntax, the second syntax, and the speech recognition rate.

Here, the processing unit may generate a graph according to a correspondence relation between the first syntax and the second syntax, detect a route with minimum errors from the graph, and correct the erroneous speech recognition result based on the detected route.

Here, the processing unit may generate the graph assuming that a certain second syntax is a certain first syntax if the certain first syntax corresponding to the certain second syntax does not exist.

Here, the processing unit may correct the erroneous speech recognition result without a rearrangement process according to the language model.

According to the present invention, an error generated by speech recognition can be corrected so that speech recognition accuracy can be improved.

BRIEF DESCRIPTION OF DRAWINGS

Example embodiments of the present invention will become more apparent by describing in detail example embodiments of the present invention with reference to the accompanying drawings, in which:

FIG. 1 is a block diagram illustrating a speech recognition apparatus;

FIG. 2 is a flowchart illustrating a speech recognition error correction method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an operation of generating a parallel corpus in a speech recognition error correction method according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating an operation of generating a speech recognition model in a speech recognition error correction method according to an embodiment of the present invention;

FIG. 5 is a diagram schematically illustrating a speech recognition model;

FIG. 6 is a flowchart illustrating an operation of correcting an error of a speech recognition result in a speech recognition error correction method according to an embodiment of the present invention;

FIG. 7 is a diagram schematically illustrating a graph according to a correspondence relation of syntaxes;

FIG. 8 is a block diagram illustrating a speech recognition error correction apparatus according to an embodiment of the present invention; and

FIG. 9 is a block diagram illustrating a speech recognition error correction apparatus according to another embodiment of the present invention.

DESCRIPTION OF EXAMPLE EMBODIMENTS

Accordingly, while the invention can be embodied in various different forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the invention to the particular forms disclosed. On the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the appended claims.

It will be understood that, although the terms first, second, A, B, etc. may be used herein in reference to elements of the invention, such elements should not be construed as limited by these terms. For example, a first element could be termed a second element, and a second element could be termed a first element, without departing from the scope of the present invention. Herein, the term “and/or” includes any and all combinations of one or more referents.

The terminology used herein to describe embodiments of the invention is not intended to limit the scope of the invention. The articles “a,” “an,” and “the” are singular in that they have a single referent, however the use of the singular form in the present document should not preclude the presence of more than one referent. In other words, elements of the invention referred to in the singular may number one or more, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including,” when used herein, specify the presence of stated features, numbers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, numbers, steps, operations, elements, components, and/or groups thereof.

Unless otherwise defined, all terms (including technical and scientific terms) used herein are to be interpreted as is customary in the art to which this invention belongs. It will be further understood that terms in common usage should also be interpreted as is customary in the relevant art and not in an idealized or overly formal sense unless expressly so defined herein.

Hereinafter, preferred embodiments will be described in more detail with reference to the accompanying drawings. Throughout the drawings and detailed description, parts of the invention are consistently denoted by the same respective reference numerals.

FIG. 1 is a block diagram illustrating a speech recognition apparatus.

With reference to FIG. 1, the speech recognition apparatus may include a speech recognition unit 20, an error correction unit 30, and a speech recognition application unit 40. The speech recognition unit 20 may receive the speech signal 10, recognize the speech signal 10, and generate a speech recognition result (text). The error correction unit 30 may analyze whether the speech recognition result includes an error and correct the error if included. The error correction unit 30 may have the same configuration as a speech recognition error correction apparatus 30 illustrated in FIGS. 8 and 9.

The speech recognition application unit 40 may apply the speech recognition result to various applications. The application may refer to a speech word processor, a speech conversation system, or the like.

FIG. 2 is a flowchart illustrating a speech recognition error correction method according to an embodiment of the present invention.

With reference to FIG. 2, the speech recognition error correction method includes determining a likelihood that a speech recognition result is erroneous based on a language model learned through a correct answer corpus and a domain corpus in step 100, if the likelihood that the speech recognition result is erroneous is higher than a predetermined standard, generating a parallel corpus based on whether the speech recognition result matches the correct answer corpus in step 200, generating a speech recognition model based on the parallel corpus in step 300, and correcting an erroneous speech recognition result based on the speech recognition model and the language model in step 400. Here, each operation of the speech recognition error correction method may be performed by the speech recognition error correction apparatus 30 illustrated in FIGS. 8 and 9.

The speech recognition error correction apparatus may determine whether the speech recognition result includes an error or not (i.e., the likelihood that it is erroneous) in step 100. The speech recognition error correction apparatus may determine the likelihood that the speech recognition result is erroneous using the language model learned through the high-capacity domain corpus and the correct answer corpus, which relate to the speech recognition result. Here, an N-gram may be used as the language model, and specifically, a bigram or a trigram may be used.

For example, if the speech recognition result is “an apple deliciously ate”, the speech recognition error correction apparatus may calculate a likelihood of generating “an apple deliciously” and “deliciously ate” by the bigram, and a likelihood of generating “apple deliciously ate” by the Ingram. At this point, because the language model is learned through the correct answer corpus, the bigram “an apple deliciously” has a low likelihood of being generated, and therefore the speech recognition error correction apparatus may determine “an apple deliciously” as a syntax with a high likelihood of error. As a result, the speech recognition error correction apparatus may determine “an apple deliciously ate” as a syntax with a high likelihood of error.

If the likelihood that the speech recognition result is erroneous is higher than a predetermined standard, the speech recognition error correction apparatus may generate a parallel corpus according to the identity of the correct answer corpus and the speech recognition result in step 200.

FIG. 3 is a flowchart illustrating an operation of generating a parallel corpus in a speech recognition error correction method according to an embodiment of the present invention.

With reference to FIG. 3, generating the parallel corpus in step 200 includes detecting a correct answer pair from the correct answer corpus and the speech recognition result in step 210, detecting an incorrect answer pair from the correct answer corpus and the speech recognition result in step 220, and generating a parallel corpus based on the correct answer corpus, the speech recognition result, the correct answer pair, and the incorrect answer pair in step 230.

The speech recognition error correction apparatus may detect the correct answer pair from an existing parallel corpus including the correct answer corpus and the speech recognition result in step 210.

TABLE 1
Existing Parallel Corpus (Error, Correct)
an apple deliciously ate an apple was deliciously eaten
an apple is reduce an apple is red
a poison was put of an apple a poison was put into an apple

Table 1 is an existing parallel corpus including a correct answer corpus and a speech recognition result. Here, “an apple deliciously ate”, “an apple is reduce”, and “a poison was put of an apple” on the left column refer to speech recognition results, and “an apple was deliciously eaten”, “an apple is red”, “a poison was put into an apple” in the right column refer to correct answer corpuses.

For example, since “an apple” of the speech recognition result in the second row of Table 1 is the same as “an apple” of the correct answer corpus, the speech recognition error correction apparatus may detect “an apple” as a correct answer pair. Further, since “a poison was put” of the speech recognition result in the third row of Table 1 is the same as “a poison was put” of the correct answer corpus, the speech recognition error correction apparatus may detect “a poison was put” as a correct answer pair.

The speech recognition error correction apparatus may detect an incorrect answer pair from the existing parallel corpus including the correct answer corpus and the speech recognition result in step 220.

For example, since “reduce” of the speech recognition result in the second row of Table 1 and “red” of the correct answer corpus are not the same, the speech recognition error correction apparatus may detect “reduce, red” as an incorrect answer pair. Further, since “of an apple” of the speech recognition result in the third row of Table 1 and “into an apple” of the correct answer corpus are not the same, the speech recognition error correction apparatus can detect “of an apple, into an apple” as an incorrect answer pair.

Here, it is described that the step 220 is performed after the step 210, but the step 210 may be performed after the step 220.

The speech recognition error correction apparatus may generate an extended parallel corpus including a correct answer corpus, a speech recognition result, a correct answer pair, and an incorrect answer pair in step 230.

TABLE 2
Extended Parallel Corpus (Error, Correct) Remarks
an apple deliciously an apple was deliciously 1.
ate eaten
an apple is reduce an apple is red 2.
a poison was put of a poison was put into an 3.
an apple apple
Figure US09318102-20160419-P00001
Figure US09318102-20160419-P00002
4. this is an incorrect
answer pair to be added
in No. 1, but excluded
due to the correct answer
pair of No. 6.
deliciously deliciously 5. correct answer pair
added in No. 1
an apple an apple 6. correct answer pair
added in No. 2
is reduce is red 7. incorrect answer pair
added in No. 2
of an apple into an apple 8. incorrect answer pair
added in No. 3
a poison was put a poison was put 9. correct answer pair
added in No. 3

Table 2 presents extended parallel corpuses including correct answer corpuses, speech recognition results, correct answer pairs, and incorrect answer pairs. In Remarks 1, 2, and 3, “an apple deliciously ate”, “an apple is reduce”, “a poison was put of an apple” presented in the left column refer to speech recognition results, and “an apple was deliciously eaten”, “an apple is red”, and “a poison was put into an apple” in the right column refer to correct answer corpuses.

In Remarks 5, 6, and 9, “deliciously, deliciously”, “an apple, an apple”, and “a poison to was put, a poison was put” refer to correct answer pairs. In Remarks 7 and 8, “reduce, red” and “of an apple, into an apple” refer to incorrect answer pairs.

That is, the speech recognition error correction apparatus may generate extended parallel corpuses including the correct answer corpuses, the speech recognition results, the correct answer pairs and the incorrect answer pairs as presented in Table 2.

Here, on the first row of Table 1, the speech recognition result of “an apple” and the correct answer corpus of “an apple was” are not the same. However, since “an apple” is detected as a correct pair (Remark 6 of Table 2), the speech recognition error correction apparatus does not detect “an apple, an apple was” as an incorrect answer pair (see Remark 2 of Table 2). That is, if text included in an incorrect answer pair is included in a correct answer pair, the speech recognition error correction apparatus does not detect the corresponding incorrect answer pair as an incorrect answer pair. Accordingly, the speech recognition error correction apparatus can reduce error generation.

The speech recognition error correction apparatus may generate a speech recognition model based on a parallel corpus in step 300.

FIG. 4 is a flowchart illustrating an operation of generating a speech recognition model in a speech recognition error correction method according to an embodiment of the present invention.

With reference to FIG. 4, generating a speech recognition model in step 300 includes detecting a first syntax before speech recognition from a parallel corpus in step 310, detecting a second syntax after speech recognition from the parallel corpus in step 320, calculating speech recognition rate between the first syntax and the second syntax in step 330, and generating the speech recognition model based on the first syntax, the second syntax, and the speech recognition rate.

The speech recognition error correction apparatus may detect a first syntax before speech recognition from an extended parallel corpus (that is, see Table 2) in step 310. That is, the speech recognition error correction apparatus may detect a correct answer corpus included in the extended parallel corpus as the first syntax.

The speech recognition error correction apparatus may detect a second syntax after speech recognition from the extended parallel corpus (that is, see Table 2) in step 320. That is, the speech recognition error correction apparatus may detect the speech recognition result included in the extended parallel corpus as the second syntax.

Here, it is described that the step 320 is performed after the step 310, but the step 310 may be performed after the step 320.

The speech recognition error correction apparatus may calculate a speech recognition rate between the first syntax and the second syntax in step 330. That is, the speech recognition error correction apparatus may calculate a speech recognition rate in which the first syntax is recognized as the second syntax, and may calculate the speech recognition rate using the extended parallel corpus at this point.

The speech recognition error correction apparatus may generate a speech recognition model based on the first syntax, the second syntax, and the speech recognition rate in step 340.

FIG. 5 is a diagram schematically illustrating a speech recognition model.

With reference to FIG. 5, FIG. 5A shows a speech recognition model in which English is translated into German, FIG. 5B shows a speech recognition model in which German is translated into English, and FIG. 5C shows a speech recognition model in which the speech recognition models of FIGS. 5A and 5B are combined.

In the speech recognition model in FIG. 5A, rows indicate English (corresponding to the first syntax), and columns indicate German (corresponding to the second syntax). Here, the black cells denote a high likelihood that the first syntax translates into the second syntax. (For example, it is highly likely that ‘that’ translates into ‘dass’.)

In the speech recognition model of FIG. 5B, rows indicate English (corresponding to the second syntax), and columns indicate German (corresponding to the first syntax). Here, the black cells denote a high likelihood that the first syntax translates into the second syntax (For example, it is highly likely that ‘geht’ translates into ‘assumes’.)

The speech recognition model of FIG. 5C is obtained by combining the speech recognition model of FIG. 5A and the speech recognition model of FIG. 5B. Cells at which it is highly likely that the first syntax translates into the second syntax in both of FIGS. 5A and 5B are indicated in black (for example, ‘that’→‘dass’), and cells at which there is a high likelihood of the first syntax translating into the second syntax in only one of FIGS. 5A and 5B are indicated in gray (‘the’→‘im’).

In this manner, the speech recognition error correction apparatus may generate the speech recognition model. That is, the speech recognition error correction apparatus may generate a matrix by setting the first syntax (for example, the correct answer corpus) as rows, and setting the second syntax (for example, the speech recognition result) corresponding to the first syntax as columns, and may generate the first speech recognition model by indicating, in black, cells in which the first syntax is highly likely to be recognized as the second syntax.

The speech recognition error correction apparatus may generate a matrix by setting a first syntax (for example, a speech recognition result) as columns, and setting a second syntax (for example, a correct answer corpus) corresponding to the first syntax as rows, and may generate a second speech recognition model by indicating, in black, cells in which the first syntax is highly likely to be recognized as the second syntax.

The speech recognition error correction apparatus may generate a final speech recognition model by combining the first speech recognition model and the second speech recognition model.

The speech recognition error correction apparatus may correct an erroneous speech recognition result based on the speech recognition model and the language model in step 400.

FIG. 6 is a flowchart illustrating an operation of correcting an error of a speech recognition result in a speech recognition error correction method according to an embodiment of the present invention.

With reference to FIG. 6, correcting an erroneous speech recognition result in step 400 may include generating a graph according to a correspondence relation between a first syntax and a second syntax in step 410, detecting a route having minimum errors from the graph in step 420, and correcting an erroneous speech recognition result based on the detected route in step 430.

The speech recognition error correction apparatus may generate a graph according to a correspondence relation between a first syntax and a second syntax in step 410.

FIG. 7 is a diagram schematically illustrating a graph according to a correspondence relation of syntaxes.

With reference to FIG. 7, a ‘code 50 (for example, a first syntax)’ refers to Spanish, and a ‘code 60 (for example, second syntax)’ refers to English. That is, ‘Maria’, ‘no’, and ‘did not’ correspond to ‘Mary’, ‘not’, and ‘no’, respectively.

In this manner, the speech recognition error correction apparatus may detect second syntaxes (that is, speech recognition results) corresponding to first syntaxes (that is, correct answer corpuses), respectively, and may generate a graph according to correspondence relations between the first syntaxes and the second syntaxes based on the detection.

At this point, if a certain first syntax corresponding to a certain second syntax does not exist, the speech recognition error correction apparatus may assume that the certain second syntax is the certain first syntax. That is, if the certain first syntax corresponding to the certain second syntax does not exist, a part on the graph with regard to the certain first syntax is regarded as a blank, and the speech recognition error correction apparatus may assume the certain second syntax to be the first syntax in order to avoid the blank.

For example, if a certain first syntax corresponding to a certain second syntax of ‘of an apple’ does not exist, the speech recognition error correction apparatus may assume the certain second syntax of ‘of an apple’ to be the certain first syntax.

The speech recognition error correction apparatus may detect a route with minimum errors from the graph in step 420. At this point, the speech recognition error correction apparatus may detect the route with minimum errors through known technology (for example, a viterbi search).

The speech recognition error correction apparatus may correct errors of the speech recognition result based on the detected route in step 430. That is, the speech recognition error correction apparatus may compare the speech recognition result with the detected route, and correct the speech recognition result based on the detected route, if there is difference according to the comparison. For example, if the speech recognition result is “an apple deliciously ate”, the speech recognition error correction apparatus may correct the speech recognition result to “an apple was deliciously eaten.”

Here, the speech recognition error correction apparatus may correct an erroneous speech recognition result without a rearrangement process according to a language model, The rearrangement process refers to rearranging a word order at the time of translation, because word orders are different from language to language. For example, if English is translated into Korean, ‘I have an apple’ may be translated into “

Figure US09318102-20160419-P00003
Figure US09318102-20160419-P00004
”, and “
Figure US09318102-20160419-P00005
Figure US09318102-20160419-P00006
” may be rearranged into “
Figure US09318102-20160419-P00007
Figure US09318102-20160419-P00008
” according to the rearrangement process by the language model. The speech recognition error correction method is for correcting an error in the same language, so the speech recognition error correction apparatus does not perform the rearrangement process above.

FIG. 8 is a block diagram illustrating a speech recognition error correction apparatus according to an embodiment of the present invention.

With reference to FIG. 8, the speech recognition error correction apparatus 30 includes a processing unit 31 and a storage unit 32. The processing unit 31 may determine a likelihood that a speech recognition result is erroneous based on a language model learned through a correct answer corpus and a domain corpus, and if the likelihood that the speech recognition result is erroneous is higher than a predetermined standard, may generate a parallel corpus based on whether a speech recognition result matches a correct answer corpus, generate a speech recognition model based on a parallel corpus, and correct an erroneous speech recognition result based on the speech recognition model and the language model. The storage unit 32 may store information to be processed and information processed in the processing unit 31.

The processing unit 31 may determine a likelihood that a speech recognition result is erroneous based on step 100 described above. Specifically, the processing unit 31 may determine likelihood of error using the language model learned through the high-capacity domain corpus and the correct answer corpus relating to the speech recognition result. Here, the N-gram may be used as the language model, and specifically, the bigram or the trigram may be used.

For example, if the speech recognition result is “an apple deliciously ate”, the processing unit 31 may calculate the likelihood of generating “an apple deliciously”, “deliciously ate” through the bigram, and calculate the likelihood of generating “an apple deliciously ate” through the trigram. At this point, since the language model was learned through the correct answer corpus, the bigram of “an apple deliciously” has a low likelihood of being generated, and therefore, the processing unit 31 may determine “an apple deliciously” as a syntax having a high likelihood of error. As a result, the processing unit 31 may determine that “an apple deliciously ate” is a syntax having high likelihood of error.

The processing unit 31 may generate a parallel corpus based on step 200 described above. Specifically, the processing unit 31 may detect a correct answer pair based on step 210 described above. For example, since “an apple” of the speech recognition result in the second row of Table 1 is the same as “‘an apple” of the correct answer corpus, the processing unit 31 may detect “an apple” as a correct answer pair. Further, since “a poison was put” of the speech recognition result in the third row of Table 1 is the same as “a poison was put” of the correct answer corpus, the processing unit 31 may detect “a poison was put” as a correct answer pair.

The processing unit 31 may detect the incorrect answer pair based on step 220 described above. For example, since “reduce” of the speech recognition result in the second row of Table 1 and “red” of the correct answer corpus are not the same, the processing unit 31 may detect “reduce, red” as an incorrect answer pair. Further, since “of an apple” of the speech recognition result in the third row of Table 1 and “into an apple” of the correct answer corpus are not the same, the processing unit 31 can detect “of an apple, into an apple” as an incorrect answer pair.

The processing unit 31 may generate the parallel corpus based on step 230, and may generate an extended parallel corpus including the correct answer corpus, the speech recognition result, the correct answer pair, and the incorrect answer pair as presented in Table 2.

The processing unit 31 may generate the speech recognition model based on step 300 described above. Specifically, the processing unit 31 may detect a first syntax before speech recognition based on step 310 described above, detect a second syntax after speech recognition based on step 320, calculate a speech recognition rate among syntaxes based on step 330 described above, and generate the speech recognition model based on step 340 described above.

That is, the processing unit 31 may generate the speech recognition model as illustrated in FIG. 5. The processing unit 31 may generate a matrix by setting the first syntax (for example, the correct answer corpus) as rows, and setting the second syntax (for example, the speech recognition result) corresponding to the first syntax as columns, and generate the first speech recognition model by indicating cells in which the first syntax is highly likely to be recognized as the second syntax in black.

The processing unit 31 may generate a matrix by setting a first syntax (for example, a speech recognition result) as columns, and setting a second syntax (for example, a correct answer corpus) corresponding to the first syntax as rows, and may generate a second speech recognition model by indicating, in black, cells in which the first syntax is highly likely to be recognized as the second syntax.

The processing unit 31 may generate a final speech recognition model by combining the first speech recognition model and the second speech recognition model.

The processing unit 31 may correct an erroneous speech recognition result based on step 400 described above. Specifically, the processing unit 31 may generate a graph according to correspondence relations among syntaxes based on step 410 described above, detect second syntaxes (that is, speech recognition results) corresponding to the first syntaxes (that is, correct answer corpuses), respectively, as illustrated in FIG. 7, and generate a graph according to the correspondence relations of the first syntaxes and the second syntaxes.

The processing unit 31 may detect a route with minimum errors based on step 420 described above, and detect a route with minimum errors through known technology (for example, a viterbi search).

The processing unit 31 may correct an erroneous speech recognition result based on step 430 described above. That is, the processing unit 31 may compare the speech recognition result and the detected route, and correct the speech recognition result based on the detected route, if there is difference according to the comparison result. For example, if the speech recognition result is “an apple deliciously ate” and the detected route is “an apple was deliciously eaten”, the speech recognition error correction apparatus may correct the speech recognition result to “an apple was deliciously eaten.”

The function performed by the processing unit 31 may be performed in a processor (for example, a Central Processing Unit (CPU) and/or a Graphics Processing Unit (GPU)).

FIG. 9 is a block diagram illustrating a speech recognition error correction apparatus according to another embodiment of the present invention.

With reference to FIG. 9, the speech recognition error correction apparatus 30 includes an error determination unit 33, a corpus generation unit 34, a model generation unit 35, and a decoder 36. The speech recognition error correction apparatus 30 illustrated in FIG. 9 is substantially the same as the speech recognition error correction apparatus 30 illustrated in FIG. 8.

The error determination unit 33 may determine the likelihood that the speech recognition result is erroneous based on step 100 described above. The corpus generation unit 34 may generate the parallel corpus based on step 200. The model generation unit 35 may generate the speech recognition model based on step 300 described above. The decoder 36 may correct an erroneous speech recognition result based on step 400.

The functions performed by the error determination unit 33, the corpus generation unit 34, the model generation unit 35, and the decoder 36 may be substantially performed by a processor (for example, a CPU and/or a GPU).

Further, the error determination unit 33, the corpus generation unit 34, the model generation unit 35, and the decoder 36 may be implemented as one integrated form, one physical apparatus, or one module. Further, the error determination unit 33, the corpus generation unit 34, the model generation unit 35, and the decoder 36 may be implemented as a plurality of physical apparatuses or groups, not only one physical apparatus or group.

Various embodiments of the present invention may be implemented as program code that can be recorded in a computer-readable recording medium and executed by a computer. The computer-readable recording medium may include program commands, data files, data structures, etc., individually or in combination. The program code recorded in the computer-readable recording medium may be designed specifically for the present invention or may be well-known to those of ordinary skill in the art. Examples of the computer-readable recording medium include hardware devices such as read-only memory (ROM), random-access memory (RAM), or flash memory, formed specifically to store and execute program code. Examples of the program code include machine code made by a compiler and high-level language codes that may be executed by a computer by using an interpreter. The aforementioned hardware devices may include one or more software modules in order to execute operations of the present invention and vice versa.

While the example embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations may be made herein without departing from the scope of the invention as defined by the appended claims.

Claims (10)

What is claimed is:
1. A speech recognition error correction method performed by a speech recognition error correction apparatus, the method comprising:
determining a likelihood that a speech recognition result is erroneous based on a language model learned through a correct answer corpus and a domain corpus;
generating a parallel corpus according to whether the speech recognition result matches the correct answer corpus, if the likelihood that the speech recognition result is erroneous is higher than a predetermined standard;
generating a speech recognition model based on the parallel corpus; and
correcting an erroneous speech recognition result based on the speech recognition model and the language model;
wherein the generating of the parallel corpus includes detecting a correct answer pair from the correct answer corpus and the speech recognition result, detecting an incorrect answer pair from the correct answer corpus and the speech recognition result, and generating the parallel corpus based on the correct answer corpus, the speech recognition result, the correct answer pair, and the incorrect answer pair,
wherein if text included in the incorrect answer pair is included in the correct answer pair, the incorrect answer pair including the text is not detected as the incorrect answer pair, and
wherein the generating of the speech recognition model includes detecting a first syntax before speech recognition from the parallel corpus, detecting a second syntax after speech recognition from the parallel corpus, calculating a speech recognition rate between the first syntax and the second syntax, and generating the speech recognition model based on the first syntax, the second syntax, and the speech recognition rate.
2. The method of claim 1, wherein the determining of the likelihood that a speech recognition result is erroneous includes determining a likelihood that a speech recognition result is erroneous based on likelihood of generating the speech recognition result.
3. The method of claim 1, wherein the correcting of the erroneous speech recognition result includes:
generating a graph according to a correspondence relation between the first syntax and the second syntax;
detecting a route with minimum errors from the graph; and
correcting the erroneous speech recognition result based on the detected route.
4. The method of claim 3, wherein the generating of the graph includes assuming that a certain second syntax is a certain first syntax if the certain first syntax corresponding to the certain second syntax does not exist.
5. The method of claim 3, wherein the correcting of the erroneous speech recognition result based on the detected route includes correcting the erroneous speech recognition result without a rearrangement process according to the language model.
6. A speech recognition error correction apparatus, comprising:
a processing unit configured to determine a likelihood that a speech recognition result is erroneous based on a language model learned through a correct answer corpus and a domain corpus, and if the likelihood that the speech recognition result is erroneous is higher than a predetermined standard, generate a parallel corpus according to whether the speech recognition result matches the correct answer corpus generate a speech recognition model based on the parallel corpus, and correct an erroneous speech recognition result based on the speech recognition model and the language model; and
a storage unit configured to store information to be processed and information processed in the processing unit;
wherein the processing unit detects a correct answer pair from the correct answer corpus and the speech recognition result, detects an incorrect answer pair from the correct answer corpus and the speech recognition result, and generates the parallel corpus based on the correct answer corpus, the speech recognition result, the correct answer pair, and the incorrect answer pair, and
wherein if text included in the incorrect answer pair is included in the correct answer pair, the processing unit does not detect the incorrect answer pair including the text as the incorrect answer pair, and
wherein the processing unit detects a first syntax before speech recognition from the parallel corpus, detects a second syntax after speech recognition from the parallel corpus, calculates a speech recognition rate between the first syntax and the second syntax, and generates the speech recognition model based on the first syntax, the second syntax, and the speech recognition rate.
7. The speech recognition error correction apparatus of claim 6, wherein the processing unit determines the likelihood that a speech recognition result is erroneous based on likelihood of generating the speech recognition result.
8. The speech recognition error correction apparatus of claim 6, wherein the processing unit generates a graph according to a correspondence relation between the first syntax and the second syntax; detects a route with minimum errors from the graph; and corrects the erroneous speech recognition result based on the detected route.
9. The speech recognition error correction apparatus of claim 8, wherein the processing unit generates the graph assuming that a certain second syntax is a certain first syntax if the certain first syntax corresponding to the certain second syntax does not exist.
10. The speech recognition error correction apparatus of claim 8, wherein the processing unit corrects the erroneous speech recognition result without a rearrangement process according to the language model.
US14087944 2012-12-07 2013-11-22 Method and apparatus for correcting speech recognition error Active 2034-04-02 US9318102B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR20120141972A KR101364774B1 (en) 2012-12-07 2012-12-07 Method for correction error of speech recognition and apparatus
KR10-2012-0141972 2012-12-07

Publications (2)

Publication Number Publication Date
US20140163975A1 true US20140163975A1 (en) 2014-06-12
US9318102B2 true US9318102B2 (en) 2016-04-19

Family

ID=50271426

Family Applications (1)

Application Number Title Priority Date Filing Date
US14087944 Active 2034-04-02 US9318102B2 (en) 2012-12-07 2013-11-22 Method and apparatus for correcting speech recognition error

Country Status (4)

Country Link
US (1) US9318102B2 (en)
JP (1) JP5788953B2 (en)
KR (1) KR101364774B1 (en)
CN (1) CN103871407B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016045420A (en) * 2014-08-25 2016-04-04 カシオ計算機株式会社 Pronunciation learning support device and program
CN104809923A (en) * 2015-05-13 2015-07-29 苏州清睿信息技术有限公司 Self-complied and self-guided method and system for generating intelligent voice communication
CN105468468B (en) * 2015-12-02 2018-07-27 北京光年无限科技有限公司 Method and apparatus for data error correction system Q
CN107122346B (en) * 2016-12-28 2018-02-27 平安科技(深圳)有限公司 An error correction method and apparatus of an input sentence

Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02171876A (en) 1988-12-23 1990-07-03 Nippon Telegr & Teleph Corp <Ntt> Pattern recognition processing system
JP2005234236A (en) 2004-02-19 2005-09-02 Canon Inc Device and method for speech recognition, storage medium, and program
KR20060057921A (en) 2004-11-24 2006-05-29 한국전자통신연구원 Recognition error correction apparatus for interactive voice recognition system and method therefof
US20070043567A1 (en) * 2005-08-22 2007-02-22 International Business Machines Corporation Techniques for aiding speech-to-speech translation
JP2008033198A (en) 2006-08-01 2008-02-14 Nec System Technologies Ltd Voice interaction system, voice interaction method, voice input device and program
US20080319962A1 (en) * 2007-06-22 2008-12-25 Google Inc. Machine Translation for Query Expansion
US20090125477A1 (en) * 2007-11-14 2009-05-14 Fang Lu Foreign language abbreviation translation in an instant messaging system
US7539619B1 (en) * 2003-09-05 2009-05-26 Spoken Translation Ind. Speech-enabled language translation system and method enabling interactive user supervision of translation and speech recognition accuracy
JP2009198647A (en) 2008-02-20 2009-09-03 Nippon Telegr & Teleph Corp <Ntt> Voice recognition error analyzing apparatus, method, and program and its recording medium
US20090306981A1 (en) * 2008-04-23 2009-12-10 Mark Cromack Systems and methods for conversation enhancement
US20100145680A1 (en) * 2008-12-10 2010-06-10 Electronics And Telecommunications Research Institute Method and apparatus for speech recognition using domain ontology
JP2010134074A (en) 2008-12-03 2010-06-17 Toshiba Corp Voice recognition device, method and program
JP2011002656A (en) 2009-06-18 2011-01-06 Nec Corp Device for detection of voice recognition result correction candidate, voice transcribing support device, method, and program
US20110238406A1 (en) * 2010-03-23 2011-09-29 Telenav, Inc. Messaging system with translation and method of operation thereof
WO2012004955A1 (en) 2010-07-06 2012-01-12 株式会社日立製作所 Text correction method and recognition method
US20120173244A1 (en) * 2011-01-04 2012-07-05 Kwak Byung-Kwan Apparatus and method for voice command recognition based on a combination of dialog models
US20120290302A1 (en) * 2011-05-10 2012-11-15 Yang Jyh-Her Chinese speech recognition system and method
US8340958B2 (en) * 2009-01-23 2012-12-25 Harman Becker Automotive Systems Gmbh Text and speech recognition system using navigation information
US20130091138A1 (en) * 2011-10-05 2013-04-11 Microsoft Corporation Contextualization, mapping, and other categorization for data semantics
US20130275118A1 (en) * 2012-04-13 2013-10-17 Google Inc. Techniques for generating translation clusters
US8606559B2 (en) * 2008-09-16 2013-12-10 Electronics And Telecommunications Research Institute Method and apparatus for detecting errors in machine translation using parallel corpus
US8645119B2 (en) * 2007-03-26 2014-02-04 Google Inc. Minimum error rate training with a large number of features for machine learning
US20140114642A1 (en) * 2012-10-19 2014-04-24 Laurens van den Oever Statistical linguistic analysis of source content
US8788258B1 (en) * 2007-03-15 2014-07-22 At&T Intellectual Property Ii, L.P. Machine translation using global lexical selection and sentence reconstruction
US8972268B2 (en) * 2008-04-15 2015-03-03 Facebook, Inc. Enhanced speech-to-speech translation system and methods for adding a new word

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3126945B2 (en) * 1997-10-30 2001-01-22 株式会社エイ・ティ・アール音声翻訳通信研究所 Character error calibration device
JP2003308094A (en) * 2002-02-12 2003-10-31 Advanced Telecommunication Research Institute International Method for correcting recognition error place in speech recognition
JP4734155B2 (en) * 2006-03-24 2011-07-27 株式会社東芝 Speech recognition device, speech recognition method and a speech recognition program
KR100825690B1 (en) 2006-09-15 2008-04-29 포항공과대학교 산학협력단 Error correction method in speech recognition system
JP2011524991A (en) * 2008-04-15 2011-09-08 モバイル テクノロジーズ,エルエルシーMobile Technologies,Llc System and method for maintenance of the voice translation - voice on site
JP4709887B2 (en) * 2008-04-22 2011-06-29 株式会社エヌ・ティ・ティ・ドコモ Speech recognition result correction apparatus and speech recognition result correction method and a speech recognition result correction system,
CN101655837B (en) * 2009-09-08 2010-10-13 北京邮电大学 Method for detecting and correcting error on text after voice recognition
KR101181928B1 (en) 2011-07-18 2012-09-11 포항공과대학교 산학협력단 Apparatus for grammatical error detection and method using the same
CN102799579B (en) * 2012-07-18 2015-01-21 西安理工大学 Statistical machine translation method with error self-diagnosis and self-correction functions

Patent Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02171876A (en) 1988-12-23 1990-07-03 Nippon Telegr & Teleph Corp <Ntt> Pattern recognition processing system
US7539619B1 (en) * 2003-09-05 2009-05-26 Spoken Translation Ind. Speech-enabled language translation system and method enabling interactive user supervision of translation and speech recognition accuracy
JP2005234236A (en) 2004-02-19 2005-09-02 Canon Inc Device and method for speech recognition, storage medium, and program
KR20060057921A (en) 2004-11-24 2006-05-29 한국전자통신연구원 Recognition error correction apparatus for interactive voice recognition system and method therefof
US20070043567A1 (en) * 2005-08-22 2007-02-22 International Business Machines Corporation Techniques for aiding speech-to-speech translation
JP2008033198A (en) 2006-08-01 2008-02-14 Nec System Technologies Ltd Voice interaction system, voice interaction method, voice input device and program
US8788258B1 (en) * 2007-03-15 2014-07-22 At&T Intellectual Property Ii, L.P. Machine translation using global lexical selection and sentence reconstruction
US8645119B2 (en) * 2007-03-26 2014-02-04 Google Inc. Minimum error rate training with a large number of features for machine learning
US20080319962A1 (en) * 2007-06-22 2008-12-25 Google Inc. Machine Translation for Query Expansion
US20090125477A1 (en) * 2007-11-14 2009-05-14 Fang Lu Foreign language abbreviation translation in an instant messaging system
JP2009198647A (en) 2008-02-20 2009-09-03 Nippon Telegr & Teleph Corp <Ntt> Voice recognition error analyzing apparatus, method, and program and its recording medium
US8972268B2 (en) * 2008-04-15 2015-03-03 Facebook, Inc. Enhanced speech-to-speech translation system and methods for adding a new word
US20090306981A1 (en) * 2008-04-23 2009-12-10 Mark Cromack Systems and methods for conversation enhancement
US8606559B2 (en) * 2008-09-16 2013-12-10 Electronics And Telecommunications Research Institute Method and apparatus for detecting errors in machine translation using parallel corpus
JP2010134074A (en) 2008-12-03 2010-06-17 Toshiba Corp Voice recognition device, method and program
US20100145680A1 (en) * 2008-12-10 2010-06-10 Electronics And Telecommunications Research Institute Method and apparatus for speech recognition using domain ontology
US8340958B2 (en) * 2009-01-23 2012-12-25 Harman Becker Automotive Systems Gmbh Text and speech recognition system using navigation information
JP2011002656A (en) 2009-06-18 2011-01-06 Nec Corp Device for detection of voice recognition result correction candidate, voice transcribing support device, method, and program
US20110238406A1 (en) * 2010-03-23 2011-09-29 Telenav, Inc. Messaging system with translation and method of operation thereof
WO2012004955A1 (en) 2010-07-06 2012-01-12 株式会社日立製作所 Text correction method and recognition method
US20120173244A1 (en) * 2011-01-04 2012-07-05 Kwak Byung-Kwan Apparatus and method for voice command recognition based on a combination of dialog models
US20120290302A1 (en) * 2011-05-10 2012-11-15 Yang Jyh-Her Chinese speech recognition system and method
US20130091138A1 (en) * 2011-10-05 2013-04-11 Microsoft Corporation Contextualization, mapping, and other categorization for data semantics
US20130275118A1 (en) * 2012-04-13 2013-10-17 Google Inc. Techniques for generating translation clusters
US20140114642A1 (en) * 2012-10-19 2014-04-24 Laurens van den Oever Statistical linguistic analysis of source content

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Translation of Office Action issued in corresponding Japanese Application No. 2013-243198 dated Nov. 11, 2014 (3 pages).

Also Published As

Publication number Publication date Type
CN103871407B (en) 2017-04-19 grant
US20140163975A1 (en) 2014-06-12 application
JP5788953B2 (en) 2015-10-07 grant
CN103871407A (en) 2014-06-18 application
JP2014115646A (en) 2014-06-26 application
KR101364774B1 (en) 2014-02-20 grant

Similar Documents

Publication Publication Date Title
US8706472B2 (en) Method for disambiguating multiple readings in language conversion
Al-Onaizan et al. Distortion models for statistical machine translation
US7711545B2 (en) Empirical methods for splitting compound words with application to machine translation
Genzel Automatically learning source-side reordering rules for large scale machine translation
US20120167009A1 (en) Combining timing and geometry information for typing correction
Zhang et al. Chinese segmentation with a word-based perceptron algorithm
US20050038643A1 (en) Statistical noun phrase translation
US7853444B2 (en) Method and apparatus for training transliteration model and parsing statistic model, method and apparatus for transliteration
US20060150069A1 (en) Method for extracting translations from translated texts using punctuation-based sub-sentential alignment
US8655646B2 (en) Apparatus and method for detecting named entity
US20090106015A1 (en) Statistical machine translation processing
US20120233498A1 (en) Hierarchical error correction for large memories
US7774193B2 (en) Proofing of word collocation errors based on a comparison with collocations in a corpus
US20120089387A1 (en) General purpose correction of grammatical and word usage errors
US20120278060A1 (en) Method and system for confidence-weighted learning of factored discriminative language models
US8069392B1 (en) Error correction code system and method
US8818791B2 (en) Techniques for assisting a user in the textual input of names of entities to a user device in multiple different languages
US20150120723A1 (en) Methods and systems for processing speech queries
US20090304283A1 (en) Corrections for recognizers
US8762128B1 (en) Back-translation filtering
US20120185480A1 (en) Method to improve the named entity classification
US20100076746A1 (en) Computerized statistical machine translation with phrasal decoder
US20120303352A1 (en) Method and apparatus for assessing a translation
US20150161110A1 (en) Techniques for a gender weighted pinyin input method editor
US8176419B2 (en) Self learning contextual spell corrector

Legal Events

Date Code Title Description
AS Assignment

Owner name: POSTECH ACADEMY - INDUSTRY FOUNDATION, KOREA, REPU

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEE, GEUN BAE;RYU, SEONG HAN;LEE, DONG HYEON;AND OTHERS;REEL/FRAME:037435/0699

Effective date: 20131025