KR101672484B1 - Misprounciations detector and method for detecting misprounciations using the same - Google Patents
Misprounciations detector and method for detecting misprounciations using the same Download PDFInfo
- Publication number
- KR101672484B1 KR101672484B1 KR1020150103092A KR20150103092A KR101672484B1 KR 101672484 B1 KR101672484 B1 KR 101672484B1 KR 1020150103092 A KR1020150103092 A KR 1020150103092A KR 20150103092 A KR20150103092 A KR 20150103092A KR 101672484 B1 KR101672484 B1 KR 101672484B1
- Authority
- KR
- South Korea
- Prior art keywords
- pronunciation
- phoneme
- error
- probability
- result
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 14
- 238000004364 calculation method Methods 0.000 claims description 17
- 238000001514 detection method Methods 0.000 claims description 15
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
Description
The present invention relates to a technique for detecting erroneous pronunciation from a learner's utterance.
When a language learner learns a new language, pronunciation errors occur within a limited range that does not go far beyond standard pronunciation, depending on the learner's linguistic background, degree of education, and so on.
In general, conventional automatic pronunciation detection and evaluation methods do not actively consider the types of errors according to the learner, and open the possibilities for all types of errors for all pronunciations. Therefore, the evaluation is somewhat inaccurate, have.
The pronunciation error detecting apparatus and the pronunciation error detecting method using the same according to the embodiment are intended to accurately and quickly detect pronunciation errors.
The pronunciation error detecting apparatus includes: a word recognizing unit for recognizing an input speech and generating a word recognition result; A phoneme recognition unit for generating phoneme recognition information corresponding to the word recognition result; A pronunciation score calculation unit for calculating a pronunciation score of the input speech based on the phoneme recognition information; An error probability calculation unit for calculating a pronunciation error probability of the input speech using the phoneme recognition information; And an error determination unit that determines whether there is a pronunciation error in the input speech based on the word recognition result, the pronunciation score, and the pronunciation error probability, and the phoneme recognition unit refers to the standard pronunciation dictionary and the multi- Information.
The error determination unit of the pronunciation error detection apparatus according to the embodiment calculates a feedback index based on the word recognition result, the pronunciation score, and the pronunciation error probability, compares a predetermined threshold value with the feedback index, And judges whether or not there is a pronunciation error in the voice.
Further, the multi-phonetic dictionary of the pronunciation error detection apparatus according to the embodiment includes a set of predicted pronunciation errors.
Further, the word recognition result of the pronunciation error detection apparatus according to the embodiment may include a forced alignment result obtained by forcibly aligning the input speech, and the phoneme recognition information may include a first phoneme corresponding to the forced alignment result; And a second phoneme corresponding to the free sorting result, and a second phoneme corresponding to the free sorting result.
The forced alignment result of the pronunciation error detecting device according to the embodiment is obtained by aligning the input speech with reference to the standard pronunciation dictionary, and the phoneme recognizing part recognizes the free alignment result in the set of the predicted pronunciation errors .
In addition, the phoneme recognition unit of the pronunciation error detection apparatus according to the embodiment calculates the first probability of the forced alignment result and the second probability of the free alignment result.
The pronunciation score calculation unit of the pronunciation error detection apparatus according to the embodiment calculates the pronunciation score using the first probability and the second probability.
The apparatus may further include an error probability database including standard pronunciation information and multiple pronunciation information of the phoneme in the pronunciation error detection apparatus according to the embodiment, wherein the error probability calculation unit compares the phoneme recognition information with the error probability database To divide the phoneme recognition information into phonemes belonging to standard pronunciation or phonemes belonging to multiple pronunciation.
The error probability calculation unit of the pronunciation error detection apparatus according to the embodiment calculates the pronunciation error probability using the number of phonemes belonging to the standard pronunciation and the number of phonemes belonging to the multiple pronunciation.
The word recognition result of the pronunciation error detecting apparatus according to the embodiment is a word unit, and the phoneme recognition information is a phoneme unit.
The pronunciation error detecting apparatus and the pronunciation error detecting method using the same according to the embodiment have an effect of accurately and rapidly detecting a pronunciation error.
1 is a block diagram showing a configuration of a pronunciation error detecting apparatus according to an embodiment.
2 is an example of a multi-phonetic dictionary of FIG.
3 is a flowchart illustrating a weight error detection method according to an embodiment.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art to which the present invention pertains. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly explain the present invention, parts not related to the description are omitted, and the same or similar components are denoted by the same reference numerals throughout the specification.
Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise. Also, the terms " part, "" module," and " module ", etc. in the specification mean a unit for processing at least one function or operation and may be implemented by hardware or software or a combination of hardware and software have.
1 is a block diagram showing a configuration of a pronunciation error detecting apparatus according to an embodiment.
2 is an example of a multi-phonetic dictionary of FIG.
Hereinafter, a pronunciation error detecting apparatus according to an embodiment will be described with reference to Figs. 1 and 2. Fig.
Referring to FIG. 1, the pronunciation error detection apparatus 1 generates a word recognition result w and a word recognition reliability RC using an input speech O. The word recognition result (RR) is a result of recognizing the input speech (O) on a word-by-word basis. The pronunciation error detecting device 1 generates phoneme recognition information PI using the input speech O and the word recognition result w. The phoneme recognition information (PI) is a result of recognizing the input speech (O) on a phoneme basis.
The pronunciation error detecting device 1 calculates a pronunciation score PC using the phoneme recognition information PI and calculates a pronunciation error PC on the basis of the pronunciation score PC and the word recognition reliability RC . At this time, the input voice O is a voice that the learner reads and pronounces a predetermined word or sentence, and the input voice O includes the voice of the word unit and the voice of the phoneme unit.
The pronunciation error detection device 1 includes a
The
The
The phoneme recognition information PI includes a forced alignment result R1 obtained by forcibly aligning the input speech O and a phoneme p corresponding to the forced alignment result R1, And a phoneme (q) corresponding to the alignment result (R2) and the free alignment result (R2). The forced sorting result R1 is a result of forcibly aligning the input speech O by word unit by referring to the
The
Hereinafter, a method in which the learner reads the word " only "and the
The
The
Referring to FIG. 2, the
In general, conventional speech recognizers are always designed based on an algorithm that finds the maximum value within a given range. However, the pronunciation
The
That is, Table 1 below shows the word recognition result (w), the forced alignment result (R1), and the free alignment result (R2) of the input speech (O).
1, the
P (O / p) is the score when the input speech (O) is aligned with the phoneme (p) by forced alignment (ie, the phoneme the probability that the input speech O will be generated when p (p) is given).
The phoneme recognition probability (P (O / q)) is the probability of multiple pronunciations and the phoneme recognition probability P (O / q) (I.e., the probability that the input speech O will be generated when the phoneme q is given).
The pronunciation
[Equation 1]
PC = 1 / N | log { (P (O / p)) / (max q ∈ Q P (O / q))} |
At this time, N in Equation (1) is the pronunciation length (e.g., frame length (ms)) of the input speech O and is equal to the total number of observation frames. N plays a role of normalizing the log expression of [Equation 1] such that it is not affected by the length. Also, Q is a set of predicted pronunciation errors (PR).
The logarithmic fraction in Equation (1) becomes the maximum value (1) when the phoneme recognition probability P (O / q) is equal to the forced alignment probability P (O / p). That is, the maximum value (1) is obtained when the phoneme (q) forcibly aligned and the phoneme (p) freely recognized are the same phoneme. Further, if the phoneme recognition probability P (O / q) is different from the forced alignment probability P (O / p), the logarithmic fraction of [Equation 1] In other words, when the phoneme (q) forcibly aligned and the phoneme (p) freely recognized are different phonemes, the value is smaller than 1. As the forced alignment probability P (O / p) is smaller, the logarithmic fraction of the expression (1) becomes smaller as the phoneme recognition probability P (O / q) is larger.
The error
The error
&Quot; (2) "
EP = P (e | c, v) = cnt (c, v, e) / cnt
In Equation (2), e is a pronunciation error, c is a phoneme belonging to the standard pronunciation, v is a phoneme belonging to multiple pronunciations, and cnt () means the number of occurrences.
The
&Quot; (3) "
FI = [1 / {1 + e ^ - (a 1 x 1 + a 2 x 2 + a 0)}] P (w / O)
In Equation (3), x 1 and x 2 are the pronunciation score (PC) and the pronunciation error probability (EP), respectively, and a 1 , a 2 and a 0 are optimized for system development data according to the machine learning algorithm It is a constant. P (w / O) represents the probability of the word recognition result (w) for the input speech (O).
The
The threshold value can be experimentally determined in accordance with the purpose and object of the pronunciation error detecting apparatus.
Hereinafter, a pronunciation error detection method according to an embodiment will be described with reference to FIG.
Referring to FIG. 3, in step S10, the
In step S20, the
In step S30, the pronunciation
In step S40, the error
In step S50, the
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, , Changes, deletions, additions, and so forth, other embodiments may be easily suggested, but these are also within the scope of the present invention.
Claims (10)
A phoneme recognition unit for generating phoneme recognition information corresponding to the word recognition result;
A pronunciation score calculation unit for calculating a pronunciation score of the input speech based on the phoneme recognition information;
An error probability calculation unit for calculating a pronunciation error probability of the input speech using the phoneme recognition information; And
And an error determination unit for determining whether there is a pronunciation error in the input speech based on the word recognition result, the pronunciation score, and the pronunciation error probability,
The phoneme recognition unit generates the phoneme recognition information by referring to a standard pronunciation dictionary and a multi-phonetic dictionary,
Wherein the error determination unit calculates a feedback index based on the word recognition result, the pronunciation score, and the pronunciation error probability, and compares a predetermined threshold value with the feedback index to determine whether the input speech has a pronunciation error Error detection device.
Wherein the multiple pronunciation dictionary includes a set of predicted pronunciation errors.
Wherein the word recognition result includes a forced alignment result in which the input speech is forcibly aligned,
In the phoneme recognition information,
A first phoneme corresponding to the forced alignment result; And
And a second phoneme corresponding to the result of free sorting in which the input speech is freely aligned.
Wherein the forced alignment result is obtained by aligning the input speech with reference to the standard pronunciation dictionary,
Wherein the phoneme recognition unit generates the free alignment result in a set of the predicted pronunciation errors.
Wherein the phoneme recognition unit calculates a first probability of the forced alignment result and a second probability of the free alignment result.
And the pronunciation score calculation unit calculates the pronunciation score using the first probability and the second probability.
Further comprising an error probability database including standard pronunciation information and multiple pronunciation information of phonemes,
Wherein the error probability calculator comprises:
And compares the phoneme recognition information with the error probability database to classify the phoneme recognition information into a phoneme belonging to standard pronunciation or a phoneme belonging to multiple pronunciation.
Wherein the error probability calculator comprises:
Wherein the pronunciation error probability is calculated using the number of phonemes belonging to the standard pronunciation and the number of phonemes belonging to the multiple pronunciation.
Wherein the word recognition result is a word unit and the phoneme recognition information is a phoneme unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150103092A KR101672484B1 (en) | 2015-07-21 | 2015-07-21 | Misprounciations detector and method for detecting misprounciations using the same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150103092A KR101672484B1 (en) | 2015-07-21 | 2015-07-21 | Misprounciations detector and method for detecting misprounciations using the same |
Publications (1)
Publication Number | Publication Date |
---|---|
KR101672484B1 true KR101672484B1 (en) | 2016-11-03 |
Family
ID=57571399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150103092A KR101672484B1 (en) | 2015-07-21 | 2015-07-21 | Misprounciations detector and method for detecting misprounciations using the same |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101672484B1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112133325A (en) * | 2020-10-14 | 2020-12-25 | 北京猿力未来科技有限公司 | Wrong phoneme recognition method and device |
CN112908363A (en) * | 2021-01-21 | 2021-06-04 | 北京乐学帮网络技术有限公司 | Pronunciation detection method and device, computer equipment and storage medium |
WO2022246782A1 (en) * | 2021-05-28 | 2022-12-01 | Microsoft Technology Licensing, Llc | Method and system of detecting and improving real-time mispronunciation of words |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130043817A (en) * | 2011-10-21 | 2013-05-02 | 포항공과대학교 산학협력단 | Apparatus for language learning and method thereof |
KR101483946B1 (en) * | 2013-10-28 | 2015-01-19 | 에스케이텔레콤 주식회사 | Method for checking phonation of sentence, system and apparatus thereof |
-
2015
- 2015-07-21 KR KR1020150103092A patent/KR101672484B1/en active Search and Examination
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130043817A (en) * | 2011-10-21 | 2013-05-02 | 포항공과대학교 산학협력단 | Apparatus for language learning and method thereof |
KR101483946B1 (en) * | 2013-10-28 | 2015-01-19 | 에스케이텔레콤 주식회사 | Method for checking phonation of sentence, system and apparatus thereof |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112133325A (en) * | 2020-10-14 | 2020-12-25 | 北京猿力未来科技有限公司 | Wrong phoneme recognition method and device |
CN112133325B (en) * | 2020-10-14 | 2024-05-07 | 北京猿力未来科技有限公司 | Wrong phoneme recognition method and device |
CN112908363A (en) * | 2021-01-21 | 2021-06-04 | 北京乐学帮网络技术有限公司 | Pronunciation detection method and device, computer equipment and storage medium |
CN112908363B (en) * | 2021-01-21 | 2022-11-22 | 北京乐学帮网络技术有限公司 | Pronunciation detection method and device, computer equipment and storage medium |
WO2022246782A1 (en) * | 2021-05-28 | 2022-12-01 | Microsoft Technology Licensing, Llc | Method and system of detecting and improving real-time mispronunciation of words |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101892734B1 (en) | Method and apparatus for correcting error of recognition in speech recognition system | |
EP2301013B1 (en) | Systems and methods of improving automated speech recognition accuracy using statistical analysis of search terms | |
US6487532B1 (en) | Apparatus and method for distinguishing similar-sounding utterances speech recognition | |
US6985863B2 (en) | Speech recognition apparatus and method utilizing a language model prepared for expressions unique to spontaneous speech | |
US7421387B2 (en) | Dynamic N-best algorithm to reduce recognition errors | |
US8744856B1 (en) | Computer implemented system and method and computer program product for evaluating pronunciation of phonemes in a language | |
US8880399B2 (en) | Utterance verification and pronunciation scoring by lattice transduction | |
US9799350B2 (en) | Apparatus and method for verifying utterance in speech recognition system | |
Witt et al. | Language learning based on non-native speech recognition. | |
US20140156276A1 (en) | Conversation system and a method for recognizing speech | |
CN111696557A (en) | Method, device and equipment for calibrating voice recognition result and storage medium | |
US11282511B2 (en) | System and method for automatic speech analysis | |
CN111951825A (en) | Pronunciation evaluation method, medium, device and computing equipment | |
KR101672484B1 (en) | Misprounciations detector and method for detecting misprounciations using the same | |
US20150179169A1 (en) | Speech Recognition By Post Processing Using Phonetic and Semantic Information | |
US6859774B2 (en) | Error corrective mechanisms for consensus decoding of speech | |
KR20160059265A (en) | Method And Apparatus for Learning Acoustic Model Considering Reliability Score | |
US11232786B2 (en) | System and method to improve performance of a speech recognition system by measuring amount of confusion between words | |
JP2015530614A (en) | Method and system for predicting speech recognition performance using accuracy scores | |
US20180012602A1 (en) | System and methods for pronunciation analysis-based speaker verification | |
US9269349B2 (en) | Automatic methods to predict error rates and detect performance degradation | |
KR20130126570A (en) | Apparatus for discriminative training acoustic model considering error of phonemes in keyword and computer recordable medium storing the method thereof | |
Arslan et al. | Detecting and correcting automatic speech recognition errors with a new model | |
CN113053414A (en) | Pronunciation evaluation method and device | |
JP2007052307A (en) | Inspection device and computer program for voice recognition result |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination |