KR101672484B1 - Misprounciations detector and method for detecting misprounciations using the same - Google Patents

Misprounciations detector and method for detecting misprounciations using the same Download PDF

Info

Publication number
KR101672484B1
KR101672484B1 KR1020150103092A KR20150103092A KR101672484B1 KR 101672484 B1 KR101672484 B1 KR 101672484B1 KR 1020150103092 A KR1020150103092 A KR 1020150103092A KR 20150103092 A KR20150103092 A KR 20150103092A KR 101672484 B1 KR101672484 B1 KR 101672484B1
Authority
KR
South Korea
Prior art keywords
pronunciation
phoneme
error
probability
result
Prior art date
Application number
KR1020150103092A
Other languages
Korean (ko)
Inventor
이근배
이종훈
방지수
강세천
서홍석
Original Assignee
포항공과대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 포항공과대학교 산학협력단 filed Critical 포항공과대학교 산학협력단
Priority to KR1020150103092A priority Critical patent/KR101672484B1/en
Application granted granted Critical
Publication of KR101672484B1 publication Critical patent/KR101672484B1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

A mispronunciation detecting device includes: a word recognizing part recognizing an inputted voice to generate a word recognition result; a phoneme recognizing part generating phoneme recognition information corresponding to the word recognition result; a pronunciation score calculating part calculating a pronunciation score of the inputted voice based on the phoneme recognition information; a mispronunciation probability calculating part a mispronunciation probability of the inputted voice by using the phoneme recognition information; and a mispronunciation determining part determining whether the inputted voice has a mispronunciation or not based on the word recognition result, the pronunciation score, and the mispronunciation probability. The phoneme recognizing part generates the phoneme recognition information by referring to a standard pronunciation dictionary and a multi-pronunciation dictionary.

Description

TECHNICAL FIELD [0001] The present invention relates to a pronunciation error detection apparatus and a pronunciation error detection method using the same,

The present invention relates to a technique for detecting erroneous pronunciation from a learner's utterance.

When a language learner learns a new language, pronunciation errors occur within a limited range that does not go far beyond standard pronunciation, depending on the learner's linguistic background, degree of education, and so on.

In general, conventional automatic pronunciation detection and evaluation methods do not actively consider the types of errors according to the learner, and open the possibilities for all types of errors for all pronunciations. Therefore, the evaluation is somewhat inaccurate, have.

The pronunciation error detecting apparatus and the pronunciation error detecting method using the same according to the embodiment are intended to accurately and quickly detect pronunciation errors.

The pronunciation error detecting apparatus includes: a word recognizing unit for recognizing an input speech and generating a word recognition result; A phoneme recognition unit for generating phoneme recognition information corresponding to the word recognition result; A pronunciation score calculation unit for calculating a pronunciation score of the input speech based on the phoneme recognition information; An error probability calculation unit for calculating a pronunciation error probability of the input speech using the phoneme recognition information; And an error determination unit that determines whether there is a pronunciation error in the input speech based on the word recognition result, the pronunciation score, and the pronunciation error probability, and the phoneme recognition unit refers to the standard pronunciation dictionary and the multi- Information.

The error determination unit of the pronunciation error detection apparatus according to the embodiment calculates a feedback index based on the word recognition result, the pronunciation score, and the pronunciation error probability, compares a predetermined threshold value with the feedback index, And judges whether or not there is a pronunciation error in the voice.

Further, the multi-phonetic dictionary of the pronunciation error detection apparatus according to the embodiment includes a set of predicted pronunciation errors.

Further, the word recognition result of the pronunciation error detection apparatus according to the embodiment may include a forced alignment result obtained by forcibly aligning the input speech, and the phoneme recognition information may include a first phoneme corresponding to the forced alignment result; And a second phoneme corresponding to the free sorting result, and a second phoneme corresponding to the free sorting result.

The forced alignment result of the pronunciation error detecting device according to the embodiment is obtained by aligning the input speech with reference to the standard pronunciation dictionary, and the phoneme recognizing part recognizes the free alignment result in the set of the predicted pronunciation errors .

In addition, the phoneme recognition unit of the pronunciation error detection apparatus according to the embodiment calculates the first probability of the forced alignment result and the second probability of the free alignment result.

The pronunciation score calculation unit of the pronunciation error detection apparatus according to the embodiment calculates the pronunciation score using the first probability and the second probability.

The apparatus may further include an error probability database including standard pronunciation information and multiple pronunciation information of the phoneme in the pronunciation error detection apparatus according to the embodiment, wherein the error probability calculation unit compares the phoneme recognition information with the error probability database To divide the phoneme recognition information into phonemes belonging to standard pronunciation or phonemes belonging to multiple pronunciation.

The error probability calculation unit of the pronunciation error detection apparatus according to the embodiment calculates the pronunciation error probability using the number of phonemes belonging to the standard pronunciation and the number of phonemes belonging to the multiple pronunciation.

The word recognition result of the pronunciation error detecting apparatus according to the embodiment is a word unit, and the phoneme recognition information is a phoneme unit.

The pronunciation error detecting apparatus and the pronunciation error detecting method using the same according to the embodiment have an effect of accurately and rapidly detecting a pronunciation error.

1 is a block diagram showing a configuration of a pronunciation error detecting apparatus according to an embodiment.
2 is an example of a multi-phonetic dictionary of FIG.
3 is a flowchart illustrating a weight error detection method according to an embodiment.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art to which the present invention pertains. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly explain the present invention, parts not related to the description are omitted, and the same or similar components are denoted by the same reference numerals throughout the specification.

Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise. Also, the terms " part, "" module," and " module ", etc. in the specification mean a unit for processing at least one function or operation and may be implemented by hardware or software or a combination of hardware and software have.

1 is a block diagram showing a configuration of a pronunciation error detecting apparatus according to an embodiment.

2 is an example of a multi-phonetic dictionary of FIG.

Hereinafter, a pronunciation error detecting apparatus according to an embodiment will be described with reference to Figs. 1 and 2. Fig.

Referring to FIG. 1, the pronunciation error detection apparatus 1 generates a word recognition result w and a word recognition reliability RC using an input speech O. The word recognition result (RR) is a result of recognizing the input speech (O) on a word-by-word basis. The pronunciation error detecting device 1 generates phoneme recognition information PI using the input speech O and the word recognition result w. The phoneme recognition information (PI) is a result of recognizing the input speech (O) on a phoneme basis.

The pronunciation error detecting device 1 calculates a pronunciation score PC using the phoneme recognition information PI and calculates a pronunciation error PC on the basis of the pronunciation score PC and the word recognition reliability RC . At this time, the input voice O is a voice that the learner reads and pronounces a predetermined word or sentence, and the input voice O includes the voice of the word unit and the voice of the phoneme unit.

The pronunciation error detection device 1 includes a word recognition unit 10, a phoneme recognition unit 20, a pronunciation score calculation unit 30, a standard pronunciation dictionary 40, a multiple pronunciation dictionary 50, 60, an error probability calculation unit 70, an error probability database 80, and an error determination unit 90.

The word recognition unit 10 generates a word recognition result w and a word recognition reliability RC using the input speech O. [ The word recognition unit 10 may be constructed using a conventional general speech recognition apparatus.

The phoneme recognition unit 20 generates phoneme recognition information PI corresponding to the word recognition result w based on the input speech O with reference to the standard pronunciation dictionary 40 and the multiple pronunciation dictionary 50. [

The phoneme recognition information PI includes a forced alignment result R1 obtained by forcibly aligning the input speech O and a phoneme p corresponding to the forced alignment result R1, And a phoneme (q) corresponding to the alignment result (R2) and the free alignment result (R2). The forced sorting result R1 is a result of forcibly aligning the input speech O by word unit by referring to the standard pronunciation dictionary 40. [ The free sorting result (R2) is a result of recognizing the input speech (O) on a phoneme-by-phoneme basis within the learner's pronunciation error rule prediction range of the multi-phonetic dictionary (50).

The standard pronunciation dictionary 40 includes pronunciation rules composed of a standard pronunciation method and a pronunciation symbol, and can be defined for at least one or more languages. The multi-phonetic dictionary 50 extracts a pronunciation error rule by analyzing a pronunciation error pattern of the actual learner and applies the extracted pronunciation error rule to the standard pronunciation dictionary 40 to generate a predicted pronunciation error set (PR , See FIG. 2). The multi-phonetic dictionary 50 may be generated using various machine learning algorithms, and may be generated using an n-best result of Conditional Random Fields, but the embodiment is not limited thereto.

Hereinafter, a method in which the learner reads the word " only "and the word recognition unit 10 generates the word recognition result w, and the phoneme recognition unit 20 generates phoneme recognition information (PI ) Will be described.

The word recognition unit 10 generates a word recognition result w of the word "only"

The phoneme recognition unit 20 extracts the standard pronunciation SP corresponding to "only" by referring to the standard pronunciation dictionary 40 and forcibly aligns the input speech O to generate a forced selection result R1 ("only" / Ow / / n / / l / / iy /).

Referring to FIG. 2, the multi-phonetic dictionary 50 may include a pronunciation error rule corresponding to the extracted "only " and a set of predicted pronunciation errors (PR) of the learner. The pronunciation error rule can be extracted by referring to the standard pronunciation SP corresponding to "only ".

In general, conventional speech recognizers are always designed based on an algorithm that finds the maximum value within a given range. However, the pronunciation score calculation unit 60 selects the phoneme q from the predicted pronunciation error set PR. That is, the pronunciation score calculation unit 60 can obtain a desired value more naturally than the conventional speech recognizer by limiting the search range to the predicted set of pronunciation errors (PR).

The phoneme recognition unit 20 reads the free sorting result R2 of the set of pronunciation errors PR predicted using the recognition result RR, the pronunciation ao / / l / - /, which is closest to the forced sorting result R1, lt; / RTI >

That is, Table 1 below shows the word recognition result (w), the forced alignment result (R1), and the free alignment result (R2) of the input speech (O).

Word recognition result (w) only Forced Sort Result (R1) / ow / n / l / Free alignment result (R2) / ao / / l / - / iy /

1, the phoneme recognition unit 20 calculates a forced alignment probability P (O / p), which is a probability value of the forced alignment result R1, And calculates the probability (P (O / q)).

P (O / p) is the score when the input speech (O) is aligned with the phoneme (p) by forced alignment (ie, the phoneme the probability that the input speech O will be generated when p (p) is given).

The phoneme recognition probability (P (O / q)) is the probability of multiple pronunciations and the phoneme recognition probability P (O / q) (I.e., the probability that the input speech O will be generated when the phoneme q is given).

The pronunciation score calculation unit 60 calculates the pronunciation score PC using the following expression (1).

[Equation 1]

PC = 1 / N | log { (P (O / p)) / (max q ∈ Q P (O / q))} |

At this time, N in Equation (1) is the pronunciation length (e.g., frame length (ms)) of the input speech O and is equal to the total number of observation frames. N plays a role of normalizing the log expression of [Equation 1] such that it is not affected by the length. Also, Q is a set of predicted pronunciation errors (PR).

The logarithmic fraction in Equation (1) becomes the maximum value (1) when the phoneme recognition probability P (O / q) is equal to the forced alignment probability P (O / p). That is, the maximum value (1) is obtained when the phoneme (q) forcibly aligned and the phoneme (p) freely recognized are the same phoneme. Further, if the phoneme recognition probability P (O / q) is different from the forced alignment probability P (O / p), the logarithmic fraction of [Equation 1] In other words, when the phoneme (q) forcibly aligned and the phoneme (p) freely recognized are different phonemes, the value is smaller than 1. As the forced alignment probability P (O / p) is smaller, the logarithmic fraction of the expression (1) becomes smaller as the phoneme recognition probability P (O / q) is larger.

The error probability calculation unit 70 compares the phoneme recognition information PI with the pronunciation information of the error probability database 80 constructed in advance and compares the phoneme recognition information PI with the phoneme belonging to the standard pronunciation and the phoneme belonging to the multi- It is classified. The error probability database 80 contains information on standard pronunciation and multiple pronunciation of phonemes.

The error probability calculation unit 70 calculates the pronunciation error probability EP by substituting the number of phonemes belonging to standard pronunciation and the number of phonemes belonging to multiple pronunciation into the following equation (2).

&Quot; (2) "

EP = P (e | c, v) = cnt (c, v, e) / cnt

In Equation (2), e is a pronunciation error, c is a phoneme belonging to the standard pronunciation, v is a phoneme belonging to multiple pronunciations, and cnt () means the number of occurrences.

The error determination unit 90 calculates the feedback index FI by substituting the word recognition reliability RC, the word recognition result w, the pronunciation score PC, and the pronunciation error probability EP into the following equation (3) do.

&Quot; (3) "

FI = [1 / {1 + e ^ - (a 1 x 1 + a 2 x 2 + a 0)}] P (w / O)

In Equation (3), x 1 and x 2 are the pronunciation score (PC) and the pronunciation error probability (EP), respectively, and a 1 , a 2 and a 0 are optimized for system development data according to the machine learning algorithm It is a constant. P (w / O) represents the probability of the word recognition result (w) for the input speech (O).

The error determination unit 90 may determine that the pronunciation of the input speech O is an error and generate error feedback for phonemes whose feedback index FI exceeds a predetermined threshold value (for example, 0.5). Further, the error determination unit 90 determines that the pronunciation of the input speech O is correct for the phoneme whose feedback index FI does not exceed the predetermined threshold value, and generates the normal feedback or does not generate the feedback .

The threshold value can be experimentally determined in accordance with the purpose and object of the pronunciation error detecting apparatus.

Hereinafter, a pronunciation error detection method according to an embodiment will be described with reference to FIG.

Referring to FIG. 3, in step S10, the word recognition unit 10 generates a word recognition result w and a word recognition reliability RC using the input speech O. FIG.

In step S20, the phoneme recognition unit 20 refers to the standard pronunciation dictionary 40 and the multi-phonetic dictionary 50, and based on the input speech O, generates phoneme recognition information (" PI).

In step S30, the pronunciation score calculation unit 60 calculates a pronunciation score PC.

In step S40, the error probability calculation unit 70 calculates the pronunciation error probability EP.

In step S50, the error determination unit 90 calculates the feedback index FI and determines an error with respect to the pronunciation of the input speech O.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, , Changes, deletions, additions, and so forth, other embodiments may be easily suggested, but these are also within the scope of the present invention.

Claims (10)

A word recognition unit for recognizing an input speech and generating a word recognition result;
A phoneme recognition unit for generating phoneme recognition information corresponding to the word recognition result;
A pronunciation score calculation unit for calculating a pronunciation score of the input speech based on the phoneme recognition information;
An error probability calculation unit for calculating a pronunciation error probability of the input speech using the phoneme recognition information; And
And an error determination unit for determining whether there is a pronunciation error in the input speech based on the word recognition result, the pronunciation score, and the pronunciation error probability,
The phoneme recognition unit generates the phoneme recognition information by referring to a standard pronunciation dictionary and a multi-phonetic dictionary,
Wherein the error determination unit calculates a feedback index based on the word recognition result, the pronunciation score, and the pronunciation error probability, and compares a predetermined threshold value with the feedback index to determine whether the input speech has a pronunciation error Error detection device.
delete The method according to claim 1,
Wherein the multiple pronunciation dictionary includes a set of predicted pronunciation errors.
The method of claim 3,
Wherein the word recognition result includes a forced alignment result in which the input speech is forcibly aligned,
In the phoneme recognition information,
A first phoneme corresponding to the forced alignment result; And
And a second phoneme corresponding to the result of free sorting in which the input speech is freely aligned.
5. The method of claim 4,
Wherein the forced alignment result is obtained by aligning the input speech with reference to the standard pronunciation dictionary,
Wherein the phoneme recognition unit generates the free alignment result in a set of the predicted pronunciation errors.
6. The method of claim 5,
Wherein the phoneme recognition unit calculates a first probability of the forced alignment result and a second probability of the free alignment result.
The method according to claim 6,
And the pronunciation score calculation unit calculates the pronunciation score using the first probability and the second probability.
8. The method of claim 7,
Further comprising an error probability database including standard pronunciation information and multiple pronunciation information of phonemes,
Wherein the error probability calculator comprises:
And compares the phoneme recognition information with the error probability database to classify the phoneme recognition information into a phoneme belonging to standard pronunciation or a phoneme belonging to multiple pronunciation.
9. The method of claim 8,
Wherein the error probability calculator comprises:
Wherein the pronunciation error probability is calculated using the number of phonemes belonging to the standard pronunciation and the number of phonemes belonging to the multiple pronunciation.
The method according to claim 1,
Wherein the word recognition result is a word unit and the phoneme recognition information is a phoneme unit.
KR1020150103092A 2015-07-21 2015-07-21 Misprounciations detector and method for detecting misprounciations using the same KR101672484B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150103092A KR101672484B1 (en) 2015-07-21 2015-07-21 Misprounciations detector and method for detecting misprounciations using the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150103092A KR101672484B1 (en) 2015-07-21 2015-07-21 Misprounciations detector and method for detecting misprounciations using the same

Publications (1)

Publication Number Publication Date
KR101672484B1 true KR101672484B1 (en) 2016-11-03

Family

ID=57571399

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150103092A KR101672484B1 (en) 2015-07-21 2015-07-21 Misprounciations detector and method for detecting misprounciations using the same

Country Status (1)

Country Link
KR (1) KR101672484B1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112133325A (en) * 2020-10-14 2020-12-25 北京猿力未来科技有限公司 Wrong phoneme recognition method and device
CN112908363A (en) * 2021-01-21 2021-06-04 北京乐学帮网络技术有限公司 Pronunciation detection method and device, computer equipment and storage medium
WO2022246782A1 (en) * 2021-05-28 2022-12-01 Microsoft Technology Licensing, Llc Method and system of detecting and improving real-time mispronunciation of words

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130043817A (en) * 2011-10-21 2013-05-02 포항공과대학교 산학협력단 Apparatus for language learning and method thereof
KR101483946B1 (en) * 2013-10-28 2015-01-19 에스케이텔레콤 주식회사 Method for checking phonation of sentence, system and apparatus thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130043817A (en) * 2011-10-21 2013-05-02 포항공과대학교 산학협력단 Apparatus for language learning and method thereof
KR101483946B1 (en) * 2013-10-28 2015-01-19 에스케이텔레콤 주식회사 Method for checking phonation of sentence, system and apparatus thereof

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112133325A (en) * 2020-10-14 2020-12-25 北京猿力未来科技有限公司 Wrong phoneme recognition method and device
CN112133325B (en) * 2020-10-14 2024-05-07 北京猿力未来科技有限公司 Wrong phoneme recognition method and device
CN112908363A (en) * 2021-01-21 2021-06-04 北京乐学帮网络技术有限公司 Pronunciation detection method and device, computer equipment and storage medium
CN112908363B (en) * 2021-01-21 2022-11-22 北京乐学帮网络技术有限公司 Pronunciation detection method and device, computer equipment and storage medium
WO2022246782A1 (en) * 2021-05-28 2022-12-01 Microsoft Technology Licensing, Llc Method and system of detecting and improving real-time mispronunciation of words

Similar Documents

Publication Publication Date Title
KR101892734B1 (en) Method and apparatus for correcting error of recognition in speech recognition system
EP2301013B1 (en) Systems and methods of improving automated speech recognition accuracy using statistical analysis of search terms
US6487532B1 (en) Apparatus and method for distinguishing similar-sounding utterances speech recognition
US6985863B2 (en) Speech recognition apparatus and method utilizing a language model prepared for expressions unique to spontaneous speech
US7421387B2 (en) Dynamic N-best algorithm to reduce recognition errors
US8744856B1 (en) Computer implemented system and method and computer program product for evaluating pronunciation of phonemes in a language
US8880399B2 (en) Utterance verification and pronunciation scoring by lattice transduction
US9799350B2 (en) Apparatus and method for verifying utterance in speech recognition system
Witt et al. Language learning based on non-native speech recognition.
US20140156276A1 (en) Conversation system and a method for recognizing speech
CN111696557A (en) Method, device and equipment for calibrating voice recognition result and storage medium
US11282511B2 (en) System and method for automatic speech analysis
CN111951825A (en) Pronunciation evaluation method, medium, device and computing equipment
KR101672484B1 (en) Misprounciations detector and method for detecting misprounciations using the same
US20150179169A1 (en) Speech Recognition By Post Processing Using Phonetic and Semantic Information
US6859774B2 (en) Error corrective mechanisms for consensus decoding of speech
KR20160059265A (en) Method And Apparatus for Learning Acoustic Model Considering Reliability Score
US11232786B2 (en) System and method to improve performance of a speech recognition system by measuring amount of confusion between words
JP2015530614A (en) Method and system for predicting speech recognition performance using accuracy scores
US20180012602A1 (en) System and methods for pronunciation analysis-based speaker verification
US9269349B2 (en) Automatic methods to predict error rates and detect performance degradation
KR20130126570A (en) Apparatus for discriminative training acoustic model considering error of phonemes in keyword and computer recordable medium storing the method thereof
Arslan et al. Detecting and correcting automatic speech recognition errors with a new model
CN113053414A (en) Pronunciation evaluation method and device
JP2007052307A (en) Inspection device and computer program for voice recognition result

Legal Events

Date Code Title Description
A201 Request for examination