KR101672484B1

KR101672484B1 - Misprounciations detector and method for detecting misprounciations using the same

Info

Publication number: KR101672484B1
Application number: KR1020150103092A
Authority: KR
Inventors: 이근배; 이종훈; 방지수; 강세천; 서홍석
Original assignee: 포항공과대학교 산학협력단
Priority date: 2015-07-21
Filing date: 2015-07-21
Publication date: 2016-11-03

Abstract

A mispronunciation detecting device includes: a word recognizing part recognizing an inputted voice to generate a word recognition result; a phoneme recognizing part generating phoneme recognition information corresponding to the word recognition result; a pronunciation score calculating part calculating a pronunciation score of the inputted voice based on the phoneme recognition information; a mispronunciation probability calculating part a mispronunciation probability of the inputted voice by using the phoneme recognition information; and a mispronunciation determining part determining whether the inputted voice has a mispronunciation or not based on the word recognition result, the pronunciation score, and the mispronunciation probability. The phoneme recognizing part generates the phoneme recognition information by referring to a standard pronunciation dictionary and a multi-pronunciation dictionary.

Description

TECHNICAL FIELD [0001] The present invention relates to a pronunciation error detection apparatus and a pronunciation error detection method using the same,

The present invention relates to a technique for detecting erroneous pronunciation from a learner's utterance.

When a language learner learns a new language, pronunciation errors occur within a limited range that does not go far beyond standard pronunciation, depending on the learner's linguistic background, degree of education, and so on.

In general, conventional automatic pronunciation detection and evaluation methods do not actively consider the types of errors according to the learner, and open the possibilities for all types of errors for all pronunciations. Therefore, the evaluation is somewhat inaccurate, have.

The pronunciation error detecting apparatus and the pronunciation error detecting method using the same according to the embodiment are intended to accurately and quickly detect pronunciation errors.

The pronunciation error detecting apparatus includes: a word recognizing unit for recognizing an input speech and generating a word recognition result; A phoneme recognition unit for generating phoneme recognition information corresponding to the word recognition result; A pronunciation score calculation unit for calculating a pronunciation score of the input speech based on the phoneme recognition information; An error probability calculation unit for calculating a pronunciation error probability of the input speech using the phoneme recognition information; And an error determination unit that determines whether there is a pronunciation error in the input speech based on the word recognition result, the pronunciation score, and the pronunciation error probability, and the phoneme recognition unit refers to the standard pronunciation dictionary and the multi- Information.

The error determination unit of the pronunciation error detection apparatus according to the embodiment calculates a feedback index based on the word recognition result, the pronunciation score, and the pronunciation error probability, compares a predetermined threshold value with the feedback index, And judges whether or not there is a pronunciation error in the voice.

Further, the multi-phonetic dictionary of the pronunciation error detection apparatus according to the embodiment includes a set of predicted pronunciation errors.

Further, the word recognition result of the pronunciation error detection apparatus according to the embodiment may include a forced alignment result obtained by forcibly aligning the input speech, and the phoneme recognition information may include a first phoneme corresponding to the forced alignment result; And a second phoneme corresponding to the free sorting result, and a second phoneme corresponding to the free sorting result.

The forced alignment result of the pronunciation error detecting device according to the embodiment is obtained by aligning the input speech with reference to the standard pronunciation dictionary, and the phoneme recognizing part recognizes the free alignment result in the set of the predicted pronunciation errors .

In addition, the phoneme recognition unit of the pronunciation error detection apparatus according to the embodiment calculates the first probability of the forced alignment result and the second probability of the free alignment result.

The pronunciation score calculation unit of the pronunciation error detection apparatus according to the embodiment calculates the pronunciation score using the first probability and the second probability.

The apparatus may further include an error probability database including standard pronunciation information and multiple pronunciation information of the phoneme in the pronunciation error detection apparatus according to the embodiment, wherein the error probability calculation unit compares the phoneme recognition information with the error probability database To divide the phoneme recognition information into phonemes belonging to standard pronunciation or phonemes belonging to multiple pronunciation.

The error probability calculation unit of the pronunciation error detection apparatus according to the embodiment calculates the pronunciation error probability using the number of phonemes belonging to the standard pronunciation and the number of phonemes belonging to the multiple pronunciation.

The word recognition result of the pronunciation error detecting apparatus according to the embodiment is a word unit, and the phoneme recognition information is a phoneme unit.

The pronunciation error detecting apparatus and the pronunciation error detecting method using the same according to the embodiment have an effect of accurately and rapidly detecting a pronunciation error.

1 is a block diagram showing a configuration of a pronunciation error detecting apparatus according to an embodiment.
2 is an example of a multi-phonetic dictionary of FIG.
3 is a flowchart illustrating a weight error detection method according to an embodiment.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art to which the present invention pertains. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly explain the present invention, parts not related to the description are omitted, and the same or similar components are denoted by the same reference numerals throughout the specification.

Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise. Also, the terms " part, "" module," and " module ", etc. in the specification mean a unit for processing at least one function or operation and may be implemented by hardware or software or a combination of hardware and software have.

1 is a block diagram showing a configuration of a pronunciation error detecting apparatus according to an embodiment.

2 is an example of a multi-phonetic dictionary of FIG.

Hereinafter, a pronunciation error detecting apparatus according to an embodiment will be described with reference to Figs. 1 and 2. Fig.

Referring to FIG. 1, the pronunciation error detection apparatus 1 generates a word recognition result w and a word recognition reliability RC using an input speech O. The word recognition result (RR) is a result of recognizing the input speech (O) on a word-by-word basis. The pronunciation error detecting device 1 generates phoneme recognition information PI using the input speech O and the word recognition result w. The phoneme recognition information (PI) is a result of recognizing the input speech (O) on a phoneme basis.

The pronunciation error detecting device 1 calculates a pronunciation score PC using the phoneme recognition information PI and calculates a pronunciation error PC on the basis of the pronunciation score PC and the word recognition reliability RC . At this time, the input voice O is a voice that the learner reads and pronounces a predetermined word or sentence, and the input voice O includes the voice of the word unit and the voice of the phoneme unit.

The pronunciation error detection device 1 includes a word recognition unit 10, a phoneme recognition unit 20, a pronunciation score calculation unit 30, a standard pronunciation dictionary 40, a multiple pronunciation dictionary 50, 60, an error probability calculation unit 70, an error probability database 80, and an error determination unit 90.

The word recognition unit 10 generates a word recognition result w and a word recognition reliability RC using the input speech O. [ The word recognition unit 10 may be constructed using a conventional general speech recognition apparatus.

The phoneme recognition unit 20 generates phoneme recognition information PI corresponding to the word recognition result w based on the input speech O with reference to the standard pronunciation dictionary 40 and the multiple pronunciation dictionary 50. [

The phoneme recognition information PI includes a forced alignment result R1 obtained by forcibly aligning the input speech O and a phoneme p corresponding to the forced alignment result R1, And a phoneme (q) corresponding to the alignment result (R2) and the free alignment result (R2). The forced sorting result R1 is a result of forcibly aligning the input speech O by word unit by referring to the standard pronunciation dictionary 40. [ The free sorting result (R2) is a result of recognizing the input speech (O) on a phoneme-by-phoneme basis within the learner's pronunciation error rule prediction range of the multi-phonetic dictionary (50).

The standard pronunciation dictionary 40 includes pronunciation rules composed of a standard pronunciation method and a pronunciation symbol, and can be defined for at least one or more languages. The multi-phonetic dictionary 50 extracts a pronunciation error rule by analyzing a pronunciation error pattern of the actual learner and applies the extracted pronunciation error rule to the standard pronunciation dictionary 40 to generate a predicted pronunciation error set (PR , See FIG. 2). The multi-phonetic dictionary 50 may be generated using various machine learning algorithms, and may be generated using an n-best result of Conditional Random Fields, but the embodiment is not limited thereto.

Hereinafter, a method in which the learner reads the word " only "and the word recognition unit 10 generates the word recognition result w, and the phoneme recognition unit 20 generates phoneme recognition information (PI ) Will be described.

The word recognition unit 10 generates a word recognition result w of the word "only"

The phoneme recognition unit 20 extracts the standard pronunciation SP corresponding to "only" by referring to the standard pronunciation dictionary 40 and forcibly aligns the input speech O to generate a forced selection result R1 ("only" / Ow / / n / / l / / iy /).

Referring to FIG. 2, the multi-phonetic dictionary 50 may include a pronunciation error rule corresponding to the extracted "only " and a set of predicted pronunciation errors (PR) of the learner. The pronunciation error rule can be extracted by referring to the standard pronunciation SP corresponding to "only ".

In general, conventional speech recognizers are always designed based on an algorithm that finds the maximum value within a given range. However, the pronunciation score calculation unit 60 selects the phoneme q from the predicted pronunciation error set PR. That is, the pronunciation score calculation unit 60 can obtain a desired value more naturally than the conventional speech recognizer by limiting the search range to the predicted set of pronunciation errors (PR).

The phoneme recognition unit 20 reads the free sorting result R2 of the set of pronunciation errors PR predicted using the recognition result RR, the pronunciation ao / / l / - /, which is closest to the forced sorting result R1, lt; / RTI >

That is, Table 1 below shows the word recognition result (w), the forced alignment result (R1), and the free alignment result (R2) of the input speech (O).

Word recognition result (w) only Forced Sort Result (R1) / ow / n / l / Free alignment result (R2) / ao / / l / - / iy /

1, the phoneme recognition unit 20 calculates a forced alignment probability P (O / p), which is a probability value of the forced alignment result R1, And calculates the probability (P (O / q)).

P (O / p) is the score when the input speech (O) is aligned with the phoneme (p) by forced alignment (ie, the phoneme the probability that the input speech O will be generated when p (p) is given).

The phoneme recognition probability (P (O / q)) is the probability of multiple pronunciations and the phoneme recognition probability P (O / q) (I.e., the probability that the input speech O will be generated when the phoneme q is given).

The pronunciation score calculation unit 60 calculates the pronunciation score PC using the following expression (1).

[Equation 1]

PC = 1 / N | log { (P (O / p)) / (max q ∈ Q P (O / q))} |

At this time, N in Equation (1) is the pronunciation length (e.g., frame length (ms)) of the input speech O and is equal to the total number of observation frames. N plays a role of normalizing the log expression of [Equation 1] such that it is not affected by the length. Also, Q is a set of predicted pronunciation errors (PR).

The logarithmic fraction in Equation (1) becomes the maximum value (1) when the phoneme recognition probability P (O / q) is equal to the forced alignment probability P (O / p). That is, the maximum value (1) is obtained when the phoneme (q) forcibly aligned and the phoneme (p) freely recognized are the same phoneme. Further, if the phoneme recognition probability P (O / q) is different from the forced alignment probability P (O / p), the logarithmic fraction of [Equation 1] In other words, when the phoneme (q) forcibly aligned and the phoneme (p) freely recognized are different phonemes, the value is smaller than 1. As the forced alignment probability P (O / p) is smaller, the logarithmic fraction of the expression (1) becomes smaller as the phoneme recognition probability P (O / q) is larger.

The error probability calculation unit 70 compares the phoneme recognition information PI with the pronunciation information of the error probability database 80 constructed in advance and compares the phoneme recognition information PI with the phoneme belonging to the standard pronunciation and the phoneme belonging to the multi- It is classified. The error probability database 80 contains information on standard pronunciation and multiple pronunciation of phonemes.

The error probability calculation unit 70 calculates the pronunciation error probability EP by substituting the number of phonemes belonging to standard pronunciation and the number of phonemes belonging to multiple pronunciation into the following equation (2).

&Quot; (2) "

EP = P (e | c, v) = cnt (c, v, e) / cnt

In Equation (2), e is a pronunciation error, c is a phoneme belonging to the standard pronunciation, v is a phoneme belonging to multiple pronunciations, and cnt () means the number of occurrences.

The error determination unit 90 calculates the feedback index FI by substituting the word recognition reliability RC, the word recognition result w, the pronunciation score PC, and the pronunciation error probability EP into the following equation (3) do.

&Quot; (3) "

FI = [1 / {1 + e ^ - (a 1 x 1 + a 2 x 2 + a 0)}] P (w / O)

In Equation (3), x ₁ and x ₂ are the pronunciation score (PC) and the pronunciation error probability (EP), respectively, and a ₁ , a ₂ and a ₀ are optimized for system development data according to the machine learning algorithm It is a constant. P (w / O) represents the probability of the word recognition result (w) for the input speech (O).

The error determination unit 90 may determine that the pronunciation of the input speech O is an error and generate error feedback for phonemes whose feedback index FI exceeds a predetermined threshold value (for example, 0.5). Further, the error determination unit 90 determines that the pronunciation of the input speech O is correct for the phoneme whose feedback index FI does not exceed the predetermined threshold value, and generates the normal feedback or does not generate the feedback .

The threshold value can be experimentally determined in accordance with the purpose and object of the pronunciation error detecting apparatus.

Hereinafter, a pronunciation error detection method according to an embodiment will be described with reference to FIG.

Referring to FIG. 3, in step S10, the word recognition unit 10 generates a word recognition result w and a word recognition reliability RC using the input speech O. FIG.

In step S20, the phoneme recognition unit 20 refers to the standard pronunciation dictionary 40 and the multi-phonetic dictionary 50, and based on the input speech O, generates phoneme recognition information (" PI).

In step S30, the pronunciation score calculation unit 60 calculates a pronunciation score PC.

In step S40, the error probability calculation unit 70 calculates the pronunciation error probability EP.

In step S50, the error determination unit 90 calculates the feedback index FI and determines an error with respect to the pronunciation of the input speech O.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, , Changes, deletions, additions, and so forth, other embodiments may be easily suggested, but these are also within the scope of the present invention.

Claims

A word recognition unit for recognizing an input speech and generating a word recognition result;
A phoneme recognition unit for generating phoneme recognition information corresponding to the word recognition result;
A pronunciation score calculation unit for calculating a pronunciation score of the input speech based on the phoneme recognition information;
An error probability calculation unit for calculating a pronunciation error probability of the input speech using the phoneme recognition information; And
And an error determination unit for determining whether there is a pronunciation error in the input speech based on the word recognition result, the pronunciation score, and the pronunciation error probability,
The phoneme recognition unit generates the phoneme recognition information by referring to a standard pronunciation dictionary and a multi-phonetic dictionary,
Wherein the error determination unit calculates a feedback index based on the word recognition result, the pronunciation score, and the pronunciation error probability, and compares a predetermined threshold value with the feedback index to determine whether the input speech has a pronunciation error Error detection device.

delete

The method according to claim 1,
Wherein the multiple pronunciation dictionary includes a set of predicted pronunciation errors.

The method of claim 3,
Wherein the word recognition result includes a forced alignment result in which the input speech is forcibly aligned,
In the phoneme recognition information,
A first phoneme corresponding to the forced alignment result; And
And a second phoneme corresponding to the result of free sorting in which the input speech is freely aligned.

5. The method of claim 4,
Wherein the forced alignment result is obtained by aligning the input speech with reference to the standard pronunciation dictionary,
Wherein the phoneme recognition unit generates the free alignment result in a set of the predicted pronunciation errors.

6. The method of claim 5,
Wherein the phoneme recognition unit calculates a first probability of the forced alignment result and a second probability of the free alignment result.

The method according to claim 6,
And the pronunciation score calculation unit calculates the pronunciation score using the first probability and the second probability.

8. The method of claim 7,
Further comprising an error probability database including standard pronunciation information and multiple pronunciation information of phonemes,
Wherein the error probability calculator comprises:
And compares the phoneme recognition information with the error probability database to classify the phoneme recognition information into a phoneme belonging to standard pronunciation or a phoneme belonging to multiple pronunciation.

9. The method of claim 8,
Wherein the error probability calculator comprises:
Wherein the pronunciation error probability is calculated using the number of phonemes belonging to the standard pronunciation and the number of phonemes belonging to the multiple pronunciation.

The method according to claim 1,
Wherein the word recognition result is a word unit and the phoneme recognition information is a phoneme unit.