JP2002040926A

JP2002040926A - Foreign language-pronunciationtion learning and oral testing method using automatic pronunciation comparing method on internet

Info

Publication number: JP2002040926A
Application number: JP2000395975A
Authority: JP
Inventors: Juei Ri; 壽永李; Shoei Tei; 韶永鄭
Original assignee: Korea Advanced Institute of Science and Technology KAIST
Current assignee: Korea Advanced Institute of Science and Technology KAIST
Priority date: 2000-07-18
Filing date: 2000-12-26
Publication date: 2002-02-08
Also published as: KR20020007597A; KR100568167B1

Abstract

PROBLEM TO BE SOLVED: To provide a language learning method capable of comparing the voice of a native speaker with that of a leaner quickly and correctly through the automatic voice comparing algorithm of a dynamic time warping base without recognizing the voice of the learner. SOLUTION: This method with which a learner can improve listening comprehension and practice conversation of a foreign language by using a computer connected to the Internet and also which is capable of objectively testing the conversational ability of the leaner is provided with a learner's voice signal 10, a native speaker's voice signal 20, an automatic voice comparing network 30, a DTW (dynamic time warping) base difference comparing model 40, an error correcting nerve circuitry 50, an expert evaluation comparing network 60 and an error calculating network 70.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、インターネット上
での自動初本比較方法を用いた外国語発音学習及び口頭
テストのための学習サービスの提供方法に関するもので
ある。特に、ネーティブスピーカー音声、学習者音声、
専門家評価からなるデータを用いて自動音声比較ネット
ワークを学習し、学習された自動音声比較ネットワーク
を用いて学習者が発音した音声をネーティブスピーカー
の発音と比較し、発音の正確度を求めるテストが行える
ようにする双方向外国語発音学習及び口頭テストのため
のサービスに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for providing a learning service for foreign language pronunciation learning and an oral test using an automatic first-book comparison method on the Internet. In particular, native speaker audio, learner audio,
A test to learn the automatic speech comparison network using data consisting of expert evaluations, compare the pronunciation of the learner with the pronunciation of the native speaker using the learned automatic speech comparison network, and determine the accuracy of the pronunciation. It relates to a service for interactive bilingual pronunciation learning and oral testing that can be performed.

【０００２】[0002]

【従来の技術】従来の外国語学習方法において、聴き取
りと会話学習はカセットテープやビデオテープを通じて
ネーティブスピーカーの発音を繰り返して聴き、学習者
が発音の真似をし、ネーティブスピーカーの発音と似通
った程度を自らが判断しながら発音が正確になるように
反復して学習した。2. Description of the Related Art In a conventional method of learning a foreign language, listening and conversation learning are performed by repeating the pronunciation of a native speaker through a cassette tape or a videotape, and the learner imitates the pronunciation and resembles the pronunciation of a native speaker. While learning the degree, they learned repeatedly so that their pronunciation was accurate.

【０００３】斯かる学習方式は、自分の外国語発音に対
する客観的な評価が行えないので、客観的な尺度を通じ
て自分とネーティブスピーカーとの発音における差を求
める努力が行われてきた。[0003] Since such a learning method cannot objectively evaluate the pronunciation of a foreign language, efforts have been made to find the difference in pronunciation between oneself and a native speaker through an objective scale.

【０００４】すなわち、従来には時間領域での音声差、
例えば、音声信号のトンと全体発音時間の差を単純に比
較し、ネーティブスピーカーの発音と学習者との発音を
比較する方法が主に用いられた。[0004] That is, conventionally, the sound difference in the time domain,
For example, a method has been mainly used in which the difference between the ton of the voice signal and the overall pronunciation time is simply compared, and the pronunciation of the native speaker and the pronunciation of the learner are compared.

【０００５】最近、音声信号処理技術を用いた発音比較
方法が開発されており、該方法は隠れマルコフ・モデル
(Hidden Markov Model;以下、ＨＭＭという。）を用い
て学習者の発音音声に対する認識をした後、ネーティブ
スピーカーの音声と比較するアルゴリズムがほとんどで
ある。Recently, a pronunciation comparison method using audio signal processing technology has been developed, and the method is based on a hidden Markov model.
(Hidden Markov Model; hereinafter, referred to as HMM). Most of the algorithms compare the pronunciation of the learner's pronunciation with the native speaker's speech.

【０００６】しかし、学習者が周辺雑音のある環境で発
音をしたり、学習者の発音が不分明したりして認識上に
おいて誤謬が発生すると、ネーティブスピーカー発音と
の差が意味なくなる可能性が多い。However, if the learner makes a pronunciation in an environment with ambient noise or the learner's pronunciation is indistinguishable and an error occurs in recognition, the difference from the native speaker pronunciation may become meaningless. Many.

【０００７】また、学習者の外国語における聴き取りと
会話能力を評価するためには、ＴＳＥ(Test of Speakin
g English)，ＳＥＰＴ(Spoken English Proficiency Te
st)等のような専門評価試験を、指定された時間及び場
所でネーティブスピーカー語学専門家に直接質問され応
答をするインタビュー方式により、学習者の外国語能力
を評価することが出来た。In order to evaluate a learner's ability to listen and speak in a foreign language, a TSE (Test of Speakin
g English), SEPT (Spoken English Proficiency Te)
The student's foreign language proficiency could be evaluated by an interview system in which a professional evaluation test such as st) was directly asked and answered by a native speaker language expert at a designated time and place.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、これら
の方法は、やはり外国語能力をテストするにおいて時間
と空間が制約され、専門家の評価も疲労度や周辺状況に
よる主観的な要素に影響を受けやすいという問題があっ
た。However, these methods are still limited in time and space in testing foreign language proficiency, and the evaluation of experts is influenced by subjective factors such as the degree of fatigue and surrounding conditions. There was a problem that it was easy.

【０００９】本発明は、前記問題点に鑑みてなされたも
のであり、本発明の目的は、動的時間ワーピング(Dynam
ic Time Warping;以下、ＤＴＷという。）基盤の自動音
声比較アルゴリズムを通じて学習者の音声を認識せず、
ネーティブスピーカー音声との差を迅速且つ正確に比較
することが出来る語学学習方法を具現するようにしたイ
ンターネット上での自動発音比較方法を用いた外国語発
音及学習及び口頭テスト方法を提供することにある。The present invention has been made in view of the above problems, and an object of the present invention is to provide dynamic time warping (Dynam warping).
ic Time Warping; ) Does not recognize the learner's speech through the underlying automatic speech comparison algorithm,
To provide a foreign language pronunciation and learning and oral test method using an automatic pronunciation comparison method on the Internet, which implements a language learning method capable of quickly and accurately comparing the difference with a native speaker voice. is there.

【００１０】本発明の他の目的は、インターネット上の
ウェッブ基盤状態で学習者が時間と場所に拘らず、希望
の時間と場所で自分の外国語発音を練習して口頭テスト
を受けることが出来る、インターネット上での自動発音
比較方法を用いた外国語発音学習及び口頭テスト方法を
提供することにある。[0010] Another object of the present invention is to enable a learner to practice his / her foreign language pronunciation and take an oral test at a desired time and place regardless of time and place in a web-based state on the Internet. Another object of the present invention is to provide a foreign language pronunciation learning and oral test method using an automatic pronunciation comparison method on the Internet.

【００１１】[0011]

【課題を解決するための手段】上述した目的を達成する
ために本発明は、ネーティブスピーカーの音声と学習者
の音声を自動により比較し、その差を求める外国語発音
学習方法において、学習者音声とネーティブスピーカー
音声の音韻及び韻律の差値がＤＴＷ基盤差比較ネットワ
ークによって計算され、誤差計算ネットワークによって
専門家評価比較ネットワークで評価した比較数値との差
が計算され、その差値が小さくなるようにＤＴＷ基盤差
ネットワークを学習するようにするインターネット上で
の自動発音比較方法を用いた外国語発音学習及び口頭テ
スト方法が提示される。SUMMARY OF THE INVENTION In order to achieve the above-mentioned object, the present invention relates to a method for learning foreign language pronunciation which automatically compares a native speaker's voice with a learner's voice and obtains the difference. The difference value between the phoneme and the prosody of the native speaker voice is calculated by the DTW-based difference comparison network, and the difference between the difference value and the comparison value evaluated by the expert evaluation comparison network is calculated by the error calculation network so that the difference value becomes smaller. A foreign language pronunciation learning and oral test method using an automatic pronunciation comparison method on the Internet to learn a DTW based difference network is presented.

【００１２】[0012]

【発明の実施の形態】以下、本発明をその実施の形態を
示す図面に基づいて具体的に説明する。図１は、本発明
に係るインターネット上での外国語発音学習及び口頭テ
ストのための自動音声比較ネットワークアルゴリズムを
示したブロック構成図である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be specifically described below with reference to the drawings showing the embodiments. FIG. 1 is a block diagram showing an automatic speech comparison network algorithm for foreign language pronunciation learning and an oral test on the Internet according to the present invention.

【００１３】まず、自動音声比較ネットワークアルゴリ
ズムの具現のために、与えられたデータはネーティブス
ピーカーと学習者が発音したそれぞれの単語や文章に対
し、専門家が比較した発音差値である。普通、発音差値
は、１〜５まで５個の離散値(discrete value)を採る。
ここで、ネーティブスピーカーの音声は時間によって発
音記号が表示されたデータ(transcribed date)として仮
定する。First, in order to implement an automatic voice comparison network algorithm, given data is a pronunciation difference value obtained by a specialist comparing words and sentences pronounced by a native speaker and a learner. Usually, the pronunciation difference value takes five discrete values (1 to 5).
Here, the voice of the native speaker is assumed to be data (transcribed date) in which phonetic symbols are displayed according to time.

【００１４】図１に図示された自動音声比較ネットワー
クアルゴリズムをみると、学習者音声信号１０と、ネー
ティブスピーカー音声信号２０、自動音声比較ネットワ
ーク３０、専門家評価比較ネットワーク６０及び誤差計
算ネットワーク７０とから構成されている。Referring to the automatic speech comparison network algorithm shown in FIG. 1, the learner speech signal 10, the native speaker speech signal 20, the automatic speech comparison network 30, the expert evaluation comparison network 60 and the error calculation network 70 It is configured.

【００１５】この時、学習者の音声信号１０とネーティ
ブスピーカーの音声信号２０はネットワーク３０によっ
て音韻と韻律の差値が計算される。自動音声比較ネット
ワーク３０ではＤＴＷ基盤差比較ネットワーク４０と誤
差補正神経回路網５０をさらに含んで構成される。At this time, the difference between the phoneme and the prosody of the learner's voice signal 10 and the native speaker's voice signal 20 is calculated by the network 30. The automatic voice comparison network 30 further includes a DTW-based difference comparison network 40 and an error correction neural network 50.

【００１６】誤差計算ネットワーク７０には、専門家評
価比較ネットワーク６０の数値と自動音声比較ネットワ
ーク３０で求めた比較数値との差を求め、その差値が小
さくなるように誤差補正神経回路網５０の学習が行われ
る。An error calculation network 70 obtains a difference between the value of the expert evaluation comparison network 60 and the comparison value obtained by the automatic voice comparison network 30, and the error correction neural network 50 operates so that the difference value becomes small. Learning takes place.

【００１７】図２は、図１に図示されたＤＴＷ基盤差比
較モデルの部分詳細図である。図２に図示されたＤＴＷ
基盤差比較モデル（又はＤＴＷ基盤差比較モデルネット
ワーク）をさらに詳しく説明すると、学習者の音声信号
１０とネーティブスピーカーの音声信号２０はＤＴＷ基
盤差比較モデル４０により音韻差と韻律差が計算され
る。FIG. 2 is a partial detailed view of the DTW-based difference comparison model shown in FIG. The DTW shown in FIG.
To explain the base difference comparison model (or DTW base difference comparison model network) in more detail, the phonetic difference and the prosody difference of the learner's voice signal 10 and the native speaker's voice signal 20 are calculated by the DTW base difference comparison model 40.

【００１８】まず、音韻差計算モデル４２には、強さ差
異比較４２ａブロックと、時間差比較４２ｂブロック
と、周波数差比較４２ｃブロックが計算され、与えられ
たネーティブスピーカーの文章発音に対して単語別、音
節別、音素別に学習者発音の正確度を計算するブロック
である。First, in the phoneme difference calculation model 42, a strength difference comparison block 42a, a time difference comparison block 42b, and a frequency difference comparison block 42c are calculated. This block calculates the accuracy of learner pronunciation for each syllable and phoneme.

【００１９】強さ差異比較４２ａブロックでは話者(spe
aker)による音声信号の特徴を全て無くした後、２音声
の信号間の差を求める。すなわち、学習者が発音した音
声とネーティブスピーカー音声の言語学メッセージ(lin
guistic message)のみの差を求めるブロックと言える。
時間差比較４２ｂブロックは文章、単語、音節、音素等
の発音持続時間の差を求めるブロックであり、周波数差
比較４２ｃブロックは学習者が発音した音声とネーティ
ブスピーカーの音声間のホルマント(formant)位置差を
計算するブロックを示す。In the strength difference comparison block 42a, the speaker (spe
After eliminating all the characteristics of the audio signal by aker), a difference between the two audio signals is obtained. In other words, the linguistic messages (lin
guistic message).
The time difference comparison block 42b is a block for calculating the difference in pronunciation duration of sentences, words, syllables, phonemes, etc. The frequency difference comparison block 42c is a formant position difference between the voice pronounced by the learner and the voice of the native speaker. 2 shows a block for calculating.

【００２０】音韻差計算モデル４４は、文章全体で音素
と音素の間、音節と音節の間、単語と単語の間等を学習
者が正確に発音したかを求めるブロックを示す。音韻差
計算モデル４４には、強勢(stress)差比較４４ａブロッ
クと、イントネーション(intonation)差比較４４ｂブロ
ック及び、リズム(rhythm)差比較４４ｃブロックが計算
される。The phoneme difference calculation model 44 indicates a block for determining whether or not the learner correctly pronounces between phonemes, between syllables, between words, and the like in the entire sentence. In the phoneme difference calculation model 44, a stress difference comparison block 44a, an intonation difference comparison block 44b, and a rhythm difference comparison block 44c are calculated.

【００２１】この時、音韻計算モデル４４は、学習者と
ネーティブスピーカー音声のピッチ輪郭線(pitch conto
ur)差から得られる。すなわち、学習者とネーティブス
ピーカーの音声で時間によるピッチ形状を既存のピッチ
検出方法を用いる。強勢差比較４４ａブロックは、ピッ
チ輪郭線で最大ピーク(peak)値の相対的な位置差を求め
るブロックである。イントネーション差比較４４ｂブロ
ックは、学習者とネーティブスピーカー発音のエンド部
分で２つのピッチ輪郭線の傾き差から求めるブロックを
示しており、リズム差比較４４ｃブロックは隣接した単
語や音節の間で現れるピークとバリー(valley)の相対的
な位置及び大きさから計算されるブロックを示す。At this time, the phoneme calculation model 44 generates a pitch contour (pitch contour) of the learner and the native speaker voice.
ur) obtained from the difference. That is, an existing pitch detection method is used to determine the pitch shape according to time in the voices of the learner and the native speaker. The stress difference comparison block 44a is a block for calculating a relative position difference between the maximum peak values in the pitch contour line. The intonation difference comparison block 44b is a block obtained from the difference between the inclinations of the two pitch contour lines at the end of the learner and the native speaker pronunciation. The rhythm difference comparison block 44c is a block that includes a peak appearing between adjacent words and syllables. Fig. 4 shows a block calculated from the relative position and size of the valley.

【００２２】前記のようなＤＴＷ基盤差比較モデル４０
で計算された６つの出力値は専門家評価ネットワーク６
０の専門家評価数値と比較される前に誤差補正神経回路
網５０を通過する。この時、誤差補正神経回路網５０
は、自動により計算された発音比較値が専門家の評価値
に近づくように音韻と韻律の差計算ネットワーク、すな
わち、ＤＴＷ基盤差比較ネットワーク４０の６つの出力
値を非線形的に組み立てるネットワークである。誤差補
正神経回路網５０の構造としては、２層の多層構造パー
セプロトンモデル(Multi-Layer Perceptron)が適用され
る。DTW-based difference comparison model 40 as described above
6 output values calculated by
It passes through the error correction neural network 50 before being compared with the expert evaluation value of zero. At this time, the error correction neural network 50
Is a network for calculating the difference between phonemes and prosody, that is, a network for nonlinearly assembling the six output values of the DTW-based difference comparison network 40 so that the pronunciation comparison value calculated automatically approaches the expert evaluation value. As a structure of the error correction neural network 50, a multi-layer perse proton model (Multi-Layer Perceptron) of two layers is applied.

【００２３】自動発音比較ネットワークのアルゴリズム
により計算された数値と専門家評価比較ネットワーク６
０の専門家評価数値は誤差補正神経回路網５０で誤差が
計算され、信号回路網ネットワークのシナプス(synaps
e)加重値を学習することになる。この時、学習は既存の
２乗平均誤差関数(Mean Squared Error Function)と誤
差逆伝播学習アルゴリズム(Error-Back Propagation al
gorithm)からなる。Numerical values calculated by the algorithm of the automatic pronunciation comparison network and the expert evaluation comparison network 6
For the expert evaluation value of 0, the error is calculated by the error correction neural network 50 and the synapse (synaps) of the signal network is calculated.
e) We will learn weights. At this time, learning is based on the existing Mean Squared Error Function and the error back propagation learning algorithm (Error-Back Propagation al
gorithm).

【００２４】図３は、図２に図示された音韻差計算モデ
ルで強さ差異比較の詳細流れ図である。図３に図示され
た強さ差異比較ブロックの詳細流れ図を説明すると、ネ
ーティブスピーカーと学習者音声の強さ差異計算は次の
ようなアルゴリズムにより具現される。まず、ネーティ
ブスピーカーの音声信号Ｓ１００でエンド点を抽出Ｓ１
０２し、学習者の音声信号Ｓ１０１でもエンド点を抽出
Ｓ１０３し、エネルギー規準化ブロックＳ１０４を通
過する。次いで、マイクによる出力エネルギーの差を除
去した後、フレームブロック化Ｓ１０５，Ｓ１０６し
てフーリエ変換Ｓ１０７，Ｓ１０８を行う。この時、
フレームブロック化Ｓ１０５，Ｓ１０６は、時系列に入
る音声信号を数十ミリ秒(milli-second)で分け、ハミン
グ窓(Hamming window)やハニング窓(Hanning window)を
覆う部分である。FIG. 3 is a detailed flowchart for comparing the strength difference in the phoneme difference calculation model shown in FIG. Referring to the detailed flowchart of the strength difference comparison block shown in FIG. 3, the calculation of the strength difference between the native speaker and the learner's voice is implemented by the following algorithm. First, an end point is extracted S1 from the audio signal S100 of the native speaker.
02, the end point is also extracted S103 in the learner's voice signal S101, and the signal passes through the energy standardization block S104. Next, after removing the difference in output energy from the microphones, frame blocking S105 and S106 are performed and Fourier transforms S107 and S108 are performed. At this time,
The frame blocking S105, S106 is a portion that divides the audio signal entering the time series into tens of milliseconds (milli-seconds) and covers a Hamming window or a Hanning window.

【００２５】また、フーリエ変換Ｓ１０７，Ｓ１０８
は、２音声信号を各フレーム別にフーリエ変換によって
時間領域信号を周波数領域信号に変える。次いで、線形
周波数変換とＤＴＷを通じて話者による特性を除去する
時間−周波数動的ワーピングＳ１０９した後、バーク(B
ark)単位での周波数ワーピングＳ１１０，Ｓ１１１と、
ラウドネス(Loudness)単位での強さワーピングＳ１１２
２，Ｓ１１３過程を通ることになる。Also, Fourier transforms S107 and S108
Converts a time domain signal into a frequency domain signal by Fourier transform of two audio signals for each frame. Next, after performing time-frequency dynamic warping S109 for removing the characteristics of the speaker through linear frequency conversion and DTW, Bark (B
ark) frequency warping in units of S110 and S111;
Intensity warping in loudness units S112
2, through the step S113.

【００２６】この時、時間−周波数動的ワーピングＳ１
０９は、ネーティブスピーカーの音声と学習者の音声の
間で話者の差による影響、すなわち、発音持続(duratio
n)時間の差と声道長さ差による周波数領域の差を無くす
ためのブロックである。発音持続時間差は、ＤＴＷによ
って除去でき、声道長さによる差は、線形周波数変換に
よって除去することが出来る。At this time, time-frequency dynamic warping S1
09 is the effect of the speaker difference between the native speaker's voice and the learner's voice, that is, the duration of the pronunciation (duratio
n) A block for eliminating a difference in a frequency domain due to a time difference and a vocal tract length difference. The difference in pronunciation duration can be removed by DTW, and the difference due to vocal tract length can be removed by linear frequency conversion.

【００２７】バーク(Bark)単位での周波数ワーピングＳ
１１０，Ｓ１１１は、ヘルツＨｚ単位の音声信号を音響
心理学的(psychoacoustic)周波数単位であるバーク(Bar
k)単位に変える部分である。ラウドネス(Loudness)単位
での強さワーピングＳ１１２，Ｓ１１３は、フーリエ変
換Ｓ１０７，Ｓ１０８を通じて出てきたスペクトルのエ
ネルギーを音響心理学的強さ単位であるラウドネス単位
に変えるブロックである。Frequency Warping S in Bark Units
Reference numerals 110 and S111 denote audio signals in Hertz Hz as Bark (psychoacoustic) frequency units.
k) This is the part that changes into units. The intensity warping in units of loudness (S112, S113) is a block that converts the energy of the spectrum output through the Fourier transforms S107, S108 into a loudness unit, which is a psychoacoustic intensity unit.

【００２８】フーリエ逆変換Ｓ１１４，Ｓ１１５した
後、ケプストラム(cepstrum)計算ブロックＳ１１６，Ｓ
１１７によって最終的にケプストラム特徴ベクターを抽
出する。すなわち、フーリエ逆変換Ｓ１１４，Ｓ１１５
はラウドネス単位強さワーピングを行った信号が実数値
を採り、対称的であるので、コサイン変換(Cosine tran
sform)により計算する部分である。After the inverse Fourier transforms S114 and S115, cepstrum calculation blocks S116 and S116
Finally, a cepstrum feature vector is extracted by 117. That is, Fourier inverse transform S114, S115
Since the signal subjected to loudness unit strength warping takes a real value and is symmetric, the cosine transform (Cosine tran
sform).

【００２９】ネーティブスピーカーの音声信号Ｓ１００
と学習者の音声信号Ｓ１０１で比較されるケプストラム
特徴ベクターをケプストラム計算ブロックＳ１１６，Ｓ
１１７によって最終的に抽出する。この時、前記した本
発明の方法は、既存のＰＬＰ(perceptual linear predi
ction)特徴抽出方法と類似するが、音声信号の話者特性
を無くすための時間−周波数動的ワーピングを特徴抽出
する過程に施行する時間−周波数動的ワーピングブロッ
クＳ１０９が新しく追加された点における差がある。The audio signal S100 of the native speaker
The cepstrum feature vector to be compared with the learner's speech signal S101 is used as the cepstrum calculation blocks S116 and S116.
Finally, it is extracted by 117. At this time, the method of the present invention described above uses the existing PLP (perceptual linear predi
ction) Similar to the feature extraction method, except that a time-frequency dynamic warping block S109, which performs a time-frequency dynamic warping feature extraction process for eliminating speaker characteristics of the audio signal, is newly added. There is.

【００３０】次いで、フレーム別距離を計算した後Ｓ１
１８、音韻の強さ差異単位に変換させることになる１１
９。この時、フレーム別距離計算ブロックＳ１１８は、
ネーティブスピーカーと学習者音声の特徴ベクターをフ
レーム単位で距離差を計算する部分である。距離計算
は、ユークリッド距離に計算し、全てのフレームに対
し、距離差の値を出して２音声の発音差値にする。Next, after calculating the distance for each frame, S1
18, converted to phoneme intensity difference unit 11
9. At this time, the frame-by-frame distance calculation block S118
This part calculates the distance difference between the native speaker and the feature vector of the learner's voice in frame units. In the distance calculation, a Euclidean distance is calculated, and a distance difference value is output for every frame to obtain a pronunciation difference value of two voices.

【００３１】また、フレーム別距離計算ブロックＳ１１
８で求めた発音差値を専門家評価数値との比較のため１
から５までの大きさを有するように音韻の強さ差異計算
ブロックＳ１１７で線形変換やロジスティックス(logis
tic)変換を用いて変換する。A frame-by-frame distance calculation block S11
1 to compare the pronunciation difference value obtained in 8 with the expert evaluation value
In the phoneme intensity difference calculation block S117, linear transformation or logistics (logis
tic) transform.

【００３２】図４は、図３に図示された時間−周波数動
的ワーピング段階の部分詳細図である。図４を説明する
と、フーリエ変換されたネーティブスピーカー音声２０
０と学習者音声２０１は、これらのケプストラム特徴ベ
クターのフレーム別距離が最小になるように、時間と周
波数領域でワーピングが起きる。すなわち、学習者音声
２０１は、話者間の音声信号差の主要因として挙げる声
道長さによる差を無くすために、線形周波数ワーピング
ネットワーク２０２を通過し、ネーティブスピーカー音
声２００との発音時間差を無くすために、非線形動的時
間ワーピング２０３を通過する。FIG. 4 is a partial detailed view of the time-frequency dynamic warping stage shown in FIG. Referring to FIG. 4, the Fourier-transformed native speaker sound 20
0 and the learner's speech 201 are warped in the time and frequency domains so that the distance of each of these cepstrum feature vectors is minimized. That is, the learner's speech 201 passes through the linear frequency warping network 202 to eliminate the difference in pronunciation time from the native speaker speech 200 in order to eliminate the difference due to vocal tract length, which is cited as the main cause of the speech signal difference between speakers. Pass through a non-linear dynamic time warping 203.

【００３３】そして、ケプストラム特徴ベクターを抽出
するブロック２０４，２０５に入り、ケプストラムが計
算され、フレーム別距離計算ブロック２０６でケプスト
ラムベクター間のユークリッド距離が計算される。この
時、この誤差が最小になるように線形周波数ワーピング
２０２と非線形動的時間ワーピング２０３が行われる。Then, the process enters the blocks 204 and 205 for extracting the cepstrum feature vector, calculates the cepstrum, and calculates the Euclidean distance between the cepstrum vectors in the frame-by-frame distance calculation block 206. At this time, linear frequency warping 202 and non-linear dynamic time warping 203 are performed so that this error is minimized.

【００３４】一方、図２の時間差比較４２ｂブロック
は、図４の時間−周波数動的ワーピングによって計算さ
れた学習者とネーティブスピーカーの特徴ベクターで時
間によるワーピング程度を利用して計算することが出来
る。すなわち、時間軸にて整列された２音声信号の音素
別発音持続時間差を全て出して総音素の個数で割算した
値が時間差比較４２ｂブロックの出力値になる。On the other hand, the time difference comparison block 42b of FIG. 2 can be calculated using the warping degree by time using the feature vector of the learner and the native speaker calculated by the time-frequency dynamic warping of FIG. That is, the output value of the time difference comparison block 42b is a value obtained by calculating all phoneme-based sounding duration differences of the two voice signals arranged on the time axis and dividing the difference by the total number of phonemes.

【００３５】これと同様に、図２で周波数差比較４２ｃ
ブロックは、前記の時間差比較４２ｂブロックで求めた
方法と類似に周波数軸による線形変換を行った後、学習
者が発音した音声とネーティブスピーカーの音声間の第
１ホルマントＦ１、第２ホルマントＦ２、第３ホルマン
トＦ３等の位置差から計算する。Similarly, in FIG.
The block performs a linear transformation based on the frequency axis in a manner similar to the method obtained in the time difference comparison block 42b, and then performs a first formant F1, a second formant F2, a second formant F2 between a voice pronounced by a learner and a voice of a native speaker. It is calculated from the position difference of three formants F3 and the like.

【００３６】図５ａ乃至５ｃは、図２に図示された韻律
差計算モデルの比較グラフである。図５に図示したよう
に、２音声信号の韻律差は、ピッチ輪郭線の差から計算
されるが、音声のピッチは既存の周波数フィルタリング
やケプストラムを用いる方法によって求め、線形回帰方
法により有声音発音と無声音発音でのピッチ輪郭線が続
けられるようにする。FIGS. 5A to 5C are comparison graphs of the prosody difference calculation model shown in FIG. As shown in FIG. 5, the prosody difference between the two voice signals is calculated from the difference between the pitch contour lines. And the pitch contour line in unvoiced pronunciation is continued.

【００３７】図５ａを説明すると、ネーティブスピーカ
ーと学習者の文章発音に対し、ピッチ輪郭線を時間によ
って示したグラフである。図２の強勢差比較４４ａブロ
ックは、ピッチ輪郭線で最大ピークが表れる音節や単語
が、間違った程度を２音声の強勢差で比較する。すなわ
ち、学習者が強勢を置いて発音する音節と、ネーティブ
スピーカーの強勢音節との時間差を計算するに当り、差
区間内の音節個数が強勢差として現れることが判る。Referring to FIG. 5A, there is shown a graph showing pitch contours of a native speaker and a learner's sentence pronunciation with time. The stress difference comparison block 44a in FIG. 2 compares the wrong degree of the syllable or word in which the maximum peak appears in the pitch contour line with the stress difference of the two voices. That is, when calculating the time difference between the syllable pronounced by the learner with stress and the stressed syllable of the native speaker, the number of syllables in the difference section appears as the stress difference.

【００３８】図５ｂを説明すると、図２のイントネーシ
ョン差比較４４ｂブロックを説明するためのものであ
り、イントネーション差は文章発音のエンド部分で現れ
るピッチ輪郭線の傾き差から計算される。すなわち、普
通の文章の場合、文章エンド部分でのピッチ傾きは負数
であり、疑問文の場合は大体的に傾きは正数になる。前
記した傾き差を用いて２音声のイントネーションの差を
求めることが出来る。Referring to FIG. 5B, this is to explain the intonation difference comparison block 44b of FIG. 2, and the intonation difference is calculated from the inclination difference of the pitch contour line appearing at the end of the sentence pronunciation. That is, in the case of ordinary sentences, the pitch inclination at the end of the sentence is a negative number, and in the case of a question sentence, the inclination is generally a positive number. The difference between the intonations of the two voices can be obtained using the above-described inclination difference.

【００３９】図５ｃは、図２のリズム差比較４４ｃブロ
ックを説明するグラフであり、２音声のピッチ輪郭線で
三角形のピークを示しており、逆三角形はバリー(valle
y)を示している。２音声のリズム差は前記したピークと
バリーの個数と大きさの差から求めることが出来る。FIG. 5c is a graph illustrating the rhythm difference comparison block 44c of FIG. 2, showing a triangular peak at the pitch contour of two voices, and an inverted triangle representing a valley.
y). The rhythm difference between the two voices can be obtained from the difference between the number and the size of the peak and the barry.

【００４０】図６は、本発明に係る外国語発音学習及び
口頭テストのためのインターネットシステムの概略構成
図である。図６に図示されたように、外国語発音学習及
び口頭テストのためのサーバーコンピュータ３００は、
前記サーバーコンピュータ３０と連結されたインターネ
ット３２０連結網、前記インターネット３２０連結網を
介して接続する学習者コンピュータ３４０ａ，３４０
ｂ，・・・と学習者３６０ａ，３６０ｂ，・・・からなるシス
テムである。この時、学習者３６０ａ，３６０ｂ，・・・
は音声信号を聴き、録音できるようにマイク装置とスピ
ーカーを有する。FIG. 6 is a schematic block diagram of an Internet system for learning a foreign language pronunciation and an oral test according to the present invention. As shown in FIG. 6, the server computer 300 for foreign language pronunciation learning and an oral test includes:
Internet 320 connection network connected to the server computer 30, learner computers 340a, 340 connected via the Internet 320 connection network
, and learners 360a, 360b,... At this time, learners 360a, 360b, ...
Has a microphone device and a speaker so that it can listen to and record audio signals.

【００４１】図７は、コンピュータとサーバーコンピュ
ータ間で行われる一連のデータ処理過程を示したもので
ある。図７ａは、学習者コンピュータで大部分のアルゴ
リズムが処理されており、図７ｂではサーバーコンピュ
ータで大部分のアルゴリズムが処理される。まず、図７
ａを説明すると、学習者コンピュータ３４０では、マイ
クを介して学習者の音声発音を録音した学習者の音声信
号３４２とサーバーコンピュータ３００のネーティブス
ピーカー音声信号データベース３０２から発音練習をし
ようとするネーティブスピーカー音声信号３０４を持っ
てきて、自動発音比較アルゴリズム３４４によって２音
声の差を数値化し、発音比較結果ディスプレー３４６ブ
ロックを介して学習者コンピュータ画面に表す。FIG. 7 shows a series of data processing steps performed between the computer and the server computer. FIG. 7a shows that most of the algorithms are processed on the learner computer, and FIG. 7b shows that most of the algorithms are processed on the server computer. First, FIG.
To explain a, the learner computer 340 uses a learner's voice signal 342 obtained by recording the learner's voice pronunciation via a microphone, and a native speaker voice from the native speaker voice signal database 302 of the server computer 300 to practice pronunciation. The signal 304 is taken, the difference between the two voices is digitized by the automatic pronunciation comparison algorithm 344, and is displayed on the learner computer screen via the pronunciation comparison result display 346 block.

【００４２】また、サーバーコンピュータ３００では、
ネーティブスピーカーの音声信号データベース３０２を
具備し、学習者が要請した発音練習シナリオに従い、イ
ンターネットを介して学習者コンピュータ３４０にネー
ティブスピーカー音声信号３４２を送る。In the server computer 300,
It has a native speaker audio signal database 302 and sends native speaker audio signals 342 to the learner computer 340 via the Internet in accordance with pronunciation practice scenarios requested by the learner.

【００４３】次いで、図７ｂを説明すると、学習者コン
ピュータ３４０では、前記過程と同じく、マイクを介し
て入ってきた学習者の音声信号３４２をサーバーコンピ
ュータ３００に送り、ネーティブスピーカー音声信号デ
ータベース３０２から練習しようとするネーティブスピ
ーカー音声信号３０４を選択する。以後、自動発音比較
アルゴリズム３０６で２音声の差が数値化され学習者コ
ンピュータの発音比較結果ディスプレー３４６に送られ
画面に表示されることになる。Next, referring to FIG. 7B, the learner computer 340 sends the learner's voice signal 342 coming in through the microphone to the server computer 300 in the same manner as in the above-mentioned process, and learns from the native speaker voice signal database 302. A native speaker audio signal 304 to be selected is selected. Thereafter, the difference between the two sounds is digitized by the automatic pronunciation comparison algorithm 306, sent to the pronunciation comparison result display 346 of the learner computer, and displayed on the screen.

【００４４】図８は、本発明の実施例に係る外国語発音
学習の過程を示した流れ図である。図８を説明すると、
学習者が自分のコンピュータでインターネットを介して
サーバーコンピュータに接続する段階Ｓ４００と；学習
者が自分の情報を入力する段階Ｓ４０２と；学習者がサ
ーバーから提供される複数の発音練習シナリオの中から
希望のシナリオを選択する段階Ｓ４０４と；選択された
シナリオからネーティブスピーカーの文章発音を聴取す
る段階Ｓ４０６と；学習者自分の発音を録音する段階Ｓ
４０８と；２音声の差を自動により計算し、比較結果を
画面にディスプレーする段階Ｓ４１０と；文章発音練習
を継続するかを判断する段階Ｓ４１２と；前期段階４１
０で、中断しようとする場合、他のシナリオで練習する
かの可否を選択する段階Ｓ４１４とを備える。FIG. 8 is a flowchart showing a process of learning a foreign language pronunciation according to an embodiment of the present invention. Referring to FIG.
Step S400 in which the learner connects to the server computer via the Internet on his / her computer; Step S402 in which the learner inputs his / her information; Step S402 in which the learner desires from a plurality of pronunciation practice scenarios provided from the server. Selecting a scenario S404; listening to the sentence pronunciation of the native speaker from the selected scenario S406; and recording the learner's own pronunciation S
408; Step S410 of automatically calculating the difference between the two voices and displaying the comparison result on the screen; Step S412 of determining whether to continue the sentence pronunciation practice;
0, if it is determined that the user wants to interrupt the training, a step S414 of selecting whether or not to practice in another scenario is provided.

【００４５】図９は、本発明の実施例に係る外国語発音
口頭テストの過程を示した流れ図であり、学習者がサー
バーコンピュータに接続して自分の外国語発音能力をテ
ストする過程を示した流れ図である。図９を説明する
と、学習者がサーバーコンピュータに接続する段階Ｓ５
００と；学習者が自分の情報を入力する段階Ｓ５０２
と；学習者が発音テストをしようとする単語や文章の難
易度を選択する段階Ｓ５０４と；選択された難易度文章
に対し、ネーティブスピーカーの発音を聴取する段階Ｓ
５０６と；学習者が自分の発音を録音する段階Ｓ５０８
と；テストしようとする文章をすべて発音したかをチェ
ックする段階Ｓ５１０と；及び他の難易度の問題で再び
テストを行うかを選択する段階Ｓ５１２と；及び学習者
が発音した音声に対し、最終的な発音比較結果を画面に
ディスプレーする段階Ｓ５１４とを備える。FIG. 9 is a flowchart showing a process of a verbal test for foreign language pronunciation according to an embodiment of the present invention, in which a learner connects to a server computer to test his / her foreign language pronunciation ability. It is a flowchart. Referring to FIG. 9, the learner connects to the server computer in step S5.
00; the learner inputs his / her information S502
A step S504 for the learner to select the difficulty level of a word or a sentence to be tested for pronunciation, and a step S for listening to the pronunciation of the native speaker for the selected difficulty level sentence.
506; the learner recording his pronunciation S508
Step S510 to check whether all sentences to be tested have been pronounced; and Step S512 to select whether or not to test again for other difficulty issues; Displaying the actual pronunciation comparison result on the screen.

【００４６】[0046]

【発明の効果】以上にて説明したとおり、本発明に係る
インターネット上での自動発音比較方法を用いた外国語
発音学習及び口頭テスト方法によると、次のようなメリ
ットがある。一つ、外国語の発音を学習しようとする学
習者の音声とネーティブスピーカーの音声を、音声信号
処理技術を用いた自動音声比較アルゴリズムによって比
較することにより、学習者が自分の発音能力に対する客
観的な評価数値が判るというメリットがある。As described above, the foreign language pronunciation learning and oral test method using the automatic pronunciation comparison method on the Internet according to the present invention has the following advantages. First, by comparing the voice of the learner trying to learn the pronunciation of a foreign language with the voice of the native speaker by an automatic voice comparison algorithm using voice signal processing technology, the learner can objectively evaluate his pronunciation ability. There is a merit that a large evaluation numerical value can be understood.

【００４７】二つ、自動音声比較アルゴリズムをウェッ
ブ状態で、すなわち、インターネットに連結されたコン
ピュータで外国語発音学習と口頭テストが行える外国語
学習サービスを提供することにより、語学専門家を側に
置いて学習を行うような、学習者が自分の外国語発音能
力を検証することが出来る。Second, by providing a foreign language learning service in which the automatic speech comparison algorithm is provided in a web state, that is, a foreign language pronunciation learning and an oral test can be performed by a computer connected to the Internet, a language expert can be put on the side. Learners can verify their foreign language pronunciation ability by learning.

[Brief description of the drawings]

【図１】本発明に係るインターネット上での外国語発
音学習及び口頭テストのための自動音声比較ネットワー
クアルゴリズムのブロック構成図である。FIG. 1 is a block diagram of an automatic speech comparison network algorithm for learning a foreign language pronunciation and an oral test on the Internet according to the present invention.

【図２】図１に図示された自動音声比較ネットワーク
の部分詳細図である。FIG. 2 is a partial detailed view of the automatic audio comparison network shown in FIG. 1;

【図３】図２に図示された音韻差計算モデルで強さ差
異比較の詳細流れ図である。FIG. 3 is a detailed flowchart of comparing strength differences in the phoneme difference calculation model illustrated in FIG. 2;

【図４】図３に図示された時間−周波数動的ワーピン
グ基盤の部分詳細図である。FIG. 4 is a partial detailed view of the time-frequency dynamic warping board shown in FIG. 3;

【図５】５ａ乃至５ｃは、図２に図示された音韻差計
算モデルの比較を示したグラフである。5a to 5c are graphs showing a comparison of the phonemic difference calculation model shown in FIG. 2;

【図６】本発明に係る外国語発音学習及び口頭テスト
のためのインターネットシステムの概略構成図である。FIG. 6 is a schematic configuration diagram of an Internet system for foreign language pronunciation learning and an oral test according to the present invention.

【図７】本発明に係る学習者コンピュータとサーバー
コンピュータ間のデータ処理を示した流れ図である。FIG. 7 is a flowchart showing data processing between a learner computer and a server computer according to the present invention.

【図８】本発明の実施例に係る外国語発音学習の過程
を示した流れ図である。FIG. 8 is a flowchart illustrating a process of learning a foreign language pronunciation according to an embodiment of the present invention.

【図９】本発明の実施例に係る外国語発音口頭テスト
の過程を示した流れ図である。FIG. 9 is a flowchart illustrating a process of a verbal test for pronunciation of a foreign language according to an embodiment of the present invention.

[Explanation of symbols]

１０学習者音声信号２０ネーティブスピーカー音声信号３０自動音声比較ネットワーク４０ＤＴＷ基盤差比較モデル４２音韻差計算モデル４４韻律差計算モデル５０誤差補正神経回路網６０専門家評価比較ネットワーク７０誤差計算ネットワーク３００サーバーコンピュータインターネット３４０学習者コンピュータ Reference Signs List 10 learner speech signal 20 native speaker speech signal 30 automatic speech comparison network 40 DTW based difference comparison model 42 phoneme difference calculation model 44 prosody difference calculation model 50 error correction neural network 60 expert evaluation comparison network 70 error calculation network 300 server computer Internet 340 Learner Computer

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/00 Ｇ１０Ｌ 3/00 ５３３Ｚ 15/24 ５３９ // Ｇ１０Ｌ 101:027 ５５１Ｅ 101:18 ５７１ＳＦターム(参考） 2C028 AA03 BA03 BB04 BB06 BC02 BD02 CA13 CB02 5D015 CC03 CC04 CC11 CC12 CC13 CC14 CC15 FF03 HH07 JJ00 KK02 LL13 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 15/00 G10L 3/00 533Z 15/24 539 // G10L 101: 027 551E 101: 18 571S F term ( Reference) 2C028 AA03 BA03 BB04 BB06 BC02 BD02 CA13 CB02 5D015 CC03 CC04 CC11 CC12 CC13 CC14 CC15 FF03 HH07 JJ00 KK02 LL13

Claims

[Claims]

1. A foreign language pronunciation learning method for automatically comparing a native speaker's voice with a learner's voice and calculating the difference between the native speaker's voice and the native speaker's voice. A first step calculated by a base difference comparison network; a second step in which a difference between the calculated value of the DTW base difference comparison network and the comparison value evaluated by the expert evaluation comparison network is calculated by an error calculation network; And a third step of learning a DTW-based difference comparison network so that the first and second calculated difference values become smaller. A foreign language pronunciation learning and oral test method using an automatic pronunciation comparison method on the Internet, comprising: .

2. The DTW-based difference comparison network includes a strength difference comparison block for calculating linguistic pronunciation accuracy for each phoneme, syllable, word, and sentence, and a time difference comparison for calculating pronunciation duration difference. Block and formant
2. The method of claim 1, further comprising a phoneme difference calculation model comprising a frequency difference comparison block for calculating a difference in position. .

3. The DTW-based difference comparison network includes an intensity difference comparison block for calculating a relative position difference from a maximum peak value at a pitch contour line, and two at an end portion of pronunciation between a learner and a native speaker. And a rhythm difference comparison block calculated from the relative position and magnitude of peaks and valleys appearing between adjacent words and syllables. 2. A method for learning pronunciation of foreign languages and an oral test using an automatic pronunciation comparison method on the Internet according to claim 1, comprising a prosody difference calculation model.

4. The method for determining an automatic sounding comparison value comprises nonlinearly assembling six output values determined by the DTW-based difference comparison network via an error correction / correction neural network, 4. A pronunciation pronunciation learning method for foreign languages using an automatic pronunciation comparison method on the Internet according to claim 1, wherein an automatic pronunciation comparison value is calculated by learning so as to approach the comparative numerical value. And oral test methods.

5. The method of calculating the phoneme intensity difference comparison comprises extracting time-frequency domain features from audio signals of a native speaker and a learner, performing linear frequency conversion and DTW.
The time-frequency dynamic warping process of removing the characteristics of the speaker through the process and the difference between the frames of the feature vectors of the two voices are summed up by the Euclidean distance, and the linguistic strength difference of the phoneme is calculated. 3. A method for learning foreign language pronunciation and an oral test using an automatic pronunciation comparison method on the Internet according to claim 1 or 2, wherein the method is to obtain.

6. The learning process of the learner's foreign language pronunciation and the oral test, wherein the learner records his / her own voice on the learner's computer through a microphone, and the voice of the native speaker is the voice of the native speaker of the server computer. Brought from the signal database and listened to by the learner,
2. The method of claim 1, wherein a difference between two voices is calculated from a learner's computer by an automatic pronunciation comparison algorithm, and the result is displayed on a screen. Learning and oral test methods.

7. The learner's learning process of pronunciation in a foreign language includes the steps of: connecting the learner to a server computer via the Internet on his / her own computer; inputting his / her own information; Selecting the desired scenario from among several provided pronunciation practice scenarios; recording the learner's own pronunciation; automatically calculating the difference between the two voices and displaying the comparison result on the screen. 7. An automatic pronunciation comparison method on the Internet according to claim 6, comprising: determining whether to continue the sentence pronunciation practice; and selecting whether to practice in another scenario. A foreign language pronunciation learning and oral test method using

8. The home of the learner's foreign pronunciation learning oral test includes: a learner connecting to a server computer; a learner inputting his / her own information; and the learner attempting a pronunciation test. Selecting the difficulty level of the word or sentence; listening to the native speaker's pronunciation for the sentence of the selected difficulty level; recording the student's own pronunciation; and writing the text to be tested. Checking whether all pronunciations have been made; selecting whether to repeat the test for another difficulty level; and displaying the final pronunciation comparison result on the screen for the sound pronounced by the learner. 7. The foreign language pronunciation learning and oral test method using the automatic pronunciation comparison method on the Internet according to claim 6, characterized in that: