JPS6360484A

JPS6360484A - Enunciation training machine

Info

Publication number: JPS6360484A
Application number: JP20266586A
Authority: JP
Inventors: 島田　和俊
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1986-08-30
Filing date: 1986-08-30
Publication date: 1988-03-16

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、音声出力された、例えば外国語の会話を聞
いて、その会話を繰り返して発音練習する発声練習機に
関するものである。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention relates to a voice training device that listens to an audio output conversation, for example, in a foreign language, and repeats the conversation to practice pronunciation.

[Conventional technology]

従来、発声練習機、特に外国語の発音を独学するめの機
器が各種実用化されている。2. Description of the Related Art Hitherto, various types of vocal practice devices, particularly devices for self-studying the pronunciation of foreign languages, have been put into practical use.

これらの機種における発音練習プロセスは、まず磁気テ
ープ等に記録された手本データを再生し、その音声デー
タを聞いて発音練習者が同じように発音を繰り返して練
習するＬＬ学習が広く普及し、数多くのソフトが販売さ
れている。The pronunciation practice process for these models is to first play back model data recorded on magnetic tape, etc., and then LL learning, in which the pronunciation practitioner repeats the pronunciation in the same way while listening to the audio data, has become widespread. Many software are on sale.

[Problem that the invention seeks to solve]

ところが、このようなＬＬ学習は、テープに録音された
先生の発音を聞いて、単に繰り返すという淡白な学習プ
ロセスのため、先生が発声した発音と自分が発声した発
音とがどの程度正確に発音されたかどうかを判定するこ
とができないため、学習成果が現われにくく、学習効果
に個人差が生じ易い問題点があった。また、最近は外国
人による日本語の修得意欲が急速に強まりつつあるにも
かかわらず、的確な日本語発音を指導する指導者が少な
いため、外国人の不満が高まりつつあるのが現状で、そ
のような練習機器の開発が強く叫ばれている。However, this kind of LL learning is a simple learning process in which you listen to the teacher's pronunciation recorded on tape and simply repeat it, so you do not know how accurately the teacher's pronunciation and your own pronunciation are pronounced. Since it is not possible to determine whether or not the learning results have been achieved, it is difficult to see learning results, and there are problems in that learning effects tend to vary from person to person. In addition, although foreigners' desire to learn Japanese has been rapidly increasing recently, there are currently only a few instructors who can teach accurate Japanese pronunciation, which is causing dissatisfaction among foreigners. There is a strong demand for the development of such training equipment.

この発明は、上記の問題点を解消するためになされたも
ので、異なる発声源から発声された音声データを解析し
てその音声パターンを比較することにより、両者の類似
度を判定して、的確に学習効果を学習者に通知できる発
声練習機を得ることを目的とする。This invention was made to solve the above problems, and by analyzing voice data uttered from different voice sources and comparing the voice patterns, the degree of similarity between the two is determined, and accurate The purpose of the present invention is to obtain a vocal training device that can notify learners of learning effects.

[Means for solving problems]

この発明に係る発声練習機は、手本となる第１の音声デ
ータおよびこの第１の音声データに基づいて発声練習者
から発声される第２の音声データを入力する音声入力手
段と、この音声入力手段から入力される第１および第２
の音声データを記憶する音声データ記憶手段と、この音
声データ記憶手段に記憶された第１および第２の音声デ
ータをそれぞれ読み出してそれぞれの音声データの周波
数スペクトルを解析して音声特徴パターンをそれぞれ抽
出するパターン抽出手段と、このパターン抽出手段によ
りそれぞれ抽出される音声特徴パターンを比較して類似
度を判定する判定手段と、この判定手段が判定した前記
類似度を段階的に表示する表示手段とを設けたものであ
る。The voice training device according to the present invention includes a voice input means for inputting first voice data serving as a model and second voice data uttered by a voice practitioner based on the first voice data; The first and second input from the input means
a voice data storage means for storing voice data; and a voice data storage means for reading out first and second voice data stored in the voice data storage means, analyzing the frequency spectrum of each voice data, and extracting voice characteristic patterns, respectively. a pattern extracting means for determining the similarity, a determining means for comparing the respective audio feature patterns extracted by the pattern extracting means to determine the degree of similarity, and a display means for displaying the degree of similarity determined by the determining means in stages. It was established.

[Effect]

この発明においては、パターン抽出手段が音声入力手段
から入力されて音声データ記憶手段に記憶された第１．
第２の音声データを読み出し、それぞれの音声データの
周波数スペクトルを解析して音声特徴パターンをそれぞ
れ抽出すると、判定手段が抽出されたそれぞれの音声特
徴パターンを比較して類似度を判定し、その類似結果を
表示手段に表示させる。In this invention, the pattern extracting means receives the first pattern input from the voice input means and stores the voice data in the voice data storage means.
When the second audio data is read out and the frequency spectrum of each audio data is analyzed to extract each audio feature pattern, the determining means compares each extracted audio feature pattern to determine the degree of similarity, and determines the similarity. Display the results on the display means.

〔Example〕

第１図はこの発明の一実施例を示す発声練習機の一例を
説明するブロック図であり、１は音声入力手段となるマ
イクで、後述するラジオカセットテープレコーダから出
力される学習テープまたは外国語放送（第１の音声デー
タ）および発声練習者から発声された肉音（第２の音声
データ）を集音する。２は入力切換部で、マイク１が集
音する第１および第２の音声データを切換スイッチ２ａ
によって切り換える。３は音声メモリで、手本音声メモ
リ３ａ　、練習音声メモリ３ｂ等から構成されており、
手本音声メモリ３ａはマイク１から集音される第１の音
声データを記憶し、練習音声メモリ３ｂはマイク１から
集音される第２の音声データを記憶する。４は特徴抽出
部で、手本音声特徴抽出部４ａ　、練習音声特徴抽出部
４ｂ等から構成されており、手本音声特徴抽出部４ａは
手本音声メモリ３ａに記憶された第１の音声データを読
み出し、高速フーリエ変換処理を行って、第１の音声デ
ータの周波数スペクトルを解析して周波数パワースペク
トラムを求めた後、母音音声の対応に有効なケプストラ
ムを抽出し、練習音声特徴抽出部４ｂは練習音声メモリ
３ｂに記憶された第２の音声データを読み出し、高速フ
ーリエ変換処理を行って、第２の音声データの周波数ス
ペクトルを解析して周波数パワースペクトラムを求めた
後、母音音声の対応に有効なケプストラムを抽出する。FIG. 1 is a block diagram illustrating an example of a vocal practice machine showing an embodiment of the present invention. Reference numeral 1 denotes a microphone serving as a voice input means, and a learning tape or foreign language output from a radio cassette tape recorder (to be described later) is used. Broadcasts (first audio data) and physical sounds uttered by a vocal practitioner (second audio data) are collected. Reference numeral 2 denotes an input switching unit, which selects the first and second audio data collected by the microphone 1 using a switch 2a.
Switch by. 3 is a voice memory, which is composed of a model voice memory 3a, a practice voice memory 3b, etc.
The model audio memory 3a stores first audio data collected from the microphone 1, and the practice audio memory 3b stores second audio data collected from the microphone 1. Reference numeral 4 denotes a feature extraction section, which is composed of a model speech feature extraction section 4a, a practice speech feature extraction section 4b, etc., and the model speech feature extraction section 4a extracts the first speech data stored in the model speech memory 3a. After reading out and performing fast Fourier transform processing and analyzing the frequency spectrum of the first voice data to obtain a frequency power spectrum, a cepstrum that is effective for vowel voice correspondence is extracted, and the practice voice feature extraction unit 4b The second voice data stored in the practice voice memory 3b is read out, fast Fourier transform processing is performed, and the frequency spectrum of the second voice data is analyzed to obtain a frequency power spectrum, which is effective for vowel voice correspondence. Extract the cepstrum.

５はパターンマツチング処理部で、手本音声特徴抽出部
４ａ、、ｌｉ習音声特徴抽出部４ｂから抽出されたｉｌ
および第２の音声データのケプストラムの特徴パラメー
タの距離を時間軸整合を取りながら求め、この距離によ
って両者のパターン類似度データを類似度判定部６に出
力する。類似度判定部６はパターンマツチング処理部５
から出力されたパターン類似度データを、例えば「良い
発音です」、「もう少しです」、「何回も練習しましょ
う」等の３段階に評価し、例えばＬＥＤで構成される表
示器７８〜７Ｃで表示する。Reference numeral 5 denotes a pattern matching processing unit that processes the patterns extracted from the model speech feature extraction unit 4a, , and the learned speech feature extraction unit 4b.
The distance between the cepstrum feature parameters of the second audio data and the second audio data is determined while aligning the time axis, and pattern similarity data between the two is output to the similarity determination section 6 based on this distance. The similarity determination unit 6 is a pattern matching processing unit 5.
The pattern similarity data outputted from is evaluated in three stages, such as "Good pronunciation,""A little more," and "Let's practice many times." indicate.

第２図は、第１図に示した発声練習機による発音練習形
態の一例を示す概要図であり、第１図と同一のものには
同じ符号を付しである。FIG. 2 is a schematic diagram showing an example of a pronunciation training form using the vocal training machine shown in FIG. 1, and the same parts as in FIG. 1 are given the same reference numerals.

この図において、１１は第１の音声データ源となるラジ
オカセットテープレコーダ（ＲＣＴＲ）で、第１の音声
データとなる市販されている学習会話テープまたは外国
語学習番組等をスピーカから音声出力する。１２は第２
の音声データ源となる練習者で、ＲＣＴＲＩＩから音声
出力された第１の音声データを聞き取り、第２の音声デ
ータとなる肉声を発する。In this figure, reference numeral 11 denotes a radio cassette tape recorder (RCTR) that serves as a first audio data source, and outputs audio from a speaker of a commercially available learning conversation tape or foreign language learning program that serves as the first audio data. 12 is the second
A practitioner who serves as a source of voice data listens to the first voice data outputted from the RCTRII and utters a real voice that becomes the second voice data.

次に第３図に示すフローチャートに基づいてこの発明に
よる発音練習評価制御動作について説明する。Next, the pronunciation practice evaluation control operation according to the present invention will be explained based on the flowchart shown in FIG.

第３図はこの発明による発音練習評価制御動作について
説明するフローチャートである。なお、（１）〜（９）
は各ステップを示す。FIG. 3 is a flowchart illustrating the pronunciation practice evaluation control operation according to the present invention. In addition, (1) to (9)
indicates each step.

マイク１により集音された第１および第２の音声データ
を切換スイッチ２ａを操作して、第１の音声データを手
本音声メモリ３ａに格納するとともに、第２の音声デー
タを練習音声メモリに格納する（１）。次いで、手本音
声特徴抽出部４ａが手本音声メモリ３ａに格納された第
１の音声データを読み出しく２）、高速フーリエ変換処
理を行って、周波数パワースペクトラムを求めた後、母
音音声の対応に有効なケプストラムを抽出してパターン
マツチング処理部５にパラメータを出力する特徴抽出を
行う（３）０次いで、練習音声メモリ３ｂに格納された
第２の音声データを読み出しく４）、高速フーリエ変換
処理を行って、周波数パワースペクトラムを求めた後、
母音音声の対応に有効なケプストラムを抽出してパター
ンマツチング処理部５にパラメータを出力する特徴抽出
を行う（５）、続いて、パターンマツチング処理部５が
手本音声特徴抽出部４ａ、！！習音声特徴抽出部４ｂか
ら抽出された第１および第２の音声データのケプストラ
ムの特徴パラメータの距＃Ｄを時間軸整合を取りながら
求めるパターンマツチングを行い（Ｂ）、この距＃Ｄに
よって両者のパターン類似度データ（距ｆｔＤ）を類似
度判定部６に出力する（７）０次いで、類似度判定部６
．パターンマツチング処理部５で求められたパラメータ
の距＠Ｄをあらかじめ決められた距離データＮｌ、Ｎ２
　とを比較しく８）、その差に基づいて表示器７ａ〜７
Ｃのうち１つを点灯する（９）。By operating the changeover switch 2a, the first and second audio data collected by the microphone 1 are stored in the model audio memory 3a, and the second audio data is stored in the practice audio memory. Store (1). Next, the model speech feature extraction unit 4a reads out the first speech data stored in the model speech memory 3a2), performs fast Fourier transform processing to obtain the frequency power spectrum, and then calculates the vowel speech correspondence. Perform feature extraction to extract the effective cepstrum and output parameters to the pattern matching processing unit 5 (3) 0 Next, read out the second audio data stored in the training audio memory 3b 4) Fast Fourier After performing the conversion process and obtaining the frequency power spectrum,
Feature extraction is performed by extracting a cepstrum that is effective for vowel speech correspondence and outputting parameters to the pattern matching processing section 5 (5).Subsequently, the pattern matching processing section 5 extracts the model speech feature extraction section 4a, ! ! Pattern matching is performed to find the distance #D of the cepstrum feature parameters of the first and second audio data extracted from the learned speech feature extraction unit 4b while aligning the time axis (B), and by this distance #D, both output the pattern similarity data (distance ftD) to the similarity determination unit 6 (7) 0 Then, the similarity determination unit 6
．． The parameter distance @D obtained by the pattern matching processing section 5 is converted into predetermined distance data Nl, N2.
8), and the indicators 7a to 7 are displayed based on the difference.
Light up one of C (9).

なお、上記実施例では１表示器７ａ〜７Ｃにより、発音
の類似度を３段階に評価した場合について説明したが、
評価数を増加することにより、学習者が学習進度を的確
に把握できる。In addition, in the above embodiment, the case where the pronunciation similarity is evaluated in three stages using the 1 display units 7a to 7C was explained.
By increasing the number of evaluations, learners can accurately grasp their learning progress.

〔Effect of the invention〕

以上説明したように、この発明は手本となる第１の音声
データおよびこの第１の音声データに基づいて発声練習
者から発声される＄２の音声データを入力する音声入力
手段と、この音声入力手段から入力される第１および第
２の音声データを記憶する音声データ記憶手段と、この
音声データ記憶手段に記憶された第１および第２の音声
データをそれぞれ読み出してそれぞれの音声データの周
波数スペクトルを解析して音声特徴パターンをそれぞれ
抽出するパターン抽出手段と、このパターン抽出手段に
よりそれぞれ抽出される音声特徴パターンを比較して類
似度を判定する判定手段と、この判定手段が判定した前
記類似度を段階的に表示する表示手段とを設けたので、
市販されている学習テープまたは放送された外国語講座
等を手本として発音練習を行えるとともに、その手本と
練習者との発音状態の評価を的確に表示してくれ、練習
者の学習意欲を大幅に促進し、極めて優れた学習効果を
発揮する等の利点を有する。As explained above, the present invention includes a voice input means for inputting first voice data serving as a model and voice data of $2 uttered by a vocal practitioner based on the first voice data, and A voice data storage means for storing first and second voice data inputted from the input means; and a frequency of each voice data by reading out the first and second voice data stored in the voice data storage means. A pattern extracting means for analyzing the spectrum and extracting each audio feature pattern, a determining means for comparing the audio feature patterns respectively extracted by the pattern extracting means and determining the degree of similarity, and the similarity determined by the determining means. Since we have provided a display means to display the degree in stages,
In addition to being able to practice pronunciation using commercially available learning tapes or broadcast foreign language courses as models, it also accurately displays the pronunciation status of the model and the practitioner, thereby increasing the practitioner's motivation to learn. It has the advantage of greatly promoting learning and exhibiting extremely excellent learning effects.

[Brief explanation of the drawing]

第１図はこの発明の一実施例を示す発声練習機の一例を
説明するブロック図、第２図は、第１図に示した発声練
習機による発音練習形態の一例を示す概要図、第３図は
この発明による発音練習評価制御動作について説明する
フローチャートである。図中、１はマイク、２は入力切換部、３は音声メモリ、
４は特徴抽出部、５はパターンマツチング部、６は類似
度判定部、７ａ〜７Ｃは表示器である。FIG. 1 is a block diagram illustrating an example of a vocal training machine according to an embodiment of the present invention, FIG. 2 is a schematic diagram showing an example of a pronunciation training form using the vocal training machine shown in FIG. The figure is a flowchart illustrating the pronunciation practice evaluation control operation according to the present invention. In the figure, 1 is a microphone, 2 is an input switching unit, 3 is an audio memory,
4 is a feature extraction section, 5 is a pattern matching section, 6 is a similarity determination section, and 7a to 7C are indicators.

Claims

[Claims]

a voice input means for inputting first voice data serving as a model and second voice data uttered by a voice practitioner based on the first voice data; A voice data storage means for storing second voice data, and a voice characteristic pattern is determined by reading the first and second voice data stored in the voice data storage means and analyzing the frequency spectrum of each voice data. A pattern extracting means for extracting the respective patterns, a determining means for comparing the audio feature patterns respectively extracted by the pattern extracting means to determine the degree of similarity, and a display means for displaying the degree of similarity determined by the determining means in stages. A vocal training machine characterized by comprising the following.