JPH0515280B2

JPH0515280B2 -

Info

Publication number: JPH0515280B2
Application number: JP59274771A
Authority: JP
Inventors: Tooru Fujita; Wasuke Yashiro; Eiki Kojima
Original assignee: HAKODATE KOGYO KOTO SENMON GATSUKOCHO
Current assignee: HAKODATE KOGYO KOTO SENMON GATSUKOCHO
Priority date: 1984-12-28
Filing date: 1984-12-28
Publication date: 1993-03-01
Also published as: JPS61156182A

Description

[Detailed description of the invention]

（産業上の利用分野）本発明は、音声のピツチ曲線（以下イントネー
シヨンと言う）を表示するイントネーシヨン表示
装置に関するものである。（従来の技術）我国における外国語教育、とくに外国語の会話
教育は先進諸国のそれと比較して非能率的といわ
れている。大都市なら、外国人と接する機会もあ
り、生の外国語を耳にすることができ、そして自
分の発音を外国人に聞いてもらつて発音を矯正す
ることも可能である。しかし、大都市以外では外
国人と出会う機会はめつたになく、レコードやテ
ープ等で練習できるとしても自分の発音を自分で
矯正することを難しい。なんとなれば、従来の方
法、つまり耳で聞いて真似て言う方法では、正し
い発音がなされたか否かを判定を自分自身の音感
のみに依存し、しかも生徒は正しく聞き、正しく
発音したつもりでいるのであるから、生徒の音感
が訓練されない限り、正しくない発音を正しい発
音であると誤認する結果になる。したがつて効果
的な語学演習のためには、発音が正しいか否かを
判定するのに、生徒の主観的な音感のみに頼ら
ず、客観的な物理量（具体的には音声の波形）を
も併用するのがより合理的である。このために、
CRT上に教師の発音を波形で表示し、この波形
をそのままCRT上に残し、生徒の発音の波形は
発音した都度画き変えて表示するようにすれば、
自分の発音の正否を波形を見ることにより客観的
に判定でき、語学演習に極めて効果的である。従来、上記のような発音のイントネーシヨンを
表示する装置として、Pitch extractor SE−01
（商品名．リオン株式会社製）がある。このイン
トネーシヨン表示装置は、マイク入力あるいはラ
イン入力による音声の振動数を複数チヤンネルの
バンドパスフイルタで分析して、イントネーシヨ
ンを実時間でCRT上に表示するもので、音声振
動数の分析範囲は男性の場合には約50Hz〜300Hz、
女性の場合には約50Hz〜600Hzと切換えることが
できるようになつている。また、時間を違えて発
音した２人のイントネーシヨンを、同一CRTの
上下に、また重ねて表示することができると共
に、必要に応じてCRT上の画面をプリンタでコ
ピーすることができるようになつている。（発明が解決しようとする問題点）しかし、上述した従来とインオネーシヨン表示
装置においては、各バンドパスフイルタ間に中心
周波数の間隔が広く、このため僅かでも抑揚のあ
る母音を連続して発音すると、CRT上に表示さ
れるイントネーシヨンが連続した曲線にならない
で、不連続な曲線となつてしまい、イントネーシ
ヨンの判別が困難である。また、破裂音や摩擦音
を伴なう子音を発音すると、ピツチがイントネー
シヨンの周りを飛びはねるように表示され、この
ためイントネーシヨンの判別を困難にしている。
この原因は、音声から抽出したピツチをそのまま
CRT上に表示しているためと考えられる。第２１図は上述した従来のイントネーシヨッン
表示装置によつて表示したイントネーシヨンの一
例を示すもので、「If you'll be kind enough to
take it in、I'm sure he'll see me.」を発音し
たときのものである。なお、第２１図において
は、ピツチ曲線自体を一層明瞭とするために、不
明確な所をサウンド・スペクトロ・グラフを併用
して確認しながらピツチ曲線に加筆して太く表示
している。この表示例からも明らかなように、実
際には加筆が施されない状態で表示されるため、
このような表示から画面に潜むピツチ曲線を見い
出すことは、素人の生徒には極めて困難である。
このため、利用者が専門家にかたより、また極め
て高価であるところから普及性に欠ける不具合も
ある。本発明の目的は、上述した種々の問題点を解決
し、イントネーシヨンを明瞭に表示でき、しかも
安価にできるよう適切に構成したイントネーシヨ
ン表示装置を提供しようとするものである。（問題点を解決するための手段）本発明のイントネーシヨン表示装置は、それぞ
れ異なる中心周波数を持つ複数のフイルタを有
し、音声入力信号の周波数成分を各フイルタで分
割して出力する多チヤンネルバンドパスフイルタ
と、このバンドパスフイルタからとアナログ出力
をデジタル信号に変換するＡ／Ｄコンバータと、
このＡ／Ｄコンバータの出力を順次サンプリング
し、サンプリングしたデータに基いて最大値およ
びの最大値を持つチヤンネルを検出して、これら
最大値とそのチヤンネルおよび該チヤンネルの近
傍のチヤンネルのデータとに基いて所要の演算を
行うことにより、前記バンドフイルタの分解能を
高めてサンプリグしたデータの基本周波数、すな
わちピツチを検出すると共に、この検出したピツ
チと当該サンプリング以前のサンプリングにおい
て検出したピツチとに基づいて所要の演算を行つ
て当該サンプリングにおいて検出したピツチが不
規則性のものであるか否かを判別して規則性のピ
ツチを抽出する中央処理装置と、この中央処理装
置から順次のサンプリングにおいて選択的に抽出
されるピツチ列を時系列的に表示すると共に、異
なる複数のピツチ列を色分けして同時に表示する
表示装置とを具えることを特徴とするものであ
る。（作用）上記構成において、中央処理装置は順次のサン
プリングにおいて最大値およびその最大値を持つ
チヤンネルと、該チヤンネルの近傍のチヤンネル
のデータとに基いて所要の演算を行なうことによ
り、ハードウエア上でのバンドパスフイルタの分
解能が低くても高分解能で基本周波数、すなわち
ピツチを求めることができ、これにより抑揚のあ
る母音を連続して発音してのイントネーシヨンを
連続して表示することができる。また、このよう
にして求めた当該サンプリングにおけるピツチと
それ以前のサンプリングにおいて求めたピツチと
に基いて規則性のあるピツチを抽出することによ
り、破裂音の摩擦音を伴なう子音を発音した際に
イントネーシヨンの周りに飛びはねるように表示
されるピツチを除去することができ、全体として
イントネーシヨンを正確かつ明瞭に表示すること
ができ、かつ異なる複数のピツチ列を同一の表示
装置に色分けして同時に表示することができる。
しかも、この中央処理装置は市販のマイクロコン
ピユータを用いることができ、したがつてこれに
バンドパスフイルタおよびＡ／Ｄコンバータを接
続するだけでイントネーシヨン表示装置を構成で
きるので安価にできる。（実施例）第１図は本発明のイントネーシヨン表示装置の
一例の構成を示すブロツク図である。本例では、
ライン入力端子１および２個のマイク入力端子２
ａ，２ｂを切換スイツチ３で選択し、その選択し
た入力端子からの音声信号をバンドパスフイルタ
（BPF）４に供給する。BPF４は、本例では第１
表に示す中心周波数を有し、通過帯域を各々1/3
オクターブに設定した12個のフイルタを具える12
チヤンネルのものを用い、これによつて音声信号
を12チヤンネルに分割してＡ／Ｄコンバータ５に
供給する。このBPF４の実測による特性曲線を
第２図に示す。 (Industrial Field of Application) The present invention relates to an intonation display device that displays a pitch curve (hereinafter referred to as intonation) of a voice. (Prior art) Foreign language education in Japan, especially foreign language conversation education, is said to be inefficient compared to that in developed countries. If you live in a big city, you will have the opportunity to meet foreigners, hear the foreign language live, and even have foreigners listen to your pronunciation to correct your pronunciation. However, outside of big cities, people rarely have the opportunity to meet foreigners, and even if they can practice with records or tapes, it is difficult to correct their pronunciation on their own. The problem is that with the traditional method of listening and imitating the words, students rely solely on their own pitch sense to determine whether or not the pronunciation is correct, and the students believe they have heard and pronounced the correct pronunciation. Therefore, unless students' pitch sense is trained, they will end up misperceiving incorrect pronunciation as correct pronunciation. Therefore, for effective language practice, it is necessary to use objective physical quantities (specifically, the waveform of the sound) to judge whether pronunciation is correct or not, rather than relying solely on the student's subjective sense of pitch. It is more reasonable to use them together. For this,
If you display the teacher's pronunciation as a waveform on the CRT, leave this waveform as is on the CRT, and display the student's pronunciation waveform with a different screen each time the student pronounces it.
You can objectively judge whether your pronunciation is correct or incorrect by looking at the waveform, making it extremely effective for language practice. Conventionally, Pitch extractor SE-01 was used as a device to display the intonation of pronunciation as described above.
(Product name: manufactured by Rion Co., Ltd.). This intonation display device analyzes the frequency of voice input from a microphone or line input using a multi-channel bandpass filter, and displays the intonation in real time on a CRT. The range is approximately 50Hz to 300Hz for men;
For women, it is possible to switch between approximately 50Hz and 600Hz. In addition, the intonations of two people pronounced at different times can be displayed above and below the same CRT, and the screen on the CRT can be copied using a printer if necessary. It's summery. (Problem to be Solved by the Invention) However, in the above-mentioned conventional and inonation display devices, the interval between the center frequencies between each bandpass filter is wide, so vowels with even slight intonation are continuously produced. Then, the intonation displayed on the CRT will not be a continuous curve, but will be a discontinuous curve, making it difficult to distinguish the intonation. Furthermore, when a consonant accompanied by a plosive or a fricative is pronounced, the pitch appears to jump around the intonation, making it difficult to distinguish the intonation.
The cause of this is that the pitch extracted from the audio is
This is probably because it is displayed on a CRT. FIG. 21 shows an example of intonation displayed by the above-mentioned conventional intonation display device.
"take it in, I'm sure he'll see me." In addition, in FIG. 21, in order to make the pitch curve itself even clearer, ambiguous parts are added to the pitch curve and displayed thicker while being confirmed using a sound spectrograph. As is clear from this display example, it is actually displayed without any additions, so
It is extremely difficult for an amateur student to discover the pitch curve hidden on the screen from such a display.
For this reason, users tend to be experts, and the product is extremely expensive, making it less popular. SUMMARY OF THE INVENTION An object of the present invention is to provide an intonation display device which can display intonation clearly and is suitably constructed at low cost, solving the various problems mentioned above. (Means for Solving the Problems) The intonation display device of the present invention has a plurality of filters each having a different center frequency, and has a multi-channel display in which the frequency components of an audio input signal are divided by each filter and output. a bandpass filter; an A/D converter that converts analog output from the bandpass filter into a digital signal;
The output of this A/D converter is sequentially sampled, the maximum value and the channel having the maximum value are detected based on the sampled data, and the maximum value and the channel having the maximum value are detected. By performing the necessary calculations on the band filter, the fundamental frequency, or pitch, of the sampled data is detected by increasing the resolution of the band filter. a central processing unit that performs the calculation to determine whether or not the pitches detected in the sampling are irregular and extracts regular pitches; The present invention is characterized in that it includes a display device that displays the extracted pitch strings in chronological order and simultaneously displays a plurality of different pitch strings in different colors. (Operation) In the above configuration, the central processing unit performs necessary calculations based on the maximum value in sequential sampling, the channel having the maximum value, and the data of channels in the vicinity of the channel, so that the central processing unit can perform the necessary calculations on the hardware. Even if the resolution of the bandpass filter is low, it is possible to obtain the fundamental frequency, or pitch, with high resolution, and this allows the intonation to be displayed continuously by pronouncing vowels with intonation. . In addition, by extracting regular pitches based on the pitches obtained in this sampling and the pitches obtained in previous samplings, it is possible to Pitches that appear to jump around the intonation can be removed, the intonation can be displayed accurately and clearly as a whole, and different pitch rows can be color coded on the same display device. and can be displayed at the same time.
Moreover, a commercially available microcomputer can be used as the central processing unit, and an intonation display device can be constructed simply by connecting a bandpass filter and an A/D converter to this central processing unit, making it possible to reduce the cost. (Embodiment) FIG. 1 is a block diagram showing the structure of an example of an intonation display device of the present invention. In this example,
Line input terminal 1 and two microphone input terminals 2
a and 2b are selected by a changeover switch 3, and the audio signal from the selected input terminal is supplied to a bandpass filter (BPF) 4. BPF4 is the first
It has the center frequency shown in the table, and the passband is 1/3 each.
12 with 12 filters set in octaves
The audio signal is divided into 12 channels and supplied to the A/D converter 5. Figure 2 shows the measured characteristic curve of this BPF4.

【表】【table】

[Processing A]

Y1＝８｛MCH＋（MCH＋１）Ｄ／2MCHD＋
OFFSET｝〔処理Ｂ〕 Y1＝８｛MCH−（MCH−１）Ｄ／2MCHD＋
OFFSET｝を示す。なお、上記処理Ａおよび処理Ｂにおい
て、MCHは最大値を持つチヤンネルであり、
MCHDはそのチヤンネルの持つ値を示す。また、
OFFSETはCRT９上に表示されるイントネーシ
ヨンの位置を示す。このように、ある時点での各チヤネルのＤの値
に基いて上記の処理Ａおよび処理Ｂを選択的に行
なえば、ハードウエア上での分解能が低くても高
分解能を得ることができ、例えば１チヤンネル〜
12チヤンネル（各チヤンネルは1/3オクターブ）
のデータを使う場合においては100チヤンネル
（分解能７Hz）の分解能を得ることができる。第
６図はマイクから取込んだ50〜1000Hzの周波数に
対し、１チヤンネル〜12チヤンネルのデータを用
いて上記処理Ａ、Ｂを行なつた結果得られたチヤ
ンネルの関係を示すものである。不規則性ピツチに対する処理破裂音、摩擦音等を伴なう子音が発音されると
不規則生ピツチが発生し、そのピツチがイントネ
ーシヨン曲線の周りに飛び散り、イントネーシヨ
ンの判別が困難になる。そこで、本例では上記の
処理Ａ、Ｂによつて得られたピツチに基いて以下
の処理を行ない、不規則性ピツチを表示しないよ
うにする。すなわち、ある時点において処理Ａまたは処理
Ｂによつて求めたピツチをY1、またY1の直前に
同様にして取込んた６個の連続しているピツチ
Y2〜Y7までの７個の平均値をY0とするとき、｜Y1−Y0｜＜Ｌを満たすY1のみを表示し、その他の値を持つY1
は表示しないようにする。本例にいては、上記Ｌ
の値を６に設定する。このようにして不規則性ピ
ツチの表示を除去することにより、イントネーシ
ヨン曲線の判別を容易に行なうことができる。以下、本実施例の動作を説明する。第７図は本実施例の全体の動作を示すフローチ
ヤートである。先ず、FD８に格納されているプ
ログラムをCPU６に読み込み、キーボード７の
「RUN」キーを押してプログラムを起動させる。
プログラムの起動により、先ず、CPT９に「先
生は男性か、女性か？」と表示する。ここで、先
生が男性のときは「１」のキーを、女性のときは
「１」以外のキーを押して性別を指定する。続い
て、CRT９にイントネーシヨンを表示する時間
間隔を表示する。本例では、この時間間隔を2.5
秒、4.5秒および10秒とし、「１」のキーによつて
2.5秒を、「２」のキーによつて4.5秒を、「３」の
キーによつて10秒の時間間隔を指定し、各時間間
隔において音声信号を500回サンプリングしてイ
ントネーシヨンを表示する。以上の処理後、マイクあるいはラインからの音
声を取込む。この音声の取込みは第８図に示すフ
ローチヤートに従つて行ない、これによりBPF
４で分割されたＡ／Ｄコンバータ５によつてデジ
タル信号に変換された各チヤンネルの12ビツトの
データをサンプリングし、その最下位の１ビツト
を捨てて８ビツトのデータに対数圧縮してCPU
６のメモリに格納する。ここで、Ａ／Ｄコンバー
タ５における変換速度は、指定されたイントネー
シヨンの表示時間間隔に基いて、例えば表示時間
間隔が2.5秒のときは1KHzに設定する。サンプリングした12チヤンネルのデータに対し
て上記の処理が終了した後、その対数圧縮した12
チヤンネルの８ビツトのデータに基いて、第９図
に示すフローチヤートに従つて分解能を高める処
理を行なう。この処理において先ず対数圧縮した
各チヤネルのデータを取込んで、Ｄ＝２×（対数圧縮した各チヤンネルのデー
タ）／チヤンネル数を演算して低減を増幅する
処理を行なうと共に、その演算結果がＤ＜８であ
るか否かを判別してＤ＜８のデータを除去するこ
とにより雑音を除去する処理を行なう。これら低
減増幅処理および雑音除去処理を指定された性別
に対応するチヤンネル数、本例では男性の場合に
は１チヤンネルから９チヤンネルまで、女性の場
合には１チヤンネルから12チヤンネルまで行なつ
て最大値を持つチヤンネル数とその値Ｄを保存す
る。ただし、男性の場合における最大値の検出は
１チヤンネルから８チヤンネルまでの間で行な
う。設定チヤンネルに対する上記の処理が終了した
ら、男性の場合には１〜８チヤンネル、女性の場
合には１〜12チヤンネルの全てのDNが０か否か
を判断し、０でない場合には保存した最大値を持
つチヤンネル数とその値、およにその前後のチヤ
ンネル数とその値に基いて以下のBRF４の分解
能を高める処理を行なう。すなわち、本例では男
性の場合には１チヤンネルから９チヤンネルま
で、女性の場合には１チヤンネルから12チヤンネ
ルまでのデータを用いるものであるから、男性の
場合には、１チヤンネルが最大値を持つときは処理Ａ、２〜８チヤンネルの中のＮチヤンネルが最大
値を持ち、かつその前後のチヤンネルの値Ｄ（Ｎ−１）、Ｄ（Ｎ＋１）がＤ（Ｎ−１）≧Ｄ（Ｎ＋１）のときは処理Ｂ、Ｄ（Ｎ−１）＜Ｄ（Ｎ＋１）のときは処理Ａを行ない、女性の場合には、１チヤンネルが最大値の持つときは処理Ａ、 12チヤンネルが最大値を持つときは処理Ｂ、２〜11チヤンネルの中のＮチヤンネルが最大
値を持ち、かつその前後のチヤンネルの値Ｄ（Ｎ−１）、Ｄ（Ｎ＋１）がＤ（Ｎ−１）≧（Ｎ＋１）のときは処理Ｂ、Ｄ（Ｎ−１）＜Ｄ（Ｎ＋１）のときは処理Ａを行なう。以上の分解能を高める処理が終了したり、当該
サンプリングにおいて求めたピツチY1とそのY1
を含むそれ以前の順次の７個のピツチの平均値
Y0とに基いて、第１０図に示すフローチヤート
に従つて不規則性ピツチに対する処理を行ない、
｜Y1−Y0｜＜６を満たすY1のみをCRT９に表
示すると共に、当該サンプリングにおけるY1と
それ以前の順次のサンプリングにおけるY2〜Y7
との７個のピツチの平均値Y0を求める。一方、第９図において性別に応じた所定範囲の
各チヤンネルの値DNが全て０のときは、BPF４
の分解性を高める処理および表示を行なうことな
く、その値をY1として第１０図に示す平均値Y0
を求める処理を行なう。以上の処理を指定した表示時間間隔の中で500
回繰返して、マイクあるいはラインからの先生の
音声を取込んでそのイントネーシヨンをCRT９
の上段に例えば赤色で表示する。なお、上記の処
理において、不規則性ピツチに対する処理は、本
例ではY1〜Y7のデータが揃う７回目のサンプリ
ングから行ない、それまでのサンプリングにおい
ては処理Ａあるいは処理Ｂで求めたピツチを全て
表示する。次に、先生に続いて発音する生徒に対して、
CRT９に「あなたは男性か、女性か？」と表示
する。ここで、生徒が男性のときは「１」のキー
を、女性のときは「１」以外のキーを押して性別
を指定する。以後、生徒が先生の発音を耳で聞
き、CRT９の先生のイントネーシヨンを目で見
ならがら真似て発音することにより、上述した先
生における処理と同様の処理を行なつて、マイク
からの生徒の音声を取込んでそのイントネーシヨ
ンをCRT９の下段に例えば緑色で表示する。な
お、このときのイントネーシヨンの表示時間間隔
を先生の場合と同じである。生徒のイントネーシヨン表示後、CRT９「繰
返しますか？」と表示する。ここで、同じ発音の
繰返し練習する場合には「Ｙ」のキーを押すこと
により、生徒の緑色のイントネーシヨンのみを消
去して再び同じ発音の練習を行ない得るようにす
る。また、他の発音練習を行なう場合には「Ｎ」
のキーを押すことにより、先生および生徒のイン
トネーシヨンを消去して初期状態に復帰させる。以上の動作を実行するプログラムの一例を以下
に示す。なお、以下のプログラムにおいて、ベー
シツクで書かれているプログラムは、アセンブラ
で記述され、マシン語で動作しているプログラム
をサブルーチンとして使つている。以下、本実施例によるイントネーシヨンの表示
例を第１１図〜第２０図に示す。なお、第１１図
〜第２０図において、横軸は時間（秒）であり、
左端から右端までの時間は全て2.5秒に設定して
ある。また、縦軸は音声の振動数であり、上方を
先生用のチヤンネルに、下方を生徒用のチヤンネ
ルに設定し、CRT９上では先生のイントネーシ
ヨンは赤色、生徒のイントネーシヨンを緑色、枠
を黄色で表示したものである。第１１図はカセツトテープに録音された「1st
day how to explain places and location」の
男性による英語の発音を、カセツトテープを２回
繰返して、先生用および生徒用チヤンネルにその
イントネーシヨンを表示したものである。第１１
図から明らかなように、カセツトテープによる同
一の発音であつても、音声をサンプリングするタ
イミングの相違によりCRT９に表示されるイン
トネーシヨンに若干の相違があるが、第２１図に
示した従来装置におけるものよりも遥かに明瞭に
イントネーシヨンが表示されている。第１２図はカセツトテープに録音された
「Simmple greeting」の男性による英語の発音
を、カセツトテーテプを２回繰返して、同様に先
生用および生徒用チヤンネルにそのイントネーシ
ヨンを表示したものである。この表示例では、
greeting中の破裂音の部分（ting）が若干乱れて
いるが、相対的にそのイントネーシヨンを明瞭に
認識することができる。第１３図はカセツトテープに録音された
「There's hardly ａ cloud in the sky」の女
性による英語の発音を、同様にカセツトテープの
２回繰返して先生用および生徒用チヤンネルにそ
のイントネーシヨンを表示したもので、イントネ
ーシヨンが明瞭に表示されている。第１４図は同じくカセツトテープに録音された
「His business is good this year」の女性によ
る英語の発音を、同様にカセツトテープを２回繰
返して先生および生徒用チヤンネルにそのイント
ネーシヨンを表示したものである。この場合の発
音には、businessの破裂音の部分（bus）やthis
の摩擦音があるが、これらが飛び散ることなく、
明瞭に表示されている。第１５図は同じくカセツトテープに録音された
「Did you pay very much？」の男性による英
語の発音を、カセツトテープを２回繰返して先生
用および生徒用チヤンネルにそのイントネーシヨ
ンを表示したもので、payの破裂音が飛び散るこ
となくイントネーシヨンが明瞭に表示されてい
る。第１６図はマイクから「今日はよいお天気で
す」の男性による日本語の発音を、普通の速さと
早口で発音して、それぞれの発音のイントネーシ
ヨンを先生および生徒用チヤンネルに表示したも
のである。一般に、男性のイントネーシヨンの表
示は難しいとされているが、本実施例において
は、上記の発音において普通の速さで発音したと
きの「天気です」の部分が散らばつているだけ
で、いずれの場合においても相対的にそのイント
ネーシヨンを明瞭に識別することができる。第１７図はマイクから「今日はよいお天気で
す」の女性による日本語の発音を普通の速さで２
回繰返して、先生用および生徒用チヤンネルにそ
のイントネーシヨンを表示したものである。この
女性の発音は抑揚に欠けるが、イントネーシヨン
は明瞭に表示されている。第１８図はマイクから「本日は晴天なり」の男
性による日本語の発音を、普通の速さと早口で発
音して、それぞれの発音のイントネーシヨンを先
生用および生徒用チヤンネルに表示したものであ
る。この場合の男性の声は少し甲高いせいか、普
通の速さで発音しても、早口で発音してもイント
ネーシヨンが明瞭に表示されている。第１９図はマイクから「本日は晴天なり」の女
性による日本語の発音を、普通の速さで２回繰返
して、先生用および生徒用チヤンネルにそのイン
トネーシヨンを表示したものである。この場合、
摩擦音と破裂音が続いている「晴天（せいてん）」
の「せいて」の部分の表示が不明瞭であるが、相
対的にイントネーシヨンは明瞭に識別することが
できる。第２０図は第１６図の発音を行なつた男性によ
り、マイクからピツチを変化させて「あ」の発音
を行なつた表示例を示すもので、そのピツチの変
化が明瞭に表示されている。以上の表示例から明らかなように、本実施例に
よれば破裂音や摩擦音があつてもイントネーシヨ
ンを明確に識別できるように常に明瞭に表示する
ことができる。なお、本発明は上述した実施例にのみ限定され
るものではなく、幾多の変形または変更が可能で
ある。例えば、BPF４のチヤンネル数は12チヤ
ンネルに限らず、それよりも多くあるいは少くし
てもよい。また、使用するチヤンネルデータは性
別によつて一義的に設定するのではなく、任意に
選択するようにすることもできる。更に、Ａ／Ｄ
コンバータは12ビツトのものに限らず８ビツト、
16ビツト等のものを用いることができると共に、
そのＡ／Ｄ変換したデータを対数圧縮することな
く、そのまま処理することもできる。更にまた、
上述した実施例では音声周波数の低減増強処理を
行なつたが、この処理は音声入力段の総合特性が
低減増強形である場合には省くことができると共
に、音声入力段の総合特性が低音域ほど極端に増
幅されている場合には逆に高域増強処理を行なえ
ばよい。この高域増強処理は、例えば２×Ｄ／
｛（Ｍ＋１）−Ｎ｝によつて行なうことができる。
ここで、Ｍは最終チヤンネル数で上述した実施例
の場合には12、ＮはデータＤのチヤンネル数を表
わす。また、上述した実施例ではBPF４の分解
能を高めるにあたつて、男性の場合には１〜９チ
ヤンネルのデータを用いて１〜８チヤンネルの中
で最大値を検出し、１チヤンネルが最大値を持つ
ときは処理Ａを、２〜８チヤンネルの中のＮチヤ
ンネルが最大値を持つときはその前後のチヤンネ
ルの値Ｄ（Ｎ−１）、Ｄ（Ｎ＋１）の大小関係に応
じて処理Ａあるいは処理Ｂを行なうようにした
が、指定された使用するチヤンネルの最終チヤン
ネルが最大値を持つときは、その前後のチヤンネ
ルの大小関係を比較することなく直接処理Ｂを行
なうようにすることもできる。更に、上述した実
施例では７回目以降のサンプリングから不規則性
ピツチに対する処理を行なうようにしたが、２〜
６回目のサンプリグにおいてもそれまでのサンプ
リングにおけるピツチの平均値、ただし２回目の
サンプリングにおいては１回目のサンプリングに
おけるピツチ、を求めて同様の処理を行なうこと
もできる。（発明の効果）以上述べたように、本発明によればイントネー
シヨンを常に明瞭に、かつ異なる複数のピツチ列
を同一の表示装置に色分けして同時に表示でき、
しかも市販のマイクロコンピユータにBPFと
Ａ／Ｄコンバータとを接触するだけで簡単に構成
できると共に安価にできる。したがつて、中・高
生でも簡単に利用することができると共に、学校
や家庭での語学練習に広く活用することができ
る。 Y1=8 {MCH+(MCH+1)D/2MCHD+
OFFSET} [Processing B] Y1=8 {MCH−(MCH−1)D/2MCHD+
OFFSET}. In addition, in the above processing A and processing B, MCH is the channel with the maximum value,
MCHD indicates the value of that channel. Also,
OFFSET indicates the intonation position displayed on the CRT9. In this way, if the above processing A and processing B are selectively performed based on the value of D of each channel at a certain point in time, high resolution can be obtained even if the resolution on the hardware is low. For example, 1 channel ~
12 channels (each channel is 1/3 octave)
When using this data, a resolution of 100 channels (resolution 7Hz) can be obtained. FIG. 6 shows the channel relationship obtained as a result of performing the above processing A and B using the data of channels 1 to 12 for the frequency of 50 to 1000 Hz taken in from the microphone. Processing for irregular pitches When consonants with plosives, fricatives, etc. are pronounced, irregular raw pitches occur, and the pitches scatter around the intonation curve, making it difficult to distinguish intonation. . Therefore, in this example, the following processing is performed based on the pitches obtained by the above processes A and B, so that irregular pitches are not displayed. In other words, the pitch obtained by processing A or processing B at a certain point in time is Y1, and the six consecutive pitches obtained in the same way immediately before Y1 are
When Y0 is the average value of the seven values from Y2 to Y7, only Y1 that satisfies |Y1−Y0|<L is displayed, and Y1 with other values are displayed.
will not be displayed. In this example, the above L
Set the value to 6. By removing the display of irregular pitches in this manner, intonation curves can be easily identified. The operation of this embodiment will be explained below. FIG. 7 is a flowchart showing the overall operation of this embodiment. First, load the program stored in the FD 8 into the CPU 6, and press the "RUN" key on the keyboard 7 to start the program.
When the program is started, the CPT 9 first displays "Is the teacher a man or a woman?". Here, if the teacher is male, press the ``1'' key, if the teacher is female, press a key other than ``1'' to specify the gender. Next, the time interval for displaying intonation is displayed on the CRT 9. In this example, we set this time interval to 2.5
seconds, 4.5 seconds and 10 seconds, and by the "1" key
Specify a time interval of 2.5 seconds, 4.5 seconds with the "2" key, and 10 seconds with the "3" key, sample the audio signal 500 times at each time interval, and display the intonation. do. After the above processing, audio from the microphone or line is captured. This audio capture is performed according to the flowchart shown in Figure 8, and as a result, the BPF
The 12-bit data of each channel converted into a digital signal by the A/D converter 5 divided by 4 is sampled, the lowest 1 bit is discarded, and logarithmically compressed to 8-bit data is sent to the CPU.
6 memory. Here, the conversion speed in the A/D converter 5 is set to 1 KHz based on the designated intonation display time interval, for example, when the display time interval is 2.5 seconds. After the above processing is completed for the sampled 12 channel data, the logarithmically compressed 12
Based on the 8-bit data of the channel, processing for increasing the resolution is performed according to the flowchart shown in FIG. In this process, first the data of each channel that has been logarithmically compressed is taken in, and the reduction is amplified by calculating D=2×(data of each channel that is logarithmically compressed)/number of channels, and the calculation result is D Processing to remove noise is performed by determining whether or not D<8 and removing data where D<8. These reduction amplification processing and noise removal processing are performed on the number of channels corresponding to the specified gender, in this example, from 1 channel to 9 channels for men, and from 1 channel to 12 channels for women, and the maximum value is reached. The number of channels with , and its value D are saved. However, in the case of a male, the detection of the maximum value is performed from channel 1 to channel 8. When the above processing for the set channels is completed, determine whether all DNs of channels 1 to 8 for men and channels 1 to 12 for women are 0, and if they are not 0, the saved maximum Based on the number of channels with a value and their values, and the number of channels before and after them and their values, the following processing to increase the resolution of BRF4 is performed. In other words, this example uses data from channels 1 to 9 for men, and from channels 1 to 12 for women, so channel 1 has the maximum value for men. In this case, process A, the N channel among channels 2 to 8 has the maximum value, and the values of the channels before and after it D(N-1) and D(N+1) are D(N-1) ≥ D(N+1). When D(N-1)<D(N+1), process A is performed.For women, when channel 1 has the maximum value, process A, and channel 12 has the maximum value. In case of processing B, the N channel among channels 2 to 11 has the maximum value, and the values D(N-1) and D(N+1) of the channels before and after it are D(N-1) ≥ (N+1). If so, process B is performed, and if D(N-1)<D(N+1), process A is performed. When the processing to increase the resolution above is completed, or the pitch Y1 obtained in the relevant sampling and its Y1
The average value of the previous seven consecutive pitches including
Y0, the irregular pitches are processed according to the flowchart shown in FIG.
Only Y1 that satisfies |Y1−Y0|<6 is displayed on the CRT9, and Y1 in the current sampling and Y2 to Y7 in the previous sequential samplings are displayed on the CRT9.
Find the average value Y0 of the seven pitches. On the other hand, in Figure 9, when the values DN of each channel in the predetermined range according to gender are all 0, BPF4
The average value Y0 shown in FIG.
Perform the process to find . 500 within the specified display time interval
Repeat several times to capture the teacher's voice from the microphone or line and record the intonation on the CRT9.
For example, it is displayed in red at the top of the page. In addition, in the above processing, processing for irregular pitches is performed from the 7th sampling when data from Y1 to Y7 is complete in this example, and all pitches obtained in processing A or processing B are displayed in the sampling up to that point. do. Next, for the students who pronounce after the teacher,
Displays "Are you a man or a woman?" on the CRT9. Here, if the student is male, press the "1" key, and if the student is female, press a key other than "1" to specify the gender. After that, the students listen to the teacher's pronunciation with their ears, imitate the teacher's intonation on the CRT9, and perform the same process as that for the teacher described above. The audio is captured and its intonation is displayed in green, for example, at the bottom of the CRT 9. Note that the intonation display time interval at this time is the same as that for the teacher. After displaying the student's intonation, CRT9 displays "Do you want to repeat?" Here, if the same pronunciation is to be practiced repeatedly, by pressing the "Y" key, only the green intonation of the student is deleted so that the student can practice the same pronunciation again. Also, if you want to practice other pronunciations, press "N".
By pressing the key, the teacher's and student's intonations are erased and the initial state is restored. An example of a program that executes the above operations is shown below. Note that in the following programs, the programs written in Basic are written in assembler and use programs running in machine language as subroutines. Examples of intonation display according to this embodiment are shown in FIGS. 11 to 20 below. In addition, in FIGS. 11 to 20, the horizontal axis is time (seconds),
The time from the left edge to the right edge is all set to 2.5 seconds. The vertical axis is the frequency of the audio, and the upper channel is set to the teacher's channel, and the lower channel is set to the student's channel.On the CRT9, the teacher's intonation is red, the student's intonation is green, and the frame is is displayed in yellow. Figure 11 shows "1st" recorded on a cassette tape.
A man's English pronunciation of "day how to explain places and locations" is repeated twice on a cassette tape, and the intonation is displayed on the teacher and student channels. 11th
As is clear from the figure, even if the pronunciation is the same on the cassette tape, there are slight differences in the intonation displayed on the CRT9 due to differences in the timing of audio sampling. The intonation is displayed much more clearly than in . Figure 12 shows a man's English pronunciation of ``Simple greeting'' recorded on a cassette tape, the cassette tape repeated twice, and the intonation similarly displayed on the teacher and student channels. . In this display example,
Although the plosive part (ting) in the greeting is slightly distorted, the intonation can be recognized relatively clearly. Figure 13 shows a woman's English pronunciation of ``There's hardly a cloud in the sky'' recorded on a cassette tape, which is also repeated twice on the cassette tape and the intonation displayed on the teacher and student channels. The intonation is clearly displayed. Figure 14 shows a woman's English pronunciation of ``His business is good this year'', which was also recorded on a cassette tape, with the cassette tape repeated twice and the intonation displayed on the teacher and student channels. It is. The pronunciation in this case includes the plosive part of business (bus) and this.
There are fricative sounds, but these do not scatter,
clearly displayed. Figure 15 shows a man's English pronunciation of "Did you pay very much?" which was also recorded on a cassette tape, with the cassette tape repeated twice and the intonation displayed on the teacher and student channels. , the intonation is clearly displayed without the plosive sounds of pay. Figure 16 shows a man's Japanese pronunciation of ``Today is a nice weather'' from a microphone at normal and rapid speed, and the intonation of each pronunciation is displayed on the teacher and student channels. be. In general, it is said that it is difficult to display male intonation, but in this example, in the above pronunciation, the parts of "weathering" when pronounced at normal speed are only scattered, In either case, the intonation can be relatively clearly identified. Figure 17 shows a woman's Japanese pronunciation of ``Today is a nice weather'' from a microphone at a normal speed.
The intonation is displayed repeatedly on the teacher and student channels. This woman's pronunciation lacks intonation, but the intonation is clearly displayed. Figure 18 shows the Japanese pronunciation of ``Today is a sunny day'' by a man from a microphone at normal speed and fast speaking, and the intonation of each pronunciation is displayed on the teacher and student channels. be. Perhaps because the male voice in this case is a little high-pitched, the intonation is clearly displayed whether it is pronounced at normal speed or fast. Figure 19 shows a woman's Japanese pronunciation of ``Today is a sunny day'' from a microphone, repeated twice at normal speed, and the intonation displayed on the teacher's and student's channels. in this case,
"Sunny weather" with continuous fricatives and plosive sounds
Although the display of the part ``because of'' is unclear, the intonation can be relatively clearly identified. Figure 20 shows an example of a display in which the man who made the pronunciation in Figure 16 produced the pronunciation of "a" by varying the pitch from the microphone, and the changes in pitch are clearly displayed. . As is clear from the above display examples, according to this embodiment, even if there are plosives or fricatives, the intonation can always be displayed clearly so that the intonation can be clearly identified. Note that the present invention is not limited only to the embodiments described above, and numerous modifications and changes are possible. For example, the number of channels in BPF4 is not limited to 12 channels, and may be greater or less than that. Furthermore, the channel data to be used may not be uniquely set based on gender, but may be arbitrarily selected. Furthermore, A/D
Converters are not limited to 12 bits, but 8 bits,
16 bits etc. can be used, and
The A/D converted data can also be processed as is without being logarithmically compressed. Furthermore,
In the above embodiment, the audio frequency is reduced and enhanced, but this processing can be omitted if the overall characteristics of the audio input stage are of the reduction and enhancement type, and the overall characteristics of the audio input stage are of the low frequency range. If the signal is extremely amplified, high frequency enhancement processing may be performed. This high frequency enhancement processing is performed by, for example, 2×D/
This can be done by {(M+1)-N}.
Here, M is the final number of channels, which is 12 in the case of the embodiment described above, and N is the number of channels of data D. In addition, in the above embodiment, when increasing the resolution of BPF4, in the case of a male, the maximum value among channels 1 to 8 is detected using data from channels 1 to 9, and channel 1 detects the maximum value. When channel N out of channels 2 to 8 has the maximum value, process A or process A depending on the magnitude relationship of the values D(N-1) and D(N+1) of the channels before and after it. However, when the final channel of the designated channels to be used has the maximum value, it is also possible to directly perform process B without comparing the magnitude relationship of the channels before and after it. Furthermore, in the embodiment described above, processing for irregular pitches is performed from the 7th sampling onward;
In the sixth sampling, the average value of the pitch in the previous samplings can be obtained, however, in the second sampling, the pitch in the first sampling can be obtained and similar processing can be performed. (Effects of the Invention) As described above, according to the present invention, intonation can always be clearly displayed, and a plurality of different pitch rows can be displayed simultaneously on the same display device in different colors.
Moreover, it can be constructed easily and inexpensively by simply connecting the BPF and A/D converter to a commercially available microcomputer. Therefore, it can be easily used even by junior high and high school students, and can be widely used for language practice at school or at home.

[Brief explanation of the drawing]

第１図は本発明のイントネーシヨン表示装置の
一例の構成を示すブロツク図、第２図は第１図に
示すBPFの特性を示す図、第３図は音声入力段
の総合特性をBPFの各チヤンネルに対応して示
す図、第４図は低域増強処理を説明するための
図、第５図は雑音の発生態様を示す図、第６図は
分解能を高める処理によつて得られる振動数とチ
ヤンネルとの関係を示す図、第７図〜第１０図は
第１図に示すイントネーシヨン表示装置の動作の
一例を説明するためのフローチヤート、第１１図
〜第２０図は第１図に示すイントネーシヨン表示
装置による表示例を示す図、第２１図は従来のイ
ントネーシヨン表示装置による表示例を示す図で
ある。１……ライン入力端子、２ａ，２ｂ……マイク
入力端子、３……切換スイツチ、４……BPF（バ
ンドパスフイルタ）、５……Ａ／Ｄコンバータ、
６……CPU（中央処理装置）、７……キーボード、
８……FD（フロピーデイスク）、９……CRT（表
示装置）。 FIG. 1 is a block diagram showing the configuration of an example of the intonation display device of the present invention, FIG. 2 is a diagram showing the characteristics of the BPF shown in FIG. 1, and FIG. 3 is a diagram showing the overall characteristics of the audio input stage of the BPF. Diagrams shown corresponding to each channel, Figure 4 is a diagram to explain low frequency enhancement processing, Figure 5 is a diagram showing how noise is generated, and Figure 6 is a diagram showing vibrations obtained by processing to increase resolution. 7 to 10 are flowcharts for explaining an example of the operation of the intonation display device shown in FIG. FIG. 21 is a diagram showing an example of display by the intonation display device shown in the figure, and FIG. 21 is a diagram showing an example of display by the conventional intonation display device. 1... Line input terminal, 2a, 2b... Microphone input terminal, 3... Selector switch, 4... BPF (band pass filter), 5... A/D converter,
6...CPU (Central Processing Unit), 7...Keyboard,
8...FD (floppy disk), 9...CRT (display device).

Claims

[Claims] 1. A multichannel bandpass filter that has a plurality of filters each having a different center frequency and that divides and outputs the frequency component of an audio input signal by each filter, and an analog output from this bandpass filter. an A/D converter that converts the output into a digital signal, and sequentially samples the output of this A/D converter, detects the maximum value and the channel having the maximum value based on the sampled data,
By performing necessary calculations based on these maximum values, the channel, and the data of channels in the vicinity of the channel, the resolution of the bandpass filter is increased to detect the fundamental frequency, that is, the pitch, of the sampled data, and Based on the detected pitch and the pitch detected in the sampling before the sampling, necessary calculations are performed to determine whether the pitch detected in the sampling is irregular or not, and the pitch is determined to be regular. The present invention includes a central processing unit for extraction, and a display device that displays pitch columns selectively extracted from the central processing unit in sequential sampling in chronological order, and simultaneously displays a plurality of different pitch columns in different colors. An intonation display device characterized by the ability to