JPS6078471A

JPS6078471A - Enunciation training apparatus

Info

Publication number: JPS6078471A
Application number: JP58185983A
Authority: JP
Inventors: 広沢　和豊
Original assignee: Agency of Industrial Science and Technology
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 1983-10-06
Filing date: 1983-10-06
Publication date: 1985-05-04

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は発語に伴う音声の特徴信号を抽出し、表示装置
に上記特徴信号を視覚化して発語訓練を行う発語訓練装
置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a speech training device that extracts feature signals of speech accompanying speech and visualizes the feature signals on a display device for speech training. .

従来例の構成とその問題点従来、表示装置Ｖこ１゛１声の特徴信−υを視覚化して
発語訓練を行う発語訓練装置において、表示装置の表示
画面に表示開始端点から、１つの発話語の表示に必彎な
所定の１１．１１間（例えば、５秒）を割当てて１フレ
ームを構成し、抽出された音声の特徴信号をＪｌ、：　
Ｕ！表示開始端点からフレームの終点まで逐時表示し、
終点に達した後、１１■び表示開始ｉｆ’ｉ！ｊｉ点に
１７＜って、１アレーン・の所定時間１１ノにザイクリ
ックに表示を行っているが、表示開始点から１フレーム
終了寸でのノリ１定位置に４、ｖ漱信号を表示させるの
に、発話者自身で発話のタイミンクをとる為、例えば、
障害者や１′０１、に７・ｊシては、発話が早すぎたり
、あるいはｊ′Ｊテすき／こシして」二、ｉ己所定のフ
レーム内に一つの発語１ｉ１’ｉが納１りきらない場合
がある。Configuration of the conventional example and its problems Conventionally, in a speech training device that performs speech training by visualizing the characteristic signal -υ of each voice on the display screen of the display device, from the display start point to the display screen of the display device, One frame is constructed by allocating a predetermined 11.11 period (for example, 5 seconds) necessary for displaying one spoken word, and the extracted voice characteristic signal is Jl:
U! Displays sequentially from the display start end point to the end point of the frame,
After reaching the end point, 11 and the display starts if'i! 17 < is displayed at the ji point at a predetermined time 11 of 1 arene, but the 4 and v signals are displayed at a fixed position of 1 from the display start point to the end of 1 frame. However, in order for the speaker to time the utterance himself, for example,
People with disabilities and people with disabilities may speak too quickly, or if they hear one utterance within a given frame. There may be cases where the payment is not completed.

さらに表示がザイクリックに繰返される為、タイミング
をとる時に緊張したシ、発話タイミングが誤った時に次
のフレームの表示開始端点まで得たねばならず、発話と
にｌ’ｌ’ｉ　１４１１的圧迫を与えるという問題があ
る。Furthermore, since the display is repeated cyclically, I was nervous when trying to figure out the timing, and when I made a mistake in the timing of my speech, I had to get to the display start point of the next frame, which put pressure on my speech. There is a problem of giving.

発明の目的本発明は上記従来の欠点を解消するものであシ、発話の
特徴信号から、発語、非発語の自動判別手段を設け、発
話者の発語の開始によって、」−記特徴信号の１フレー
ムの表示がなされるようにして、発語タイミングがＪレ
ジやすく、−発語情報を１７Ｖ−ム内に容易に納め得る
発語訓練装置を提供することを目的とする。OBJECTS OF THE INVENTION The present invention solves the above-mentioned drawbacks of the conventional art.It is an object of the present invention to provide automatic discrimination means for speech and non-speech based on characteristic signals of speech, and to detect the "-" characteristic by the start of speech by the speaker. To provide a speech training device in which one frame of a signal is displayed so that speech timing can be easily adjusted and speech information can be easily stored within 17V.

発明の構成本発明は、発話者の音声の特徴信号を抽出する特徴抽出
手段と、前記音声の特徴信号を記憶する記憶手段と、前
記音声の特徴信号から発語の開始を判別する発語判別手
段と、前記発語判別手段の非発語状態出力において時間
軸１フレーム分の表示範囲を分割して表示開始点から分
割された一区画の表示範囲で前記音声の特徴データをシ
フト表示し、前記発語判別手段の発語状態出力において
上記表示範囲から抜出して時間軸１フレーム分の表示を
行う表示制御手段を備えた発語訓練装置であって、表示
領域の一部を使って発語前の特徴信号を表示しておき、
定話者の発ａ７ｉによって１動的に特徴信号の表示を終
端点に向けて継続して表示するととにより、発、ｉ、ｉ
と表示タイミングを容易に合致させて所定の表示ソレー
ノ・内に発語情報を表示することができると１１、に、
発語前の特徴信号の立上り情報についても火１１．１１
間１トＩを卸゛持して表示ｉＴＪ能である。Structure of the Invention The present invention provides a feature extracting means for extracting a characteristic signal of a speaker's voice, a storage means for storing the characteristic signal of the voice, and an utterance discrimination device for determining the start of speech from the characteristic signal of the voice. means, dividing a display range of one frame on the time axis in the non-speech state output of the utterance determining means, and shifting and displaying the voice characteristic data in one section of display range divided from a display start point; The speech training device is equipped with a display control means for displaying one frame on the time axis extracted from the display range in the speech state output of the speech discriminating means, and the speech training device is equipped with a display control means for displaying one frame on the time axis by extracting the speech state output from the speech discriminating means, Display the previous feature signal,
By continuously displaying the characteristic signal dynamically toward the terminal point by the utterance a7i of a regular speaker, the utterance i, i
11. It is possible to easily match the display timing and display the utterance information within a predetermined display solenoid.
Regarding the rising edge information of the feature signal before speech, also on Tuesday 11.11
It is possible to maintain and display 1 bit in between.

実施例の説明第１図は本発明の−・実施例における発語訓練装置１？
Ｉの構成を示すブＩＪツク図である。１はマイクロホン
１ａ、声帯倣動センサ１ｂ　、　弁振動センサ１Ｃ１呼
気流センザ１ｄ　ｊ；）７のセンサ信号から、音声イン
テンシテイ、のど振動エンベローソ°、ハナ］辰動エン
ベロープ等発語訓練に必要な音声の特徴信号を抽出する
特徴抽出回路である。２Ｉ″ｉ前記特徴信号の表示１フ
レ一ム分のデータを記憶する記憶手段、３は特徴抽出回
路１のｑ′！ｆ徴信号より、発話者の発語の開始を判別
する発語判別手段、４は前記特徴信号をＸ−Ｙ座標４；
９・のグフフィクに表示する為のテレビジョン等の表示
装置、５は特徴抽出回路１の特徴信号、発語判別手段３
の出力信号を入力とする表示制御手段で、上記入力信号
ｉ対応して記憶手段２に音声の特徴データを記憶すると
ともに表示装置４に上記データを表示制御するものであ
る。DESCRIPTION OF EMBODIMENTS FIG. 1 shows a speech training device 1 according to an embodiment of the present invention.
1 is a block diagram showing the configuration of IJ. 1 is a microphone 1a, a vocal fold motion sensor 1b, a valve vibration sensor 1C1 an exhalation flow sensor 1dj; This is a feature extraction circuit that extracts a feature signal. 2I''i storage means for storing data for one display frame of the characteristic signal; 3 utterance discrimination means for determining the start of the speaker's utterance from the q'!f characteristic signal of the feature extraction circuit 1; , 4 represents the feature signal in X-Y coordinates 4;
9. A display device such as a television for displaying graphic images; 5 is a feature signal of the feature extraction circuit 1; and utterance discrimination means 3.
The display control means receives the output signal i as an input, and stores voice characteristic data in the storage means 2 in correspondence with the input signal i and controls the display of the data on the display device 4.

第２図は表示装置４の表示画面の描成例を示すもので、
表示開始端点６から横方向の表示終端点７までの範囲で
一発ＨＱの人力に髪する時間設定が為されてお見特徴抽
出信号の所要のサンブリンク周期で規定される表示分解
能で」二記特徴信号を表示する。第３図は、特徴抽出回
路１の出力借り波形図を示すもので、ここではこの信号
を全４！’ｒ判別手段３の人力信号としている。第４図
は記憶手段２の記憶フォーマットを示すものである。FIG. 2 shows an example of how the display screen of the display device 4 is drawn.
In the range from the display start end point 6 to the display end point 7 in the horizontal direction, the time required for one shot HQ to be manually set is set, and the display resolution specified by the required sunblink period of the viewing feature extraction signal is set. Display the characteristic signal. FIG. 3 shows a diagram of the output waveform of the feature extraction circuit 1. Here, all 4! of this signal are shown. 'r is the human input signal of the discriminating means 3. FIG. 4 shows the storage format of the storage means 2.

以」二のように構成された本実施例の発、？Ｌ１訓練装
買について、以下その動作を説明する。This embodiment is structured as follows. The operation of L1 training equipment will be explained below.

表示制御手段６は所定のサンプリング周期で特徴抽出回
路１の出力である特徴信号を入力して記憶手段２に第４
図に示すようなフォーマットで表示開始端点６に相当す
る記憶領域りより順次データを記憶していくとＪ（、に
表示画面の表示開始端点６よシ順次表示していく。ここ
で人力時間’ｒｗ後の特徴値づが発語判別手段３によっ
て発語状態と判別されない時（第３図の電圧Ｖｔｈ以丁
）、表示制御手段５は記１；ｉ’ｊ手段２のメモリ上の
Ｔｗ間のデータ（第４図）を開始端点方向にそれぞれ１
データシフトさせると共に表示装置４の表示画面中のＴ
ｗの表示区画の表示をンフトさせる。以」二の処理は発
語判別手段３の出力が非発語状態と検出された期間、繰
返される３、ここで第３図に示すように時刻ｔｗにおい
てｆ（ｊ　＋、７！判別手段３の出力が発語状態と検出
された時、表示制御手段６ｄ、これ１てのＴｗの区間の
表シｊ″、処理を抜は出し一〇表示画面の終端点７０方
向に表示処理を実行する。The display control means 6 inputs the feature signal which is the output of the feature extraction circuit 1 at a predetermined sampling period and stores it in the storage means 2 as a fourth signal.
When data is stored sequentially from the storage area corresponding to the display start end point 6 in the format shown in the figure, data is sequentially displayed from the display start end point 6 on the display screen. When the characteristic value after rw is not determined to be a speech state by the speech discrimination means 3 (more than the voltage Vth in FIG. 3), the display control means 5 displays the value 1; data (Fig. 4) in the direction of the starting end point.
The data is shifted and the T on the display screen of the display device 4 is
The display of w's display area is lowered. The second process is repeated during the period in which the output of the speech discrimination means 3 is detected to be in a non-speech state.Here, as shown in FIG. When the output of is detected as a speech state, the display control means 6d extracts the table screen j'' of the section of Tw and executes the display process in the direction of the terminal point 70 of the display screen. .

第６図は以」−の動作を示すフローチャートである。以
下、各ステツプ□について説明する。FIG. 6 is a flowchart showing the following operations. Each step □ will be explained below.

（イ）：初期化のステツプ゛であり、入カザンプ／ｌ／
数ｎ＝ｏにセットする。(b): This is the initialization step, and the input /l/
Set the number n=o.

（ロ）：特徴データＤ　＋＋が人力されると、このデー
タを記憶手段２のアドレスＡｎに格納する。(b): When the feature data D++ is input manually, this data is stored at address An of the storage means 2.

最初のデータＤ。はアドレスＡ。に格納される。First data D. is address A. is stored in

（ハ）：動作開始からの時間−１ｎが表示装置４の画面
の分割表示期間’ｒｗに相当する時間廿になっているか
どうかを判定する。(C): Determine whether the time -1n from the start of the operation corresponds to the divided display period 'rw of the screen of the display device 4.

に）：ｔｎがｔｗになっていないときにはアドレスＡｙ
に格納されているデータＤｎを時間軸ｔｎに４゛目当す
る画面位置に表示する。アドレス人。に格納されている
データＤ。は時間ｔ。すなわち表示開始位置６に表示さ
れる。): Address Ay when tn is not tw
The data Dn stored in is displayed at the desired screen position by 4 degrees on the time axis tn. Address people. Data D stored in . is time t. That is, it is displayed at display start position 6.

（ホ）：次のサンプル処理に移る。(e): Move on to next sample processing.

以下、（ロ）〜Ｃ→がｔｎ≧ｔｗになるまで繰り返され
る。Thereafter, (b) to C→ are repeated until tn≧tw.

（へ）：データＤｎが所定値Ｄｔｈ（第３図ではＶｕ＋
）より大きいかどうか判定する。すなわち、ｆａ　ｌ；
Ａされたかどうかを判定する。(to): Data Dn is a predetermined value Dth (in FIG. 3, Vu+
). That is, fa l;
Determine whether A has been performed.

（ト）：　Ｄｎ　（Ｄｔｈの場合、すなわち発語されて
いない場合、アドレスＡ。−Ａｗに格納されているデー
タを１つシフトし、）、　ｏ　−Ａｗ−、に移す（結果
としてＡ。に格納されていたデータは捨てられる）。(G): Dn (In case of Dth, that is, when no utterance has been made, shift the data stored at address A.-Aw by one), move it to o -Aw-, (as a result, to A. (The stored data will be discarded.)

（イ）ニアドレスＡ。−ＡＶ−１に格納されているデー
タを表示する（第２図に示すＴｗの期間に表示される）
。以１・、（ロ）に戻る。(b) Near address A. -Display the data stored in AV-1 (displayed during the period Tw shown in Figure 2)
. Return to 1., (b).

（す）　：　Ｄｎ≧Ｄｔｈになると、すなわち発語され
ると、データＤｎを１１・冒１ｉ１　ｔ、位置に表示す
る。(S): When Dn≧Dth, that is, when a word is uttered, data Dn is displayed at position 11.1i1t.

し）：次のサンプル処理に移る。): Move to next sample processing.

（／→：時間ｔｎが１フレームに４・１１当する時間に
なっていないかどうかを判定する。(/→: Determine whether or not the time tn corresponds to 4.11 in one frame.

（プ）：１ｎ≦ｔｋのとき、次のデータＤｎを格納し、
（す）に戻る。(P): When 1n≦tk, stores the next data Dn,
Return to (su).

以−１−ｌＱｌ、１においで、ｔｎ＞ｔｋになるまで（
す）〜（ワ）が縁りＪｌｙ、えされる。Therefore, in -1-lQl, 1, until tn>tk (
(S) ~ (W) is the edge Jly, and is drawn.

第１図の発ｉｆ、７ｉ’Ｆｌ団１汀段３において、発語
状態の判別は？゛↑声イフィンテンシティいはのど振動
エンベロープのいずれかの嶌シ：上りによって行えば、
全ての発語の判別がＩＩＮｊＬである。さらに、鼻振動
エンベロープ等他の１．１ノ徴パラメータの立上りを使
用してもかまわない。How can you determine the speech state in Figure 1, if, 7i'Fl group 1, stage 3?゛↑Voice intensity or throat vibration envelope: If done by ascending,
The discrimination of all utterances is IINjL. Additionally, the rise of other 1.1 characteristic parameters such as the nasal vibration envelope may be used.

以上のように本実施例によれば、発語判別手段３を設け
ることにより、自動的に発話者の発語タイミンクが表示
画面の所定の位置に表示されるので発話者の余分な介在
を一必要としない。これは発語のモデル表示との比較訓
練に対しても有効となる。また、表示画面と記憶エリア
の一部に、発語と検出されない低いレベルの信号を記憶
ならびに表示しているので、発語開始前の低いレベルか
らの立上り特性表示も実時間に行なうことができる。As described above, according to this embodiment, by providing the utterance determination means 3, the utterance timing of the speaker is automatically displayed at a predetermined position on the display screen, thereby eliminating unnecessary intervention by the speaker. do not need. This is also effective for comparison training with a model display of speech. In addition, low-level signals that are not detected as speech are stored and displayed on the display screen and part of the memory area, so it is possible to display the rise characteristic from a low level before speech starts in real time. .

発明の効果本発明の発５？１訓練装置は発語開始の自動判別手段と
発語前のデータを一部の分割領域に表示割切１する表示
制御手段を設けることにより、実時間性を有してなお、
発語前後の状態から発語終了までのデータを表示画面上
の一フレーム内に容易に設定できるので、発話者が発語
タイミングをとるのが不要となり、その実用的効果は大
きい。Effects of the Invention The utterance 5-1 training device of the present invention achieves real-time performance by providing means for automatically determining the start of speech and display control means for displaying pre-utterance data in some divided areas. Even though I have
Since data from the state before and after speech to the end of speech can be easily set within one frame on the display screen, it is not necessary for the speaker to time the speech, and this has a great practical effect.

[Brief explanation of the drawing]

第１図は本発明の一実施例における発語訓練装置のブロ
ック図、第２図は同実施例の表示画…１フォーマット図
、第３図は特徴信号の一例を示す波形図、第４図は記憶
手段の記憶フォーマ、１・を示すフォーマット図、第６
図は」記実施例の動作を示すフローチャニドである。１・・・・・・！１、＋Ｊ徴油抽出回路２・・・・・記
憶手段、３・・・・発語判別手段、４・・・・表７’１
．：装置、５・・・・表示制御手段。持、ｒＩ出ｊｉｉｒｉ人Ｉｍ技術院１４　川Ｉｌｌ　裕
部第２図第３図Ｍ４図Fig. 1 is a block diagram of a speech training device according to an embodiment of the present invention, Fig. 2 is a display screen of the same embodiment...1 format diagram, Fig. 3 is a waveform diagram showing an example of a characteristic signal, and Fig. 4 6 is a format diagram showing the storage former of the storage means, 1.
The figure is a flowchart showing the operation of the embodiment described above. 1...! 1.+J oil extraction circuit 2...Storage means, 3...Utterance discrimination means, 4...Table 7'1
．． : device, 5...display control means. 14 River Ill Yube Figure 2 Figure 3 Figure M4

Claims

[Claims]

a feature extracting means for extracting a characteristic signal of a speaker's voice; a storage means for storing the characteristic signal of the voice; an utterance determining means for determining the start of speech from the characteristic signal of the voice; In the output of the non-speech state of the means, the display range of one frame on the time axis is divided, and the characteristic data of the voice is shifted 1 in the display range of a certain section divided from the display start point.
-Table, :; 17. A ttf training characterized by comprising display control means for displaying one frame on the time axis by extracting from the display range in the speech state output of the speech discriminating means. Device.