JP2908720B2

JP2908720B2 - Synthetic based conversation training device and method

Info

Publication number: JP2908720B2
Application number: JP7085534A
Authority: JP
Inventors: ヘクター、ラウル、ジャブキン; エリザベス、グレイス、キート; ノーマ、アントナンザス、バロソ; ブライアン、アレン、ハンソン
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-04-12
Filing date: 1995-04-11
Publication date: 1999-06-21
Anticipated expiration: 2014-06-21
Also published as: JPH0830190A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、会話（ｓｐｅｅｃｈ）
訓練装置（ｔｒａｉｎｉｎｇｓｙｓｔｅｍ）及び方法
（ｍｅｔｈｏｄ）に関し、特に、言語に障害を持つ生徒
の訓練装置及び方法に関する。また、聴覚障害者以外の
学習者の外国語会話の学習に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speech.
The present invention relates to a training system and a method, and more particularly, to a training apparatus and a method for a student having a language disorder. Also, it relates to learning of foreign language conversations of learners other than hearing-impaired persons.

【０００２】[0002]

【従来の技術】耳の不自由な児童に会話、特に発声、こ
の他口の動き等を視認することによる聞き取りを教える
最も基本的な教授（訓練）法としては、教師が自らの口
等の発声器官（ｖｏｃａｌａｐｐａｒａｔｕｓ）を使
って正しい発音動作を示す方法がある。これによって、
児童学習者（訓練生）は、言語音声作成時におこる唇や
顎などの外部発声器官の動作を観察でき、また舌の動き
もある程度までは観察できる。触覚を利用する教授法と
して、学習者が教師の発声器官と自分の発声器官を触っ
て比較し、自分の誤りを正す方法も時折使用されてい
る。このような従来の教授法には、発声の動作の多くは
外部から観察できないなどの限界があった。2. Description of the Related Art The most basic teaching (training) method of teaching a hearing-impaired child to speak, in particular to speak, and to hear by visually observing the movements of the mouth, etc. There is a method of using a vocal apparatus to indicate a correct sounding operation. by this,
The child learner (trainee) can observe the movement of the external vocal organs such as the lips and chin that occur during the production of language speech, and can also observe the movement of the tongue to some extent. As a tactile teaching method, a method in which a learner touches and compares a teacher's vocal organs with his / her own vocal organs to correct their own mistakes is sometimes used. Such conventional teaching methods have limitations such as that many of the vocal movements cannot be observed from the outside.

【０００３】近年、音声分析を行う装置またはコンピュ
ータプログラムを使用して、言語の発声方法を解説する
教授法が可能となっている。これらの装置やプログラム
を使用した教授法では、学習者は発声情報の表示など数
多くの音声の発声に伴う各種の特徴を観察することが出
来る。そのようなシステムの最適な使用例として、松下
（Ｍａｔｓｕｓｈｉｔａ）によって開発されたコンピュ
ータを内蔵した会話、特に発声訓練補助装置（ＣＩＳＴ
Ａ，ＣｏｍｐｕｔｅｒＩｎｔｅｇｒａｔｅｄＳｐｅｅ
ｃｈＴｒａｉｎｉｎｇＡｉｄ）がある。ＣＩＳＴＡ
では、数個の変換器によって集められた複数チャンネル
データが供給される。その内容は、以下のようなもので
ある。In recent years, it has become possible to use a device for analyzing speech or a computer program to teach a method of speaking a language. In the teaching method using these devices and programs, the learner can observe various features associated with the production of many voices, such as display of voice information. An example of an optimal use of such a system is a computer-integrated conversation developed by Matsushita, especially a vocal training aid (CIST).
A, Computer IntegratedSpeed
ch Training Aid). CISTA
Provides multi-channel data collected by several converters. The contents are as follows.

【０００４】１．動的口蓋図。１９６２年に旧ソビエ
トの学者、Ｙ．Ｋｕｚｍｉｎが初めて使用した教授方法
であり、発声時に起こる舌と口蓋の接触が、口中に人工
的に設けられた口蓋上の多くの電極の接触を利用して示
される。舌が電極の１つに触れると、口外に通ずる機器
との低電圧回路が通じ、これを検知する。更に、接触の
有無は、ＣＲＴ（表示装置）に表示される。[0004] 1. Dynamic palate diagram. In 1962, a former Soviet scholar, Y. Kuzmin is the first teaching method used, and the contact between the tongue and the palate that occurs during vocalization is shown using the contact of many electrodes on the palate artificially provided in the mouth. When the tongue touches one of the electrodes, a low voltage circuit with the device that goes outside the mouth communicates and detects this. Further, the presence or absence of the contact is displayed on a CRT (display device).

【０００５】２．鼻音センサ。鼻腔の側面にヘッドギ
ア（頭部を介しての固定具、Ｈｅａｄｇｅａｒ）また
は一時的に粘着テープで取り付けられたマイクロフォン
（Ｅｌｅｃｔｒｅｔｍｉｃｒｏｐｈｏｎｅ）が鼻音化
の震えを検出する。３．喉センサ。喉頭にフレキシブルな襟で取り付けら
れたマイクフォンが声門の震えを検出する。[0005] 2. Nasal sound sensor. Headgear (Head gear) on the side of the nasal cavity or a microphone (Electret microphone) temporarily attached with adhesive tape detects the trembling of nasalization. 3. Throat sensor. A microphone attached to the larynx with a flexible collar detects glottal tremors.

【０００６】４．空気流センサ。児童学習者の口の前
で、吐き出す空気の流れを感知する装置を使用するもの
であり数種の方法がある。５．標準的なマイクロフォンで音声分析用の入力を行
う。耳の不自由な児童学習者への会話の教育は、教師に
よる教育時間が制約されるためなかなか困難とされてい
る。聴覚に障害を持たない児童が、毎日何時間も音声を
聞き取り、自らの発声について音声によるフィードバッ
クを得るのに対して、耳の不自由な児童は１週間に１回
程度の会話の訓練時間に、自分の発声のフィードバック
を得るにすぎない。[0006] 4. Air flow sensor. In front of the child learner's mouth, there are several methods that use a device that senses the flow of exhaled air. 5. Perform input for voice analysis using a standard microphone. Teaching conversations to deaf children learners is a difficult task due to the limited time for teachers to teach. Children with no hearing impairments listen for hours on a daily basis and get audio feedback on their utterances, while children with hearing impairments spend about one week a week training conversation. You just get feedback on your utterances.

【０００７】ＣＩＳＴＡのような会話訓練装置とコンピ
ュータを組み合わせると、会話の、特に発声の訓練の一
部を教師抜きで行うことができ、したがって訓練時間を
大幅に延長することが可能となる。他の従来例として文
書（章）音声化システムがあるが、このシステムでは、
いかなる発話（発声対象の言語、単語、文等を指す。言
語学上の「発話」に限定されない。以下同様。ｕｔｔｅ
ｒａｎｃｅ）がタイプ入力されても、それを自動的に合
成する。このような文書音声化システムで現在最もよく
知られているのは、ディジタル・イクイップメント（Ｄ
ｉｇｉｔａｌＥｑｕｉｐｍｅｎｔ）社の文書から英語
を発声する「ＤＥＣＴａｌｋ」である。[0007] Combining a computer with a conversation training device, such as CISTA, allows some of the training of conversation, especially vocalization, to be performed without a teacher, thus greatly extending the training time. Another conventional example is a document (chapter) audio system.
Any utterance (refers to the target language, word, sentence, etc., is not limited to linguistic "utterance";
lance) is typed in automatically. The best known of such document speech systems is currently Digital Equipment (D).
It is "DECTalk" which utters English from a document of the company (Equipment Equipment).

【０００８】次に、発声の研究について説明する。人間
の声道（発声器官により作り出される声の通り途）の形
状は、発声のための共振状態を決定し、更には人の発声
そのものをコントロールする。このため、声道の形状
と、その結果生じる音響出力の関係を示す電子的、計算
的モデルは、長年会話研究者の主要な研究課題であっ
た。そして、この研究では声道の形状は研究者自身のも
のを調べて音響出力を測定していた。極く最近では、研
究者らは、明瞭な有節音（ａｒｔｉｃｕｌａｔｏｒｙ）
の合成を開発し、声道の形状の生成は自動的に行われ
る。この場合、インプットは音素群（ｓｔｒｉｎｇｏ
ｆｐｈｏｎｅｍｅｓ）で構成され、この音素の群は声道
の形状の詳細な仕様に変換される。声道の形状は音声を
発する声道モデルの形成に使用される。Next, the study of vocalization will be described. The shape of the human vocal tract (the way the voice is produced by the vocal organs) determines the resonance state for the utterance, and further controls the human utterance itself. For this reason, electronic and computational models showing the relationship between the shape of the vocal tract and the resulting sound output have been a major research topic for conversation researchers for many years. And in this study, the shape of the vocal tract was examined by the researchers themselves, and the sound output was measured. More recently, researchers have identified articulatory articulations.
Vocal tract shapes are generated automatically. In this case, the input is a string of phonemes (string
fphonemes), and this group of phonemes is converted into a detailed specification of the shape of the vocal tract. The vocal tract shape is used to form a vocal tract model that emits speech.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、ＣＩＳ
ＴＡのようにフィードバックを学習者に直接提供する会
話訓練装置では、指導内容が個別の音声または事前にプ
ログラムされた発声や会話に限定される。また、「ＤＥ
ＣＴａｌｋ」では、文書音声化システムが作成するのは
聞き取り可能な音に限られる。SUMMARY OF THE INVENTION However, CIS
In conversational training devices that provide feedback directly to the learner, such as TAs, guidance content is limited to individual voices or pre-programmed utterances or conversations. Also, "DE
In CTTalk, the document speech system creates only audible sounds.

【００１０】また、発声の正確性に影響する各発声器
官、空気流等の表示等が完全とはいえない。特に、実際
の声道の形成に重要な役割を課す舌と口蓋の動作、具体
的には接触パターン、どの部分がどの時点で接触したり
離れたりするのか等の使用、表示が不明確である。そし
て、これについては、学習訓練、指導に採用することが
考えられていなかった。[0010] In addition, the display of each vocal organ, air flow, and the like, which affect the accuracy of utterance, cannot be said to be perfect. In particular, the movements of the tongue and palate that play an important role in the actual formation of the vocal tract, specifically the contact patterns, the use of which parts are coming and going at what time, etc., and the display are unclear . And this was not considered to be adopted for learning training and guidance.

【００１１】また、多少とも聴覚が残っている者への効
果的な学習には不充分である。また、外国語の修得にも
不便である。本発明は、以上の課題に鑑みなされたもの
であり、児童学習者が教師の指導を受けることなく、い
かなる発話に対しても、フィードバック情報を入手可能
にする訓練装置及び方法を提供することを目的としてい
る。[0011] In addition, it is insufficient for effective learning for those who still have some hearing. It is also inconvenient for learning a foreign language. The present invention has been made in view of the above problems, and provides a training device and a method that enable a child learner to obtain feedback information for any utterance without the guidance of a teacher. The purpose is.

【００１２】また、同じく、発声の形成に重要な役を課
す声道形成の様子がわかるようにする訓練装置及び方法
を提供することを目的としている。併せて、外国語の修
得にも便利なものを提供することを目的としている。Another object of the present invention is to provide a training apparatus and a training method for making it possible to understand the state of vocal tract formation, which plays an important role in vocal formation. At the same time, it aims to provide useful things for learning foreign languages.

【００１３】[0013]

【課題を解決するための手段】上記の各目的を達成する
ため本発明の会話訓練装置及び方法は、学習者（訓練
生）は、学習（訓練）を希望するいかなる発話（ｕｔｔ
ｅｒａｎｃｅ）や外国語もコンピュータにタイプ入力
し、タイプ入力された文書の発声に伴う（必要な）必ず
しも視認し難い各種発声器官の正しい動作たる調音模範
動作（ａｒｔｉｃｕｌａｔｏｒｙｍｏｄｅｌｍｏｖ
ｅｍｅｎｔｓ）、特に舌と口蓋の接触の様子（ｔｏｎｇ
ｕｅ−ｐａｌａｔｅｃｏｎｔａｃｔｐａｔｔｅｒｎ
ｓ．）という形式や、これに伴う空気流の様子や各器官
の動きの様子をＣＲＴに表示させて見ることにより学習
することを可能としている。In order to achieve the above objects, the conversation training apparatus and method according to the present invention provide a learner (trainee) with any utterance (utt) desired to learn (train).
erence) and foreign languages are also typed into the computer, and articulatory model mov, which is the correct operation of various vocal organs (necessary) that are not necessarily visible due to the utterance of the typed document.
elements, especially the state of contact between the tongue and palate (tong)
ue-palate contact pattern
s. ), And the state of the airflow and the movement of each organ accompanying the form are displayed on a CRT and can be learned.

【００１４】また、多少とも聴覚が残っている者には、
正しい発声を大きな音で聞かせる。他の目的を達成する
ため、本発明の会話訓練装置及び方法は、タイプ入力し
た発話や外国語を学習者が発声すると、学習者の実際の
発声と模範的な発声との類似点や相違点を比較し、この
評価結果をＣＲＴに表示することにより正しい発声を訓
練、学習するというフィードバックを可能としている。For those who still have some hearing,
Make the correct utterance sound loud. In order to achieve another object, the conversation training apparatus and method of the present invention provides a method for learning similarities and differences between an actual utterance of a learner and an exemplary utterance when the learner utters a typed utterance or a foreign language. Are compared, and the evaluation result is displayed on the CRT, thereby enabling the feedback that the correct utterance is trained and learned.

【００１５】[0015]

【作用】上記構成により、まず、編集手段がタイプ入力
された文書（章）を発声のために編集、変換して、音声
合成部に送る。音声合成部は、このタイプ入力された文
書を一連の音素群に分析し、文書の発音の特徴を規定す
るパラメータを１０ミリセカンド毎に作成する。作成さ
れた１セットのパラメータはシステムに戻されて、タイ
プ入力された文書を発声するのに必要な、例えば舌と口
蓋の接触についての調音模範パターンを表す１セットの
パラメータに変換される。そして、この調音のための舌
と口蓋の接触している場所や時間の関係を示す接触パタ
ーンの映像がＣＲＴに表示される。According to the above arrangement, first, the editing means edits and converts a typed document (chapter) for utterance, and sends it to the speech synthesizer. The speech synthesizer analyzes the type-input document into a series of phoneme groups, and creates parameters defining the pronunciation characteristics of the document every 10 milliseconds. The generated set of parameters is returned to the system and converted into a set of parameters needed to utter the typed document, for example, an articulatory pattern for tongue-palate contact. Then, a video of a contact pattern indicating the location and time relationship between the tongue and the palate for the articulation is displayed on the CRT.

【００１６】学習者が文書化して入力した発話を実際に
発声する際の鼻音化振動、声門振動、吐き出した空気
流、舌の接触及び発声そのものの判別可能性が１セット
の変換器で測定される。会話訓練装置は、学習者の発声
した音を、例えば模範的な舌と口蓋の接触パターン等と
如何に類似しているかに基づいて評価する。この評価
は、ＣＲＴに舌、口蓋の接触パターンの画面として表示
される。[0016] A set of transducers is used to measure the nasalization vibration, glottal vibration, exhaled air flow, tongue contact, and utterance discrimination when the learner actually utters the input utterance that is documented and input. You. The conversation training device evaluates the sound uttered by the learner based on, for example, how similar it is to an exemplary tongue-palate contact pattern. This evaluation is displayed on the CRT as a screen of the contact pattern of the tongue and palate.

【００１７】また、模範的な発声を行う外国語の訓練、
学習でも同様のことがなされる。[0017] Also, training in a foreign language to give an exemplary utterance,
The same goes for learning.

【００１８】[0018]

【実施例】以下、本発明を使用した、合成を基本とした
音声能力訓練システムの実施例を図と表を参照しつつ説
明する。（第１実施例）図１にその構成を示す。本実施例の合成
を基本とした会話訓練装置は、ラボラトリ用ディジタル
コンピュータを使用することもできる。このような装置
は、学習すべき発話を入力する手段を持ち、そのような
手段としては、会話訓練装置の入力部を構成するキーボ
ード１が好適である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a speech ability training system based on synthesis using the present invention will be described below with reference to the drawings and tables. (First Embodiment) FIG. 1 shows the configuration. The speech training apparatus based on synthesis according to the present embodiment may use a digital computer for a laboratory. Such a device has means for inputting an utterance to be learned, and as such means, the keyboard 1 constituting an input unit of the conversation training device is suitable.

【００１９】入力された発話は、ＡＳＣＩＩ表記方式の
音声訓練装置に送られることとなる。会話訓練装置は、
文書音声化装置を利用してタイプ入力された発話を自動
的に合成することとなる。ここに、英語に対しての文書
音声化装置としては、現在ディジタル・イクウィプメン
ト社の「ＤＥＣＴａｌｋ」が最もよく知られている。し
かし、本実施例で使用しているのは、パナソニック・テ
クノロジー（ＰａｎａｓｏｎｉｃＴｅｃｈｎｏｌｏｇ
ｉｅｓ）社に属する会話技術（ＳｐｅｅｃｈＴｅｃｈ
ｎｏｌｏｇｙ）研究所によってその後に開発された最新
の文書音声化システム「ＳＴＬｔａｌｋ」である。The input utterance is sent to a voice training device of the ASCII notation system. The conversation training device
Utterances that have been type-input using the document speech device are automatically synthesized. Here, "DECTalk" manufactured by Digital Equipment Corporation is the most well-known as a document speech conversion device for English. However, in this embodiment, Panasonic Technology (Panasonic Technology) is used.
ies) Conversation technology belonging to the company (Speech Tech)
Novel) is a state-of-the-art text-to-speech system "STLtalk" developed subsequently by the Institute.

【００２０】合成部３と会話訓練装置間の連絡は、ＲＳ
−２３２ポートを介して行われる。ＲＳ−２３２はＸＯ
Ｎ／ＸＯＦＦ通信規約（ｐｒｏｔｏｃｏｌ）用に構成さ
れ、その送信セッティングは９６００ビット／秒（ｂａ
ｕｄ）、８データビット、１ストップビット、０パリテ
ィビットである。合成部３は、一連の音素群である入力
文書を受け取り、これを入力バッファに格納する。Communication between the synthesizing unit 3 and the conversation training device is performed by RS
-232 port. RS-232 is XO
It is configured for the N / XOFF communication protocol (protocol) and its transmission setting is 9600 bits / second (ba
ud), 8 data bits, 1 stop bit, and 0 parity bit. The synthesizing unit 3 receives an input document which is a series of phonemes, and stores the input document in an input buffer.

【００２１】文書音声化部２では、入力された発話が分
析されることとなる。発話の構文分析（ｓｙｎｔａｃｔ
ｉｃａｎａｌｙｓｉｓ）は、綴り中のコンマの位置お
よび辞書検索時に検出する前置詞等の機能語（ｆｕｎｃ
ｔｉｏｎｗｏｒｄｓ）や動詞の構文上の役割を参照し
て行う。次に、図２に示す手順に従って、発話の音素情
報（ｐｈｏｎｅｍｉｃｒｅｐｒｅｓｅｎｔａｔｉｏ
ｎ）が作成される。以下、本図の内容を説明する。The document utterance unit 2 analyzes the input utterance. Syntax analysis of utterance (synactact)
ic analysis) is a function word (func) such as the position of a comma in spelling and a preposition detected at the time of dictionary search.
Tion words) and the syntactic role of the verb. Next, in accordance with the procedure shown in FIG. 2, phonemic information of the utterance (phonemic representation)
n) is created. Hereinafter, the contents of this drawing will be described.

【００２２】まず、各語を、小さい発音用辞書の見出し
語と比べる（ステップ１２）。若し、辞書に対応する見
出し語がない場合には、”ｅｄ”や”ｉｎｇ”等の共通
接尾辞を取り除くなどして語を形態素（ｍｏｒｐｈｅｍ
ｅｓ）に分割する（ステップ１４）。分割後、残った語
根（ｒｏｏｔ）を音素辞書（ｐｈｏｎｅｍｉｃｄｉｃ
ｔｉｏｎａｒｙ）の見出し音と比べる（ステップ１
６）。もし語根に匹敵する見出しがない場合には、文字
から音素（ｐｈｏｎｅｍｅ）への変換規則を参照して発
音を推測する（ステップ１８）。加えて、対象語を音素
へ変換する際に、音節に使用されるべき強勢（アクセン
ト、ｓｔｒｅｓｓ）パターンの検討も行われる。すなわ
ち、対象語がシステム内の辞書に記載されていなかった
り、語から分解された語根の強勢パターンが接頭辞によ
って変わるときは、それを推測しなければならない。音
節の強勢レベルは、音素表記された母音の直前に強勢記
号を挿入することにより表される。強勢記号が無いとき
には、強勢は置かれない。発声等に際しては、共通接尾
辞を除去しているならば再度付加する（ステップ２
０）。First, each word is compared with a headword of a small pronunciation dictionary (step 12). If there is no corresponding headword in the dictionary, the common suffix such as “ed” or “ing” is removed to convert the word into a morpheme (morphem).
es) (step 14). After the division, the remaining roots are stored in a phonemic dictionary (phonemic dic).
(ionary) heading sound (step 1)
6). If there is no heading comparable to the root, the pronunciation is guessed by referring to the conversion rules from characters to phonemes (step 18). In addition, when converting a target word into a phoneme, a stress (stress) pattern to be used for a syllable is also examined. That is, when the target word is not described in the dictionary in the system or when the stress pattern of the root decomposed from the word changes depending on the prefix, it must be inferred. The stress level of a syllable is represented by inserting a stress symbol immediately before the vowel represented by the phoneme. When there is no stress sign, no stress is placed. When uttering, etc., if the common suffix has been removed, it is added again (step 2).
0).

【００２３】音パラメータ合成部３は、前記バッファを
処理して、１セット２０の音声パラメータを、１０ミリ
セカンドの連続する音素毎に１つ作成する。音パラメー
タは、学習される発話の音としての特徴を規定するもの
である。これらパラメータは、送られてきた情報を、基
本的な周波数や振幅数、フォルマント（厳密には母音の
構成素音、ｆｏｒｍａｎｔである。ただし、本明細書で
は訓練対象の音をも指すことがある。）の周波数や帯域
幅、対象となる発話のノイズ源に基づいて予め定められ
た規定値と比べる。音パラメータの例を表１に示す。The sound parameter synthesizing unit 3 processes the buffer to generate one set of 20 sound parameters for each 10 msec continuous phoneme. The sound parameter defines a feature of the learned utterance as a sound. These parameters represent the transmitted information as basic frequencies, amplitude numbers, and formants (strictly speaking, constituent vowels and formants. However, in this specification, the parameters may also refer to sounds to be trained. ) Is compared with a predetermined value based on the frequency, bandwidth, and noise source of the target utterance. Table 1 shows examples of sound parameters.

【００２４】[0024]

【表１】 [Table 1]

【００２５】同一の音パラメータが２組作成される。発
話の終わりを表す記号であるピリオドを受け取ると、第
１組は、アレイ（ａｒｒａｙ，ＣＰＵ内の記憶素子の配
列された記憶装置。）に格納される。次に、第２組の音
パラメータがフォルマント合成部４に送られて、アナロ
グ音声信号形式の出力信号に変換される。フォルマント
合成部４で変換された出力信号は、スピーカー５により
大きな音量で再生されるので、聴覚がいくらか残ってい
る学習者はこれを聞くことができる。Two sets of the same sound parameters are created. Upon receiving the period, which is the symbol representing the end of an utterance, the first set is stored in an array (array, a storage device with an array of storage elements in the CPU). Next, the second set of sound parameters is sent to the formant synthesizing unit 4 and converted into an output signal in the form of an analog audio signal. The output signal converted by the formant synthesizing unit 4 is reproduced by the speaker 5 at a large volume, so that a learner who has some hearing can hear it.

【００２６】音パラメータ合成部（フォルマント合成を
除く）３のアレイに格納された第１組の音パラメータセ
ットは、ＲＳ−２３２ポートを介して会話訓練装置に戻
される。会話訓練装置は、これらの値を読み込んで、別
のアレイに格納する。そして、音パラメータは、音パラ
メータの調音パラメータへの変換部６で調音パラメータ
に変換される。調音パラメータで、３形式のタイプ入力
発話を表す。The first set of sound parameters stored in the array of sound parameter synthesizers (excluding formant synthesis) 3 is returned to the conversation training device via the RS-232 port. The conversation training device reads these values and stores them in another array. Then, the sound parameters are converted into the articulation parameters by the conversion unit 6 of the sound parameters into the articulation parameters. The articulatory parameter represents three types of typed utterances.

【００２７】会話訓練装置は、１グループの音パラメー
タを、タイプ入力された発話の音声特徴を示す調音パラ
メータに変換する。変換に使用される音パラメータは基
本的な周波数、発話の振幅数、有声または無声、フォル
マント１、２、３の周波数を含む。文書音声化装置は、
音パラメータの周波数（ｈｅｒｚ）を１０倍して会話訓
練装置に送る。周波数を使用する会話訓練装置は、受け
取ったパラメータを１０で割って、ＣＲＴに画面として
表示する。The conversation training apparatus converts one group of sound parameters into articulation parameters indicating the voice characteristics of the utterance typed. The sound parameters used for the conversion include the fundamental frequency, the amplitude of the utterance, voiced or unvoiced, formant 1, 2, 3 frequencies. The document speech device is
The frequency (herz) of the sound parameter is multiplied by 10 and sent to the conversation training device. The conversational training apparatus using the frequency divides the received parameter by 10 and displays the screen on the CRT.

【００２８】会話訓練装置は、また、タイプ入力された
発話の鼻音化を示す調音パラメータを作成する。鼻音化
は、／ｍ／や／ｎ／等の音には適しているが、聴覚障害
者は、しばしば不適切に発話を鼻音化して発声し（鼻声
で話す）、これが会話の妨げとなる。鼻音は、周波数定
義域に鼻音ポール（ｎａｓａｌｐｏｌｅｓ）と鼻音ゼ
ロ（ｚｅｒｏ）の両方が存在するか否かで示される。典
型例としては、１鼻音ポール及び１鼻音ゼロで鼻音化が
表される。これらは、鼻と口の間に存在する弁である軟
口蓋が低く、開位置まで下がると現れる。そして、軟口
蓋が閉まると存在しなくなる。The conversation training apparatus also creates articulation parameters indicating the nasalization of the typed utterance. Nasalization is suitable for sounds such as / m / and / n /, but hearing impaired people often inappropriately nasalize the utterance (speak nasal), which hinders conversation. Nasal sounds are indicated by the presence or absence of both nasal poles and zeros in the frequency domain. Typically, nasalization is represented by one nasal pole and one nasal zero. These appear when the soft palate, the valve between the nose and mouth, is low and descends to the open position. When the soft palate closes, it disappears.

【００２９】文書音声化装置は、タイプ入力された発話
の鼻音化を検出する音パラメータ変数を提供する。会話
訓練装置は、音パラメータを使用して鼻音インデックス
を作成する。文書音声化装置は、非鼻音用の音パラメー
タとして鼻音ポールと鼻音ゼロを同じ周波数および帯域
幅に設定する。すなわち、同じ周波数および帯域幅に設
定された鼻音ポールと鼻音ゼロは相互に打ち消しあう。
本件では、非鼻音を表すのに、鼻音ポールと鼻音ゼロを
２５０Ｈｚに設定する。鼻音を作成する際は、鼻音ポー
ルと鼻音ゼロの周波数は互いに異なった値に移動して、
それぞれ音スペクトルの異なった箇所に影響する。鼻音
ゼロの周波数は、対象音が／ｎ／３４または／ｍ／３６
であるかによって、一時的に３３０または３６０Ｈｚに
増加する。周波数の変動は、図３に示される。文書音声
化装置は、また、鼻音フォルマント３０、３２の振幅数
を提供する。鼻音フォルマントの典型的な振幅数は３０
から６０の範囲内であるが、振幅数が０であるときは鼻
音フォルマント３０、３２どちらも提供されない。文書
音声化装置が提供する音パラメータを使用する言語能力
訓練装置は、タイプ入力した発話を行うのに必要な鼻音
量を学習者に知らせる鼻音インデックスを作成するため
の演算を行う。The text-to-speech device provides a sound parameter variable for detecting the nasalization of the typed utterance. The conversation training device creates a nasal index using the sound parameters. The document sounder sets the nasal pole and nasal zero to the same frequency and bandwidth as the non-nasal sound parameters. That is, a nasal pole and a nasal zero set at the same frequency and bandwidth cancel each other out.
In this case, the nasal pole and nasal zero are set to 250 Hz to represent non-nasal sounds. When creating nasal sounds, the nasal pole and nasal zero frequencies move to different values from each other,
Each affects a different part of the sound spectrum. The frequency of zero nasal sound is that the target sound is / n / 34 or / m / 36.
, Temporarily increase to 330 or 360 Hz. The variation in frequency is shown in FIG. The document sounder also provides the number of amplitudes of the nasal formants 30,32. The typical amplitude of a nasal formant is 30
, But when the amplitude number is zero, neither of the nasal formants 30, 32 is provided. The language ability training device that uses the sound parameters provided by the document voicing device performs an operation for creating a nasal index that informs a learner of a nasal volume required for performing a typed utterance.

【００３０】上記演算の１例は以下のようである。ＮＩ＝｛（｜Ｆ_nf−Ｆ_nz｜）／（｜Ｆ_nf−Ｆ
_nz｜）_max｝×〔｛（ＡＮ−１）−｜（ＡＮ−１）｜｝
／２＋１〕ここに、ＮＩ＝鼻音インデックス(nasalization index) Ｆ_nf＝鼻音フォルマントの周波数(frequency of the na
sal formant) Ｆ_nz＝鼻音ゼロの周波数(frequency of the nasal zer
o) ＡＮ＝鼻音フォルマントの振幅(amplitude of the nasa
l formant) ここに、全入力変数は整数である。One example of the above operation is as follows. NI = ｛(| F _nf −F _nz |) / (| F _nf −F
_nz |) _max ｝ × [｛(AN-1)-| (AN-1) |｝
/ 2 + 1] where NI = nasalization index F _nf = frequency of the nasal formant
sal formant) F _nz = frequency of the nasal zer
o) AN = amplitude of the nasa
l formant) where all input variables are integers.

【００３１】また、演算式の右辺、２番目の要素では鼻
音化の振幅が０であれば値に０を掛け、鼻音化の振幅数
が０以外であれば値に１を掛ける。会話訓練装置は、ま
た、タイプ入力した発話を行うのに必要な舌と口蓋の接
触を示す有声音（ａｒｔｉｃｕｌａｔｏｒｙ）の調音パ
ラメータを作成する。音パラメータを提供する以外、文
書音声化システムは、次の４つの時間のタイミングをと
るために使用される。つまり音の開始時点、最大振幅に
達する時点、振幅が減少し始める時点、そして終了する
時点である。これらの時間は組み合わせて、１セットの
口蓋図で示されている。In the second element on the right side of the arithmetic expression, the value is multiplied by 0 when the amplitude of nasalization is 0, and the value is multiplied by 1 when the amplitude of nasalization is other than 0. The speech training device also creates articulatory articulation parameters indicative of the tongue-palate contact required to make the typed utterance. Other than providing sound parameters, the document speech system is used to time the next four times. That is, the time when the sound starts, the time when the maximum amplitude is reached, the time when the amplitude starts to decrease, and the time when it ends. These times are shown in combination in a set of palate diagrams.

【００３２】音素であれ音節であれ、各発声単位の接触
のための最大領域は音パラメータの調音パラメータへの
変換部６で格納され、発声単位の対象（ｔａｒｇｅｔ）
パターンとして扱われる。会話の音声は種々のコンテク
スト（文脈、単語、音等の前後関係。例えば、「ｔ」な
らば、「ｃｔ」、「ｔｈ」、「ｓｔ」等。ｃｏｎｔｅｘ
ｔ）に応じた接触パターンで変化するので、多くの異な
った接触パターンを各音ごとにだけでなく、各文脈、前
後関係毎に格納する必要がある。The maximum area for contact of each utterance unit, whether a phoneme or a syllable, is stored in the converter 6 for converting sound parameters into articulation parameters, and the target of the utterance unit (target)
Treated as a pattern. The voice of the conversation is in various contexts (context, word, sound, etc., for example, "t" means "ct", "th", "st", etc.)
Since the contact pattern changes according to t), it is necessary to store many different contact patterns not only for each sound, but also for each context and context.

【００３３】各発声単位には、その開始、終了の時期、
その間の安定した状態の部分の持続時間（ｄｕｒａｔｉ
ｏｎ）のデータが割り当てられている。舌と口蓋の間の
接触領域は６３個の点からなるセットとして定義されて
いる。なお、ここに、６３個としたのは、各種の音声、
発音単位に対して、充分に舌と口蓋の接触の様子がわか
ること、ＣＲＴ上での視覚性が良好なこと、実際の製作
や使用にあたっての実用性を考慮したものである。Each utterance unit has a start and end time,
The duration of the part of the steady state (duration
on) data is assigned. The contact area between the tongue and palate is defined as a set of 63 points. Here, 63 voices are used for various sounds,
This is in consideration of the fact that the state of contact between the tongue and the palate can be sufficiently understood with respect to the pronunciation unit, that the visibility on the CRT is good, and that practicality in actual production and use.

【００３４】分類段階階層（ｈｉｅｒａｒｃｈｙ）は、
いくつかの点が開始、終了の部分で接触したり離れたり
するような順に定義されている。概して、最初に接触す
る地点は口蓋の奥に向かう部分であり、最初に離れる地
点は口蓋の前方の部分である。文書音声化システムによ
って生成された音パラメータを人間が学ぶ舌と口蓋の接
触パターンに変換することは困難であるので、本発明
（実施例）では、舌−口蓋接触パターンの合成に子音の
開始と終了を使用（流用）している。The classification hierarchy is:
A number of points are defined in such a way that they touch and separate at the beginning and end. In general, the point of initial contact is the part towards the back of the palate, and the point of initial separation is the part in front of the palate. Since it is difficult to convert the sound parameters generated by the text-to-speech system into a tongue-palate contact pattern learned by humans, the present invention (embodiment) uses the consonant start and the tongue-palate contact pattern synthesis. The end is used (diverted).

【００３５】例えば、“Ｓｈｅｓａｉｄ”という発話
をする場合、文書音声化システムは合成パラメータによ
って／ＳＨ／の開始と終了、及び／Ｓ／の開始と終了を
知らせる。図４には、発話“Ｓｈｅｓａｉｄ”の全て
のパラメータが示されている。口蓋の接触のためには、
実際の接触パターンを提供する現在の訓練装置では、始
めと終わりの時間のみ使用されている。For example, when the utterance "She Said" is uttered, the text-to-speech system notifies the start and end of / SH / and the start and end of / S / based on the synthesis parameters. FIG. 4 shows all parameters of the utterance “She Said”. For palatal contact,
Current training devices that provide actual contact patterns use only the beginning and ending times.

【００３６】現在の装置で、学習者により表示される舌
−口蓋接触はこの文書音声化システムに含まれている。
図５に示すのは、典型的なパターンであり、“Ｓｈｅ
ｓａｉｄ”の中の／Ｓ／音を取り囲んでいる母音の舌−
口蓋接触パターンを示している。大きい方の点は接触
を、小さい方の点は非接触を示している。５８から１０
４までの各フレームは１０ミリ秒を表している。In current devices, the tongue-palate contact displayed by the learner is included in this document speech system.
FIG. 5 shows a typical pattern, "She
vowel tongue surrounding / S / sound in "said"
7 shows a palate contact pattern. The larger point indicates contact, and the smaller point indicates non-contact. 58 to 10
Each frame up to 4 represents 10 milliseconds.

【００３７】図５のフレームの連続は“Ｓｈｅ”の中の
母音から次に続く／Ｓ／音、そして“ｓａｉｄ”の母音
という流れを表している。始めの７つのフレーム５８、
６０、６２、６４、６６、６８、７０は“Ｓｈｅ”の母
音が“ｓａｉｄ”の／Ｓ／フレーム１０４に続くための
接触を示している。The continuation of the frame in FIG. 5 represents a flow of a vowel in "She", a following / S / sound, and a vowel in "said". The first seven frames 58,
Numerals 60, 62, 64, 66, 68, and 70 indicate a contact for the vowel of "She" to follow the / S / frame 104 of "said".

【００３８】完全な／Ｓ／音とフレーム８と７２で始ま
り、９フレーム後のフレーム９０で舌は口蓋から離れ始
める。２４番目のフレーム１０４によって、“ｓａｉ
ｄ”の中の母音に必要な接触は完全に非接触となる。会
話訓練プログラムにおいて、ＣＲＴに画面として表示す
る実際の舌の接触動作は、以下の３つの異なる方法によ
る。Beginning with a complete / S / sound and frames 8 and 72, the tongue begins to separate from the palate at frame 90, nine frames later. By the 24th frame 104, “sai
The contact required for the vowel in d "is completely non-contact. In the conversation training program, the actual tongue contact operation displayed on the CRT as a screen is performed in the following three different ways.

【００３９】装置は、子音／ｔ／、／ｄ／、／ｓ／、／
ｚ／、／ｓｈ／、／ｚｈ／、／ｌ／、／ｎ／、／ｒ／を
発音する際、その舌接触パターンは予め格納（利用可能
な態様でのあらかじめの記憶、Ｐｒｅｓｔｏｒｅｄ）し
ておく。現在までには、高舌、前舌母音の直前または直
後に発音される／ｋ／または／ｇ／の舌接触パターンは
前記予め格納される舌接触パターンに含まれていない
が、適用される原理は同じである。The device has consonants / t /, / d /, / s /, /
When pronounced z /, / sh /, / zh /, / l /, / n /, / r /, the tongue contact pattern is stored in advance (prestored in an available mode, Prestored). . Up to now, the tongue contact pattern of / k / or / g / pronounced immediately before or immediately after the high tongue, the front tongue vowel is not included in the previously stored tongue contact pattern, but the principle applied Is the same.

【００４０】教師は、各音の舌接触パターンを、それが
形成される状況に応じて装置に利用可能な態様で記憶さ
せる。また、学習者、児童の発声中、一度はうまく発声
されたけれども、情況は異なるものの日本人における
とｒの区別等、くり返し練習する必要があると教師が判
断した音を選定の上記憶させる。最も適当な舌と口蓋の
接触は各学習者の口蓋の形に深く関っているので、教師
によって行われるこの格納方法が最も有効である場合が
多い。The teacher stores the tongue contact pattern of each sound in a form usable by the device according to the situation in which it is formed. Also, while the learner and the child uttered the speech once well, the situation was different but the Japanese
For example, the sound determined by the teacher as having to be repeatedly practiced, such as the distinction between r and r, is selected and stored. Since the most appropriate tongue-palate contact is closely related to the shape of the palate of each learner, this storage method performed by the teacher is often most effective.

【００４１】教師は、各子音に対して、上記の３つの方
法の何れを採用するか指定する。特に指定されなけれ
ば、予め舌接触パターンを格納しておく方法が採用され
る。上記全ての方法に共通して、会話訓練装置は子音開
始前に始まり、子音終了後まで続く一連の接触パターン
を提供する。舌と口蓋の接触を有する子音を要求する発
話が装置にタイプ入力された場合、文書音声化システム
は、その子音の開始と終了についての時間を訓練装置に
通知する。次に、上記に述べた３つの方法のうちの１つ
により格納された一連のパターンは、学習者に適切な接
触パターンを表示するために使用される。The teacher specifies which of the above three methods is to be used for each consonant. Unless otherwise specified, a method of storing a tongue contact pattern in advance is adopted. Common to all of the above methods, the conversation training device provides a series of contact patterns that begin before the consonant begins and continue until after the consonant ends. When a speech requesting a consonant with tongue-palate contact is typed into the device, the document speech system notifies the training device of the time for the start and end of the consonant. The series of patterns stored by one of the three methods described above is then used to display the appropriate contact pattern to the learner.

【００４２】タイプ入力した発話を行うために必要な舌
と口蓋の接触の動作の様子をＣＲＴ上の画面で確認した
後、学習者はその発話を実際に発声してみる。なお、こ
こに会話訓練装置は、以上説明してきた会話訓練補助装
置（ＣＩＳＴＡ）を使用している。そして、この装置
は、数個の変換器（ｓｐｅｅｃｈｔｒａｎｓｄｕｃｅ
ｒｓ）８を使用して学習者の発声に伴う各種パターン、
パラメータを測定する。学習者の舌口蓋接触、鼻音化振
動、声門振動、吐き出した空気流及び音響を測定し、発
声の際の評価に必要なデータが取得される。そして、変
換器へ入力されたデータが解析されることとなる。After confirming the state of the operation of contact between the tongue and the palate necessary for performing the utterance typed on the screen on the CRT, the learner actually utters the utterance. Here, the conversation training device uses the conversation training assistance device (CISTA) described above. Then, this device has several transducers (speech transduce).
rs) 8, various patterns associated with the utterance of the learner,
Measure the parameters. The learner's tongue and palate contact, nasal vibration, glottal vibration, exhaled airflow and sound are measured, and data necessary for evaluation during vocalization is acquired. Then, the data input to the converter is analyzed.

【００４３】会話訓練装置の評価部９は、学習者の発声
に伴う測定結果と格納されている音パラメータとを比較
して、その類似性（相違）を検出し、また評価する。そ
の後、学習者が実際に行った発音器官動作と調音模範動
作との相違に基づいて、表示部１０を通じて学習者にフ
ィードバックを行う。（第２実施例）本実施例では、正常な聴力を持つ学習者
が合成を基本とした会話訓練装置を使用して外国語会話
を学習する。以下に述べる点を除いて、本実施例で使用
する会話（ｓｐｅｅｃｈ）訓練（学習）装置は、第１実
施例で使用する会話訓練装置と同一である。The evaluation unit 9 of the conversation training apparatus compares the measurement result accompanying the utterance of the learner with the stored sound parameters to detect and evaluate the similarity (difference). Thereafter, based on the difference between the sounding organ operation and the articulatory model operation actually performed by the learner, feedback is provided to the learner through the display unit 10. (Second Embodiment) In this embodiment, a learner having a normal hearing ability learns a foreign language conversation using a conversation training apparatus based on synthesis. Except as described below, the speech training (learning) device used in this embodiment is the same as the conversation training device used in the first embodiment.

【００４４】学習者は会話訓練装置の入力部（キーボー
ド）１に、学習の目的とする外国語の文書を入力する。
文書音声化部２は、入力された文書を会話用に編集す
る。編集された文書は、音パラメータ合成部（除く、フ
ォルマント合成部）３に送られる。この合成部３は、文
書を処理して、２０の音パラメータからなる音パラメー
タセットを作成する。ここに、音パラメータは、所定
の、この場合には学習対象の、外国語の文書の音声特徴
を特定するものである。The learner inputs a foreign language document to be learned into the input unit (keyboard) 1 of the conversation training apparatus.
The document audio unit 2 edits the input document for conversation. The edited document is sent to the sound parameter synthesizing unit (excluding the formant synthesizing unit) 3. The synthesizing unit 3 processes the document to create a sound parameter set including 20 sound parameters. Here, the sound parameter specifies a predetermined feature, in this case, a speech feature of a foreign language document to be learned.

【００４５】同一の音パラメータが２組作成される。２
組目の音パラメータセットは、フォルマント合成部４に
送られてアナログ音声シグナルの出力信号に変換され
る。フォルマント合成部４の出力は、所定の外国語で、
発声部としてのスピーカー５から再生される。次に、１
組目の音パラメータセットが会話訓練装置の音パラメー
タの調音パラメータへの変換部６に送られる。会話訓練
装置は、音パラメータをこの変換部６で調音パラメータ
に変換して、入力された文書を所定の外国語で発音する
ための調音模範動作を示す。変換された調音パラメータ
に相応した子音を含む舌と口蓋の接触パターンを含む等
各部の動作はＣＲＴ７に画面表示される。Two sets of the same sound parameters are created. 2
The sound parameter set of the set is sent to the formant synthesizing unit 4 and is converted into an analog audio signal output signal. The output of the formant synthesis unit 4 is a predetermined foreign language,
The sound is reproduced from the speaker 5 as an utterance part. Then, 1
The sound parameter set of the group is sent to the conversion unit 6 for converting sound parameters into articulation parameters of the conversation training device. The conversation training device converts the sound parameters into the articulatory parameters by the conversion unit 6, and shows an articulatory model operation for producing an input document in a predetermined foreign language. The operation of each part, including the contact pattern of the tongue and palate including the consonant corresponding to the converted articulation parameters, is displayed on the CRT 7 on the screen.

【００４６】所定の外国語で入力された文書を発声する
ための舌と口蓋の接触パターンを含んだ調音模範動作を
参照して、学習者は外国語の入力文書を発音する。会話
訓練装置は、学習者の発音を測定する。次に、学習者の
発声に伴う測定値を、外国語用として入力された文書用
音パラメータと比較して評価する。会話訓練装置の評価
部９は、学習者が外国語で入力された文書用音パラメー
タと比較して評価する。会話訓練装置は、学習者が外国
語で入力された文書を発声する際に行った調音動作に対
して、あらかじめ格納等されている調音模範動作との相
違を検出し、これを学習者に表示部のフィードバック用
のディスプレイ１０を使用してフィードバックする。The learner pronounces the input document in a foreign language by referring to the articulatory model operation including the contact pattern of the tongue and palate for uttering a document input in a predetermined foreign language. The conversation training device measures the learner's pronunciation. Next, the measured value accompanying the utterance of the learner is evaluated by comparing it with the document sound parameter input for the foreign language. The evaluation unit 9 of the conversation training device evaluates the learner by comparing it with a sound parameter for a document input in a foreign language. The conversation training device detects the difference between the articulation operation performed when the learner utters a document input in a foreign language and the articulation model operation stored in advance, and displays this to the learner. The feedback is provided using the display 10 for the feedback of the section.

【００４７】以上、いわゆる当業者が本発明を明瞭に理
解し、実施しえるべく好ましい実施例に基づいて説明し
たが、本発明は何も明細書の実施例の記載に制約をされ
るものではなく、その要旨からはずれる事なく種々の変
更しての実施が可能なことは言うまでもない。具体的に
は、例えば英語のシステムでは音素を基本的発声単位と
して使用する。このため、実施例では音素を基本的単位
とする英語のシステムについて述べている。しかしなが
ら、日本語等他の言語のなかには、基本的発声単位とし
て主に音節を使用するものもある。従って、かかる言語
の場合は、音節を基本的発声単位とするシステムとして
いる。While the present invention has been described with reference to preferred embodiments so that those skilled in the art can clearly understand and practice the present invention, the present invention is not limited to the description of the embodiments in the specification. Needless to say, various changes can be made without departing from the gist of the invention. Specifically, for example, an English language system uses phonemes as basic utterance units. For this reason, the embodiment describes an English system using phonemes as basic units. However, some other languages, such as Japanese, mainly use syllables as basic utterance units. Therefore, in the case of such a language, a system is used in which syllables are used as basic utterance units.

【００４８】[0048]

【発明の効果】以上説明してきたように、本発明によれ
ば、聴覚に障害のある児童でも、キーボード入力さえで
きれば、どのような単語、文書でもそれを正しく発声す
るのに必要な、しかし本来視認し難い口蓋、舌の接触を
中心とした動き等をＣＲＴに表示させ、これによりそれ
を目で観察することが可能となる。このため、正確な発
音をするために口蓋、舌を如何に動かせばよいか、接触
させればよいか等が容易に理解しえる。As described above, according to the present invention, even a child with hearing impairment is required to correctly utter any word or document as long as he / she can enter with a keyboard, but it is not essential. The movement around the palate, the contact of the tongue, etc., which is difficult to visually recognize, is displayed on the CRT, so that it is possible to visually observe the movement. For this reason, it is easy to understand how to move the palate and tongue to make accurate sound, how to make contact, and the like.

【００４９】更には、他人が話しているときの動きを観
察することによる口での聞き取りの支援ともなる。以上
より聴覚に障害があるため音声情報を知得しえず、この
一方で早い時期にタイプを学習する児童学習者に好適で
ある。また、多少とも聴覚がのこっておれば、大きな音
量とすることにより聞き取り可能となるため、更に良好
な会話の訓練が可能となる。また、自分が実際に発声を
した際の発声そのものはもとより、口蓋、舌等の発声器
官の動作やこれに伴う空気流れ等の測定値と模範的な発
声や発声器官の動作等とを比較した結果がＣＲＴに表示
されるため、より一層、会話、特に発声の訓練が効果的
となる。Further, it also assists in hearing by mouth by observing the movement of another person while speaking. As described above, since hearing impairment prevents speech information from being obtained, it is suitable for a child learner who learns typing early. In addition, if the hearing is somewhat enhanced, it becomes possible to listen by setting the volume to a large volume, so that it is possible to perform better conversation training. In addition, we compared the measured values of the vocal organs such as the palate, the tongue, and the associated airflow, as well as the typical vocalized and vocalized organ movements, as well as the utterance itself when actually speaking. Since the results are displayed on the CRT, the training of conversation, especially vocalization, is more effective.

【００５０】また、会話の訓練に専任の教師も不要とな
る。また、聴覚障害者でなく、一般人が外国語を学習す
る際にも使用しえる。この場合も、正確な発声はもとよ
り、口蓋、舌等の正確な発音のための動作と実際の自分
の発声や動作とがＣＲＴに比較表示されるため、学習効
率が増加する。Also, a dedicated teacher is not required for conversation training. It can also be used by non-hearing people to learn foreign languages. In this case as well, the operation for accurate sounding of the palate, the tongue, etc., and the actual utterance and operation of the user, as well as the accurate utterance, are compared and displayed on the CRT.

[Brief description of the drawings]

【図１】会話訓練装置の一実施例の構成図である。FIG. 1 is a configuration diagram of one embodiment of a conversation training device.

【図２】ＡＳＩＩ方式に基づいて表記された語を音素、
強勢及び品詞情報（ｐａｒｔｓ−ｏｆ−ｓｐｅｅｃｈ
ｉｎｆｏｒｍａｔｉｏｎ）に書き換える動作流れ図であ
る。FIG. 2 shows a phoneme, a word written based on the ASII system.
Stress and part-of-speech information (parts-of-speech)
9 is an operation flow chart of rewriting the information into (information).

【図３】１発話中の鼻音ゼロおよび鼻音ポール（極点、
ｐｏｌｅ）の音パラメータを示す音スペクトルである。FIG. 3 shows nasal zero and nasal pole during one utterance (pole,
(Pole) is a sound spectrum indicating sound parameters.

【図４】１発話中に起こる舌の口蓋への接触パターンを
示す音スペクトルの例である。FIG. 4 is an example of a sound spectrum showing a contact pattern of the tongue with the palate occurring during one utterance.

【図５】１発話中に起こる、舌と口蓋の実際の接触パタ
ーンを表示した図である。FIG. 5 is a diagram showing an actual contact pattern between the tongue and the palate, which occurs during one utterance.

[Explanation of symbols]

１学習対象の会話のキーボード入力部２文書音声化部３音パラメータ合成部（フォルマント合成部を除
く）４フォルマント合成部５合成された会話の発声部６音パラメータの調音パラメータへの変換部７調音パラメータの表示部８学習者の発声の際の測定器９学習者の発声をパラメータを使用して評価する評
価部１０フィードバックとしての表示部Reference Signs List 1 Keyboard input unit for conversation to be learned 2 Document speech unit 3 Sound parameter synthesizing unit (excluding formant synthesizing unit) 4 Formant synthesizing unit 5 Speech unit of synthesized conversation 6 Conversion unit of sound parameters to articulation parameters 7 Articulation Parameter display section 8 Measuring device for learner's utterance 9 Evaluation section for evaluating learner's utterance using parameters 10 Display section as feedback

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＧ１０Ｌ 9/02 ３０１Ｇ１０Ｌ 9/02 ３０１ＡＨ０４Ｒ 1/42 Ｈ０４Ｒ 1/42 (72)発明者ブライアン、アレン、ハンソンアメリカ合衆国カリフォルニアゴレタサン・カルピノ 7457 (56)参考文献特開平６−348297（ＪＰ，Ａ) 特開平４−359299（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G09B 21/00 G09B 5/06 G09B 19/06 G10L 3/00 G10L 3/00 551 G10L 9/02 301 H04R 1/42 ──────────────────────────────────────────────────の Continued on the front page (51) Int.Cl. ⁶ Identification symbol FI G10L 9/02 301 G10L 9/02 301A H04R 1/42 H04R 1/42 (72) Inventor Brian, Allen, Hanson United States of America California Goleta San Carpino 7457 (56) References JP-A-6-348297 (JP, A) JP-A-4-359299 (JP, A) (58) Fields investigated (Int. Cl. ⁶ , DB name) G09B 21 / 00 G09B 5/06 G09B 19/06 G10L 3/00 G10L 3/00 551 G10L 9/02 301 H04R 1/42

Claims

(57) [Claims]

1. An input means for documenting and inputting a conversation to be trained, and a set of utterance units for using the utterances constituting the documented and input conversation for use in practical education. Editing means for converting into data relating to the start and end of each utterance unit, and the characteristics as speech in the practical training of the conversation to be trained based on the above-mentioned one set of utterance units and the data relating to the start and end thereof A combination means for composing a set of tongue and palate contact patterns, and a display means for displaying the combined set of tongue and palate contact patterns. A conversation training device based on synthesis.

2. The synthesizing means includes a three-time determination control unit for controlling the synthesis so as to determine the start and end times of each utterance unit and the duration of a stable contact state. A speech training device based on synthesis according to claim 1.

3. The three time determination control unit for each utterance unit for controlling synthesis so as to determine the start and end times and the duration of a stable state for each utterance unit. 3. The speech training apparatus based on synthesis according to claim 2, wherein the speech training apparatus has a combination.

4. The three kinds of time determination control part for each utterance unit includes, for each utterance unit, a region where the contact between the tongue and the palate is maximized, and a different context in which the utterance unit is included. 4. The synthesis-based conversation training apparatus according to claim 3, further comprising a context-reflecting small portion that enables synthesis by dividing at least one contact area according to the following.

5. The apparatus according to claim 1, wherein said synthesizing means has a 63-point expression control section for expressing a contact portion between the tongue and the palate with 63 points. A speech training apparatus based on synthesis according to claim 4.

6. An utterance measuring means for performing utterance measurement including measurement of a contact pattern between a tongue and a palate.
6. A conversation training device based on synthesis according to claim 5.

7. An evaluation means for evaluating the similarity of a tongue-palate contact pattern between an exemplary utterance in training and a measured trainee's utterance.
A conversation training device based on the described composition.

8. The synthesis-based conversation training apparatus according to claim 7, further comprising display means for displaying a result of the evaluation of the similarity.

9. A conversation input step of inputting a conversation to be trained, a conversation data creation conversion step of converting the input conversation into utterance units and start and end data of the utterance units, Tongue and palate contact pattern creation step of converting the converted conversation data into a corresponding tongue and palate contact pattern, and a display step of displaying the converted tongue and palate contact pattern. Conversation training method based on synthesis.

10. The tongue and palate contact pattern creating step includes: data indicating an area of contact between the tongue and palate for each vocal unit;
Furthermore, it has a contact area time reflection creation small step of holding data on the start and end times of each utterance unit and the duration of a stable state, and creating a contact pattern of the tongue and palate based on both data. 10. The method for training conversation based on synthesis according to claim 9, wherein:

11. The contact area time reflection creating sub-step includes, for each utterance unit, a region where the tongue and the palate come into contact with each other and a different context in which the utterance unit is used. The data of at least one other contact area is retained, and from this data and data on the start and end times of each vocal unit and the duration of the steady state, the tongue and palate contact depending on the context 11. The synthesis-based conversation training apparatus according to claim 10, further comprising a context reflecting small step for creating a pattern.

12. The tongue and palate contact pattern creating step is to hold each contact area between the tongue and palate as data comprising 63 points as data indicating an area of contact between the tongue and palate in each vocal unit. 12. The conversation training method based on synthesis according to claim 9, wherein the method has a small step of holding dot data.

13. An input means for documenting and inputting a conversation to be trained, an editing means for converting the input conversation into a phoneme set for use in practical education, and Sound parameter synthesizing means for converting into a sound parameter set that characterizes the speech in the implementation education of conversation, and converting the converted sound parameter set into an articulatory parameter set related to the operation of a vocal organ required for the utterance. Articulation parameter synthesizing means for converting, operation display means for displaying the operation of the vocal organ based on the converted articulation parameter set, and synthesized speech conversion means for converting the converted sound parameter set into synthesized speech output. A speech creation means for creating speech that can be heard by a hearing-impaired person from the converted synthesized speech output; A means for measuring a trainee's tongue contact pattern when a student utters; a voice measurement means for measuring a trainee's voice; and an air flow measurement means for measuring a trainee's exhaled air flow. A nasal vibration measuring means for measuring the trainee's nasal vibration; a neck vibration measuring means for measuring the trainee's neck vibration; and a measurement value obtained from the trainee and an exemplary model for practical education. A conversational training apparatus based on synthesis, comprising: comparison means for comparing and evaluating similarities with sound parameter sets of various utterances; and comparison display means for displaying a result of the comparison and evaluation by the comparison means. .

14. An input means for documenting and inputting a conversation to be trained, an editing means for converting the input and documented conversation into a phoneme set for use in practical education, and the converted phoneme set. And a synthesizing means for converting the sound parameter set into a sound parameter set that characterizes an exemplary speech in the practical education of the conversation of the training target. A speech training apparatus based on synthesis, comprising: articulation parameter synthesizing means for converting to an articulation parameter set; and display means for displaying the operation of a vocal organ based on the converted articulation parameter set. .

15. The speech training apparatus based on synthesis according to claim 14, further comprising formant synthesis means for converting the converted sound parameter set into a synthesized speech output.

16. The apparatus according to claim 14, further comprising a synthesized voice output means for outputting an audible synthesized voice.
16. A conversation training apparatus based on synthesis according to claim 15.

17. The speech training apparatus based on synthesis according to claim 14, further comprising articulation parameter measuring means for measuring an element related to an articulation parameter accompanying the utterance of the trainee.

18. The speech training apparatus based on synthesis according to claim 18, wherein said articulatory parameter measuring means includes a tongue contact pattern measuring unit for measuring a tongue contact pattern.

19. The speech training apparatus based on synthesis according to claim 17, wherein said articulation parameter measuring means has a sound measuring section for measuring sound.

20. The speech training apparatus based on synthesis according to claim 17, wherein said articulation parameter measuring means has a discharge air flow measuring section for measuring a discharge air flow.

21. The speech training apparatus based on synthesis according to claim 17, wherein said articulation parameter measuring means includes a nasalized vibration measuring unit for measuring a nasalized vibration.

22. The speech training apparatus based on synthesis according to claim 17, wherein said articulation parameter measuring means has a neck vibration measuring section for measuring neck vibration.

23. Similarity comparison evaluation means for comparing and evaluating the similarity between a sound parameter set of an exemplary utterance of a conversation in a practical education and a sound parameter set associated with the utterance of a trainee. 18. A conversation training apparatus based on synthesis according to claim 17.

24. A comparison display means for comparing and displaying the similarity between the sound parameter set of the exemplary conversation in the practical education and the value of the sound parameter set measured when the trainee utters. 24. A conversation training apparatus based on synthesis according to claim 17 or claim 23.

25. Documenting and inputting a conversation to be trained, converting the input conversation to a phoneme set for use in practical education, and converting the converted phoneme set to a conversation to be trained. A sound parameter synthesizing step for converting into sound parameters characterizing as speech in practical education; a synthetic sound output step for converting the converted sound parameters into a synthetic sound output; and a hearing impaired person based on the converted synthetic sound output. A speech creation step for creating an audible speech, an articulation parameter synthesizing step for converting the converted sound parameters into an articulation parameter set related to the operation of a vocal organ required for the speech, and the converted articulation parameters. To convert articulation into a set of articulation parameters related to the behavior of the vocal organs required for its utterance A parameter set synthesizing step, an operation displaying step of displaying the operation of the vocal organ based on the converted articulatory parameter set, and measuring a tongue contact pattern of the trainee when the trainee utters a conversation to be trained A tongue contact pattern measurement step, a vocal measurement step to measure the trainee's vocalization, and a discharge air flow measurement step to also measure the discharge airflow of the trainee, and a nasal vibration of the trainee. Nose sounding vibration measurement step, which also measures the trainee's neck vibration, and similarity between the typical utterance sound parameter set and the measurement values obtained from the trainee in the practical education A similarity comparison evaluation step of comparing and evaluating, and a comparison display step of displaying a result of the step of performing the comparison evaluation. Basic conversations training method the synthesis that.

26. An input step of documenting and inputting a conversation to be trained, an editing step of converting the input and documented conversation into a phoneme set for use in practical education, and converting the converted phoneme set. A sound parameter synthesizing step of converting into a sound parameter set characterizing as an exemplary voice in the practical training of the conversation to be trained; and articulation relating to the operation of the vocal organ necessary for the utterance of the converted sound parameter. A speech training method based on synthesis, comprising: an articulatory parameter synthesizing step of converting to a parameter; and an operation displaying step of displaying an operation of a vocal organ based on the converted articulatory parameter set. .

27. A synthesized voice conversion step of converting a sound parameter set into a synthesized voice output, a voice generation step of generating a voice audible to a hearing-impaired person from the converted synthesized voice output, and a trainee as a training target. An utterance measurement step for measuring the tongue contact pattern, utterance sound, exhalation air flow, nasal vibration, and neck vibration when uttering a conversation corresponding to the input document, and sound parameters of an exemplary utterance in practical education. A comparison step of comparing and evaluating the similarity between the set and a measured value of the sound parameter set when the trainee utters, and a comparison display step of displaying a result of the comparison evaluation in the comparison step. 27. A conversation training method based on synthesis according to claim 26.

28. An input step of inputting a document in a language to be learned, an editing step of converting the input document into a phoneme set, and converting the converted phoneme set into an input document of a foreign language to be learned. A sound parameter set synthesizing step of converting the sound parameter set into a sound parameter set characterizing the utterance; A synthesis-based conversation training method, comprising: a synthesis step; and an operation display step of displaying an operation of a vocal organ based on the converted articulation parameter set.

29. A synthesized speech output step for converting a sound parameter into a synthesized speech output; a foreign language speech creation step for creating a foreign language speech that can be heard by a trainee from the converted synthesized speech output; When uttering a document to be learned in a foreign language, a foreign language utterance measurement step for measuring the tongue contact pattern, vocal sound, exhaled air flow, nasal vibration, and neck vibration, A comparison step of comparing and evaluating the similarity between the sound parameter set of the input document and the measured value when the trainee utters, and a comparison display step of displaying a result of the comparison evaluation in the comparison step. 29. The method for training conversation based on synthesis according to claim 28.