JPH02129686A

JPH02129686A - Conversation aid apparatus

Info

Publication number: JPH02129686A
Application number: JP63273874A
Authority: JP
Inventors: Kenji Kurono; 黒野　健治
Original assignee: ROEHM PROPERTIES BV
Current assignee: ROEHM PROPERTIES BV
Priority date: 1988-10-28
Filing date: 1988-10-28
Publication date: 1990-05-17
Anticipated expiration: 2012-08-06
Also published as: JP2638151B2

Abstract

PURPOSE: To use a word or a conversational sentence which are not prepared by a memory by syntheisizing the voice data of a voice storage means with voice data selected by a selecting means as a series of voice data, and outputting it as a voice. CONSTITUTION: Conversational data such as a stylized sentence are prepared as a conversational data group M4, and when the output of the desired conversation is instructed by an operator from an instruction inputting means M3, the voice data of the desired conversation are selected from the conversational data group M4 by a selecting means M5. The voice data are synthesized with the voice data directly obtained from the voice of the operator stored through a voice inputting means M1 in a voice storage means M2 as a series of voice data by a synthesizing means M6. Then, this series of voice data are uttered from a voice outputting means M7. Thus, a desired complete sentence can be automatically and smoothly uttered altogether with a sentence in the memory only by uttering and inputting a voice which is not prepared by the memory.

Description

【発明の詳細な説明】え咀Ω旦句［産業上の利用分野］本発明は会話補助装置に関し、特に外国語会話において
、少ない知識でその場の状況に合わせた会話、あるいは
自己の音声にて会話する場合の補助装置に関する。[Detailed Description of the Invention] え咀Ωdanku [Field of Industrial Application] The present invention relates to a conversation aid device, and in particular, in foreign language conversation, it is possible to have a conversation tailored to the situation at the moment with little knowledge, or to adjust one's own voice. This invention relates to auxiliary devices for conversation.

［従来の技術］従来、発音練習機として、所定の単語をキーボードから
入力すると、その単語に対応する発声を電子回路により
合成して示す装置が存在する（特開昭６２−３６６８５
号）。また、メツセージ用に所定の文章を音声で出力す
る装置がある（特開昭６２−４０５２４号）。また人間
の音声を電気的に半導体メモリに記憶しておき、後で再
生する装置もある（特開昭６２−５５６９８号）。[Prior Art] Conventionally, as a pronunciation practice device, there is a device in which when a predetermined word is inputted from a keyboard, the utterance corresponding to the word is synthesized and displayed using an electronic circuit (Japanese Patent Laid-Open No. 62-36685).
issue). There is also a device that outputs a predetermined text for a message in the form of voice (Japanese Patent Laid-Open No. 62-40524). There is also a device that electrically stores human voice in a semiconductor memory and reproduces it later (Japanese Patent Laid-Open No. 62-55698).

［発明が解決しようとする課題］しかし、これらの装置は単に操作者が所望の会話・単語
を発音させたり、テープレコーダ代わりに録音し再生す
るものである。従ってメモリに用意してない単語や会話
文では利用することが困難であり、更に装置に記憶され
た会話文を自己の声で発声させることなども不可能であ
った。[Problems to be Solved by the Invention] However, these devices simply allow the operator to pronounce desired conversations and words, or record and reproduce the desired conversation or words in place of a tape recorder. Therefore, it is difficult to use words or conversational sentences that are not prepared in the memory, and it is also impossible to utter the conversational sentences stored in the device in one's own voice.

発明の構成本発明はこの課題を解決し、装置のメモリにある知識は
十分に活用すると共に、装置のメモリにない単語や会話
文についても音声で出力可能な会話補助装置を提供し、
また操作者の声でその会話が出力される会話補助装置を
提供することを目的とするものである。Structure of the Invention The present invention solves this problem, and provides a conversation aid device that can make full use of the knowledge stored in the device's memory, and can also output words and conversational sentences that are not in the device's memory in voice.
Another object of the present invention is to provide a conversation auxiliary device that outputs the conversation using the operator's voice.

［課題を解決するための手段］上記課題を解決するための構成は次のごとくである。即
ち、第１発明の会話補助装置は、第１図（Ａ）に例示す
るごとく、音声入力手段Ｍ１と、音声入力手段Ｍ１から入力された音声を記憶する音声記
憶手段Ｍ２と、指示入力手段Ｍ３と、指示入力手段Ｍ３から入力された指示に基づいて、音声
データからなる会話データ群Ｍ４から所定の会話の音声
データを選択する選択手段Ｍ５と、音声記憶手段Ｍ２の
音声データと選択手段Ｍ５にて選択された音声データと
を一連の音声データとして合成する合成手段Ｍ６と、合成手段Ｍ６にて合成された音声データを音声として出
力する音声出力手段Ｍ７と、を備えたことを特徴とする。[Means for solving the problem] The configuration for solving the above problem is as follows. That is, the conversation auxiliary device of the first invention, as illustrated in FIG. 1(A), includes: voice input means M1; voice storage means M2 for storing the voice input from voice input means M1; and instruction input means M3. and a selection means M5 for selecting audio data of a predetermined conversation from a conversation data group M4 consisting of audio data based on an instruction inputted from the instruction inputting means M3; The present invention is characterized by comprising: a synthesizing means M6 for synthesizing the voice data selected by the synthesizing means M6 as a series of voice data; and an audio output means M7 for outputting the voice data synthesized by the synthesizing means M6 as voice.

第２発明の会話補助装置は、第１図（Ｂ）に例示するご
とく、音声入力手段Ｍｌｌと、音声入力手段Ｍｌｌから入力された音声を記憶する音声
記憶手段Ｍ１２と、指示入力手段Ｍ１３と、指示入力手段Ｍ１３から入力された指示に基づいて、会
話データ群Ｍ１４から所定の会話データを選択する選択
手段Ｍ１５と、音声入力手段Ｍｌｌから入力された音声パターンの特性
を抽出する特性抽出手段ＭＩＢと、選択手段Ｍ１５にて
選択された会話データを、特性抽出手段Ｍ１６にて抽出
された音声パターン特性を有する音声データに変換する
データ変換手段Ｍ１７と、音声記憶手段Ｍ１２の音声データとデータ変換手段Ｍ１
７にて変換された音声データとを一連の音声データとし
て合成する合成手段Ｍ１８と、合成手段Ｍ１Ｂにて合成
された音声データを音声として出力する音声出力手段Ｍ
１９と、を備えたことを特徴とする。The conversation auxiliary device of the second invention, as illustrated in FIG. 1(B), includes: voice input means Mll; voice storage means M12 for storing the voice input from the voice input means Mll; instruction input means M13; Selection means M15 selects predetermined conversation data from the conversation data group M14 based on instructions input from the instruction input means M13; Characteristic extraction means MIB extracts characteristics of the voice pattern input from the voice input means Mll. , data conversion means M17 for converting the conversation data selected by the selection means M15 into voice data having the voice pattern characteristics extracted by the characteristic extraction means M16, and voice data of the voice storage means M12 and data conversion means M1.
synthesis means M18 for synthesizing the voice data converted in step 7 as a series of voice data; and voice output means M for outputting the voice data synthesized by the synthesis means M1B as voice.
19.

［作用コ第１発明定形文などの会話データは、会話データ群Ｍ４として準
備され、指示入力手段Ｍ３から操作者が所定の会話の出
力を指示すれば、選択手段Ｍ５が会話データ群Ｍ４から
所望の会話の音声データを選び出す。この会話の音声デ
ータは操作者が発声する部分が空白となっているデータ
である。この音声データは、合成手段Ｍ６により、音声
入力手段Ｍ１を介して音声記憶手段Ｍ２に記憶されてい
る操作者の声から直接的に得られた音声データと一連の
音声データに合成される。この一連の音声データは、音
声出力手段Ｍ７から発声される。[Operations] Conversation data such as first invention fixed phrases are prepared as a conversation data group M4, and when the operator instructs output of a predetermined conversation from the instruction input means M3, the selection means M5 selects a desired one from the conversation data group M4. Select the audio data of the conversation. The voice data of this conversation is data in which the part uttered by the operator is blank. This voice data is synthesized by the synthesis means M6 into a series of voice data and voice data directly obtained from the operator's voice stored in the voice storage means M2 via the voice input means M1. This series of audio data is output from the audio output means M7.

即ち、その発声される会話の内容は、操作者が声で入力
した部分と、会話データ群中にある会話とが一連の文に
合成されて、完全な会話文として出力される。That is, the contents of the spoken conversation are output as a complete conversation by combining the part inputted by the operator's voice and the conversation in the conversation data group into a series of sentences.

第２発明定形文などの会話データは、会話データ群Ｍ１４として
準備され、指示入力手段Ｍ１３から操作者が所定の会話
の出力を指示すれば、選択手段Ｍ１５が会話データ群Ｍ
１４から所望の会話データを選び出す。更に、音声入力
手段Ｍｌｌから入力された音声が音声記憶手段Ｍ１２に
記憶されるのは、第１発明と同様である。Conversation data such as fixed sentences of the second invention are prepared as a conversation data group M14, and when the operator instructs output of a predetermined conversation from the instruction input means M13, the selection means M15 selects the conversation data group M14.
Select desired conversation data from 14. Furthermore, similar to the first invention, the voice input from the voice input means Mll is stored in the voice storage means M12.

第２発明では、音声入力手段Ｍｌｌから入力された音声
について、特性抽出手段Ｍ１６により音声のパターン特
性が抽出される。次いで上記選択された会話データを、
データ変換手段Ｍ１７にて上記パターン特性を有する音
声データに変換される。この変換された音声データが、
第１発明と同様に、合成手段Ｍ１Ｂにより、音声記憶手
段Ｍ１２に記憶されている音声データと一連の音声デー
タに合成され、この一連の音声データは、音声出力手段
Ｍ１９から発声される。In the second invention, the characteristic extraction means M16 extracts the pattern characteristics of the voice input from the voice input means Mll. Next, the conversation data selected above is
The data conversion means M17 converts the audio data into audio data having the above pattern characteristics. This converted audio data is
Similar to the first invention, the synthesizing means M1B synthesizes the audio data stored in the audio storage means M12 into a series of audio data, and this series of audio data is uttered from the audio output means M19.

第２発明の場合は、その合成された会話文がすべて、操
作者の声あるいは操作者に近似の声で出力される。In the case of the second invention, all of the synthesized conversation sentences are output in the operator's voice or a voice similar to the operator's voice.

［実施例コ第２図に第１発明会話補助装置の一実施例を示す。本装
置は英仏西会話補助装置として構成されたものである。[Embodiment] Fig. 2 shows an embodiment of the conversation assisting device of the first invention. This device is configured as an English, French, and Spanish conversation aid.

会話補助装置１の外観はカード型を為し、その−面に、
指示入力用キー３が配列され、表示用の液晶パネル（Ｌ
ＣＤ）５、音声出力用のスピーカ７及び音声入力用のマ
イク９が設けられている。The appearance of the conversation aid device 1 is card-shaped, and on its side,
Instruction input keys 3 are arranged, and a display liquid crystal panel (L
CD) 5, a speaker 7 for audio output, and a microphone 9 for audio input.

この内部構成は第３図のブロック図に示すごとく、マイ
クロコンピュータとして構成され、主要部はＣＰＵｌ１
．ＲＯＭ１３．ＲＡＭ１５を備えると共ここ５、キー３
及びマイク９０入力回路１７と、液晶パネル５及びスピ
ーカ７の出力回路１９とを備えている。これらの構成は
図示しないアドレスバスやデータバスにて信号送受信可
能に接続されている。As shown in the block diagram of Fig. 3, this internal structure is configured as a microcomputer, and the main part is CPU11.
．． ROM13. With RAM15, here 5, key 3
and a microphone 90 input circuit 17, and an output circuit 19 for the liquid crystal panel 5 and speaker 7. These components are connected to enable signal transmission and reception via an address bus and a data bus (not shown).

第４図のフローチャートに第１発明一実施例の会話補助
装置１の処理を示す。本処理は電池２１が会話補助装置
１に装着された以後に繰り返し実行される。The flowchart in FIG. 4 shows the processing of the conversation assisting device 1 of the first embodiment of the invention. This process is repeatedly executed after the battery 21 is attached to the conversation assisting device 1.

処理が開始されると、まず初期設定がなされ、各種変数
やフラグの期間値が設定される（ステップ１１０）。次
にキー入力待となる（ステ・ンプ１２０）。ここでキー
３のいずれかが押されると、キー内容が判定される（ス
テップ１３０）。When the process starts, initial settings are first made, and period values of various variables and flags are set (step 110). Next, the system waits for a key input (step 120). If any of the keys 3 is pressed here, the contents of the key are determined (step 130).

仏語キー３ｂが押されると、ＲＯＭ１３またはＲＡＭ１
５内に記憶されている仏語の文章ファイルが、文章選択
の対象として設定される（ステップ１３５）、このとき
ＬＣＤ５には「仏会話が設定されました。」とメツセー
ジが表示される（ステップ１３７）。When French key 3b is pressed, ROM13 or RAM1
The French text file stored in 5 is set as a text selection target (step 135), and at this time, the message "French conversation has been set" is displayed on the LCD 5 (step 137). ).

次に、文章選択キー３ｅが押されると、該当する言語、
このときは仏語の文章がファイルの先頭から１つ読み出
されて表示される（ステップ１４０）。この文章が希望
の文章ではない場合、次候補選択キー３ｈが押されると
（ステップ１５ｏ）次の文章を表示する（ステップ１４
０）。希望の文章が表示されれば、確定キー３ｄを押す
と、その確定された文章に該当する音声データが、ＲＯ
Ｍ１３からＲＡＭ１５内の所定のアドレスに記憶される
（ステップ１６０）。Next, when the text selection key 3e is pressed, the corresponding language,
At this time, one French sentence is read from the beginning of the file and displayed (step 140). If this sentence is not the desired sentence, when the next candidate selection key 3h is pressed (step 15o), the next sentence is displayed (step 14).
0). When the desired sentence is displayed, press the confirm key 3d and the audio data corresponding to the confirmed sentence will be transferred to the RO.
The data is stored from M13 to a predetermined address in the RAM 15 (step 160).

例えば、順番にＬＣＤ５に表示されて来る複数の会話文
、例えば、ｒｃｏｍｂ　ｉ　ｅｎ　　ｙ　　ａ−ｔ−ｉ　１ｄ’１
ｃｉａ＄場所＄？」ｒＱｕｅ　　ｐｒｅｎｄｒｅｚ−ｖｏｕｓ。For example, a plurality of conversation sentences that are sequentially displayed on the LCD 5, such as rcomb i en y a-t-i 1d'1
cia$location$? ” rQue prendrez-vous.

＄物＊　　ｏｕ　　＄物＄？」の中から、ｒＱｕｅ　　ｐｒｅｎｄｒｅｚ−ｖｏｕｓ。$ thing * ou $ thing $? ” From among the rQue prendrez-vous.

＄物＄ｏｕ　　＄物＄？」なる文章を選択すると、この文章の音声データがＲＡＭ
１５内に記憶されることになる。この音声データは各種
の分析台成形符号化方式により得られたパラメータにて
構成されているものでもよく、また記録再生方式の音声
データでもよい。尚、文、章の内、２ｆｔｆ所の「＄物
＄」は音声データ挿入部分であり、音声データには変換
されない。$things$ou $things$? ” When you select a sentence, the audio data of this sentence is stored in RAM.
15. This audio data may be composed of parameters obtained by various types of analysis table shaping encoding methods, or may be audio data obtained by recording and reproducing methods. Note that "$mono$" at 2 ftf in a sentence or chapter is an audio data insertion part and is not converted into audio data.

次に「音声人カキ−を押してから、音声をいれてくださ
い。」という、音声入力指示表示がＬＣＤ５に表示され
る（ステップ１７０）。Next, a voice input instruction display is displayed on the LCD 5, ``Please press the voice key and then input voice.'' (step 170).

音声人カキ−３ｆが押されると、マイク９から音声入力
し記憶することが可能となる（ステップ１８０）。マイ
ク９から入力された音声は、入力回路１７に備えられた
Ａ／Ｄ変換装置により、その波形がデジタル値に変換さ
れて、ＲＡＭ１５内に設定されたバッファ中に記憶され
る。例えば、ｒｄｕ　　１ａｉｔＪと発音すれば、その
発音データがバッファに記憶される。この音声データは
上記会話文の内の第１番目の「＄物＄」に該当する。When the voice key 3f is pressed, it becomes possible to input voice from the microphone 9 and store it (step 180). The waveform of the audio input from the microphone 9 is converted into a digital value by an A/D converter provided in the input circuit 17, and the digital value is stored in a buffer set in the RAM 15. For example, if you pronounce rdu 1aitJ, the pronunciation data will be stored in the buffer. This audio data corresponds to the first "$mono$" in the conversational text.

その後、再度、音声人カキ−３ｆが押されれば（ステッ
プ１８４）、前回の訂正処理としてステップ１８０の音
声入力及び記憶処理が繰り返される。Thereafter, if the voice person key 3f is pressed again (step 184), the voice input and storage process of step 180 is repeated as the previous correction process.

確定キー３ｄが押されれば（ステップ１８６）、処理回
数が所定回数に至ったか否かが判断される（ステップ１
８日）。「＄物＄」は２つあり、もう１回音声データ入
力が必要なので否定判定されて、２回目の音声入力・記
憶処理に移る（ステップ１８０）。即ち、上記会話文の
内の第２番目の「＄物＄」に該当する音声データが同様
に入力・記憶されることとなる。例えば、　ｒｄｅ　　
ｌａｃｒｅｍｅ」と発音すれば、第２番目の発音データ
としてバ・ソファに記憶される。When the confirmation key 3d is pressed (step 186), it is determined whether the number of processing times has reached a predetermined number (step 1).
8th). Since there are two "$ items $" and one more voice data input is required, a negative determination is made and the process moves to the second voice input/storage process (step 180). That is, the audio data corresponding to the second "$thing $" in the conversational text is similarly input and stored. For example, rde
If you pronounce "la creme", it will be stored in the basso as the second pronunciation data.

次にこの音声データ中の２つの「＄物＄」部分にステッ
プ１８０にて記憶された２つの音声データを挿入して、
１つの文章に合成する（ステ・ンブ２１０）。即ち、ｒＱｕｅ　　ｐｒｅｎｄｒｅｚ−ｖｏｕｓ、ｄｕｆａｉ
′ｔ　　ｏｕ　　ｄｅ　　ｌａ　　ｃｒこｍｅ？」とい
う文章に該当する音声データを合成することになる。Next, insert the two audio data stored in step 180 into the two "$thing $" parts in this audio data,
Combine them into one sentence (Ste Mbu 210). That is, rQue prendrez-vous, dufai
'tou de la crome? The audio data corresponding to the sentence ``'' will be synthesized.

次に音声出力指示表示をして（ステップ２２０）、キー
入力待となる（ステ・ンブ１２０）。Next, a voice output instruction is displayed (step 220), and the system waits for key input (step 120).

次に音声出カキ−３ｇを押すと、ステップ２１０で合成
された音声データがスピーカ７から出力される（ステッ
プ２３０）。Next, when the audio output key 3g is pressed, the audio data synthesized in step 210 is output from the speaker 7 (step 230).

即ち、ｒＱｕｅ　　ｐｒｅｎｄｒｅｚ−ｖｏｕｓ、ｄｕｌａｉ
ｔ　　ｏｕ　　ｄｅ　　ｌａ　　ｃｒ″ｅｍｅ　　？Ｊ
なる音声がスピーカ７から出力されることになる。That is, rQue prendrez-vous, dulai
tou de la cr″eme ?J
The sound will be output from the speaker 7.

従って、単に目的の単語を発声して入力するだけで、所
望の文章が自動的に滑らかに発声されることになる。Therefore, by simply uttering and inputting the desired word, the desired sentence will be automatically and smoothly uttered.

本実施例において特別にステ・ンブ２１０の挿入合成処
理を実施せずとも、他の合成処理として、ステ・ンブ２
３０にて出力する際に発声すべき順に出力するようにし
てもよい。In this embodiment, even if the insertion and synthesis processing of the stem 210 is not performed, other synthesis processing can be performed on the stem 210.
When outputting in step 30, the outputting may be performed in the order in which they should be uttered.

即ちＲＯＭ１３からｒＱｕｅ　　ｐｒｅｎｄｒｅＺ−Ｖ
ＯｕＳＪの音声データを取り出して発声した後、ＲＡＭ
１５からｒｄ　ｕ　　ｌ　ａ　ｉ　ｔ」の音声データを
取り出して発声し、次ぎにＲＯＭ１３から「Ｏｕ」の音
声データを取り出して発声し、次ぎにＲＡＭ１５からｒ
ｄｅ　　ｌａ　　ｃｒｅｍｅ」の音声データを取り出し
て発声するようにしてもよい。尚、このような発声方式
ならば、ステップ１６０では音声データそのものを記憶
する必要はなく、文章の音声データが格納されているメ
モリ、アドレスのみ記憶しておけばよい。That is, rQue prendreZ-V from ROM13
After extracting and uttering the OuSJ audio data, the RAM
15, the voice data of ``rd u l a it'' is extracted and uttered, then the voice data of ``Ou'' is extracted from the ROM 13 and uttered, and then the voice data of ``rd u l a it'' is extracted from the RAM 15 and uttered.
The audio data of "de la creme" may be extracted and uttered. In addition, if such a utterance method is used, it is not necessary to store the audio data itself in step 160, but only the memory and address in which the audio data of the sentence is stored may be stored.

上記実施例において、マイク９が音声入力手段Ｍ１に該
当、し、ＲＡＭ１５が音声記憶手段Ｍ２に該当し、次候
補選択キー３ｈが指示入力手段Ｍ３に該当し、ＣＰＵＩ
Ｉが選択手段Ｍ５及び合成手段Ｍ６に該当し、スピーカ
７が音声出力手段Ｍ７に該当する。ＣＰＬ］１１の処理
の内、ステップ１４０．１５０の処理が選択手段Ｍ５と
しての処理に該当し、ステップ２１０の処理が合成手段
Ｍ６としての処理に該当する。In the above embodiment, the microphone 9 corresponds to the voice input means M1, the RAM 15 corresponds to the voice storage means M2, the next candidate selection key 3h corresponds to the instruction input means M3, and the CPU
I corresponds to the selection means M5 and the synthesis means M6, and the speaker 7 corresponds to the audio output means M7. CPL] 11, the processing of steps 140 and 150 corresponds to the processing as the selection means M5, and the processing of step 210 corresponds to the processing as the synthesis means M6.

第５図のフローチャートに第２発明一実施例の会話補助
装置の処理を示す。本実施例のハード的構成は第２図及
び第３図に示した第１発明の実施例と同一であるので説
明は省略する。本処理は電池２１が会話補助装置に装着
された以後に繰り返し実行される。The flowchart in FIG. 5 shows the processing of the conversation assisting device according to an embodiment of the second invention. The hardware configuration of this embodiment is the same as the embodiment of the first invention shown in FIGS. 2 and 3, so a description thereof will be omitted. This process is repeatedly executed after the battery 21 is attached to the conversation assisting device.

処理が開始されると、まず初期設定がなされ、各種変数
やフラグの初期値が設定される（ステップ３１０）。次
にキー入力待となる（ステップ３２０）。ここでキー３
のいずれかが押されると、キー内容が判定される（ステ
ップ３３０）。When the process starts, initial settings are first made, and initial values of various variables and flags are set (step 310). Next, the system waits for a key input (step 320). key 3 here
When any one of the keys is pressed, the contents of the key are determined (step 330).

英語キー３ａが押されると、ＲＯＭ１３またはＲＡＭ１
５内に記憶されている英語の文章ファイルが、文章選択
の対象として設定される（ステップ３３５）。このとき
ＬＣＤ５には「英会話が設定されました。」とメツセー
ジが表示される（ステ・ンプ３３７）。When English key 3a is pressed, ROM13 or RAM1
The English text file stored in 5 is set as a text selection target (step 335). At this time, the message "English conversation has been set" is displayed on the LCD 5 (step 337).

次に、文章選択キー３ｅが押されると、該当する言語、
このときは英語の文章がファイルの先頭から１つ読み出
されて表示される（ステップ３４０）。この文章が希望
の文章ではない場合、次候補選択キー３ｈが押されると
（ステップ３５０）次の文章を表示する（ステップ３４
０）。希望の文章が表示されれば、確定キー３ｄを押す
と、その確定された文章に該当する音声データが、ＲＯ
Ｍ１３からＲＡＭ１５内の所定のアドレスに記憶される
（ステ・ンブ３６０）。Next, when the text selection key 3e is pressed, the corresponding language,
At this time, one English sentence is read from the beginning of the file and displayed (step 340). If this sentence is not the desired sentence, when the next candidate selection key 3h is pressed (step 350), the next sentence is displayed (step 34).
0). When the desired sentence is displayed, press the confirm key 3d and the audio data corresponding to the confirmed sentence will be transferred to the RO.
The data is stored from M13 to a predetermined address in the RAM 15 (step 360).

例えば、順番にＬＣＤ５に表示されて来る複数の会話文
、例えば、ｒＷｈｅｒｅ　　ｃａｎ　　Ｉ　　ｇｅｔ　　＄物＄？
」ｒＷｏｕｌｄ　　ｙｏｕ　　ｐａｇｅ　　＄人＄？」
の中から、ｒＷｏｕｌｄ　　ｙｏｕ　　ｐａｇｅ　　＄人＄？」な
る文章を選択すると、この文章の音声データがＲＡＭ１
５内ｔこ記憶されることになる。この音声データは各種
の分析合成形符号化方式により得られたパラメータにて
構成されているものである。For example, a plurality of conversation sentences are sequentially displayed on the LCD 5, such as rWhere can I get $things$?
”Would you page $people$? ”
From the rWould you page $people$? ”, the audio data of this sentence will be stored in RAM1.
5 out of 5 times will be stored. This audio data is composed of parameters obtained by various analysis-synthesis encoding methods.

尚、文章の内、「＄人＄」は音声データ挿入部分であり
、音声データには変換されない。Note that in the text, "$人$" is a portion into which audio data is inserted, and is not converted into audio data.

次に「音声人カキ−を押してから、音声をいれてくださ
い。」という、音声入力指示表示がＬＣＤ５に表示され
る（ステップ３７０）。Next, a voice input instruction display is displayed on the LCD 5, ``Please press the voice key and then input voice.'' (step 370).

音声人カキ−３ｆが押されると、マイク９から音声入力
し記憶することが可能となる（ステップ３８０）。マイ
ク９から入力された音声は、入力回路１７に備えられた
Ａ／Ｄ変換装置により、その波形がデジタル値に変換さ
れて、ＲＡＭ１５内に設定されたバッファ中に記憶され
る。例えば、ｒＭｒ、Ｓｍｉ　ｔ　ｈＪと発音すれば、
その発音データがバッファに記憶される。この音声デー
タは上記会話文の内の「＄人＄」に該当する。When the voice key 3f is pressed, it becomes possible to input voice from the microphone 9 and store it (step 380). The waveform of the audio input from the microphone 9 is converted into a digital value by an A/D converter provided in the input circuit 17, and the digital value is stored in a buffer set in the RAM 15. For example, if you pronounce rMr, Smit hJ,
The sound data is stored in a buffer. This audio data corresponds to "$人$" in the above conversation sentence.

その後、再度、音声人カキ−３ｆが押されれば（ステッ
プ３日４）、前回の訂正処理としてステップ３８０の音
声入力及び記憶処理が繰り返される。Thereafter, if the voice person key 3f is pressed again (step 3, day 4), the voice input and storage process of step 380 is repeated as the previous correction process.

確定キー３ｄが押されれば（ステップ３８６）、処理回
数が所定回数に至ったか否かが判断される（ステップ３
８８）。ここでは音声データが必要なのは「＄人＄」の
１つだけであるので、否定判定されて、次ぎにステ・ン
ブ３８０にてバッファに記憶されている音声のパターン
分析が行われる（ステップ３９０）。If the confirmation key 3d is pressed (step 386), it is determined whether the number of processing times has reached a predetermined number (step 3
88). Here, since only one piece of audio data, "$人$", is required, a negative determination is made, and then pattern analysis of the audio stored in the buffer is performed in step 380 (step 390). .

即ち、一般的に知られているＰＡＲＣＯＲボコーダ方式
やＬＳＰボコーダ方式等の分析合成形符号化方式により
、話者の音声の特徴を表すパラメータを検出する。特に
スペクトル包絡と基本周波数（ピッチ）とが特撮を表し
ているので、この２つのパラメータのみを捉えてもよい
。That is, parameters representing characteristics of the speaker's voice are detected using a generally known analysis-synthesis encoding method such as the PARCOR vocoder method or the LSP vocoder method. In particular, since the spectral envelope and fundamental frequency (pitch) represent special effects, only these two parameters may be captured.

次にステップ３６０で記憶された文章の音声データの標
準的パラメータの内、スペクトル包絡パラメータと基本
周波数（ピッチ）パラメータとの部分が、ステップ３９
０にて検出されたパラメータで入れ替えられる（ステッ
プ４００）。勿論、ステ・ンブ３９０で他のパラメータ
を検出していれば、そのパラメータも入れ替えてもよい
。こうして選択された文章の音声データが話者の声に近
似される。Next, among the standard parameters of the audio data of the sentence stored in step 360, the spectral envelope parameter and fundamental frequency (pitch) parameter are stored in step 39.
It is replaced with the parameter detected at 0 (step 400). Of course, if other parameters are detected by the stem 390, those parameters may also be replaced. In this way, the audio data of the selected sentence is approximated to the speaker's voice.

次にこの近似された音声データ中の「＄人＄」の部−分
にステップ３８０にて記憶された音声データを挿入して
、１つの文章に合成する（ステップ４１０）。Next, the voice data stored in step 380 is inserted into the "$人$" portion of the approximated voice data to synthesize one sentence (step 410).

次に音声出力指示表示をして（ステップ４２０）、キー
入力待となる（ステップ３２０）。Next, a voice output instruction is displayed (step 420), and the system waits for key input (step 320).

次に音声出カキ−３ｇを押すと、ステップ４１０で合成
された音声データがスピーカ７から出力される（ステッ
プ４３０）。Next, when the audio output key 3g is pressed, the audio data synthesized in step 410 is output from the speaker 7 (step 430).

即ち、ｒＷｏｕｌｄ　　ｙｏｕ　　ｐａｇｅＭｒ、Ｓｍｉ　ｔｈ　　？Ｊなる音声がスピーカ７から出力されることになる。That is, rWould you page Mr. Smith? J The sound will be output from the speaker 7.

しかもこの音声の内、ｒＭｒ、Ｓｍｉ　ｔｈＪは完全に
話者の音質であり、ｒＷｏｕｌｄ　　ｙｏｕｐａｇｅＪ
は話者の音質に近似した音質となっている。Furthermore, among these sounds, rMr, Smith thJ are completely the sound quality of the speaker, and rWould youpageJ
has a sound quality that approximates that of the speaker.

従って、単に目的の単語を発声して入力するだけで、所
望の文章が自動的に自己の声で滑らかに発声されること
になる。Therefore, by simply speaking and inputting the desired word, the desired sentence will be automatically and smoothly uttered in one's own voice.

本実施例において特別にステ・ンプ４１０の挿入合成処
理を実施せずとも、他の合成処理として、ステップ４３
０にて出力する際に発声すべき順に出力してもよいこと
は、第１発明の実施例と同様である。In this embodiment, even if step 410 is not specially inserted and synthesized, step 43 can be performed as another synthesis process.
Similar to the embodiment of the first invention, the output may be performed in the order in which the utterances should be made when outputting the utterances at 0.

上記実施例において、マイク９が音声入力手段Ｍｌｌに
該当し、次候補選択キー３ｈが指示入力手段Ｍ１３に該
当し、ＣＰＵＩＩが選択手段Ｍ１５、特性抽出手段Ｍ１
６、データ変換手段Ｍ１７及び合成手段Ｍ１Ｂに該当し
、ＲＡＭ１５が音声記憶手段Ｍ１２に該当し、スピーカ
７が音声出力手段Ｍ１９に該当する。ＣＰＵＩＩの処理
の内、ステップ３４０．３５０の処理が選択手段Ｍ１５
としての処理に該当し、ステップ３９０の処理が特性抽
出手段Ｍ１６としての処理に該当し、ステップ４００の
処理がデータ変換手段Ｍ１７としての処理に該当し、ス
テップ４１０の処理が合成手段Ｍ１８としての処理に該
当する。In the above embodiment, the microphone 9 corresponds to the voice input means Mll, the next candidate selection key 3h corresponds to the instruction input means M13, the CPU II corresponds to the selection means M15, and the characteristic extraction means M1.
6. corresponds to the data conversion means M17 and the synthesis means M1B, the RAM 15 corresponds to the voice storage means M12, and the speaker 7 corresponds to the voice output means M19. Among the processes of the CPU II, the processes of steps 340 and 350 are performed by the selection means M15.
The processing at step 390 corresponds to the processing as the characteristic extraction means M16, the processing at step 400 corresponds to the processing as the data conversion means M17, and the processing at step 410 corresponds to the processing as the synthesis means M18. Applies to.

各実施例において、発音すべき単語が不明な場合を考慮
して辞書機能を設け、ステ・ツブ１８０゜３８０の処理
時に、日本語を、別途設けた仮名キーから入力すると、
英語、仏語、スペイン語に変換してＬＣＤ５に出力する
ようにしてもよい。また同時にスピーカ７からその発音
を音声出力してもよい。操作者はそれを見て、あるいは
それを問いて音声入力すればよい。In each embodiment, a dictionary function is provided in case the word to be pronounced is unknown, and when Japanese is input from the separately provided kana key when processing Ste-Tsub 180°380,
The information may be converted into English, French, or Spanish and output to the LCD 5. At the same time, the pronunciation may be output as audio from the speaker 7. The operator can look at it or ask it and input it by voice.

またステ・ツブ１４０．３４０の表示処理にて、各国語
の文章を表示する際に、あわせて日本語を表示するよう
にしてもよい。Further, in the display processing of STEP 140.340, when displaying sentences in each language, Japanese may also be displayed.

例えば、ｒＱｕｅ　　ｐｒｅｎｄｒｅｚ−ｖｏｕｓ。for example, rQue prendrez-vous.

＄物＄　　ｏｕ　　＄物＄　？＄物＄と　＄物＄と　どちらにしますか。」ｒＷｏｕｌ
ｄ　　ｙｏｕ　　ｐａｇｅ　　＄人＄　？＄人＄　を呼
び出してもらえますか。」といった表示にする。$things$ou $things$? Which would you prefer, $things or $things? ”rWool
d you page ＄人＄? Could you please call someone $? ” will be displayed.

またこの表示の際に同時に装置が発音するようにしても
よい。こうすれば操作者にとっては予め正確な発音が覚
えられるという、メリットがある。Furthermore, the device may generate a sound at the same time as this display. This has the advantage that the operator can memorize the correct pronunciation in advance.

及肌辺苅１第１発明によれば、単に目的の単語を発声して入力する
だけで、メモリ中の文章と一体となって所望の完全な文
章が自動的に滑らかに発声されることになる。According to the first invention, by simply speaking and inputting the desired word, the desired complete sentence is automatically and smoothly uttered together with the sentence in memory. Become.

第２発明によれば、更に所望の文章が自己の声で滑らか
に発声されることになる。According to the second invention, the desired sentence can be uttered smoothly in one's own voice.

各発明のような会話補助装置を使用すれば、即座に必要
な会話文を円滑に発声できる。話者が発音する部分は一
部分であり、他は装置側が正確な発音をするので、話者
の発音が上手でなくとも、外国人の聞き手にも理解が容
易である。特に第２発明の会話補助装置は文章全体が話
者の音声に近い音質りこ統一されるので、聞き側も違和
感なく一層理解し易い。By using a conversation aid device such as those disclosed in the inventions, it is possible to immediately and smoothly utter the necessary conversation sentences. The speaker pronounces only one part, and the rest is pronounced accurately by the device, so even if the speaker is not good at pronunciation, it is easy for foreign listeners to understand. In particular, in the conversation assisting device of the second invention, the entire text has a uniform sound quality that is close to the speaker's voice, making it easier for the listener to understand without feeling any discomfort.

[Brief explanation of drawings]

第１図（Ａ）は第１発明の基本的構成例示図、第１図（
Ｂ）は第２発明の基本的構成例示図、第２図は第１発明
及び第２発明の各実施例の外観斜視図、第３図はそのブ
ロック図、第４図は第１発明一実施例の処理内容のフロ
ーチャート、第５図は第２発明一実施例の処理内容のフ
ローチャートを表す。Ｍｌ、ＭＩＭ２．ＭＩＭ３．ＭＩＭ４．ＭＩＭ５．ＭＩＭ６．Ｍｌ１・・・音声入力手段２・・・音声記憶手段３・・・指示入力手段４・・・会話データ群５・・・選択手段８・・・合成手段Ｍ７．Ｍ１９・・・音声出力手段Ｍ２Ｓ・・・特性抽出手段　Ｍ１７・・・データ変換手
段３ｈ・・・次候補選択キー　　７・・・スピーカ９・
・・マイク　　　　　　１１・・・ＣＰＵ１３・・・Ｒ
ＯＭ　　　　　　１５・・・ＲＡＭ代理人　弁理士　定
立　勉　（はが２名）図万そのｌFIG. 1(A) is a diagram illustrating the basic configuration of the first invention;
B) is a diagram illustrating the basic configuration of the second invention, FIG. 2 is an external perspective view of each embodiment of the first invention and the second invention, FIG. 3 is a block diagram thereof, and FIG. 4 is an implementation of the first invention. FIG. 5 shows a flowchart of the processing contents of the example of the embodiment of the second invention. Ml, MI M2. MI M3. MI M4. MI M5. MI M6. Ml 1... Voice input means 2... Voice storage means 3... Instruction input means 4... Conversation data group 5... Selection means 8... Synthesis means M7. M19...Audio output means M2S...Characteristic extraction means M17...Data conversion means 3h...Next candidate selection key 7...Speaker 9.
...Microphone 11...CPU13...R
OM 15...RAM agent Patent attorney Tsutomu Sadatsu (2 people)

Claims

[Claims] 1. A voice input means, a voice storage means for storing the voice input from the voice input means, an instruction input means, and a conversation consisting of voice data based on the instructions input from the instruction input means. a selection means for selecting voice data of a predetermined conversation from a data group; a synthesis means for synthesizing the voice data in the voice storage means and the voice data selected by the selection means as a series of voice data; and synthesis by the synthesis means. A conversation auxiliary device comprising: a voice output means for outputting the voice data as voice; 2 voice input means; voice storage means for storing the voice input from the voice input means; instruction input means; and selecting predetermined conversation data from the conversation data group based on the instructions input from the instruction input means. a selection means; a characteristic extraction means for extracting the characteristics of the voice pattern input from the voice input means; and a characteristic extraction means for extracting the characteristics of the voice pattern input from the voice input means; a data converting means for converting; a synthesizing means for synthesizing the audio data in the audio storage means and the audio data converted by the data converting means as a series of audio data; and outputting the audio data synthesized by the synthesizing means as audio. A conversation auxiliary device comprising: a voice output means for outputting a voice;