JPH04106596A

JPH04106596A - Voice recognition and output device

Info

Publication number: JPH04106596A
Application number: JP2222562A
Authority: JP
Inventors: Toru Miyamae; 徹宮前; Kenichi Hirayama; 健一平山
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1990-08-27
Filing date: 1990-08-27
Publication date: 1992-04-08

Abstract

PURPOSE:To lighten the voicing load on an operator greatly by inputting only a voice of a key word and then obtaining an output with a large information quantity. CONSTITUTION:A voice recognition part 2 compares a voice which is inputted through a microphone 1 with a standard pattern to recognize the voice. A reader 4 receives the recognition result 3 from the voice recognition part 2 and retrieves a key word stored in a memory 5. The memory 5 is stored with the addresses of voice synthesis patterns stored in a voice synthesis pattern memory 6. Further, a key word A is inputted to the recognition part 2 through the microphone 1. Then the reader 4 retrieves a memory 5 stored with correspondence between key words and the addresses of the respective voice synthesis patterns and reads the address of the voice synthesis pattern corresponding to the key word recognized by the recognition part 2. Then data of the voice synthesis pattern whose address is specified is sent to a voice synthesis part 7 and a synthesized voice 9 of a long sentence is outputted from a speaker.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、音声認識機能を有し、その認識結果に対応し
た情報を出力する音声認識出力装置に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a speech recognition output device that has a speech recognition function and outputs information corresponding to the recognition result.

（従来の技術）従来、音声認識装置は、ワードプロセッサや音声合成装
置等の出力装置と接続され、認識された音声の内容に応
じて文字や音声の形態で出力を行なうのが一般的であっ
た。(Prior Art) In the past, speech recognition devices were generally connected to output devices such as word processors and speech synthesis devices, and output in the form of text or speech depending on the content of the recognized speech. .

第２図は、従来の音声認識処理フローを示す図である。FIG. 2 is a diagram showing a conventional speech recognition processing flow.

まず、アナログ音声信号２ｏが音声認識装置３０に入力
されると、マイクアンプ２１により適切なゲインで増幅
される。そして、その後、アナログローパスフィルタ２
２によって折り返し歪を防ぐための帯域制限が行なわれ
る。以上の処理を受けたアナログ音声信号は、Ａ／Ｄ変
換器２３によってディジタル音声信号に変換される。そ
して、次のバンドパスフィルタ２４による周波数分析で
最終的な音声データ表現であるスペクトルに変換される
。また、このとき、音声信号のパワーも演算される。First, when the analog voice signal 2o is input to the voice recognition device 30, it is amplified by the microphone amplifier 21 with an appropriate gain. And after that, analog low pass filter 2
2 performs band limitation to prevent aliasing distortion. The analog audio signal that has undergone the above processing is converted into a digital audio signal by the A/D converter 23. Then, in the next frequency analysis by the bandpass filter 24, it is converted into a spectrum which is the final voice data expression. At this time, the power of the audio signal is also calculated.

以上のように、スペクトルに変換された音声は、パワー
情報等を用いて音声区間を検出する音声区間検出器２５
により真の音声区間の始端及び終端が求められる。その
真の音声区間内のスペクトルデータと予め用意された標
準パターン２６とがマツチング部２７によりマツチング
される。このマツチングにより非類似度を表わす距離値
が各カテゴリ毎に算出される。その距離値の大小によっ
て最終的な認識結果２８が判定され、出力される。As described above, the voice converted into a spectrum is processed by the voice section detector 25 which detects the voice section using power information etc.
The starting and ending points of the true voice section can be found. A matching section 27 matches the spectrum data within the true voice section with a standard pattern 26 prepared in advance. Through this matching, a distance value representing the degree of dissimilarity is calculated for each category. The final recognition result 28 is determined based on the magnitude of the distance value and is output.

次に、出力された認識結果が使用者の意図する処理を実
行するための出力装置２９に送られ、認識結果に応じた
出力が行なわれる。例えば、ワードプロセッサであるな
らば、認識語を文字列に変換し、デイスプレィ上に表示
する。又は、プリンタ毎にその文字を出力する。また、
音声翻訳機ならば、その認識語を他国語に翻訳した後、
音声翻訳でその機械語を出力する。Next, the output recognition result is sent to the output device 29 for executing the process intended by the user, and output is performed according to the recognition result. For example, a word processor converts the recognized word into a character string and displays it on the display. Or output the character for each printer. Also,
If it is a voice translator, after translating the recognized word into another language,
Output the machine language using voice translation.

（発明が解決しようとする課題）しかしながら、上述した従来の技術には、次のような問
題点があった。(Problems to be Solved by the Invention) However, the above-described conventional technology has the following problems.

即ち、音声認識を利用した出力装置では、入力する音声
と出力との情報量は同等である。っまり、大きな情報量
を持つ出力、例えば長いセンテンスやガイダンスを得る
ためには、それだけ発声量を多くしなければならず、発
声者の負担が大きくなるという問題があった。That is, in an output device using voice recognition, the amount of information of input voice and output is equivalent. In other words, in order to obtain an output containing a large amount of information, such as a long sentence or guidance, the amount of vocalization must be increased, which poses a problem in that the burden on the speaker increases.

本発明は以上の点に着目してなされたもので、音声認識
を利用した出力装置の入力音声と出力との情報量が同等
であり、発声負担か大きいという問題点を除去するため
、キーワードのみを入力することにより、連想的に情報
量の大きい内容を出力することが可能な発声負担の少な
い操作性の優れた装置を提供することを目的とするもの
である。The present invention has been made with attention to the above points, and in order to eliminate the problem that the input voice and output of an output device using voice recognition have the same amount of information, and the burden of speaking is large, only keywords can be used. It is an object of the present invention to provide a device which is capable of outputting a large amount of information in an associative manner by inputting information, and is easy to operate with a low burden of vocalization.

（課題を解決するための手段）本発明の音声認識出力装置は、音声を認識する音声認識
部と、キーワードと出力内容とを対応付ける情報を格納
したメモリとを備え、前記音声認識部により認識された
キーワードに対応した出力内容を前記メモリから出力す
るようにしたことを特徴とするものである。(Means for Solving the Problems) A speech recognition output device of the present invention includes a speech recognition unit that recognizes speech, and a memory that stores information that associates keywords with output contents, The present invention is characterized in that the output content corresponding to the selected keyword is output from the memory.

（作用）本発明の音声認識出力装置においては、キーワードによ
り音声の入力を行なう。従って、操作者は、センテンス
やガイダンス等の出力内容をフルに発声する必要はない
。この結果、装置の操作性の向上を図ることができる。(Operation) In the speech recognition output device of the present invention, speech is input using keywords. Therefore, the operator does not need to fully utter the output contents such as sentences and guidance. As a result, the operability of the device can be improved.

（実施例）第１図は、本発明の音声認識出力装置の実施例を示すブ
ロック図である。(Embodiment) FIG. 1 is a block diagram showing an embodiment of the speech recognition output device of the present invention.

図示の装置は、マイク１と、音声認識部２と、リーダ４
と、メモリ５と、音声合成パターンメモリ６と、音声合
成部７と、スピーカ８とから成る。The illustrated device includes a microphone 1, a voice recognition section 2, and a reader 4.
, a memory 5 , a speech synthesis pattern memory 6 , a speech synthesis section 7 , and a speaker 8 .

マイク１は、操作者が音声情報を吹き込むためのもので
ある。The microphone 1 is used by the operator to input audio information.

音声認識部２は、マイク１を通じて入力された音声を標
準パターンと比較することにより音声を認識する。The speech recognition unit 2 recognizes speech by comparing the speech input through the microphone 1 with a standard pattern.

リーダ４は、音声認識部２から認識結果３を受は取り、
メモリ５に格納されたキーワードを探索する。The reader 4 receives the recognition result 3 from the speech recognition unit 2, and
Search for keywords stored in memory 5.

メモリ５は、キーワードに対応して音声合成パターンア
ドレスを格納している。この音声合成パターンアドレス
は、音声合成パターンメモリ６に格納された音声合成パ
ターンのアドレスである。The memory 5 stores speech synthesis pattern addresses corresponding to keywords. This speech synthesis pattern address is the address of the speech synthesis pattern stored in the speech synthesis pattern memory 6.

音声合成パターンメモリ６は、出力内容である音声合成
パターンを格納したメモリである。The speech synthesis pattern memory 6 is a memory that stores speech synthesis patterns that are output contents.

音声合成部７は、音声パターンから音声を合成する。The speech synthesis unit 7 synthesizes speech from the speech pattern.

スピーカ８は、音声合成部７で合成された音声を出力す
る。The speaker 8 outputs the voice synthesized by the voice synthesizer 7.

次に、上述した装置の動作を説明する。Next, the operation of the above-described device will be explained.

まず、キーワードＡがマイク１を通して音声認識部２に
入力される。次に、キーワードと各音声合成パターンの
アドレスとの対応が記載されたメモリ５をリーダ４が探
索し、音声認識部２で認識されたキーワードの対応する
音声合成パターンのアドレスを読み出す。そして、その
アドレス指定された音声合成パターンのデータが音声合
成部に送られ、長いセンテンスの合成音９がスピーカか
ら出力される。First, a keyword A is input to the speech recognition section 2 through the microphone 1. Next, the reader 4 searches the memory 5 in which the correspondence between the keyword and the address of each speech synthesis pattern is written, and reads out the address of the speech synthesis pattern corresponding to the keyword recognized by the speech recognition section 2. Then, the data of the addressed speech synthesis pattern is sent to the speech synthesis section, and a long sentence of synthesized speech 9 is output from the speaker.

次に、第２実施例について説明する。Next, a second example will be described.

第１実施例では、キーワードのみの入力であったが、第
２実施例では、キーワードの他にそのキーワードを用い
るシテユエーション（状況）も入力可能としたところに
その特徴がある。これによって、１つのキーワードに対
し、状況に応じた何通りもの出力が可能となった。In the first embodiment, only keywords were input, but in the second embodiment, in addition to keywords, situations in which the keywords are used can also be input. This makes it possible to output a single keyword in multiple ways depending on the situation.

第３図は、本発明の第２実施例を示すブロック図である
。FIG. 3 is a block diagram showing a second embodiment of the present invention.

図示の装置は、音声認識部１１と、知識データベース１
５と、音声合成部１７とから成る。The illustrated device includes a speech recognition section 11 and a knowledge database 1.
5 and a speech synthesis section 17.

音声認識部１１は、入力された音声を標準パターンと比
較することにより音声を認識する。The speech recognition unit 11 recognizes the input speech by comparing it with a standard pattern.

知識データベース１５は、キーワードに対応して音声合
成パターンを格納したメモリと、当該メモリに格納され
たキーワードを探索するり−ダ等から成る。The knowledge database 15 consists of a memory that stores speech synthesis patterns corresponding to keywords, and a reader that searches for the keywords stored in the memory.

音声合成部１７は、音声パターンから音声を合成する。The speech synthesis unit 17 synthesizes speech from the speech pattern.

まず、「コーヒー」といつ音声を音声認識部１１に入力
し、認識させる。次に、「買物」１２、「レストラン」
１３、「休憩」１４の３つのシチュエーションから１つ
を選択する。尚、このシチュエーションを選択する方法
としては、いかなる方法であってもよい。例えば、プッ
シュホクンや音声入力等が考えられる。また、シチュエ
ーションの入力回数も１回だけでなく、より複雑な状況
に対応するため、複数回入力する場合も考えられる。　
シチュエーションの入力が完了した状態でキーワードを
入力すると、知識データベースの探索が行なわれる。こ
の探索により、入力されたキーワードに対する各シチュ
エーションに応じた最適な音声合成パターンのアドレス
が見出される。次に、その探索された音声合成パターン
１６が音声合成部１７に送られ、スピーカから出力され
る。その出力内容としては、例えば、「買物」と選択し
たときは、「コーヒーはいくらですか。」、「レストラ
ン」では、「コーヒー−杯下さい。」、「休憩」では、
「コーヒーを飲みに行きましょう、」となる。First, the voice ``coffee'' is input to the voice recognition unit 11 and recognized. Next, “Shopping” 12, “Restaurant”
13. Select one from the three situations 14: "Break". Note that any method may be used to select this situation. For example, push-button input, voice input, etc. can be considered. Furthermore, the number of times a situation is input is not limited to once, but may be input multiple times in order to deal with more complex situations.
If a keyword is entered after the situation has been entered, the knowledge database will be searched. Through this search, the address of the optimal speech synthesis pattern corresponding to each situation for the input keyword is found. Next, the searched speech synthesis pattern 16 is sent to the speech synthesis section 17 and output from the speaker. For example, when you select ``Shopping,'' the output is ``How much is the coffee?'' When you select ``Restaurant,'' ``Coffee, please.'' When you select ``Break,'' the output is,
``Let's go get coffee.''

以上のように、第２実施例では、１つのキーワードに対
して状況に応じた出力が可能となる。As described above, in the second embodiment, it is possible to output one keyword according to the situation.

また、第２実施例の発展形として複数のキーワードを入
力することで、より長いセンテンスを出力するという方
法も考えられる。Further, as an advanced version of the second embodiment, a method of outputting a longer sentence by inputting a plurality of keywords may be considered.

尚、ここで示した出力装置として音声合成翻訳機を例と
して挙げたが、他にもワードプロセッサ、デイスプレィ
装置等の様々なアプリケーションが考えられる。Note that although a speech synthesis translator is used as an example of the output device shown here, various other applications such as a word processor, a display device, etc. can be considered.

（発明の効果）以上説明したように、本発明の音声認識出力装置によれ
ば、キーワードの音声を入力するだけで、大きな情報量
を持つ出力を得ることができるので、操作者の負担を大
幅に減少することができる。(Effects of the Invention) As explained above, according to the speech recognition output device of the present invention, it is possible to obtain an output with a large amount of information just by inputting the voice of a keyword, which greatly reduces the burden on the operator. can be reduced to

更に、状況に応じた判断機能を付与することによって、
キーワードの音声からセンテンスを推定して出力可能と
することができる。この結果、１つのキーワードに対す
る装置の利用範囲の拡大を図ることができる。Furthermore, by providing a judgment function according to the situation,
Sentences can be estimated and output from the voice of the keyword. As a result, it is possible to expand the scope of use of the device for one keyword.

[Brief explanation of the drawing]

第１図は本発明の第１実施例を示すブロック図、第２図
は従来の音声認識装置の構成を示すブロック図、第３図
は本発明の第２実施例を示すブロック図である。 ■・・・マイク、２・・・音声認識部、４・・・リーグ
、５・・・メモリ、６・・・音声合成パターンメモリ、
７・・・音声合成部、８・・・スピーカ。FIG. 1 is a block diagram showing a first embodiment of the present invention, FIG. 2 is a block diagram showing the configuration of a conventional speech recognition device, and FIG. 3 is a block diagram showing a second embodiment of the present invention. ■...Microphone, 2...Speech recognition unit, 4...League, 5...Memory, 6...Speech synthesis pattern memory,
7...Speech synthesis section, 8...Speaker.

Claims

[Scope of Claims] 1. A voice recognition unit that recognizes voice, and a memory that stores information associating keywords with output content, and output content corresponding to the keywords recognized by the voice recognition unit is stored in the memory. A speech recognition output device characterized by outputting from. 2. The speech recognition output device according to claim 1, wherein the semantic information of the keyword is searched from a knowledge database, and the output content is estimated from the keyword according to the specification of the situation.