JPH0863185A

JPH0863185A - Speech recognition device

Info

Publication number: JPH0863185A
Application number: JP6199179A
Authority: JP
Inventors: Junichiro Fujimoto; 潤一郎藤本; Takashi Ariyoshi; 敬有吉; Tetsuya Muroi; 哲也室井; Masako Hirose; 雅子広瀬
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1994-08-24
Filing date: 1994-08-24
Publication date: 1996-03-08

Abstract

PURPOSE: To eliminate unnaturality in interaction by displaying words having meanings from results of recognition on a display in order of the higher similarity of speech recognition or order reverse therefrom and selecting any from the displayed words without generating speeches. CONSTITUTION: The 'PCM sound recorder' uttered by a speaker in this speech recognizing device is inputted to the speech recognizing device and the words PCM having the meanings are recognized. As a result, the four words 'PC98', 'PCM', 'TQC' and 'DM' are displayed. The data on the PCM is displayed from a data base on a screen by bringing a cursor 4 at the 'PCM' and clicking the cursor. The results of the recognition are lined up in order of the high reliability thereof according thereto and, therefore, a correct answer is easily selectable simply by slightly moving the cursor 4 for selecting the results of the recognition.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識装置、より詳
細には、通常の対話の中で発声された言葉を聞いて、話
者に必要な情報を自動的に提供するような装置に係るも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device, and more particularly to a device for automatically listening to a spoken word in a normal dialogue and automatically providing a speaker with necessary information. It is related.

【０００２】[0002]

【従来の技術】本出願人は、先に、音声認識の利用分野
の一つとして、対話をしている話者の会話の中から自動
的に情報提供する装置について提案した。この装置の場
合、音声認識の利用者は、音声認識装置のための特別の
コマンドを発声することなく、話者同士は通常の会話を
していれば、話者に対して自動的に情報が表示されると
言う面で、画期的なものである。2. Description of the Related Art The present applicant has previously proposed, as one of the fields of use of voice recognition, an apparatus for automatically providing information from the conversation of a talking speaker. In the case of this device, the voice recognition user does not have to issue a special command for the voice recognition device, and if the speakers are in a normal conversation, the information is automatically given to the speakers. It is epoch-making in terms of being displayed.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、音声認
識装置が完全に動作した場合は、問題ないが、実際には
音声認識装置にはかなりの誤りが発生し、そのために、
話者同士の対話が不自然になってしまうという欠点があ
った。とくに、話者の一方は本装置を使っていることを
しらず、しかも、一方の話者が他方の話者に比べて優位
に立っており、そのうえ、劣位に立つ話者が音声認識を
使っているような場合、認識装置の誤りを訂正するため
に同じ言葉を再度発声することはほとんど不可能に近い
ことである。本出願人が先に提案した前記情報提供装置
にはこのような欠点があった。However, when the voice recognition device is fully operated, there is no problem, but in reality, a considerable error occurs in the voice recognition device.
There was a drawback that the dialogue between the speakers became unnatural. In particular, one of the speakers does not use this device, and one speaker has an advantage over the other speaker, and the speaker who is inferior uses speech recognition. In such cases, it is almost impossible to re-speak the same word in order to correct an error in the recognizer. The above-mentioned information providing device proposed by the applicant of the present invention has such drawbacks.

【０００４】このような音声認識の不完全を補うことを
目的として、認識結果の候補をいくつか示し、それぞれ
の候補に番号を付けておいて、利用者がテンキーによっ
て正しい答え（正解）を指示する方法がある（特開平１
−１５４１００号公報）。しかしながら、この方法で
は、候補と、番号を人間が対応づけねばらなず、対話中
にこれを行うことは、対話を妨害するという欠点があっ
た。For the purpose of compensating for such incomplete speech recognition, some candidates of the recognition result are shown, each candidate is numbered, and the user gives a correct answer (correct answer) with the ten-key pad. There is a method to
No. 154100). However, this method has a drawback in that the human has to associate the candidate with the number, and doing this during the dialog interferes with the dialog.

【０００５】また、音声認識結果が誤っていた場合に
は、再度発声し直す方法もあるが（特開平６１−２４８
１９８号公報）、対話の最中に同じ言葉を２度、３度と
続けて発声することは、相手に対し、不快な感情を起こ
させるという欠点があった。There is also a method of re-speaking when the voice recognition result is incorrect (Japanese Patent Laid-Open No. 61-248).
No. 198), uttering the same word twice or three times in a row during a dialogue has a drawback of causing unpleasant feelings to the other party.

【０００６】本発明は、上述のごとき実情に鑑みてなさ
れたもので、対話をしている話者の会話の中から自動的
に情報提供する装置において、その中の音声認識部分の
不確実性を補償し、対話に不自然さをなくした使いやす
い装置にするための方法を提供することを目的としてな
されたものである。The present invention has been made in view of the above-mentioned circumstances, and in a device for automatically providing information from the conversation of a talking speaker, the uncertainty of the voice recognition portion therein. The object of the present invention is to provide a method for compensating for the above, and making the device easy to use with no unnaturalness in the dialogue.

【０００７】[0007]

【課題を解決するための手段】本発明は、上記課題を解
決するために、（１）話者が特別な装置の使用を意識せ
ずに会話を行える環境と、その会話の中から１人以上の
信号を取り出し、音声認識装置に印加せしめる手段と、
認識した結果によって、該結果表示する（および／また
は）その結果で特定のものを動作させるような信号を出
力するようにした音声認識装置において、得られた認識
結果の中から意味のある言葉を音声認識の類似性の高い
順（距離の小さい順）、あるいは、その逆順にディスプ
レー上に表示し、表示された中からどれかを音声を発す
ることなく選択できるようにしたこと、或いは、（２）
話者が特別な装置の使用を意識せずに会話を行える環境
と、その会話の中から１人以上の信号を取り出し、音声
認識装置に印加せしめる手段と、認識した結果によっ
て、該結果表示する（および／または）その結果で特定
のものを動作させるような信号を出力するようにした音
声認識装置において、得られた認識結果の中から意味の
ある言葉を文字列に直し、先頭に近い文字が決められた
順、あるいは、その逆順にディスプレー上に表示し、表
示された中からどれかを音声を発することなく選択でき
るようにしたこと、或いは、（３）話者が特別な装置の
使用を意識せずに会話を行える環境と、その会話の中か
ら１人以上の信号を取り出し、音声認識装置に印加せし
める手段と、認識した結果によって、該結果表示する
（および／または）その結果で特定のものを動作させる
ような信号を出力するようにした音声認識装置におい
て、得られた認識結果の中から意味のある言葉に数字が
存在する場合、先頭に近い数字が決められた順になるよ
うに認識結果をディスプレー上に表示し、表示された中
からどれかを音声を発することなく選択できるようにし
たこと、或いは、（４）話者が特別な装置の使用を意識
せずに会話を行える環境と、その会話の中から１人以上
の信号を取り出し、音声認識装置に印加せしめる手段
と、認識した結果によって、該結果表示する（および／
または）その結果で特定のものを動作させるような信号
を出力するようにした音声認識装置において、得られた
認識結果の中の意味のある言葉に文字と数字の両方が存
在する場合、上記順序付けに従って、先頭に近い方を優
先的に順序づけて認識結果をディスプレー上に表示し、
表示された中からどれかを音声を発することなく選択で
きるようにしたこと、或いは、（５）話者が特別な装置
の使用を意識せずに会話を行える環境と、その会話の中
から１人以上の信号を取り出し、音声認識装置に印加せ
しめる手段と、認識した結果によって、該結果表示する
（および／または）その結果で特定のものを動作させる
ような信号を出力するようにした音声認識装置におい
て、得られた認識結果の中から意味のある言葉が数字、
または、英字のみである場合、別に記憶されている情報
から該数字と対になる文字列を見付け、それも認識結果
の情報としてディスプレー上に表示し、表示された中か
らどれかを音声を発することなく選択できるようにした
こと、或いは、前記（１）乃至（５）のいずれかにおい
て、（６）正解候補の中のもっとも類似度が高いもの
に、結果選択のためのカーソルを一致させてディスプレ
ー上に表示し、表示された中からどれかを音声を発する
ことなく選択できるようにしたこと、或いは、（７）話
者が先に発声した言葉と同じ、または、類似した言葉を
続けて発声することが不自然でないような特定の言葉が
認識されたとき、先の認識結果を破棄、または、修正す
るようにしたこと、或いは、（８）スイッチを設け、話
者がスイッチで、装置への音声入力を制御することを特
徴としたものであり、更には、（９）前記（８）におい
て、音声入力制御用のスイッチが決められた時間よりも
短くオン／オフされたとき、それより前の認識結果を破
棄、または、修正するようにしたこと、更には、（１
０）前記（１）乃至（９）のいずれかにおいて、特定の
命令によって、すでに表示し終わった情報を再度表示す
ること、更には、前記（１０）において、（１１）特定
の命令として、決められた言葉による音声認識結果を利
用すること、或いは、（１２）特定の命令として音声入
力スイッチのオン／オフを利用すること、更には、前記
（１）乃至（４）のいずれかにおいて、（１３）認識結
果の候補の一部に類似度の高いものを集めてディスプレ
ー上に表示し、表示された中からどれかを音声を発する
ことなく選択できるようにしたこと、或いは、（１４）
正解候補の中のもっとも類似度が高いものに、結果選択
のためのカーソルを一致させてディスプレー上に表示
し、表示された中からどれかを音声を発することなく選
択できるようにしたこと、更には、（１５）前記（９）
乃至（１２）のいずれかにおいて、特定の言葉を認識し
たときになにも動作、演算をしないようにしたことを特
徴としたものである。In order to solve the above problems, the present invention provides (1) an environment in which a speaker can have a conversation without being aware of the use of a special device, and one of the conversations. Means for extracting the above signals and applying them to the voice recognition device,
According to the recognition result, in a voice recognition device which displays the result (and / or) outputs a signal for operating a specific one with the result, a meaningful word is selected from the obtained recognition results. Display on the display in descending order of similarity of voice recognition (smallest distance first) or in reverse order so that any of the displayed ones can be selected without uttering a voice, or (2 )
An environment in which a speaker can talk without being aware of the use of a special device, a means for extracting one or more signals from the conversation and applying them to a voice recognition device, and displaying the result according to the recognition result. (And / or) In a voice recognition device that outputs a signal that causes a specific one to operate as a result, correct a meaningful word from the obtained recognition result into a character string, and add a character close to the beginning. Are displayed on the display in a predetermined order or in the reverse order so that any one of the displayed can be selected without producing a sound, or (3) the speaker uses a special device. Environment in which a conversation can be made without being conscious of, and a means for extracting one or more signals from the conversation and applying them to a voice recognition device, and displaying the result (and / or) according to the recognition result. In a voice recognition device that outputs a signal that operates a specific one as a result, if there are numbers in meaningful words from the obtained recognition results, the numbers near the beginning are determined in the determined order. The recognition result is displayed on the display so that any of the displayed results can be selected without producing a voice, or (4) the speaker does not have to be aware of the use of a special device. The result is displayed according to the environment in which the conversation is possible, the means for extracting one or more signals from the conversation and applying them to the voice recognition device, and the recognition result (and / or
Or) In the voice recognition device which outputs a signal for operating a specific one as a result, when both a letter and a number are present in a meaningful word in the obtained recognition result, the above-mentioned ordering is performed. According to, display the recognition result on the display by prioritizing the one closest to the beginning,
One of the displayed conversations can be selected without uttering a voice, or (5) the environment in which the speaker can have a conversation without being aware of the use of a special device, and one of the conversations. A means for extracting a signal of a person or more and applying it to a voice recognition device, and a voice recognition for outputting a signal for displaying a result (and / or) operating a specific one according to the result of the recognition, depending on the recognition result. In the device, the meaningful words from the obtained recognition results are numbers,
Or, if it is only alphabetic characters, it finds a character string paired with the number from the separately stored information, displays it as the information of the recognition result on the display, and utters one of the displayed characters. Or (5) in any of the above (1) to (5), the cursor for result selection is made to coincide with the one having the highest similarity among the correct candidates. Displayed on the display so that you can select any of the displayed ones without uttering a sound, or (7) The same or similar words that the speaker spoke previously were continued. When a specific word whose utterance is not unnatural is recognized, the previous recognition result is discarded or corrected, or (8) a switch is provided, and the speaker switches (9) In (8), when the switch for controlling the voice input is turned on / off for a shorter time than a predetermined time, The previous recognition result is discarded or corrected, and further, (1
0) In any one of (1) to (9) above, the information already displayed is displayed again by a specific command, and further, in (10) above, it is determined as (11) a specific command. Using the voice recognition result by the specified word, or (12) using ON / OFF of the voice input switch as the specific command, and further, in any of the above (1) to (4), 13) Some of the recognition result candidates with high similarity are collected and displayed on the display so that any one of the displayed results can be selected without uttering a voice, or (14)
Matching the cursor for result selection to the one with the highest similarity among the correct answer candidates and displaying it on the display, it was possible to select any of the displayed ones without making a sound. Is (15) above (9)
In any one of (12) to (12), it is characterized in that no operation or calculation is performed when a specific word is recognized.

【０００８】[0008]

【作用】音声認識装置によって、得られた認識結果の中
から意味のある言葉を音声認識の類似性の高い順（距離
の小さい順）、あるいは、その逆順にディスプレー上に
表示し、表示された中からどれかを音声を発することな
く選択できるようにした。By means of the voice recognition device, meaningful words are displayed on the display in the order of high similarity of voice recognition (smallest distance first) or vice versa from the obtained recognition results. You can select any of them without making a voice.

【０００９】[0009]

【Example】

実施例１（請求項１に対応）この実施例は、話者が特別な装置の使用を意識せずに会
話を行える環境と、その会話の中から１人以上の信号を
取り出し、音声認識装置に印加せしめる手段と、認識し
た結果によって、該結果表示する（および／または）そ
の結果で特定のものを動作させるような信号を出力する
ようにした音声認識装置において、得られた認識結果の
中から意味のある言葉を音声認識の類似性の高い順（距
離の小さい順）、あるいは、その逆順にディスプレー上
に表示し、表示された中からどれかを音声を発すること
なく選択できるようにしたものである。Example 1 (corresponding to claim 1) In this example, an environment in which a speaker can have a conversation without being aware of the use of a special device and a voice recognition device by extracting one or more signals from the conversation Of the recognition results obtained in the voice recognition device, which is configured to display the result (and / or) output a signal for operating a specific one according to the recognition result. Meaningful words are displayed on the display in the order of high similarity of voice recognition (smallest distance first) or vice versa, and you can select any of the displayed words without uttering a voice. It is a thing.

【００１０】図１は、本実施例を説明するための図で、
この例は、電話を通じて話者Ｂが話者Ａに対して情報を
提供するシステムである。両話者Ａ，Ｂの電話は通常の
公衆電話回線Ｃで接続されているが、話者Ｂの電話から
の送話信号が、音声認識装置１に印加されるようになっ
ている。本来、音声認識の前に、信号の中から音声の区
間だけを取り出すような部分を有することも多いが、ワ
ードスポッティングの手法を用いると、区間検出部がな
くても動作させることは可能である。区間検出部が必要
なものは、音声認識部に含まれていると考えることにす
る。音声認識装置１では発せられた音声を電気信号に直
し、その中に必要な単語が必要な順に並んでいるかをワ
ードスポッティングや対話テンプレートを使った方法で
認識する。FIG. 1 is a diagram for explaining the present embodiment.
This example is a system in which speaker B provides information to speaker A over the telephone. The telephones of both talkers A and B are connected by a normal public telephone line C, but the transmission signal from the telephone of the talker B is applied to the voice recognition device 1. Originally, it often has a part that extracts only the voice section from the signal before the voice recognition, but by using the word spotting method, it is possible to operate without the section detection unit. . What needs a section detection unit is considered to be included in the voice recognition unit. The voice recognition device 1 converts the emitted voice into an electric signal and recognizes whether the necessary words are arranged in the necessary order in the electric signal by a method using word spotting or a dialogue template.

【００１１】この認識の仕方は、たとえば、音響学会春
期講演論文集平成５年３月１−４−１や、情報処理学会
（'９３．１０）第４７回全国大会講演論文集（２）
２−３６９に詳しく述べられている。その結果、得られ
た結果には、名詞や助詞などが含まれる。しかし、助詞
は認識結果として表示する必要が無いので、名詞だけを
取り出して、類似度の順に取り出し、ディスプレイ２に
表示するようにした。音声認識の結果の中から類似度の
高いものを表示することは、例えば、特開平３−１７３
２４８号公報にも記されているが、この方法によると、
名詞に限らず認識した結果を表示することになるので非
常に表示が見づらくなってしまうと言う欠点があった。
本実施例ではそれを解消している。This recognition method is described, for example, in the Acoustic Society of Japan Spring Lecture Collection, March 1-4-1, and in the Information Processing Society of Japan ('93 .10) 47th National Convention Lecture Collection (2).
2-369. As a result, the obtained results include nouns and particles. However, since the particle does not have to be displayed as a recognition result, only the nouns are taken out, and they are taken out in the order of similarity and displayed on the display 2. Displaying the one having a high degree of similarity among the results of the voice recognition is disclosed in, for example, Japanese Patent Laid-Open No. 3-173.
According to this method, as described in Japanese Patent No. 248,
There is a drawback that the display is very difficult to see because the recognition result is displayed, not limited to the noun.
This is solved in the present embodiment.

【００１２】図１では、話者Ａが電話で「ＰＣＭ録音機
について教えてほしい」と言い、Ｂが「ＰＣＭ録音機で
すね」と答えているものとする。音声認識装置１にはこ
の中の話者Ｂが発声した「ＰＣＭ録音機ですね」が入力
され、意味があるＰＣＭと言う単語が認識される。その
結果、もっとも確からしい結果を得た言葉は、例えば、
図２に示すように「ＰＣ９８」で、正解である「ＰＣ
Ｍ」は第２位に来ている。この場合、ほかに助詞の
「で」が認識されたりしているが、助詞は除いて、図２
に示すように、認識結果として、「ＰＣ９８」「ＰＣ
Ｍ」「ＴＱＣ」「ＤＭ」の４単語が表示される。これ
は、４単語に限定されるものではない。むしろ、単語数
よりも、類似度、つまり、結果の確かさにしきい値を設
けておくほうがよいかもしれない。表示するのはプルダ
ウンメニュウのような形が良く、マウス３で選べると良
い。この結果、カーソル４を「ＰＣＭ」に一致させて、
クリックすることでデータベース３の中からＰＣＭに関
わるデータを画面に表示する。In FIG. 1, it is assumed that the speaker A says "I would like to know about PCM recorder" on the telephone and B answers "Is it a PCM recorder?" In the voice recognition device 1, "PCM recorder," which is uttered by the speaker B is input, and a meaningful word PCM is recognized. As a result, the words that gave the most probable result are, for example,
As shown in FIG. 2, the correct answer is "PC98".
"M" is in second place. In this case, the particle "de" is also recognized.
As shown in, the recognition results are “PC98” and “PC
Four words "M", "TQC", and "DM" are displayed. This is not limited to 4 words. Rather, it may be better to set a threshold for the degree of similarity, that is, the certainty of the result, rather than the number of words. It is displayed in a good shape like a pull-down menu, which can be selected with the mouse 3. As a result, match the cursor 4 with "PCM",
By clicking, data related to PCM from the database 3 is displayed on the screen.

【００１３】この実施例によると、認識結果の信頼性の
高い順に並べられているので、認識結果を選択するため
のカーソル４をわずかに動かすだけで、正解が選びやす
い。しかしながら、正解は候補の中で必ずしも上位にあ
るとい限らないだけでなく、アルファベットの何文字か
で表された記号など、本実施例では、どれが正解かを読
み取るのに神経を使う。これでは、本来の目的である対
話の自然さはそこなわれてしまう。According to this embodiment, since the recognition results are arranged in descending order of reliability, the correct answer can be easily selected by slightly moving the cursor 4 for selecting the recognition result. However, the correct answer is not necessarily in the upper rank of the candidates, and in the present embodiment, such as a symbol represented by some letters of the alphabet, the nerve is used to read which is the correct answer. In this case, the natural purpose of dialogue, which is the original purpose, is lost.

【００１４】実施例２（請求項２に対応）この実施例は、話者が特別な装置の使用を意識せずに会
話を行える環境と、その会話の中から１人以上の信号を
取り出し、音声認識装置に印加せしめる手段と、認識し
た結果によって、該結果表示する（および／または）そ
の結果で特定のものを動作させるような信号を出力する
ようにした音声認識装置において、得られた認識結果の
中から意味のある言葉を文字列に直し、先頭に近い文字
が決められた順、あるいは、その逆順にディスプレー上
に表示し、表示された中からどれかを音声を発すること
なく選択できるようにしたものである。Embodiment 2 (corresponding to claim 2) In this embodiment, an environment in which a speaker can have a conversation without being aware of the use of a special device, and one or more signals are extracted from the conversation, Recognition obtained by means for applying to a voice recognition device and a voice recognition device adapted to output a signal for displaying the result (and / or) operating a specific one according to the result of recognition The meaningful words in the result are converted into a character string, the characters near the beginning are displayed on the display in the determined order or vice versa, and any of the displayed words can be selected without uttering a sound. It was done like this.

【００１５】図３に、アルファベット順（ＡＢＣ順）の
例を示す。図１の場合と同様の作用をする部分には図１
の場合と同一の参照番号を付して説明を省略するが、認
識した名詞のアルファベットの単語中の、先頭に近いも
のがＡＢＣ順に並ぶように配慮したものである。従っ
て、この候補の中から、望む正解を探すことは大して難
しくない。図４は、ＡＢＣ順に並べた場合の表示例であ
るが、アルファベットだけでなく、漢字を読みの五十音
順に並べることも同じように出来る。このような文字や
記号は良いが、数字の場合は、次の実施例３のようにす
る。FIG. 3 shows an example in alphabetical order (ABC order). 1 is included in a portion having the same operation as in FIG.
Although the same reference numerals as those in the case of No. 1 are given and the description thereof is omitted, it is considered that the words of the recognized alphabet of the noun near the beginning are arranged in the order of ABC. Therefore, it is not so difficult to find a desired correct answer from these candidates. FIG. 4 shows a display example in which the characters are arranged in the order of ABC, but not only the alphabet but also the kanji can be arranged in the order of the Japanese syllabary. Although such characters and symbols are good, in the case of numbers, the following example 3 is used.

【００１６】実施例３（請求項３に対応）この実施例は、話者が特別な装置の使用を意識せずに会
話を行える環境と、その会話の中から１人以上の信号を
取り出し、音声認識装置に印加せしめる手段と、認識し
た結果によって、該結果表示する（および／または）そ
の結果で特定のものを動作させるような信号を出力する
ようにした音声認識装置において、得られた認識結果の
中から意味のある言葉に数字が存在する場合、先頭に近
い数字が決められた順になるように認識結果をディスプ
レー上に表示し、表示された中からどれかを音声を発す
ることなく選択できるようにしたもので、図５にその表
示例を示す。図示の場合、左側の数字が小さい順に並ん
でいるが、このようにすると、先の実施例と同様に選び
やすい。Embodiment 3 (corresponding to claim 3) In this embodiment, an environment in which a speaker can have a conversation without being aware of the use of a special device, and one or more signals are extracted from the conversation, Recognition obtained by means for applying to a voice recognition device and a voice recognition device adapted to output a signal for displaying the result (and / or) operating a specific one according to the result of recognition If there is a number in a meaningful word from the result, the recognition result is displayed on the display so that the number closest to the beginning is in the determined order, and select one from the displayed without uttering a voice The display example is shown in FIG. In the illustrated case, the numbers on the left side are arranged in ascending order, but this makes it easier to select, as in the previous embodiment.

【００１７】実施例４（請求項４に対応）この実施例は、話者が特別な装置の使用を意識せずに会
話を行える環境と、その会話の中から１人以上の信号を
取り出し、音声認識装置に印加せしめる手段と、認識し
た結果によって、該結果表示する（および／または）そ
の結果で特定のものを動作させるような信号を出力する
ようにした音声認識装置において、得られた認識結果の
中の意味のある言葉に文字と数字の両方が存在する場
合、上記順序付けに従って、先頭に近い方を優先的に順
序づけて認識結果をディスプレー上に表示し、表示され
た中からどれかを音声を発することなく選択できるよう
にしたものである。Embodiment 4 (corresponding to claim 4) In this embodiment, an environment in which a speaker can have a conversation without being aware of the use of a special device, and one or more signals are extracted from the conversation, Recognition obtained by means for applying to a voice recognition device and a voice recognition device adapted to output a signal for displaying the result (and / or) operating a specific one according to the result of recognition If there are both letters and numbers in the meaningful words in the result, according to the above ordering, the recognition result is displayed on the display with the one closest to the beginning preferentially ordered, and one of the displayed ones is displayed. It is possible to select without making a sound.

【００１８】図６は、その表示例を示すが、アルファベ
ットと数字が組み合わせられている場合、左側に近い方
を優先的に順序付ける。この例では、まず、英字をＡＢ
Ｃ順に並べることを優先し、続いて数字を並べる。この
ようにすると、先に発声する音が優先となるため、探し
やすくなる。また、これらの折衷案として、次の実施例
５に示すごとき音声認識装置が考えられる。FIG. 6 shows an example of the display. When alphabets and numbers are combined, the one closer to the left side is preferentially ordered. In this example, first, the alphabetic characters are AB
Priority is given to arranging in C order, and then the numbers are arranged. In this way, the sound to be uttered first has priority, so that it is easy to find. As a compromise between these, a voice recognition device as shown in the following fifth embodiment can be considered.

【００１９】実施例５（請求項５に対応）この実施例は、前記実施例１乃至４に記載の音声認識装
置において、認識結果の候補の一部に類似度の高いもの
を集めてディスプレー上に表示し、表示された中からど
れかを音声を発することなく選択できるようにしたもの
である。図７は、その実施例を示す図で、図示のように
音声認識装置１により音声認識した結果を類似度順に並
べてメモリー１に入れ、その後、五十音順に並べてメモ
リー２に入れ、表示の際は五十音に並べたものの最上段
に類似度の高いものを加えて表示する。Embodiment 5 (corresponding to claim 5) In this embodiment, in the speech recognition apparatus according to any one of Embodiments 1 to 4, some of the recognition result candidates having a high degree of similarity are collected and displayed. Is displayed, and any of the displayed items can be selected without producing a voice. FIG. 7 is a diagram showing an embodiment thereof. As shown, the results of voice recognition by the voice recognition device 1 are arranged in the memory 1 after being arranged in the order of similarity, and then in the memory 2 after being arranged in the order of Japanese syllabary. Is displayed in addition to the top row of the Japanese syllabary with a high degree of similarity.

【００２０】図８は、ディスプレイ２上の表示を示す図
で、まず、実施例２のようなやり方で、五十音順に並べ
（キーシステム以下）、その最上段に最も類似度の高か
った候補２つ（コピー、コーヒー）を並べた例である。
類似度が高い候補は正解である確率が高いので、多くの
場合はこの２つの中から正解を選ぶことが出来るが、こ
こに正解が存在しない場合にも、実施例２で示したよう
に、正解を選びやすくなる。勿論、これは実施例２だけ
に応用できるものではなく、他の方法にも適用できる。
この場合は、類似度の高い単語は五十音順の中にも含ま
れているが、これを取り除くことも可能である。さら
に、類似度の高いものは２個表示するだけでなく、必要
に応じて１個、３個、４個いずれでもよい。同様のやり
かたとして、次の実施例６に示すようなやり方がある。FIG. 8 is a diagram showing a display on the display 2. First, in a manner similar to that of the second embodiment, the candidates are arranged in the order of the Japanese syllabary (key system and below), and the candidate with the highest degree of similarity is at the top. This is an example in which two (copy, coffee) are arranged.
Since a candidate having a high degree of similarity has a high probability of being a correct answer, the correct answer can be selected from these two in many cases. However, even when there is no correct answer here, as shown in the second embodiment, It becomes easier to choose the correct answer. Of course, this is not only applicable to the second embodiment, but also applicable to other methods.
In this case, a word with a high degree of similarity is also included in the Japanese syllabary order, but this can be removed. Furthermore, not only two pieces having a high degree of similarity are displayed, but one piece, three pieces, or four pieces may be displayed if necessary. As a similar method, there is a method as shown in the following sixth embodiment.

【００２１】実施例６（請求項６に対応）この発明は、実施例１乃至４の音声認識装置において、
正解候補の中のもっとも類似度が高いものに、結果選択
のためのカーソルを一致させてディスプレー上に表示
し、表示された中からどれかを音声を発することなく選
択できるようにしたものである。この実施例は、実施例
３の場合（図５参照）と同様、数字の大きな順に並んで
いるが、図９に示すように、類似度のもっとも大きな候
補（コピー）の上にカーソル４が位置している。従っ
て、多くの場合はこのまま選択キーを押せずよいことに
なるし、誤りの場合も、正解は選びやすい。Embodiment 6 (corresponding to claim 6) The present invention provides a speech recognition apparatus according to any one of Embodiments 1 to 4.
It is the one with the highest similarity among the correct answer candidates, which is displayed on the display with the cursor for result selection aligned with it so that you can select any of the displayed candidates without uttering a voice. . In this embodiment, as in the case of the third embodiment (see FIG. 5), the numbers are arranged in descending order of the numbers, but as shown in FIG. 9, the cursor 4 is positioned on the candidate (copy) having the largest similarity. are doing. Therefore, in many cases, it is not necessary to press the selection key as it is, and even in the case of an error, it is easy to select the correct answer.

【００２２】構成は、図７と同じで、類似度の最も高い
名詞を１つ選び、それをメモリー１に記憶する。一方、
五十音順に並べたメモリー２の中の名詞の中からメモリ
ー１の中の言葉と同じものを選び、表示の時に明暗を反
転させる。その例が、図９である。勿論、反転させるこ
とに限定するものではなく、カーソル４がその単語に一
致していることが分かるようにすれば良い。類似度が最
大であるから、正解である可能性は最も高く、その時は
なにもせず、リターンキーを押下すればよい。最大類似
度を得た言葉を正解ではないときにはマウスによって正
しいものを選び、クリックする。The configuration is the same as in FIG. 7, and one noun with the highest degree of similarity is selected and stored in the memory 1. on the other hand,
From the nouns in memory 2 arranged in the order of the Japanese syllabary, select the same words as those in memory 1 and invert the lightness and darkness at the time of display. An example is shown in FIG. Of course, the present invention is not limited to inversion, and it is sufficient to make it possible to know that the cursor 4 matches the word. Since the degree of similarity is the maximum, the possibility of a correct answer is the highest, and at that time, the return key may be pressed without doing anything. If the word with the maximum similarity is not the correct answer, select the correct one with the mouse and click.

【００２３】実施例７（請求項７に対応）この実施例は、話者が特別な装置の使用を意識せずに会
話を行える環境と、その会話の中から１人以上の信号を
取り出し、音声認識装置に印加せしめる手段と、認識し
た結果によって、それを表示する（そして／または）そ
の結果で特定のものを動作させるような信号を出力する
ようにした音声認識装置において、得られた認識結果が
数字、または、英字のみである場合、別に記憶されてい
る情報から該数字と対になる文字列を見付け、それも認
識結果の情報としてディスプレー上に表示し、表示され
た中からどれかを音声を発することなく選択できるよう
にしたものである。Embodiment 7 (corresponding to claim 7) In this embodiment, an environment in which a speaker can have a conversation without being aware of the use of a special device and one or more signals from the conversation are extracted. Recognition obtained by means for applying to a voice recognition device and a voice recognition device adapted to display (and / or) output a signal for operating a specific object depending on the recognition result If the result is only numbers or letters, it finds a character string that pairs with the number from the separately stored information, displays it as the information of the recognition result on the display, and selects one from the displayed It is possible to select without outputting a voice.

【００２４】この実施例は発声された音声の中から、ス
ポッティングによって複数の単語を取り出すとき、全て
の単語が取り出せなかったときのためのものである。Ｆ
Ｘ２５０のような英字と数字の連結に意味がある場合を
考える。これは、図１０に示すように、数字と英字を関
係付けた、例えば、ＦＸ２５０という言葉が有ると、Ｆ
Ｘと２５０が対になっていることを示すような関係テー
ブル６をメモリーに持ち、認識結果が、例えば、ＦＸだ
けであったり、２５０だけであった場合（step１）、こ
のテーブル６の中から組み合わせを見つけ、不足分を補
った上で（step２）、ディスプレー２に表示する。これ
によって、認識ミスで欠落した情報を付け加えて表示す
ることが出来るだけでなく、例えば、話者が「ＦＸなん
とかという機械」と言うような表現をしたときにも結果
を得ることが出来る。このような方法で、利用者が正解
候補から一つを選び出すことができるが、候補の中に正
解が含まれていなかったような場合がある。そのような
ために、次の実施例８が考えられる。This embodiment is for a case where a plurality of words cannot be extracted from the uttered voice by spotting. F
Consider a case where a combination of letters and numbers such as X250 is meaningful. As shown in FIG. 10, this is because if there is the word FX250, which relates numbers and letters,
If the memory has a relation table 6 indicating that X and 250 are paired, and the recognition result is, for example, only FX or 250 (step 1), from this table 6, Find the combination, make up for the shortage (step 2), and display it on the display 2. As a result, not only information that is missing due to a recognition error can be added and displayed, but also results can be obtained when, for example, the speaker makes an expression such as "a machine called FX somehow". By such a method, the user can select one from the correct answer candidates, but there are cases where the correct answer is not included in the candidates. For that reason, the following Example 8 is considered.

【００２５】実施例８（請求項８に対応）この実施例は、実施例１乃至４又は７の音声認識装置に
おいて、表示された認識結果の１つを選び、選ばれた文
字、または、数字の一部を、音声以外の入力手段によっ
て訂正するようにしたものである。すでにこれまでに述
べたような方法で、正解候補を出し、その中に正解が無
かったら、正解に近そうな言葉を選び出し、その文字列
をキーボードから正しく修正して結果とする。同様の発
明に特開昭５９−２１４８９９号公報があるが、これ
は、音声認識結果の中で、怪しい桁のみを指定して再発
声するものであり、本発明で述べる状況では、再発声す
ることは相手に不快な思いをさせてしまうと言う欠点が
あった。そこで、本実施例においては、図１１のような
構成で、表示されたものの中からキーボード７により一
度選び、その綴りをキーボード８により修正した後で、
情報検索部へ転送するようにしている。Embodiment 8 (corresponding to claim 8) In this embodiment, one of the displayed recognition results is selected in the speech recognition apparatus of Embodiments 1 to 4 or 7, and a selected character or numeral is selected. A part of is corrected by an input means other than voice. In the same way as described above, a correct answer candidate is given, and if there is no correct answer, a word that seems to be close to the correct answer is selected, and the character string is corrected correctly from the keyboard to obtain the result. A similar invention is disclosed in Japanese Patent Laid-Open No. 59-214899. This is to re-speak by designating only a suspicious digit in the speech recognition result, and in the situation described in the present invention, re-speaks. It had the drawback of making people feel uncomfortable. In view of this, in the present embodiment, with the configuration shown in FIG. 11, one of the displayed items is selected once with the keyboard 7, and its spelling is corrected with the keyboard 8,
The information is transferred to the information retrieval section.

【００２６】実施例９（請求項９に対応）この実施例は、実施例１乃至４又は７の音声認識装置に
おいて、話者が先に発声した言葉と同じ、または、類似
した言葉を続けて発声することが不自然でないような特
定の言葉が認識されたとき、先の認識結果を破棄、また
は、修正するというようにしたものである。すでに、従
来技術の中でも述べたように、認識がうまく行かなかっ
たような場合、再度同じ言葉を発声することは、対話が
不自然になってしまうという欠点があった。そこで、こ
こでは、対話を不自然にしないような言葉を発声したと
きに、先の答えを否定するようにした。この対話を不自
然にしないような言葉として、「もう一度申し上げま
す」とか、「繰り返させていただきます」などが良く、
認識装置でこのような言葉が認識されたときには、１つ
前の結果を次に認識する言葉で置き換えるようにする。Embodiment 9 (corresponding to claim 9) In this embodiment, in the speech recognition apparatus according to any one of Embodiments 1 to 4 or 7, the same or similar words as previously spoken by the speaker are continuously used. When a specific word that is not unnatural to utter is recognized, the previous recognition result is discarded or corrected. As already mentioned in the prior art, when the recognition is not successful, uttering the same word again has a drawback that the dialogue becomes unnatural. Therefore, here, when I say a word that does not make the dialogue unnatural, I try to deny the previous answer. As a word that does not make this dialogue unnatural, "I would like to say again" or "I will repeat" is good,
When such a word is recognized by the recognition device, the previous result is replaced with the next recognized word.

【００２７】仮に、音声認識がうまく行かず、全く認識
結果が得られなかったような場合、あるいは、修正出来
ないようなとんでもない文字列を結果とした場合のため
に、再発声をする必要が出ることがある。そのようなと
きに、同じことを２回繰り返すと、相手には不自然さと
共に、不愉快感を与えることになる。そのようなため
に、「もう一度申し上げます」とか、「繰り返させてい
ただきます」と言う言葉をキーワードとしておき、これ
が認識されたら先の認識結果を否定して、再入力のモー
ドになるようにする。It is necessary to re-voice the voice if the voice recognition is not successful and no recognition result is obtained, or if the result is a ridiculous character string that cannot be corrected. It may come out. In such a case, if the same thing is repeated twice, the other person will feel uncomfortable as well as unnatural. For that reason, the words "I will say again" or "I will repeat" will be used as a keyword, and if this is recognized, the previous recognition result will be denied and the mode will be re-entered. .

【００２８】実施例１０（請求項１０に対応）この実施例は、実施例１乃至４又は７において、スイッ
チを設け、話者がスイッチで、装置への音声入力を制御
するもので、次のような欠点を解消するためのものであ
る。例えば、交換手が本装置を使っている場合を考え、
例えば、「はい、こちらはＯＡ機器のリコーでございま
す」「総務部の本藤でございますね」と発声した場合、
本認識装置は本藤さんに電話をつなげるような動作をす
ると、期待されている。しかしながら、社員の中に大江
さんがいた場合、ＯＡ機器の「オーエー」の発声と大江
の「オオエ」の部分で近いために、大江さんが認識結果
として、得られることは十分考えられる。このようなミ
スを防ぐために話者Ａの側にスイッチを設け、交換手が
人の名前を復唱するときにスイッチによって音声認識装
置を起動するようにしたものである。Embodiment 10 (corresponding to claim 10) In this embodiment, a switch is provided in any one of Embodiments 1 to 4 or 7, and a speaker controls the voice input to the apparatus by the following. This is to eliminate such drawbacks. For example, consider the case where an operator is using this device,
For example, if you say "Yes, this is Ricoh for office automation equipment,""This is Motoo of the General Affairs Department,"
It is expected that this recognition device will operate like connecting a phone to Mr. Motofuji. However, if Mr. Oe is among the employees, it is fully conceivable that Mr. Oe will be obtained as a recognition result because the voice of "OA" of the OA device and the part of "OEE" of Oe are close. In order to prevent such a mistake, a switch is provided on the side of the speaker A, and the switch activates the voice recognition device when the operator repeats the name of the person.

【００２９】図１２は、その一実施例を示す要部構成図
で、認識装置１がスイッチ９によって起動される場合、
このスイッチ９を使うことによって、これまでに述べた
対話の相手に不自然さを与えなようにして、認識結果を
修正することができる。そのためには、次に説明する実
施例１１のようにするとよい。FIG. 12 is a block diagram showing the essential part of the embodiment. When the recognition device 1 is activated by the switch 9,
By using this switch 9, it is possible to correct the recognition result without giving unnaturalness to the other party of the dialogue described above. For that purpose, it is preferable to do as in Example 11 described below.

【００３０】実施例１１（請求項１１に対応）この実施例は、実施例１０において、音声入力制御用の
スイッチが決められた時間よりも短くオン／オフされた
とき、それより前の認識結果を破棄、または修正するよ
うにしたものである。同様の考え方は、例えば、特開平
３−２７８２９７号公報にも記されている。これは認識
結果が認識し、結果を表示した後、次の認識待ちの状態
で、所定の時間以上入力が無かった場合には認識結果が
正しいものとして取り扱われると言うものである。しか
しながら、正しい結果の場合、一定時間待って、その結
果が出力されれば良いが、誤った場合の修正にはこの方
法を用いることが出来ない。本実施例は、そのような背
景に立ってなされており、１つの意味のある音声の長さ
は最低０.５秒程度であるために、スイッチがそれ以下
でオン／オフされたときにはその前の認識結果を、訂
正、あるいは取消をする。Embodiment 11 (corresponding to claim 11) In this embodiment, when the switch for voice input control is turned on / off for a shorter time than the predetermined time in the tenth embodiment, the recognition result before that is given. Is to be abandoned or modified. The same idea is also described in, for example, Japanese Patent Laid-Open No. 3-278297. This means that after the recognition result is recognized and the result is displayed, the recognition result is treated as correct when there is no input for a predetermined time or more in the next recognition waiting state. However, in the case of a correct result, it suffices that the result is output after waiting for a fixed time, but this method cannot be used for correction in the case of an incorrect result. The present embodiment is made against such a background, and since one meaningful voice has a minimum length of about 0.5 seconds, when the switch is turned on / off below that, The recognition result of is corrected or canceled.

【００３１】図１３は、前記実施例１１を説明するため
の図で、図示のようにスイッチ９の後に音声区間検出部
１０を設けて、ここでは、音声信号の区間よりも、音声
入力のマイクからの電気信号が流れたか途絶えたかによ
り、区間を検出し、その区間長がｔ（０.５秒程度）よ
り短いかどうかを判断するものである。短いときには先
に認識し、現在表示している言葉の表示を消去し、長い
ときは通常通り音声を認識する。認識した後は、現在表
示している情報の次に必要とされた情報として、前のも
のを消して、現在の情報を表示する。FIG. 13 is a diagram for explaining the eleventh embodiment. As shown in the figure, a voice section detector 10 is provided after the switch 9, and here, a voice input microphone is used rather than a voice signal section. The section is detected and whether the section length is shorter than t (about 0.5 seconds) is determined depending on whether the electric signal from the section flows or is cut off. When it is short, it recognizes it first, erases the display of the currently displayed word, and when it is long, it recognizes the voice as usual. After the recognition, the previous information is erased and the current information is displayed as the information required next to the currently displayed information.

【００３２】次に、電話で情報を提供するような場合、
連続して掛かってきた電話が同じ情報を要求している場
合がある。そのような場合には、誤認識の可能性のある
認識の動作を再度繰り返し、結果を選択せねばならない
ような手順を踏む必要はなく、次の実施例１２のように
すればよい。Next, in the case of providing information by telephone,
A series of incoming calls may be requesting the same information. In such a case, it is not necessary to repeat the recognition operation having the possibility of erroneous recognition and select the result, and the procedure of the twelfth embodiment may be performed.

【００３３】実施例１２（請求項１２に対応）この実施例は、実施例１乃至７又は７乃至１１の音声認
識装置において、特定の命令によって、すでに表示し終
わった情報を再度表示するようにしたものである。例え
ば、「１」、「リターン」と打つことによって、１つ前
の情報をそのまま画面に表示するようにする。図１４に
示すように、これまでの方法で音声を認識し、表示した
あと、表示したデータをメモリー１に入れる。この際
に、メモリー３に入っている内容を消去し、メモリー２
の内容をメモリー３に、メモリー１の内容をメモリー２
に移した後、最新のデータをメモリー１に書き込む。認
識の状態、あるいはその状態に入る前に、キーボード１
１から数字が打たれた場合、数字で指定されたメモリー
の内容をディスプレイ画面２に表示し、話者Ｂがその情
報を使用した後は、音声認識を利用して得られた情報と
同様にメモリーの内容をずらして行き、表示している情
報をメモリー１に書き込む。こうすることにより、同じ
言葉を発声仕直すことなく、情報を表示することが出来
るようになった。Embodiment 12 (corresponding to claim 12) In this embodiment, in the voice recognition device of Embodiments 1 to 7 or 7 to 11, the information which has already been displayed is displayed again by a specific command. It was done. For example, by typing "1" or "return", the previous information is displayed as it is on the screen. As shown in FIG. 14, after the voice is recognized and displayed by the conventional method, the displayed data is stored in the memory 1. At this time, the contents in the memory 3 are deleted and the memory 2
Contents in memory 3 and contents in memory 1 in memory 2
Then, the latest data is written in the memory 1. Keyboard 1 before entering the recognition state or that state
When a number is struck from 1, the contents of the memory designated by the number are displayed on the display screen 2, and after speaker B uses the information, the same as the information obtained by using voice recognition. The contents of the memory are shifted, and the displayed information is written in the memory 1. By doing this, it is possible to display information without re-speaking the same words.

【００３４】実施例１３（請求項１３に対応）この実施例は、実施例１２の音声認識装置において、特
定の命令として、対話を不自然にしない決められた言葉
による音声認識結果を利用するようにしたものである。
特定の命令として、「承知致しました」などの言葉が良
く、これを認識したときには先にディスプレーに表示し
た情報を再度表示するようにする。これによって、いく
つかの候補のなかから正解を選ぶことなく、直接情報を
得ることが出来る。勿論、言葉によって２つ前の情報
や、３つ前の情報を表示するような構成も考えられる
が、あまり複雑にすると、使用が難しくなるだけでなく
なるうえ、情報を一時蓄えるためのメモリーも大きくな
ってしまう。さらに、話者ＢがＡと対話中に数字のボタ
ンを押すことは、混乱しやすく、押し間違いをしやす
い。そこで、これが音声が出来れば更に便利になる。そ
のために、音声のコマンドとして、「承知致しました」
のようなものを選んで決めておく。この言葉が認識され
たときには、一つ前の情報をディスプレーに表示するよ
うにする。図１４のメモリーの中を一つにし、「承知致
しました」の類の言葉が認識されたとき、その中の情報
を出すようにする。Embodiment 13 (corresponding to claim 13) In this embodiment, in the voice recognition apparatus of the embodiment 12, as a specific command, the voice recognition result by a predetermined word that does not make the dialogue unnatural is used. It is the one.
As a specific command, words such as "I understand" are good, and when this is recognized, the information previously displayed on the display is displayed again. This makes it possible to directly obtain information without selecting the correct answer from among several candidates. Of course, it is conceivable that two or three information items are displayed depending on the words, but if it is too complicated, it will not only be difficult to use, but also a large memory for temporarily storing information. turn into. Furthermore, it is easy for the speaker B to press the numeric buttons during the conversation with A, which is confusing and easy to make a mistake. So it would be even more convenient if this could be voiced. Therefore, as a voice command, "I understand"
Choose something like and decide. When this word is recognized, the previous information is displayed on the display. The memories shown in FIG. 14 are integrated into one, and when a word of the kind "I understand" is recognized, the information in the word is output.

【００３５】実施例１４（請求項１４に対応）この実施例は、実施例１２の音声認識装置において、特
定の命令として音声入力スイッチのオン／オフを利用す
るようにしたもので、実施例１３との組み合わせでもあ
る。図１５に示した実施例ではスイッチ９を短くオン／
オフしたときの動作が書かれているが、短い動作を複数
回連続して認められたときなど別の方法もある。Embodiment 14 (corresponding to claim 14) In this embodiment, the voice recognition apparatus of embodiment 12 is adapted to utilize ON / OFF of a voice input switch as a specific command. It is also a combination with. In the embodiment shown in FIG. 15, the switch 9 is shortly turned on /
Although the operation when it is turned off is written, there is another method such as when a short operation is recognized several times in a row.

【００３６】実施例１５（請求項１５に対応）この実施例は、実施例１１乃至１４の音声認識装置にお
いて、特定の言葉を認識したときになにも動作、演算を
しないようにしたものである。実施例１３のような装置
において、前の認識結果を訂正するつもりで、スイッチ
をオンしたが、訂正の必要が無いことに気付いた場合、
訂正のコマンドをキャンセルすることが出来ないという
欠点があった。そこで、意味を持たないような相槌に近
い、言葉、例えば、「エエそうですね」の類のものを決
めておいて、これが認識されたら、そのまま認識状態を
終了するだけで、何の動作もしないようにする。Embodiment 15 (corresponding to claim 15) In this embodiment, the speech recognition apparatus according to any one of Embodiments 11 to 14 is configured so that no operation or calculation is performed when a specific word is recognized. is there. In a device such as the thirteenth embodiment, the switch is turned on for the purpose of correcting the previous recognition result, but when it is found that the correction is not necessary,
There was a drawback that the correction command could not be canceled. So, if you decide a word that is close to an auction that doesn't have meaning, for example, something like "Yes, that's right", and if this is recognized, just exit the recognition state and do not do anything. To

【００３７】[0037]

【発明の効果】以上の説明から明らかなように、本発明
によると、対話をしている話者の会話の中から自動的に
情報を提供する装置において、その中の音声認識部分の
不確実性を補償し、対話に不自然さをなくした使いやす
い装置が提供することができる。更に、各請求項におい
ては、下記のごとき効果を得ることができる。請求項１に対応する効果：認識結果の中から意味のある
言葉を音声認識の類似性の高い順（距離の小さい順）、
あるいは、その逆順にディスプレー上に表示し、表示さ
れた中からどれかを音声を発することなく選択できるよ
うにしたので、つまり、認識結果が信頼性の高い順に並
べられているので、認識結果を選択するためのカーソル
をわずかに動かすだけで、正解が選びやすい。請求項２に対応する効果：得られた認識結果の中から意
味のある言葉を文字列に直し、先頭に近い文字を決めら
れた順、あるいは、その逆順にディスプレー上に表示し
たので、例えば、文字がアルファベットで構成されてい
る場合、認識した名詞のアルファベットの単語中の、先
頭に近いものがＡＢＣ順に並ぶように配慮したもので、
この候補の中から、望む正解を探すことは大して難しく
なく、表示された中からどれかを音声を発することなく
選択することができる。請求項３に対応する効果：認識結果の中の意味のある言
葉に数字が存在する場合、先頭に近い数字が決められた
順になるように認識結果をディスプレー上に表示するよ
うにしたので、表示された中からどれかを音声を発する
ことなく選択することができる。請求項４に対応する効果：得られた認識結果の中の意味
のある言葉に文字と数字の両方が存在する場合、例え
ば、アルファベットと数字が組み合わせられている場
合、左側に近い方を優先的に順序づけると、例えば、英
字をＡＢＣ順に並べることを優先し、続いて数字を並べ
ると、先に発声する音が優先となるため、探しやすくな
り、表示された中からどれかを音声を発することなく選
択することができる。請求項５に対応する効果：認識結果の候補の一部に類似
度の高いものを集めてディスプレー上に表示し、この類
似度の高い候補から正解を選ぶようにしたので、正解の
確率が高いので、表示された中からどれかを音声を発す
ることなく選択することができる。請求項６に対応する効果：正解候補の中のもっても類似
度が高いものに、結果選択のためのカーソルを一致させ
てディスプレー上に表示するようにしたので、表示され
た中からどれかを音声を発することなく選択することが
できる。請求項７に対応する効果：得られた認識結果が数字、ま
たは、英字のみである場合、別に記憶されている情報か
ら該数字と対になる文字列を見付け、それも認識結果の
情報としてディスプレー上に表示するようにしたので、
表示された中からどれかを音声を発することなく選択す
ることができる。例えば、認識結果が、例えば、ＦＸだ
けであったり、２５０だけであった場合、このテーブル
の中から組み合わせを見つけ、不足分を補った上で、デ
ィスプレーに表示することによって、認識ミスで欠落し
た情報（例えばＦＸ、或いは２５０）を付け加えてＦＸ
２５０と表示することが出来る。請求項８に対応する効果：正解候補の中に正解が含まれ
ていなかったような場合に、正解に近そうな言葉を選び
出し、その文字列をキーボードから正しく修正して結果
とすることができる。請求項９に対応する効果：話者が先に発声した言葉と同
じ、または、類似した言葉を続けて発声することが不自
然でないような特定の言葉、例えば、「もう一度申し上
げます」とか、「繰り返させていただきます」が認識さ
れたとき、先の認識結果を破棄、または修正するように
したので、対話が不自然にならず、相手に不快感を与え
ない。請求項１０に対応する効果：音声認識装置の入力側にス
イッチを設け、話者がこのスイッチで音声認識装置への
音声入力を制御できるようにしたので、例えば、ＯＡ機
器の「オーエー」と大江さんの「オオエ」とを間違えな
いようにすることができる。請求項１１に対応する効果：請求項１０のスイッチを使
用する場合に、スイッチが決められた時間よりも短くオ
ン／オフされたとき、それより前の認識結果を破棄、ま
たは、修正するようにしたので、対話の相手に不自然さ
を与えることなく、認識結果を修正することができる。請求項１２に対応する効果：特定の命令によって、すで
に表示し終わった情報を再度表示するようにしたので、
同じ言葉を発声し直すことなく、情報を表示することが
出来る。請求項１３に対応する効果：特定の命令として、例え
ば、「承知致しました」などの言葉を利用し、これを認
識したときには先にディスプレーに表示した情報を再度
表示するようにしたので、いくつかの候補のなかから正
解を選ぶことなく、直接情報を得ることが出来る。請求項１４に対応する効果：特定の命令として音声入力
スイッチのオン／オフを利用するようにすることによっ
て、請求項１０の場合と同様の効果を得ることができ
る。請求項１５に対応する効果：特定の言葉を認識したとき
になにも動作、演算をしないようにしたので、例えば、
意味を持たないような相槌に近い、言葉、例えば、「エ
エそうですね」の類のものを決めておいて、これが認識
されたら、そのまま認識状態を終了するだけで、何の動
作もしないようにしたので、不自然な対話が生じない。As is apparent from the above description, according to the present invention, in the device for automatically providing information from the conversation of the talking speaker, the uncertainty of the voice recognition portion therein is uncertain. It is possible to provide an easy-to-use device that compensates for gender and eliminates unnaturalness in dialogue. Furthermore, in each claim, the following effects can be obtained. Effect corresponding to claim 1: Meaningful words are selected from the recognition result in the order of high similarity of speech recognition (smallest distance),
Or, since it was displayed on the display in reverse order so that you can select any of the displayed ones without uttering a voice, that is, the recognition results are arranged in order of high reliability. It is easy to select the correct answer by slightly moving the cursor for selection. Effect corresponding to claim 2: Meaningful words from the obtained recognition result are converted into a character string, and characters near the beginning are displayed on the display in a determined order or in the reverse order. When the letters are composed of alphabets, it is considered that the words near the beginning among the words of the recognized noun alphabet are arranged in the order of ABC,
It is not so difficult to find a desired correct answer from these candidates, and any one of the displayed candidates can be selected without uttering a voice. Effect corresponding to claim 3: When a number is present in a meaningful word in the recognition result, the recognition result is displayed on the display so that the number near the beginning is in a determined order. It is possible to select any of the above without making a sound. Effect corresponding to claim 4: When both a letter and a number are present in a meaningful word in the obtained recognition result, for example, when an alphabet and a number are combined, the one closer to the left side is given priority. For example, if alphabetical letters are arranged in order of ABC, priority is given to the alphabetical letters, and if numbers are successively arranged, the sound to be uttered first has priority, which makes it easier to find and utters one of the displayed sounds. Can be selected without. Effect corresponding to claim 5: Since some of the recognition result candidates having a high degree of similarity are collected and displayed on the display and the correct answer is selected from the candidates having a high degree of similarity, the probability of a correct answer is high. Therefore, it is possible to select any of the displayed ones without producing a sound. Effect corresponding to claim 6: Since the cursor having the highest degree of similarity among the correct answer candidates is displayed on the display by matching the cursor for result selection, any one of the displayed candidates can be selected. You can select without uttering a voice. Effect corresponding to claim 7: When the obtained recognition result is only numbers or letters, a character string paired with the number is found from the separately stored information, and that is also displayed as the recognition result information. Since it was displayed above,
It is possible to select any of the displayed ones without producing a sound. For example, when the recognition result is, for example, only FX or 250, the combination is found from this table, the shortage is compensated, and the result is displayed on the display. FX by adding information (eg FX or 250)
It can be displayed as 250. Effect corresponding to claim 8: When a correct answer is not included in correct answer candidates, a word that is likely to be close to the correct answer can be selected, and the character string can be corrected correctly from the keyboard to obtain the result. . Effect corresponding to claim 9: A specific word for which it is not unnatural for the speaker to continuously say the same or similar words as previously said, for example, "I would like to say again." I will repeat it "is recognized, so the previous recognition result is discarded or corrected, so that the dialogue does not become unnatural and does not cause discomfort to the other party. Effect corresponding to claim 10: Since a switch is provided on the input side of the voice recognition device so that the speaker can control voice input to the voice recognition device, for example, "OA" of OA equipment and Oe You can make sure you are not mistaken for Mr. Effect corresponding to claim 11: When the switch of claim 10 is used, when the switch is turned on / off for less than a predetermined time, the recognition result before that is discarded or corrected. Therefore, the recognition result can be corrected without giving unnaturalness to the other party of the dialogue. Effect corresponding to claim 12: Since the information that has already been displayed is displayed again by a specific command,
Information can be displayed without re-speaking the same word. Effect corresponding to claim 13: As a specific command, for example, a word such as "I understand" is used, and when this is recognized, the information previously displayed on the display is displayed again. It is possible to directly obtain information without selecting the correct answer from those candidates. Effect corresponding to claim 14: By using ON / OFF of the voice input switch as the specific command, the same effect as in the case of claim 10 can be obtained. Effect corresponding to claim 15: Since no operation or calculation is performed when a specific word is recognized, for example,
I decided a word close to an auction that doesn't make sense, for example, something like "yes, that's right", and if this is recognized, I just exit the recognition state and do nothing. Therefore, unnatural dialogue does not occur.

[Brief description of drawings]

【図１】本発明の第１の実施例を説明するための要部
回路構成図である。FIG. 1 is a circuit configuration diagram of a main part for explaining a first embodiment of the present invention.

【図２】本発明の第１の実施例を説明するためのディ
スプレイ上の表示例を示す図である。FIG. 2 is a diagram showing a display example on a display for explaining the first embodiment of the present invention.

【図３】本発明の第２の実施例を説明するための要部
回路構成図である。FIG. 3 is a circuit configuration diagram of a main part for explaining a second embodiment of the present invention.

【図４】本発明の第２の実施例を説明するためのディ
スプレイ上の表示例を示す図である。FIG. 4 is a diagram showing a display example on a display for explaining a second embodiment of the present invention.

【図５】本発明の第３の実施例を説明するためのディ
スプレイ上の表示例を示す図である。FIG. 5 is a diagram showing a display example on a display for explaining a third embodiment of the present invention.

【図６】本発明の第４の実施例を説明するためのディ
スプレイ上の表示例を示す図である。FIG. 6 is a diagram showing a display example on a display for explaining a fourth embodiment of the present invention.

【図７】本発明の第５の実施例を説明するための要部
回路構成図である。FIG. 7 is a circuit configuration diagram of a main part for explaining a fifth embodiment of the present invention.

【図８】本発明の第５の実施例を説明するためのディ
スプレイ上の表示例を示す図である。FIG. 8 is a diagram showing a display example on a display for explaining a fifth embodiment of the present invention.

【図９】本発明の第６の実施例を説明するためのディ
スプレイ上の表示例を示す図である。FIG. 9 is a diagram showing a display example on a display for explaining a sixth embodiment of the present invention.

【図１０】本発明の第７の実施例を説明するためのフ
ロー図である。FIG. 10 is a flowchart for explaining a seventh embodiment of the present invention.

【図１１】本発明の第８，９の実施例を説明するため
の図である。FIG. 11 is a diagram for explaining eighth and ninth embodiments of the present invention.

【図１２】本発明の第１０の実施例を説明するための
要部構成図である。FIG. 12 is a main part configuration diagram for explaining a tenth embodiment of the present invention.

【図１３】本発明の第１１の実施例を説明するための
要部構成図である。FIG. 13 is a main part configuration diagram for explaining an eleventh embodiment of the present invention.

【図１４】本発明の第１２，１３の実施例を説明する
ための要部構成図である。FIG. 14 is a main part configuration diagram for explaining a twelfth and a thirteenth embodiments of the present invention.

【図１５】本発明の第１４，１５の実施例を説明する
ための要部構成図である。FIG. 15 is a main part configuration diagram for explaining fourteenth and fifteenth embodiments of the present invention.

[Explanation of symbols]

１…音声認識装置，２…ディスプレイ，３…マウス，４
…カーソル，５…データベース，６…関係テーブル，
７，８…キーボード，９…スイッチ，１０…音声区間検
出器、１１…キーボード。1 ... Voice recognition device, 2 ... Display, 3 ... Mouse, 4
… Cursor, 5… Database, 6… Relationship table,
7, 8 ... Keyboard, 9 ... Switch, 10 ... Voice section detector, 11 ... Keyboard.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ１０Ｌ 3/00 ５７１Ｇ (72)発明者広瀬雅子東京都大田区中馬込１丁目３番６号株式会社リコー内─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification number Office reference number FI technical display location G10L 3/00 571 G (72) Inventor Masako Hirose 1-3-6 Nakamagome, Ota-ku, Tokyo Stock company Ricoh

Claims

[Claims]

1. An environment in which a speaker can have a conversation without being aware of the use of a special device, a means for extracting one or more signals from the conversation and applying them to a voice recognition device, and a result of recognition. In a voice recognition device that displays the result (and / or) outputs a signal that causes a specific one to operate according to the result, a meaningful word is selected from the obtained recognition results to be similar to the voice recognition. A voice recognition device characterized in that it is displayed on the display in the order of high performance (in order of increasing distance) or in the reverse order so that one of the displayed can be selected without uttering a voice.

2. An environment in which a speaker can have a conversation without being aware of the use of a special device, a means for extracting one or more signals from the conversation and applying them to a voice recognition device, and a recognition result. In the voice recognition device, which displays the result (and / or) outputs a signal for operating a specific one according to the result, converts a meaningful word from the obtained recognition result into a character string. A voice recognition device characterized in that characters near the beginning are displayed on the display in a predetermined order or in the reverse order, and any one of the displayed characters can be selected without uttering a voice.

3. An environment in which a speaker can have a conversation without being aware of the use of a special device, a means for extracting one or more signals from the conversation and applying them to a voice recognition device, and a recognition result. , In the voice recognition device which displays the result (and / or) outputs a signal for operating a specific one with the result, there is a number in a meaningful word from the obtained recognition results In this case, the speech recognition device is characterized in that the recognition results are displayed on the display so that the numbers near the beginning are in a determined order, and any of the displayed results can be selected without uttering a voice.

4. An environment in which a speaker can have a conversation without being aware of the use of a special device, a means for extracting one or more signals from the conversation and applying them to a voice recognition device, and a result of recognition. In the voice recognition device, which displays the result (and / or) outputs a signal for operating a specific one according to the result, a meaningful word in the obtained recognition result is composed of letters and numbers. If both are present, according to the above ordering, the one closest to the beginning is preferentially ordered and the recognition result is displayed on the display, and one of the displayed ones can be selected without making a sound. And a voice recognition device.

5. The method according to any one of claims 1 to 4, wherein some of the recognition result candidates having a high degree of similarity are collected and displayed on a display, and one of the displayed results is not uttered. A voice recognition device characterized by being selectable.

6. The method according to claim 1, wherein a cursor for selecting a result is displayed on the display with a cursor having a highest similarity among the correct answer candidates.
A voice recognition device characterized in that it is possible to select any of the displayed ones without producing a voice.

7. An environment in which a speaker can have a conversation without being aware of the use of a special device, a means for extracting one or more signals from the conversation and applying them to a voice recognition device, and a recognition result. In the voice recognition device, which displays the result (and / or) outputs a signal for operating a specific one according to the result, a meaningful word from the obtained recognition result is a number, or If it is only English letters, it finds a character string that pairs with the number from the separately stored information, displays it as the information of the recognition result on the display, and does not utter any of the displayed characters. A voice recognition device characterized by being selectable.

8. The method according to claim 1, wherein one of the displayed recognition results is selected and a selected character,
Alternatively, a voice recognition device characterized by correcting a part of a numeral by an input means other than voice.

9. The specific word according to any one of claims 1 to 4 or 7, wherein a specific word is recognized so that it is not unnatural for the speaker to continuously say the same or similar words as previously said. The previous recognition result is discarded, or
A voice recognition device characterized by being modified.

10. The voice recognition device according to claim 1, wherein a switch is provided, and a speaker controls voice input to the device by the switch.

11. The voice recognition device according to claim 10, wherein when the switch for voice input control is turned on / off for a shorter time than a predetermined time, the recognition result before that is discarded or corrected. And a voice recognition device.

12. The voice recognition device according to claim 1, wherein the information already displayed is displayed again by a specific command.

13. The voice recognition device according to claim 12, wherein a voice recognition result by a predetermined word that does not make the dialogue unnatural is used as the specific command.

14. The voice recognition device according to claim 12, wherein on / off of a voice input switch is used as the specific command.

15. A voice recognition apparatus according to claim 11, wherein no operation or calculation is performed when a specific word is recognized.