JPH0695828A

JPH0695828A - Voice input system

Info

Publication number: JPH0695828A
Application number: JP4245058A
Authority: JP
Inventors: Yoichi Sadamoto; 洋一貞本; Shigenobu Seto; 重宣瀬戸; Yoichi Takebayashi; 洋一竹林
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-09-14
Filing date: 1992-09-14
Publication date: 1994-04-08
Anticipated expiration: 2020-10-26
Also published as: JP3710493B2

Abstract

PURPOSE:To agreeably utilize a voice input function by displaying visually constraint information at the time of utilizing a voice input of a voice input object sentence pattern, a voice recognition object word, etc., so as to be easily understandable in accordance with a state of a system. CONSTITUTION:In an applied data processing part 4, a processing is executed based on an input from a voice recognizing part 2, and a response of a result of processing, etc., to be presented to a user is outputted to a display control part 5. For instance, based on a partition of input information of a difference of a vocabulary, etc., requested by the applied data processing part 4, information of an internal state set in advance is outputted. In a voice input constraint information managing part 3, by this information, an internal state-continuous/ isolated utterance corresponding table is retrieved, and whether the present voice can be inputted in a continuous voice or isolated utterance is necessary, is outputted. Subsequently, in a display control part 5, this information and a result of processing from the applied data processing part 4 are outputted to a display part 6. In the display part 6, this information, especially whether the present continuous utterance is executed or the isolated utterance is necessary, is displayed visually so as to be easily understandable.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、編集、翻訳、計算、描
画、複写などのデータ処理の結果を視覚表示する手段を
有する情報処理システムにおいて、音声認識を利用した
音声入力システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice input system using voice recognition in an information processing system having means for visually displaying the results of data processing such as editing, translation, calculation, drawing and copying.

【０００２】[0002]

【従来の技術】従来、電話による問い合わせや計算機へ
のデータ入力システムなどの応用に音声入力によるイン
ターフェースの実用化が成されているが、音声入力の方
法が分かりにくかったり、誤認識が発生した場合の入力
情報の訂正方法が分かりにくいために、使い勝手の悪い
ユーザインターフェースとなっていた。2. Description of the Related Art Conventionally, a voice input interface has been put into practical use in applications such as telephone inquiries and data input systems to computers. However, when the voice input method is difficult to understand or erroneous recognition occurs. Since it was difficult to understand how to correct the input information of, the user interface was not easy to use.

【０００３】また、近年電子計算機は、キーボード、マ
ウス、マイクロフォン、イメージスキャナ、ペン入力な
どの複数の入力手段（マルチモーダルインターフェー
ス）を装備し、様々な入力方法を可能にしている。そこ
で、従来のキーボードとマウスを用いた入力機能の一部
または全部をマイクロフォンによる音声入力によっても
可能とし、キーボード、マウス、マイクロフォンを併用
した使い勝手の良いマルチモーダルなユーザインターフ
ェースの要求が高まっている。例えば、ウィンドウシス
テムのボタンをマウスクリックだけでなく、ボタン上に
表示されている文字を発声して入力を可能とするような
ユーザインターフェースの要求が高まっている。Further, in recent years, electronic computers are equipped with a plurality of input means (multimodal interface) such as a keyboard, a mouse, a microphone, an image scanner, and a pen input to enable various input methods. Therefore, some or all of the conventional input functions using a keyboard and a mouse are enabled by voice input using a microphone, and there is an increasing demand for a user-friendly multi-modal user interface that uses a keyboard, a mouse, and a microphone together. For example, there is an increasing demand for a user interface that allows not only mouse clicking on a button of a window system but also uttering characters displayed on the button to input.

【０００４】しかし、初心者が前述のキーボード、マウ
ス、マイクロフォンを併用したシステムを使用した場
合、音声入力対象文型、音声認識対象単語、連続／孤立
発声、特定／不特定話者対象、音声認識辞書作成のため
の音声データ収集方法などの音声入力の利用に関連した
制約情報についての予備知識が無いため、入力方法が解
らず、戸惑ったり入力間違いを起こしていた。例えば、
マイクロフォンによる音声入力の場合は、予め登録した
入力単語についてのみ音声認識可能であるため、ユーザ
は音声入力対象単語を予め知って音声入力する必要があ
った。However, when a beginner uses the above-described system that uses a keyboard, a mouse, and a microphone in combination, a voice input target sentence pattern, a voice recognition target word, continuous / isolated utterance, a specific / unspecified speaker target, and a voice recognition dictionary are created. Because there was no prior knowledge about the constraint information related to the use of voice input such as the voice data collection method for, the input method was not understood and the user was confused or made an input error. For example,
In the case of voice input using a microphone, only a pre-registered input word can be voice-recognized, so that the user needs to know the voice input target word in advance and input the voice.

【０００５】さらに、音声認識のための認識辞書作成に
は、大量の音声データが必要であるため、画面に表示さ
れた全ての入力対象に対して、音声入力を支援できない
場合もある。例えば、ウィンドウシステムの同じボタン
アイテムであっても、音声によっても入力可能なものと
不可能なものができることになり、ユーザは、音声入力
可能なボタンアイテムを予め知ってボタンアイテム上に
表示されている文字を発声する必要があった。とくに、
初心者は、どのボタンアイテムが音声入力可能であるか
解らず、戸惑ったり入力間違いを起こしていた。熟練し
た利用者でさえ、音声入力可能なボタンアイテムを自ら
記憶していたり、さもなくば、音声入力対象単語の表な
どを見て確認する必要があった。そのため、利用者に負
担がかかり使い勝手の悪いインターフェースであった。Further, since a large amount of voice data is required to create a recognition dictionary for voice recognition, voice input may not be supported for all input targets displayed on the screen. For example, even if the same button item of the window system can be input and cannot be input by voice, the user knows in advance which button item can be input by voice and is displayed on the button item. It was necessary to utter the letters that were present. Especially,
Beginners were confused and made input mistakes without knowing which button items could be used for voice input. Even a skilled user had to remember a button item capable of voice input by himself or, otherwise, it was necessary to check by checking a table of words for voice input. Therefore, the interface is burdensome for the user and is not easy to use.

【０００６】[0006]

【発明が解決しようとする課題】このように従来の音声
入力を用いたシステムでは、ユーザは、音声入力対象文
型、音声認識対象単語、連続／孤立発声、特定／不特定
話者対象、男性／女性話者対象、大人／子供話者対象、
音声入力のＯＮ／ＯＦＦなどの音声入力の利用に関連し
た制約情報が解りにくく、戸惑ったり、入力間違いを起
こしていた。As described above, in the conventional system using the voice input, the user has a voice input target sentence pattern, a voice recognition target word, continuous / isolated utterance, a specific / unspecific speaker target, a male / Female speakers, adult / child speakers,
Constraint information related to the use of voice input, such as voice input ON / OFF, was difficult to understand, which caused confusion or input error.

【０００７】この発明はこのような従来の課題を解決す
るためになされたもので、その目的とするところは、音
声入力の利用に関する制約情報を分かり易く表示し得る
音声入力システムを提供することにある。The present invention has been made to solve such a conventional problem, and an object thereof is to provide a voice input system capable of easily displaying constraint information regarding the use of voice input. is there.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するた
め、本発明に係る音声入力システムは、音声入力対象文
型、音声認識対象単語、連続／孤立発声、特定／不特定
話者対象、男性／女性話者対象、大人／子供話者対象、
音声入力のＯＮ／ＯＦＦなどの音声入力の利用形態、発
話様式に関連した制約情報を、文字、図形、画像、光り
の強弱、色の違い、濃度、テクスチャなどの視覚情報に
より分かりやすく表示する手段を具備している。In order to achieve the above object, a voice input system according to the present invention has a voice input target sentence pattern, a voice recognition target word, continuous / isolated utterance, a specific / unspecified speaker target, a male / Female speakers, adult / child speakers,
A means for displaying constraint information related to a voice input usage pattern such as ON / OFF of voice input and a speech style in an easy-to-understand manner by visual information such as characters, figures, images, intensity of light, color difference, density, and texture. It is equipped with.

【０００９】[0009]

【作用】本発明によれば、音声入力によってその命令実
行、機能の選択、データ入力を支援する場合に、現在の
システムの内部状態を調べ、その状態での音声入力対象
文型、音声入力対象単語、連続／孤立発声、特定／不特
定話者対象、認識対象話者の性別や年齢、認識性能など
の音声入力を利用する際の制約情報を、文字、図形、画
像、光りの強弱、色の違い、テクスチャなどの視覚情報
の制御により分かりやすく表示するため、ユーザは一目
で音声入力の際の制約条件や発声方法を理解することが
できる。According to the present invention, when the command execution, function selection, and data input are supported by voice input, the internal state of the current system is checked, and the voice input target sentence pattern and voice input target word in that state are checked. , Continuous / isolated utterance, specific / unspecified speaker target, recognition target speaker's gender and age, recognition performance, and other constraint information when using voice input, including characters, figures, images, intensity of light, and color Since the difference and texture are displayed in an easy-to-understand manner by controlling visual information such as texture, the user can understand at a glance the constraint conditions at the time of voice input and the utterance method.

【００１０】よって、ユーザは音声入力の利用に関連し
た制約情報が解らず戸惑ったり、入力間違いを起こすこ
となく、快適に音声入力機能を利用することができ、初
心者にも使い易いヒューマンインターフェースが実現で
きる。Therefore, the user can comfortably use the voice input function without being confused about the restriction information related to the use of voice input or causing an input error, and realizes a human interface which is easy for a beginner to use. it can.

【００１１】[0011]

【実施例】以下図面を参照しながら本発明の一実施例に
ついて説明する。図１は本発明の第１の実施例に係る音
声入力システムの構成図である。同図において、音声入
力部１を通じて音声は電気信号に変換され音声認識部２
へ入力される。音声認識部２では、入力された音声を、
例えば１２kHz の標本化周波数、１６bit の量子化ビッ
ト数のデジタル信号に変換し、例えば高速フーリエ変換
による周波数分析により特徴パラメータ時系列を求め、
さらに、例えば複合類似度法、HMM(Hidden Marcov Mode
l)などの方式に基づき音声認識を行い、認識結果として
尤度の最も高い語彙または文を応用データ処理部に出力
する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a configuration diagram of a voice input system according to a first embodiment of the present invention. In the figure, a voice is converted into an electric signal through a voice input unit 1 and a voice recognition unit 2
Is input to. In the voice recognition unit 2, the input voice is
For example, it is converted into a digital signal having a sampling frequency of 12 kHz and a quantization bit number of 16 bits, and a characteristic parameter time series is obtained by frequency analysis by, for example, fast Fourier transform,
Furthermore, for example, the composite similarity method, HMM (Hidden Marcov Mode
Speech recognition is performed based on the method such as l) and the vocabulary or sentence with the highest likelihood is output to the application data processing unit as the recognition result.

【００１２】応用データ処理部４は、例えば、翻訳、計
算、編集、描画、複写、予約、現金自動支払などの作業
を支援するソフトウェアまたはシステムである。応用デ
ータ処理部４では、上記音声認識部からの入力に基づい
て処理し、次にユーザに呈示する処理結果などの応答を
表示制御部５へ出力し、例えば応用データ処理部４の要
求する語彙の違いなどの入力情報の区分に基づいて予め
設定された内部状態の情報を音声入力制約情報管理部３
へ出力する。例えば、航空券の予約システムにおいて応
用データ処理部の要求する入力が「行先」である状態か
ら「出発時刻」である状態に変化した際には、「出発時
刻」を入力する内部状態に変化したという情報を音声入
力制約情報管理部３へ出力する。The application data processing unit 4 is, for example, software or a system that supports operations such as translation, calculation, editing, drawing, copying, reservation, and automatic cash payment. The application data processing unit 4 processes based on the input from the voice recognition unit, then outputs a response such as a processing result to be presented to the user to the display control unit 5, and, for example, the vocabulary requested by the application data processing unit 4. The information on the internal state preset based on the division of the input information such as the difference between the voice input restriction information management unit 3
Output to. For example, in the airline ticket reservation system, when the input requested by the application data processing unit changes from the state of “destination” to the state of “departure time”, it changes to the internal state of inputting “departure time”. Is output to the voice input restriction information management unit 3.

【００１３】音声入力制約情報管理部３では、応用デー
タ処理部４の内部状態の変化情報により、例えば、表１
のような内部状態−連続／孤立発声対応テーブルを検索
して、現在、音声を連続発声で入力可能であるか、孤立
発声する必要が有るのかを音声認識部２と表示制御部５
へ出力する。The voice input restriction information management unit 3 uses, for example, Table 1 according to the change information of the internal state of the application data processing unit 4.
Such an internal state-continuous / isolated utterance correspondence table is searched, and it is determined whether it is possible to input the voice continuously by the utterance or whether it is necessary to utter the utterance at present, the voice recognition unit 2 and the display control unit 5.
Output to.

【００１４】[0014]

【表１】表示制御部５では、応用データ処理部４からの処理結果
と連続／孤立発声の区別を表す情報を表示部６に出力す
る。表示部６では、表示制御部５からの情報を画面表示
し、とくに、現在、連続発声するのか、または孤立発声
する必要があるのかを分かりやすく視覚表示する。[Table 1] The display control unit 5 outputs the processing result from the application data processing unit 4 and information indicating the distinction between continuous and isolated utterances to the display unit 6. The display unit 6 displays the information from the display control unit 5 on the screen, and particularly, visually displays whether the continuous utterance or the isolated utterance is required at present.

【００１５】応用例として航空券の予約を応用データ処
理部４が支援している場合について述べる。先ず、この
装置を始動させた時、応用データ処理部４は、初期状態
という情報を音声入力制約情報管理部３へ出力し、初期
画面の表示情報を表示制御部５へ出力する。音声入力制
約情報管理部３では、表１に示す内部状態−連続／孤立
発声対応テーブルを検索し、初期状態の時、連続発声す
ることが可能であるという情報を音声認識部２と表示制
御部５へ出力する。表示制御部５では、応用データ処理
部４からの表示情報と音声入力制約情報管理部３からの
連続発声可能という情報を視覚表示する表示データを作
成し、表示部６へ出力する。表示部６では、表２に示す
ように、予約／キャンセルと行き先と便番号などを選択
させる表示と連続発声が可能であることの表示がなされ
る。As an application example, a case where the application data processing unit 4 supports the reservation of airline tickets will be described. First, when the apparatus is started, the application data processing unit 4 outputs the information of the initial state to the voice input restriction information management unit 3 and the display information of the initial screen to the display control unit 5. The voice input restriction information management unit 3 searches the internal state-continuous / isolated utterance correspondence table shown in Table 1, and obtains information indicating that continuous utterance is possible in the initial state, from the voice recognition unit 2 and the display control unit. Output to 5. The display control unit 5 creates display data for visually displaying the display information from the application data processing unit 4 and the information indicating that continuous speech is possible from the voice input restriction information management unit 3, and outputs the display data to the display unit 6. As shown in Table 2, the display section 6 displays a reservation / cancel, a display for selecting a destination and a flight number, and a display indicating that continuous utterance is possible.

【００１６】[0016]

【表２】ユーザは、予約またはキャンセルの選択と行き先／便番
号を連続発声すれば良いことが一目で分かる。[Table 2] The user can see at a glance that it is sufficient to continuously select the reservation or cancellation and speak the destination / flight number.

【００１７】次に、ユーザがニューヨーク行きの便を予
約したい場合に「予約ニューヨーク」と発声した際、音
声認識部２は、予め音声入力制約情報管理部３よりユー
ザが連続発声を行うことを知らされており、孤立ではな
く連続発声に適した認識手法を用いて認識をより正確に
行い、さらに「予約」、「ニューヨーク」という入力が
応用データ処理部４に入力される。応用データ処理部４
では、図２に示すように内部状態が航空券の出発の時刻
を指定する状態に変り、前述と同様に、表２に示す時刻
指定状態という内部状態に関する情報を音声入力制約情
報管理部へ出力し、時刻指定画面の表示情報を表示制御
部５へ出力する。そして、音声入力制約情報管理部３と
表示制御部５が前述と同様の処理を行い、表３に示す便
指定画面が表示される。Next, when the user utters "Reserved New York" when he / she wants to book a flight to New York, the voice recognition unit 2 knows from the voice input restriction information management unit 3 that the user will continuously speak. The recognition is performed more accurately using a recognition method suitable for continuous utterance rather than isolation, and inputs such as “reservation” and “New York” are input to the application data processing unit 4. Application data processing unit 4
Then, as shown in FIG. 2, the internal state changes to a state in which the departure time of the airline ticket is designated, and similarly to the above, the information regarding the internal state of the time designated state shown in Table 2 is output to the voice input restriction information management unit. Then, the display information of the time designation screen is output to the display control unit 5. Then, the voice input restriction information management unit 3 and the display control unit 5 perform the same processing as described above, and the flight designation screen shown in Table 3 is displayed.

【００１８】[0018]

【表３】なお、表３では月、日、時、分を音声を所定時間空白
（無音）とすることで孤立発声を入力する例を示してい
るが、キー入力やマウスをクリックするなどして区切る
ようにしても良い。[Table 3] Note that Table 3 shows an example of inputting an isolated utterance by leaving the voice of the month, day, hour, and minute blank (silent) for a predetermined time. However, it is possible to divide the voice by key input or clicking the mouse. May be.

【００１９】このように、音声入力を連続発声可能であ
るか孤立発声を行う必要があるのかを画面上に表示する
ことにより、ユーザは一目で音声の連続／孤立の発声方
法の区別を理解することができるため、現在、連続／孤
立どちらかの発声方法が適当であるか分からず戸惑った
り、発声方法を誤ったりすることがなくなり、使い勝手
の良いインターフェースを構築できる。As described above, by displaying on the screen whether the voice input can be continuously uttered or whether it is necessary to perform the isolated utterance, the user can understand the distinction between the continuous / isolated utterance methods of the voice at a glance. Therefore, it is possible to construct an easy-to-use interface without being confused about whether the continuous / isolated utterance method is appropriate or erroneous.

【００２０】次に、本発明の第２実施例について述べ
る。構成は、第１実施例と同様であるが、音声入力制約
情報管理部は、内部状態−連続／孤立発声対応テーブル
ではなく、表４に示すような認識対象語彙が入力として
受入れられるために必要な発声回数を設定した認識対象
語彙−発声回数対応テーブルを管理している。Next, a second embodiment of the present invention will be described. The configuration is the same as that of the first embodiment, but the voice input restriction information management unit is necessary because the recognition target vocabulary shown in Table 4 is accepted as an input instead of the internal state-continuous / isolated utterance correspondence table. It manages a recognition target vocabulary-utterance count correspondence table in which various utterance counts are set.

【００２１】[0021]

【表４】音声入力部１と応用データ処理部４の処理の流れは、第
１実施例と同様である。音声入力制約情報管理部３で
は、表４に示すような認識対象語彙−発声回数対応テー
ブルを参照し、現在の認識対象語彙が入力となるために
必要な発声回数を音声認識部２と表示制御部５に出力
し、音声認識部２は、各語彙が音声入力制約情報管理部
３より入力された回数分連続して入力された場合に、応
用データ処理部４へ認識結果としてその語彙を出力す
る。[Table 4] The processing flow of the voice input unit 1 and the application data processing unit 4 is the same as in the first embodiment. The voice input restriction information management unit 3 refers to a recognition target vocabulary-speech count correspondence table as shown in Table 4, and determines the number of utterances necessary for the current recognition target vocabulary to be an input and the voice recognition unit 2 and display control. The speech recognition unit 2 outputs the vocabulary to the application data processing unit 4 as a recognition result when each vocabulary is continuously input by the number of times input by the voice input restriction information management unit 3. To do.

【００２２】応用データ処理部４では、処理結果などの
応答を表示制御部５に出力し、内部状態の変化情報を音
声入力制約情報管理部３に出力する。表示制御部５で
は、応用データ処理部４からの表示情報や、音声入力制
約情報管理部３から入力された入力語彙の発声回数の区
別を表す情報を表示部６に出力する。表示部６では、表
示制御部５からの表示情報を画面表示し、とくに、現
在、入力語彙の発声回数を分かりやすく視覚表示する。
例えば、ユーザインターフェースとして確実性が要求さ
れる入力語彙を複数回発声するようにすると、より安定
したインターフェースを実現できる。例えば、図３に示
すように、１回の発声で入力できる語彙（「次画面」）
は、その語彙の周囲を１重の枠で囲って表示し、２回発
声する必要のある語彙（「終了」）は、その語彙の周囲
を２重の枠で囲って表示することにより、ユーザは一目
で入力対象語彙の発声回数が分かり、使い勝手の良いイ
ンターフェースとなる。The application data processing unit 4 outputs a response such as a processing result to the display control unit 5, and outputs change information of the internal state to the voice input restriction information management unit 3. The display control unit 5 outputs to the display unit 6 the display information from the application data processing unit 4 and the information indicating the number of times of utterance of the input vocabulary input from the voice input restriction information management unit 3. In the display unit 6, the display information from the display control unit 5 is displayed on the screen, and in particular, the number of times the input vocabulary is currently uttered is visually displayed in an easy-to-understand manner.
For example, by uttering an input vocabulary that requires certainty as a user interface multiple times, a more stable interface can be realized. For example, as shown in Fig. 3, vocabulary that can be input with one utterance ("next screen")
Indicates that the vocabulary is surrounded by a single frame, and the vocabulary that needs to be uttered twice (“end”) is displayed by surrounding the vocabulary with a double frame. Is a user-friendly interface because the number of utterances of the input vocabulary can be understood at a glance.

【００２３】なお、図３では２重の枠によって２回発声
を示すようにしたが、文字の表示や枠等に濃淡を付け
て、濃い場合には大きい声で発声するよう指示する構成
とすることもできる。また、発話回数分だけ「☆」や
「＊」等の記号やマークを肩に表示する等、他の表示方
法によっても発生回数を示すことが可能である。In FIG. 3, the utterance is shown twice by a double frame. However, the display of characters, the frame, and the like are shaded, and when the tone is dark, it is instructed to speak loudly. You can also It is also possible to indicate the number of occurrences by other display methods such as displaying symbols or marks such as “☆” and “*” on the shoulders for the number of times of utterance.

【００２４】また、ブザー等のオーディオ信号を断続し
て複数回出力し、音声入力繰り返し回数を指定すること
もできる。It is also possible to intermittently output an audio signal such as a buzzer a plurality of times to specify the number of times of voice input repetition.

【００２５】次に、第３実施例について述べる。構成
は、第１実施例と同様であるが、音声入力制約情報管理
部３は、内部状態−連続／孤立発声対応テーブルではな
く、表５に示すような数字入力方法テーブルを管理して
いる。Next, a third embodiment will be described. Although the configuration is the same as that of the first embodiment, the voice input restriction information management unit 3 manages the number input method table as shown in Table 5 instead of the internal state-continuous / isolated utterance correspondence table.

【００２６】[0026]

【表５】音声入力部１と応用データ処理部４の処理の流れは、第
１実施例と同様である。[Table 5] The processing flow of the voice input unit 1 and the application data processing unit 4 is the same as in the first embodiment.

【００２７】音声入力制約情報管理部３では、表５に示
すような数字入力方法テーブルを参照し、複数桁の数字
を音声入力する際の発声方法を音声認識部２と表示制御
部５に出力する。例えば、応用データ処理部４の内部状
態が暗証番号を入力する状態である場合、表５に示す数
字入力方法テーブルを参照し、複数桁の数字を上位の桁
から「イチ］「ニ」「サン］「ヨン」「シ」「ゴ］「ロ
ク」「シチ」「ナナ］「ハチ」「キュウ」の１１種類の
音声を用いて発声するという発声方法に関する制約情報
（個別発声）を音声認識部２と表示制御部５に出力す
る。例えば、表６の例に示すように個別発生の場合ユー
ザは、８２１５という数字を「ハチニイチゴ」のように
数字をそのまま１桁ずつ発声しなくてはならない。ま
た、応用データ処理部４の内部状態が表７の例に示すよ
うに予約番号を入力する状態である場合は、数字入力方
法テーブルを参照し、桁数を含んで複数桁の数字を発声
するという発声方法に関する制約情報（複数桁発声）を
音声認識部２と表示制御部５に出力する。この場合、ユ
ーザは２５１という数字を「ニヒャクゴジュウイチ」の
ように発声しなくてはならない。同様に、両方を受け入
れることを視覚表示することもできる。The voice input restriction information management unit 3 refers to the number input method table as shown in Table 5 and outputs the utterance method for voice input of a plurality of digits to the voice recognition unit 2 and the display control unit 5. To do. For example, when the internal state of the application data processing unit 4 is a state in which the personal identification number is input, the numeral input method table shown in Table 5 is referred to, and a plurality of digits are entered from the upper digit "Ichi""D""San". ] The voice recognition unit 2 provides constraint information (individual utterances) regarding the utterance method of uttering using 11 types of voices of "Young", "Si", "Go", "Roku", "Shichi", "Nana", "Hachi", and "Kyu". Is output to the display control unit 5. For example, as shown in the example of Table 6, in the case of an individual occurrence, the user has to utter the number 8215, digit by digit, such as "Hachini Strawberry". If the internal state of the application data processing unit 4 is in the state of inputting a reservation number as shown in the example of Table 7, the numeral input method table is referred to and a plurality of digits including the number of digits are uttered. The constraint information (voices of a plurality of digits) regarding the utterance method is output to the voice recognition unit 2 and the display control unit 5. In this case, the user has to say the number 251 as "Nyakugojuichi". Similarly, acceptance of both can be visually indicated.

【００２８】音声認識部２は、数の認識において音声入
力制約情報管理部３より入力された数字の発声方法に関
する制約情報によって、音声認識辞書を選択して認識を
行う。例えば、数字を１桁ずつ音声入力する場合（個別
発声）と桁数を含んで数字を音声入力する場合（複数桁
発声）を区別して、音声認識辞書を選択し認識を行う。
これにより、数字の認識における曖昧性を減少させて認
識を行うことが可能となるため認識性能を向上させるこ
とができる。The voice recognition unit 2 selects and recognizes a voice recognition dictionary according to the constraint information relating to the number utterance method input from the voice input constraint information management unit 3 in the number recognition. For example, a voice recognition dictionary is selected and recognized by distinguishing between a case where a number is input by voice one digit at a time (individual utterance) and a case where a number is input by voice including a number of digits (utterance at a plurality of digits).
As a result, it is possible to reduce the ambiguity in the recognition of numbers and perform recognition, so that the recognition performance can be improved.

【００２９】応用データ処理部４では、ユーザに呈示す
る処理結果などの応答を表示制御部５に出力し、内部状
態の変化情報を音声入力制約情報管理部３に出力する。
表示制御部５では、応用データ処理部４からの表示情報
や音声入力制約情報管理部３から入力された数字入力の
発声方法に関する制約情報に基づき、その区別を表す情
報を表示部に出力する。表示部６では、表示制御部５か
らの表示情報を画面表示し、とくに、現在、数字の発声
方法に関する制約情報を分かりやすく視覚表示する。例
えば、表６に示すように、キャッシュカードの暗証番号
を入力する状態では、複数桁の数字を上位の桁から「イ
チ］「ニ」「サン］「ヨン」「シ」「ゴ］「ロク」「シ
チ」「ナナ］「ハチ」「キュウ」の１１種類の音声を用
いて発声するということを分かりやすく表示する。The application data processing unit 4 outputs a response such as a processing result to be presented to the user to the display control unit 5, and outputs the internal state change information to the voice input restriction information management unit 3.
The display control unit 5 outputs the information indicating the distinction to the display unit based on the display information from the application data processing unit 4 and the constraint information regarding the utterance method of the numeric input input from the voice input constraint information management unit 3. The display unit 6 displays the display information from the display control unit 5 on the screen, and particularly visually displays the constraint information regarding the number utterance method at present. For example, as shown in Table 6, in the state in which the personal identification number of the cash card is entered, a plurality of digits starting from the upper digit are "Ichi", "D", "San", "Yon", "Shi", "Go", and "Roku". It is displayed in an easy-to-understand manner that the utterance is made using 11 types of voices such as "shichi", "nana", "bee" and "kyu".

【００３０】[0030]

【表６】また、表７に示すように、予約番号を入力する状態で
は、数を含んで複数桁の数字を発声するということを分
かりやすく表示する。[Table 6] Further, as shown in Table 7, it is displayed in an easy-to-understand manner that a plurality of digits including the number will be uttered when the reservation number is input.

【００３１】[0031]

【表７】これにより、ユーザは一目で数字を入力する際の発話方
法が分かり、使い勝手の良いインターフェースとなり、
さらに認識性能を向上させることができる。[Table 7] This allows the user to understand the utterance method when inputting numbers at a glance, making the interface easy to use,
Further, the recognition performance can be improved.

【００３２】次に、第４実施例について述べる。図４は
本発明の第４実施例に係る音声入力システムの構成図を
示す。第１実施例のうち応用データ処理部４の処理結果
は表示されるのではなく、音声出力され、音声入力に関
する制約情報である連続／孤立発声の区別を表す情報は
第１実施例と同様に表示部６に表示される。処理の流れ
は、第１実施例と同様であるが、応用データ処理部４で
は処理結果を表示制御部５ではなく音声出力部７に出力
する。これによって、電話による音声の認識、応答の確
認が可能となる。Next, a fourth embodiment will be described. FIG. 4 shows a block diagram of a voice input system according to a fourth embodiment of the present invention. In the first embodiment, the processing result of the application data processing unit 4 is not displayed but is output as voice, and the information indicating the distinction between continuous / isolated utterance, which is constraint information regarding voice input, is the same as in the first embodiment. It is displayed on the display unit 6. The flow of processing is the same as in the first embodiment, but the application data processing unit 4 outputs the processing result to the audio output unit 7 instead of the display control unit 5. As a result, it becomes possible to recognize the voice and confirm the response by telephone.

【００３３】次に、第５実施例について述べる。図５は
本発明の第５実施例に係る音声入力システムの構成図で
ある。Next, a fifth embodiment will be described. FIG. 5 is a block diagram of a voice input system according to the fifth embodiment of the present invention.

【００３４】音声入力部１を通じて音声は電気信号に変
換され音声認識部２へ入力される。音声認識部２では、
入力された音声を、例えば１２kHz の標本化周波数、１
６bit の量子化ビット数のデジタル信号に変換し、例え
ば高速フーリエ変換による周波数分析により特徴パラメ
ータ時系列を求め、さらに、例えば複合類似度法または
HMM(hidden marcov model)などの方式に基づき音声認識
を行い、認識結果として尤度の最も高い語彙または文を
応用データ処理部４に出力する。The voice is converted into an electric signal through the voice input unit 1 and input to the voice recognition unit 2. In the voice recognition unit 2,
The input voice, for example, a sampling frequency of 12kHz, 1
It is converted into a digital signal having a quantization bit number of 6 bits, and a characteristic parameter time series is obtained by frequency analysis by, for example, a fast Fourier transform.
Speech recognition is performed based on a method such as HMM (hidden marcov model) and the vocabulary or sentence with the highest likelihood is output to the application data processing unit 4 as a recognition result.

【００３５】入力情報制御部８では、音声認識部２から
の認識結果やキーボード９やマウス１０からの入力情報
を応用データ処理部４への入力データや制御信号に変換
し、応用データ処理部４へ出力する。応用データ処理部
４は、例えば翻訳、計算、編集、描画、複写などの作業
を支援するソフトウェアまたはシステムである。応用デ
ータ処理部４では、上記入力情報制御部８からの入力デ
ータや制御信号を入力として処理を行い、ユーザに現在
の状態で表示する画面や文章などの情報を表示制御部５
に出力し、応用データ処理部４の内部状態の変化情報を
音声入力制約情報管理部３に出力する。The input information control unit 8 converts the recognition result from the voice recognition unit 2 and the input information from the keyboard 9 and the mouse 10 into input data and control signals to the application data processing unit 4, and the application data processing unit 4 Output to. The application data processing unit 4 is software or a system that supports work such as translation, calculation, editing, drawing, and copying. The application data processing unit 4 processes the input data and control signals from the input information control unit 8 as input, and displays information such as screens and sentences to be displayed to the user in the current state on the display control unit 5.
Then, the change information of the internal state of the application data processing unit 4 is output to the voice input restriction information management unit 3.

【００３６】音声入力制約情報管理部３では、図６に示
すように、入力された応用データ処理部４の内部状態の
変化情報を用いて表８のような内部状態−音声入力対象
語彙対応テーブルを検索し、現在の応用データ処理部４
の内部状態での音声認識対象語彙を検出する。In the voice input restriction information management unit 3, as shown in FIG. 6, the internal state-voice input target vocabulary correspondence table as shown in Table 8 is used by using the input change information of the internal state of the application data processing unit 4. The current application data processing unit 4
The vocabulary for speech recognition in the internal state of is detected.

【００３７】[0037]

【表８】次に、現在の音声入力対象語彙を音声認識部２へ出力
し、現在の応用データ処理部４の内部状態の情報を表示
制御部５へ出力する。[Table 8] Next, the current voice input target vocabulary is output to the voice recognition unit 2, and the current internal state information of the application data processing unit 4 is output to the display control unit 5.

【００３８】表示制御部５では、例えば表９に示すよう
な入力対象表示管理テーブルを保持している。The display control unit 5 holds an input target display management table as shown in Table 9, for example.

【００３９】[0039]

【表９】入力対象表示管理テーブルとは、応用データ処理部４の
内部状態に応じて表示される画面、その画面で表示され
る語彙、その語彙をどのメディアで入力可能であるかと
いう情報、その語彙が表示される対象などの情報を保持
している。例えば、表９の入力対象表示管理テーブルの
「編集」という語彙については、初期状態の画面で、マ
ウスまたは音声で入力することが可能であり、ボタンア
イテム上に表示されているという情報を保持している。
表示制御部５では、応用データ処理部４からの画面表示
に関する情報を画面表示データに変換する際、上記入力
対象管理テーブルにおいて音声入力制約情報管理部３か
ら入力された現在の応用データ処理部４の内部状態の画
面で音声を入力メディアとする表示対象（例えば、ボタ
ンアイテム）の色、形状、背景色、周囲の図形、フォン
トなどを、音声を入力メディアとしない表示対象と区別
して表示するように制御し、そのデータを表示部６へ出
力する。表示部６では、例えばＣＲＴディスプレイのよ
うに入力された画面表示データをユーザに表示する。[Table 9] The input target display management table is a screen displayed according to the internal state of the application data processing unit 4, a vocabulary displayed on the screen, information on which medium can input the vocabulary, and the vocabulary displayed. Holds information such as the target. For example, for the vocabulary “edit” in the input target display management table of Table 9, it is possible to input with the mouse or voice on the screen in the initial state, and hold the information that it is displayed on the button item. ing.
In the display control unit 5, when converting the information related to the screen display from the application data processing unit 4 into the screen display data, the current application data processing unit 4 input from the voice input restriction information management unit 3 in the input target management table described above. Display the color, shape, background color, surrounding figure, font, etc. of the display target (for example, button item) that uses voice as the input medium on the internal state of the screen separately from the display target that does not use voice as the input medium. And outputs the data to the display unit 6. The display unit 6 displays the input screen display data to the user, such as a CRT display.

【００４０】文章の編集作業を応用データ処理部４が支
援している場合について述べる。先ず、この装置を始動
させた時、応用データ処理部４は、初期状態という情報
を音声入力制約情報管理部３へ出力し、初期画面の表示
情報を表示制御部５へ出力する。音声入力制約情報管理
部３では、表８に示す内部状態−音声入力対象語彙対応
テーブルを検索し、初期状態の時の音声認識対象語彙
「編集」、「レイアウト」、「出力」、「終了」を音声
認識部２と表示制御部５に出力する。The case where the application data processing unit 4 supports the text editing work will be described. First, when the apparatus is started, the application data processing unit 4 outputs the information of the initial state to the voice input restriction information management unit 3 and the display information of the initial screen to the display control unit 5. The voice input restriction information management unit 3 searches the internal state-speech input target vocabulary correspondence table shown in Table 8, and the speech recognition target vocabulary "edit", "layout", "output", "end" in the initial state. Is output to the voice recognition unit 2 and the display control unit 5.

【００４１】表示制御部５では、表９に示すような入力
対象表示管理テーブルの初期画面の語彙の中で、音声入
力制約情報管理部３より入力された「編集」、「レイア
ウト」、「出力」、「終了」の入力メディアの音声の項
目を入力可能（ＯＫ）となるように修正する。次に、入
力対象表示管理テーブルを参照して、初期画面の入力メ
ディアの中で音声を入力可能としている表示対象である
「編集」、「レイアウト」、「出力」、「終了」のマウ
スボタン上の文字を赤色で表示し、その他の表示対象で
ある「文書名」を黒色で表示するように制御し、表示デ
ータを表示部に出力する。図７に示すように初期状態の
表示画面の４つのボタン「編集」、「レイアウト」、
「出力」、「終了」の文字が赤色で表示され、他の文字
の黒色と一目で区別できる。ユーザは、赤色の文字で表
示されている語彙は音声で入力でき、黒色で表示されて
いるものはマウスかキーボードで入力すれば良いことが
分かる。In the display control unit 5, in the vocabulary of the initial screen of the input target display management table as shown in Table 9, "edit", "layout", "output" input from the voice input restriction information management unit 3 is entered. , "And" end "are corrected so that the voice item of the input medium can be input (OK). Next, referring to the input target display management table, on the mouse buttons of “Edit”, “Layout”, “Output”, and “End” which are the display targets for which audio can be input in the input medium of the initial screen. The characters are displayed in red, the "document name" that is the other display target is controlled to be displayed in black, and the display data is output to the display unit. As shown in FIG. 7, four buttons “Edit”, “Layout”, and
The characters "output" and "end" are displayed in red, and can be distinguished from other characters in black at a glance. The user can understand that the vocabulary displayed in red characters can be input by voice and the vocabulary displayed in black can be input by the mouse or the keyboard.

【００４２】この初期状態で、ユーザが、文書名をマウ
スを用いて選択すると、その文書名が入力情報制御部８
へ出力される。次に、「編集」と発声すると音声認識部
２での認識結果「編集」が入力情報制御部８へ出力され
る。入力情報制御部８では、選択された文書名と編集の
ボタンアイテムをマウスクリックした場合と同じ制御信
号を応用データ処理部４に出力する。そして、応用デー
タ処理部４では、内部状態が編集状態に変り、前述と同
様に編集状態という情報を音声入力制約情報管理部３へ
出力し、編集画面の表示情報を表示制御部５へ出力す
る。音声入力制約情報管理部３と表示制御部５が前述と
同様の処理を行い、図８に示す文章編集の画面が表示さ
れる。In this initial state, when the user selects the document name with the mouse, the document name is input.
Is output to. Next, when "edit" is uttered, the recognition result "edit" in the voice recognition unit 2 is output to the input information control unit 8. The input information control unit 8 outputs to the application data processing unit 4 the same control signal as when the selected document name and the edit button item are clicked with the mouse. Then, the application data processing unit 4 changes the internal state to the editing state, outputs the information of the editing state to the voice input restriction information management unit 3 and outputs the display information of the editing screen to the display control unit 5 as described above. . The voice input restriction information management unit 3 and the display control unit 5 perform the same processing as described above, and the text editing screen shown in FIG. 8 is displayed.

【００４３】このように、音声入力を支援している語彙
の画面上の表示色を音声入力を支援していない語彙と区
別して表示することにより、ユーザは一目で音声入力可
能な語彙を理解することができるため、どの語彙が音声
入力可能であるか分からず戸惑ったり、音声認識対象外
の語彙を発声したりすることがなくなり、使い勝手の良
いインターフェースを構築できる。さらに、音声認識部
２は、現在の入力対象語彙にしぼって、音声標準パター
ンとの照合を行えるため、認識性能は向上する。In this way, by displaying the display color of the vocabulary supporting the voice input on the screen in distinction from the vocabulary not supporting the voice input, the user can understand the vocabulary that can be voice input at a glance. Therefore, it is possible to construct an easy-to-use interface without being confused by not knowing which vocabulary can be input by voice or uttering a vocabulary that is not the target of voice recognition. Furthermore, the voice recognition unit 2 can narrow down the current input target vocabulary and collate it with the voice standard pattern, so that the recognition performance is improved.

【００４４】また、使用頻度の高い語彙は音声入力と
し、使用頻度の低い語彙をボタン入力とする構成として
も良い。Further, the vocabulary frequently used may be inputted by voice, and the vocabulary rarely used may be inputted by button.

【００４５】次に、第６実施例について述べる。第５実
施例のうち音声入力対象語彙と対象外語彙の表示色では
なく背景テクスチャを区別する。処理の流れは、第４実
施例と同様である。例えば、図９のように音声入力可能
なボタンアイテムはテクスチャにより分かりやすく表示
される。Next, a sixth embodiment will be described. In the fifth embodiment, the background texture is distinguished not by the display color of the voice input target vocabulary and the non-target vocabulary. The flow of processing is the same as in the fourth embodiment. For example, as shown in FIG. 9, a button item capable of voice input is displayed in an easy-to-understand manner by the texture.

【００４６】次に、第７実施例について述べる。図１０
は本発明の第７実施例に係る音声入力システムの構成図
である。Next, a seventh embodiment will be described. Figure 10
FIG. 13 is a configuration diagram of a voice input system according to a seventh embodiment of the present invention.

【００４７】第５実施例の構成に、音声認識辞書作成部
１２が加わる。音声認識辞書作成部１２では、予めシス
テムが保持している不特定ユーザの音声データと特定の
ユーザが入力した音声データを管理し、管理している音
声データを用いて音声認識辞書を作成し、音声認識部２
に認識辞書を出力する。さらに、音声認識辞書作成部１
２では、認識辞書を作成した音声データ中の不特定ユー
ザのデータ数と特定ユーザのデータ数を音声入力制約情
報管理部３へ出力する。A voice recognition dictionary creating section 12 is added to the configuration of the fifth embodiment. The voice recognition dictionary creating unit 12 manages voice data of an unspecified user and voice data input by a specific user, which the system holds in advance, and creates a voice recognition dictionary using the managed voice data. Speech recognition unit 2
The recognition dictionary is output to. Furthermore, the voice recognition dictionary creation unit 1
In 2, the number of data of the unspecified user and the number of data of the specific user in the voice data for which the recognition dictionary is created are output to the voice input restriction information management unit 3.

【００４８】入力情報制御部８、応用データ処理部４の
処理の流れは、第５実施例と同様であるが、音声入力制
約情報管理部３では音声認識辞書作成部１２からの認識
辞書を作成した音声データの情報に基づいて表１０に示
す特定／不特定辞書作成データ数テーブルを作成する。The processing flow of the input information control unit 8 and the application data processing unit 4 is the same as that of the fifth embodiment, but the voice input restriction information management unit 3 creates a recognition dictionary from the voice recognition dictionary creation unit 12. Based on the information of the voice data, the specific / non-specific dictionary creation data number table shown in Table 10 is created.

【００４９】[0049]

【表１０】また、音声入力制約情報管理部３では、表１０に示す特
定／不特定辞書作成データ数テーブルを参照して、表示
制御部５に現在の応用データ処理部４の内部状態の情報
と各認識対象語彙の認識辞書を作成する際に用いた不特
定話者の音声データ数とユーザの音声データ数を表示制
御部５に出力し、音声認識部２に現在の認識対象語彙を
出力する。[Table 10] Further, the voice input restriction information management unit 3 refers to the specific / non-specific dictionary creation data number table shown in Table 10 to inform the display control unit 5 of the current internal state information of the application data processing unit 4 and each recognition target. The number of voice data of the unspecified speaker and the number of voice data of the user used when creating the vocabulary recognition dictionary are output to the display control unit 5, and the current recognition target vocabulary is output to the voice recognition unit 2.

【００５０】表示制御部５では、上記入力対象管理テー
ブルを検索し、現在の音声入力対象となる語彙につい
て、（ユーザの音声データ数）／（不特定話者の音声デ
ータ数）が大きいほどその語彙が表示されているボタン
アイテムの表示色が濃くなるように制御し、表示データ
を表示部６に出力する。表示部６では、例えば、図１１
に示すように（ユーザの音声データ数）／（不特定話者
の音声データ数）＝１．２の「コピー」は、濃い色で表
示され、（ユーザの音声データ数）／（不特定話者の音
声データ数）＝０．１の「ペースト」は、薄い色で表示
される。これにより、ユーザは各語彙の認識辞書がどの
程度の割合で自分の音声によって訓練されているかを一
目で理解することができる。したがって、ユーザが自分
の音声による辞書の訓練の割合が小さい語彙の認識性能
が良くない場合、認識性能の向上の手段として自分の音
声データを追加して辞書を作り直せば、認識性能が向上
することが分かる。The display control unit 5 searches the input target management table, and the larger the (the number of voice data of the user) / (the number of voice data of the unspecified speaker) of the vocabulary to be the current voice input target, the larger The display data is output to the display unit 6 by controlling the display color of the button item displaying the vocabulary to be dark. In the display unit 6, for example, FIG.
As shown in, the “copy” of (the number of voice data of the user) / (the number of voice data of the unspecified speaker) = 1.2 is displayed in a dark color, and (the number of voice data of the user) / (the unspecified talk) “Paste” in which the number of voice data of a person) = 0.1 is displayed in a light color. With this, the user can understand at a glance how much the recognition dictionary of each vocabulary is trained by his / her own voice. Therefore, if the user does not have good recognition performance for a vocabulary for which the training rate of the dictionary by his / her own voice is small, the recognition performance can be improved by recreating the dictionary by adding his / her own voice data as a means of improving the recognition performance. I understand.

【００５１】また、２人のユーザにそれぞれ赤と青の色
を割り当て、各ユーザの使用頻度によって表示色を変化
させても良い。例えば、ユーザＡに対して「赤」、ユー
ザＢに対して「青」を割り当てたとすると、ユーザＡが
頻繁に使用する語彙は赤色、ユーザＢが頻繁に使用する
語彙は青色に変化させる。そして、赤色になった語彙は
ユーザＡにとって認識し易いように設定し、反対に青色
になった語彙はユーザＢにとって認識し易いように設定
する。これによって、より使い勝手のよいインターフェ
ースを構築することができる。It is also possible to assign red and blue colors to the two users and change the display color according to the frequency of use of each user. For example, if "red" is assigned to the user A and "blue" is assigned to the user B, the vocabulary frequently used by the user A is changed to red and the vocabulary frequently used by the user B is changed to blue. Then, the red vocabulary is set so that the user A can easily recognize it, while the blue vocabulary is set so that the user B can easily recognize it. This makes it possible to build a more user-friendly interface.

【００５２】次に、第８実施例について述べる。図１２
は本発明の第８実施例に係る音声入力システムの構成図
である。Next, the eighth embodiment will be described. 12
FIG. 13 is a configuration diagram of a voice input system according to an eighth embodiment of the present invention.

【００５３】第５実施例の構成に、音声データ管理部１
３が加わる。音声データ管理部１３では、特定のユーザ
が入力あるいは予め登録した音声データとそれらの音声
データの発声内容（語彙）、及びその語彙が入力可能と
なる応用データ処理部４の内部状態の情報を管理してい
る。音声データ管理部１３では、音声データを音声認識
部２に出力し、さらに、その語彙が入力可能となる応用
データ処理部４の内部状態と音声データの発声内容（語
彙）を音声入力制約情報管理部３へ出力する。音声入力
制約情報管理部３では、表８に示したような内部状態−
音声入力対象語彙対応テーブルを参照して、音声データ
管理部１３から入力された応用データ処理部４の内部状
態で認識対象となる語彙を音声認識部２へ出力し、音声
認識部２での認識結果と音声データ管理部１３より入力
した発声内容とを照らし合わせて、語彙ごとの認識率を
算出し、表１１に示すような認識率テーブルを作成す
る。In addition to the configuration of the fifth embodiment, the voice data management unit 1
3 is added. The voice data management unit 13 manages voice data input or registered in advance by a specific user, utterance contents (vocabulary) of those voice data, and information on the internal state of the application data processing unit 4 capable of inputting the vocabulary. is doing. The voice data management unit 13 outputs the voice data to the voice recognition unit 2 and further manages the internal state of the application data processing unit 4 where the vocabulary can be input and the utterance content (vocabulary) of the voice data for voice input restriction information management. Output to the part 3. In the voice input restriction information management unit 3, the internal state as shown in Table 8-
With reference to the voice input target vocabulary correspondence table, the vocabulary to be recognized in the internal state of the application data processing unit 4 input from the voice data managing unit 13 is output to the voice recognizing unit 2 and recognized by the voice recognizing unit 2. The recognition rate for each vocabulary is calculated by comparing the result with the utterance content input from the voice data management unit 13, and a recognition rate table as shown in Table 11 is created.

【表１１】また、音声入力部１から音声認識部２へ入力された音声
は第５実施例と同様に処理され、入力情報制御部８、応
用データ管理部４の処理の流れも第５実施例と同様であ
る。音声入力制約情報管理部３では表１１に示すような
上記認識率テーブルを参照し、表示制御部５に現在の応
用データ処理部４の内部状態の情報と各認識対象語彙の
認識率を表示制御部５に出力し、音声認識部２に現在の
認識対象語彙を出力する。表示制御部５では、認識対象
語彙が表示されているボタンアイテムの表示色が認識率
により区別されるよう制御し、表示データを表示部に出
力する。例えば、認識率が８０％未満の語彙のボタンア
イテムは赤色、８０％以上９０％未満の語彙のボタンア
イテムは黄色、９０％以上９５％未満の語彙のボタンア
イテムは薄い黄色、９５％以上の語彙のボタンアイテム
は青色で表示するように制御する。表示部６では、例え
ば、図１３に示すように認識率７５％の「書式」のボタ
ンアイテムは赤色、認識率８０％台の「ペースト」、
「セーブ」のボタンアイテムは黄色、認識率９３％の
「カット」と「フォント」のボタンアイテムは薄い黄
色、認識率９５％以上の他の語彙のボタンアイテムは青
色で表示される。これにより、ユーザは各語彙の認識性
能を一目で理解することができ、認識性能の良くない語
彙の認識性能を向上させるように認識辞書を学習させる
か、あるいは、その語彙は音声入力せず、マウスによっ
て選択すればよいことが分かり、入力誤りを未然に防ぐ
ことができる。[Table 11] Further, the voice input from the voice input unit 1 to the voice recognition unit 2 is processed in the same manner as in the fifth embodiment, and the processing flow of the input information control unit 8 and the applied data management unit 4 is also the same as in the fifth embodiment. is there. The voice input restriction information management unit 3 refers to the recognition rate table as shown in Table 11, and controls the display control unit 5 to display information on the current internal state of the applied data processing unit 4 and the recognition rate of each recognition target vocabulary. It is output to the unit 5, and the current recognition target vocabulary is output to the voice recognition unit 2. The display control unit 5 controls so that the display color of the button item displaying the recognition target vocabulary is distinguished by the recognition rate, and outputs the display data to the display unit. For example, a button item with a vocabulary with a recognition rate of less than 80% is red, a button item with a vocabulary of 80% or more and less than 90% is yellow, a button item with a vocabulary of 90% or more and less than 95% is light yellow, and a vocabulary of 95% or more. The button item of is controlled to be displayed in blue. In the display unit 6, for example, as shown in FIG. 13, the button item of “format” having a recognition rate of 75% is red, the “paste” having a recognition rate of 80%,
The "save" button item is displayed in yellow, the "cut" and "font" button items with a recognition rate of 93% are displayed in light yellow, and the button items of other words with a recognition rate of 95% or more are displayed in blue. With this, the user can understand the recognition performance of each vocabulary at a glance, and learn the recognition dictionary so as to improve the recognition performance of the vocabulary with poor recognition performance, or the vocabulary is not input by voice, It is understood that the user can select with the mouse, and input errors can be prevented.

【００５４】次に、第９実施例について述べる。図１４
は本発明の第９実施例に係る音声入力システムの構成図
である。Next, a ninth embodiment will be described. 14
FIG. 13 is a configuration diagram of a voice input system according to a ninth embodiment of the present invention.

【００５５】第５実施例の構成に、音声認識辞書管理部
１４が加わる。音声認識辞書管理部１４は、例えば、発
声者の性別、年齢などの特徴の異なる複数の音声認識辞
書を管理し、音声認識部２から入力される辞書の選択要
求に適した認識辞書を音声認識部２に出力する。The voice recognition dictionary management unit 14 is added to the configuration of the fifth embodiment. The voice recognition dictionary management unit 14 manages, for example, a plurality of voice recognition dictionaries having different characteristics such as gender and age of a speaker, and recognizes a recognition dictionary suitable for a dictionary selection request input from the voice recognition unit 2. Output to section 2.

【００５６】音声入力部１の処理の流れは、第５実施例
と同様であるが、音声認識部２では、認識の際に用いる
辞書を音声入力制約情報管理部３から入力される辞書の
選択要求に基づいて音声認識辞書管理部３から入力し、
第５実施例と同様に認識処理を行う。入力情報制御部８
の処理の流れは第５実施例と同様である。応用データ処
理部４では、第５実施例と処理の流れは同様であるが、
入力情報制御部８から音声認識辞書を選択する入力の場
合は、音声入力制約情報管理部３へ内部状態の変化情報
と選択された音声認識辞書の情報を出力する。音声入力
制約情報管理部３では、第５実施例と同様に現在の認識
対象単語を音声認識部２と表示制御部５に出力し、さら
に、音声認識辞書の選択に関する入力があった場合は、
表１２のような認識辞書属性テーブルを参照して、認識
辞書のＩＤ番号を音声認識部に出力する。例えば、図１
６のように音声認識辞書を選択する画面が表示されてお
り、ユーザが現在と異なった認識辞書を使用するために
「男性青年用」と音声入力あるいはマウスによって選択
した場合について説明する。この場合、音声入力制約情
報管理部では、表１２に示すような認識辞書属性テーブ
ルを参照し性別が男性で年齢が青年（１８〜３５才）の
認識辞書番号" １" を音声認識部に出力し、さらに、認
識辞書番号" １" の辞書が選択されたことを認識辞書属
性テーブルに記録する。The flow of processing of the voice input unit 1 is the same as that of the fifth embodiment, but the voice recognition unit 2 selects a dictionary used for recognition from the voice input restriction information management unit 3. Input from the voice recognition dictionary management unit 3 based on the request,
The recognition process is performed as in the fifth embodiment. Input information control unit 8
The processing flow of is similar to that of the fifth embodiment. The application data processing unit 4 has the same process flow as that of the fifth embodiment,
In the case of an input for selecting a voice recognition dictionary from the input information control unit 8, the change information of the internal state and the information of the selected voice recognition dictionary are output to the voice input restriction information management unit 3. The voice input restriction information management unit 3 outputs the current recognition target word to the voice recognition unit 2 and the display control unit 5 as in the fifth embodiment, and when there is an input regarding the selection of the voice recognition dictionary,
The ID number of the recognition dictionary is output to the voice recognition unit by referring to the recognition dictionary attribute table as shown in Table 12. For example, in FIG.
A screen for selecting a voice recognition dictionary as shown in 6 is displayed, and a case where the user selects "for men and youth" by voice input or by mouse in order to use a recognition dictionary different from the present will be described. In this case, the voice input restriction information management unit refers to the recognition dictionary attribute table as shown in Table 12 and outputs the recognition dictionary number "1" of male gender and young age (18 to 35 years old) to the voice recognition unit. Then, the fact that the dictionary with the recognition dictionary number "1" is selected is recorded in the recognition dictionary attribute table.

【００５７】[0057]

【表１２】表示制御部５、表示部の処理の流れは、第５実施例と同
様である。例えば、図１５に示すように初期画面に" 音
声認識辞書を選択" する機能を有するボタンアイテムを
マウスクリックあるいは音声入力すると図１６に示すよ
うな音声認識辞書を選択する画面が表示される。ユーザ
は、自分の性別、年齢に適した辞書を選択し、認識に用
いることができる。また、ユーザは、例えば自分の声質
が年齢よりも若い場合にも、自分の年齢よりも若い発声
者の音声データによって作成された辞書を選択すること
も可能となる。[Table 12] The processing flow of the display control unit 5 and the display unit is the same as in the fifth embodiment. For example, as shown in FIG. 15, when a button item having a function of "selecting a voice recognition dictionary" is clicked or voice input on the initial screen, a screen for selecting a voice recognition dictionary as shown in FIG. 16 is displayed. The user can select a dictionary suitable for his / her gender and age and use it for recognition. Also, for example, even when the voice quality of the user is younger than the age, the user can select the dictionary created by the voice data of the speaker who is younger than the age.

【００５８】このように、ユーザの声質に適した認識辞
書を選択することが可能となるため、音声認識誤りを減
少させることができ、使い勝手の良いインターフェース
を実現できる。なお、属性の分類の方法は表１２の例に
限らず、階層化することも可能である。例えば、「全
体」、「大人」、「大人の女性」……等に分類しても良
い。As described above, since the recognition dictionary suitable for the voice quality of the user can be selected, the voice recognition error can be reduced and the user-friendly interface can be realized. Note that the attribute classification method is not limited to the example in Table 12, and may be hierarchical. For example, it may be classified into “whole”, “adult”, “adult woman” ...

【００５９】次に、第１０実施例について述べる。構成
は、第１実施例と同様であるが、音声入力制約情報管理
部３は、内部状態−連続／孤立発声対応テーブルではな
く、表１３に示すような内部状態−自由／制限発話テー
ブルを保持している。Next, a tenth embodiment will be described. The configuration is similar to that of the first embodiment, but the voice input restriction information management unit 3 holds an internal state-free / restricted utterance table as shown in Table 13 instead of the internal state-continuous / isolated utterance correspondence table. is doing.

【００６０】[0060]

【表１３】内部状態−自由／制限発話対応テーブルは、語順の変
化、省略表現、「えー」「あの」のような発話内容に関
係の無い言葉（不要語）を含むような自由発話を入力と
する応用データ処理部４の内部状態と、システムの指定
した語順で一字一句間違えないように発話する必要があ
る内部状態を区別して管理している。処理の流れは、第
１実施例と同様であるが、音声入力制約情報管理部３で
は、表１３に示すような内部状態−自由／制限発話対応
テーブルを参照し、現在の内部状態で上述の自由発話に
よる入力が可能であるか（自由発話）、またはシステム
の指定した語順で一字一句間違えないように発話する必
要があるか（制限発話）を、音声認識部２と表示制御部
５に出力する。[Table 13] The internal state-free / restricted utterance correspondence table is application data that inputs free utterances that include word order changes, abbreviations, and words (unnecessary words) that are not related to the utterance content, such as "er" and "no". The internal state of the processing unit 4 and the internal state in which it is necessary to speak so as not to make a mistake in the word order specified by the system are managed separately. The flow of processing is the same as that in the first embodiment, but the voice input restriction information management unit 3 refers to the internal state-free / restricted utterance correspondence table as shown in Table 13 and uses the above-mentioned internal state for the current internal state. Whether the input by free utterance is possible (free utterance), or whether it is necessary to utter in the word order specified by the system without making a mistake for each character (restricted utterance) is determined by the voice recognition unit 2 and the display control unit 5. Output.

【００６１】音声認識部２では、自由発話を理解する場
合には、例えば入力音声に対してキーワードスポッティ
ングに基づいた単語検出を行い、検出された時間離散的
な単語系列に構文的制約情報を用いて文として成り立つ
系列を抽出し、発話の意味内容を理解する。一方、シス
テムの指定した語順で一字一句間違えないように発声さ
れた発話を理解する場合には、認識した語句を予めユー
ザに指定した語順に従って組み合わせて文を構成し、発
話を理解する。次に、理解した発話内容を応用データ処
理部４へ認識結果として出力する。In order to understand free speech, the voice recognition unit 2 detects a word based on keyword spotting for input voice, and uses syntactic constraint information for the detected time-discrete word sequence. To understand the meaning and content of utterances. On the other hand, in the case of understanding the utterance uttered so as not to make a mistake for each letter in the word order designated by the system, the recognized words and phrases are combined in accordance with the word order designated in advance by the user to compose a sentence to understand the utterance. Next, the understood utterance content is output to the application data processing unit 4 as a recognition result.

【００６２】応用データ処理部４では、処理結果などの
応答を表示制御部５に出力し、内部状態の変化情報を音
声入力制約情報管理部３に出力する。表示制御部５で
は、応用データ処理部４からの表示情報や、音声入力制
約情報管理部３から入力された自由／制限発話の区別を
表す表示情報を表示部６に出力する。表示部６では、表
示制御部５からの表示情報を画面表示する際、とくに、
現在、システムの指定した語順で一字一句間違えないよ
うに発話する必要がある場合は、入力形式を分かりやす
く視覚表示する。The application data processing unit 4 outputs a response such as a processing result to the display control unit 5, and outputs the change information of the internal state to the voice input restriction information management unit 3. The display control unit 5 outputs to the display unit 6 the display information from the application data processing unit 4 and the display information representing the distinction between free / restricted utterances input from the voice input restriction information management unit 3. In the display unit 6, when displaying the display information from the display control unit 5 on the screen,
Presently, when it is necessary to speak in a word order specified by the system without making a mistake for each character, the input format is visually displayed in an easy-to-understand manner.

【００６３】例えば、表１４に示すように、航空機の座
席を指定する場合、喫煙／非喫煙、ファースト／ビジネ
スクラス、窓側／通路側などの希望を、語順の変化、省
略表現、不要語を含んだ自由な発話で入力が行えるた
め、入力方法について特別な表示は行わない。For example, as shown in Table 14, when a seat of an aircraft is designated, preference is given to smoking / non-smoking, first / business class, window side / aisle side, etc., including word order changes, abbreviations, and unnecessary words. However, since the user can input with free utterance, no special display is provided for the input method.

【００６４】[0064]

【表１４】しかし、表１５に示すようにニューヨークのホテルを予
約する場合は、「（ホテル名）に（日数）滞在したい」
などのように、希望のホテル名や滞在日数を予め設定し
た語順で入力する必要があることを分かりやすく表示す
る。[Table 14] However, when booking a hotel in New York as shown in Table 15, "I want to stay at (hotel name) for (days)"
For example, it is displayed in an easy-to-understand manner that the desired hotel name and the length of stay must be entered in a preset word order.

【００６５】[0065]

【表１５】このように、自由発話／制限発話の区別を分かりやすく
視覚表示することにより、ユーザは一目で入力発話形式
を理解でき、使い勝手の良いインターフェースを実現で
きる。[Table 15] In this way, by visually displaying the distinction between free utterances and restricted utterances in an easy-to-understand manner, the user can understand the input utterance form at a glance and realize a user-friendly interface.

【００６６】[0066]

【発明の効果】以上説明したように、本発明では、音声
入力対象文型、音声入力対象単語、連続／孤立発声、特
定／不特定話者対象、認識対象話者の性別や年齢、認識
性能などの音声入力を利用する際の制約情報を、システ
ムの状態に応じて分かりやすく視覚表示する事により、
ユーザは一目で音声入力の際の制約条件や発声方法を理
解することができる。よって、ユーザは音声入力の利用
に関連した制約情報が解らず戸惑ったり、入力間違いを
起こすことなく、快適に音声入力機能を利用することが
でき、初心者にも使い易いヒューマンインターフェース
が実現できる。As described above, according to the present invention, the voice input target sentence pattern, the voice input target word, the continuous / isolated utterance, the specific / unspecified speaker target, the gender and age of the recognition target speaker, the recognition performance, etc. By visually displaying the constraint information when using the voice input of, according to the state of the system,
At a glance, the user can understand the constraint conditions at the time of voice input and the utterance method. Therefore, the user can comfortably use the voice input function without being confused about the constraint information related to the use of voice input or causing an input error, and realize a human interface that is easy for a beginner to use.

[Brief description of drawings]

【図１】本発明が適用された音声入力システムの第１実
施例の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a first embodiment of a voice input system to which the present invention is applied.

【図２】応用データ処理部での処理手順を示すフローチ
ャートである。FIG. 2 is a flowchart showing a processing procedure in an application data processing unit.

【図３】航空券予約システムの画面例を示す図である。FIG. 3 is a diagram showing a screen example of an airline ticket reservation system.

【図４】本発明の第４実施例の構成を示すブロック図で
ある。FIG. 4 is a block diagram showing a configuration of a fourth exemplary embodiment of the present invention.

【図５】本発明の第５実施例の構成を示すブロック図で
ある。FIG. 5 is a block diagram showing a configuration of a fifth exemplary embodiment of the present invention.

【図６】音声入力制約情報管理部での動作を示すフロー
チャートである。FIG. 6 is a flowchart showing an operation of a voice input restriction information management unit.

【図７】文字表示色を変化させた初期画面の例を示す説
明図である。FIG. 7 is an explanatory diagram showing an example of an initial screen in which a character display color is changed.

【図８】文書編集画面の例を示す説明図である。FIG. 8 is an explanatory diagram showing an example of a document edit screen.

【図９】テクスチャを変化させた初期画面の例を示す説
明図である。FIG. 9 is an explanatory diagram showing an example of an initial screen in which a texture is changed.

【図１０】本発明の第７実施例の構成を示すブロック図
である。FIG. 10 is a block diagram showing a configuration of a seventh exemplary embodiment of the present invention.

【図１１】文書編集画面の例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of a document edit screen.

【図１２】本発明の第８実施例の構成を示すブロック図
である。FIG. 12 is a block diagram showing a configuration of an eighth exemplary embodiment of the present invention.

【図１３】文書編集画面の例を示す説明図である。FIG. 13 is an explanatory diagram illustrating an example of a document edit screen.

【図１４】本発明の第９実施例の構成を示すブロック図
である。FIG. 14 is a block diagram showing a configuration of a ninth exemplary embodiment of the present invention.

【図１５】音声認識辞書を選択可能としたときの初期画
面の例を示す説明図である。FIG. 15 is an explanatory diagram showing an example of an initial screen when a voice recognition dictionary is selectable.

【図１６】音声認識辞書の選択の画面例を示す説明図で
ある。FIG. 16 is an explanatory diagram showing an example of a screen for selecting a voice recognition dictionary.

[Explanation of symbols]

１音声入力部２音声認識部３音声入力制約情報管理部４応用データ処理部５表示制御部６表示部７音声出力部８入力情報制御部１２音声認識辞書作成部１３音声データ管理部１４音声認識辞書管理部 1 voice input unit 2 voice recognition unit 3 voice input restriction information management unit 4 application data processing unit 5 display control unit 6 display unit 7 voice output unit 8 input information control unit 12 voice recognition dictionary creation unit 13 voice data management unit 14 voice recognition Dictionary management

Claims

[Claims]

1. An input unit for inputting a voice, a recognition unit for recognizing an input voice from the input unit, a visual display unit for processing the recognition result and visually displaying a response, and a user's input voice. And a constraint information display means for displaying constraint information for constraining the voice input system.

2. Input means for inputting voice, recognition means for recognizing input voice from the input means, visual display means for processing the recognition result and visually displaying a response, and input voice of the user. A continuous / isolated utterance, continuous utterance count, utterance volume, individual utterance of a number / constraint information displaying means for displaying constraint information for restricting at least one of a plurality of digits. Input system.

3. Input means for inputting voice, recognition means for recognizing input voice from the input means, visual display means for processing the recognition result and visually displaying a response, voice input or other Selection information display means for displaying selection information indicating which one of the input means is selectable by using at least one of a display color, a voice input recognition rate, and a recognition rate for a user's unique utterance. A voice input system comprising: