JPH0883092A

JPH0883092A - Information inputting device and method therefor

Info

Publication number: JPH0883092A
Application number: JP6219942A
Authority: JP
Inventors: Mizuhiro Hida; 瑞広飛田; Shigeki Sagayama; 茂樹嵯峨山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1994-09-14
Filing date: 1994-09-14
Publication date: 1996-03-26

Abstract

PURPOSE: To easily input the contents of information with high precision by providing a means to recognize visual sense information such as inputted characters, graphics and pictures and a means to recognize auditory sense information such as inputted voice and music. CONSTITUTION: An input switching section 3 switches and selects either one of a visual information input section 1 which inputs hand written characters and symbols and a visual sense information reading section 2 which reads already generated characters and symbols and the visual sense information is read. The contents of the information are recognized by a visual sense information recognition section 5 and the recognition results are rearranged in the order of liklihood by a recognition result sorting section 6. Similarly, the contents of the auditory sense information inputted by an auditory sense input section 4 are recognized and evaluated by an auditory sense information recognition section 7 and the recognition resluts are rearranged in the order of liklihood by a sorting section 8. When there are plural candidates to which both recognition results are matched, they are outputted to a recognition result output section 9 in the order of their overall degree of liklihood. Thus, correct information source is generated by combining relatively rough hand written character inputs, already existing hand written documents and voice recognition.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、主として文字、記
号、図形、画像、映像やそれらの組合せなどの視覚情報
を認識する技術と、音声、音響信号、あるいはそれらの
組合せなどの聴覚情報を認識する技術に関し、特に、こ
れら双方の認識技術を組み合わせることにより、双方の
単独使用の場合より認識精度や効率を向上して、よりよ
い情報の入力手段を提供しようとするものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention mainly relates to a technique for recognizing visual information such as characters, symbols, figures, images, videos and combinations thereof, and auditory information such as voice, acoustic signals, or combinations thereof. In particular, the present invention intends to provide a better information input means by improving the recognition accuracy and the efficiency as compared with the case of using both of them individually, by combining the both recognition techniques.

【０００２】[0002]

【従来の技術】近年の情報機器の発達によって、コンピ
ュータや携帯端末は小型化し、キーボードを持たない情
報機器が出現している。これらでは、情報入力手段は主
に手書き文字認識などを通して行われる。また、キーボ
ードを備えた情報機器においても、場面によってはキー
ボードからの入力が望ましくない場合、例えば習熟して
いない人による入力、暗い場所での入力などもある。さ
らに文字入力に限らず、図形や画像や映像などに含まれ
る情報を認識して取り出したい場面もある。2. Description of the Related Art With the recent development of information equipment, computers and portable terminals have been downsized, and information equipment without a keyboard has appeared. In these, the information input means is mainly performed through handwritten character recognition or the like. Further, even in an information device equipped with a keyboard, when input from the keyboard is not desirable in some situations, for example, input by an unskilled person or input in a dark place. In addition to character input, there are also scenes where it is desired to recognize and retrieve information contained in figures, images, videos, and the like.

【０００３】そこで、例として手書き文字認識を取り上
げると、その性能は、丁寧に書いた文字の認識の場合で
も、必ずしも満足できるものではない。実用性からは、
走り書きのように手早く入力しても正しく認識入力され
ることが望ましいが、この場合は丁寧に書いた場合より
もさらに認識精度は低下する。一般に、これらの記号な
どを含む文字を主体としたデータを情報源として入力し
作成する場合の基本は、上でも述べたように用紙やタブ
レット等に直接手書きをする方法や、ワードプロセッサ
ーやパーソナルコンピュータ等のキー操作によって入力
することが行われる。この場合のキー操作による入力以
外は、機械による文字認識や文字読み取りの技術を用い
て認識を行うこととなるが、その性能は向上して来ては
いるが十分ではない。Taking handwritten character recognition as an example, the performance is not always satisfactory even in the case of carefully recognized character recognition. From practicality,
It is desirable that the correct recognition input can be made even when inputting quickly, such as scribble, but in this case, the recognition accuracy becomes lower than that when writing carefully. Generally, when inputting and creating data mainly consisting of characters including these symbols as an information source, the basics are the method of handwriting directly on paper or tablet as described above, word processor, personal computer, etc. Input is performed by the key operation of. In this case, except for the input by the key operation, the recognition is performed by using the technology of character recognition and character reading by a machine, but the performance is improved, but it is not sufficient.

【０００４】一方、音声の発話内容を機械によって認識
する技術レベルも確実に向上して来てはいる。しかし、
上記の手書き文字の認識の場合と同様に特に入力の環境
条件が異なる場面において１００％の認識率を実現する
ことは困難である。これは、文字や音声などにより情報
源となるものを最初に入力する作業は、人間が行うこと
が前提であって、この場合手書きによる文字や発声する
音声の場合は、平静の状態であっても個々人の癖や特徴
の違い等の変動要因が存在することに加えて、さらに作
業時の環境やその時の心理面に変化が生じれば、前記し
た癖や特徴がさらに大きな変動を生じていくことが考え
られ、これを回避することが困難なためである。On the other hand, the technical level of recognizing the utterance contents of voice by a machine has been surely improved. But,
It is difficult to realize a recognition rate of 100% especially in a situation where input environmental conditions are different, as in the case of recognition of handwritten characters. This is based on the premise that humans first perform the work of inputting information that is a source of information, such as characters and voice. In this case, handwritten characters and uttered voice are in a calm state. In addition to the existence of variable factors such as differences in individual habits and characteristics, if there are further changes in the working environment and psychology at that time, the above-mentioned habits and characteristics will cause even greater fluctuations. This is because it is difficult to avoid this.

【０００５】なお、音声認識の補助手段として仮名文字
を入力することによって、認識の性能を向上しようとす
る考えがある（特開昭５８−１２３５９６，補助情報を
併用する音声認識方式）。ここでの考えは、発声音声の
音節数を押しボタンのオンオフ回数によって入力するこ
とや、日本語の５母音については５個のキーで与えるこ
との他に、子音の場合は数が多いためにキーで用意する
ことの代わりに仮名文字で入力する補助機能を与えよう
とするものである。しかし、これらはあくまでも音声認
識の補助手段として用いるもので、手書き文字による入
力を優先して使用したい場面では適用が困難となる。Incidentally, there is an idea to improve the recognition performance by inputting kana characters as an auxiliary means for voice recognition (Japanese Patent Laid-Open No. 58-123596, a voice recognition system using auxiliary information). The idea here is that in addition to inputting the number of syllables of the vocalized voice by the number of times the push button is turned on and off, and giving the five Japanese vowels with the five keys, there are many consonant sounds. It is intended to provide an auxiliary function of inputting with kana characters instead of preparing with keys. However, these are used only as auxiliary means for voice recognition, and are difficult to apply in a situation where input by handwritten characters is preferentially used.

【０００６】[0006]

【発明が解決しようとする課題】この発明の目的は、こ
の様な点に鑑みてなされたもので、視覚情報や聴覚情報
の入力したい情報内容を高精度でかつ容易に入力するた
めの情報入力装置を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems, and to input information contents to be input with high precision and high accuracy, such as visual information and auditory information. To provide a device.

【０００７】[0007]

【課題を解決するための手段】この発明の装置及び方法
のいずれにおいても、文字、図形、画像などの視覚情報
を入力する視覚情報入力手段と、この入力された視覚情
報を認識する視覚情報認識手段と、音声、音楽などの聴
覚情報を入力する聴覚情報入力手段と、その入力された
聴覚情報を認識する聴覚情報認識手段とが設けられる。In any of the devices and methods of the present invention, visual information input means for inputting visual information such as characters, figures, images, and visual information recognition for recognizing the input visual information. Means, an auditory information input means for inputting auditory information such as voice and music, and an auditory information recognition means for recognizing the input auditory information.

【０００８】この発明装置は、さらに視覚情報入力手段
の入力準備がなされるとこれが検出され、また聴覚情報
入力手段の入力準備がなされるとこれが検出され、両入
力準備の一方が検出されると、これと対応する入力手段
のみが情報入力可能とされ、両方の入力準備が検出され
ると、両方の入力手段が情報入力可能とされる。視覚情
報認識手段と聴覚情報認識手段との一方の認識結果候補
について、他方の認識手段は認識対象を絞る手段があ
る。また視覚情報認識手段と聴覚情報認識手段との両認
識結果が一致した認識候補については、その尤度を綜合
した尤度が求められて、その候補に対する尤度とされ
る。In the apparatus of the present invention, when the visual information input means is further prepared for input, this is detected, and when the auditory information input means is prepared for input, this is detected, and when one of the two input preparations is detected. Only the input means corresponding to this can input information, and when both input preparations are detected, both input means can input information. Regarding one of the recognition result candidates of the visual information recognition means and the auditory information recognition means, the other recognition means has a means for narrowing down the recognition target. For a recognition candidate in which both the recognition results of the visual information recognizing means and the auditory information recognizing means are the same, the likelihood obtained by combining the likelihoods is calculated and used as the likelihood for the candidate.

【０００９】この発明の方法においては、視覚情報入力
手段と聴覚情報入力手段との一方を先に用いて情報を入
力認識し、その後、他方の入力手段を用いて対応する情
報を入力して認識するが、その認識対象を先の認識結果
に応じて絞り込む。また視覚情報入力手段による情報入
力と、聴覚情報入力手段による入力とを同時に行い、そ
の両認識結果の同一の候補についてはその尤度を綜合し
た尤度とし、これを用いて認識する。さらに前記一方を
先に入力する手法と、同時に入力する手法を選択的に用
いるようにする。In the method of the present invention, one of the visual information input means and the auditory information input means is used first to input and recognize the information, and then the other input means is used to input and recognize the corresponding information. However, the recognition target is narrowed down according to the previous recognition result. Further, the information input by the visual information input means and the input by the auditory information input means are performed at the same time, and the same candidates of the both recognition results are combined as the likelihood and the recognition is performed using this. Further, a method of inputting one of the above first and a method of inputting at the same time are selectively used.

【００１０】[0010]

【作用】この発明による情報入力装置あるいは情報入
力方法を用いることにより、比較的雑な手書きの文字入
力や既に手書きされた文書が存在する場合にはこれを情
報入力源の基本として、これと音声認識技術を併用して
正しい情報源の作成を可能とすることができる。一方、
このような視覚情報源としての文書等が存在しない場合
には、聴覚情報としての音声信号を情報入力源の基本と
して、これと手書きの文字入力による認識技術を併用し
て、正しい情報源の作成を容易に行うことができる。[Operation] By using the information input device or the information input method according to the present invention, if there is a relatively rough handwritten character input or an already handwritten document, this is used as the basis of the information input source. It is possible to use a voice recognition technology together to create a correct information source. on the other hand,
When there is no such document as a visual information source, the audio signal as auditory information is used as the basis of the information input source, and this is used together with the recognition technology by handwritten character input to create the correct information source. Can be done easily.

【００１１】つまり、現行の個々の技術では１００％の
認識率が達成されないことを前提として、例えば文字を
書くとともに、その文字の読みを音声により与えてやる
ことが考えられる。典型的な手法としては、文字認識に
よって複数の認識候補を挙げておき、それらを対象語と
した音声認識により正しい文字を選択する。例えば
「音」と手書きで書きつつ、「おと」と発声する。この
ような基本的な発明に関しては、多くの型がある。例え
ば、映像（動画像）により男女を識別する場合におい
て、本人の音声が得られれば、音声による男女識別と組
み合わせて全体の識別性能が上げられる。That is, on the assumption that the recognition rate of 100% cannot be achieved by the current individual techniques, it is conceivable to write a character and give the reading of the character by voice. As a typical method, a plurality of recognition candidates are listed by character recognition, and a correct character is selected by speech recognition using them as target words. For example, say “Sound” while handwriting “Sound” and say “Sound”. There are many types of such basic inventions. For example, in the case of identifying a man and a woman by a video (moving image), if the voice of the person is obtained, the overall discrimination performance can be improved in combination with the gender discrimination by the voice.

【００１２】すなわち、この発明は、このような視覚情
報と聴覚情報を認識する技術を組み合わせることによ
り、精度の高い情報入力手段を提供できる。That is, the present invention can provide a highly accurate information input means by combining the techniques for recognizing such visual information and auditory information.

【００１３】[0013]

【実施例】図１にこの発明による一実施例の機能構成を
示す。視覚情報入力部１は手書き等による文字や記号、
画像、図形、映像などの情報を作成しながら入力する部
分であり、視覚情報読取部２は既に作成されている文字
や記号、画像、図形、映像などの情報を読み取る部分で
あり、入力切替部３によりこれら視覚情報入力部１と視
覚情報読取部２のいずれか一方を切替え選択して視覚情
報を読み取る。また話者が発声した音声やその他の音響
情報をマイクロホンなどを介して取り込むための聴覚情
報入力部４が設けられている。入力切替部３により入力
された視覚情報の内容は視覚情報認識部５で認識され、
この視覚情報認識部５で認識された結果は認識結果ソー
ト部６で尤度（スコア、類似度もしくは距離値のことを
言う）の順に並べ替えられる。同様に聴覚情報入力部４
で入力された音響情報の内容は聴覚情報認識部７で認識
評価され、この聴覚情報認識部７で認識された結果は認
識結果のソート部８で尤度（前同）の順に並べ替えられ
る。認識結果の出力部９から前述した視覚情報と聴覚情
報との各々の認識結果が出力される。その認識された結
果の出力に誤りがある場合に修正箇所特定部１１で修正
箇所を特定することができる。入力情報が正しく修正さ
れた後の結果は半導体メモリや磁気ディスク等の入力情
報蓄積部１２に記録し蓄積され、これより情報出力部１
３にて正しく入力された情報は外部へ出力される。動作
制御部１４により各部の動作の起動や動作順序ならびに
入力切替部３の制御等、本装置の動作の手順の制御がな
される。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows a functional configuration of an embodiment according to the present invention. The visual information input unit 1 is a handwritten character or symbol,
The visual information reading unit 2 is a unit for inputting information such as an image, a figure, and a video, and the visual information reading unit 2 is a unit for reading already created information such as characters and symbols, an image, a graphic, and a video. One of the visual information input unit 1 and the visual information reading unit 2 is switched and selected by 3 to read the visual information. Further, an auditory information input unit 4 is provided for taking in a voice uttered by the speaker and other acoustic information via a microphone or the like. The content of the visual information input by the input switching unit 3 is recognized by the visual information recognition unit 5,
The recognition result sorting unit 6 sorts the results recognized by the visual information recognizing unit 5 in the order of likelihoods (scores, similarities, or distance values). Similarly, the auditory information input unit 4
The content of the acoustic information input in (1) is recognized and evaluated by the auditory information recognition unit 7, and the results recognized by the auditory information recognition unit 7 are rearranged in the order of likelihood (same as above) by the recognition result sorting unit 8. The recognition result output unit 9 outputs the recognition results of the visual information and the auditory information described above. If there is an error in the output of the recognized result, the correction point specifying unit 11 can specify the correction point. The result after the input information is corrected correctly is recorded and accumulated in the input information accumulating unit 12 such as a semiconductor memory or a magnetic disk.
The information correctly input in 3 is output to the outside. The operation control unit 14 controls the procedure of the operation of the apparatus, such as the activation of the operation of each unit, the operation sequence, and the control of the input switching unit 3.

【００１４】この装置の外観例を図２に示す。筐体２１
はほゞ薄形の直方体状をしており、上面２１ａは背面２
１ｂに近づくに従って、ごくわずか立上がるテーパ面と
され、背面２１ｂの近くではさらに立上がるテーパ面と
され、この部分は認識結果を表示する表示面２２とされ
ている。極めて徐々に立上がる上面中央部は視覚情報入
力面２０であって、視覚情報入力部１の一部をなし、タ
ッチパネルや、ライトペンによる入力のためのＣＲＴ表
示器などであり、この視覚情報入力面２０に対し文字や
記号、図形の手書きなどによる入力を実行するためのペ
ン２３が視覚情報入力部１の他の一部として、上面２１
ａの前縁近くにこれと平行に取り外し自在に配されてい
る。ペン２３と視覚情報入力面２０との間に、視覚情報
読取部２を構成するＯＣＲ（光学文字読取器）などの一
部をなす光電変換部２４が取り外し自在に配されてい
る。光電変換部２４と並んで聴覚情報入力部４を構成す
るマイクロホン２５が必要に応じて取り出し自在に設け
られている。表示面２２と視覚情報入力面２０との間に
複数の制御キー２６が設けられ、制御キー２６は情報入
力時のカーソル位置の移動や入力切替部３の切替え、入
力情報の確定、印刷の実行や情報の出力等を指定選択
し、かつ制御する。An example of the appearance of this device is shown in FIG. Case 21
Has a substantially thin rectangular parallelepiped shape, and the upper surface 21a is the rear surface 2
It is a tapered surface that rises slightly as it approaches 1b, and a tapered surface that rises further near the back surface 21b. This portion is a display surface 22 for displaying the recognition result. The central portion of the upper surface that rises extremely gradually is the visual information input surface 20, which forms a part of the visual information input unit 1, and is a touch panel, a CRT display for inputting with a light pen, or the like. As a part of the visual information input unit 1, a pen 23 for executing input by handwriting of characters, symbols, and graphics on the surface 20 is provided on the upper surface 21.
It is removably arranged parallel to the front edge of a. Between the pen 23 and the visual information input surface 20, a photoelectric conversion unit 24 that forms a part of an OCR (optical character reader) that constitutes the visual information reading unit 2 is detachably arranged. A microphone 25 that constitutes the auditory information input unit 4 along with the photoelectric conversion unit 24 is provided so that it can be taken out as needed. A plurality of control keys 26 are provided between the display surface 22 and the visual information input surface 20, and the control keys 26 move the cursor position when inputting information, switch the input switching unit 3, confirm input information, and execute printing. Select and control the output of information and information.

【００１５】筐体２１の側面の一端部に、認識結果を出
力するためのスピーカ２７及びイヤホン端子２８が設け
られ、また外部へ出力するためのデータ端子２９が設け
られている。タッチパネルやＣＲＴ表示器などの視覚情
報入力面２０と、表示面２２とは両者を一体にして、両
機能をもたせてもよい。次に図１及び図２に示したこの
発明装置の動作の例を説明する。まず、視覚情報入力部
１より文字等の情報を入力する場合は、動作制御部１４
により入力切替部３を視覚情報入力部１側のデータが視
覚情報認識部５に入力されるように設定する。視覚情報
入力部１では、例えば視覚情報入力面２０としてのタッ
チパネルとペン２３との組合せ等により、入力したい情
報内容を手書きによりデータとして入力する。なお、既
に入力したい情報源が有る場合には、入力切替部３を視
覚情報読取部２の側へ切替え、例えばＯＣＲ等の光走査
型情報読取部２４を用いて、その情報源の内容をデータ
として入力する。その結果、視覚情報入力部１もしくは
視覚情報読取部２の出力が視覚情報認識部５へ入力され
る。視覚情報認識部５では、事前に標準パターンとして
蓄積されている文字や記号の内容と比較対照して、入力
された情報内容が認識される。このとき、認識された候
補が１個の場合はその候補が認識結果出力部９へ出力さ
れる。しかし、認識された候補が１個でなく複数個有る
場合は、それらの候補の尤度の高い順序となるように認
識結果のソート部６で並べ替えを行い、その結果を認識
結果出力部９へ出力する。当該認識結果出力部９は、表
示面２２によって視覚的に確認したり、スピーカ２７や
イヤホーン２８等によって聴覚的に確認することができ
る。A speaker 27 and an earphone terminal 28 for outputting the recognition result are provided at one end of the side surface of the housing 21, and a data terminal 29 for outputting the result to the outside is provided. The visual information input surface 20 such as a touch panel or a CRT display device and the display surface 22 may be integrated to have both functions. Next, an example of the operation of the device of the present invention shown in FIGS. 1 and 2 will be described. First, when inputting information such as characters from the visual information input unit 1, the operation control unit 14
Thus, the input switching unit 3 is set so that the data on the visual information input unit 1 side is input to the visual information recognition unit 5. In the visual information input unit 1, for example, a combination of a touch panel as the visual information input surface 20 and a pen 23 is used to manually input the information content to be input as data. If there is an information source to be input, the input switching unit 3 is switched to the visual information reading unit 2 side, and the content of the information source is changed to data by using the optical scanning information reading unit 24 such as OCR. Enter as. As a result, the output of the visual information input unit 1 or the visual information reading unit 2 is input to the visual information recognition unit 5. The visual information recognition unit 5 recognizes the input information content by comparing and comparing with the content of characters and symbols stored in advance as a standard pattern. At this time, when the number of recognized candidates is one, the candidate is output to the recognition result output unit 9. However, when there are a plurality of recognized candidates instead of one, the recognition result sorting unit 6 rearranges the candidates so that the candidates have the highest likelihood, and the results are recognized. Output to. The recognition result output unit 9 can be visually confirmed by the display surface 22 or can be auditorily confirmed by the speaker 27, the earphone 28, or the like.

【００１６】次に、情報源の入力に聴覚情報入力部４を
用いた場合について述べる。この場合は、話者音声を、
マイクロホン等の音響信号を電気信号に変換する機器を
用いて聴覚情報入力部４へ入力する。聴覚情報入力部４
では、音声等の聴覚情報の特徴を表現する複数のパラメ
ータに変換し、時系列のデータ群として聴覚情報認識部
７へ入力し、事前に辞書として登録されている標準の特
徴パラメータと音声であれば音素や単語あるいは文章や
記号の表記との対として対照して、入力された音声情報
の内容が何であるかを認識する。認識された候補が１個
の場合は先の場合と同様に、その候補が認識結果出力部
９へ出力される。しかし、認識された候補が１個でなく
複数個有る場合は、それらの候補の尤度の高い順序とな
るように認識結果のソート部８で並べ替えを行い、その
結果を認識結果出力部９へ出力する。Next, a case where the auditory information input unit 4 is used for inputting an information source will be described. In this case, the speaker voice
Input to the auditory information input unit 4 using a device such as a microphone that converts an acoustic signal into an electric signal. Auditory information input unit 4
Then, it is converted to a plurality of parameters that express the characteristics of auditory information such as voice, and is input to the auditory information recognition unit 7 as a time-series data group. For example, it recognizes what the content of the input voice information is, by contrasting with phonemes, words, or sentences and symbols. When the number of recognized candidates is one, the candidate is output to the recognition result output unit 9 as in the previous case. However, when there are a plurality of recognized candidates instead of one, the recognition result sorting unit 8 rearranges the candidates so that the candidates have the highest likelihood, and the results are recognized. Output to.

【００１７】以上のようにして手書き文字や記号等の視
覚情報の入力と、音声による聴覚情報の入力とのいずれ
によっても入力された内容の認識結果が出力できる。も
し、いずれの方法を用いても情報源として入力したい内
容が正しく認識されていれば、その結果を修正すること
なく修正結果の記憶部１２へ記録し、必要に応じて情報
出力部１３を経由してプリンタでハードコピーを入手し
たり、通信回線を経由した相手に情報の提供を行うこと
ができる。しかし、最初にも述べたように入力情報源
が、上記した視覚情報入力部１，視覚情報読取部２，聴
覚情報入力部４のいずれの入力方法を用いた場合でも修
正の必要が全く無く認識されることは稀である。そこで
以下に、認識を行う場合の手順ならびに認識結果に誤り
があった場合の修正と確定の方法の一例について説明す
る。As described above, the recognition result of the input contents can be output by both the input of visual information such as handwritten characters and symbols and the input of auditory information by voice. If the content to be input as the information source is correctly recognized by any of the methods, the result is recorded in the storage unit 12 of the correction result without correction, and the information is output via the information output unit 13 as necessary. Then, the printer can obtain a hard copy or provide information to the other party via the communication line. However, as described at the beginning, even if the input information source uses any of the input methods of the visual information input unit 1, the visual information reading unit 2, and the auditory information input unit 4 described above, there is no need for correction and recognition is performed. It is rare to be done. Therefore, an example of a procedure for recognition and a method of correction and confirmation when there is an error in the recognition result will be described below.

【００１８】先にも述べたようにこの発明においては、
所望とする情報の入力を実現する場合において、情報源
の入力方法には次の３通りがあり、この中から任意の一
つの入力方法が選択できるとともに、それに対応して誤
り修正の手法が設定される。その１は、視覚情報認識技
術もしくは視覚情報読取技術のいずれか一方を用いて入
力し、その認識結果に誤りがある場合に聴覚情報認識技
術を用いて正解となる内容に修正する方法であり、その
２は、最初に聴覚情報認識技術を用いて入力し、その認
識結果に誤りがある場合は視覚情報認識技術もしくは視
覚情報読取技術のいずれか一方を用いて正解となる内容
に修正する方法であり、その３は、視覚情報認識技術も
しくは視覚情報読取技術のいずれか一方と、聴覚情報認
識技術との両者を同時に用いて認識し、その認識結果に
誤りがある場合は視覚情報認識技術もしくは視覚情報読
取技術もしくは聴覚情報認識技術のいずれかを用いて修
正を行う方法である。As described above, in the present invention,
When inputting desired information, there are the following three input methods for the information source. One of them can be selected, and the error correction method is set accordingly. To be done. The first is a method of inputting using one of the visual information recognition technology and the visual information reading technology, and when the recognition result has an error, the auditory information recognition technology is used to correct the content to be a correct answer. The second is a method of inputting using the auditory information recognition technology first, and if there is an error in the recognition result, correct it using either visual information recognition technology or visual information reading technology to correct the content. The third is that the recognition is performed by using both the visual information recognition technology or the visual information reading technology and the auditory information recognition technology at the same time, and if there is an error in the recognition result, the visual information recognition technology or the visual information recognition technology is used. This is a method of making corrections using either information reading technology or auditory information recognition technology.

【００１９】このように、情報の入力時に適用する技術
の優先順位を、手書き文字入力手段、音声入力手段、あ
るいは両者の同時併用、の３通りの中からいずれか一つ
を選択して情報入力を可能とするように構成し、これに
応じて誤り修正の手順が設定されるようにしておくこと
により次のような利点が生ずる。利点の１は、情報源を
入力する周囲の環境に依存して対応できることである。
これは、例えば周囲が騒々しい場合は、視覚情報による
入力を優先して騒音が静まった時機を見計らって音声等
の聴覚情報による修正を加えること、逆に周囲が比較的
静かな環境でかつ移動する車内など手元が揺れるような
場面や暗がりのような場面では音声入力を優先し、誤っ
た箇所は揺れがない時機や明るい場面で手書きによる修
正を行うことが可能となる。As described above, the priority of the technology applied at the time of inputting information is selected by selecting one from the three types of handwritten character input means, voice input means, or simultaneous use of both, and inputting information. By enabling the error correction procedure and setting the error correction procedure accordingly, the following advantages occur. One of the advantages is that it can respond depending on the surrounding environment in which the information source is input.
This is because, for example, when the surroundings are noisy, input by visual information is given priority and correction is made by auditory information such as voice when the noise is quiet, and conversely in a relatively quiet environment. It is possible to give priority to voice input in a scene where the hand is shaking, such as in a moving car, or in a scene where there is darkness, and it is possible to make corrections by handwriting in a timely or bright scene where there is no shaking in the wrong place.

【００２０】この場合の認識のフローの例を図３に示
す。図３Ａは、視覚情報を入力し（Ｓ ₁），その視覚情
報を認識し（Ｓ₂），その結果が第一位からｎ位まで得
られて出力される（Ｓ₃）。この情報を聴覚情報による
認識時の認識対象候補として絞り込むため、同一入力情
報源について聴覚情報を入力し（Ｓ₄），その聴覚情報
中の、視覚情報で認識された候補ｎ個を対象として認識
し（Ｓ₅），その認識結果を出力する（Ｓ₆）。このよ
うに両者の協調によって認識処理時間の短縮と認識確度
の向上が期待される。同図Ｂは、情報入力の手順を上記
の逆にした場合で、つまり、まず視覚情報を入力し、こ
れを認識し、その認識候補について視覚情報入力により
認識する。得られる効果はＡの場合と同様である。ただ
し、どちらの認識手順を優先するかは、使用者が任意に
制御キー２６の選択で行うが、情報入力時の周囲の環境
などを考慮して高い精度の認識結果が得られる方へ設定
することができる。An example of the flow of recognition in this case is shown in FIG.
You In FIG. 3A, the visual information is input (S ₁), The visual information
Recognize the information (S₂), The result is from 1st to nth
And output (S₃). This information depends on auditory information
The same input information is used to narrow down the recognition target candidates during recognition.
Enter auditory information about the source (S_Four), Its auditory information
Recognize n candidates among the candidates that are recognized by visual information
Shi (S_Five), And outputs the recognition result (S₆). This
Cooperates with both sea urchins to reduce recognition processing time and recognition accuracy.
Is expected to improve. FIG. 9B shows the procedure for inputting information described above.
The opposite is true, that is, first enter the visual information, then
Recognize this and input the visual information about the recognition candidate
recognize. The effect obtained is similar to that of the case A. However
However, the user is free to decide which recognition procedure should be given priority.
It is done by selecting the control key 26, but the surrounding environment when entering information
Set for those who can obtain highly accurate recognition results
can do.

【００２１】利点の２は、手書き文字等の視覚情報の認
識もしくは読み取り技術のいずれか一方と音声認識技術
の両者を同時に用いて認識する場合で、このときは文字
などを書きながら、その文を声に出して読み上げながら
入力していく形式となるため、文書などの作成に際して
は内容の入力そのものの誤りが低減できる可能性が大き
くなることが期待される。[0021] The second advantage is that the recognition is performed by using both the recognition or reading technology of visual information such as handwritten characters and the voice recognition technology at the same time. Since the format is such that input is done aloud and read aloud, it is expected that the possibility of reducing errors in the input of the content itself when creating a document will increase.

【００２２】以下にこの場合の具体的手法を示す。例１：手書きで入力したい情報を入力することに併行し
て、音声でも同じ内容を発声して入力する。両者の協調
により、入力精度は、手書き文字認識単独の精度より
も、また音声認識単独の精度よりも高い。手書き入力と
音声入力の両者の組合せの実施例としては、以下のよう
な入力手法が考えられる。A specific method in this case will be shown below. Example 1: Along with inputting information to be input by handwriting, the same content is spoken and input. Due to the cooperation between the two, the input accuracy is higher than the accuracy of the handwritten character recognition alone and the accuracy of the voice recognition alone. The following input methods are conceivable as examples of a combination of both handwriting input and voice input.

【００２３】具体例１）：手書きで「嵯峨山」と入力
し、音声で「さがやま」と入力する。この場合は、人名
を漢字で手書きしてかつその読みを音声で入力したもの
である。具体例２）：手書きで「音声」と入力し、音声で「お
と、こえ」と入力する。この場合は、漢字１文字毎の読
みを訓読みで入力したものである。Concrete example 1): "Sagayama" is input by handwriting, and "Sagayama" is input by voice. In this case, the person's name is handwritten in Kanji and its reading is input by voice. Specific example 2): Input "voice" by handwriting and "oto, koe" by voice. In this case, the reading for each kanji character is input as a kun reading.

【００２４】具体例３）：手書きで「識」と入力し、音
声で「ごんべん」と入力する。この場合は、「識」の部
首名称を音声で入力したものである。具体例４）：手書きで「嵯」と入力し、音声で「やまへ
ん、さ」と入力する。この場合は、入力文字の部首名称
と、文字を示唆する部分の読みを入力するものである。Concrete example 3): Input "knowledge" by handwriting and "gomben" by voice. In this case, the radical name of "knowledge" is input by voice. Concrete example 4): Input "Saga" by handwriting and "Yamahen, Sa" by voice. In this case, the radical name of the input character and the reading of the part suggesting the character are input.

【００２５】具体例５）：手書きで「機」と入力し、音
声で「きかい」と発声し入力する。この場合は、入力し
たい文字を含む熟語を発声して入力するものである。具体例６）：手書きで「機」と入力し、音声で「きか
い、きへん、はた」と入力する。この場合は、上記の具
体例２）〜５）を包含する複数の内容の発声音声で入力
するものである。Concrete example 5): "Machine" is input by handwriting and "Kikai" is spoken and input. In this case, the idiom containing the desired character is uttered and input. (Specific example 6): Handwriting “machine” and voice input “kikai, kihen, hata”. In this case, the voice is input with a plurality of contents including the specific examples 2) to 5).

【００２６】具体例７）：手書きで「◇」と入力し、音
声で「ひしがた」と入力する。この場合は、記号の入力
であることを音声で教示してやるものである。これらの
例のように、視覚情報が単数あるいは複数の手書き文字
あるいは手書き記号の場合では、聴覚情報が入力しよう
とする視覚情報の音読みあるいは訓読みあるいは部首の
名称あるいは記号名あるいは入力しようとする文字を示
唆する単語やキーワードで情報の入力を行うことができ
る。Concrete Example 7): "◇" is input by handwriting, and "Hishigata" is input by voice. In this case, the input of a symbol is taught by voice. As in these examples, when the visual information is a single or multiple handwritten characters or handwritten symbols, the auditory information should read or read the visual information or the radical name or symbol name or the character to be input. You can enter information with words or keywords that suggest.

【００２７】以下の例２〜５は、画像あるいは映像を情
報源としてこれを認識する場合に、音声を併用して認識
精度の向上を図るものである。例２：嵯峨山の顔を画像認識入力し、音声で「さがや
ま」と入力する。両者の協調により、キーボードから入
力しないでも、「嵯峨山」が入力できる。その精度は、
画像認識単独の精度よりも、また音声認識単独の精度よ
りも高い。In Examples 2 to 5 below, when an image or video is used as an information source for recognition, voice is also used to improve the recognition accuracy. Example 2: Image recognition input of Sagayama's face and voice input "Sagayama". By cooperation of both, "Sagayama" can be input without inputting from the keyboard. The accuracy is
It is higher than the accuracy of image recognition alone and the accuracy of voice recognition alone.

【００２８】例３：映像入力が赤紫色の物体で、音声で
は「あかむらさき」と入力する。両者の協調により、色
名「赤紫」が精度良く入力できる。その精度は、映像の
色彩認識単独の精度よりも、また音声認識単独の精度よ
りも高い。例４：画像認識入力が男の顔で、音声も男の音声が入力
される。両者の協調により、対象が男であることが認識
される、つまり男女の性別の入力に用いて、その精度
は、画像認識単独の精度よりも、また音声認識単独の精
度よりも高い。Example 3: An image input is a reddish purple object, and "Akamurasaki" is input as voice. By collaborating with each other, the color name "magenta" can be entered accurately. The accuracy is higher than the accuracy of video color recognition alone and the accuracy of voice recognition alone. Example 4: The image recognition input is the face of a man and the voice of the man is also input. By the cooperation of the two, it is recognized that the object is a male, that is, the accuracy is higher than the accuracy of the image recognition alone and the accuracy of the speech recognition alone, which is used for the input of gender of the male and female.

【００２９】例５：手書き図形入力が正方形で、音声で
「せいほうけい」と入力する。両者の協調により、その
精度は、画像認識単独の精度よりも、また音声認識単独
の精度よりも高い。例６：書籍を読み取る文字認識入力と、その内容を朗読
した音声の両方の協調により、精度良く入力ができる。
その精度は、文字認識単独の精度よりも、また音声認識
単独の精度よりも高い。Example 5: The handwritten figure input is a square, and "seikei" is input by voice. Due to the cooperation of both, the accuracy is higher than the accuracy of the image recognition alone and the accuracy of the voice recognition alone. Example 6: Accurate input is possible by cooperation between the character recognition input for reading a book and the voice for reading the content.
The accuracy is higher than the accuracy of the character recognition alone and the accuracy of the voice recognition alone.

【００３０】次に入力した内容が、所望の情報内容であ
ることを抽出確定する過程について説明する。この場合
は、視覚情報認識技術もしくは視覚情報読取技術のいず
れか一方と、聴覚情報認識技術を併用したときに認識さ
れる上位から第ｎ位までの候補を、認識結果のソート部
６ならびに８を経て、認識結果出力部９へ出力する。こ
のとき、視覚情報認識技術もしくは視覚情報読取技術の
いずれか一方と、聴覚情報認識技術との両者による出力
結果が同一であるとした候補が複数組ある場合には、視
覚情報認識技術もしくは視覚情報読取技術と、聴覚情報
認識技術との両者で認識した結果から、まずそれぞれの
尤度（スコア、類似度もしくは距離値）の和を求め、次
に尤度（前同）の和の大きい順に前記のソート部６また
は８でソーティングを行って、その結果を視覚情報とし
てディスプレイに表示するか、もしくは聴覚情報として
使用者が確認できるように認識結果出力部９へ出力す
る。Next, a process of extracting and confirming that the input contents are desired information contents will be described. In this case, the candidates from the upper rank to the nth rank which are recognized when the visual information recognition technology or the visual information reading technology and the auditory information recognition technology are used together are sorted by the recognition result sorting units 6 and 8. After that, the result is output to the recognition result output unit 9. At this time, if there are a plurality of sets of candidates that the output results of both the visual information recognition technology or the visual information reading technology and the auditory information recognition technology are the same, the visual information recognition technology or the visual information recognition technology is used. From the results recognized by both the reading technique and the auditory information recognition technique, first, the sum of the respective likelihoods (score, similarity or distance value) is obtained, and then the sum of the likelihoods (same as above) is arranged in descending order. Sorting is performed by the sorting unit 6 or 8 and the result is displayed on the display as visual information or is output to the recognition result output unit 9 as auditory information so that the user can confirm it.

【００３１】この場合、視覚情報認識技術もしくは視覚
情報読取技術と、聴覚情報認識技術との両者を用いて認
識したときに算出されるそれぞれの尤度の値に、一定の
重み係数を付与した後に両者の和を求めることが必要と
なる場合も生ずる。これは例えば、視覚情報を用いたと
きの認識尤度の値が第１位から５位までが８００から５
０程度の範囲であるのに対し、聴覚情報を用いたときの
認識尤度の値が前と同じく第１位から５位までが１０か
ら0.２程度と桁数が異なる場合は、それらの尤度の値に
重みつけを行うことが必要で、これは実験的に前もって
適当な値に設定しておくことが精度の良い認識結果を得
るために大切である。実験的に各種の場合を決めるのは
大変であるから、両認識尤度がほゞ同一オーダとなるよ
うに、例えば正規化してもよい。In this case, after assigning a certain weighting coefficient to each likelihood value calculated when recognition is performed using both the visual information recognition technology or the visual information reading technology and the auditory information recognition technology, In some cases, it may be necessary to find the sum of the two. This is because, for example, the value of the recognition likelihood when using visual information is 800 to 5 for the first to fifth positions.
Whereas the range of 0 is about 0, while the value of the recognition likelihood when using auditory information is the same as before, the number of digits differs from 10 to 0.2 for the 1st to 5th places. It is necessary to weight likelihood values, and it is important to experimentally set appropriate values in advance in order to obtain accurate recognition results. Since it is difficult to experimentally determine various cases, for example, normalization may be performed so that both recognition likelihoods are on the same order.

【００３２】次に、認識結果出力部９へ認識結果が出力
されかつ、その内容の一部に誤りがある場合には、修正
箇所特定部１１により修正したい箇所へカーソルを移動
してやることとなる。この場合の、誤り修正を行うため
の修正箇所を特定するための手段としては、文字入力に
使用する機能部品であるペン２３や指を用いるかもしく
は上下左右方向へカーソルを移動して修正箇所を特定で
きる動作制御部の機能キー２６を用いることとなる。Next, when the recognition result is output to the recognition result output unit 9 and a part of the content is incorrect, the correction position specifying unit 11 moves the cursor to the position to be corrected. In this case, as a means for identifying a correction location for performing an error correction, a pen 23 or a finger, which is a functional component used for character input, is used, or the cursor is moved in the up / down / left / right directions to locate the correction location. The function key 26 of the operation control section that can be specified is used.

【００３３】さらに、誤り修正を行うための修正箇所が
文字の一部分をなす偏や旁である場合は、前述した方法
により修正したい箇所を指定し、既に認識されている文
字以外の候補を保有している辞書の中から複数個を順次
候補として出力するとともに、当該候補の中から正解で
ある文字に対してこれを特定する機能を付与している。Further, when the correction portion for error correction is a partial or partial correction of a character, the portion to be corrected is designated by the above-mentioned method and a candidate other than the already recognized character is held. In addition to outputting a plurality of candidates as candidates sequentially from the dictionary, the function of specifying the correct character from the candidates is given.

【００３４】なお、入力すべき情報が特に文字の場合で
かつ音声認識技術を用いる場合は、文字の読みを音読み
もしくは訓読みのいずれをも許容できるように表記法を
変えて事前に登録しておくこととする。これらの内容に
ついては、先に手書き入力と音声入力の組合せの実施例
として既に示した通りである。他の実施例として、入力
から確定までの一連の手順を以下に示す。If the information to be input is in particular characters and the voice recognition technique is used, the notation is changed and registered in advance so that either phonetic reading or kun reading can be accepted. I will. These contents are as described above as the embodiment of the combination of the handwriting input and the voice input. As another example, a series of procedures from input to confirmation is shown below.

【００３５】まず入力モードの切替えは、例えばペン２
３を筐体２１から取り上げることによって視覚情報入力
の優先モードとなり、一方情報読取部２４の例えばＯＣ
Ｒ部品を取り上げれば視覚情報読取部の優先モードとな
るように構成することで、機能キー２６の個数を低減し
て構成できる。聴覚情報による入力を優先する場合は、
前述の機能キー２６で選定すれば良い。なお音声入力の
場合は、発話の開始と終了のタイミングを特定するため
の機能キーを指定しておき、これを用いて発話区間の情
報の入力を同時に行うことによって、より高精度の認識
が実現できる場合もある。First, the input mode is switched by, for example, the pen 2
3 is picked up from the housing 21, the visual information input priority mode is set, while the information reading unit 24, for example, OC.
By taking the R part as the priority mode of the visual information reading unit, the number of function keys 26 can be reduced. If you give priority to input by auditory information,
It may be selected by the function key 26 described above. For voice input, specify a function key to specify the start and end timings of utterance, and use this to simultaneously input the information of the utterance section to realize more accurate recognition. Sometimes you can.

【００３６】このような入力の動作機能を有する情報入
力装置において、ペン２３を筐体２１から取り上げてタ
ッチパネルなどの視覚情報入力面２０等の上に「認識する」という文字を手書きによって入力すると、手書き情報の
入力優先状態となって文字認識動作が行われる。このと
き、音声の入力も同時に行いたい場合には、前述のキー
選択２６を手書き情報入力と音声情報入力との両者を同
時に行うためのキーを押下すれば良い。情報の入力が終
了すれば、認識された結果が先に述べた尤度値の順位で
表示面２２あるいはスピーカ２７等に表示される。その
結果が、例えば「認識する」というように「識」の部分が「職」に誤っていた場合
は、ペン２３で、当該誤り部分を指定すると、その部分
の第２候補が出力されてくる。この時点で正解の「識」
が出力されれば、機能キー２６の例えば「確定」という
キーを押下するか、もしくはこれに代わるコマンドを入
力することで、正しい入力情報を確定して取り込むこと
ができる。In the information input device having such an input operation function, when the pen 23 is picked up from the housing 21 and the character "recognize" is input by handwriting on the visual information input surface 20 such as a touch panel, The character recognition operation is performed with the handwriting information input priority state. At this time, if voice input is desired to be performed at the same time, the above-described key selection 26 may be performed by pressing a key for performing both handwriting information input and voice information input. When the input of information is completed, the recognized result is displayed on the display surface 22 or the speaker 27 or the like in the order of the likelihood value described above. If the result is that the "knowledge" part is wrong, such as "recognize", is "job", the pen 23 will specify the error part, and the second candidate for that part will be output. . At this point the correct "knowledge"
Is output, the correct input information can be confirmed and taken in by depressing, for example, the "confirm" key of the function keys 26 or by inputting a command in place of this.

【００３７】全体の処理流れ図を図４に示す。つまり常
時は入力準備が検出されるかを監視し（Ｓ₁），つまり
視覚情報手動入力、視覚情報読出し入力、または聴覚入
力を示すいずれかのキー２６が操作されるか、ペン２３
の取り上げ、情報読取部２４の取り上げ、マイクロホン
２５の取り出しのいずれかが行われると、入力準備が検
出される。これより手動入力、つまり手書入力かが調べ
られ（Ｓ₂），手書入力でなければ読取り入力かが調べ
られ（Ｓ₃），読取り入力でなければ聴覚情報の入力と
決定され、聴覚情報入力部４からの聴覚情報が取り込ま
れる（Ｓ₄）。この取り込まれた聴覚情報に対する認識
が聴覚情報認識部７で行われ（Ｓ₅），その認識結果が
出力部１３から出力され、誤りがあるかが調べられる
（Ｓ₆）。An overall processing flow chart is shown in FIG. That is, it is constantly monitored whether or not input preparation is detected (S ₁ ), that is, any key 26 indicating visual information manual input, visual information read input, or auditory input is operated, or the pen 23
When any of the above, the information reading section 24, and the microphone 25 are taken out, the input preparation is detected. From this, it is checked whether it is a manual input, that is, a handwriting input (S ₂ ), and if it is not a handwriting input, it is checked whether it is a reading input (S ₃ ). Auditory information from the input unit 4 is captured (S ₄ ). Recognition of the captured auditory information is performed by the auditory information recognition unit 7 (S ₅ ), and the recognition result is output from the output unit 13 to check whether there is an error (S ₆ ).

【００３８】誤りがあれば視覚情報入力手段、視覚情報
入力部１，または視覚情報読取部２から、前記聴覚情報
入力部４で入力した聴覚情報と対応した視覚情報が入力
される（Ｓ₇）。その入力された視覚情報は視覚情報認
識部５で認識されるが、その認識対象は聴覚情報認識ス
テップＳ₅で認識された候補のうち、予め決められた上
位から一定の数のものについて認識が行われる
（Ｓ₈）。その認識結果は尤度の高い順に出力部１３に
可視表示され、または可聴的に出力される（Ｓ₇）。そ
の認識結果候補において、必要に応じて修正が行われ
（Ｓ₁₀），その後、正しい入力して確定操作が例えばキ
ー２６により行われる（Ｓ₁₁）。If there is an error, the visual information input means, the visual information input section 1, or the visual information reading section 2 inputs the visual information corresponding to the auditory information input by the auditory information input section 4 (S ₇ ). . The input visual information is recognized by the visual information recognizing unit 5, and the recognition target is a certain number of candidates from the predetermined upper rank among the candidates recognized in the auditory information recognition step S _5. It is carried out (S _8). The recognition result is visually displayed or audibly output on the output unit 13 in descending order of likelihood (S ₇ ). In the recognition result candidates, is performed modified as necessary (S _10), then, is carried out by confirming operation by the correct input is, for example, key 26 (S _11).

【００３９】一方ステップＳ₂またはＳ₃において、視
覚情報の入力準備状態が検出されると、聴覚情報入力準
備状態になっているかが調べられる（Ｓ₁₂）。聴覚情報
入力準備状態になっていないと、フラグが立っているか
が調べられる（Ｓ₁₃），つまりステップＳ₃で読取り入
力の準備が検出されると、フラグが１に立てられてステ
ップＳ₁₂に移る。ステップＳ₁₃でフラグが１であれば入
力切替部３は視覚情報読取部２に切替えられる
（Ｓ₁₅）。入力切替部３はフラグが０の場合は視覚情報
入力部１に切替えられる。次に視覚情報が取り込まれ
（Ｓ₁₆），この視覚情報が視覚情報認識部５で認識され
る（Ｓ₁₇）。その認識結果に誤りが有るかが調べられ
（Ｓ₁₈），誤りがなければ確定出力され、誤りがあれば
聴覚情報入力部４により入力された視覚情報と対応した
情報が入力される（Ｓ₁₉）。その入力聴覚情報は聴覚情
報認識部７で認識されるが（Ｓ₂₀），この認識はステッ
プＳ₁₇での認識結果中の尤度が高い順から所定数の候補
のみが認識対象とされる。この認識結果は尤度の高い順
に出力部１３により出力される（Ｓ₉）。On the other hand, when the visual information input preparation state is detected in step S ₂ or S ₃ , it is checked whether or not the auditory information input preparation state is set (S ₁₂ ). If it is not in the auditory information input preparation state, it is checked whether or not the flag is set (S ₁₃ ). That is, when the read input preparation is detected in step S ₃ , the flag is set to 1 and the process proceeds to step S ₁₂ . Move. If the flag is 1 in step S ₁₃ , the input switching unit 3 is switched to the visual information reading unit 2 (S ₁₅ ). When the flag is 0, the input switching section 3 is switched to the visual information input section 1. Then the visual information is captured (S _16), the visual information can be recognized on the visual information recognition unit 5 (S _17). It is checked whether or not the recognition result has an error (S ₁₈ ), and if there is no error, the confirmation output is made. If there is an error, the information corresponding to the visual information inputted by the auditory information input unit 4 is inputted (S _19). ). The input auditory information is recognized by the auditory information recognizing unit 7 (S ₂₀ ), but this recognition is performed only on a predetermined number of candidates from the highest likelihood in the recognition result at step S ₁₇ . The recognition result is output by the output unit 13 in the order of highest likelihood (S ₉ ).

【００４０】ステップＳ₁₂で聴覚情報入力準備がなされ
ていることが検出されると、この場合は、聴覚情報、視
覚情報の同時入力の場合であって、入力された視覚情報
及び聴覚情報が取り込まれ（Ｓ₂₁），これら視覚情報及
び聴覚情報がそれぞれ認識される（Ｓ₂₂）。これら両認
識結果の候補中の同一のものについては綜合尤度が計算
され（Ｓ₂₃），この綜合尤度と、不一致候補の尤度との
うち予め決めた数だけ高いものから出力部１３から出力
される（Ｓ₉）。When it is detected in step S ₁₂ that the preparation for inputting auditory information is detected, in this case, the auditory information and the visual information are simultaneously input, and the input visual information and the auditory information are captured. is (S _21), these visual information and auditory information are recognized, respectively (S _22). The combined likelihood is calculated for the same one of these two recognition result candidates (S ₂₃ ), and from the combined likelihood and the likelihood of the non-matching candidate, the one that is higher by a predetermined number is output from the output unit 13. It is output (S ₉ ).

【００４１】視覚情報を入力し、聴覚情報を入力してい
ない場合は聴覚情報入力部４から周囲の騒音や雑音が入
力されたり、誤動作しないように情報入力準備の検出状
態で聴覚情報入力部４から聴覚情報認識部７への入力を
禁止、または聴覚情報認識部７の処理を禁止する。上述
では聴覚情報としては音声を例として述べたが、例えば
救急車の視覚情報の入力と対応して聴覚情報として救急
車のサイレンを入力してもよい。動物の視覚情報の入力
と対応して、その動物の鳴き声を聴覚情報として入力し
てもよい、など各種の音を聴覚情報とすることもでき
る。When the visual information is input and the auditory information is not input, the auditory information input unit 4 receives the ambient noise or noise from the auditory information input unit 4 or detects the information input preparation so as not to malfunction. From the input to the auditory information recognition unit 7 or the processing of the auditory information recognition unit 7 is prohibited. Although the audio has been described as an example of the auditory information in the above description, the siren of the ambulance may be input as the auditory information in correspondence with the input of the visual information of the ambulance. Corresponding to the input of the visual information of the animal, various sounds such as the sound of the animal may be input as the auditory information may be used as the auditory information.

【００４２】[0042]

【発明の効果】以上説明したように、文字や記号などの
情報を高精度に入力する場合、従来のワープロやパソコ
ン等を用いたときに多くのキー入力を必要とする場合に
比べて、この発明による情報入力装置、情報入力方法を
用いることにより、比較的雑な手書きの文字入力や既に
手書きされた文書が存在する場合にはこれを情報入力源
の基本として、これと音声認識技術を併用して正しい情
報源の作成を可能とすることができることと、これを必
要最小限のキー操作で容易に素早い入力が可能となるた
め、特にキーボードの操作に不慣れな使用者に対しても
違和感なく、情報の入力が実現できる。As described above, in the case of inputting information such as characters and symbols with high precision, compared with the case where many key inputs are required when using a conventional word processor or personal computer, By using the information input device and the information input method according to the present invention, if there is a relatively rough handwritten character input or a document already handwritten, this is used as the basis of the information input source and this is used in combination with the voice recognition technology. It is possible to create a correct information source, and it is possible to easily and quickly input this with the minimum number of key operations, so even users who are unfamiliar with keyboard operation do not feel uncomfortable. , Information input can be realized.

[Brief description of drawings]

【図１】この発明による装置の一実施例を示すブロック
構成図。FIG. 1 is a block diagram showing an embodiment of an apparatus according to the present invention.

【図２】この発明による装置の外観構成の例を示す斜視
図。FIG. 2 is a perspective view showing an example of an external configuration of a device according to the present invention.

【図３】視覚情報と聴覚情報を用いた場合の認識手順を
示す流れ図。FIG. 3 is a flowchart showing a recognition procedure when visual information and auditory information are used.

【図４】この発明による方法の処理手順の例を示す流れ
図。FIG. 4 is a flowchart showing an example of a processing procedure of a method according to the present invention.

Claims

[Claims]

1. A visual information input means for inputting visual information such as characters, symbols, figures and images, an auditory information input means for inputting auditory information such as voice and music, and an input by the visual information input means. The visual information recognition means for recognizing visual information, the auditory information recognition means for recognizing the auditory information input from the auditory information input means, the recognition result of the visual information recognition means, and the recognition result of the auditory information recognition means An information input device comprising: a recognition result output means for outputting.

2. A means for detecting the input preparation of the visual information input means, a means for detecting the input preparation of the auditory information input means, and one of the both input preparation detecting means when the input preparation is detected. 2. Only the corresponding input means is capable of inputting information, and when both of the input preparation detecting means detect the input preparation, a means for enabling the information input of the both input means is included. Information input device.

3. The information input device according to claim 1, further comprising means for performing a recognition process by another recognition means for only one recognition result candidate of said visual information recognition means and said auditory information recognition means. .

4. A means for comparing a candidate recognized by the visual information recognition means with a candidate recognized by the auditory information recognition means, and both of the same candidates when there are a plurality of matched candidates by the comparison. The information input device according to claim 1, further comprising: a unit that obtains a total likelihood from the likelihood and uses it as a likelihood of the candidate.

5. The visual information input means, visual information manual input means for inputting after visual information is created, visual information reading means for reading the created visual information, these visual information manual input means, and visual information. 5. The information input device according to claim 1, further comprising input switching means for supplying one of the information reading means and the inputted visual information to the visual information recognition means.

6. A visual information input means for inputting visual information such as characters, symbols, figures and images, an auditory information input means for inputting auditory information such as voice and music, and an input by the visual information input means. An information input method comprising visual information recognition means for recognizing visual information, and auditory information recognition means for recognizing auditory information input from the auditory information input means, comprising: the auditory information input means; One of the input information is input, the input information is recognized by the corresponding recognition means, and then the other input means is used to input the information corresponding to the input information, and the recognition means corresponding to the input information is used. An information input method characterized in that recognition candidates are obtained by narrowing down recognition target candidates according to the recognition result.

7. A visual information input means for inputting visual information such as characters, symbols, figures and images, an auditory information input means for inputting auditory information such as voice and music, and an input by the visual information input means. In an information input method comprising visual information recognition means for recognizing visual information, and auditory information recognition means input by the auditory information input means, while inputting information using the visual information input means,
Information corresponding to the above information is simultaneously input using the auditory information input means, the input information is recognized by the corresponding recognition means, and the likelihoods of the same candidates in the recognition result are recognized by these recognition means. An information input method characterized by obtaining likelihood and obtaining a recognition candidate.

8. Information is input using one of the visual information input means and the auditory information input means, the input information is recognized by a corresponding recognition means, and then the other input means corresponds to the input information. Information to be input and the recognition means corresponding to the input information narrows down the recognition target candidates according to the recognition result, and the recognition means obtains recognition candidates by using both of the input means at the same time. 8. The information input method according to claim 7, wherein the method is selectively used.

9. The correction of the obtained recognition candidate is performed by inputting auditory information when the recognition candidate is based on the recognition result of visual information, and by inputting the visual information when based on the recognition result of auditory information. 9. The information input method according to claim 6, wherein the information input method is performed.

10. When the visual information to be input is a single character or a plurality of characters or symbols, the aural reading or the instructive reading of the visual information as the auditory information to be input,
9. The information input method according to claim 6, which is a radical name, a symbol, or a word or a keyword that suggests the visual information.

11. The total likelihood is obtained by adding a weighting coefficient to one of the likelihood of a visual information recognition result and the likelihood of an auditory information recognition result, and then adding both. The information input method according to item 7 or 8.