JPH0540599A

JPH0540599A - Voice input device

Info

Publication number: JPH0540599A
Application number: JP3197735A
Authority: JP
Inventors: Mitsuhiro Inazumi; 満広稲積
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1991-08-07
Filing date: 1991-08-07
Publication date: 1993-02-19

Abstract

PURPOSE:To obtain the voice input device to be easily used equally to a 'Japanese syllabary (KANA)/Chinese character (KANJI) converting Japanese input' without requiring the change of an application software. CONSTITUTION:This device is composed of a data input means 1, data output means 2, voice input selection detecting means 3, voice input means 4, voice feature extracting means 5, data storing means 6, voice feature pattern matching means 7 and data input/output control means 8. Thus, without requiring the correction of the existent application software, the device can be used similarly to the normal 'KANA/KANJI converting Japanese input'.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声入力装置に関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice input device.

【０００２】[0002]

【従来の技術】音声入力装置は非常に効率的な情報入力
手段であると考えられる。しかしながら、音声入力装置
はまだまだ一般的な情報入力手段とはなっていない。そ
れにはいくつかの理由が考えられる。2. Description of the Related Art A voice input device is considered to be a very efficient information input means. However, the voice input device has not yet become a general information input means. There can be several reasons.

【０００３】最初に考えられるのは当然の事であるが、
音声入力装置を使用する際において、予めそれを含めた
ハードウェアと、音声入力を取り扱えるアプリケーショ
ンソフトウェアを用意する必要があると言う事である。
これらを新規に取り揃えるためには非常に大きなハード
ウェアへの投資が必要とされる。Of course, the first thing to think of is,
This means that when using the voice input device, it is necessary to prepare hardware including it and application software capable of handling voice input in advance.
A very large amount of investment in hardware is required to newly install these.

【０００４】更により大きな問題であるのは、音声認識
を取り扱えるアプリケーションソフトウェアが決定的に
不足している事である。従来の技術においては音声認識
を取り扱おうとするとアプリケーションソフトウェア側
で対応する必要があった。しかし、ソフトウェアを作成
する事はハードウェアを作成する事よりも非常に多くの
人的な投資を必要とする。従って、既存のソフトウェア
を音声認識対応にする事は現実的にはまず不可能であ
る。An even greater problem is the definitive lack of application software that can handle voice recognition. In the conventional technology, when trying to handle voice recognition, it was necessary for the application software to handle it. However, creating software requires significantly more human investment than creating hardware. Therefore, it is practically impossible to make existing software compatible with voice recognition.

【０００５】また仮に従来のソフトウェアを音声認識入
力に対応したものに書き換えたとしても、それによりそ
のソフトウェアの使用方法等が変化してしまえばユーザ
ーに拒絶される大きな要因となる。Further, even if the conventional software is rewritten into one that is compatible with voice recognition input, if the usage method of the software is changed by that, it will be a major factor rejected by the user.

【０００６】更に上で述べた問題に加えて、音声認識技
術そのものの問題として、通常のオフィス環境程度の雑
音においても発話区間の切り出しが困難であると言う事
がある。また、音声認識処理はかなり多くの計算機資源
を必要とするため、それを常に動作させておく事は、他
の処理に対して悪影響を与える可能性がある。In addition to the above-mentioned problems, a problem of the voice recognition technology itself is that it is difficult to cut out the utterance section even in the noise of the normal office environment. Further, since the voice recognition processing requires a considerable amount of computer resources, keeping it running may adversely affect other processing.

【０００７】このように、従来的な技術においては、上
で述べてきたような、ハードウェア的、ソフトウェア的
に非常に多くの問題があり、更にそれらの問題を全て解
決したとしても、その上でより使用し易い環境を整える
のは非常に困難であった。As described above, the conventional technique has a large number of problems in terms of hardware and software as described above, and even if all of these problems are solved, further problems still remain. It was very difficult to prepare an environment that is easier to use.

【０００８】[0008]

【発明が解決しようとする課題】本発明が解決しようと
する課題は上で述べてきたような事であり、本発明の目
的は、上記の課題を解決するために既存のハードウェア
に最小限の音声入力用のハードウェアを追加するのみ
で、かつ、既存のアプリケーションソフトウェアを変更
する事なく使用可能であり、かつユーザーが容易に音声
入力とキー入力を選択できる音声入力装置を実現する事
である。The problem to be solved by the present invention is as described above, and the object of the present invention is to minimize the existing hardware to solve the above problems. By adding the hardware for voice input, it can be used without changing the existing application software, and by realizing a voice input device that allows the user to easily select voice input and key input. is there.

【０００９】[0009]

【課題を解決するための手段】図１は本発明の概念の模
式図である。図１を用いて本発明を説明すると、本発明
は、１）、１組以上のデータ入力手段１と、２）、１組以上のデータ出力手段２と、３）、１のデータ入力手段により入力されたデータか
ら、音声入力選択データを検出する音声入力選択検出手
段３と、４）、音声入力手段４と、５）、４の音声入力手段により入力された音声の特徴を
抽出する音声特徴抽出手段５と、６）、音声特徴と、データの対を記憶するデータ記憶手
段６と、７）、５の音声特徴抽出手段による音声特徴と、６のデ
ータ記憶手段中の音声特徴とを入力として、それらのパ
タンマッチングを行う音声特徴パタンマッチング手段７
と、８）、１のデータ入力手段と、３の音声入力選択検出手
段と、６のデータ記憶手段と、７の音声特徴パタンマッ
チング手段を入力とし、また、２のデータ出力手段を出
力としてデータの入出力の制御を行うデータ入出力制御
手段８と、をその構成中に含む事を特徴とする音声入力
装置である。FIG. 1 is a schematic view of the concept of the present invention. The present invention will be described with reference to FIG. 1. The present invention includes: 1) one or more sets of data input means 1; 2) one or more sets of data output means 2; and 3) one data input means. Voice input selection detecting means 3 for detecting voice input selection data, 4), voice input means 4, 5) voice features for extracting features of voice input by the voice input means 4 from input data Extraction means 5 and 6), voice features and data storage means 6 for storing a pair of data, 7) Input voice features by the voice feature extraction means 5 and voice features in the data storage means 6 As a voice feature pattern matching means 7 for performing those pattern matching
8) 1 data input means, 3 voice input selection detecting means, 6 data storage means, 7 voice feature pattern matching means as input, and 2 data output means as output And a data input / output control means 8 for controlling the input / output of the voice input device and the voice input device.

【００１０】[0010]

【実施例】図１は本発明の概念の模式図である。また、
図２、図３、図４は本発明における処理の概略を示した
図である。以下にこの４つの図を用いて本発明を詳細に
説明する。1 is a schematic view of the concept of the present invention. Also,
2, 3 and 4 are diagrams showing an outline of the processing in the present invention. The present invention will be described in detail below with reference to these four figures.

【００１１】現在用いられているコンピュータの多くは
本体とは独立したキーボードを持っている。あるいは、
端末装置として主な処理を行うコンピュータとは独立し
たものも多い。これらの場合、いずれも本体とは独立し
たＣＰＵを持ち、主な処理を行うコンピュータと通信を
行う事により処理を実行している。Most of the computers currently in use have a keyboard independent of the main body. Alternatively,
In many cases, the terminal device is independent of the computer that performs the main processing. In these cases, each has a CPU independent of the main body, and executes processing by communicating with a computer that performs main processing.

【００１２】本発明は、その通信線に接続されたハード
ウェア、及びソフトウェアを用いて音声認識装置を接続
し、広く用いられている「かな漢字変換日本語入力」と
同等に使い易い音声入力装置を実現するものである。も
ちろん、本発明においての通信線への接続は必ずしも物
理的な接続である必要はない。例えば、コンピュータ本
体のデバイスドライバーを介したソフトウェア的な接続
であってもかまわない。The present invention provides a voice input device which is as easy to use as a widely used "kana-kanji conversion Japanese input" by connecting a voice recognition device using the hardware and software connected to the communication line. It will be realized. Of course, the connection to the communication line in the present invention does not necessarily have to be a physical connection. For example, it may be a software connection via a device driver of the computer main body.

【００１３】図１、及び図２を用いて本発明の動作を説
明する。まず予め音声入力選択データを登録しておく。
この選択データは非印字文字、例えばコントロール文字
データを用いる事が考えられる。その後データ入力手段
１から入力されたデータは、音声入力選択検出手段３へ
送られる。もしも入力されたデータが音声入力選択デー
タでなければ、そのデータはデータ入出力制御手段８を
介してデータ出力手段２へ送られ、それにより主な処理
を行うコンピュータ本体へ送られる。この処理はキーボ
ードや端末装置が直接コンピュータ本体へ接続されてい
る通常の場合とまったく同じ動作である。The operation of the present invention will be described with reference to FIGS. 1 and 2. First, the voice input selection data is registered in advance.
It is conceivable that this selection data uses non-printing characters, for example, control character data. After that, the data inputted from the data input means 1 is sent to the voice input selection detecting means 3. If the input data is not the voice input selection data, the data is sent to the data output means 2 via the data input / output control means 8 and thereby to the computer main body which performs main processing. This process is exactly the same as the normal case where the keyboard and the terminal device are directly connected to the computer main body.

【００１４】もしも、音声入力選択検出手段３へ送られ
たデータが音声入力選択データであったとしたならば、
そのデータそのものは捨てられ、その副作用として音声
入力手段４と音声特徴抽出手段５が起動される。そし
て、この状態において音声は音声入力手段４から入力さ
れた音声の特徴が、音声特徴抽出手段５により抽出され
る。その後、図３に示す処理１が起動される。If the data sent to the voice input selection detecting means 3 is the voice input selection data,
The data itself is discarded, and as a side effect thereof, the voice input means 4 and the voice feature extraction means 5 are activated. Then, in this state, the features of the voice input from the voice input unit 4 are extracted by the voice feature extraction unit 5. Then, the process 1 shown in FIG. 3 is started.

【００１５】処理１において、音声特徴抽出手段５によ
り抽出された音声特徴と、データ記憶手段６の中の音声
特徴が、音声特徴パタンマッチング手段７において照合
される。In the process 1, the voice feature extracted by the voice feature extraction means 5 and the voice feature in the data storage means 6 are collated in the voice feature pattern matching means 7.

【００１６】もしも、入力された音声の特徴をデータ記
憶手段６の中から見つける事が出来たならば、その特徴
パタンの対としてデータ記憶手段６の中に登録されてい
たデータを２のデータ出力手段へ出力する。例えばそれ
は、「住所」と言う音声に対して、「長野県諏訪市大和
３ー３ー５」等と言うような文字列に対応するデータ列
である。この場合、主な処理をするコンピュータ本体に
おいてワードプロセッサ等のプログラムが動作していた
ならば、これはその音声入力がなされた時点でのカーソ
ル位置に上の文字列が入力された事と同じ結果を与え
る。つまり、アプリケーションソフトウェアに手を加え
る事なく音声入力を実現する事ができる。この音声特徴
と対になっているデータは、必ずしも「文字」に対応す
るデータである必要はない。例えば、ある機能を実現す
るためのコントロールデータである等の「文字」ではな
いデータである事も当然可能である。If the characteristic of the input voice can be found in the data storage means 6, the data registered in the data storage means 6 as a pair of the characteristic patterns is output as data 2. Output to the means. For example, it is a data string corresponding to a character string such as "Yamato 3-3-5, Suwa City, Nagano Prefecture" for the voice "Address". In this case, if a program such as a word processor is running on the computer that performs the main processing, this will give the same result as the above character string being input at the cursor position at the time the voice was input. give. In other words, it is possible to realize voice input without modifying the application software. The data paired with the voice feature does not necessarily have to be the data corresponding to the “character”. For example, it is naturally possible that the data is not "character" such as control data for realizing a certain function.

【００１７】もしも、入力された音声の特徴をデータ記
憶手段６の中から見つける事が出来なかったならば、未
知パタンに対応する予め登録されたデータ、例えば、
「？？？？？？」等も文字に相当するデータを２のデー
タ出力手段へ出力する。If the feature of the input voice cannot be found in the data storage means 6, the pre-registered data corresponding to the unknown pattern, for example,
The data corresponding to the characters such as "?????????" is also output to the data output means 2.

【００１８】通常その出力されたデータ列はＣＲＴ上へ
エコーバックされ、ユーザはそれを確認し、入力を確定
するかどうかの判断を行う。もしも、ユーザが入力確定
の処理を行えば、この音声入力を確定とし、例えば改行
データ等を出力する。もしもユーザが確定の処理を行わ
なければ、図４に示す文字列の修正を行う処理２が起動
される。Normally, the output data string is echoed back on the CRT, and the user confirms it and decides whether or not to confirm the input. If the user performs the input confirmation processing, this voice input is determined and the line feed data or the like is output. If the user does not perform the confirmation process, the process 2 for correcting the character string shown in FIG. 4 is started.

【００１９】処理２は本質的に通常の文字列の修正処理
である。例えば「住所」に対し「？？？？？？」が出力
された場合、ユーザは通常の修正操作と同じようにカー
ソルキー等を用い、その文字列を「長野県諏訪市大和３
ー３ー５」等と修正し、確定操作を行う。そうするとシ
ステムは今入力された音声パタンに対して、この修正さ
れた文字列を対応させて登録をする。Process 2 is essentially a normal character string modification process. For example, when "????????" is output for "address", the user uses a cursor key or the like as in a normal correction operation, and the character string is changed to "3 Yamato, Suwa-shi, Nagano Prefecture".
-3-5 ", etc., and confirm operation. Then, the system registers this corrected character string in correspondence with the voice pattern just input.

【００２０】[0020]

【発明の効果】以上述べてきたように、本発明によれば
既存のハードウェアに最小限の音声入システムを加える
だけで、ソフトウェアには何の変更も加える事なく音声
入力を実現する事ができる。また、上の操作は現在広く
用いられている「かな漢字変換日本語入力」と同等の操
作であり、ユーザに容易に受け入れらるものである。ま
た音声認識処理においても、本発明の方法によれば、ユ
ーザーが容易に音声入力とキーボード入力とを切り替え
る事が可能であり、発話区間の切り出し誤り等は起こら
ず、また計算機資源の負荷も非常に小さいものである。As described above, according to the present invention, the voice input can be realized by adding the minimum voice input system to the existing hardware and without making any change to the software. it can. Further, the above operation is equivalent to the widely used "kana-kanji conversion Japanese input", which is easily accepted by the user. Also in the voice recognition processing, according to the method of the present invention, the user can easily switch between the voice input and the keyboard input, no erroneous segmentation of the utterance section occurs, and the load of computer resources is extremely high. It is a small one.

【００２１】本発明において、本発明のハードウェア
的、ソフトウェア的な附加部分がコンピュータ本体から
独立している必要はない。これはコンピュータ本体に含
まれたハードウェアを用い、ソフトウェア的なキーボー
ドデバイスドライバープログラムとして実現する事も可
能である。In the present invention, the hardware- and software-added portion of the present invention need not be independent of the computer main body. This can also be realized as a software keyboard device driver program using the hardware included in the computer body.

[Brief description of drawings]

【図１】本発明の構成の概念の模式図である。FIG. 1 is a schematic diagram of the concept of the configuration of the present invention.

【図２】本発明の処理全体の概略アルゴリズムを示す
フローチャートである。FIG. 2 is a flowchart showing a schematic algorithm of the whole processing of the present invention.

【図３】本発明の音声特徴パタンマッチング処理の概
略アルゴリズムを示すフローチャートである。FIG. 3 is a flowchart showing a schematic algorithm of a voice feature pattern matching process of the present invention.

【図４】本発明のデータ修正の概略アルゴリズムを示
すフローチャートである。FIG. 4 is a flowchart showing a schematic algorithm for data correction of the present invention.

[Explanation of symbols]

１：データ入力手段２：データ出力手段３：音声入力選択検出手段４：音声入力手段５：音声特徴抽出手段６：データ記憶手段７：音声特徴パタンマッチング手段８：データ入出力制御手段 1: Data input means 2: Data output means 3: Voice input selection detection means 4: Voice input means 5: Voice feature extraction means 6: Data storage means 7: Voice feature pattern matching means 8: Data input / output control means

Claims

[Claims]

1. A voice input selection data is detected from data inputted by 1) one or more sets of data input means, 2) one or more sets of data output means, and 3) one data input means. 4), voice input means, 5) voice feature extraction means for extracting features of voice input by the voice input means, 6), voice feature and data pair 7) a data storage means for storing the following: 7) a voice feature pattern matching means for performing a pattern matching of the voice features by the voice feature extraction means of 5 and the voice features in the data storage means of 6 as inputs; ) 1 data input means, 3 voice input selection detection means, 6 data storage means, 7 voice feature pattern matching means as input, and 2 data output means as output, input / output of data A voice input device comprising: a data input / output control means for controlling force;