JPS63159900A

JPS63159900A - Voice information input system

Info

Publication number: JPS63159900A
Application number: JP61306408A
Authority: JP
Inventors: 潔長澤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1986-12-24
Filing date: 1986-12-24
Publication date: 1988-07-02

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は文字読取装置に用いて入力作業の効率向上に好
適な音声情報入力方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a voice information input method suitable for use in a character reading device to improve the efficiency of input work.

[Conventional technology]

従来°の装置は、特開昭５７−２００１００号のように
、文字読取装置と音声認識のハードを一部共有化して組
合せることが考えられていた。しかし、音声入力時の認
識範囲については配慮されていなかった。In conventional devices, as in Japanese Patent Laid-Open No. 57-200100, it was considered to combine a character reading device and voice recognition hardware by sharing part of the hardware. However, no consideration was given to the recognition range during voice input.

[Problem that the invention seeks to solve]

上記従来技術は認識範囲の点については配慮がされてお
らず、特にＯＣＲでの文字読取は一語単位で行なわれる
ため、音声入力も必然的に音節単位のＬ＆ｍとなるが、
これは良く知られているように、認識精度（認識率）の
点で問題となる。The above conventional technology does not take into consideration the recognition range, and in particular, character reading in OCR is performed in units of words, so voice input is inevitably L&M in units of syllables.
As is well known, this poses a problem in terms of recognition accuracy (recognition rate).

本発明の目的はＯＣＲにて入力できながった文字を音声
にて安定かつ速やかに入力することにある。An object of the present invention is to stably and quickly input characters by voice that cannot be input by OCR.

[Means for solving problems]

上記目的は、ＯＣＲにて取り込んだ情報を認識範囲に反
映することにより達成される。The above objective is achieved by reflecting the information captured by OCR in the recognition range.

[Effect]

ＯＣＲにて読み込まれた文字入力データは、標準文字パ
ターンとの類似度計算（マツチング）を行なった後、判
定部で結果が求められる。この時ＣＲＴ等の外部表示は
第１位のものだけであるとしても、内部的には第ｎ位（
ｎは別に与えられるパラメータ）までの結果をメモリに
保持しておき、音声入力による訂正や再入力の際にはこ
れらの候補単語を認識範囲とする。このようにすると、
入力文字の種類が数１０〜数１００　　あっても音声で
の入力対象となるのはｎ語であるので、正確な認識が期
待できる。また、演算量も減少するので処理時間も短縮
できる。The character input data read by OCR is subjected to similarity calculation (matching) with a standard character pattern, and then the result is determined by a determination unit. At this time, even if the external display such as a CRT is only for the first place, internally the nth place (
The results up to (n is a parameter given separately) are held in memory, and these candidate words are used as the recognition range when correcting or re-inputting by voice input. In this way,
Even if there are tens to hundreds of types of input characters, only n words are input by voice, so accurate recognition can be expected. Furthermore, since the amount of calculations is reduced, the processing time can also be shortened.

〔Example〕

以下、本発明の一実施例を第１図により説明する。図中
、１は文字データを電気信号に変換する光電変換部であ
り、ここで読み取られた文字情報は２の特徴抽出部で特
徴を抽出された後、３の類似度演算部に送られる。類似
度演算部では入力データと４の文字標準パターンメモリ
ー中の標準文字データとの伸縮マツチングを行ない、類
似度を求める。５の判定部は類似度判定部で求められた
ガ１似度が充分高く、かつ２位以下の候補の類似度と充
分な差がある場合には１位の文字コードを上位装置やＣ
ＲＴ等に出力するが、類似度が低い場合や２位以下の候
補と類似度の差があまりない場合には音声入力による再
入力をうながす。操作者により発生された音声は６の音
声入力部で増幅やＡＤ変換等が行なわれた後、７の特徴
抽出部で特、徴パラメータが求められ、９の類似度演算
部でこの特徴パラメータと８の音声標準パターンメモリ
中の標準音声データとのマツチングが行なわれる。An embodiment of the present invention will be described below with reference to FIG. In the figure, numeral 1 is a photoelectric conversion unit that converts character data into electrical signals, and the character information read here has its features extracted by a feature extraction unit 2, and then sent to a similarity calculation unit 3. The similarity calculation section performs expansion/contraction matching between the input data and the standard character data in the character standard pattern memory 4 to determine the similarity. If the degree of similarity determined by the similarity degree determination section is sufficiently high and there is a sufficient difference from the degree of similarity of the second or lower candidates, the determination section 5 transmits the character code of the first place to the host device or C.
It is output to RT, etc., but if the degree of similarity is low or there is not much difference in degree of similarity from the second or lower candidate, re-input by voice input is prompted. After the voice generated by the operator is amplified and AD converted in the voice input section 6, features and characteristic parameters are obtained in the feature extraction section 7, and these feature parameters are calculated in the similarity calculation section 9. Matching with the standard voice data in the voice standard pattern memory of No. 8 is performed.

この時全ての標準パターンとのマツチングを行なわずに
、１０の認識範囲制御部によりＯＣＲ部であらかじめ候
補にあげられた単語に制限される。At this time, without performing matching with all standard patterns, the recognition range control section 10 limits the words to words that have been selected as candidates in advance by the OCR section.

ＯＣＲで読み込む文字は通常、数字、アルファベット、
カタカナの数１０〜数１００文字におよび、これらの中
には「２」、「二Ｊ、ｒＥＪ等の互いに発音が似かよっ
たものが含まれているため、これらを音声で再入力、あ
るいは訂正しようとしても高い認識率は期待できないが
１本実施例によれば、認識の対象となる文字数はかなり
制限されるので、認識精度が高まると共に、９の類似度
演算部での処理量が減少するので、処理時間が短縮され
るという効果がある。The characters read by OCR are usually numbers, alphabets,
There are 10 to 100 katakana characters, and some of these include words with similar pronunciations, such as ``2'', ``niJ'', and rEJ, so let's re-enter or correct them aloud. However, according to this embodiment, the number of characters to be recognized is considerably limited, so the recognition accuracy is increased and the amount of processing in the similarity calculation section 9 is reduced. This has the effect of shortening processing time.

〔Effect of the invention〕

本発明によれば、入力すべき文字数が多くても音声によ
る認識範囲はあらかじめＯＣＲによって選ばれた候補に
限られるので、認識精度が高まり演算処理に要する時間
も短縮できるという効果がある。According to the present invention, even if there are a large number of characters to be input, the voice recognition range is limited to candidates selected in advance by OCR, so that the recognition accuracy is increased and the time required for arithmetic processing is reduced.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図である。１・・・光電変換装置、３・・・類似度演算部、４・・
・文字標準パターンメモリー、５・・・判定部、６・・
・音声入力部、９・・・類似度演算部、１０・・・認識
範囲制御部。Ｘｉ　図FIG. 1 is a block diagram showing one embodiment of the present invention. 1... Photoelectric conversion device, 3... Similarity calculation unit, 4...
・Character standard pattern memory, 5... Judgment section, 6...
- Voice input section, 9... Similarity calculation section, 10... Recognition range control section. Xi diagram

Claims

[Claims]

1. In an OCR (optical character reader) equipped with a voice recognition device, when correcting characters misread or rejected by the OCR using voice input, limit the voice recognition range to the top candidates determined by the OCR. A voice information input method characterized by improving recognition accuracy and speed.