JPS61279989A

JPS61279989A - System for correcting recognized result

Info

Publication number: JPS61279989A
Application number: JP60122004A
Authority: JP
Inventors: Masahiro Kojima; 雅広小島
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1985-06-05
Filing date: 1985-06-05
Publication date: 1986-12-10

Abstract

PURPOSE:To improve a correction processing speed by comparing the normalized basic feature of each character candidate with that of a character to be input ted, and outputting and displaying the candidate in the order of a higher coinci dence. CONSTITUTION:A normalized feature information part 10 stores basic features of a normalized character and a normalized graphic corresponding to a character code and a graphic one. A comparing part 11 compares the basic features stored in the information part 10, which correspond to character and graphic codes outputted from a correcting dictionary 7 with the normalized basic features of character and graphic images to be recognized, and outputs the character and graphic with the higher coincidence to a display part 5. thus, with respect to a reading input from a keying part 5, target characters with the 1st and 2nd priorities can be outputted by displaying candidate characters given to an operator, thereby improving the correction processing speed.

Description

【発明の詳細な説明】［概　要］漢字を含む文字・図形の認識装置において、文字・図形
コードに対応した正規化画像に対する基本特徴情報を格
納し、認識対象文字・図形画像の正規化した基本特徴と
、読み広入力に対して修正用辞書（カナ漢字変換辞書な
ど）より出力される文字・図形コードに対応した基本特
徴とを比較し、最も近似した文字・図形コードより表示
するようにしたもので、これにより、目的文字を優先的
に表示させて修正効率を向とすることができる。[Detailed Description of the Invention] [Summary] In a character/figure recognition device including kanji, basic feature information for a normalized image corresponding to a character/figure code is stored, and the normalized character/figure image to be recognized is The basic features are compared with the basic features corresponding to the character/figure codes output from a correction dictionary (kana-kanji conversion dictionary, etc.) for the reading input, and the most similar character/figure code is displayed. This makes it possible to display the target character preferentially and improve the correction efficiency.

「産業上の利用分野」本発明は、文字、図形の認識装置における認識結果の修
正に係わり、特に漢字等の認識における認識過誤の修正
方式に関するものである。"Industrial Application Field" The present invention relates to correction of recognition results in character and figure recognition devices, and particularly relates to a method for correcting recognition errors in recognition of Chinese characters and the like.

［従来の技術］文字、図形の認識装置における認識結果に対する従来の
修正方式は、基本的には、認識とは独立した処理として
構成されていた。[Prior Art] Conventional correction methods for recognition results in character and figure recognition devices have basically been configured as processing independent of recognition.

即ち、漢字等の文字を認識する場合は、その認識結果に
においで、認識過誤が発生すると、これを修正する手段
として、修正すべき漢字コードを引き出すため、その漢
字の読み方を打鍵入力し、この読み方に対応した漢字群
を、予め構成されている［読み方−漢字変換辞書」より
、順次引き出してきて表示部に表示し、操作者がこの中
から選択して修正漢字コードを決定していた。That is, when recognizing characters such as kanji, if a recognition error occurs in the recognition result, as a means to correct this, input the reading of the kanji by keystroke in order to extract the kanji code to be corrected. A group of kanji corresponding to this reading was sequentially pulled out from a pre-configured ``reading-kanji conversion dictionary'' and displayed on the display, and the operator selected one from among these to determine the corrected kanji code. .

第３図は、漢字認識を行う認識装置の構成例を示すブロ
ック図である。FIG. 3 is a block diagram showing an example of the configuration of a recognition device that performs kanji recognition.

第３図において、２は文字・図形情報の存在する媒体上
を光学的に走査する光電変換部と、得られた画像情報に
前処理を加える前処理部からなる外部装置からの、２値
化画像データを格納する画像メモリである。In Fig. 3, 2 is the binarization from an external device consisting of a photoelectric conversion unit that optically scans the medium on which text/graphic information exists, and a preprocessing unit that performs preprocessing on the obtained image information. This is an image memory that stores image data.

３は予め文字・図形の位置情報等を指示したデータ群か
らなるフォーマット定義体であり、４は処理対象とする
文字・図形の特徴群が設定されている認識用辞書である
。Reference numeral 3 denotes a format definition body consisting of a data group specifying positional information of characters and figures in advance, and numeral 4 denotes a recognition dictionary in which a group of characteristics of characters and figures to be processed are set.

５は認識結果の文字・図形コード、および修正指示文字
・図形コードを表示する表示部であり、６は認識結果の
文字・図形コードを修正する手段としての指示を与える
打鍵部であって、漢字を修正するときは、カナ文字によ
り入力するものである。Reference numeral 5 denotes a display section for displaying the character/figure code of the recognition result and the correction instruction character/figure code, and 6 is the keypad section for giving an instruction as a means of correcting the character/figure code of the recognition result. When editing, input in kana characters.

７は打鍵部６からのカナ文字コード列に対応した漢字コ
ードが出力されるよう予め構成された修正用辞書であっ
て、即ち「読み方−漢字変換辞書」である。Reference numeral 7 denotes a correction dictionary configured in advance to output a kanji code corresponding to the kana character code string from the keypad 6, that is, a ``reading-kanji conversion dictionary''.

８はこの修正用辞書７を検索する検索部である。Reference numeral 8 denotes a search unit that searches this correction dictionary 7.

第４図は、カナ漢字変換方式による修正の従来例を説明
する図である。FIG. 4 is a diagram illustrating a conventional example of correction using the kana-kanji conversion method.

第４図において、（ａ）は認識すべき媒体上の文字を示
し、（ｂ）は認識処理を示す。In FIG. 4, (a) shows the characters on the medium to be recognized, and (b) shows the recognition process.

（ｈｌ）および（ｂ２）は、（ｂ）の認識処理の原理を
示すものである。(hl) and (b2) show the principle of the recognition process in (b).

（ｂｌ）は画像メモリに格納され、１文字づつ切り出さ
れた画像データであり、ｍＸｎドツトの切り出し寸法を
持っている。(bl) is image data stored in the image memory and cut out one character at a time, and has a cutout size of mXn dots.

（ｂ２）は、これをＱＸＱドツトに正規化し、縦、横、
および斜めの線密度パターンを特徴として抽出したとこ
ろを示す。(b2) normalizes this to QXQ dots, vertically, horizontally,
and diagonal line density patterns extracted as features are shown.

認識処理は、これらの線密度特徴を認識用辞書に格納さ
れている線密度特徴と比較することにより行われる。Recognition processing is performed by comparing these line density features with line density features stored in a recognition dictionary.

第４図（ｃ）は、認識用辞書と比較の結果、最も一致度
の高い順に、叉、夕、犬、又、文と候補が表示されたが
、認識対象の真のカテゴリ「丈」が存在せず、認識過誤
を起したことを示す。Fig. 4(c) shows that as a result of comparison with the recognition dictionary, the following candidates were displayed in the order of highest matching: 叉, ゆう, いん, MATATA, and sentences, but the true category to be recognized was ``Jō''. It does not exist, indicating that a recognition error occurred.

（ｄ）は、操作者が修正のため、打鍵部より読み方「ジ
ヨウ」を入力したことを示す。(d) indicates that the operator inputs the pronunciation "JIYO" from the keypad for correction.

（ｅ）は、検索部が読み広入力「ジヨウ」により修正用
辞書（読み方−漢字変換辞書）を検索するところを示す
。(e) shows the search unit searching the correction dictionary (reading-kanji conversion dictionary) based on the reading input "Jiyou".

＜ｒ＞　は、（ｅ）の修正用辞書の１つの部分を拡大し
て示したもので、読み広入力「ジヨウ」に対応する候補
群、上、条、成、等を示し、目的文字である「丈」がｎ
番目に存在していることを示している。<r> is an enlarged view of one part of the correction dictionary in (e), indicating the candidate group corresponding to the wide-reading input ``Jiyou'', such as 上, 行, sei, etc., and the target character. A certain “length” is n
It shows that it exists in the second place.

このように、読み広入力に対して表示される文字候補は
、修正用辞書における格納の順序によるものであり、場
合によっては、目的文字を得るまでに、多くの手数を要
することがあるものであった。In this way, the character candidates displayed for Yomihiro input depend on the order in which they are stored in the correction dictionary, and in some cases, it may take many steps to obtain the target character. there were.

［発明が解決しようとする問題点］上記に説明したように、従来の修正方式では、読み方に
より引き出される漢字コードが一般には複数個存在し、
したがって、目的とする漢字コードの現れるまでの選択
操作が多くなりがちであり、操作性が良くないという問
題点があった。[Problems to be Solved by the Invention] As explained above, in the conventional correction method, there are generally multiple kanji codes extracted depending on the reading.
Therefore, there is a problem that many selection operations are required until the desired kanji code appears, resulting in poor operability.

また、操作者側からみると、目的漢字コードとは異なる
漢字コードが出力されるため、認識装置としての評価低
下につながる。Furthermore, from the operator's perspective, a kanji code different from the target kanji code is output, leading to a decline in the evaluation as a recognition device.

本発明は、これらの問題点を解消した新規な修正方式を
提供しようとするものである。The present invention seeks to provide a novel correction method that eliminates these problems.

［問題点を解決するための手段］第１図は本発明の認識結果の修正方式の原理ブロック図
を示す。[Means for Solving the Problems] FIG. 1 shows a block diagram of the principle of a recognition result correction method according to the present invention.

第１図において、第３図と同一の符号は同一の対象物を
示す。In FIG. 1, the same reference numerals as in FIG. 3 indicate the same objects.

第１図において、１０は正規化特徴情報部であって、文
字・図形コードに対応させた正規化文字・図形の基本特
徴を格納する。例えば、予め認識対象とする文字・図形
をＱＸＱドツトに正規化し、各々の綿密度情報を大量に
収集し、これらより平均的、且つ最適な線密度情報を、
各々の文字・図形コードに対応して格納するものである
。In FIG. 1, reference numeral 10 denotes a normalized feature information section, which stores basic features of normalized characters and graphics that correspond to character and graphics codes. For example, characters and figures to be recognized are normalized into QXQ dots in advance, a large amount of density information is collected for each, and the average and optimal line density information is calculated from these.
It is stored in correspondence with each character/graphic code.

１１は比較部であって、修正用辞書７より出力される文
字・図形コードに対応した、特徴情報部１０の格納する
基本特徴と、認識対象文字・図形画像の正規化した基本
特徴とを比較し、−政変の最も高い文字・図形から順に
、表示部５に出力する。Reference numeral 11 denotes a comparison unit that compares the basic features stored in the feature information unit 10 corresponding to the character/figure codes output from the correction dictionary 7 with the normalized basic features of the character/figure image to be recognized. Then, the characters/figures with the highest political change are outputted to the display unit 5 in order.

［作用］上記に説明したように、本発明は、カナ漢字変換方式に
よる認識結果の修正において、読み方入力に対応して修
正用辞書より出力される文字候補を、そのままの順序で
出力表示することなく、その各文字候補の正規化した基
本特徴と、入力対象文字の正規化した基本特徴とを比較
し、−政変あ高い順に出力表示したものである。[Operation] As explained above, the present invention is capable of outputting and displaying character candidates output from a correction dictionary in response to reading input in the same order when correcting recognition results using the kana-kanji conversion method. Instead, the normalized basic features of each character candidate are compared with the normalized basic features of the input target character, and output and displayed in descending order of political change.

これによって、読み広入力に対して、操作者に与える候
補文字の表示で、目的文字を１〜２番目の高い順位で出
力することができ、修正処理速度を向上できるものであ
る。As a result, the target character can be output in the first or second highest order in the display of candidate characters given to the operator in response to a wide reading input, and the correction processing speed can be improved.

即ち、本来の文字・図形画像に、より近似した文字・図
形コードを修正用辞書から索引する手段として、文字・
□図形の正規化特徴情報を予め格納して、これと認識対
象文字・図形の認識時の基本特徴とを比較するようにし
、認識用辞書と修正用辞書との、本来の使用目的および
構成の違いによる修正効率の低下を極小にしたものであ
る。In other words, as a means of indexing a character/figure code that is more similar to the original character/figure image from a correction dictionary,
□ Store the normalized feature information of figures in advance and compare this with the basic features during recognition of characters and figures to be recognized, and check the original purpose and configuration of the recognition dictionary and correction dictionary. This minimizes the reduction in correction efficiency due to differences.

［実施例］第２図は、本発明の実施例の概念ブロック図である。[Example] FIG. 2 is a conceptual block diagram of an embodiment of the present invention.

第２図において、（ａ）は特徴情報部を示し、各文字・
図形コードに対応して、正規化文字・図形の基本特徴と
して、各文字の横（Ｘ）方向および縦（Ｙ）方向の線密
度パターンのテーブル（または線密度コード）を格納し
、１文字について４８バイトのテーブルとなっている。In FIG. 2, (a) shows the feature information section, and each character
Corresponding to the figure code, a table (or line density code) of the line density pattern in the horizontal (X) direction and vertical (Y) direction of each character is stored as a basic feature of normalized characters and figures. It is a 48-byte table.

（ｂ）は、認識対象文字画像の横（Ｘ）方向および縦（
Ｙ）方向の線密度パターンであって、４８パ゛イトであ
る。(b) shows the horizontal (X) direction and vertical (
The line density pattern is 48 bytes in the Y) direction.

（ｃ）は、打鍵部からの読み方「ジヨウ」の入力により
修正用辞書を検索し、それぞれ、ｉ＋ｉ＋１、ｉ＋２番
目として、上、条、成、の文字コードが出力されること
を示している。(c) shows that the correction dictionary is searched by inputting the pronunciation "Jiyou" from the keypad, and the character codes of 上, jo, and sei are output as i+i+1 and i+2, respectively.

（ｄ）は、比較部において、（ｂ）から出力される認識
対象文字と、（ｃ）から出力されるｉ番目の文字コード
に対する線密度パターンとの距離を計算し、近似してい
ればこれを出力し、近似していなければ、ｉ＋１番目の
文字コードに移り、再び比較することを示し、このよう
にして、近似している「丈」が出力されることを示して
いる。In (d), the comparison unit calculates the distance between the recognition target character output from (b) and the linear density pattern for the i-th character code output from (c), and if they are approximate, this is the distance. is output, and if they are not approximate, the process moves to the i+1th character code and is compared again. In this way, the approximate "length" is output.

［発明の効果］以上説明のように本発明によれば、漢字を含む文字・図
形認識装置における、カナ漢字変換方式　　　　　。[Effects of the Invention] As described above, according to the present invention, there is provided a kana-kanji conversion method in a character/figure recognition device including kanji.

による認識結果の修正において、修正効率を大幅　　　
　　。Significantly improves correction efficiency when correcting recognition results using
.

に向上するが期待でき、その実用上の効果は大である。It can be expected that this will improve the performance, and its practical effects will be significant.

[Brief explanation of the drawing]

第１図は本発明の原理ブロック図、第２図は本発明の実施例の概念ブロック図、第３図は従
来例のブロック図、第４図は従来例の修正方式を説明する図である。図面において、１は認識制御部、　　　　　２は画像メモリ、３はフォ
ーマツ１一定義体、　　４は認識用辞書、５は表示部、
　　　　　　　６は打鍵部、７は修正用辞書、　　　　
　　８は検索部、９は出ノ】部、　　　　　　　１０は
特徴情報部、１１は比較部、をそれぞれ示す。Fig. 1 is a principle block diagram of the present invention, Fig. 2 is a conceptual block diagram of an embodiment of the present invention, Fig. 3 is a block diagram of a conventional example, and Fig. 4 is a diagram explaining a modification method of the conventional example. . In the drawings, 1 is a recognition control unit, 2 is an image memory, 3 is a format 1 definition body, 4 is a recognition dictionary, 5 is a display unit,
6 is a keypad, 7 is a correction dictionary,
8 is a search section, 9 is an output section, 10 is a feature information section, and 11 is a comparison section.

Claims

[Scope of Claims] A format consisting of an image memory (2) that stores image information obtained by scanning a medium containing character/graphic information by optical means, and a data group in which positional information of characters/graphical information, etc. are specified in advance. A definition body (3), a recognition dictionary (4) that stores the feature group of characters and figures to be processed, and a display unit (5) that outputs and displays the character and figure codes of the recognition results.
), a keypad (6) for giving an instruction as a means of correcting the character/figure code of the recognition result, and a keypad (6) configured in advance to output a character/figure code corresponding to the input code string from the keypad. A correction dictionary (7), a search unit (8) that searches the correction dictionary (7), an output unit (9) that outputs character/graphic codes resulting from recognition and correction onto an output medium, and each of the above units. A recognition control unit (1) that controls interfaces with external devices and a feature information unit (10) that stores basic characteristics of normalized characters and figures corresponding to character and figure codes. ) and the correction dictionary (
7) The basic features stored in the feature information section (10) corresponding to the character/graphic code output from
Comparison unit (1) that compares the normalized basic features of the graphic image
1), and the result of comparison by the comparison unit (11),
A recognition result correction method characterized in that the most similar character/figure code is displayed.