JP3100786B2

JP3100786B2 - Character recognition post-processing method

Info

Publication number: JP3100786B2
Application number: JP04299413A
Authority: JP
Inventors: 章鈴木; 末治宮原
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1992-11-10
Filing date: 1992-11-10
Publication date: 2000-10-23
Anticipated expiration: 2015-10-23
Also published as: JPH06150070A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、文字認識装置におい
て、手書き入力文字等の文字認識精度を向上し得る文字
認識後処理方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition post-processing system capable of improving the accuracy of character recognition of handwritten input characters and the like in a character recognition device.

【０００２】[0002]

【従来の技術】従来の文字認識後処理の技術を以下に説
明する。説明のために、認識対象の文字カテゴリの集合
を漢字およびカタカナであるとする。また、入力パター
ン列は枠の無い帳票に書かれているものとする。また、
この従来例では、単語辞書を利用した場合を例とし、説
明のための例として、図２に示す９個の単語から構成さ
れる単語辞書を用いる。2. Description of the Related Art A conventional post-character recognition technique will be described below. For the sake of explanation, it is assumed that a set of character categories to be recognized are kanji and katakana. It is assumed that the input pattern sequence is written on a form without a frame. Also,
In this conventional example, a case where a word dictionary is used is taken as an example, and a word dictionary composed of nine words shown in FIG. 2 is used as an example for explanation.

【０００３】まず、図３に例を示す単語「山村商店」の
表記のイメージデータが入力される。続いて、個々の文
字パターンを切り出す処理が行われる。この例では２文
字目の「村」の篇と旁が分離しているため、イメージデ
ータの情報だけではこれが１文字か２文字かが確定でき
ず、図４の（ａ）と（ｂ）のように２つの切り出し結果
が得られたとする。[0005] First, image data of the word “Yamamura Shoten” shown in FIG. 3 is input. Subsequently, a process of cutting out individual character patterns is performed. In this example, since the second character “village” is separated from the side, it is not possible to determine whether this is one character or two characters only by the information of the image data, and FIG. 4A and FIG. It is assumed that two cutout results are obtained as follows.

【０００４】次に、切り出された個々の文字パターンは
文字認識され、図５の（ａ）と（ｂ）のような個々の文
字パターンの切り出し結果から推定した文字位置（これ
を推定文字位置と呼ぶ）と候補文字を組み合わせたデー
タ構造（これを候補文字マトリクスと呼ぶ）が作成され
る。Next, each of the cut-out character patterns is subjected to character recognition, and character positions estimated from the cut-out results of the individual character patterns as shown in FIGS. ) And a candidate character are combined (this is called a candidate character matrix).

【０００５】次に、こうして作成された候補文字マトリ
クスと単語辞書が照合され、一致する文字数の多い単語
が正解として選ばれる（図６）。この例では図５（ｂ）
の候補文字マトリクスと「山村商店」の一致文字数が３
となって最も多く、「山村商店」が正しく出力される。
これは図５（ｂ）の推定文字位置が単語辞書中の単語の
文字位置（これを正解文字位置と呼ぶ）と一致したため
である。Next, the candidate character matrix created in this way is compared with a word dictionary, and a word having a large number of matching characters is selected as a correct answer (FIG. 6). In this example, FIG.
Match character number of "Yamamura Shoten" is 3
And "Yamamura Shoten" is output correctly.
This is because the estimated character position in FIG. 5B matches the character position of the word in the word dictionary (this is called the correct character position).

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上記従
来の技術による文字認識後処理方式では、単語辞書中の
単語と候補文字マトリクスの照合において、対応する文
字位置における文字を比較するため、推定文字位置と正
解文字位置が一致しない場合は単語照合に失敗するとい
う問題があった。However, in the character recognition post-processing method according to the above-mentioned prior art, in comparing a word in a word dictionary with a candidate character matrix, a character at a corresponding character position is compared. If the correct character positions do not match, the word matching fails.

【０００７】その一例を以下に示す。「株式会社ミク
ロ」のイメージデータの１つの例の文字切り出し結果を
図７に示す。これは、「株」と「式」、「会」と「社」
が接触しているためにこの２箇所で文字切り出しが失敗
し、さらに「ク」と「ロ」の間に汚れが混入した例であ
る。この切り出し結果から得られる候補文字マトリクス
と単語辞書中の単語とは、図８に示すように推定文字位
置と正解文字位置がずれているために一致文字が一つも
無く単語照合に失敗する。An example is shown below. FIG. 7 shows the result of extracting characters from one example of image data of “Micro Corporation”. These are "stock" and "formula", "kai" and "sha"
This is an example in which the character cutout fails at these two locations because of contact, and dirt is mixed between “ku” and “b”. As shown in FIG. 8, the candidate character matrix obtained from the cut-out result and the words in the word dictionary have a misaligned estimated character position and correct character position, so that there is no matching character and word matching fails.

【０００８】本発明は、上記問題点を解決するためにな
されたものであり、その第１の目的としては、単語照合
を適確に行い、認識精度を向上させることができる文字
認識後処理方式を提供すること、またその第２の目的と
しては、高速に上記の文字認識後処理を実行できる文字
認識後処理方式を提供することにある。The present invention has been made to solve the above problems, and a first object of the present invention is to provide a character recognition post-processing method capable of accurately performing word matching and improving recognition accuracy. The second object of the present invention is to provide a character recognition post-processing method capable of executing the above-described character recognition post-processing at high speed.

【０００９】[0009]

【課題を解決するための手段】上記の第１の目的を達成
するため、本発明では、請求項１に記載するように、入
力文字パターン列を認識する文字認識装置において、前
記入力文字パターン列から個別の文字パターンを切り出
す文字切り出し手段と、前記文字切り出し手段から出力
された個別の文字パターンを文字認識する文字認識手段
と、単語に索引として付与したキー文字および該キー文
字の文字位置をまとめたキー文字辞書と、前記文字認識
手段から出力された文字認識結果と前記キー文字辞書を
照合して該当するキー文字から候補単語および該キー文
字の文字位置を得るキー文字照合手段と、前記キー文字
照合手段で得られた候補単語の中で正解の可能性の高い
候補単語を前記該当するキー文字の文字位置で評価して
選択し出力する候補単語評価手段とを、用いる構成とし
ている。In order to achieve the first object, the present invention provides a character recognition device for recognizing an input character pattern sequence according to the present invention. Character extracting means for extracting an individual character pattern from a character, character recognizing means for character recognizing the individual character pattern output from the character extracting means, a key character added to a word as an index, and a character position of the key character. A key character dictionary, key character matching means for comparing a character recognition result output from the character recognition means with the key character dictionary, and obtaining a candidate word and a character position of the key character from a corresponding key character; A candidate word that is likely to be a correct answer among candidate words obtained by the character matching means is evaluated and selected at the character position of the corresponding key character, and is selected and output. The word evaluation means has a configuration to be used.

【００１０】また、上記の第２の目的を達成するため、
本発明では、請求項２に記載するように、入力文字パタ
ーン列を認識する文字認識装置において、前記入力文字
パターン列から個別の文字パターンを切り出す文字切り
出し手段と、前記文字切り出し手段から出力された個別
の文字パターンを文字認識する文字認識手段と、各単語
に索引として付与した重要性を有するキー文字および該
キー文字の文字位置をまとめたキー文字辞書と、前記文
字認識手段から出力された文字認識結果と前記キー文字
辞書を照合して該当するキー文字から候補単語および該
キー文字の文字位置を得るキー文字照合手段と、前記各
単語を構成するキー文字の中でキー文字辞書に登録され
ていない非キー文字および該非キー文字の文字位置を単
語別にまとめた非キー文字辞書と、前記キー文字照合手
段によって得られた候補単語の前記キー文字辞書に登録
されていない非キー文字の情報を非キー文字辞書から得
て、該非キー文字の情報と前記文字認識手段から出力さ
れた文字認識結果を照合し該当する非キー文字の文字位
置を得る非キー文字照合手段と、前記候補単語の中で正
解の可能性の高い単語を前記キー文字照合手段および／
または前記非キー文字照合手段で得られた文字位置で評
価して選択し出力する候補単語評価手段とを、用いる構
成としている。In order to achieve the second object,
According to a second aspect of the present invention, in the character recognition device for recognizing an input character pattern string, a character cutout unit that cuts out an individual character pattern from the input character pattern string, and a character output unit that outputs the character pattern. Character recognition means for recognizing individual character patterns, key character dictionaries summarizing key characters having significance assigned to each word as an index and character positions of the key characters, and characters output from the character recognition means Key character matching means for comparing a recognition result with the key character dictionary to obtain a candidate word and a character position of the key character from a corresponding key character; and a key character dictionary registered in the key character dictionary among the key characters constituting each word. A non-key character dictionary that summarizes the non-key characters and the character positions of the non-key characters for each word, and the key character matching unit. The information of the non-key character not registered in the key character dictionary of the candidate word is obtained from the non-key character dictionary, and the information of the non-key character is compared with the character recognition result output from the character recognition means. A non-key character matching means for obtaining a character position of a character; and a key character matching means and / or
Alternatively, a candidate word evaluation unit that evaluates, selects, and outputs a character position obtained by the non-key character matching unit is used.

【００１１】[0011]

【作用】本発明の文字認識後処理方式では、入力された
文字パターン列から個別に文字パターンを切り出して文
字認識を行った後、その文字認識結果により単語照合を
行って、キー文字辞書からキー文字を検索して候補単語
を選択し、その候補単語を該当するキー文字の文字位置
情報で評価して正解の可能性の高い単語を選択する。こ
のように、文字切り出し時の文字位置情報を使用しない
で単語照合を行うことにより、文字切り出し結果から推
定される文字位置が正解文字位置からずれていても単語
照合を可能として、認識精度を向上させる。According to the character recognition post-processing method of the present invention, a character pattern is individually cut out from an input character pattern string, character recognition is performed, and word matching is performed based on the character recognition result. A character is searched to select a candidate word, and the candidate word is evaluated based on the character position information of the corresponding key character to select a word having a high possibility of a correct answer. In this way, by performing word matching without using character position information at the time of character extraction, even if the character position estimated from the character extraction result is deviated from the correct character position, word matching is possible, and recognition accuracy is improved. Let it.

【００１２】また、キー文字辞書に候補単語の索引とし
て付与されるキー文字に重要性の高いキー文字を選択す
ることにより、キー文字照合の結果で得られる候補単語
数を減少させ、単語照合および正解の可能性の高い候補
単語の選択の処理速度を向上させる。Further, by selecting a key character having a high importance as a key character to be added to the key character dictionary as an index of the candidate word, the number of candidate words obtained as a result of the key character collation can be reduced, and the word collation can be reduced. The processing speed of selecting a candidate word having a high possibility of a correct answer is improved.

【００１３】[0013]

【実施例】以下、本発明の実施例を、図面を参照して詳
細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１４】図１は、本発明の文字認識後処理方式の第
１の実施例の構成を示すブロック図である。図１におい
て、１は文字切り出し手段、２は文字認識手段、３はキ
ー文字辞書、４はキー文字照合手段、５は候補単語評価
手段である。本実施例は請求項１の発明の一実施例を示
すものである。FIG. 1 is a block diagram showing the configuration of a first embodiment of the character recognition post-processing system of the present invention. In FIG. 1, reference numeral 1 denotes a character extracting unit, 2 denotes a character recognizing unit, 3 denotes a key character dictionary, 4 denotes a key character collating unit, and 5 denotes a candidate word evaluating unit. This embodiment shows an embodiment of the first aspect of the present invention.

【００１５】まず、文字切り出し手段１に文字パターン
列が入力され、個々の文字パターンの切り出し処理が行
われる。そして、その個々の文字パターンが文字認識手
段２で認識される。入力データの例として、図７の「株
式会社ミクロ」のイメージデータを用い、文字切り出し
手段１と文字認識手段２の処理結果が図８で示す候補マ
トリクスであったとする。First, a character pattern string is inputted to the character extracting means 1, and an individual character pattern is extracted. Then, the individual character patterns are recognized by the character recognition means 2. As an example of input data, it is assumed that the image data of “Micro Corporation” in FIG. 7 is used, and the processing results of the character extracting unit 1 and the character recognizing unit 2 are the candidate matrices shown in FIG.

【００１６】次に、候補文字マトリクスはキー文字照合
手段４に入力される。キー文字照合手段４は、候補文字
マトリクスから図９のような候補文字と推定文字位置か
ら構成される検索キー表を作成する。そして、この検索
キー表をキー文字辞書３と照合する。キー文字辞書３の
構成例を次に説明する。Next, the candidate character matrix is input to the key character collating means 4. The key character matching means 4 creates a search key table composed of candidate characters and estimated character positions as shown in FIG. 9 from the candidate character matrix. Then, this search key table is collated with the key character dictionary 3. Next, a configuration example of the key character dictionary 3 will be described.

【００１７】説明の例として、単語を構成する全ての文
字をキー文字として単語に付与する場合を説明する。ま
ず、単語辞書中の各単語に単語コードを割り当てる。次
に各単語を構成する文字に分解し、各文字と単語コード
および正解文字位置を結合したデータ構造（これを個別
文字情報と呼ぶ）を作成する。図２の単語辞書から各単
語の各文字の個別文字情報を作成した例を図１０に示
す。次に、全ての個別文字情報を文字別にまとめ、これ
をキー文字辞書とする。例を図１１に示す。As an example of the description, a case will be described in which all characters constituting a word are assigned to the word as key characters. First, a word code is assigned to each word in the word dictionary. Next, each word is decomposed into characters, and a data structure (referred to as individual character information) in which each character is combined with a word code and a correct character position is created. FIG. 10 shows an example in which individual character information of each character of each word is created from the word dictionary of FIG. Next, all the individual character information is collected for each character, and this is used as a key character dictionary. An example is shown in FIG.

【００１８】図９の検索キー表と図１１のキー文字辞書
を照合すると、まず図１２に示す直接的な照合結果が得
られる。キー文字照合手段４は、これを各候補単語別に
整理して図１３に例を示す候補単語表を作成する。候補
単語表における候補単語の情報の例として、推定文字位
置と正解文字位置から構成される文字位置情報を用いて
いる。When the search key table shown in FIG. 9 is compared with the key character dictionary shown in FIG. 11, a direct comparison result shown in FIG. 12 is obtained. The key character collating means 4 arranges these for each candidate word and creates a candidate word table as shown in FIG. As an example of candidate word information in the candidate word table, character position information including an estimated character position and a correct character position is used.

【００１９】次に、候補単語評価手段５が候補単語表の
各候補単語について評価スコアを計算する。評価スコア
の計算方法にはさまざまなものが考えられるが、説明の
ための例として以下の方法を用いる。Next, the candidate word evaluation means 5 calculates an evaluation score for each candidate word in the candidate word table. There are various methods for calculating the evaluation score, and the following method is used as an example for explanation.

【００２０】まず各文字位置情報により、推定文字位置
から正解文字位置を引いた値（これを文字位置差分と呼
ぶ）を計算する。次に文字位置差分の各数値について、
次の方法で定義される差分スコアを計算する。First, a value obtained by subtracting the correct character position from the estimated character position (referred to as a character position difference) is calculated from each character position information. Next, for each numerical value of the character position difference,
Calculate the difference score defined by the following method.

【００２１】文字位置差分の数値をＸとし、Ｘの差分ス
コアをＦ（Ｘ）とすると、Ｆ（Ｘ）は次の式、Ｆ（Ｘ）＝Ｘの個数＋（Ｘ−１）の個数×０．５＋（Ｘ
＋１）の個数×０．５で計算される。次に、求められたＦ（Ｘ）の集合の中で
最大の数値をその候補単語の評価スコアとする。Assuming that the numerical value of the character position difference is X and the difference score of X is F (X), F (X) is given by the following equation: F (X) = the number of X + the number of (X−1) × 0.5+ (X
+1) × 0.5. Next, the largest numerical value in the obtained set of F (X) is set as the evaluation score of the candidate word.

【００２２】図１３の例から評価スコアを計算する過程
を示したのが図１４である。例えば候補単語「株式会社
ミクロ」では、文字位置差分−２の差分スコアは２．
５、文字位置差分−１の差分スコアは２、評価スコアは
これらの中の最大値２．５となる。FIG. 14 shows the process of calculating the evaluation score from the example of FIG. For example, for the candidate word “Micro Corporation”, the difference score of the character position difference−2 is 2.
5. The difference score of the character position difference -1 is 2, and the evaluation score is the maximum value 2.5 among these.

【００２３】そして候補単語評価手段５は評価スコアの
最も高い候補単語を正解として出力するが、この場合は
「株式会社ミクロ」が正しく選ばれる。このように、本
実施例によれば、単語辞書中の単語と文字切り出し結果
から推定される文字位置がずれていても単語照合が行え
るので、認識精度を向上させることができる。The candidate word evaluation means 5 outputs the candidate word having the highest evaluation score as a correct answer. In this case, "Micro Corporation" is correctly selected. As described above, according to the present embodiment, even if the positions of the words in the word dictionary and the character positions estimated from the character segmentation result are shifted, word matching can be performed, so that the recognition accuracy can be improved.

【００２４】次に、本発明の第２の実施例を説明する。
本実施例は、請求項２の発明の一実施例を示すものであ
る。Next, a second embodiment of the present invention will be described.
This embodiment shows an embodiment of the second aspect of the present invention.

【００２５】上記第１の実施例では、単語に付与するキ
ー文字の制限方法については触れられておらず、場合に
よってはキー文字照合の結果得られる候補単語の個数が
大きくなり、処理量が大きくなる危険性がある。In the first embodiment, the method of restricting key characters to be added to a word is not described. In some cases, the number of candidate words obtained as a result of key character collation increases, and the processing amount increases. There is a danger of becoming.

【００２６】例として、第１の実施例における説明にお
いて、入力文字パターン列として図３の「山村商店」の
データが入力された場合を考える。文字切り出し手段１
と文字認識手段３の結果、図５（ａ）と図５（ｂ）の候
補文字マトリクスが得られたとする。文字認識手段２で
は、図５（ｂ）の候補文字マトリクスから図１５の検索
キー表を作成する。As an example, in the description of the first embodiment, it is assumed that data of "Yamamura Shoten" in FIG. 3 is input as an input character pattern string. Character segmentation means 1
5A and 5B are obtained as a result of the character recognition unit 3. The character recognizing means 2 creates the search key table of FIG. 15 from the candidate character matrix of FIG. 5B.

【００２７】キー文字照合手段４におけるキー文字辞書
３と図１５の検索キー表との照合の直接的な結果は図１
６となり、これから作成される候補単語表は図１７とな
る。図１７からわかるように、文字認識結果の「商」と
「店」は図２の単語辞書中の７個の単語で用いられてい
るため、候補単語の数が第１の実施例における説明の場
合よりも大幅に大きくなっている。The direct result of the comparison between the key character dictionary 3 and the retrieval key table of FIG.
6, and the candidate word table to be created is shown in FIG. As can be seen from FIG. 17, since the “quotient” and “store” of the character recognition result are used in seven words in the word dictionary of FIG. 2, the number of candidate words is less than that of the description in the first embodiment. It is much larger than if you were.

【００２８】本実施例は、上記に鑑みてなされたもの
で、検索キーとしての重要性により単語に付与するキー
文字を制限することにより、認識精度の高い文字認識後
処理を高速に行えるようにする例を示すものである。The present embodiment has been made in view of the above, and by limiting key characters to be assigned to words according to the importance as a search key, post-recognition processing with high recognition accuracy can be performed at high speed. This is an example of the operation.

【００２９】図１８は、本発明の第２の実施例に関わる
文字認識後処理方式の構成を示すブロック図である。図
１８において、１は文字切り出し手段、２は文字認識手
段、３はキー文字辞書、４はキー文字照合手段、６０は
非キー文字辞書、７０は非キー文字照合手段、５は候補
単語評価手段である。FIG. 18 is a block diagram showing a configuration of a character recognition post-processing system according to the second embodiment of the present invention. In FIG. 18, 1 is a character segmenting means, 2 is a character recognizing means, 3 is a key character dictionary, 4 is a key character collating means, 60 is a non-key character dictionary, 70 is a non-key character collating means, and 5 is a candidate word evaluating means. It is.

【００３０】まず、文字切り出し手段１に文字パターン
列が入力され、個々の文字パターンの切り出し処理が行
われる。そして、個々の文字パターンが文字認識手段２
で認識される。ここまでの処理は第１の実施例と同じで
あり、入力データの例として、図３の「山村商店」のイ
メージデータを用いると、文字切り出し手段１と文字認
識手段２の処理結果が図５（ａ）と図５（ｂ）で示され
る候補文字マトリクスとなる。First, a character pattern string is input to the character extracting means 1, and individual character pattern extracting processing is performed. Then, each character pattern is read by the character recognition means 2.
Recognized by The processing up to this point is the same as that of the first embodiment. When the image data of “Yamamura Shoten” in FIG. 3 is used as an example of the input data, the processing results of the character cutout unit 1 and the character recognition unit 2 are shown in FIG. The candidate character matrix shown in FIG. 5A and FIG. 5B is obtained.

【００３１】次に、候補文字マトリクスはキー文字照合
手段４に入力される。ここでは説明のために、正解であ
る図５（ｂ）の候補文字マトリクスを用いる。キー文字
照合手段４は、候補文字マトリクスから図１５の検索キ
ー表を作成し、キー文字辞書３と照合する。本実施例に
おけるキー文字辞書３の構成例を次に説明する。Next, the candidate character matrix is input to the key character matching means 4. Here, for the sake of explanation, the candidate character matrix shown in FIG. The key character matching means 4 creates the search key table of FIG. Next, a configuration example of the key character dictionary 3 in the present embodiment will be described.

【００３２】「発明が解決しようとする課題」で示した
ように、単語辞書を構成する文字の中には多くの単語で
使用されている文字が存在し、それらを全ての単語にキ
ー文字として付与すると候補単語数の増大につながる危
険性がある。そこで、本実施例では各単語を構成する文
字の検索キーとしての重要性を評価し、重要性の高い文
字だけをキー文字とて選択することを考える。検索キー
としての重要性の尺度にはさまざまものが考えられる
が、説明のために、その文字が使用されている単語数の
逆数を用い、これを重要度と呼ぶ。As described in "Problems to be Solved by the Invention", there are characters used in many words in the characters constituting the word dictionary, and these are used as key characters in all the words. If assigned, there is a risk of leading to an increase in the number of candidate words. Therefore, in this embodiment, it is considered that the importance of a character constituting each word as a search key is evaluated, and only a character having high importance is selected as a key character. There can be various measures of importance as a search key, but for explanation, the reciprocal of the number of words in which the character is used is referred to as importance.

【００３３】図２の単語辞書における各文字について、
使用されている単語数および重要度をまとめたのが図１
９である。ここで、キー文字として採用する基準の例と
して「重要度が０．４以上であること」を用いると、図
１９における文字の中で、「島」「商」「店」以外の文
字がキー文字として採用され、キー文字辞書は図２０と
なる。図１５の検索キー表と図２０のキー文字辞書の照
合結果の直接的な結果は図２１となり、これから作成さ
れる候補単語表は図２２となる。For each character in the word dictionary of FIG.
Figure 1 summarizes the number and importance of words used
9 Here, when "importance is 0.4 or more" is used as an example of a standard adopted as a key character, characters other than "island", "quote", and "store" in the characters in FIG. FIG. 20 shows a key character dictionary adopted as characters. FIG. 21 shows a direct result of the collation result between the search key table of FIG. 15 and the key character dictionary of FIG. 20, and FIG. 22 shows a candidate word table created from this.

【００３４】非キー文字照合手段７０は、キー文字辞書
３から出力された候補単語表を非キー文字辞書６０と照
合する。非キー文字辞書６０は各単語を構成する文字の
中でキー文字辞書に登録されていない文字の情報（文字
と正解文字位置から構成される情報であり、非キー文字
情報と呼ぶ）を各単語についてまとめたもので、図２３
に例を示す。非キー文字照合手段７０は、まず候補単語
表に含まれる各候補単語について、非キー文字情報を非
キー文字辞書６０から検索して検索キー表と照合する。
図２３の非キー文字辞書を用いて図２２の候補単語表と
図２１の検索キー表を照合した結果を図２４に示す。こ
れを図１７と比較すると、候補単語数が７個から２個へ
と大幅に削減されていることがわかる。従って、各候補
単語について行うその後の処理に要する時間も、大幅に
削減されることになる。すなわち、文字認識後処理の処
理速度の向上が図れる。The non-key character collating unit 70 collates the candidate word table output from the key character dictionary 3 with the non-key character dictionary 60. The non-key character dictionary 60 stores information of characters (character information and non-key character information, which are composed of characters and correct character positions) among characters constituting each word which are not registered in the key character dictionary. FIG. 23
Shows an example. The non-key character matching unit 70 first searches the non-key character dictionary 60 for non-key character information for each candidate word included in the candidate word table, and matches the non-key character information with the search key table.
FIG. 24 shows the result of comparing the candidate word table of FIG. 22 with the search key table of FIG. 21 using the non-key character dictionary of FIG. Comparing this with FIG. 17, it can be seen that the number of candidate words has been significantly reduced from seven to two. Therefore, the time required for the subsequent processing performed on each candidate word is also greatly reduced. That is, the processing speed of the character recognition post-processing can be improved.

【００３５】次に、候補単語評価手段５は、非キー文字
照合手段７０の結果について、第１の実施例と同じ方法
で各候補単語の評価スコアを計算し、最も評価スコアの
高い候補単語を正解として出力する（例を図２５に示
す）。この例では「山村商店」が正しく選ばれている。Next, the candidate word evaluation means 5 calculates the evaluation score of each candidate word in the same manner as in the first embodiment, for the result of the non-key character matching means 70, and determines the candidate word having the highest evaluation score. Output as a correct answer (an example is shown in FIG. 25). In this example, "Yamamura Shoten" is correctly selected.

【００３６】[0036]

【発明の効果】以上の説明で明らかなように、本発明の
文字認識後処理方式によれば、単語辞書中の単語と文字
切り出し結果から推定される文字位置が正解文字位置か
らずれていても単語照合が行えるので、認識精度を向上
させることができる。As is clear from the above description, according to the character recognition post-processing method of the present invention, even if the character position estimated from the word in the word dictionary and the character cutout result is shifted from the correct character position. Since word matching can be performed, recognition accuracy can be improved.

【００３７】また、請求項２の発明によれば、特に、単
語に検索キーとして付与するキー文字を選択することで
キー文字照合の結果の候補単語数を減少させることがで
きるので、文字認識後処理の処理速度を高速化させるこ
とができる。According to the second aspect of the present invention, in particular, the number of candidate words as a result of key character collation can be reduced by selecting a key character to be assigned to a word as a search key. The processing speed of the processing can be increased.

[Brief description of the drawings]

【図１】本発明の第１の実施例の文字認識後処理方式の
構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a character recognition post-processing method according to a first embodiment of the present invention.

【図２】本発明の実施例および従来例における説明のた
めに用いる単語辞書の例を示す図FIG. 2 is a diagram showing an example of a word dictionary used for explanation in an embodiment of the present invention and a conventional example.

【図３】入力データの例として単語「山村商店」の表記
のイメージデータを示す図FIG. 3 is a diagram showing image data of the word “Yamamura Shoten” as an example of input data;

【図４】（ａ），（ｂ）は図３のイメージデータから文
字切り出し処理によって得られた切り出し結果の例を示
す図FIGS. 4A and 4B are diagrams showing an example of a cutout result obtained by a character cutout process from the image data of FIG. 3;

【図５】（ａ），（ｂ）は図４（ａ），（ｂ）の文字切
り出し結果を認識して得られる候補文字マトリクスを示
す図FIGS. 5A and 5B are diagrams showing candidate character matrices obtained by recognizing the character segmentation results of FIGS. 4A and 4B;

【図６】図５（ｂ）の候補文字マトリクスと単語「山村
商店」の照合を示す図FIG. 6 is a diagram showing a comparison between the candidate character matrix of FIG. 5B and the word “Yamamura Shoten”;

【図７】「株式会社ミクロ」のイメージデータを入力デ
ータの例とした文字切り出し結果の例を示す図FIG. 7 is a diagram illustrating an example of a character cutout result using image data of “Micro Corporation” as an example of input data;

【図８】図７から得られる候補文字マトリクスと単語
「株式会社ミクロ」の照合を示す図FIG. 8 is a diagram showing matching between a candidate character matrix obtained from FIG. 7 and the word “Micro Corporation”;

【図９】候補文字と推定文字位置から構成される検索キ
ー表の例を示す図FIG. 9 is a diagram illustrating an example of a search key table including candidate characters and estimated character positions.

【図１０】図２の単語辞書から各単語の各文字の個別文
字情報を作成した例を示す図FIG. 10 is a diagram showing an example in which individual character information of each character of each word is created from the word dictionary of FIG. 2;

【図１１】上記第１の実施例におけるキー文字辞書の例
を示す図FIG. 11 is a diagram showing an example of a key character dictionary in the first embodiment.

【図１２】図９の検索キー表と図１１のキー文字辞書の
照合により得られる直接的な照合結果の例を示す図12 is a diagram showing an example of a direct collation result obtained by collating the search key table of FIG. 9 with the key character dictionary of FIG. 11;

【図１３】第１の実施例における候補単語表の例を示す
図FIG. 13 is a diagram showing an example of a candidate word table according to the first embodiment;

【図１４】図１３の候補単語表において各候補単語の評
価スコアを計算した例を示す図14 is a diagram showing an example of calculating an evaluation score of each candidate word in the candidate word table of FIG.

【図１５】図４（ｂ）の候補文字マトリクスから得られ
た検索キー表を示す図FIG. 15 is a view showing a search key table obtained from the candidate character matrix of FIG. 4 (b).

【図１６】キー文字照合手段３と図１５の検索キー表と
の照合の直接的な結果を示す図FIG. 16 is a diagram showing a direct result of collation between the key character collating means 3 and the retrieval key table of FIG. 15;

【図１７】図１６から作成される候補単語表を示す図FIG. 17 is a diagram showing a candidate word table created from FIG. 16;

【図１８】本発明の第２の実施例に関わる文字認識後処
理方式の構成を示すブロック図FIG. 18 is a block diagram illustrating a configuration of a character recognition post-processing method according to a second embodiment of the present invention.

【図１９】図２の単語辞書における各文字について、使
用されている単語数および重要度をまとめた表を示す図19 is a diagram showing a table summarizing the number of words used and the degree of importance for each character in the word dictionary in FIG. 2;

【図２０】上記第２の実施例におけるキー文字辞書の例
を示す図FIG. 20 is a diagram showing an example of a key character dictionary in the second embodiment.

【図２１】図１５の検索キー表と図２０のキー文字辞書
の照合結果の直接的な結果を示す図21 is a diagram showing a direct result of a collation result between the search key table of FIG. 15 and the key character dictionary of FIG. 20;

【図２２】図２１から作成される候補単語表を示す図FIG. 22 is a diagram showing a candidate word table created from FIG. 21;

【図２３】上記第２の実施例における非キー文字辞書の
例を示す図FIG. 23 is a diagram showing an example of a non-key character dictionary in the second embodiment.

【図２４】図２２の候補単語表と図２１の検索キー表を
照合した結果を示す図FIG. 24 is a diagram showing a result of comparing the candidate word table of FIG. 22 with the search key table of FIG. 21;

【図２５】図２４における各候補単語の評価スコアを計
算した結果を示す図FIG. 25 is a diagram showing a result of calculating an evaluation score of each candidate word in FIG. 24;

[Explanation of symbols]

１…文字切り出し手段２…文字認識手段３…キー文字辞書４…キー文字照合手段５…候補単語評価手段６０…非キー文字辞書７０…非キー文字照合手段 DESCRIPTION OF SYMBOLS 1 ... Character cut-out means 2 ... Character recognition means 3 ... Key character dictionary 4 ... Key character collation means 5 ... Candidate word evaluation means 60 ... Non-key character dictionary 70 ... Non-key character collation means

フロントページの続き (56)参考文献特開平４−96889（ＪＰ，Ａ) 特開平２−264388（ＪＰ，Ａ) 特開平２−121078（ＪＰ，Ａ) 特開昭61−136182（ＪＰ，Ａ) 特開昭59−188783（ＪＰ，Ａ) 「情報処理学会全国大会講演論文集」第34回（昭和62年前期）Ｎｏ．３ｐ. 1845−1846 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06K 9/72 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References JP-A-4-96889 (JP, A) JP-A-2-264388 (JP, A) JP-A-2-121078 (JP, A) JP-A-61-136182 (JP) , A) JP-A-59-188783 (JP, A) "Transactions of the National Convention of the Information Processing Society of Japan," No. 34 (Early 1987) 3 p. 1845-1846 (58) Fields investigated (Int. Cl. ⁷ , DB name) G06K 9/72 JICST file (JOIS)

Claims

(57) [Claims]

1. A character recognition apparatus for recognizing an input character pattern sequence, comprising: a character extracting unit for extracting an individual character pattern from the input character pattern sequence; and character recognizing an individual character pattern output from the character extracting unit. A character recognition unit, a key character dictionary summarizing key characters assigned to the word as an index and character positions of the key characters, and matching the key character dictionary with a character recognition result output from the character recognition unit. Key character matching means for obtaining a candidate word and the character position of the key character from the key character; and a candidate word having a high possibility of being a correct answer among candidate words obtained by the key character matching means. And a candidate word evaluation means for evaluating and selecting at a position and outputting the selected word.

2. A character recognition device for recognizing an input character pattern sequence, comprising: a character extracting unit for extracting an individual character pattern from the input character pattern sequence; and character recognizing the individual character pattern output from the character extracting unit. A character recognition unit, a key character dictionary that summarizes key characters having importance assigned to each word as an index and character positions of the key characters, a character recognition result output from the character recognition unit, and the key character dictionary. Key character matching means for obtaining a candidate word and a character position of the key character from the corresponding key character by collating, non-key characters not registered in the key character dictionary among the key characters constituting each word, and the non-key A non-key character dictionary in which character positions of characters are grouped for each word; and the key sentences of candidate words obtained by the key character matching means. Non-key character information not registered in the character dictionary is obtained from the non-key character dictionary, and the non-key character information is compared with the character recognition result output from the character recognition means, and the character position of the corresponding non-key character is determined. Non-key character matching means to be obtained, and a word having a high possibility of being a correct answer among the candidate words is evaluated and selected and output at a character position obtained by the key character matching means and / or the non-key character matching means. A character recognition post-processing method characterized by using candidate word evaluation means.