JPH10116321A

JPH10116321A - Character recognition method and device therefor

Info

Publication number: JPH10116321A
Application number: JP8271150A
Authority: JP
Inventors: Mikio Hasegawa; 幹夫長谷川; Hirohisa Goto; 裕久後藤
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1996-10-14
Filing date: 1996-10-14
Publication date: 1998-05-06

Abstract

PROBLEM TO BE SOLVED: To reduce an erroneous recognition rate by correcting a collation order so that the collation order of a recognition candidate which is character-recognized becomes the first through the use of a dictionary whose type is the same as the type of the characters on the respective characters of a single character area when the character area is judged to be the single character area where a single type of characters are mentioned. SOLUTION: A character segment part 16 segments respective character pictures, generates character area information, sums up the number of the character pictures contained in the respective character areas and generates an intra-character area character number summed table. A single character area judgment part 22 judges whether the respective character areas are the single character area where the type of the characters contained in the character area is single or not based on character area information of the character area, which is sent from the character segment part 16, or collation dictionary information being the recognition result of the respective character pictures contained in the character area, which is sent from the distance calculation part 20. The type of the dictionary which collation dictionary information of the recognition candidate whose collation order of the respective character pictures is the first shows is set to be the type of the representative dictionary of the character picture.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、文字認識方法お
よび文字認識装置に関する。The present invention relates to a character recognition method and a character recognition device.

【０００２】[0002]

【従来の技術】従来の文字認識装置においては、読み取
り対象の文書や帳票といった媒体上の手書き文字、印刷
文字（活字）といった文字パタンを認識するにあたり、
取り込まれたイメージから切り出された個々の文字パタ
ン毎に文字認識を行う。文字認識の際には、切り出され
た個々の文字パタンに対して、特徴抽出を行って特徴ベ
クトルを算出する。そして、算出された特徴ベクトルと
辞書に格納されている標準パタンの特徴ベクトルとを比
較する。この標準パタンの辞書には、手書き文字用辞書
や活字用辞書といった複数の辞書が用いられる。そし
て、複数の辞書の中から、算出された特徴ベクトルに最
も類似している標準パタンの特徴ベクトルの文字カテゴ
リが選択されて認識結果となる。この認識結果は、ＣＲ
Ｔに表示される。オペレータは、表示された認識結果を
確認し、認識結果に誤りがある場合に認識結果を訂正し
ていた。2. Description of the Related Art In a conventional character recognition apparatus, when recognizing a character pattern such as a handwritten character on a medium such as a document or a form to be read, or a printed character (printed character),
Character recognition is performed for each character pattern cut out from the captured image. At the time of character recognition, a feature vector is calculated by performing feature extraction on each cut-out character pattern. Then, the calculated feature vector is compared with the feature vector of the standard pattern stored in the dictionary. As the dictionary of the standard pattern, a plurality of dictionaries such as a dictionary for handwritten characters and a dictionary for printed characters are used. Then, a character category of a feature vector of a standard pattern that is most similar to the calculated feature vector is selected from a plurality of dictionaries, and a recognition result is obtained. The result of this recognition is CR
It is displayed on T. The operator checks the displayed recognition result, and corrects the recognition result when there is an error in the recognition result.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
文字認識装置においては、複数の辞書を用いて文字認識
を行う際に、文字パタンの文字種と異なる種類の辞書を
用いて文字認識が行われることがあった。例えば、手書
き文字の文字パタンの文字認識を行うのに、活字用の辞
書が用いられてしまう場合があった。文字パタンの種類
と異なる種類の辞書を用いて得られた認識結果の誤認識
率は、文字パタンの種類と同一の種類の辞書を用いて得
られた認識結果の誤認識率よりも、一般に高くなる傾向
がある。However, in the conventional character recognition apparatus, when performing character recognition using a plurality of dictionaries, character recognition is performed using a dictionary of a type different from the character type of the character pattern. was there. For example, a character dictionary may be used for character recognition of a character pattern of handwritten characters. The misrecognition rate of recognition results obtained using a dictionary of a type different from the type of character pattern is generally higher than that of recognition results obtained using a dictionary of the same type as the type of character pattern. Tend to be.

【０００４】このため、誤認識率の低減を図ることがで
きる文字認識装置および文字認識方法の実現が望まれて
いた。[0004] Therefore, it has been desired to realize a character recognition device and a character recognition method capable of reducing the erroneous recognition rate.

【０００５】[0005]

【課題を解決するための手段】ところで、一般に、文字
が記録された媒体において一つの欄に記載されている文
字の種類は、一種類であることが多い。そして、文字認
識にあたっては、通常、一つの欄を一つの文字領域とし
て扱っている。従って、一つの文字領域に記載されてい
る文字の種類は一種類であることが多い。すなわち、個
々の文字領域に記載されている文字は、例えば手書き文
字のみ、あるいは活字のみというように一種類であるこ
とが多い。そこで、この出願に係る発明者は、この点に
着目した。By the way, in general, there is often only one kind of character described in one column in a medium on which characters are recorded. In character recognition, one field is usually treated as one character area. Therefore, the type of character described in one character area is often one type. That is, the characters described in the individual character areas are often of one type, such as only handwritten characters or only printed characters. Therefore, the inventor of the present application paid attention to this point.

【０００６】（第１の発明）そこで、この出願に係る第
１の発明の文字認識方法によれば、文字が記載された媒
体から取り込まれたイメージから個々の文字画像を切り
出し、個々の文字画像の特徴を抽出し、文字画像の特徴
と、複数種類の辞書にそれぞれ格納されている文字認識
用の標準文字パタンの特徴との照合を行って、この文字
画像に対する複数の標準パタンを認識候補として照合順
位を付けて選択し、かつ、この認識候補のうちの照合順
位の最も高い認識候補を認識結果とする文字認識方法に
おいて、文字画像が含まれていた文字領域を表す文字領
域情報を生成し、認識結果または認識候補の標準文字パ
タンが格納されている辞書の種類を示す照合辞書情報を
生成し、文字領域情報および照合辞書情報に基づいて、
各文字領域が、当該文字領域に含まれる文字の種類が単
一である単一文字領域であるか否かをそれぞれ判定し、
文字領域が単一文字領域である場合に、この単一文字領
域に含まれる文字の種類に対応する辞書の種類を示す領
域代表辞書情報を生成し、単一文字領域に属する各文字
画像の認識候補のうち、照合辞書情報と領域代表辞書情
報とが一致する一致認識候補の照合順位を、この一致認
識候補どうしの照合順位を維持したままで、照合辞書情
報と領域代表辞書情報とが一致しない他の認識候補の照
合順位よりも高くなるように修正することを特徴とす
る。(First Invention) Therefore, according to the character recognition method of the first invention of the present application, individual character images are cut out from an image captured from a medium in which characters are described, and the individual character images are cut out. The feature of the character image is compared with the feature of the standard character pattern for character recognition stored in each of the plurality of types of dictionaries, and a plurality of standard patterns for the character image are recognized as recognition candidates. In a character recognition method in which a collation order is assigned and selected, and a recognition candidate having the highest collation order among the recognition candidates is used as a recognition result, character region information representing a character region containing a character image is generated. Generating collation dictionary information indicating the type of dictionary in which the recognition result or the standard character pattern of the recognition candidate is stored, based on the character area information and the collation dictionary information,
It is determined whether each character region is a single character region in which the type of character included in the character region is single,
When the character area is a single character area, area representative dictionary information indicating the type of dictionary corresponding to the type of character included in the single character area is generated, and recognition candidates for each character image belonging to the single character area are generated. The matching order of the matching recognition candidates in which the matching dictionary information and the area representative dictionary information match with each other is maintained while maintaining the matching order of the matching recognition candidates. It is characterized in that it is corrected so as to be higher than the matching order of the candidates.

【０００７】また、第１の発明の文字認識方法におい
て、好ましくは、文字領域が、単一文字領域であるか否
かを判定するにあたり、文字領域情報に基づいて文字領
域に属すると特定された各文字画像毎に、文字画像の各
認識候補についての照合辞書情報が示す辞書の種類のう
ちから代表辞書の種類をそれぞれ決定し文字領域に含ま
れるこの各文字画像の各代表辞書の種類に基づいて、文
字領域が単一文字領域であるか否かを判定すると良い。In the character recognition method according to the first aspect of the present invention, preferably, in determining whether or not the character area is a single character area, each character area specified as belonging to the character area based on the character area information is determined. For each character image, the type of the representative dictionary is determined from the type of dictionary indicated by the collation dictionary information for each recognition candidate of the character image, and based on the type of each representative dictionary of each character image included in the character area. It is preferable to determine whether or not the character area is a single character area.

【０００８】また、第１の発明の文字認識方法におい
て、好ましくは、辞書として、手書文字用の辞書および
活字用の辞書を用いると良い。In the character recognition method of the first invention, it is preferable to use a dictionary for handwritten characters and a dictionary for printed characters.

【０００９】（第２の発明）また、この出願に係る第２
の発明の文字認識装置によれば、文字認識用の標準パタ
ンがそれぞれ格納されている複数種類の辞書と、文字が
記載された媒体から取り込まれたイメージから文字領域
を抽出し、文字領域から個々の文字画像を切り出す文字
切出部と、個々の文字画像の特徴を抽出する特徴抽出部
と、文字画像の特徴と標準文字パタンの特徴との照合を
行って、この文字画像に対する複数の標準文字パタンを
認識候補として照合順位を付けて選択し、かつ、この認
識候補のうちの照合順位の最も高い認識候補を認識結果
とする照合部とを具えた文字認識装置において、文字切
出部として、文字画像が含まれていた文字領域を示す文
字領域情報を生成する文字切出部を具え、照合部とし
て、認識候補の標準文字パタンが格納されている辞書の
種類を示す照合辞書情報を生成する照合部を具え、文字
領域情報および照合辞書情報に基づいて、各文字領域
が、当該文字領域に含まれる文字の種類が単一である単
一文字領域であるか否かをそれぞれ判定し、文字領域が
単一文字領域である場合に、この単一文字領域に含まれ
る文字の種類に対応する辞書の種類を示す領域代表辞書
情報を生成する単一文字領域判定部を具え、単一文字領
域に属する各文字画像の認識候補のうち、照合辞書情報
と領域代表辞書情報とが一致する一致認識候補の照合順
位を、この一致認識候補どうしの照合順位を維持したま
まで、照合辞書情報と領域代表辞書情報とが一致しない
他の認識候補の照合順位よりも高くなるように修正する
修正部を具えてなることを特徴とする。(Second invention) A second invention according to this application is described.
According to the character recognition apparatus of the present invention, character areas are extracted from a plurality of types of dictionaries each storing a standard pattern for character recognition, and an image captured from a medium in which characters are described, and individual character areas are extracted from the character areas. A character extracting unit for extracting the character image of the character image, a feature extracting unit for extracting the characteristics of each character image, and comparing the character image characteristics with the standard character pattern characteristics to obtain a plurality of standard characters for the character image. In a character recognition device including a pattern and a collation unit that selects a pattern as a recognition candidate and sets a collation order as a recognition candidate, and a collation unit that sets a recognition result having the highest collation order among the recognition candidates as a recognition result, as a character cutout unit, A collation dictionary that includes a character cutout unit that generates character area information indicating a character area in which a character image has been included, and as a collation unit, indicates a type of dictionary in which standard character patterns of recognition candidates are stored. A comparison unit that generates a report, and determines whether each character region is a single character region in which the character type included in the character region is a single character based on the character region information and the collation dictionary information. When the character region is a single character region, the character region includes a single character region determination unit that generates region representative dictionary information indicating a dictionary type corresponding to a character type included in the single character region. Among the recognition candidates of each of the character images to which the matching dictionary information and the area representative dictionary information match, the matching order of the matching recognition candidates that match the matching dictionary information is maintained. It is characterized by comprising a correction unit that corrects the recognition order so that it is higher than the collation order of other recognition candidates that do not match the dictionary information.

【００１０】また、第２の発明の文字認識装置におい
て、好ましくは、単一文字領域判定部は、文字領域情報
に基づいて文字領域に属すると特定された各文字画像毎
に、文字画像の各認識候補についての照合辞書情報が示
す辞書の種類のうちから代表辞書の種類をそれぞれ決定
し文字領域に含まれるこの各文字画像の各代表辞書の種
類に基づいて、文字領域が単一文字領域であるか否かを
判定する単一文字領域判定部であると良い。In the character recognition device according to the second aspect of the present invention, preferably, the single character area determination unit performs the recognition of each character image for each character image specified to belong to the character area based on the character area information. Determine the type of the representative dictionary from the types of dictionaries indicated by the matching dictionary information about the candidates, and determine whether the character region is a single character region based on the type of each representative dictionary of each character image included in the character region It may be a single character area determination unit that determines whether or not it is not.

【００１１】また、第２の発明の文字認識装置におい
て、好ましくは、辞書として、手書文字用の辞書および
活字用の辞書を具えてなると良い。In the character recognition device according to the second aspect of the present invention, preferably, the dictionary may include a dictionary for handwritten characters and a dictionary for printed characters.

【００１２】尚、この出願に係る各発明において、文字
は、記号一般を含む。In each of the inventions according to this application, characters include general symbols.

【００１３】このように、第１および第２の発明によれ
ば、文字領域が単一種類の文字が記載された単一文字領
域と判定された場合に、単一文字領域の各文字につい
て、その文字の種類と同じ種類の辞書を用いて文字認識
された認識候補の照合順位が１位となるように、照合順
位を修正することができる。その結果、文字の種類と認
識結果に用いた辞書の種類との一致を図ることができ
る。このため、誤認識率の低減を図ることができる。As described above, according to the first and second aspects of the invention, when the character area is determined to be a single character area in which a single type of character is described, the character of each character in the single character area is The collation order can be corrected so that the collation order of the recognition candidates recognized by the characters using the same type of dictionary is the first. As a result, it is possible to match the type of the character with the type of the dictionary used for the recognition result. For this reason, the false recognition rate can be reduced.

【００１４】[0014]

【発明の実施の形態】以下、図面を参照して、この出願
に係る第１の発明の文字認識方法および第２の発明の文
字認識装置の実施の形態について併せて説明する。尚、
参照する図面は、これらの発明が理解できる程度に各構
成成分の大きさ、形状および配置関係を概略的に示して
あるに過ぎない。従って、これらの発明は、図示例にの
み限定されるものではない。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, with reference to the drawings, an embodiment of a character recognition method according to a first invention and a character recognition device according to a second invention according to the present application will be described together. still,
The drawings referred to merely schematically show the sizes, shapes, and arrangements of the components so that these inventions can be understood. Therefore, these inventions are not limited only to the illustrated examples.

【００１５】（構成について）先ず、図１のブロック図
を参照して、この実施の形態の文字認識装置の構成につ
いて説明する。(Configuration) First, the configuration of the character recognition device of this embodiment will be described with reference to the block diagram of FIG.

【００１６】この実施の形態の文字認識装置は、複数の
種類の辞書部１０、スキャナ部１２、画像記憶部１４、
文字切出部１６、特徴抽出部１８、照合部としての距離
計算部２０、単一文字領域判定部２２および修正部２４
を具えている。The character recognition device of this embodiment includes a plurality of types of dictionary units 10, scanner units 12, image storage units 14,
Character extraction unit 16, feature extraction unit 18, distance calculation unit 20 as a comparison unit, single character region determination unit 22, and correction unit 24
It has.

【００１７】また、この辞書部１０は、手書き文字用の
辞書１０ａおよび活字用の辞書１０ｂを具えている。手
書き文字用の辞書１０ａには、手書き文字の文字認識用
の標準パタンが格納されており、活字用の辞書１０ｂに
は、活字の文字認識用の標準パタンが格納されている。The dictionary unit 10 includes a dictionary 10a for handwritten characters and a dictionary 10b for printed characters. The handwritten character dictionary 10a stores standard patterns for character recognition of handwritten characters, and the dictionary 10b for printed characters stores standard patterns for character recognition of printed characters.

【００１８】また、各辞書の標準パタンは、その標準パ
タンに対応する文字コードの手書き文字または活字の文
字画像から作成された特徴ベクトルを集めたものであ
る。ここでは、標準パタンの特徴ベクトルとして、ここ
では、同一文字コードの複数の標準文字画像からそれぞ
れ得られた特徴ベクトルを平均したものを用いている。
尚、同一の文字コードについて複数の標準パタンがあっ
ても良いが、１つの標準パタンに対応する文字コードは
１つだけである。また、文字コードは、ＪＩＳコードに
限定する必要はない。例えば、文字コードとしてＵｎｉ
ｃｏｄｅ（ユニコード）を用いても良い。The standard pattern of each dictionary is a collection of feature vectors created from handwritten or printed character images of character codes corresponding to the standard pattern. Here, as the feature vector of the standard pattern, an average of feature vectors respectively obtained from a plurality of standard character images having the same character code is used.
Note that there may be a plurality of standard patterns for the same character code, but only one character code corresponds to one standard pattern. Further, the character code does not need to be limited to the JIS code. For example, Uni as a character code
code (Unicode) may be used.

【００１９】また、スキャナ部１２においては、文字が
記載された媒体からこの文字の二値化されたイメージが
取り込まれる。The scanner unit 12 takes in a binarized image of the character from a medium on which the character is described.

【００２０】また、文字切出部１６において、取り込ま
れたイメージは、文字が並んでいる領域ではその領域を
文字列に分解される。さらに、文字列から個々の文字画
像が切り出される。そして、文字切出部１６は、切り出
された各文字画像が含まれていた文字領域を表す文字領
域情報を生成する。In the character extracting section 16, the captured image is decomposed into a character string in a region where characters are arranged. Further, individual character images are cut out from the character string. Then, the character extracting section 16 generates character area information indicating the character area in which each of the extracted character images was included.

【００２１】また、特徴抽出部１８においては、文字画
像の線幅、高さ、水平成分および垂直成分といった様々
な特徴が、個々の文字画像の特徴ベクトルとして抽出さ
れる。In the feature extracting section 18, various features such as the line width, height, horizontal component and vertical component of the character image are extracted as a feature vector of each character image.

【００２２】また、距離計算部２０においては、文字画
像の特徴と標準文字パタンの特徴との照合が行われ、そ
の結果、この文字画像に対する複数の標準文字パタンが
認識候補として照合順位を付けて選択される。そして、
この認識候補のうちの照合順位の最も高い認識候補が認
識結果とされる。さらに、この距離計算部２０におい
て、認識結果または認識候補の標準文字パタンが格納さ
れている辞書の種類を示す照合辞書情報が生成される。In the distance calculation unit 20, the features of the character image are compared with the features of the standard character pattern. As a result, a plurality of standard character patterns for the character image are collated as recognition candidates. Selected. And
The recognition candidate having the highest collation order among the recognition candidates is set as the recognition result. Further, the distance calculation unit 20 generates collation dictionary information indicating the type of dictionary in which the recognition result or the standard character pattern of the recognition candidate is stored.

【００２３】また、単一文字領域判定部２２において
は、文字領域情報および照合辞書情報に基づいて、各文
字領域が、当該文字領域に含まれる文字の種類が単一で
ある単一文字領域であるか否かがそれぞれ判定される。
さらに、文字領域が単一文字領域である場合には、この
単一文字領域に含まれる文字の種類に対応する辞書の種
類を示す領域代表辞書情報が生成される。In the single character area determination section 22, based on the character area information and the collation dictionary information, each character area is determined to be a single character area in which the character type included in the character area is a single type. It is determined whether or not each is not.
Further, when the character area is a single character area, area representative dictionary information indicating the type of dictionary corresponding to the type of character included in the single character area is generated.

【００２４】また、修正部２４においては、単一文字領
域に含まれる各文字画像の認識候補のうち、照合辞書情
報と領域代表辞書情報とが一致する一致認識候補の照合
順位が、この一致認識候補どうしの照合順位を維持した
ままで、照合辞書情報と領域代表辞書情報とが一致しな
い他の認識候補の照合順位よりも高くなるように修正さ
れる。In the correction unit 24, among the recognition candidates of each character image included in the single character area, the matching order of the matching recognition candidate whose matching dictionary information matches the area representative dictionary information is determined by the matching recognition candidate. While maintaining the collation order, the modification is performed so that the collation dictionary information and the area representative dictionary information become higher than the collation order of other recognition candidates that do not match.

【００２５】（動作について）次に、この実施の形態の
文字認識方法について説明する。(Operation) Next, a character recognition method according to this embodiment will be described.

【００２６】ここでは、図２に示す振込依頼書の帳票に
記載された文字を認識する例について説明する。また、
図３に、図２に示された帳票のフォーマット情報、すな
わち、各欄の位置および範囲を示す情報を示す。ここで
は、第１領域〜第５領域のフォーマット情報を示す。第
１領域は、図２に示された振込依頼書の左側の１段目の
銀行名（「凸凹銀行」）の欄に対応している。また、第
２領域は、この振込依頼書の左側の２段目の支店名
（「八王子支店」）の欄に対応している。また、第３領
域は、この振込依頼書の左側の３段目の会社名（「Ｘ電
気株式会社」）の欄に対応している。また、第４領域
は、この振込依頼書の左側の４段目の氏名（「沖太
郎」）の欄に対応している。また、第５領域は、この振
込依頼書の右側の１段目の数字（「１００００」）の欄
に対応している。Here, an example of recognizing characters described in the form of the transfer request form shown in FIG. 2 will be described. Also,
FIG. 3 shows the format information of the form shown in FIG. 2, that is, information indicating the position and range of each column. Here, the format information of the first to fifth areas is shown. The first area corresponds to the first row of bank names ("rough banks") on the left side of the transfer request form shown in FIG. The second area corresponds to the branch name (“Hachioji branch”) in the second row on the left side of the transfer request form. The third area corresponds to the column of the company name (“X Electric Co., Ltd.”) in the third row on the left side of the transfer request form. The fourth area corresponds to the name (“Okitaro”) in the fourth row on the left side of the transfer request form. The fifth area corresponds to the first-number (“10000”) column on the right side of the transfer request form.

【００２７】この帳票の文字認識を行うにあたり、先
ず、スキャナ部１２によって、この帳票のイメージを取
り込む。そして、スキャナ部１２は、読み取ったイメー
ジの画像データを、画像記憶部１４に送る。In performing the character recognition of this form, first, an image of this form is taken in by the scanner unit 12. Then, the scanner unit 12 sends the image data of the read image to the image storage unit 14.

【００２８】次に、画像記憶部１４は、送られてきた画
像データを保存する。そして、画像記憶部１４は、新た
な画像データが保存されたことを、文字切出部１６に知
らせる。Next, the image storage unit 14 stores the sent image data. Then, the image storage unit 14 notifies the character extracting unit 16 that the new image data has been saved.

【００２９】次に、この知らせを受けた文字切出部１６
は、画像記憶部１４に保存されている画像データを読出
してきて、個々の文字画像を切り出す。Next, the character extracting section 16 receiving this information
Reads out image data stored in the image storage unit 14 and cuts out individual character images.

【００３０】文字画像の切り出しにあたり、文字切出部
１６は、イメージ全体の中での切り出された文字画像の
位置および大きさを示す位置大きさ情報を作成する。In extracting a character image, the character extracting section 16 creates position and size information indicating the position and size of the extracted character image in the entire image.

【００３１】文字画像の切り出しにあたり、文字切出部
１６は、先ず、画像データを、上述したフォーマット情
報に基づいて、第１領域〜第５領域の文字領域とそれ以
外の非文字領域とに領域単位で分割する。領域単位で分
割する際には、各文字領域とその文字領域に含まれる画
像データとの対応関係を示す文字領域情報を作成する。When extracting a character image, the character extracting unit 16 first converts the image data into a character area of the first to fifth areas and a non-character area other than the area based on the format information described above. Divide by unit. When segmenting on a region basis, character region information indicating the correspondence between each character region and image data contained in the character region is created.

【００３２】ここで、図４に、領域単位で分割された画
像データの例を示す。図４の１段目に、領域画像１とし
て、活字の「凸凹銀行」の画像データを示す。また、２
段目に、領域画像２として、活字の「八王子支店」のデ
ータ画像を示す。また、３段目に、領域画像３として、
活字の「Ｘ電気株式会社」のデータ画像を示す。また、
４段目に、領域画像４として、手書きの「沖太郎」のデ
ータ画像を示す。また、５段目に、領域画像５として、
手書きの「１００００」のデータ画像を示す。FIG. 4 shows an example of image data divided on a region basis. In the first row of FIG. 4, as the area image 1, image data of the type “uneven bank” is shown. Also, 2
The data image of the type “Hachioji branch” is shown as the area image 2 in the lower row. In the third row, as area image 3,
The data image of type "X Denki Co., Ltd." is shown. Also,
On the fourth row, a handwritten data image of “Taro Oki” is shown as the area image 4. In the fifth row, as an area image 5,
9 shows a handwritten “10000” data image.

【００３３】また、図５に、文字領域情報の例を示す。
図５の１段目の第１領域に領域画像１が含まれることを
示す。また、２段目の第２領域に領域画像２が含まれる
ことを示す。また、３段目の第３領域に領域画像３が含
まれることを示す。また、４段目の第４領域に領域画像
４が含まれることを示す。また、５段目の第５領域に領
域画像５が含まれることを示す。FIG. 5 shows an example of character area information.
It shows that the first image in the first row in FIG. 5 includes the area image 1. In addition, it indicates that the area image 2 is included in the second area in the second row. In addition, it indicates that the region image 3 is included in the third region in the third row. In addition, it indicates that the area image 4 is included in the fourth area in the fourth row. In addition, it indicates that the fifth area in the fifth row includes the area image 5.

【００３４】次に、文字切出部１６は、各文字領域を文
字列画像領域（以下、単位文字列とも称する）に分割
し、この文字列から個々の文字画像を切り出す。Next, the character extracting section 16 divides each character area into a character string image area (hereinafter also referred to as a unit character string) and cuts out individual character images from this character string.

【００３５】文字領域の分割にあたり、文字切出部１６
は、各領域画像の黒画素の分布を求める。ここでは、横
書きのイメージであるので、縦方向（ｙ軸方向）での黒
画素の分布を求める。そして、黒画素の数がしきい値を
越える部分を文字列画像として分割する。文字列画素領
域を分割する際には、領域、各文字列画素領域、および
その文字列画素領域に含まれる画像データの対応関係を
示す文字領域情報を作成する。In dividing the character area, the character extracting section 16
Calculates the distribution of black pixels in each area image. Here, since the image is written horizontally, the distribution of black pixels in the vertical direction (y-axis direction) is obtained. Then, a portion where the number of black pixels exceeds the threshold is divided as a character string image. When dividing the character string pixel region, character region information indicating the correspondence between the region, each character string pixel region, and the image data included in the character string pixel region is created.

【００３６】ここで、図６に、領域画像１の「凸凹銀
行」の黒画素の分布を示す。図６の横軸は、黒点の数を
表し、縦軸は領域画像のｙ座標を表す。そして、図６中
にＩで示す曲線で黒画素の分布を表す。また、図６中に
IIで示す矢印で、ｙ座標での文字列画像領域の範囲を示
す。Here, FIG. 6 shows the distribution of black pixels of the “uneven bank” in the area image 1. The horizontal axis in FIG. 6 represents the number of black points, and the vertical axis represents the y coordinate of the area image. The distribution of black pixels is represented by a curve indicated by I in FIG. Also, in FIG.
The arrow indicated by II indicates the range of the character string image area at the y coordinate.

【００３７】また、図７の左側に、領域画像１のうちの
文字列画像１を示し、また、図７の右側に、この文字列
画像１の文字領域情報の例を示す。図７では、第１領域
の文字列１（文字列画像領域１）に、文字列画像１が含
まれていることを示す。尚、文字列画像領域は、領域画
像の中の文字列画像の上下に外接する線に挟まれた部分
に相当する。The left side of FIG. 7 shows the character string image 1 of the area image 1, and the right side of FIG. 7 shows an example of the character area information of the character string image 1. FIG. 7 shows that the character string image 1 is included in the character string 1 (character string image area 1) in the first area. Note that the character string image region corresponds to a portion sandwiched between lines circumscribed above and below the character string image in the region image.

【００３８】次に、文字切出部１６は、文字列画像から
個々の文字画像を切り出す。Next, the character extracting section 16 extracts individual character images from the character string image.

【００３９】文字列画像の分割にあたり、文字切出部１
６は、各領域画像の黒画素の分布を求める。ここでは、
横書きの文字列画像であるので、横方向（ｘ軸方向）で
の黒画素の分布を求める。そして、黒画素の数がしきい
値を越える部分を文字列画像として分割する。また、文
字切出部１６は、切り出された各文字画像に、それぞれ
固有の文字画像識別子を付与する。また、文字切出部１
６は、領域、文字列画像領域、および文字画像の画像デ
ータの対応関係を示す文字領域情報を作成する。In dividing the character string image, the character extracting unit 1
6 finds the distribution of black pixels in each area image. here,
Since the image is a horizontally written character string image, the distribution of black pixels in the horizontal direction (x-axis direction) is obtained. Then, a portion where the number of black pixels exceeds the threshold is divided as a character string image. Further, the character extracting section 16 assigns a unique character image identifier to each of the extracted character images. In addition, character extraction section 1
Reference numeral 6 creates character region information indicating the correspondence between the region, the character string image region, and the image data of the character image.

【００４０】ここで、図８に、文字列画像１の「凸凹銀
行」の黒画素の分布を示す。図８の横軸は、文字列画像
のｘ座標を表し、縦軸は黒点の数を表す。そして、曲線
IIIで黒画素の分布を表す。また、矢印IVで、ｘ座標で
の文字画像領域の範囲を示す。Here, FIG. 8 shows the distribution of black pixels of the “uneven bank” of the character string image 1. The horizontal axis in FIG. 8 represents the x coordinate of the character string image, and the vertical axis represents the number of black points. And the curve
III represents the distribution of black pixels. An arrow IV indicates the range of the character image area at the x coordinate.

【００４１】また、図９の左側に、文字列画像１のうち
の文字画像１〜文字画像４を示す。図９では、文字画像
領域１に「凸」の文字画像１が含まれている。この文字
画像１に「文字１」の文字画像識別子を付与する。ま
た、文字画像領域２に「凹」の文字画像２が含まれてい
る。この文字画像２には「文字２」の文字画像識別子を
付与する。また、文字画像領域３に「銀」の文字画像３
が含まれている。この文字画像３には「文字３」の文字
画像識別子を付与する。また、文字画像領域４に「行」
の文字画像４が含まれている。この文字画像領域４には
「文字４」の文字画像識別子を付与する。The character images 1 to 4 of the character string image 1 are shown on the left side of FIG. In FIG. 9, the character image area 1 includes the “convex” character image 1. A character image identifier of “character 1” is assigned to the character image 1. Further, the character image area 2 includes the “concave” character image 2. This character image 2 is given a character image identifier of “character 2”. In the character image area 3, the character image 3 of "silver"
It is included. The character image 3 is given a character image identifier of “character 3”. In the character image area 4, "line"
Character image 4 is included. This character image area 4 is provided with a character image identifier of “character 4”.

【００４２】また、図９の右側に、各文字画像の文字領
域情報の例を示す。図９では、第１領域の文字列１（文
字列画像領域１）の文字画像領域１に、文字画像１が含
まれている。また、第１領域の文字列１の文字画像領域
２に、文字画像２が含まれている。また、第１領域の文
字列１の文字３（文字画像領域３）に、文字画像３が含
まれている。また、第１領域の文字列１の文字４（文字
画像領域４）に、文字画像４が含まれている。The right side of FIG. 9 shows an example of character area information of each character image. In FIG. 9, the character image 1 is included in the character image area 1 of the character string 1 (character string image area 1) in the first area. The character image 2 is included in the character image area 2 of the character string 1 in the first area. The character 3 (character image area 3) of the character string 1 in the first area includes the character image 3. The character image 4 is included in the character 4 (character image area 4) of the character string 1 in the first area.

【００４３】また、文字切出部１６は、個々の文字画像
を切出し、文字領域情報を作成した後、各文字領域に含
まれる文字画像の数を集計して文字領域内文字数集計テ
ーブル（以下、集計テーブルとも称する）を作成する。
集計テーブルは、各文字領域がいくつの文字画像領域に
分割されたかということを示すテーブルである。The character extracting section 16 extracts individual character images, creates character area information, and then totals the number of character images included in each character area to collect the number of character images in each character area. (Also referred to as an aggregation table).
The tabulation table is a table indicating how many character image areas each character area is divided into.

【００４４】また、文字画像の切り出しにあたり、文字
切出部１６は、イメージ全体の中での切り出された各文
字画像の位置および大きさを示す位置大きさ情報を作成
する。In extracting a character image, the character extracting section 16 creates position and size information indicating the position and size of each of the extracted character images in the entire image.

【００４５】そして、文字切出部１６は、各文字画像の
文字画像識別子および位置大きさ情報を互いに対応付け
て、画像記憶部１２へ送る。画像記憶部１２は、送られ
てきた文字画像識別子および位置大きさ情報を保存す
る。また、文字切出部１６は、文字画像識別子を、特徴
抽出部１８へも送る。Then, the character extracting section 16 sends the character image identifier and the position size information of each character image to the image storage section 12 in association with each other. The image storage unit 12 stores the sent character image identifier and positional size information. The character cutout unit 16 also sends the character image identifier to the feature extraction unit 18.

【００４６】次に、特徴抽出部１８は、送られてきた文
字画像識別子を、画像記憶部１４へ送る。画像記憶部１
４は、その文字画像識別子に対応する位置大きさ情報が
示す文字画像を、特徴抽出部１８へ送る。そして、特徴
抽出部１８は、この文字画像の特徴を抽出する。特徴の
抽出にあたり、特徴抽出部１８は、特徴ベクトルを作成
する。Next, the feature extracting unit 18 sends the sent character image identifier to the image storage unit 14. Image storage unit 1
4 sends the character image indicated by the position / size information corresponding to the character image identifier to the feature extracting unit 18. Then, the feature extracting unit 18 extracts the features of the character image. In extracting a feature, the feature extracting unit 18 creates a feature vector.

【００４７】そして、特徴抽出部１８は、文字画像の抽
出された特徴ベクトルをその文字画像の文字画像識別子
と共に、照合部としての距離計算部２０に送る。Then, the feature extracting unit 18 sends the extracted feature vector of the character image together with the character image identifier of the character image to the distance calculating unit 20 as a collating unit.

【００４８】次に、距離計算部２０は、特徴抽出部１８
から文字画像の特徴ベクトルが送られてくる毎に、文字
画像の特徴ベクトルと標準文字パタンの特徴ベクトルと
の照合を行う。Next, the distance calculation unit 20 includes the feature extraction unit 18
Every time a feature vector of a character image is sent from the device, the feature vector of the character image is compared with the feature vector of the standard character pattern.

【００４９】照合にあたり、距離計算部２０は、文字画
像の特徴ベクトルと標準文字パタンの特徴ベクトルとの
距離計算を行う。距離計算の結果、距離計算部２０は、
文字画像に対する複数の標準文字パタンを、認識候補と
して照合順位を付けて選択する。そして、距離計算部２
０は、この認識候補のうちの照合順位の最も高い認識候
補を認識結果とする。In matching, the distance calculation unit 20 calculates the distance between the feature vector of the character image and the feature vector of the standard character pattern. As a result of the distance calculation, the distance calculation unit 20
A plurality of standard character patterns for a character image are selected as recognition candidates with matching order. And the distance calculation unit 2
0 makes the recognition candidate having the highest collation rank among the recognition candidates the recognition result.

【００５０】さらに、距離計算部２０は、認識候補とし
て採用された標準文字パタンが格納されている認識用辞
書の種類を示す照合辞書情報を生成する。Further, the distance calculation section 20 generates collation dictionary information indicating the type of recognition dictionary in which the standard character pattern adopted as the recognition candidate is stored.

【００５１】ここで、図１０の（Ａ）に認識候補の例を
示す。図１０においては、「文字１」の文字画像識別子
が付与された文字画像の認識候補の例を示す。この文字
画像の場合、認識候補１の文字コードは「２３３０」で
ある。また、認識候補２の文字コードは「２３３９」で
ある。また、認識候補３の文字コードは、「２３３０」
である。また、認識候補４の文字コードは「２３３８」
である。Here, FIG. 10A shows an example of a recognition candidate. FIG. 10 shows an example of a recognition candidate for a character image to which a character image identifier of “character 1” is assigned. In the case of this character image, the character code of recognition candidate 1 is “2330”. The character code of recognition candidate 2 is “2339”. The character code of recognition candidate 3 is “2330”
It is. The character code of recognition candidate 4 is “2338”
It is.

【００５２】さらに、図１０の（Ｂ）に照合辞書情報の
例を示す。この「文字１」の文字画像の認識候補１の照
合辞書情報は「辞書１」である。また、認識候補２の照
合辞書情報は「辞書２」である。また、認識候補３の照
合辞書情報は「辞書２」である。また、認識候補４の照
合辞書情報は「辞書１」である。FIG. 10B shows an example of collation dictionary information. The matching dictionary information of the recognition candidate 1 of the character image of “character 1” is “dictionary 1”. The matching dictionary information of the recognition candidate 2 is “dictionary 2”. The matching dictionary information of the recognition candidate 3 is “dictionary 2”. The matching dictionary information of the recognition candidate 4 is “dictionary 1”.

【００５３】そして、距離計算部２０は、認識候補１の
文字コード「２３３０」および認識候補１の照合辞書情
報「辞書１」を、それぞれ認識結果およびその照合辞書
情報として、単一文字領域判定部２２へ送る。Then, the distance calculation unit 20 uses the character code “2330” of the recognition candidate 1 and the collation dictionary information “dictionary 1” of the recognition candidate 1 as the recognition result and the collation dictionary information, respectively, as the single character area determination unit 22. Send to

【００５４】単一文字領域判定部２２においては、文字
切出部１６から送られてきた文字領域の文字領域情報お
よび当該文字領域に含まれる各文字画像の距離計算部２
０から送られてきた認識結果の照合辞書情報に基づい
て、各文字領域が、当該文字領域に含まれる文字の種類
が単一である単一文字領域であるか否かをそれぞれ判定
する。In the single character area determining section 22, the character area information of the character area sent from the character extracting section 16 and the distance calculating section 2 for each character image included in the character area are included.
Based on the collation dictionary information of the recognition result sent from 0, it is determined whether or not each character area is a single character area in which the type of character included in the character area is single.

【００５５】判定に先立ち、単一文字領域判定部２２
は、先ず、文字領域情報に基づいて文字領域に属する文
字画像を特定する。そのために、単一文字領域判定部２
２は、各文字領域に属する文字画像の照合辞書情報が送
られてきた回数をそれぞれ数える。そして、単一文字領
域判定部２２は、回数を数える度に、文字切出部１６の
集計テーブルから回数を数えた文字領域の文字数を読み
出す。そして、その文字領域の文字数とその文字領域の
照合辞書情報の送られた回数とが同数となった場合に、
単一文字領域判定部２２は判定を開始する。Prior to the determination, the single character area determination unit 22
First, the character image belonging to the character area is specified based on the character area information. Therefore, the single character area determination unit 2
2 counts the number of times the matching dictionary information of the character image belonging to each character area has been sent. Then, each time the single character area determination unit 22 counts the number of times, the single character area determination unit 22 reads out the number of characters of the counted character area from the aggregation table of the character extraction unit 16. When the number of characters in the character area is equal to the number of times the matching dictionary information of the character area is sent,
The single character area determination unit 22 starts the determination.

【００５６】ここで、図１１の（Ａ）に、判定対象の文
字領域の文字領域情報の例を示す。図１１の（Ａ）に
は、領域１の文字列１の文字１〜文字５についての文字
領域情報が示されている。図１１（Ａ）では、領域１の
文字数が５文字であるので、集計テーブルの回数が５回
を数えた段階で、文字領域１の全ての文字が送られてき
たことになる。Here, FIG. 11A shows an example of character area information of a character area to be determined. FIG. 11A shows character area information on characters 1 to 5 of the character string 1 in the area 1. In FIG. 11A, since the number of characters in the area 1 is five, all the characters in the character area 1 have been sent when the number of times in the tally table is counted five.

【００５７】次に、判定にあたり、単一文字領域判定部
２２は、先ず、各文字画像毎に、文字画像の各認識候補
についての照合辞書情報が示す辞書の種類うちから代表
辞書の種類をそれぞれ決定する。Next, in the determination, the single character area determination unit 22 first determines the type of the representative dictionary from the type of the dictionary indicated by the collation dictionary information for each recognition candidate of the character image for each character image. I do.

【００５８】ここで、図１１の（Ｂ）に、判定対象の文
字領域の照合辞書情報の例を示す。図１１の（Ｂ）に
は、この文字１〜文字４の文字情報識別子で識別される
文字画像についての照合辞書情報が照合順位の順に示さ
れている。文字１については、照合順位の順に「辞書
１」、「辞書２」、「辞書２」および「辞書１」が照合
辞書情報として示されている。また、文字２について
は、照合順位の順に「辞書２」、「辞書１」、「辞書
２」および「辞書１」が照合辞書情報として示されてい
る。また、文字３については、照合順位の順に「辞書
１」、「辞書２」、「辞書２」および「辞書１」が照合
辞書情報として示されている。また、文字４について
は、照合順位の順に「辞書１」、「辞書２」、「辞書
２」および「辞書１」が照合辞書情報として示されてい
る。また、文字５については、照合順位の順に「辞書
１」、「辞書２」、「辞書２」および「辞書１」が照合
辞書情報として示されている。Here, FIG. 11B shows an example of the collation dictionary information of the character area to be determined. FIG. 11B shows collation dictionary information on the character images identified by the character information identifiers of the characters 1 to 4, in the order of collation. For character 1, “dictionary 1”, “dictionary 2”, “dictionary 2”, and “dictionary 1” are shown as collation dictionary information in the order of collation order. For character 2, “dictionary 2”, “dictionary 1”, “dictionary 2”, and “dictionary 1” are shown in the order of collation order as collation dictionary information. As for character 3, “dictionary 1”, “dictionary 2”, “dictionary 2”, and “dictionary 1” are shown as collation dictionary information in the order of collation order. For character 4, “dictionary 1”, “dictionary 2”, “dictionary 2”, and “dictionary 1” are shown as collation dictionary information in the order of collation order. For character 5, “dictionary 1”, “dictionary 2”, “dictionary 2”, and “dictionary 1” are shown in the order of collation order as collation dictionary information.

【００５９】そして、各文字画像の照合順位が１位の認
識候補の照合辞書情報が示す辞書の種類を、その文字画
像の代表辞書の種類とする。従って、図１１の（Ｂ）に
示す例では、文字１の代表辞書の種類は「辞書１」、文
字２の代表辞書の種類は「辞書２」、文字３の代表辞書
の種類は「辞書２」、文字４の代表辞書の種類は「辞書
１」そして文字５の代表辞書の種類は「辞書１」とな
る。尚、代表辞書の判断基準は、この実施例の判断基準
に限定されない。例えば、各文字画像について、認識候
補の照合辞書情報として示されている辞書の種類うち、
最多数の辞書を代表辞書としても良い。Then, the type of dictionary indicated by the collation dictionary information of the recognition candidate having the first collation order of each character image is set as the type of the representative dictionary of the character image. Therefore, in the example shown in FIG. 11B, the type of the representative dictionary of the character 1 is “dictionary 1”, the type of the representative dictionary of the character 2 is “dictionary 2”, and the type of the representative dictionary of the character 3 is “dictionary 2”. , The representative dictionary type of character 4 is “dictionary 1”, and the representative dictionary type of character 5 is “dictionary 1”. Note that the criterion of the representative dictionary is not limited to the criterion of this embodiment. For example, for each character image, among the types of dictionaries shown as the matching dictionary information of the recognition candidates,
The largest number of dictionaries may be used as the representative dictionaries.

【００６０】次に、単一文字領域判定部２２は、この文
字領域（領域１）に含まれる各文字画像の各代表辞書の
種類に基づいて、文字領域が単一文字領域であるか否か
を判定する。Next, the single character area determination section 22 determines whether or not the character area is a single character area based on the type of each representative dictionary of each character image included in the character area (area 1). I do.

【００６１】そのために、単一文字領域判定部２２は、
各文字領域に属する各文字画像の代表辞書の種類毎の数
をそれぞれ比較する。そして、最多数の代表辞書の種類
が２種類以上ない場合に、文字領域が単一文字領域であ
ると判断する。尚、単一文字領域の判定基準は、この実
施例の基準に限定されない。For this purpose, the single character area determination unit 22
The number of each type of the representative dictionary of each character image belonging to each character area is compared. If there is no more than two types of representative dictionaries, it is determined that the character area is a single character area. The criterion for determining a single character area is not limited to the criterion of this embodiment.

【００６２】図１１の（Ｂ）に示す例では、「辞書１」
の代表辞書が４つ、「辞書２」の代表辞書が１つである
ので、領域１は単一文字領域と判定される。単一文字領
域の文字は、例えば、手書き文字だけ、または、活字だ
けの場合に相当する。In the example shown in FIG. 11B, "dictionary 1"
Since there are four representative dictionaries and one representative dictionary of "dictionary 2," region 1 is determined to be a single character region. The characters in the single character area correspond to, for example, only handwritten characters or only printed characters.

【００６３】尚、一つの文字領域において、最多数の代
表辞書の種類が複数ある場合は、その文字領域は、単一
文字領域ではないと判定される。この場合、例えば、一
つの文字領域に手書き文字と活字とが混在している場合
に相当する。When there are a plurality of types of the most representative dictionaries in one character area, it is determined that the character area is not a single character area. This case corresponds to, for example, a case where handwritten characters and printed characters are mixed in one character area.

【００６４】さらに、単一文字領域判定部２２は、文字
領域が単一文字領域である場合に、この単一文字領域に
含まれる文字の種類に対応する辞書の種類を示す領域代
表辞書情報を生成する。図１１の（Ｂ）に示す例では、
「辞書１」を示す領域代表辞書情報が生成される。Further, when the character area is a single character area, the single character area determination section 22 generates area representative dictionary information indicating the type of dictionary corresponding to the type of characters included in the single character area. In the example shown in FIG.
Area representative dictionary information indicating “dictionary 1” is generated.

【００６５】そして、単一文字領域判定部２２は、判定
結果と、「辞書１」を示す領域代表辞書情報とを修正部
２４へ送る。Then, the single character area determination section 22 sends the determination result and the area representative dictionary information indicating “dictionary 1” to the correction section 24.

【００６６】修正部２４は、単一文字領域に含まれる各
文字画像の認識候補のうち、照合辞書情報と領域代表辞
書情報とが一致する一致認識候補の照合順位を、この一
致認識候補どうしの照合順位を維持したままで、照合辞
書情報と領域代表辞書情報とが一致しない他の認識候補
の照合順位よりも高くなるように修正する。尚、照合順
位の修正は、少なくとも、照合順位が１位の認識候補の
照合辞書情報が、領域代表辞書情報と一致しない文字に
ついて行えば良い。The correction unit 24 determines the collation order of the matching recognition candidates whose matching dictionary information matches the area representative dictionary information among the recognition candidates of each character image included in the single character area. While maintaining the order, the matching dictionary information and the area representative dictionary information are corrected so as to be higher than the matching order of other recognition candidates that do not match. It should be noted that the collation order may be corrected at least for characters whose collation dictionary information of the recognition candidate having the first collation rank does not match the area representative dictionary information.

【００６７】ここで、図１２の（Ａ）から（Ｃ）に、単
一文字領域に含まれる文字画像のうち、文字２の認識候
補の照合順位の修正例を示す。FIGS. 12A to 12C show examples of correcting the collation order of the recognition candidate for character 2 in the character image included in the single character area.

【００６８】先ず、図１２の（Ａ）に、「文字２」の文
字画像識別子が付与された文字画像の各認識候補の照合
辞書情報を示す。この「文字２」の文字画像の認識候補
１の照合辞書情報は「辞書２」である。また、認識候補
２の照合辞書情報は「辞書１」である。また、認識候補
３の照合辞書情報は「辞書２」である。また、認識候補
４の照合辞書情報は「辞書１」である。First, FIG. 12A shows the collation dictionary information of each recognition candidate of the character image to which the character image identifier of “character 2” is added. The matching dictionary information of the recognition candidate 1 of the character image of “character 2” is “dictionary 2”. The matching dictionary information of the recognition candidate 2 is “dictionary 1”. The matching dictionary information of the recognition candidate 3 is “dictionary 2”. The matching dictionary information of the recognition candidate 4 is “dictionary 1”.

【００６９】次に、図１２の（Ｂ）に、「文字２」の文
字画像識別子が付与された文字画像の認識候補を示す。
この「文字２」の文字画像の場合、認識候補１の文字コ
ードは「２３３１」である。また、認識候補２の文字コ
ードは「２３３２」である。また、認識候補３の文字コ
ードは「２３３３」である。また、認識候補４の文字コ
ードは「２３３４」である。また、図１２においては、
領域代表辞書の種類である「辞書１」に基づく認識候補
の辞書および文字コードを太枠で囲んで示す。Next, FIG. 12B shows recognition candidates for a character image to which a character image identifier of "character 2" has been added.
In the case of the character image of “character 2”, the character code of recognition candidate 1 is “2331”. The character code of recognition candidate 2 is “2332”. The character code of recognition candidate 3 is “2333”. The character code of recognition candidate 4 is “2334”. In FIG. 12,
A dictionary and a character code of a recognition candidate based on “Dictionary 1” which is a type of the area representative dictionary are indicated by thick frames.

【００７０】図１２の（Ａ）および（Ｂ）に示すよう
に、修正前の段階では、照合順位１位の認識候補１の照
合辞書情報が示す辞書２は領域代表辞書の種類とは一致
していない。そこで、照合辞書情報がいずれも辞書１で
ある、認識候補２の文字コード「２３３２」および認識
候補４の文字コード「２３３４」とを、互いの照合順位
を入れ替えずに、照合順位の上位に来るように修正す
る。As shown in FIGS. 12A and 12B, at the stage before correction, the dictionary 2 indicated by the collation dictionary information of the recognition candidate 1 having the first collation rank matches the type of the area representative dictionary. Not. Therefore, the character code “2332” of the recognition candidate 2 and the character code “2334” of the recognition candidate 4, each of which has the dictionary 1 as the collation dictionary information, are ranked higher in the collation order without changing the collation order. Modify as follows.

【００７１】ここで、図１２の（Ｃ）に、「文字２」の
修正後の認識候補を示す。修正後は、照合順位の１位が
「辞書１」の文字コード「２３３２」となり、照合順位
の２位が「辞書１」の文字コード「２３３４」となって
いる。Here, FIG. 12C shows the recognition candidates after the correction of “character 2”. After the correction, the first place of the collation order is the character code “2332” of “dictionary 1”, and the second place of the collation order is the character code “2334” of “dictionary 1”.

【００７２】従って、照合順位が１位の文字コード「２
３３２」が、修正後の「文字２」の認識結果となる。Accordingly, the character code "2
332 ”is the recognition result of the corrected“ character 2 ”.

【００７３】ここで、「辞書１」を手書き文字用の辞
書、「辞書２」を活字用の辞書とすれば、修正前は、手
書き文字の「文字２」の認識候補１は、活字用の辞書に
格納されていた文字コード「２３３１」であった。文字
パタンの種類と異なる種類の辞書を用いて得られた認識
結果の誤認識率は、文字パタンの種類と同一の種類の辞
書を用いて得られた認識結果の誤認識率よりも、一般に
高くなる傾向があるので、「文字２」の認識候補１を認
識結果とした場合は、誤認識となる確率が高くなる。Here, assuming that “dictionary 1” is a dictionary for handwritten characters and “dictionary 2” is a dictionary for printed characters, before correction, the recognition candidate 1 for “character 2” of handwritten characters is The character code was "2331" stored in the dictionary. The misrecognition rate of recognition results obtained using a dictionary of a type different from the type of character pattern is generally higher than that of recognition results obtained using a dictionary of the same type as the type of character pattern. Therefore, when the recognition candidate 1 of “character 2” is used as the recognition result, the probability of erroneous recognition increases.

【００７４】これに対して、修正後は、手書き文字の
「文字２」の照合順位が１位の認識候補が、手書き用の
辞書に格納されていた認識候補のうちで照合順位が最も
高い文字コード「２３３２」となった。従って、照合順
位の修正によって、修正前よりも誤認識率の低減を図る
ことができる。On the other hand, after the correction, the recognition candidate with the highest matching order of the handwritten character “character 2” is the character with the highest matching order among the recognition candidates stored in the handwriting dictionary. The code was "2332". Therefore, by correcting the collation order, it is possible to reduce the erroneous recognition rate compared to before the correction.

【００７５】（変形例）これらの発明の文字認識方法お
よび文字認識装置においては、単一文字領域判定の方法
および単一文字領域判定部は、上述した実施の形態の方
法および装置に限定されるものではない。(Modification) In the character recognition method and the character recognition device according to the present invention, the method for determining a single character area and the single character area determination unit are not limited to the method and the apparatus according to the above-described embodiment. Absent.

【００７６】例えば、文字領域が、単一文字領域である
か否かを判定するにあたり、文字領域情報に基づいてそ
の文字領域に属すると特定された各文字画像のそれぞれ
の各認識候補について照合辞書情報が示す辞書の種類に
基づいて、文字領域が、単一文字領域であるか否かを判
定しても良い。For example, in determining whether or not a character area is a single character area, the matching dictionary information is used for each recognition candidate of each character image specified to belong to the character area based on the character area information. May be determined based on the type of dictionary indicated by.

【００７７】また、単一文字領域判定部を、文字領域
が、単一文字領域であるか否かを判定するにあたり、文
字領域情報に基づいてその文字領域に属すると特定され
た各文字画像のそれぞれの各認識候補について照合辞書
情報が示す辞書の種類に基づいて、文字領域が、単一文
字領域であるか否かを判定する単一文字領域判定部とし
ても良い。Further, when determining whether or not the character area is a single character area, the single character area determination unit determines whether the character image belongs to the character area based on the character area information. A single character area determination unit that determines whether the character area is a single character area based on the type of dictionary indicated by the collation dictionary information for each recognition candidate may be used.

【００７８】より具体的には、例えば、同一文字領域に
属する全ての文字画像の全てもしくは上位から一定順位
までの認識候補について全ての照合辞書情報が示す辞書
の種類について、種類毎の数を比較して、最多数含まれ
る辞書の種類が２種類以上ない場合に、その文字領域が
単一文字領域であると判定しても良い。More specifically, for example, for all of the character images belonging to the same character area or for the recognition candidates from the top to a certain order, the number of dictionaries indicated by all the collation dictionary information is compared for each type. Then, when there is no more than two types of dictionaries included in the largest number, the character area may be determined to be a single character area.

【００７９】次に、図１１の（Ａ）および（Ｂ）に示さ
れた「領域１」について、この判定方法で単一文字領域
か否かを判定する。図１１の（Ｂ）に示される「文字
１」〜「文字５」の文字画像についての上位４位までの
認識候補について全ての照合辞書情報が示す辞書の種類
毎の数を集計して比較する。その結果、「辞書１」は１
０回、「辞書２」も１０回となる。従って、最多数含ま
れる辞書の種類が２種類であるので、この判定方法で
は、「領域１」は単一文字領域ではないと判定される。Next, it is determined whether or not “area 1” shown in FIGS. 11A and 11B is a single character area by this determination method. For the top four recognition candidates for the character images of "character 1" to "character 5" shown in FIG. 11B, the numbers for each type of dictionary indicated by all the collation dictionary information are totaled and compared. . As a result, "dictionary 1" is 1
0 times and “dictionary 2” also 10 times. Accordingly, since there are two types of dictionaries included in the largest number, it is determined that “area 1” is not a single character area in this determination method.

【００８０】上述した実施の形態では、これらの発明を
特定の条件で構成した例についてのみ説明したが、これ
らの発明は多くの変更および変形を行うことができる。
例えば、また、上述した実施の形態においては、辞書と
して、手書き用辞書および活字用辞書の２種類の辞書を
用いたが、辞書の種類は、この２種類に限定する必要は
ない。例えば、活字辞書として、明朝体用の辞書および
ゴシック体用の辞書を用いても良い。また、辞書の種類
も制限されない。In the above-described embodiment, only examples in which these inventions are configured under specific conditions have been described. However, these inventions can be subjected to many changes and modifications.
For example, in the above-described embodiment, two types of dictionaries, a handwritten dictionary and a type dictionary, are used as the dictionaries. However, the types of dictionaries need not be limited to these two types. For example, a dictionary for Mincho and a dictionary for Gothic may be used as the type dictionary. Also, the type of dictionary is not limited.

【００８１】また、辞書の標準パタンの作り方は、この
実施の形態で用いられた方法に限定する必要はない。例
えば、特徴抽出部において、用いられる特徴抽出方法を
用いて特徴ベクトルを作成しても良く、また、一つの文
字画像で得られた特徴ベクトルをそのまま標準パタンと
しても良い。It is not necessary to limit the method of creating a dictionary standard pattern to the method used in this embodiment. For example, the feature extraction unit may create a feature vector using the feature extraction method used, or a feature vector obtained from one character image may be used as a standard pattern as it is.

【００８２】また、上述した実施の形態においては、一
行の文字領域の例を示したが、文字領域の行数は一行に
限定されるものではなく、複数行が含まれた文字領域に
ついても適用することができる。Further, in the above-described embodiment, an example of a one-line character area has been described. However, the number of lines in the character area is not limited to one line, and the present invention is also applicable to a character area including a plurality of lines. can do.

【００８３】[0083]

【発明の効果】第１の発明の文字認識方法および第２の
発明の文字認識装置によれば、文字領域が単一種類の文
字が記載された単一文字領域と判定された場合に、単一
文字領域の各文字について、その文字の種類と同じ種類
の辞書を用いて文字認識された認識候補の照合順位が１
位となるように、照合順位を修正することができる。そ
の結果、文字の種類と認識結果に用いた辞書の種類との
一致を図ることができる。このため、誤認識率の低減を
図ることができる。According to the character recognition method of the first invention and the character recognition device of the second invention, when a character area is determined to be a single character area in which a single type of character is described, a single character For each character in the area, the collation order of the recognition candidates recognized by using the dictionary of the same type as the type of the character is 1
The collation order can be modified so as to be ranked. As a result, it is possible to match the type of the character with the type of the dictionary used for the recognition result. For this reason, the false recognition rate can be reduced.

[Brief description of the drawings]

【図１】実施の形態の文字認識装置の構成の説明に供す
るブロック図である。FIG. 1 is a block diagram for explaining a configuration of a character recognition device according to an embodiment;

【図２】実施の形態における認識対象の帳票の例であ
る。FIG. 2 is an example of a form to be recognized in the embodiment.

【図３】実施の形態におけるフォーマット情報の例であ
る。FIG. 3 is an example of format information according to the embodiment;

【図４】実施の形態における領域単位で分割された画像
データの例である。FIG. 4 is an example of image data divided on a region basis according to the embodiment;

【図５】実施の形態における文字領域情報の例である。FIG. 5 is an example of character area information in the embodiment.

【図６】実施の形態における領域画像およびその黒画素
の分布である。FIG. 6 is a distribution of a region image and its black pixels in the embodiment.

【図７】実施の形態における文字列画像およびその文字
領域情報の例である。FIG. 7 is an example of a character string image and character area information according to the embodiment.

【図８】実施の形態における文字列画像およびその黒画
素の分布の例である。FIG. 8 is an example of a character string image and a distribution of black pixels thereof in the embodiment.

【図９】実施の形態における文字画像およびその文字領
域情報の例である。FIG. 9 is an example of a character image and character area information according to the embodiment.

【図１０】（Ａ）は、実施の形態における認識候補であ
り、（Ｂ）は、照合辞書情報の例である。FIG. 10A is an example of a recognition candidate according to the embodiment, and FIG. 10B is an example of collation dictionary information.

【図１１】（Ａ）は、実施の形態における文字領域情報
の例であり、（Ｂ）は、照合辞書情報の例である。FIG. 11A is an example of character area information in the embodiment, and FIG. 11B is an example of collation dictionary information.

【図１２】（Ａ）〜（Ｃ）は、照合順位の修正例であ
る。FIGS. 12A to 12C are examples of correcting the collation order;

[Explanation of symbols]

１０：辞書部１０ａ：手書き文字用の辞書１０ｂ：活字用の辞書１２：スキャナ部１４：画像記憶部１６：文字切出部１８：特徴抽出部２０：距離計算部２２：単一文字領域判定部２４：修正部 10: Dictionary unit 10a: Dictionary for handwritten characters 10b: Dictionary for printed characters 12: Scanner unit 14: Image storage unit 16: Character extraction unit 18: Feature extraction unit 20: Distance calculation unit 22: Single character area determination unit 24 : Correction department

Claims

[Claims]

1. A character image is cut out from an image captured from a medium in which characters are described, and features of each character image are extracted. The features of the character image are stored in a plurality of types of dictionaries. Is compared with the features of the standard character pattern for character recognition, a plurality of standard patterns for the character image are selected as recognition candidates with a collating order, and the most significant collating order among the recognition candidates is selected. In a character recognition method that uses a high recognition candidate as a recognition result, the dictionary in which character region information representing a character region in which the character image is included is generated, and the recognition result or a standard character pattern of the recognition candidate is stored. Based on the character area information and the collation dictionary information,
It is determined whether each of the character regions is a single character region in which the type of characters included in the character region is single, and if the character region is the single character region, the single character region is determined. Generating region representative dictionary information indicating the type of dictionary corresponding to the type of character included in the character string, and among the recognition candidates for each character image belonging to the single character region, the collation dictionary information and the region representative dictionary information match. The matching order of the matching recognition candidates is corrected to be higher than the matching order of other recognition candidates whose matching dictionary information and the area representative dictionary information do not match while maintaining the matching order of the matching recognition candidates. A character recognition method.

2. The character recognition method according to claim 1, wherein in determining whether the character area is the single character area, it is specified that the character area belongs to the character area based on the character area information. For each character image, the type of the representative dictionary is determined from the types of dictionaries indicated by the matching dictionary information for each recognition candidate of the character image, and the type of each representative dictionary of each character image included in the character area And determining whether the character area is the single character area based on the following.

3. The character recognition method according to claim 1, wherein a dictionary for handwritten characters and a dictionary for printed characters are used as the dictionary.

4. A character area is extracted from a plurality of types of dictionaries each storing a standard pattern for character recognition, and an image captured from a medium in which characters are described, and individual character images are extracted from the character area. A character extracting unit that extracts a character, a feature extracting unit that extracts a characteristic of each character image, and a plurality of standard character patterns corresponding to the character image by comparing the characteristic of the character image with the characteristic of the standard character pattern. Is selected as a recognition candidate with matching order, and
A character recognition device comprising: a recognition unit that uses a recognition candidate having the highest matching order among the recognition candidates as a recognition result; and a character region indicating a character region containing the character image as the character cutout unit. A character extracting unit that generates information; a collating unit that generates collation dictionary information indicating a type of the dictionary in which the standard character pattern of the recognition candidate is stored; Based on the collation dictionary information,
It is determined whether each of the character regions is a single character region in which the type of characters included in the character region is single, and if the character region is the single character region, the single character region is determined. A single character area determination unit that generates area representative dictionary information indicating the type of dictionary corresponding to the type of character included in the recognition dictionary information and the collation dictionary information among recognition candidates for each character image belonging to the single character area. The matching order of the matching recognition candidates that match the area representative dictionary information is compared with the matching order of other recognition candidates whose matching dictionary information does not match the area representative dictionary information while maintaining the matching order of the matching recognition candidates. A character recognition device comprising a correction unit for correcting the height to be higher than the height.

5. The character recognition device according to claim 4, wherein the single character area determination unit includes: for each character image identified as belonging to the character area based on the character area information, The type of the representative dictionary is determined from the types of dictionaries indicated by the collation dictionary information for the recognition candidates, and the character area is changed to the single character based on the type of each representative dictionary of each character image included in the character area. A character recognition device, comprising: a single character area determination unit that determines whether the area is an area.

6. The character recognition device according to claim 4, wherein the dictionary includes a dictionary for handwritten characters and a dictionary for printed characters.