JP2009193170A

JP2009193170A - Character recognition device and character recognition method

Info

Publication number: JP2009193170A
Application number: JP2008031027A
Authority: JP
Inventors: Hiroaki Ikeda; 裕章池田; Hideaki Matsumoto; 英明松本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2008-02-12
Filing date: 2008-02-12
Publication date: 2009-08-27

Abstract

<P>PROBLEM TO BE SOLVED: To improve character recognition accuracy when there is fluctuation of an aspect ratio of a character or a difference of an aspect ratio by a character type or when characters are closed up or gradually written in smaller size. <P>SOLUTION: A line rectangle including characters is extracted from an image of a character recognition target (S201). In the line rectangle, pixel blocks are cut and divided (S202). The pixel blocks having vertical arrangement or the isolated pixel blocks among the cut and divided pixel blocks are combined, and the pixel block that is a unit of a character cut pattern is determined (S203, S204). Based on size of the determined pixel block, estimation character size of the notice pixel block is obtained from a local average (S205). The pixel blocks are combined such that the pixel blocks has size approximate to the obtained estimation character size (S207), and a permissible range is further widened to create a candidate of the character cut pattern (S208). Character recognition is performed about each obtained candidate, and a recognition result having most excellent evaluation is selected and output (S209). <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、たとえば紙文書等に記入された文字を読み取って認識する文字認識装置及び文字認識方法に関する。 The present invention relates to a character recognition device and a character recognition method for reading and recognizing characters entered on, for example, a paper document.

スキャナ等で紙文書に記録された画像を読み取り、読み取った画像から文字を自動的に認識する文字認識処理がある。文字認識処理では、初めに認識対象の領域に含まれる画素塊（または画素ブロックとも呼ぶ。）を切り分ける。そして切り分けた画素塊をそのまま、あるいは複数の画素塊を結合してひとつの個別文字パターンとして切り出し、その特徴を抽出し、抽出した特徴を、認識対象文字の標準パターン情報が格納されている認識辞書と照合する。そして照合の結果、一致する特徴などに基づいて最も確からしい文字を選択し、認識対象の文字パターンの文字コードを決定する。ここで認識対象の画像を画素塊に切り分け、切り分けた画素塊から認識対象となる個々の文字パターンを構成する文字切り処理は以下のような手順で行われていた。 There is a character recognition process in which an image recorded on a paper document is read by a scanner or the like and characters are automatically recognized from the read image. In the character recognition process, first, a pixel block (or pixel block) included in a recognition target region is cut out. Then, the segmented pixel block is cut out as it is or by combining a plurality of pixel blocks as one individual character pattern, the feature is extracted, and the extracted feature is stored in the recognition dictionary in which standard pattern information of the recognition target character is stored. To match. As a result of collation, the most probable character is selected based on matching features and the like, and the character code of the character pattern to be recognized is determined. Here, the image to be recognized is segmented into pixel blocks, and the character segmenting process for configuring individual character patterns to be recognized from the segmented pixel clusters is performed in the following procedure.

第１の手順では、認識対象の画像を含む矩形領域から、空白により区切られる複数の画素塊を切り出す。そして行矩形の高さ及び切り出した画素塊の幅のうち最大のものを基準値とする。画素塊を結合して個別文字パターンを生成するために、幅が（基準値＋α）以内となる１または複数の画素塊を１つの文字パターンとして認識の対象とする（例えば、特許文献１参照）。 In the first procedure, a plurality of pixel blocks delimited by blanks are cut out from a rectangular region including an image to be recognized. Then, the maximum value among the height of the row rectangle and the width of the cut out pixel block is set as the reference value. In order to combine pixel blocks and generate individual character patterns, one or a plurality of pixel blocks whose width is within (reference value + α) are recognized as one character pattern (for example, see Patent Document 1). .

第２の手順では、認識に先立って文字切りの仕方（文字切りパターンと呼ぶことにする。）を決定せず、複数通りの文字切りパターンを作成する。複数通りの文字切りパターンとは、互いに隣接する画素塊を結合するか否かを切り替えることで得られる文字切りパターンの組である。そして、各文字切りパターンについて、その文字切りパターンに従って切り出される文字パターンの文字形状から得られる文字形状評価値と、候補文字パターンを認識辞書と照合して得られる文字認識評価値とに基づいて文字評価値を算出する。文字評価値が最大となる文字切りパターンが文字認識対象の文字切りパターンとなる（例えば、特許文献２参照）。
特開昭６２−１０７８４号公報特開平３−１９１４９１号公報 In the second procedure, a character cutting method (referred to as a character cutting pattern) is not determined prior to recognition, and a plurality of character cutting patterns are created. A plurality of character cut patterns are sets of character cut patterns obtained by switching whether or not to combine adjacent pixel blocks. Then, for each character cutting pattern, a character is evaluated based on the character shape evaluation value obtained from the character shape of the character pattern cut out according to the character cutting pattern and the character recognition evaluation value obtained by comparing the candidate character pattern with the recognition dictionary. An evaluation value is calculated. The character cutting pattern with the maximum character evaluation value is the character cutting pattern for character recognition (see, for example, Patent Document 2).
JP 62-10784 A JP-A-3-191491

しかしながら、上述した第１手順では、例えば手書きされた住所を文字認識する場合に、筆記者により文字の縦横比が違っていたり、漢字と数字等の文字種により縦横比が違っていたりする。そのため、市区町村、町名部分では本来の１文字が偏と旁に分離して切り出されて認識されたり、丁目番地部分では本来複数の文字が１文字として切り出されて認識されてしまうといった欠点があった。 However, in the first procedure described above, for example, when recognizing a handwritten address, the aspect ratio of the character is different depending on the writer, or the aspect ratio is different depending on the character type such as Chinese characters and numbers. For this reason, there is a drawback in that the original character is separated and recognized in the municipality and the town name part, and that a plurality of characters are originally extracted as one character in the chome address part. there were.

また、住所等の記入欄の余白が少なくなると、ユーザは徐々に文字を詰めて書いたり、徐々に小さく書いたりする傾向がある。そのため、詰められたり小さくかかれた部分では、本来個別であるべき複数の文字が、それらを結合した文字として切り出されて認識されるという欠点があった。 In addition, when the margin of an entry field such as an address is reduced, the user tends to gradually write characters in a smaller size or gradually write smaller. For this reason, there is a drawback in that a plurality of characters that should originally be separated are cut out and recognized as a character obtained by combining them in a portion that is packed or small.

また、上述した第２の手順では、切り分けられた画素塊に対し、その組み合わせを全て網羅するような文字切りパターンの組を作ってしまうと評価すべき文字パターンの数が膨大になる。特に第２の手順では、複数の文字切りパターンのそれぞれについて、各文字パターンの文字形状評価値と文字認識評価値とを評価し、それらの値から文字評価値を求める必要があり、計算量が膨大になるという問題があった。 Further, in the above-described second procedure, if a group of character cut patterns that covers all the combinations is created for the segmented pixel block, the number of character patterns to be evaluated becomes enormous. In particular, in the second procedure, it is necessary to evaluate the character shape evaluation value and the character recognition evaluation value of each character pattern for each of the plurality of character cutting patterns, and obtain the character evaluation value from those values, and the calculation amount is large. There was a problem of becoming enormous.

本発明は上記従来例に鑑みて成されたもので、上述した課題を解決し、文字認識精度を向上させることを目的とする。 The present invention has been made in view of the above conventional example, and has as its object to solve the above-described problems and improve the character recognition accuracy.

上述した課題を解決するために、本発明は以下の構成を備える。 In order to solve the above-described problems, the present invention has the following configuration.

文字認識装置であって、画素により構成される画像データから文字列を含む領域を抽出する抽出手段と、
前記領域を、組み合わせの単位となる画素塊に切り分ける切り分け手段と、
前記切り分け手段により切り分けられた画素塊に順次着目し、前記領域内において、着目画素塊を含む局所領域に属する所定の画素塊の幅の平均を、着目画素塊の推定文字サイズとして計算する推定手段と、
互いに隣接する前記画素塊を結合した場合、結合後の画素塊の幅が前記推定文字サイズから第１の許容範囲内であれば、結合後の画素塊がひとつの画素塊となるよう、前記切り分け手段による切り分けを変更した文字切りパターンを作成する文字切り手段と、
前記文字切りパターンに従って切り分けられた画素塊を対象として文字認識を行う認識手段とを備える。 An extraction means for extracting a region including a character string from image data composed of pixels;
A segmenting means for segmenting the region into pixel blocks which are units of combination;
Estimating means for sequentially paying attention to the pixel blocks segmented by the segmenting means and calculating the average width of the predetermined pixel clusters belonging to the local area including the target pixel cluster as the estimated character size of the target pixel cluster in the region When,
When the pixel blocks adjacent to each other are combined, if the width of the combined pixel block is within the first allowable range from the estimated character size, the segmentation is performed so that the combined pixel block becomes one pixel block. A character cutting means for creating a character cutting pattern in which the separation by means is changed,
Recognizing means for performing character recognition on a pixel block cut according to the character cutting pattern.

あるいは、文字認識装置であって、画素により構成される画像データから文字列を含む領域を抽出する抽出手段と、
前記領域を、組み合わせの単位となる画素塊に切り分ける切り分け手段と、
前記切り分け手段により切り分けられた画素塊の位置と幅との関係を二次関数により近似し、該二次関数により得られる各画素塊の位置に対応する幅を、各画素塊の推定文字サイズとして計算する推定手段と、
互いに隣接する前記画素塊を結合した場合、結合後の画素塊の幅が前記推定文字サイズから第１の許容範囲内であれば、結合後の画素塊がひとつの画素塊となるよう、前記切り分け手段による切り分けを変更した文字切りパターンを作成する文字切り手段と、
前記文字切りパターンに従って切り分けられた画素塊を対象として文字認識を行う認識手段とを備える。 Alternatively, it is a character recognition device, and extraction means for extracting a region including a character string from image data constituted by pixels;
A segmenting means for segmenting the region into pixel blocks which are units of combination;
The relation between the position and width of the pixel block segmented by the segmenting means is approximated by a quadratic function, and the width corresponding to the position of each pixel block obtained by the quadratic function is used as the estimated character size of each pixel block. An estimation means to calculate;
When the pixel blocks adjacent to each other are combined, if the width of the combined pixel block is within the first allowable range from the estimated character size, the segmentation is performed so that the combined pixel block becomes one pixel block. A character cutting means for creating a character cutting pattern in which the separation by means is changed,
Recognizing means for performing character recognition on a pixel block cut according to the character cutting pattern.

本発明によれば文字認識精度が向上する。たとえば、文字の縦横比の変動や文字種による縦横比の違いがあっても、あるいは、文字が詰められたり徐々に小さく書かれることがあっても、本来１つの文字を分離した文字として誤認識したり、本来複数の文字を結合した文字として誤認識する可能性を低下させる。また、認識対象の候補となる文字切りパターンを作る前に画素塊を結合することで計算量を減らす効果がある。 According to the present invention, the character recognition accuracy is improved. For example, even if there is a change in the aspect ratio of the character or the difference in the aspect ratio depending on the character type, or even if the character is stuffed or written gradually, it is misrecognized as a character that was originally separated. Or the possibility of misrecognizing the character as a combination of a plurality of characters. Also, there is an effect of reducing the amount of calculation by combining pixel blocks before creating a character cutting pattern that is a candidate for recognition.

［第１の実施形態］
＜文字認識装置の構成＞
図１は本発明を実施するための文字認識装置の構成を示すブロック図である。１０１はＲＯＭ１０２に格納されている制御プログラムに従って本装置全体の制御を行うＣＰＵ、１０２はＣＰＵ１０１が実行する後述するフローチャートに示す処理等本装置の制御プログラム等を格納するＲＯＭである。１０３は帳票画像等を記憶するＲＡＭ、１０４は磁気ディスク等の外部記憶装置である。１０５はディスプレイ、１０６はキーボード、１０７はマウス等のポインティングデバイスである。１０８は帳票画像等を読み取るイメージスキャナである。 [First Embodiment]
<Configuration of character recognition device>
FIG. 1 is a block diagram showing the configuration of a character recognition apparatus for carrying out the present invention. A CPU 101 controls the entire apparatus according to a control program stored in the ROM 102, and a ROM 102 stores a control program of the apparatus, such as processing shown in a flowchart described later, executed by the CPU 101. Reference numeral 103 denotes a RAM that stores form images and the like, and 104 denotes an external storage device such as a magnetic disk. Reference numeral 105 denotes a display, 106 denotes a keyboard, and 107 denotes a pointing device such as a mouse. An image scanner 108 reads a form image or the like.

本発明は汎用的なコンピュータでも実施可能である。その場合、コンピュータ読取り可能な記憶媒体等で提供されるコンピュータプログラムを外部記憶装置１０４に記憶し、オペレータの指示等によりＣＰＵ１０１で実行するように構成されてもよい。また、１０９はネットワークインターフェースであり、図示しない遠隔地に存在する装置と通信し、プログラムやデータなどを読み込んだり、書き込んだりする。汎用コンピュータを用いる場合には、イメージスキャナ１０８は周辺機器として提供されるイメージスキャナが用いられる。 The present invention can also be implemented by a general-purpose computer. In that case, a computer program provided on a computer-readable storage medium or the like may be stored in the external storage device 104 and executed by the CPU 101 in accordance with an operator instruction or the like. Reference numeral 109 denotes a network interface, which communicates with a device located in a remote place (not shown) to read and write programs and data. When a general-purpose computer is used, an image scanner provided as a peripheral device is used as the image scanner 108.

＜文字認識処理＞
次に本実施形態が実行する文字認識処理を図２のフローチャートを用いて説明する。なお、本説明は横書きの文書に対して行うが、縦書きについても方向を９０度回転した上で同様の方法で可能である。図２の手順は図１の文字認識装置により実行されるプログラムにより実現される。 <Character recognition processing>
Next, character recognition processing executed by the present embodiment will be described with reference to the flowchart of FIG. Although this description is given for a horizontally written document, the same method can be used for vertical writing after rotating the direction by 90 degrees. The procedure of FIG. 2 is realized by a program executed by the character recognition apparatus of FIG.

まず、イメージスキャナ１０８で読み取られた、画素で構成される画像データから、ステップＳ２０１で行抽出を行う。抽出対象の画像データは、外部記憶装置１０４に保存されている画像データ、あるいは他の画像入力装置からネットワークインターフェース１０９を介して受信された画像データであってもよい。行抽出は、行方向に射影を取り、射影の存在する部分を高さとするように行矩形を取ればよい。たとえば、行方向に濃度ヒストグラムを求め、空白と空白との間を高さとする領域が行として抽出される。換言すれば、この領域処理は、画素により構成される画像データから文字列を含む領域を抽出する処理である。 First, row extraction is performed in step S201 from image data composed of pixels read by the image scanner 108. The image data to be extracted may be image data stored in the external storage device 104 or image data received from another image input device via the network interface 109. The row extraction may be performed by taking a projection in the row direction and taking a row rectangle so that the portion where the projection exists is the height. For example, a density histogram is obtained in the row direction, and an area having a height between a blank and a blank is extracted as a row. In other words, this region processing is processing for extracting a region including a character string from image data composed of pixels.

次にＳ２０２で抽出された行矩形から、連続した画素のまとまり（画素塊）を抽出する。画素塊は、たとえば抽出した領域（行矩形とも呼ぶ。）を適当に走査して文字を形成する画素を見つけ、輪郭を追跡して元の画素まで戻ってきたら、それを１つの画素塊とする。輪郭の追跡は、たとえば追跡された画素の座標の列として記録される。そして、その画素塊に外接する、行矩形と各辺が平行な矩形の位置とサイズが計算される。矩形の位置は例えば左上角など予め定めた点で示される。又サイズはたとえば縦横各辺の長さで示される。一旦外接矩形が決定された画素塊は、その外接矩形により示される。すなわち、ステップＳ２０２で画素塊が抽出された後には、画素塊とは画素の集まりではなく、その外接矩形を指す。本実施形態における画素塊の抽出処理の出力は、画素塊の外接矩形の位置とサイズである。これは処理上の便宜であって、もちろん画素の集まりそのものを保存して処理することもできる。ステップＳ２０２では、たとえば行矩形をメモリに複製して画素塊を抽出し、抽出した画素塊については行矩形から削除する。そして残った行矩形の内容について繰り返し画素塊の抽出を行って、行矩形内のすべての画素塊を抽出する。 Next, a group of continuous pixels (pixel block) is extracted from the row rectangle extracted in S202. For example, when the extracted pixel (also referred to as a row rectangle) is appropriately scanned to find a pixel that forms a character, and the contour is traced back to the original pixel, the pixel block is used as one pixel block. . Contour tracking is recorded, for example, as a sequence of tracked pixel coordinates. Then, the position and size of a row rectangle and a rectangle parallel to each side that circumscribes the pixel block are calculated. The position of the rectangle is indicated by a predetermined point such as the upper left corner. The size is indicated by, for example, the length of each side in the vertical and horizontal directions. A pixel block for which a circumscribed rectangle is once determined is indicated by the circumscribed rectangle. That is, after the pixel block is extracted in step S202, the pixel block is not a collection of pixels but a circumscribed rectangle. The output of the pixel block extraction processing in this embodiment is the position and size of the circumscribed rectangle of the pixel block. This is a processing convenience, and of course, the pixel collection itself can be stored and processed. In step S202, for example, a row rectangle is copied to a memory to extract a pixel block, and the extracted pixel block is deleted from the row rectangle. Then, pixel blocks are repeatedly extracted for the contents of the remaining row rectangles, and all the pixel blocks in the row rectangles are extracted.

ステップＳ２０３では、横書きなら上下に存在する画素塊同士を結合する。画素塊を結合するとは、たとえば結合対象の画素塊の外接矩形をすべて包含する最小の外接矩形を、結合後の新たな外接矩形として作成することである。結合対象の画素塊の外接矩形は不要なので消去される。その結果、たとえば冠を部首として持つ文字は画素塊同士が結合し、１文字単位の個別文字パターンとなる。また冠の下に偏と旁がある場合には、その３つの画素塊に外接するひとつの外接矩形が結合後の画素塊として得られる。しかしながら、文字の一部の小画素塊が孤立し、結合しないで残ってしまうことがある。なお上下に存在するとは、文字列の方向と直交する方向について互いに重複すると言い換えることもできる。 In step S203, the pixel blocks existing above and below are combined for horizontal writing. Combining pixel blocks means, for example, creating a minimum circumscribed rectangle including all circumscribed rectangles of pixel blocks to be combined as a new circumscribed rectangle after combining. The circumscribed rectangle of the pixel block to be combined is unnecessary and is deleted. As a result, for example, for a character having a crown as a radical, pixel blocks are combined to form an individual character pattern in units of one character. When there is a bias and a defect under the crown, one circumscribed rectangle circumscribing the three pixel blocks is obtained as a combined pixel block. However, some small pixel blocks of a character may be isolated and remain without being combined. In addition, it can be paraphrased that it exists mutually up and down about the direction orthogonal to the direction of a character string.

そこで、ステップＳ２０４で孤立画素塊と接近して存在する画素塊同士を結合する。結合の条件は、たとえば画素塊が隣接しており、かつ、隣接する画素塊の距離が所定距離以内であることなどである。この距離は絶対的なものではなく、小さい方の外接矩形の短辺の長さの所定数分の１（たとえば４分の１）以下であることなどである。これに、縦横いずれかの辺どうしの比が一定値以上（たとえば４以上）または一定値以下（たとえば４分の１以下）など、付加的な条件を加えることもできる。もちろんこれは一例に過ぎない。また句読点を除外するために、まず画素塊の位置及び形状から句読点（「、」と「。」）を先に認識し、その位置を保存した後、句読点に相当する画素塊を行矩形から削除してしまってもよい。 In step S204, the pixel blocks existing close to the isolated pixel block are combined. The combination condition is, for example, that pixel blocks are adjacent to each other and the distance between adjacent pixel blocks is within a predetermined distance. This distance is not absolute and is not more than a predetermined number (for example, ¼) of the length of the short side of the smaller circumscribed rectangle. An additional condition such as a ratio between any of the vertical and horizontal sides being greater than a certain value (for example, 4 or more) or less than a certain value (for example, ¼ or less) can be added to this. Of course, this is only an example. To exclude punctuation marks, first recognize punctuation marks ("," and ".") From the position and shape of the pixel block, save the position, and then delete the pixel block corresponding to the punctuation mark from the line rectangle. You may do it.

このようにステップＳ２０２〜Ｓ２０４では、結合されるべき画素塊は結合して、この後のステップで結合される単位となる画素塊に切り分けられる。すなわち、認識対象の文字列が含まれる領域を、文字認識のために組み合わせられる単位となる画素塊に切り分ける。ステップＳ２０３、Ｓ２０４では、ある画素塊に着目した場合、着目画素およびその近傍画素塊の関係のみに従って結合するか否かが決定される。後述のステップＳ２０６やＳ２０８における画素塊の結合では、推定文字サイズという、統計処理により得られた値を用いて結合の有無を決定しており、その点でステップＳ２０３，２０４は、ステップＳ２０６，Ｓ２０８と異なっている。 As described above, in steps S202 to S204, the pixel blocks to be combined are combined and divided into pixel blocks which are units to be combined in the subsequent steps. That is, an area including a character string to be recognized is cut into pixel blocks that are units to be combined for character recognition. In steps S203 and S204, when attention is paid to a certain pixel block, whether or not to combine them is determined only according to the relationship between the pixel of interest and its neighboring pixel block. In the pixel block combination in steps S206 and S208, which will be described later, the presence / absence of combination is determined using a value obtained by statistical processing, that is, the estimated character size. In this regard, steps S203 and 204 are steps S206 and S208. Is different.

なお、行矩形について、文字の並び方向に直交する方向のヒストグラムを求め、空白位置で文字切りをしてもよい。このようにすれば、上下に結合すべき画素塊はひとつの画素塊として扱われるため、ステップＳ２０３の処理を省くことができる。 For the line rectangle, a histogram in a direction orthogonal to the character arrangement direction may be obtained, and characters may be cut at blank positions. In this way, the pixel block to be combined vertically is handled as one pixel block, so that the process of step S203 can be omitted.

例えば図３の『京』というを含む行矩形からは、ステップＳ２０２で４個の画素塊３０１〜３０４が抽出され、ステップＳ２０３で上下方向に重なる画素塊３０１、３０２、３０３が結合されて１つの画素塊３０５になる。次にステップＳ２０４で孤立画素塊３０４と結合した画素塊３０５に注目し、両者が接近しており一方の画素塊が他方に比べ十分に小さいので、小さい画素塊は大きな画素塊の一部が分離しているものとして結合される。最終的に個別文字パターン３０６が得られる。 For example, from the row rectangle including “K” in FIG. 3, four pixel blocks 301 to 304 are extracted in step S202, and pixel blocks 301, 302, and 303 that overlap in the vertical direction are combined in step S203. A pixel block 305 is obtained. Next, in step S204, attention is paid to the pixel block 305 combined with the isolated pixel block 304. Since both pixels are close to each other and one pixel block is sufficiently smaller than the other, the small pixel block is separated from a part of the large pixel block. To be combined. Finally, an individual character pattern 306 is obtained.

ステップＳ２０３の上下結合とＳ２０４の孤立画素塊結合が終わった画素塊が、文字切りパターンの組み合わせの単位となる。まず、この画素塊を使用して、ステップＳ２０５で行方向の位置によって決定する推定文字サイズを求める。文字サイズは横書きでは文字幅、縦書きでは文字高さである。着目画素塊の推定文字サイズは、例えば着目画素塊の２つ前の画素塊と２つ後の画素塊の幅の平均である局所平均とする。ただし２つ前の画素塊や２つ後の画素塊が存在しない場合には別途定めておく。たとえば、先頭から２番目の画素塊に対する推定文字サイズは先頭から１番目と４番目の文字の平均とし、先頭から１番目の推定文字サイズは先頭から２番目とそれと同一の値にする。また末尾から２番目の画素塊に対する推定文字サイズは末尾から１番目と４番目の文字の平均とし、末尾から１番目の文字も同様とする。これにより抽出された全ての画素塊に対し推定文字サイズが決定される。より一般化すれば、推定文字サイズは上述の２つの画素塊の平均とは限らない。着目画素塊を含む、行矩形内の一定サイズの局所領域を定め、その局所領域内の画素塊のサイズから計算した平均値を推定文字サイズとする。上記例では、局所領域内として着目画素塊の前後２画素塊を含む領域を定め、その中に含まれる先頭と末尾の画素塊のサイズの平均を推定文字サイズとしている。すなわち、行矩形において、ステップＳ２０４までの手順で切り分けられた画素塊に順次着目し、着目した画素塊を含む局所領域に属する所定の画素塊の幅の平均を計算する。その結果を、着目画素塊の推定文字サイズとして推定する。推定文字サイズは、着目した画素塊それ自身ではなく、その近傍の画素塊のサイズを参照して着目画素塊のサイズを推定するための値である。たとえば着目画素塊のサイズが推定画素塊のサイズよりも大幅に小さい場合、着目画素塊は独立した文字ではない可能性が高いと推定できる。 The pixel block after the combination of the top and bottom in step S203 and the isolated pixel block combination in S204 is a unit of character cut pattern combination. First, using this pixel block, an estimated character size determined by the position in the row direction is obtained in step S205. The character size is the character width in horizontal writing and the character height in vertical writing. The estimated character size of the pixel block of interest is, for example, a local average that is the average of the widths of the pixel block two pixels before and two pixels after the pixel block of interest. However, when there is no two previous pixel block or two subsequent pixel block, it is determined separately. For example, the estimated character size for the second pixel block from the top is the average of the first and fourth characters from the top, and the first estimated character size from the top is the second and the same value from the top. The estimated character size for the second pixel block from the end is the average of the first and fourth characters from the end, and the same applies to the first character from the end. Thereby, the estimated character size is determined for all the extracted pixel blocks. More generally, the estimated character size is not necessarily the average of the two pixel blocks described above. A local area of a certain size within a row rectangle including the pixel block of interest is defined, and an average value calculated from the size of the pixel block within the local area is set as the estimated character size. In the above example, an area including two pixel blocks before and after the pixel block of interest is defined as a local region, and an average of the sizes of the first and last pixel blocks included therein is used as the estimated character size. That is, in the row rectangle, attention is sequentially paid to the pixel blocks cut out in the procedure up to step S204, and the average of the widths of the predetermined pixel blocks belonging to the local region including the focused pixel block is calculated. The result is estimated as the estimated character size of the pixel block of interest. The estimated character size is a value for estimating the size of the pixel block of interest by referring to the size of the pixel block in the vicinity thereof, not the pixel block of interest. For example, when the size of the pixel block of interest is much smaller than the size of the estimated pixel block, it can be estimated that there is a high possibility that the pixel block of interest is not an independent character.

図４は入力された文字列の一例であり、図５は画素塊の位置（行方向）をＸ軸に、各画素塊のサイズ（太線）及び各画素塊の推定文字サイズ（局所平均：細線）をＹ軸に表した表である。たとえば「都」は、図４では、「者」４０１と「おおざと」４０２のふたつの画素塊に切り分けられている。図５を参照すると、この両者の実際のサイズ（幅）は、周辺の画素塊の幅に比べて極端に小さい（座標位置５０１，５０２）。ところが推定文字サイズを参照すると、近傍の画素塊よりもむしろ大きい（座標位置５１１，５１２）。これは、画素塊「者」と「おおざと」とが結合してひとつの文字を構成する可能性が高いことを示す。そして実際にその通りである。 FIG. 4 shows an example of an input character string. FIG. 5 shows the size (thick line) of each pixel block and the estimated character size (local average: thin line) of each pixel block with the position (row direction) of the pixel block as the X axis. ) On the Y-axis. For example, “Miyako” is divided into two pixel blocks of “person” 401 and “Ozama” 402 in FIG. Referring to FIG. 5, the actual size (width) of both is extremely small compared to the width of the surrounding pixel block (coordinate positions 501 and 502). However, referring to the estimated character size, it is larger than the neighboring pixel block (coordinate positions 511, 512). This indicates that there is a high possibility that the pixel blocks “person” and “Ozama” are combined to form one character. And that is true.

このようにして推定文字サイズが求まったら、ステップＳ２０６で、文字列の方向すなわち横書きなら左右方向にある画素塊同士を、ステップＳ２０５でもとめた推定文字サイズを使用して結合する。ただし、結合には条件があり、結合後の画素塊のサイズ（幅）が、結合対象の画素塊の一方の推定文字サイズを基準として第１の許容範囲内であれば、結合する。基準となる推定文字サイズは、予め定めた方の画素塊のそれであってもよいし、あるいは、結合される画素塊の推定文字サイズの平均であっても良い。例えば文字幅の許容範囲を推定文字サイズの０．９倍から１．１倍とし、２つの画素塊を結合した時の文字幅が許容範囲内に入れば、両者を結合する。ここで言う結合は、ステップＳ２０３と同じ事である。すでに画素塊の切り分けは決定されているので、ステップＳ２０６では既存の画素塊の切り分け方を変更して文字切りパターンを作成する処理ということもできる。 When the estimated character size is obtained in this way, in step S206, the pixel blocks in the direction of the character string, that is, in the horizontal direction in horizontal writing, are combined using the estimated character size determined in step S205. However, there is a condition for combining, and if the size (width) of the combined pixel block is within the first allowable range based on the estimated character size of one of the pixel blocks to be combined, combining is performed. The estimated character size as a reference may be that of a predetermined pixel block, or may be the average of the estimated character sizes of the combined pixel block. For example, if the allowable range of the character width is 0.9 to 1.1 times the estimated character size and the character width when the two pixel blocks are combined is within the allowable range, the two are combined. The connection here is the same as in step S203. Since the segmentation of the pixel block has already been determined, it can be said that the process of creating the character segmentation pattern by changing the segmentation of the existing pixel block in step S206.

ステップＳ２０６では条件が満たされなければ画素塊の結合は行われないので、必ずしも結合が生じるわけではない。ステップＳ２０６で左右結合が発生した場合、ステップＳ２０７で結合後の画素塊の情報を利用してステップＳ２０５の推定文字サイズ算出ステップに戻る。たとえば、結合前の画素塊の外接矩形の位置とサイズを、結合後の画素塊の外接矩形の位置とサイズとで置換し、改めて推定文字サイズの計算から繰り返す。左右結合が発生しなければステップＳ２０８に進む。 In step S206, if the condition is not satisfied, the pixel block is not combined, so the combination does not necessarily occur. If left-right combination occurs in step S206, the process returns to the estimated character size calculation step in step S205 using the pixel block information after combination in step S207. For example, the position and size of the circumscribed rectangle of the pixel block before combining are replaced with the position and size of the circumscribed rectangle of the combined pixel block, and the calculation of the estimated character size is repeated again. If left-right coupling does not occur, the process proceeds to step S208.

ステップＳ２０８では、ここまでのステップで得られた画素塊を組み合わせてラティス構造を生成する。すなわち、ステップＳ２０７までで得られた画素塊の切り分けを示す文字切りパターンを基に、互いに隣接する画素塊を結合するよう画素塊の切り分けを変更して新たな文字切りパターンを生成する。これは基の文字切りパターンに代わるものではなく、追加的に生成される。作成された文字切りパターンが、最終的に決定される文字切りパターンの候補となる。画素塊の結合は、ここでも一定の条件の下で行われる。すなわち、結合後の画素塊のサイズ（幅）が、結合対象の画素塊の一方の推定文字サイズを基準として第２の許容範囲内であれば結合する。第２の許容範囲は第１の許容範囲よりも広い。たとえば、推定文字サイズの１.２倍もしくは文字行高さの１.２倍の大きい方を第２の許容範囲とする。画素塊同士を結合した幅が許容範囲（あるいは閾値）以内となる場合は、文字切りパターンの候補（すなわちラティス構造）に、結合後の文字切りパターンを追加する。ステップＳ２０８では、条件に適合するすべての文字切りパターンの候補が１または複数作成される。 In step S208, a lattice structure is generated by combining the pixel blocks obtained in the steps so far. That is, based on the character segmentation pattern indicating the segmentation of the pixel block obtained up to step S207, the segmentation of the pixel block is changed so as to combine the adjacent pixel clusters and a new character segmentation pattern is generated. This is not a substitute for the original character cut pattern, but is generated additionally. The created character cutting pattern is a candidate for a character cutting pattern to be finally determined. The pixel block combination is again performed under certain conditions. That is, if the size (width) of the combined pixel block is within the second allowable range based on the estimated character size of one of the pixel blocks to be combined, the combined pixel block is combined. The second allowable range is wider than the first allowable range. For example, a larger one that is 1.2 times the estimated character size or 1.2 times the character line height is set as the second allowable range. If the combined width of the pixel blocks is within an allowable range (or threshold), the combined character cutting pattern is added to the character cutting pattern candidate (ie, lattice structure). In step S208, one or more candidates for all character cutting patterns that meet the conditions are created.

ステップＳ２０９では、ステップＳ２０８で作成された文字切りパターンのそれぞれに従って切り分けられた画素塊を対象として、文字認識が行われる。そして、ひとつの文字切りパターン毎に、各文字の文字の認識結果得られた評価値の合計を計算する。この評価値の合計値を文字切りパターンの候補それぞれについて求め、互いに比較して、評価値が最大となるような（すなわち最も確からしい）文字切りパターンを選択する。選択された文字切りパターンに従って認識された文字列が、認識結果として出力される。文字認識処理自体は公知の方法であって良く、たとえば識別辞書に登録された特徴と、認識対象の画素塊から抽出された特徴とを対比して、最も一致点が多い文字を認識結果として出力する。特徴の一致点が多ければ評価値は高い。 In step S209, character recognition is performed on the pixel block cut according to each of the character cutting patterns created in step S208. Then, for each character cutting pattern, the sum of evaluation values obtained as a result of character recognition for each character is calculated. A total value of the evaluation values is obtained for each of the character cutting pattern candidates, and compared with each other to select a character cutting pattern that maximizes the evaluation value (that is, most likely). A character string recognized according to the selected character cutting pattern is output as a recognition result. The character recognition process itself may be a known method. For example, the feature registered in the identification dictionary is compared with the feature extracted from the pixel block to be recognized, and the character with the most matching points is output as the recognition result. To do. The evaluation value is high if there are many matching points of features.

以上の手順により、複数の文字切りパターンの候補それぞれについて文字認識を行ってその結果を評価し、最も適当と判定された文字切りパターンに従って認識された文字列を出力結果とする。また候補となる文字切りパターンは、統計的に可能性が高いと判定された文字切りパターンに限られているために、処理量も従来例に比べて相当小さくて済む。 Through the above procedure, character recognition is performed for each of a plurality of character cut pattern candidates, and the result is evaluated, and a character string recognized according to the character cut pattern determined to be most appropriate is used as an output result. Further, since the character cut patterns that are candidates are limited to character cut patterns that have been statistically determined to be highly probable, the processing amount can be considerably smaller than that of the conventional example.

以上説明したように、本実施形態により、文字が左右に分離してしまう場合でも局所平均から少ない誤差で推定文字サイズが得られる効果がある。また、文字列全体の平均ではなく、局所的な平均を使うので、ユーザが徐々に文字を詰めて書いたり、小さく書いたりしている場合にもより精度よく対応することができる。また、上記処理を繰り返すことで、左右に分離する文字が連続しても、個別文字パターンを徐々に作り出し、最終的には全ての個別文字パターンを抽出できる効果がある。 As described above, according to the present embodiment, there is an effect that an estimated character size can be obtained with a small error from the local average even when characters are separated into left and right. In addition, since the local average is used instead of the average of the whole character string, even when the user gradually writes characters in small letters or writes them in small letters, it is possible to cope with higher accuracy. Further, by repeating the above processing, even if characters separated on the left and right are continuous, there is an effect that an individual character pattern can be gradually created and finally all individual character patterns can be extracted.

［変形例］
なお本実施形態ではステップＳ２０７の後には、ステップＳ２０８に進むものとしているが、ステップＳ２０９に進んでも良い。この場合には複数の文字切りパターンの候補は作成されないため、候補の中から適当と推定される認識結果を選択することはないが、処理量は大幅に圧縮される。また、ステップＳ２０７の処理において、統計的に可能性の高い文字切りパターンが決定されているために、認識結果も比較的良好であることが期待できる。 [Modification]
In this embodiment, after step S207, the process proceeds to step S208. However, the process may proceed to step S209. In this case, since a plurality of character cutting pattern candidates are not created, a recognition result estimated to be appropriate is not selected from the candidates, but the processing amount is greatly reduced. In addition, in the process of step S207, since a character cutting pattern with a statistically high possibility is determined, it can be expected that the recognition result is relatively good.

［第２の実施形態］
次に、図１の文字認識装置が実行する、前記実施形態とは異なる第２の実施形態が実行する文字認識処理を図６のフローチャートを用いて説明する。本実施形態は、図２を図６の手順で置換したものであるため、重複する部分の説明は省略する。 [Second Embodiment]
Next, a character recognition process executed by the second embodiment different from the above-described embodiment, which is executed by the character recognition device shown in FIG. 1, will be described with reference to the flowchart shown in FIG. In the present embodiment, FIG. 2 is replaced by the procedure of FIG.

＜文字認識処理＞
次に本実施形態が実行する文字認識処理を図６のフローチャートを用いて説明する。なお、本説明は横書きの文書に対して行うが、縦書きについても方向を９０度回転した上で同様の方法で可能である。図６の手順は図１の文字認識装置により実行されるプログラムにより実現される。 <Character recognition processing>
Next, a character recognition process executed by the present embodiment will be described with reference to the flowchart of FIG. Although this description is given for a horizontally written document, the same method can be used for vertical writing after rotating the direction by 90 degrees. The procedure of FIG. 6 is realized by a program executed by the character recognition device of FIG.

まず、イメージスキャナ１０８で読み取られた、画素で構成される画像データから、ステップＳ６０１で行抽出を行う。抽出対象の画像データは、外部記憶装置１０４に保存されている画像データ、あるいは他の画像入力装置からネットワークインターフェース１０９を介して受信された画像データであってもよい。行抽出は、行方向に射影を取り、射影の存在する部分を高さとするように行矩形を取ればよい。たとえば、行方向に濃度ヒストグラムを求め、空白と空白との間を高さとする領域が行として抽出される。換言すれば、この領域処理は、画素により構成される画像データから文字列を含む領域を抽出する処理である。 First, line extraction is performed in step S601 from image data composed of pixels read by the image scanner. The image data to be extracted may be image data stored in the external storage device 104 or image data received from another image input device via the network interface 109. The row extraction may be performed by taking a projection in the row direction and taking a row rectangle so that the portion where the projection exists is the height. For example, a density histogram is obtained in the row direction, and an area having a height between a blank and a blank is extracted as a row. In other words, this region processing is processing for extracting a region including a character string from image data composed of pixels.

次にＳ６０２で抽出された行矩形から画素の塊（画素塊）を抽出する。画素塊は、たとえば抽出した領域（行矩形とも呼ぶ。））を適当に走査して文字を形成する画素を見つけ、輪郭を追跡して元の画素まで戻ってきたら、それを１つの画素塊とする。輪郭の追跡は、たとえば追跡された画素の座標の列として記録される。そして、その画素塊に外接する、行矩形と各辺が平行な矩形の位置とサイズが計算される。矩形の位置は例えば左上角など予め定めた点で示される。又サイズはたとえば縦横各辺の長さで示される。一旦外接矩形が決定された画素塊は、その外接矩形により示される。すなわち、ステップＳ６０２で画素塊が抽出された後には、画素塊とは画素の集まりではなく、その外接矩形を指す。本実施形態における画素塊の抽出処理の出力は、画素塊の外接矩形の位置とサイズである。これは処理上の便宜であって、もちろん画素の集まりそのものを保存して処理することもできる。ステップＳ６０２では、たとえば行矩形をメモリに複製して画素塊を抽出し、抽出した画素塊については行矩形から削除する。そして残った行矩形の内容について繰り返し画素塊の抽出を行って、行矩形内のすべての画素塊を抽出する。 Next, a pixel block (pixel block) is extracted from the row rectangle extracted in S602. The pixel block is, for example, a pixel that forms a character by appropriately scanning the extracted area (also referred to as a row rectangle) and tracing the outline back to the original pixel. To do. Contour tracking is recorded, for example, as a sequence of tracked pixel coordinates. Then, the position and size of a row rectangle and a rectangle parallel to each side that circumscribes the pixel block are calculated. The position of the rectangle is indicated by a predetermined point such as the upper left corner. The size is indicated by, for example, the length of each side in the vertical and horizontal directions. A pixel block for which a circumscribed rectangle is once determined is indicated by the circumscribed rectangle. That is, after the pixel block is extracted in step S602, the pixel block is not a collection of pixels but a circumscribed rectangle. The output of the pixel block extraction processing in this embodiment is the position and size of the circumscribed rectangle of the pixel block. This is a processing convenience, and of course, the pixel collection itself can be stored and processed. In step S602, for example, the pixel rectangle is extracted by copying the row rectangle into the memory, and the extracted pixel block is deleted from the row rectangle. Then, pixel blocks are repeatedly extracted for the contents of the remaining row rectangles, and all the pixel blocks in the row rectangles are extracted.

ステップＳ６０３では、横書きなら上下に存在する画素塊同士を結合する。画素塊を結合するとは、たとえば結合対象の画素塊の外接矩形をすべて包含する最小の外接矩形を、結合後の新たな外接矩形として作成することである。結合対象の画素塊の外接矩形は不要なので消去される。その結果、たとえば冠を部首として持つ文字は画素塊同士が結合し、１文字単位の個別文字パターンとなる。また冠の下に偏と旁がある場合には、その３つの画素塊に外接するひとつの外接矩形が結合後の画素塊として得られる。しかしながら、文字の一部の小画素塊が孤立し、結合しないで残ってしまうことがある。 In step S603, the pixel blocks existing above and below are combined for horizontal writing. Combining pixel blocks means, for example, creating a minimum circumscribed rectangle including all circumscribed rectangles of pixel blocks to be combined as a new circumscribed rectangle after combining. The circumscribed rectangle of the pixel block to be combined is unnecessary and is deleted. As a result, for example, for a character having a crown as a radical, pixel blocks are combined to form an individual character pattern in units of one character. When there is a bias and a defect under the crown, one circumscribed rectangle circumscribing the three pixel blocks is obtained as a combined pixel block. However, some small pixel blocks of a character may be isolated and remain without being combined.

そこで、ステップＳ６０４で孤立画素塊と接近して存在する画素塊同士を結合する。結合の条件は、たとえば画素塊が隣接しており、かつ、隣接する画素塊の距離が所定距離以内であることなどである。この距離は絶対的なものではなく、小さい方の外接矩形の短辺の長さの所定数分の１（たとえば４分の１）以下であることなどである。これに、縦横いずれかの辺どうしの比が一定値以上（たとえば４以上）または一定値以下（たとえば４分の１以下）など、付加的な条件を加えることもできる。もちろんこれは一例に過ぎない。また句読点を除外するために、まず画素塊の位置及び形状から句読点（「、」と「。」）を先に認識し、その位置を保存した後、句読点に相当する画素塊を行矩形から削除してしまってもよい。 In step S604, the pixel blocks existing close to the isolated pixel block are combined. The combination condition is, for example, that pixel blocks are adjacent to each other and the distance between adjacent pixel blocks is within a predetermined distance. This distance is not absolute and is not more than a predetermined number (for example, ¼) of the length of the short side of the smaller circumscribed rectangle. An additional condition such as a ratio between any of the vertical and horizontal sides being greater than a certain value (for example, 4 or more) or less than a certain value (for example, ¼ or less) can be added to this. Of course, this is only an example. To exclude punctuation marks, first recognize punctuation marks ("," and ".") From the position and shape of the pixel block, save the position, and then delete the pixel block corresponding to the punctuation mark from the line rectangle. You may do it.

このようにステップＳ６０２〜Ｓ６０４では、結合されるべき画素塊は結合して、この後のステップで結合される単位となる画素塊に切り分けられる。 As described above, in steps S602 to S604, the pixel blocks to be combined are combined and divided into pixel blocks which are units to be combined in the subsequent steps.

ステップＳ６０３の上下結合とＳ６０４の孤立画素塊結合が終わった画素塊が、文字切りパターンの組み合わせの単位となる。この画素塊を使用して、ステップＳ６０５で行方向の位置によって決定する推定文字サイズを求める。ステップＳ６０５の推定文字サイズは、画素塊の行方向の位置をパラメータして対応する幅を与える近似二次関数で決定される。この近似二次関数は、たとえば、実際の各画素塊の位置と幅を示す各点からの距離の二乗和が最小となるように決定される。すなわち、最小二乗法で３つの係数を決定することで、近似二次関数は一意に決定できる。 The pixel block that has undergone the vertical combination in step S603 and the isolated pixel block combination in step S604 is a unit of character cut pattern combination. Using this pixel block, an estimated character size determined by the position in the row direction is obtained in step S605. The estimated character size in step S605 is determined by an approximate quadratic function that gives the corresponding width by parameterizing the position of the pixel block in the row direction. This approximate quadratic function is determined so that, for example, the sum of squares of the distances from each point indicating the actual position and width of each pixel block is minimized. That is, the approximate quadratic function can be uniquely determined by determining three coefficients by the least square method.

入力された文字列は実施形態１と同様に図４で示される。図７は行方向をＸ軸にとり、画素塊のサイズを二次関数で近似して得られたグラフである。図７においても、「者」４０１と「おおざと」４０２のふたつの画素塊の推定文字サイズを参照すると、近傍の画素塊並であることが分かる。これは、画素塊「者」と「おおざと」とが結合してひとつの文字を構成する可能性が高いことを示す。 The input character string is shown in FIG. 4 as in the first embodiment. FIG. 7 is a graph obtained by approximating the size of a pixel block with a quadratic function with the row direction taken as the X-axis. Also in FIG. 7, referring to the estimated character sizes of the two pixel blocks “person” 401 and “Ozama” 402, it can be seen that the pixel blocks are in the vicinity. This indicates that there is a high possibility that the pixel blocks “person” and “Ozama” are combined to form one character.

このようにして推定文字サイズが求まったら、ステップＳ６０６で、文字列の方向すなわち横書きなら左右方向にある画素塊同士を、ステップＳ６０５でもとめた推定文字サイズを使用して結合する。ただし、結合には条件があり、結合後の画素塊のサイズ（幅）が、結合対象の画素塊の一方の推定文字サイズを基準として第１の許容範囲内であれば、結合する。基準となる推定文字サイズは、予め定めた方の画素塊のそれであってもよいし、あるいは、結合される画素塊の推定文字サイズの平均であっても良い。例えば文字幅の許容範囲を推定文字サイズの０．９倍から１．１倍とし、２つの画素塊を結合した時の文字幅が許容範囲内に入れば、両者を結合する。すでに画素塊の切り分けは決定されているので、ステップＳ６０６では既存の画素塊の切り分け方を変更して文字切りパターンを作成する処理ということもできる。 When the estimated character size is obtained in this way, in step S606, pixel blocks in the direction of the character string, that is, in the horizontal direction in horizontal writing, are combined using the estimated character size determined in step S605. However, there is a condition for combining, and if the size (width) of the combined pixel block is within the first allowable range based on the estimated character size of one of the pixel blocks to be combined, combining is performed. The estimated character size as a reference may be that of a predetermined pixel block, or may be the average of the estimated character sizes of the combined pixel block. For example, if the allowable range of the character width is 0.9 to 1.1 times the estimated character size and the character width when the two pixel blocks are combined is within the allowable range, the two are combined. Since the segmentation of the pixel block has already been determined, it can be said that the process of creating the character segmentation pattern by changing the segmentation of the existing pixel block in step S606.

この実施形態では、推定文字サイズは二次関数で近似されているため、画素塊の結合が生じても、推定文字サイズを計算し直す必要はない。そこで、ステップＳ６０７では、ここまでのステップで得られた画素塊を組み合わせてラティス構造を生成する。すなわち、ステップＳ６０６までで得られた画素塊の切り分けを示す文字切りパターンを基に、互いに隣接する画素塊を結合するよう画素塊の切り分けを変更して新たな文字切りパターンを生成する。これは基の文字切りパターンに代わるものではなく、追加的に生成される。作成された文字切りパターンが、最終的に決定される文字切りパターンの候補となる。画素塊の結合は、ここでも一定の条件の下で行われる。すなわち、結合後の画素塊のサイズ（幅）が、結合対象の画素塊の一方の推定文字サイズを基準として第２の許容範囲内であれば結合する。たとえば、推定文字サイズの１.２倍もしくは文字行高さの１.２倍の大きい方を第２の許容範囲とする。画素塊同士を結合した幅が許容範囲（あるいは閾値）以内となる場合は、文字切りパターンの候補（すなわちラティス構造）に、結合後の文字切りパターンを追加する。ステップＳ６０７では、条件に適合するすべての文字切りパターンの候補が作成される。 In this embodiment, since the estimated character size is approximated by a quadratic function, it is not necessary to recalculate the estimated character size even if pixel blocks are combined. In step S607, a lattice structure is generated by combining the pixel blocks obtained in the steps so far. That is, based on the character segmentation pattern indicating segmentation of pixel blocks obtained up to step S606, the segmentation of pixel blocks is changed so as to combine adjacent pixel clusters, and a new character segmentation pattern is generated. This is not a substitute for the original character cut pattern, but is generated additionally. The created character cutting pattern is a candidate for a character cutting pattern to be finally determined. The pixel block combination is again performed under certain conditions. That is, if the size (width) of the combined pixel block is within the second allowable range based on the estimated character size of one of the pixel blocks to be combined, the combined pixel block is combined. For example, a larger one that is 1.2 times the estimated character size or 1.2 times the character line height is set as the second allowable range. If the combined width of the pixel blocks is within an allowable range (or threshold), the combined character cutting pattern is added to the character cutting pattern candidate (ie, lattice structure). In step S607, all character cutting pattern candidates that meet the conditions are created.

ステップＳ６０８では、ステップＳ６０７で作成された文字切りパターンのそれぞれに従って切り分けられた画素塊を対象として、文字認識が行われる。そして、ひとつの文字切りパターン毎に、各文字の文字の認識結果得られた評価値の合計を計算する。この評価値の合計値を文字切りパターンの候補それぞれについて求め、互いに比較して、評価値が最大となるような文字切りパターンを選択する。選択された文字切りパターンに従って認識された文字列が、認識結果として出力される。 In step S608, character recognition is performed on the pixel block cut according to each of the character cut patterns created in step S607. Then, for each character cutting pattern, the sum of evaluation values obtained as a result of character recognition for each character is calculated. A total value of the evaluation values is obtained for each character cutting pattern candidate and compared with each other, and a character cutting pattern that maximizes the evaluation value is selected. A character string recognized according to the selected character cutting pattern is output as a recognition result.

以上説明したように、本実施形態により、文字が左右に分離してしまう場合でも局所平均から少ない誤差で推定文字サイズが得られる効果がある。また、上記処理を繰り返すことで、左右に分離する文字が連続しても、個別文字パターンを徐々に作り出し、最終的には全ての個別文字パターンを抽出できる効果がある。 As described above, according to the present embodiment, there is an effect that an estimated character size can be obtained with a small error from the local average even when characters are separated into left and right. Further, by repeating the above processing, even if characters separated on the left and right are continuous, there is an effect that an individual character pattern can be gradually created and finally all individual character patterns can be extracted.

本実施形態により、文字が徐々に小さくなるような場合でも、文字サイズの傾向を適切に推定し近似できる効果がある。また画素塊の結合が生じても推定文字サイズを再計算する必要がなく、処理負荷が軽減される。 According to the present embodiment, there is an effect that the tendency of the character size can be appropriately estimated and approximated even when the character is gradually reduced. Further, even if pixel blocks are combined, it is not necessary to recalculate the estimated character size, and the processing load is reduced.

［変形例］
なお本実施形態ではステップＳ６０６の後には、ステップＳ６０７に進むものとしているが、ステップＳ６０８に進んでも良い。これは第１実施形態と同様である。 [Modification]
In this embodiment, after step S606, the process proceeds to step S607. However, the process may proceed to step S608. This is the same as in the first embodiment.

［他の実施形態］
なお本発明は、複数の機器（例えばホストコンピュータ、インタフェイス機器、リーダ、プリンタなど）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、複写機、ファクシミリ装置など）に適用してもよい。また本発明の目的は、前述の実施形態の機能を実現するプログラムコードを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータが記憶媒体に格納されたプログラムコードを読み出し実行することによっても達成される。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコード自体およびプログラムコードを記憶した記憶媒体は本発明を構成することになる。 [Other Embodiments]
Note that the present invention can be applied to a system (for example, a copier, a facsimile machine, etc.) consisting of a single device even if it is applied to a system composed of a plurality of devices (eg, a host computer, interface device, reader, printer, etc.). You may apply. Another object of the present invention is to supply a recording medium recording a program code for realizing the functions of the above-described embodiments to a system or apparatus, and the system or apparatus computer reads out and executes the program code stored in the storage medium. Is also achieved. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the program code itself and the storage medium storing the program code constitute the present invention.

また、本発明には、プログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム(OS)などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれる。さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた場合についても、本発明は適用される。その場合、書き込まれたプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるCPUなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される。 Further, according to the present invention, the operating system (OS) running on the computer performs part or all of the actual processing based on the instruction of the program code, and the functions of the above-described embodiments are realized by the processing. This is also included. Furthermore, the present invention is also applied to the case where the program code read from the storage medium is written in a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer. In that case, based on the instruction of the written program code, the CPU of the function expansion card or the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing. .

本発明の実施形態１ないし２に係る文字認識装置の構成を示す図である。It is a figure which shows the structure of the character recognition apparatus which concerns on Embodiment 1-2 of this invention. 本発明の実施形態１に係る文字認識処理を説明するためのフローチャートである。It is a flowchart for demonstrating the character recognition process which concerns on Embodiment 1 of this invention. 本発明の実施形態１ないし２に係る文字塊の結合処理を説明する図である。It is a figure explaining the combination process of the character block which concerns on Embodiment 1 or 2 of this invention. 本発明の実施形態１ないし２に係る入力された文字列画像を示す図である。It is a figure which shows the input character string image which concerns on Embodiment 1-2 of this invention. 本発明の実施形態１に係る推定文字サイズを示すグラフである。It is a graph which shows the estimated character size which concerns on Embodiment 1 of this invention. 本発明の実施形態２に係る文字認識処理を説明するためのフローチャートである。It is a flowchart for demonstrating the character recognition process which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係る推定文字サイズを示すグラフである。It is a graph which shows the estimated character size which concerns on Embodiment 2 of this invention.

Explanation of symbols

１０１・・・ＣＰＵ
１０２・・・ＲＯＭ
１０３・・・ＲＡＭ
１０４・・・外部記憶装置
１０５・・・ディスプレイ
１０６・・・キーボード
１０７・・・ポインティングデバイス
１０８・・・イメージスキャナ
１０９・・・ネットワークインターフェース 101 ... CPU
102 ... ROM
103 ... RAM
104 ... External storage device 105 ... Display 106 ... Keyboard 107 ... Pointing device 108 ... Image scanner 109 ... Network interface

Claims

Extraction means for extracting a region including a character string from image data composed of pixels;
A segmenting means for segmenting the region into pixel blocks which are units of combination;
Estimating means for sequentially paying attention to the pixel blocks segmented by the segmenting means and calculating the average width of the predetermined pixel clusters belonging to the local area including the target pixel cluster as the estimated character size of the target pixel cluster in the region When,
When the pixel blocks adjacent to each other are combined, if the width of the combined pixel block is within the first allowable range from the estimated character size, the segmentation is performed so that the combined pixel block becomes one pixel block. A character cutting means for creating a character cutting pattern in which the separation by means is changed,
A character recognition apparatus comprising: a recognition unit configured to perform character recognition on a pixel block cut according to the character cut pattern.

Extraction means for extracting a region including a character string from image data composed of pixels;
A segmenting means for segmenting the region into pixel blocks which are units of combination;
The relation between the position and width of the pixel block segmented by the segmenting means is approximated by a quadratic function, and the width corresponding to the position of each pixel block obtained by the quadratic function is used as the estimated character size of each pixel block. An estimation means to calculate;
When the pixel blocks adjacent to each other are combined, if the width of the combined pixel block is within the first allowable range from the estimated character size, the segmentation is performed so that the combined pixel block becomes one pixel block. A character cutting means for creating a character cutting pattern in which the separation by means is changed,
A character recognition apparatus comprising: a recognition unit configured to perform character recognition on a pixel block cut according to the character cut pattern.

When the pixel blocks whose separation has been changed by the character cutting means that are adjacent to each other are combined, the width of the combined pixel block is larger than the first allowable range based on the estimated character size. If it is within the range, it further comprises candidate creating means for creating one or a plurality of character cutting pattern candidates whose cutting by the character cutting means is changed so that the combined pixel block becomes one pixel block,
The recognizing unit performs character recognition on a pixel block cut in accordance with each of the character cutting pattern created by the character cutting unit and the character cutting pattern candidate created by the candidate creating unit, and each character cutting pattern The character recognition apparatus according to claim 1, wherein the most likely recognition result among the recognition results corresponding to is output as a character recognition result.

The segmenting means segments a group of continuous pixels as a pixel block, pays attention to the segmented pixel block in sequence, and overlaps in a direction perpendicular to the direction of the character string and the pixel block within a predetermined distance from the target pixel block 4. The character recognition device according to claim 1, wherein a pixel block to be processed is cut out as a pixel block as a unit of the combination. 5.

A character recognition method executed by a character recognition device comprising an extraction means, a separation means, an estimation means, a character cut means, and a recognition means,
An extracting step in which the extracting means extracts a region including a character string from image data composed of pixels;
The dividing means for dividing the area into pixel blocks that are units of combination; and
The estimation means sequentially pays attention to the pixel blocks cut out by the cutting step, and within the region, the average width of predetermined pixel blocks belonging to the local area including the target pixel block is calculated as the estimated character size of the target pixel block. An estimation process to calculate as
When the character cutout unit combines the pixel blocks adjacent to each other, if the width of the combined pixel block is within the first allowable range from the estimated character size, the combined pixel block is one pixel block. A character cutting step for creating a character cutting pattern in which the cutting by the cutting means is changed,
A character recognition method comprising: a recognition step in which the recognition means performs character recognition on a pixel block cut according to the character cut pattern.

A character recognition method executed by a character recognition device comprising an extraction means, a separation means, an estimation means, a character cut means, and a recognition means,
An extracting step in which the extracting means extracts a region including a character string from image data composed of pixels;
The dividing means for dividing the area into pixel blocks that are units of combination; and
The estimation means approximates the relationship between the position and width of the pixel block segmented by the segmenting step by a quadratic function, and sets the width corresponding to the position of each pixel block obtained by the quadratic function to each pixel block. An estimation process to calculate the estimated character size of
When the character cutout unit combines the pixel blocks adjacent to each other, if the width of the combined pixel block is within the first allowable range from the estimated character size, the combined pixel block is one pixel block. A character cutting step for creating a character cutting pattern in which the cutting by the cutting means is changed,
A character recognition method comprising: a recognition step in which the recognition means performs character recognition on a pixel block cut according to the character cut pattern.

The character recognition device further includes candidate creation means,
When the candidate creation means combines the pixel blocks whose separation has been changed by the character cutting step adjacent to each other, the width of the combined pixel block is determined from the estimated character size and the first allowable range. A candidate creation step for creating one or a plurality of character cut pattern candidates in which the segmentation by the character cutting step is changed so that the combined pixel block becomes one pixel block within a second allowable range wider than Further comprising
In the recognizing step, character recognition is performed on a pixel block cut in accordance with each of the character cutting pattern created in the character cutting step and the character cutting pattern candidates created in the candidate creating step, and each character cutting pattern 7. The character recognition method according to claim 5, wherein the most probable recognition result among the recognition results corresponding to is output as a character recognition result.

In the segmentation step, a group of consecutive pixels is segmented as a pixel block, and attention is sequentially paid to the segmented pixel blocks, overlapping each other in a direction perpendicular to the direction of the character string and the pixel block within a predetermined distance from the target pixel block The character recognition method according to any one of claims 5 to 7, wherein a pixel block to be processed is cut out as a pixel block as a unit of the combination.

A program for causing a computer to function as the character recognition device according to any one of claims 1 to 4.