JPH0291785A

JPH0291785A - Character recognizing device

Info

Publication number: JPH0291785A
Application number: JP63242214A
Authority: JP
Inventors: Masami Hisagai; 正己久貝
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1988-09-29
Filing date: 1988-09-29
Publication date: 1990-03-30
Anticipated expiration: 2014-01-20
Also published as: JP2848560B2

Abstract

PURPOSE:To attain correct character recognition by deciding whether or not block synthesis is executed based on two adjacent block widths, segmenting the block data based on the decided result and executing the character recognition based on the segmented block data. CONSTITUTION:Image data is inputted by an input means, and block data including character pattern data is extracted based on the input image data by a block extracting means 7. Next, by block synthesizing deciding means 8 and 9, it is decided whether or not the block synchronization is executed based on two adjacent block widths of the extracting block data. By character segmenting means 10 and 11, the block data are segmented based on the decided result by the block synthesizing deciding means 8 and 9, and by a recognizing means 12, the character recognition is executed based on the block data segmented with the character segmenting means 10 and 11. Thus, the character segmenting to list up the word candidates by the word collation can be correctly executed.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は文字認識装置に関し、例えば分離文字の文字認
識を行う文字認識装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a character recognition device, and for example, to a character recognition device that recognizes separated characters.

［従来の技術］従来、この種の装置においては、光学的文字認識での文
字の切出しをまず縦方向の黒画素ヒストグラムをとるこ
とにより文字行の切出しを行い、その後に切出された各
文字行について横方向ヒストグラムをとることにより文
字の外接矩形（以下ブロックと呼ぶ）を求めて行ってい
る。この際に、例えば「い」・「す」　・「ル」・「仏
」などの分離文字においては、各分離文字のブロックが
左右二個のブロックに分かれてしまうが、左右のブロッ
クを合成した合成ブロックが平均文字中に近い巾になる
ときには合成ブロックを１文字として切出すようにして
いる。[Prior Art] Conventionally, in this type of device, character lines in optical character recognition are first extracted by taking a vertical black pixel histogram, and then each extracted character is extracted. By taking a horizontal histogram for each line, a circumscribed rectangle (hereinafter referred to as a block) of a character is obtained. In this case, for example, for separated characters such as ``i'', ``su'', ``ru'', and ``butsu'', the block of each separated character is divided into two blocks on the left and right, but when the left and right blocks are combined, When the composite block has a width close to that of an average character, the composite block is cut out as one character.

ここで、文字認識動作を第６図に示す従来の文字認識装
置の概略的なブロック構成に基づいて説明する。Here, the character recognition operation will be explained based on the schematic block configuration of the conventional character recognition device shown in FIG.

まず、読取部５１で光学的に読み取った原稿上のイメー
ジデータなメモリ５２に格納する。次にイメージデータ
に基づいてブロック抽出部５３で上述のような行の切出
し及びブロックの抽出を行う。合成可能なブロック同士
があればブロック合成部５４で合成し、文字バッファ５
５に格納する。そして認識辞書部５７に記憶されている
文字の標準パターンと文字バッファ５５に格納されてい
る単独ブロック或は合成ブロックとを認識部５６で文字
認識する。このようにして文字認識された標準パターン
を認識文字としてその文字コードを単語照合部５８の単
語バッファ（図示しない）に格納する。単語バッファに
所定の認識文字が格納されると単語辞書部５９に記憶さ
れている単語辞書との単語照合を行うようにする。First, image data on a document optically read by the reading unit 51 is stored in the memory 52 . Next, based on the image data, the block extraction section 53 performs row cutting and block extraction as described above. If there are blocks that can be combined, they are combined in the block combination unit 54 and transferred to the character buffer 5.
Store in 5. Then, the recognition unit 56 performs character recognition on the standard pattern of characters stored in the recognition dictionary unit 57 and the single block or composite block stored in the character buffer 55. The standard pattern character-recognized in this manner is used as a recognized character, and its character code is stored in a word buffer (not shown) of the word matching unit 58. When a predetermined recognized character is stored in the word buffer, the word is compared with the word dictionary stored in the word dictionary section 59.

［発明が解決しようとする課題］しかしながら、上記従来例では、ピリオド、コンマ、中
黒及び半角数字など文字中の小さい文字が混在している
ために、例えば第７図に示すように“３“と　、”が合
成されて“３．”になってしまいブロックの合成を誤っ
てしまうことがある。このため文字切出しの段階で、す
でに正しい文字候補が排除されてしまうことになる。こ
れは単語照合の段階で一文字程度の違いは許して比較す
ることにより単語候補を見つけることによって補填する
ことも考えられるが、−文字の違いといえども正確性を
失うことにより、またブロック合成での誤りは単語を構
成する文字数を誤ることになるので、長さの異なる単語
同志の類似度比較が必要となり単語照合が複雑となって
しまう欠点がある。[Problems to be Solved by the Invention] However, in the above conventional example, small characters such as periods, commas, bullets, and half-width numbers are mixed, so for example, as shown in FIG. ,” are synthesized to form “3. ”, which may result in incorrect block composition. As a result, correct character candidates are already eliminated at the character extraction stage. It is possible to compensate by finding word candidates by comparing the words, but - even if there are differences in letters, accuracy will be lost, and errors in block composition will lead to incorrect numbers of characters composing a word. Therefore, it is necessary to compare the degree of similarity between words of different lengths, which makes word matching complicated.

本発明は上述の従来例の欠点に鑑みてなされたものであ
り、その目的とするところは、単語照合での単語候補を
挙げるための文字切り出しを正確に行える文字認識装置
を提供する点にある。The present invention has been made in view of the above-mentioned drawbacks of the conventional examples, and its purpose is to provide a character recognition device that can accurately cut out characters to list word candidates in word matching. .

［課題が解決するための手段］上述した課題を解決し、目的を達成するため、本発明に
係わる文字認識装置は、イメージデータに基づいて文字
認識を行う文字認識装置において、イメージデータを入
力する入力手段と、前記入力イメージデータに基づいて
文字パターンデータな含むブロックデータを抽出するブ
ロック抽出手段と、前記抽出ブロックデータの隣り合う
２つのブロック中に基づいてブロック合成をするか否か
を判定するブロック合成判定手段と、該ブロック合成判
定手段での判定結果に基づいてブロックデータの切出し
を行う文字切出し手段と、該文字切出し手段で切出した
ブロックデータに基づいて文字認識を行う認識手段とを
備えることを特徴とする。[Means for Solving the Problems] In order to solve the above-mentioned problems and achieve the purpose, a character recognition device according to the present invention is a character recognition device that performs character recognition based on image data. an input means; a block extracting means for extracting block data including character pattern data based on the input image data; and determining whether to perform block synthesis based on two adjacent blocks of the extracted block data. It includes a block composition determination means, a character extraction means for extracting block data based on the determination result by the block composition determination means, and a recognition means for performing character recognition based on the block data extracted by the character extraction means. It is characterized by

また、好ましくは、前記認識手段は、文字認識の結果に
基づいて単語候補を形成する単語候補形成手段と、前記
単語候補で単語照合を行う単語照合手段とを含むことを
特徴とする。Preferably, the recognition means includes a word candidate forming means that forms word candidates based on the result of character recognition, and a word matching means that performs word matching on the word candidates.

さらに、好ましくは、前記認識手段は、ブロックデータ
中の文字パターンを認識文字の候補とし、予め記憶して
いる標準パターンとの類似度で認識文字を識別する識別
手段を含むことを特徴とする。Furthermore, preferably, the recognition means includes an identification means that uses a character pattern in the block data as a candidate for a recognition character and identifies the recognition character based on the degree of similarity to a standard pattern stored in advance.

［作用］以上の構成によれば、入力手段によりイメージデータな
入力し、ブロック抽出手段により入力イメージデータに
基づいて文字パターンデータな含むブロックデータを抽
出し、ブロック合成判定手段により抽出ブロックデータ
の隣り合う２つのブロック中に基づいてブロック合成を
するか否かを判定し、文字切出し手段によりブロック合
成判定手段での判定結果に基づいてブロックデータの切
出しを行い、認識手段により文字切出し手段で切出した
ブロックデータに基づいて文字認識を行うようにしてい
る。[Operation] According to the above configuration, the input means inputs image data, the block extraction means extracts block data including character pattern data based on the input image data, and the block composition determination means extracts block data adjacent to the extracted block data. It is determined whether or not to perform block synthesis based on two blocks that match, and the block data is extracted by the character extraction means based on the determination result of the block composition determination means, and the block data is extracted by the character extraction means by the recognition means. Character recognition is performed based on block data.

し実施例］以下、添付図面を参照して本発明に係る好適な実施例を
詳細に説明する。Embodiments] Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

く第１の実施例の説明〉まず、第１の実施例について説明する。Description of the first embodiment> First, a first example will be described.

第１図は本発明に係わる文字認識装置の第１の実施例を
示すブロック図である。図において、１は第１の実施例
の文字認識装置を示している。２は本装置全体を制御す
るＣＰＵを示している。３はＣＰＵ２を動作させるため
の制御プログラム。FIG. 1 is a block diagram showing a first embodiment of a character recognition device according to the present invention. In the figure, 1 indicates a character recognition device of the first embodiment. 2 indicates a CPU that controls the entire device. 3 is a control program for operating the CPU2.

エラー処理プログラム、そして後述する第３図（ａ）、
（ｂ）に示すフローチャートに従ったプログラム等を格
納しているＲＯＭを示し、４はＲＯＭ３に格納されてい
る各種プログラムのワークエリア及びエラー処理時の一
時退避エリアとして用いるＲＡＭを示している。５は原
稿画像を光学的に読み取る読取部を示し、６は読取部５
で読取った画像データを格納するメモリを示している。Error processing program and FIG. 3(a), which will be described later.
A ROM stores programs according to the flowchart shown in (b), and 4 indicates a RAM used as a work area for various programs stored in the ROM 3 and a temporary save area during error processing. Reference numeral 5 indicates a reading section that optically reads the original image, and 6 indicates the reading section 5.
The figure shows the memory that stores the image data read by the .

７はメモリ６に格納されている画像データより文字行の
切出しを行った後に一文字分のブロックな抽出するブロ
ック抽出部を示し、８はブロック抽出部７で抽出された
ブロックにおいて隣り合うブロックとのブロック中（こ
の場合、２つのブロック中をいう）が通常の一文字分の
文字中を示す平均文字中とほぼ等しい場合にブロック合
成を行うブロック合成部を示している。Reference numeral 7 indicates a block extraction unit that extracts a block of one character after cutting out a character line from the image data stored in the memory 6, and 8 indicates the relationship between adjacent blocks in the block extracted by the block extraction unit 7. This figure shows a block synthesis unit that performs block synthesis when the inside of a block (in this case, the inside of two blocks) is approximately equal to the average inside of a character indicating the inside of one normal character.

そして、９はブロック合成部８で合成した合成ブロック
の正確度を判定するブロック正確度判定部を示し、１０
はブロック正確度判定部９での判定結果により出力され
るブロック或はブロック正確度判定部９で判定処理を行
わずに出力される単独ブロックを格納する文字バッファ
を示している。このブロック正確度判定部９より文字バ
ッファ１０へ出力されるブロック数は文字切出しを一回
行ったときの数である。１１はブロック正確度判定部９
より出力される１回の文字切出しにおけるブロックの総
数をカウントするブロックカウンタを示している。１２
は文字バッファ１０に格納されたブロックの文字パター
ンに該当する後述の認識辞書部１３に格納されている標
準パターンをマツチング用せて文字認識を行う認識部を
示している。１３はマツチング用の標準パターンを格納
している認識辞書部を示している。Reference numeral 9 indicates a block accuracy determination unit that determines the accuracy of the composite block synthesized by the block composition unit 8, and 10
indicates a character buffer that stores a block output based on the determination result of the block accuracy determination unit 9 or a single block output without determination processing performed by the block accuracy determination unit 9. The number of blocks output from the block accuracy determination section 9 to the character buffer 10 is the number when character extraction is performed once. 11 is a block accuracy determination unit 9
3 shows a block counter that counts the total number of blocks in one character extraction output from the block counter. 12
indicates a recognition unit that performs character recognition by matching a standard pattern stored in a recognition dictionary unit 13 (to be described later) that corresponds to a character pattern of a block stored in the character buffer 10. Reference numeral 13 indicates a recognition dictionary section storing standard patterns for matching.

また、１４は単語バッファを有し、認識部１２での認識
結果に基づいて単語候補を挙げ、後述の単語辞書部１５
に格納されている単語辞書と単語照合する単語照合部を
示している。１５は認識結果の単語候補と照合させるた
めの単語辞書を記憶している単語辞書部を示している。Further, 14 has a word buffer, which lists word candidates based on the recognition result in the recognition unit 12, and uses the word dictionary unit 15 (described later) to select word candidates.
The figure shows a word matching unit that matches words with the word dictionary stored in . Reference numeral 15 denotes a word dictionary section that stores a word dictionary for matching with word candidates of recognition results.

次に、第１の実施例の文字認識方法について説明する。Next, the character recognition method of the first embodiment will be explained.

第２図（ａ）、（ｂ）は第１の実施例のＣＰＵ２の動作
を説明するフローチャート、第３図は第１の実施例の単
語照合を説明する図である。FIGS. 2(a) and 2(b) are flowcharts for explaining the operation of the CPU 2 in the first embodiment, and FIG. 3 is a diagram for explaining word matching in the first embodiment.

まず、原稿は読取部５によって光学的に読み取られ、２
個画像に変換されてメモリ６に記憶される（ステップＳ
ｔ）。そしてブロック抽出部７では、メモリ６に記憶さ
れたイメージデータな主走査方向、即ち、行方向に黒画
素ヒストグラムをとり、ヒストグラムの谷の位置を文字
行の切出し位置として文字行の切出しを行う（ステップ
Ｓ２）。このようにして切出された文字行の領域につい
て、行を副走査方向、即ち、行方向に対して垂直な方向
に黒画素のヒストグラムをとり各文字塊の外接矩形（以
下ブロックと呼ぶ）を求める（ステップＳ３）。First, the document is optically read by the reading section 5, and
It is converted into individual images and stored in the memory 6 (step S
t). Then, the block extracting unit 7 takes a black pixel histogram of the image data stored in the memory 6 in the main scanning direction, that is, in the row direction, and cuts out the character line by using the valley position of the histogram as the cutting position of the character line ( Step S2). For the character line area cut out in this way, a histogram of black pixels is taken in the sub-scanning direction of the line, that is, in a direction perpendicular to the line direction, and the circumscribed rectangle of each character block (hereinafter referred to as block) is calculated. (Step S3).

次に、以下の処理手順をブロック間について順次に行う
。まず一つ目の第１のブロックのブロック巾とその次の
第２のブロックのブロック巾との合計のブロック巾を算
出しくステップＳ４）、予め設定されている所定の平均
文字中とを比較する（ステップＳ５）　　この結果、は
とんど等しい（例えば誤差２０％以内の差であれば等し
いとする）と判定されたときには２つのブロックを合成
すべきと判断してステップＳ７へ進む。一方、誤差２０
％を越えるときには２つのブロックを合成すべきではな
いと判断して第１のブロックを単独ブロックとしてブロ
ック正確度判定部９に出力する。この単独ブロック出力
の場合には、ブロック正確度判定部９で何も処理を行わ
ずに文字バッファ１０に格納する。これと同時に、ブロ
ックカウンタ１１の内容を一つカウントアツプしてステ
ップＳ１２に進む（ステップＳ６）。尚、ブロックカウ
ンタ１１の初期値は０゛°とする。Next, the following processing procedure is performed sequentially between blocks. First, calculate the total block width of the first block width and the block width of the next second block (Step S4), and compare it with a predetermined average character size set in advance. (Step S5) As a result, when it is determined that the blocks are almost equal (for example, if the difference is within an error of 20%, they are considered equal), it is determined that the two blocks should be combined, and the process proceeds to step S7. On the other hand, the error is 20
%, it is determined that the two blocks should not be combined, and the first block is output to the block accuracy determination section 9 as a single block. In the case of this single block output, the block accuracy determining section 9 stores the output in the character buffer 10 without performing any processing. At the same time, the contents of the block counter 11 are counted up by one and the process proceeds to step S12 (step S6). Note that the initial value of the block counter 11 is 0°.

また、ステップＳ５よりステップＳ７に進んだ場合には
、隣り合う２個のブロックを合成し、１つの合成ブロッ
クを生成する（ステップＳ６）。Further, when the process advances from step S5 to step S7, two adjacent blocks are combined to generate one combined block (step S6).

そして次の式に基づいてブロック合成の正確度を求める
。Then, the accuracy of block synthesis is determined based on the following formula.

以上の式に基づいて算出したブロック正確度が１０％よ
りも小さければ、即ち、合成ブロック巾と平均文字中と
の差が平均文字中の１０％よりも小さければブロックの
合成が不正確と判断し、方、ブロック正確度が１０％以
上ならばブロックの合成は不正確であるとして判断する
。そこで、ブロックの合成が正しく行われたと判断した
場合には、２つのブロックを合成する前のブロックの状
態でそれぞれ単独ブロックを文字バッファ１０へ出力し
くステップ５ＩＯ）、さらに２つのブロックを合成した
合成ブロックも文字バッファに出力してステップＳ１２
に進む（ステップ５１１）。以上のステップＳ１０では
ブロックカウンタ１１を２つカウントアツプし、続くス
テップＳｌｌではブロックカウンタ１１を１つカウント
アツプする。従ってブロックカウンタ１１の値は“３”
となる。またブロックの合成は正確であるとして判断し
た場合には、直接ステップＳｌｌに進んで合成ブロック
のみの出力を行ってステップＳｌｌに進む。この場合に
は、ブロックカウンタ１１の値は”１”となる。If the block accuracy calculated based on the above formula is less than 10%, that is, if the difference between the composite block width and the average character size is smaller than 10% of the average character size, it is determined that the block composition is inaccurate. However, if the block accuracy is 10% or more, it is determined that the block combination is inaccurate. Therefore, when it is determined that the blocks have been combined correctly, each individual block is output to the character buffer 10 in the state of the block before the two blocks were combined (step 5IO), and then the combination of the two blocks is combined. The block is also output to the character buffer in step S12.
(Step 511). In the above step S10, the block counter 11 is counted up by two, and in the following step Sll, the block counter 11 is counted up by one. Therefore, the value of block counter 11 is “3”
becomes. If it is determined that the combination of blocks is accurate, the process directly proceeds to step Sll, where only the combined block is output, and the process proceeds to step Sll. In this case, the value of the block counter 11 becomes "1".

ここで、従来例のところで説明した第７図の例では、“
は”と“で”は平均文字中との差が１０％より小さいの
で単独ブロックの出力が行われず、“３．”は平均文字
中との差が１０％以上となり単独ブロックの出力がなさ
れることになる。Here, in the example of FIG. 7 explained in the conventional example, “
Since the difference between ``ha'' and ``de'' is less than 10% from the average character middle, individual blocks are not output, and ``3. ” has a difference of 10% or more from the average character, and a single block is output.

また“は”、“で”、“３．”以外の文字のブロックの
場合には合成が行われることはなく単独ブロックとして
文字バッファ１０に出力されることになる。このように
、各文字はブロック合成の有無およびブロック合成の確
定度に応じて、（１）単独ブロックのみの文字バッファ
１０への出力（ステップＳ６）、（２）合成ブロックのみの文字バッファ１０への出力（
ステップ５１１）、そして、（３）単独ブロックと合成ブロックの文字バッファへの
出力（ステップＳ１０．ステップ５１１）、の３通りで文字の切出しが実行される。上記の（１）及
び（２）の場合には、文字バッファ１０へは１個のブロ
ックが出力される。また上記の（３）の場合には、文字
バッファ１０へは３個のブロック（単独ブロック２個と
合成ブロック１個）が出力されることになる。Furthermore, in the case of blocks of characters other than "wa", "de", and "3.", the combination is not performed and the blocks are output to the character buffer 10 as individual blocks. In this way, each character is output to the character buffer 10 of only a single block (step S6), (2) to the character buffer 10 of only a composite block, depending on the presence or absence of block composition and the degree of certainty of block composition. The output of (
Character extraction is performed in three ways: (3) outputting the single block and composite block to the character buffer (step S10. step 511). In cases (1) and (2) above, one block is output to the character buffer 10. In the case of (3) above, three blocks (two individual blocks and one composite block) are output to the character buffer 10.

次に、ステップＳｌｌでは、文字バッファ１０からブロ
ックカウンタ１１の値に応じた数のブロックを１個ずつ
取り出し、認識部１２で認識辞書１３を用いることによ
り公知の技術で文字認識を行う（ステップ５１２）。そ
して認識結果の文字コードは単語照合部１４へ送られ、
単語照合部１５内の単語バッファに格納される。但し、
ブロックカウンタ１１の値が“３”の場合には、認識結
果、即ち、文字コードの送出に先だって制御コードを単
語照合部１４内の単語バッファに送出しくステップ５１
４）、その後に３つの文字コードを送出する（ステップ
５１５）。ここで、前述の（１）、（２）、（３）の各
々の場合に応じて認識部１２は認識結果を送出する。ま
ず（１）の場合には単独ブロックの文字コードが１つ送
出され、（２）の場合には合成ブロックの文字コードが
１つ送出され、（３）の場合には制御コード。Next, in step Sll, blocks corresponding to the value of the block counter 11 are taken out one by one from the character buffer 10, and the recognition unit 12 performs character recognition using a known technique using the recognition dictionary 13 (step 512). ). The character code of the recognition result is then sent to the word matching section 14,
It is stored in the word buffer in the word matching section 15. however,
If the value of the block counter 11 is "3", step 51 sends the recognition result, that is, the control code to the word buffer in the word matching section 14 before sending the character code.
4), and then sends three character codes (step 515). Here, the recognition unit 12 sends out recognition results in accordance with each of the cases (1), (2), and (3) described above. First, in case (1), one character code of a single block is sent, in case (2), one character code of a composite block is sent, and in case (3), a control code is sent.

第１の単独ブロックの文字コード、第２の単独ブロック
の文字コード、そして合成ブロックの文字コードの４つ
が送出される。ここで、文字コードはＪＩＳ２バイトコ
ードが使用され、制御コードはＪＩＳコード系で未使用
の２バイトコードが使用されている。Four character codes are sent: the first single block character code, the second single block character code, and the composite block character code. Here, a JIS 2-byte code is used as the character code, and an unused 2-byte code in the JIS code system is used as the control code.

このようにして、単語照合が可能となる単語バッファに
１つの単語が蓄積されるまで上述の処理を繰り返す。即
ち、ステップＳ１５による認識結果の単語バッファへの
送出が１回終了すると、その時点で単語照合が可能か否
かをステップＳ１６で判定する。ステップＳ１６で不可
能と判定された場合には、まず次行の切出しを必要とす
るか否かを判定しくステップ５１７）、この判定で行切
出しを必要とした場合には１ペ一ジ分の処理が終了して
いない間はステップＳ２へ戻り上述の処理を繰り返す（
ステップ５１８）。またステップＳ１７の判定で行切出
しを必要無しとした場合にはステップＳ４に戻り上述の
処理を繰り返す。In this way, the above-described process is repeated until one word is accumulated in the word buffer that allows word matching. That is, when the recognition result is sent to the word buffer once in step S15, it is determined in step S16 whether word matching is possible at that point. If it is determined in step S16 that it is not possible, it is first determined whether or not it is necessary to cut out the next line (step 517), and if it is determined that it is necessary to cut out the next line, then While the process is not completed, the process returns to step S2 and repeats the above process (
step 518). If it is determined in step S17 that line cutting is not necessary, the process returns to step S4 and repeats the above-described process.

このようにして、単語バッファに文字コード（制御コー
ドを含む場合がある）による１つの単語が蓄積されると
、単語バッファ中の制御コードの数を調べ、制御コード
が１つも含まれていないときは、単語バッファの内容を
そのまま最終認識結果として出力する（ステップＳ１９
．ステップ５２０）。従って外部には単語バッファの内
容を認識結果として出力する（ステップ５２７）。また
ステップＳ１９で単語バッファに制御コードが含まれて
いると判定した場合は、次のように単語照合を行う。ま
ず制御コード以降に続く３個の文字コードがある場合に
は、第１．第２のブロックの文字コードがブロック合成
を行わない第１の文字認識の結果を示し、合成ブロック
の文字コードはブロック合成を行った第２の文字認識の
結果を示している。このような条件で単語バッファに制
御フードがｎ個含まれているとすると、単語、候補は２
０通りの文字列の組合せがあることになる。In this way, when a word with a character code (which may include a control code) is accumulated in the word buffer, the number of control codes in the word buffer is checked, and if it does not contain any control codes, outputs the contents of the word buffer as is as the final recognition result (step S19
．． step 520). Therefore, the contents of the word buffer are output to the outside as a recognition result (step 527). If it is determined in step S19 that the word buffer contains a control code, word matching is performed as follows. First, if there are three character codes following the control code, the first... The character code of the second block indicates the result of the first character recognition without block composition, and the character code of the composite block indicates the result of the second character recognition with block composition. Under these conditions, if the word buffer contains n control foods, there are 2 word candidates.
There are 0 combinations of character strings.

ここで、“アルコール”という文字列を例に挙げ、第３
図を用いて説明する。この場合には、文字切出しの結果
“ル”の文字について前述の（３）が該当する。即ち、
“ル°°は制御コードと３つの文字コードで表される。Here, taking the character string "alcohol" as an example, let's use the third
This will be explained using figures. In this case, the above-mentioned (3) applies to the character "ru" as a result of character extraction. That is,
“A rule is represented by a control code and three character codes.

従って文字列゛アルコール”には“ル”の文字が２つ含
まれていることにより、単語候補は４通りとなる。即ち
、（ａ）７　　／　　Ｉ、ｔ　　：ｌ＝／　　１．ｚ（
ｂ）７／Ｌ／：１　　＝　ル（ｃ）　７匹ユニ／Ｌ／（ｄ）７　　ル　ユ　＝　ルの４通りの単語候補が挙げられる。そこですべての単語
候補について単語辞書部１５の内容とのマツチングを行
い、単語辞書部１５内の一致する単語候補を最終認識結
果とする。従って上記の“アルコール”の場合、単語候
補の（ａ）、（ｂ）。Therefore, since the character string "alcohol" contains two "ru" characters, there are four word candidates. Namely, (a) 7/I, t :l=/1.z(
There are four word candidates: b) 7/L/:1 = ru (c) 7 uni/L/ (d) 7 ru yu = ru. Therefore, all word candidates are matched with the contents of the word dictionary section 15, and the matching word candidates in the word dictionary section 15 are taken as the final recognition result. Therefore, in the case of "alcohol" above, the word candidates are (a) and (b).

（Ｃ）は一致する単語が単語辞書部１５内に存在しない
ためにはじかれる。最後の単語候補の（ｄ）は一致する
単語が単語辞書部１５内に存在するので、この（ｄ）の
“７　　）Ｌｓ　　三：　　ル”が正しい認識結果とな
る。従って処理としては、ステップＳ２１で照合する単
語の数（２′″）だけ単語候補を挙げ、一つづつ一致不
一致を確認しながら単語照合を行う（ステップＳ２２．
ステップ５２３）。ここで、一致する単語がないままに
次の単語候補が切れてしまった場合には、単語候補すべ
てを認識結果としくステップＳ２４．ステップ５２５）
、この認識結果を外部に出力する（ステップ５２７）。(C) is rejected because no matching word exists in the word dictionary section 15. Since the last word candidate (d) has a matching word in the word dictionary section 15, the correct recognition result is "7) Ls 3: ru" in (d). Therefore, as a process, word candidates are listed as many as the number of words to be matched (2''') in step S21, and word matching is performed while checking each match and mismatch (step S22.
Step 523). Here, if the next word candidate is cut off without a matching word, all word candidates are treated as recognition results and step S24. step 525)
, and outputs this recognition result to the outside (step 527).

またステップＳ２３で一致する単語候補を見つけたとき
には、一致した単語候補を認識結果とする（ステップ５
２６）。そして認識結果を外部に出力する（ステップ５
２７）。Further, when a matching word candidate is found in step S23, the matching word candidate is set as the recognition result (step S23).
26). Then, output the recognition results to the outside (step 5)
27).

このように、外部に認識結果を出力した後には、再びス
テップＳ１７に戻り、上述の処理を繰り返す。After outputting the recognition result to the outside in this way, the process returns to step S17 and the above-described process is repeated.

以上の説明により第１の実施例によれば、複数の文字で
照合させると共に、文字認識の候補に漏れを無くすこと
で正確な文字認識を実施することができる。As described above, according to the first embodiment, accurate character recognition can be performed by collating a plurality of characters and eliminating omissions in character recognition candidates.

く第２の実施例の説明〉次に、第２の実施例について説明する。Description of the second embodiment> Next, a second example will be described.

第４図は本発明に係わる文字認識装置の第２の実施例を
示すブロック図であり、第５図は第１の実施例のＣＰＵ
２の動作を説明するフローチャートである。FIG. 4 is a block diagram showing a second embodiment of the character recognition device according to the present invention, and FIG. 5 is a block diagram showing the CPU of the first embodiment.
2 is a flowchart illustrating the operation of step 2.

第４図において、２１は第２の実施例の文字認識装置を
示している。２２は本装置全体を制御するＣＰＵを示し
ている。２３はＣＰＵ２２を動作させるための制御プロ
グラム、エラー処理プログラム、そして後述する第５図
に示すフローチャートに従ったプログラム等を格納して
いるＲＯＭを示し、２４はＲＯＭ２３に格納されている
各種プログラムのワークエリア及びエラー処理時の一時
退避エリアとして用いるＲＡＭを示している。ここで、
参照番号２５〜３１までの各部の機能は前述の第１の実
施例と同様のため、説明を省略する。In FIG. 4, numeral 21 indicates a character recognition device of the second embodiment. 22 indicates a CPU that controls the entire device. Reference numeral 23 indicates a ROM that stores a control program for operating the CPU 22, an error processing program, and a program according to a flowchart shown in FIG. It shows a RAM used as an area and a temporary save area during error processing. here,
The functions of the respective parts with reference numbers 25 to 31 are the same as those in the first embodiment described above, and therefore the description thereof will be omitted.

そして、３２は文字バッファ３０に格納されたブロック
中の文字パターンを後述の認識辞書部３３に記憶されて
いる標準パターンと比較によって類似度を求め、類似度
が最大の標準パターンに基づいて認識文字の候補を挙げ
る類似度計算部を示している。３３は類似度計算部３２
で認識文字の候補を挙げるための標準パターンを記憶し
ている認識辞書部を示している。３４は類似度計算部３
２で挙げた認識文字の候補より最後の認識結果を識別す
る識別部を示している。32 determines the degree of similarity by comparing the character pattern in the block stored in the character buffer 30 with a standard pattern stored in the recognition dictionary section 33 (described later), and recognizes the character based on the standard pattern with the maximum degree of similarity. This figure shows a similarity calculation unit that lists candidates. 33 is a similarity calculation unit 32
2 shows a recognition dictionary section that stores standard patterns for listing candidates for recognition characters. 34 is the similarity calculation unit 3
2 shows an identification unit that identifies the final recognition result from the recognized character candidates listed in 2.

ここで、第２の実施例による文字認識方法について説明
する。Here, a character recognition method according to a second embodiment will be explained.

第２の実施例においても第１の実施例のステップ８１〜
ステツプＳｌｌまでの処理と同様の処理がステップ８１
〜ステツプｓ１１′まで行われる。従ってその間の処理
の説明を省略する。In the second embodiment as well, steps 81 to 81 of the first embodiment
The same process as that up to step Sll is performed at step 81.
- Step s11' is performed. Therefore, explanation of the processing during that time will be omitted.

そこで、ステップ８１〜ステツプｓ１１゛により文字バ
ッファ３ｏにブロックが格納されると、次にブロックカ
ウンタ３１の値を調べる（ステップ５３０）。もし値が
“１”の場合には、単独ブロックか或は合成ブロックの
ため、類似度計算部３２で認識辞書部３３内の標準パタ
ーンと類似度を算出する（ステップ５３１）。そして類
似度が最大の標準パターンを認識結果とする（ステップ
５３２）。このようにして求めた認識結果は識別部３４
では何も処理を行わずに外部に出力される（ステップ５
４２）。またステップＳ３０でブロックカウンタ３１の
値が“３”であることを確認すると、類似度計算部３２
では文字バッファ３０内のそれぞれのブロック（第１の
単独ブロック、第２の単独ブロック及び合成ブロック）
に対して認識辞書部３３内の標準パターンとの類似度の
計算が行われる。そして各々の計算時には最大の類似度
を求め、第１の単独ブロックの最大の類似度、第２の単
独ブロックの最大の類似度１合成。Therefore, after the block is stored in the character buffer 3o in steps 81 to s11', the value of the block counter 31 is checked (step 530). If the value is "1", the block is a single block or a composite block, so the similarity calculation unit 32 calculates the similarity with the standard pattern in the recognition dictionary unit 33 (step 531). Then, the standard pattern with the highest degree of similarity is set as the recognition result (step 532). The recognition result obtained in this way is obtained by the identification unit 34.
Then, it is output to the outside without any processing (step 5)
42). Further, when it is confirmed in step S30 that the value of the block counter 31 is "3", the similarity calculation unit 32
Now, each block in the character buffer 30 (first single block, second single block, and composite block)
The degree of similarity between the reference pattern and the standard pattern in the recognition dictionary section 33 is calculated. Then, at the time of each calculation, the maximum similarity is determined, and the maximum similarity of the first single block and the maximum similarity of the second single block are combined.

ブロックの最大類似度をそれぞれｍＩ、ｍ２゜ｍ３とす
る（ステップＳ３３〜ステツプ８３８）。The maximum similarity of the blocks is set to mI and m2°m3, respectively (step S33 to step 838).

次に、最大類似度ｍ　ｒ　＋　ｍ　２＋　ｍ　３より最
終的に認識文字として出力するための標準パターンを識
別部３４で識別する。この識別方法として以下の式を用
いる。即ち、（ｍｌ＋ｍｚ）÷２≦ｍ。Next, the identification unit 34 identifies a standard pattern to be finally output as a recognized character based on the maximum similarity m r + m 2 + m 3. The following formula is used as this identification method. That is, (ml+mz)÷2≦m.

ならば単独ブロックとの類似度の最も大きい標準パター
ンの文字コードを最終認識結果とする。この場合には類
似度ｍ　＋　＋　ｍ　２にそれぞれ該当する標準パター
ンが認識結果として識別される（ステップＳ３９．ステ
ップ５４０）　　また（ｍ＋　＋ｍ２）÷２〉ｍ３なら
ば合成ブロックとの類似度の最も大きい標準パターンの
文字コードを最終認識結果とする。この場合には類似度
ｍ。If so, the character code of the standard pattern with the greatest degree of similarity to the single block is set as the final recognition result. In this case, standard patterns corresponding to the degree of similarity m + + m2 are identified as recognition results (steps S39 and 540). Also, if (m+ +m2)÷2〉m3, then the standard pattern with the highest degree of similarity to the composite block is identified as the recognition result. The character code of the large standard pattern is used as the final recognition result. In this case, the similarity is m.

に該当する標準パターンが認識結果として識別される（
ステップ５４１）。A standard pattern that corresponds to is identified as a recognition result (
step 541).

次に、ステップＳ３２．ステップＳ４０．そしてステッ
プＳ４１のそれぞれの識別処理により識別された認識結
果を外部に出力する（ステップ５４２）。このようにし
て外部に識別された認識結果を出力した後には、次の文
字認識を行うために次行の切出しが必要なければ第１の
実施例と同様に対応するステップＳ４に戻り（ステップ
５４３）、また次行の切出しが必要であれば１ペ一ジ分
の処理が終了するまでは第１の実施例と同様に対応する
ステップＳ２に戻り処理を繰り返す（ステップ５４４）
。Next, step S32. Step S40. Then, the recognition results identified by the respective identification processes in step S41 are output to the outside (step 542). After outputting the recognized recognition result to the outside in this way, if the next line does not need to be cut out in order to perform the next character recognition, the process returns to the corresponding step S4 as in the first embodiment (step 543 ), and if the next line needs to be cut out, the process returns to the corresponding step S2 and repeats the process as in the first embodiment until the process for one page is completed (step 544).
.

以上の説明により第２の実施例によれば、文字認識を類
似度判定で行っても文字認識のための候補に漏れがない
状態による正確な文字認識を行うことができろ。As described above, according to the second embodiment, even if character recognition is performed based on similarity determination, accurate character recognition can be performed with no omission of candidates for character recognition.

［発明の効果］以上の説明により本発明によれば、文字の切出しを正確
に行うことで文字認識のための候補に漏れがない状態に
よる正確な文字認識を実施できる。[Effects of the Invention] As described above, according to the present invention, by accurately cutting out characters, accurate character recognition can be performed in a state where there are no omissions in candidates for character recognition.

[Brief explanation of drawings]

第１図は本発明に係わる文字認識装置の第１の実施例を
示すブロック図、第２図（ａ）、（ｂ）は第１の実施例のｃｐｕ２の動作
を説明するフローチャート、第３図は第１の実施例の単語照合を説明する図、第４図は本発明に係わる文字認識装置の第２の実施例を
示すブロック図、第５図は第１の実施例のＣＰＵ２の動作を説明するフロ
ーチャート、算６図は従来の文字認識装置を示すブロック図、第７図は従来の単語照合を説明する図である。図中、１．２１・・・文字認識装置、２，２２・・・Ｃ
ＰＵ、３．２３・・・ＲＯＭ、４．２４・・・ＲＡＭ。５．２５．５１・・・読取部、６，２６．５２・・・メ
モリ、７，２７．５３・・・ブロック抽出部、８．２８
．５４・・・ブロック合成部、９．２９・・・ブロック
正確度判定部、１０，３０．５５・・・文字バッファ、
１１．３１・・・ブロックカウンタ、１２゜５６・・・
認識部、１３．３３．５７・・・認識辞書部、１４．５
８・・・単語照合部、１５．５９・・・単語辞書部、３
２・・・類似度計算部、３４・・・識別部である。FIG. 1 is a block diagram showing a first embodiment of the character recognition device according to the present invention, FIGS. 2(a) and (b) are flowcharts explaining the operation of the CPU 2 of the first embodiment, and FIG. 3 4 is a block diagram showing the second embodiment of the character recognition device according to the present invention. FIG. 5 is a diagram illustrating the operation of the CPU 2 in the first embodiment. FIG. 6 is a block diagram showing a conventional character recognition device, and FIG. 7 is a diagram explaining conventional word matching. In the figure, 1.21...Character recognition device, 2,22...C
PU, 3.23...ROM, 4.24...RAM. 5.25.51...Reading unit, 6,26.52...Memory, 7,27.53...Block extraction unit, 8.28
．． 54...Block synthesis unit, 9.29...Block accuracy determination unit, 10,30.55...Character buffer,
11.31...Block counter, 12°56...
Recognition unit, 13.33.57... Recognition dictionary unit, 14.5
8...Word matching section, 15.59...Word dictionary section, 3
2... Similarity calculation unit, 34... Identification unit.

Claims

[Claims]

(1) A character recognition device that performs character recognition based on image data, comprising: input means for inputting image data; block extraction means for extracting block data including character pattern data based on the input image data; and the extraction means for extracting block data including character pattern data based on the input image data. a block composition determining means for determining whether or not to perform block composition based on the widths of two adjacent blocks of block data; and a character cutting means for extracting block data based on the determination result of the block composition determining means. , and recognition means for performing character recognition based on the block data cut out by the character cutout means.

(2) The recognition means includes a word candidate forming means that forms word candidates based on the result of character recognition, and a word matching means that performs word matching with the word candidates. The character recognition device described.

(3) The recognition means includes an identification means that uses a character pattern in the block data as a candidate for a recognized character and identifies the recognized character based on the degree of similarity to a standard pattern stored in advance. Character recognition device according to item 1.