JP2002183661A

JP2002183661A - Character recognizing device and image processor and image recognizing device

Info

Publication number: JP2002183661A
Application number: JP2001331572A
Authority: JP
Inventors: Yoshiaki Kurosawa; 由明黒沢; Bunpei Irie; 文平入江; Hideo Horiuchi; 秀雄堀内
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2001-10-29
Filing date: 2001-10-29
Publication date: 2002-06-28
Anticipated expiration: 2020-01-26
Also published as: JP3615179B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processor capable of highly accurately recognizing a character. SOLUTION: The image processor is provided with a color separating means 2 for color separation of a part corresponding to a seal color and a color part of a writing-in character to input image data, a determining means for determining cancellation to a character writing part of image data of the color part of a write-in character separated by this color separating means, and a means 7 for outputting the existence of the seal color to the part by taking out a position part of the image data of the seal color, corresponding to a position of a cancellation part, when the image data is determined as the cancellation part by this determining means.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、伝票などに印刷さ
れた文字の認識に用いられる文字認識装置に適用される
画像処理装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image processing apparatus applied to a character recognition device used for recognizing characters printed on a slip or the like.

【０００２】[0002]

【従来の技術】従来、伝票のように罫線・空白などによ
り文字の並びが区切られている表形式帳票に印刷されて
いる文字の認識を行うような場合、画像入力時の際に、
ドロップ・アウトカラーと称する帳票のフォーム部分を
構成する色を光学的フィルター処理により落し、認識対
象となる原画像からフォーム部分を消去して画像入力を
行なう事により文字認識を行うようにしている。2. Description of the Related Art Conventionally, in the case of recognizing characters printed on a tabular form in which the arrangement of characters is separated by ruled lines and blank spaces, such as a slip, when inputting an image,
The colors constituting the form part of the form called drop-out colors are dropped by an optical filter process, the form part is deleted from the original image to be recognized, and image input is performed to perform character recognition.

【０００３】このため、従来技術ではドロップ・アウト
カラーとは異なる色の帳票を扱うことはできず、もちろ
ん、バックグラウンドカラーが白で無いものも扱えなか
った。また多色刷りのものや、記載文字がドロップ・ア
ウトカラーであるものも扱うことができず、当然の事な
がら、記載文字の色が特定されないものも読み取ること
ができなかった。[0003] For this reason, the prior art cannot handle a form having a color different from the drop-out color, and of course, cannot handle a form having a background color other than white. Further, it was not possible to handle multi-colored prints or printouts in which the written characters were drop-out colors, and of course, it was not possible to read those in which the colors of the written characters were not specified.

【０００４】そこで、従来、特開昭６１−１５２８５号
公報によりドロップ・アウトカラーでない色で印刷され
た帳票に対応できる文字認識装置が提案されているが、
この公報のものには、色分離技術が含まれていない為に
文字枠と文字の分離が色分離技術ではなく、文字枠と文
字の分離技術のみによって実施されている。この為、認
識精度の点で限界があった。To solve this problem, Japanese Patent Application Laid-Open No. 61-15285 has proposed a character recognition device which can handle a document printed in a color other than a drop-out color.
Since the publication does not include a color separation technique, the separation of a character frame and a character is performed not by a color separation technique but by only a technique of separating a character frame and a character. Therefore, there is a limit in recognition accuracy.

【０００５】一方、特開昭６２−５４８５号公報、特開
昭６２−１５４１８１号公報、特開平２−６７６８９号
公報、特開平２−１３５５８４号公報、特開平２−１２
３４８８号公報、特開平３−１４０７７号公報、特開平
３−２２３９８７号公報、特開平４−３９７８９号公
報、特開平４−１６０４８６号公報、特開平４−３５４
０８３号公報では、あらかじめ設定されている方法によ
り入力画像の色分離を行ない、その結果を用いて文字認
識を行う装置が開示されている。On the other hand, JP-A-62-5485, JP-A-62-154181, JP-A-2-67689, JP-A-2-135584, JP-A-2-12
JP-A-3488, JP-A-3-14077, JP-A-3-223987, JP-A-4-39789, JP-A-4-160486, JP-A-4-354
Japanese Patent Application Publication No. 083 discloses an apparatus for performing color separation of an input image by a method set in advance and performing character recognition using the result.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、これら
の文字認識装置によると、帳票色や記入色、帳票フォー
ム部分や文字部分の印刷方法、文字記入方法等に対する
制限などを緩やかにして運用すると、例えば、乱雑に記
入文字を記入欄に記載した場合などに、文字を正確に認
識する事ができないことがあり、また色ずれを起こして
いる様なケースにも対応できなかった。However, according to these character recognition devices, if the restrictions on the form color and the entry color, the printing method of the form form and the character portion, the character entry method, and the like are relaxed, for example, In some cases, characters cannot be recognized accurately, for example, when characters are randomly entered in a column, and it is not possible to cope with a case where color misregistration occurs.

【０００７】また、これら従来の文字認識装置では、ド
ロップ・アウトカラーを使う事のできない運用形態の時
に、画像をカラー・データで読み込み、何らかの色分離
処理を行ない、これら色分離されたデータから文字認識
を行なうことなどが提案されているだけであり、現実
に、文字認識のシステムを構成する場合に考慮しなけれ
ばいけない種々の問題について何ら解決策を提示してい
ない。In these conventional character recognition devices, in an operation mode in which drop-out color cannot be used, an image is read as color data, and some color separation processing is performed. Recognition is only proposed, and no solution is actually presented for various problems that must be considered when configuring a character recognition system.

【０００８】この為、従来技術のみでは実際に前述した
ような現実に存在している帳票読み取りの運用形態が実
施された場合には、文字認識手段として有効なものとは
いえなかった。一例として、赤い枠に赤い文字で印刷さ
れている部分と、赤い枠に黒い文字で記入された部分が
同居するような帳票については、具体的にどのような手
段で読み取るか問題になるが、従来のものでは、これら
の問題は解決され得ない。For this reason, the prior art alone cannot be said to be effective as a character recognition means when an actual form reading operation mode as described above is actually implemented. As an example, for a form in which a part printed in red characters in a red frame and a part written in black characters in a red frame coexist, it is a problem how to read concretely, Conventionally, these problems cannot be solved.

【０００９】本発明は、上記事情に鑑みてなされたもの
で、精度の高い文字認識を実現することができる画像処
理装置を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has as its object to provide an image processing apparatus capable of realizing highly accurate character recognition.

【００１０】[0010]

【課題を解決するための手段】本発明は、入力画像デー
タについて暫定的な色分離を行うとともに、該暫定的に
色分離された各部分を構成する画素の幾何学的な情報
と、これら各部分およびバックグラウンドの濃度情報を
併用して最終的な色分離を決定するようにしている。SUMMARY OF THE INVENTION According to the present invention, provisional color separation is performed on input image data, and geometric information of pixels constituting each provisionally color-separated portion, The final color separation is determined by using the density information of the part and the background together.

【００１１】また、本発明は、予め書式情報に部分領域
の定義と該領域毎に実行される色分離に関する情報を用
意し、これら情報に基づいて所定の領域での色分離を実
行するようにしている。According to the present invention, the format information is prepared in advance with information on the definition of a partial area and color separation performed for each area, and color separation in a predetermined area is performed based on the information. ing.

【００１２】また、本発明は、帳票の入力画像データに
ついて特定色を分離し、この分離された特定の色に対応
する書式情報を選択し、該選択された書式情報に対する
認識処理を実行するようにしている。According to the present invention, a specific color is separated from input image data of a form, format information corresponding to the separated specific color is selected, and recognition processing for the selected format information is executed. I have to.

【００１３】また、本発明は、画像データから抽出され
る複数の代表色をランレングス表現し、これらランレン
グス表現データを指し示すポインタにより構成されるポ
インタ・テーブルを生成し、該ポインタ・テーブルのポ
インタにより特定のデータ部分を参照可能にしている。According to the present invention, a plurality of representative colors extracted from image data are run-length-represented, a pointer table including pointers pointing to the run-length-represented data is generated, and a pointer of the pointer table is generated. Allows a specific data portion to be referenced.

【００１４】また、本発明は、画像の色表現形式である
色情報、濃淡情報、２値情報に対応して各処理ユニット
を用意し、入力された画像の色表現形式に応じた処理ユ
ニットを使用するようにしている。Further, according to the present invention, each processing unit is prepared corresponding to color information, shading information, and binary information, which are color representation formats of an image, and a processing unit corresponding to the color representation format of an input image is prepared. I use it.

【００１５】また、本発明は、入力画像がそのままある
いは色分離されて蓄積されるカラー画像バッファ、カラ
ー画像から得られるか、直接入力されるグレー画像を格
納するグレイ画像バッファ少なくとも一方を有し、且つ
グレー画像またはカラー画像から生成される２値画像を
格納する２値画像バッファを有し、２値画像を対象とし
た処理において、入力画像の特定部分を再チェックする
場合に、前記カラー画像バッファおよびグレイ画像バッ
ファの少なくとも一方の前記特定部分の対応部分を参照
可能にしている。Further, the present invention has at least one of a color image buffer in which an input image is stored as it is or color-separated, and a gray image buffer which is obtained from the color image or stores a directly input gray image, And a binary image buffer for storing a binary image generated from a gray image or a color image. In the processing for the binary image, when rechecking a specific portion of the input image, the color image buffer And a corresponding portion of at least one of the specific portions of the gray image buffer.

【００１６】本発明は、入力画像データが色分離され、
印鑑の色に相当する部分と記入文字の色部分に分けら
れ、該記入色の画像データに対して取り消し判定が行わ
れ、該取り消し部分と判定されると、この取り消し部分
の位置に対応する印鑑色の画像データの位置部分を取り
出して、その部分に印鑑色が存在するか調べるととも
に、その存在の有無を出力する２重線取消し処理機能を
有している。According to the present invention, the input image data is color-separated,
The part corresponding to the color of the seal and the color part of the entered characters are divided, and a cancellation determination is performed on the image data of the entry color. If it is determined that the part is the canceled part, the seal corresponding to the position of the canceled part It has a double line cancellation processing function for extracting the position portion of the color image data, checking whether or not a seal color exists in that portion, and outputting the presence or absence of the presence.

【００１７】また、本発明は、入力された画像データか
ら帳票のフォーム・イメージを取り出し、該帳票上の文
字を認識するとともに、少なくともその文字の存在位置
を記憶し、帳票のフォーム・イメージの色に類似する色
によってフォーム・イメージを画像表示するときに、前
記認識された文字をその文字が存在していた位置に重ね
て表示するようにしている。Further, according to the present invention, a form image of a form is extracted from input image data, characters on the form are recognized, at least the position of the character is stored, and the color of the form image of the form is stored. When a form image is displayed as an image in a color similar to the above, the recognized character is displayed so as to overlap the position where the character existed.

【００１８】また、本発明は、入力画像データに対し修
正に必要な修正パラメータを属性データとして付与して
おき、その後前記属性データに基づいて修正パラメータ
を再現し該修正パラメータにより画像データを修正して
該修正画像データについて画像処理を実行するようにし
ている。Further, according to the present invention, a correction parameter necessary for correction is added to input image data as attribute data, and thereafter, the correction parameter is reproduced based on the attribute data, and the image data is corrected by the correction parameter. Image processing is performed on the corrected image data.

【００１９】［作用］この結果、本発明によれば、帳票
色や記入色、帳票フォーム部分や文字部分の印刷方法、
文字記入方法等に対する制限を緩く運用した場合や、乱
雑に記入文字を記入欄に記載した場合などにあっても、
高精度の文字認識を実現することができるようになる。[Operation] As a result, according to the present invention, a form color and an entry color, a form form part and a character part printing method,
Even if the restrictions on the character entry method are loosely used, or if the entry characters are written in the entry column messy,
High-precision character recognition can be realized.

【００２０】[0020]

【発明の実施の形態】以下、本発明の一実施例を図面に
従い説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below with reference to the drawings.

【００２１】図１は、同実施例を文字認識装置に適用し
た場合の概略構成を示している。図において、１は画像
入力部で、この画像入力部１には、画像データが入力さ
れる。この場合、入力画像データとしては、例えば、Ｒ
ＧＢ（赤、緑、青）の３色からなり、それぞれ多値表現
されたものが用いられる。勿論、これ以外の表現形式の
ものでも良い。FIG. 1 shows a schematic configuration when the embodiment is applied to a character recognition device. In the figure, reference numeral 1 denotes an image input unit to which image data is input. In this case, as the input image data, for example, R
Each of the three colors GB (red, green, and blue), each of which is represented by a multi-value, is used. Of course, other expressions may be used.

【００２２】そして、画像入力部１では、例えば、色ず
れ補正処理、色の偏り補正処理、色変換などを行うよう
にしている。The image input unit 1 performs, for example, color shift correction processing, color bias correction processing, and color conversion.

【００２３】画像入力部１からの出力は、色分離部２に
送られる。色分離部２は、入力画像データから特定の色
部分の色画像を分離抽出するもので、これら分離した色
部分をバッファ３の各色画像バッファ３１、３２、…３
ｎに格納する。The output from the image input unit 1 is sent to the color separation unit 2. The color separation unit 2 separates and extracts a color image of a specific color portion from the input image data, and separates the separated color portions into respective color image buffers 31, 32,.
n.

【００２４】バッファ３の各色画像バッファ３１、３
２、…３ｎからの出力は、画像処理部４に送られ。この
画像処理部４は後述するランレングス処理などを行うも
のである。Each color image buffer 31, 3 of the buffer 3
The outputs from 2, 3n are sent to the image processing unit 4. The image processing unit 4 performs a run-length process described later.

【００２５】そして、この画像処理部４からの出力は、
文字切り出し部５に送られる。文字切り出し部５は、バ
ッファ３から読み出された画像データから文字の切り出
し処理を行うようにしている。The output from the image processing unit 4 is
It is sent to the character cutout unit 5. The character cutout unit 5 performs character cutout processing from the image data read from the buffer 3.

【００２６】そして、この文字切り出し部５での切り出
し処理結果は、文字認識部６に送られる。文字認識部６
は、文字切り出し部５より切り出された文字について認
識処理を行うものである。そして、この文字認識部６で
認識された結果は、読み取り結果修正部７に送られ、オ
ペレータにより修正されて最終的な認識結果として出力
される。The result of the cut-out processing in the character cut-out unit 5 is sent to the character recognition unit 6. Character recognition unit 6
Is for performing recognition processing on characters cut out by the character cutout unit 5. The result recognized by the character recognition unit 6 is sent to the read result correction unit 7, corrected by the operator, and output as the final recognition result.

【００２７】このように構成において、画像入力部１に
は、画像データが入力される。この場合、入力画像デー
タは、例えば、ＲＧＢ（赤、緑、青）の３色で、それぞ
れ多値表現されたものである。In such a configuration, image data is input to the image input unit 1. In this case, the input image data is, for example, one of three colors of RGB (red, green, and blue) and is represented by multi-values.

【００２８】そして、画像入力部１により、入力画像に
対するずれ修正、色の偏り補正の処理、色変換などが行
われる。Then, the image input unit 1 performs a process of correcting a shift with respect to the input image, a process of correcting a color bias, a color conversion, and the like.

【００２９】ここで、色ずれ修正は、色ずれ修正パラメ
ータとして、例えば、ＲＧＢの各色に対してシフトする
量を記述したものが考えられ、その修正は、それらのシ
フト量を各色について行なう事により実現される。Here, the color misregistration correction may be, for example, a parameter describing a shift amount for each color of RGB as a color misregistration correction parameter. Is achieved.

【００３０】また、色の偏り補正処理は、例えば、ＲＧ
Ｂの各色に対して色の濃度を変換する事により行なわれ
る。ここで色の濃度を変換する方式には、ガンマ変換と
呼ばれる１個の色の濃度からその色の濃度の変換を行う
方式を採用して色の偏り補正処理を実現しても良いし、
ＲＧＢすべての色の濃度から、それぞれの色の濃度を決
めるような関数を用意するようにしても良い。ここでの
色の偏り補正パラメータは、それらの関数を決める様な
数値データであっても良いし、変換関数そのものであっ
ても良い。The color bias correction process is performed, for example, by using RG
This is performed by converting the color density for each color B. Here, as the method of converting the color density, a method of converting the density of a single color from the density of one color called gamma conversion may be employed to implement the color bias correction processing,
A function that determines the density of each color from the density of all RGB colors may be prepared. The color bias correction parameter here may be numerical data for determining those functions, or may be the conversion function itself.

【００３１】本装置では、これら色ずれ修正パラメータ
や色の偏り補正パラメータを保持していて、これらに基
づいて色ずれ修正と色偏り補正処理を行なう。すなわ
ち、文字認識や特徴抽出、線分抽出、２値化処理などの
実際の文字認識処理や画像処理作業の途中で、認識不能
や抽出失敗が多くなったり、何等かの異常、矛盾が生じ
たときに修正パラメータによらない色ずれ修正を行い、
新たな修正パラメータを得、この結果と修正パラメータ
が異なるときにパラメータを新しく書き換えることによ
ってパラメータを更新する。The present apparatus holds these color shift correction parameters and color shift correction parameters, and performs color shift correction and color shift correction processing based on these parameters. That is, during actual character recognition processing and image processing work such as character recognition, feature extraction, line segment extraction, and binarization processing, unrecognition and extraction failure increased, and some abnormalities and contradictions occurred. Sometimes color shift correction not depending on correction parameters,
A new correction parameter is obtained, and the parameter is updated by newly rewriting the parameter when the result is different from the correction parameter.

【００３２】この更新は定期的に行なっても良いし、オ
ペレータの指示によって行なっても良い。また、色修正
（補正）パラメータは複数の画像入力部１に対応できる
ように複数個用意されていても良い。この場合、各画像
入力部１に対応してそれぞれの色修正（補正）パラメー
タによる色修正、及びパラメータのアップデートを行な
うようになる。This updating may be performed periodically, or may be performed according to an operator's instruction. Further, a plurality of color correction (correction) parameters may be prepared so as to correspond to a plurality of image input units 1. In this case, the color correction using the respective color correction (correction) parameters and the update of the parameters are performed for each image input unit 1.

【００３３】色ずれ修正パラメータは、帳票などでの黒
い部分に基づく色分析によって求められる。この黒い部
分は、入力画像中から自動的に抽出されても良いし、あ
らかじめ設定されている書式情報に従ってその存在場所
や大きさ形が決められていても良い。The color misregistration correction parameter is obtained by color analysis based on a black portion in a form or the like. The black portion may be automatically extracted from the input image, or its location and size may be determined in accordance with preset format information.

【００３４】あらかじめ決められた形のパターンが帳票
上に設定されている場合は、帳票中の特定部分に色ずれ
の修正用の模様やマーク、例えば十字形をセットしてお
いて、それに基づいて処理を行なうようになる。これら
模様、マークはその位置や形状等の情報が書式情報に登
録されていても良い。また、この模様部分を帳票番号を
記載する文字枠などとして、他の用途と共用する事もで
きる。When a pattern having a predetermined shape is set on a form, a pattern or a mark for correcting color misregistration, for example, a cross, is set on a specific portion of the form, and based on the set pattern or mark. Processing will be performed. Information such as the position and shape of these patterns and marks may be registered in the format information. Also, this pattern portion can be shared with other uses as a character frame for describing a form number.

【００３５】一方、色ずれ修正を行うための色ずれ分析
は、黒い部分の上下と左右の両側について、補色の関係
にある色がどの位存在しているかによって計測される。
このような計測は、複数の場所で行い、この結果のアベ
レージをとって全画面に共通の色ずれ量としても良い
し、それらをそれぞれの場所での色ずれ量だと見なして
も良い。後者のケースでは全画面で色ずれが一様でない
場合に対応できる。On the other hand, the color misregistration analysis for correcting color misregistration is performed by measuring how many complementary colors exist in the upper, lower, left and right sides of the black portion.
Such measurement may be performed at a plurality of locations, and an average of the results may be taken as a color shift amount common to all screens, or may be regarded as a color shift amount at each location. The latter case can cope with the case where the color shift is not uniform over the entire screen.

【００３６】図２は、色ずれの測定の方法を説明するも
のである。この場合、図に示すように黒部分の両側に２
重のにじみ、つまり２重のはみだしがあるとすると、各
色の切れ目はＡ１〜Ａ６で表される。この場合、Ａ１〜
Ａ２のエリアとＡ４〜Ａ５のエリアが補色の関係にあ
り、またＡ２〜Ａ３のエリアとＡ５〜Ａ６のエリアが補
色の関係にある事を条件に、Ａ１からＡ２までの距離、
Ａ１からＡ３までの距離が、それぞれ赤信号に対する緑
と青の色ずれ量と判定される。FIG. 2 illustrates a method of measuring color misregistration. In this case, as shown in FIG.
If there is a double bleed, that is, a double bleed, the breaks of each color are represented by A1 to A6. In this case, A1
The distance from A1 to A2, provided that the area of A2 and the area of A4 to A5 are in complementary colors, and the area of A2 to A3 and the area of A5 to A6 are in complementary colors.
The distance from A1 to A3 is determined as the amount of color shift between green and blue with respect to the red signal.

【００３７】ここで、補色の関係とは、両者の和がバッ
クグラウンドカラーＡ１〜Ａ６の外側、この例では白に
なることか、またはそれに近くなることである。この考
え方は中央の領域Ａ３〜Ａ４が消失している場合でも同
様に適用する事ができる。Here, the relationship between the complementary colors means that the sum of the two is outside the background colors A1 to A6, in this example, white or close to it. This concept can be similarly applied even when the central areas A3 to A4 have disappeared.

【００３８】この場合、２重のはみ出しでなく、１重の
はみ出しでも適用できる。１重の場合には２つの色のず
れ量は同じものとすれば良い。この例ではＲＧＢの順に
ずれている例を示したが、どの順にずれているかは各エ
リアの色から判断する事が出来る。In this case, a single overhang can be applied instead of a double overhang. In the case of a single color, the shift amounts of the two colors may be the same. This example shows an example in which the colors are shifted in the order of RGB. However, the order in which the colors are shifted can be determined from the color of each area.

【００３９】帳票中で黒で無い部分についても上述の考
え方が適用できる。この場合は、隣接する部分での色の
変化に基づいて色ずれ修正パラメータを決定する。つま
り、バックグラウンドの色のＲＧＢの濃度値をＲ０、Ｇ
０、Ｂ０とし、対象部分の色をＲ１、Ｇ１、Ｂ１とする
と、色ずれを生じている所の色の変化は図３に示すよう
に表すことができ、この表から色ずれが起きているかど
うかと、色ずれ量を判定できる。The above concept can be applied to a non-black portion of a form. In this case, a color misregistration correction parameter is determined based on a change in color between adjacent portions. That is, the RGB density values of the background color are R0, G
Assuming that the color of the target portion is R1, G1, and B1, the color change at the place where the color shift occurs can be represented as shown in FIG. And the amount of color misregistration can be determined.

【００４０】この場合、Ａ１〜Ａ２のエリアとＡ４〜Ａ
５のエリアの和をとると、それはＡ３〜Ａ４のエリアの
色とバックグラウンドの色を加えたものとなり、またＡ
２〜Ａ３のエリアとＡ５〜Ａ６のエリアについても同様
の事が言えれば、この例の場合はＡ１とＡ２の距離、Ａ
１とＡ３の距離がそれぞれ赤信号に対する緑と青の色ず
れ量と判定される。また、仮にＡ３〜Ａ４のエリアが消
失していたり、２重のはみ出しでなく１重のはみ出しの
場合でも、各エリアの色の状態からＲ０、Ｇ０、Ｂ０、
Ｒ１、Ｇ１、Ｂ１を決定できるので、上述した考え方を
使用してやはり色ずれ量を求める事ができる。In this case, the areas A1 to A2 and A4 to A4
If the sum of the area of 5 is taken, it becomes the sum of the color of the area of A3 to A4 and the color of the background.
If the same can be said for the areas 2 to A3 and the areas A5 to A6, the distance between A1 and A2, A
The distance between A1 and A3 is determined as the amount of color shift between green and blue with respect to the red signal. Further, even if the areas A3 to A4 have disappeared or if the area does not protrude double but protrude single, R0, G0, B0,
Since R1, G1, and B1 can be determined, the amount of color misregistration can also be determined using the above-described concept.

【００４１】一方、図４に示すようなカラー印刷された
マークを検出し、その中にある赤、緑、青の各色のずれ
を調べる事により色ずれ量を検出する事ができる。On the other hand, a color printed mark as shown in FIG. 4 is detected, and the amount of color misregistration can be detected by examining the misalignment of each of the red, green, and blue colors therein.

【００４２】また、この様なマークの各色から色の偏り
補正処理を実行する事もできる。すなわち、画像入力部
１から得られるマークの色情報と本来期待されるべき色
情報の差に基づいて色の偏り補正の為のパラメータまた
は関数を決定する事ができる。Further, it is also possible to execute the color bias correction processing from each color of such a mark. That is, it is possible to determine a parameter or a function for correcting color bias based on the difference between the color information of the mark obtained from the image input unit 1 and the color information that should be expected.

【００４３】また、画像入力部１に入力される画像デー
タを生成するスキャナーのスキャンされる面上の特定部
分に色ずれの修正用のパターンを設定し、これから作ら
れる入力画像中の模様に基づいて色ずれ修正や色の偏り
補正を行なっても良い。Further, a pattern for correcting color misregistration is set at a specific portion on a scanned surface of a scanner for generating image data to be input to the image input section 1, and based on a pattern in an input image to be created from this. Color shift correction or color bias correction may be performed.

【００４４】図５（ａ）（ｂ）は、さらに異なる色ずれ
量を求める方法を示している。この方法は、赤、緑、青
の各色の濃度値が変化している部分で、各色の波形をず
らしながらマッチング、すなわち重ね合わせを行ない、
それらが最も重なる位置で各色のずらし量を求めるよう
になる。ここでのマッチングの方法はパターン認識の分
野で良く使われているマッチング技術を使用する事が出
来る。FIGS. 5A and 5B show a method for obtaining further different color shift amounts. In this method, matching is performed while shifting the waveform of each color in a portion where the density value of each color of red, green, and blue is changing, that is, overlapping is performed,
The shift amount of each color is obtained at the position where they overlap most. The matching method here can use a matching technique often used in the field of pattern recognition.

【００４５】色ずれ修正の作業は、図５に示す様な各色
の変化部分だけで実行するようにして、バックグラウン
ド部分や印刷部分や記載部分などの色ずれの影響の無い
部分は色ずれ処理を施さないように構成しても良い。こ
れらの部分では各色をずらしても結果が変わらないから
である。これにより色ずれ修正の高速化がはかれる。入
力画像の色ずれがはっきりわかる部分や特定位置を拡大
表示する機能により、色ずれ状態をオペレータが確認し
やすくする事ができる。The operation of correcting the color misregistration is performed only on the changed portion of each color as shown in FIG. 5, and the portions which are not affected by the color misregistration, such as the background portion, the printed portion and the described portion, are subjected to the color misregistration processing. May not be applied. This is because in these portions, the result does not change even if each color is shifted. This speeds up the correction of color misregistration. The function of enlarging and displaying a portion where a color shift of an input image can be clearly recognized or a specific position can make it easier for an operator to check the color shift state.

【００４６】色ずれ修正において、入力画像を１画素単
位よりも細かい単位でずらすのは、例えば補間を使う事
によって可能となる。この場合、移動すべき位置の濃度
値を、その位置の近傍の画素の濃度値から補間計算して
決定する。In the color misregistration correction, it is possible to shift the input image in units smaller than one pixel by using, for example, interpolation. In this case, the density value at the position to be moved is determined by interpolation calculation from the density values of the pixels near the position.

【００４７】なお、上述で述べた事は、色のボケ方が違
うケースにも適用する事が出来る。図６は、黒の両端に
赤が出現している例であるが、この様なケースは赤に対
して緑と青の焦点がずれてボケている事を示している。Note that the above description can be applied to a case where the color blur is different. FIG. 6 shows an example in which red appears at both ends of black. Such a case indicates that the focus of green and blue is out of focus with respect to red and blurred.

【００４８】この様に、補色ではなく同色が両側にある
場合にはボケ修正用パラメータを得る事ができ、これに
より各色のボケ程度をそろえる事ができる。As described above, when the same color is present on both sides instead of the complementary color, it is possible to obtain a blur correction parameter, whereby the degree of blur of each color can be made uniform.

【００４９】また、上記の各種の補正、修正作業は、前
もって行なわず、後で述べる文字抽出プロセスや文字認
識プロセスの中の処理でそれらの補正や修正用パラメー
タを考慮して当該処理を実行する様にしても良い。The above-described various correction and correction operations are not performed in advance, but are performed in consideration of the correction and correction parameters in a character extraction process and a character recognition process described later. You may do.

【００５０】そして、この様にして色ずれ補正、色偏り
補正された画像入力部１からの出力は、色分離部２に送
られる。The output from the image input unit 1 which has undergone the color shift correction and the color bias correction in this manner is sent to the color separation unit 2.

【００５１】次に、色分離部２での色分離の方法を述べ
る。ここでは、帳票色登録機能について述べると、ＯＣ
Ｒに帳票（記入済みでも未記入でも）を入力した時に、
その画像から色分析をして、その帳票が何色をしている
か、すなわちどの部分がどの様な色をしているかを調
べ、それに基づいて書式情報に色情報を登録する。登録
データとしては、単に色の種類数と色データ（ＲＧＢ）
というものから実際の帳票イメージまで登録してしまう
手段が可能である。Next, a method of color separation in the color separation section 2 will be described. Here, the form color registration function will be described.
When you fill out a form (completed or unfilled) in R,
A color analysis is performed from the image to find out what color the form has, that is, what part has what color, and based on that, the color information is registered in the format information. The registration data is simply the number of color types and color data (RGB)
It is possible to provide a means for registering from the above to the actual form image.

【００５２】勿論、この時、インタラクティブにオペレ
ータが登録内容を修正できる。これは例えば、テスト的
に入力した帳票の画像に対して、様々な色についての色
抜き出し（消去）をトライアンドエラーで行ない、うま
く帳票の印刷部分を抜き出せる（消せる）色を書式情報
に登録するなどの手段である。At this time, of course, the operator can interactively modify the registered contents. For example, for a form image input as a test, color extraction (erasing) of various colors is performed by trial and error, and a color that allows a printed portion of the form to be extracted (erased) well is registered in the format information. Means.

【００５３】また、画面上に人工的な色データ、または
帳票の入力画像を表示し、その上のある位置を指示する
事によりその場所のＲＧＢやＶＳＨや色名称を出力し、
オペレータが登録色を決定するのに分かり易いようにす
る事もできる。Further, an artificial color data or an input image of a form is displayed on the screen, and by specifying a certain position on the screen, RGB, VSH and color names of the position are output.
It is also possible to make it easy for the operator to determine the registered color.

【００５４】このような処理について、ズーム機能によ
ってオペレータが画像の一部分を拡大縮小するようなこ
とができれば便利である。For such processing, it is convenient if the zoom function allows the operator to enlarge or reduce a part of the image.

【００５５】このようにして作られた書式情報に基づい
て、ある特定の色を分離抽出する処理が実行される。A process for separating and extracting a specific color is executed based on the format information thus created.

【００５６】書式情報に登録された色情報（背景色、記
入色を含み、それぞれ複数の可能性を持っていて良い）
に基づいて、消去すべき部分を取り除き、認識すべき部
分を抽出して認識する。Color information registered in the format information (including the background color and the entry color, each of which may have a plurality of possibilities)
, The part to be erased is removed, and the part to be recognized is extracted and recognized.

【００５７】この場合、消去すべき部分を書式情報によ
らず、入力画像から得られる色情報のみから自動的に決
定する様にも構成出来る。たとえば簡単な例では特定の
色のみを消去したり、あるいは、帳票のフォームを構成
する線らしさや、その色から自動手的にフォーム部分の
色を決定できる。認識すべき部分についても同様であ
る。In this case, it is possible to configure such that the part to be erased is automatically determined only from the color information obtained from the input image without depending on the format information. For example, in a simple example, only a specific color can be erased, or the color of a form portion can be automatically determined automatically from the linearity of a form of a form and the color. The same applies to the part to be recognized.

【００５８】色分離は、ＲＧＢから明度、彩度、色相
（ＶＳＨ）に変換し、それに基づいて行なう事も出来
る。また、ＲＧＢ、またはＶＳＨの空間で距離を定義
し、それに基づいて色を分離する事も有効である。The color separation can be performed by converting RGB into lightness, saturation, and hue (VSH) and based on the conversion. It is also effective to define a distance in an RGB or VSH space and separate colors based on the distance.

【００５９】色の名称を定義し、それに基づいて色を分
離する方法はマンマシン・インタフェースを考慮に入れ
ると有力な方式である。この場合、書式情報における定
義などもこの色名称によって行なわれる。The method of defining color names and separating colors based on them is a powerful method in consideration of the man-machine interface. In this case, the definition in the format information is also performed by the color name.

【００６０】この様なシステムでは各色の名称と、ＲＧ
ＢやＶＳＨなどの色表現データとが対応づいて登録され
ている。従って、書式情報などに登録された色名称はＲ
ＧＢやＶＳＨなどの色表現データに変換されて使用さ
れ、また逆にＲＧＢやＶＳＨなどの色表現データがオペ
レータに提示される時は、それはもっともそれに近い色
名称に逆変換されてオペレータに知らされる。In such a system, the name of each color, RG
Color expression data such as B and VSH are registered in association with each other. Therefore, the color name registered in the format information is R
When color expression data such as RGB or VSH is used by being converted to color expression data such as GB or VSH, and conversely, when color expression data such as RGB or VSH is presented to the operator, it is inversely converted into a color name closest to that and notified to the operator. You.

【００６１】次に、色分離の一般的な方法について述べ
る。ここでは簡単のためにグレー画像の例で説明する。
グレー画像の濃度値の頻度分布をとった例を図７（ａ）
（ｂ）（ｃ）に示している。この場合、０で黒、１で
白、中間部分が暗い灰色から明るい灰色としている。そ
して、Ａの分布は図８に示すの画像の頻度分布を取った
ものである。この図では灰色で書かれた数字「５」（Ａ
の部分）の真ん中に明るい灰色の直線（Ｂの部分）が貫
通している例である。これらの頻度分布がそれぞれ図７
のＡとＢに対応している。この時、直線部の一部分ｂは
暗くなっており、頻度分布のｂに対応して数字「５」の
一部分ａは明るくなっており、頻度分布ａに対応してい
る。これらにより全体の頻度分布はＣとなる。この頻度
分布から色分離に対応する濃度分離を行うと、頻度を濃
度値ｍで区切ってＸとＹのエリアに分けられることにな
る。Next, a general method of color separation will be described. Here, an example of a gray image will be described for simplicity.
FIG. 7A shows an example in which the frequency distribution of density values of a gray image is obtained.
(B) and (c). In this case, 0 is black, 1 is white, and the middle part is dark gray to light gray. The distribution of A is obtained by taking the frequency distribution of the image shown in FIG. In this figure, the number "5" (A
This is an example in which a light gray straight line (part B) penetrates the center of the part (part B). These frequency distributions are shown in FIG.
A and B of FIG. At this time, a part b of the straight line part is dark, and a part a of the number “5” is bright corresponding to b of the frequency distribution, and corresponds to the frequency distribution a. Thus, the entire frequency distribution becomes C. When density separation corresponding to color separation is performed from this frequency distribution, the frequency is divided by the density value m and divided into X and Y areas.

【００６２】ところが、この分離法で画像を分離する
と、図９（ａ）（ｂ）に示すようにａやｂの部分がノイ
ズとして残ってしまう。また直線は数字「５」によって
分断されてしまう。However, when an image is separated by this separation method, the portions a and b remain as noise as shown in FIGS. 9A and 9B. Also, the straight line is divided by the number “5”.

【００６３】データが濃度情報だけでなく例えばＲＧＢ
の３色の色情報の場合は、前に説明した例と同様に、１
次元だったものを３次元に拡張して考えればよい。３次
元空間の中である領域の色を同一色と見做して色分離す
ればよい。The data is not only density information but also RGB
In the case of the color information of the three colors
What is a dimension can be extended to three dimensions. What is necessary is just to consider the color of the area | region in a three-dimensional space as the same color, and to carry out color separation.

【００６４】しかしながら、このような濃度の色空間内
における領域分割だけで、色分離するのでは、前に述べ
たように正しい色分離はできない。この場合、Ａの部分
は直線であるとか、Ｂの部分は数字「５」であるなど各
画素の幾何学的な情報、すなわち位相情報を濃度情報と
併用しない限り、正しい色分離は論理的に不可能である
ことはこれまでの説明で明らかである。However, if color separation is performed only by dividing the area in the color space having such a density, correct color separation cannot be performed as described above. In this case, unless the geometrical information of each pixel, that is, the phase information is used together with the density information, such as that the part A is a straight line and the part B is a numeral “5”, correct color separation is logically possible. What is impossible is clear from the above explanation.

【００６５】これを更に詳しく述べると、いま、図９
（ａ）に示す結果が得られたとする。ここで、それぞれ
のブロックａ、ｘ、ｙ、ｚ、ｗについて、それらの各組
についてそれらが直線の一部であることをチェックす
る。この結果、例えばｘとｙの間のエリアでは濃度値が
暗くなっているので、これらの間には何か黒い線が可能
性があることになり、ｘとｙは直線の一部である可能性
ありとなる。同様なことがｙとｚ、ｚとｗについてもい
え、また、ｘ、ｙ、ｚ、ｗの全体としても直線性が肯定
的となる。これらを各ブロックの輪郭線を延長すること
により接続して、図１０に示すような１つの直線として
得ることができる。This will be described in more detail.
It is assumed that the result shown in FIG. Now, for each block a, x, y, z, w, check that for each set of them they are part of a straight line. As a result, for example, since the density value is dark in an area between x and y, there may be some black line between them, and x and y may be part of a straight line. There is a nature. The same is true for y and z, z and w, and the linearity is positive for x, y, z and w as a whole. These are connected by extending the outline of each block to obtain one straight line as shown in FIG.

【００６６】また、例えばａとｙは輪郭線の延長がマッ
チしないことや、間にバックグラウンドカラーが存在す
るなどの理由により、接続不可能である。従って、ａは
孤立したブロックであり、ノイズとして除去される。逆
にｂの凹みは輪郭線を追跡することにより発見される
が、その凹み部分の濃度値が暗いながらもその輪郭線が
ｘｙｚ全体を直線と見做したときの輪郭線と一致するこ
とから、ｂは直線の一部と判断され、最終的に図１１に
示すような直線を得ることができる。Also, for example, a and y cannot be connected because the extension of the outline does not match or a background color exists between them. Therefore, a is an isolated block and is removed as noise. Conversely, the dent of b is found by tracing the contour, but since the density of the dent is dark, but the contour coincides with the contour when the entire xyz is regarded as a straight line, b is determined to be a part of a straight line, and finally a straight line as shown in FIG. 11 can be obtained.

【００６７】図９（ｂ）に示す数字「５」についても同
様で、曲線らしさや文字の形状に関する情報から正しく
数字「５」を抜き出すことができる。The same applies to the numeral “5” shown in FIG. 9B, and the numeral “5” can be correctly extracted from the information on the likelihood of a curve and the shape of the character.

【００６８】すなわち、本発明では、暫定的な色分離の
後、暫定的に文字切り出し処理、文字認識処理、線分抽
出処理、フォーム構造理解処理などを施し、それらの結
果を用いることにより最終的な色分離結果を決定すると
いうものである。すなわち、認識、抽出、理解などの処
理途中やその結果で色分離が疑わしいと判断される場所
があれば、フィードバックを行って、当該部分の色分離
についてその結果を使いながら、分離処理のやり直しを
実行するように構成すると言う事である。That is, in the present invention, after provisional color separation, provisional character cutout processing, character recognition processing, line segment extraction processing, form structure understanding processing, etc. are performed, and the final result is obtained by using those results. It is to determine a color separation result. In other words, if there is a place where it is judged that color separation is suspicious during the processing of recognition, extraction, understanding, etc. or as a result, feedback is performed, and the separation processing is redone while using the result of color separation of the relevant part. It is configured to execute.

【００６９】他の例として、例えば直線部分を多く含
み、また必要な場合にはそれらの並びを調べることによ
って、それらが帳票のフォーム部分（未記入帳票画面）
を構成していることが判明した時に、それらをなす色を
自動的に帳票のフォーム部分の色として決定することが
できる。As another example, for example, if a lot of straight lines are included, and if necessary, their arrangement is checked to find out the form part of the form (unfilled form screen).
Can be automatically determined as the color of the form part of the form.

【００７０】また、文字程度の大きさの塊が連続してい
る場合には、それらの色を文字の記入色、印刷色として
決定できる。In the case where chunks about the size of a character are continuous, those colors can be determined as the writing color and the printing color of the character.

【００７１】このように、本発明では、書式情報に登録
された色情報や書式情報そのものがないときでも、入力
された帳票から色分析して自動的に帳票を構成する複数
の代表色を分離することが可能である。勿論、書式情報
に色情報や書式情報が定義されているときは、もっと容
易に上記の目的が達せられることはいうまでもない。次
に、図１２は領域ａ、ｂ、ｃからなる直線の近くにノイ
ズｄが存在する例である。そして、ａ、ｃの色をＣ１、
領域ｂ，ｄの色をＣ２とし、Ｃ２はＣ１とバックグラウ
ンドカラーの中間的な色であるとする。As described above, according to the present invention, even when there is no color information or format information itself registered in the format information, a plurality of representative colors constituting the form are automatically separated by analyzing the color of the input form. It is possible to Of course, when the color information and the format information are defined in the format information, it goes without saying that the above object can be achieved more easily. Next, FIG. 12 shows an example in which a noise d exists near a straight line composed of regions a, b, and c. And the colors of a and c are C1,
It is assumed that the colors of the regions b and d are C2, and C2 is an intermediate color between C1 and the background color.

【００７２】このような画像に対して、ある任意の点を
領域の種としてそこから始めて徐々に領域を拡大してい
く。この領域拡大については、その領域の色と同一の色
と見做せる画素で、当該領域に接続している画素をその
領域に取り込むことによって拡大していく。どの領域に
も属さなくなるまでこれを繰り返すことにより、同図に
示す４個の領域ａ、ｂ、ｃ、ｄが抽出される。In such an image, an arbitrary point is used as a region seed, and the region is gradually expanded starting from there. The area is enlarged by taking in pixels that can be regarded as the same color as the color of the area and connected to the area to the area. By repeating this process until it does not belong to any region, four regions a, b, c, and d shown in FIG.

【００７３】このような領域抽出法は、画像処理分野で
知られている領域セグメンテーションの方式で、これは
領域成長法と呼ばれる手法の１つである。この他に、ス
プリット・マージやリラキゼーションなどの様々な方式
が可能である。Such a region extraction method is a region segmentation method known in the field of image processing, and is one of methods called a region growing method. In addition, various methods such as split merge and relaxation are possible.

【００７４】次に、各領域部分について前に述べたと同
様な方法で直線性の判断からｂを直線一部、すなわちｂ
の部分の色をＣ１として分離し、ｄの部分をノイズ、す
なわちＣ２の色として分離する。Next, for each area portion, b is determined to be a part of a straight line, that is, b from the determination of linearity in the same manner as described above.
Is separated as C1, and the portion of d is separated as noise, that is, as the color of C2.

【００７５】一般的にいえば、本発明は領域セグメンテ
ーションを行っての領域の代表色を決め、それらを総合
的に判断して全体における代表色を決めるというもので
ある。また、必要に応じてこれら代表色の情報から色分
離を行う。Generally speaking, according to the present invention, representative colors of an area are determined by performing area segmentation, and the overall colors are determined by comprehensively judging them. Further, color separation is performed from the information on the representative colors as needed.

【００７６】領域セグメンテーションに色の近さの情報
を導入することによっても本発明による正しい色分離を
行うことができる。例えば成長法で行っている時に同一
色であるとの判断を緩めておけば、ａ，ｂ，ｃが１領域
として検出されるようにも設定できる。The correct color separation according to the present invention can also be performed by introducing information on the proximity of colors to the area segmentation. For example, if the determination that the colors are the same when the growth method is performed is relaxed, it can be set so that a, b, and c are detected as one region.

【００７７】すなわち、成長の過程で接触している輪郭
部分の多い領域についてその両者の色が近いとしたとき
に、具体的にはＣ１とＣ２が近いと判断されたとして、
これらをマージして１つの大きな領域にする。こうすれ
ばｄに大きく接触するＣ２に近い色は無いかまたはそれ
はバックグラウンドカラーなので、それはいずれにせよ
ノイズとして無視されてしまう一方ｂの領域は直線の一
部として正しく領域抽出される。That is, when it is determined that the colors of two areas having many contour portions in contact with each other in the course of growth are close to each other, specifically, it is determined that C1 and C2 are close to each other.
These are merged into one large area. In this way, there is no color close to C2 that greatly touches d, or it is a background color, so it is ignored anyway as noise, while region b is correctly extracted as part of a straight line.

【００７８】このような本方式では色情報の他に画素の
位相情報、すなわち各画素の幾何学的関係を使っている
ので、従来の単に色の統計的な情報だけに基づいて代表
色を決めたり色分離する方式よりも、高精度な代表色決
定や色分離が可能となる。In this method, since the phase information of the pixel, that is, the geometrical relationship of each pixel is used in addition to the color information, the representative color is determined based on only the conventional statistical information of the color. This makes it possible to determine a representative color and perform color separation with higher accuracy than the method of performing color separation.

【００７９】ある特定の色を分離抽出する為に、画像の
色情報と画像内で表現されているもの、すなわち図形、
直線、曲線、文字などの形状情報の両者を用いる事は有
効である。図１３（ａ）（ｂ）はその例を示したもので
ある。１３１は青い直線の途中が青緑に変性しているも
のである。これから単純に青色を分離すると１３２に示
すように直線はかすれてしまう。また、であるからと言
って青緑の部分も青と見倣してしまうと、例えば、１３
３に示すように青い直線に青緑のゴミが付着している様
なケースでは１３４に示す様なゴミの付着したままのイ
メージが色分離されてしまう。In order to separate and extract a specific color, the color information of the image and the one represented in the image, that is, a figure,
It is effective to use both shape information such as straight lines, curves, and characters. FIGS. 13A and 13B show an example thereof. Numeral 131 indicates that the middle of the blue straight line is changed to blue-green. If the blue color is simply separated from this, the straight line will be blurred as shown by 132. Further, if the blue-green portion is also regarded as blue because of this, for example, 13
In the case where blue-green dust is attached to the blue straight line as shown in FIG. 3, the image with the dust attached as shown at 134 is color-separated.

【００８０】そこで、本発明では、まず青い部分につい
て形状分析を行ない、ＰとＲの部分が直線である事を認
識し、次いでかすれ部分とみなされるＱの部分をしら
べ、Ｐ、Ｑ、Ｒの色が近い事、またＰ、Ｑ、Ｒ全体で直
線とみなされる形状を持っている事を条件として、Ｐ、
Ｑ、Ｒ全部を青色エリアとみなして色分離する。この様
にすれば１３３に示す様なケースでもゴミのエリアＡが
誤って青色に色分離される事はなくなる。Therefore, in the present invention, first, shape analysis is performed on the blue portion, and it is recognized that the portions P and R are straight lines. Then, the portion Q which is regarded as a blurred portion is examined, and the P, Q, R Provided that the colors are close, and that P, Q, and R have a shape that can be regarded as a straight line as a whole,
Color separation is performed by regarding all of Q and R as a blue area. In this way, even in the case shown by 133, the dust area A is not erroneously separated into blue.

【００８１】図１４は、青い線と赤い線がオーバーラッ
プしている様子を示している。Ａ１、Ｃ、Ａ２は青い
線、Ｂ１、Ｃ、Ｂ２は赤い線である。Ｃはオーバーラッ
プエリアで紫色をしている。FIG. 14 shows how the blue line and the red line overlap. A1, C and A2 are blue lines, and B1, C and B2 are red lines. C is purple in the overlap area.

【００８２】この場合、通常の方法で赤を抜き出しても
青を抜き出してもＣを抜き出す事はできない。また、赤
を抜き出す時に常に紫を抜き出す様にすると真に紫色を
している部分で抜いてはいけない所まで抜いてしまう事
があり、不都合である。In this case, C cannot be extracted by extracting red or blue by the usual method. Also, if purple is always extracted when red is extracted, it may be inconvenient that a part that is truly purple may not be extracted.

【００８３】そこで例えば青を抜くケースでは、まず、
青に接触しているエリアを観察し、そこに当該青色とは
別の色の混色である可能性のある色が存在している時、
さらにその混色エリアの近傍を調べて該混色をなすもう
一方の色が存在するエリアがある時には、その混色エリ
アも青のエリアとして抜き出すこととする。Therefore, for example, in the case of removing blue, first,
Observe the area in contact with blue, and when there is a color that may be a mixture of colors other than the blue,
Further, the vicinity of the mixed color area is examined, and if there is an area where the other color forming the mixed color exists, the mixed color area is also extracted as a blue area.

【００８４】この時、この青のエリアＡ１、Ａ２の形状
が直線を表しておりまた、混色をなすもう一方の色であ
るＢ１、Ｂ２のエリアも直線を表しており、Ｃのエリア
もそれらの直線に矛盾しない事をチェックする様にすれ
ば、より一層精度の高い色分離が可能となる。At this time, the shapes of the blue areas A1 and A2 represent straight lines, the areas B1 and B2, which are the other colors forming the mixed colors, also represent straight lines, and the area C also has those lines. By checking that there is no inconsistency with the straight line, more accurate color separation can be achieved.

【００８５】この場合の形状チェックは直線だけでな
く、曲線や、図形、文字形状ないし文字である確からし
さ、などのチェックによっても良い。In this case, the shape check may be performed not only by checking a straight line but also by checking a curve, a figure, a character shape or certainty of a character.

【００８６】図１５は別の例である。エリアＣが赤みが
かった黒であれば上記とほぼ同様な処理で色分離ができ
るが、赤と黒の混色が黒であるとすればｘとｚ、ｙとｗ
を接続することによって赤の色分離が可能となる。また
Ｂ２エリアが存在しない時でもｘとｚを接続する曲線を
エリアＢ１の形状から推定する事により、赤の色分離が
可能となる。FIG. 15 is another example. If area C is reddish black, color separation can be performed in substantially the same manner as above, but if the mixed color of red and black is black, x and z, y and w
, The color separation of red becomes possible. Further, even when the area B2 does not exist, red color separation can be performed by estimating a curve connecting x and z from the shape of the area B1.

【００８７】線幅情報と色情報の両者を使う事によって
色分離を行なう方法は効果的である。この場合書式情報
に色分離にかんする指定を記載するとすれば、色分離す
べきものの色に関する情報に付加して線幅のしきい値
や、ハッチングの場合には線のピッチなどの情報を設定
する。The method of performing color separation by using both line width information and color information is effective. In this case, if the specification relating to color separation is described in the format information, information such as a line width threshold and a line pitch in the case of hatching are set in addition to information relating to the color of the object to be color separated. .

【００８８】図１６は帳票上に観測される線１６１上の
ある部分に設定された円領域１６２を示すものである。
例えば、その中の黒がその重心位置からＮ方向に黒がそ
の存在を調べて、その黒画素の連続する長さが最小にな
る方向の長さを持って線幅と見做す方式が知られてい
る。また別の手段としては、縦横の黒画素のランレング
スから線幅の推定値を求める方法も考えられる。いずれ
にしても、このようにして部分的な線幅情報を得る方法
は公知である。FIG. 16 shows a circular area 162 set at a certain portion on a line 161 observed on a form.
For example, there is a known method in which the black is examined in the N direction from the position of the center of gravity to determine the existence of black, and the length of the black pixel in the direction in which the continuous length is minimized is regarded as the line width. Have been. As another means, a method of obtaining an estimated value of the line width from the run length of the vertical and horizontal black pixels can be considered. In any case, a method of obtaining partial line width information in this manner is known.

【００８９】これにより、その部分が特定の色をしてい
るのか、または特定の色でないかという事実と、その部
分に決められた線幅以下の部分であるかどうかという事
実の両者を使うことによって効果的な色分離を行うこと
ができる。Thus, it is possible to use both the fact that the part has a specific color or is not a specific color, and the fact that the part is a part that is smaller than the line width determined. Thus, effective color separation can be performed.

【００９０】上記の手法で、例えば帳票上で図１７のよ
うなハッチングが施された領域から、同一色で線幅の広
い文字部分１７１を抜き出すことができる。また、この
時、仮に線幅が狭くても、色の違う部分は例えば、１７
２の様な部分はハッチング部分を消去した後でも分離し
て残す事ができる。With the above method, for example, a character portion 171 having the same color and a wide line width can be extracted from a hatched area as shown in FIG. 17 on a form. At this time, even if the line width is narrow, the portion having a different color is, for example, 17 lines.
The portion like 2 can be left separately even after the hatched portion is erased.

【００９１】ところで、具体的な色分離の指定方法には
次に示すものが主要なものである。特定の色を分離する
場合、特定の色以外を分離する場合、無彩色を分離する
場合、有彩色を分離する場合、白を分離する場合、黒を
分離する場合、またはそれらの組合せである。By the way, the following are the main methods of specifying color separation. When separating a specific color, when separating colors other than a specific color, when separating achromatic colors, when separating chromatic colors, when separating white, when separating black, or a combination thereof.

【００９２】特定の色の分離する場合は、色相（Ｈ）の
値を主に基準として判別する。無彩色、有彩色の分離の
場合は、彩度（Ｓ）の値を主に基準として判別する。白
や黒を分離する場合は彩度（Ｓ）の値が小さいものの中
から明度（Ｖ）の大小で判定を行なう。When a specific color is separated, the determination is made mainly based on the value of the hue (H). In the case of separation of achromatic color and chromatic color, the determination is made mainly based on the value of the saturation (S). When separating white and black, the determination is made based on the magnitude of the lightness (V) from among those having the small values of the saturation (S).

【００９３】色分離した後の結果から、帳票の中でバッ
クグラウンドカラーである可能性のもっとも高いもの、
例えば、もっとも支配的な代表色を選ぶ事ができる。す
なわち、各代表色の中で総面積の最大の色を選ぶのであ
る。この結果を認識結果に付加して出力する事は有用で
ある。すなわち自動的に推定されたバックグラウンドカ
ラーを認識結果に付加して出力するのである。From the result after the color separation, the most likely background color in the form is
For example, you can choose the most dominant representative color. That is, the color having the largest total area is selected from the representative colors. It is useful to add this result to the recognition result and output it. That is, the automatically estimated background color is added to the recognition result and output.

【００９４】また、認識された帳票のフォーム部分の色
や、帳票上の特定部分の色を出力する事も有用である。It is also useful to output the color of the recognized form part or the color of a specific part on the form.

【００９５】また、認識結果として得られる文字コード
にその文字の帳票上での色やその文字の記載された位置
のバックグラウンドの色や、その文字を囲む文字枠の色
などを付加して出力することも有用である。The character code obtained as a result of recognition is output after adding the color of the character on the form, the background color of the position where the character is described, the color of the character frame surrounding the character, and the like. It is also useful to do so.

【００９６】また、フォーム部分の色情報は罫線情報な
どと共に帳票判別に用いる事も可能である。これら抽出
された色情報に基づいて、オペレータにアラームを出し
たり、その旨を表示する様にも構成できる。また、その
色情報に基づいて修正画面表示を変える様にも構成でき
る。Further, the color information of the form part can be used for form discrimination together with the ruled line information. On the basis of the extracted color information, an alarm can be issued to the operator or a message to that effect can be displayed. Further, it is also possible to change the display of the correction screen based on the color information.

【００９７】帳票上の各部分領域ごとに色分離手段を変
えることは非常に重要な手段である。It is very important to change the color separation means for each partial area on the form.

【００９８】まず、あらかじめ書式情報には部分領域の
定義とそのエリア内で通常の２値化処理を行うか、また
は色分離処理を行うかが記載されている。実際の処理が
行われる時点で、例えば図１８に示す帳票が入力された
とする。この帳票において、認識すべきエリアは１８１
で、画像のみを取り出して、その画像データをそのまま
出力すべきエリアが１８２である。書式情報にはそのよ
うに記載されているとする。この場合エリア１８１で
は、文字枠を示すプレプリントの部分を取り除いて文字
を認識する必要がある。一方、エリア１８２では文字枠
や項目名などのプレプリント部分は消去してはいけない
ので、それらの色は取り除かれない。もし、この場合に
それらを一様に色分離してしまうと不都合を生じる。First, the format information previously describes the definition of a partial area and whether to perform normal binarization processing or color separation processing within the area. It is assumed that the form shown in FIG. 18 is input at the time when the actual processing is performed. In this form, the area to be recognized is 181
182 is an area where only the image is taken out and the image data is to be outputted as it is. It is assumed that the format information is described as such. In this case, in the area 181, it is necessary to recognize the character by removing the preprinted portion indicating the character frame. On the other hand, in the area 182, preprinted portions such as character frames and item names must not be erased, so that their colors are not removed. In this case, if they are uniformly color-separated, a problem occurs.

【００９９】この場合の具体的な処理は次の様にして行
なわれる。エリア１８１に関しては書式情報に基づきカ
ラーデータで画像を入力する。一方、１８２に関しては
２値で画像を入力する。また１８２に関しては入力され
たカラーデータに対してＲＧＢの平均値を計算してモノ
クロ画像を得、それを２値か処理して２値画像を得るよ
うにしてもよい。次に、書式情報に基づきエリア１８１
では文字枠の色を分離して消去する。そしてこれに対し
て文字認識が実行される。１８２についてはそのまま画
像データが出力される。The specific processing in this case is performed as follows. For the area 181, an image is input in color data based on the format information. On the other hand, with respect to 182, an image is input in binary. Regarding 182, a monochrome image may be obtained by calculating an average value of RGB with respect to the input color data, and the monochrome image may be processed to obtain a binary image. Next, the area 181 is determined based on the format information.
Then, the color of the character frame is separated and deleted. Then, character recognition is performed on this. For 182, the image data is output as it is.

【０１００】こうすることによって、処理時間のかかる
色分離作業を帳票上の一部分だけに限定することができ
る。通常の２値か処理は短時間で実行できるので、色分
離作業を一部分に限定することにより、全体的な処理時
間の向上を図ることができる。By doing so, it is possible to limit the time-consuming color separation work to only a part of the form. Since ordinary binary processing can be executed in a short time, the overall processing time can be improved by limiting the color separation work to a part.

【０１０１】上記の例は、色分離するかどうかを部分領
域ごとに変える方式であるが、色分離手段を複数の方式
から選択する方式も可能である。In the above example, whether or not to perform color separation is changed for each partial area. However, a method in which the color separation means is selected from a plurality of methods is also possible.

【０１０２】例えば、エリア１８２の印鑑部の領域に於
いては、赤または橙の色分離を行うように設定された色
分離プロセスを起動し、またエリア１８１においては黒
色を分離するようなプロセスを起動するように書式情報
を登録することによって文字認識と印鑑抽出を高速に行
うことができる。こうすれば、すべての領域についてす
べての代表色を色分離するというような、従来の方式、
例えば特開平３−１４０７７号公報などの方式に比べ
て、高速に処理を行うことができる。For example, in the area of the seal portion of the area 182, a color separation process set to perform red or orange color separation is started, and in the area 181, a process of separating black is performed. By registering the format information so as to be activated, character recognition and stamp extraction can be performed at high speed. In this way, conventional methods, such as separating all representative colors for all areas,
For example, processing can be performed at a higher speed than in the method disclosed in Japanese Patent Application Laid-Open No. 3-14077.

【０１０３】本発明では、これらの指定に附属して、そ
の領域における文字の切り出し手段、文字認識手段、文
字認識後処理手段などを複数の方式の中から特定のもの
を選択できるような指定を行なえる様にしても良い。In the present invention, attached to these designations, there are designations such as character cutout means, character recognition means, and character recognition post-processing means in the area which allow a specific one to be selected from a plurality of methods. You may be able to do it.

【０１０４】帳票の特定部分において特定の色を色分離
し、その部分にその色が存在するかどうかを判定し、存
在すればその色の対象物を認識し、その認識結果を出力
する機構は便利である。当該色の認識対象物が無いと判
断された時は、その旨出力する。またこの時、その部分
に記載された対象物の色を出力するように構成しても良
い。もちろん、この時に認識対象物を認識することな
く、これらの判定結果や色情報のみを出力するように構
成しても良い。A mechanism for color-separating a specific color in a specific portion of a form, determining whether the color exists in that portion, recognizing an object of that color if it exists, and outputting a recognition result. It is convenient. When it is determined that there is no recognition target object of the color, it is output to that effect. At this time, the color of the object described in that portion may be output. Of course, at this time, it may be configured to output only these determination results and color information without recognizing the recognition target.

【０１０５】この時、印鑑判定部を本装置に組込むこと
により当該部分に印鑑が存在するかどうかや、その印鑑
がどの種類のものであるかを判定し、その結果を認識結
果として出力する事ができる。たとえば、印鑑がない時
や、所定の印鑑が押されていない時にそれを自動的にオ
ペレータに知らせたり、警告するようにシステムを構成
できる。すなわち、これら抽出された印鑑情報に基づい
て、オペレータにアラームを出したり、その旨を表示す
る様にも構成できる。また、その印鑑情報に基づいて修
正画面表示を変える様にも構成できる。At this time, it is possible to determine whether or not a seal is present in the relevant part and to determine what kind of the seal is by incorporating the seal determining unit into the apparatus, and to output the result as a recognition result. Can be. For example, the system can be configured to automatically notify or warn the operator when there is no seal or when a predetermined seal is not pressed. That is, based on the extracted seal stamp information, an alarm can be issued to the operator or the effect can be displayed. Further, it is also possible to change the display of the correction screen based on the seal information.

【０１０６】印鑑判定部は例えば、特定の色、例えば赤
で色分離した画像から、当該色の対象部分が存在してい
る事、またその部分の外側の輪郭形状がたとえば丸や楕
円である事、また内部形状がある程度複雑である事実等
から判定できる。The seal determining unit determines, for example, that a target portion of the color is present from an image separated by a specific color, for example, red, and that the contour shape outside the portion is, for example, a circle or an ellipse. And the fact that the internal shape is complicated to some extent.

【０１０７】また、印鑑の形状の類似性の判定は、パタ
ーン認識等で良く用いられているパターンマッチング処
理などにより行なう事ができる。The similarity of the shape of the seal can be determined by a pattern matching process often used in pattern recognition and the like.

【０１０８】帳票の色分離を行なう事によって得られる
結果や、さらにそれらに何らかの処理を施した結果を画
面に表示し、オペレータの確認を待って次の処理に進む
様にも構成できる。この場合、オペレータが処理結果に
不満であれば、処理結果を直接修正できる様な機構を設
けても良いし、また何らかのパラメータを変更してもう
一度当該処理をやり直しさせられる様にも構成して良
い。A result obtained by performing color separation of a form and a result obtained by subjecting the form to some processing may be displayed on a screen, and the processing may be advanced to the next processing after an operator's confirmation. In this case, if the operator is dissatisfied with the processing result, a mechanism that can directly correct the processing result may be provided, or a configuration may be adopted in which any parameter is changed and the processing can be performed again. .

【０１０９】また色分離時に判定困難な色が発見された
時に、それを必要な場合にはその周囲の状態と共に表示
して、オペレータに正解入力を促す様にも構成できる。
以降、この正解データに基づいて色分離処理を行なうよ
うにも構成できる。When a color that is difficult to determine is found at the time of color separation, if necessary, it is displayed together with its surrounding state to prompt the operator to input a correct answer.
Thereafter, the color separation processing can be performed based on the correct answer data.

【０１１０】まず、記入欄に対応する部分に第１の特定
色、例えば黒に色分離される部分があり、それらが文字
列として認識されれば、それを認識結果として出力す
る。同様に、必要であれば第Ｎの特定色まで同様の処理
を繰り返しても良い。これらに関してはこの特定色が帳
票のフォーム色と同一でなければ、色分離された画像デ
ータの中では、文字枠や記入欄は消去されていると見倣
され、文字認識には「枠なし文字認識方式」が適用され
る。また逆にこれらが同一であるときは「枠あり文字認
識方式」が適用される。記入欄のどこにも何も存在しな
いと見倣された時には空白を出力する。First, a portion corresponding to the entry column includes a portion that is color-separated into a first specific color, for example, black. If these are recognized as a character string, they are output as a recognition result. Similarly, if necessary, the same processing may be repeated up to the N-th specific color. If these specific colors are not the same as the form color of the form, character frames and entry fields are assumed to be erased in the color-separated image data. "Recognition method" is applied. Conversely, when they are the same, the “character recognition method with a frame” is applied. If it is assumed that nothing is present anywhere in the entry field, output a blank.

【０１１１】またこれとは別に次の様にもできる。帳票
のフォームの色と同一色でない色として分離された部分
がその記入欄に存在して、それらが文字列として認識さ
れれば、それらを出力する。この場合には文字枠や記入
欄は消去されていると見倣されて、文字認識に「枠なし
文字認識方式」が適用される。そしてその記入欄で、フ
ォームと同一色の存在範囲が小さければ、記入欄の文字
はフォームと同一色で記載されていると見倣されて、文
字認識には「枠あり文字認識方式」が適用される。記入
欄のどこにも何も存在しないと見倣された時は空白を出
力する。これらの処理はどちらが先に行なわれても良
い。In addition, the following can be performed separately. If a portion separated as a color that is not the same as the color of the form of the form exists in the entry column and these are recognized as a character string, they are output. In this case, it is assumed that the character frame and the entry field have been erased, and the “frameless character recognition method” is applied to character recognition. If the existing area of the same color as the form is small in the entry field, the characters in the entry field are assumed to be described in the same color as the form, and the "character recognition method with a frame" is applied for character recognition. Is done. If it is assumed that nothing is present anywhere in the entry field, output a blank. Either of these processes may be performed first.

【０１１２】「枠なし文字認識方式」とは、文字枠が存
在しないと仮定して文字の切り出しと文字認識を行なう
もので、さまざまな方式が知られている。The "frameless character recognition system" performs character extraction and character recognition on the assumption that there is no character frame, and various systems are known.

【０１１３】「枠あり文字認識方式」とは、文字枠が存
在すると仮定して文字の切り出しと文字認識を行なうも
のである。これにはあらかじめ文字枠の位置や形状が書
式情報などから与えられている「既知形状文字枠あり文
字認識方式」と、そうではない「未知形状文字枠あり文
字認識方式」の２種類がある。これらは、それぞれ書式
情報の内容によってどちらかが採用される。The "character recognition method with a frame" is to perform character cutout and character recognition on the assumption that a character frame exists. There are two types of "character recognition method with character frame with known shape", in which the position and shape of the character frame are given in advance from format information and the like, and "character recognition method with character frame with unknown shape", which is not so. Either of these is adopted depending on the contents of the format information.

【０１１４】またいずれの方式においても、「罫線接触
あり文字認識方式」と「罫線接触なし文字認識方式」が
あり、実際のパターンの状態からどちらかの方式が選択
される。In each method, there are a "character recognition method with ruled line contact" and a "character recognition method without ruled line contact", and either method is selected from the actual pattern state.

【０１１５】これらは次に示す手順で実行される。最初
に文字枠推定処理が行なわれ、文字枠位置、またはその
候補が決定される。次に文字部分を推定し、文字と文字
枠の接触の度合を調べる。もし、ここで接触が少ないか
無ければ、「罫線接触なし文字認識」が選択される。も
ちろん、接触か非接触かで方式を分けない様に構成する
ことも可能である。次に、求められた罫線位置情報に基
づきながら文字の切り出しと文字認識を行なって行く。
罫線接触がある場合には、適切な文字と罫線の切り離し
処理が行なわれる。These are executed according to the following procedure. First, a character frame estimation process is performed to determine a character frame position or a candidate thereof. Next, the character portion is estimated, and the degree of contact between the character and the character frame is examined. If there is little or no contact here, "character recognition without ruled line contact" is selected. Of course, it is also possible to configure so that the method is not divided between contact and non-contact. Next, character extraction and character recognition are performed based on the obtained ruled line position information.
If there is a ruled line contact, an appropriate character and ruled line separation process is performed.

【０１１６】これら罫線切り離し技術や、接触文字の切
り離し処理を伴う様な文字の切り出し処理や文字認識処
理については、そうでない場合も含めて従来技術として
知られているものを使う事ができる。As for the ruled line separating technique, the character extracting processing and the character recognizing processing accompanied with the contact character separating processing, those which are known as the prior art can be used even if they are not.

【０１１７】この様に構成する事によって、常に同一の
方式で文字認識するよりも格段に精度の良い認識系を構
成する事ができる。With such a configuration, it is possible to configure a recognition system with much higher accuracy than character recognition by the same method at all times.

【０１１８】なお、色分離と色ずれ修正を同時に行なう
事ができる。色の偏り補正や色のボケ具合をそろえる補
正処理なども同時に行なえる。例えば黒い部分を分離す
る場合、もし、あらかじめ色ずれ修正パラメータが判明
している時には、ＲＧＢ各色のシフトと色分離（特定の
色を取り出す作業）を同時に行なえば良い。また、もし
あらかじめ色ずれ修正パラメータが決まっていない時で
も、黒い部分の両側に補色関係の色があれば、それらか
ら色ずれを推定して、その結果に基づいてＲＧＢ各色を
シフトしたと見なした上での黒部分を分離すれば良い。
他の色についても同様の処理が可能である。Note that color separation and color shift correction can be performed simultaneously. At the same time, color deviation correction and correction processing for adjusting the degree of color blur can be performed. For example, in the case of separating a black portion, if the color misregistration correction parameter is known in advance, it is only necessary to simultaneously perform the shift of each color of RGB and the color separation (operation of extracting a specific color). Also, even when the color misregistration correction parameters are not determined in advance, if there are complementary colors on both sides of the black portion, it is assumed that the color misregistration is estimated from them and the RGB colors are shifted based on the result. What is necessary is just to separate the black part after having done.
Similar processing can be performed for other colors.

【０１１９】次に、画像処理部４での色の表現について
説明する。画像処理部４では、色分離することにより得
られた複数の代表色、すなわち微妙な色の違いを無視し
て選ばれた色について、または、入力された画像データ
についてこれらを画像の１ライン毎にランレングスとし
て表し、この１ライン分の表現に対してポインタを作
り、このポインタ列からなるテーブルを作成して、この
ポインタ列と該代表色ランレングス表現からなるデータ
を画像データとして格納する。そして、このような画像
表現を入力として、色表現に基づいて画像処理または文
字認識を行う。Next, the expression of colors in the image processing section 4 will be described. In the image processing unit 4, a plurality of representative colors obtained by color separation, that is, colors selected ignoring subtle color differences, or input image data, , A pointer is created for this one-line representation, a table consisting of this pointer row is created, and the data consisting of this pointer row and the representative color run-length representation is stored as image data. Then, using such an image expression as input, image processing or character recognition is performed based on the color expression.

【０１２０】まず、色ランレングス表現の画像からその
中の一部分をもとのビットマップデータに変換して文字
認識や画像処理を行うことができる。色別のランレング
ス表現や色別のポインタになっていないと、一部分のデ
ータを復号化することができず、画像全体を復号化しな
ければならなかった。またポインタ表現されたテーブル
と白黒ランレングスデータ表現を組み合わせた形式の画
像表現に対して文字認識を行う方式も知られているが、
このままではカラー画像を扱うことができなかった（Ｕ
ＳＰ４４２６７３１）。ある部分のある特定の色部分を
ビットマップデータに変換したい場合、その部分に対応
するラインのポインタから色表現されたランレングスデ
ータを参照して特定の色に関するものだけでビットマッ
プを変換するか、またはその部分の特定の色のポインタ
だけを参照してそのポインタが指し示すランレングスデ
ータからビットマップデータを得るようにすれば、高速
に特定部分の特定色のみを抽出することができて、それ
に基づいて文字認識や画像処理を行うことができる。こ
れにより全画面を復号化して前記処理を行うよりもはる
かに高速な処理が可能になる。また、復号化せずに直接
ランレングス表現に対して処理を行うようにすれば一層
高速な処理が可能になる。First, character recognition and image processing can be performed by converting a part of the color run-length representation image into the original bitmap data. Unless a run length expression for each color or a pointer for each color is used, some data cannot be decoded, and the entire image must be decoded. There is also known a method of performing character recognition on an image expression in a format in which a table represented by a pointer and a black-and-white run-length data expression are combined.
As it was, color images could not be handled (U
SP4426731). If you want to convert a specific color part of a certain part into bitmap data, refer to the run-length data expressed in color from the pointer of the line corresponding to that part and convert the bitmap only for the specific color. Or, by referring to only the pointer of the specific color of the part and obtaining the bitmap data from the run length data indicated by the pointer, it is possible to quickly extract only the specific color of the specific part, Character recognition and image processing can be performed based on this. This makes it possible to perform processing much faster than performing the above processing by decoding the entire screen. Further, if processing is directly performed on the run-length expression without decoding, higher-speed processing becomes possible.

【０１２１】帳票に現れる代表色について、それらはあ
らかじめ書式情報に記載されていて、それに基づいて抽
出されたり、そうでなくて自動的に抽出されたりする
が、それら代表色に基づいて画像を次の様に表現する。The representative colors appearing on the form are described in the format information in advance, and are extracted based on the information or automatically extracted otherwise. Express like.

【０１２２】この場合、代表色を表すコードＣと、その
色が連続する長さ、またはそれを符号化したものをＬと
すると、それらの組ＣＬを求め、これらの連続によって
１行分の画像を表現する。図１９は、コード化の一例を
示している。これはランレングス表現のカラー化であ
る。In this case, assuming that a code C representing a representative color and a continuous length of the color or L is a coded version of the color, a set CL thereof is obtained, and an image of one line is obtained by the continuation. To express. FIG. 19 shows an example of coding. This is a colorization of the run-length expression.

【０１２３】また、各代表色ごとにランレングス符号化
する方法も使用できる。この場合、図２０（ａ）の２０
０に示すように１ラインごとに各代表色のランレングス
表現をまとめておく表現方法と図２０（ｂ）の２０１に
示すように全画面分のランレングス表現をまとめておく
表現方法があり、また、各ラインのランレングス表現の
先頭を指すポインタの設定方法にも、図２０（ｃ）の２
０２に示すように１ラインごとに各代表色のポインタを
まとめておく方法と図２０（ｄ）の２０３に示すように
各代表色の全画面分のポインタをまとめておく方法があ
る。Further, a method of performing run-length encoding for each representative color can also be used. In this case, 20 in FIG.
There is an expression method in which run-length expressions of respective representative colors are put together for each line as shown by 0, and an expression method in which run-length expressions of the entire screen are put together as indicated by 201 in FIG. Also, the method of setting a pointer that points to the head of the run-length expression of each line is described in FIG.
There are a method of collecting pointers of each representative color for each line as shown in FIG. 02 and a method of collecting pointers for all screens of each representative color as shown at 203 in FIG.

【０１２４】また、ＲＧＢ各色を量子化して、各ＲＧＢ
ごとに同様のランレングスコーディングする事もでき
る。これらのコード化は隣あう画素間の少しの色の違い
をノイズと見なして無視する事によって、それらを代表
色や量子化濃度で近似する事により実現される。Further, each color of RGB is quantized, and each RGB is
Similar run-length coding can be performed for each time. These codings are realized by considering slight color differences between adjacent pixels as noise and ignoring them, and approximating them with representative colors or quantized densities.

【０１２５】ランレングス表現の画像に対して直接、画
像処理を施す例は、例えば特開昭６２−２４６１３８号
公報、特開昭６２−３０７６１５号公報などに開示され
ている。これらはすべて白黒ランレングスデータに対す
る画像処理方式であるが、これを各色のランレングス表
現に当てはめて実行することができる。また、これらの
画像処理技術に基づいて文字認識を実行することも可能
である。Examples of directly performing image processing on a run-length representation image are disclosed in, for example, Japanese Patent Application Laid-Open Nos. 62-246138 and 62-307615. These are all image processing methods for black-and-white run-length data, but can be applied to run-length representations of each color. It is also possible to execute character recognition based on these image processing techniques.

【０１２６】例えば、特定領域内の特定色の長い直線を
検出する場合は、その特定領域を指し示すポインタを参
照して該領域を含む部分の特定色の表すンレングスデー
タを得、そのうち、ランの長さが長くて該部分領域に含
まれるものを得て、隣り合う長いランを統合することに
よって直線を検出することができる。For example, when detecting a long straight line of a specific color in a specific area, referring to a pointer indicating the specific area, length data representing a specific color of a portion including the area is obtained. A straight line can be detected by obtaining a long one included in the partial area and integrating adjacent long runs.

【０１２７】別の例として色ランレングス表現の画像に
対して直接、画像処理を行う例を下記に示す。この場合
の画像処理の例は、色ずれ修正処理である。As another example, an example in which image processing is performed directly on an image expressed in color run length will be described below. An example of the image processing in this case is a color shift correction processing.

【０１２８】これらのコード化された情報に基づいて画
像処理、文字切出し、文字認識を行なう事により、例え
ばＲＧＢ画像に対して処理するよりも処理速度を速める
事ができ、また画像を表現するデータ量を圧縮する事が
できる。By performing image processing, character extraction, and character recognition based on these coded information, the processing speed can be increased as compared to, for example, processing on RGB images, and data representing images can be obtained. The amount can be compressed.

【０１２９】色ランレングスで表現されたデータに対し
て色ずれ修正を行なう事もできる。たとえば画像をＲＧ
Ｂに分離してラン表現を作っている時（図２０（ｄ）の
２０３）は各行の先頭ランの長さを変更するだけ１画素
単位のシフトは簡単に実現できる。It is also possible to correct the color misregistration of the data represented by the color run length. For example, RG
When a run expression is created separately for B (203 in FIG. 20 (d)), shifting by one pixel can be easily realized only by changing the length of the first run of each row.

【０１３０】１画素単位以下のシフトの場合は次の様な
処理で実現される。まず各ランの境界部分の画素の値を
調べて、その濃度値から各ランの長さを整数から実数に
変換する。境界画素におけるその色の濃度が高いほどラ
ンの長さを長く、低い程短くするように各ランの長さを
変更する。この時、各ランの総合計は変えないようにす
る。In the case of a shift of one pixel unit or less, the shift is realized by the following processing. First, the value of the pixel at the boundary of each run is checked, and the length of each run is converted from an integer to a real number based on the density value. The length of each run is changed so that the longer the density of the color at the boundary pixel is, the longer the run is, and the lower the density is, the shorter the run is. At this time, do not change the grand total of each run.

【０１３１】次に、先頭ランの長さをシフト量に従って
変更する。１行のラン表現が左から右に行なわれている
場合には左シフトの時は先頭ランをその分短くする。右
シフトの時はその分長くする。その後、各ランの総合計
を変えないように各ランの長さを整数化する。これで、
ラン表現上での画像のシフトが実現される。Next, the length of the first run is changed according to the shift amount. If the run expression of one line is performed from left to right, the first run is shortened by that amount when shifting left. When shifting to the right, make it longer. Then, the length of each run is converted to an integer so that the total sum of each run is not changed. with this,
A shift of the image on the run representation is realized.

【０１３２】たとえば画像を色コード＋ランレングス表
現で表している時（図２０（ｂ）の２０１）は、各１行
ラン表現の中で３個または４個のランの連続を調べる。
この連続したランの両はじのランにはさまれたランの色
と長さが色ずれによって生ずる可能性があるかどうか
を、シフト量と量はじのランの色から判断し、色ずれが
原因の場合はそのランを消滅させ、各色のシフトがあっ
たとして変更されるべき両はじのランの長さを変更す
る。または、各ランを例えばＲＧＢに色分解して事実上
図２０（ｄ）の２０３と同様な表現にかえて色ずれシフ
トを行ない、また元の表現（図２０（ｂ）の２０１）に
戻す手段も可能である。For example, when an image is represented by a color code + run-length expression (201 in FIG. 20B), a continuation of three or four runs in each one-line run expression is examined.
The amount of shift and the amount of the run are judged based on the color of the run and whether or not the color and length of the run sandwiched between the runs of the continuous run may be caused by color misregistration. In the case of, the run is extinguished, and the length of both runs to be changed as a shift of each color is changed. Alternatively, each run is color-separated into, for example, RGB, and a color shift shift is performed instead of the expression 203 substantially in FIG. 20D, and the original expression (201 in FIG. 20B) is returned. Is also possible.

【０１３３】いずれにしても、この様な手段で色ラン表
現の各色を直接原画像に戻す事なく、ずらす事ができ
る。In any case, it is possible to shift each color of the color run expression by such means without directly returning to the original image.

【０１３４】なお、上述の色分離を２段階に別けて行う
ことにより、処理の高速化を図ることができる。第１段
階で細かな色の違いを余り無視しないようにして色分離
を行い、その結果に基づいて第１の色のランレングス表
現を作り、次にこのランレングス表現のデータに基づい
て第２の色分離を行う。そして必要があれば第２の色分
離の結果に基づいて第２の色ランレングス・データ表現
を作成し、これを画像処理や文字認識処理の入力データ
とする。By performing the above-described color separation in two stages, the processing can be speeded up. In the first stage, color separation is performed so as not to disregard a small color difference, a run-length expression of the first color is created based on the result, and then a second-length expression is generated based on the data of the run-length expression. Color separation. If necessary, a second color run-length data expression is created based on the result of the second color separation, and this is used as input data for image processing and character recognition processing.

【０１３５】一般に色分離は、処理時間が掛かるので、
第１の色分離に簡単で高速な方式を用いて粗く色分離を
行っておき、これに対して処理時間のかかる第２の色分
離を行って精密な代表色の決定を行うようにすれば、直
接原画像に対して第２の色分離処理を行うよりも高速で
分離処理を行うことができる。Generally, color separation takes a long processing time.
If a simple and high-speed method is used for the first color separation to roughly perform color separation, and a second color separation requiring a long processing time is performed, a precise representative color can be determined. Thus, the separation process can be performed at a higher speed than when the second color separation process is directly performed on the original image.

【０１３６】第２の色分離手段としては、上述した各方
式が考えられ、これらは処理時間を要する色分離手段で
ある。一方、特開平４−１６０４８６号公報に示される
ものは、単純で高速なので、第１の色分離に適してい
る。As the second color separation means, each of the above-mentioned methods can be considered, and these are color separation means requiring processing time. On the other hand, Japanese Patent Application Laid-Open No. 4-160486 is suitable for the first color separation because it is simple and fast.

【０１３７】ＲＧＢの値は、明度（Ｖ）、彩度（Ｃ）、
色相（Ｈ）に変換してこれらの空間の中で色処理した方
が良いし、また色の指定もこれらの値によって行なった
方が便利である。これらの変換はたとえば次の様にして
実現される。The RGB values are: lightness (V), saturation (C),
It is better to convert to hue (H) and perform color processing in these spaces, and it is more convenient to specify colors by these values. These conversions are realized, for example, as follows.

【０１３８】まず、明度（Ｖ）はＲＧＢの濃度値の平均
値として定義できる。次に彩度（Ｃ）は最小の濃度値に
基づいて定義できる。すなわち、最小の濃度が０に近い
ほど彩度は高い値をとる。なぜなら、この場合は濃度値
の高い色と濃度値の低い色の濃度値の差が大きいので鮮
やかな色が得られるからである。色相（Ｈ）を定義する
ためには、まずＲＧＢの各色に対応する数値を定める。
例えばＲを０、Ｇを１２８、Ｂを１９２とし、ＢからＲ
へ至る色の変化を表すときは、Ｒを２５６として扱う。
次に最大の濃度値の色を示す数値と次の濃度値の色を表
す数値の間の数値をそれらの濃度値の比で求めそれを入
力色の数値すなわち色相とする。First, the lightness (V) can be defined as an average of RGB density values. Next, the saturation (C) can be defined based on the minimum density value. That is, the saturation takes a higher value as the minimum density is closer to zero. This is because, in this case, since the difference between the density values of the color having a high density value and the color having a low density value is large, a vivid color can be obtained. In order to define the hue (H), first, a numerical value corresponding to each color of RGB is determined.
For example, assume that R is 0, G is 128, B is 192, and B is R
R represents 256 when representing a color change leading to.
Next, a numerical value between the numerical value indicating the color of the maximum density value and the numerical value indicating the color of the next density value is determined by the ratio of those density values, and is set as the numerical value of the input color, that is, the hue.

【０１３９】具体的には次の手順で求めることができ
る。ｒ、ｇ、ｂをそれぞれＲＧＢの濃度値、ｈ、ｓ、ｖ
をそれぞれＨＳＶの値として、変換の手順をＣ言語で記
述すると、下記のようになる。Specifically, it can be obtained by the following procedure. r, g, and b are RGB concentration values, h, s, and v, respectively.
Is described as the HSV value, and the conversion procedure is described in C language as follows.

【０１４０】ｖ＝ｒ＋ｇ＋ｂ； if（ｖ＝０）｛ｃ＝０；ｈ＝０；｝ else ｛ｒ＝（ｒ＊２５５）／ｖ；ｇ＝（ｇ＊２５５）／ｖ；ｂ＝（ｂ＊２５５）／ｖ； if（ｒ＞ｇ）｛ if（ｇ＞ｂ）｛/*ｒ＞ｇ＞b*/s＝ 255−b*３;h＝ ((g−b)*129)/ （ｒ＋ｇ−2*b); ｝ else if（ｂ＞ｒ）｛/*ｂ＞ｒ＞g*/s＝ 255−b*３;h＝ ((r−g)*65 )/（ｂ＋ｒ−2*b)＋192;｝ else ｛/*ｒ＞＝ｂ＞g*/s＝ 255−g*３;h＝ ((r−g)*65)/(ｒ＋ｂ−2*b)＋192;｝｝ else ｛ if（ｒ＞ｂ）｛/*ｇ＞＝ｒ＞b*/s＝ 255−b*３;h＝ ((g−b)*129)/（ｇ＋ｒ−2*b); ｝ else if（ｂ＞ｇ）｛/*ｂ＞ｇ＞＝r*/s＝ 255−r*３;h＝ ((b−r)*65)/(ｂ＋ｇ−2*r)＋128;｝ else if((ｇ＞ｂ）‖（ｂ＞ｒ)) ｛/*ｇ＞＝ｂ＞＝r*/s＝ 255−r*３;h ＝ (( b−r)*65)/(ｇ＋ｂ−2*b)＋128;｝ else ｛/*ｇ＝ｂ＝r*/s＝０;h＝０；｝｝｝ if（ｈ＞255)ｈ＝０； if（ｓ＞＝128)ｓ＝（ｓ−128)/4＋224; else ｓ＝（ｓ＊224)/128; ｖ＝ｖ／３；）このような変換は、上記に限られるものではないが、こ
れらの値をテーブルルックアップによって求める様にし
ても良い。すなわちＲＧＢの値を量子化し、各色をＮ
Ｒ、ＮＧ、ＮＢビットで表現し、アドレス空間がＮＲ＋
ＮＧ＋ＮＢビットのテーブルを用意して、このテーブル
内のＲＧＢの値に対応するアドレスにそのＲＧＢが表す
色に対応するＨＳＶの値を格納するようにすれば良い。
また画像入力部１からの入力画像の色表現がＨＳＶであ
っても良い。V = r + g + b; if (v = 0) ｛c = 0; h = 0;｝ else ｛r = (r * 255) / v; g = (g * 255) / v; b = (b * 255) / v; if (r> g) ｛if (g> b) ｛/ * r>g> b * / s = 255−b * 3; h = ((g−b) * 129) / (r + g −2 * b);｝ else if (b> r) ｛/ * b>r> g * / s = 255−b * 3; h = ((r−g) * 65) / (b + r−2 * b ) +192;｝ else ｛/ * r> = b> g * / s = 255−g * 3; h = ((r−g) * 65) / (r + b−2 * b) +192;｝｝ else ｛if (R> b) ｛/ * g> = r> b * / s = 255−b * 3; h = ((g−b) * 129) / (g + r−2 * b);｝ else if (b> g) ｛/ * b>g> = r * / s = 255−r * 3; h = ((b−r) * 65) / (b + g−2 * r) +128;｝ else if ((g> b) ‖ (b> r)) ｛/ * g> = b> = r * / s = 255−r * 3; h = ((b−r) * 65) / (g + b−2 * b) +128; ｝ Else ｛/ * g = b = r * / s = 0; h = 0;｝｝｝ if (h> 255) h = 0; if (s> = 128) s = ( s−128) / 4 + 224; else s = (s * 224) / 128; v = v / 3;) Such conversion is not limited to the above, but it is possible to obtain these values by table lookup. You may do it. That is, the RGB values are quantized, and each color is set to N
Expressed in R, NG and NB bits, the address space is NR +
An NG + NB bit table may be prepared, and the HSV value corresponding to the color represented by RGB may be stored in the address corresponding to the RGB value in this table.
The color representation of the image input from the image input unit 1 may be HSV.

【０１４１】また、これらの変換を色の偏り補正と同時
に行なっても良いし、これらの変換を色ずれ修正と同時
に行なっても良い。In addition, these conversions may be performed simultaneously with the correction of the color deviation, or these conversions may be performed simultaneously with the correction of the color shift.

【０１４２】ところで、画像の表現には３種類あり、こ
れらは各画素が色情報を保持している場合（Ｃ）、濃淡
情報を保持している場合（Ｇ）、２値化されている場合
（Ｂ）である。By the way, there are three types of image representations. Each pixel holds color information (C), and each pixel holds density information (G). (B).

【０１４３】前者はより高度な認識処理に向いている
が、後者は高速処理に向いている。高度な認識処理を必
要としない部分にカラーデータを割り当てる事や高度な
認識処理を必要とする部分に２値データを割り当てる事
はシステムパフォーマンスを向上させる観点からは望ま
しくない。そこで、帳票上の各部分ごとにデータ表現形
式を指定する機構を導入する事は重要である。具体的に
は書式情報などに領域指定とその領域における画像表現
形式を記載できるようにする。The former is suitable for more advanced recognition processing, while the latter is suitable for high-speed processing. Assigning color data to portions that do not require advanced recognition processing or assigning binary data to portions that require advanced recognition processing is not desirable from the viewpoint of improving system performance. Therefore, it is important to introduce a mechanism for designating a data expression format for each part on a form. Specifically, an area designation and an image expression format in the area can be described in format information or the like.

【０１４４】画像入力部１が前記ＣＧＢのいずれの画像
を出力するのかは、１種類に限定されている訳ではな
い。そこで、本発明装置では（Ｃ）に対応して文字認識
する機構、（Ｇ）に対応して文字認識する機構、（Ｂ）
に対応して文字認識する機構のうち少なくとも複数の機
構を備えていて、入力画像の種類に応じて処理を行なう
様にする。The image input unit 1 to output any of the above CGB images is not limited to one type. Therefore, in the device of the present invention, a mechanism for character recognition corresponding to (C), a mechanism for character recognition corresponding to (G), and (B)
, At least a plurality of mechanisms for character recognition are provided, and processing is performed according to the type of input image.

【０１４５】図２１は、上述の手段を具体化するための
構成で、画像入力部２１１からの出力をそれぞれカラー
画像バッファ２１２、グレイ画像バッファ２１３、２値
画像バッファ２１４に各別に取り込み、それぞれについ
て認識部２１５〜２１７において文字認識を行い、これ
らの認識結果を読取り結果修正部２１８を介して出力す
るようになる。FIG. 21 shows a configuration for embodying the above-described means. Outputs from the image input unit 211 are respectively captured into a color image buffer 212, a gray image buffer 213, and a binary image buffer 214, and Character recognition is performed by the recognition units 215 to 217, and these recognition results are output via the read result correction unit 218.

【０１４６】読み取り対象帳票によって、それぞれＣＧ
Ｂのいずれで画像入力したら良いかがあるので、書式情
報等には入力すべき画像の画像表現に関する指定ができ
るようになっていると便利である。この書式情報の画像
表現に関する指定に従って画像入力装置に指定の画像表
現の出力をリクエストする。画像入力部１はそれに応じ
た画像を送ってくるので、本装置はそれに対応する処理
を行なう。Each of the forms to be read has a CG
Since there is a choice as to which of B the image should be input, it is convenient if the format information or the like can specify the image expression of the image to be input. The image input device is requested to output the specified image expression according to the specification regarding the image expression of the format information. Since the image input unit 1 sends an image corresponding to the image, the present apparatus performs a process corresponding to the image.

【０１４７】しかし、画像入力部１がリクエストに応じ
られない場合や、本装置からリクエストしない場合に
は、あらかじめ分かっている画像表現のデータが送られ
て来る訳ではないので、この場合には画像データに付随
しているか、あらかじめ設定されている画像表現データ
の種類に応じて認識処理などを行なう様にする。However, if the image input unit 1 cannot respond to the request or does not make a request from the present apparatus, data of an already known image expression is not sent. Recognition processing or the like is performed according to the type of image expression data attached to the data or set in advance.

【０１４８】また画像入力の対象が画像ファイル、すな
わち、磁気ディスクなどに蓄えられたデータの時には、
そのファイルにはデータの画像表現形式を判定できるデ
ータが付加されており、それが、画像データと共に本装
置に入力される。ＦＡＸなどの通信回線を通して送られ
る画像データについても同様であることは前にも述べ
た。When the object of image input is an image file, that is, data stored in a magnetic disk or the like,
The file is provided with data for determining the image representation format of the data, and the data is input to the apparatus together with the image data. As described above, the same applies to image data sent through a communication line such as a facsimile.

【０１４９】本機能は文字認識を行なうシステムと複数
の端末と複数の画像入力装置をネットワークで接続した
ようなシステム構成の時に有効である。何故なら、ネッ
トワークに接続された画像入力部１には各種のものが存
在する可能性があり、それらに自由に対応できるからで
ある。This function is effective in a system configuration in which a character recognition system and a plurality of terminals and a plurality of image input devices are connected via a network. This is because there is a possibility that there are various kinds of image input units 1 connected to the network, and these can be freely handled.

【０１５０】次に、このようにして色表現された画像
（Ｃ）、濃淡表現された画像（Ｇ）に対する文字認識処
理について述べる。Next, the character recognition processing for the image (C) represented in color and the image (G) represented in light and shade will be described.

【０１５１】まず、ＣまたはＧの画像を２値化して、そ
の２値画像を認識対象とする。文字の検出切出しや文字
認識の際にかすれや潰れ、輪郭乱れなどの疑いのある場
所が発生した時に、ＣまたはＧの画像の当該部分に対応
する部分を調べる。従って、本装置では、２値化の後も
ＣまたはＧ画像が保持されている。First, the C or G image is binarized, and the binary image is set as a recognition target. When a suspicious place such as blurring, crushing, or contour distortion occurs during character extraction and character recognition or character recognition, a portion corresponding to the portion of the C or G image is examined. Therefore, in this device, the C or G image is retained even after the binarization.

【０１５２】図２２はブロック図である。この場合、入
力されたカラー画像はそのままかあるいは色分離されて
カラー情報としてカラー画像バッファ２２１に蓄えられ
る。この時、そのままのカラー情報と、色分離後のカラ
ー情報を別々のバッファに格納してもよい。次に、カラ
ー情報からグレー画像が例えばＲＧＢの濃度値の平均値
を計算するなどして得、グレイ画像バッファ２２２に格
納する。入力画像がグレーの時は、それは直接このバッ
ファに入力されてカラー画像バッファは省略される。FIG. 22 is a block diagram. In this case, the input color image is stored as it is or color-separated in the color image buffer 221 as color information. At this time, the color information as it is and the color information after color separation may be stored in separate buffers. Next, a gray image is obtained from the color information by calculating, for example, an average value of RGB density values, and stored in the gray image buffer 222. When the input image is gray, it is input directly into this buffer and the color image buffer is omitted.

【０１５３】次にグレー画像に対して２値化処理を施す
ことによって２値画像を生成し、２値画像バッファ２２
３に格納する。次に２値画像バッファ２２３のデータか
ら１文字分の画像データを取り出し、これに対して文字
切り出し部２２４で文字切り出しを行い、文字認識部２
２５で文字認識を行う。例えば図２３（ａ）（ｂ）
（ｃ）に示すようにＡやＢのパターンが辞書パターンと
して記憶されているものだとして、これらと入力パター
ンのＸのマッチングが取られる。例えば、テンプレート
マッチングのような手法によってマッチングする。Next, a binary image is generated by performing a binarization process on the gray image.
3 is stored. Next, image data for one character is extracted from the data of the binary image buffer 223, and the character is extracted by the character extracting unit 224.
At 25, character recognition is performed. For example, FIGS.
As shown in (c), assuming that patterns A and B are stored as dictionary patterns, X of the input pattern is matched with these patterns. For example, matching is performed by a method such as template matching.

【０１５４】その結果、ＡとＢのマッチング程度が同じ
くらいでどちらか判定がつかなかったとすると、この時
辞書パターンＡとＢの組には、詳細認識判定プロセスが
付属しており、このような判定不能時に起動されるもの
とする。As a result, assuming that the matching degree between A and B is almost the same and a judgment cannot be made, a set of dictionary patterns A and B is provided with a detailed recognition judgment process. It shall be started when judgment is impossible.

【０１５５】この詳細認識判定プロセスでは、入力パタ
ーンの特定部分について調べるような指定がしてある。
この場合、入力パターンのＹの領域部分について穴が開
いているかどうか調べるという指定がある。In the detailed recognition determination process, a designation is made to check a specific portion of the input pattern.
In this case, there is a designation to check whether a hole is opened in the Y area portion of the input pattern.

【０１５６】そこで、この領域に対応するグレイ像バッ
ファ２２２を参照してそこの濃度値を調査する。その濃
度値が周囲の濃度よりもバックグラウンドの濃度値に近
ければ、穴が開いていると判断して入力文字は「８」と
判定される。その逆の時は「９」と判定される。Therefore, the density value of the gray image buffer 222 corresponding to this area is checked with reference to the gray image buffer 222. If the density value is closer to the background density value than the surrounding density, it is determined that a hole is opened, and the input character is determined to be "8". In the opposite case, it is determined to be "9".

【０１５７】グレー画像の変わりにカラー画像バッファ
２２１を参照するときも同様である。文字切出し部２２
４においても同様にグレー画像バッファ２２２やカラー
画像バッファ２２１を参照することがもきる。The same applies when referring to the color image buffer 221 instead of the gray image. Character extraction section 22
4, the gray image buffer 222 and the color image buffer 221 can be similarly referenced.

【０１５８】例えば、図２４は２個の文字が接触してい
る例である。この画像を縦方向に射影成分、すなわち黒
画素数を計算すると、同図のＡのようになる。この射影
成分分析から、例えばこの射影成分の最小点、この場合
はＢとＣを文字の切り出し候補点とする。For example, FIG. 24 shows an example in which two characters are in contact. When the projected component of this image, that is, the number of black pixels, is calculated in the vertical direction, the result is as shown in A of FIG. From the projection component analysis, for example, the minimum point of the projection component, in this case, B and C are set as character extraction candidate points.

【０１５９】そこで、これらの切り出し候補点に基づい
て切断箇所を求め、ｂとｃ，そこの部分に対応するグレ
ー画像またはカラー画像の該当箇所を参照する。その結
果、その部分が、バックグラウンドの色に近いか、また
は濃度値に近いと判定されたときには、その部分の切り
出し候補の優先順位を上げることができる。ｂとｃの場
合、射影成分の値からはｂの方が切り出し点としての優
先順位が高いと判定されているので、ｃの部分の濃度値
やバックグラウンドに近いことが判明すれば、ｃの部分
を正しい切り出し点として優先順位を上げることがで
き、結果として誤読を防ぐことができる。Therefore, a cut portion is determined based on these cutout candidate points, and b and c, and the corresponding portions of the gray image or the color image corresponding to those portions are referred to. As a result, when it is determined that the part is close to the background color or close to the density value, it is possible to increase the priority of the cutout candidates of the part. In the case of b and c, it is determined that b has a higher priority as a cut-out point from the value of the projected component, so if it is found that it is close to the density value or the background of the portion of c, c The priority can be raised by setting a portion as a correct cutout point, and as a result, erroneous reading can be prevented.

【０１６０】勿論、切り出し方式は、前記方式に限定さ
れるものでない。この他に輪郭情報や切り出したときの
文字の幅やそのほかの特徴を使ったり、それらを併用し
てもよいことはいうまでもない。Of course, the cutout method is not limited to the above method. In addition to this, it is needless to say that the outline information, the width of the character at the time of clipping, and other characteristics may be used, or may be used together.

【０１６１】以上の考え方を一般的に述べると、認識プ
ロセスから決められる再チェックが必要な場所が決めら
れたときに、後段の処理をフィードバックする事によ
り、その場所におけるパターンの色や濃淡データを参照
しする事によって、かすれ、潰れ、輪郭乱れなどを修正
し、線分抽出、レイアウト理解、フォーム理解、検出切
出し、文字認識等を行なう。In general, the above concept is described. When a place that needs to be re-checked determined from the recognition process is determined, the processing of the subsequent stage is fed back, so that the color and shade data of the pattern at that place can be obtained. By referencing, blurring, crushing, contour distortion, etc. are corrected, and line segment extraction, layout understanding, form understanding, detection extraction, character recognition, and the like are performed.

【０１６２】次に、文字認識における２重線取消処理に
ついて述べる。入力画像は色分離され、印鑑の色に相当
する部分の記入文字の色に分けられる。そして、記入色
の画像データに対して文字認識が行われる。この時、文
字記載部分に、長い横線が２本、また１本発見された時
は、取り消し部分と判定される。Next, double line cancellation processing in character recognition will be described. The input image is color-separated, and is divided into the colors of the written characters in the portion corresponding to the color of the seal. Then, character recognition is performed on the image data of the entry color. At this time, if two long horizontal lines or one long horizontal line are found in the character description portion, it is determined to be a cancel portion.

【０１６３】長い横線は、例えば長い横線ラインを見付
け出すなどして検出することができる。その長さが十分
でない時には、それらが１文字かまたは接触文字である
可能性も合わせて調べるように構成する。そのような文
字切り出しや文字認識が失敗すればそれは取り消しであ
る可能性が高くなる。A long horizontal line can be detected, for example, by finding a long horizontal line. When the length is not enough, it is configured to check whether they are single characters or contact characters. If such character segmentation or character recognition fails, it is more likely that it is a cancellation.

【０１６４】この結果から、取り消しであると判断され
て、近傍に訂正文字があればそれは訂正文字として認識
され、当然そのときは、該取り消し部分が真に取り消さ
れている可能性が高いと判断される。From this result, it is determined that the character is canceled, and if there is a corrected character in the vicinity, it is recognized as a corrected character. At that time, it is naturally determined that there is a high possibility that the canceled part has been truly canceled. Is done.

【０１６５】次に、この取り消し部分の位置に対応する
印鑑色の画像データの一部分を取り出して、その部分に
十分な印鑑色が存在するかどうかを調べる。もし存在し
なければ装置を使用しているオペレータに警告を発生す
るようにする。また、その警告をフラグとして、または
存在有無などを文字認識結果に付随して出力するように
する。Next, a part of the image data of the seal color corresponding to the position of the canceled part is extracted, and it is checked whether or not a sufficient seal color exists in the part. If not, a warning is issued to the operator using the device. Further, the warning is output as a flag, or the presence or absence of the warning is output together with the character recognition result.

【０１６６】印鑑の判定については、印鑑判定部を有
し、取り消し部分に印鑑が存在するかどうかを印鑑色の
画像データの形状から判定するようにしても良い。ま
た、印鑑の形状の類似性を判定できる印鑑判定部を導入
することにより、どの種類の印鑑が発見されたかを認識
結果に付随して出力することもできる。For the determination of the seal, a seal determination unit may be provided to determine whether or not the seal exists in the canceled portion based on the shape of the seal color image data. In addition, by introducing a seal stamp determining unit that can determine the similarity of the shape of the seal stamp, it is possible to output which type of seal was found, accompanying the recognition result.

【０１６７】通常、印鑑の色と記入色は異なるので色分
離により印鑑の色部分のみを取り出して、印鑑判定を行
なえば精度の良い判定ができる。記入部分と印鑑の重な
り部分も上述の色分離で説明した方法や、また記入部分
に印鑑が押されている可能性があるとして印鑑の判定や
照合を行なう様に構成して良い。Normally, the color of the seal is different from the entry color. Therefore, if only the color portion of the seal is taken out by color separation and the seal is judged, accurate judgment can be made. The overlapping portion between the entry portion and the seal may be configured to perform the method described in the above-described color separation, or to perform the judgment and collation of the seal based on the possibility that the stamp is pressed on the entry portion.

【０１６８】また、文字認識における文字種を同定する
場合について述べる。The case of identifying a character type in character recognition will be described.

【０１６９】この場合、文字線が太いことなどは、文字
がプレプリントされたものである事を示しており、その
様な証拠が多く発見された時はプレプリント用の文字切
り出し、文字認識手段を適用する。または、それを優先
的に適用するようになる。In this case, a thick character line indicates that the character has been preprinted, and if such evidence is found in large numbers, the character is cut out for preprinting and the character recognition means is used. Apply Or, it will be applied preferentially.

【０１７０】最後に、修正画面の構成について述べる。
入力された画像データから、例えば色分離処理によって
帳票のフォーム・イメージを取り出す。フォーム・イメ
ージは、色のみから分離されても良いが、罫線抽出処理
を特定の色に対して行うようにして抽出しても良い。フ
ォーム・イメージは、ビットパターンで取り出されても
良いし、また、ベクトルデータ、すなわち、始点座標と
終点座標の組からなる線分データの組み合わせとして表
現されていても良い。Finally, the configuration of the correction screen will be described.
From the input image data, a form image of the form is extracted by, for example, color separation processing. The form image may be separated from only the color, or may be extracted by performing the ruled line extraction processing for a specific color. The form image may be extracted in a bit pattern, or may be expressed as vector data, that is, a combination of line segment data including a set of start point coordinates and end point coordinates.

【０１７１】次に、帳票上の文字を色分離した画像デー
タなどから認識する。この時認識した文字の存在位置や
大きさなどの情報を記憶しておく。Next, the characters on the form are recognized from the color-separated image data and the like. At this time, information such as the location and size of the recognized character is stored.

【０１７２】ここで色分析により得られた帳票のフォー
ム・イメージの色に類似している色によってフォーム・
イメージを表示する。その時に、認識結果の文字をその
文字が存在していた位置に重ねて表示する。この時、記
入色に対応する色で文字を表示しても良い。Here, the form is determined by a color similar to the color of the form image of the form obtained by the color analysis.
Display an image. At that time, the character of the recognition result is superimposed and displayed at the position where the character existed. At this time, the characters may be displayed in a color corresponding to the entry color.

【０１７３】前記で類似している色とは、予め決められ
た対応関係によって対応する色であっても良いし、計算
によって求められる近い色であっても良い。The similar color may be a color corresponding to a predetermined correspondence, or may be a similar color obtained by calculation.

【０１７４】これらの表示の全体や一部は、現画像表示
の全体や一部に隣接して、それらの位置関係が対応付く
ように表示しても良い。The whole or a part of the display may be displayed adjacent to the whole or a part of the current image display so that their positional relations are associated with each other.

【０１７５】なお、図１に示す実施例では、外部より与
えられる画像データを入力画像としてそのまま処理する
場合を述べたが、送信されてくる画像データや記憶装置
に蓄えられている画像データを入力画像とすることもで
きる。In the embodiment shown in FIG. 1, the case where the image data supplied from the outside is processed as the input image as it is has been described, but the transmitted image data and the image data stored in the storage device are input. It can also be an image.

【０１７６】この場合、画像データのみを送信や蓄積す
ると、該画像を処理する側で色補正を行う手掛かりを得
られないという不都合が生じる。In this case, if only the image data is transmitted or stored, there is a disadvantage that the image processing side cannot obtain a clue to perform the color correction.

【０１７７】図２５は、通信システムに適用した例を示
すもので、送信部２５１に画像入力部２５１１、修正パ
ラメータ付加部２５１２、通信回線出力部２５１３を有
し、受信部２５２に通信回線入力部２５２１、修正パラ
メータ解釈部２５２２、色修正部２５２３、文字認識部
／画像処理部２５２４を有している。FIG. 25 shows an example in which the present invention is applied to a communication system. A transmitting section 251 has an image input section 2511, a correction parameter adding section 2512, and a communication line output section 2513, and a receiving section 252 has a communication line input section. 2521, a correction parameter interpretation unit 2522, a color correction unit 2523, and a character recognition unit / image processing unit 2524.

【０１７８】そして、スキャナから読み込まれる画像は
例えばＲＧＢ３色の多値表現カラー画像データとして画
像入力部２５１１から入力される。すると、修正パラメ
ータ付加部２５１２によりスキャナの形式、通し番号を
表すコード、入力装置名、ＩＤ番号、機種などや色ずれ
修正に必要な修正パラメータ、または色ずれ修正済みか
どうかのフラグなど、色ずれ修正プロセスを特定できる
データが、入力した画像に属性データとして付与され
て、通信回線出力部２５１３より通信回線２５３を通じ
て文字認識機能を有する受信部２５２に送られる。The image read from the scanner is input from the image input unit 2511 as, for example, RGB three-color multivalued color image data. Then, the correction parameter adding unit 2512 corrects the color misregistration such as the code of the scanner, the code representing the serial number, the input device name, the ID number, the model, the modification parameters necessary for the color misregistration correction, or the flag indicating whether the color misregistration has been corrected. Data that can specify the process is given as attribute data to the input image, and is sent from the communication line output unit 2513 to the reception unit 252 having a character recognition function via the communication line 253.

【０１７９】受信部２５２では、通信回線入力部２５２
１よりデータを受け取ると、修正パラメータ解釈部２５
２２により属性データがスキャナの型式や通し番号を表
すものであるときは、本装置に予め登録してある型式や
通し番号と色ずれ修正パラメータの対応表を参照して色
ずれ修正パラメータを解釈する。この時、対応表に修正
パラメータがないときには、エラー処理とするか、また
は自動的に色ずれ修正パラメータを得るプロセスを用い
て色ずれ修正パラメータを採用する。In the receiving section 252, the communication line input section 252
When receiving the data from 1, the modified parameter interpreting unit 25
If the attribute data indicates the model and serial number of the scanner according to 22, the color misregistration correction parameter is interpreted with reference to the correspondence table between the model and serial number and the color misregistration correction parameter registered in advance in the apparatus. At this time, if there is no correction parameter in the correspondence table, error processing is performed, or the color deviation correction parameter is adopted by using a process for automatically obtaining the color deviation correction parameter.

【０１８０】受信したデータに直接、修正パラメータが
付属している場合は、この修正パラメータを用いる。も
し、受信データに修正パラメータを特定できる属性デー
タが付属していないときは、エラー処理とするか、また
は自動的に自動的に色ずれ修正パラメータを得るプロセ
スを用いて色ずれ修正パラメータを得る。When a correction parameter is directly attached to the received data, this correction parameter is used. If the received data does not include attribute data capable of specifying a correction parameter, error correction processing is performed, or a color registration correction parameter is obtained using a process for automatically obtaining a color registration correction parameter.

【０１８１】属性データが色ずれ修正ずみかどうかを表
すフラグを有していて、それが色ずれを表していれば、
色ずれ修正は行わない。If the attribute data has a flag indicating whether or not the color shift has been corrected, and if the flag indicates a color shift,
No color shift correction is performed.

【０１８２】このようにして色ずれ修正されたデータは
文字認識部／画像処理部２５２４に送られ処理される。The data thus corrected for color misregistration is sent to the character recognition unit / image processing unit 2524 for processing.

【０１８３】なお、システムが画像蓄積装置を有してい
る場合は、前記した色ずれ修正を特定するための属性デ
ータを入力した画像データに付属させて蓄積し、その後
に蓄積されたデータを読み出して上述と同様な処理を行
うようになる。If the system has an image storage device, the attribute data for specifying the above-described color misregistration correction is attached to the input image data and stored, and then the stored data is read out. Thus, the same processing as described above is performed.

【０１８４】この場合、本装置では、色ずれ修正だけで
なく、各色のボケ程度を揃える修正や、色の偏り補正な
どについても同様な処理を行うことができる。In this case, the present apparatus can perform the same processing not only for correcting color misregistration but also for correcting the degree of blur of each color and correcting color bias.

【０１８５】その他、本発明は上記実施例に限定される
ものでは無い。要するに本発明はその要旨を逸脱しない
範囲で種々変形して用いる事ができる。In addition, the present invention is not limited to the above embodiment. In short, the present invention can be variously modified and used without departing from the gist thereof.

【０１８６】[0186]

【発明の効果】本発明によれば、帳票色や記入色、帳票
フォーム部分や文字部分の印刷方法、文字記入方法等に
対する制限を緩く運用した時や、乱雑に記入文字を記入
欄に記載した時などでも、高精度で文字を認識する事が
できる。According to the present invention, when the restrictions on the form color and the entry color, the printing method of the form form and the character part, the character entry method, etc. are loosely applied, and the entry characters are described in the entry column in a random manner. Even at times, characters can be recognized with high accuracy.

[Brief description of the drawings]

【図１】本発明の一実施例の概略構成を示す図。FIG. 1 is a diagram showing a schematic configuration of an embodiment of the present invention.

【図２】黒い部分が色ずれした時のＲＧＢ信号の様子を
示す図。FIG. 2 is a diagram illustrating a state of an RGB signal when a black portion is out of color;

【図３】一般に色ずれがあった時のＲＧＢ信号の様子を
示す図。FIG. 3 is a diagram illustrating a state of an RGB signal when a color shift occurs in general.

【図４】色ずれ検出用マークを示す図。FIG. 4 is a diagram showing a color misregistration detection mark.

【図５】ＲＧＢ信号の変化点を合わせる事により色ずれ
修正を行なう方法を説明する図。FIG. 5 is a view for explaining a method of correcting color misregistration by matching changing points of RGB signals.

【図６】色のボケ方が一様でない時のＲＧＢ信号の様子
を示す図。FIG. 6 is a diagram illustrating a state of an RGB signal when a color blur is not uniform.

【図７】グレー画像の濃度値の頻度分布を示す図。FIG. 7 is a diagram showing a frequency distribution of density values of a gray image.

【図８】数字「５」に直線が貫通している例を示す図。FIG. 8 is a diagram showing an example in which a straight line passes through the number “5”.

【図９】図８の例の色分離を説明する図。FIG. 9 is a view for explaining color separation in the example of FIG. 8;

【図１０】図８の例の色分離を説明する図。FIG. 10 is a view for explaining color separation in the example of FIG. 8;

【図１１】図８の例の色分離を説明する図。FIG. 11 is a view for explaining color separation in the example of FIG. 8;

【図１２】直線の近くにノイズが存在する例を示す図。FIG. 12 is a diagram showing an example in which noise exists near a straight line.

【図１３】直線の一部が若干変色している例と、直線に
その色に近い色のノイズが付着している例を示す図。FIG. 13 is a diagram illustrating an example in which a part of a straight line is slightly discolored, and an example in which noise of a color close to the color is attached to the straight line.

【図１４】異なる色の線分が交わっている状態を示す
図。FIG. 14 is a diagram showing a state where line segments of different colors intersect.

【図１５】黒い線分と何らかの色を持った線分が交わっ
ている状態を示す図。FIG. 15 is a diagram showing a state where a black line segment and a line segment having any color intersect;

【図１６】直線の部分的太さを検出する例を示す図。FIG. 16 is a diagram illustrating an example of detecting a partial thickness of a straight line.

【図１７】文字とハッチング部分を分離する手法を示す
図。FIG. 17 is a diagram showing a method of separating a character and a hatched portion.

【図１８】色分離処理を一部分のみに限定して行なう処
理を説明する図。FIG. 18 is a view for explaining a process in which the color separation process is limited to only a part.

【図１９】色を表現できるランレングス表現方法を説明
する図。FIG. 19 is a view for explaining a run-length expression method capable of expressing colors.

【図２０】色を表現できるランレングス・データのフォ
ーマット定義の方法を説明する図。FIG. 20 is an exemplary view for explaining a format definition method of run-length data capable of expressing a color.

【図２１】入力画像の種類に応じて処理を行う例を示す
図。FIG. 21 is a diagram showing an example in which processing is performed according to the type of an input image.

【図２２】かすれ、潰れ、輪郭乱れなどが生じた場合の
処理例を示す図。FIG. 22 is a diagram showing a processing example in the case where blurring, crushing, contour disorder, and the like have occurred.

【図２３】文字の一部分がループなのかどうか疑わしい
ケースを説明する図。FIG. 23 is a diagram illustrating a case where it is doubtful whether a part of a character is a loop.

【図２４】２個の文字が接触している例を示す図。FIG. 24 is a diagram showing an example in which two characters are in contact with each other.

【図２５】通信システムに適用した例を示す図。FIG. 25 is a diagram showing an example applied to a communication system.

[Explanation of symbols]

１…画像入力部２…色分離部３…バッファ３１、３２、３ｎ…色画像バッファ４…画像処理部５…文字切り出し部６…文字認識部７…読み取り結果修正部。 DESCRIPTION OF SYMBOLS 1 ... Image input part 2 ... Color separation part 3 ... Buffer 31, 32, 3n ... Color image buffer 4 ... Image processing part 5 ... Character cutout part 6 ... Character recognition part 7 ... Read result correction part.

───────────────────────────────────────────────────── フロントページの続き (72)発明者堀内秀雄神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内Ｆターム(参考） 5B029 BB02 CC29 EE08 5B064 AA01 BA01 CA08 CA09 DA03 5L096 AA02 BA17 FA15 FA73 GA38 JA11 ──────────────────────────────────────────────────続き Continuing from the front page (72) Inventor Hideo Horiuchi 1st address, Komukai Toshiba-cho, Saiwai-ku, Kawasaki-shi, Kanagawa F-term in the Toshiba R & D Center (reference) 5B029 BB02 CC29 EE08 5B064 AA01 BA01 CA08 CA09 DA03 5L096 AA02 BA17 FA15 FA73 GA38 JA11

Claims

[Claims]

1. A color separation means for performing color separation of a part corresponding to the color of a seal stamp and a color part of an input character with respect to input image data, and an image of a color part of the input character separated by the color separation means. Determining means for performing a cancellation determination on a character description portion of the data; and when the determination means determines that the image data is a canceled portion, extracting a position portion of the image data of the seal stamp color corresponding to the position of the canceled portion A means for outputting the presence or absence of a seal color at the corresponding portion.

2. A character string is recognized by combining color separation means for separating a specific color from input image data and a plurality of methods for recognizing a part separated as a specific color by the color separation means as a character string. An image processing apparatus comprising:

3. A color separating means for separating a specific color from input image data of a form, and a part separated by a color different from the color of the form of the form by the color separating means. An image recognition apparatus comprising: a recognition unit configured to recognize a part which is the same color as the color of the form and is not separated from the form by character recognition.