JP2007011529A

JP2007011529A - Method for determining character recognition position in ocr processing

Info

Publication number: JP2007011529A
Application number: JP2005189270A
Authority: JP
Inventors: Akitoshi Yoshizawa; 明登志吉澤; Daisuke Okamoto; 大輔岡本
Original assignee: NJK Corp
Current assignee: NJK Corp
Priority date: 2005-06-29
Filing date: 2005-06-29
Publication date: 2007-01-18

Abstract

<P>PROBLEM TO BE SOLVED: To make it unnecessary to execute format setting based on software in each document format, to perform character recognition without being influenced by variation in character positions of image data and to automatically search the positions of characters to be recognized. <P>SOLUTION: A reference character string to be a reference is selected from image data of a routine document, and the coordinates of the range of the reference character string and the coordinates of an OCR character recognition target range corresponding to the reference character string are specified by a user interface of a computer system to search the image data in each page. When the reference character string can not be searched for, the range of the reference character string is moved on the image data by changing the value of the coordinates around the coordinates, a character string coincident with the reference character string on the image data is searched, and correspondingly to the movement of the range of the reference character string, the coordinates of the OCR character recognition target range corresponding to the reference character string are moved. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、紙媒体の定型文書をスキャナから取り込んだ画像データをＯＣＲ処理により文字認識したり、既にＰＤＦ形式で画像データに処理された電子文書をＯＣＲ処理したりする場合に、画像データ上のＯＣＲ処理する文字認識位置を決定するＯＣＲ処理における文字認識位置の決定方法に関する。 According to the present invention, when image data obtained by scanning a standard document on a paper medium from a scanner is character-recognized by OCR processing, or an electronic document that has already been processed into image data in PDF format is subjected to OCR processing, The present invention relates to a method for determining a character recognition position in OCR processing for determining a character recognition position for OCR processing.

ＯＣＲ処理では、基準となる画像データ（基準位置）上の文字認識位置（範囲）を予めソフトウェアで書式設定（フォームレイアウト設定）し、実際に取り込んだ画像データ上の前記基準となる画像データ上の文字認識位置に対応する位置（範囲）の文字をＯＣＲ認識するようにしている。しかし、実際に取り込んだ画像データ上の文字認識位置と、基準となる画像データ上の文字認識位置との間に誤差が発生すると、ＯＣＲ処理での文字認識ができなくなる場合がある。 In the OCR processing, the character recognition position (range) on the reference image data (reference position) is pre-formatted by software (form layout setting), and the reference image data on the actually captured image data is displayed. A character at a position (range) corresponding to the character recognition position is OCR-recognized. However, if an error occurs between the character recognition position on the actually captured image data and the character recognition position on the reference image data, character recognition may not be performed in the OCR process.

例えば、図１（ａ）及び（ｂ）に示すように、スキャナで読み取り処理された画像データが基準位置（基準となる画像データ）に対して傾斜して形成され、この結果、画像データの原稿読み取り面上の認識の対象となる文字列（対象の文字列）が、基準位置に対して書式設定された範囲（位置）から外れると、ＯＣＲ処理での文字認識ができなくなる。このため、図１（ｃ）に示すように、スキャナの文書取り込み処理及びＯＣＲ処理の機能で、読み取り処理された画像データの傾きを一定の範囲で補正するようにしている。しかし、この補正後であっても、画像データの認識の対象となる文字列が、基準位置に対して書式設定された範囲から外れると、ＯＣＲ処理での文字認識ができなくなる。 For example, as shown in FIGS. 1A and 1B, the image data read and processed by the scanner is formed with an inclination with respect to a reference position (reference image data). If the character string to be recognized on the reading surface (target character string) is out of the range (position) formatted with respect to the reference position, character recognition in the OCR process cannot be performed. For this reason, as shown in FIG. 1C, the inclination of the image data that has been read is corrected within a certain range by the document capture processing and OCR processing functions of the scanner. However, even after this correction, if the character string that is the target of image data recognition falls outside the range that is formatted with respect to the reference position, character recognition in the OCR process cannot be performed.

また、図２（ａ）及び（ｂ）に示すように、ソフトウェアで作製処理された画像データ自体が基準位置（基準となる画像データ）に対して縦方向及び横方向に外れて形成され、この結果、画像データの原稿読み取り面上の認識の対象となる文字列（対象の文字列）が、基準位置に対して書式設定された範囲から外れると、ＯＣＲ処理での文字認識ができなくなる。例えば、ソフトウェアの余白設定や倍率設定の相違により、このような誤差が発生し、このような誤差が発生した場合、画像データの補正は困難である。 Further, as shown in FIGS. 2A and 2B, the image data itself processed by software is formed so as to deviate in the vertical and horizontal directions with respect to the reference position (reference image data). As a result, if the character string to be recognized on the original reading surface of the image data (target character string) is out of the range formatted with respect to the reference position, character recognition in the OCR process cannot be performed. For example, such errors occur due to differences in software margin settings and magnification settings, and when such errors occur, it is difficult to correct image data.

更に、様々な方法で大量に作成される画像データでは、基準位置（基準となる画像データ）に対する誤差が個々に一定ではなく、従来の原稿様式に対して固定的に書式設定する方法では対応できない問題があった。 Furthermore, in image data created in large quantities by various methods, the error with respect to the reference position (reference image data) is not constant, and cannot be handled by the method of fixedly formatting the conventional document format. There was a problem.

大量の定型文書を機械的に読み取ってＯＣＲ処理する場合、画像データ及び電子文章の作成過程で文字位置の不揃いが生じ、ＯＣＲ処理の誤認識による修正や追加入力が多数発生しているのが現状であった。このため、ＯＣＲ処理における文字の認識率を向上させて修正作業や追加入力作業を軽減させ、更に、従来ＯＣＲ処理が不可能であった文書の電子化に寄与するものの開発が強く望まれていた。 When OCR processing is performed by mechanically reading a large number of standard documents, character position irregularities occur in the process of creating image data and electronic text, and many corrections and additional inputs have occurred due to misrecognition of OCR processing. Met. For this reason, there has been a strong demand for development of a document that contributes to digitization of documents, which has been impossible with conventional OCR processing, by improving the recognition rate of characters in OCR processing to reduce correction work and additional input work. .

本発明は上記事情に鑑みて為されたもので、原稿様式毎のソフトウェアによる書式設定を行う必要をなくし、画像データの文字位置の変動に影響されることなく文字認識を行うことができ、更に、認識すべき文字の位置を自動的に探索できるようにしたＯＣＲ処理における文字認識位置の決定方法を提供することを目的とする。 The present invention has been made in view of the above circumstances, eliminates the need for software-based format settings for each document format, and allows character recognition without being affected by variations in the character position of image data. Another object of the present invention is to provide a method for determining a character recognition position in the OCR process so that a character position to be recognized can be automatically searched.

上記目的と達成するため、本発明のＯＣＲ処理における文字認識位置の決定方法は、定型文書の画像データから探索の基準となる基準文字列を選定し、該基準文字列の範囲の座標及び該基準文字列に対応するＯＣＲ文字認識の対象範囲の座標をコンピュータ・システムのユーザインターフェイスにより設定して、画像データをページ毎に探索することを特徴とする。
このように、コンピュータ・システムのユーザインターフェイスによりＯＣＲ処理における文字認識位置を設定することで、原稿様式毎のソフトウェアによる書式設定を不要とすることができる。 In order to achieve the above object, a method for determining a character recognition position in OCR processing according to the present invention selects a reference character string as a reference for search from image data of a standard document, coordinates of the range of the reference character string, and the reference The coordinates of the target range for OCR character recognition corresponding to the character string are set by the user interface of the computer system, and the image data is searched for each page.
In this way, by setting the character recognition position in the OCR process through the user interface of the computer system, it is possible to eliminate the need for format setting by software for each document format.

前記基準文字列を、前記基準文字列の範囲の座標を中心とし該座標の値を変化させて画像データ上を移動させ、画像データ上の前記基準文字列と一致する文字列を探索するようにすることが好ましい。 The reference character string is moved around the image data by changing the value of the coordinates around the coordinates of the range of the reference character string, and a character string that matches the reference character string on the image data is searched. It is preferable to do.

画像データ上の前記基準文字列の移動に連動して、前記基準文字列に対応するＯＣＲ文字認識の対象範囲の座標を移動させるようにすることが好ましい。これにより、画像データ上の文字位置の変動に影響されることなく、ＯＣＲ処理により認識すべき文字の位置を自動的に探索して、ＯＣＲ処理による文字認識を行うことができる。 It is preferable to move the coordinates of the target range for OCR character recognition corresponding to the reference character string in conjunction with the movement of the reference character string on the image data. As a result, the character position to be recognized by the OCR process can be automatically searched and character recognition by the OCR process can be performed without being affected by the fluctuation of the character position on the image data.

本発明によれば、例えば原稿に多少の歪みがある場合でも、ＯＣＲ処理による文字認識が可能となり、このため、修正作業や追加入力作業を軽減できるサービスを提供することができる。 According to the present invention, for example, even when a document has some distortion, it is possible to perform character recognition by OCR processing. Therefore, it is possible to provide a service that can reduce correction work and additional input work.

以下、本発明の実施の形態を図面を参照して説明する。
本発明を実施するためのコンピュータ・システムは、図３に示すように、中央処理装置１０、記憶装置１２、表示装置１４及び入力装置１６から構成されている。定型文書（紙媒体）をスキャナ１８で読み込んだ画像データ（イメージデータ）や定型文書（ＰＤＦ形式）のＰＤＦデータは、コンピュータ・システムの記憶装置１２に記憶され、中央処理装置１０でＯＣＲ処置を行って画像データから求めたテキストデータも記憶装置１２に記憶される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
As shown in FIG. 3, the computer system for carrying out the present invention comprises a central processing unit 10, a storage unit 12, a display unit 14, and an input unit 16. Image data (image data) read from a standard document (paper medium) by the scanner 18 or PDF data of a standard document (PDF format) is stored in the storage device 12 of the computer system, and the central processing unit 10 performs OCR processing. Text data obtained from the image data is also stored in the storage device 12.

図４に示すように、画像データは、ワープロ等で電子的に作成された原稿文書をＰＤＦ作成ソフトで出力したＰＤＦ形式のデータ（ＰＤＦデータ）および紙媒体の原稿文書をスキャナで読み取ったイメージ（イメージデータ）を指す。画像データは、原稿文書の文字や罫線の描画用の情報であり、コンピュータ・システムの表示装置１４での元の原稿文書の再現やＯＣＲ処理の文字認識に使用され、ＯＣＲ処理後にテキストデータとして記憶装置１２に記憶される。１つの画像データは、１種類の原稿文書の様式（定型文書）であり、原稿文書の量（例えば、枚数）に応じたデータ量を持つ。 As shown in FIG. 4, the image data includes PDF format data (PDF data) obtained by outputting a document document electronically created by a word processor or the like by PDF creation software, and an image obtained by scanning a document document on a paper medium (PDF data). Image data). The image data is information for drawing characters and ruled lines of the original document, and is used for reproducing the original original document on the display device 14 of the computer system and for character recognition in the OCR processing, and is stored as text data after the OCR processing. It is stored in the device 12. One image data is one type of original document format (standard document), and has a data amount corresponding to the amount (for example, the number of original documents) of the original document.

以下、図５に示す原稿文書を画像データに変換してＯＣＲ処理を行うようにした例について説明する。この原稿文書を画像データに変換後に表示装置１４に出力した時、図６に示すように、原稿文書上の基準となる文字位置と該基準となる文字位置と対応する画像データ上の文字位置との間に、スキャナによる文書取り込み段階またはＯＣＲ処理では補正できない歪みが残っており、原稿文書と画像データの文字位置を比較した時の誤差が１つの様式（定型文書）として一定でない場合は、従来の書式設定（原稿文書を使用したフォームレイアウト設定）では文字認識が不可能である。従って、本発明では以下のような処理を行っている。 An example in which the original document shown in FIG. 5 is converted into image data and OCR processing is performed will be described below. When this original document is converted to image data and output to the display device 14, as shown in FIG. 6, the reference character position on the original document and the character position on the image data corresponding to the reference character position are If there is a distortion that cannot be corrected in the document capture stage by the scanner or in the OCR process, and the error when comparing the character positions of the original document and the image data is not constant as one form (standard document), With this format setting (form layout setting using a manuscript document), character recognition is impossible. Therefore, in the present invention, the following processing is performed.

本発明の処理フローを図７に示す。図７に示すように、処理を開始すると、画像データファイルから基準とする１件分の画像データを入力情報として表示装置１４に出力して表示する。この画像データには、前述のように、紙媒体の定型文書をスキャナの機能により画像データに変換されたものと、定型文書がＰＤＦ形式の電子文書に変換されたものが含まれる。表示装置１４に表示された画像データを図８に示す。 The processing flow of the present invention is shown in FIG. As shown in FIG. 7, when the process is started, one reference image data from the image data file is output to the display device 14 as input information and displayed. As described above, the image data includes a standard document on a paper medium converted into image data by a scanner function, and a standard document converted into a PDF format electronic document. The image data displayed on the display device 14 is shown in FIG.

次に、表示装置１４に表示された画像データを基に、探索の基準となる基準文字列を選定し、画像データ上の該基準文字列の位置を設定して記憶する。次に、基準文字列に対応するＯＣＲ文字認識の対象となる項目内容の範囲を設定して記憶する。この基準文字列の設定及び項目内容の範囲の設定を、１つの原稿文書の中で文字認識する部分の数だけ繰り返す。 Next, based on the image data displayed on the display device 14, a reference character string serving as a search reference is selected, and the position of the reference character string on the image data is set and stored. Next, the range of the item content that is the target of OCR character recognition corresponding to the reference character string is set and stored. The setting of the reference character string and the setting of the item content range are repeated as many times as the number of character recognition portions in one original document.

ここで、基準文字列は、例えば固定的な項目名称の語句（文字列）を意味し、ＯＣＲ文字認識の対象となる項目内容の範囲とは、基準文字列と対応する可変の文字列を含むと想定する範囲を意味する。この例では、図８に示すように、基準文字列として「氏名」を、項目内容の範囲として、「氏名」と対応する文字列、つまり「鈴木一郎」のように、実際の氏名を含む想定する範囲を指定した例を示している。 Here, the reference character string means, for example, a fixed item name phrase (character string), and the range of item contents to be subjected to OCR character recognition includes a variable character string corresponding to the reference character string. Means the assumed range. In this example, as shown in FIG. 8, it is assumed that “name” is used as a reference character string, and a character string corresponding to “name”, ie, “Ichiro Suzuki”, is included as an item content range. The example which specified the range to perform is shown.

そして、図９に示すように、表示装置１４に表示された画像データの「氏名」の文字列をコンピュータ・システムの入力装置１６により矩形枠で囲み、その範囲を、矩形枠の左上の頂点の座標Ａと右下の頂点の座標Ｂをピクセル値で求めて基準文字列の位置を設定し、この範囲内をＯＣＲ文字認識して得られた基準文字列と共に記憶している。同様に、表示装置１４に表示された画像データから「鈴木一郎」の文字列を含み、かつ実際の名前を含む想定する範囲をコンピュータ・システムの入力装置１６により矩形枠で囲み、その範囲を、矩形枠の左上の頂点の座標Ｃと右下の頂点の座標Ｄをピクセル値で求めて、項目内容の範囲を設定し記憶している。つまり、この例では、基準文字列として「氏名」を選定し、基準文字列の位置を座標Ａ（１００，１５０）と座標Ｂ（３００，２００）で設定し、項目内容の範囲を座標Ｃ（３２０，１５０）と座標Ｄ（９００，２００）で設定して記憶している。 Then, as shown in FIG. 9, the character string “name” of the image data displayed on the display device 14 is surrounded by a rectangular frame by the input device 16 of the computer system, and the range is represented by the upper left vertex of the rectangular frame. The coordinates A and the coordinates B of the lower right vertex are obtained by pixel values, the position of the reference character string is set, and this range is stored together with the reference character string obtained by OCR character recognition. Similarly, an assumed range including the character string “Ichiro Suzuki” from the image data displayed on the display device 14 and including the actual name is surrounded by a rectangular frame by the input device 16 of the computer system. The coordinates C of the upper left vertex and the coordinates D of the lower right vertex of the rectangular frame are obtained as pixel values, and the range of the item contents is set and stored. That is, in this example, “name” is selected as the reference character string, the position of the reference character string is set by coordinates A (100, 150) and coordinates B (300, 200), and the range of the item content is set by coordinates C ( 320, 150) and coordinates D (900, 200).

以上の設定で記憶した、基準文字列、基準文字列の範囲の座標、基準文字列に対応してＯＣＲ文字認識の対象となる項目内容の範囲の座標からなる３つの情報を、原稿文書中における１つの項目の情報としてコンピュータ・システムに保存する。
この例では、１つの原稿文書の中で文字認識する部分が１つのみ場合について説明しているが、文字認識する部分が２つ以上ある場合には、基準文字列の設定（基準文字列の選定及びその範囲の設定）及び項目内容の範囲の設定を文字認識する部分の数だけ繰り返して記憶する。 Three types of information stored in the above settings, including the reference character string, the coordinates of the reference character string range, and the coordinates of the item content range corresponding to the reference character string, which are subject to OCR character recognition, are stored in the original document. One item of information is stored in the computer system.
In this example, a case is described in which only one character recognition portion is included in one original document. However, when there are two or more character recognition portions, reference character string setting (reference character string setting) is performed. The selection and setting of the range) and the setting of the item content range are repeatedly stored for the number of character recognition portions.

上記のようにして、基準文字列の設定及び項目内容の範囲の設定を行ってコンピュータ・システムに記憶した後、画像データファイルから順次読み取った画像データのページ毎の探索を行う。先ず、基準文字列をキーとして、画像データ中の基準文字列を検索する。この例では、基準文字列を、基準文字列の位置で求めた座標を中心として画像データ上に指定する範囲を旋回させながら、画像データ上の基準文字列と一致する文字列を探索するようにしている。これにより、画像データ上の文字位置の変動に影響されることなく、ＯＣＲ処理による文字認識を行うことが可能となる。 As described above, the reference character string and the item content range are set and stored in the computer system, and then the image data read sequentially from the image data file is searched for each page. First, a reference character string in image data is searched using the reference character string as a key. In this example, the reference character string is searched for a character string that matches the reference character string on the image data while turning the range specified on the image data around the coordinates obtained at the position of the reference character string. ing. This makes it possible to perform character recognition by OCR processing without being affected by fluctuations in character positions on the image data.

つまり、図１０に示すように、ある画像データにあっては、基準文字列と一致する文字列の位置が、前述のようにして設定した基準文字列の位置と異なっている場合がある。このよう場合に、前述のようにして設定した矩形の基準文字列の範囲を、この座標を起点として、例えば中心から外周方向に向かって、１ピクセル単位に変化させて旋回させながら移動させ、矩形の中の成分をＯＣＲ文字認識して、基準文字列（この例では「氏名」の文字）を探索する。なお、この基準文字列の探索は、この例に限定されることなく、例えば基準文字列の範囲を、数ピクセル単位に変化させて旋回させるようにしてもよいことは勿論である。
画像データから基準文字列が検出できずに探索範囲を越えた場合には、探索範囲を越えた時点で、対象の項目の処理を終了して次の項目へ移る。 That is, as shown in FIG. 10, in some image data, the position of the character string that matches the reference character string may be different from the position of the reference character string set as described above. In this case, the range of the rectangular reference character string set as described above is moved from the center to the outer peripheral direction, for example, by changing it in units of one pixel from the center while turning and moving the rectangle. The component in is recognized by OCR characters, and a reference character string (in this example, the character “name”) is searched. The search for the reference character string is not limited to this example. For example, the range of the reference character string may be changed in units of several pixels and turned.
If the reference character string cannot be detected from the image data and the search range is exceeded, the processing of the target item is terminated and the process moves to the next item when the search range is exceeded.

基準文字列が検出された画像データに対して、項目内容の範囲を捕捉する。つまり、文字列を検出した時点で、起点から移動した座標の差異を求め、基準文字列に対応する項目内容の範囲の座標を、この座標の差異に比例させて移動させ、これによって、ＯＣＲ文字認識の対象範囲となる項目内容の範囲（図１０における「鈴木一郎」の位置）を捕捉する。この時点で１つの項目におけるＯＣＲ文字認識の対象範囲の座標が決定する。これにより、ＯＣＲ処理により認識すべき文字の位置を自動的に探索することができる。 The range of the item content is captured for the image data in which the reference character string is detected. That is, when the character string is detected, the difference in coordinates moved from the starting point is obtained, and the coordinates of the range of the item content corresponding to the reference character string are moved in proportion to the difference in coordinates, thereby the OCR character. The range of the item content (position of “Ichiro Suzuki” in FIG. 10) that is the recognition target range is captured. At this time, the coordinates of the target range for OCR character recognition in one item are determined. Thereby, the position of the character to be recognized by the OCR process can be automatically searched.

次に、前述のようにして捕捉した項目内容の範囲内にある文字列のＯＣＲ文字認識を行い、ＯＣＲ文字認識の結果をテキスト変換して記憶する。そして、１つの画像データに設定した全ての項目が終了した時点で、画像データから求めたテキストデータをコンピュータ・システムに保存する。以上で１つの画像データに対する処理を終了する。 Next, OCR character recognition of the character string within the range of the item content captured as described above is performed, and the result of OCR character recognition is converted into text and stored. When all items set for one image data are completed, the text data obtained from the image data is stored in the computer system. This completes the processing for one image data.

以上のように、この例によれば、原稿様式毎のソフトウェアによる書式設定を行う必要をなくし、画像データの文字位置の変動に影響されることなく文字認識を行うことができ、更に、認識すべき文字の位置を自動的に探索することができる。 As described above, according to this example, it is not necessary to perform format setting by software for each original form, and character recognition can be performed without being affected by fluctuations in the character position of image data. The position of the power should be searched automatically.

原稿読み取り処理における原稿の物理的な傾きによって、画像データに傾きが発生し、更に補正処理によって傾きを補正した状態を示す図である。FIG. 6 is a diagram illustrating a state in which an image data is tilted due to a physical tilt of the document in the document reading process and is further corrected by a correction process. 画像データの作成処理ソフトウェアの機能と使用条件によって、画像データ自体に位置の外れが生じた状態を示す図である。It is a figure which shows the state from which the position shift | offset | difference occurred in image data itself with the function and use conditions of image data creation processing software. 本発明を実施するためのコンピュータ・システムの例を示す図である。FIG. 2 is a diagram illustrating an example of a computer system for carrying out the present invention. 本発明の実施するためのコンピュータ・システムにおける原稿文書と画像データとの関係を示す図である。It is a figure which shows the relationship between the manuscript document and image data in the computer system for implementing this invention. 画像データに変換してＯＣＲ文字認識する定型文書の例を示す図である。It is a figure which shows the example of the fixed form document which converts into image data and recognizes an OCR character. 図５に示す定型文書を画像データ処理して表示した一例を示す図である。It is a figure which shows an example which displayed the fixed form document shown in FIG. 5 by image data processing. 本発明の制御フロー図である。It is a control flow figure of the present invention. 表示装置に表示された画像データを基に基準文字列と項目範囲を設定する例を示す図である。It is a figure which shows the example which sets a reference character string and an item range based on the image data displayed on the display apparatus. 基準文字列の範囲の座標と項目内容の範囲の座標を、基準文字列と共に記憶する例の説明に付する図である。It is a figure attached | subjected to description of the example which memorize | stores the coordinate of the range of a reference | standard character string, and the coordinate of the range of item content with the reference | standard character string. 文字列の探索の説明に付する図である。It is a figure attached | subjected to description of the search of a character string.

Explanation of symbols

１０中央処理装置
１２記憶装置
１４表示装置
１６入力装置
１８スキャナ 10 Central Processing Unit 12 Storage Unit 14 Display Unit 16 Input Unit 18 Scanner

Claims

A reference character string serving as a search reference is selected from the image data of the fixed form document, and the coordinates of the range of the reference character string and the coordinates of the target range of OCR character recognition corresponding to the reference character string are determined by the user interface of the computer system. A method for determining a character recognition position in OCR processing, characterized by setting and searching image data page by page.

The range of the reference character string is moved on the image data by changing the value of the coordinate around the coordinates, and a character string that matches the reference character string on the image data is searched for. A method for determining a character recognition position in the OCR processing according to Item 1.

3. The method for determining a character recognition position in OCR processing according to claim 2, wherein the coordinates of the target range for OCR character recognition corresponding to the reference character string are moved in conjunction with the movement of the range of the reference character string. .