JP2012022577A

JP2012022577A - Image processing apparatus, control method, and computer program

Info

Publication number: JP2012022577A
Application number: JP2010161040A
Authority: JP
Inventors: Taeko Yamazaki; 妙子山▲崎▼
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2010-07-15
Filing date: 2010-07-15
Publication date: 2012-02-02

Abstract

PROBLEM TO BE SOLVED: To extract a document area of exact original size by estimating a correct border line between a document and a background even when a natural image includes a plurality of documents whose borders with the background are indistinctive.SOLUTION: An image processing apparatus includes: character area extraction means of extracting a character area which is a circumscribed rectangle of a character included in a document area from input image data; first search range determination means of determining a first search range for searching for a document area end pixel as a document area end based upon the position of the extracted character area; document area end pixel extraction means of extracting the document area end pixel from the first search range; second search range determination means of determining a second search range for searching a document area end line as a document area end based upon the position of the extracted document area end pixel; document area end line extraction means of extracting the document area end line from the second search range; and document area estimation means of determining the document area according to the document area end line.

Description

本発明は、デジタルカメラなどで撮影された自然画中の文書の領域を抽出する技術に関する。 The present invention relates to a technique for extracting a region of a document in a natural image shot with a digital camera or the like.

近年、ネットワークの広がりにより、文書が電子的に配布される機会も増え、それに伴い紙の文書をスキャンにより電子文書化して配布可能とする技術が普及している。しかし、掲示されているポスターや、会議で用いたホワイトボード、サイズの大きな模造紙など、スキャンすることが困難な文書が存在する。そこで、カメラで撮影した画像を電子文書化する技術が考えられている。その場合、カメラと被写文書との位置関係により、得られる画像に台形状の歪みが生じるため、歪みを補正する技術が必要となる。 In recent years, due to the spread of networks, the opportunity for electronic distribution of documents has increased, and along with this, a technology that enables paper documents to be electronically distributed by scanning has become widespread. However, there are documents that are difficult to scan, such as posted posters, whiteboards used in meetings, and large-size imitation paper. In view of this, a technique for electronically documenting images taken with a camera has been considered. In that case, a trapezoidal distortion occurs in the obtained image due to the positional relationship between the camera and the document to be copied, and thus a technique for correcting the distortion is required.

例えば、色差からエッジを取得し、一定以上の長さの線分を文書枠として検出し、歪みを補正する技術がある（特許文献１参照）。この技術を用いれば撮影画像中から文書を検出して歪みを補正することが可能となる。また、台座上にある文書を撮影した場合に、検出した線分候補の撮影画像平面上の相対位置から隣接辺を求めていく技術がある（特許文献２）。また、撮影した文書に、他のオブジェクトに隠れて文書の端を抽出できない場合において、原稿端を推定する技術もある。まず、入力画像からエッジを抽出し、原稿端のおおよその位置を検出する。文書全体の色情報から生成したヒストグラムのピーク値から下地色候補および背景色候補を推定し、推定された下地色候補および背景色候補から最終的な原稿端を求めている（特許文献３）。 For example, there is a technique for acquiring an edge from a color difference, detecting a line segment having a certain length or more as a document frame, and correcting distortion (see Patent Document 1). By using this technique, it is possible to detect a document from a captured image and correct distortion. Further, there is a technique for obtaining an adjacent side from a relative position on a captured image plane of a detected line segment candidate when a document on a pedestal is captured (Patent Document 2). There is also a technique for estimating the edge of a document when the edge of the document cannot be extracted because it is hidden behind another object in the photographed document. First, an edge is extracted from the input image, and the approximate position of the document edge is detected. Background color candidates and background color candidates are estimated from the peak values of the histogram generated from the color information of the entire document, and the final document edge is obtained from the estimated background color candidates and background color candidates (Patent Document 3).

特開２００３−０５８８７７号公報JP 2003-058877 A 特開２００７−５８６３４号公報JP 2007-58634 A 特開２００４−０９６４３５号公報JP 2004-096435 A

文書をカメラにより撮影した場合、対象とカメラを正確に正対させるのが困難であるため、撮影画像中の文書には３次元的な傾きにより、台形状の歪みが生じる。そのため撮影画像中から文書を読みやすい形で抽出する為には、文書の枠を正確に抽出する必要がある。枠を抽出する方法に、ハフ変換などを用いて直線成分を検出し、４直線から枠を推定する方法がある。 When a document is photographed with a camera, it is difficult to accurately face the object and the camera, so that the document in the photographed image has a trapezoidal distortion due to a three-dimensional tilt. Therefore, in order to extract a document from a captured image in a form that is easy to read, it is necessary to accurately extract the frame of the document. As a method of extracting a frame, there is a method of detecting a straight line component using Hough transform or the like and estimating a frame from four straight lines.

しかし、文書の背景によっては、直線が多数抽出される。特に、図３のような、文書領域３０１および文書領域３０２の下地色と画像３００全体の背景が近似している場合、文書領域端のエッジを出すにはかなりの弱エッジまで検出し、枠を推定しなければならない。また、自然画中には複数の文書が映りこむこともある。このため、枠を構成する直線の組み合わせ候補数が増大し、正しい文書枠の推定が困難となる問題があった。 However, many straight lines are extracted depending on the background of the document. In particular, when the background color of the document area 301 and the document area 302 and the background of the entire image 300 are approximate as shown in FIG. Must be estimated. In addition, a plurality of documents may be reflected in a natural image. For this reason, there is a problem that the number of straight line combination candidates constituting the frame increases and it is difficult to estimate a correct document frame.

また、一般的な領域分割を行い文字領域だけを抽出しても、図４のように本来の文書領域より小さい領域しか抽出できない。このため、文書領域を本来の大きさ通りに抽出できない。さらに、同一であるべき文書が文字領域単位で分断されてしまい、ユーザの意図にかなった文書領域の抽出ができない問題があった。 Further, even if a general area division is performed and only a character area is extracted, only an area smaller than the original document area can be extracted as shown in FIG. For this reason, the document area cannot be extracted with the original size. Furthermore, the document that should be the same is divided in character area units, and there is a problem that it is not possible to extract the document area in accordance with the user's intention.

上記課題を解決するために、本願発明は以下の構成を有する。文書領域を有する画像データから前記文書領域を抽出する画像処理装置であって、入力された画像データから前記文書領域を構成する文字の文字領域を抽出する文字領域抽出手段と、抽出された前記文字領域の位置を基準として、文書領域端となる文書領域端画素を探索する第一の探索範囲を決定する第一の探索範囲決定手段と、前記第一の探索範囲から前記文書領域端画素を抽出する文書領域端画素抽出手段と、抽出された前記文書領域端画素の位置を基準として、文書領域端となる文書領域端線を探索する第二の探索範囲を決定する第二の探索範囲決定手段と、前記第二の探索範囲から前記文書領域端線を抽出する文書領域端線抽出手段と、前記文書領域端線から前記文書領域を確定する文書領域確定手段とを有する。 In order to solve the above problems, the present invention has the following configuration. An image processing apparatus for extracting the document area from image data having a document area, wherein the character area extracting unit extracts a character area of characters constituting the document area from input image data, and the extracted character First search range determination means for determining a first search range for searching for a document region end pixel serving as a document region end with reference to the position of the region, and extracting the document region end pixel from the first search range And a second search range determining means for determining a second search range for searching for a document area end line serving as a document area end with reference to the position of the extracted document area end pixel. And document area end line extracting means for extracting the document area end line from the second search range, and document area determining means for determining the document area from the document area end line.

ひとつの自然画に背景との境界がはっきりしない文書領域が複数存在しても、文書領域と背景の正しい境界線を推定でき、本来の大きさ通りの文書領域の抽出が可能になる。 Even if there are a plurality of document areas in which the boundary between the background and the background is not clear in one natural image, the correct boundary line between the document area and the background can be estimated, and the document area can be extracted with the original size.

第一、第二の実施形態に係るシステム構成の例図。The example figure of the system configuration concerning the first and second embodiments. 第一、第二の実施形態に係るフローチャートの図。The figure of the flowchart which concerns on 1st, 2nd embodiment. 第一、第二の実施形態に係るカラー入力画像の一例を示す図。The figure which shows an example of the color input image which concerns on 1st, 2nd embodiment. 図３のカラー入力画像の領域分割結果を示す図。The figure which shows the area | region division result of the color input image of FIG. 第一の実施形態に係る文書領域抽出の詳細機能のブロック図。The block diagram of the detailed function of document area extraction which concerns on 1st embodiment. 文字領域４０１に対する第一の探索範囲を示した図。The figure which showed the 1st search range with respect to the character area 401. FIG. 濃度変化点抽出部の詳細フロー図。The detailed flowchart of a density | concentration change point extraction part. 文字領域４０１の文書領域端画素を示した図。The figure which showed the document area edge pixel of the character area 401. FIG. 文字領域４０１の第二の探索範囲を示した図。The figure which showed the 2nd search range of the character area 401. FIG. 文字領域４０１の文書領域端線を示した図。The figure which showed the document area | region end line of the character area 401. FIG. 文字領域４０１の文書領域端線から推定される直線を示した図。The figure which showed the straight line estimated from the document area end line of the character area 401. FIG. 文字領域４０１の文書領域を示した図。The figure which showed the document area of the character area 401. FIG. 文書領域１００１、１３０１の歪み補正処理、電子ファイル変換の一例を示す図。6 is a diagram illustrating an example of distortion correction processing and electronic file conversion of document areas 1001 and 1301. FIG. 第二の実施形態に係る文書領域抽出の詳細機能ブロック図。The detailed functional block diagram of the document area extraction which concerns on 2nd embodiment. 文字領域統合部の詳細フロー図。The detailed flowchart of a character area integration part. 文字領域４０２に対する第一の探索範囲および文書領域端画素を示した図。The figure which showed the 1st search range with respect to the character area 402, and a document area end pixel. 文字領域４０２、４０３に対する文書領域端画素、第二の探索範囲、文書領域端線を示した図。The figure which showed the document area edge pixel with respect to the character areas 402 and 403, the 2nd search range, and the document area edge line. 文字領４０２，４０３の文書領域を示した図。The figure which showed the document area of character area 402,403.

＜第一の実施形態＞
図１は本発明の実施形態を実施するためのシステム構成例である。ＣＰＵ１０１は、システム全体を制御し、各処理が定義された実行プログラムを実行する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０２では、処理プログラムや入出力データが展開されて処理される。記憶装置１０３は、処理対象となる画像データや処理済の電子ファイルを記憶する。入力装置１０４は、処理データを外部から入力するために用いられる。出力装置１０５は、処理データを外部に出力するために用いられる。 <First embodiment>
FIG. 1 shows an example of a system configuration for carrying out an embodiment of the present invention. The CPU 101 controls the entire system and executes an execution program in which each process is defined. In a RAM (Random Access Memory) 102, processing programs and input / output data are expanded and processed. The storage device 103 stores image data to be processed and processed electronic files. The input device 104 is used for inputting processing data from the outside. The output device 105 is used to output processing data to the outside.

デジタルカメラなどの入力装置１０４から入力された画像データは、ハードディスクなどの記憶装置１０３に入力データ１０３２として記憶される。記憶装置１０３に記憶されている処理プログラム１０３１はＲＡＭ１０２上の処理プログラム展開領域１０２１に展開され、ＣＰＵ１０１によって実行される。処理プログラムは入力データを記憶装置１０３から呼び出し、ＲＡＭ１０２上の入力データ領域１０２２を展開する。処理プログラムは入力データに対して処理を施し、ＲＡＭ１０２上の出力データ領域１０２３に処理結果を出力し、記憶装置１０３に出力データ１０３３として保存する。出力データ１０３３は必要に応じてディスプレイやプリンタなどの出力装置１０５に出力される。 Image data input from an input device 104 such as a digital camera is stored as input data 1032 in a storage device 103 such as a hard disk. The processing program 1031 stored in the storage device 103 is expanded in a processing program expansion area 1021 on the RAM 102 and executed by the CPU 101. The processing program calls input data from the storage device 103 and expands the input data area 1022 on the RAM 102. The processing program processes the input data, outputs the processing result to the output data area 1023 on the RAM 102, and saves it as output data 1033 in the storage device 103. The output data 1033 is output to the output device 105 such as a display or a printer as necessary.

［処理フロー］
図２は本発明の第一の実施形態におけるフローチャートである。図２を元に処理手順の詳細を説明する。Ｓ２０１では、入力画像の判定を行う。画像判定方法は、入力画像中に文字が存在するかどうかで自然画像と文字有り画像に分類する。画像データから文字を抽出する方法に関しては、例えば特開２００２−０４２０５５号公報「カラー文書からの文字認識方法」などを用いて抽出することができる。なお、この手法に限定するものではなく、本発明が適用可能であれば、他の画像判定方法を用いても良い。ここで、文字領域とは、１以上の文字を含む領域を指し、文字を含む矩形となる。また、文書領域とは、１以上の文字領域を含む領域を指し、画像によっては当該画像を撮影した角度などの影響により台形状となる歪みが生じているものが含まれる。 [Processing flow]
FIG. 2 is a flowchart in the first embodiment of the present invention. Details of the processing procedure will be described with reference to FIG. In S201, the input image is determined. The image determination method classifies a natural image and an image with characters depending on whether characters exist in the input image. Regarding a method for extracting characters from image data, for example, Japanese Patent Laid-Open No. 2002-042055 “Character recognition method from color document” can be used. Note that the present invention is not limited to this method, and other image determination methods may be used as long as the present invention is applicable. Here, the character area refers to an area including one or more characters and is a rectangle including characters. The document area refers to an area including one or more character areas, and some images include a trapezoidal distortion due to an influence of an angle at which the image is captured.

画像データ中に文字が存在する場合にはＳ２０２以降の処理を行う。また、画像データ中に文字が存在しない場合（自然画像の場合）は処理を終了する。Ｓ２０２では、画像データ中の文字を含む領域である文書領域を抽出する。Ｓ２０２の詳細な機能構成を図５に示す。画像５０１はＳ２０１で文字有り画像と判定された画像である。本実施形態では文字有り画像の例を図３の画像３００とし、画像３００が入力されたとして以降の処理の詳細を説明する。 If there is a character in the image data, the processing from S202 is performed. If no character is present in the image data (in the case of a natural image), the process is terminated. In S202, a document area that is an area including characters in the image data is extracted. A detailed functional configuration of S202 is shown in FIG. An image 501 is an image determined as an image with characters in S201. In the present embodiment, an example of an image with characters is an image 300 in FIG. 3, and the details of the subsequent processing will be described assuming that the image 300 is input.

カラー入力として画像３００には、“ＡＢＣ”と書き込まれた文書領域３０１および、“Ｔｈｅｒｅａｒｅｓｏｍｅｗｏｒｄｓｉｎｔｈｉｓｐａｐｅｒ”と書き込まれた文書領域３０２が存在している。ここでの文書領域の例として、具体的には紙のメモなどが想定されうる。文書領域３０１および文書領域３０２はこれら２つの背景色と類似した色のテーブルに置かれ、文書領域とテーブルとの境界ははっきりしていない。 As the color input, the image 300 includes a document area 301 in which “ABC” is written and a document area 302 in which “The are some words in this paper” are written. Specifically, a paper memo or the like may be assumed as an example of the document area here. The document area 301 and the document area 302 are placed in a table having a color similar to these two background colors, and the boundary between the document area and the table is not clear.

文字領域抽出部５０２は、入力画像を領域分割し、文字領域の抽出を行う。文字領域抽出部５０２には画像５０１が入力される。領域分割処理の具体例としては、ＵＳＰ５６８０４７８号公報記載の処理などがある。上例では文書画像中の黒画素連結成分、白画素成分の集合を抽出し、その形状、大きさ、集合状態等から、文字、絵や図、表、枠、線といった特徴的な領域を抽出している。本実施形態では、画像３００を公知の二値化技術で二値画像を生成し、領域分割を行う。 A character area extraction unit 502 divides an input image into areas and extracts character areas. An image 501 is input to the character region extraction unit 502. As a specific example of the area dividing process, there is a process described in US Pat. No. 5,680,478. In the above example, a set of black pixel connected components and white pixel components in the document image is extracted, and characteristic areas such as characters, pictures, diagrams, tables, frames, and lines are extracted from the shape, size, set state, etc. is doing. In the present embodiment, a binary image is generated from the image 300 by a known binarization technique, and region segmentation is performed.

図４は画像３００を属性ごとに領域分割した結果を表した図である。この結果では、文字領域４０１，４０２，４０３が領域として分割されている。文字領域抽出部５０２の出力は、文字領域、すなわち文字に該当する画素連結成分および文字画素連結成分の外接矩形である。抽出された文字領域は、文書領域端画素抽出部５０３に引き渡される。 FIG. 4 is a diagram showing the result of dividing the image 300 into regions for each attribute. In this result, the character areas 401, 402, and 403 are divided as areas. The output of the character area extraction unit 502 is a character area, that is, a pixel connected component corresponding to the character and a circumscribed rectangle of the character pixel connected component. The extracted character area is transferred to the document area edge pixel extraction unit 503.

文書領域端画素抽出部５０３では、文書領域と背景の境界の画素となりうる文書領域端画素を検出する。文書領域端画素抽出部５０３は、第一の探索範囲決定部５０４と濃度変化点抽出部５０５からなる。また、文書領域端画素抽出部５０３の入力は画像５０１および、文字領域抽出部５０２で出力された文字領域となる。 The document area edge pixel extraction unit 503 detects a document area edge pixel that can be a pixel at the boundary between the document area and the background. The document area edge pixel extraction unit 503 includes a first search range determination unit 504 and a density change point extraction unit 505. The input to the document area edge pixel extracting unit 503 is the image 501 and the character area output from the character area extracting unit 502.

［第一の探索範囲決定］
まず、第一の探索範囲決定部５０４において、画像５０１において文書領域端画素を探索する範囲を決定する。文書領域端画素を探索する範囲は、文字領域の内部の走査開始点（始点）から放射状に引いた線分とする。ここで、一般的な画像処理における走査は画像をすべて走査するＺスキャン、もしくはラインスキャンを指す。本実施形態では走査方向を放射状とすることで探索対象となる画素は画像全体をスキャンするよりも少なくなり、処理量が低減できる。さらに、単純な縦横方向の走査だけよりも文書領域端画素を多く獲得できるため、より文書領域の推定がしやすくなる。また、走査開始点を文字領域の内点にすることで、走査開始時の対象画素が確実に文書領域上にあり、文書領域と背景の微小な濃度変化を抽出しやすくなる利点がある。 [First search range determination]
First, the first search range determination unit 504 determines a range for searching for a document area end pixel in the image 501. The search range of the document area end pixel is a line segment that is radially drawn from the scanning start point (start point) inside the character area. Here, scanning in general image processing refers to Z scanning or line scanning for scanning all images. In the present embodiment, by setting the scanning direction to be radial, the number of pixels to be searched becomes smaller than when the entire image is scanned, and the processing amount can be reduced. Furthermore, since more document area end pixels can be obtained than by simple vertical and horizontal scanning, it is easier to estimate the document area. Further, by setting the scanning start point as the inner point of the character area, there is an advantage that the target pixel at the start of scanning is surely on the document area, and it is easy to extract a minute density change between the document area and the background.

本実施形態では走査開始点を文字領域の外接矩形の中心とし、探索範囲をこの中心点から４５度刻みの８方向に走査した線分とする。なお、探索する走査線の開始点は文字領域の外接矩形の内側であればどこでもよい。また、走査終了点を画像端、もしくは、他の文字領域の外接矩形と交差する画素までとする。 In this embodiment, the scanning start point is the center of the circumscribed rectangle of the character area, and the search range is a line segment scanned in 8 directions in 45 degree increments from this center point. Note that the starting point of the scanning line to be searched may be anywhere inside the circumscribed rectangle of the character area. Further, the scanning end point is set to the edge of the image or a pixel that intersects a circumscribed rectangle of another character area.

図６は文書領域３０１に含まれる文字領域４０１を基準とした第一の探索範囲を示している。走査線６０１〜６０８は第一の探索範囲となる走査線を示す。走査線６０１〜６０４、および、走査線６０６〜６０８の走査終了点は画像データ端であり、走査線６０５の走査終了点は画像データ中の異なる文字領域である文字領域４０２の外接矩形と交差する画素までとなる。 FIG. 6 shows a first search range based on the character area 401 included in the document area 301. Scan lines 601 to 608 indicate scan lines that are the first search range. The scanning end points of the scanning lines 601 to 604 and the scanning lines 606 to 608 are image data ends, and the scanning end point of the scanning line 605 intersects the circumscribed rectangle of the character region 402 which is a different character region in the image data. Up to pixels.

本実施形態では先に文書領域端画素を探索する走査線をすべて確定させたのち、次工程を行う濃度変化点抽出部５０５へ遷移する。しかし、１本の探索範囲が確定したら直ちに濃度変化点抽出部５０５に遷移し、濃度変化点を抽出後に他の未処理の探索範囲の決定するように繰り返しても良い。 In this embodiment, after all the scanning lines for searching the document region end pixel are first determined, the process proceeds to the density change point extraction unit 505 that performs the next process. However, as soon as one search range is determined, the process may be repeated so that the process proceeds to the density change point extraction unit 505 and another unprocessed search range is determined after the density change point is extracted.

以上、文字領域４０１の文書領域を抽出するにあたり、８本の走査線６０１〜６０８に対し、文書領域端画素を探索するものとして説明を続ける。 As described above, in extracting the document area of the character area 401, the description will be continued assuming that the document area end pixels are searched for the eight scanning lines 601 to 608.

［濃度変化点抽出］
濃度変化点抽出部５０５では、決定された第一の探索範囲となる入力画像中の走査線上の濃度変化点を抽出する。濃度変化点抽出部５０５について、図７のフローを用いて詳細を説明する。濃度変化点抽出部５０５では、文書領域の下地色を決定し、下地色と走査線上の画素の濃度変化を参照することで、対象画素が文書領域端画素かを判定する。 [Density change point extraction]
The density change point extraction unit 505 extracts density change points on the scanning line in the input image that is the determined first search range. The density change point extraction unit 505 will be described in detail with reference to the flow of FIG. The density change point extraction unit 505 determines the background color of the document area and refers to the background color and the density change of the pixel on the scanning line to determine whether the target pixel is the document area end pixel.

まず、Ｓ５０５１にて、走査開始点となる画素に近接する文字画素連結成分を文字領域から取得し、Ｓ５０５２に遷移する。Ｓ５０５２では、文書領域における下地色決定を行う。具体的には取得した文字画素連結成分の周辺画素を入力された画像５０１から取得し、文字周辺画素の色情報（画素値）の平均値を算出する。そして算出した色情報を文書下地色として記憶し、Ｓ５０５３へ遷移する。 First, in S5051, a character pixel connected component close to the pixel that becomes the scanning start point is acquired from the character region, and the process proceeds to S5052. In step S5052, the background color in the document area is determined. Specifically, the peripheral pixels of the acquired character pixel connected component are acquired from the input image 501 and the average value of the color information (pixel value) of the character peripheral pixels is calculated. Then, the calculated color information is stored as the document background color, and the flow shifts to S5053.

Ｓ５０５３では、先に決定した探索範囲の走査線を取得し、Ｓ５０５４へ遷移する。Ｓ５０５４では、走査線上の一画素を濃度変化点判定の対象画素とし、Ｓ５０５５へ遷移する。Ｓ５０５５では、文字画素を濃度変化判定から除外する。この処理は、濃度変化が明らかに大きいと想定される文字そのものの画素を除外し、背景と文書下地色の濃度変化点抽出の精度を上げることを目的としている。具体的には、対象画素が文字領域中の文字連結画素に該当するかを判定し、ＹＥＳならばＳ５０５８、ＮＯならばＳ５０５６へ遷移する。 In S5053, the scanning line of the previously determined search range is acquired, and the process proceeds to S5054. In S5054, one pixel on the scanning line is set as a target pixel for density change point determination, and the process proceeds to S5055. In S5055, the character pixel is excluded from the density change determination. The purpose of this processing is to eliminate the pixels of the character itself that is assumed to have a clearly large density change, and to improve the accuracy of extracting density change points of the background and document background color. Specifically, it is determined whether the target pixel corresponds to a character connection pixel in the character area. If YES, the process proceeds to S5058, and if NO, the process proceeds to S5056.

Ｓ５０５６では、文書領域端画素を判定する。具体的には、対象画素の色情報と先に求めた文書下地色との差分を求め、差が閾値より大きいかを判定し、ＹＥＳならＳ５０５７へ、ＮＯならＳ５０５８へ遷移する。なお、文書領域端画素判定を行う際の色（画素値）の差分の算出にはマンハッタン距離などを用いることが可能である。Ｓ５０５７では、対象画素を濃度変化点とし、この文字領域における文書領域端画素として関連付けて記憶し、Ｓ５０５１０へ遷移する。 In step S5056, the document area end pixel is determined. Specifically, the difference between the color information of the target pixel and the previously obtained document background color is obtained, and it is determined whether the difference is larger than the threshold. If YES, the process proceeds to S5057, and if NO, the process proceeds to S5058. Note that a Manhattan distance or the like can be used to calculate a color (pixel value) difference when performing document region edge pixel determination. In step S5057, the target pixel is set as a density change point, and is stored in association with the document region end pixel in the character region, and the flow advances to step S50510.

Ｓ５０５８では、走査線方向へ１画素分だけ進め、Ｓ５０５９へ遷移する。Ｓ５０５９では、この画素が探索範囲内か否かを判定する。具体的にはこの画素が画像端、もしくは、他の文字領域の外接矩形との交点かを判定し、ＹＥＳであればＳ５０５４へ、ＮＯであればＳ５０５１０へ遷移する。Ｓ５０５１０では、まだ走査していない探索走査線があるかを判定し、ＹＥＳであればＳ５０５３へ、ＮＯであればこの文字領域に対する濃度変化点抽出を終了する。 In step S5058, the scanning line direction is advanced by one pixel, and the flow advances to step S5059. In S5059, it is determined whether this pixel is within the search range. Specifically, it is determined whether this pixel is an image edge or an intersection with a circumscribed rectangle of another character area. If YES, the process proceeds to S5054, and if NO, the process proceeds to S50510. In S50510, it is determined whether there is a search scanning line that has not been scanned yet. If YES, the process proceeds to S5053, and if NO, the density change point extraction for this character area is terminated.

文字領域４０１の処理結果を図８に示す。文字領域４０１の濃度変化点、すなわち、文書領域端画素７０１〜７０８が得られる。文書領域端画素抽出部５０３はひとつの文字領域に関連付けられた文書領域端画素を出力し、文書領域端線抽出部５０６に引き渡される。 The processing result of the character area 401 is shown in FIG. Density change points of the character area 401, that is, document area end pixels 701 to 708 are obtained. The document area edge pixel extraction unit 503 outputs a document area edge pixel associated with one character area, and is delivered to the document area edge line extraction unit 506.

文書領域端線抽出部５０６は、文書と背景の境界の線となりうる文書領域端線を検出する。文書領域端線抽出部５０６は、第二の探索範囲決定部５０７とエッジ抽出部５０８からなる。また、文書領域端線抽出部５０６の入力は、画像５０１および、文書領域端画素抽出部５０３で出力された文字領域に関連付けられた文書領域端画素となる。 The document area end line extraction unit 506 detects a document area end line that can be a boundary line between the document and the background. The document area end line extraction unit 506 includes a second search range determination unit 507 and an edge extraction unit 508. Further, the input of the document area end line extraction unit 506 is an image 501 and a document area end pixel associated with the character area output by the document area end pixel extraction unit 503.

［第二の探索範囲決定］
まず、第二の探索範囲決定部５０７において、入力される画像５０１における文書領域端線を探索する範囲を決定する。文書領域端線を探索する範囲は、文書領域端画素を中心とした小矩形とする。小矩形の大きさの設定は、まず、小矩形の所定のサイズとして上限のサイズをあらかじめ定めておく。上限サイズの小矩形に文字連結成分が含まれるようであれば小矩形の大きさを縮小し、調整する。文書領域端線の抽出範囲を文書領域端画素が中心の小領域にすることで、ノイズとなる他の余計なエッジを発生させずに文書領域と背景色の微小な濃度変化点の抽出ができる利点がある。 [Second search range determination]
First, the second search range determination unit 507 determines a range for searching for a document area end line in the input image 501. The search range for the document area end line is a small rectangle centered on the document area end pixel. In setting the size of the small rectangle, an upper limit size is determined in advance as a predetermined size of the small rectangle. If a character connected component is included in the upper limit size small rectangle, the size of the small rectangle is reduced and adjusted. By making the extraction range of the document area end line a small area centered on the document area edge pixel, it is possible to extract minute density change points of the document area and the background color without generating other extra edges that cause noise. There are advantages.

図９は文字領域４０１に対する、抽出された文書領域端画素を基準とした第二の探索範囲をそれぞれ示している。小矩形８０１〜８０８は、抽出された文書領域端画素それぞれに対する探索範囲の小矩形を示している。本実施形態では先に文書領域端線を探索する小矩形をすべて確定させたのち、次工程を行うエッジ抽出部５０８へ遷移する。しかし、１つの探索範囲が確定したら直ちにエッジ抽出部５０８に遷移し、エッジ抽出後に他の探索範囲の決定を繰り返してもよい。 FIG. 9 shows a second search range with respect to the character area 401 based on the extracted document area end pixels. Small rectangles 801 to 808 indicate small rectangles in the search range for each of the extracted document region end pixels. In this embodiment, after all the small rectangles for searching the document area end line are first determined, the process proceeds to the edge extraction unit 508 that performs the next process. However, the transition to the edge extraction unit 508 may be performed immediately after one search range is determined, and the determination of another search range may be repeated after edge extraction.

以上、文書領域３０１の文書領域を抽出するにあたり、８個の小矩形８０１〜８０８に対し、文書領域端線を探索するものとして説明を続ける。 As described above, in extracting the document area of the document area 301, the description will be continued assuming that the document area end lines are searched for the eight small rectangles 801 to 808.

［エッジ抽出］
エッジ抽出部５０８では、決定された第二の探索範囲となる入力画像中の小矩形内部の弱エッジを抽出する。弱エッジの検出は、公知の手法を用いて次のように行うことができる。すなわち、ＳｏｂｅｌフィルタやＬａｐｌａｃｉａｎフィルタなどを用いたエッジ強調手法により、入力画像中の文書の境界部分に該当する画素を際立たせる。 [Edge Extraction]
The edge extraction unit 508 extracts a weak edge inside the small rectangle in the input image that becomes the determined second search range. The weak edge can be detected using a known method as follows. That is, a pixel corresponding to a boundary portion of a document in an input image is made to stand out by an edge enhancement method using a Sobel filter or a Laplacian filter.

図１０は文書領域３０１の弱エッジ、すなわち、文書領域端線を示している。文書領域端線抽出部５０６はひとつの文字領域に関連付けられた文書領域端線を出力し、文書領域確定部５０９に引き渡される。 FIG. 10 shows a weak edge of the document area 301, that is, an end line of the document area. The document area end line extraction unit 506 outputs a document area end line associated with one character area and passes it to the document area determination unit 509.

［文書領域確定］
文書領域確定部５０９では、文書領域端線を用い、文書領域を確定する。文書領域端線にハフ変換や最小近似法などの公知の直線抽出法を用いることで、直線を検出することが可能である。第二の探索範囲のみのエッジ情報であれば、同じ入力画像中にある他の文字領域のエッジ情報も混在しないため、文書領域の境界を決定する直線の組み合わせを決定するための計算量を大幅に低減できる利点がある。図１１は文書領域端線から推定される直線を示す。ここから閉じた矩形を検出し、文書領域と確定する。図１２は上記処理により抽出された、文字領域４０１に対する文書領域１００１を示している。これは、入力された画像３００における文書領域３０１に対応する。 [Confirm document area]
The document area determination unit 509 determines the document area using the document area end line. A straight line can be detected by using a well-known straight line extraction method such as a Hough transform or a minimum approximation method for a document area end line. If the edge information is only for the second search range, the edge information of other character areas in the same input image will not be mixed, which greatly increases the amount of calculation for determining the combination of lines that determine the boundary of the document area. There is an advantage that can be reduced. FIG. 11 shows a straight line estimated from the document area end line. From this, a closed rectangle is detected and determined as a document area. FIG. 12 shows a document area 1001 for the character area 401 extracted by the above processing. This corresponds to the document area 301 in the input image 300.

以上、図２のＳ２０２における文書領域抽出の処理を説明した。なお、これらの処理は入力画像に存在する文字領域をひとつずつ文書領域情報の出力まで実行しても良いし、各機能に入力画像に存在する文字領域すべてを処理してから次の機能に遷移しても良い。 The document area extraction process in S202 of FIG. 2 has been described above. These processes may be executed one by one for each character area existing in the input image until the output of the document area information, or after each character area existing in the input image is processed for each function, the process proceeds to the next function. You may do it.

抽出した文書領域はＳ２０３において、入力画像から切り出され、逆透視変換などの歪み補正を行い、Ｓ２０４へ遷移する。Ｓ２０４では補正された文書領域を電子ファイルに変換する。図１３（ａ）は文字領域４０１に対する文書領域１００１を歪み補正し電子ファイル化した例である。 The extracted document area is cut out from the input image in S203, distortion correction such as reverse perspective transformation is performed, and the process proceeds to S204. In step S204, the corrected document area is converted into an electronic file. FIG. 13A shows an example in which the document area 1001 for the character area 401 is corrected for distortion and converted into an electronic file.

以上、説明したとおり、本実施形態ではひとつの自然画に背景との境界がはっきりしない文書が含まれる場合に対して、画像端の探索範囲を限定することにより、他の余計なノイズの発生を抑えることができる。そのため、自然画と背景がはっきりしない境界を抽出するための弱いエッジを抽出することが可能となる。また、文書端線の組み合わせ数を減らせるため、計算量の低減が可能となる。さらに、文字領域と文書端線を対応させて文書境界を算出するため、自然画中に複数の文書がある場合においても、文書領域の抽出が可能となる。 As described above, in the present embodiment, in the case where a single natural image includes a document whose boundary with the background is not clear, by limiting the search range of the image edge, other extra noise is generated. Can be suppressed. Therefore, it is possible to extract a weak edge for extracting a boundary where the natural image and the background are not clear. Further, since the number of combinations of document edge lines can be reduced, the amount of calculation can be reduced. Furthermore, since the document boundary is calculated by associating the character area with the document end line, the document area can be extracted even when there are a plurality of documents in the natural image.

＜第二の実施形態＞
第一の実施形態では、一つの文書領域に対し、文字領域がひとつある例について説明した。本実施形態では図４の文字領域４０２，４０３のように、ひとつの文書領域に対し、文字領域が複数に分かれてしまった場合の文書領域確定部の機能について説明する。 <Second Embodiment>
In the first embodiment, an example in which there is one character area for one document area has been described. In the present embodiment, the function of the document area determination unit when a single character area is divided into a plurality of character areas as in the character areas 402 and 403 in FIG. 4 will be described.

図２のＳ２０２の詳細な機能構成を図１４に示す。なお、図５の番号と同じ機能ブロックは第一の実施形態と同じであるため説明を省く。図１４の文字領域統合部５０１１は、第一の探索範囲である走査線と文書領域端画素の関係から文字領域を統合する。文字領域統合部５０１１の入力は、文書領域端画素抽出部５０３で出力した文字領域とそれに関連付けられた文書領域端画素、および第一の探索範囲である走査線のそれぞれの走査終点情報を用いる。 FIG. 14 shows a detailed functional configuration of S202 in FIG. The functional blocks that are the same as the numbers in FIG. 5 are the same as those in the first embodiment, and will not be described. The character area integration unit 5011 in FIG. 14 integrates the character areas from the relationship between the scanning line that is the first search range and the document area end pixels. The input of the character region integration unit 5011 uses the character region output from the document region end pixel extraction unit 503, the document region end pixel associated therewith, and the scanning end point information of the scanning line that is the first search range.

［文字領域統合処理］
文字領域統合部５０１１の詳細なフローを図１５に示す。文字領域統合部５０１１では、走査線の終点と、その走査線上に境界となる文書領域端画素があるかを判定し、２つの文字領域が同じ文書領域に含まれるか否かを判定する。第一の走査範囲である走査線の終点が画像そのものの端であれば統合する対象となる文字領域は存在しない。 [Character area integration processing]
A detailed flow of the character area integration unit 5011 is shown in FIG. The character area integration unit 5011 determines the end point of the scanning line and whether there is a document area end pixel serving as a boundary on the scanning line, and determines whether the two character areas are included in the same document area. If the end point of the scanning line that is the first scanning range is the end of the image itself, there is no character area to be integrated.

また、第一の走査範囲の走査線の終点が他の文字領域であれば、この２つの文字領域は同じ文書領域上にある可能性がある。従って、２つの文字領域の間に位置づく走査線上に文書領域端画素が存在すれば、そこに文書としての境界が存在するので統合対象でないと判定できる。以上のことから、文書領域統合の条件は、“走査線の終点が他の文字領域”であり、かつ、“終点が他の文字領域である走査線上に文書領域端画素が存在しない”場合となる。 If the end point of the scanning line in the first scanning range is another character area, the two character areas may be on the same document area. Accordingly, if there is a document area edge pixel on the scanning line located between the two character areas, it can be determined that the document is not an integration target because there is a boundary as a document. From the above, the document area integration condition is that “the end point of the scan line is another character area” and “the end pixel of the document area is not present on the scan line whose end point is another character area”. Become.

Ｓ１１０１では入力された文字領域に対し、文書領域端画素が抽出されなかった走査線があるか否かを判定する。Ｓ１１０１にてＹＥＳであればＳ１１０２へ、ＮＯであれば本処理フローの終了へ遷移する。Ｓ１１０２では、文書領域端画素が抽出されなかった走査線の終点が他の文字領域であるか否かを判定し、ＹＥＳならＳ１１０３、ＮＯなら本処理フローの終了へ遷移する。 In step S <b> 1101, it is determined whether there is a scanning line from which the document area end pixel has not been extracted for the input character area. If YES in S1101, the process proceeds to S1102, and if NO, the process flow ends. In S1102, it is determined whether or not the end point of the scanning line from which the document area end pixel has not been extracted is another character area. If YES, the process proceeds to S1103, and if NO, the process flow ends.

Ｓ１１０３、Ｓ１１０４では、同一の文書領域上にあると判定された文字領域および文字領域に関連付けられた文書領域端画素を統合し、本処理フローを終了する。なお、Ｓ１１０１およびＳ１１０２の順番が入れ替わっても統合条件は満たされるため、Ｓ１１０１およびＳ１１０２の順番はどちらが先でも構わない。 In S1103 and S1104, the character area determined to be on the same document area and the document area end pixels associated with the character area are integrated, and this processing flow ends. Note that, since the integration condition is satisfied even if the order of S1101 and S1102 is switched, whichever of the order of S1101 and S1102 may be used.

図１６は文字領域４０２に対する第一の探索範囲および文書領域端画素を示す。走査線１２０１〜１２０８は探索範囲を示している。走査線１２０１〜１２０４、および、走査線１２０６〜１２０８の走査終了点は画像端であり、走査線１２０５の走査終了点は文字領域４０３の外接矩形と交差する画素までとなる。また走査線１２０１〜１２０４、および、走査線１２０６〜１２０８にはそれぞれ文書領域端画素が抽出されているが、走査線１２０５は、文書領域端画素の抽出ができていない。このため文字領域４０２および文字領域４０３は同一の文書領域に存在する文字領域と判定し、２つの文書領域端画素を統合する。 FIG. 16 shows the first search range and document area end pixel for the character area 402. Scan lines 1201 to 1208 indicate a search range. The scanning end points of the scanning lines 1201 to 1204 and the scanning lines 1206 to 1208 are image ends, and the scanning end point of the scanning line 1205 is up to a pixel that intersects the circumscribed rectangle of the character area 403. Further, the document area end pixels are extracted in the scanning lines 1201 to 1204 and the scanning lines 1206 to 1208, respectively. However, the scanning line 1205 cannot extract the document area end pixels. For this reason, the character region 402 and the character region 403 are determined as character regions existing in the same document region, and the two document region end pixels are integrated.

図１７は文字領域統合部５０１１から出力された統合された文書領域端画素および、第二の探索範囲である小矩形、および、文書領域確定部５０９内で推定される文書領域端線を示す。本実施形態の第二の探索範囲決定部５０７では、文書領域端画素が一定の距離より近い場合はこれらを包含するような矩形にサイズを拡大する機能を有する。図１７の小矩形１３０１および小矩形１３０６は文書領域端画素が近接していると判定され、小矩形の大きさを規定より拡大したものである。 FIG. 17 shows the integrated document region end pixel output from the character region integration unit 5011, the small rectangle that is the second search range, and the document region end line estimated in the document region determination unit 509. The second search range determination unit 507 of the present embodiment has a function of enlarging the size into a rectangle that includes these when the document region end pixels are closer than a certain distance. The small rectangle 1301 and the small rectangle 1306 in FIG. 17 are determined to be close to the document region end pixel, and are larger than the size of the small rectangle.

図１８の文字領域１４０１は文字領域４０２、４０３を含む文書領域を示している。この文書領域に対し歪み補正を行い電子ファイル化したものを図１３（ｂ）に示す。なお、この文字領域１４０１は、図３の文書領域３０２に対応している。 A character area 1401 in FIG. 18 indicates a document area including character areas 402 and 403. FIG. 13B shows an electronic file obtained by performing distortion correction on this document area. The character area 1401 corresponds to the document area 302 in FIG.

以上、説明したとおり、同じ文書内で文字が離れて存在し、文字領域抽出時に分割された状態であっても、文字領域を統合して単一の文書領域として抽出することが可能なる。また、文字領域統合を行うことで文書領域確定に使用できる線が増えるため、より確実に文書領域が抽出できる。さらに文書領域確定処理を実施する領域数を減らせるため計算量の低減が可能となる。 As described above, even if characters exist apart from each other in the same document and are divided when the character region is extracted, the character regions can be integrated and extracted as a single document region. In addition, by integrating character areas, more lines can be used to determine the document area, so that the document area can be extracted more reliably. Further, the amount of calculation can be reduced because the number of areas for performing the document area determination process can be reduced.

＜その他の実施形態＞
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 <Other embodiments>
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

An image processing apparatus for extracting a document area from image data having a document area,
A character region extraction means for extracting a character region that is a circumscribed rectangle of characters included in the document region from the input image data;
First search range determining means for determining a first search range for searching for a document region end pixel serving as a document region end with reference to the position of the extracted character region;
Document area edge pixel extracting means for extracting the document area edge pixel from the first search range;
Second search range determining means for determining a second search range for searching for a document area end line serving as a document area end with reference to the position of the extracted document area end pixel;
Document area end line extracting means for extracting the document area end line from the second search range;
An image processing apparatus comprising: a document area determining unit that determines the document area from the document area end line.

The first search range determination means scans in a plurality of directions starting from an inner point of the character area that is a circumscribed rectangle and going outward from the circumscribed rectangle, and the first search range is determined from the inner point. The image processing apparatus according to claim 1, wherein the image processing apparatus includes an image end or an intersection with a different character area in the image.

2. The image according to claim 1, wherein the second search range determination unit sets a small rectangle having a predetermined size centered on the extracted pixel in the document area as the second search range. Processing equipment.

The document area edge pixel extracting means includes:
A background color determining means for determining a pixel value to be a background color of the document area from a pixel value of a peripheral pixel of a character connected component adjacent to a pixel that is a search start point of the document area edge pixel;
Document area end pixel determining means for scanning the first search range and determining whether the pixel is the document area end pixel using the pixel value of the target pixel and the pixel value as the background color. The image processing apparatus according to claim 1, wherein:

If it is determined from the determined first search range and the document area end pixel whether a plurality of character areas are included in the same document area, and the plurality of character areas are determined to be included in the same document area, The image processing apparatus according to claim 1, further comprising: a character area integrating unit that integrates the character areas.

In the first search range located between the plurality of character regions, when the document region end pixel is not extracted, the character region integration unit includes the plurality of character regions included in the same document region. The image processing apparatus according to claim 5, wherein the determination is performed.

A control method of an image processing apparatus for extracting the document area from image data having a document area,
A character region extraction step in which the character region extraction means of the image processing device extracts a character region that is a circumscribed rectangle of a character included in the document region from the input image data;
A first search for determining a first search range in which a first search range determining unit of the image processing apparatus searches for a document region end pixel serving as a document region end with reference to the extracted position of the character region. A range determination process;
A document area edge pixel extracting step in which the document area edge pixel extracting means of the image processing apparatus extracts the document area edge pixel from the first search range;
Second search range determining means of the image processing apparatus determines a second search range for searching for a document area end line serving as a document area end with reference to the extracted position of the document area end pixel. A search range determination step of
A document area end line extracting means for extracting the document area end line from the second search range;
A control method comprising: a document region determining step of determining the document region from the document region end line, wherein the document region determining means of the image processing apparatus.

Computer
A character area extracting means for extracting a character area that is a circumscribed rectangle of a character included in the document area from the input image data;
First search range determining means for determining a first search range for searching for a document region end pixel serving as a document region end with reference to the position of the extracted character region;
Document area edge pixel extracting means for extracting the document area edge pixel from the first search range;
Second search range determining means for determining a second search range for searching for a document area end line serving as a document area end with reference to the position of the extracted document area end pixel;
Document area end line extracting means for extracting the document area end line from the second search range;
A computer program for functioning as document area determining means for determining the document area from the document area end line.