JP2017058743A

JP2017058743A - Recognition program, recognition method and recognition device

Info

Publication number: JP2017058743A
Application number: JP2015180731A
Authority: JP
Inventors: 美佐子宗; Misako So; 瀬川　英吾; Eigo Segawa; 英吾瀬川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-09-14
Filing date: 2015-09-14
Publication date: 2017-03-23

Abstract

PROBLEM TO BE SOLVED: To provide a recognition device capable of sorting input images to document images and scene images.SOLUTION: A recognition device 10 executes a series of processing: to acquire input images; to generate a plurality of blurred images in which the degree of blur is different from each other from the input images; to calculate the number of connection components which are obtained by applying a series of labeling processing to each of the blurred images; and to determine whether, among the plurality of blurred images, there are any changes in the degree of blur, in which the number of connection components changes exceeding a predetermined threshold level.SELECTED DRAWING: Figure 1

Description

本発明は、認識プログラム、認識方法及び認識装置に関する。 The present invention relates to a recognition program, a recognition method, and a recognition apparatus.

画像に含まれる文字を認識する技術の一例として、ＯＣＲ（Optical Character Recognition）が知られている。 As an example of a technique for recognizing characters included in an image, OCR (Optical Character Recognition) is known.

例えば、ＯＣＲソフトが適用される画像には、文書が撮影された画像のように、画像に含まれる内容の主体が文字である画像もあれば、風景などが撮影された画像のように、画像に含まれる内容の主体が文字以外のコンテンツ、例えば風景、図や表などである画像もある。なお、以下では、前者の画像のことを「文書画像」と記載すると共に、後者の画像のことを「情景画像」と記載する場合がある。 For example, an image to which the OCR software is applied includes an image in which the subject of the content included in the image is a character, such as an image in which a document is captured, or an image in which a landscape or the like is captured. There are also images whose main content is other than text, such as landscapes, figures and tables. In the following, the former image is sometimes referred to as a “document image” and the latter image is sometimes referred to as a “scene image”.

このうち、文書画像では、一定の規則、例えば文字、行や章などの書式にしたがって文字が整列する傾向にある。このため、文字画像からは、行方向、すなわち横書きまたは縦書きや行間を検出するレイアウト解析を行うことにより、画像上で文字が存在する文字領域を推定することができる。例えば、文書画像を２値化し、２値化画像を水平方向および垂直方向に走査して黒画素をカウントすることにより、黒画素のヒストグラムが方向ごとに生成される。その上で、方向ごとに生成された黒画素のヒストグラムから行方向および行間を検出し、行方向および行間にしたがって文字領域が推定される。 Among these, in a document image, there is a tendency that characters are arranged according to a certain rule, for example, a format of characters, lines, chapters and the like. For this reason, from the character image, the character area where the character exists on the image can be estimated by performing the layout analysis for detecting the line direction, that is, horizontal writing, vertical writing, and line spacing. For example, the histogram of black pixels is generated for each direction by binarizing the document image and scanning the binarized image in the horizontal and vertical directions to count black pixels. Then, the line direction and the line spacing are detected from the histogram of black pixels generated for each direction, and the character area is estimated according to the line direction and the line spacing.

その一方で、情景画像からは、文書画像のように、文字領域を簡単に推定することはできない。なぜなら、情景画像内に文字が含まれる場合でも、例えば、風景に収まる看板の文字などのように、書式等にしたがって文字が整列するとは限らず、情景画像上で文字が局所的かつ不規則に存在するからである。それ故、情景画像のレイアウト解析が行われる場合、行方向や行間などの書式を手がかりに文字領域を推定するのが困難である。よって、情景画像上で文字と紛らわしい箇所が文字であるか否かを弁別する点に重点が置かれたアルゴリズムがレイアウト解析に用いられる。 On the other hand, a character area cannot be easily estimated from a scene image as in a document image. This is because even if characters are included in the scene image, the characters are not always aligned according to the format, such as a signboard character that fits in the landscape, and the characters are locally and irregularly displayed on the scene image. Because it exists. Therefore, when a layout analysis of a scene image is performed, it is difficult to estimate a character area by using a format such as a line direction or a line space. Therefore, an algorithm with an emphasis on discriminating whether or not a portion confused with a character on a scene image is a character is used for layout analysis.

このように、文書画像および情景画像の間では、レイアウト解析に適用されるアルゴリズムも異なる。このため、文書画像には、文書画像用のＯＣＲソフトが適用される一方で、情景画像には、情景画像用のＯＣＲソフトが適用される。 Thus, the algorithm applied to the layout analysis differs between the document image and the scene image. For this reason, OCR software for document images is applied to document images, while OCR software for scene images is applied to scene images.

特開平０４−３３０５８７号公報Japanese Patent Laid-Open No. 04-330587 特開平０８−３０５７９３号公報Japanese Patent Laid-Open No. 08-305793 特開平０６−０２００９２号公報Japanese Patent Laid-Open No. 06-020092 特開２０１３−１５７９５６号公報JP 2013-157756 A

秋山照雄，増田功「書式指定情報によらない紙面構成要素抽出法」電子通信学会論文誌Ｖｏｌ，Ｊ６６−ＤＮｏ．１Teruo Akiyama, Isao Masuda “A method for extracting paper component elements without using format specification information” Vol., J66-D No. 1

しかしながら、上記の技術では、入力画像を文書画像および情景画像へ分類できない。 However, with the above technique, the input image cannot be classified into a document image and a scene image.

すなわち、文書画像および情景画像の分類は、ユーザによるオペレーション、すなわち入力画像に適用されるＯＣＲソフトを文書画像用または情景画像用のいずれの方式とするのかを切り替えるオペレーションに委ねられている実情がある。それ故、ユーザによるオペレーションがなければ、情景画像用のＯＣＲソフトが文書画像に適用されたり、文書画像用のＯＣＲソフトが情景画像に適用されたりする場合がある。この場合、文字認識の精度が低下したり、文字以外の部分から意味不明な文字が認識されたり、或いは処理時間が増大したりするといった不具合が起こる。 In other words, the classification of the document image and the scene image depends on the operation by the user, that is, the operation of switching between the OCR software applied to the input image for the document image or the scene image. . Therefore, if there is no operation by the user, OCR software for a scene image may be applied to the document image, or OCR software for the document image may be applied to the scene image. In this case, there arises a problem that the accuracy of character recognition is reduced, an unknown character is recognized from a part other than the character, or the processing time is increased.

また、文書画像用のレイアウト解析を援用し、行方向や行間などが検出されるか否かにより、入力画像を文書画像または情景画像へ分類しようとしても、誤った分類が行われる場合がある。例えば、情景画像の２値化画像にストライプ状の黒画素が含まれる場合、文書画像の２値化画像と同様に、黒画素のヒストグラムから行方向や行間が検出される場合がある。この場合、入力画像が情景画像であっても誤って文書画像へ分類されてしまう。 In addition, there is a case where an erroneous classification is performed even if an attempt is made to classify an input image into a document image or a scene image depending on whether a line direction, a line spacing, or the like is detected by using layout analysis for document images. For example, when a binary image of a scene image includes stripe-shaped black pixels, the row direction and the line spacing may be detected from the histogram of black pixels as in the case of a binary image of a document image. In this case, even if the input image is a scene image, it is erroneously classified into a document image.

１つの側面では、本発明は、入力画像を文書画像および情景画像へ分類できる認識プログラム、認識方法及び認識装置を提供することを目的とする。 In one aspect, an object of the present invention is to provide a recognition program, a recognition method, and a recognition device that can classify an input image into a document image and a scene image.

一態様では、コンピュータに、入力画像を取得する処理と、前記入力画像から暈け度合が異なる複数の暈け画像を生成する処理と、暈け画像ごとに当該暈け画像にラベリング処理を適用することにより得られる連結成分の数を算出する処理と、前記複数の暈け画像の間で前記連結成分の数の変化量が所定の閾値以上である暈け度合の変化が存在するか否かを判定する処理とを実行させる。 In one aspect, a process for acquiring an input image, a process for generating a plurality of blur images having different degrees of blur from the input image, and a labeling process for the blur image are applied to each blur image. A process for calculating the number of connected components obtained by this, and whether there is a change in the degree of blurring between the plurality of blurred images, wherein the amount of change in the number of connected components is greater than or equal to a predetermined threshold. The determination process is executed.

入力画像を文書画像および情景画像へ分類できる。 The input image can be classified into a document image and a scene image.

図１は、実施例１に係る認識装置の機能的構成を示す図である。FIG. 1 is a diagram illustrating a functional configuration of the recognition apparatus according to the first embodiment. 図２Ａは、画像の一例を示す図である。FIG. 2A is a diagram illustrating an example of an image. 図２Ｂは、画像の一例を示す図である。FIG. 2B is a diagram illustrating an example of an image. 図２Ｃは、画像の一例を示す図である。FIG. 2C is a diagram illustrating an example of an image. 図２Ｄは、画像の一例を示す図である。FIG. 2D is a diagram illustrating an example of an image. 図３は、画像ピラミッドの一例を示す図である。FIG. 3 is a diagram illustrating an example of an image pyramid. 図４は、暈け画像におけるブロブの抽出結果の一例を示す図である。FIG. 4 is a diagram illustrating an example of a blob extraction result in a blurred image. 図５は、暈かし関数の一例を示す図である。FIG. 5 is a diagram illustrating an example of the blurring function. 図６は、暈け画像におけるブロブの抽出結果の一例を示す図である。FIG. 6 is a diagram illustrating an example of a blob extraction result in a blurred image. 図７は、暈かし関数の一例を示す図である。FIG. 7 is a diagram illustrating an example of the blurring function. 図８は、実施例１に係る分類処理の手順を示すフローチャートである。FIG. 8 is a flowchart illustrating the procedure of the classification process according to the first embodiment. 図９は、実施例１に係る暈け画像の生成処理の手順を示すフローチャートである。FIG. 9 is a flowchart illustrating the procedure of the blur image generation process according to the first embodiment. 図１０は、実施例１に係るブロブ数算出処理の手順を示すフローチャートである。FIG. 10 is a flowchart illustrating a procedure of blob number calculation processing according to the first embodiment. 図１１は、実施例１及び実施例２に係る認識プログラムを実行するコンピュータのハードウェア構成例を示す図である。FIG. 11 is a diagram illustrating a hardware configuration example of a computer that executes a recognition program according to the first and second embodiments.

以下に添付図面を参照して本願に係る認識プログラム、認識方法及び認識装置について説明する。なお、この実施例は開示の技術を限定するものではない。そして、各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 A recognition program, a recognition method, and a recognition apparatus according to the present application will be described below with reference to the accompanying drawings. Note that this embodiment does not limit the disclosed technology. Each embodiment can be appropriately combined within a range in which processing contents are not contradictory.

［認識装置の構成］
図１は、実施例１に係る認識装置の機能的構成を示す図である。図１に示す認識装置１０は、入力画像に含まれる文字を認識する文字認識処理を行うものである。かかる文字認識処理の一環として、入力画像から生成される暈け度合が異なる一連の暈け画像の間で、黒画素のブロブが文字単位から行単位へ暈けるブロブ数の変化の有無を判定することにより、入力画像を文書画像および情景画像へ分類する分類処理を実現する。 [Configuration of recognition device]
FIG. 1 is a diagram illustrating a functional configuration of the recognition apparatus according to the first embodiment. The recognition device 10 shown in FIG. 1 performs character recognition processing for recognizing characters included in an input image. As part of such character recognition processing, it is determined whether or not there is a change in the number of blobs in which a black pixel blob moves from a character unit to a line unit between a series of blurred images generated from an input image with different degrees of blurring. Thus, a classification process for classifying an input image into a document image and a scene image is realized.

一実施形態として、認識装置１０は、上記の文字認識処理がパッケージソフトウェアやオンラインソフトウェアとして提供される文字認識プログラム、いわゆるＯＣＲソフトを所望のコンピュータにインストールさせることによって実装できる。かかるＯＣＲソフトは、上記の分類処理を実現する分類プログラムをコンポーネントとして含むアプリケーションプログラムとして提供されることとしてもよいし、分類プログラムがＯＣＲソフトにアドオンされるライブラリとして提供されることとしてもよい。例えば、スマートフォン、携帯電話機やＰＨＳ（Personal Handyphone System）などの移動体通信端末のみならず、タブレット端末やスレート端末などを含む携帯端末装置に文字認識プログラムをインストールさせる。これによって、携帯端末装置を認識装置１０として機能させることができる。なお、ここでは、一例として、認識装置１０が携帯端末装置として実装される場合を例示して以下の説明を行うが、パーソナルコンピュータを始めとする据置き型の端末装置に認識プログラムをインストールさせることもできる。 As one embodiment, the recognition apparatus 10 can be implemented by installing a character recognition program in which the above character recognition processing is provided as package software or online software, so-called OCR software, in a desired computer. Such OCR software may be provided as an application program that includes a classification program for realizing the above classification processing as a component, or may be provided as a library in which the classification program is added to the OCR software. For example, the character recognition program is installed not only on mobile communication terminals such as smartphones, mobile phones and PHS (Personal Handyphone System) but also on mobile terminal devices including tablet terminals and slate terminals. Thereby, the mobile terminal device can function as the recognition device 10. Here, as an example, the following description is given by taking the case where the recognition device 10 is implemented as a mobile terminal device, but the recognition program is installed in a stationary terminal device such as a personal computer. You can also.

以下では、あくまで一例として、ユーザにより使用されるコンピュータで上記の文字認識プログラムが実行される場合を例示するが、必ずしもユーザのコンピュータ上で上記の文字認識プログラムが実行されずともかまわない。すなわち、ユーザのコンピュータから送信される画像を入力画像とし、当該入力画像の分類結果または当該入力画像の文字認識結果を出力する機能が実現されればよく、ネットワーク上のコンピュータ、例えばサーバ装置、物理マシンまたは物理マシン群により形成されるクラウドとして実装されることとしてもかまわない。 Hereinafter, as an example, a case where the above character recognition program is executed on a computer used by a user is exemplified, but the above character recognition program may not necessarily be executed on the user's computer. That is, an image transmitted from a user's computer can be used as an input image, and a function for outputting a classification result of the input image or a character recognition result of the input image can be realized. It may be implemented as a cloud formed by a machine or a group of physical machines.

図１には、上記の文字認識処理を仮想的に実現する処理部及び処理部が参照または登録を行う記憶部が図示されているが、これらは上記の文字認識処理を行う上で最小限の処理部及び記憶部が示されているに過ぎない。すなわち、認識装置１０は、図１に示した機能部以外にも既知のコンピュータが標準装備する各種の機能部を有することとしてもかまわない。例えば、認識装置１０がタブレット端末として実装される場合には、加速度センサや角速度センサなどのモーションセンサをさらに有することとしてもよい。また、認識装置１０が移動体通信端末として実装される場合には、アンテナ、ＧＰＳ（Global Positioning System）受信機などの機能部をさらに有していてもかまわない。また、認識装置１０が据置き型の端末装置として実装される場合には、キーボード、マウスやディスプレイなどの入出力デバイスを有することとしてもよい。 FIG. 1 illustrates a processing unit that virtually implements the character recognition process and a storage unit that is referred to or registered by the processing unit. These are the minimum for performing the character recognition process. Only the processing unit and the storage unit are shown. That is, the recognition apparatus 10 may have various functional units that are standardly installed in known computers in addition to the functional units illustrated in FIG. For example, when the recognition apparatus 10 is mounted as a tablet terminal, it may have a motion sensor such as an acceleration sensor or an angular velocity sensor. When the recognition device 10 is implemented as a mobile communication terminal, the recognition device 10 may further include a functional unit such as an antenna or a GPS (Global Positioning System) receiver. When the recognition device 10 is mounted as a stationary terminal device, it may have an input / output device such as a keyboard, a mouse, and a display.

図１に示すように、認識装置１０は、画像記憶部１１と、取得部１２と、変換部１３と、生成部１４と、分類部１５と、第１認識部１６ａと、第２認識部１６ｂとを有する。 As shown in FIG. 1, the recognition device 10 includes an image storage unit 11, an acquisition unit 12, a conversion unit 13, a generation unit 14, a classification unit 15, a first recognition unit 16a, and a second recognition unit 16b. And have.

画像記憶部１１は、画像データを記憶する記憶部である。 The image storage unit 11 is a storage unit that stores image data.

一実施形態として、画像記憶部１１には、図示しないカメラ等により撮影が行われた場合、カメラにより撮像された画像データが登録される。この他、画像記憶部１１には、図示しない通信インタフェースなどを介して、ネットワーク上の任意のコンピュータから画像データのダウンロードが行われた場合、当該ダウンロードされた画像が登録される。なお、画像記憶部１１には、任意のファイル形式、いわゆるフォーマットで画像を保存することができる。 As an embodiment, when image capturing is performed by a camera (not shown) or the like, image data captured by the camera is registered in the image storage unit 11. In addition, when image data is downloaded from an arbitrary computer on the network via a communication interface (not shown) or the like, the downloaded image is registered in the image storage unit 11. The image storage unit 11 can store images in any file format, so-called format.

図２Ａ〜図２Ｄは、画像の一例を示す図である。図２Ａ〜図２Ｄには、一部に文字を含む画像が示されている。図２Ａ及び図２Ｂには、文書のページが撮影された画像が示されている。図２Ａに示す画像には、横書きの論文の１ページが映っており、図２Ｂに示す画像には、縦書きの書籍の１ページが映っている。また、図２Ｃ及び図２Ｄには、風景が撮影された画像が示されている。図２Ｃに示す画像には、風景の中に収まる看板に「仙人岳」等の文字列が映っており、図２Ｄに示す画像には、左上のポスター、中央部の左に存在する掲示板、各所に配置された段ボールの一部に文字列が映っている。 2A to 2D are diagrams illustrating examples of images. 2A to 2D show images partially including characters. 2A and 2B show images in which a page of a document is captured. The image shown in FIG. 2A shows one page of a horizontally written paper, and the image shown in FIG. 2B shows one page of a vertically written book. 2C and 2D show images in which a landscape is photographed. In the image shown in FIG. 2C, a character string such as “Senjin-dake” is shown on a signboard that fits in the landscape. In the image shown in FIG. 2D, a poster on the upper left, a bulletin board on the left in the center, and various places A character string is reflected on a part of the cardboard placed in the box.

このように、スマートフォンに代表される携帯端末装置等には、文書画像および情景画像が混在して保存される状況が生じる。例えば、図２Ａ及び図２Ｂに示す通り、文書がメモ代わりに撮影された文書画像が保存されたり、また、図２Ｃ及び図２Ｄに示す通り、風景等が記録として撮影された情景画像が保存されたりする。 Thus, a situation occurs in which a document image and a scene image are stored in a mixed manner in a mobile terminal device represented by a smartphone. For example, as shown in FIGS. 2A and 2B, a document image obtained by taking a document instead of a memo is saved, or as shown in FIGS. 2C and 2D, a scene image obtained by taking a landscape as a record is saved. Or

これらの画像に含まれる文字を認識する技術的意義の一例として、次のような点が挙げられる。例えば、文書画像に含まれる文字を認識することにより、アプリケーション、例えばテキストエディタ、ワープロソフトや表計算ソフトにおける編集に用いることができる。また、情景画像に含まれる文字を認識することにより、当該情景画像の検索に用いるキーワードとして情景画像のメタデータへ付与できる。 Examples of the technical significance of recognizing characters included in these images include the following points. For example, by recognizing characters included in a document image, it can be used for editing in an application such as a text editor, word processor software or spreadsheet software. Further, by recognizing characters included in the scene image, it can be given to the metadata of the scene image as a keyword used for searching the scene image.

ここで、画像記憶部１１に記憶された画像に適用するレイアウト解析を文書画像用または情景画像用のいずれの方式とするのかを切り替えるオペレーションをユーザに委ねたのでは、ユーザの利便性が損なわれる場合がある。そこで、本実施例に係る認識装置１０では、文書画像または情景画像への分類を自動化し、もってユーザによるオペレーションを省略して文字認識処理を実行できる環境を提供することを目指す。 Here, if the operation for switching between the layout analysis applied to the image stored in the image storage unit 11 for the document image or the scene image is entrusted to the user, the convenience for the user is impaired. There is a case. In view of this, the recognition apparatus 10 according to the present embodiment aims to provide an environment in which classification into a document image or a scene image is automated, so that the operation by the user can be omitted and the character recognition process can be executed.

取得部１２は、画像を取得する処理部である。 The acquisition unit 12 is a processing unit that acquires an image.

一実施形態として、取得部１２は、次のような条件で上記の分類処理を起動し、画像記憶部１１に記憶された画像を取得する。例えば、取得部１２は、図示しない操作部等を介して、画像記憶部１１に記憶された画像の中から画像の指定を受け付けた場合、当該指定が受け付けられた画像を画像記憶部１１から読み出す。この他、取得部１２は、画像記憶部１１に新規の画像が登録される度に、新規登録が行われた画像を画像記憶部１１から読み出す。この場合、取得部１２により上記の分類処理が起動され、バックグラウンドで分類処理が実行されることになる。このように取得部１２により取得された画像が上記の分類処理の入力とされる。以下では、取得部１２により取得された画像、すなわち上記の分類処理の入力とされる画像のことを「入力画像」と記載する場合がある。 As one embodiment, the acquisition unit 12 starts the above classification process under the following conditions, and acquires an image stored in the image storage unit 11. For example, when the acquisition unit 12 receives a designation of an image from among images stored in the image storage unit 11 via an operation unit (not shown) or the like, the acquisition unit 12 reads the image that has received the designation from the image storage unit 11. . In addition, the acquisition unit 12 reads the newly registered image from the image storage unit 11 each time a new image is registered in the image storage unit 11. In this case, the above classification process is started by the acquisition unit 12, and the classification process is executed in the background. The image acquired by the acquisition unit 12 in this way is used as an input for the above classification process. Hereinafter, an image acquired by the acquisition unit 12, that is, an image that is input to the above classification process may be referred to as an “input image”.

なお、ここでは、あくまで一例として、認識装置１０が有する画像記憶部１１から画像を取得する場合を例示したが、必ずしも認識装置１０が画像を保存しておらずともかまわない。例えば、取得部１２は、図示しない通信インタフェースを介して、ネットワーク上のコンピュータ、例えばファイルサーバ、Ｗｅｂサーバやクラウドから画像を取得することもできる。この他、取得部１２は、メモリカードやＵＳＢ（Universal Serial Bus）メモリなどのリムーバブルメディアから画像を取得することもできる。 Here, as an example, the case where an image is acquired from the image storage unit 11 included in the recognition device 10 is illustrated as an example, but the recognition device 10 may not necessarily store the image. For example, the acquisition unit 12 can also acquire an image from a computer on a network, such as a file server, a Web server, or a cloud, via a communication interface (not shown). In addition, the acquisition unit 12 can also acquire an image from a removable medium such as a memory card or a USB (Universal Serial Bus) memory.

変換部１３は、入力画像の画像形式を所定の画像形式へ変換する処理部である。 The conversion unit 13 is a processing unit that converts the image format of the input image into a predetermined image format.

一実施形態として、変換部１３は、取得部１２により取得された入力画像がカラー形式であるか否かを判定する。このとき、変換部１３は、入力画像がカラー形式である場合、入力画像をカラー形式からグレースケール形式へ変換する。かかるグレースケールへの変換は、任意の手法により実現することができる。例えば、入力画像がＲＧＢ表色系の画像データである場合、変換部１３は、入力画像に含まれる画素が持つＲＧＢの画素値をＬ＊ａ＊ｂ表色系やＹＵＶ表色系の画素値に変換した上で明度Ｌや輝度Ｙを各画素の画素値に設定することにより、カラー形式からグレースケール形式へ変換することができる。 As an embodiment, the conversion unit 13 determines whether the input image acquired by the acquisition unit 12 is in a color format. At this time, when the input image is in the color format, the conversion unit 13 converts the input image from the color format to the gray scale format. Such conversion to gray scale can be realized by any method. For example, when the input image is RGB color system image data, the conversion unit 13 converts the RGB pixel values of the pixels included in the input image into pixel values of the L * a * b color system or the YUV color system. By converting the color format to the gray scale format, the brightness L and the luminance Y are set to the pixel values of the respective pixels.

なお、ここでは、他の表色系の画素値へ変換する場合を例示したが、あくまで一例であり、グレースケールへの変換方法はこれに限定されない。例えば、変換部１３は、ＲＧＢの画素値のうち少なくともいずれか１つの成分の画素値を代表して抽出することにより変換を実現してもよいし、複数の成分の画素値に統計処理を実行することにより１つの代表値を算出することにより変換を実現してもよい。 In addition, although the case where it converts to the pixel value of another color system was illustrated here, it is an example to the last and the conversion method to a gray scale is not limited to this. For example, the conversion unit 13 may realize the conversion by extracting the pixel value of at least one of the RGB pixel values as a representative, or perform statistical processing on the pixel values of a plurality of components Thus, the conversion may be realized by calculating one representative value.

生成部１４は、入力画像から暈け度合が異なる複数の暈け画像を生成する処理部である。 The generation unit 14 is a processing unit that generates a plurality of blurred images having different degrees of blur from the input image.

一実施形態として、生成部１４は、変換部１３によりカラーからグレースケールへ変換された入力画像に対し、ガウスフィルタを適用してフィルタの畳み込み演算を行うことにより、暈け画像を生成する。ここで、生成部１４は、画像を水平方向に暈かすガウスフィルタを入力画像に適用する畳み込み演算と、画像を垂直方向に暈かす垂直方向用のガウスフィルタを入力画像に適用する畳み込み演算とを独立して実行する。これによって、生成部１４は、入力画像が水平方向へ暈かされた暈け画像と、入力画像が垂直方向へ暈かされた暈け画像とを生成する。以下では、前者のことを「水平暈け画像」と記載し、後者のことを「垂直暈け画像」と記載する場合がある。さらに、生成部１４は、ガウスフィルタの生成に用いるガウス関数の分散σ^２、言い換えればスケールｔを初期値ｔ_ｓから目標値ｔ_ｅまで所定の更新幅Δｔで変化させながらガウスフィルタを計算することにより異なる暈け度合が設定されたガウスフィルタを生成し、ガウスフィルタごとに畳み込み演算を実行する。この結果、１つの入力画像から図３に示す構造を持つ画像ピラミッドが生成される。図３は、画像ピラミッドの一例を示す図である。図３に示すように、水平及び垂直の暈かし方向ごとに、水平方向のスケールｔ_Ｈが異なる一連の水平暈け画像と、垂直方向のスケールｔ_Ｖが異なる一連の垂直暈け画像とが生成される。 As one embodiment, the generation unit 14 generates a blurred image by applying a Gaussian filter to the input image converted from color to grayscale by the conversion unit 13 and performing a filter convolution operation. Here, the generation unit 14 performs a convolution operation that applies a Gaussian filter that applies an image to the input image in a horizontal direction and a convolution operation that applies a Gaussian filter for a vertical direction that applies an image to the input image. Run independently. Thus, the generation unit 14 generates a blurred image in which the input image is blurred in the horizontal direction and a blurred image in which the input image is blurred in the vertical direction. Hereinafter, the former may be referred to as “horizontal blurred image” and the latter may be referred to as “vertically blurred image”. Further, generating unit 14, variance sigma ² of the Gaussian function used for generating the Gaussian ^filter, calculating a Gaussian filter while changing a predetermined update width Δt from the initial value t _s to the target value t _e scale t in other words To generate a Gaussian filter with a different degree of blurring, and perform a convolution operation for each Gaussian filter. As a result, an image pyramid having the structure shown in FIG. 3 is generated from one input image. FIG. 3 is a diagram illustrating an example of an image pyramid. As shown in FIG. 3, for each horizontal and vertical blur direction, a series of horizontal blur images having different horizontal scales t _H and a series of vertical blur images having different vertical scales t _V are obtained. Generated.

［ガウスフィルタの算出式］
ここで、上記の水平暈け画像の生成に用いる第１のガウスフィルタは、一例として、下記の式（１）にしたがって算出することができる。また、上記の垂直暈け画像の生成に用いる第２のガウスフィルタも、一例として、下記の式（２）にしたがって算出することができる。 [Gaussian filter calculation formula]
Here, the 1st Gaussian filter used for generation of the above-mentioned horizontal blur image can be calculated according to the following formula (1) as an example. In addition, the second Gaussian filter used for generating the above-described vertical blur image can be calculated according to the following equation (2) as an example.

上記の式（１）における（ｘ，ｙ）は、画像上の座標を指す。このうち、「ｘ」は、画像の水平方向に対応するＸ軸の座標を指す一方で、「ｙ」は、画像の垂直方向に対応するＹ軸の座標を指す。また、上記の式（１）における「ｔ_Ｈ」は、水平方向のスケールを指し、また、上記の式（２）における「ｔ_Ｖ」は、垂直方向のスケールを指す。また、上記の式（１）及び上記の式（２）における「ｃ」は、１よりも十分に小さく、かつ符号が正である定数を指し、実験等の結果により設定される。例えば、０．１〜０．２の値が定数ｃとして採用される。 In the above equation (1), (x, y) indicates the coordinates on the image. Among these, “x” indicates the X-axis coordinate corresponding to the horizontal direction of the image, while “y” indicates the Y-axis coordinate corresponding to the vertical direction of the image. Further, “t _H ” in the above equation (1) indicates a horizontal scale, and “t _V ” in the above equation (2) indicates a vertical scale. Further, “c” in the above formula (1) and the above formula (2) indicates a constant that is sufficiently smaller than 1 and has a positive sign, and is set based on the result of an experiment or the like. For example, a value of 0.1 to 0.2 is adopted as the constant c.

これら式（１）及び式（２）を用いて、生成部１４は、スケールｔを初期値ｔ_ｓ、例えば「０」に設定してｔ＝０のガウスフィルタを算出した後、スケールｔの値が目標値ｔ_ｅに更新されるまでスケールｔの値をΔｔずつインクリメントしながら各スケールｔのガウスフィルタを水平及び垂直の暈かし方向ごとに算出する。これによって、行数及び列数が等しい等方性を持つガウスフィルタばかりが算出されるとは限らず、行数及び列数が異なるフィルタ、すなわち非等方性を持つガウスフィルタが算出される場合もある。例えば、スケールｔが大きくなるほど、第１のガウスフィルタの場合、行数が列数よりも大きいガウスフィルタが生成される一方で、第２のガウスフィルタの場合、列数が行数よりも大きいガウスフィルタが生成される可能性が高まる。 Using these formulas (1) and (2), the generation unit 14 calculates a Gaussian filter with t = 0 by setting the scale t to an initial value t _s , for example, “0”, and then the value of the scale t. There is calculated while incrementing the value of the scale t by Δt a Gaussian filter of each scale t for each horizontal and vertical bulk lend direction until updated to the target value t _e. As a result, not only Gaussian filters with equal isotropic numbers of rows and columns are calculated, but filters with different numbers of rows and columns, that is, Gaussian filters with anisotropy are calculated. There is also. For example, as the scale t increases, a Gaussian filter having a number of rows larger than the number of columns is generated in the case of the first Gaussian filter, whereas a Gaussian having a number of columns larger than the number of rows in the case of the second Gaussian filter. The possibility that a filter will be generated increases.

ここで、スケールｔが「０」である場合、ガウス関数の分散σ^２が「０」に設定される。それ故、ｔ＝０のガウスフィルタが適用された入力画像からは、暈けがない画像、すなわち入力画像と同一の画像が暈け画像として生成されることになる。したがって、ｔ＝０のガウスフィルタは、必ずしも上記の式（１）及び上記の式（２）を用いる計算により算出せずともかまわず、入力画像をそのままｔ＝０の暈け画像として用いることができる。なお、ここで、スケールｔの初期値ｔ_ｓを「０」とする場合を例示したが、初期値ｔ_ｓは必ずしも「０」でなくともかまわない。 Here, when the scale t is “0”, the variance σ ^{2 of the} Gaussian function is set to “0”. Therefore, from the input image to which the Gaussian filter of t = 0 is applied, an image without a blur, that is, the same image as the input image is generated as a blur image. Therefore, the Gaussian filter with t = 0 does not necessarily have to be calculated by the calculation using the above formula (1) and the above formula (2), and the input image is directly used as the gain image with t = 0. it can. Here, a case has been exemplified where the initial value t _s of scale t is "0", the initial value t _s is acceptable not necessarily "0".

［フィルタサイズ］
このように、スケールｔを段階的に更新しながらガウスフィルタを算出する場合、ガウスフィルタのフィルタサイズもスケールｔに連動して変えることができる。なぜなら、スケールｔが大きく設定されるほど大きな範囲で入力画像を暈かすにもかかわらず、スケールｔに比べて小さなフィルタサイズが設定されれば、フィルタサイズが制約となって水平方向または垂直方向の暈かしが適切に機能しないからである。 [Filter size]
As described above, when the Gaussian filter is calculated while updating the scale t stepwise, the filter size of the Gaussian filter can be changed in conjunction with the scale t. This is because if a smaller filter size than the scale t is set despite the fact that the input image is blurred in a larger range as the scale t is set larger, the filter size becomes a constraint and the horizontal or vertical direction is limited. This is because the trick is not working properly.

したがって、生成部１４は、ガウスフィルタのスケールｔが大きくなるにつれてフィルタサイズも大きく設定する。例えば、第１のガウスフィルタ及び第２のガウスフィルタのフィルタサイズＬは、下記の式（３）により算出することができる。例えば、下記の式（３）における「σ」は、ガウス関数の標準偏差である。下記の式（３）における「ｎ」は、整数であり、計算速度または精度のいずれを優先するのかによってその値を変えることができる。例えば、精度よりも計算速度を優先する場合、整数ｎには「１」を採用できる。また、計算速度よりも精度を優先する場合、整数ｎには「２」を採用できる。また、下記の式（３）における「integer」は、整数型を指し、ここでは、一例として、ｎ＊σの乗算値が四捨五入等により整数へ変換される。但し、σが０でない小さな値をとる場合には、Ｌの最小値は３とされる。 Therefore, the generation unit 14 sets a larger filter size as the scale t of the Gaussian filter increases. For example, the filter size L of the first Gaussian filter and the second Gaussian filter can be calculated by the following equation (3). For example, “σ” in the following equation (3) is a standard deviation of a Gaussian function. “N” in the following formula (3) is an integer, and the value can be changed depending on whether calculation speed or accuracy is prioritized. For example, when priority is given to calculation speed over accuracy, “1” can be adopted as the integer n. Further, when the accuracy is given priority over the calculation speed, “2” can be adopted as the integer n. In addition, “integer” in the following formula (3) indicates an integer type, and here, as an example, a multiplication value of n * σ is converted into an integer by rounding off or the like. However, when σ is a small value other than 0, the minimum value of L is 3.

Ｌ＝２＊integer（ｎ＊σ）＋１・・・（３） L = 2 * integer (n * σ) +1 (3)

分類部１５は、入力画像を文書画像および情景画像へ分類する処理部である。 The classification unit 15 is a processing unit that classifies an input image into a document image and a scene image.

一実施形態として、分類部１５は、入力画像から生成される暈け度合が異なる一連の暈け画像の間で、黒画素のブロブが文字単位から行単位へ暈けるブロブ数の変化の有無を判定することにより、入力画像を文書画像および情景画像へ分類する。すなわち、分類部１５は、暈け度合の変化に対する暈け画像の黒画素のブロブ数の変化量により、入力画像が書式特有の階層構造、例えば文字、行、章やページなどの階層構造を有するか否かを判別する。 As one embodiment, the classification unit 15 determines whether or not there is a change in the number of blobs in which a black pixel blob moves from a character unit to a line unit between a series of blurred images generated from input images with different degrees of blurring. By determining, the input image is classified into a document image and a scene image. That is, the classification unit 15 has an input image having a format-specific hierarchical structure, for example, a character, line, chapter, page, or other hierarchical structure, depending on the amount of change in the number of black pixels in the blurred image with respect to the change in the degree of blurring. It is determined whether or not.

ここで、文書画像は、図、表や写真を含んでいても、縦書きまたは横書きの文字部分が一定の割合以上で存在する公算が高い。よって、文書画像には、文字、行、章（行の一塊）、ページなどの各階層で固有のスケールを持つ画像要素が存在する。他方、情景画像は、固有のスケールを持った階層構造を持たず、様々なサイズの画像要素が点在する。 Here, even if the document image includes a figure, a table, and a photograph, it is highly probable that vertical or horizontal character portions exist at a certain ratio or more. Therefore, the document image includes image elements having a unique scale in each layer such as characters, lines, chapters (a group of lines), and pages. On the other hand, a scene image does not have a hierarchical structure with a unique scale, and is dotted with image elements of various sizes.

このことから、黒画素のブロブ数の減少量が大きい箇所が存在する場合、文字単位のブロブが行単位のブロブへ暈けたり、行単位のブロブが章単位やページ単位のブロブへ暈けたりといった暈け度合の変化があった公算が高まる。このような暈け度合の変化が存在する場合、入力画像が書式特有の階層構造を有すると推定できる結果、入力画像が文書画像である公算も高まる。一方、上記の暈け度合の変化が存在しない場合、入力画像が書式特有の階層構造を持たないと推定できる結果、入力画像が情景画像である公算が高まる。これによって、分類部１５は、入力画像を文書画像および情景画像へ自動的に分類できる。 For this reason, if there is a part where the decrease in the number of blobs of black pixels is large, a character-by-character blob can go to a line-by-line blob, or a line-by-line blob can go to a chapter or page-by-blob. Probability that there was a change in the degree of profit. If there is such a change in the degree of blurring, it can be estimated that the input image has a format-specific hierarchical structure. As a result, the likelihood that the input image is a document image is increased. On the other hand, if there is no change in the degree of blurring, it can be estimated that the input image does not have a format-specific hierarchical structure. As a result, the likelihood that the input image is a scene image increases. Thereby, the classification unit 15 can automatically classify the input image into a document image and a scene image.

図１に示す通り、分類部１５は、ブロブ数算出部１５ａと、極値算出部１５ｂと、判定部１５ｃとを有する。 As shown in FIG. 1, the classification unit 15 includes a blob number calculation unit 15a, an extreme value calculation unit 15b, and a determination unit 15c.

ブロブ数算出部１５ａは、暈け画像に含まれるブロブ数を算出する処理部である。ここで言う「ブロブ」とは、ラベリング処理により同一のラベルが付与された黒画素の連結成分のことを指す。 The blob number calculation unit 15a is a processing unit that calculates the number of blobs included in the lost image. The “blob” here refers to a connected component of black pixels to which the same label is given by the labeling process.

一実施形態として、ブロブ数算出部１５ａは、生成部１４により生成された一連の水平暈け画像及び一連の垂直暈け画像の各暈け画像に対し、次のような処理を実行する。すなわち、ブロブ数算出部１５ａは、暈け画像に含まれる各画素が持つ画素値が閾値以上であるか否かを判定する。そして、ブロブ数算出部１５ａは、画素値が閾値以上である画素に黒に対応する画素値「０」を設定する一方で、画素値が閾値未満である画素に白に対応する画素値「１」を設定する。これによって、暈け画像に含まれる各画素が白の画素値「１」または黒の画素値「０」に２値化された２値化画像を得ることができる。 As an embodiment, the blob number calculation unit 15a performs the following process on each of the series of horizontal blur images and the series of vertical blur images generated by the generation unit 14. That is, the blob number calculation unit 15a determines whether or not the pixel value of each pixel included in the blurred image is equal to or greater than a threshold value. Then, the blob number calculation unit 15a sets a pixel value “0” corresponding to black to a pixel having a pixel value equal to or greater than a threshold value, and a pixel value “1” corresponding to white to a pixel having a pixel value less than the threshold value. "Is set. Accordingly, a binary image in which each pixel included in the blurred image is binarized to a white pixel value “1” or a black pixel value “0” can be obtained.

そして、ブロブ数算出部１５ａは、２値化画像にラベリング処理を実行する。ここで言う「ラベリング処理」とは、２値化画像上で白または黒が連続する画素に同一のラベルを割り当てる処理を指し、既知の任意の手法を適用することができる。文書内の文字は、黒または黒に準じる色で表現されるので、ここでは、一例として、黒の画素値「０」が連続する画素に同一の識別情報が割り当てられる場合を想定する。このように、暈け画像の２値化画像にラベリング処理が行われることにより、同一のラベルが付与された黒画素の連結成分が「ブロブ」として抽出される。その上で、ブロブ数算出部１５ａは、上記のラベリング処理により暈け画像の２値化画像から抽出された黒画素のブロブ数を計数し、当該黒画素のブロブ数を対数へ変換する。 Then, the blob number calculation unit 15a performs a labeling process on the binarized image. The “labeling process” here refers to a process of assigning the same label to pixels in which white or black continues on a binarized image, and any known method can be applied. Since characters in the document are expressed in black or a color similar to black, here, as an example, a case is assumed in which the same identification information is assigned to pixels in which black pixel values “0” are continuous. As described above, the labeling process is performed on the binarized image of the blurred image, so that connected components of black pixels to which the same label is assigned are extracted as “blob”. After that, the blob number calculation unit 15a counts the blob number of black pixels extracted from the binarized image of the blurred image by the labeling process, and converts the blob number of the black pixels into a logarithm.

これによって、水平及び垂直の暈かし方向ごとに、一連の暈け画像のスケールｔと、各スケールｔの暈け画像に含まれる黒画素のブロブ数の対数との対応関係が得られる。以下では、プロセッサ上で仮想的に実現される各処理部がブロブ数の対数の変化をスケールｔを独立変数とする関数として扱うこととし、当該関数のことを「暈かし関数」と記載する場合がある。さらに、暈かし関数の暈かし方向を総称する場合に「暈かし関数log f(t)」と記載し、暈かし方向が水平方向である暈かし関数のことを指す場合に「暈かし関数log f(t_H)」と記載すると共に、暈かし方向が垂直方向である暈かし関数のことを指す場合に「暈かし関数logf (t_V)」と記載する場合がある。 Thus, for each horizontal and vertical blur direction, a correspondence relationship between the scale t of a series of blurred images and the logarithm of the number of blobs of black pixels included in the blurred image of each scale t is obtained. In the following, each processing unit virtually realized on the processor will treat the change in the logarithm of the number of blobs as a function having the scale t as an independent variable, and the function will be referred to as a “crimping function”. There is a case. Furthermore, when referring to the direction of the blurring function as a whole, it is described as “cracking function log f (t)”. In addition to describing the “cracking function log f (t _H )”, it refers to the whisker function whose vertical direction is the vertical direction, and “cracking function logf (t _V )”. There is a case.

極値算出部１５ｂは、暈かし関数の１次微分の極値を算出する処理部である。 The extreme value calculation unit 15b is a processing unit that calculates the extreme value of the first-order differentiation of the blurring function.

一実施形態として、極値算出部１５ｂは、暈かし関数log f(t)をスケールｔで微分することにより１次導関数dlog f(t)/dtを算出する。さらに、極値算出部１５ｂは、暈かし関数の２次導関数d²log f(t)/dt²を算出する。そして、極値算出部１５ｂは、暈かし関数の２次導関数d²log f(t)/dt²の値が「０」となるゼロクロス点を検出することにより、暈かし関数の１次導関数dlog f(t)/dtの極値、ここではガウスフィルタの暈かしによりスケールの増加に伴ってブロブ数が単調に減少するので極小値を算出する。その上で、極値算出部１５ｂは、暈かし関数の１次導関数dlog f(t)/dtの極値が求まる度に、これまでに求められた極値数を１つインクリメントすることにより、水平および垂直の暈かし方向ごとに極値数を集計する。このようにして、水平および垂直の暈かし方向ごとに、１次導関数dlog f(t_H)/dtの極値数、および、１次導関数dlog f(t_V)/dtの極値数が求められる。 As one embodiment, the extreme value calculation unit 15b calculates the first derivative dlog f (t) / dt by differentiating the blurring function log f (t) with a scale t. Further, the extreme value calculation unit 15b calculates the second derivative d ² log f (t) / dt ² of the blurring function. Then, the extreme value calculation unit 15b detects the zero cross point at which the value of the second derivative d ² log f (t) / dt ² of the recursive function is “0”, thereby obtaining 1 of the recursive function. The extreme value of the second derivative dlog f (t) / dt, here, the minimum value is calculated because the number of blobs monotonously decreases as the scale increases due to the Gaussian filter. In addition, every time the extreme value of the first derivative dlog f (t) / dt of the blurring function is obtained, the extreme value calculation unit 15b increments the number of extreme values obtained so far by one. To count the number of extreme values for each horizontal and vertical blur direction. In this way, the number of extreme values of the first derivative dlog f (t _H ) / dt and the extreme value of the first derivative dlog f (t _V ) / dt in each of the horizontal and vertical blur directions. A number is required.

判定部１５ｃは、暈かし関数の１次微分の極値数が所定の条件を満たすか否かを判定する処理部である。 The determination unit 15c is a processing unit that determines whether or not the number of extreme values of the first derivative of the blurring function satisfies a predetermined condition.

一実施形態として、判定部１５ｃは、極値算出部１５ｂにより算出された暈かし方向が水平方向である暈かし関数の１次導関数dlog f(t_H)/dtの極値数がゼロよりも大きいか否かを判定する。このとき、判定部１５ｃは、１次導関数dlog f(t_H)/dtの極値数がゼロよりも大きい場合、極値算出部１５ｂにより算出された暈かし方向が垂直方向である暈かし関数の１次導関数dlog f(t_V)/dtの極値数がゼロよりも大きいか否かをさらに判定する。ここで、１次導関数dlog f(t_V)/dtの極値数もゼロよりも大きい場合、水平および垂直の両方の暈かし方向で、入力画像が書式特有の階層構造を有すると推定できるので、入力画像が文書画像である公算が高まる。この場合、判定部１５ｃは、入力画像を「文書画像」へ分類し、文書画像用のレイアウト解析により文字認識処理を実行する第１の認識部１６ａへ入力画像を出力する。一方、１次導関数dlog f(t_H)/dtの極値数がゼロであるか、あるいは１次導関数dlog f(t_V)/dtの極値数がゼロである場合、水平または垂直の少なくともいずれかの暈かし方向で、入力画像が書式特有の階層構造を有さないと推定できるので、入力画像が情景画像である公算が高まる。この場合、判定部１５ｃは、入力画像を「情景画像」へ分類し、情景画像用のレイアウト解析により文字認識処理を実行する第２の認識部１６ｂへ入力画像を出力する。 As one embodiment, the determination unit 15c has the number of extreme values of the first derivative dlog f (t _H ) / dt of the blurring function whose horizontal direction is calculated by the extreme value calculation unit 15b. Determine if greater than zero. At this time, when the number of extreme values of the first derivative dlog f (t _H ) / dt is larger than zero, the determination unit 15c has the vertical direction calculated by the extreme value calculation unit 15b. It is further determined whether or not the number of extreme values of the first derivative dlog f (t _V ) / dt of the scare function is greater than zero. Here, if the number of extreme values of the first derivative dlog f (t _V ) / dt is larger than zero, it is estimated that the input image has a format-specific hierarchical structure in both horizontal and vertical blur directions. This increases the likelihood that the input image is a document image. In this case, the determination unit 15c classifies the input image into “document image”, and outputs the input image to the first recognition unit 16a that executes the character recognition process by the layout analysis for the document image. On the other hand, if the number of extreme values of the first derivative dlog f (t _H ) / dt is zero, or the number of extreme values of the first derivative dlog f (t _V ) / dt is zero, the horizontal or vertical In at least one of the above blur directions, it can be estimated that the input image does not have a hierarchical structure peculiar to the format, so that the input image is more likely to be a scene image. In this case, the determination unit 15c classifies the input image into a “scene image”, and outputs the input image to the second recognition unit 16b that executes character recognition processing by layout analysis for the scene image.

ここで、図４〜図７を用いて、入力画像の分類方法の一例を説明する。図４及び図６は、暈け画像におけるブロブの抽出結果の一例を示す図である。図５及び図７は、暈かし関数の一例を示す図である。図４及び図６には、横書きの２段組の文書の１ページが撮影された入力画像の暈け画像から抽出されたブロブの抽出結果が示されている。図４には、水平方向のスケールｔ_Ｈが異なる一連の水平暈け画像から抽出されたブロブが示されると共に、図６には、垂直方向のスケールｔ_Ｖが異なる一連の垂直暈け画像から抽出されたブロブが示されている。また、図５の上段には、暈かし方向が水平方向である暈かし関数log f(t_H)が示されると共に、図５の下段には、１次導関数dlog f(t_H)/dtが示されている。さらに、図７の上段には、暈かし方向が垂直方向である暈かし関数log f(t_V)が示されると共に、図７の下段には、１次導関数dlog f(t_V)/dtが示されている。 Here, an example of an input image classification method will be described with reference to FIGS. 4 and 6 are diagrams showing an example of blob extraction results in a blurred image. 5 and 7 are diagrams illustrating an example of the blurring function. 4 and 6 show blob extraction results extracted from a blurred image of an input image in which one page of a horizontally written two-column document is captured. FIG. 4 shows a blob extracted from a series of horizontal blur images with different horizontal scales t _H , and FIG. 6 shows an extract from a series of vertical blur images with different vertical scales t _V. The blobs are shown. Further, the upper part of FIG. 5 shows a blurring function log f (t _H ) whose horizontal direction is the horizontal direction, and the lower part of FIG. 5 shows the first derivative dlog f (t _H ). / dt is shown. Further, the upper part of FIG. 7 shows a blurring function log f (t _V ) whose vertical direction is the vertical direction, and the lower part of FIG. 7 shows the first derivative dlog f (t _V ). / dt is shown.

図４の上段には、ｔ_Ｈ＝０の水平暈け画像（＝入力画像）からのブロブの抽出結果が示されている。この通り、スケールｔ_Ｈが初期値の段階では、暈け画像から文字単位のブロブが抽出されることがわかる。そして、スケールｔ_ＨがΔｔ刻みで更新されると、図４の中段に示すように、文字単位のブロブが水平方向の暈かしによって行単位のブロブへ暈ける。さらに、スケールｔ_ＨがΔｔ刻みで更新されると、図４の下段に示すように、行単位のブロブが水平方向の暈かしによってページ単位のブロブへ暈ける。 The upper part of FIG. 4 shows a blob extraction result from a horizontally blurred image (= input image) at t _H = 0. As can be seen, when the scale t _H is the initial value, the blob in character units is extracted from the lost image. When the scale t _H is updated in increments Delta] t, as shown in the middle of FIG. 4, blurred by blob horizontal halo However in characters into blobs row. Further, when the scale t _H is updated in increments of Δt, as shown in the lower part of FIG. 4, the row-by-row blobs are spread to the page-by-page blobs by horizontal fringing.

これら一連の水平暈け画像から算出されるブロブ数の対数から、図５の上段に示す暈かし関数log f(t_H)が得られる。この暈かし関数の２次導関数d²log f(t_H)/dt²の値が「０」となるゼロクロス点を検出することにより、暈かし関数の１次導関数dlog f(t_H)/dtの極値が算出される。すなわち、図５の下段に示すように、スケールｔ＝ｔ_ＨＭ１の点、及び、スケールｔ＝ｔ_ＨＭ２の点が極値として算出される。これら２つの極値は、文字単位のブロブが水平方向の暈かしによって行単位のブロブへ暈けたスケールと、行単位のブロブが水平方向の暈かしによってページ単位のブロブへ暈けたスケールとに対応する。 From the logarithm of the blob number calculated from the series of horizontal blur images, the blurring function log f (t _H ) shown in the upper part of FIG. 5 is obtained. By detecting a zero-cross point where the value of the second derivative d ² log f (t _H ) / dt ² of the above-described crack function is “0”, the first derivative d log f (t The extreme value of _H ) / dt is calculated. That is, as shown in the lower part of FIG. 5, the point of scale t = t _{HM1 and} the point of scale t = t _HM2 are calculated as extreme values. These two extremes are: a scale in which a character-wise blob has been turned into a line-by-line blob by a horizontal break, and a scale in which a line-by-line blob has been turned into a page-by-page blob by a horizontal break. Corresponding to

このように、水平方向の暈かしでは、１次導関数dlog f(t_H)/dtの極値数が「２」であると判別できるので、入力画像が水平方向に文字、行（段組みの一行）、ページといった３つの階層構造を持つことを推定できる。 As described above, since the number of extreme values of the first derivative dlog f (t _H ) / dt can be determined to be “2” in the horizontal direction, the input image has characters and lines (columns) in the horizontal direction. It can be estimated that it has three hierarchical structures such as one line of a set) and a page.

一方、図６の上段には、ｔ_Ｖ＝０の垂直暈け画像（＝入力画像）からのブロブの抽出結果が示されている。この通り、スケールｔ_Ｖが初期値の段階では、暈け画像から文字単位のブロブが抽出されることがわかる。そして、スケールｔ_ＨがΔｔ刻みで更新されると、図６の下段に示すように、文字単位のブロブが垂直方向の暈かしによってページ単位のブロブへ暈ける。 On the other hand, the upper part of FIG. 6 shows the blob extraction result from the vertical blurred image (= input image) of t _V = 0. This street, at the stage of the scale t _V is the initial value, it can be seen that the blobs of character units are extracted from the blurred image. When the scale t _H is updated in increments Delta] t, as shown in the lower part of FIG. 6, blob character units blurs the blobs in page units by the vertical umbrella however.

これら一連の垂直暈け画像から算出されるブロブ数の対数から、図７の上段に示す暈かし関数log f(t_V)が得られる。この暈かし関数の２次導関数d²log f(t_V)/dt²の値が「０」となるゼロクロス点を検出することにより、暈かし関数の１次導関数dlog f(t_V)/dtの極値が算出される。すなわち、図７の下段に示すように、スケールｔ＝ｔ_ＶＭ１の点が極値として算出される。かかる極値は、文字単位のブロブが垂直方向の暈かしによってページ単位のブロブへ暈けたスケールに対応する。 The blurring function log f (t _V ) shown in the upper part of FIG. 7 is obtained from the logarithm of the blob number calculated from the series of vertical blur images. By detecting a zero-cross point where the value of the second derivative d ² log f (t _V ) / dt ² of the above-described crack function is “0”, the first derivative d log f (t The extreme value of _V ) / dt is calculated. That is, as shown in the lower part of FIG. 7, a point of scale t = t _VM1 is calculated as an extreme value. Such an extreme value corresponds to a scale in which a blob in a character unit has opened into a blob in a page unit by a vertical fringe.

このように、垂直方向の暈かしでは、１次導関数dlog f(t_V)/dtの極値数が「１」であると判別できるので、入力画像が垂直方向に文字、ページといった２つの階層構造を持つことを推定できる。 As described above, since it is possible to determine that the number of extreme values of the first derivative dlog f (t _V ) / dt is “1” in the vertical direction, the input image is 2 characters such as characters and pages in the vertical direction. It can be estimated that it has two hierarchical structures.

以上のように、１次導関数dlog f(t_H)/dtの極値数がゼロよりも大きいか否か、さらには、１次導関数dlog f(t_V)/dtの極値数がゼロよりも大きいか否かという判定を実施することにより、プロセッサは、水平および垂直の両方の暈かし方向で入力画像が書式特有の階層構造を有するか否かを推定できる。本例の場合、１次導関数dlog f(t_H)/dtの極値数がゼロよりも大きく、かつ１次導関数dlog f(t_V)/dtの極値数がゼロよりも大きいので、入力画像を「文書画像」へ正確に分類できる。 As described above, whether or not the number of extreme values of the first derivative dlog f (t _H ) / dt is greater than zero, and further, the number of extreme values of the first derivative dlog f (t _V ) / dt is By performing the determination of whether it is greater than zero, the processor can estimate whether the input image has a format-specific hierarchical structure in both horizontal and vertical blur directions. In this example, the number of extreme values of the first derivative dlog f (t _H ) / dt is larger than zero, and the number of extreme values of the first derivative dlog f (t _V ) / dt is larger than zero. The input image can be accurately classified into “document image”.

ここで、暈かし関数でブロブ数の代わりにブロブ数の対数を用いることとしたのは、次のような理由が挙げられる。例えば、文字単位のブロブが行単位のブロブへ暈ける場合と、行単位のブロブがページ単位のブロブへ暈ける場合とでは、ブロブ数の減少幅が大きく異なる。すなわち、行単位のブロブがページ単位のブロブへ暈ける場合の減少幅の方が、文字単位のブロブが行単位のブロブへ暈ける場合の減少幅よりも著しく小さくなることがある。このように、スケールが大きくなるほどスケールが小さい場合よりもブロブ数の減少幅が検出されづらくなる。このため、スケールが小さい場合でもスケールが大きい場合でも、同一のレベルの値をプロセッサに扱わせるために、暈かし関数には、ブロブ数の代わりにブロブ数の対数が用いられる。 Here, the reason why the logarithm of the blob number is used in place of the blob number in the blurring function is as follows. For example, the amount of blob reduction is greatly different between a case where a character-by-character blob goes to a line-by-line blob and a case where a line-by-line blob goes to a page-by-page blob. That is, the reduction width when a blob in a line unit is opened to a blob in a page unit may be significantly smaller than the reduction width when a blob in a character unit is moved to a blob in a line unit. Thus, the larger the scale, the harder it is to detect the decrease in the number of blobs than when the scale is small. For this reason, the logarithm of the number of blobs is used instead of the number of blobs in order to cause the processor to handle the same level value regardless of whether the scale is small or large.

なお、ここでは、極値数の有無を判定することにより入力画像を文書画像および情景画像へ分類する場合を例示したが、ブロブの構成単位が暈けにより切り替わる変化点はブロブ数の対数の変化量、すなわち減少幅が所定の閾値以上であるか否かにより推定することができるので、ブロブ数の対数の減少幅が閾値以上であるスケールの変化点が存在するか否かにより、入力画像を文書画像および情景画像へ分類することもできる。 Here, the case where the input image is classified into the document image and the scene image by determining the presence or absence of the extremum number is exemplified, but the change point at which the blob's constituent unit is switched by the gain is the change in the logarithm of the blob number. Since the amount, that is, whether the reduction width is equal to or larger than a predetermined threshold value, it is possible to estimate the input image depending on whether there is a change point of the scale where the logarithmic reduction width of the blob number is equal to or larger than the threshold value. It can also be classified into a document image and a scene image.

第１認識部１６ａ及び第２認識部１６ｂは、いずれも入力画像に含まれる文字を認識する処理部である。これら第１認識部１６ａ及び第２認識部１６ｂの間では、第１認識部１６ａが文書画像用のレイアウト解析を文字認識処理に用いる一方で第２認識部１６ｂが情景画像用のレイアウト解析を文字認識処理に用いる点が異なる。 The first recognition unit 16a and the second recognition unit 16b are both processing units that recognize characters included in the input image. Between the first recognizing unit 16a and the second recognizing unit 16b, the first recognizing unit 16a uses layout analysis for document images for character recognition processing, while the second recognizing unit 16b performs layout analysis for scene images as characters. It differs in the point used for recognition processing.

なお、上記の取得部１２、変換部１３、生成部１４、分類部１５、第１認識部１６ａ及び第２認識部１６ｂなどの処理部は、次のようにして実装できる。例えば、中央処理装置、いわゆるＣＰＵ（Central Processing Unit）などに、上記の各処理部と同様の機能を発揮するプロセスをメモリ上に展開して実行させることにより実現できる。これらの処理部は、必ずしも中央処理装置で実行されずともよく、ＭＰＵ（Micro Processing Unit）に実行させることとしてもよい。また、上記の各機能部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などのハードワイヤードロジックによっても実現できる。 The processing units such as the acquisition unit 12, the conversion unit 13, the generation unit 14, the classification unit 15, the first recognition unit 16a, and the second recognition unit 16b can be implemented as follows. For example, it can be realized by causing a central processing unit, a so-called CPU (Central Processing Unit), or the like to develop and execute a process that exhibits the same function as each of the above-described processing units on a memory. These processing units do not necessarily have to be executed by the central processing unit, but may be executed by an MPU (Micro Processing Unit). Each functional unit described above can also be realized by a hard wired logic such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).

また、上記の画像記憶部１１には、一例として、各種の半導体メモリ素子、例えばＲＡＭ（Random Access Memory)やフラッシュメモリを採用できる。また、上記の画像記憶部１１は、必ずしも主記憶装置でなくともよく、補助記憶装置であってもかまわない。この場合、ＨＤＤ（Hard Disk Drive）、光ディスクやＳＳＤ（Solid State Drive）などを採用できる。 Further, for example, various semiconductor memory elements such as a RAM (Random Access Memory) and a flash memory can be employed in the image storage unit 11. The image storage unit 11 is not necessarily a main storage device, and may be an auxiliary storage device. In this case, an HDD (Hard Disk Drive), an optical disk, an SSD (Solid State Drive), or the like can be employed.

［処理の流れ］
次に、本実施例に係る認識装置１０の処理の流れについて説明する。なお、ここでは、認識装置１０により実行される（１）分類処理を説明した後に、分類処理のサブルーチンとして実行される（２）暈け画像の生成処理、（３）ブロブ数算出処理について説明することとする。 [Process flow]
Next, the flow of processing of the recognition apparatus 10 according to the present embodiment will be described. Here, (1) the classification process executed by the recognition apparatus 10 will be described, and then (2) a generated image generation process and (3) a blob number calculation process executed as a classification process subroutine will be described. I will do it.

（１）分類処理
図８は、実施例１に係る分類処理の手順を示すフローチャートである。この処理は、一例として、画像記憶部１１に記憶された画像の中から画像の指定を受け付けた場合や画像記憶部１１に新規の画像が登録された場合に開始される。 (1) Classification Processing FIG. 8 is a flowchart illustrating a classification processing procedure according to the first embodiment. As an example, this process is started when an image designation is received from images stored in the image storage unit 11 or when a new image is registered in the image storage unit 11.

図８に示すように、取得部１２により画像が取得されると（ステップＳ１０１）、変換部１３は、入力画像をカラー形式からグレースケール形式へ変換する（ステップＳ１０２）。なお、入力画像がグレースケールである場合には、ステップＳ１０２の処理はとばすことができる。 As shown in FIG. 8, when an image is acquired by the acquisition unit 12 (step S101), the conversion unit 13 converts the input image from a color format to a grayscale format (step S102). If the input image is grayscale, the process of step S102 can be skipped.

そして、生成部１４は、ステップＳ１０２でグレースケールへ変換された入力画像から暈け度合が異なる複数の暈け画像を生成する「暈け画像の生成処理」を実行する（ステップＳ１０３）。 Then, the generation unit 14 performs “bright image generation processing” that generates a plurality of blur images having different degrees of blur from the input image converted to grayscale in step S102 (step S103).

続いて、ブロブ数算出部１５ａは、ステップＳ１０３で生成された一連の水平暈け画像及び一連の垂直暈け画像の各暈け画像ごとに当該暈け画像に含まれるブロブ数を算出する「ブロブ数算出処理」を実行する（ステップＳ１０４）。 Subsequently, the blob count calculation unit 15a calculates the number of blobs included in the blur image for each of the series of horizontal blur images and the series of vertical blur images generated in step S103. Number calculation process "is executed (step S104).

その後、極値算出部１５ｂは、ステップＳ１０４で水平及び垂直の暈かし方向ごとに算出された各スケールｔの暈け画像に含まれる黒画素のブロブ数の対数から定まる暈かし関数の２次導関数d²log f(t)/dt²の値が「０」となるゼロクロス点を検出することにより、暈かし関数の１次導関数dlog f(t)/dtの極値を水平および垂直の暈かし方向ごとに算出する（ステップＳ１０５）。そして、極値算出部１５ｂは、ステップＳ１０５で極値が算出された回数、すなわち極値数を水平および垂直の暈かし方向ごとに集計する（ステップＳ１０６）。 After that, the extremum calculation unit 15b calculates 2 of the blur function determined from the logarithm of the number of blobs of black pixels included in the blur image of each scale t calculated for each horizontal and vertical blur direction in step S104. By detecting the zero crossing point where the value of the ^second derivative d ² log f (t) / dt ² is “0”, the extreme value of the first derivative dlog f (t) / dt of the exponential function is horizontal And it calculates for every vertical direction (step S105). Then, the extreme value calculation unit 15b counts the number of times the extreme value is calculated in step S105, that is, the number of extreme values for each horizontal and vertical direction (step S106).

続いて、判定部１５ｃは、ステップＳ１０６で集計された極値数のうち暈かし方向が水平方向である暈かし関数の１次導関数dlog f(t_H)/dtの極値数がゼロよりも大きいか否かを判定する（ステップＳ１０７）。 Subsequently, the determination unit 15c determines that the number of extreme values of the first derivative dlog f (t _H ) / dt of the stroke function in which the stroke direction is the horizontal direction among the number of extreme values counted in step S106. It is determined whether or not it is greater than zero (step S107).

このとき、１次導関数dlog f(t_H)/dtの極値数がゼロよりも大きい場合（ステップＳ１０７Ｙｅｓ）、判定部１５ｃは、ステップＳ１０６で集計された極値数のうち暈かし方向が垂直方向である暈かし関数の１次導関数dlog f(t_V)/dtの極値数がゼロよりも大きいか否かをさらに判定する（ステップＳ１０８）。 At this time, when the number of extreme values of the first derivative dlog f (t _H ) / dt is greater than zero (Yes in step S107), the determination unit 15c determines the direction of the outliers counted in step S106. It is further determined whether or not the number of extreme values of the first derivative dlog f (t _V ) / dt of the wrinkle function in the vertical direction is greater than zero (step S108).

ここで、１次導関数dlog f(t_V)/dtの極値数もゼロよりも大きい場合（ステップＳ１０８Ｙｅｓ）、水平および垂直の両方の暈かし方向で、入力画像が書式特有の階層構造を有すると推定できるので、入力画像が文書画像である公算が高まる。この場合、判定部１５ｃは、入力画像を「文書画像」へ分類し（ステップＳ１０９）、文書画像用のレイアウト解析により文字認識処理を実行する第１の認識部１６ａへ入力画像を出力し（ステップＳ１１１）、処理を終了する。 Here, when the number of extreme values of the first derivative dlog f (t _V ) / dt is larger than zero (Yes in step S108), the input image has a format-specific hierarchical structure in both horizontal and vertical direction. Therefore, it is highly probable that the input image is a document image. In this case, the determination unit 15c classifies the input image into “document image” (step S109), and outputs the input image to the first recognition unit 16a that executes the character recognition process by the layout analysis for the document image (step S109). S111), the process is terminated.

一方、１次導関数dlog f(t_H)/dtの極値数がゼロであるか、あるいは１次導関数dlog f(t_V)/dtの極値数がゼロである場合（ステップＳ１０７ＮｏまたはステップＳ１０８Ｎｏ）、次のような判断を下すことができる。すなわち、水平または垂直の少なくともいずれかの暈かし方向で、入力画像が書式特有の階層構造を有さないと推定できるので、入力画像が情景画像である公算が高まる。この場合、判定部１５ｃは、入力画像を「情景画像」へ分類し（ステップＳ１１０）、情景画像用のレイアウト解析により文字認識処理を実行する第２の認識部１６ｂへ入力画像を出力し（ステップＳ１１１）、処理を終了する。 On the other hand, when the number of extreme values of the first derivative dlog f (t _H ) / dt is zero or the number of extreme values of the first derivative dlog f (t _V ) / dt is zero (step S107 No or In step S108 No), the following determination can be made. That is, since it can be estimated that the input image does not have a format-specific hierarchical structure in at least one of the horizontal and vertical blur directions, the likelihood that the input image is a scene image increases. In this case, the determination unit 15c classifies the input image into a “scene image” (step S110), and outputs the input image to the second recognition unit 16b that executes character recognition processing by layout analysis for the scene image (step S110). S111), the process is terminated.

（２）暈け画像の生成処理
図９は、実施例１に係る暈け画像の生成処理の手順を示すフローチャートである。この処理は、図８に示したステップＳ１０３の処理に対応し、ステップＳ１０２で入力画像がグレースケールへ変換された場合、あるいはステップＳ１０１でグレースケールの入力画像が取得された場合に開始される。なお、ここでは、暈かし方向が水平及び垂直のいずれの場合でも、ガウスフィルタの計算に式（１）の水平方向用のアルゴリズムまたは式（２）の垂直方向用のアルゴリズムを用いるのか以外は処理内容が同一であるので、片方の暈け画像の生成、すなわち水平暈け画像を生成する場合を例示する。 (2) Blurred Image Generation Processing FIG. 9 is a flowchart illustrating the procedure of the blurred image generation processing according to the first embodiment. This process corresponds to the process of step S103 shown in FIG. 8, and is started when the input image is converted to gray scale in step S102, or when the gray scale input image is acquired in step S101. It should be noted that, here, whether the blur direction is horizontal or vertical, except for using the horizontal algorithm of equation (1) or the vertical algorithm of equation (2) for the calculation of the Gaussian filter. Since the processing contents are the same, a case where one blurred image is generated, that is, a horizontal blurred image is generated will be exemplified.

図９に示すように、生成部１４は、スケールｔ_Ｈを初期値ｔ_ｓ、例えば「０」に設定する（ステップＳ３０１）。続いて、生成部１４は、暈かし方向が水平であるガウスフィルタを算出するアルゴリズム、すなわち上記の式（１）にスケールｔ_Ｈを適用することにより、当該スケールｔ_Ｈのガウスフィルタを算出する（ステップＳ３０２）。 As illustrated in FIG. 9, the generation unit 14 sets the scale t _H to an initial value t _s , for example, “0” (step S301). Then, generation unit 14, the algorithm bulk lend direction to calculate the Gaussian filter is horizontal, that is, by applying the scale t _H in the above equation (1) to calculate a Gaussian filter of the scale t _H (Step S302).

続いて、生成部１４は、ステップＳ３０２で算出されたスケールｔ_ＨのガウスフィルタをステップＳ１０２でグレースケール形式へ変換された入力画像に適用して、畳み込み演算を行う（ステップＳ３０３）。これによって、スケールｔ_Ｈの水平暈け画像が生成されることになる。 Then, generation unit 14 applies the Gaussian filter scale t _H calculated in step S302 to the input image converted to grayscale format in step S102, it performs the convolution operation (step S303). As a result, a horizontally blurred image having a scale t _H is generated.

このとき、スケールｔ_Ｈが目標値ｔ_ｅまで更新されていない場合（ステップＳ３０４Ｎｏ）、生成部１４は、スケールｔ_Ｈの値に所定の更新値Δｔをインクリメントし（ステップＳ３０５）、上記のステップＳ３０２及びステップＳ３０３の処理を繰り返し実行する。その後、スケールｔ_Ｈが目標値ｔ_ｅまで更新されると（ステップＳ３０４Ｙｅｓ）、処理を終了する。 At this time, if the scale _{t H} is not updated to the target value _{t e} (step S304No), generating unit 14 increments the update value Δt value for the predetermined scale _{t H} (step S305), the above steps S302 And the process of step S303 is repeatedly executed. Thereafter, when the scale _{t H} are updated to the target value _{t e} (step S304Yes), the process ends.

なお、ここでは、図９を参照して、一連の水平暈け画像が生成される場合を例示したが、垂直暈け画像を生成する場合にも、ガウスフィルタの計算に式（２）の垂直方向用のアルゴリズムを用いることにより、同様にして、一連の垂直暈け画像を生成できる。これら水平暈け画像および垂直暈け画像は、いずれの暈け画像が先に生成されることとしてもよいし、両者が並行して生成されることとしてもかまわない。 Here, the case where a series of horizontally blurred images is generated is illustrated with reference to FIG. 9, but the vertical of Equation (2) is also used in the calculation of the Gaussian filter when generating a vertically blurred image. By using a direction algorithm, a series of vertically blurred images can be generated in the same manner. As for the horizontal blur image and the vertical blur image, either blur image may be generated first, or both may be generated in parallel.

（３）ブロブ数算出処理
図１０は、実施例１に係るブロブ数算出処理の手順を示すフローチャートである。この処理は、図８に示したステップＳ１０４の処理に対応し、ステップＳ１０３の処理が実行された後に実行される。 (3) Blob Number Calculation Processing FIG. 10 is a flowchart illustrating a procedure of blob number calculation processing according to the first embodiment. This process corresponds to the process of step S104 shown in FIG. 8, and is executed after the process of step S103 is executed.

図１０に示すように、ブロブ数算出部１５ａは、ステップＳ１０３で暈け画像の生成に用いられたスケールｔを１つ選択する（ステップＳ５０１）。続いて、ブロブ数算出部１５ａは、ステップＳ５０１で選択されたスケールｔに対応する水平暈け画像の２値化画像を生成する（ステップＳ５０２）。 As shown in FIG. 10, the blob number calculation unit 15a selects one scale t used for generating a blurred image in step S103 (step S501). Subsequently, the blob count calculation unit 15a generates a binarized image of the horizontal blur image corresponding to the scale t selected in step S501 (step S502).

そして、ブロブ数算出部１５ａは、ステップＳ５０２で得られた水平暈け画像の２値化画像にラベリング処理を適用して得られたブロブを計数する（ステップＳ５０３）。続いて、ブロブ数算出部１５ａは、ステップＳ５０３で得られたブロブ数を対数へ変換する（ステップＳ５０４）。 Then, the blob count calculation unit 15a counts the blobs obtained by applying the labeling process to the binarized image of the horizontally blurred image obtained in step S502 (step S503). Subsequently, the blob number calculation unit 15a converts the blob number obtained in step S503 into a logarithm (step S504).

また、ブロブ数算出部１５ａは、ステップＳ５０１で選択されたスケールｔに対応する垂直暈け画像の２値化画像を生成する（ステップＳ５０５）。 Further, the blob number calculation unit 15a generates a binarized image of the vertical blur image corresponding to the scale t selected in step S501 (step S505).

そして、ブロブ数算出部１５ａは、ステップＳ５０５で得られた垂直暈け画像の２値化画像にラベリング処理を適用して得られたブロブを計数する（ステップＳ５０６）。続いて、ブロブ数算出部１５ａは、ステップＳ５０６で得られたブロブ数を対数へ変換する（ステップＳ５０７）。 Then, the blob number calculation unit 15a counts the blobs obtained by applying the labeling process to the binarized image of the vertical blur image obtained in step S505 (step S506). Subsequently, the blob number calculation unit 15a converts the blob number obtained in step S506 into a logarithm (step S507).

その後、未選択のスケールがなくなるまで（ステップＳ５０８Ｙｅｓ）、未選択のスケールを選択し（ステップＳ５０１）、ステップＳ５０２〜ステップＳ５０７の処理を繰り返し実行する。そして、未選択のスケールがなくなると（ステップＳ５０８Ｎｏ）、処理を終了する。 Thereafter, until there is no unselected scale (step S508 Yes), an unselected scale is selected (step S501), and the processes of steps S502 to S507 are repeatedly executed. When there is no unselected scale (No in step S508), the process is terminated.

［実施例１の効果］
上述してきたように、本実施例１に係る認識装置１０は、入力画像から生成される暈け度合が異なる一連の暈け画像の間で、黒画素のブロブが文字単位から行単位へ暈けるブロブ数の変化の有無を判定することにより、入力画像を文書画像および情景画像へ分類する。 [Effect of Example 1]
As described above, in the recognition apparatus 10 according to the first embodiment, the blob of the black pixel is generated from the character unit to the line unit between the series of blurred images generated from the input image with different degrees of blurring. By determining whether or not the number of blobs has changed, the input image is classified into a document image and a scene image.

言い換えれば、暈け度合の変化に対する暈け画像の黒画素のブロブ数の変化量により、入力画像が書式特有の階層構造、例えば文字、行、章やページなどの階層構造を有するか否かを判別する。例えば、黒画素のブロブ数の減少量が大きい箇所が存在する場合、文字単位のブロブが行単位のブロブへ暈けたり、行単位のブロブが章単位やページ単位のブロブへ暈けたりといった暈け度合の変化があった公算が高まる。 In other words, whether the input image has a format-specific hierarchical structure, for example, a character, line, chapter, page, or other hierarchical structure, depending on the amount of change in the number of black pixels in the blurred image with respect to the change in the degree of blurring. Determine. For example, if there is a part where the decrease in the number of blobs of black pixels is large, a character-by-character blob may be turned into a line-by-line blob, or a line-by-line blob may be turned into a chapter or page-by-blob. The likelihood that the degree of change will increase.

このような暈け度合の変化が存在する場合、入力画像が書式特有の階層構造を有すると推定できる結果、入力画像が文書画像である公算も高まる。一方、上記の暈け度合の変化が存在しない場合、入力画像が書式特有の階層構造を持たないと推定できる結果、入力画像が情景画像である公算が高まる。 If there is such a change in the degree of blurring, it can be estimated that the input image has a format-specific hierarchical structure. As a result, the likelihood that the input image is a document image is increased. On the other hand, if there is no change in the degree of blurring, it can be estimated that the input image does not have a format-specific hierarchical structure. As a result, the likelihood that the input image is a scene image increases.

したがって、本実施例に係る認識装置１０によれば、入力画像を文書画像および情景画像へ自動的に分類できる。 Therefore, according to the recognition apparatus 10 according to the present embodiment, the input image can be automatically classified into the document image and the scene image.

さて、これまで開示の装置に関する実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下では、本発明に含まれる他の実施例を説明する。 Although the embodiments related to the disclosed apparatus have been described above, the present invention may be implemented in various different forms other than the above-described embodiments. Therefore, another embodiment included in the present invention will be described below.

［スケールの更新幅Δｔ］
上記の実施例１で説明を行ったスケールの更新幅Δｔは、その値が大きすぎるとブロブ数の対数の一次微分のグラフに極値が観測されづらくなる一方で、これとは逆に小さすぎると処理時間が増加するという傾向を持つ。 [Scale update width Δt]
The scale update width Δt described in the first embodiment is too small to be observed in the graph of the first derivative of the logarithm of the blob number if the value is too large. And the processing time tends to increase.

このことから、スケールの更新幅Δｔを適切に設定するために、次のような処理を実行することができる。例えば、入力画像が２値化された２値化画像にラベリング処理を適用することによりブロブを抽出した上で、各ブロブの外接矩形の高さの平均サイズＳｈ及び幅の平均サイズＳｗを算出する。各ブロブの外接矩形の高さの平均サイズＳｈ及び幅の平均サイズＳｗは、およそ平均文字高さおよび平均文字幅と同等のサイズになると推定できる。その上で、水平暈け画像を生成する場合には、スケールｔ_Ｈの初期値ｔｓを十分に小さい値、例えばｔ_Ｓ＝０．０００１から開始し、スケールｔ_Ｈの更新幅ΔｔをＳｗよりも小さい値、例えばΔｔ＝Ｓｗ／５と設定すると共に、目標値ｔ_ｅまでのΔｔの更新回数を入力画像の幅／Δｔ、すなわち目標値ｔ_ｅを入力画像の幅／２と設定することができる。また、垂直暈け画像を生成する場合には、先の水平暈け画像の説明において「Ｓｗ」を「Ｓｈ」と読み替えると共に「入力画像の幅」を「入力画像の高さ」と読み替えることにより、同様にして、スケールｔ_Ｖの更新幅Δｔや目標値ｔ_ｅを適切に設定できる。 From this, the following processing can be executed in order to appropriately set the scale update width Δt. For example, after extracting blobs by applying a labeling process to the binarized image obtained by binarizing the input image, the average height Sh and the average width Sw of the circumscribed rectangle of each blob are calculated. . It can be estimated that the average size Sh and the average width Sw of the height of the circumscribed rectangle of each blob are approximately equal to the average character height and the average character width. On top of that, when generating a horizontal blur images, scale _t sufficiently small value the initial value ts of _H, for example, starting from _t S = 0.0001, than Sw an update width Δt of the scale _{t H} small value, for example, and sets the Delta] t = Sw / 5, it is possible to set the number of updates Delta] t to the target value t _e width / Delta] t of the input image, i.e., the target value t _e and width / 2 of the input image . In the case of generating a vertical blur image, “Sw” is replaced with “Sh” in the description of the previous horizontal blur image, and “input image width” is replaced with “input image height”. , similarly, can be appropriately set update width Δt and the target value t _e of the scale t _V.

［暈け画像の生成方法］
上記の実施例１では、ガウスフィルタを用いて入力画像の暈け画像を生成する場合を例示したが、暈け画像の生成方法はこれに限定されない。例えば、入力画像を段階的に縮小することにより一連の暈け画像を生成することができる。この場合、画素を間引く処理が入力画像を適用することとなるが、当該間引き処理においては、複数の画素のうち最も画素値が低い画素、すなわち黒色に近い画素を残す一方でそれ以外の画素を間引くことにより、ガウスフィルタを用いる畳み込み演算により生成される暈け画像と同等の暈け画像を得ることができる。 [How to generate a profitable image]
In the first embodiment, the case where a blurred image of an input image is generated using a Gaussian filter is illustrated, but the method for generating a blurred image is not limited to this. For example, a series of blurred images can be generated by reducing the input image in stages. In this case, the process of thinning out pixels applies the input image. However, in the thinning-out process, the pixel having the lowest pixel value among a plurality of pixels, that is, the pixel close to black is left while the other pixels are removed. By thinning out, it is possible to obtain a blur image equivalent to a blur image generated by a convolution operation using a Gaussian filter.

［分散および統合］
また、図示した各装置の各構成要素は、必ずしも物理的に図示の如く構成されておらずともよい。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、取得部１２、変換部１３、生成部１４、分類部１５、第１認識部１６ａまたは第２認識部１６ｂを認識装置１０の外部装置としてネットワーク経由で接続するようにしてもよい。一例として、取得部１２、変換部１３、生成部１４、分類部１５を有するサーバ装置を構築し、クライアント端末へ分類結果を送信することもできる。また、取得部１２、変換部１３、生成部１４、分類部１５、第１認識部１６ａまたは第２認識部１６ｂを別の装置がそれぞれ有し、ネットワーク接続されて協働することで、上記の認識装置１０の機能を実現するようにしてもよい。 [Distribution and integration]
In addition, each component of each illustrated apparatus does not necessarily have to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. For example, the acquisition unit 12, the conversion unit 13, the generation unit 14, the classification unit 15, the first recognition unit 16a, or the second recognition unit 16b may be connected as an external device of the recognition device 10 via a network. As an example, a server device having an acquisition unit 12, a conversion unit 13, a generation unit 14, and a classification unit 15 can be constructed, and the classification result can be transmitted to the client terminal. In addition, the acquisition unit 12, the conversion unit 13, the generation unit 14, the classification unit 15, the first recognition unit 16a or the second recognition unit 16b each have another device, and are connected to a network to cooperate with each other. You may make it implement | achieve the function of the recognition apparatus 10. FIG.

［認識プログラム］
また、上記の実施例で説明した各種の処理は、予め用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図１１を用いて、上記の実施例と同様の機能を有する認識プログラムを実行するコンピュータの一例について説明する。 [Recognition program]
The various processes described in the above embodiments can be realized by executing a prepared program on a computer such as a personal computer or a workstation. In the following, an example of a computer that executes a recognition program having the same function as that of the above embodiment will be described with reference to FIG.

図１１は、実施例１及び実施例２に係る認識プログラムを実行するコンピュータのハードウェア構成例を示す図である。図１１に示すように、コンピュータ１００は、操作部１１０ａと、スピーカ１１０ｂと、カメラ１１０ｃと、ディスプレイ１２０と、通信部１３０とを有する。さらに、このコンピュータ１００は、ＣＰＵ１５０と、ＲＯＭ１６０と、ＨＤＤ１７０と、ＲＡＭ１８０とを有する。これら１１０〜１８０の各部はバス１４０を介して接続される。 FIG. 11 is a diagram illustrating a hardware configuration example of a computer that executes a recognition program according to the first and second embodiments. As illustrated in FIG. 11, the computer 100 includes an operation unit 110 a, a speaker 110 b, a camera 110 c, a display 120, and a communication unit 130. Further, the computer 100 includes a CPU 150, a ROM 160, an HDD 170, and a RAM 180. These units 110 to 180 are connected via a bus 140.

ＨＤＤ１７０には、図１１に示すように、上記の実施例１で示した取得部１２、変換部１３、生成部１４及び分類部１５と同様の機能を発揮する認識プログラム１７０ａが記憶される。この認識プログラム１７０ａは、図１に示した取得部１２、変換部１３、生成部１４及び分類部１５の各構成要素と同様、統合又は分離してもかまわない。すなわち、ＨＤＤ１７０には、必ずしも上記の実施例１で示した全てのデータが格納されずともよく、処理に用いるデータがＨＤＤ１７０に格納されればよい。 As shown in FIG. 11, the HDD 170 stores a recognition program 170 a that exhibits the same functions as the acquisition unit 12, the conversion unit 13, the generation unit 14, and the classification unit 15 described in the first embodiment. This recognition program 170a may be integrated or separated as in the constituent elements of the acquisition unit 12, the conversion unit 13, the generation unit 14, and the classification unit 15 illustrated in FIG. That is, the HDD 170 does not necessarily have to store all the data shown in the first embodiment, and data used for processing may be stored in the HDD 170.

このような環境の下、ＣＰＵ１５０は、ＨＤＤ１７０から認識プログラム１７０ａを読み出した上でＲＡＭ１８０へ展開する。この結果、認識プログラム１７０ａは、図１１に示すように、認識プロセス１８０ａとして機能する。この認識プロセス１８０ａは、ＲＡＭ１８０が有する記憶領域のうち認識プロセス１８０ａに割り当てられた領域にＨＤＤ１７０から読み出した各種データを展開し、この展開した各種データを用いて各種の処理を実行する。例えば、認識プロセス１８０ａが実行する処理の一例として、図８〜図１０に示す処理などが含まれる。なお、ＣＰＵ１５０では、必ずしも上記の実施例１で示した全ての処理部が動作せずともよく、実行対象とする処理に対応する処理部が仮想的に実現されればよい。 Under such an environment, the CPU 150 reads the recognition program 170 a from the HDD 170 and expands it in the RAM 180. As a result, the recognition program 170a functions as a recognition process 180a as shown in FIG. The recognition process 180a expands various data read from the HDD 170 in an area allocated to the recognition process 180a in the storage area of the RAM 180, and executes various processes using the expanded various data. For example, the processes shown in FIGS. 8 to 10 are included as an example of the process executed by the recognition process 180a. Note that the CPU 150 does not necessarily operate all the processing units described in the first embodiment, and the processing unit corresponding to the process to be executed may be virtually realized.

なお、上記の認識プログラム１７０ａは、必ずしも最初からＨＤＤ１７０やＲＯＭ１６０に記憶されておらずともかまわない。例えば、コンピュータ１００に挿入されるフレキシブルディスク、いわゆるＦＤ、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に認識プログラム１７０ａを記憶させる。そして、コンピュータ１００がこれらの可搬用の物理媒体から認識プログラム１７０ａを取得して実行するようにしてもよい。また、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータ１００に接続される他のコンピュータまたはサーバ装置などに認識プログラム１７０ａを記憶させておき、コンピュータ１００がこれらから認識プログラム１７０ａを取得して実行するようにしてもよい。 The above recognition program 170a does not necessarily have to be stored in the HDD 170 or the ROM 160 from the beginning. For example, the recognition program 170a is stored in a “portable physical medium” such as a flexible disk inserted into the computer 100, so-called FD, CD-ROM, DVD disk, magneto-optical disk, IC card or the like. Then, the computer 100 may acquire and execute the recognition program 170a from these portable physical media. In addition, the recognition program 170a is stored in another computer or server device connected to the computer 100 via a public line, the Internet, a LAN, a WAN, etc., and the computer 100 acquires the recognition program 170a from these and executes it. You may make it do.

１０認識装置
１１画像記憶部
１２取得部
１３変換部
１４生成部
１５分類部
１５ａブロブ数算出部
１５ｂ極値算出部
１５ｃ判定部
１６ａ第１認識部
１６ｂ第２認識部 DESCRIPTION OF SYMBOLS 10 Recognition apparatus 11 Image memory | storage part 12 Acquisition part 13 Conversion part 14 Generation part 15 Classification | category part 15a Blob number calculation part 15b Extreme value calculation part 15c Determination part 16a 1st recognition part 16b 2nd recognition part

Claims

On the computer,
Processing to acquire the input image;
A process of generating a plurality of blur images having different blur degrees from the input image;
A process for calculating the number of connected components obtained by applying a labeling process to the blurred image for each blurred image;
A process for determining whether or not there is a change in the degree of blur in which the amount of change in the number of connected components is greater than or equal to a predetermined threshold value among the plurality of blur images. .

The generating process includes a first blurred image group in which the input image is blurred in a first direction, and a second blur in which the input image is blurred in a direction perpendicular to the first direction. A group of images and
In the determination process, there is a change in the degree of blur in which the amount of change in the number of connected components is greater than or equal to a predetermined threshold between the first blur image groups, and the second blur image group 2. The recognition program according to claim 1, further comprising:

In the computer,
Further executing a process of calculating a statistical value of the height or width of a connected component obtained by applying a labeling process to the input image;
The generating process generates the plurality of blur images while updating the blur degree according to an update width determined from a statistical value of a height or a statistical value of the connected component. The recognition program according to 1 or 2.

The generation process updates the blur degree from a first value to a second value determined by a statistical value of the height or width of the connected component and a height or width of the input image. The recognition program according to claim 3, wherein the plurality of blurred images are generated.

Computer
Processing to acquire the input image;
A process of generating a plurality of blur images having different blur degrees from the input image;
A process for calculating the number of connected components obtained by applying a labeling process to the blurred image for each blurred image;
And a process of determining whether or not there is a change in the degree of blur in which the amount of change in the number of connected components is greater than or equal to a predetermined threshold value among the plurality of blur images. .

An acquisition unit for acquiring an input image;
A generating unit that generates a plurality of blur images having different degrees of blur from the input image;
A calculation unit that calculates the number of connected components obtained by applying a labeling process to the blurred image for each blurred image;
And a determination unit that determines whether there is a change in the degree of blur in which the amount of change in the number of connected components is greater than or equal to a predetermined threshold value among the plurality of blur images. .