JP2009169675A

JP2009169675A - Document processing apparatus, document processing method and document processing program

Info

Publication number: JP2009169675A
Application number: JP2008007075A
Authority: JP
Inventors: Yoshio Komaki; 由夫小巻
Original assignee: Konica Minolta Business Technologies Inc
Current assignee: Konica Minolta Business Technologies Inc
Priority date: 2008-01-16
Filing date: 2008-01-16
Publication date: 2009-07-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing apparatus, a document processing method and a document processing program that can automatically set document breaks in originals of various types of mixed documents without troubling users. <P>SOLUTION: The document processing apparatus includes a line area extraction part 21 for extracting line areas from document images stored in an image buffer part 13, a style feature detection part 22 for detecting a plurality of line attributes based on a plurality of predetermined attribute types and detecting a style feature indicating the combination pattern of the line attributes in each line area, a primary determination part 23 for determining primary style features of the document images from a plurality of detected style features, and a break position setting part 24 for setting break positions in the document images according to variations in the primary style features between pages. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、文書処理装置、文書処理方法および文書処理プログラムに関し、特に、原稿の読取り機能を有する文書処理装置、文書処理方法および文書処理プログラムに関する。 The present invention relates to a document processing apparatus, a document processing method, and a document processing program, and more particularly to a document processing apparatus, a document processing method, and a document processing program having a document reading function.

省資源や省スペースの観点から、紙原稿などに記載された文書を電子文書に変換して管理する文書管理システムが実用化されている。このような文書管理システムでは、スキャナ装置などを用いて原稿を読取ることで文書画像を生成し、これらの文書画像から電子文書を生成する。 From the viewpoint of resource saving and space saving, a document management system that converts and manages a document described on a paper manuscript or the like into an electronic document has been put into practical use. In such a document management system, a document image is generated by reading a document using a scanner device or the like, and an electronic document is generated from these document images.

従来より、様々な文書を連続して入力する場合に、文書の区切りを自動的に設定するための提案がなされている。 2. Description of the Related Art Conventionally, proposals have been made for automatically setting document separators when various documents are continuously input.

たとえば、特許文献１には、電子ファイリング装置が、画像の文字認識を行ない、その文字認識結果が予め設定した文字列と一致した場合に、文書の区切りを設定することが開示されている。 For example, Patent Document 1 discloses that an electronic filing device performs character recognition of an image and sets a document break when the character recognition result matches a preset character string.

また、特許文献２には、画像処理装置が、各ページのレイアウトと各ページに隣接するページのレイアウトとの相違度に基づいて、文書の区切りを設定することが開示されている。 Patent Document 2 discloses that the image processing apparatus sets document breaks based on the degree of difference between the layout of each page and the layout of a page adjacent to each page.

また、特許文献３には、電子ファイル装置が、画像の向き（縦横）を検出し、検出した画像の向きが直前の原稿と異なる場合に文書の区切りを設定することが開示されている。
特開平９−２３１３０９号公報特開２００６−７２４８４号公報特開平７−１９２１０９号公報 Patent Document 3 discloses that an electronic file device detects the orientation (vertical and horizontal) of an image and sets a document break when the detected orientation of the image is different from that of the immediately preceding document.
Japanese Patent Laid-Open No. 9-231309 JP 2006-72484 A JP 7-192109 A

特許文献１では、文書の区切りを設定するためには、文書の最初または終わりなどに特定の文字列を含むことが必要とされる。しかしながら、特定の文字列を含む同種の文書は限定される。 In Patent Document 1, in order to set a delimiter for a document, it is necessary to include a specific character string at the beginning or end of the document. However, the same type of document including a specific character string is limited.

特許文献２では、文書の最初と最後とでレイアウトの変化が必要とされる。単一の種類の文書では、文書の区切りでレイアウトの規則的な変化が生じる場合もあるが、複数の文書が混在した場合にはレイアウトの変化に規則性を生じない場合も少なくない。また、レイアウトが変化しない種類の文書、あるいは、レイアウトが多様な文書も存在する。 In Patent Document 2, a layout change is required between the beginning and end of a document. In a single type of document, there may be a regular change in layout due to document separation, but in the case where a plurality of documents are mixed, there are many cases where regularity does not occur in the change in layout. There are also types of documents whose layout does not change, or documents with various layouts.

さらに、特許文献３では、複数の文書をスキャナにかける前に、ユーザが、区切りを設定したい箇所に各文書の向きを直前の文書の向きと変える等の手間が必要となる。 Furthermore, according to Patent Document 3, before a plurality of documents are scanned, the user needs to change the direction of each document to the direction of the immediately preceding document at a location where a separation is desired.

本発明は、上記のような問題を解決するためになされたものであって、その目的は、多様な種類の文書が混在した原稿から、ユーザの手間無しに自動的に文書の区切りを設定することのできる画像処理装置、文書処理方法および文書処理プログラムを提供することである。 The present invention has been made to solve the above-described problems, and an object of the present invention is to automatically set document separation without any user's effort from a manuscript in which various types of documents are mixed. An image processing apparatus, a document processing method, and a document processing program are provided.

この発明のある局面に従う画像処理装置は、文書画像を記憶するための第１の記憶手段と、前記第１の記憶手段に記憶された文書画像から、所定の種類の文書領域を抽出するための抽出手段と、前記文書領域ごとに、所定の複数の属性種別それぞれに基づく複数の領域属性を検出し、前記領域属性の組合せのパターンを表わすスタイル特徴を検出するための検出手段と、検出された複数の前記スタイル特徴をパターンごとに分類し、分類されたパターンの出現頻度に基づいて、前記文書画像についての１以上の主要なスタイル特徴を判定するための判定手段と、ページごとに、複数の前記主要なスタイル特徴のうちのいずれが存在するかを検出し、ページ間での前記主要なスタイル特徴の変化量に基づいて、前記文書画像の区切り位置を設定するための設定手段とを備える。 An image processing apparatus according to an aspect of the present invention is configured to extract a predetermined type of document area from a first storage unit for storing a document image and the document image stored in the first storage unit. Detecting means for detecting a plurality of area attributes based on each of a plurality of predetermined attribute types for each document area, and detecting a style feature representing a pattern of the combination of the area attributes; A plurality of the style features are classified for each pattern, a determination unit for determining one or more main style features for the document image based on the appearance frequency of the classified pattern, and a plurality of style features for each page Detect which of the main style features is present and set the document image separation position based on the amount of change of the main style features between pages And an order of the setting means.

好ましくは、前記判定手段は、前記スタイル特徴ごとに、当該スタイル特徴が出現したページ数を前記出現頻度として算出するための頻度算出手段と、算出されたページ数が、所定値以上であるスタイル特徴を、前記主要なスタイル特徴として判定する手段とを含む。 Preferably, for each of the style features, the determination unit includes a frequency calculation unit for calculating the number of pages in which the style feature appears as the appearance frequency, and a style feature in which the calculated number of pages is equal to or greater than a predetermined value. Means for determining as the main style feature.

好ましくは、前記設定手段は、各ページに存在する前記主要なスタイル特徴を、当該ページのページ特徴として判定するための手段と、ページごとに、当該ページのページ特徴が、前ページのページ特徴から変化した量を前記変化量として算出するための量算出手段と、算出された前記変化量が一定値以上の場合に、当該ページ間に区切り位置を設定する手段とを含む。 Preferably, the setting means includes means for determining the main style feature existing in each page as the page feature of the page, and for each page, the page feature of the page is derived from the page feature of the previous page. An amount calculating means for calculating the changed amount as the change amount; and a means for setting a break position between the pages when the calculated change amount is equal to or greater than a predetermined value.

好ましくは、前記設定手段は、連続する第１、第２および第３のページのうち、前記第２のページ以外の前記第１および第３のページで同一のページ特徴が検出された場合には、前記第２のページにも前記同一のページ特徴が検出されたものとして補正するための補正手段をさらに含む。 Preferably, when the same page feature is detected in the first and third pages other than the second page among the continuous first, second and third pages, the setting means The second page further includes correction means for correcting that the same page feature has been detected.

好ましくは、前記設定手段により設定された区切り位置に基づいて、前記文書画像を分割して出力するための出力手段をさらに備える。 Preferably, output means for dividing and outputting the document image based on the separation position set by the setting means is further provided.

好ましくは、前記文書画像が前記区切り位置に基づいて区切られた場合に、区切られた複数の分割文書画像における最初のページのインデックス画像を表示するための信号を生成する表示制御手段と、前記表示制御手段からの信号に応じた出力を行なう表示手段とをさらに備える。 Preferably, when the document image is divided based on the separation position, a display control unit that generates a signal for displaying an index image of a first page in the plurality of divided document images, and the display Display means for performing output in accordance with a signal from the control means.

好ましくは、ユーザから、前記設定手段による前記文書画像の区切り数に関する指示を受付けるための指示受付手段をさらに備える。 Preferably, the apparatus further includes an instruction receiving unit for receiving an instruction from the user regarding the number of document image divisions by the setting unit.

好ましくは、受付けられた前記指示に応じて、前記設定手段による区切り位置の設定のための所定のパラメータを変更する変更手段と、前記変更手段による変更後に、再度、前記設定手段による処理の実行を指示するための実行指示手段とをさらに備える。 Preferably, in accordance with the received instruction, a changing unit that changes a predetermined parameter for setting a separation position by the setting unit, and after the change by the changing unit, the processing by the setting unit is executed again. And execution instruction means for instructing.

好ましくは、レイアウトあるいは所定の文字列に基づいて、前記文書画像の区切り位置を設定するための第２の設定手段をさらに備える。 Preferably, the image processing apparatus further includes a second setting unit for setting a break position of the document image based on a layout or a predetermined character string.

好ましくは、前記所定の複数の属性種別は、上位領域での位置，大きさ，色、部分領域の個数，位置，高さ，色，隣接する部分領域との距離、文字の大きさ，文字修飾、文字色、背景色、フォントの種別のうち、少なくとも２つを含む。 Preferably, the plurality of predetermined attribute types include position, size, color in upper area, number of partial areas, position, height, color, distance between adjacent partial areas, character size, character modification , Character color, background color, and font type.

好ましくは、前記所定の種類の文書領域は、段、行、文字列および文字のうちのいずれかに対応する。 Preferably, the predetermined type of document area corresponds to one of a column, a line, a character string, and a character.

この発明のさらに他の局面に従う文書処理方法は、文書画像を記憶するための記憶部と演算処理部とを備えた文書処理装置において実行される文書処理方法であって、前記演算処理部が、前記記憶部に記憶された前記文書画像から、所定の種類の文書領域を抽出するステップと、前記演算処理部が、前記文書領域ごとに、所定の複数の属性種別それぞれに基づく複数の領域属性を検出し、前記領域属性の組合せのパターンを表わすスタイル特徴を検出するステップと、検出された複数の前記スタイル特徴をパターンごとに分類し、分類されたパターンの出現頻度に基づいて、前記文書画像についての１以上の主要なスタイル特徴を判定するステップと、ページごとに、複数の前記主要なスタイル特徴のうちのいずれが存在するかを検出し、ページ間での前記主要なスタイル特徴の変化量に基づいて、前記文書画像の区切り位置を設定するステップとを備える。 A document processing method according to still another aspect of the present invention is a document processing method executed in a document processing apparatus including a storage unit for storing a document image and an arithmetic processing unit, wherein the arithmetic processing unit includes: A step of extracting a predetermined type of document area from the document image stored in the storage unit; and the arithmetic processing unit includes a plurality of area attributes based on a plurality of predetermined attribute types for each of the document areas. Detecting and detecting a style feature representing a combination pattern of the region attributes, classifying the detected plurality of style features for each pattern, and based on the appearance frequency of the classified pattern, the document image Determining one or more primary style features of each of the plurality of primary style features, detecting which of the plurality of primary style features is present for each page; Based on the amount of change in the primary style features among di-, and a step of setting a sectioning position of the document image.

この発明のさらに他の局面に従う画像処理プログラムは、上記記載の文書処理方法をコンピュータに実行させる。 An image processing program according to still another aspect of the present invention causes a computer to execute the document processing method described above.

本発明によると、多様な種類の文書が混在した原稿からでも、ユーザの手間無しに区切り位置を設定することができる。また、主要なスタイル特徴が判定されるため、最適な位置で、文書画像を分割することができる。 According to the present invention, it is possible to set a separation position without a user's effort even from an original in which various types of documents are mixed. In addition, since the main style feature is determined, the document image can be divided at an optimal position.

本発明の実施の形態について図面を参照しながら詳細に説明する。なお、図中同一または相当部分には同一符号を付してその説明は繰返さない。 Embodiments of the present invention will be described in detail with reference to the drawings. In the drawings, the same or corresponding parts are denoted by the same reference numerals and description thereof will not be repeated.

＜構成について＞
（全体システム構成）
図１は、この発明の実施の形態に従う文書処理装置を含むシステムの概略構成図である。本実施の形態においては、代表的に、本発明に係る文書処理装置を搭載するＭＦＰ（Multi Function Peripheral）について説明する。なお、本発明に係る文書処理装置は、
ＭＦＰに限らず、複写機、ファクシミリ装置、スキャナ装置などにも適用可能である。 <About configuration>
(Overall system configuration)
FIG. 1 is a schematic configuration diagram of a system including a document processing apparatus according to an embodiment of the present invention. In the present embodiment, an MFP (Multi Function Peripheral) equipped with the document processing apparatus according to the present invention will be typically described. The document processing apparatus according to the present invention is
The present invention can be applied not only to the MFP but also to a copying machine, a facsimile apparatus, a scanner apparatus, and the like.

図１を参照して、本実施の形態に従うＭＦＰ１は、原稿３００を読取るための画像読取部１０４と、紙媒体などへの印刷処理を行なうためのプリント部１０６とを含んで構成される。本実施の形態に従うＭＦＰ１は、画像読取部１０４で原稿３００を読取ることで文書画像を取得し、この文書画像を含む電子文書を生成する。また、ＭＦＰ１は、読取った文書画像をプリント部１０６に出力する。 Referring to FIG. 1, MFP 1 according to the present embodiment is configured to include an image reading unit 104 for reading a document 300 and a printing unit 106 for performing a printing process on a paper medium or the like. MFP 1 according to the present embodiment obtains a document image by reading document 300 by image reading unit 104 and generates an electronic document including the document image. In addition, the MFP 1 outputs the read document image to the printing unit 106.

特に、ＭＦＰ１は、複数ページを有する文書画像のスタイル特徴に基づいて、該文書画像の区切り位置を設定することができる。区切り位置は、ページ単位で設定される。 In particular, the MFP 1 can set the separation position of the document image based on the style characteristics of the document image having a plurality of pages. The delimiter position is set for each page.

ＭＦＰ１は、設定された区切り位置により文書画像を分割し、分割された複数の文書画像それぞれに対応した複数の電子文書を生成する。これらの電子文書４００には、代表的に、ＰＤＦ（Portable Document Format）などのフォーマットが採用できる。 The MFP 1 divides the document image according to the set delimiter positions, and generates a plurality of electronic documents corresponding to the plurality of divided document images. These electronic documents 400 typically employ a format such as PDF (Portable Document Format).

ＭＦＰ１は、生成した電子文書４００を自身の記憶部（図１において不図示）に格納する。また、記憶された電子文書４００を、ネットワークを介してパーソナルコンピュータＰＣ１，ＰＣ２，ＰＣ３（以下、「パーソナルコンピュータＰＣ」とも総称する）に送信したりする。代表的な使用形態として、ＭＦＰ１が設置されている同一のオフィス内に敷設されたネットワークであるＬＡＮ（Local Area Network）に接続されているパーソナルコンピュータＰＣ１，ＰＣ２に対しては、ＭＦＰ１から電子文書４００が直接的に送信される。一方、ＬＡＮとＷＡＮ（Wide Area Network）との接続点には、サーバ装置ＳＲＶが設けてあり、ＭＦＰ１とは離れたオフィスにあるパーソナルコンピュータＰＣ３などに対しては、ＭＦＰ１からサーバ装置ＳＲＶを介して電子文書４００が送信される。ここで、サーバ装置ＳＲＶは代表的に、メールサーバ、ＦＴＰ（File Transfer Protocol）サーバ、Ｗｅｂサーバ、ＳＭＢサーバなどからなる。 The MFP 1 stores the generated electronic document 400 in its own storage unit (not shown in FIG. 1). In addition, the stored electronic document 400 is transmitted to personal computers PC1, PC2 and PC3 (hereinafter also collectively referred to as “personal computer PC”) via a network. As a typical usage pattern, the MFP 1 sends an electronic document 400 to the personal computers PC1 and PC2 connected to a local area network (LAN) that is a network laid in the same office where the MFP 1 is installed. Is sent directly. On the other hand, a server SRV is provided at a connection point between a LAN and a WAN (Wide Area Network). A personal computer PC3 or the like in an office remote from the MFP 1 is connected from the MFP 1 via the server SRV. An electronic document 400 is transmitted. Here, the server SRV typically includes a mail server, an FTP (File Transfer Protocol) server, a Web server, an SMB server, and the like.

画像読取部１０４は、原稿をセットするための戴荷台と、原稿台ガラスと、戴荷台にセットされた原稿を原稿台ガラスに自動的に一枚ずつ搬送する搬送部と、読取られた原稿を排出するための排出台とを含む（いずれも図示しない）。これにより、複数枚の原稿を連続的に読取って、一つまたは複数の電子文書を生成することができる。 The image reading unit 104 includes a loading table for setting a document, a document table glass, a conveyance unit that automatically conveys the documents set on the loading table one by one to the document table glass, and a scanned document. And a discharge stand for discharging (both not shown). As a result, one or a plurality of electronic documents can be generated by continuously reading a plurality of documents.

プリント部１０６は、ソート機能を備えている。ＭＦＰ１は、設定された区切り位置により文書画像を分割し、分割された複数の文書画像それぞれを、ソート方法を変えてプリント部１０６に出力する。これにより、一度に読み取られた原稿を、仕分けして出力することができる。 The print unit 106 has a sort function. The MFP 1 divides the document image according to the set separation position, and outputs each of the divided document images to the printing unit 106 by changing the sorting method. Thereby, it is possible to sort and output the originals read at one time.

（ＭＦＰ１の概略構成）
図２は、本発明の実施の形態に従うＭＦＰ１における概略のハードウェア構成を示すブロック図である。 (Schematic configuration of MFP 1)
FIG. 2 is a block diagram showing a schematic hardware configuration in MFP 1 according to the embodiment of the present invention.

図２を参照して、ＭＦＰ１の機能構成としては、制御部１００と、メモリ部１０２と、画像読取部１０４と、プリント部１０６と、通信インターフェイス部１０８と、操作パネル部１１０と、記憶部１１２とを含む。 Referring to FIG. 2, the functional configuration of MFP 1 includes a control unit 100, a memory unit 102, an image reading unit 104, a printing unit 106, a communication interface unit 108, an operation panel unit 110, and a storage unit 112. Including.

制御部１００は、代表的にＣＰＵ（Central Processing Unit）などの演算装置から構成され、プログラムを実行することで本実施の形態に従う文書処理を実現する。メモリ部１０２は、代表的にＤＲＡＭ（Dynamic Random Access Memory）などの揮発性の記憶装置であり、制御部１００で実行されるプログラムやプログラムの実行に必要なデータなどを保持する。通信インターフェイス部１０８は、代表的に、ネットワーク（たとえば、図１に示すＬＡＮ）を介してパーソナルコンピュータＰＣ（図１）との間でデータを送受信するための部位であり、たとえば、ＬＡＮアダプタおよびそれを制御するドライバソフトなどを含む。プリント部１０６は、プリント処理を行なうための部位であり、プリント処理に係るハードウェア構成に加えて、各部の作動を制御するための制御装置をも含む。 The control unit 100 is typically composed of an arithmetic device such as a CPU (Central Processing Unit), and implements document processing according to the present embodiment by executing a program. The memory unit 102 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory), and holds a program executed by the control unit 100, data necessary for executing the program, and the like. The communication interface unit 108 is typically a part for transmitting and receiving data to and from the personal computer PC (FIG. 1) via a network (for example, the LAN shown in FIG. 1). Including driver software to control The print unit 106 is a part for performing print processing, and includes a control device for controlling the operation of each unit in addition to the hardware configuration related to print processing.

記憶部１１２は、代表的にハードディスク装置やフラッシュメモリなどの不揮発性の記憶装置であり、制御部１００の動作のためのプログラムや制御部１００で生成された電子文書４００などを格納する。 The storage unit 112 is typically a nonvolatile storage device such as a hard disk device or a flash memory, and stores a program for the operation of the control unit 100, the electronic document 400 generated by the control unit 100, and the like.

なお、本実施の形態において、設定された区切り位置に基づいて分割された文書画像は、通信インターフェイス部１０８、記憶部１１２およびプリント部１０６のうちのいずれかに出力されるものとする。 In the present embodiment, it is assumed that the document image divided based on the set separation position is output to any one of the communication interface unit 108, the storage unit 112, and the print unit 106.

操作パネル部１１０の外観例を図３に示す。図３は、本発明の実施の形態に従うＭＦＰ１の操作パネル部の外観の一例を示す図である。 An example of the appearance of the operation panel unit 110 is shown in FIG. FIG. 3 is a diagram showing an example of the appearance of the operation panel unit of MFP 1 according to the embodiment of the present invention.

図３を参照して、操作パネル部１１０は、液晶表示装置やタッチパネルなどから構成される表示パネル１１０ａと、ストップボタン１１０ｂと、スタートボタン１１０ｃと、英数キーなどを含む操作ボタン１１０ｄとを備えている。 Referring to FIG. 3, operation panel unit 110 includes a display panel 110a including a liquid crystal display device, a touch panel, and the like, a stop button 110b, a start button 110c, and an operation button 110d including alphanumeric keys. ing.

なお、ＭＦＰ１は、表示機能と指示の入力機能とを兼ね備えた操作パネル部１１０を備えることとしたが、これに代えて、表示部とハードウェアボタンを含む入力部との両方を備えることとしてもよい。 The MFP 1 includes the operation panel unit 110 having both a display function and an instruction input function. Alternatively, the MFP 1 may include both a display unit and an input unit including hardware buttons. Good.

（パーソナルコンピュータの構成）
図４は、本発明の実施の形態に従うパーソナルコンピュータＰＣの概略のハードウェア構成を示す模式図である。 (Configuration of personal computer)
FIG. 4 is a schematic diagram showing a schematic hardware configuration of personal computer PC according to the embodiment of the present invention.

図４を参照して、パーソナルコンピュータＰＣは、オペレーティングシステム（ＯＳ：Operating System）を含む各種プログラムを実行するＣＰＵ（Central Processing Unit）２０１と、ＣＰＵ２０１でのプログラムの実行に必要なデータを一時的に記憶するメモリ部２１３と、ＣＰＵ２０１で実行されるプログラムを不揮発的に記憶するハードディスク部（ＨＤＤ：Hard Disk Drive）２１１とを含む。また、ハードディスク部２１１には、ＭＦＰ１で生成された電子文書４００を表示するための閲覧アプリケーションが記憶されており、このようなプログラムは、ＦＤＤドライブ２１７またはＣＤ−ＲＯＭドライブ２１５によって、それぞれフレキシブルディスク２１７ａまたはＣＤ−ＲＯＭ（Compact Disk-Read Only Memory）２１５ａなどから読取られる。 Referring to FIG. 4, personal computer PC temporarily stores a CPU (Central Processing Unit) 201 that executes various programs including an operating system (OS) and data necessary for the CPU 201 to execute the program. A memory unit 213 that stores data and a hard disk unit (HDD: Hard Disk Drive) 211 that stores programs executed by the CPU 201 in a nonvolatile manner are included. The hard disk unit 211 stores a browsing application for displaying the electronic document 400 generated by the MFP 1, and such a program is stored in the flexible disk 217 a by the FDD drive 217 or the CD-ROM drive 215, respectively. Alternatively, it is read from a CD-ROM (Compact Disk-Read Only Memory) 215a or the like.

ＣＰＵ２０１は、キーボードやマウスなどからなる入力部２０９を介してユーザからの指示を受取るとともに、プログラムの実行によって生成される画面出力をディスプレイ部２０５へ出力する。また、ＣＰＵ２０１は、ＬＡＮカードなどからなる通信インターフェイス部２０７を介して、ＬＡＮやＷＡＮに接続されたＭＦＰ１やサーバ装置ＳＲＶ（図１）から電子文書４００を取得し、ハードディスク部２１１などに格納する。また、上述の各部は、内部バス２０３を介して相互にデータを授受する。 The CPU 201 receives an instruction from the user via the input unit 209 including a keyboard and a mouse, and outputs a screen output generated by executing the program to the display unit 205. Further, the CPU 201 acquires the electronic document 400 from the MFP 1 or server SRV (FIG. 1) connected to the LAN or WAN via the communication interface unit 207 including a LAN card and stores the electronic document 400 in the hard disk unit 211 or the like. Further, the above-described units exchange data with each other via the internal bus 203.

（ＭＦＰの機能構成）
図５は、本発明の実施の形態に従うＭＦＰ１が電子文書を生成する場合の機能構成を示す機能ブロック図である。 (Functional configuration of MFP)
FIG. 5 is a functional block diagram showing a functional configuration when MFP 1 according to the embodiment of the present invention generates an electronic document.

図５を参照して、ＭＦＰ１の制御部１００は、画像バッファ部１３と、電子文書生成部１５と、画像解析部１６と、送信処理部１７と、表示制御部２５と、指示受付部２６と、パラメータ変更部２７とを含む。 Referring to FIG. 5, the control unit 100 of the MFP 1 includes an image buffer unit 13, an electronic document generation unit 15, an image analysis unit 16, a transmission processing unit 17, a display control unit 25, and an instruction reception unit 26. A parameter changing unit 27.

上述の画像読取部１０４は、原稿３００を読取って文書画像を取得する。より具体的には、紙状の原稿３００を光学的に読取り、デジタルデータに変換することでＲＧＢデジタル画像データを取得する。画像読取部１０４は、取得した文書画像（デジタル画像データ）を画像バッファ部１３へ出力する。 The above-described image reading unit 104 reads the document 300 and acquires a document image. More specifically, RGB digital image data is acquired by optically reading a paper-like document 300 and converting it to digital data. The image reading unit 104 outputs the acquired document image (digital image data) to the image buffer unit 13.

なお、本実施の形態では、画像読取部１０４によって読取られた文書画像が、画像バッファ部１３に出力されることとして説明するが、受信部１２として機能する通信インターフェイス部１０８が受信した画像データ３１０が、画像バッファ部１３に出力されてもよい。受信部１２は、ＬＡＮなどのネットワークを介して画像データ３１０を受信してもよいし、たとえば携帯型小型メモリ（図示せず）に記憶された画像データ３１０を受付けてもよい。受信部１２は、データ形式の整合をとった上で、画像データ３１０を出力する。 In the present embodiment, the document image read by the image reading unit 104 is described as being output to the image buffer unit 13, but the image data 310 received by the communication interface unit 108 functioning as the receiving unit 12. May be output to the image buffer unit 13. The receiving unit 12 may receive the image data 310 via a network such as a LAN, or may receive the image data 310 stored in a portable small memory (not shown), for example. The receiving unit 12 outputs the image data 310 after matching the data format.

画像バッファ部１３は、文書画像のデータを一時的に格納する部位であり、画像解析部１６および電子文書生成部１５から出力の要求があるまで画像データを保持する。画像解析部１６および電子文書生成部１５から出力の要求があれば、対応する画像データを出力する。本実施の形態において、画像バッファ部１３は、各画像データの管理情報も保持する。管理情報には、後述するように、文書ＩＤが含まれる。 The image buffer unit 13 is a part for temporarily storing document image data, and holds the image data until an output request is received from the image analysis unit 16 and the electronic document generation unit 15. If there is an output request from the image analysis unit 16 and the electronic document generation unit 15, the corresponding image data is output. In the present embodiment, the image buffer unit 13 also holds management information for each image data. As described later, the management information includes a document ID.

画像解析部１６は、行領域抽出部２１と、スタイル特徴検出部２２と、主要判定部２３と、区切り位置設定部２４とを含む。 The image analysis unit 16 includes a line area extraction unit 21, a style feature detection unit 22, a main determination unit 23, and a break position setting unit 24.

行領域抽出部２１は、画像バッファ部１３から出力される文書画像から、所定の種類の文書領域として行領域を抽出する。本実施の形態において、「所定の種類の文書領域」は、テキスト領域に含まれる領域であればよく、行領域に限定されるものではない。たとえば、一定以上の余白によって分割される文字領域、文字列および段など、他の部分領域であってもよい。あるいは、ページ，カラムなどの上位領域であってもよい。 The line area extraction unit 21 extracts a line area from the document image output from the image buffer unit 13 as a predetermined type of document area. In the present embodiment, the “predetermined type of document area” may be an area included in the text area, and is not limited to the line area. For example, it may be other partial areas such as a character area, a character string, and a column divided by a certain margin or more. Alternatively, it may be an upper area such as a page or a column.

スタイル特徴検出部２２は、行領域抽出部２１で抽出された行領域ごとに、所定の複数の属性種別に基づいて、スタイル特徴を検出する。より具体的には、行領域ごとに、所定の複数の属性種別それぞれに基づく複数の行属性を検出し、その行属性の組合せのパターンを、スタイル特徴として検出する。「所定の複数の属性種別」とは、文書画像に含まれる複数のページの特徴を特定するために用いられる情報であり、上位領域での位置，大きさ，色、部分領域の個数，位置，高さ，色，隣接する部分領域との距離、文字の大きさ，文字修飾、文字色、背景色、フォントの種別のうち、少なくとも２つを含む。その他にも、頻度が上位の文字領域の高さ、などを含んでもよい。本実施の形態では、所定の複数の属性種別は、行（部分領域）の高さ、近接行との行間（位置）、文字色、背景色および文字修飾を含むものとする。本実施の形態において、「行属性」とは、行領域ごとの、５つの属性種別それぞれに対応した５種類の属性をいう。 The style feature detection unit 22 detects a style feature for each row region extracted by the row region extraction unit 21 based on a plurality of predetermined attribute types. More specifically, for each row region, a plurality of row attributes based on each of a plurality of predetermined attribute types are detected, and a combination pattern of the row attributes is detected as a style feature. “Predetermined multiple attribute types” are information used to identify the characteristics of a plurality of pages included in a document image. The position, size, color, number of partial areas, position, It includes at least two of height, color, distance between adjacent partial areas, character size, character modification, character color, background color, and font type. In addition, it may include the height of the upper character area. In the present embodiment, the predetermined plurality of attribute types include the height of a line (partial region), the line spacing (position) with an adjacent line, character color, background color, and character modification. In the present embodiment, “row attribute” refers to five types of attributes corresponding to each of the five attribute types for each row region.

検出された行領域ごとのスタイル特徴は、主要判定部２３に出力される。
主要判定部２３は、検出された複数のスタイル特徴のうち、読取られた文書画像の主要なスタイル特徴を判定する。具体的には、複数のスタイル特徴をパターンごとに分類し、分類されたパターンの出現頻度に基づいて、文書画像についての１以上の主要なスタイル特徴を判定する。つまり、主要判定部２３は、スタイル特徴ごとに、そのスタイル特徴が出現したページ数を出現頻度として算出（カウント）し、算出されたページ数が、所定値以上であるスタイル特徴を、主要なスタイル特徴（有効）として判定する。言い換えると、ページ数が所定値未満のスタイル特徴は、局所的なスタイル特徴であり、文書を特徴付ける主要なスタイル特徴ではない（無効）と判断する。判定された主要なスタイル特徴は、区切り位置設定部２４に出力される。 The detected style feature for each row area is output to the main determination unit 23.
The main determining unit 23 determines a main style feature of the read document image among the detected plurality of style features. Specifically, a plurality of style features are classified for each pattern, and one or more main style features for the document image are determined based on the appearance frequency of the classified patterns. That is, for each style feature, the main determination unit 23 calculates (counts) the number of pages in which the style feature appears as the appearance frequency, and determines the style feature whose calculated number of pages is equal to or greater than a predetermined value as the main style. Judged as a feature (valid). In other words, the style feature having the number of pages less than the predetermined value is a local style feature, and is determined not to be a main style feature characterizing the document (invalid). The determined main style features are output to the break position setting unit 24.

区切り位置設定部２４は、判定された複数の主要なスタイル特徴に基づいて、読取られた文書画像の区切り位置をページ単位で設定する。具体的には、ページごとに、複数の主要なスタイル特徴のうちのいずれが存在するかを検出し、ページ間での主要なスタイル特徴の変化量に基づいて、文書画像の区切り位置を設定する。つまり、ページに存在する主要なスタイル特徴を、当該ページのページ特徴として判定する。そして、ページごとに、当該ページのページ特徴が前ページのページ特徴から変化した量を変化量として算出する。区切り位置設定部２４は、算出された変化量が一定値以上の場合に、そのページ間に区切り位置を設定する。箇所を、区切り位置として設定する。また、区切り位置設定部２４は、設定した区切り位置に基づいて、各ページに対応する画像データに文書ＩＤを付与する。付与した文書ＩＤの情報は、画像バッファ部１３に出力される。 The delimiter position setting unit 24 sets the delimiter position of the read document image in units of pages based on the determined main style features. Specifically, for each page, it is detected which of a plurality of main style features is present, and the document image separation position is set based on the amount of change of the main style features between pages. . That is, the main style feature existing on the page is determined as the page feature of the page. Then, for each page, the amount of change of the page feature of the page from the page feature of the previous page is calculated as a change amount. The break position setting unit 24 sets a break position between the pages when the calculated change amount is equal to or greater than a certain value. Set the location as the break position. Further, the break position setting unit 24 assigns a document ID to the image data corresponding to each page based on the set break position. Information on the assigned document ID is output to the image buffer unit 13.

このように、文書画像における主要なスタイル特徴に基づいて、自動的に区切り位置が設定される。そのため、多様な種類の文書が混在した原稿からでもユーザの手間を必要とせずに最適な位置に区切りを設定することができる。 Thus, the break position is automatically set based on the main style features in the document image. Therefore, it is possible to set a break at an optimum position without requiring the user's effort even from a manuscript in which various types of documents are mixed.

なお、区切り位置設定部２４は、ページ特徴を判定した後に、補正処理も行なうことが好ましい。すなわち、連続する第１、第２および第３のページのうち、第２のページ以外の第１および第３のページで同一のページ特徴が検出された場合には、第２のページにも同一のページ特徴が検出されたものとして補正する。このような補正は、全ページに同一のページ特徴が検出されない場合にのみ、実行されることが好ましい。前後のページに同じページ特徴がある場合には、その間のページも同じページ特徴であることが一般的である。たとえば、その間のページが表や写真であった場合などが想定される。このような場合にも、主要なスタイル特徴の出現位置から各ページのページ特徴を適切に判定するために、上記のような補正処理を行なう。これにより、区切りとして最適でない位置で区切りが設定されてしまうことを防止することができる。 Note that the break position setting unit 24 preferably performs correction processing after determining the page characteristics. That is, when the same page feature is detected in the first and third pages other than the second page among the first, second, and third pages that are continuous, the same applies to the second page. The correction is made assuming that the page feature is detected. Such correction is preferably performed only when the same page feature is not detected on all pages. When the same page feature is present in the preceding and following pages, the pages in between are generally the same page feature. For example, the case where the page in the meantime was a table | surface or a photograph is assumed. Even in such a case, the correction process as described above is performed in order to appropriately determine the page feature of each page from the appearance position of the main style feature. Thereby, it is possible to prevent a break from being set at a position that is not optimal as a break.

電子文書生成部１５は、画像バッファ部１３から文書画像および管理情報を入力する。電子文書生成部１５は、画像データの圧縮処理を行ない、圧縮された画像データをＰＤＦ形式に変換する（電子文書を生成する）。その際に、電子文書生成部１５は、管理情報に含まれる文書ＩＤに基づいて（すなわち、設定された区切り位置に基づいて）、複数の電子文書を生成する。より具体的には、付与された文書ＩＤごとに、電子文書を生成する。そして、生成した電子文書の情報を表示制御部２５に出力する。また、指示受付部２６が所定の指示を受付けた場合に、生成された電子文書を、ユーザによる設定などに応じて、記憶部１１２へ格納されたり、送信処理部１７へ出力されたりする。なお、圧縮度合いは、生成される電子文書の大きさや、要求される文書画像の解像度などに応じて変化させてもよい。 The electronic document generation unit 15 inputs a document image and management information from the image buffer unit 13. The electronic document generation unit 15 performs image data compression processing and converts the compressed image data into a PDF format (generates an electronic document). At that time, the electronic document generation unit 15 generates a plurality of electronic documents based on the document ID included in the management information (that is, based on the set separation position). More specifically, an electronic document is generated for each assigned document ID. Then, the information of the generated electronic document is output to the display control unit 25. In addition, when the instruction receiving unit 26 receives a predetermined instruction, the generated electronic document is stored in the storage unit 112 or output to the transmission processing unit 17 according to the setting by the user. The degree of compression may be changed according to the size of the generated electronic document, the required resolution of the document image, and the like.

表示制御部２５は、生成された各電子文書を特定するためのインデックス画像、たとえば最初のページのプレビュー画像を表示するための信号を生成する。表示制御部２５は、生成した信号を操作パネル部１１０に出力する。これにより、操作パネル部１１０に、複数のプレビュー画像が表示される。なお、インデックス画像は、文書画像の区切り位置が特定可能であればよく、最初のページのプレビュー画像に限定されない。また、文書画像の区切り位置が特定できれば、画像でなくてもよい。 The display control unit 25 generates a signal for displaying an index image for specifying each generated electronic document, for example, a preview image of the first page. The display control unit 25 outputs the generated signal to the operation panel unit 110. As a result, a plurality of preview images are displayed on the operation panel unit 110. Note that the index image is not limited to the preview image of the first page as long as it is possible to specify the document image separation position. Further, the image may not be an image as long as the document image separation position can be specified.

指示受付部２６は、操作パネル部１１０を介して、生成された電子文書の数に関する指示を受付ける。具体的には、ユーザから、文書数を増やす指示（以下「増指示」という）、文書数を減らす指示（以下「減指示」という）およびＯＫ指示（現状の文書数で出力する旨の指示）のいずれかが受付けられる。指示受付部２６は、増指示または減指示を受付けた場合に、その指示内容をパラメータ変更部２７に出力する。ＯＫ指示を受付けた場合には、その旨を電子文書生成部１５に出力する。 The instruction receiving unit 26 receives an instruction regarding the number of generated electronic documents via the operation panel unit 110. Specifically, an instruction to increase the number of documents (hereinafter referred to as “increase instruction”), an instruction to decrease the number of documents (hereinafter referred to as “decrease instruction”), and an OK instruction (instruction to output with the current number of documents). Either of these is accepted. When receiving an increase instruction or a decrease instruction, the instruction receiving unit 26 outputs the instruction content to the parameter changing unit 27. When an OK instruction is accepted, a message to that effect is output to the electronic document generation unit 15.

パラメータ変更部２７は、受付けた修正の指示内容に基づいて、区切り位置設定のための所定のパラメータを変更する。パラメータが変更されると、再度、区切り位置設定部２４による処理の実行が指示される。これにより、区切り位置設定部２４は、変更されたパラメータに基づいて、再度、区切り位置設定処理を実行する。 The parameter changing unit 27 changes a predetermined parameter for setting the break position based on the received correction instruction content. When the parameter is changed, execution of processing by the separation position setting unit 24 is instructed again. Thereby, the break position setting unit 24 executes the break position setting process again based on the changed parameter.

このように、本実施の形態では、区切り位置が自動的に設定された後に、ユーザにその区切り度合い（分割数）を確認させる。そのため、ユーザは、希望の数になるように何度でも区切り位置設定処理の実行を指示することができる。 As described above, in this embodiment, after the break position is automatically set, the user is made to confirm the break degree (number of divisions). Therefore, the user can instruct execution of the separation position setting process as many times as desired.

送信処理部１７は、電子文書４００を、各種の通信プロトコルに従って、ユーザにより指定された送り先に送信するための処理を行なう。プロトコルとしては、たとえばＳＭＢ（Server Message Block），ＦＴＰ（File Transfer Protocol），ＨＴＴＰ（Hyper Text Transfer Protocol），ＳＭＴＰ（Simple Mail Transfer Protocol）などが選択可能である。本実施の形態において、送信処理部１７は、文書ＩＤが異なる複数の電子文書４００が生成された場合には、各電子文書４００に異なるファイル名を付与する。ファイル名として、たとえば、「“Doc”＋“文書ＩＤ”＋“.pdf”」が付与される。また、送信処理部１７は、ユーザからの指示に基づいて、電子文書４００の送り先を設定する。なお、これらの具体的な処理の例については後述する。送信処理部１７は、電子文書４００、電子文書４００ごとのファイル名および送り先を送信部１８に出力する。 The transmission processing unit 17 performs processing for transmitting the electronic document 400 to a destination designated by the user according to various communication protocols. As the protocol, for example, SMB (Server Message Block), FTP (File Transfer Protocol), HTTP (Hyper Text Transfer Protocol), SMTP (Simple Mail Transfer Protocol) or the like can be selected. In the present embodiment, when a plurality of electronic documents 400 having different document IDs are generated, the transmission processing unit 17 assigns a different file name to each electronic document 400. For example, “Doc” + “Document ID” + “. Pdf” is given as the file name. Further, the transmission processing unit 17 sets a destination of the electronic document 400 based on an instruction from the user. Examples of these specific processes will be described later. The transmission processing unit 17 outputs the electronic document 400 and the file name and destination for each electronic document 400 to the transmission unit 18.

送信部１８は、通信インターフェイス部１０８によって実現され、ＬＡＮなどのネットワークを介してパーソナルコンピュータＰＣ（図１）などへ、電子文書生成部１５で生成された複数の電子文書４００を送信する。なお、送信部１８は、たとえば携帯型小型メモリ（図示せず）を装着可能であり、電子文書４００は、そのような着脱可能なメモリに出力されてもよい。 The transmission unit 18 is realized by the communication interface unit 108, and transmits the plurality of electronic documents 400 generated by the electronic document generation unit 15 to a personal computer PC (FIG. 1) or the like via a network such as a LAN. The transmission unit 18 can be mounted with, for example, a portable small memory (not shown), and the electronic document 400 may be output to such a removable memory.

なお、制御部１００に含まれる各ブロックの動作は、たとえば記憶部１１２中に格納されたソフトウェアを実行することで実現されてもよいし、これらのブロックのうち少なくとも１つについては、ハードウェアで実現されてもよい。 The operation of each block included in the control unit 100 may be realized, for example, by executing software stored in the storage unit 112, and at least one of these blocks is implemented by hardware. It may be realized.

＜動作について＞
（電子文書生成処理）
以下に、本発明の実施の形態のＭＦＰ１が実行する電子文書生成処理について、図６〜８のフローチャートおよび図９〜２２を参照して具体的に説明する。 <About operation>
(Electronic document generation processing)
The electronic document generation process executed by the MFP 1 according to the embodiment of the present invention will be specifically described below with reference to the flowcharts of FIGS. 6 to 8 and FIGS.

図６は、本発明の実施の形態における電子文書生成処理を示すフローチャートである。図６のフローチャートに示す処理は、予めプログラムとして記憶部１１２に格納されており、制御部１００がこのプログラムを読み出して実行することにより、電子文書生成処理の機能が実現される。なお、このフローチャートにおいては、電子文書をインターネット経由でＰＣ等に送信することがユーザにより設定されたものとして説明する。 FIG. 6 is a flowchart showing an electronic document generation process in the embodiment of the present invention. The processing shown in the flowchart of FIG. 6 is stored in advance in the storage unit 112 as a program, and the function of the electronic document generation processing is realized by the control unit 100 reading and executing this program. In this flowchart, it is assumed that the user has set transmission of an electronic document to a PC or the like via the Internet.

図６を参照して、はじめに、行領域抽出部２１は、画像バッファ部１３に記憶された文書画像から、行領域を抽出する（ステップＳ２）。具体的には、次の手順で行領域が抽出される。図９は、本実施の形態における行領域抽出処理を説明するための図である。図９（ａ）には、元の文書画像の任意のページが示されている。 With reference to FIG. 6, the line area extraction unit 21 first extracts a line area from the document image stored in the image buffer unit 13 (step S2). Specifically, a row area is extracted by the following procedure. FIG. 9 is a diagram for explaining row area extraction processing in the present embodiment. FIG. 9A shows an arbitrary page of the original document image.

行領域抽出部２１は、各ページのＲＧＢ画像データを２値化する。たとえば、ＲＧＢ画像の輝度値に変換し、各ページの輝度値の平均値と比較する。これにより、明るい場合は「０」、暗い場合は「１」がたとえばメモリ部１０２に一時記録される。次に、２値画像の値が“１”の画素を８連結で走査する。連結がある場合には、画素に同一のラベル値を与えることで黒連結領域を生成し、その外接矩形を求める。図９（ｂ）は、外接矩形が求められた結果を示す図である。 The row area extraction unit 21 binarizes the RGB image data of each page. For example, the luminance value of the RGB image is converted and compared with the average luminance value of each page. Accordingly, “0” is temporarily recorded in the bright area and “1” is recorded in the dark area in the memory unit 102, for example. Next, the pixels of the binary image whose value is “1” are scanned eight times. When there is connection, a black connected region is generated by giving the same label value to the pixel, and a circumscribed rectangle is obtained. FIG. 9B is a diagram illustrating a result of obtaining a circumscribed rectangle.

次に、各外接矩形について左右方向に一定距離以下に近接する領域を検出し、検出された領域を１つの領域に統合する。図９（ｃ）は、図９（ｂ）の外接矩形が統合された領域を示す図である。行領域抽出部２１は、統合された領域を行領域として出力する。複数ページを含む文書に対して抽出された行領域の例を図１０に示す。なお、行領域を抽出するための処理は、上述のような処理に限定されるものではない。 Next, for each circumscribed rectangle, a region close to a certain distance in the left-right direction is detected, and the detected regions are integrated into one region. FIG. 9C shows a region where the circumscribed rectangles of FIG. 9B are integrated. The line area extraction unit 21 outputs the integrated area as a line area. An example of a row area extracted for a document including a plurality of pages is shown in FIG. Note that the process for extracting the row region is not limited to the process described above.

ステップＳ２の処理が終わると、スタイル特徴検出部２２は、所定の複数の属性種別に基づいて、抽出された行領域ごとのスタイル特徴を検出する（ステップＳ４）。具体的には、複数の属性種別として、行の高さ、近接行との行間、文字色、背景色および文字修飾が用いられる。したがって、スタイル特徴検出部２２は、行領域ごとに、これらの属性種別に基づいたスタイル特徴を検出する。 When the process of step S2 ends, the style feature detection unit 22 detects the style feature for each extracted row region based on a plurality of predetermined attribute types (step S4). Specifically, line height, line spacing with adjacent lines, character color, background color, and character modification are used as a plurality of attribute types. Therefore, the style feature detection unit 22 detects a style feature based on these attribute types for each row region.

「行の高さ」は、行領域の高さを算出することで得られる。「近接行との行間」は、上下に隣接する行領域を検出して、隣接する各行領域との距離を算出する。そして、距離が小さい方の値を、「近接行との行間」として得る。「文字色」および「背景色」は、各行領域のカラーヒストグラムを生成する。そして、最頻度の色を「背景色」、次頻度の色を「文字色」として得る。「文字修飾」は、行領域のランレングスヒストグラムを生成する。すなわち、行領域の縦方向および横方向の黒画素の連続長のヒストグラムを生成する。そして、生成されたランレングスヒストグラムに基づいて、「太い文字」、「斜体」などの文字修飾を判定し、最頻度の文字修飾を「文字修飾」として得る。なお、各々の行属性の検出方法は、上述のような方法に限定されるものではない。 The “row height” is obtained by calculating the height of the row area. “Inter-row spacing” detects the adjacent row regions in the vertical direction and calculates the distance to each adjacent row region. Then, the value having the smaller distance is obtained as “the space between adjacent rows”. “Character color” and “background color” generate a color histogram of each row region. Then, the most frequent color is obtained as “background color” and the next frequency color is obtained as “character color”. “Character modification” generates a run-length histogram of the line area. That is, a continuous length histogram of black pixels in the vertical and horizontal directions of the row region is generated. Then, based on the generated run length histogram, character modification such as “thick character” and “italic” is determined, and the most frequent character modification is obtained as “character modification”. The method for detecting each row attribute is not limited to the method described above.

スタイル特徴検出部２２は、検出した各々の行属性を、行領域リストに記録する。
図１１は、文書画像に含まれる行領域のリストの一例を示す図である。図１１を参照して、行領域のリストには、主に４つの項目すなわちページ、ページＩＤ、行ＩＤおよび属性種別を有しており、これらの項目が互いに対応付けられている。ページの項目には、文書画像のページ順に、１ページ目、２ページ目、…、６ページ目が記録される。ページＩＤの項目には、各ページを一意に特定するための識別情報（ＩＤ）が記録される。行ＩＤの項目には、各ページ内の各行を一意に特定するための識別情報（ＩＤ）が記録される。属性種別の項目には、行ＩＤごとに、上記５つの属性種別に基づくスタイル特徴が記録される。「スタイル特徴」とは、本実施の形態において、５つの行属性の組合せに基づく特徴を示す。つまり、たとえば、１ページ目における行ＩＤ１の行では、行の高さ「４」、近接行との行間「２０」、文字色「黒」、背景色「白」および文字修飾「太い文字」のそれぞれが、この行の行属性である。また、これらの行属性の組合せが、この行のスタイル特徴となる。 The style feature detection unit 22 records each detected row attribute in the row area list.
FIG. 11 is a diagram illustrating an example of a list of row areas included in the document image. Referring to FIG. 11, the row area list has mainly four items, that is, a page, a page ID, a row ID, and an attribute type, and these items are associated with each other. In the page item, the first page, the second page,..., The sixth page are recorded in the page order of the document image. In the page ID item, identification information (ID) for uniquely specifying each page is recorded. In the row ID item, identification information (ID) for uniquely specifying each row in each page is recorded. In the attribute type item, style characteristics based on the above five attribute types are recorded for each row ID. “Style feature” refers to a feature based on a combination of five row attributes in the present embodiment. That is, for example, in the row with the row ID 1 on the first page, the row height “4”, the line spacing “20” with the adjacent row, the character color “black”, the background color “white”, and the character modification “thick character”. Each is a row attribute of this row. A combination of these row attributes is a style feature of this row.

ステップＳ４の処理が終わると、スタイル特徴検出部２２は、行領域ごとに検出されたスタイル特徴に基づいて、スタイル特徴リストを生成する（ステップＳ６）。スタイル特徴リストは、ユニークなスタイル特徴のリストである。 When the process of step S4 ends, the style feature detection unit 22 generates a style feature list based on the style features detected for each row region (step S6). The style feature list is a list of unique style features.

図１２は、スタイル特徴リストの一例を示す図である。
図１２を参照して、スタイル特徴リストは、６つの項目すなわち、スタイルＩＤ、行の高さ、近接行との行間、文字色、背景色および文字修飾の項目を有し、各項目は互いに対応付けられている。スタイルＩＤは、ユニークなスタイル特徴を一意に特定するための識別情報（ＩＤ）である。 FIG. 12 is a diagram illustrating an example of a style feature list.
Referring to FIG. 12, the style feature list has six items, that is, a style ID, a line height, a space between adjacent lines, a character color, a background color, and a character modification item, and each item corresponds to each other. It is attached. The style ID is identification information (ID) for uniquely specifying a unique style feature.

ステップＳ６の処理が終わると、主要判定部２３は、スタイル特徴ごとの頻度を算出する（ステップＳ８）。具体的には、スタイル特徴リストのスタイルＩＤごとに、出現する行数およびページ数を算出する。これにより、各スタイルＩＤで特定されるスタイル特徴ごとに、文書画像における出現頻度（存在割合）が算出される。なお、ページごとのスタイル特徴を抽出する必要があるので、ページ数のみが算出されることとしてもよい。 When the process of step S6 ends, the main determination unit 23 calculates the frequency for each style feature (step S8). More specifically, the number of appearing lines and the number of pages are calculated for each style ID in the style feature list. Thus, the appearance frequency (existence ratio) in the document image is calculated for each style feature specified by each style ID. Since it is necessary to extract the style features for each page, only the number of pages may be calculated.

図１３は、スタイル特徴ごとの頻度算出結果を示すテーブル（以下「算出結果テーブル」という）の一例を示す図である。図１３を参照して、算出結果テーブルの列には、７つのスタイルＩＤそれぞれに対応する７つの項目が設けられ、該テーブルの行には、６つのページＩＤそれぞれに対応する６つの項目と、行数を示す項目と、ページ数を示す項目とが設けられる。なお、算出結果テーブルの行に含まれる有効性を示すの項目は、後の判定処理の結果が記録される。 FIG. 13 is a diagram illustrating an example of a table indicating a frequency calculation result for each style feature (hereinafter referred to as “calculation result table”). Referring to FIG. 13, the calculation result table column includes seven items corresponding to the seven style IDs, and the row of the table includes six items corresponding to the six page IDs, and An item indicating the number of lines and an item indicating the number of pages are provided. In addition, the item of the effectiveness included in the row of the calculation result table records the result of the subsequent determination process.

算出結果テーブルには、ページＩＤの欄には、各スタイルＩＤで特定されるスタイル特徴が出現する行数が記録される。そして、主要判定部２３は、記録した行数の情報に基づいて、スタイルＩＤごとに、行数およびページ数を算出する。 In the calculation result table, the number of lines in which the style feature specified by each style ID appears is recorded in the page ID column. Then, the main determination unit 23 calculates the number of lines and the number of pages for each style ID based on the recorded information on the number of lines.

ステップＳ８の処理が終わると、主要判定部２３は、主要なスタイル特徴を判定する（ステップＳ１０）。具体的には、たとえば、ページ数が所定数以上のスタイル特徴を、主要なスタイル特徴として判定する。なお、ここでの「所定数」とは、文書画像が有するページ数に基づいて、予め算出された値であることが望ましい。たとえば、ページ数に一定割合（たとえば、１／３）を乗じた値である。したがって、図１０に示した文書画像の例では、ページ数が２以上のスタイル特徴が、主要なスタイル特徴であると判定される。 When the process of step S8 ends, the main determination unit 23 determines main style characteristics (step S10). Specifically, for example, a style feature having a predetermined number of pages or more is determined as a main style feature. Here, the “predetermined number” is preferably a value calculated in advance based on the number of pages of the document image. For example, it is a value obtained by multiplying the number of pages by a certain ratio (for example, 1/3). Therefore, in the example of the document image shown in FIG. 10, the style feature having two or more pages is determined as the main style feature.

主要判定部２３は、算出結果テーブルの有効性の欄のうち、ページ数が２以上のスタイル特徴に対応するスタイルＩＤの欄に有効を示す「有」を記録し、ページ数が２未満のスタイル特徴に対応するスタイルＩＤの欄に無効を示す「無」を記録する。つまり、主要なスタイル特徴であれば「有」が記録されることになる。図１３では、スタイルＩＤが２，３，５，６のスタイル特徴が主要なスタイル特徴であることが示される。 The main determination unit 23 records “present” indicating validity in a style ID column corresponding to a style feature having two or more pages in the validity column of the calculation result table, and a style having less than two pages. “None” indicating invalidity is recorded in the style ID column corresponding to the feature. In other words, “Yes” is recorded for the main style feature. FIG. 13 shows that style features with style IDs 2, 3, 5, and 6 are main style features.

なお、ここでは、ページ数のみに基づいて、主要なスタイル特徴を判定したが、行数を加味した判定を行なってもよい。たとえば、ページ数が２以上であり、かつ、行数が所定数以上であるものを、主要なスタイル特徴と判定してもよい。この場合の「所定数」は、全行数に基づいて、予め算出された値であってもよいし、ページ数に基づいて、予め算出された値であってもよい。このように、行数も加味した判定をすることで、ページ内での出現割合が低いスタイル特徴が、主要なスタイル特徴と判定されることを防止することができる。または、各スタイル特徴のページ内での占有面積を算出し、その割合が、一定値以上の場合にのみ、ページ数をカウントするようにしてもよい。 Here, the main style features are determined based only on the number of pages, but determination may be made in consideration of the number of lines. For example, a case where the number of pages is 2 or more and the number of lines is a predetermined number or more may be determined as the main style feature. The “predetermined number” in this case may be a value calculated in advance based on the total number of rows, or may be a value calculated in advance based on the number of pages. In this way, by making a determination in consideration of the number of lines, it is possible to prevent a style feature having a low appearance ratio in the page from being determined as a main style feature. Alternatively, the occupation area of each style feature in a page may be calculated, and the number of pages may be counted only when the ratio is equal to or greater than a certain value.

ステップＳ１０の処理が終わると、主要判定部２３は、主要なスタイル特徴と判定されたスタイル特徴についてのテーブル（以下「主要領域特徴テーブル」）を生成する（ステップＳ１２）。図１４は、主要領域特徴テーブルの一例を示す図である。 When the process of step S10 ends, the main determination unit 23 generates a table (hereinafter, “main area feature table”) of style features determined to be main style features (step S12). FIG. 14 is a diagram illustrating an example of the main area feature table.

図１４を参照して、主要領域特徴テーブルは、図１３に示した算出結果テーブルのうち、有効性の欄に「有」が記録されたスタイルＩＤに対応する項目のみを含む。つまり、主要領域特徴テーブルには、主要なスタイル特徴と判定されたスタイル特徴についての、ページごとの出現行数、トータルの行数、出現ページ数（および有効性）が含まれる。 Referring to FIG. 14, the main area feature table includes only items corresponding to the style ID in which “present” is recorded in the validity column in the calculation result table shown in FIG. 13. In other words, the main area feature table includes the number of appearing lines, the total number of lines, and the number of appearing pages (and validity) for each page for the style feature determined as the main style feature.

ステップＳ１２の処理が終わると、区切り位置設定部２４は、区切り位置設定処理を実行する（ステップＳ１４）。区切り位置設定処理については、図７のサブルーチンのフローチャートを用いて詳細に説明する。 When the process of step S12 ends, the break position setting unit 24 executes a break position setting process (step S14). The delimiter position setting process will be described in detail with reference to the flowchart of the subroutine of FIG.

図７は、本発明の実施の形態における区切り位置設定処理を示すフローチャートである。 FIG. 7 is a flowchart showing a delimiter position setting process according to the embodiment of the present invention.

区切り位置設定部２４は、各ページのページ特徴を判定する（ステップＳ１０２）。図１４の主要領域特徴テーブルに基づくと、ページ１のページ特徴は、スタイルＩＤが２，３のスタイル特徴であり、ページ２のページ特徴は、スタイルＩＤが３のスタイル特徴である。また、ページ３のページ特徴は、スタイルＩＤが２，３のスタイル特徴であり、ページ４，５のページ特徴は、各々、スタイルＩＤが５，６のスタイル特徴である。ページ６のページ特徴は、スタイルＩＤが６のスタイル特徴である。 The delimiter position setting unit 24 determines the page feature of each page (step S102). Based on the main area feature table of FIG. 14, the page feature of page 1 is a style feature with a style ID of 2 and 3, and the page feature of page 2 is a style feature with a style ID of 3. The page features of page 3 are style features with style IDs 2 and 3, and the page features of pages 4 and 5 are style features with style IDs 5 and 6, respectively. The page feature of page 6 is a style feature with a style ID of 6.

次に、区切り位置設定部２４は、補正処理を実行する（ステップＳ１０４）。具体的には、連続する３つのページのうち、間のページのみに存在しないスタイルＩＤがある場合に、そのページにも該スタイルＩＤが存在するものとして補正する。図１４の例では、囲み線５１で示されるように、スタイルＩＤ“２”で特定されるスタイル特徴（ページ特徴）は、ページ１および３に存在するが、ページ２には存在しない。したがって、区切り位置設定部２４は、スタイルＩＤ“２”に対応するページ２の欄を、“０”から“１”に書き換える。補正した結果を図１５に示す。図１５は、本発明の実施の形態における補正処理が行なわれた後の主要領域特徴テーブルを示す図である。図１５に示されるように、上記書き換えに伴ない、スタイルＩＤが“２”の行の行数およびページ数を１だけインクリメントする。このような補正処理の結果、主要領域特徴テーブルの列にページ特徴判定結果の項目６１が追加される。項目６１の行を参照すると、ページ１〜３のページ特徴がスタイルＩＤ２，３、ページ４，５のページ特徴がスタイルＩＤ５，６、ページ６のページ特徴がスタイルＩＤ６であることが示される。 Next, the break position setting unit 24 executes correction processing (step S104). Specifically, when there is a style ID that does not exist only in the pages between three consecutive pages, the page is corrected as having the style ID. In the example of FIG. 14, the style feature (page feature) specified by the style ID “2” exists on pages 1 and 3 but does not exist on page 2 as indicated by a surrounding line 51. Therefore, the break position setting unit 24 rewrites the page 2 field corresponding to the style ID “2” from “0” to “1”. The corrected result is shown in FIG. FIG. 15 is a diagram showing a main area feature table after the correction processing according to the embodiment of the present invention is performed. As shown in FIG. 15, the number of rows and the number of pages of the row with the style ID “2” are incremented by 1 along with the rewriting. As a result of such correction processing, a page feature determination result item 61 is added to the column of the main area feature table. Referring to the line of the item 61, it is indicated that the page features of pages 1 to 3 are style IDs 2 and 3, the page features of pages 4 and 5 are style IDs 5 and 6, and the page feature of page 6 is style ID 6.

なお、ここでの補正処理は、前後のページ（第１および第３のページ）に、１つでも同一のスタイル特徴の行が存在すれば実行される。しかしながら、たとえば、全ページ数に応じて予め定められた行数が存在する場合にのみ、補正処理が実行されてもよい。 Note that the correction processing here is executed if at least one row having the same style feature exists in the previous and subsequent pages (first and third pages). However, for example, the correction process may be executed only when there is a predetermined number of lines according to the total number of pages.

また、補正処理は、補正処理をするか否かを特定するための「変数ｉ」が１である場合にのみ実行されるものとし、はじめ、変数ｉは１にセットされているものとする。 The correction process is executed only when the “variable i” for specifying whether or not to perform the correction process is 1, and the variable i is initially set to 1.

なお、このような補正処理は、必ずしも実行されなくてもよい。
ステップＳ１０４の処理が終わると、区切り位置設定部２４は、ページ間でのページ特徴変化量を算出する（ステップＳ１０６）。区切り位置設定部２４は、図１５に示した主要領域特徴テーブルの列に、さらに、ページ特徴変化量の項目６２を追加する。項目６２の行を参照すると、ページ１〜３のページ特徴は同一であるため、ページ１からページ２への変化量、および、ページ２からページ３への変化量は、ともに“０”であることが記録される。ページ４には、ページ３におけるページ特徴であるスタイルＩＤ２，３のいずれも存在しないため、ページ３からページ４への変化量は、“２”と記録される。ページ４とページ５とは、ページ特徴が同一であるため、ページ４からページ５への変化量は、“０”となる。ページ６には、ページ５におけるページ特徴であるスタイルＩＤ５が存在しないため、ページ５からページ６への変化量は、“１”となる。 Note that such correction processing does not necessarily have to be executed.
When the process of step S104 is completed, the break position setting unit 24 calculates a page feature change amount between pages (step S106). The delimiter position setting unit 24 further adds a page feature change amount item 62 to the column of the main area feature table shown in FIG. Referring to the row of the item 62, since the page features of the pages 1 to 3 are the same, the change amount from the page 1 to the page 2 and the change amount from the page 2 to the page 3 are both “0”. Is recorded. Since page 4 has neither of style IDs 2 and 3 which are page features in page 3, the change amount from page 3 to page 4 is recorded as “2”. Since page 4 and page 5 have the same page characteristics, the amount of change from page 4 to page 5 is “0”. Since the page 6 does not have the style ID 5 that is the page feature of the page 5, the change amount from the page 5 to the page 6 is “1”.

ステップＳ１０６の処理が終わると、区切り位置設定部２４は、区切り位置を設定する（ステップＳ１０８）。具体的には、区切り位置設定部２４は、変化量が「変数ｊ」以上であるページ間に、区切り位置を設定する。「変数ｊ」は、区切り位置を設定する際の基準となる値（変化量）であり、はじめ、変数ｊは、１に設定されているものとする。つまり、項目６２の欄に１以上の数値が記録されたページの直前で、区切り位置が設定される。したがって、この例では、ページ３とページ４との間、ページ５とページ６との間に区切り位置が設定される。 When the process of step S106 is completed, the break position setting unit 24 sets a break position (step S108). Specifically, the break position setting unit 24 sets a break position between pages whose change amount is “variable j” or more. “Variable j” is a value (amount of change) that serves as a reference when setting the delimiter position, and initially, variable j is set to 1. That is, the separation position is set immediately before the page in which a numerical value of 1 or more is recorded in the item 62 column. Therefore, in this example, a separation position is set between page 3 and page 4 and between page 5 and page 6.

再び図６を参照して、区切り位置設定部２４は、区切り位置に基づいて、文書ＩＤを付与する（ステップＳ１６）。すなわち、区切り位置設定部２４は、区切り位置と判定された箇所で、文書ＩＤを更新する。具体的には、区切り位置設定部２４は、図１５に示した主要領域特徴テーブルの列に、さらに、文書ＩＤの項目６３を追加する。項目６３の行を参照して、ページ１〜３の欄には、文書ＩＤとして“１”が記録され、ページ４，５の欄には、文書ＩＤとして“２”が記録される。また、ページ６の欄には、文書ＩＤとして“３”が記録される。このように、最初のページに文書ＩＤとして“１”を付与し、区切り位置で文書ＩＤを１だけインクリメントする。 Referring to FIG. 6 again, the break position setting unit 24 assigns a document ID based on the break position (step S16). In other words, the break position setting unit 24 updates the document ID at the location determined as the break position. Specifically, the break position setting unit 24 adds a document ID item 63 to the column of the main area feature table shown in FIG. Referring to the row of item 63, “1” is recorded as the document ID in the columns of pages 1 to 3, and “2” is recorded as the document ID in the columns of pages 4 and 5. In the page 6 column, “3” is recorded as the document ID. Thus, “1” is assigned as the document ID to the first page, and the document ID is incremented by 1 at the break position.

区切り位置設定部２４は、文書ＩＤの情報を、画像バッファ部１３に出力する。これにより、画像バッファ部１３に、文書画像とともに、文書ＩＤを含む管理情報が記憶される。図１６に、文書ＩＤを含む管理情報のデータ構造の一例を示す。管理情報には、ページＩＤと、文書ＩＤと、各画像データ（ページ）へのポインタとが対応付けられて記憶されている。画像バッファ部１３は、文書画像のデータと管理情報とを、電子文書生成部１５に出力する。 The delimiter position setting unit 24 outputs the document ID information to the image buffer unit 13. As a result, the management information including the document ID is stored in the image buffer unit 13 together with the document image. FIG. 16 shows an example of the data structure of management information including a document ID. In the management information, a page ID, a document ID, and a pointer to each image data (page) are stored in association with each other. The image buffer unit 13 outputs document image data and management information to the electronic document generation unit 15.

電子文書生成部１５は、管理情報に含まれる文書ＩＤの情報に基づき、複数の電子文書を生成する（ステップＳ１８）。つまり、電子文書生成部１５は、文書ＩＤの数分、電子文書を生成する。図１０の文書画像について、３つの文書が生成された例を、図１７に示す。図１７を参照して、第１の電子文書となるＰＤＦデータ１は、文書ＩＤ１に対応しており、当該電子文書には、ページ１〜ページ３の文書画像が含まれる。第２の電子文書となるＰＤＦデータ２は、文書ＩＤ２に対応しており、当該電子文書には、ページ４，５の文書画像が含まれる。第３の電子文書となるＰＤＦデータ３は、文書ＩＤ３に対応しており、当該電子文書には、ページ６の文書画像が含まれる。 The electronic document generation unit 15 generates a plurality of electronic documents based on the document ID information included in the management information (step S18). That is, the electronic document generation unit 15 generates electronic documents for the number of document IDs. FIG. 17 shows an example in which three documents are generated for the document image of FIG. Referring to FIG. 17, PDF data 1 serving as a first electronic document corresponds to document ID 1, and the electronic document includes page 1 to page 3 document images. The PDF data 2 serving as the second electronic document corresponds to the document ID 2, and the electronic document includes the document images of pages 4 and 5. The PDF data 3 serving as the third electronic document corresponds to the document ID 3, and the electronic document includes the document image of page 6.

次に、表示制御部２５は、操作パネル部１１０の表示パネル１１０ａに文書確認画面を表示する（ステップＳ２０）。図１８は、文書確認画面の一例を示す図であり、図１８（ａ）は、図６のステップＳ２０にて表示される画面の一例を示す。図１８（ａ）を参照して、文書確認画面には、各電子文書の先頭ページのプレビュー画像が表示され、「３つの文書を読み込みました。送付してよろしいですか？」とのメッセージが表示される。また、同画面には、ユーザからの指示を受付けるための４つのボタン（ＯＫボタン，修正（増）ボタン，修正（減）ボタン，キャンセルボタン）が表示される。 Next, the display control unit 25 displays a document confirmation screen on the display panel 110a of the operation panel unit 110 (step S20). FIG. 18 is a diagram illustrating an example of a document confirmation screen, and FIG. 18A illustrates an example of a screen displayed in step S20 of FIG. Referring to FIG. 18A, a preview image of the first page of each electronic document is displayed on the document confirmation screen, and a message “Three documents have been read. Are you sure you want to send them?” Is displayed. Is displayed. In addition, four buttons (an OK button, a correction (increase) button, a correction (decrease) button, and a cancel button) for receiving an instruction from the user are displayed on the screen.

指示受付部２６は、操作パネル部１１０を介してＯＫ指示が入力されたか否かを判断する（ステップＳ２２）。修正ボタンが選択された場合（ステップＳ２４でＮＯ）、パラメータ変更部２７は、パラメータ変更処理を実行する（ステップＳ２６）。なお、キャンセルボタンが選択された場合には、全ての処理を中断して、元の状態（区切り位置が設定されていない状態）へ戻されものとする。 The instruction receiving unit 26 determines whether or not an OK instruction is input via the operation panel unit 110 (step S22). When the correction button is selected (NO in step S24), the parameter changing unit 27 executes a parameter changing process (step S26). When the cancel button is selected, all processing is interrupted and the original state (state where no separation position is set) is returned.

パラメータ変更処理については、図８に示すサブルーチンのフローチャートを用いて説明する。 The parameter change process will be described with reference to a subroutine flowchart shown in FIG.

図８は、本発明の実施の形態におけるパラメータ変更処理を示すフローチャートである。図８を参照して、パラメータ変更部２７は、ユーザから、増指示が入力されたか否かを判断する（ステップＳ２０２）。増指示が入力されたと判断された場合、すなわち、修正（増）ボタンが選択された場合に（ステップＳ２０２においてＹＥＳ）、パラメータ変更部２７は、上述の変数ｉを０にする（ステップＳ２０４）。つまり、上述の補正処理（ステップＳ１０４）を実行しないように設定する。 FIG. 8 is a flowchart showing parameter change processing in the embodiment of the present invention. Referring to FIG. 8, parameter changing unit 27 determines whether an increase instruction is input from the user (step S202). When it is determined that an increase instruction has been input, that is, when the correction (increase) button is selected (YES in step S202), parameter changing unit 27 sets variable i described above to 0 (step S204). That is, it sets so that the above-mentioned correction processing (Step S104) may not be performed.

また、増指示でないと判断された場合、すなわち、修正（減）ボタンが選択された場合には（ステップＳ２０２においてＮＯ）、パラメータ変更部２７は、上述の変数ｊ（区切り位置を設定する際の基準）をたとえば１だけインクリメントする（ステップＳ２０６）。 When it is determined that the instruction is not an increase instruction, that is, when the correction (decrease) button is selected (NO in step S202), the parameter changing unit 27 sets the above-described variable j (when setting the separation position). For example, the reference is incremented by 1 (step S206).

パラメータ変更処理が終わると、処理は、ステップＳ１４に戻され、変更後のパラメータに基づいて、再度、区切り位置設定処理が実行される。 When the parameter changing process ends, the process returns to step S14, and the delimiter position setting process is executed again based on the changed parameter.

上記の例では、はじめ、変数ｉは１（補正処理をする）であり、変数ｊも１（基準値が１）であった。増指示が入力された場合に、変数ｉは０（補正処理をしない）とされる。図１９は、増指示が入力された後に、再度、区切り位置設定処理が実行された場合の主要領域特徴テーブルを示す図である。図１９を参照して、補正処理が実行されないので、スタイルＩＤ２で特定されるスタイル特徴は、ページ２には存在しないこととなる。したがって、この状態でのページ特徴判定結果の項目６１Ａの行を参照すると、ページ１，３のページ特徴がスタイルＩＤ２，３、ページ２のページ特徴がスタイルＩＤ３、ページ４，５のページ特徴がスタイルＩＤ５，６、ページ６のページ特徴がスタイルＩＤ６であることが示される。そうすると、ページ特徴変化量の項目６２Ａを参照すると、ページ１からページ２への変化量、および、ページ２からページ３への変化量は、ともに“１”であることが記録される。ページ３〜６の変化量の欄は、図１５と同様である。したがって、文書ＩＤの項目６３Ａの行を参照すると、変化量が１以上の箇所で文書ＩＤが更新されるため、ページ１の欄には、文書ＩＤとして“１”が記録され、ページ２の欄には、文書ＩＤとして“２”が記録される。また、ページ３の欄には、文書ＩＤとして“３”が記録され、ページ４，５の欄には、文書ＩＤとして“４”が記録される。また、ページ６の欄には、文書ＩＤとして“５”が記録される。このように、増指示が入力された場合、分割数が３つから５つに増加される。 In the above example, first, the variable i is 1 (correction processing is performed), and the variable j is also 1 (reference value is 1). When an increase instruction is input, the variable i is set to 0 (no correction process is performed). FIG. 19 is a diagram showing a main area feature table when the delimiter position setting process is executed again after the increase instruction is input. Referring to FIG. 19, since the correction process is not executed, the style feature specified by style ID 2 does not exist in page 2. Therefore, referring to the row of the page feature determination result item 61A in this state, the page features of pages 1 and 3 are style IDs 2 and 3, the page feature of page 2 is style ID 3, and the page features of pages 4 and 5 are styles. It is indicated that the page features of IDs 5 and 6 and page 6 are style ID6. Then, referring to the page feature change amount item 62A, it is recorded that the change amount from page 1 to page 2 and the change amount from page 2 to page 3 are both “1”. The column of change amount on pages 3 to 6 is the same as that in FIG. Accordingly, referring to the row of the document ID item 63A, the document ID is updated at a location where the amount of change is 1 or more, so “1” is recorded as the document ID in the page 1 column, and the page 2 column. "2" is recorded as the document ID. Further, “3” is recorded as the document ID in the column of page 3, and “4” is recorded as the document ID in the columns of pages 4 and 5. In the page 6 column, “5” is recorded as the document ID. Thus, when an increase instruction is input, the number of divisions is increased from three to five.

図１８（ｂ）には、図１８（ａ）に示した文書確認画面において、修正（増）ボタンが選択された場合に、２回目のステップＳ２０において表示される文書画面確認画面の一例が示される。 FIG. 18B shows an example of the document screen confirmation screen displayed in the second step S20 when the correction (increase) button is selected on the document confirmation screen shown in FIG. It is.

他方、減指示が入力された場合に、変数ｊ（変化量基準値）は２とされる。図２０は、減指示が入力された後に、再度、区切り位置設定処理が実行された場合の主要領域特徴テーブルを示す図である。図２０を参照して、ページ特徴判定結果は、１回目の区切り位置設定処理の場合と同じであるため、項目６１，６２の行は図１５と同様である。文書ＩＤの項目６３Ｂの行を参照すると、変化量が２以上の箇所で文書ＩＤが更新されるため、ページ１〜３の欄には、文書ＩＤとして“１”が記録され、ページ４，５の欄には、文書ＩＤとして“２”が記録される。ページ５からページ６への変化量は、基準値（２）未満であるため、ページ６の欄にも、文書ＩＤとして“２”が記録される。このように、減指示が入力された場合、分割数が３つから２つに減少される。 On the other hand, when a reduction instruction is input, the variable j (change amount reference value) is set to 2. FIG. 20 is a diagram illustrating a main area feature table when the delimiter position setting process is executed again after the reduction instruction is input. Referring to FIG. 20, since the page feature determination result is the same as that in the first separation position setting process, the rows of items 61 and 62 are the same as those in FIG. Referring to the row of the document ID item 63B, since the document ID is updated at a location where the amount of change is 2 or more, “1” is recorded as the document ID in the columns of pages 1 to 3, and pages 4 and 5 In this column, “2” is recorded as the document ID. Since the amount of change from page 5 to page 6 is less than the reference value (2), “2” is also recorded as the document ID in the page 6 column. Thus, when the reduction instruction is input, the number of divisions is reduced from three to two.

図１８（ｃ）には、図１８（ａ）に示した文書確認画面において、修正（減）ボタンが選択された場合に、２回目のステップＳ２０において表示される文書確認画面の一例が示される。 FIG. 18C shows an example of the document confirmation screen displayed in the second step S20 when the correction (decrease) button is selected on the document confirmation screen shown in FIG. .

なお、図１８（ｂ）および（ｃ）に示した画面において、さらに、修正ボタンが選択されると、その度に、パラメータ（変数ｉ，ｊ）を変更して区切り位置設定処理が繰り返される。 In the screens shown in FIGS. 18B and 18C, each time a correction button is selected, the parameter (variables i, j) is changed and the delimiter position setting process is repeated.

ステップＳ２２において、ＯＫ指示が入力された場合、すなわち、ＯＫボタンが選択された場合（ステップＳ２２においてＹＥＳ）、送信処理部１７は、文書画像ごとにファイル名を生成して、操作パネル部１１０に送付先設定画面を表示する（ステップＳ２４）。 If an OK instruction is input in step S22, that is, if the OK button is selected (YES in step S22), the transmission processing unit 17 generates a file name for each document image and sends it to the operation panel unit 110. A destination setting screen is displayed (step S24).

図２１は、送付先設定画面の一例を示す図である。図２１を参照して、送付先設定画面には、各電子文書の名前および送付先が表示され、「以下に送付します」というメッセージが表示される。同画面には、ユーザからの指示の入力を受付けるためのボタン（ＯＫボタン，キャンセル，変更ボタン）が表示される。変更ボタンは、各電子文書の名前および送付先を変更する指示を受付けるためのボタンであり、電子文書ごとに設けられる。なお、図２１は、ユーザにより送付先が設定された後の画面の一例が表示されている。送付先は、電子文書ごとに、たとえば、ユーザがメールアドレスをマニュアルで入力することにより設定されてもよい。そのような入力も、操作パネル部１１０が受付けてよい。 FIG. 21 is a diagram illustrating an example of a destination setting screen. Referring to FIG. 21, the name and destination of each electronic document are displayed on the destination setting screen, and a message “Send to below” is displayed. On this screen, buttons (OK button, cancel, change button) for accepting input of instructions from the user are displayed. The change button is a button for accepting an instruction to change the name and destination of each electronic document, and is provided for each electronic document. FIG. 21 shows an example of a screen after the transmission destination is set by the user. The delivery destination may be set for each electronic document, for example, when the user manually inputs a mail address. Such an input may be accepted by the operation panel unit 110.

送付先設定画面において、ＯＫボタンが選択されると、送信処理部１７は、電子文書ごとに設定された送付先に対して、送信部１８を介して電子文書を送信する（ステップＳ２８）。以上で文書画像生成処理は終了される。 When the OK button is selected on the transmission destination setting screen, the transmission processing unit 17 transmits the electronic document to the transmission destination set for each electronic document via the transmission unit 18 (step S28). This completes the document image generation process.

なお、生成された電子文書を送信部１８に出力する場合について説明したが、生成された電子文書を記憶部１１２に出力する場合も上記と同様の処理が実行されるものとする。その場合、ステップＳ２４，Ｓ２８の処理に代えて、記憶部１１２に、複数の電子文書を格納する処理が実行されればよい。 Note that although the case where the generated electronic document is output to the transmission unit 18 has been described, the same processing as described above is executed when the generated electronic document is output to the storage unit 112. In that case, instead of the processes of steps S24 and S28, a process of storing a plurality of electronic documents in the storage unit 112 may be executed.

このように、多様な種類の文書が混在した原稿を一括して読み込んだとしても、自動的に生成された複数の電子文書それぞれを、ユーザが設定した送り先に送信することができる。また、取得した文書画像のスタイル特徴に基づいて、区切り位置を設定するため、最適な位置で電子文書を分離することができる。つまり、本実施の形態によると、まず、行領域ごとに、スタイル特徴を検出し、文書画像を特徴付けるスタイルのパターンを表わす主要なスタイル特徴を判定する。その上で、ページごとに、該ページに存在する主要なスタイル特徴をページ特徴として検出し、ページ間でのページ特徴の変化量に基づいて、文書画像の区切り位置を設定する。このように、一旦、文書画像全体における主要なスタイル特徴を判定してから、ページごとのページ特徴を検出する。したがって、単純に、行領域ごとのスタイル特徴のみに基づいて区切り位置を設定するよりも、最適な位置で区切り位置を設定することが可能となる。 As described above, even when documents including various types of documents are read together, a plurality of automatically generated electronic documents can be transmitted to destinations set by the user. In addition, since the delimiter position is set based on the style characteristics of the acquired document image, the electronic document can be separated at the optimum position. In other words, according to the present embodiment, first, a style feature is detected for each row area, and a main style feature representing a style pattern that characterizes the document image is determined. Then, for each page, main style features existing on the page are detected as page features, and a document image separation position is set based on the amount of change in page features between pages. In this way, once the main style features in the entire document image are determined, the page features for each page are detected. Therefore, it is possible to set the delimiter position at an optimum position rather than simply setting the delimiter position based only on the style feature for each row area.

なお、上述のように、パラメータを変更するために、増指示または減指示の入力を受付けることとしたが、ユーザから、分割数に関する指示が受付けられればこのような形態に限定されない。たとえば、ユーザから、直接、分割数の入力を受付けてもよい。または、区切り位置設定処理が実行される前の段階で、ユーザから、分割数の入力を受付けておいてもよい。この場合、受付けた分割数（あるいはその数に近い値）になるまでパラメータを変更して区切り位置設定処理を実行する。そうすることで、ユーザは、所望の数の電子文書を効率良く得ることができる。 As described above, in order to change the parameter, an input of an increase instruction or a decrease instruction is accepted, but the present invention is not limited to such a form as long as an instruction regarding the number of divisions is accepted from the user. For example, an input of the number of divisions may be received directly from the user. Alternatively, an input of the number of divisions may be received from the user before the separation position setting process is executed. In this case, the parameter is changed until the accepted division number (or a value close to the number) is reached, and the delimiter position setting process is executed. By doing so, the user can efficiently obtain a desired number of electronic documents.

また、上述の区切り位置設定処理が実行された後、さらに、従来から存在する手法を用いて、第２の区切り位置設定処理を実行してもよい。本実施の形態における区切り位置設定処理では、行領域のスタイル特徴に基づいて区切り位置が設定される。したがって、ページ特徴の変化量が０の状態が複数ページ続いた場合、それらのページの間では区切り位置が設定されないことになる。そのような場合、多数のページが１つの文書として扱われることになるので、不都合が生じる場合もある。そのため、画像解析部１６は、たとえば、文書ＩＤが同一のページが所定数続いた場合に、第２の区切り位置設定処理を実行してもよい。あるいは、文書ＩＤを付与する前に、所定のページ数連続して、ページ特徴の変化量が０である場合に、第２の区切り位置設定処理を実行することとしてもよい。第２の区切り位置設定処理としては、たとえば、レイアウトや特定文字列（たとえば、ページ番号を表わす文字列）に基づく処理が採用可能である。 In addition, after the above-described separation position setting process is executed, the second separation position setting process may be executed using a conventional method. In the break position setting process in the present embodiment, the break position is set based on the style characteristics of the row area. Therefore, when a state where the amount of change in page feature is zero continues for a plurality of pages, no separation position is set between these pages. In such a case, since many pages are handled as one document, inconvenience may occur. Therefore, for example, the image analysis unit 16 may execute the second separation position setting process when a predetermined number of pages having the same document ID continue. Alternatively, the second delimiter position setting process may be executed when the change amount of the page feature is 0 continuously for a predetermined number of pages before assigning the document ID. As the second delimiter position setting process, for example, a process based on a layout or a specific character string (for example, a character string representing a page number) can be employed.

また、各電子文書４００のファイル名は、固定の文字列と文書ＩＤとを組み合わせたものとしたが、各文書の内容からファイル名を決定してもよい。たとえば、分割された文書それぞれのタイトルに相当する文字列をファイル名としてもよい。その場合、画像解析部１６は、区切り位置が設定された各文書画像の中で、高さが最大の文字列領域を抽出する。そして、その領域に対して文字認識処理を行ない、その認識処理の結果をファイル名として決定する。決定したファイル名を上述の管理情報に含めることで、上記と同様に、送信処理部１７が各電子文書４００のファイル名を付与することができる。このような処理をすることで、より有用なファイル名の付与が可能となる。 The file name of each electronic document 400 is a combination of a fixed character string and a document ID, but the file name may be determined from the contents of each document. For example, a character string corresponding to the title of each divided document may be used as the file name. In that case, the image analysis unit 16 extracts the character string region having the maximum height from each document image in which the break position is set. Then, character recognition processing is performed on the area, and the result of the recognition processing is determined as a file name. By including the determined file name in the management information, the transmission processing unit 17 can give the file name of each electronic document 400 in the same manner as described above. By performing such processing, more useful file names can be assigned.

また、上記フローチャートでは、一旦、電子文書が生成されてから、ユーザからの増減の指示を受付けたが、そのような手順に限定されない。たとえば、表示制御部２５が、先に、文書ＩＤに基づいて表示処理を実行し、指示受付部２６によりＯＫ指示が入力された場合に、電子文書生成部１５が電子文書を生成することとしてもよい。 In the above flowchart, once the electronic document is generated, an instruction to increase or decrease from the user is accepted. However, the present invention is not limited to such a procedure. For example, the display control unit 25 may first execute display processing based on the document ID, and the electronic document generation unit 15 may generate an electronic document when an OK instruction is input by the instruction reception unit 26. Good.

また、ＭＦＰ１は、設定された区切り位置に基づいて、複数の電子文書を生成することとしたが、区切り位置に特定のページ（たとえば白紙ページ）を挿入して１つの電子文書を生成することとしてもよい。 In addition, the MFP 1 generates a plurality of electronic documents based on the set delimiter positions. However, the MFP 1 generates a single electronic document by inserting a specific page (for example, a blank page) at the delimiter positions. Also good.

各電子文書４００のデータ構造例を図２２に示す。図２２を参照して、各電子文書４００は、ヘッダ部４０２と、文書画像部４０４と、フッタ部４０８とを含む。ヘッダ部４０２およびフッタ部４０８には、電子文書４００の属性についての情報、たとえば作成日時・作成者・著作権情報などが格納される。文書画像部４０６には、各ページに対応する文書画像が格納される。なお、この文書画像は、上述したように圧縮された状態で格納されてもよい。 An example of the data structure of each electronic document 400 is shown in FIG. Referring to FIG. 22, each electronic document 400 includes a header part 402, a document image part 404, and a footer part 408. The header part 402 and the footer part 408 store information about the attributes of the electronic document 400, such as creation date / time / creator / copyright information. A document image corresponding to each page is stored in the document image unit 406. The document image may be stored in a compressed state as described above.

（印刷処理）
以上、文書画像から電子文書を生成する処理において、区切り位置が設定される手順について説明したが、文書画像についての印刷処理を実行する処理においても上記と同様に区切り位置が設定される。 (Printing process)
The procedure for setting the delimiter position in the process of generating the electronic document from the document image has been described above. However, the delimiter position is also set in the process for executing the print process for the document image in the same manner as described above.

図２３は、本発明の実施の形態に従うＭＦＰ１が印刷処理を実行する場合の機能構成を示す機能ブロック図である。図２３においては、ＭＦＰ１が印刷処理を実行する場合の機能が示されている。図５と同じ機能については、同じ符号を付してある。したがって、それらの説明は繰返さない。 FIG. 23 is a functional block diagram showing a functional configuration when MFP 1 according to the embodiment of the present invention executes print processing. FIG. 23 shows functions when the MFP 1 executes print processing. The same functions as those in FIG. 5 are denoted by the same reference numerals. Therefore, those descriptions will not be repeated.

図２３を参照して、図５の電子文書生成部１５，送信処理部１７に代えて、画像処理部１９が、ＭＦＰ１の機能構成に加えられる。また、送信部１８，記憶部１１２に代えて、プリント部１０６がＭＦＰ１の機能構成に加えられる。画像処理部１９は、ユーザ操作に応じて、画像バッファ部１３から出力される文書画像をプリント部１０６でのプリント動作に適した画像に変換する。代表的に、ＲＧＢ表示系で規定された文書画像をカラープリントに適したＣＭＹＫ表示系の画像データなどに変換する。このとき、プリント部１０６の特性に応じた色調整を行なってもよい。プリント部１０６は、画像処理部１９からのＣＭＹＫ画像からプリント画像を形成して、紙媒体などへの印刷処理を行なう。 Referring to FIG. 23, an image processing unit 19 is added to the functional configuration of MFP 1 in place of electronic document generation unit 15 and transmission processing unit 17 in FIG. In addition, the printing unit 106 is added to the functional configuration of the MFP 1 instead of the transmission unit 18 and the storage unit 112. The image processing unit 19 converts the document image output from the image buffer unit 13 into an image suitable for a printing operation in the printing unit 106 according to a user operation. Typically, a document image defined by an RGB display system is converted into image data of a CMYK display system suitable for color printing. At this time, color adjustment according to the characteristics of the print unit 106 may be performed. The print unit 106 forms a print image from the CMYK image from the image processing unit 19 and performs a printing process on a paper medium or the like.

ＭＦＰ１が印刷処理を実行する場合、図６のフローチャートのうち、電子文書生成処理（ステップＳ１８）が削除される。また、送付先設定画面表示（ステップＳ２４）および送信処理（ステップＳ２８）に代えて、画像処理が実行される。画像処理において、画像処理部１９は、上述のような変換処理を行なうとともに、文書ＩＤごとにソート方法を変えて、プリント部１０６に画像データを出力する。これにより、プリント部１０６より仕分けされた紙媒体が出力される。 When the MFP 1 executes the printing process, the electronic document generation process (step S18) in the flowchart of FIG. 6 is deleted. Further, image processing is executed in place of the destination setting screen display (step S24) and transmission processing (step S28). In the image processing, the image processing unit 19 performs the conversion processing as described above, and outputs the image data to the printing unit 106 by changing the sorting method for each document ID. As a result, the paper medium sorted by the printing unit 106 is output.

なお、プリント部１０６は、必ずしもソート機能を備えていなくてもよい。この場合、画像処理部１９は、たとえば、文書ＩＤの変わり目に、白紙ページなどの特定のページを挿入することとしてもよい。 Note that the print unit 106 does not necessarily have a sort function. In this case, for example, the image processing unit 19 may insert a specific page such as a blank page at the change of the document ID.

＜変形例＞
上述の実施の形態においては、本発明に係る処理がＭＦＰ１で実行される場合について説明したが、原稿３００を読取るための画像読取機能を備えたコンピュータにおいて上記処理が実行されてもよい。この場合には、コンピュータを文書処理装置として機能させるための図５または図２３に示された処理機能を実行させるプログラムを提供することもできる。このようなプログラムは、ＣＤ−ＲＯＭ（Compact Disk-Read Only Memory）などの光学媒体や、メモリカードなどのコンピュータ読取り可能な記録媒体にて記憶させて、プログラム製品として提供することもできる。あるいは、コンピュータに内蔵するハードディスクなどの記憶媒体にて記憶させて、プログラムを提供することもできる。また、ネットワークを介したダウンロードによって、プログラムを提供することもできる。 <Modification>
In the above-described embodiment, the case where the process according to the present invention is executed by the MFP 1 has been described. However, the above process may be executed by a computer having an image reading function for reading the document 300. In this case, it is also possible to provide a program for executing the processing function shown in FIG. 5 or FIG. 23 for causing the computer to function as a document processing apparatus. Such a program can be stored in an optical medium such as a CD-ROM (Compact Disk-Read Only Memory) or a computer-readable recording medium such as a memory card and provided as a program product. Alternatively, the program can be provided by being stored in a storage medium such as a hard disk built in the computer. A program can also be provided by downloading via a network.

また、プログラムを記録しているたとえば光学媒体等をＭＦＰ１の駆動装置（図示せず）で読み出して、記憶部１１２のプログラムをアップデートすることもできる。 Further, the program in the storage unit 112 can be updated by reading, for example, an optical medium in which the program is recorded with a driving device (not shown) of the MFP 1.

また、画像読取機能を他の装置またはコンピュータで実現した上で、生成された文書画像を受取って、上記のような処理に従って区切り位置を設定してもよい。 Alternatively, the image reading function may be realized by another device or computer, and the generated document image may be received and the separation position may be set according to the above processing.

なお、本発明にかかるプログラムは、コンピュータのオペレーティングシステム（ＯＳ）の一部として提供されるプログラムモジュールのうち、必要なモジュールを所定の配列で所定のタイミングで呼出して処理を実行させるものであってもよい。その場合、プログラム自体には上記モジュールが含まれずＯＳと協働して処理が実行される。このようなモジュールを含まないプログラムも、本発明にかかるプログラムに含まれ得る。 The program according to the present invention is a program module that is provided as a part of a computer operating system (OS) and calls necessary modules in a predetermined arrangement at a predetermined timing to execute processing. Also good. In that case, the program itself does not include the module, and the process is executed in cooperation with the OS. A program that does not include such a module can also be included in the program according to the present invention.

また、本発明にかかるプログラムは他のプログラムの一部に組込まれて提供されるものであってもよい。その場合にも、プログラム自体には上記他のプログラムに含まれるモジュールが含まれず、他のプログラムと協働して処理が実行される。このような他のプログラムに組込まれたプログラムも、本発明にかかるプログラムに含まれ得る。 The program according to the present invention may be provided by being incorporated in a part of another program. Even in this case, the program itself does not include the module included in the other program, and the process is executed in cooperation with the other program. Such a program incorporated in another program can also be included in the program according to the present invention.

提供されるプログラム製品は、ハードディスクなどのプログラム格納部にインストールされて実行される。なお、プログラム製品は、プログラム自体と、プログラムが記憶された記憶媒体とを含む。 The provided program product is installed in a program storage unit such as a hard disk and executed. Note that the program product includes the program itself and a storage medium in which the program is stored.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

本発明の実施の形態に従う文書処理装置を含むシステムの概略構成図である。1 is a schematic configuration diagram of a system including a document processing device according to an embodiment of the present invention. 本発明の実施の形態に従うＭＦＰにおける概略のハードウェア構成を示すブロック図である。FIG. 3 is a block diagram showing a schematic hardware configuration in an MFP according to an embodiment of the present invention. 本発明の実施の形態に従うＭＦＰの操作パネル部の外観の一例を示す図である。It is a diagram showing an example of an appearance of an operation panel unit of the MFP according to the embodiment of the present invention. 本発明の実施の形態に従うパーソナルコンピュータの概略のハードウェア構成を示す模式図である。It is a schematic diagram which shows the schematic hardware constitutions of the personal computer according to embodiment of this invention. 本発明の実施の形態に従うＭＦＰが電子文書を生成する場合のＭＦＰの機能構成を示す機能ブロック図である。3 is a functional block diagram showing a functional configuration of the MFP when the MFP according to the embodiment of the present invention generates an electronic document. FIG. 本発明の実施の形態における電子文書生成処理を示すフローチャートである。It is a flowchart which shows the electronic document production | generation process in embodiment of this invention. 本発明の実施の形態における区切り位置設定処理を示すフローチャートである。It is a flowchart which shows the division | segmentation position setting process in embodiment of this invention. 本発明の実施の形態におけるパラメータ変更処理を示すフローチャートである。It is a flowchart which shows the parameter change process in embodiment of this invention. （ａ）〜（ｃ）は、本実施の形態における行領域抽出処理を説明するための図である。(A)-(c) is a figure for demonstrating the row area | region extraction process in this Embodiment. 複数ページを含む文書に対して抽出された行領域の例を示す図である。It is a figure which shows the example of the line area | region extracted with respect to the document containing several pages. 文書画像に含まれる行領域のリストの一例を示す図である。It is a figure which shows an example of the list | wrist of the line area | region contained in a document image. スタイル特徴リストの一例を示す図である。It is a figure which shows an example of a style feature list. スタイル特徴ごとの算出結果テーブルの一例を示す図である。It is a figure which shows an example of the calculation result table for every style feature. 主要領域特徴テーブルの一例を示す図である。It is a figure which shows an example of a main area | region characteristic table. 本発明の実施の形態における補正処理が行なわれた後の主要領域特徴テーブルを示す図である。It is a figure which shows the main area | region characteristic table after the correction process in embodiment of this invention was performed. 画像バッファ部に記憶される管理情報のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the management information memorize | stored in an image buffer part. 図１０の文書画像について、３つの文書が生成された例を示す図である。It is a figure which shows the example by which three documents were produced | generated about the document image of FIG. （ａ）〜（ｃ）は、文書確認画面の一例を示す図である。(A)-(c) is a figure which shows an example of a document confirmation screen. 増指示が入力された後に、再度、区切り位置設定処理が実行された場合の主要領域特徴テーブルを示す図である。It is a figure which shows the main area | region characteristic table when the division | segmentation position setting process is performed again after the increase instruction | indication is input. 減指示が入力された後に、再度、区切り位置設定処理が実行された場合の主要領域特徴テーブルを示す図である。It is a figure which shows the main area | region characteristic table when the delimiter position setting process is performed again after the reduction instruction | indication is input. 送付先設定画面の一例を示す図である。It is a figure which shows an example of a transmission destination setting screen. 各電子文書のデータ構造例を示す図である。It is a figure which shows the data structure example of each electronic document. 本発明の実施の形態に従うＭＦＰが印刷処理を実行する場合のＭＦＰの機能構成を示す機能ブロック図である。3 is a functional block diagram showing a functional configuration of the MFP when the MFP according to the embodiment of the present invention executes a printing process. FIG.

Explanation of symbols

１ＭＦＰ、ＰＣ１，ＰＣ２，ＰＣ３パーソナルコンピュータ、ＳＲＶサーバ装置、１２受信部、１３画像バッファ部、１５電子文書生成部、１６画像解析部、１７送信処理部、１８送信部、１９画像処理部、２１行領域抽出部、２２スタイル特徴検出部、２３主要判定部、２４位置設定部、２５表示制御部、２６指示受付部、２７パラメータ変更部、１００制御部、１０２メモリ部、１０４画像読取部、１０６プリント部、１０８通信インターフェイス部、１１０操作パネル部、１１２記憶部、２０１ＣＰＵ、２０３内部バス、２０５ディスプレイ部、２０７通信インターフェイス部、２０９入力部、２１１ハードディスク部、２１３メモリ部、２１５ＣＤ−ＲＯＭドライブ、２１５ａＣＤ−ＲＯＭ、２１７ＦＤＤドライブ、２１７ａフレキシブルディスク、３００原稿、３１０画像データ、４００電子文書。 DESCRIPTION OF SYMBOLS 1 MFP, PC1, PC2, PC3 Personal computer, SRV server apparatus, 12 receiving part, 13 image buffer part, 15 electronic document production | generation part, 16 image analysis part, 17 transmission processing part, 18 transmission part, 19 image processing part, 21 Line area extraction unit, 22 style feature detection unit, 23 main determination unit, 24 position setting unit, 25 display control unit, 26 instruction reception unit, 27 parameter change unit, 100 control unit, 102 memory unit, 104 image reading unit, 106 Print unit, 108 communication interface unit, 110 operation panel unit, 112 storage unit, 201 CPU, 203 internal bus, 205 display unit, 207 communication interface unit, 209 input unit, 211 hard disk unit, 213 memory unit, 215 CD-ROM drive 215a CD- OM, 217 FDD drive, 217a flexible disk, 300 document, 310 image data, 400 an electronic document.

Claims

First storage means for storing a document image;
Extraction means for extracting a predetermined type of document area from the document image stored in the first storage means;
Detecting means for detecting a plurality of region attributes based on each of a plurality of predetermined attribute types for each document region, and detecting a style feature representing a combination pattern of the region attributes;
Classification means for classifying the detected plurality of style features for each pattern, and determining one or more main style features for the document image based on the appearance frequency of the classified patterns;
Detecting, for each page, one of a plurality of the main style features, and setting a separation position of the document image based on a change amount of the main style features between pages. A document processing apparatus comprising setting means.

The determination means includes
For each style feature, a frequency calculating means for calculating the number of pages in which the style feature appears as the appearance frequency;
The document processing apparatus according to claim 1, further comprising: a unit that determines a style feature whose calculated number of pages is a predetermined value or more as the main style feature.

The setting means includes
Means for determining the primary style features present on each page as page features of the page;
For each page, an amount calculation means for calculating the amount of change of the page feature of the page from the page feature of the previous page as the amount of change;
The document processing apparatus according to claim 2, further comprising means for setting a break position between the pages when the calculated amount of change is equal to or greater than a predetermined value.

The setting means includes
If the same page feature is detected in the first and third pages other than the second page among consecutive first, second and third pages, the second page is also displayed. The document processing apparatus according to claim 3, further comprising correction means for correcting that the same page feature is detected.

5. The document processing apparatus according to claim 1, further comprising an output unit configured to divide and output the document image based on a separation position set by the setting unit.

Display control means for generating a signal for displaying an index image of the first page in a plurality of divided document images when the document image is divided based on the separation position;
The document processing apparatus according to claim 1, further comprising display means for performing output in accordance with a signal from the display control means.

The document processing apparatus according to claim 1, further comprising an instruction receiving unit configured to receive an instruction regarding the number of document image divisions by the setting unit from a user.

A changing means for changing a predetermined parameter for setting a delimiter position by the setting means in accordance with the received instruction;
The document processing apparatus according to claim 7, further comprising an execution instructing unit for instructing execution of processing by the setting unit again after the change by the changing unit.

9. The document processing apparatus according to claim 1, further comprising: a second setting unit configured to set a break position of the document image based on a layout or a predetermined character string.

The plurality of predetermined attribute types include position, size, color in upper area, number of partial areas, position, height, color, distance between adjacent partial areas, character size, character modification, character color The document processing apparatus according to claim 1, wherein the document processing apparatus includes at least two of background color and font type.

The document processing apparatus according to claim 1, wherein the predetermined type of document area corresponds to any one of a stage, a line, a character string, and a character.

A document processing method executed in a document processing apparatus including a storage unit for storing a document image and an arithmetic processing unit,
The arithmetic processing unit extracting a document area of a predetermined type from the document image stored in the storage unit;
The arithmetic processing unit detects a plurality of region attributes based on each of a plurality of predetermined attribute types for each document region, and detects a style feature representing a combination pattern of the region attributes;
Classifying the detected plurality of style features for each pattern, and determining one or more main style features for the document image based on the appearance frequency of the classified pattern;
Detecting which of the plurality of main style features is present for each page, and setting a separation position of the document image based on a change amount of the main style features between pages; A document processing method comprising:

A document processing program for causing a computer to execute the document processing method according to claim 12.