JP2007122392A

JP2007122392A - Device, method, and program for image processing, and storage medium

Info

Publication number: JP2007122392A
Application number: JP2005313399A
Authority: JP
Inventors: Shigeo Fukuoka; 茂雄福岡
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2005-10-27
Filing date: 2005-10-27
Publication date: 2007-05-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device, method, and program for image processing, and a storage medium capable of easily searching for a target document to be transmitted out of a large amount of documents. <P>SOLUTION: A multifunctional printer is provided with a scanner part 101, a printer part 102, an operation part 103, a modem 104, a network controller part 105, a device for image processing 106, an HDD 107, a memory 108, a transmitting/receiving part 109, a character recognition part 110, and a search part 112, and is further provided with a dictionary 111 for character recognition connected to the character recognition part 110, and a dictionary 113 for search connected to the search part 112. After dividing each of the plurality of the documents by pages and by regions of the respective pages, a plurality of regions are searched on the basis of a predetermined search keyword. Based on the search result, the region and its page and document with maximum scores as the number of hits of search by the predetermined keyword are selected. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、画像処理装置及び方法、並びにプログラム及び記憶媒体に関し、特に、読み取った画像データを送信する画像処理装置及び方法、並びにプログラム及び記憶媒体に関する。 The present invention relates to an image processing apparatus and method, a program, and a storage medium, and more particularly, to an image processing apparatus and method, a program, and a storage medium that transmit read image data.

近年、スキャナの普及により文書の電子化が進んでいる。多機能型プリンタ（ＭＦＰ）にも読み取った画像データを蓄積するボックス機能が搭載され、また、メールやファイル送信（ＦＴＰ(File Transfer Protocol)，ＳＭＢ(Server Message Block)等）で送信する機能を備えるようになってきている。
特開２００２−０５９５９３号公報 In recent years, the digitization of documents has progressed with the spread of scanners. A multi-function printer (MFP) is also equipped with a box function for storing scanned image data, and also has a function for transmitting by mail or file transmission (FTP (File Transfer Protocol), SMB (Server Message Block), etc.). It has become like this.
JP 2002-059593 A

しかしながら、従来技術では、目的の文書をボックス内から見つけ出すのが困難であり、特に、ボックス内に大量の文書が存在する場合には、送信する目的文書を探すことが極めて困難であった。 However, in the conventional technique, it is difficult to find a target document from the box, and in particular, when a large number of documents exist in the box, it is extremely difficult to search for a target document to be transmitted.

本発明の目的は、大量の文書の中から、送信する目的文書を容易に探すことができる画像処理装置及び方法、並びにプログラム及び記憶媒体を提供することにある。 An object of the present invention is to provide an image processing apparatus and method, a program, and a storage medium that can easily find a target document to be transmitted from a large number of documents.

上記の目的を達成するために、請求項１記載の画像処理装置は、データ蓄積部に蓄積された複数のデータから所定のデータを検索する画像処理装置において、前記複数のデータの各々をページ毎及び各ページの部分領域毎に区分けした上で所定の検索情報に基づいて前記複数の部分領域を検索する検索手段と、前記検索結果に基づいて前記複数のデータを再構成する再構成手段とを備えることを特徴とする。 In order to achieve the above object, an image processing apparatus according to claim 1 is an image processing apparatus that retrieves predetermined data from a plurality of data stored in a data storage unit. And a search means for searching for the plurality of partial areas based on predetermined search information after dividing each partial area of each page, and a reconstruction means for reconfiguring the plurality of data based on the search results It is characterized by providing.

請求項７記載の画像処理方法は、データ蓄積部に蓄積された複数のデータから所定のデータを検索する画像処理方法において、前記複数のデータの各々をページ毎及び各ページの部分領域毎に区分けした上で所定の検索情報に基づいて前記複数の部分領域を検索する検索ステップと、前記検索結果に基づいて前記複数のデータを再構成する再構成ステップとを備えることを特徴とする。 The image processing method according to claim 7 is an image processing method for retrieving predetermined data from a plurality of data stored in a data storage unit, wherein each of the plurality of data is classified for each page and for each partial region of each page. And a search step for searching for the plurality of partial areas based on predetermined search information, and a reconstruction step for reconfiguring the plurality of data based on the search results.

請求項１３記載の画像処理プログラムは、データ蓄積部に蓄積された複数のデータから所定のデータを検索する画像処理プログラムにおいて、前記複数のデータの各々をページ毎及び各ページの部分領域毎に区分けした上で所定の検索情報に基づいて前記複数の部分領域を検索する検索モジュールと、前記検索結果に基づいて前記複数のデータを再構成する再構成モジュールとをコンピュータに実行させることを特徴とする。 An image processing program according to claim 13 is an image processing program for retrieving predetermined data from a plurality of data stored in a data storage unit, and divides each of the plurality of data into pages and partial areas of each page. And a computer that executes a search module that searches the plurality of partial areas based on predetermined search information and a reconfiguration module that reconfigures the plurality of data based on the search results. .

請求項１２記載のコンピュータ読取り可能な記憶媒体は、請求項１１記載のプログラムを格納することを特徴とする。 A computer-readable storage medium according to a twelfth aspect stores the program according to the eleventh aspect.

本発明によれば、まず、複数のデータの各々をページ毎及び各ページの部分領域毎に区分けした上で所定の検索情報に基づいて複数の部分領域を検索する。そして、この検索結果に基づいて複数のデータを再構成するので、大量の文書の中から目的文書を容易に探すことができる。 According to the present invention, first, a plurality of partial areas are searched based on predetermined search information after each of a plurality of data is divided into pages and partial areas of each page. Since a plurality of data is reconstructed based on the search result, the target document can be easily searched from a large number of documents.

以下、本発明の実施の形態を図面を参照しながら詳述する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の実施の形態に係る画像処理装置を備える多機能型プリンタ（ＭＦＰ）の構成を概略的に示すブロック図である。 FIG. 1 is a block diagram schematically showing a configuration of a multifunction printer (MFP) including an image processing apparatus according to an embodiment of the present invention.

図１において、多機能型プリンタ（ＭＦＰ、複合機とも言う）は、スキャナ部１０１、プリンタ部１０２、操作部１０３，モデム１０４、ネットワークコントローラ部１０５、画像処理装置１０６、ＨＤＤ１０７、メモリ１０８、送受信部１０９、文字認識部１１０、及び検索部１１２を備え、これらは、システムバスによって接続されている。ＭＦＰは、さらに、文字認識部１１０に接続された文字認識用辞書１１１、及び検索部１１２に接続された検索用辞書１１３を備える。 In FIG. 1, a multifunction printer (also referred to as an MFP or a multifunction device) includes a scanner unit 101, a printer unit 102, an operation unit 103, a modem 104, a network controller unit 105, an image processing device 106, an HDD 107, a memory 108, and a transmission / reception unit. 109, a character recognition unit 110, and a search unit 112, which are connected by a system bus. The MFP further includes a character recognition dictionary 111 connected to the character recognition unit 110 and a search dictionary 113 connected to the search unit 112.

スキャナ部１０１は、画像を取り込む。プリンタ部１０２は、コピーやコンピュータ等からの印刷をする。操作部１０３は、ＭＦＰの操作を行い、通常はタッチパネル付きの液晶ディスプレイである。モデム１０４は、ファックス送受信時に使用される。ここから公衆回線へ接続することができる。ネットワークコントローラ部１０５は、通常Ethernet（登録商標）を使いＬＡＮへ接続される。 The scanner unit 101 captures an image. The printer unit 102 performs copying and printing from a computer or the like. The operation unit 103 operates the MFP and is usually a liquid crystal display with a touch panel. The modem 104 is used at the time of fax transmission / reception. You can connect to the public line from here. The network controller unit 105 is normally connected to the LAN using Ethernet (registered trademark).

画像処理部１０６は、YCbCrとＲＧＢ等の色空間変換の相互変換やＪＰＥＧ，ＭＭＲといった画像の圧縮伸張等の処理を行う。ハードディスクドライブ（ＨＤＤ）１０７（データ蓄積部）は、スキャナ部１０１から読み取った画像等を一時的に蓄積し、またＭＦＰの蓄積機能のためのデータを保存する領域として使用される。メモリ１０８は、画像処理を行う場合のワークエリア等として使用される。送受信部１０９は、ファイルを送受信する場合にネットワークコントローラ部１０５を操作して、実際の送受信処理を行う。文字認識部１１０は、メモリ１０８上に展開された画像データから、文字領域を識別し、文字認識処理を行う。 The image processing unit 106 performs processing such as mutual conversion between color space conversions such as YCbCr and RGB, and image compression / decompression such as JPEG and MMR. A hard disk drive (HDD) 107 (data storage unit) is used as an area for temporarily storing images read from the scanner unit 101 and storing data for the storage function of the MFP. The memory 108 is used as a work area when image processing is performed. The transmission / reception unit 109 operates the network controller unit 105 to perform actual transmission / reception processing when transmitting / receiving a file. The character recognition unit 110 identifies a character region from the image data developed on the memory 108 and performs character recognition processing.

文字認識用辞書１１１は、文字認識部が使用し、通常のＤＲＡＭ上に置かれてもよいし、ＲＯＭであってもよい。検索部１１２は、操作部から入力された検索キーワードを用い、文字認識部１１０が出力した、文字列を検索してページのスコアを算出する。検索用辞書１１３は、検索部１１２によって使用される。 The character recognition dictionary 111 is used by the character recognition unit and may be placed on a normal DRAM or a ROM. The search unit 112 searches the character string output by the character recognition unit 110 using the search keyword input from the operation unit, and calculates the score of the page. The search dictionary 113 is used by the search unit 112.

１．スキャンによる文書の蓄積
ＭＦＰの蓄積機能は、スキャンした画像、ファックス受信した画像、メール等の受信機能による画像、ＰＤＬによる画像等を蓄積することができる。メール等の受信機能は、ネットワークインターフェースを経由して受信し、ＰＤＬによる画像は、ネットワークインターフェースやＵＳＢ等のコンピュータと直接接続するインターフェースを経由して受信する。 1. Accumulation of Documents by Scanning The accumulation function of the MFP can accumulate scanned images, images received by fax, images by receiving functions such as mail, images by PDL, and the like. A receiving function such as mail is received via a network interface, and an image by PDL is received via an interface directly connected to a computer such as a network interface or USB.

ここでは、スキャナ部１０１から読み取った画像をＨＤＤ１０７へ蓄積する場合について説明する。 Here, a case where an image read from the scanner unit 101 is stored in the HDD 107 will be described.

操作部１０３を操作し、蓄積機能を呼び出し、ＡＤＦ（Auto Document Feeder）に原稿がセットされているものとする。また画像の取り込みモードは２４ビットフルカラーがセットされているものとする。 It is assumed that the operation unit 103 is operated to call a storage function, and a document is set in an ADF (Auto Document Feeder). Further, it is assumed that 24-bit full color is set as the image capture mode.

スタートボタンが押されると、スキャナ部１０１が動作し、画像処理部１０６でガンマ補正やＪＰＥＧ圧縮の画像処理が行われ、ＨＤＤ１０７に保存される。ＨＤＤ１０７に空き領域があり、ＡＤＦに次の原稿があれば次ページのスキャン動作を行うことを繰り返すことで原稿の全ページのスキャンを行う。ＨＤＤ１０７に空き領域がなければ、操作部１０３に「ＨＤＤがいっぱいです」等のメッセージを表示し、処理を中断する。 When the start button is pressed, the scanner unit 101 operates, the image processing unit 106 performs gamma correction and JPEG compression image processing, and the image is stored in the HDD 107. If there is an empty area in the HDD 107 and there is a next document in the ADF, the next page scan operation is repeated to scan all pages of the document. If there is no free space in the HDD 107, a message such as “HDD is full” is displayed on the operation unit 103, and the processing is interrupted.

全てのページの読み込みが終了すると、ＨＤＤ１０７上には図２に示す文書データが蓄積される。 When all the pages have been read, the document data shown in FIG.

２．蓄積された文書データに対する検索用データ生成
スキャナ部１０１によるスキャンが終了した時点では、ＨＤＤ１０７上に図２に示す文書データが蓄積されている。この時点では、画像データと日付程度のデータしか付加されていないため、これにレイアウトデータやテキストコード、タイトル、要約等のデータを付加する処理を行う。この処理は、図３の構成によって実行される。 2. Search Data Generation for Stored Document Data When the scan by the scanner unit 101 is completed, the document data shown in FIG. At this time, since only image data and date data are added, a process of adding data such as layout data, text code, title, and summary is performed. This process is executed by the configuration shown in FIG.

図３は、図２の文書データにデータ処理を行う動作を説明する図である。 FIG. 3 is a diagram for explaining an operation for performing data processing on the document data of FIG.

図３において、ページ画像２０１は、スキャナ部１０１で読み取られた文書データであって、ＪＰＥＧで圧縮してある。ＨＤＤ読み出し部２０２は、ＨＤＤに記録されているＪＰＥＧデータ２０１を読み出し、ＪＰＥＧ伸張部２０３へ符号データを送る。ＪＰＥＧ伸張部２０３は、受け取ったＪＰＥＧのデータを復号し、YCbCrのラスタ画像を生成し画像二値化部２０４へ送信する。画像二値化部２０４は、受け取ったYCbCrのラスタ画像から輝度成分(Y成分)のみを抽出し、あらかじめ決められた閾値に基づき二値化し、二値画像データ２０５を出力する。領域識別部２０６は、得られた二値画像データ２０５から、通常の文字認識と同様に、文字外接矩形の分布の検出や外接矩形の結合等を行い、レイアウトデータ２０７を生成する。 In FIG. 3, a page image 201 is document data read by the scanner unit 101 and is compressed by JPEG. The HDD reading unit 202 reads JPEG data 201 recorded in the HDD and sends code data to the JPEG decompression unit 203. The JPEG decompression unit 203 decodes the received JPEG data, generates a YCbCr raster image, and transmits it to the image binarization unit 204. The image binarization unit 204 extracts only the luminance component (Y component) from the received YCbCr raster image, binarizes it based on a predetermined threshold value, and outputs binary image data 205. From the obtained binary image data 205, the area identification unit 206 detects the distribution of the circumscribed rectangle of the character, combines the circumscribed rectangle, and the like to generate the layout data 207, as in normal character recognition.

例えば、二値画像データ２０５が、図１４の左側面に示す画像である場合、連結黒画素を判別したり、水平方向及び垂直方向のヒストグラムを取ったりすることにより、図１４の右側部に示すように属性ごとの領域に分割する。領域分割処理については公知の技術を用いることが可能である。このような領域分割結果（例えば図１４の右や図１５（Ａ））に基づいて、入れ子になっている各領域の配置構造を示すツリー型のデータ構造（例えば図１５（Ｂ））を作成し、レイアウトデータ２０７として保持される。 For example, when the binary image data 205 is the image shown on the left side of FIG. 14, it is shown on the right side of FIG. 14 by determining connected black pixels or taking histograms in the horizontal and vertical directions. As shown in FIG. A known technique can be used for the area division processing. Based on such region segmentation results (for example, the right side of FIG. 14 or FIG. 15A), a tree-type data structure (for example, FIG. 15B) showing the arrangement structure of each nested region is created. And stored as layout data 207.

文字認識部２０８は、二値画像２０５とレイアウトデータ２０７に含まれている文字領域の情報を用い文字認識処理を行い、テキストコードデータ２０９を生成する。ＨＤＤ書き込み部２１０は、レイアウトデータ２０７とテキストコードデータ２０９をＨＤＤ１０７に書き込み、ＨＤＤ１０７上にレイアウトデータ２１１とテキストコードデータ２１２を生成する。 The character recognition unit 208 performs character recognition processing using information on the character area included in the binary image 205 and the layout data 207 to generate text code data 209. The HDD writing unit 210 writes layout data 207 and text code data 209 to the HDD 107, and generates layout data 211 and text code data 212 on the HDD 107.

以上が１文書中の１ページ分の画像に対する処理である。この処理を全てのページに対して行うことで図４に示す文書データの構造がＨＤＤ１０７上に生成される。 The above is the processing for the image for one page in one document. By performing this process on all pages, the document data structure shown in FIG.

図３において、ＨＤＤ読み出し部２０２、ＪＰＥＧ伸張部２０３、画像二値化部２０４、及び領域識別部２０６は、図１における画像処理部１０６に対応し、文字認識部２０８は図１における文字認識部１１０に対応する。 In FIG. 3, an HDD reading unit 202, a JPEG decompression unit 203, an image binarization unit 204, and an area identification unit 206 correspond to the image processing unit 106 in FIG. 1, and a character recognition unit 208 is a character recognition unit in FIG. Corresponds to 110.

３．タイトル情報生成
図５は、図４の文書データにタイトル付加処理を行う動作を説明するブロック図である。 3. Title Information Generation FIG. 5 is a block diagram illustrating an operation for performing a title addition process on the document data of FIG.

文書のタイトルは、先頭ページのレイアウトデータとテキストコードデータから生成する。文書データ３０１は、図４の文書データと同じもので、スキャナ部１０１から取り込まれ、各ページ画像とレイアウトデータ、テキストコードデータが含まれている。この文書データからＨＤＤ読み出し部３０２は、先頭ページのページデータに含まれるレイアウトデータ３０３とテキストコードデータ３０４を読み出す。タイトル生成部３０５は、レイアウトデータとテキストコードデータから、タイトル３０６を生成する。タイトル３０６の生成方法は、例えば、一番上に存在する文字領域のテキストコードから一行分の文字コードデータを取り出し、タイトルとする。また、レイアウトデータとテキストコードデータから、一番大きな文字で記述されている文字列を含む文字領域を探し出しタイトルとする等の方法がある。ＨＤＤ書き込み部３０７は、生成されたタイトル３０６をＨＤＤ１０７の文書情報に追加する。この結果ＨＤＤ１０７には文書データ３０８が生成される。図５において、ＨＤＤ読み出し部３０２、タイトル生成部３０５、及びＨＤＤ書き込み部３０７は図１における画像処理部１０６に対応する。 The document title is generated from the layout data and text code data of the first page. The document data 301 is the same as the document data in FIG. 4 and is taken from the scanner unit 101 and includes each page image, layout data, and text code data. From this document data, the HDD reading unit 302 reads the layout data 303 and the text code data 304 included in the page data of the first page. The title generation unit 305 generates a title 306 from the layout data and text code data. As a method for generating the title 306, for example, one line of character code data is extracted from the text code of the character area existing at the top, and is used as the title. Also, there is a method of searching for a character area including a character string described by the largest character from layout data and text code data and using it as a title. The HDD writing unit 307 adds the generated title 306 to the document information of the HDD 107. As a result, document data 308 is generated in the HDD 107. In FIG. 5, an HDD reading unit 302, a title generation unit 305, and an HDD writing unit 307 correspond to the image processing unit 106 in FIG.

４．要約データ生成
図６は、図５の処理によって生成された文書データに要約データ付加処理を行う動作を概略的に示すブロック図である。 4). Summary Data Generation FIG. 6 is a block diagram schematically showing an operation of performing summary data addition processing on the document data generated by the processing of FIG.

図６において、要約データは、全てのページの、全てのテキストコードデータから表中の文字等本文ではないテキストコードを除いたテキストコードを用い生成する。 In FIG. 6, the summary data is generated by using a text code obtained by excluding a text code that is not a body such as characters in a table from all text code data of all pages.

文書データ４０１は、図５の文書データ３０６と同じもので、スキャナから取り込まれ、各ページ画像とレイアウトデータ、テキストコードデータが含まれている。この文書データからＨＤＤ読み出し部４０２は、まず、各ページのページデータに含まれるレイアウトデータ４０３とテキストコードデータ４０４を読み出す。テキスト抽出部４０５は、レイアウトデータとテキストコードデータを用い、表中の文字や図表のキャプション等の本文ではない部分を除いた本文テキスト４０６を生成する。要約生成部４０７は、入力された本文テキスト情報から要約データ４０８を生成する。ＨＤＤ書き込み部４０９は、生成された要約データ４０９をＨＤＤ１０７の文書データに追加する。この結果ＨＤＤには文書データ４１０が生成される。これにより、ＨＤＤ１０７には図７の検索情報付き文書データを生成することができる。 The document data 401 is the same as the document data 306 in FIG. 5 and is taken from the scanner, and includes each page image, layout data, and text code data. From this document data, the HDD reading unit 402 first reads layout data 403 and text code data 404 included in the page data of each page. The text extraction unit 405 uses the layout data and the text code data to generate body text 406 excluding non-body parts such as characters in the table and captions of the chart. The summary generation unit 407 generates summary data 408 from the input body text information. The HDD writing unit 409 adds the generated summary data 409 to the document data in the HDD 107. As a result, document data 410 is generated in the HDD. As a result, the document data with search information shown in FIG.

図６において、ＨＤＤ読み出し部４０２、テキスト抽出部４０５、要約生成部４０７、及びＨＤＤ書き込み部４０９は図１における画像処理部１０６に対応する。 In FIG. 6, an HDD reading unit 402, a text extracting unit 405, a summary generating unit 407, and an HDD writing unit 409 correspond to the image processing unit 106 in FIG.

５．検索送信機能
図８は、図７の文書データの検索処理を行う動作を説明する図である。 5. Search / Transmission Function FIG. 8 is a diagram for explaining the operation of performing the document data search process of FIG.

ここまでの処理によって、ＨＤＤ１０７には複数の文書データが保存されており、操作部１０３から検索用のキーワードとして「ＡＢＣ」が入力されたものとする。 By the processing so far, a plurality of document data is stored in the HDD 107, and “ABC” is input from the operation unit 103 as a search keyword.

図８において、検索部５０６は、操作部から入力された検索キーワードや検索条件(カラー画像だけに限定する等)を用い、以下のように、各文書のスコアを求める。 In FIG. 8, a search unit 506 obtains the score of each document as follows using a search keyword and search conditions (limited to color images only) input from the operation unit.

文書データ５０１は検索対象となり、各ページ画像とレイアウトデータ、テキストコードデータが含まれている。この文書データ５０１からＨＤＤ読み出し部５０２は、まず、各ページのページデータに含まれるレイアウトデータ５０３とテキストコードデータ５０４を読み出す。検索部５０６は、レイアウトデータ５０３とテキストコードデータ５０４と操作部から入力された検索クエリ５０５を用い、各ページに含まれる各文字領域に対して検索結果によるスコア５０７を求める。検索クエリ５０５は、操作部から入力された検索キーワードや検索条件（カラー画像だけに限定する等）を含む検索情報を用い生成される。ここでは、検索キーワードにヒットした回数をスコア５０７とする。この結果の例を図９に示す。 The document data 501 is a search target and includes each page image, layout data, and text code data. First, the HDD reading unit 502 reads the layout data 503 and the text code data 504 included in the page data of each page from the document data 501. The search unit 506 uses the layout data 503, the text code data 504, and the search query 505 input from the operation unit to obtain a search result score 507 for each character area included in each page. The search query 505 is generated using search information including a search keyword and a search condition (limited to color images only) input from the operation unit. Here, the score 507 is the number of times the search keyword has been hit. An example of the result is shown in FIG.

この処理を全ての文書に対して行い、各文書の検索スコア５０７を求める（再構成手段）。この結果、図１０に示すものが得られる。 This processing is performed for all documents, and a search score 507 for each document is obtained (reconstruction means). As a result, the one shown in FIG. 10 is obtained.

次に、この検索結果を１つのファイルにまとめて送信する。ここでは、テキストと画像データを１つのファイルの中に入れることが可能であるＰＤＦ形式で送信する場合について説明する。 Next, the search results are sent together in one file. Here, a case will be described in which text and image data are transmitted in the PDF format that can be included in one file.

図１０によれば、文書あたりのスコアは文書１の方が文書５より大きいため、文書１の情報を検索結果ページの先頭に配置し、その後に文書５の情報を配置する。文書１の検索結果としてはページ２の領域３の領域スコアが一番高いため、この領域の部分画像をページ画像から切り出し、検索ページに配置する。また文書５の検索結果としては、ページ２の領域１が選ばれる。検索ページには、各文書のタイトルも共に配置する。図１０のような検索結果の場合は、図１１のような検索ページが生成されることになる。検索ページの後には、文書１と文書５の全ページを配置する。検索ページ中の各文書のタイトル部分には、各文書の先頭ページへのリンク情報を埋め込んでおく。また、検索ページ中の部分画像にも、その画像が含まれているページへのリンク情報を埋め込んでおく。実際の送信ファイル形式は図１２のようになる。このようにリンク情報を埋め込んでおくことで、受信側でそのファイルを開いたときに、リンク情報を辿って対象ページを参照することができるようになる。 According to FIG. 10, since the score per document is larger in document 1 than in document 5, information on document 1 is arranged at the top of the search result page, and information on document 5 is arranged thereafter. As a search result of the document 1, since the area score of the area 3 of the page 2 is the highest, a partial image of this area is cut out from the page image and arranged on the search page. As a search result of the document 5, the area 1 of the page 2 is selected. The title of each document is also placed on the search page. In the case of the search result as shown in FIG. 10, a search page as shown in FIG. 11 is generated. All pages of document 1 and document 5 are arranged after the search page. In the title portion of each document in the search page, link information to the first page of each document is embedded. Also, link information to the page including the image is embedded in the partial image in the search page. The actual transmission file format is as shown in FIG. By embedding the link information in this way, when the file is opened on the receiving side, the target page can be referred to by following the link information.

このように生成したＰＤＦファイルを図１中の送受信部１０９を用いファイルサーバ等へ送信する。 The PDF file generated in this way is transmitted to a file server or the like using the transmission / reception unit 109 in FIG.

上記実施の形態では、検索によってヒットした文書の全てのページが送信されることになるが、ＭＦＰがWebサーバとしても動作している場合は、ヒットした文書の全てのページを添付するのではなく、その文書にアクセス可能なＵＲＬを添付してもよい。この場合の送信ファイルの例を図１３に示す。 In the above embodiment, all pages of the document hit by the search are transmitted. However, when the MFP also operates as a Web server, not all pages of the hit document are attached. A URL that can access the document may be attached. An example of the transmission file in this case is shown in FIG.

本実施の形態によれば、複数の文書の各々をページ毎及び各ページの領域毎に区分けした上で所定の検索キーワードに基づいて複数の領域を検索し、この検索結果に基づいて、所定のキーワードによる検索のヒット数であるスコアが最も多い領域、及びそのページ及びその文書を選択する。 According to the present embodiment, each of a plurality of documents is divided into pages and regions of each page, and a plurality of regions are searched based on a predetermined search keyword. An area having the highest score, which is the number of hits for a search by keyword, and its page and its document are selected.

また、本発明の目的は、上記実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体（又は記録媒体）を、システム又は装置に供給し、そのシステム又は装置のコンピュータ（又はＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。 Another object of the present invention is to supply a storage medium (or recording medium) in which a program code of software for realizing the functions of the above-described embodiments is recorded to a system or apparatus, and to perform computer (or CPU or MPU) of the system or apparatus. Needless to say, this is also achieved by reading and executing the program code stored in the storage medium.

この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。 In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention.

また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム(ＯＳ)等が実際の処理の一部又は全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) or the like running on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.

さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部又は全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, after the program code read from the storage medium is written in a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. It goes without saying that the case where the CPU or the like provided in the card or the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

また、上記プログラムは、上述した実施の形態の機能をコンピュータで実現することができればよく、その形態は、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給されるスクリプトデータ等の形態を有するものでもよい。 The above-described program only needs to be able to realize the functions of the above-described embodiments by a computer, and the form includes forms such as object code, a program executed by an interpreter, and script data supplied to the OS. But you can.

プログラムを供給する記録媒体としては、例えば、ＲＡＭ、ＮＶ−ＲＡＭ、フロッピー（登録商標）ディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＭＯ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＤＶＤ（ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、ＤＶＤ−ＲＷ、ＤＶＤ＋ＲＷ）、磁気テープ、不揮発性のメモリカード、他のＲＯＭ等の上記プログラムを記憶できるものであればよい。又は、上記プログラムは、インターネット、商用ネットワーク、若しくはローカルエリアネットワーク等に接続される不図示の他のコンピュータやデータベース等からダウンロードすることにより供給される。 As a recording medium for supplying the program, for example, RAM, NV-RAM, floppy (registered trademark) disk, optical disk, magneto-optical disk, CD-ROM, MO, CD-R, CD-RW, DVD (DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), magnetic tape, non-volatile memory card, other ROM, etc. may be used as long as they can store the above programs. Alternatively, the program is supplied by downloading from another computer or database (not shown) connected to the Internet, a commercial network, a local area network, or the like.

本発明の実施の形態に係る画像処理装置を備える多機能型プリンタ（ＭＦＰ）の構成を概略的に示すブロック図である。1 is a block diagram schematically showing a configuration of a multifunction printer (MFP) including an image processing apparatus according to an embodiment of the present invention. 図１におけるＨＤＤに蓄積された文書データを説明する図である。It is a figure explaining the document data accumulate | stored in HDD in FIG. 図２の文書データにデータ処理を行う動作を説明する図である。It is a figure explaining the operation | movement which performs a data process to the document data of FIG. 図３のデータ処理によって生成された文書データを説明する図である。It is a figure explaining the document data produced | generated by the data processing of FIG. 図４の文書データにタイトル付加処理を行う動作を説明するブロック図である。FIG. 5 is a block diagram illustrating an operation for performing a title addition process on the document data of FIG. 4. 図５の文書データに要約データ付加処理を行う動作を概略的に示すブロック図である。FIG. 6 is a block diagram schematically showing an operation of performing summary data addition processing on the document data of FIG. 5. 図６の要約データ付加処理によって生成された文書データを説明する図である。It is a figure explaining the document data produced | generated by the summary data addition process of FIG. 図７の文書データの検索処理を行う動作を説明する図である。It is a figure explaining the operation | movement which performs the search process of the document data of FIG. 図８の検索処理の結果の一例を示す図である。It is a figure which shows an example of the result of the search process of FIG. 図８の検索処理の結果の一例を示す図であり、全ての文書を対象とした場合を示す。It is a figure which shows an example of the result of the search process of FIG. 8, and shows the case where all the documents are object. 図８の検索処理の結果としての検索ページを説明する図である。It is a figure explaining the search page as a result of the search process of FIG. 図８の検索処理の結果としての送信ファイル形式を説明する図である。It is a figure explaining the transmission file format as a result of the search process of FIG. 図８の検索処理の結果としての送信ファイル形式を説明する図であり、アクセス可能なＵＲＬを添付した場合を示す。It is a figure explaining the transmission file format as a result of the search process of FIG. 8, and shows the case where accessible URL is attached. 領域分割処理の一例を示す図である。It is a figure which shows an example of an area | region division process. 領域分割処理結果に基づくレイアウトデータを説明する図である。It is a figure explaining the layout data based on a region division process result.

Explanation of symbols

１０１スキャナ部
１０２プリンタ部
１０３操作部
１０４モデム
１０５ネットワークコントローラ部
１０６画像処理部
１０７ストレージ部
１０８メモリ
１０９送受信部
１１０文字認識部
１１１文字認識用辞書
１１２検索部
１１３検索用辞書 DESCRIPTION OF SYMBOLS 101 Scanner part 102 Printer part 103 Operation part 104 Modem 105 Network controller part 106 Image processing part 107 Storage part 108 Memory 109 Transmission / reception part 110 Character recognition part 111 Character recognition dictionary 112 Search part 113 Search dictionary

Claims

In an image processing apparatus that retrieves predetermined data from a plurality of data stored in a data storage unit,
Search means for searching for the plurality of partial areas based on predetermined search information after dividing each of the plurality of data for each page and each partial area of each page, and the plurality of data based on the search results An image processing apparatus comprising: reconstruction means for reconfiguring the image processing apparatus.

The image processing apparatus according to claim 1, wherein the search information includes a search keyword.

3. The image processing apparatus according to claim 2, wherein the reconstruction unit selects a partial area having the largest number of search hits based on the predetermined search keyword, its page, and its data.

The image processing apparatus according to claim 2, wherein the reconfiguration unit selects a page having the largest number of search hits according to the predetermined search keyword and its data.

The image processing apparatus according to claim 2, wherein the reconstruction unit selects data having the largest number of search hits based on the search keyword.

The image processing apparatus according to claim 1, wherein the data is document data.

In an image processing method for retrieving predetermined data from a plurality of data stored in a data storage unit,
A search step of searching each of the plurality of partial areas based on predetermined search information after dividing each of the plurality of data for each page and each partial area of each page, and the plurality of data based on the search result An image processing method comprising: a reconstruction step for reconstructing the image.

The image processing method according to claim 7, wherein the search information includes a search keyword.

8. The image processing method according to claim 7, wherein the reconstructing step selects the partial area having the largest number of search hits by the search keyword, its page, and its data.

8. The image processing method according to claim 7, wherein the reconstructing step selects a page having the largest number of search hits according to the predetermined search keyword and its data.

8. The image processing method according to claim 7, wherein the reconstruction step selects data having the largest number of search hits based on the predetermined search keyword.

12. The image processing method according to claim 7, wherein the data is document data.

In an image processing program for retrieving predetermined data from a plurality of data stored in a data storage unit,
A search module that searches the plurality of partial areas based on predetermined search information after dividing each of the plurality of data into pages and partial areas of each page, and the plurality of data based on the search results An image processing program for causing a computer to execute a reconstruction module for reconfiguring the image.

A computer-readable storage medium storing the program according to claim 11.