JP2008129793A

JP2008129793A - Document processing system, apparatus and method, and recording medium with program recorded thereon

Info

Publication number: JP2008129793A
Application number: JP2006313148A
Authority: JP
Inventors: Tomomi Takada; 智美高田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2006-11-20
Filing date: 2006-11-20
Publication date: 2008-06-05

Abstract

<P>PROBLEM TO BE SOLVED: To make it possible to highly accurately retrieve a table in a document or a part of the table by extracting a text including a language expression similar to the element name of a line or a column of an emphasized portion in emphasizing a partial item in the table, and utilizing the extracted text for retrieval. <P>SOLUTION: The element name of a line or a column of a table is extracted, an item emphasized in the table is extracted, and meta data relative to the table are extracted on the basis of the element name of the line or the column relative to the item of the table. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、文書中に含まれる画像や表、および画像や表に関連するメタデータを抽出することで、文書中に含まれる所望の画像や表を検索し、活用できるようにするシステムおよび方法およびプログラムを記録した記録媒体に関する。 The present invention provides a system and method for retrieving and utilizing desired images and tables contained in a document by extracting images and tables included in the document and metadata related to the images and tables. And a recording medium on which the program is recorded.

電子文書の普及に伴い、それらを有効活用したいという需要が高まっている。電子文書に対する操作は、一度作成・利用した後で蓄積・保存し、更にこれを編集・加工することによって新しい文書を作成する等、文書作成のコスト削減のために再利用するのが一般的である。また、同様に、印刷文書もコンピュータに取り込んで、その内容を再利用したいという要求がある。印刷文書や電子文書を効率的に再利用するためには、大量の文書の中から必要な情報を探し出すための検索技術が重要となる。文書の中には、文字情報だけでなく、図、表、写真等の画像情報や音声情報等も含まれており、特に利用頻度が高いと考えられる。文書に含まれる文字情報の場合は、指定された検索語と文字情報のマッチングを行うことで容易に検索することができるが、画像情報等の場合は、それ自体は文字情報をもたないため、画像情報等に検索のためのメタデータを付加する技術が提案されている。 With the widespread use of electronic documents, there is an increasing demand for effective use of them. Operations on electronic documents are typically created and used, then stored and saved, and then edited and processed to create new documents and reuse them to reduce document creation costs. is there. Similarly, there is a demand for taking a printed document into a computer and reusing the contents. In order to efficiently reuse a printed document and an electronic document, a search technique for searching for necessary information from a large amount of documents is important. The document includes not only character information but also image information such as diagrams, tables, and photographs, audio information, and the like, and is considered to be particularly frequently used. In the case of character information contained in a document, it can be easily searched by matching the specified search word with character information. However, in the case of image information, etc., it does not have character information itself. A technique for adding metadata for search to image information or the like has been proposed.

特開平11-025113は、画像を含む文書を画像領域と文字領域に分割し、文字領域中から画像の内容を記述したテキストを抽出する。 Japanese Patent Laid-Open No. 11-025113 divides a document including an image into an image region and a character region, and extracts text describing the contents of the image from the character region.

例えば、画像の内容を記述したテキストとして、文字領域から、キャプションと、キャプションに含まれる画像番号語を含むテキストを抽出し、キャプションと最も距離が近い画像と関連付けている。 For example, as the text describing the contents of the image, the text including the caption and the image number word included in the caption is extracted from the character area, and is associated with the image closest to the caption.

この技術では、主に、領域間の画素単位の距離によって、テキストと画像を関連付けているが、空間的な距離だけで、テキストと画像の関連性を判断することはできない。また、この技術では、画像に関する語（画像番号語や画像指示語）によって、テキストと画像を関連付けているが、語を含むことだけでテキストと画像の関連性を判断することはできない。 In this technique, the text and the image are associated with each other mainly based on the pixel unit distance between the regions. However, the relevance between the text and the image cannot be determined only by the spatial distance. In this technique, the text and the image are associated with each other using a word related to the image (image number word or image instruction word). However, the relevance between the text and the image cannot be determined only by including the word.

文書画像を解析し、その論理構造を抽出・構造化する技術として、例えば以下の手法が提案されている。 As techniques for analyzing a document image and extracting and structuring its logical structure, for example, the following methods have been proposed.

特開平11-250041は、印刷文書をスキャンして得られた文書のページ画像から、テキスト、画像、セパレータ等の領域とそのレイアウト構造を抽出し、更にテキスト領域からタイトル、ヘッダ、本文等の論理オブジェクトを抽出する。また、ページ内の論理オブジェクトに対して読み順や他の論理オブジェクトとの関係を決定し、ページ単位での論理構造を抽出する。 Japanese Patent Laid-Open No. 11-250041 extracts areas such as text, images, and separators and their layout structure from a page image of a document obtained by scanning a printed document, and further extracts logic such as a title, header, and body from the text area. Extract objects. Also, the reading order and the relationship with other logical objects are determined for the logical objects in the page, and the logical structure in page units is extracted.

しかし、この技術では、文書の論理構造を抽出するだけで、文書に含まれる画像領域に注目してテキスト領域との関係付けなどは行っていない。
特開平１１−０２５１１３号公報特開平１１−２５００４１号公報特開平９−４４５９４号公報特開平１１−０７３４７４号公報特開平６−９６２７５号公報 However, with this technique, only the logical structure of the document is extracted, and attention is paid to the image area included in the document and the relation with the text area is not performed.
JP-A-11-025113 Japanese Patent Laid-Open No. 11-250041 JP-A-9-44594 Japanese Patent Laid-Open No. 11-073474 JP-A-6-96275

文書中の画像や表等について、より正確な検索を行うためには、画像や表等に関する情報をいかに抽出するかが重要である。しかし、従来は、画像や表などについている1つのキャプションをもとに、そのテキスト・キャプション自身と、近傍の本文中のテキストを検索用のメタデータとして抽出する一般的な方式しかなく、画像や表の持つ構造や関係などまで配慮したメタデータの抽出方式はなかった。 In order to perform a more accurate search for images, tables, etc. in a document, it is important how to extract information relating to images, tables, etc. However, conventionally, there is only a general method for extracting the text caption itself and the text in the nearby text as metadata for search based on one caption attached to the image, table, etc. There was no metadata extraction method that considered the structure and relationships of tables.

本発明は、上記課題を鑑みてなされたものであり、文書中の画像、表、本文の関係をより正確に捉えることで、画像や表について、より正確な検索ができるようにすることを目的とする。特に、表中の一部の項目に対して、背景や文字に彩色、飾りを施すことでその部分を強調した場合に、近傍の本文中から、強調された部分の行や列の要素名に類似した言語表現を含むテキストを探してメタデータとして抽出し、検索時に利用することによって、文書中の表または表の一部を精度良く容易に検索できるようにすることを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to enable a more accurate search for images and tables by more accurately grasping the relationship between images, tables and texts in a document. And In particular, if some parts in the table are highlighted by coloring or decorating the background or characters, the element names of the highlighted parts in rows and columns are selected from the nearby text. An object of the present invention is to make it possible to easily and accurately search a table or a part of a table in a document by searching for texts containing similar language expressions, extracting them as metadata, and using them when searching.

また、本発明は、上記課題を鑑みてなされたものであり、近傍の本文中から、表の行や列の要素名に類似した言語表現を含むテキストを探してメタデータとして抽出する場合に、強調された部分の行や列の要素名に類似した言語表現を含むテキストに対しては高い重要度を付与し、検索時に利用することによって、文書中の表または表の一部を精度良く容易に検索できるようにすることを目的とする。 In addition, the present invention has been made in view of the above problems, and when searching for text including a linguistic expression similar to an element name of a row or column of a table and extracting it as metadata from nearby text, Gives high importance to texts that contain linguistic expressions similar to the highlighted part row or column element names, and can be used during search to easily and accurately use a table or part of a table in a document The purpose is to be able to search.

文書中の画像や表、およびその画像や表に関連するメタデータを抽出することで、文書中の所望の画像や表を検索し、活用できる文書処理システムにおいて、文書中の画像や表といった文書の構成部品を検索するために、あらかじめ、検索対象となる文書に対してレイアウトなどの解析を行い、その文書に含まれる画像や表、さらに表の項目などの各構成部分に対して、それを特徴付ける情報を取り出す手段と、前記文書中の表の項目の特徴情報を基に、前記文書に含まれる表に関するメタデータを抽出するメタデータ抽出手段と、前記文書に含まれる表と、前記メタデータ抽出手段により抽出されたメタデータを関連付けて格納するメタデータ格納手段と、前記文書中に含まれる表を検索するための検索条件を入力するための検索条件入力手段と、前記入力された検索条件に基づいて、前記表に関連付けられたメタデータを検索する検索手段と、前記検索手段によって検索された文書中の表または表の一部を検索結果として表示する検索結果表示手段と、を備えたことを特徴とすることである。 Documents such as images and tables in documents in a document processing system that can retrieve and use desired images and tables in documents by extracting the images and tables in the documents and metadata related to the images and tables. In order to search for components, the layout of the document to be searched is analyzed in advance, and the components such as images and tables included in the document, as well as table items, are analyzed. Means for extracting information to be characterized; metadata extraction means for extracting metadata relating to a table included in the document based on feature information of items in the table in the document; a table included in the document; and the metadata Metadata storage means for associating and storing metadata extracted by the extraction means, and search condition input for inputting a search condition for searching a table included in the document And a search means for searching for metadata associated with the table based on the input search condition, and a table or a part of the table in the document searched by the search means is displayed as a search result. And a search result display means.

以上説明した様に、本発明によれば、文書画像に含まれる画像や表に関連するメタデータを精度良く抽出できるようになる。 As described above, according to the present invention, metadata related to images and tables included in a document image can be extracted with high accuracy.

また、本発明によれば、抽出されたメタデータを利用することによって、文書画像に含まれる画像や表をユーザの要求に応じて精度良く容易に検索できるようになる。 Further, according to the present invention, by using the extracted metadata, it becomes possible to easily and accurately search for an image or a table included in a document image according to a user request.

また、本発明によれば、その結果、文書中の画像や表を効率的に再利用できるようになる。 Further, according to the present invention, as a result, images and tables in the document can be efficiently reused.

また、本発明によれば、抽出されたメタデータを利用することによって、文書および文書中の画像や表を蓄積する時に、効率的に分類・整理・管理することができるようになる。 Further, according to the present invention, by using the extracted metadata, it is possible to efficiently classify, organize, and manage documents and images and tables in the documents.

本実施例では、文書登録時に、文書画像を解析して、文字、図、表等の領域とそれらのレイアウトを抽出し、それに基づいて領域の論理属性とそれらの論理的関係を構造化した論理構造を抽出し、それに基づいて検索用メタデータを抽出する構成となっているが、必ずしも文書の論理構造を解析する必要はなく、検索用メタデータを抽出するのに必要な情報が得られればどのような解析を行っても構わない。 In this embodiment, at the time of document registration, a document image is analyzed, areas such as characters, diagrams, and tables and their layouts are extracted, and based on this, logical attributes of the areas and logical relationships that structure their logical relationships are extracted. The structure is extracted, and the search metadata is extracted based on it. However, it is not always necessary to analyze the logical structure of the document, and if the information necessary to extract the search metadata is obtained Any analysis may be performed.

以下、図面を参照して本発明の好適な実施の形態について説明する。 Preferred embodiments of the present invention will be described below with reference to the drawings.

図１は、本発明の一実施形態に係る文書処理システムが構築されるコンピュータ装置の基本構成を示すブロック図である。 FIG. 1 is a block diagram showing a basic configuration of a computer apparatus in which a document processing system according to an embodiment of the present invention is constructed.

図１において、１０１はＣＰＵであり、本実施形態の文書処理装置における各種制御を実行する。その作業内容は、後述するＲＯＭ１０２やＲＡＭ１０３上のプログラムによって指示される。また、ＣＰＵ自身の機能や、計算機プログラムの機構により、複数の計算機プログラムを並列に動作させることができる。１０２はＲＯＭであり、ＣＰＵによる制御の手順を記憶させた計算機プログラムエリアやデータエリアが格納されている。１０３はＲＡＭであり、ＣＰＵ１０１が処理するための制御プログラムを格納するとともに、ＣＰＵ１０１が各種制御を実行する際の作業領域を提供する。１０４は、アルファベットキー、ひらがなキー、カタカナキー、句点等の文字記号入力キー、及び、カーソル移動を指示するカーソル移動キー等のような各種の機能キーを備えたキーボードであり、ユーザによる各種入力操作環境を提供する。なお、マウスのようなポインティングデバイスも含むこともできる。１０５は各構成に接続されているアドレスバス、データバスなどである。１０６は、さまざまなデータ等を記憶するための外部記憶装置であり、ハードディスクやフレキシブルディスク、光ディスク、磁気ディスク、光磁気ディスク、磁気テープ、不揮発性のメモリカード等の記録媒体と、記憶媒体を駆動し、情報を記録するドライブなどで構成される。保管された計算機プログラムやデータはキーボード等の指示や、各種計算機プログラムの指示により、必要な時にＲＡＭ上に完全もしくは部分的に呼び出される。 In FIG. 1, reference numeral 101 denotes a CPU which executes various controls in the document processing apparatus according to the present embodiment. The content of the work is instructed by a program on the ROM 102 or RAM 103 described later. Further, a plurality of computer programs can be operated in parallel by the function of the CPU itself and the computer program mechanism. Reference numeral 102 denotes a ROM which stores a computer program area and a data area in which a control procedure by the CPU is stored. Reference numeral 103 denotes a RAM which stores a control program for processing by the CPU 101 and provides a work area when the CPU 101 executes various controls. Reference numeral 104 denotes a keyboard having various function keys such as alphabet key, hiragana key, katakana key, character symbol input key such as a punctuation mark, and cursor movement key for instructing cursor movement. Provide an environment. A pointing device such as a mouse can also be included. Reference numeral 105 denotes an address bus or a data bus connected to each component. 106 is an external storage device for storing various data and the like, and drives a recording medium such as a hard disk, a flexible disk, an optical disk, a magnetic disk, a magneto-optical disk, a magnetic tape, and a nonvolatile memory card, and a storage medium. And a drive for recording information. The stored computer program and data are called completely or partially on the RAM when necessary by an instruction from a keyboard or the like or an instruction from various computer programs.

１０７は表示器であり、ディスプレイなどで構成され、各種入力操作の状態をユーザに対して表示する。 Reference numeral 107 denotes a display, which includes a display or the like, and displays various input operation states to the user.

１０８は、他のコンピュータ装置等と通信を行うための通信デバイスであり、ネットワーク（ＬＡＮ）等を介して、図にはないが遠隔地に存在する装置と通信し、本実施形態のプログラムやデータを共有することが可能になる。 Reference numeral 108 denotes a communication device for communicating with other computer devices and the like, which communicates with a device located in a remote place (not shown) via a network (LAN), etc. Can be shared.

１０９は、画像を読み取るためのイメージスキャナであり、セットされた紙原稿を１枚ずつ光学的に読み取り、イメージ信号をデジタル信号列に変換する。読み取られた画像データは、外部記憶装置やＲＡＭ等に格納することができる。 Reference numeral 109 denotes an image scanner for reading an image, which optically reads a set paper document one by one and converts an image signal into a digital signal sequence. The read image data can be stored in an external storage device, RAM, or the like.

ＲＯＭやＲＡＭなどの記憶媒体には、本実施形態のデータ管理を実現する計算機プログラムやデータなどが格納されており、これらの記録媒体に格納されたプログラムコードを、ＣＰＵが読み出し実行することによって機能が実現されるが、記憶媒体の種類は問われない。また、本発明に係わるプログラムやデータを格納した外部記録装置を本システムあるいは本装置に供給して、ＲＡＭなどの書き換え可能な記憶媒体上に前記の外部記憶装置から、そのプログラムがＲＡＭ上にコピーされる可能性があるが、その外部記憶装置としては、フレキシブルディスク、ＣＤ−ＲＯＭ、ハードディスク、メモリカード、光磁気ディスクなど何でもよい。 A storage medium such as a ROM or a RAM stores a computer program or data for realizing the data management of the present embodiment, and functions by the CPU reading and executing the program code stored in the recording medium. However, the type of storage medium is not limited. Also, an external recording device storing the program and data according to the present invention is supplied to the system or the device, and the program is copied from the external storage device onto a rewritable storage medium such as a RAM. The external storage device may be anything such as a flexible disk, a CD-ROM, a hard disk, a memory card, and a magneto-optical disk.

１０４は、ユーザによる各種入力操作環境を提供するキーボードやマウスであるが、ユーザからの各種入力操作環境を提供するものであれば、タッチパネル、スタイラスペン等、何でもよい。 Reference numeral 104 denotes a keyboard or mouse that provides various input operation environments by the user, but any device such as a touch panel or a stylus pen may be used as long as it provides various input operation environments from the user.

１０８の通信手段としては、ＲＳ２３２ＣやＵＳＢ、ＩＥＥＥ１３９４、Ｐ１２８４、ＳＣＳＩ、モデム、有線通信や、赤外線通信、IEEE802.11b等の無線通信等何でもよいし、本発明の装置と接続されている機器も通信手段をもつものであれば何でもかまわない。 The communication means 108 may be anything such as RS232C, USB, IEEE 1394, P1284, SCSI, modem, wired communication, infrared communication, wireless communication such as IEEE802.11b, etc., and devices connected to the apparatus of the present invention also communicate. It doesn't matter as long as it has means.

また、画像データは、１０９のイメージスキャナではなく、図にはないが１０８の通信デバイスによって接続されたネットワークスキャナやコピー装置等の入力機器を介して、入力されてもよいし、読み取られた画像データも、外部記憶装置やＲＡＭ等ではなく、通信デバイスによって接続されたサーバやコピー機等の外部記憶装置等に格納してもよい。 In addition, the image data may be input via an input device such as a network scanner or a copying apparatus connected to the communication device 108 (not shown) instead of the 109 image scanner. Data may also be stored in an external storage device such as a server or a copier connected by a communication device, instead of an external storage device or RAM.

以上述べてきた構成は、本実施形態における一例であり、本発明においては、特にこれに限定されるものでない。 The configuration described above is an example in the present embodiment, and the present invention is not particularly limited to this.

本実施形態における文書登録時の動作について詳細に説明する。図３は、文書登録処理の流れの例を示したフローチャートである。この処理プログラムは、ＲＯＭ１０２に格納されており、ＣＰＵ１０１によって実行される。 The operation at the time of document registration in this embodiment will be described in detail. FIG. 3 is a flowchart showing an example of the flow of document registration processing. This processing program is stored in the ROM 102 and executed by the CPU 101.

本実施例の動作説明においては、一例として、イメージスキャナ１０９等の入力機器を用いて読み取られた紙文書を対象として説明を行うが、紙文書だけでなく、ワードプロセッサ等の編集ソフトウエアで作成した文書や、ＨＴＭＬ等で記述された文書、ＰＤＦ等のような形式の電子文書でもかまわない。但し、電子文書の場合はステップＳ３０１の入力処理においてフォーマット変換等の処理が必要となる。以下、図３を用いて詳細に説明する。 In the description of the operation of the present embodiment, as an example, a description will be given for a paper document read using an input device such as the image scanner 109. However, not only the paper document but also an editing software such as a word processor was used. It may be a document, a document described in HTML or the like, or an electronic document in a format such as PDF. However, in the case of an electronic document, processing such as format conversion is required in the input processing in step S301. Hereinafter, it demonstrates in detail using FIG.

ステップＳ３０１は、イメージスキャナ１０９やネットワークに接続されたコピー機等の入力機器を用いて、画像および文字情報が混在した１ページまたは複数ページのページから成る文書を読み取り、電子化されたページ単位の文書画像を得るステップである。入力機器によって入力される文書画像には、２値画像、カラー画像等がある。なお、電子化された文書画像を得た後で、各ページ画像について、ノイズ除去処理や、向きと傾きの補正処理等を行ってもよい。ページ画像の向きと傾きを判定し修正する方法としては、例えば特開平９−４４５９４や特開平１１−０７３４７４等に示されるような方法があるが、これに限るものではなく、どのような方法でもかまわない。 In step S301, an input device such as an image scanner 109 or a copier connected to a network is used to read a document composed of one or a plurality of pages in which images and character information are mixed, and digitized page units. This is a step of obtaining a document image. Document images input by the input device include binary images and color images. Note that after obtaining the digitized document image, noise removal processing, orientation and inclination correction processing, and the like may be performed on each page image. As a method for determining and correcting the orientation and inclination of the page image, for example, there are methods as disclosed in JP-A-9-44594, JP-A-11-073474, and the like. It doesn't matter.

ステップＳ３０２は、ステップＳ３０１で入力された文書画像を解析する処理を行うステップである。ステップＳ３０２の処理については、図４を用いて後述する。 Step S302 is a step of performing processing for analyzing the document image input in step S301. The process of step S302 will be described later with reference to FIG.

次に、ステップＳ３０３において、各領域に関する各種情報、レイアウト抽出結果、および文字領域に含まれる文字情報の特徴などを基に解析を行い、文書の論理構造を抽出する。論理構造とは、ステップＳ３０２で抽出された領域やページに対して、論理的な論理属性を抽出して付与したもの、およびそれらの論理的な関係を推定し構造化したものである。ステップＳ３０３の処理については、図５を用いて後述する。 In step S303, the logical structure of the document is extracted by performing analysis based on various information about each area, layout extraction results, and characteristics of character information included in the character area. The logical structure is obtained by extracting and assigning logical logical attributes to the area or page extracted in step S302 and estimating and structuring the logical relationship between them. The process of step S303 will be described later with reference to FIG.

次に、ステップＳ３０４において、ステップＳ３０３で抽出された論理構造を基に、ページ画像中の文字領域から画像や表に関連する検索用メタデータを抽出する処理を行う。ステップＳ３０４の処理については、図６を用いて後述する。 Next, in step S304, based on the logical structure extracted in step S303, processing for extracting search metadata related to images and tables from the character area in the page image is performed. The process of step S304 will be described later with reference to FIG.

以上の図３を用いて説明した文書登録処理は、本発明における処理の一例であり、他にもいろいろなものが考えられ、処理の順や処理内容は、このとおりでなくてもよい。 The document registration process described with reference to FIG. 3 is an example of the process according to the present invention, and various other processes are conceivable. The order of processing and the processing content may not be as described above.

図４は、図３のステップＳ３０２の処理の一例を示したフローチャートであり、本発明における文書画像解析時の処理の例を示したフローチャートである。この処理プログラムは、ＲＯＭ１０２に格納されており、ＣＰＵ１０１によって実行される。 FIG. 4 is a flowchart showing an example of the processing in step S302 of FIG. 3, and is a flowchart showing an example of processing at the time of document image analysis in the present invention. This processing program is stored in the ROM 102 and executed by the CPU 101.

図３のＳ３０１で入力された文書が、電子文書等の様に文章を文字コードで保持している場合は、ステップＳ４０２の文字認識処理は不要となる。以下、図４を用いて詳細に説明する。 If the document input in S301 of FIG. 3 holds a sentence as a character code, such as an electronic document, the character recognition process in step S402 is not necessary. Hereinafter, it demonstrates in detail using FIG.

ステップＳ４０１では、図３のステップＳ３０１で読み取られた文書の各ページ画像について領域分割を行い、文字、図、表、写真等の画像を内包する矩形領域を、その矩形の種類とサイズ、ページ内での位置座標等の物理的な情報と共に抽出する。文字領域については、文字列の読み方向と文字サイズを検出し、それに基づいて文字列行と文字を抽出する。ここでは、文字列の方向が同じで、文字サイズと文字間値と行間値がほぼ均一である領域を一まとまりの文字領域として抽出するものとする。非文字領域については、写真、表、枠や線等を検出し、領域として抽出する。また、領域抽出時に、領域の特徴に基づいた物理的な情報を抽出してもよい。例えば、文字領域の場合は、文字飾りや文字スタイル等に関する情報を共に抽出してもよいし、非文字領域の場合は、画像特徴等を解析することによって得られた情報を抽出してもよい。入力された文書画像がカラー画像等の多値の場合は、２値に変換することで同様に領域分割処理を行うことができる。このステップでの領域分割方法としては、例えば特開平６−９６２７５号等で示される方法を利用できるが、この方法に限るものではなく、文字、画像等の領域を抽出できれば、どのような方法でもかまわない。 In step S401, area division is performed for each page image of the document read in step S301 in FIG. 3, and a rectangular area containing images such as characters, diagrams, tables, and photographs is classified into the type and size of the rectangle, and within the page. It is extracted together with physical information such as position coordinates at. For the character area, the reading direction and the character size of the character string are detected, and the character string line and the character are extracted based on the detected direction. Here, an area in which the direction of the character string is the same and the character size, the inter-character value, and the inter-line value are almost uniform is extracted as a set of character areas. For non-character areas, photographs, tables, frames, lines, etc. are detected and extracted as areas. Further, physical information based on the features of the region may be extracted at the time of region extraction. For example, in the case of a character area, information relating to character decoration, character style, etc. may be extracted together. In the case of a non-character area, information obtained by analyzing image features may be extracted. . When the input document image is a multi-value such as a color image, the region division process can be performed in the same manner by converting it to binary. As a region dividing method in this step, for example, the method disclosed in Japanese Patent Laid-Open No. 6-96275 can be used. However, the method is not limited to this method, and any method can be used as long as regions such as characters and images can be extracted. It doesn't matter.

図９は、ある文書におけるページ画像や各ページ画像から抽出された領域についての各種物理的な情報の一例を示している。 FIG. 9 shows an example of various physical information about a page image and a region extracted from each page image in a certain document.

次に、ステップＳ４０２に進み、ステップＳ４０１で抽出した全ての文字領域に対して文字認識処理を行う。 Next, the process proceeds to step S402, and character recognition processing is performed on all the character areas extracted in step S401.

次に、ステップＳ４０３で、文書の各ページにおけるレイアウトを抽出し、矩形領域の物理的な情報に従って、各ページ画像における各矩形領域の空間的な関係を抽出する。例えば、ページ画像内の２つの領域に対する空間的な関係として、互いの領域が存在する上下左右の方向や、２つの領域が重なっている、接している、含まれている等の状態、２つの領域の大小関係などを、各矩形領域の位置座標やサイズを用いて演算し、判定する。以上の解析結果は、ページ毎に木構造やネットワーク構造で表現することができる。ここで挙げた各矩形領域間の関係およびその表現方法は、本実施例における一つの例であり、他の関係が抽出されてもよいし、また解析結果を他の方法で表現してもかまわない。例えば、レイアウトとして、各矩形領域のページ全体に対する相対的な位置やサイズ等を抽出してもよい。 Next, in step S403, the layout of each page of the document is extracted, and the spatial relationship between the rectangular areas in each page image is extracted according to the physical information of the rectangular areas. For example, as a spatial relationship between two areas in the page image, a state in which the areas exist, up and down, left and right directions, two areas are overlapped, touched, included, etc. The size relationship between the areas is calculated and determined using the position coordinates and size of each rectangular area. The above analysis results can be expressed in a tree structure or network structure for each page. The relationship between the rectangular regions and the method of expressing them are just examples in the present embodiment, and other relationships may be extracted, and the analysis results may be expressed by other methods. Absent. For example, the relative position and size of each rectangular area with respect to the entire page may be extracted as the layout.

図１０は、あるページ画像における各領域の空間的な関係を抽出した結果の一例を示している。 FIG. 10 shows an example of the result of extracting the spatial relationship between regions in a page image.

ステップＳ４０１からステップＳ４０４の処理結果は、全てＲＡＭや外部記憶装置等の記憶媒体に格納される。 All processing results from step S401 to step S404 are stored in a storage medium such as a RAM or an external storage device.

ステップＳＳ４０１からステップＳ４０３の処理を入力された文書の全てのページ画像について行う。 The processing from step SS401 to step S403 is performed for all page images of the input document.

以上の図４を用いて説明した文書画像の解析処理は、本発明における処理の一例であり、他にもいろいろなものが考えられ、処理の順や処理内容は、このとおりでなくてもよい。 The document image analysis process described above with reference to FIG. 4 is an example of the process in the present invention, and various other processes are conceivable. The processing order and the processing content may not be as described above. .

図５は、図３のステップＳ３０３の処理の一例を示したフローチャートであり、本発明における論理構造解析処理の例を示したフローチャートである。この処理プログラムは、ＲＯＭ１０２に格納されており、ＣＰＵ１０１によって実行される。以下、図５を用いて詳細に説明する。 FIG. 5 is a flowchart showing an example of the process in step S303 of FIG. 3, and is a flowchart showing an example of the logical structure analysis process in the present invention. This processing program is stored in the ROM 102 and executed by the CPU 101. Hereinafter, it demonstrates in detail using FIG.

ステップＳ５０１では、ページ及び領域の読み順を判定する処理を行う。
ページ及び領域の読み順は、文字領域の読み方向や位置関係、文書中の主な文字領域の読み方向から判定できる組方向等によって判別できる。例えば、日本語で横組のページでは、文字領域は、同段の上から下の段へ、左から右の段へと読み順が付与されることになる。これは判定方法の一例であり、他にもいろいろな方法がある。 In step S501, processing for determining the reading order of pages and areas is performed.
The reading order of pages and areas can be determined by the reading direction and positional relationship of character areas, the grouping direction that can be determined from the reading directions of main character areas in a document, and the like. For example, in a page written horizontally in Japanese, the reading order of character areas is given from the top to the bottom of the same row and from the left to the right. This is an example of a determination method, and there are various other methods.

ステップＳ５０１の読み順判定処理は必ずしも必要なものではなく、文書の種類によっては省略してもかまわない。 The reading order determination process in step S501 is not necessarily required, and may be omitted depending on the type of document.

ステップＳ５０２では、ページや領域の論理属性と領域間の関係を解析する処理を行う。 In step S502, a process of analyzing the logical attribute of the page or area and the relationship between the areas is performed.

例えば、文書の先頭にあり、他のページに比べて空白が多く、文書内の他の文字と比較して特徴的な文字領域が存在するページは、文書における表紙ページと推定することができる。 For example, a page that is at the top of a document and has more blank space than other pages and has a character region that is characteristic compared to other characters in the document can be estimated as a cover page in the document.

また、文書内の他の文字領域と配置や文字についての共通性・規則性が低く、文字が特徴的である文字領域に対して、ヘッダを除いた文書の上部にあればタイトル、それ以外の位置に出現すれば見出しであると推定することができる。また、写真や図、表等の画像領域に対して最も近接した文字領域を、その画像領域に対するキャプションと推定し、画像領域との関係を結ぶことができる。 In addition, a character area that is low in common / regularity with other character areas in the document and in the arrangement and character, and a character area that is characteristic of a character, is a title if it is at the top of the document excluding the header. If it appears at a position, it can be estimated that it is a headline. In addition, the character area closest to the image area such as a photograph, a figure, or a table can be estimated as a caption for the image area, and a relationship with the image area can be established.

これらの論理属性や論理的な関係は、特徴的なページや領域に対して推定可能であり、全てのページ・領域について付与できるとは限らない。また、ページや領域について、唯一の論理属性や論理的な関係が決定できるとは限らないため、可能性のある複数の論理属性や関係が付与されてもよく、その可能性の強さを示す数値等が付与されてもよい。 These logical attributes and logical relationships can be estimated for characteristic pages and areas, and cannot be assigned to all pages and areas. In addition, for a page or area, it is not always possible to determine a single logical attribute or logical relationship, so multiple potential logical attributes or relationships may be added, indicating the strength of that possibility. A numerical value or the like may be given.

また、論理的な関係には、その関係の強さを示す数値等を付与してもよい。 Further, a numerical value indicating the strength of the relationship may be given to the logical relationship.

ここで挙げた論理属性および関係とその解析規則は一例であり、他にもいろいろなものがある。 The logical attributes and relationships listed here and their analysis rules are examples, and there are many others.

次にステップＳ５０３では、解析対象の領域が表領域か否かを判定する。ここで、表と判定された場合はステップＳ５０４に進む。 In step S503, it is determined whether the analysis target area is a table area. If it is determined as a table, the process proceeds to step S504.

ステップＳ５０４は、領域の物理的な情報と領域の空間的な関係を用いて、表領域に対して解析を行い、表の構造や特徴を示す属性を抽出するステップである。ここでは、行や列の要素名、及び各項目に対する領域や項目値の強調を示す属性を抽出する。例えば、行と列の要素名部分は、表内での位置や罫線の種類等、また言語的な特徴等を利用して識別することができる。また、例えば、表要素を示す領域については、背景の彩色や飾り等を示す画像領域が含まれているか否かで、強調の有無を判断することができる。表の項目値については、表内の他の項目値の文字と異なる文字サイズや文字色、文字スタイル等によって、強調を加えているか否かを判断する。 Step S504 is a step of analyzing the table region using the physical information of the region and the spatial relationship between the regions, and extracting attributes indicating the structure and features of the table. Here, element names of rows and columns, and attributes indicating emphasis of areas and item values for each item are extracted. For example, row and column element name portions can be identified using the position in the table, the type of ruled line, etc., and linguistic features. Further, for example, with respect to the area indicating the table element, it can be determined whether or not the area is emphasized depending on whether or not an image area indicating the background color or decoration is included. As for the item values in the table, it is determined whether or not emphasis is applied based on the character size, character color, character style, etc. different from the characters of the other item values in the table.

図８の例は、表８０６の１行目と一列目の項目値が表の行と列の要素名であることを示している。また、３行×５列目の要素の背景が網点になっており、この要素が他の要素より強調されていることを示している。図９と図１０の例では、３行×５列目の表要素を示す領域６の２に、画像種別がテクスチャの画像領域６の２の１が含まれ、更に領域６の２の１には、文字領域６の２の２が含まれている。また、他の表要素には、同様の画像領域がなく、文字領域が含まれている。このことから、表要素領域６の２は、他の表領域よりも強調されていると推定することができる。 The example of FIG. 8 indicates that the item values in the first row and first column of the table 806 are the element names of the rows and columns in the table. In addition, the background of the element in the 3rd row × 5th column is a halftone dot, which indicates that this element is emphasized more than other elements. In the example of FIGS. 9 and 10, 2 of area 6 indicating the table element in the 3rd row × 5th column includes 1 of 2 of image area 6 of the texture type, and further 1 of 2 of area 6 Includes 2 in the character area 6. The other table elements do not have the same image area but include a character area. From this, it can be estimated that 2 of the table element region 6 is emphasized more than other table regions.

ステップＳ５０１からステップＳ５０４の処理結果は、全てＲＡＭや外部記憶装置等の記憶媒体に格納される。 The processing results from step S501 to step S504 are all stored in a storage medium such as a RAM or an external storage device.

図１１は、文書の論理構造を解析した結果の一例を示している。 FIG. 11 shows an example of the result of analyzing the logical structure of a document.

以上の図５を用いて説明した論理構造の解析処理は、本発明における処理の一例であり、他にもいろいろなものが考えられ、処理の順や処理内容は、このとおりでなくてもよい。例えば、論理構造として抽出する内容は、ページや領域の意味属性でなくてもよいし、他の関係を抽出してもよい。 The logical structure analysis process described above with reference to FIG. 5 is an example of the process in the present invention, and various other processes are conceivable, and the processing order and processing contents may not be as described above. . For example, the content to be extracted as the logical structure may not be a semantic attribute of the page or area, or other relationships may be extracted.

図６は、図３のステップＳ３０４の処理の一例を示したフローチャートであり、本実施例における検索用メタデータ抽出処理について示している。この処理プログラムは、ＲＯＭ１０２に格納されており、ＣＰＵ１０１によって実行される。以下、図６を用いて詳細に説明する。 FIG. 6 is a flowchart showing an example of the process in step S304 of FIG. 3, and shows the search metadata extraction process in this embodiment. This processing program is stored in the ROM 102 and executed by the CPU 101. Hereinafter, it demonstrates in detail using FIG.

まず、ステップＳ６０１では、文書の論理構造解析結果から、検索対象となる領域を抽出する。ここでは、写真、図、表等の領域を検索対象とするが、それ以外の領域を検索対象として抽出してもよい。 First, in step S601, a search target area is extracted from the logical structure analysis result of the document. Here, areas such as photographs, diagrams, and tables are set as search targets, but other areas may be extracted as search targets.

次に、ステップＳ６０２では、ステップＳ６０１で抽出された領域に対する検索用メタデータとなるテキスト情報を抽出する。まず、メタデータの抽出対象となる領域を取得する。ここでは、検索対象である画像領域と論理的関係が結ばれている文字領域や、更にその文字領域と論理的関係が結ばれている文字領域を取得するが、他の領域をメタデータの抽出対象としてもよい。次に、取得した文字領域のテキストの中から、画像領域について説明しているテキスト部分を抽出する。 Next, in step S602, text information serving as search metadata for the area extracted in step S601 is extracted. First, an area from which metadata is extracted is acquired. Here, the character area that is logically connected to the image area to be searched and the character area that is logically connected to the character area are acquired, but the other areas are extracted as metadata. It may be a target. Next, the text portion describing the image area is extracted from the text of the acquired character area.

例えば、論理属性が「キャプション」である文字領域のテキストから画像番号文字列（「表１」等）と画像名文字列（「製品の仕様比較」等）を取り出し、それぞれをメタデータとする。また、検索対象である画像領域の周辺にあり、論理属性が「段落」である文字領域から、画像番号文字列を含む文を抽出してもよい。また、「上（の）表」等のような画像の方向を示す語と画像を示す語を含む文を抽出し、その語が示す画像の方向とステップＳ４０４で抽出されたページ内での各領域の空間的な関係を照合して、画像とメタデータを関連つけたり、「次（の）ページの図」のようなページの位置を示す語と画像を示す語を含む文を抽出し、その語が示すページ位置と文書のページ構成を照合して、画像とメタデータを関連つけたりすることもできる。 For example, an image number character string (such as “Table 1”) and an image name character string (such as “product specification comparison”) are extracted from text in a character area whose logical attribute is “caption”, and each is used as metadata. Alternatively, a sentence including an image number character string may be extracted from a character area having a logical attribute “paragraph” around the image area to be searched. Also, a word indicating the direction of the image such as “upper table” and a sentence including the word indicating the image are extracted, and the direction of the image indicated by the word and each page in the page extracted in step S404 are extracted. By collating the spatial relationship of areas, correlating images with metadata, extracting sentences that include words that indicate the position of the page and words that indicate the image, such as “Next (Figure) Page” The page position indicated by the word and the page structure of the document can be collated to associate the image with the metadata.

次に、ステップＳ６０３では、ステップＳ６０１で検索対象として抽出されたオブジェクトが表領域であるか否かを判別する。ここで、表領域の場合は、ステップＳ６０４に進む。 Next, in step S603, it is determined whether or not the object extracted as a search target in step S601 is a table area. Here, in the case of a table area, the process proceeds to step S604.

ステップＳ６０４は、ステップＳ６０１で検索対象として抽出された表領域について、行と列の要素名に対して重みを決定する処理を行う。重み付けは、例えば、表の中の項目値を示す領域の背景や文字の強調の有無、等によって行うことができる。また、要素名に含まれる各単語について、要素名に含まれる語の品詞や語義等の情報、等を用いて重み付けを行ってもよいし、要素名に含まれる各単語の関連語について重みを付与してもよい。また、他の情報を用いて重みを決定してもよい。 In step S604, a process is performed for determining the weights for the element names of the rows and columns for the table area extracted as the search target in step S601. The weighting can be performed by, for example, the background of the area indicating the item value in the table, the presence or absence of character emphasis, and the like. Further, each word included in the element name may be weighted using information such as the part of speech or meaning of the word included in the element name, or the related word of each word included in the element name may be weighted. It may be given. Moreover, you may determine a weight using other information.

例えば、図８の例では、３行×５列目の表要素を示す領域６の２は、表内で強調されているので、表の３行目の要素名である「Ｂ」と５列目の要素名である「出力速度」が、他の要素名よりも重みが高く設定されることになる。 For example, in the example of FIG. 8, 2 in the region 6 indicating the table element in the 3rd row × 5th column is emphasized in the table, so that “B” which is the element name in the 3rd row of the table and the 5th column The “output speed” that is the element name of the eye is set to have a higher weight than the other element names.

ステップＳ６０５は、ステップＳ６０２で取得したメタデータ抽出対象となる文字領域のテキストの中から、表の要素名に関連するテキスト部分を検索用メタデータとして抽出する処理を行う。例えば、検索対象である表領域の周辺にあり、論理属性が「段落」である文字領域から、表の要素名に含まれる各単語を含む文を抽出する。その際、抽出に用いた要素名の表内での位置等を情報として共に付与してもよい。 In step S605, a text portion related to the element name of the table is extracted as search metadata from the text of the character area to be extracted in step S602. For example, a sentence including each word included in the element name of the table is extracted from a character area having a logical attribute “paragraph” around the table area to be searched. At this time, the position of the element name used for extraction in the table may be given together as information.

図８の例では、「Aのサイズは、……。」という文や、「Bの出力速度は、製品の中で最も速い。」という文が抽出される。 In the example of FIG. 8, the sentence “The size of A is ...” and the sentence “The output speed of B is the fastest among the products” are extracted.

ステップＳ６０６では、ステップＳ６０５で抽出したテキストに対して、ステップＳ６０４で決定した行と列の要素名についての重み等に従って、重要度を算出する処理を行う。 In step S606, an importance level is calculated for the text extracted in step S605 according to the weights of the element names of the rows and columns determined in step S604.

例えば、図８の例では、表の３行目の要素名である「Ｂ」と５列目の要素名である「出力速度」が、他の要素名よりも重みが高く設定されているので、ステップＳ６０５で抽出された「Aのサイズは、……。」という文よりも、「Bの出力速度は、製品の中で最も速い。」という文の方が、重要度は高くなる。 For example, in the example of FIG. 8, the element name “B” in the third row of the table and the element name “output speed” in the fifth column are set higher in weight than other element names. The sentence “The output speed of B is the fastest among the products” is more important than the sentence “A size is ...” extracted in step S605.

ステップＳ６０２とステップＳ６０５で説明したメタデータ抽出方法は例であり、他にも様々なメタデータ抽出方法が考えられる。また、メタデータ抽出時に、メタデータとして抽出された理由を示す情報を出力してもよいし、抽出対象となった領域の論理属性や論理的な関係等に従って、画像とメタデータの関係の強さを示す数値を出力してもよい。また、ここでは文字領域に含まれるテキストをメタデータとして抽出したが、テキスト以外の情報をメタデータとして抽出してもよい。 The metadata extraction methods described in step S602 and step S605 are examples, and various other metadata extraction methods can be considered. In addition, when metadata is extracted, information indicating the reason for extraction as metadata may be output, and the relationship between the image and metadata is strengthened according to the logical attributes and logical relationships of the extraction target area. A numerical value indicating the length may be output. Here, the text included in the character area is extracted as metadata, but information other than text may be extracted as metadata.

ステップＳ６０７では、ステップＳ６０２及びステップＳ６０５で抽出された検索用メタデータと検索対象領域を関連付けてＤＢに格納する処理を行う。 In step S607, the search metadata extracted in steps S602 and S605 and the search target area are associated with each other and stored in the DB.

図２は、検索対象である画像領域と抽出されたメタデータを関連付けて格納する際のデータ構造の一例を示したものである。 FIG. 2 shows an example of the data structure when the image area to be searched and the extracted metadata are stored in association with each other.

ステップＳ６０８は、次の検索対象領域が存在するか否かを判定するステップであり、検索対象が存在する場合はステップＳ６０２に戻り、ステップＳ６０２からステップＳ６０７の処理を、ステップＳ６０１で抽出された全ての領域について行う。 Step S608 is a step of determining whether or not the next search target area exists. If there is a search target, the process returns to step S602, and the processes from step S602 to step S607 are all performed in step S601. This is done for the area.

図６の処理フローは、本実施例におけるメタデータ抽出処理の一例であり、処理の順や処理内容は、このとおりでなくてもよい。 The processing flow of FIG. 6 is an example of the metadata extraction processing in the present embodiment, and the processing order and processing content may not be as described above.

図８は、あるページ画像に対して領域抽出処理を行った結果の例を示している。 FIG. 8 shows an example of a result of performing region extraction processing on a certain page image.

図８において、８００はスキャンされたページ画像であり、８０１から８０７は抽出されたオブジェクトの領域を示している。８０１から８０５は文字領域であるが、文字列の方向が同じで、文字サイズと文字間値・行間値がほぼ均一であり、更に行方向の配置（字下げ、センタリング、揃え等）が同じ領域が一つの文字領域として抽出される。８０６は表領域であり、８０７は、表の中の項目の一つを示している。 In FIG. 8, reference numeral 800 denotes a scanned page image, and reference numerals 801 to 807 denote extracted object regions. 801 to 805 are character areas, but the direction of the character string is the same, the character size, the inter-character value and the inter-line value are almost uniform, and the arrangement in the line direction (indentation, centering, alignment, etc.) is the same Are extracted as one character area. Reference numeral 806 denotes a table area, and reference numeral 807 denotes one of the items in the table.

これらの図は、本実施例における領域抽出結果の一例を示しているが、画像と文字の領域が抽出できれば、他の領域抽出結果でも構わない。 These drawings show an example of the region extraction result in this embodiment, but other region extraction results may be used as long as the image and character regions can be extracted.

図８では、画像とテキストが混在した文書画像を例に挙げたが、必ずしも複数種類のオブジェクトが混在する必要はなく、例えば画像のみで構成された文書画像であっても構わない。 In FIG. 8, a document image in which an image and text are mixed is taken as an example. However, a plurality of types of objects are not necessarily mixed, and for example, a document image composed only of images may be used.

図９では、ページ画像に対して、ページサイズや読み込み時の解像度、電子化されたページ画像データの格納位置等の物理的な情報が付与されている。また、抽出された各矩形領域に対して、文字領域、画像領域等の領域種別、矩形領域のサイズ、ページ内での位置座標等の物理的な情報が付与されている。更に、文字領域については、文字サイズ、文字認識した結果である文字情報が付与され、画像領域については、写真、図、表、記号画像等の画像種別情報が付与されている。例えばページ画像１は、幅２９０ｍｍ、高さ２１０ｍｍで、処理解像度が３００ｄｐｉであることを示している。領域１は、Ｘ座標１０ｍｍ、Ｙ座標１０ｍｍの位置にある、幅４０ｍｍ、高さ１２．５ｍｍの文字領域で、文字列が文字サイズ８ポイントで記述されていることを示している。図９は、本実施例における領域の物理的な情報の一例を示しているが、物理的な情報とはこれに限るものではなく、次のステップのレイアウト抽出ができれば、他の情報が抽出されてもよい。例えば、図９では、矩形領域のサイズと位置座標情報を抽出しているが、矩形領域の左上の位置座標と右下の位置座標を抽出するようにしてもよい。 In FIG. 9, physical information such as the page size, the resolution at the time of reading, and the storage position of the digitized page image data is given to the page image. In addition, physical information such as a region type such as a character region and an image region, a size of the rectangular region, and a position coordinate in the page is assigned to each extracted rectangular region. Further, the character area is assigned character size and character information as a result of character recognition, and the image area is assigned image type information such as a photograph, a diagram, a table, and a symbol image. For example, the page image 1 has a width of 290 mm, a height of 210 mm, and a processing resolution of 300 dpi. A region 1 is a character region having a width of 40 mm and a height of 12.5 mm at a position of an X coordinate of 10 mm and a Y coordinate of 10 mm, and indicates that a character string is described with a character size of 8 points. FIG. 9 shows an example of the physical information of the area in this embodiment. However, the physical information is not limited to this, and if the next step layout can be extracted, other information is extracted. May be. For example, in FIG. 9, the size and position coordinate information of the rectangular area are extracted, but the upper left position coordinates and the lower right position coordinates of the rectangular area may be extracted.

図１０では、ページ画像内の２つの領域に対する空間的な関係として、互いの領域が存在する上下左右の方向や、２つの領域が重なっている、接している、含まれている等の状態、更に、２つの領域が接していない場合には、隣接する２つの領域間の相対的な距離を、ネットワーク構造で表現している。例えば、領域１と領域２の空間的な関係は、領域２が領域１の下にあることを示している。また、領域６と領域６の１の空間的な関係は、領域６の１が領域６に含まれていることを示している。図１０は、本実施例における領域の空間的な関係の一例を示しているが、空間的な関係とはこれに限るものではなく、次のステップの論理構造解析ができれば、他の情報が抽出されてもよい。 In FIG. 10, as a spatial relationship with respect to two areas in the page image, a state in which the areas are in the vertical and horizontal directions, the two areas are overlapped, in contact with each other, included, Further, when the two areas are not in contact with each other, the relative distance between the two adjacent areas is expressed by a network structure. For example, the spatial relationship between region 1 and region 2 indicates that region 2 is below region 1. The spatial relationship between the region 6 and the region 6 indicates that the region 6 includes 1 in the region 6. FIG. 10 shows an example of the spatial relationship of the areas in the present embodiment. However, the spatial relationship is not limited to this, and if the logical structure analysis of the next step can be performed, other information is extracted. May be.

図１１は、文書の論理構造を解析した結果の一例を示しており、抽出されたページと領域の論理属性、およびそれらの読み順や論理的関係等を示している。これは、論理構造を解析した結果の1つの例であるが、論理構造の解析結果は、メタデータ抽出規則を適応できるものであれば、どのような形式・内容でもよく、他にもいろいろなものが考えられる。 FIG. 11 shows an example of the result of analyzing the logical structure of the document, and shows the logical attributes of the extracted pages and areas, their reading order, logical relationship, and the like. This is one example of the result of analyzing the logical structure, but the analysis result of the logical structure may be in any format and content as long as the metadata extraction rules can be applied. Things can be considered.

図１１において、ページ画像１、２、・・・は、読み込まれた文書の各ページ画像に関する論理情報であり、ページに対する論理属性等が付与されている。また、領域１、２、・・・は、ページ画像１から抽出された領域に関する論理情報であり、領域に対する論理属性等が付与されている。ページや領域をつなぐ実線の矢印はページや領域の読み順や論理的関係を示しており、矢印のない点線は包含関係を示している。例えば、ページ画像１には、領域１、領域２、・・・が含まれており、その中の論理属性が「見出し」の領域３、論理属性が「段落」の領域４、論理属性が「キャプション」の領域５の順に読み順が付与されている。論理属性が「キャプション」の領域５と論理属性が「表」の領域６は、「表とキャプションの関係」で結ばれている。論理属性が「表要素名」の領域６の１、論理属性が「表項目」の領域６の２、・・・は、論理属性が「表」の領域６に含まれており、論理属性が「表項目値」である領域６の２０、・・・は、領域６の２に含まれている。 In FIG. 11, page images 1, 2,... Are logical information related to each page image of the read document, and are given logical attributes and the like for the page. The areas 1, 2,... Are logical information related to the areas extracted from the page image 1, and are given logical attributes and the like for the areas. Solid arrows connecting pages and regions indicate the reading order and logical relationship of the pages and regions, and dotted lines without arrows indicate inclusion relationships. For example, page image 1 includes region 1, region 2,..., Of which region 3 has logical attribute “headline”, logical attribute “paragraph” region 4, and logical attribute “ The reading order is given in the order of “caption” area 5. The area 5 whose logical attribute is “caption” and the area 6 whose logical attribute is “table” are connected by “relationship between table and caption”. The logical attribute “1” in the area 6 with “table element name”, the logical attribute “2” in the area 6 with “table item”,... Are included in the area 6 with the logical attribute “table”. 20 in the region 6 that is the “table item value” is included in 2 of the region 6.

図２は、検索対象である領域と抽出されたメタデータを関連付けて格納する際のデータ構造の一例を示したものである。 FIG. 2 shows an example of the data structure when the area to be searched and the extracted metadata are stored in association with each other.

図２では、ページ画像１に含まれる画像や表等に対して、文書から抽出されたテキスト等がメタデータとして付与されていることを示している。ここでは、一例として、ある表に対して、表について説明しているテキストとして文字列「製品の仕様比較」等が、メタデータとして抽出された理由、表とテキストの関係の強さを示す値と共に付与されている。 FIG. 2 shows that text or the like extracted from a document is assigned as metadata to an image or table included in the page image 1. Here, as an example, for a certain table, the value indicating the reason why the character string “Product Specification Comparison” etc. was extracted as metadata as the text explaining the table, and the strength of the relationship between the table and the text It is given with.

本実施例では、特に、表の要素名を用いて周辺の文字領域から抽出したテキスト「Aのサイズは、・・・。」や「Bの出力速度は、製品の中で最も速い。」が、それぞれ重要度「２]と重要度「３」と共に、メタデータとして付与されている。また、図にはないが、表の要素名を用いて抽出されたメタデータに対して、抽出に用いた要素名の表内での位置等を情報として共に付与してもよい。 In the present embodiment, in particular, the text “A size is ...” and “B output speed is the fastest among the products” extracted from the surrounding character area using the element names in the table. These are given as metadata together with importance “2” and importance “3”, respectively. Although not shown in the drawing, the position of the element name used for extraction in the table may be added as information to the metadata extracted using the element name of the table.

これは例であり、格納されるメタデータや格納方法はこれに限らない。これらは図１のＤＩＳＫ、ＲＯＭ、ＲＡＭ、等の記憶装置等の記録媒体に保存されている。 This is an example, and the stored metadata and storage method are not limited to this. These are stored in a recording medium such as a storage device such as DISK, ROM, RAM, etc. in FIG.

図７は、本発明において、文書に含まれる画像や表等を検索する処理の例を示したフローチャートである。この処理プログラムは、ＲＯＭ１０２に格納されており、ＣＰＵ１０１によって実行される。以下、図７を用いて詳細に説明する。 FIG. 7 is a flowchart showing an example of processing for searching for an image, a table, or the like included in a document in the present invention. This processing program is stored in the ROM 102 and executed by the CPU 101. Hereinafter, this will be described in detail with reference to FIG.

本実施形態では、図２に示すような画像や表等に関連付けられている検索用メタデータを利用して検索を行う。検索は、ユーザから検索キーワードやキーワードのリスト、自然文などの検索条件を与えてもらい、その検索条件と各画像や表等に関連つけられたメタデータを対比し、該検索条件と適合するメタデータが付与されている画像や表等をピックアップして、検索結果として表示する。 In this embodiment, a search is performed using search metadata associated with an image, a table, or the like as shown in FIG. The search is given by the user search conditions such as a search keyword, a list of keywords, a natural sentence, etc., and the search conditions are compared with metadata associated with each image, table, etc., and the search conditions are met. Pick up images and tables to which data is assigned and display them as search results.

ステップＳ１３０１は、検索用のインデックス情報等を利用できるように図１のＲＡＭ上でデータの操作を行う部分である。なお、検索対象のデータや、そのメタデータ等は、図１のＮＣＵ経由でＬＡＮ上のＰＣなどの計算機やＤＴＵ経由で外部ネットワーク上の計算機上に保有するようにすることができる。 Step S1301 is a part for manipulating data on the RAM in FIG. 1 so that search index information and the like can be used. The data to be searched, its metadata, etc. can be held on a computer such as a PC on the LAN via the NCU of FIG. 1 or on a computer on the external network via the DTU.

ステップＳ１３０２は、ユーザが検索条件となる文や単語を入力する処理を示している。ここで、ユーザは図１のＫＢから、検索対象となる画像や表等を表現する自然文を検索条件として入力することができる。また、この検索条件は、例えば、図１のＰＣにおいて受け付けて、ＬＡＮを経由して入力するようにすることもできる。 Step S1302 shows a process in which the user inputs a sentence or a word as a search condition. Here, the user can input a natural sentence expressing an image, a table, or the like to be searched from the KB of FIG. 1 as a search condition. Further, for example, this search condition can be received by the PC in FIG. 1 and input via the LAN.

ステップＳ１３０３では、検索条件として入力された文や単語に対し、自然言語処理技術を利用して、形態素解析や構文解析などの解析を行う。これらの各解析の技法や手法としては、公知の種々の手法が利用できるが、ここでは、文を意味的な語の集まりに区切り、その区切られたことによってできた語の品詞や語義などの情報と、文中のそれらの語の関係に関する情報を取り出せるものであれば、何でもかまわない。 In step S1303, a sentence or word input as a search condition is analyzed using a natural language processing technique, such as morphological analysis or syntax analysis. Various known techniques can be used for each of these analysis techniques and methods, but here the sentence is divided into a collection of semantic words, and the part of speech and meaning of the word that is formed by the division. Anything can be used as long as it can extract information and information on the relationship between those words in the sentence.

ステップＳ１３０４では、ステップＳ１３０３で得られた解析結果が、図２に示した検索用メタデータ中に含まれる画像や表等を検索する。 In step S1304, the analysis result obtained in step S1303 searches for images, tables, and the like included in the search metadata shown in FIG.

ステップＳ１３０５では、ステップＳ１３０４で検索した画像や表等の候補があるか否かを判別する。候補が何も見つからなかった場合には、該当候補なしとして検索が終了する。ここで、１つでも該当候補があった場合には、ステップＳ１３０６に進む。 In step S1305, it is determined whether there is a candidate for the image or table searched in step S1304. If no candidate is found, the search ends with no corresponding candidate. If there is even one candidate, the process proceeds to step S1306.

ステップＳ１３０６では、ステップＳ１３０４で検索された候補である画像や表等について、検索条件内の語の関係と、これを含むメタデータ同士の関係を比較することで、語の関係に類似する構造があるかどうかを調べる処理を行う。 In step S1306, for the image or table that is the candidate searched in step S1304, the relationship between the words in the search condition and the relationship between the metadata including the relationship are compared, so that a structure similar to the word relationship is obtained. Perform a process to check if it exists.

ステップＳ１３０７は、ステップＳ１３０４で検索された候補の中から、ステップＳ１３０６で構造の類似性が見出せた画像を検索結果としてユーザに提示する処理を行う。 In step S1307, processing for presenting to the user, as a search result, an image whose structural similarity is found in step S1306 from the candidates searched in step S1304.

その際、ステップＳ１３０６で見つかった構造の類似性やメタデータに付与された重要度等に基づいて、検索条件とピックアップした各画像や表等のメタデータとの類似度を求め、類似度の高い順番に並べて検索結果を提示してもよい。ここで言う類似度とは、ユーザが入力した検索条件が、各画像や表等に付与されたメタデータとの関係を示す表現としてどの程度適切であるかを示している。 At that time, based on the similarity of the structure found in step S1306, the importance assigned to the metadata, etc., the similarity between the search condition and the metadata of each image or table picked up is obtained, and the similarity is high. The search results may be presented in order. The degree of similarity referred to here indicates how appropriate the search condition input by the user is as an expression indicating the relationship with the metadata assigned to each image, table, or the like.

また、その際、複数の画像や表の集まりを検索結果としてもよいし、検索条件に該当する表の一部分を検索結果としてもよい。 At that time, a collection of a plurality of images and tables may be used as the search result, or a part of the table corresponding to the search condition may be used as the search result.

例えば、ユーザから「出力速度が速い製品」という検索条件が与えられた場合には、この例文は、「出力速度」や「速い」、「が」等の語に分解され、それぞれに「名詞」や「接続詞」等の品詞情報が割り当てられる。そして、各語は、「修飾」といった関係を持つことを解析できる。次に、解析結果の中で名詞等の品詞を持つ重要な語である「出力速度」や「速い」等がメタデータに含まれる画像や表、即ち、図２においては、領域６の表を該当候補として取得する。次に、これらの該当候補について、解析結果の語の関係と、これを含むメタデータ同士の関係を比較する。検索条件の例文では、「出力速度」と「速い」、「速い」と「製品」が修飾の関係で結ばれている。図２における領域６に付与された表の要素名によるメタデータ中には「出力速度」と「速い」が含まれており、修飾の関係で結ばれている。このことより、領域６の表のメタデータに、例文と類似の構造が見出せることがわかる。 For example, when the search condition “product with fast output speed” is given by the user, this example sentence is broken down into words such as “output speed”, “fast”, “ga”, etc. And part-of-speech information such as “connectives”. Each word can be analyzed to have a relationship such as “modification”. Next, in the analysis result, an image or table in which metadata such as “output speed” and “fast”, which are important words having parts of speech such as nouns, is included, that is, in FIG. Get as a candidate. Next, for these candidates, the relationship between the words of the analysis results and the relationship between the metadata including them are compared. In the example of the search condition, “output speed” and “fast”, “fast” and “product” are connected in a modification relationship. The metadata by the element name of the table assigned to the area 6 in FIG. 2 includes “output speed” and “fast”, which are linked in a modification relationship. From this, it can be seen that a structure similar to the example sentence can be found in the metadata of the table in the region 6.

また、例えば、ユーザから「Bの出力速度」という検索条件が与えられた場合には、この例文は、「B」や「の」や「出力速度」の語に分解され、「B」と「出力速度」が修飾の関係にあることが解析結果として得られる。次に、解析結果の中の「B」や「出力速度」がメタデータに含まれる該当候補として、図２の領域６の表を取得する。領域６に付与された表の要素名によるメタデータ中には「B」と「出力速度」が含まれており、修飾の関係で結ばれていることから、領域６の表のメタデータに、例文と類似の構造が見出せることがわかる。この場合は、検索条件中に含まれる「B」および「出力速度」は表の要素名であるので、表ではなく、これらの要素名が指している表の項目（値）を検索結果としてもよい。 For example, when a search condition “output speed of B” is given by the user, this example sentence is decomposed into words “B”, “no”, and “output speed”, and “B” and “ It is obtained as an analysis result that the “output speed” is in a modification relationship. Next, the table of the region 6 in FIG. 2 is acquired as a corresponding candidate in which “B” and “output speed” in the analysis result are included in the metadata. The metadata by the element name of the table assigned to the area 6 includes “B” and “output speed”, which are linked by a modification relationship. Therefore, in the metadata of the table of the area 6, It can be seen that a structure similar to the example sentence can be found. In this case, since “B” and “Output speed” included in the search conditions are the element names of the table, the table items (values) pointed to by these element names can be used as search results. Good.

これは、検索方法の例であり、検索方法としてはこれに限るものではなく、どのような方法でもかまわない。 This is an example of a search method, and the search method is not limited to this, and any method may be used.

また、上記検索結果を利用することによって、文書に含まれる画像や表等を効率的に再利用することができるようになる。例えば、検索結果の一覧の中から所望の画像や表等を選択して得たデータを、ワープロ等の編集ソフトウエア等を用いて編集・加工したり、別の文書に挿入したりすることによって、再利用することができるようになる。これは、再利用方法の一例であり、これに限るものではなく、どのような方法でもかまわない。 In addition, by using the search result, an image, a table, or the like included in the document can be efficiently reused. For example, by editing and processing data obtained by selecting a desired image or table from a list of search results using editing software such as a word processor, or inserting it into another document Can be reused. This is an example of a reuse method, and is not limited to this, and any method may be used.

また、上記メタデータを利用することによって、文書および文書中の画像や表を蓄積する時に、効率的に分類・整理・管理することができるようになる。例えば、メタデータとして付与されている語を分析して、関連するカテゴリ等で画像や表等を分類することができ、分類するカテゴリなどはユーザが与えてもよいし、クラスタリング等の統計的手法によって自動的に分類するようにしてもよい。また、分類時に、カテゴリと各画像や表等のメタデータの類似度等を計算して求めて、分類に利用してもよい。これは、分類方法の例であるし、また文書管理方法の例であり、文書管理方法としてはこれに限るものではなく、どのような方法でもかまわない。 Also, by using the metadata, it is possible to efficiently classify, organize, and manage documents and images and tables in the documents. For example, it is possible to analyze words given as metadata and classify images and tables by related categories, etc. The categories to be classified may be given by the user, or statistical methods such as clustering May be automatically classified according to Further, at the time of classification, the similarity between the category and metadata such as images and tables may be calculated and used for classification. This is an example of a classification method and an example of a document management method. The document management method is not limited to this, and any method may be used.

（他の実施形態）
本実施例では、図１の外部記憶装置、ＲＯＭ、ＲＡＭ等の記憶装置に、図２、図９、図１０、図１１に示す解析結果、また解析規則が格納されているが、これらの情報はＬＡＮなどから入手して、記憶装置に格納して利用することも可能である。 (Other embodiments)
In the present embodiment, the analysis results and analysis rules shown in FIGS. 2, 9, 10, and 11 are stored in the storage device such as the external storage device, ROM, or RAM in FIG. Can be obtained from a LAN or the like and stored in a storage device.

また、本発明は、複数の機器（例えばホストコンピュータ、インタフェース機器、リーダ、プリンタなど）から構成されるシステムに適応しても、単一の機器からなる装置（例えば、複写機、ファクシミリ装置など）に適応してもよい。 In addition, the present invention can be applied to a system composed of a plurality of devices (for example, a host computer, interface device, reader, printer, etc.), but can also be a device composed of a single device (for example, a copier, a facsimile machine, etc.). May be adapted.

また、本発明の目的は、前述した実施形態の機能を実現するソフトウエアのプログラムコードを記録した記憶媒体（または記録媒体）をシステムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても達成されることはいうまでもない。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。プログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどを用いることができる。 In addition, an object of the present invention is to supply a storage medium (or recording medium) in which software program codes for realizing the functions of the above-described embodiments are recorded to a system or apparatus, and a computer (or CPU or CPU) of the system or apparatus. Needless to say, this can also be achieved by the MPU) reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention. As a storage medium for supplying the program code, for example, a flexible disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ（オペレーティングシステム）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現されることはいうまでもない。 Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (operating system) operating on the computer based on the instruction of the program code. It goes without saying that some or all of the actual processing is performed, and the functions of the above-described embodiments are realized by the processing.

更に、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, after the program code read from the storage medium is written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. It goes without saying that the CPU or the like provided in the board or the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.

本発明に係る文書処理システムが構築されるコンピュータ装置の基本構成例を示すブロック図1 is a block diagram showing a basic configuration example of a computer device in which a document processing system according to the present invention is constructed. 文書画像から抽出された画像とメタデータを関連付けて格納するデータ構造の例を示す図The figure which shows the example of the data structure which links | relates and stores the image extracted from the document image, and metadata 本発明における文書登録処理の例を示すフローチャートThe flowchart which shows the example of the document registration process in this invention 図３のステップＳ３０２の処理を詳細化した図で、文書画像解析時の処理の例を示すフローチャートFIG. 4 is a detailed diagram of the processing in step S302 in FIG. 3, and is a flowchart illustrating an example of processing at the time of document image analysis; 図３のステップＳ３０３の処理を詳細化した図であり、論理構造の抽出処理の例を示すフローチャートFIG. 4 is a detailed diagram of the processing in step S303 in FIG. 3, and is a flowchart illustrating an example of logical structure extraction processing; 図３のステップＳ３０４の処理を詳細化した図で、第一の実施例における検索用メタデータ抽出処理の例を示すフローチャートFIG. 4 is a detailed diagram of the processing in step S304 in FIG. 3, and is a flowchart illustrating an example of search metadata extraction processing in the first embodiment 文書に含まれる画像を検索する処理の例を示すフローチャートFlowchart illustrating an example of processing for searching for an image included in a document ページ画像に対して領域抽出処理を行った結果の例Example of result of region extraction processing for page image ページ画像や領域についての各種物理的な情報の例を示す図Diagram showing examples of various physical information about page images and areas 各領域の空間的な関係を抽出した結果の例を示す図The figure which shows the example of the result of having extracted the spatial relationship of each area | region 文書の論理構造を解析した結果の例を示す図Diagram showing an example of the result of analyzing the logical structure of a document

Claims

In a document processing system that can search and use images and tables in a document by extracting images and tables in the document and metadata related to the images and tables,
In order to search for document components such as images and tables in the document, means for analyzing in advance the document to be searched and extracting information characterizing each component included in the document When,
Metadata extraction means for extracting metadata relating to a table included in the document based on feature information of items in the table in the document;
Metadata storage means for associating and storing the table included in the document and the metadata extracted by the metadata extraction means;
Search condition input means for inputting a search condition for searching a table included in the document;
Search means for searching for metadata associated with the table based on the input search condition;
Search result display means for displaying a table or a part of the table in the document searched by the search means as a search result;
A document processing system comprising:

In a document processing system that can search and use images and tables in a document by extracting images and tables in the document and metadata related to the images and tables,
In order to search for document components such as images and tables in the document, means for analyzing in advance the document to be searched and extracting information characterizing each component included in the document When,
Metadata extraction means for extracting metadata relating to the table included in the document based on element names of rows and columns relating to items of the table in the document;
Metadata storage means for associating and storing the table in the document and the metadata extracted by the metadata extraction means;
Search condition input means for inputting a search condition for searching a table in the document;
Search means for searching for metadata associated with the table based on the input search condition;
Search result display means for displaying a table or a part of the table in the document searched by the search means as a search result;
A document processing system comprising:

In a document processing system that can search and use images and tables contained in documents by extracting the images and tables in the document and the metadata related to the images and tables,
A document input means for inputting a document in which images, tables, and sentences are mixed;
An area extraction means for recognizing and extracting an image or table in the document and an area including a sentence from the document input via the document input means;
Layout analysis means for analyzing a layout related to the area extracted by the area extraction means in each page of the document input via the document input means;
Logical structure analysis means for analyzing the logical structure of the document based on the layout extracted by the layout analysis means for each page of the document input via the document input means;
Metadata extraction means for extracting metadata relating to the table included in the document based on the element names of the rows and columns relating to the items of the table in the document based on the logical structure analyzed by the logical structure analysis means; ,
Metadata storage means for associating and storing the table included in the document and the metadata extracted by the metadata extraction means;
Search condition input means for inputting a search condition for searching a table in the document;
Search means for searching for metadata associated with the table based on the input search condition;
Search result display means for displaying a table or a part of the table in the document searched by the search means as a search result;
A document processing system comprising:

The document processing system according to claim 2, wherein:
The document processing system, wherein the metadata extraction means extracts metadata about a table included in the document based on element names of rows and columns related to emphasized items in the table.

The document processing system according to any one of claims 1 to 3,
Metadata importance calculation means for calculating importance for metadata extracted by the metadata extraction means;
With
The metadata storage means stores the metadata extracted by the metadata extraction means, the importance calculated by the metadata importance calculation means, and a table included in the document in association with each other,
The document processing system, wherein the search means calculates a similarity between a search condition and metadata based on metadata and importance associated with the table.

The document processing system according to claim 5,
Element name weighting means for weighting the element names of the rows and columns of the table;
With
The metadata extraction means extracts metadata related to a table included in the document based on element names of rows and columns of the table,
The document processing system, wherein the metadata importance calculation means calculates the importance of the metadata in accordance with an element name weight by the element name weighting means.

The document processing system according to claim 6.
The element name weighting unit weights element names of table rows and columns included in the document in accordance with presence / absence of emphasis of items in the table.

The document processing system according to claim 2, wherein:
The metadata extraction means extracts metadata related to a table included in the document according to a relationship between words included in element names of rows and columns of the table and a relationship between words in sentences included in the document. A document processing system.

The document processing system according to any one of claims 5 to 7,
The metadata extraction means extracts metadata relating to a table included in the document according to a relationship between words included in element names of the rows and columns of the table and a relationship between words in sentences included in the document,
The metadata importance calculating means calculates the importance of the metadata according to the relationship between words included in the element names of the rows and columns of the table and the relationship between words in sentences included in the document. Feature document processing system.

The document processing system according to any one of claims 1 to 9,
The search means searches for metadata associated with the table according to a relationship between words included in the search condition and a relationship between words within the search condition included in the metadata. system.

The document processing system according to claim 3, wherein the layout analysis unit extracts a spatial relationship between the regions.

4. The document processing system according to claim 3, wherein the logical structure analyzing unit extracts logical attributes of pages and areas constituting the document, determines a reading order of the areas, and extracts the logical structure. .

The document processing system according to claim 1, wherein the document is a paper document on which a book or an electronic document is printed.

The document processing system according to claim 1, wherein the document is an electronic document.

In a document processing device that can search and utilize images and tables in a document by extracting images and tables in the document and metadata related to the images and tables,
In order to search for document components such as images and tables in the document, means for analyzing in advance the document to be searched and extracting information characterizing each component included in the document When,
Metadata extraction means for extracting metadata relating to a table included in the document based on feature information of items in the table in the document;
Metadata storage means for associating and storing the table included in the document and the metadata extracted by the metadata extraction means;
Search condition input means for inputting a search condition for searching a table included in the document;
Search means for searching for metadata associated with the table based on the input search condition;
Search result display means for displaying a table or a part of the table in the document searched by the search means as a search result;
A document processing apparatus comprising:

In a document processing device that can search and utilize images and tables in a document by extracting images and tables in the document and metadata related to the images and tables,
In order to search for document components such as images and tables in the document, means for analyzing in advance the document to be searched and extracting information characterizing each component included in the document When,
Metadata extraction means for extracting metadata relating to the table included in the document based on element names of rows and columns relating to items of the table in the document;
Metadata storage means for associating and storing the table in the document and the metadata extracted by the metadata extraction means;
Search condition input means for inputting a search condition for searching a table in the document;
Search means for searching for metadata associated with the table based on the input search condition;
Search result display means for displaying a table or a part of the table in the document searched by the search means as a search result;
A document processing apparatus comprising:

In a document processing device that can retrieve and utilize images and tables contained in a document by extracting the images and tables in the document and metadata related to the images and tables,
A document input means for inputting a document in which images, tables, and sentences are mixed;
An area extraction means for recognizing and extracting an image or table in the document and an area including a sentence from the document input via the document input means;
Layout analysis means for analyzing a layout related to the area extracted by the area extraction means in each page of the document input via the document input means;
Logical structure analysis means for analyzing the logical structure of the document based on the layout extracted by the layout analysis means for each page of the document input via the document input means;
Metadata extracting means for extracting metadata relating to the table included in the document based on the element names of the rows and columns relating to the items of the table in the document based on the logical structure analyzed by the logical structure analyzing means; ,
Metadata storage means for associating and storing the table included in the document and the metadata extracted by the metadata extraction means;
Search condition input means for inputting a search condition for searching a table in the document;
Search means for searching for metadata associated with the table based on the input search condition;
Search result display means for displaying a table or a part of the table in the document searched by the search means as a search result;
A document processing apparatus comprising:

The document processing apparatus according to any one of claims 16 to 17,
The document processing apparatus, wherein the metadata extraction unit extracts metadata about a table included in the document based on element names of rows and columns related to highlighted items in the table.

The document processing apparatus according to any one of claims 15 to 17,
Metadata importance calculation means for calculating importance for metadata extracted by the metadata extraction means;
With
The metadata storage means stores the metadata extracted by the metadata extraction means, the importance calculated by the metadata importance calculation means, and a table included in the document in association with each other,
The document processing apparatus, wherein the search unit calculates a similarity between a search condition and metadata based on metadata and importance associated with the table.

The document processing apparatus according to claim 19, wherein
Element name weighting means for weighting the element names of the rows and columns of the table;
With
The metadata extraction means extracts metadata related to a table included in the document based on element names of rows and columns of the table,
The document processing apparatus characterized in that the metadata importance calculation means calculates the importance of the metadata in accordance with the weight of the element name by the element name weighting means.

The document processing apparatus according to claim 20, wherein
The element name weighting unit weights element names of table rows and columns included in the document according to presence / absence of emphasis of items in the table.

The document processing apparatus according to any one of claims 16 to 18,
The metadata extraction means extracts metadata related to a table included in the document according to a relationship between words included in element names of rows and columns of the table and a relationship between words in sentences included in the document. A document processing apparatus characterized by the above.

The document processing apparatus according to any one of claims 19 to 21,
The metadata extraction means extracts metadata relating to a table included in the document according to a relationship between words included in element names of the rows and columns of the table and a relationship between words in sentences included in the document,
The metadata importance calculating means calculates the importance of the metadata according to a relationship between words included in element names of the rows and columns of the table and a relationship between words in sentences included in the document. Feature document processing device.

24. The document processing apparatus according to claim 15, wherein:
The search means searches for metadata associated with the table according to a relationship between words included in the search condition and a relationship between words within the search condition included in the metadata. apparatus.

The document processing apparatus according to claim 17, wherein the layout analysis unit extracts a spatial relationship between the regions.

18. The document processing apparatus according to claim 17, wherein the logical structure analyzing unit extracts logical attributes of pages and areas constituting the document, determines a reading order of the areas, and extracts the logical attributes as a logical structure. .

27. The document processing apparatus according to claim 15, wherein the document is a paper document on which a book or an electronic document is printed.

27. The document processing apparatus according to claim 15, wherein the document is an electronic document.

In a document processing method that can search and utilize images and tables in a document by extracting the images and tables in the document and metadata related to the images and tables,
In order to search for a component part of a document such as an image or a table in the document, a process of analyzing a document to be searched in advance and extracting information characterizing each component included in the document When,
A metadata extraction step of extracting metadata relating to a table included in the document based on feature information of items in the table in the document;
A metadata storage step of associating and storing the table included in the document and the metadata extracted by the metadata extraction step;
A search condition input step for inputting a search condition for searching a table included in the document;
A search step of searching for metadata associated with the table based on the input search condition;
A search result display step of displaying a table or a part of the table in the document searched by the search step as a search result;
A document processing method comprising:

In a document processing method that can search and utilize images and tables in a document by extracting the images and tables in the document and metadata related to the images and tables,
In order to search for a component part of a document such as an image or a table in the document, a process of analyzing a document to be searched in advance and extracting information characterizing each component included in the document When,
A metadata extraction step of extracting metadata relating to a table included in the document based on element names of rows and columns relating to items of the table in the document;
A metadata storage step of associating and storing the table in the document and the metadata extracted by the metadata extraction step;
A search condition input step for inputting a search condition for searching the table in the document;
A search step of searching for metadata associated with the table based on the input search condition;
A search result display step of displaying a table or a part of the table in the document searched by the search step as a search result;
A document processing method comprising:

In a document processing method that can search for and utilize images and tables contained in a document by extracting the images and tables in the document and metadata related to the images and tables,
A document input process for inputting a document in which images, tables, and sentences are mixed,
An area extraction step of recognizing and extracting an image or table in the document and an area including a sentence from the document input through the document input step;
A layout analysis step of analyzing a layout related to the region extracted by the region extraction step in each page of the document input through the document input step;
A logical structure analysis step for analyzing the logical structure of the document based on the layout extracted by the layout analysis step for each page of the document input through the document input step;
Based on the logical structure analyzed by the logical structure analysis step, based on the element names of the rows and columns related to the table items in the document, a metadata extraction step for extracting metadata about the table included in the document; ,
A metadata storage step of associating and storing the table included in the document and the metadata extracted by the metadata extraction step;
A search condition input step for inputting a search condition for searching the table in the document;
A search step of searching for metadata associated with the table based on the input search condition;
A search result display step of displaying a table or a part of the table in the document searched by the search step as a search result;
A document processing method comprising:

32. The document processing method according to claim 30, wherein:
The document processing method is characterized in that the metadata extraction step extracts metadata relating to a table included in the document based on element names of rows and columns relating to emphasized items in the table.

32. The document processing method according to claim 29, wherein:
A metadata importance calculation step for calculating the importance for the metadata extracted by the metadata extraction step;
With
The metadata storage step stores the metadata extracted by the metadata extraction step, the importance calculated by the metadata importance calculation step, and a table included in the document in association with each other,
The document processing method characterized in that the search step calculates the similarity between the search condition and the metadata based on the metadata and importance associated with the table.

34. The document processing method according to claim 33.
An element name weighting step for weighting the element names of the rows and columns of the table;
With
The metadata extraction step extracts metadata related to a table included in the document based on element names of rows and columns of the table,
The document processing method characterized in that the metadata importance calculation step calculates the importance of the metadata in accordance with an element name weight in the element name weighting step.

The document processing method according to claim 34.
The element name weighting step weights element names of table rows and columns included in the document in accordance with presence / absence of emphasis of items in the table.

The document processing method according to claim 30 to claim 32,
The metadata extraction step extracts metadata relating to a table included in the document according to a relationship between words included in element names of rows and columns of the table and a relationship between words in sentences included in the document. A document processing method characterized by the above.

36. The document processing method according to claim 33, wherein:
The metadata extraction step extracts metadata relating to a table included in the document according to a relationship between words included in element names of rows and columns of the table and a relationship between words in sentences included in the document,
The metadata importance calculation step calculates the importance of the metadata according to the relationship between words included in the element names of the rows and columns of the table and the relationship between words in sentences included in the document. Characteristic document processing method.

The document processing method according to any one of claims 29 to 37,
The document processing characterized in that the search step searches for metadata associated with the table according to a relationship between words included in the search condition and a relationship between words within the search condition included in the metadata. Method.

32. The document processing method according to claim 31, wherein the layout analysis step extracts a spatial relationship between the regions.

32. The document processing method according to claim 31, wherein the logical structure analysis step extracts logical attributes of pages and areas constituting the document, determines a reading order of the areas, and extracts the logical attributes. .

41. The document processing method according to claim 29, wherein the document is a paper document on which a book or an electronic document is printed.

41. The document processing method according to claim 29, wherein the document is an electronic document.

15. A recording medium used for the document processing system according to claim 1.