JP5393582B2

JP5393582B2 - Document management program, document management method, and document management apparatus

Info

Publication number: JP5393582B2
Application number: JP2010099964A
Authority: JP
Inventors: 純石井; 聡荻原
Original assignee: Fujitsu Frontech Ltd
Current assignee: Fujitsu Frontech Ltd
Priority date: 2010-04-23
Filing date: 2010-04-23
Publication date: 2014-01-22
Anticipated expiration: 2030-04-23
Also published as: JP2011232811A

Description

文書を電子化して管理する文書管理プログラム、文書管理方法及び文書管理装置に関する。 The present invention relates to a document management program, a document management method, and a document management apparatus for digitizing and managing a document.

従来、書類や帳票等の文書を電子化し、記憶装置に格納して管理する文書管理装置がある。この文書管理装置は、関連のある複数枚の書類や帳票を有する文書の登録機能と検索機能を有する。登録時には、書類や帳票のイメージデータを文書単位にまとめ、１つのインデックスを付与して記憶装置に格納する。検索時には、インデックスを用いて記憶装置に格納される文書単位のイメージデータを抽出する。 2. Description of the Related Art Conventionally, there is a document management apparatus that digitizes documents such as documents and forms and stores them in a storage device for management. This document management apparatus has a registration function and a search function for documents having a plurality of related documents and forms. At the time of registration, image data of documents and forms is collected in units of documents, and one index is assigned and stored in the storage device. At the time of search, image data for each document stored in the storage device is extracted using an index.

しかし、文書には、１文書に複数のインデックスが含まれるものがある。例えば、団体保険の申込書類では、１文書で複数の契約者が登録される。そして、１文書は、団体保険の申込書類と、契約者別の契約関連書類とを有する。この場合、文書全体ばかりでなく、契約者それぞれに対応する書類にインデックスを付与して管理することが好ましい。図１８は、従来の複数インデックスを含む文書の管理の一例を示した図である。管理対象の申込文書９１０は、申込書類ＡＢＣ９１１と、添付書類（Ａ）９１２、添付書類（Ｂ）９１３及び添付書類（Ｃ）９１４を有する。申込書類ＡＢＣ９１１には、それぞれ添付書類を識別するインデックスとなる「証券番号Ａ」、「証券番号Ｂ」、「証券番号Ｃ」が記載されている。「証券番号Ａ」は添付書類（Ａ）９１２、「証券番号Ｂ」は添付書類（Ｂ）９１３、「証券番号Ｃ」は添付書類（Ｃ）９１４にそれぞれ対応する。以下、このように添付書類のインデックスが記載される書類を代表帳票、代表帳票に記載されるインデックスによって代表帳票に関連付けられる添付書類を付帯帳票とする。 However, some documents include a plurality of indexes in one document. For example, in a group insurance application document, a plurality of contractors are registered in one document. One document includes group insurance application documents and contract-related documents for each contractor. In this case, it is preferable to manage by assigning an index to the document corresponding to each contractor as well as the entire document. FIG. 18 is a diagram illustrating an example of management of a document including a plurality of conventional indexes. The application document 910 to be managed includes an application document ABC911, an attached document (A) 912, an attached document (B) 913, and an attached document (C) 914. In the application document ABC911, “securities number A”, “securities number B”, and “securities number C”, which are indexes for identifying the attached documents, are described. “Securities number A” corresponds to the attached document (A) 912, “Securities number B” corresponds to the attached document (B) 913, and “Securities number C” corresponds to the attached document (C) 914, respectively. Hereinafter, the document in which the index of the attached document is described is referred to as a representative form, and the attached document associated with the representative form by the index described in the representative form is referred to as an incidental form.

従来の文書管理装置９００では、申込文書９１０に含まれる全書類のイメージデータ９３１をイメージデータベース（以下、ＤＢとする）９３０に格納し、１つのインデックスを付与する。図１８の例では、申込文書９１０のイメージデータを文字認識してインデックス項目の先頭の「証券番号Ａ」を抽出し、インデックスとしている。インデックス「証券番号Ａ」は、イメージデータ９３１が格納されるアドレスに対応づけてインデックスＤＢ９２０に登録する。「証券番号Ｂ」による検索を可能とするため、「証券番号Ｂ」についてインデックスを生成したい場合がある。この場合、オペレータは再度申込書類ＡＢＣ９１１のイメージデータを用いて文書管理装置９００に上記と同様の処理を行わせる。文書管理装置９００は、申込書類ＡＢＣのイメージデータ９３２をイメージＤＢ９３０に格納し、インデックス「証券番号Ａ」を抽出する。そこで、オペレータが、手動で「証券番号Ａ」を「証券番号Ｂ」に付け替える。「証券番号Ｃ」についてもオペレータが同様の処理を行う。こうして、イメージＤＢ９３０には、インデックス「証券番号Ａ」に対応付けられた申込文書９１０の全イメージデータ９３１と、インデックス「証券番号Ｂ」に対応付けられた申込書類ＡＢＣのイメージデータ９３２と、インデックス「証券番号Ｃ」に対応付けられた申込書類ＡＢＣのイメージデータ９３３と、が格納される。 In the conventional document management apparatus 900, image data 931 of all documents included in the application document 910 is stored in an image database (hereinafter referred to as DB) 930, and one index is assigned. In the example of FIG. 18, the image data of the application document 910 is character-recognized, and the “stock number A” at the head of the index item is extracted and used as an index. The index “securities number A” is registered in the index DB 920 in association with the address where the image data 931 is stored. In order to enable a search based on “security number B”, an index may be generated for “security number B”. In this case, the operator again causes the document management apparatus 900 to perform the same processing as described above using the image data of the application document ABC911. The document management apparatus 900 stores the image data 932 of the application document ABC in the image DB 930 and extracts the index “securities number A”. Therefore, the operator manually replaces “securities number A” with “securities number B”. The operator performs the same process for “Securities Number C”. Thus, in the image DB 930, the entire image data 931 of the application document 910 associated with the index “Securities number A”, the image data 932 of the application document ABC associated with the index “Securities number B”, and the index “ The image data 933 of the application document ABC associated with the “security number C” is stored.

なお、画像データ、音声データ、テキストデータ等、種類の異なる情報データの集合体であるマルチメディア情報の管理では、マルチメディア情報に１つのインデックスではなく、含まれる情報データに応じた属性インデックスを付与し、検索を容易にする方法がある（例えば、特許文献１参照）。 In the management of multimedia information, which is a collection of different types of information data such as image data, audio data, text data, etc., an attribute index corresponding to the included information data is assigned to the multimedia information instead of one index. However, there is a method for facilitating the search (see, for example, Patent Document 1).

特開２００２−７４１８号公報JP 2002-7418 A

従来の文書管理では、複数のインデックスが含まれる文書の管理が容易ではないという問題点があった。
上述のように、１つの文書に複数のインデックスが含まれる文書を複数のインデックスで管理するためには、オペレータが、登録したいインデックスの数だけ代表帳票を読み込ませ、得られたインデックスの付け替えを行わなければならず、オペレータの負担が大きかった。図１８の例では、「証券番号Ｂ」「証券番号Ｃ」について申込書類ＡＢＣ９１１の読み取りを繰り返すとともに、得られたインデックスの付け替えを行う必要があった。この操作はオペレータの手作業で行われるため、登録ミスが発生する可能性も高かった。さらに、記憶装置には、インデックスの数の代表帳票のイメージデータを重複して格納しなければならず、インデックスが多くなると、記憶装置の保管用量を圧迫してしまうという問題もある。 Conventional document management has a problem that it is not easy to manage a document including a plurality of indexes.
As described above, in order to manage a document in which a single document includes a plurality of indexes with a plurality of indexes, the operator reads as many representative forms as the number of indexes to be registered, and replaces the obtained indexes. The burden on the operator was great. In the example of FIG. 18, it is necessary to repeatedly read the application document ABC911 for “security number B” and “security number C” and to replace the obtained index. Since this operation is performed manually by the operator, there is a high possibility that a registration error will occur. Furthermore, the storage device has to store the image data of the representative forms corresponding to the number of indexes, and there is a problem that if the number of indexes increases, the storage amount of the storage device is compressed.

検索時においても、付加したインデックスについては、直接イメージデータを取得できないという問題点があった。図１８の例では、「証券番号Ｂ」の添付書類（Ｂ）を確認したいときは、まず、「証券番号Ｂ」を指定して申込書類ＡＢＣのイメージデータ９３２を取得する。しかし、添付書類のイメージデータはない。そこで、表示装置に表示される申込書類ＡＢＣのイメージデータ９３２に基づき、インデックスの先頭の「証券番号Ａ」を確認し、「証券番号Ａ」を指定して全体のイメージデータ９３１を取得する。このように、検索時にも煩雑な操作が必要であった。 Even in the search, there is a problem that image data cannot be directly acquired for the added index. In the example of FIG. 18, when it is desired to confirm the attached document (B) of “Securities No. B”, first, “Securities No. B” is specified and the image data 932 of the application document ABC is acquired. However, there is no image data for attached documents. Therefore, based on the image data 932 of the application document ABC displayed on the display device, the “stock number A” at the head of the index is confirmed, and the “stock number A” is designated to obtain the entire image data 931. As described above, a complicated operation is required even during a search.

このような点に鑑み、本願発明は、複数のインデックスが含まれる文書の文書管理を容易にすることが可能な文書管理プログラム、文書管理方法及び文書管理装置を提供することを目的とする。 In view of such a point, an object of the present invention is to provide a document management program, a document management method, and a document management apparatus capable of facilitating document management of a document including a plurality of indexes.

上記課題を解決するために、文書を電子化して管理する文書管理処理を行うコンピュータを機能させる文書管理プログラムが提供される。この文書管理プログラムは、コンピュータを、認識手段、登録手段及び検索手段として機能させる。認識手段は、代表帳票と代表帳票に添付される付帯帳票とを有し、代表帳票に関連付けられる付帯帳票を識別するインデックスが代表帳票に記載される文書を電子化したイメージデータを取得する。そして、代表帳票のイメージデータに文字認識を行って代表帳票に記載されるインデックスを全て抽出する。登録手段は、取得した文書のイメージデータをイメージデータ記憶手段に格納し、文書のイメージデータを格納したアドレスをこの文書の代表帳票から抽出したインデックスに対応付けてインデックス管理情報に登録する。検索手段は、検索対象のインデックスが指定されたときは、指定されたインデックスに基づいてインデックス管理情報を検索し、指定されたインデックスに対応する文書のイメージデータを抽出する。 In order to solve the above problems, a document management program is provided that allows a computer that performs document management processing to digitize and manage documents to function. This document management program causes a computer to function as a recognition unit, a registration unit, and a search unit. The recognizing unit has a representative form and an incidental form attached to the representative form, and acquires image data obtained by digitizing a document in which an index for identifying the incidental form associated with the representative form is described in the representative form. Then, character recognition is performed on the image data of the representative form to extract all indexes described in the representative form. The registration unit stores the acquired document image data in the image data storage unit, and registers the address storing the document image data in the index management information in association with the index extracted from the representative form of the document. When a search target index is specified, the search means searches the index management information based on the specified index, and extracts document image data corresponding to the specified index.

また、上記課題を解決するために、上記の文書管理プログラムを実行するコンピュータと同様の処理手順を実行する文書管理方法及び文書管理装置が提供される。 In order to solve the above problems, there are provided a document management method and a document management apparatus that execute the same processing procedure as that of a computer that executes the document management program.

開示の文書管理プログラム、文書管理方法及び文書管理装置によれば、文書の登録時、インデックスを全て抽出し、抽出したインデックスを文書のイメージデータに対応付ける。このように、１度のイメージデータ読込みでインデックスが全て抽出されるため、オペレータの登録時の作業効率を向上させることができる。また、検索時には、インデックスに基づいて文書のイメージデータを容易に取得することができ、検索時の作業効率も向上できる。 According to the disclosed document management program, document management method, and document management apparatus, when a document is registered, all indexes are extracted, and the extracted indexes are associated with document image data. In this way, since all the indexes are extracted by reading the image data once, it is possible to improve the work efficiency when registering the operator. Further, at the time of retrieval, document image data can be easily acquired based on the index, and work efficiency at the time of retrieval can be improved.

実施の形態に適用される発明の概念図である。It is a conceptual diagram of the invention applied to embodiment. 第１の実施の形態の文書管理システムの一例を示した図である。It is a figure showing an example of a document management system of a 1st embodiment. 文書管理装置のハードウェアの構成例を示す図である。It is a figure which shows the structural example of the hardware of a document management apparatus. 文書管理装置のソフトウェア構成を示したブロック図である。It is the block diagram which showed the software configuration of the document management apparatus. 文書の一例を示した図である。It is the figure which showed an example of the document. 代表帳票が包括帳票の場合の文書登録処理を示した図である。It is the figure which showed the document registration process in case a representative form is a comprehensive form. 代表帳票が通常帳票の場合の文書登録処理を示した図である。FIG. 10 is a diagram illustrating document registration processing when a representative form is a normal form. インデックスの関係を示した図である。It is the figure which showed the relationship of the index. インデックス管理情報の一例を示した図である。It is the figure which showed an example of index management information. 代表帳票が包括帳票の場合の検索処理を示した図である。It is the figure which showed the search process in case a representative form is a comprehensive form. 代表帳票が通常帳票の場合の検索処理を示した図である。It is the figure which showed the search process in case a representative form is a normal form. 文書管理装置の文書登録時の読取・認識処理の手順を示したフローチャートである。6 is a flowchart showing a procedure of reading / recognition processing at the time of document registration of the document management apparatus. 文書管理装置の文書登録時の点検・登録処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the inspection and registration process at the time of document registration of a document management apparatus. 文書管理装置の検索処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the search processing of the document management device. 第２の実施の形態の代表帳票が包括帳票の場合の文書登録処理を示した図である。It is the figure which showed the document registration process in case the representative form of 2nd Embodiment is a comprehensive form. 第２の実施の形態のインデックス管理情報の一例を示した図である。It is the figure which showed an example of the index management information of 2nd Embodiment. 第２の実施の形態の代表帳票が包括帳票の場合の文書検索処理を示した図である。It is the figure which showed the document search process in case the representative form of 2nd Embodiment is a comprehensive form. 従来の複数インデックスを含む文書の管理の一例を示した図である。It is the figure which showed an example of the management of the document containing the conventional several index.

以下、本発明の実施の形態を、図面を参照して説明する。図１は、実施の形態に適用される発明の概念図である。
文書管理装置１は、読取手段１ａ、認識手段１ｂ、登録手段１ｃ及び検索手段１ｄを有し、インデックスを生成して文書５のイメージデータを登録する登録系の処理と、登録した文書５のイメージデータを検索する検索系の処理を行う。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a conceptual diagram of the invention applied to the embodiment.
The document management apparatus 1 includes a reading unit 1a, a recognition unit 1b, a registration unit 1c, and a search unit 1d. The document management apparatus 1 generates an index and registers image data of the document 5 and an image of the registered document 5 Performs search processing to retrieve data.

管理対象の文書５は、代表帳票５ａと、代表帳票５ａに添付されているＮ枚（Ｎは、Ｎ≧０の整数）の付帯帳票５ｂとを有する。ここで、帳票は、所定の様式で情報が記載された文書を指すとする。代表帳票５ａには、付帯帳票５ｂを識別する情報がインデックスとして記載されている。また、代表帳票５ａには、文書５に対して１つのインデックスが設定される通常帳票と、複数のインデックスが設定される包括帳票とがある。通常帳票の場合、文書５は、１つのインデックスが記載される代表帳票５ａと、そのインデックスに関連する１つの付帯帳票５ｂを有する。包括帳票の場合、文書５は、Ｍ（Ｍは、Ｍ≧０の整数）個のインデックスが記載される代表帳票５ａと、それぞれのインデックスに関連付けられるＭ種類の付帯帳票５ｂとを有する。 The document 5 to be managed has a representative form 5a and N attached forms 5b (N is an integer of N ≧ 0) attached to the representative form 5a. Here, it is assumed that the form indicates a document in which information is described in a predetermined format. In the representative form 5a, information for identifying the incidental form 5b is described as an index. The representative form 5a includes a normal form in which one index is set for the document 5 and a comprehensive form in which a plurality of indexes are set. In the case of a normal form, the document 5 has a representative form 5a in which one index is described and one incidental form 5b related to the index. In the case of a comprehensive form, the document 5 has a representative form 5a in which M (M is an integer of M ≧ 0) indexes are described, and M types of incidental forms 5b associated with each index.

読取手段１ａは、例えば、スキャナを制御し、管理対象の文書５の画像イメージを読み取り、そのイメージデータを生成する。読取手段１ａは、文書５に含まれる代表帳票５ａ及び付帯帳票５ｂの画像イメージを順次読み取り、読み取った順にイメージデータを認識手段１ｂに出力する。ここで、文書５は、代表帳票５ａを先頭に、代表帳票５ａに記載されているインデックスの配列順に従って付帯帳票５ｂが並べられているとする。読み取ったイメージデータは、代表帳票５ａのイメージデータ、先頭に記載されたインデックスに対応する付帯帳票のイメージデータ、次に記載されたインデックスに対応する付帯帳票のイメージデータ、という順に出力される。なお、読取手段１ａは、文書５について既に電子化されたイメージデータを外部から入力するとしてもよい。 The reading unit 1a controls, for example, a scanner, reads an image image of the document 5 to be managed, and generates image data thereof. The reading unit 1a sequentially reads the image images of the representative form 5a and the accompanying form 5b included in the document 5, and outputs the image data to the recognition unit 1b in the order of reading. Here, it is assumed that the document 5 has the incidental form 5b arranged in the order of the index described in the representative form 5a with the representative form 5a at the top. The read image data is output in the order of the image data of the representative form 5a, the image data of the attached form corresponding to the index described at the head, and the image data of the attached form corresponding to the next described index. Note that the reading unit 1a may input image data already digitized for the document 5 from the outside.

認識手段１ｂは、読取手段１ａが取得した文書５のイメージデータの代表帳票５ａのイメージデータからインデックスを抽出する。認識手段１ｂは、読取手段１ａから取得したイメージデータの帳票種別を判定し、代表帳票５ａであれば、インデックスを抽出する。このとき、代表帳票５ａが通常帳票であれば、１つのインデックスを抽出し、包括帳票であれば、代表帳票５ａに記載された全てのインデックスを抽出する。例えば、帳票ごとに予め決められたデータの領域定義に基づいて、インデックスが記載されている領域を検出し、そのイメージデータを抽出する。そして、抽出したイメージデータに文字認識処理を施し、得られた語をインデックスとして抽出する。読取手段１ａから取得したイメージデータと、抽出したインデックスは、登録手段１ｃへ出力する。 The recognizing unit 1b extracts an index from the image data of the representative form 5a of the image data of the document 5 acquired by the reading unit 1a. The recognizing unit 1b determines the form type of the image data acquired from the reading unit 1a, and if it is the representative form 5a, extracts the index. At this time, if the representative form 5a is a normal form, one index is extracted, and if the representative form 5a is a comprehensive form, all indexes described in the representative form 5a are extracted. For example, based on the area definition of data predetermined for each form, the area where the index is described is detected, and the image data is extracted. Then, a character recognition process is performed on the extracted image data, and the obtained word is extracted as an index. The image data acquired from the reading unit 1a and the extracted index are output to the registration unit 1c.

登録手段１ｃは、読取手段１ａが読み取ったイメージデータを文書単位にまとめ、その文書単位のイメージデータ３ｃを記憶装置３に格納するとともに、個別インデックス及び文書インデックスを生成して管理する。文書単位のイメージデータ３ｃは、読み取り順に従って、代表帳票イメージデータ、付帯帳票イメージデータの順に配列されている。文書インデックスは、文書を識別する識別情報であり、１文書単位のイメージデータの記憶領域に対応付けられる。図１の例では、文書５を識別する文書インデックスを生成し、記憶装置３に格納された文書５のイメージデータ３ｃのアドレスに対応付けられる。生成した文書インデックスと、対応付けられた文書５の文書単位のイメージデータ３ｃのアドレスは、文書インデックス管理情報３ｂに登録する。文書インデックスは、代表帳票５ａが、通常帳票であっても包括帳票であっても同じになる。個別インデックスは、認識手段１ｂで抽出した代表帳票５ａに記載されたインデックスである。帳票のイメージデータは、文書単位のイメージデータ３ｃが１つ格納されるので、どのインデックスで検索する場合であっても、抽出するのは文書単位のイメージデータ３ｃになる。そこで、個別インデックスは、文書単位のイメージデータ３ｃの格納アドレスに対応付ける。ここでは、文書インデックスが文書単位のイメージデータ３ｃに対応付けられているので、個別インデックスに記載された文書の文書インデックスを対応付ける。登録手段１ｃ、認識手段１ｂが抽出した個別インデックスに文書インデックスを対応付け、個別インデックス管理情報３ａに登録する。 The registration unit 1c collects the image data read by the reading unit 1a in document units, stores the image data 3c in document units in the storage device 3, and generates and manages individual indexes and document indexes. The document-unit image data 3c is arranged in the order of representative form image data and incidental form image data in accordance with the reading order. The document index is identification information for identifying a document, and is associated with a storage area of image data for each document. In the example of FIG. 1, a document index for identifying the document 5 is generated and associated with the address of the image data 3 c of the document 5 stored in the storage device 3. The generated document index and the address of the image data 3c of the document unit of the associated document 5 are registered in the document index management information 3b. The document index is the same regardless of whether the representative form 5a is a normal form or a comprehensive form. The individual index is an index described in the representative form 5a extracted by the recognition unit 1b. As the image data of the form, one piece of image data 3c in document units is stored. Therefore, regardless of the index to be searched, the image data 3c in document units is extracted. Therefore, the individual index is associated with the storage address of the image data 3c in document units. Here, since the document index is associated with the image data 3c in document units, the document index of the document described in the individual index is associated. The document index is associated with the individual index extracted by the registration unit 1c and the recognition unit 1b and registered in the individual index management information 3a.

検索手段１ｄは、ユーザ等によってインデックスが指定されたとき、指定されたインデックスに対応する付帯帳票を含む文書のイメージデータを抽出し、ユーザに提供する。まず、指定されたインデックスに基づいて個別インデックス管理情報３ａを検索し、このインデックスに対応する文書インデックスを検出する。続いて検出した文書インデックスに基づいて、文書インデックス管理情報３ｂを検索し、この文書の文書単位のイメージデータが記憶されているアドレスを検出する。そして、検出されたアドレスに基づき、指定されたインデックスの付帯帳票を含む文書単位のイメージデータ３ｃを読み出し、文書イメージデータ７として要求元に提供する。 When an index is designated by the user or the like, the retrieval unit 1d extracts image data of a document including an incidental form corresponding to the designated index and provides it to the user. First, the individual index management information 3a is searched based on the designated index, and the document index corresponding to this index is detected. Subsequently, based on the detected document index, the document index management information 3b is searched, and the address where the image data of the document unit of this document is stored is detected. Then, based on the detected address, the document-unit image data 3c including the attached form of the designated index is read and provided as document image data 7 to the request source.

文書管理装置１の動作について説明する。文書の登録処理では、代表帳票５ａと付帯帳票５ｂを有する文書５が指定されると、読取手段１ａは文書５の各帳票のイメージデータを順次取得し、認識手段１ｂへ出力する。イメージデータは、代表帳票５ａのイメージデータを先頭に、帳票の並び順に入力される。認識手段１ｂは、順次入力するイメージデータの帳票種別を判別し、代表帳票５ａであるときは、インデックスを抽出する。代表帳票５ａが通常帳票であれば、１つのインデックスを抽出する。代表帳票５ａが包括帳票であれば、記載される全てのインデックスを抽出する。インデックスは、予め定義された領域のイメージデータを抽出し、文字認識を行って抽出する。登録手段１ｃは、読取手段１ａが読み取った文書５のイメージデータをまとめた文書単位のイメージデータ３ｃを記憶装置３の所定の記憶領域に格納する。そして、文書５を識別する文書インデックスに関連付けて、文書単位のイメージデータ３ｃが格納されるアドレスに対応付け、文書インデックス管理情報３ｂに登録する。次に、認識手段１ｂが認識した代表帳票５ａに記載されていたインデックスを用いて個別インデックスを生成し、文書５の文書インデックスを対応づけて個別インデックス管理情報３ａに登録する。 The operation of the document management apparatus 1 will be described. In the document registration process, when the document 5 having the representative form 5a and the incidental form 5b is designated, the reading unit 1a sequentially acquires the image data of each form of the document 5 and outputs it to the recognition unit 1b. The image data is input in the order of the form, starting with the image data of the representative form 5a. The recognizing unit 1b discriminates the form type of the image data to be sequentially input, and if it is the representative form 5a, extracts the index. If the representative form 5a is a normal form, one index is extracted. If the representative form 5a is a comprehensive form, all indexes to be described are extracted. The index is extracted by extracting image data of a predefined area and performing character recognition. The registration unit 1 c stores document data image data 3 c in which image data of the document 5 read by the reading unit 1 a is collected in a predetermined storage area of the storage device 3. Then, the document 5 is associated with the document index for identifying the document 5 and is registered in the document index management information 3b in association with the address where the image data 3c for each document is stored. Next, an individual index is generated using the index described in the representative form 5a recognized by the recognition unit 1b, and the document index of the document 5 is associated and registered in the individual index management information 3a.

文書の検索処理は、インデックスが指定され、検索を開始する。検索手段１ｄは、指定されたインデックスに基づいて個別インデックス管理情報３ａを検索し、指定されたインデックスの付帯帳票が含まれる文書の文書インデックスを検出する。そして、検出した文書インデックスに基づいて文書インデックス管理情報３ｂを検索し、この文書の文書単位のイメージデータ３ｃが格納されるアドレスを検出する。そして、検出したアドレスに基づいて、記憶装置３に格納されるイメージデータ３ｃを抽出し、文書のイメージデータ７として検索の要求元に出力する。 In the document search process, an index is designated and the search is started. The search unit 1d searches the individual index management information 3a based on the specified index, and detects the document index of the document including the incidental form of the specified index. Then, the document index management information 3b is searched based on the detected document index, and an address where the document-unit image data 3c of this document is stored is detected. Then, based on the detected address, the image data 3c stored in the storage device 3 is extracted and output to the search request source as the image data 7 of the document.

このように、登録時には、代表帳票５ａに記載されるインデックスを一括して読み込み、認識された個別インデックスに文書単位のイメージデータを対応付ける。このため、包括帳票であっても、繰り返し代表帳票５ａの読み出しを行なう作業が必要なくなり、オペレータの負担を軽減し、作業効率を向上させることができる。また、文書に複数設定される個別インデックスは、文書のイメージデータに対応付けられているので、検索時に個別インデックスを指定すれば対象文書のイメージデータが抽出される。このため、所望のデータを得るために検索を繰り返す必要がなくなり、検索の効率を上げることができる。この結果、複数のインデックスを含む文書の登録処理及び検索処理の効率が向上し、文書管理を容易にすることが可能となる。 As described above, at the time of registration, the indexes described in the representative form 5a are collectively read, and the image data for each document is associated with the recognized individual indexes. For this reason, even if it is a comprehensive form, the operation | work which reads the representative form 5a repeatedly becomes unnecessary, can reduce a burden of an operator, and can improve work efficiency. Since a plurality of individual indexes set for a document are associated with the image data of the document, the image data of the target document is extracted if the individual index is designated at the time of retrieval. For this reason, it is not necessary to repeat the search to obtain the desired data, and the search efficiency can be increased. As a result, the efficiency of registration processing and search processing of documents including a plurality of indexes is improved, and document management can be facilitated.

次に、図１に示した文書管理装置を保険の申込書類と、その添付書類の管理に適用した場合を例に図面を参照して詳細に説明する。
図２は、第１の実施の形態の文書管理システムの一例を示した図である。文書管理システムは、文書管理装置１００、スキャナ２００、文書データ記憶装置３００及びモニタ６００を有し、対象の文書５００を電子化して管理する。 Next, the document management apparatus shown in FIG. 1 will be described in detail with reference to the drawings, taking as an example the case where the document management apparatus is applied to the management of insurance application documents and attached documents.
FIG. 2 is a diagram illustrating an example of a document management system according to the first embodiment. The document management system includes a document management device 100, a scanner 200, a document data storage device 300, and a monitor 600, and manages the target document 500 by digitizing it.

文書管理装置１００は、スキャナ２００が読み取った文書５００のイメージデータを取得し、インデックスを付与してインデックス管理情報に設定し、文書５００のイメージデータとともに文書データ記憶装置３００に格納する。また、検索要求に応じて文書データ記憶装置３００に格納された文書のイメージデータを検索して抽出する。スキャナ２００は、文書５００に含まれる代表帳票と付帯帳票を順次読み取り、読み取った順にイメージデータを生成して文書管理装置１００に出力する。文書データ記憶装置３００は、インデックス管理情報と、文書５００のイメージデータとを記憶する。モニタ６００は、文書管理装置１００に従って表示データを表示する。 The document management apparatus 100 acquires the image data of the document 500 read by the scanner 200, assigns an index, sets the index management information, and stores it in the document data storage apparatus 300 together with the image data of the document 500. Further, the image data of the document stored in the document data storage device 300 is searched and extracted in response to the search request. The scanner 200 sequentially reads representative forms and incidental forms included in the document 500, generates image data in the order of reading, and outputs the image data to the document management apparatus 100. The document data storage device 300 stores index management information and image data of the document 500. The monitor 600 displays display data according to the document management apparatus 100.

次に、文書管理装置１００について説明する。図３は、文書管理装置のハードウェアの構成例を示す図である。文書管理装置１００は、ＣＰＵ（Central Processing Unit）１０１によって装置全体が制御されている。ＣＰＵ１０１には、バス１０９を介してＲＡＭ（Random Access Memory）１０２と複数の周辺機器が接続されている。 Next, the document management apparatus 100 will be described. FIG. 3 is a diagram illustrating a hardware configuration example of the document management apparatus. The entire document management apparatus 100 is controlled by a CPU (Central Processing Unit) 101. A RAM (Random Access Memory) 102 and a plurality of peripheral devices are connected to the CPU 101 via a bus 109.

ＲＡＭ１０２は、文書管理装置１００の主記憶装置として使用される。ＲＡＭ１０２には、ＣＰＵ１０１に実行させるＯＳ（Operating System）のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、ＲＡＭ１０２には、ＣＰＵ１０１による処理に必要な各種データが格納される。 The RAM 102 is used as a main storage device of the document management apparatus 100. The RAM 102 temporarily stores at least part of an OS (Operating System) program and application programs to be executed by the CPU 101. The RAM 102 stores various data necessary for processing by the CPU 101.

バス１０９に接続されている周辺機器としては、ハードディスクドライブ（ＨＤＤ:Hard Disk Drive）１０３、グラフィック処理装置１０４、入力インタフェース１０５、光学ドライブ装置１０６、スキャナ制御装置１０７及び通信インタフェース１０８がある。 Peripheral devices connected to the bus 109 include a hard disk drive (HDD) 103, a graphic processing device 104, an input interface 105, an optical drive device 106, a scanner control device 107, and a communication interface 108.

ＨＤＤ１０３は、内蔵したディスクに対して、磁気的にデータの書き込みおよび読み出しを行う。ＨＤＤ１０３は、文書管理装置１００の二次記憶装置として使用される。ＨＤＤ１０３には、ＯＳのプログラム、アプリケーションプログラム、および各種データが格納される。文書データ記憶装置３００として機能するとしてもよい。なお、二次記憶装置としては、フラッシュメモリなどの半導体記憶装置を使用することもできる。 The HDD 103 magnetically writes and reads data to and from the built-in disk. The HDD 103 is used as a secondary storage device of the document management apparatus 100. The HDD 103 stores an OS program, application programs, and various data. The document data storage device 300 may function. Note that a semiconductor storage device such as a flash memory can also be used as the secondary storage device.

グラフィック処理装置１０４には、モニタ６００が接続されている。グラフィック処理装置１０４は、ＣＰＵ１０１からの命令に従って、画像をモニタ６００の画面に表示させる。モニタ６００としては、ＣＲＴ（Cathode Ray Tube）を用いた表示装置や液晶表示装置などがある。 A monitor 600 is connected to the graphic processing device 104. The graphic processing device 104 displays an image on the screen of the monitor 600 in accordance with a command from the CPU 101. Examples of the monitor 600 include a display device using a CRT (Cathode Ray Tube) and a liquid crystal display device.

入力インタフェース１０５には、キーボード６０１とマウス６０２とが接続されている。入力インタフェース１０５は、キーボード６０１やマウス６０２から送られてくる信号をＣＰＵ１０１に送信する。なお、マウス６０２は、ポインティングデバイスの一例であり、他のポインティングデバイスを使用することもできる。他のポインティングデバイスとしては、タッチパネル、タブレット、タッチパッド、トラックボールなどがある。 A keyboard 601 and a mouse 602 are connected to the input interface 105. The input interface 105 transmits signals sent from the keyboard 601 and the mouse 602 to the CPU 101. Note that the mouse 602 is an example of a pointing device, and other pointing devices can also be used. Examples of other pointing devices include a touch panel, a tablet, a touch pad, and a trackball.

光学ドライブ装置１０６は、レーザ光などを利用して、光ディスク６０３に記録されたデータの読み取りを行う。光ディスク６０３は、光の反射によって読み取り可能なようにデータが記録された可搬型の記録媒体である。光ディスク６０３には、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）などがある。 The optical drive device 106 reads data recorded on the optical disk 603 using a laser beam or the like. The optical disk 603 is a portable recording medium on which data is recorded so that it can be read by reflection of light. The optical disk 603 includes a DVD (Digital Versatile Disc), a DVD-RAM, a CD-ROM (Compact Disc Read Only Memory), a CD-R (Recordable) / RW (ReWritable), and the like.

スキャナ制御装置１０７は、スキャナ２００に接続し、ＣＰＵ１０１からの命令に従ってスキャナ２００に画像の読み取りを指示するとともに、スキャナ２００から取得したイメージデータをＣＰＵ１０１に送信する。 The scanner control device 107 is connected to the scanner 200, instructs the scanner 200 to read an image in accordance with an instruction from the CPU 101, and transmits image data acquired from the scanner 200 to the CPU 101.

通信インタフェース１０８は、ネットワーク６０４に接続されている。通信インタフェース１０８は、ネットワーク６０４を介して、他のコンピュータまたは通信機器との間でデータの送受信を行う。 The communication interface 108 is connected to the network 604. The communication interface 108 transmits and receives data to and from other computers or communication devices via the network 604.

また、文書データ記憶装置３００を外部に設けるときは、内部バス１０９に接続するインタフェースを介して、または、ネットワーク６０４を介して文書管理装置１００に接続させることができる。以上のようなハードウェア構成によって、本実施の形態の処理機能を実現することができる。 When the document data storage device 300 is provided externally, it can be connected to the document management device 100 via an interface connected to the internal bus 109 or via the network 604. With the hardware configuration as described above, the processing functions of the present embodiment can be realized.

次に、文書管理装置１００のソフトウェア構成について説明する。図４は、文書管理装置のソフトウェア構成を示したブロック図である。文書管理装置１００は、読取部１１０、認識部１２０、点検部１３０、登録部１４０、検索部１５０及び表示部１６０を有する。また、文書データ記憶装置３００には、個別インデックスＤＢ３１０、文書インデックスＤＢ３２０及びイメージＤＢ３３０が設けられている。 Next, the software configuration of the document management apparatus 100 will be described. FIG. 4 is a block diagram showing a software configuration of the document management apparatus. The document management apparatus 100 includes a reading unit 110, a recognition unit 120, an inspection unit 130, a registration unit 140, a search unit 150, and a display unit 160. The document data storage device 300 is provided with an individual index DB 310, a document index DB 320, and an image DB 330.

読取部１１０は、スキャナ２００の動作を制御し、スキャナ２００が読み取った文書５００のイメージデータを取得する。取得したイメージデータは、読み取った順に認識部１２０に出力する。 The reading unit 110 controls the operation of the scanner 200 and acquires image data of the document 500 read by the scanner 200. The acquired image data is output to the recognition unit 120 in the order of reading.

認識部１２０は、帳票判別部１２１及び文字認識部１２２を有する。帳票判別部１２１は、読取部１１０から入力する帳票の種別が、代表帳票であるのか付帯帳票であるのか、また代表帳票であれば、通常文書であるのか、包括帳票であるのかを判別する。例えば、帳票上の所定の位置に記載された帳票種別を読み出して判別する。また、予め各帳票の特徴を定義しておき、読み取ったイメージデータから得られる特徴と照合して帳票を判別するとしてもよい。イメージデータが、付帯帳票であれば、そのまま点検部１３０へ出力する。代表帳票を検出したときは、代表帳票が通常帳票か包括帳票かを判別し、判別結果とともに文字認識部１２２へ出力する。文字認識部１２２は、代表帳票のイメージデータに記載されているインデックスを抽出する。インデックスは、代表帳票に添付される付帯帳票を識別する情報で、代表帳票に記載されている項目のうち、どれをインデックスとするかは予め決められている。また、インデックスが記載されている領域も予め定義されている。代表帳票が通常帳票の場合、予め定義されている領域のイメージデータに文字認識処理を行い、１個のインデックスを抽出する。代表帳票が包括帳票の場合は、予め定義されている領域のイメージデータに対し文字認識処理を施し、Ｍ個のインデックスを抽出する。代表帳票のイメージデータと、抽出したインデックスは、点検部１３０に出力する。 The recognition unit 120 includes a form determination unit 121 and a character recognition unit 122. The form discriminating unit 121 discriminates whether the type of the form input from the reading unit 110 is a representative form or an accompanying form, and if it is a representative form, it is a normal document or a comprehensive form. For example, the form type described at a predetermined position on the form is read and determined. Alternatively, the characteristics of each form may be defined in advance, and the form may be identified by comparing with the characteristics obtained from the read image data. If the image data is an accompanying form, it is output to the inspection unit 130 as it is. When the representative form is detected, it is determined whether the representative form is a normal form or a comprehensive form, and is output to the character recognition unit 122 together with the determination result. The character recognition unit 122 extracts an index described in the image data of the representative form. The index is information for identifying an incidental form attached to the representative form, and which of the items described in the representative form is determined as an index. An area where an index is described is also defined in advance. When the representative form is a normal form, character recognition processing is performed on image data in a predefined area, and one index is extracted. If the representative form is a comprehensive form, character recognition processing is performed on image data in a predefined area, and M indexes are extracted. The image data of the representative form and the extracted index are output to the inspection unit 130.

点検部１３０は、読取部１１０が読み取った文書５００のイメージデータ、認識部１２０が抽出したインデックス等を表示部１６０に表示し、オペレータによる点検を受ける。オペレータは、表示部１６０に表示されたイメージデータを見て、読み取った文書の順序や、読み取り誤りをチェックする。また、認識部１２０が抽出したインデックスを確認し、正しく文字認識ができたか、抜けはないか等を確認する。こうしたオペレータの点検を経たイメージデータ及びインデックスは、登録部１４０に出力する。 The inspection unit 130 displays the image data of the document 500 read by the reading unit 110, the index extracted by the recognition unit 120, and the like on the display unit 160, and is checked by the operator. The operator looks at the image data displayed on the display unit 160 and checks the order of the read documents and reading errors. In addition, the index extracted by the recognition unit 120 is checked to check whether the character has been correctly recognized or not missing. The image data and the index that have undergone such operator inspection are output to the registration unit 140.

登録部１４０は、オペレータによる点検が終了した文書のイメージデータを文書単位に分割してイメージＤＢに格納する。また、文書インデックスと個別インデックスを生成し、それぞれ文書インデックスＤＢ３２０、個別インデックスＤＢ３１０に格納する。 The registration unit 140 divides the image data of the document that has been checked by the operator into document units and stores it in the image DB. Further, a document index and an individual index are generated and stored in the document index DB 320 and the individual index DB 310, respectively.

検索部１５０は、イメージＤＢ３３０に格納した文書のイメージデータの検索依頼を受け取ると、指定された検索キーワードを用いて個別インデックスＤＢ３１０、文書インデックスＤＢ３２０を検索し、指定されたキーワードに対応する文書のイメージデータを抽出する。そして、抽出したイメージデータを、例えば、表示部１６０に表示する。なお、抽出したイメージデータは、印刷装置に出力して印刷したり、外部記憶装置に出力したりすることもできる。 Upon receiving a search request for image data of a document stored in the image DB 330, the search unit 150 searches the individual index DB 310 and the document index DB 320 using the specified search keyword, and the image of the document corresponding to the specified keyword. Extract data. Then, the extracted image data is displayed on the display unit 160, for example. The extracted image data can be output to a printing device and printed, or can be output to an external storage device.

表示部１６０は、点検部１３０及び検索部１５０からの指示に応じて、イメージデータに基づく文書５００の画像や、インデックス等を表示画面に表示する。
個別インデックスＤＢ３１０は、イメージＤＢ３３０に格納した文書のイメージデータ検索の際に用いる個別インデックス管理情報を管理する。 The display unit 160 displays an image of the document 500 based on the image data, an index, and the like on the display screen in accordance with instructions from the inspection unit 130 and the search unit 150.
The individual index DB 310 manages individual index management information used when searching for image data of documents stored in the image DB 330.

文書インデックスＤＢ３２０は、１文書を識別する文書インデックスと、この文書に対するイメージデータが格納されるアドレスとを対応づけた文書インデックス管理情報を管理する。 The document index DB 320 manages document index management information in which a document index for identifying one document is associated with an address where image data for the document is stored.

イメージＤＢ３３０には、文書単位のイメージデータが格納される。
以下、文書５００の具体例を用いて、文書管理装置１００の動作を説明する。図５は、文書の一例を示した図である。文書管理装置１００は、文書ＡＢＣ５１０、文書Ｄ５２０、文書Ｅ５３０及び文書Ｆ５４０のイメージデータの登録及び管理を行う。図５に示した文書は、証券番号をインデックスとして文書管理を行うとする。 The image DB 330 stores image data in units of documents.
Hereinafter, the operation of the document management apparatus 100 will be described using a specific example of the document 500. FIG. 5 is a diagram showing an example of a document. The document management apparatus 100 registers and manages image data of the document ABC 510, the document D 520, the document E 530, and the document F 540. The document shown in FIG. 5 is assumed to be managed by using the security number as an index.

文書ＡＢＣ５１０は、代表帳票である申込書類ＡＢＣ５１１と、付帯帳票である添付書類（Ａ）５１２、添付書類（Ｂ）５１３及び添付書類（Ｃ）５１４を有する。申込書類ＡＢＣ５１１は、包括帳票であり、複数の契約者の証券番号として「Ａ」、「Ｂ」、「Ｃ」が記載されている。この証券番号は、申込書類ＡＢＣ５１１に添付される各契約者の添付書類（Ａ）５１２、添付書類（Ｂ）５１３及び添付書類（Ｃ）５１４を識別するための識別情報でもある。例えば、文書ＡＢＣ５１０は、申込書類ＡＢＣ５１１を先頭として、申込書類ＡＢＣ５１１に記載された証券番号の並び順に、各証券番号に対応する添付書類が並べられている。図５の例では、証券番号について、「Ａ」に対応する添付書類（Ａ）５１２、「Ｂ」に対応する添付書類（Ｂ）５１３及び「Ｃ」に対応する添付書類（Ｃ）５１４を有する。 The document ABC 510 includes an application document ABC 511 that is a representative form, an attached document (A) 512, an attached document (B) 513, and an attached document (C) 514 that are incidental forms. The application document ABC511 is a comprehensive form, in which “A”, “B”, and “C” are described as securities numbers of a plurality of contractors. This security number is also identification information for identifying the attached document (A) 512, attached document (B) 513, and attached document (C) 514 of each contractor attached to the application document ABC 511. For example, the document ABC 510 has the application documents ABC 511 at the top, and the attached documents corresponding to the respective securities numbers are arranged in the order of the securities numbers described in the application documents ABC 511. In the example of FIG. 5, the securities number has an attached document (A) 512 corresponding to “A”, an attached document (B) 513 corresponding to “B”, and an attached document (C) 514 corresponding to “C”. .

文書Ｄ５２０は、代表帳票である申込書類Ｄ５２１と、付帯帳票である添付書類（Ｄ）５２２を有する。申込書類Ｄ５２１は、通常帳票であり、１名の契約者の証券番号「Ｄ」のみが記載されている。また、「Ｄ」に対応する添付書類（Ｄ）５２２を有する。文書Ｅ５３０は、代表帳票である申込書類Ｅ５３１と、付帯帳票である添付書類（Ｅ）５３２を有する。文書Ｆ５４０は、代表帳票である申込書類Ｆ５４１と、付帯帳票である添付書類（Ｆ）５４２を有する。 The document D520 includes an application document D521 that is a representative form and an attached document (D) 522 that is an accompanying form. The application document D521 is a normal form, and only the securities number “D” of one contractor is described. Further, it has an attached document (D) 522 corresponding to “D”. The document E530 includes an application document E531 that is a representative form and an attached document (E) 532 that is an accompanying form. The document F540 includes an application document F541 that is a representative form and an attached document (F) 542 that is an accompanying form.

次に、文書管理装置１００の文書ＡＢＣ５１０、文書Ｄ５２０、文書Ｅ５３０及び文書Ｆ５４０の登録処理について説明する。まず、包括帳票を有する文書ＡＢＣ５１０の登録処理について説明し、続いて通常帳票を有する文書Ｄ５２０、文書Ｅ５３０及び文書Ｆ５４０の登録処理について説明する。 Next, registration processing of the document ABC 510, the document D 520, the document E 530, and the document F 540 of the document management apparatus 100 will be described. First, registration processing of the document ABC510 having a comprehensive form will be described, and subsequently, registration processing of the document D520, the document E530, and the document F540 having a normal form will be described.

図６は、代表帳票が包括帳票の場合の文書登録処理を示した図である。文書ＡＢＣ５１０は、代表帳票が包括帳票の文書であり、代表帳票である申込書類ＡＢＣ５１１と、添付書類（Ａ）５１２、添付書類（Ｂ）５１３、及び添付書類（Ｃ）５１４を有する。 FIG. 6 is a diagram showing document registration processing when the representative form is a comprehensive form. The document ABC 510 is a document in which a representative form is a comprehensive form, and includes an application document ABC 511 that is a representative form, an attached document (A) 512, an attached document (B) 513, and an attached document (C) 514.

読取部１１０は、申込書類ＡＢＣ５１１、添付書類（Ａ）５１２、添付書類（Ｂ）５１３、添付書類（Ｃ）５１４の順に画像を読み取り、そのイメージデータを生成する。認識部１２０は、読取部１１０が読み取ったイメージデータの帳票を判別し、帳票が代表帳票であるときは、インデックスを抽出する。図６の例では、まず、申込書類ＡＢＣ５１１を代表帳票かつ包括帳票と認識し、Ｎ個のインデックスを抽出する。ここでは、証券番号について「Ａ」、「Ｂ」、「Ｃ」の３個のインデックス１２０１を抽出する。添付書類（Ａ）５１２、添付書類（Ｂ）５１３及び添付書類（Ｃ）５１４は、付帯帳票と判別し、認識処理は行わない。 The reading unit 110 reads images in the order of the application document ABC511, the attached document (A) 512, the attached document (B) 513, and the attached document (C) 514, and generates the image data. The recognizing unit 120 determines the form of the image data read by the reading unit 110, and extracts an index when the form is a representative form. In the example of FIG. 6, first, the application document ABC511 is recognized as a representative form and a comprehensive form, and N indexes are extracted. Here, three indexes 1201 of “A”, “B”, and “C” are extracted for the security number. The attached document (A) 512, the attached document (B) 513, and the attached document (C) 514 are discriminated as accompanying documents, and recognition processing is not performed.

各帳票のイメージデータと、インデックス１２０１は、点検部１３０に引き渡される。点検部１３０は、イメージデータを表示部１６０に表示し、オペレータの点検を待つ。インデックス１２０１も表示部１６０に表示する。オペレータの点検を受けた申込書類ＡＢＣのイメージデータ５１１１、添付書類（Ａ）のイメージデータ５１２１、添付書類（Ｂ）のイメージデータ５１３１、添付書類（Ｃ）のイメージデータ５１４１及びインデックス１２０１は、登録部１４０へ引き渡される。 The image data of each form and the index 1201 are delivered to the inspection unit 130. The inspection unit 130 displays the image data on the display unit 160 and waits for an operator's inspection. The index 1201 is also displayed on the display unit 160. The image data 5111 of the application document ABC checked by the operator, the image data 5121 of the attached document (A), the image data 5131 of the attached document (B), the image data 5141 of the attached document (C), and the index 1201 140 is delivered.

登録部１４０は、イメージデータを文書単位に分割し、文書単位のイメージデータをイメージＤＢ３３０に格納する。図６の例では、申込書類ＡＢＣのイメージデータ５１１１、添付書類（Ａ）のイメージデータ５１２１、添付書類（Ｂ）のイメージデータ５１３１及び添付書類（Ｃ）のイメージデータ５１４１をこの順に配列し、イメージＤＢ３３０に格納する。また、文書ＡＢＣ５１０に関する文書インデックス「ＩＮＤＥＸ−Ａ」を生成し、イメージＤＢ３３０の文書ＡＢＣ５１０のイメージデータの格納アドレスに対応付け、文書インデックス管理情報に登録し、文書インデックスＤＢ３２０に設定する。そして、申込書類ＡＢＣ５１１から抽出したインデックス１２０１に基づいて、個別インデックスを生成する。図６の例では、認識部１２０が抽出したインデックス１２０１の「Ａ」、「Ｂ」、「Ｃ」をそれぞれ文書５１０のイメージデータを示す文書インデックス「ＩＮＤＥＸ−Ａ」に対応付けて個別インデックス管理情報に登録し、個別インデックスＤＢ３１０に設定する。なお、個別インデックスを先に生成し、後から文書インデックスを生成することもできる。 The registration unit 140 divides the image data into document units, and stores the document unit image data in the image DB 330. In the example of FIG. 6, the image data 5111 of the application document ABC, the image data 5121 of the attached document (A), the image data 5131 of the attached document (B), and the image data 5141 of the attached document (C) are arranged in this order. Store in the DB 330. Also, a document index “INDEX-A” relating to the document ABC 510 is generated, associated with the image data storage address of the document ABC 510 in the image DB 330, registered in the document index management information, and set in the document index DB 320. Then, an individual index is generated based on the index 1201 extracted from the application document ABC511. In the example of FIG. 6, “A”, “B”, and “C” of the index 1201 extracted by the recognition unit 120 are associated with the document index “INDEX-A” indicating the image data of the document 510, respectively, and the individual index management information And set in the individual index DB 310. An individual index can be generated first, and a document index can be generated later.

図７は、代表帳票が通常帳票の場合の文書登録処理を示した図である。文書Ｄ５２０及び文書Ｅ５３０は、代表帳票が通常帳票の文書であり、それぞれ申込書類Ｄ５２１と添付書類（Ｄ）５２２、申込書類Ｅ５３１と添付書類（Ｅ）５３２を有する。図７の例では省略しているが、文書Ｆ５４０についても同様である。 FIG. 7 is a diagram showing document registration processing when the representative form is a normal form. The document D520 and the document E530 are documents in which the representative form is a normal form, and includes an application document D521 and an attached document (D) 522, and an application document E531 and an attached document (E) 532, respectively. Although omitted in the example of FIG. 7, the same applies to the document F540.

読取部１１０は、申込書類Ｄ５２１、添付書類（Ｄ）５２２、申込書類Ｅ５３１、添付書類（Ｅ）５３２の順に画像を読み取り、そのイメージデータを生成する。認識部１２０は、読取部１１０が読み取ったイメージデータの帳票を判別し、帳票が代表帳票であるときは、インデックスを抽出する。図７の例では、まず、申込書類Ｄ５２１を代表帳票かつ通常帳票と認識し、１個のインデックスを抽出する。ここでは、証券番号について「Ｄ」というインデックス１２０２を抽出する。添付書類（Ｄ）５２２については、付帯帳票と判別し、認識処理は行わない。次の申込書類Ｅ５３１は代表帳票かつ通常帳票と認識し、１個のインデックスを抽出する。ここでは、「Ｅ」というインデックス１２０３を抽出する。添付書類（Ｅ）５３２については、付帯帳票と判別し、認識処理は行わない。 The reading unit 110 reads images in the order of application document D521, attached document (D) 522, application document E531, and attached document (E) 532, and generates image data thereof. The recognizing unit 120 determines the form of the image data read by the reading unit 110, and extracts an index when the form is a representative form. In the example of FIG. 7, first, the application document D521 is recognized as a representative form and a normal form, and one index is extracted. Here, an index 1202 “D” is extracted for the security number. The attached document (D) 522 is determined as an accompanying form, and recognition processing is not performed. The next application document E531 is recognized as a representative form and a normal form, and one index is extracted. Here, an index 1203 “E” is extracted. The attached document (E) 532 is determined as an incidental form, and recognition processing is not performed.

各帳票のイメージデータと、インデックス１２０２，１２０３は、点検部１３０に引き渡される。点検部１３０は、イメージデータを表示部１６０に表示し、オペレータの点検を待つ。インデックス１２０２，１２０３も表示部１６０に表示する。オペレータの点検を受けた申込書類Ｄのイメージデータ５２１１、添付書類（Ｄ）のイメージデータ５２２１、申込書類Ｅのイメージデータ５３１１、添付書類（Ｅ）のイメージデータ５３２１及びインデックス１２０２，１２０３は、登録部１４０へ引き渡される。 The image data of each form and the indexes 1202 and 1203 are delivered to the inspection unit 130. The inspection unit 130 displays the image data on the display unit 160 and waits for an operator's inspection. The indexes 1202 and 1203 are also displayed on the display unit 160. The image data 5211 of the application document D that has been checked by the operator, the image data 5221 of the attached document (D), the image data 5311 of the application document E, the image data 5321 of the attached document (E), and the indexes 1202 and 1203 140 is delivered.

登録部１４０は、イメージデータを文書単位に分割し、文書単位のイメージデータをまとめてイメージＤＢ３３０に格納する。図７の例では、申込書類Ｄのイメージデータ５２１１と添付書類（Ｄ）のイメージデータ５２２１という文書単位と、申込書類Ｅのイメージデータ５３１１と添付書類（Ｅ）のイメージデータ５３２１という文書単位と、に分割し、この順を保持してイメージＤＢ３３０に格納する。また、文書Ｄ５２０に対応する文書インデックス「ＩＮＤＥＸ−Ｄ」を生成し、文書Ｄ５２０のイメージデータが格納される領域に対応付けて文書インデックスＤＢ３２０に格納される文書インデックス管理情報に登録する。文書Ｅ５３０についても同様に、文書Ｅ５３０に対応するインデックス「ＩＮＤＥＸ−Ｅ」を生成し、文書Ｅ５３０のイメージデータが格納される領域に対応付けて文書インデックスＤＢ３２０に格納される文書インデックス管理情報に登録する。そして、申込書類Ｄ５２１から抽出したインデックス１２０２の個別インデックスを生成する。図７の例では、認識部１２０が抽出したインデックス１２０２の「Ｄ」を文書インデックス「ＩＮＤＥＸ−Ｄ」に対応付け、個別インデックスＤＢ３１０に格納される個別インデックス管理情報に登録する。同様に、インデックス１２０３の「Ｅ」を文書インデックス「ＩＮＤＥＸ−Ｅ」に対応付け、個別インデックス管理情報に登録する。個別インデックス管理情報は、個別インデックスＤＢ３１０で管理する。 The registration unit 140 divides the image data into document units, and stores the document unit image data together in the image DB 330. In the example of FIG. 7, a document unit of image data 5211 of application document D and image data 5221 of attached document (D), a document unit of image data 5311 of application document E and image data 5321 of attached document (E), Are stored in the image DB 330 while maintaining this order. Also, a document index “INDEX-D” corresponding to the document D520 is generated and registered in the document index management information stored in the document index DB 320 in association with the area where the image data of the document D520 is stored. Similarly, for the document E530, an index “INDEX-E” corresponding to the document E530 is generated and registered in the document index management information stored in the document index DB 320 in association with the area in which the image data of the document E530 is stored. . Then, an individual index of the index 1202 extracted from the application document D521 is generated. In the example of FIG. 7, “D” of the index 1202 extracted by the recognition unit 120 is associated with the document index “INDEX-D” and registered in the individual index management information stored in the individual index DB 310. Similarly, “E” in the index 1203 is associated with the document index “INDEX-E” and registered in the individual index management information. Individual index management information is managed by the individual index DB 310.

図８は、インデックスの関係を示した図である。個別インデックス３１１は、代表帳票である申込書類ＡＢＣ５１１、申込書類Ｄ５２１、申込書類Ｅ５３１及び申込書類Ｆ５４１から抽出した証券番号「Ａ」、「Ｂ」、「Ｃ」、「Ｄ」、「Ｅ」、「Ｆ」である。また、この個別インデックス３１１は、それぞれが属する文書の文書インデックス３２１にリンクされている。図８の例では、文書ＡＢＣ５１０の申込書類ＡＢＣ５１１から抽出した個別インデックス「Ａ」、「Ｂ」、「Ｃ」は、文書ＡＢＣ５１０の文書インデックス「ＩＮＤＥＸ−Ａ」に対応付けられる。文書Ｄ５２０の申込書類Ｄ５２１から抽出した個別インデックス「Ｄ」は、文書Ｄ５２０の文書インデックス「ＩＮＤＥＸ−Ｄ」に対応付けられる。文書Ｅ５３０の申込書類Ｅ５３１から抽出した個別インデックス「Ｅ」は、文書Ｅ５３０の文書インデックス「ＩＮＤＥＸ−Ｅ」に対応付けられる。そして、文書Ｆ５４０の申込書類Ｆ５４１から抽出した個別インデックス「Ｆ」は、文書Ｆ５４０の文書インデックス「ＩＮＤＥＸ−Ｆ」に対応付けられる。 FIG. 8 is a diagram showing the relationship of indexes. The individual index 311 includes the application number ABC511, the application document D521, the application document E531, and the securities number “A”, “B”, “C”, “D”, “E”, “E” extracted from the application document F541. F ". The individual index 311 is linked to the document index 321 of the document to which each belongs. In the example of FIG. 8, the individual indexes “A”, “B”, and “C” extracted from the application document ABC511 of the document ABC510 are associated with the document index “INDEX-A” of the document ABC510. The individual index “D” extracted from the application document D521 of the document D520 is associated with the document index “INDEX-D” of the document D520. The individual index “E” extracted from the application document E531 of the document E530 is associated with the document index “INDEX-E” of the document E530. The individual index “F” extracted from the application document F541 of the document F540 is associated with the document index “INDEX-F” of the document F540.

文書インデックス３２１は、それぞれの文書のイメージデータ３３１が格納される記憶領域を示すアドレス、例えば、イメージＤＢ３３０における文書単位のイメージデータが記憶される記憶領域の先頭アドレスに対応付けられる。図８の例では、文書ＡＢＣ５１０の文書インデックス「ＩＮＤＥＸ−Ａ」は、申込書類ＡＢＣイメージ、添付書類（Ａ）イメージ、添付書類（Ｂ）イメージ及び添付書類（Ｃ）イメージを有する文書ＡＢＣのイメージデータの記憶領域に対応付けられる。文書Ｄ５２０の文書インデックス「ＩＮＤＥＸ−Ｄ」は、申込書類Ｄイメージ及び添付書類（Ｄ）イメージを有する文書Ｄ５２０のイメージデータの記憶領域に対応付けられる。文書Ｅ５３０の文書インデックス「ＩＮＤＥＸ−Ｅ」は、申込書類Ｅイメージ及び添付書類（Ｅ）イメージを有する文書Ｅのイメージデータの記憶領域に対応付けられる。文書Ｆ５４０の文書インデックス「ＩＮＤＥＸ−Ｆ」は、申込書類Ｆイメージ及び添付書類（Ｆ）イメージを有する文書Ｆのイメージデータの記憶領域に対応付けられる。 The document index 321 is associated with an address indicating a storage area in which image data 331 of each document is stored, for example, a head address of a storage area in which image data in units of documents in the image DB 330 is stored. In the example of FIG. 8, the document index “INDEX-A” of the document ABC 510 includes the application document ABC image, the attached document (A) image, the attached document (B) image, and the image data of the document ABC having the attached document (C) image. Are associated with the storage area. The document index “INDEX-D” of the document D520 is associated with the image data storage area of the document D520 having the application document D image and the attached document (D) image. The document index “INDEX-E” of the document E530 is associated with the image data storage area of the document E having the application document E image and the attached document (E) image. The document index “INDEX-F” of the document F540 is associated with the image data storage area of the document F having the application document F image and the attached document (F) image.

このように、個別インデックスを文書インデックスに対応付けることにより、文書インデックスとイメージデータとの間は１：１の関係になる。また、個別インデックスと文書インデックスとの関係は、Ｎ：１になる。すなわち、１つの文書に複数のインデックスが含まれる場合であっても、代表帳票のイメージデータは１つでよくなる。従来のように、個別インデックスに合わせて代表帳票を再度読み込む必要がなくなり、読み取りの効率化及び記憶領域の利用の効率化を図ることができる。また、検索においても、代表帳票の先頭に記載されていない個別インデックスも付帯帳票も含む文書のイメージデータに対応付けられているため、再検索が必要なくなり、検索効率を上げることができるという利点もある。 As described above, by associating the individual index with the document index, the relationship between the document index and the image data is 1: 1. The relationship between the individual index and the document index is N: 1. That is, even if a single document includes a plurality of indexes, only one image data of the representative form is required. As in the prior art, there is no need to read the representative form again in accordance with the individual index, and the reading efficiency and the storage area utilization efficiency can be improved. Also, in the search, since the individual index that is not described at the top of the representative form is associated with the image data of the document including the incidental form, there is an advantage that the re-search is not necessary and the search efficiency can be improved. is there.

図９は、インデックス管理情報の一例を示した図である。（Ａ）は個別インデックス管理情報、（Ｂ）は文書インデックス管理情報を示している。
（Ａ）個別インデックス管理情報３１００には、対象文書の代表帳票から抽出した個別インデックス３１０１に対応付けて、参照先文書インデックス３１０２が登録されている。例えば、文書ＡＢＣ５１０の申込書類ＡＢＣ５１１から抽出された個別インデックス、「Ａ」、「Ｂ」、「Ｃ」には、「ＩＮＤＥＸ−Ａ」を参照先の文書インデックスとすることが登録されている。同様に、「Ｄ」には「ＩＮＤＥＸ−Ｄ」、「Ｅ」には「ＩＮＤＥＸ−Ｅ」、「Ｆ」には「ＩＮＤＥＸ−Ｆ」を参照先の文書インデックスとすることが登録されている。 FIG. 9 is a diagram illustrating an example of index management information. (A) shows individual index management information, and (B) shows document index management information.
(A) In the individual index management information 3100, a reference destination document index 3102 is registered in association with the individual index 3101 extracted from the representative form of the target document. For example, in the individual indexes “A”, “B”, and “C” extracted from the application document ABC 511 of the document ABC 510, “INDEX-A” is registered as a reference destination document index. Similarly, “INDEX-D” is registered in “D”, “INDEX-E” is registered in “E”, and “INDEX-F” is registered in “F” as a reference destination document index.

（Ｂ）文書インデックス管理情報３２００には、対象文書の文書インデックス３２０１に対応付けて、参照先イメージデータ３２０２が登録されている。例えば、文書インデックス「ＩＮＤＥＸ−Ａ」は、「申込書類ＡＢＣイメージ」を参照先のイメージデータとすることが登録されている。同様に、「ＩＮＤＥＸ−Ｄ」には「申込書類Ｄイメージ」、「ＩＮＤＥＸ−Ｅ」には「申込書類Ｅイメージ」、「ＩＮＤＥＸ−Ｆ」には「申込書類Ｆイメージ」を参照先の文書イメージデータとすることが登録されている。例えば、各イメージデータが格納される記憶領域を示すアドレスが登録される。 (B) In the document index management information 3200, reference destination image data 3202 is registered in association with the document index 3201 of the target document. For example, the document index “INDEX-A” is registered to use “application document ABC image” as reference image data. Similarly, “INDEX-D” includes “application document D image”, “INDEX-E” includes “application document E image”, and “INDEX-F” refers to “application document F image”. Registered as data. For example, an address indicating a storage area in which each image data is stored is registered.

なお、図９の例では、インデックスとその参照先とをテーブル形式の個別インデックス管理情報と、文書インデックス管理情報とによって管理するとしているが、本願発明はこれに限定されない。図８に示したインデックスの関係が表現できれば、どのような表現形式で設定してもよい。 In the example of FIG. 9, the index and the reference destination thereof are managed by the table-type individual index management information and the document index management information, but the present invention is not limited to this. Any expression format may be used as long as the index relationship shown in FIG. 8 can be expressed.

次に、イメージＤＢ３３０に格納される文書ＡＢＣ５１０、文書Ｄ５２０、文書Ｅ５３０及び文書Ｆ５４０のイメージデータの検索処理について説明する。まず、代表帳票が包括帳票の場合について説明し、続いて代表帳票が通常帳票の場合について説明する。 Next, search processing of image data of the document ABC 510, the document D 520, the document E 530, and the document F 540 stored in the image DB 330 will be described. First, the case where the representative form is a comprehensive form will be described, and then the case where the representative form is a normal form will be described.

図１０は、代表帳票が包括帳票の場合の検索処理を示した図である。個別インデックスＤＢ３１０には、それぞれが文書インデックスに対応付けられた個別インデックス「Ａ」、「Ｂ」、「Ｃ」が登録されている。文書インデックスＤＢ３２０には、文書ＡＢＣ５１０のイメージデータの記憶領域に対応付けられた文書インデックス「ＩＮＤＥＸ−Ａ」が登録されている。イメージＤＢ３３０には、文書ＡＢＣ５１０のイメージデータが文書単位に格納されている。 FIG. 10 is a diagram showing search processing when the representative form is a comprehensive form. In the individual index DB 310, individual indexes “A”, “B”, and “C”, each associated with a document index, are registered. In the document index DB 320, a document index “INDEX-A” associated with the image data storage area of the document ABC 510 is registered. In the image DB 330, image data of the document ABC 510 is stored in document units.

検索部１５０は、指定された検索キーワードに基づいてインデックスを検索し、イメージデータが格納される記憶領域の先頭アドレスを検出する。図１０の例では、検索キーワード「Ａ」１５０１が指定されると、検索部１５０は、個別インデックスＤＢ３１０の個別インデックス管理情報を検索し、「Ａ」を検出する。個別インデックス「Ａ」は、文書インデックス「ＩＮＤＥＸ−Ａ」に対応付けられているので、文書インデックスＤＢ３２０の文書インデックス管理情報を検索し、「申込書類ＡＢＣ」、「添付書類（Ａ）」、「添付書類（Ｂ）」及び「添付書類（Ｃ）」を有する文書ＡＢＣ５１０のイメージデータが記憶される記憶領域の先頭アドレスを検出する。検索部１５０は、検索キーワード「Ａ」１５０１の応答として、文書ＡＢＣ５１０のイメージデータ１５１１を返す。これにより、ユーザは、指定された証券番号「Ａ」の資料として、代表帳票である申込書類ＡＢＣのイメージデータと、添付書類（Ａ）のイメージデータとを取得し、内容を確認することができる。 The search unit 150 searches the index based on the specified search keyword, and detects the top address of the storage area in which the image data is stored. In the example of FIG. 10, when the search keyword “A” 1501 is specified, the search unit 150 searches the individual index management information in the individual index DB 310 and detects “A”. Since the individual index “A” is associated with the document index “INDEX-A”, the document index management information in the document index DB 320 is searched, and “application document ABC”, “attached document (A)”, “attachment” The head address of the storage area in which the image data of the document ABC 510 having “Document (B)” and “Attached Document (C)” is stored is detected. The search unit 150 returns the image data 1511 of the document ABC 510 as a response to the search keyword “A” 1501. Thereby, the user can acquire the image data of the application document ABC, which is a representative form, and the image data of the attached document (A) as the material of the designated security number “A”, and can confirm the contents. .

次に、検索キーワード「Ｂ」１５０２が指定されたとする。検索部１５０は、個別インデックスＤＢ３１０の個別インデックス管理情報を検索し、「Ｂ」を検出する。個別インデックス「Ｂ」は、文書インデックス「ＩＮＤＥＸ−Ａ」に対応付けられているので、文書インデックスＤＢ３２０の文書インデックス管理情報を検索し、「申込書類ＡＢＣ」、「添付書類（Ａ）」、「添付書類（Ｂ）」及び「添付書類（Ｃ）」を有する文書ＡＢＣ５１０のイメージデータが記憶される記憶領域の先頭アドレスを検出する。検索部１５０は、検索キーワード「Ｂ」１５０２の応答として、文書ＡＢＣのイメージデータ１５１２を返す。これにより、ユーザは、指定された証券番号「Ｂ」の資料として、代表帳票である申込書類ＡＢＣのイメージデータと、添付書類（Ｂ）のイメージデータとを取得し、内容を確認することができる。 Next, it is assumed that the search keyword “B” 1502 is designated. The search unit 150 searches the individual index management information in the individual index DB 310 and detects “B”. Since the individual index “B” is associated with the document index “INDEX-A”, the document index management information in the document index DB 320 is searched and “application document ABC”, “attached document (A)”, “attachment” is searched. The head address of the storage area in which the image data of the document ABC 510 having “Document (B)” and “Attached Document (C)” is stored is detected. The search unit 150 returns the image data 1512 of the document ABC as a response to the search keyword “B” 1502. As a result, the user can acquire the image data of the application document ABC, which is a representative form, and the image data of the attached document (B) as the material of the designated security number “B”, and can confirm the contents. .

図１１は、代表帳票が通常帳票の場合の検索処理を示した図である。個別インデックスＤＢ３１０には、それぞれが文書インデックスに対応付けられた個別インデックス「Ｄ」、「Ｅ」が登録されている。文書インデックスＤＢ３２０には、イメージデータに対応付けられた文書インデックス「ＩＮＤＥＸ−Ｄ」、「ＩＮＤＥＸ−Ｅ」が登録されている。イメージＤＢ３３０には、文書のイメージデータが文書単位に格納されている。 FIG. 11 is a diagram showing search processing when the representative form is a normal form. In the individual index DB 310, individual indexes “D” and “E”, each associated with a document index, are registered. Document indexes “INDEX-D” and “INDEX-E” associated with image data are registered in the document index DB 320. The image DB 330 stores document image data in document units.

図１１の例では、検索キーワード「Ｄ」１５０３が指定されると、検索部１５０は、個別インデックスＤＢ３１０の個別インデックス管理情報を検索し、「Ｄ」を検出する。個別インデックス「Ｄ」は、文書インデックス「ＩＮＤＥＸ−Ｄ」に対応付けられているので、文書インデックスＤＢ３２０の文書インデックス管理情報を検索し、「申込書類Ｄ」及び「添付書類（Ｄ）」を有する文書Ｄのイメージデータ格納領域を検出する。検索部１５０は、検索キーワード「Ｄ」１５０３の応答として、文書Ｄ５２０のイメージデータ１５１３を返す。これにより、ユーザは、指定された証券番号「Ｄ」の資料として、代表帳票である申込書類Ｄのイメージデータと、添付書類（Ｄ）のイメージデータとを取得し、内容を確認することができる。同様に、検索キーワード「Ｅ」１５０４が指定されたときは、応答として、文書Ｅのイメージデータ１５１４を返す。これにより、ユーザは、指定された証券番号「Ｅ」の資料として、代表帳票である申込書類Ｅのイメージデータと、添付書類（Ｅ）のイメージデータとを取得し、内容を確認することができる。 In the example of FIG. 11, when the search keyword “D” 1503 is designated, the search unit 150 searches the individual index management information in the individual index DB 310 and detects “D”. Since the individual index “D” is associated with the document index “INDEX-D”, the document index management information in the document index DB 320 is searched, and the document having “application document D” and “attached document (D)” is retrieved. The image data storage area of D is detected. The search unit 150 returns the image data 1513 of the document D520 as a response to the search keyword “D” 1503. As a result, the user can acquire the image data of the application document D, which is a representative form, and the image data of the attached document (D) as the material of the designated security number “D” and confirm the contents. . Similarly, when the search keyword “E” 1504 is designated, the image data 1514 of the document E is returned as a response. As a result, the user can acquire the image data of the application document E, which is a representative form, and the image data of the attached document (E) as the material of the designated security number “E” and confirm the contents. .

次に、文書管理装置の処理手順及び文書管理方法を、フローチャートを用いて説明する。まず、文書登録時の読取・認識処理と点検・登録処理について説明し、次に文書検索時の検索処理について説明する。 Next, a processing procedure and a document management method of the document management apparatus will be described using a flowchart. First, reading / recognition processing and inspection / registration processing at the time of document registration will be described, and then search processing at the time of document search will be described.

図１２は、文書管理装置の文書登録時の読取・認識処理の手順を示したフローチャートである。対象文書がスキャナ２００にセットされ、登録が指示されて処理が開始される。
［ステップＳ０１］読み取り部１１０は、スキャナ２００にセットされる文書ＡＢＣ５１０、文書Ｄ５２０、文書Ｅ５３０及び文書Ｆ５４０、それぞれの文書のイメージデータを順次読み取り、イメージデータを生成する。 FIG. 12 is a flowchart showing a procedure of reading / recognition processing at the time of document registration of the document management apparatus. The target document is set in the scanner 200, registration is instructed, and processing is started.
[Step S01] The reading unit 110 sequentially reads the document ABC 510, the document D 520, the document E 530, and the document F 540 set in the scanner 200, and the image data of each document, and generates image data.

［ステップＳ０２］認識部１２０は、読取部１１０が読み取ったイメージデータを取得し、帳票の種別を判別する。例えば、予め定義される所定の位置に記載される帳票種別コード等を読み出し、帳票の種別を検出する。 [Step S02] The recognizing unit 120 acquires the image data read by the reading unit 110, and determines the type of the form. For example, a form type code or the like described in a predetermined position defined in advance is read to detect the form type.

［ステップＳ０３］認識部１２０は、ステップＳ０２で検出した帳票の種別が代表帳票であるか否かを判定する。代表帳票であるときは、処理をステップＳ０４に進める。代表帳票でない時は、処理をステップＳ０７に進め、認識処理を行わない。 [Step S03] The recognizing unit 120 determines whether or not the type of the form detected in Step S02 is a representative form. If it is a representative form, the process proceeds to step S04. If it is not a representative form, the process proceeds to step S07 and no recognition process is performed.

［ステップＳ０４］認識部１２０は、帳票の種別が代表帳票であるときは、この代表帳票が通常帳票であるか、包括帳票であるかを判定する。包括帳票であれば、処理をステップＳ０５に進める。包括帳票でないときは、処理をステップＳ０６に進める。 [Step S04] When the type of the form is a representative form, the recognizing unit 120 determines whether the representative form is a normal form or a comprehensive form. If it is a comprehensive form, the process proceeds to step S05. If it is not a comprehensive form, the process proceeds to step S06.

［ステップＳ０５］認識部１２０は、帳票の種別が代表帳票かつ包括帳票であるときは、文字認識処理を行って代表帳票に含まれるＭ個のインデックスを全て抽出する。例えば、予めインデックスとする項目が記載されているイメージデータ上の位置を定義しておき、その位置定義に基づいて、代表帳票のイメージデータを文字認識する。また、インデックスに対応する項目の記載欄がすべて埋まっているとは限らない。指定された全領域の文字認識を行い、記載なしと判定したときは、インデックスに含めない。こうして記載された全てのインデックスを抽出し、処理をステップＳ０７に進める。 [Step S05] When the form type is a representative form and a comprehensive form, the recognition unit 120 performs character recognition processing and extracts all M indexes included in the representative form. For example, a position on image data in which an item to be an index is described in advance is defined, and the image data of the representative form is recognized based on the position definition. Moreover, not all the description columns of the items corresponding to the index are filled. When the character recognition of all specified areas is performed and it is determined that there is no description, it is not included in the index. All indexes thus described are extracted, and the process proceeds to step S07.

［ステップＳ０６］認識部１２０は、帳票の種別が代表帳票かつ通常帳票であるときは、文字認識処理を行って代表帳票に含まれる１個のインデックスを抽出する。予めインデックスとする項目が記載されているイメージデータ上の位置を定義しておき、その位置定義に基づいて、代表帳票のイメージデータを文字認識する。 [Step S06] When the type of the form is a representative form and a normal form, the recognizing unit 120 performs a character recognition process and extracts one index included in the representative form. A position on the image data in which items to be indexed are described in advance, and the image data of the representative form is recognized based on the position definition.

［ステップＳ０７］認識部１２０は、ステップＳ０１で読み取った全ての文書のイメージデータについて、認識処理が終了したかどうかを判定する。終了していないと判定したときは、処理をステップＳ０２に進め、次の帳票のイメージデータの認識処理を行う。終了したと判定したときは、処理を図１３に示す結合子Ａへ処理を進め、点検・登録処理を行う。 [Step S07] The recognizing unit 120 determines whether or not the recognition process has been completed for the image data of all the documents read in step S01. If it is determined that the process has not been completed, the process proceeds to step S02, and the image data of the next form is recognized. If it is determined that the process has been completed, the process proceeds to a connector A shown in FIG. 13 and an inspection / registration process is performed.

図１３は、文書管理装置の文書登録時の点検・登録処理の手順を示したフローチャートである。図１２に示す結合子Ａまでの処理を実行した後、以下の処理手順を実行する。
［ステップＳ０８］点検部１３０は、前段までの処理で取得した文書のイメージデータと、抽出したインデックス等の情報を表示部１６０に表示し、オペレータによる点検作業を待つ。オペレータの確認の終了が通知されたとき、点検されたイメージデータ及びインデックスを登録部１４０へ引き渡し、次のステップへ処理を進める。 FIG. 13 is a flowchart showing a procedure of inspection / registration processing at the time of document registration of the document management apparatus. After the processing up to the connector A shown in FIG. 12 is executed, the following processing procedure is executed.
[Step S08] The inspection unit 130 displays the image data of the document acquired through the processing up to the previous stage and information such as the extracted index on the display unit 160, and waits for an inspection operation by the operator. When the completion of the operator confirmation is notified, the inspected image data and index are delivered to the registration unit 140, and the process proceeds to the next step.

［ステップＳ０９］登録部１４０は、取得した文書のイメージデータを文書単位に分割する。例えば、イメージデータの帳票種別を配列順に従って順に調査し、代表帳票を検出する。そして、代表帳票が検出されてから、次に検出された代表帳票の１つ前の帳票までを１つの文書単位とする。また、代表帳票に、文書に含まれる帳票の枚数が記載されていれば、これを利用して文書単位の区切りを検出するとしてもよい。そして、検出した区切りを用いて、イメージデータを文書単位に分割する。 [Step S09] The registration unit 140 divides the acquired image data of the document into document units. For example, the form type of the image data is examined in order according to the arrangement order, and the representative form is detected. Then, after the representative form is detected, the document form immediately before the next detected representative form is set as one document unit. Further, if the representative form describes the number of forms included in the document, it may be used to detect a document unit break. Then, using the detected delimiter, the image data is divided into document units.

［ステップＳ１０］登録部１４０は、文書単位に分割した文書のイメージデータをイメージＤＢ３３０に格納する。
［ステップＳ１１］登録部１４０は、文書を識別する文書インデックスを生成し、イメージＤＢ３３０に格納した文書単位のイメージデータの記憶領域のアドレスと対応付け、文書インデックス管理情報に登録する。文書インデックスは、文書単位ごとに、ユニークな値を設定する。例えば、イメージＤＢ３３０に格納される文書のイメージデータの位置や、代表帳票から抽出した文書を特徴づける見出し語等を設定する。設定した文書インデックス管理情報は、文書インデックスＤＢ３２０に設定する。 [Step S <b> 10] The registration unit 140 stores image data of a document divided into document units in the image DB 330.
[Step S11] The registration unit 140 generates a document index for identifying a document, associates it with the address of the storage area of the image data stored in the image DB 330, and registers it in the document index management information. As the document index, a unique value is set for each document unit. For example, a position of image data of a document stored in the image DB 330, a headword characterizing the document extracted from the representative form, and the like are set. The set document index management information is set in the document index DB 320.

［ステップＳ１２］登録部１４０は、認識部１２０が抽出したインデックスに基づいて、個別インデックスを生成する。個別インデックスは、それぞれが属する文書インデックスに対応付けて個別インデックス管理情報に登録し、個別インデックスＤＢ３１０に設定する。 [Step S12] The registration unit 140 generates an individual index based on the index extracted by the recognition unit 120. The individual index is registered in the individual index management information in association with the document index to which each belongs, and is set in the individual index DB 310.

［ステップＳ１３］登録部１４０は、全文書の処理が終了したかどうかを判定する。終了していない場合は、ステップＳ１０に戻り、次の文書の登録処理を行う。
以上の処理手順が実行され、図８に示したような関係を有する個別インデックスと文書インデックスが生成され、文書のイメージデータ管理に用いられる。 [Step S13] The registration unit 140 determines whether or not the processing of all documents has been completed. If not completed, the process returns to step S10 to perform registration processing for the next document.
The above processing procedure is executed, and an individual index and a document index having the relationship shown in FIG. 8 are generated and used for document image data management.

図１４は、文書管理装置の文書検索処理の手順を示したフローチャートである。
ユーザより検索指示を受け、処理を開始する。
［ステップＳ２１］検索部１５０は、検索指示を受けると、検索指示とともに指定された検索キーワードを入力する。 FIG. 14 is a flowchart showing a document search process procedure of the document management apparatus.
A search instruction is received from the user, and the process is started.
[Step S21] Upon receiving the search instruction, the search unit 150 inputs the specified search keyword together with the search instruction.

［ステップＳ２２］検索部１５０は、入力した検索キーワードを用いて個別インデックスＤＢ３１０の個別インデックス管理情報を検索する。検索キーワードに指定された語と一致するインデックスを検索し、検出されたときはインデックスに対応付けられている文書インデックスを抽出する。検索キーワードに指定された語が検出されなかったときは、未検出とする。 [Step S22] The search unit 150 searches the individual index management information in the individual index DB 310 using the input search keyword. An index matching the word specified as the search keyword is searched, and when it is detected, the document index associated with the index is extracted. When the word specified as the search keyword is not detected, it is determined as not detected.

［ステップＳ２３］検索部１５０は、ステップＳ２２の個別インデックスＤＢ３１０の個別インデックス管理情報の検索の結果、指定された検索キーワードに該当する個別インデックスが検出されたか否かを判定する。検出された時は、処理をステップＳ２４に進める。検出されなかったときは、処理をステップＳ２７に進める。 [Step S23] The search unit 150 determines whether an individual index corresponding to the designated search keyword is detected as a result of the search of the individual index management information in the individual index DB 310 in step S22. If detected, the process proceeds to step S24. If not detected, the process proceeds to step S27.

［ステップＳ２４］検索部１５０は、個別インデックスが検出されたときは、個別インデックスに対応付けられている文書インデックスを抽出する。そして、この文書インデックスを用いて文書インデックスＤＢ３２０の文書インデックス管理情報を検索し、文書インデックスに対応付けられている文書単位のイメージデータが記憶されるイメージＤＢ３３０のアドレスを抽出する。 [Step S24] When an individual index is detected, the search unit 150 extracts a document index associated with the individual index. Then, using this document index, the document index management information in the document index DB 320 is searched, and the address of the image DB 330 in which the image data of the document unit associated with the document index is stored is extracted.

［ステップＳ２５］検索部１５０は、ステップＳ２４で抽出された対象文書の文書単位のイメージデータが記憶されるアドレスに基づいてイメージＤＢ３３０を検索し、該当する記憶領域に格納されているイメージデータを読み出す。 [Step S25] The search unit 150 searches the image DB 330 based on the address where the image data of the target document extracted in step S24 is stored, and reads the image data stored in the corresponding storage area. .

［ステップＳ２６］検索部１５０は、ステップＳ２５で読み出したイメージデータを表示部１６０に表示し、処理を終了する。
［ステップＳ２７］検索部１５０は、入力したキーワードに一致する個別インデックスが検出されなかったときは、表示部１６０にエラー画面を表示するなどしてエラーを通知し、処理を終了する。 [Step S26] The search unit 150 displays the image data read in step S25 on the display unit 160, and ends the process.
[Step S27] When the individual index matching the input keyword is not detected, the search unit 150 notifies the error by displaying an error screen on the display unit 160, and ends the process.

以上の処理手順が実行されることにより、指定した検索キーワードに関連する文書のイメージデータが表示部１６０に表示される。また、付帯帳票が含まれる検索キーワードが指定された場合であっても、１回の検索で付帯帳票のイメージデータを表示部１６０に表示させることができる。 By executing the above processing procedure, the image data of the document related to the designated search keyword is displayed on the display unit 160. Further, even when a search keyword including an incidental form is designated, the image data of the incidental form can be displayed on the display unit 160 by one search.

次に、第２の実施の形態について説明する。第１の実施の形態では、包括帳票から抽出された個別インデックスを文書単位の文書インデックスに対応付け、各付帯帳票に対応する個別インデックスが指定されたとき、文書単位のイメージデータを抽出して表示するとした。このように、文書単位のイメージデータを抽出し、表示部１６０に表示すれば、先頭の代表帳票の内容を確認し、続いて、所望の付帯帳票まで画面をスクロールさせて、付帯帳票を確認することができる。しかし、包括帳票に多くのインデックスが記載されている場合等、文書単位のイメージデータを取得しても所望の付帯帳票を探し出すことが大変な場合もある。そこで、第２の実施の形態では、検索時に、文書単位のイメージデータから指定された個別インデックスに対応する付帯帳票のイメージデータを抽出し、代表帳票とともに表示する。 Next, a second embodiment will be described. In the first embodiment, the individual index extracted from the comprehensive form is associated with the document unit document index, and when the individual index corresponding to each supplementary form is designated, the document unit image data is extracted and displayed. Then. In this way, when image data in units of documents is extracted and displayed on the display unit 160, the contents of the first representative form are confirmed, and then the screen is scrolled to a desired incident form to confirm the incident form. be able to. However, there are cases where it is difficult to find a desired incidental form even if image data for each document is acquired, such as when many indexes are described in a comprehensive form. Therefore, in the second embodiment, at the time of retrieval, the image data of the attached form corresponding to the specified individual index is extracted from the document unit image data, and is displayed together with the representative form.

なお、第２の実施の形態における文書管理装置が有する処理機能の構成は、図２〜４に示した第１の実施の形態の構成要素と同様である。そこで、図２〜４に示した構成要素の符号を用いて、第２の実施の形態における機能を説明する。 Note that the configuration of the processing functions of the document management apparatus in the second embodiment is the same as the components of the first embodiment shown in FIGS. Therefore, functions in the second embodiment will be described using the reference numerals of the components shown in FIGS.

第２の実施の形態の文書管理装置における登録処理及び検索処理について説明する。なお、代表帳票が通常帳票であるときの処理は、第１の実施の形態と同様であるので、以下の説明では、代表帳票が包括帳票である場合について説明する。 Registration processing and search processing in the document management apparatus according to the second embodiment will be described. Note that the processing when the representative form is a normal form is the same as that in the first embodiment, and therefore, in the following description, the case where the representative form is a comprehensive form will be described.

図１５は、第２の実施の形態の代表帳票が包括帳票の場合の文書登録処理を示した図である。文書ＡＢＣ５５０は、包括帳票である申込書類ＡＢＣ５５１と、添付書類（Ａ）５５２、添付書類（Ｂ）５５３及び添付書類（Ｃ）５５４と、を有する。申込書類ＡＢＣ５５１には、インデックスとする証券番号「Ａ」、「Ｂ」、「Ｃ」に対応付けて、それぞれの証券番号に対応する添付書類の枚数が記載されている。図１５の例では、証券番号「Ａ」に対応する添付書類（Ａ）５５２は１枚、証券番号「Ｂ」に対応する添付書類（Ｂ）５５３は２枚、及び証券番号「Ｃ」に対応する添付書類（Ｃ）５５４は１枚、が申込書類ＡＢＣ５５１に添付されることが記載されている。 FIG. 15 is a diagram illustrating document registration processing when the representative form of the second embodiment is a comprehensive form. The document ABC550 includes an application document ABC551 that is a comprehensive form, an attached document (A) 552, an attached document (B) 553, and an attached document (C) 554. In the application document ABC551, the number of attached documents corresponding to each security number is described in association with the security numbers “A”, “B”, and “C” as indexes. In the example of FIG. 15, one attached document (A) 552 corresponding to the security number “A”, two attached documents (B) 553 corresponding to the security number “B”, and corresponding to the security number “C”. It is described that one attached document (C) 554 is attached to the application document ABC551.

読取部１１０は、申込書類ＡＢＣ５５１、添付書類（Ａ）５５２、添付書類（Ｂ）５５３、添付書類（Ｃ）５５４の順に画像を読み取り、そのイメージデータを生成する。認識部１７０は、読取部１１０が読み取ったイメージデータの帳票を判別し、帳票が代表帳票であるときは、インデックスと、それぞれのインデックスに対応する付帯帳票の枚数とを抽出する。付帯帳票の枚数についても、予め定義された領域のイメージデータを抽出し、このイメージデータに対して文字認識を行うことによって取得する。インデックスと付帯帳票の枚数は、例えば、イメージデータ上のインデックスの位置と、付帯帳票の枚数の位置とを定義しておき、定義に基づいて抽出したイメージデータを文字認識して得る。図１５の例では、申込書類ＡＢＣ５５１を代表帳票かつ包括帳票と認識し、Ｍ個のインデックスと、枚数とを抽出する。ここでは、認識情報１７０１として、「Ａ」、「Ｂ」、「Ｃ」の３個の個別インデックスと、それぞれの個別インデックスに対応する付帯帳票の枚数「１」、「２」、「１」を抽出する。添付書類（Ａ）５５２、添付書類（Ｂ）５５３及び添付書類（Ｃ）５５４は、付帯帳票と判別し、認識処理は行わない。各帳票のイメージデータと、認識情報１７０１は、点検部１３０に引き渡される。点検部１３０の処理は、第１の実施の形態と同様であるので説明は省略する。 The reading unit 110 reads the application document ABC551, the attached document (A) 552, the attached document (B) 553, and the attached document (C) 554 in this order, and generates the image data. The recognizing unit 170 determines the form of the image data read by the reading unit 110, and when the form is a representative form, extracts an index and the number of attached forms corresponding to each index. The number of attached forms is also obtained by extracting image data of a predefined area and performing character recognition on the image data. The number of the index and the accompanying form is obtained by, for example, defining the position of the index on the image data and the position of the number of the accompanying form, and character-recognizing the image data extracted based on the definition. In the example of FIG. 15, the application document ABC551 is recognized as a representative form and a comprehensive form, and M indexes and the number of sheets are extracted. Here, as the recognition information 1701, three individual indexes “A”, “B”, and “C” and the number of attached forms “1”, “2”, and “1” corresponding to each individual index are displayed. Extract. The attached document (A) 552, the attached document (B) 553, and the attached document (C) 554 are discriminated as accompanying documents, and recognition processing is not performed. The image data of each form and the recognition information 1701 are delivered to the inspection unit 130. Since the processing of the inspection unit 130 is the same as that of the first embodiment, description thereof is omitted.

登録部１８０は、イメージデータを文書単位に分割し、文書単位のイメージデータをイメージＤＢ３３０ｃに格納する。図１５の例では、申込書類ＡＢＣのイメージデータ５５１１、添付書類（Ａ）のイメージデータ５５２１、添付書類（Ｂ）のイメージデータ５５３１及び添付書類（Ｃ）のイメージデータ５５４１をこの順に配列し、イメージＤＢ３３０ｃに格納する。また、文書ＡＢＣ５５０に関する文書インデックス「ＩＮＤＥＸ−Ａ」を生成し、インデックスＤＢ３２０ｃに格納する。そして、申込書類ＡＢＣ５５１から抽出した認識情報１７０１に基づき、個別インデックスを生成する。図１５の例では、認識部１７０が抽出した「Ａ」、「Ｂ」、「Ｃ」をそれぞれ文書ＡＢＣのイメージデータを示す文書インデックス「ＩＮＤＥＸ−Ａ」に対応付け、個別インデックスＤＢ３１０ｃの個別インデックス管理情報に登録する。また、このとき、付帯帳票情報として、「位置」と「枚数」を個別インデックスに対応付けて記憶する。「枚数」は、個別インデックスとともに文字認識して得られた申込書類ＡＢＣ５５１に記載された数値である。「位置」は、「枚数」に基づき、個別インデックスに対応する付帯帳票が代表帳票から何枚目にあるかを示すデータである。位置情報は、帳票の配列が１つ前の個別インデックスの「位置」に、前の個別インデックスに対応する「枚数」を加算して得られる。例えば、申込書類ＡＢＣ５５１に続く添付書類（Ａ）５５２の「位置」は、代表帳票の次に当たるので「１」になる。添付書類（Ａ）５５２の次に配置される添付書類（Ｂ）５５３の位置は、添付書類（Ａ）５５２の位置「１」に、添付書類（Ａ）５５２の枚数「１」を加算した「２」になる。こうして、順次、個別インデックスに対応する付帯帳票について、代表帳票である申込書類ＡＢＣ５５１からの枚数を算出し、個別インデックスに関連付け、個別インデックスＤＢ３１０ｃの個別インデックス管理情報に登録する。 The registration unit 180 divides the image data into document units, and stores the document unit image data in the image DB 330c. In the example of FIG. 15, the image data 5511 of the application document ABC, the image data 5521 of the attached document (A), the image data 5531 of the attached document (B), and the image data 5541 of the attached document (C) are arranged in this order. Store in the DB 330c. Also, a document index “INDEX-A” relating to the document ABC550 is generated and stored in the index DB 320c. Then, an individual index is generated based on the recognition information 1701 extracted from the application document ABC551. In the example of FIG. 15, “A”, “B”, and “C” extracted by the recognition unit 170 are associated with the document index “INDEX-A” indicating the image data of the document ABC, and the individual index management of the individual index DB 310c is performed. Register for information. At this time, “position” and “number of sheets” are stored in association with the individual index as supplementary form information. “Number of sheets” is a numerical value described in the application document ABC551 obtained by character recognition together with the individual index. “Position” is data indicating the number of the attached form corresponding to the individual index from the representative form based on the “number of sheets”. The position information is obtained by adding the “number of sheets” corresponding to the previous individual index to the “position” of the previous individual index in the form array. For example, the “position” of the attached document (A) 552 following the application document ABC551 is “1” because it is next to the representative form. The position of the attached document (B) 553 to be placed next to the attached document (A) 552 is obtained by adding the number “1” of the attached document (A) 552 to the position “1” of the attached document (A) 552. 2 ”. In this way, the number of sheets from the application document ABC551, which is a representative form, is sequentially calculated for the incidental form corresponding to the individual index, and is associated with the individual index and registered in the individual index management information of the individual index DB 310c.

図１６は、第２の実施の形態のインデックス管理情報の一例を示した図である。（Ａ）は、個別インデックス管理情報である。なお、文書インデックス管理情報については、図９に示した文書インデックス管理情報３２００と同様であるので説明は省略する。 FIG. 16 is a diagram illustrating an example of index management information according to the second embodiment. (A) is individual index management information. The document index management information is the same as the document index management information 3200 shown in FIG.

個別インデックス管理情報３１１０は、対象文書の代表帳票から抽出した個別インデックス３１１１に対応付けて、参照先の文書インデックス３１１２、添付書類位置３１１３３及び添付書類枚数３１１４が登録されている。参照先の文書インデックス３１１２は、個別インデックスに対応付けた文書を示す。例えば、文書ＡＢＣ５５０の申込書類５５１から抽出された個別インデックス、「Ａ」、「Ｂ」、「Ｃ」には、「ＩＮＤＥＸ−Ａ」を参照先の文書インデックスとすることが登録されている。添付書類位置３１１３には、個別インデックスに対応する付帯帳票について、この文書の先頭の代表帳票からの位置、ここでは、代表帳票から何枚目にあるかが登録されている。添付書類枚数３１１４には、個別インデックスに対応する付帯帳票の枚数が登録されている。例えば、インデックス「Ｂ」に対応する添付書類（Ｂ）５５３については、文書インデックスは「ＩＮＤＥＸ−Ａ」、添付書類位置は「２」、添付書類枚数は「２」が登録されている。 In the individual index management information 3110, a document index 3112 to be referred to, an attached document position 31133, and the number of attached documents 3114 are registered in association with the individual index 3111 extracted from the representative form of the target document. The reference destination document index 3112 indicates a document associated with the individual index. For example, in the individual indexes “A”, “B”, and “C” extracted from the application document 551 of the document ABC550, “INDEX-A” is registered as a reference destination document index. The attached document position 3113 registers the position of the incidental form corresponding to the individual index from the top representative form of this document, in this case, the number from the representative form. In the number of attached documents 3114, the number of attached forms corresponding to the individual index is registered. For example, for the attached document (B) 553 corresponding to the index “B”, the document index “INDEX-A”, the attached document position “2”, and the number of attached documents “2” are registered.

なお、図１５，１６の例では、代表帳票に付帯帳票の枚数が記載されているとしたが、例えば、目次のように対応する付帯帳票が先頭の帳票から何番目にあるかを示す位置情報が予め代表帳票に記載されていてもよい。この場合は、文字認識によって読み出した位置情報が、添付書類位置３１３に登録される。また、別途、オペレータ等によって位置情報が定義されているとしてもよい。 In the examples of FIGS. 15 and 16, it is assumed that the number of supplementary forms is described in the representative form. For example, as shown in the table of contents, position information indicating the position of the corresponding supplementary form from the top form. May be described in the representative form in advance. In this case, the position information read by character recognition is registered in the attached document position 313. In addition, position information may be defined separately by an operator or the like.

このようなインデックス管理情報３１１０に基づき、検索処理が行われる。
図１７は、第２の実施の形態の代表帳票が包括帳票の場合の文書検索処理を示した図である。個別インデックスＤＢ３１０ｃには、それぞれが文書インデックスに対応付けられた個別インデックス「Ａ」、「Ｂ」、「Ｃ」が、位置（付帯書類位置）及び枚数（付帯書類枚数）とともに登録されている。文書インデックスＤＢ３２０ｃには、イメージデータに対応付けられた文書インデックス「ＩＮＤＥＸ−Ａ」が登録されている。イメージＤＢ３３０ｃには、文書のイメージデータが文書単位に格納されている。 A search process is performed based on such index management information 3110.
FIG. 17 is a diagram illustrating document search processing when the representative form according to the second embodiment is a comprehensive form. In the individual index DB 310c, individual indexes “A”, “B”, and “C”, each associated with a document index, are registered together with a position (position of attached documents) and a number of sheets (number of attached documents). A document index “INDEX-A” associated with the image data is registered in the document index DB 320c. The image DB 330c stores document image data in document units.

検索部１９０は、指定された検索キーワードに基づいてインデックスを検索し、イメージデータが格納される記憶領域の先頭アドレスを検出する。図１７の例では、検索キーワード「Ａ」１９０１が指定されると、検索部１９０は、個別インデックスＤＢ３１０ｃの個別インデックス管理情報を検索し、個別インデックス「Ａ」を検出する。このとき、「位置＝１」、「枚数＝１」も取得する。個別インデックス「Ａ」は、文書インデックス「ＩＮＤＥＸ−Ａ」に対応付けられているので、文書インデックスＤＢ３２０ｃを検索し、「申込書類ＡＢＣ」、「添付書類（Ａ）」、「添付書類（Ｂ）」及び「添付書類（Ｃ）」を有する文書ＡＢＣのイメージデータ格納領域を検出する。検索部１９０は、インデックス「Ａ」に対応する「位置＝１」及び「枚数＝１」に基づいて、インデックス「Ａ」に対応する付帯帳票である添付書類（Ａ）のイメージデータは、代表帳票である申込書類ＡＢＣから１枚目の位置にある１枚の書類であることを特定する。そして、添付書類（Ａ）分のイメージデータを抽出し、代表帳票である申込書類ＡＢＣのイメージデータと合わせたイメージデータ１９１１を生成し、検索キーワード「Ａ」１９０１の応答として返す。これにより、ユーザは、指定した証券番号「Ａ」について、代表帳票である申込書類ＡＢＣのイメージデータと、添付書類（Ａ）のイメージデータとを取得し、内容を確認することができる。続いて、検索キーワード「Ｂ」１９０２が指定されたとする。検索部１９０は、個別インデックスＤＢ３１０ｃの個別インデックス管理情報を検索し、個別インデックス「Ｂ」を検出する。このとき、「位置＝２」、「枚数＝２」も取得する。個別インデックス「Ｂ」は、文書インデックス「ＩＮＤＥＸ−Ａ」に対応付けられているので、文書インデックスＤＢ３２０ｃを検索し、「申込書類ＡＢＣ」、「添付書類（Ａ）」、「添付書類（Ｂ）」及び「添付書類（Ｃ）」を有する文書ＡＢＣのイメージデータ格納領域を検出する。さらに、インデックス「Ｂ」に対応する「位置＝２」及び「枚数＝２」に基づいて、インデックス「Ｂ」に対応する付帯帳票である添付書類（Ｂ）のイメージデータは、代表帳票である申込書類ＡＢＣから２枚目の位置にある２枚の書類であることを特定する。そして、添付書類（Ｂ）分のイメージデータを抽出し、代表帳票である申込書類ＡＢＣのイメージデータと合わせたイメージデータ１９１２を生成し、検索キーワード「Ｂ」１９０２の応答として返す。これにより、ユーザは、指定した証券番号「Ｂ」の資料として、代表帳票である申込書類ＡＢＣのイメージデータと、添付書類（Ｂ）のイメージデータとを取得し、内容を確認することができる。 The search unit 190 searches the index based on the specified search keyword, and detects the top address of the storage area in which the image data is stored. In the example of FIG. 17, when the search keyword “A” 1901 is designated, the search unit 190 searches the individual index management information in the individual index DB 310c and detects the individual index “A”. At this time, “position = 1” and “number of sheets = 1” are also acquired. Since the individual index “A” is associated with the document index “INDEX-A”, the document index DB 320c is searched, and “application document ABC”, “attached document (A)”, “attached document (B)”. And the image data storage area of the document ABC having “attached document (C)” is detected. Based on “position = 1” and “number of sheets = 1” corresponding to the index “A”, the search unit 190 obtains the image data of the attached document (A) corresponding to the index “A” as the representative form. It is specified that it is one document located at the first position from the application document ABC. Then, the image data for the attached document (A) is extracted, image data 1911 combined with the image data of the application document ABC, which is a representative form, is generated, and returned as a response to the search keyword “A” 1901. Thereby, the user can acquire the image data of the application document ABC, which is a representative form, and the image data of the attached document (A) for the specified securities number “A” and confirm the contents. Subsequently, it is assumed that the search keyword “B” 1902 is designated. The search unit 190 searches the individual index management information in the individual index DB 310c and detects the individual index “B”. At this time, “position = 2” and “number of sheets = 2” are also acquired. Since the individual index “B” is associated with the document index “INDEX-A”, the document index DB 320c is searched, and “application document ABC”, “attached document (A)”, “attached document (B)”. And the image data storage area of the document ABC having “attached document (C)” is detected. Further, based on “position = 2” and “number of sheets = 2” corresponding to the index “B”, the image data of the attached document (B) corresponding to the index “B” is a representative form. It is specified that the two documents are in the second position from the document ABC. Then, the image data for the attached document (B) is extracted, image data 1912 combined with the image data of the application document ABC as a representative form is generated, and returned as a response to the search keyword “B” 1902. As a result, the user can acquire the image data of the application document ABC, which is a representative form, and the image data of the attached document (B) as the material of the designated security number “B” and confirm the contents.

なお、第２の実施の形態では、登録処理時に枚数に基づいて位置を算出し、個別インデックスに関連付けて登録しておくとしていたが、枚数だけを登録しておき、位置の算出処理は検索処理で行うとすることもできる。 In the second embodiment, the position is calculated based on the number of sheets and registered in association with the individual index during the registration process. However, only the number of sheets is registered, and the position calculation process is a search process. You can also do it.

第２の実施の形態によれば、代表帳票が包括帳票である文書について、代表帳票と、検索キーワードで指定したインデックスに対応する付帯帳票のみを選択的に抽出することができる。このため、文書単位に抽出された多くのイメージデータから所望の付帯帳票を探し出す手間が省け、検索効率をさらに向上させることができる。 According to the second embodiment, for a document whose representative form is a comprehensive form, only the representative form and the incidental form corresponding to the index specified by the search keyword can be selectively extracted. For this reason, it is possible to save the trouble of searching for a desired incidental form from a lot of image data extracted in units of documents, and the search efficiency can be further improved.

なお、上記の処理機能は、コンピュータによって実現することができる。その場合、文書管理装置が有すべき機能の処理内容を記述したプログラムが提供される。そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、磁気記憶装置、光ディスク、光磁気記録媒体、半導体メモリなどがある。磁気記憶装置には、ハードディスク装置（ＨＤＤ）、フレキシブルディスク（ＦＤ）、磁気テープなどがある。光ディスクには、ＤＶＤ、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ／ＲＷなどがある。光磁気記録媒体には、ＭＯ（Magneto-Optical disk）などがある。 The above processing functions can be realized by a computer. In that case, a program describing the processing contents of the functions that the document management apparatus should have is provided. By executing the program on a computer, the above processing functions are realized on the computer. The program describing the processing contents can be recorded on a computer-readable recording medium. Examples of the computer-readable recording medium include a magnetic storage device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Examples of the magnetic storage device include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape. Optical discs include DVD, DVD-RAM, CD-ROM / RW, and the like. Magneto-optical recording media include MO (Magneto-Optical disk).

プログラムを流通させる場合には、例えば、そのプログラムが記録されたＤＶＤ、ＣＤ−ＲＯＭなどの可搬型記録媒体が販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When distributing the program, for example, a portable recording medium such as a DVD or a CD-ROM in which the program is recorded is sold. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、ネットワークを介して接続されたサーバコンピュータからプログラムが転送されるごとに、逐次、受け取ったプログラムに従った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. In addition, each time a program is transferred from a server computer connected via a network, the computer can sequentially execute processing according to the received program.

また、上記の処理機能の少なくとも一部を、ＤＳＰ（Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）などの電子回路で実現することもできる。 In addition, at least a part of the above processing functions can be realized by an electronic circuit such as a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), or a PLD (Programmable Logic Device).

１文書管理装置
１ａ読取手段
１ｂ認識手段
１ｃ登録手段
１ｄ検索手段
３記憶装置
３ａ個別インデックス管理情報
３ｂ文書インデックス管理情報
３ｃイメージデータ
５文書
５ａ代表帳票
５ｂ付帯帳票
７文書イメージデータ DESCRIPTION OF SYMBOLS 1 Document management apparatus 1a Reading means 1b Recognition means 1c Registration means 1d Search means 3 Storage device 3a Individual index management information 3b Document index management information 3c Image data 5 Document 5a Representative form 5b Attached form 7 Document image data

Claims

In a document management program that causes a computer that performs document management processing to digitize and manage documents to function,
The computer,
An image that has a representative form and an incidental form attached to the representative form, and an index for identifying the incidental form associated with the representative form is obtained by digitizing a document described in the representative form, and Recognizing means for performing character recognition on image data of a representative form and extracting all the indexes described in the representative form;
Registration means for storing the acquired image data of the document in image data storage means, and registering the address storing the image data of the document in index management information in association with the index extracted from the representative form of the document;
When a search target index is specified, search means for searching the index management information based on the specified index, and extracting image data of the document corresponding to the specified index,
Document management program characterized by functioning as

The index management information includes document index management information that associates an address of image data of the document with a document index that identifies the document, and individual index management information that associates the index with a document index of a document including the representative form. Have
The registration means registers the address storing the document image data in the document index management information in association with the document index of the document, and extracts the index extracted from the representative form of the document including the representative form. Register to the individual index management information in association with the document index,
The search means searches the individual index management information based on the specified index to detect a document index corresponding to the specified index, and determines the document index management information based on the detected document index. Searching and detecting an address storing the image data of the document corresponding to the document index, and extracting the image data of the document;
The document management program according to claim 1, wherein:

The recognizing means extracts the position information of the auxiliary form together with the index by character recognition when the position information of the auxiliary form in the document to which the auxiliary form belongs is described in the representative form,
The registration unit registers the extracted position information of the incidental form in the index management information in association with the index,
The search means searches the index management information based on the specified index to obtain an address where the image data of the document corresponding to the specified index is stored and position information of the incidental form, Based on the acquired address where the image data of the document is stored and the position information of the incidental form, the address where the image data of the incidental form corresponding to the index is stored is detected, and the image data of the incidental form is detected. Extract,
The document management program according to claim 1, wherein:

The recognizing means extracts the number of the attached forms described in the representative form as the position information of the accompanying form;
The registration means includes a first form of the document for each supplementary form based on the number of the supplementary forms included in the document extracted from the image data of the document and the arrangement order of the supplementary forms in the document. Calculating the number of sheets from and registering the calculated number of sheets as position information in the index management information in association with the index corresponding to the supplementary form.
The document management program according to claim 3.

In a document management method for performing document management processing for digitizing and managing documents,
Computer
An image that has a representative form and an incidental form attached to the representative form, and an index for identifying the incidental form associated with the representative form is obtained by digitizing a document described in the representative form, and Perform character recognition on the image data of the representative form to extract all the indexes described in the representative form,
The acquired image data of the document is stored in an image data storage unit, and the address storing the image data of the document is registered in the index management information in association with the index extracted from the representative form of the document,
When an index to be searched is specified, the index management information is searched based on the specified index, and image data of the document corresponding to the specified index is extracted.
A document management method characterized by executing a procedure.

In a document management apparatus that performs document management processing for digitizing and managing documents,
An image that has a representative form and an incidental form attached to the representative form, and an index for identifying the incidental form associated with the representative form is obtained by digitizing a document described in the representative form, and Recognition means for performing character recognition on the image data of the representative form and extracting all the indexes described in the representative form;
Registration means for storing the acquired image data of the document in an image data storage means, and registering the address storing the image data of the document in index management information in association with the index extracted from the representative form of the document; ,
When an index to be searched is specified, search means for searching the index management information based on the specified index and extracting image data of the document corresponding to the specified index;
A document management apparatus comprising: