JP2010097262A

JP2010097262A - Database creation device, database creation method, and computer program

Info

Publication number: JP2010097262A
Application number: JP2008265353A
Authority: JP
Inventors: Takayuki Ueno; 貴之上野
Original assignee: Keyence Corp
Current assignee: Keyence Corp
Priority date: 2008-10-14
Filing date: 2008-10-14
Publication date: 2010-04-30

Abstract

PROBLEM TO BE SOLVED: To provide a database creation device, a database creation method and a computer program, for, even when a plurality of table data having different data formats exist, easily creating a new database, and for correctly calculating a tabulation result even when there exists tabulation items. SOLUTION: One or a plurality of electronic document files are obtained, and the database items and data extraction rules of a database to be generated are specified. The database items and the corresponding data are extracted from the electronic document file based on the specified database items and data extraction rules. A data type is detected for each extracted database item, and when a tabulation instruction is accepted about the database item whose data type is determined to be a numerical type, the tabulation item of each database item is added to the extracted database item and the corresponding data, and displayed in the form of a list. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、異なるデータ形式を有する複数の表データが存在する場合であっても、容易に一のデータベースを生成することができ、集計項目を累計することができるデータベース生成装置、データベース生成方法及びコンピュータプログラムに関する。 The present invention provides a database generation apparatus, a database generation method, and a database generation apparatus capable of easily generating one database and accumulating total items even when there are a plurality of table data having different data formats It relates to a computer program.

関係データベースを生成する場合、事前に生成されている表データを利用することが多い。表データのデータベース項目が一致している場合には、複数の表データをマージすることにより容易に新しいデータベースを生成することができる。 When generating a relational database, table data generated in advance is often used. When the database items of the table data match, a new database can be easily generated by merging a plurality of table data.

しかし、表データのデータ形式が標準化されていない場合、表データのデータ形式は作成者に依存しており、また同一のアプリケーションであっても使用するソフトウェアプログラムによってデータベース項目の相違、データベース項目の配列の相違等が存在しており、そのままマージすることができない。斯かる問題を解決するべく、従来は中間ファイルフォーマットを用いて、データベース項目が相違している、あるいはデータベース項目の配列順序が相違している複数の表データをマージして１つの表データを生成している。 However, if the data format of the table data is not standardized, the data format of the table data depends on the creator, and even in the same application, the database items differ depending on the software program used, the database item array There are differences, etc., and merging is not possible. In order to solve such a problem, conventionally, an intermediate file format is used to generate one table data by merging a plurality of table data having different database items or different database item arrangement orders. is doing.

例えば特許文献１では、表データの中間ファイルとして良く用いられているＣＳＶファイルを用い、複数の表データファイルから１つのデータベースを生成しているデータベース管理システムが開示されている。
特開２００６−０５９１３５号公報 For example, Patent Document 1 discloses a database management system that uses a CSV file that is often used as an intermediate file of table data and generates one database from a plurality of table data files.
JP 2006-059135 A

しかし、特許文献１のように、ＣＳＶファイルを介して複数の表データをマージする場合、どのデータベース項目が相違しているか、どのデータベース項目の配列順序が相違しているか等に関する情報を事前に知っておく必要が有り、これらの情報に応じて適切な変換手順を設定しておかないと、所望の表データを生成することができないという問題点があった。 However, as in Patent Document 1, when a plurality of table data are merged via a CSV file, information regarding which database items are different and which database items are arranged in different order is known in advance. There is a problem that it is impossible to generate desired table data unless an appropriate conversion procedure is set in accordance with the information.

また、異なる表データ間で横断的な集計処理を実行する場合、単に表データをマージするだけではなく、データベース項目ごとに事前にデータ型を認識しておく必要がある。例えばデータ型が数値型のデータベース項目については集計することが可能であるのに対して、データ型が文字型であるデータベース項目については集計することができない。したがって、新たなデータベースを生成する基礎となる表データのすべての項目についてデータ型を認識しておかなければ、新たなデータベースを生成した場合に集計処理を確実に実行できることを保証することができない。 In addition, when performing a cross tabulation process between different table data, it is necessary not only to merge table data but also to recognize a data type for each database item in advance. For example, database items whose data types are numeric types can be aggregated, whereas database items whose data types are character types cannot be aggregated. Therefore, unless the data types are recognized for all items of the table data serving as a basis for generating a new database, it cannot be guaranteed that the aggregation process can be reliably executed when the new database is generated.

さらに、表データのセル位置に基づいて自動的に表データをマージする方法も考えられているが、いわゆるゆらぎ情報が存在する場合、新たなデータベースを生成する基礎となるすべての表データの項目間の関係を事前に把握する必要があり、処理が煩雑になるという問題点があった。 In addition, a method of automatically merging table data based on the cell position of the table data is also considered, but if so-called fluctuation information exists, the items between all table data items that form the basis for creating a new database It is necessary to grasp the relationship in advance, and there is a problem that processing becomes complicated.

本発明は斯かる事情に鑑みてなされたものであり、異なるデータ形式を有する複数の表データが存在する場合であっても、容易に新たなデータベースを生成することができ、集計項目が存在する場合であっても正しく集計結果を算出することができるデータベース生成装置、データベース生成方法及びコンピュータプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and even when there are a plurality of table data having different data formats, a new database can be easily generated, and there are tabulation items. It is an object of the present invention to provide a database generation device, a database generation method, and a computer program that can correctly calculate a totaling result even in a case.

上記目的を達成するために第１発明に係るデータベース生成装置は、表データが含まれる一又は複数の電子文書ファイル中から抽出したデータに基づいて新たなデータベースを生成するデータベース生成装置において、一又は複数の前記電子文書ファイルを取得する電子文書ファイル取得手段と、生成するデータベースのデータベース項目及びデータ抽出規則を特定するデータ抽出規則特定手段と、特定したデータベース項目及びデータ抽出規則にて、一又は複数の前記電子文書ファイルから前記データベース項目及び対応するデータを抽出するデータ抽出手段と、抽出されたデータベース項目ごとにデータ型を検出するデータ型検出手段と、前記データベース項目ごとに、データ型が数値型であるか否かを判断する型判断手段と、該型判断手段で、データ型が数値型であると判断されたデータベース項目について、集計指示を受け付けたか否かを判断する指示判断手段と、該指示判断手段で集計指示を受け付けていないと判断した場合、抽出されたデータベース項目及び対応するデータを一覧表示し、前記指示判断手段で集計指示を受け付けたと判断した場合、データベース項目ごとの集計項目を加えて前記データベース項目及び対応するデータを一覧表示する表示手段とを備えることを特徴とする。 In order to achieve the above object, a database generation apparatus according to the first invention is a database generation apparatus that generates a new database based on data extracted from one or a plurality of electronic document files including table data. One or more of electronic document file acquisition means for acquiring a plurality of electronic document files, data extraction rule specifying means for specifying database items and data extraction rules of a database to be generated, and specified database items and data extraction rules Data extraction means for extracting the database item and corresponding data from the electronic document file, data type detection means for detecting a data type for each extracted database item, and for each database item, the data type is a numeric type Type determination means for determining whether or not the type, and the type determination If the database type is determined to be numeric, the instruction determination means for determining whether or not the aggregation instruction has been accepted and the instruction determination means determines that the aggregation instruction has not been accepted. Display means for displaying a list of the database items and corresponding data, and adding a total item for each database item to display a list of the database items and corresponding data when it is determined that the totaling instruction is received by the instruction determination unit. It is characterized by providing.

また、第２発明に係るデータベース生成装置は、第１発明において、前記データ型検出手段は、前記データベース項目ごとに数値変換する数値変換手段と、数値変換にエラーが生じたか否かを判断するエラー判断手段と、該エラー判断手段でエラーが生じたと判断した場合、該データベース項目のデータ型が文字型であると検出し、エラーが生じていないと判断した場合、該データベース項目のデータ型が数値型であると検出する検出手段とを備えることを特徴とする。 According to a second aspect of the present invention, there is provided the database generation device according to the first aspect, wherein the data type detection means includes a numerical value conversion means for performing numerical conversion for each database item, and an error for determining whether or not an error has occurred in the numerical conversion. If it is determined that an error has occurred in the determination means and the error determination means, the data type of the database item is detected as a character type, and if it is determined that no error has occurred, the data type of the database item is a numeric value. And detecting means for detecting that it is a mold.

また、第３発明に係るデータベース生成装置は、第１又は第２発明において、異なる電子文書ファイルから抽出した表データの位置の相違に関する情報、及び／又は異なる電子文書ファイルから抽出したデータベース項目の相違に関する情報を少なくとも含む表データの相違に関するゆらぎ情報を抽出するゆらぎ情報抽出手段と、前記データベース項目、前記データ抽出規則及び抽出されたゆらぎ情報に基づいて、前記データ抽出規則の変更部分が存在するか否かを判断する判断手段と、該判断手段で変更部分が存在すると判断した場合、同一のデータベース項目に対しては同一の、異なるデータベース項目に対しては異なるタグ情報を付与するタグ情報付与手段とを備え、前記データ抽出手段は、前記データ抽出規則の変更部分を反映して前記データベース項目及び対応するデータを抽出するようにしてあり、前記表示手段は、前記データベース項目に付与されているタグ情報に従って前記データベース項目を配列して、前記データベース項目及び対応するデータを一覧表示するようにしてあることを特徴とする。 Further, the database generation device according to the third invention is the database generation device according to the first or second invention, wherein the information on the difference in the position of the table data extracted from different electronic document files and / or the difference of the database items extracted from the different electronic document files. Fluctuation information extracting means for extracting fluctuation information relating to differences in table data including at least information relating to, and whether there is a change part of the data extraction rule based on the database item, the data extraction rule, and the extracted fluctuation information Determining means for determining whether or not, and tag information providing means for assigning different tag information to the same database item and different database items when it is determined by the determining means that there is a changed portion And the data extraction means reflects the changed part of the data extraction rule. The database items and corresponding data are extracted, and the display means arranges the database items according to tag information given to the database items, and displays the database items and corresponding data as a list. It is characterized by the above.

次に、上記目的を達成するために第４発明に係るデータベース生成方法は、表データが含まれる一又は複数の電子文書ファイル中から抽出したデータに基づいて新たなデータベースを生成するデータベース生成装置で実行することが可能なデータベース生成方法において、一又は複数の前記電子文書ファイルを取得し、生成するデータベースのデータベース項目及びデータ抽出規則を特定し、特定したデータベース項目及びデータ抽出規則にて、一又は複数の前記電子文書ファイルから前記データベース項目及び対応するデータを抽出し、抽出されたデータベース項目ごとにデータ型を検出し、前記データベース項目ごとに、データ型が数値型であるか否かを判断し、データ型が数値型であると判断されたデータベース項目について、集計指示を受け付けたか否かを判断し、集計指示を受け付けていないと判断した場合、抽出されたデータベース項目及び対応するデータを一覧表示し、集計指示を受け付けたと判断した場合、データベース項目ごとの集計項目を加えて前記データベース項目及び対応するデータを一覧表示することを特徴とする。 Next, in order to achieve the above object, a database generation method according to a fourth invention is a database generation apparatus that generates a new database based on data extracted from one or a plurality of electronic document files including table data. In a database generation method that can be executed, one or a plurality of the electronic document files are acquired, a database item and a data extraction rule of the database to be generated are specified, and the specified database item and data extraction rule Extracting the database item and corresponding data from the plurality of electronic document files, detecting a data type for each extracted database item, and determining whether the data type is a numeric type for each database item For database items whose data type is determined to be numeric, If it is determined whether or not, and it is determined that the aggregation instruction is not accepted, the extracted database items and corresponding data are displayed in a list. If it is determined that the aggregation instruction is accepted, the aggregation item for each database item is displayed. In addition, the database items and corresponding data are displayed as a list.

また、第５発明に係るデータベース生成方法は、第４発明において、前記データベース項目ごとに数値変換し、数値変換にエラーが生じたか否かを判断し、エラーが生じたと判断した場合、該データベース項目のデータ型が文字型であると検出し、エラーが生じていないと判断した場合、該データベース項目のデータ型が数値型であると検出することを特徴とする。 The database generation method according to a fifth aspect of the present invention is the database generation method according to the fourth aspect, wherein the database item is converted into a numerical value for each of the database items, and it is determined whether an error has occurred in the numerical conversion. When the data type is detected as a character type and it is determined that no error has occurred, the data type of the database item is detected as a numeric type.

また、第６発明に係るデータベース生成方法は、第４又は第５発明において、異なる電子文書ファイルから抽出した表データの位置の相違に関する情報、及び／又は異なる電子文書ファイルから抽出したデータベース項目の相違に関する情報を少なくとも含む表データの相違に関するゆらぎ情報を抽出し、前記データベース項目、前記データ抽出規則及び抽出されたゆらぎ情報に基づいて、前記データ抽出規則の変更部分が存在するか否かを判断し、変更部分が存在すると判断した場合、同一のデータベース項目に対しては同一の、異なるデータベース項目に対しては異なるタグ情報を付与し、前記データ抽出規則の変更部分を反映して前記データベース項目及び対応するデータを抽出し、前記データベース項目に付与されているタグ情報に従って前記データベース項目を配列して、前記データベース項目及び対応するデータを一覧表示することを特徴とする。 The database generation method according to the sixth invention is the database generation method according to the fourth or fifth invention, wherein the information on the difference in the position of the table data extracted from different electronic document files and / or the difference in the database items extracted from the different electronic document files. Fluctuation information related to the difference in the table data including at least information related to the data is extracted, and it is determined whether there is a changed part of the data extraction rule based on the database item, the data extraction rule, and the extracted fluctuation information. If it is determined that there is a changed part, the same database item is assigned the same, different database items are given different tag information, and the changed part of the data extraction rule is reflected to reflect the changed database item and The corresponding data is extracted and the tag information assigned to the database item is extracted. Said arranged database items, characterized in that list the database field and the corresponding data I.

次に、上記目的を達成するために第７発明に係るコンピュータプログラムは、表データが含まれる一又は複数の電子文書ファイル中から抽出したデータに基づいて新たなデータベースを生成するデータベース生成装置で実行することが可能なコンピュータプログラムにおいて、前記データベース生成装置を、一又は複数の前記電子文書ファイルを取得する電子文書ファイル取得手段、生成するデータベースのデータベース項目及びデータ抽出規則を特定するデータ抽出規則特定手段、特定したデータベース項目及びデータ抽出規則にて、一又は複数の前記電子文書ファイルから前記データベース項目及び対応するデータを抽出するデータ抽出手段、抽出されたデータベース項目ごとにデータ型を検出するデータ型検出手段、前記データベース項目ごとに、データ型が数値型であるか否かを判断する型判断手段、該型判断手段で、データ型が数値型であると判断されたデータベース項目について、集計指示を受け付けたか否かを判断する指示判断手段、及び該指示判断手段で集計指示を受け付けていないと判断した場合、抽出されたデータベース項目及び対応するデータを一覧表示し、前記指示判断手段で集計指示を受け付けたと判断した場合、データベース項目ごとの集計項目を加えて前記データベース項目及び対応するデータを一覧表示する表示手段として機能させることを特徴とする。 Next, in order to achieve the above object, the computer program according to the seventh invention is executed by a database generation device that generates a new database based on data extracted from one or a plurality of electronic document files including table data. In the computer program that can be performed, the database generation device includes an electronic document file acquisition unit that acquires one or a plurality of the electronic document files, a data extraction rule specifying unit that specifies a database item and a data extraction rule of the database to be generated , Data extraction means for extracting the database item and corresponding data from one or a plurality of the electronic document files with the specified database item and data extraction rule, and data type detection for detecting the data type for each extracted database item Means, said database For each eye, type determination means for determining whether or not the data type is a numerical type, and whether or not the type determination means has received an aggregation instruction for the database item for which the data type is determined to be a numerical type When determining that the instruction determining means to determine, and when the instruction determining means does not accept the aggregation instruction, when the instruction determining means displays a list of the extracted database items and corresponding data and determines that the instruction determining means has received the aggregation instruction In addition, a total item for each database item is added to function as display means for displaying a list of the database item and corresponding data.

また、第８発明に係るコンピュータプログラムは、第７発明において、前記データ型検出手段を、前記データベース項目ごとに数値変換する数値変換手段、数値変換にエラーが生じたか否かを判断するエラー判断手段、及び該エラー判断手段でエラーが生じたと判断した場合、該データベース項目のデータ型が文字型であると検出し、エラーが生じていないと判断した場合、該データベース項目のデータ型が数値型であると検出する検出手段として機能させることを特徴とする。 The computer program according to an eighth invention is the computer program according to the seventh invention, wherein in the seventh invention, the data type detecting means is a numerical value converting means for numerically converting each database item, and an error determining means for determining whether or not an error has occurred in the numerical conversion And when the error determination means determines that an error has occurred, the data type of the database item is detected as a character type. When it is determined that no error has occurred, the data type of the database item is a numeric type. It is made to function as a detection means which detects that there exists.

また、第９発明に係るコンピュータプログラムは、第７又は第８発明において、前記データベース生成装置を、異なる電子文書ファイルから抽出した表データの位置の相違に関する情報、及び／又は異なる電子文書ファイルから抽出したデータベース項目の相違に関する情報を少なくとも含む表データの相違に関するゆらぎ情報を抽出するゆらぎ情報抽出手段、前記データベース項目、前記データ抽出規則及び抽出されたゆらぎ情報に基づいて、前記データ抽出規則の変更部分が存在するか否かを判断する判断手段、及び該判断手段で変更部分が存在すると判断した場合、同一のデータベース項目に対しては同一の、異なるデータベース項目に対しては異なるタグ情報を付与するタグ情報付与手段として機能させ、前記データ抽出手段を、前記データ抽出規則の変更部分を反映して前記データベース項目及び対応するデータを抽出する手段として機能させ、前記表示手段を、前記データベース項目に付与されているタグ情報に従って前記データベース項目を配列して、前記データベース項目及び対応するデータを一覧表示する手段として機能させることを特徴とする。 A computer program according to a ninth invention is the computer program according to the seventh or eighth invention, wherein the database generation device is extracted from information relating to a difference in position of table data extracted from different electronic document files and / or from different electronic document files. Fluctuation information extraction means for extracting fluctuation information relating to differences in table data including at least information relating to differences in the database items, the database item, the data extraction rule, and the changed portion of the data extraction rule based on the extracted fluctuation information If it is determined by the determination means that there is a changed part, the same database item is assigned the same tag information, but different database items are assigned different tag information. It functions as tag information giving means, and the data extracting means Reflecting the changed part of the data extraction rule, function as means for extracting the database item and the corresponding data, and arranging the database item according to the tag information given to the database item, the display means, It is characterized by functioning as means for displaying a list of database items and corresponding data.

第１発明、第４発明及び第７発明では、一又は複数の電子文書ファイルを取得し、生成するデータベースのデータベース項目及びデータ抽出規則を特定し、特定したデータベース項目及びデータ抽出規則にて、一又は複数の電子文書ファイルからデータベース項目及び対応するデータを抽出する。抽出されたデータベース項目ごとにデータ型を検出し、データベース項目ごとに、データ型が数値型であるか否かを判断する。データ型が数値型であると判断されたデータベース項目について、集計指示を受け付けたか否かを判断し、集計指示を受け付けていないと判断した場合、抽出されたデータベース項目及び対応するデータを一覧表示し、集計指示を受け付けたと判断した場合、データベース項目ごとの集計項目を加えてデータベース項目及び対応するデータを一覧表示する。データベース項目及び対応するデータを抽出した時点でデータ型を判断することにより、不要な表データについて項目のデータ型を判断する必要がない。また、データ型が数値型であるデータベース項目についてのみ集計項目を追加して表示することができ、異なるデータ形式を有する複数の表データを基礎とする場合であっても、新たなデータベースを生成することができるとともに、正しく集計結果を算出することが可能となる。 In the first invention, the fourth invention, and the seventh invention, one or a plurality of electronic document files are acquired, the database item and data extraction rule of the database to be generated are specified, and the specified database item and data extraction rule Alternatively, database items and corresponding data are extracted from a plurality of electronic document files. A data type is detected for each extracted database item, and it is determined for each database item whether the data type is a numeric type. For database items whose data type is determined to be numeric, it is determined whether or not an aggregation instruction has been accepted. If it is determined that no aggregation instruction has been accepted, a list of the extracted database items and corresponding data is displayed. If it is determined that the totaling instruction has been received, the totaling item for each database item is added and the database item and corresponding data are displayed in a list. By determining the data type at the time of extracting the database item and the corresponding data, it is not necessary to determine the data type of the item for unnecessary table data. In addition, aggregate items can be added and displayed only for database items whose data type is numeric, and a new database is created even when based on a plurality of table data having different data formats. In addition, it is possible to calculate the total result correctly.

第２発明、第５発明及び第８発明では、データベース項目ごとに数値変換し、数値変換にエラーが生じたか否かを判断する。エラーが生じたと判断した場合、該データベース項目のデータ型が文字型であると検出し、エラーが生じていないと判断した場合、該データベース項目のデータ型が数値型であると検出する。これにより、抽出したデータベース項目ごとに数値変換処理を実行するだけで、該データベース項目のデータ型が数値型であるか文字型であるかを検出することができ、事前にすべての項目のデータ型を把握しておくことなく、正しく集計結果を算出することが可能なデータベース項目であるか否かを判断することが可能となる。 In the second invention, the fifth invention, and the eighth invention, numerical conversion is performed for each database item, and it is determined whether or not an error has occurred in the numerical conversion. When it is determined that an error has occurred, it is detected that the data type of the database item is a character type, and when it is determined that no error has occurred, it is detected that the data type of the database item is a numeric type. As a result, it is possible to detect whether the data type of the database item is a numeric type or a character type simply by executing a numerical value conversion process for each extracted database item. It is possible to determine whether or not the database item is capable of correctly calculating the total result without knowing the above.

第３発明、第６発明及び第９発明では、異なる電子文書ファイルから抽出した表データの位置の相違に関する情報、及び／又は異なる電子文書ファイルから抽出したデータベース項目の相違に関する情報を少なくとも含む表データの相違に関するゆらぎ情報を抽出する。データベース項目、データ抽出規則及び抽出されたゆらぎ情報に基づいて、データ抽出規則の変更部分が存在するか否かを判断し、変更部分が存在すると判断した場合、同一のデータベース項目に対しては同一の、異なるデータベース項目に対しては異なるタグ情報を付与する。データ抽出規則の変更部分を反映してデータベース項目及び対応するデータを抽出し、データベース項目に付与されているタグ情報に従ってデータベース項目を配列して、データベース項目及び対応するデータを一覧表示する。これにより、複数のファイル上で表データの位置が相違する場合、データベース項目が相違する場合、データベース項目の配列順序が相違する場合等であっても、斯かる相違に起因して変更されたデータ抽出規則に従ってデータを抽出し、同一のデータベース項目については同一のタグ情報をキー情報として集約することができ、新たな異なるデータベース項目については、異なるタグ情報により新規のデータベース項目として追加生成することができる。したがって、ユーザがデータベース項目の相違を事前にすべて知ることができない場合であっても、データベース項目が重複又は欠落することなく新たなデータベースを生成して一覧表示することが可能となる。 In the third invention, the sixth invention, and the ninth invention, table data including at least information relating to a difference in position of table data extracted from different electronic document files and / or information relating to a difference in database items extracted from different electronic document files Fluctuation information about the difference between Based on the database item, the data extraction rule, and the extracted fluctuation information, it is determined whether there is a changed part of the data extraction rule. If it is determined that the changed part exists, the same database item is the same Different tag information is assigned to different database items. The database item and corresponding data are extracted reflecting the changed part of the data extraction rule, the database item is arranged according to the tag information given to the database item, and the database item and the corresponding data are displayed in a list. As a result, even if the position of the table data is different on a plurality of files, the database items are different, the arrangement order of the database items is different, etc., the data changed due to such differences Data can be extracted according to the extraction rules, and the same tag information can be aggregated as key information for the same database item, and new different database items can be additionally generated as new database items with different tag information it can. Therefore, even when the user cannot know all the differences between the database items in advance, a new database can be generated and displayed in a list without duplication or omission of the database items.

本発明によれば、データベース項目及び対応するデータを抽出した時点でデータ型を判断することにより、不要な表データについて項目のデータ型を判断する必要がない。また、データ型が数値型であるデータベース項目についてのみ集計項目として追加して表示することができ、異なるデータ形式を有する複数の表データを基礎とする場合であっても、新たなデータベースを生成することができるとともに、正しく集計結果を算出することが可能となる。 According to the present invention, it is not necessary to determine the data type of the item for unnecessary table data by determining the data type at the time of extracting the database item and the corresponding data. In addition, only database items whose data types are numeric types can be added and displayed as aggregated items, and a new database is generated even when based on a plurality of table data having different data formats. In addition, it is possible to calculate the total result correctly.

以下、本発明の実施の形態に係るデータベース生成装置について、図面に基づいて具体的に説明する。以下の実施の形態は、特許請求の範囲に記載された発明を限定するものではなく、実施の形態の中で説明されている特徴的事項の組み合わせの全てが解決手段の必須事項であるとは限らないことは言うまでもない。 Hereinafter, a database generation device according to an embodiment of the present invention will be specifically described with reference to the drawings. The following embodiments do not limit the invention described in the claims, and all combinations of characteristic items described in the embodiments are essential to the solution. It goes without saying that it is not limited.

また、本発明は多くの異なる態様にて実施することが可能であり、実施の形態の記載内容に限定して解釈されるべきものではない。実施の形態を通じて同じ要素には同一の符号を付している。 The present invention can be implemented in many different modes and should not be construed as being limited to the description of the embodiment. The same symbols are attached to the same elements throughout the embodiments.

以下の実施の形態では、コンピュータシステムにコンピュータプログラムを導入したデータベース生成装置について説明するが、当業者であれば明らかな通り、本発明はその一部をコンピュータで実行することが可能なコンピュータプログラムとして実施することができる。したがって、本発明は、データベース生成装置というハードウェアとしての実施の形態、ソフトウェアとしての実施の形態、又はソフトウェアとハードウェアとの組み合わせの実施の形態をとることができる。コンピュータプログラムは、ハードディスク、ＤＶＤ、ＣＤ、光記憶装置、磁気記憶装置等の任意のコンピュータで読み取ることが可能な記録媒体に記録することができる。 In the following embodiment, a database generation apparatus in which a computer program is introduced into a computer system will be described. However, as will be apparent to those skilled in the art, the present invention is a computer program that can be partially executed by a computer. Can be implemented. Therefore, the present invention can take an embodiment of hardware as a database generation device, an embodiment of software, or an embodiment of a combination of software and hardware. The computer program can be recorded on any computer-readable recording medium such as a hard disk, DVD, CD, optical storage device, magnetic storage device or the like.

（実施の形態１）
図１は、本発明の実施の形態１に係るデータベース生成装置の構成例を示すブロック図である。本発明の実施の形態１に係るデータベース生成装置１は、少なくともＣＰＵ（中央演算装置）１１、メモリ１２、記憶装置１３、Ｉ／Ｏインタフェース１４、ビデオインタフェース１５、可搬型ディスクドライブ１６、通信インタフェース１７及び上述したハードウェアを接続する内部バス１８で構成されている。 (Embodiment 1)
FIG. 1 is a block diagram showing a configuration example of a database generation apparatus according to Embodiment 1 of the present invention. The database generation device 1 according to the first embodiment of the present invention includes at least a CPU (central processing unit) 11, a memory 12, a storage device 13, an I / O interface 14, a video interface 15, a portable disk drive 16, and a communication interface 17. And an internal bus 18 for connecting the hardware described above.

ＣＰＵ１１は、内部バス１８を介してデータベース生成装置１の上述したようなハードウェア各部と接続されており、上述したハードウェア各部の動作を制御するとともに、記憶装置１３に記憶されているコンピュータプログラム１００に従って、種々のソフトウェア的機能を実行する。メモリ１２は、ＳＲＡＭ、ＳＤＲＡＭ等の揮発性メモリで構成され、コンピュータプログラム１００の実行時にロードモジュールが展開され、コンピュータプログラム１００の実行時に発生する一時的なデータ等を記憶する。 The CPU 11 is connected to the above-described hardware units of the database generation device 1 via the internal bus 18, controls the operation of the above-described hardware units, and stores the computer program 100 stored in the storage device 13. Various software functions are executed according to the above. The memory 12 is composed of a volatile memory such as SRAM or SDRAM, and a load module is expanded when the computer program 100 is executed, and stores temporary data generated when the computer program 100 is executed.

記憶装置１３は、内蔵される固定型記憶装置（ハードディスク）、ＳＲＡＭ等の揮発性メモリ、ＲＯＭ等の不揮発性メモリ等で構成されている。記憶装置１３に記憶されているコンピュータプログラム１００は、プログラム及びデータ等の情報を記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体９０から、可搬型ディスクドライブ１６によりダウンロードされ、実行時には記憶装置１３からメモリ１２へ展開して実行される。もちろん、通信インタフェース１７を介してネットワーク２に接続されている外部のコンピュータからダウンロードされたコンピュータプログラムであっても良い。 The storage device 13 includes a built-in fixed storage device (hard disk), a volatile memory such as SRAM, and a nonvolatile memory such as ROM. The computer program 100 stored in the storage device 13 is downloaded by a portable disk drive 16 from a portable recording medium 90 such as a DVD or CD-ROM in which information such as programs and data is recorded. To the memory 12 and executed. Of course, a computer program downloaded from an external computer connected to the network 2 via the communication interface 17 may be used.

また記憶装置１３は、電子文書ファイル記憶部１３１、データ抽出規則記憶部１３２、データベース記憶部１３３及びゆらぎ情報記憶部１３４を備えている。電子文書ファイル記憶部１３１には、表データを内容に含み、新たなデータベースを生成するための基礎となる電子文書ファイルを記憶する。 The storage device 13 includes an electronic document file storage unit 131, a data extraction rule storage unit 132, a database storage unit 133, and a fluctuation information storage unit 134. The electronic document file storage unit 131 stores an electronic document file that includes table data as a content and serves as a basis for generating a new database.

データ抽出規則記憶部１３２には、例えば電子文書ファイルに含まれる表データのうち最大のサイズを有する表データを選択する、ファイルの先頭からｎ（ｎは自然数）番目の表データを選択する等の、表データからデータベース項目及び対応するデータを抽出するデータ抽出規則を記憶している。 The data extraction rule storage unit 132 selects, for example, the table data having the maximum size among the table data included in the electronic document file, or selects the nth (n is a natural number) table data from the top of the file. A data extraction rule for extracting database items and corresponding data from the table data is stored.

データベース記憶部１３３には、複数の電子文書ファイルに含まれる表データをマージして新たに生成されたデータベースを記憶する。ゆらぎ情報記憶部１３４には、マージする対象となる表データ間の相違に関する情報、いわゆるゆらぎ情報を記憶する。ゆらぎ情報としては、例えば表データの開始セルの位置の相違に関する表位置ゆらぎ情報、表データの項目の順序が相違する、新規項目の存在、項目の抜けの存在等の項目の相違に関する項目ゆらぎ情報等がある。また、英語表記での大文字と小文字との相違、全角と半角との相違等も含む広い概念である。 The database storage unit 133 stores a database newly generated by merging table data included in a plurality of electronic document files. The fluctuation information storage unit 134 stores information on differences between table data to be merged, so-called fluctuation information. As the fluctuation information, for example, the table position fluctuation information regarding the difference in the position of the start cell of the table data, the item fluctuation information regarding the item difference such as the presence of a new item, the presence of a missing item, etc. Etc. Moreover, it is a broad concept including the difference between uppercase and lowercase letters in English and the difference between full-width and half-width.

通信インタフェース１７は内部バス１８に接続されており、インターネット、ＬＡＮ、ＷＡＮ等の外部のネットワーク２に接続されることにより、外部のコンピュータ等とデータ送受信を行うことが可能となっている。電子文書ファイル記憶部１３１は、データベース生成装置１の記憶装置１３に備えることに限定されるものではなく、外部のコンピュータの記憶装置に記憶されることによりネットワーク上に点在していても良い。 The communication interface 17 is connected to an internal bus 18 and can transmit and receive data to and from an external computer or the like by connecting to an external network 2 such as the Internet, LAN, or WAN. The electronic document file storage unit 131 is not limited to be provided in the storage device 13 of the database generation device 1, and may be scattered on a network by being stored in a storage device of an external computer.

Ｉ／Ｏインタフェース１４は、キーボード２１、マウス２２等のデータ入力媒体と接続され、データの入力を受け付ける。また、ビデオインタフェース１５は、ＣＲＴモニタ、ＬＣＤ等の表示装置２３と接続され、所定の画像を表示する。 The I / O interface 14 is connected to a data input medium such as a keyboard 21 and a mouse 22 and receives data input. The video interface 15 is connected to a display device 23 such as a CRT monitor or LCD, and displays a predetermined image.

図２は、本発明の実施の形態１に係るデータベース生成装置１の機能ブロック図である。電子文書ファイル取得部２０１は、一又は複数の表データを含む電子文書ファイルを取得する。電子文書ファイルは、記憶装置１３内に記憶されている電子文書ファイルを電子文書ファイル記憶部１３１に集約しても良いし、ネットワーク２を介して外部のコンピュータから取得しても良い。また、キーボード２１、マウス２２等の入力装置を介して入力しても良い。 FIG. 2 is a functional block diagram of the database generation device 1 according to Embodiment 1 of the present invention. The electronic document file acquisition unit 201 acquires an electronic document file including one or more table data. The electronic document file may be collected in the electronic document file storage unit 131 from the electronic document file stored in the storage device 13 or may be acquired from an external computer via the network 2. Moreover, you may input via input devices, such as the keyboard 21 and the mouse | mouth 22.

データ抽出規則特定部２０２は、取得した一又は複数の電子文書ファイルに含まれる表データを罫線に関する罫線情報に基づいて解析して、表データを抽出する場合に適用するべきデータ抽出規則を特定する。特定されたデータベース項目及びデータ抽出規則は、データ抽出規則記憶部１３２に記憶される。データ抽出規則特定部２０２は、少なくとも罫線情報抽出部２０３と、解析部２０４とを備えている。 The data extraction rule specifying unit 202 analyzes the table data included in the acquired one or more electronic document files based on the ruled line information regarding the ruled line, and specifies the data extraction rule to be applied when extracting the table data. . The specified database item and data extraction rule are stored in the data extraction rule storage unit 132. The data extraction rule specifying unit 202 includes at least a ruled line information extraction unit 203 and an analysis unit 204.

罫線情報抽出部２０３は、取得した一又は複数の電子文書ファイルに含まれる罫線に関する罫線情報をそれぞれ抽出する。具体的には、罫線で囲まれている部分を表データと認識し、その他の罫線がどのように配置されているかに関する情報を取得する。 The ruled line information extraction unit 203 extracts ruled line information related to ruled lines included in the acquired one or more electronic document files. Specifically, a part surrounded by ruled lines is recognized as table data, and information on how other ruled lines are arranged is acquired.

解析部２０４は、抽出した複数の罫線情報に基づいて電子文書ファイルの内容を解析する。具体的には、罫線によりレコード単位で項目がどのように区分けされているかを判断し、見出し部とデータ部とを区別する。 The analysis unit 204 analyzes the contents of the electronic document file based on the extracted plurality of ruled line information. Specifically, it is determined how items are divided in units of records by ruled lines, and the heading portion and the data portion are distinguished.

データ抽出部２０５は、特定したデータベース項目及びデータ抽出規則にて、一又は複数の電子文書ファイルからデータベース項目及び対応するデータを抽出する。抽出されたデータベース項目及び対応するデータはデータベース記憶部１３３に記憶される。 The data extraction unit 205 extracts a database item and corresponding data from one or a plurality of electronic document files using the specified database item and data extraction rule. The extracted database items and corresponding data are stored in the database storage unit 133.

データ型検出部２０６は、抽出されたデータベース項目ごとにデータ型を検出する。検出されるデータ型は、集計することが可能な数値型、及びそれ以外のデータ型、例えば文字型等である。 The data type detection unit 206 detects the data type for each extracted database item. The detected data type is a numerical type that can be aggregated, and other data types such as a character type.

型判断部２０７は、データ型検出部２０６で検出されたデータ型が、数値型であるか否かを判断する。データ型が数値型でなければ集計の対象になりえないからである。指示判断部２０８は、データ型が数値型であると判断されたデータベース項目について、集計指示を受け付けたか否かを判断する。 The type determination unit 207 determines whether the data type detected by the data type detection unit 206 is a numeric type. This is because if the data type is not a numeric type, it cannot be aggregated. The instruction determination unit 208 determines whether or not an aggregation instruction has been received for a database item whose data type is determined to be a numerical type.

表示部２０９は、集計指示を受け付けていないと判断した場合は、抽出されたデータベース項目及び対応するデータを表示装置２３にて一覧表示する。集計指示を受け付けたと判断した場合は、抽出されたデータベース項目及び対応するデータだけでなく、データベース項目ごとの集計項目を加えて表示装置２３にて一覧表示する。 When the display unit 209 determines that the aggregation instruction has not been received, the display unit 23 displays a list of the extracted database items and corresponding data. When it is determined that the totaling instruction has been received, not only the extracted database items and corresponding data but also the totaling items for each database item are added and displayed on the display device 23 as a list.

図３は、本発明の実施の形態１に係るデータベース生成装置１のＣＰＵ１１のデータベース生成処理の手順を示すフローチャートである。図３において、データベース生成装置１のＣＰＵ１１は、一又は複数の表データを含む電子文書ファイルを取得する（ステップＳ３０１）。電子文書ファイルは、記憶装置１３内に記憶されている電子文書ファイルを読み出しても良いし、ネットワーク２を介して外部のコンピュータから読み出しても良い。また、キーボード２１、マウス２２等の入力装置を介して入力を受け付けても良い。 FIG. 3 is a flowchart showing a database generation processing procedure of the CPU 11 of the database generation device 1 according to the first embodiment of the present invention. In FIG. 3, the CPU 11 of the database generation apparatus 1 acquires an electronic document file including one or more table data (step S301). The electronic document file may be read from an electronic document file stored in the storage device 13 or may be read from an external computer via the network 2. Further, input may be received via an input device such as a keyboard 21 and a mouse 22.

ＣＰＵ１１は、取得した一又は複数の電子文書ファイルに含まれる罫線に関する罫線情報をそれぞれ抽出する（ステップＳ３０２）。具体的には、罫線で囲まれている部分を表データと認識し、その他の罫線がどのように配置されているかに関する情報を取得する。 The CPU 11 extracts ruled line information related to ruled lines included in the acquired one or more electronic document files (step S302). Specifically, a part surrounded by ruled lines is recognized as table data, and information on how other ruled lines are arranged is acquired.

ＣＰＵ１１は、抽出した複数の罫線情報に基づいて電子文書ファイルの内容を解析する（ステップＳ３０３）。具体的には、罫線によりレコード単位で項目がどのように区分けされているかを判断し、見出し部とデータ部とを区別する。 The CPU 11 analyzes the contents of the electronic document file based on the extracted plurality of ruled line information (step S303). Specifically, it is determined how items are divided in units of records by ruled lines, and the heading portion and the data portion are distinguished.

図４は、罫線情報に基づいて表データの抽出を行う処理の例示図である。具体的には、電子文書ファイルのデータの走査方向につきユーザの指定を受け付け、項目が階層化されているか否かを１行ずつ判定する。図４（ａ）では、表データを下方向４２へ走査する場合を示しており、項目領域４１の１行目には項目「材料名」、「重量」、「物質」、「比率」が存在することを検出することができる。２行目では、項目「比率」が「平均重量」、「最大重量」に分割され、項目数が増加していることを検出することができる。 FIG. 4 is an exemplary diagram of processing for extracting table data based on ruled line information. Specifically, the user designation is accepted for the scanning direction of the data of the electronic document file, and it is determined line by line as to whether or not the item is hierarchized. FIG. 4A shows a case where the table data is scanned in the downward direction 42, and items “material name”, “weight”, “substance”, and “ratio” exist in the first row of the item area 41. Can be detected. In the second line, the item “ratio” is divided into “average weight” and “maximum weight”, and it can be detected that the number of items is increasing.

３行目では、項目名を検出することはできないものの、２行目の項目とセル位置及び項目数が同一であることを検出することができる。したがって、２行目まで見出し部であり、３行目以降がデータ部であることを自動認識することができ、新たなデータベース生成のためのデータ抽出は、３行目以降のデータ部から行うことができる。 In the third line, although the item name cannot be detected, it can be detected that the cell position and the number of items are the same as the item in the second line. Therefore, it is possible to automatically recognize that the heading part is up to the second line and the data part is after the third line, and data extraction for generating a new database is performed from the data part after the third line. Can do.

図４（ｂ）では、表データを右方向４４へ走査する場合を示しており、項目領域４３の１列目には項目「材料名」、「重量」、「物質」、「比率」が存在することを検出することができる。２列目では、項目「比率」が「平均重量」、「最大重量」に分割され、項目数が増加していることを検出することができる。 FIG. 4B shows a case where the table data is scanned in the right direction 44, and items “material name”, “weight”, “substance”, and “ratio” exist in the first column of the item area 43. Can be detected. In the second column, the item “ratio” is divided into “average weight” and “maximum weight”, and it can be detected that the number of items is increasing.

３列目では、項目名を検出することはできないものの、２列目の項目とセル位置及び項目数が同一であることを検出することができる。したがって、２列目まで見出し部であり、３列目以降がデータ部であることを自動認識することができ、新たなデータベース生成のためのデータ抽出は、３列目以降のデータ部から行うことができる。 In the third column, although the item name cannot be detected, it can be detected that the cell position and the number of items are the same as the item in the second column. Therefore, it is possible to automatically recognize that the second column is the heading part and the third and subsequent columns are data parts, and data extraction for generating a new database is performed from the third and subsequent data parts. Can do.

このように走査方向によらず、罫線情報に基づいて、データベース生成時に抽出するべきデータベース項目及び対応するデータのセル位置を正確に検出することができるので、表データの行と列とが反転している場合であっても一のデータベースとしてマージすることが可能となる。 As described above, since the database item to be extracted at the time of database generation and the cell position of the corresponding data can be accurately detected based on the ruled line information regardless of the scanning direction, the row and column of the table data are inverted. Even if it is, it becomes possible to merge as one database.

図３に戻って、データベース生成装置１のＣＰＵ１１は、解析結果に基づいて、生成するデータベースのデータベース項目及びデータ抽出規則を特定する（ステップＳ３０４）。ＣＰＵ１１は、一又は複数の電子文書ファイル又は該電子文書ファイル中の表データから、新たに生成するデータベースのデータベース項目及び対応するデータを抽出する（ステップＳ３０５）。データを抽出する規則は、記憶されているデータ抽出規則に従う。 Returning to FIG. 3, the CPU 11 of the database generation device 1 identifies the database item and data extraction rule of the database to be generated based on the analysis result (step S304). The CPU 11 extracts database items and corresponding data of a newly generated database from one or a plurality of electronic document files or table data in the electronic document files (step S305). The rules for extracting data follow the stored data extraction rules.

なお、罫線情報の解析結果だけでは正しくデータベース項目等が特定できない場合も生じうる。この場合、手動にてデータベース項目及びデータ抽出規則の特定を受け付ける。図５は、本発明の実施の形態１に係るデータベース生成装置１のＣＰＵ１１の手動特定処理の手順を示すフローチャートである。 Note that there may be a case where the database item or the like cannot be correctly specified only by the analysis result of the ruled line information. In this case, specification of database items and data extraction rules is received manually. FIG. 5 is a flowchart showing a procedure of manual identification processing of the CPU 11 of the database generation device 1 according to Embodiment 1 of the present invention.

図５において、データベース生成装置１のＣＰＵ１１は、図３のステップＳ３０３の処理の実行終了後、一又は複数の電子文書ファイルの指定を受け付け（ステップＳ５０１）、複数のシートが存在する場合にはシートの指定、及びシートに含まれる表データ中にて該表データと他の表データとのマージ対象となる範囲指定を受け付ける（ステップＳ５０２）。ＣＰＵ１１は、受け付けた範囲指定に従って、データベース項目及びデータ抽出規則を特定し（ステップＳ５０３）、処理を図３のステップＳ３０５へ進める。 In FIG. 5, the CPU 11 of the database generation device 1 accepts designation of one or a plurality of electronic document files after completion of the processing of step S 303 in FIG. 3 (step S 501). And the specification of a range to be merged between the table data and other table data in the table data included in the sheet is received (step S502). The CPU 11 specifies a database item and a data extraction rule according to the received range specification (step S503), and advances the process to step S305 in FIG.

図６は、範囲指定が必要となる場合の例示図である。図６（ａ）は、表データの構造が特段の規則性を有していない場合の例示図である。この場合、キーボード２１、マウス２２等の入力装置により、表データとして使用する領域６１のみを範囲指定として受け付ける。指定を受け付けた範囲に、例えば他の表データのデータベース項目とリンクするようなタグ情報を付加することにより、新たなデータベースに含まれるデータベース項目のデータとして正しく抽出することができる。 FIG. 6 is an exemplary diagram in a case where range specification is required. FIG. 6A is an exemplary diagram when the structure of the table data does not have any particular regularity. In this case, only an area 61 used as table data is accepted as a range designation by an input device such as the keyboard 21 and the mouse 22. For example, tag information that links to a database item of other table data is added to the range in which the designation is accepted, so that it can be correctly extracted as data of a database item included in a new database.

図６（ｂ）は、表データとして認識できない領域区分となっている場合の例示図である。図６（ｂ）の例では、見出し部として認識するべき領域６２が表として認識できる領域、すなわち矩形領域となっていない。この場合、キーボード２１、マウス２２等の入力装置により、領域６２を含めて列ごとの領域６３の範囲指定を受け付け、見出し部「材料」、「重量」、「比率１」、「比率２」に対して、他の表データのデータベース項目とリンクするようにタグ情報を付加する。これにより、新たなデータベースに含まれるデータベース項目のデータとして抽出することができる。 FIG. 6B is an exemplary diagram in the case where the region is unrecognizable as table data. In the example of FIG. 6B, the area 62 to be recognized as the heading portion is not an area that can be recognized as a table, that is, a rectangular area. In this case, the range designation of the area 63 for each column including the area 62 is accepted by an input device such as the keyboard 21 and the mouse 22 and the headings “material”, “weight”, “ratio 1”, “ratio 2” On the other hand, tag information is added so as to link with database items of other table data. Thereby, it can extract as data of the database item contained in a new database.

図３に戻って、データベース生成装置１のＣＰＵ１１は、抽出されたデータベース項目ごとにデータ型を検出し（ステップＳ３０６）、一のデータベース項目を選択する（ステップＳ３０７）。ＣＰＵ１１は、選択されたデータベース項目のデータ型が数値型であるか否かを判断する（ステップＳ３０８）。 Returning to FIG. 3, the CPU 11 of the database generation device 1 detects the data type for each extracted database item (step S306), and selects one database item (step S307). The CPU 11 determines whether or not the data type of the selected database item is a numeric type (step S308).

ＣＰＵ１１が、データ型が数値型であると判断した場合（ステップＳ３０８：ＹＥＳ）、ＣＰＵ１１は、集計指示を受け付けたか否かを判断する（ステップＳ３０９）。ＣＰＵ１１が集計指示を受け付けたと判断した場合（ステップＳ３０９：ＹＥＳ）、ＣＰＵ１１は、集計処理を実行し、データベース項目に集計項目を追加する（ステップＳ３１０）。 When the CPU 11 determines that the data type is a numerical type (step S308: YES), the CPU 11 determines whether or not an aggregation instruction has been received (step S309). When the CPU 11 determines that the aggregation instruction has been received (step S309: YES), the CPU 11 executes the aggregation process and adds the aggregation item to the database item (step S310).

ＣＰＵ１１が、データ型が数値型ではなく例えば文字型であると判断した場合（ステップＳ３０８：ＮＯ）、ＣＰＵ１１は、ステップＳ３０９及びステップＳ３１０を、ＣＰＵ１１が、集計指示を受け付けていないと判断した場合（ステップＳ３０９：ＮＯ）、ＣＰＵ１１は、ステップＳ３１０を、それぞれスキップし、全てのデータベース項目を選択したか否かを判断する（ステップＳ３１１）。 When the CPU 11 determines that the data type is not a numerical type but a character type, for example (step S308: NO), the CPU 11 performs step S309 and step S310 when the CPU 11 determines that it has not received an aggregation instruction ( Step S309: NO), the CPU 11 skips Step S310, and determines whether all database items have been selected (Step S311).

ＣＰＵ１１が、まだ選択されていないデータベース項目が存在すると判断した場合（ステップＳ３１１：ＮＯ）、ＣＰＵ１１は、次のデータベース項目を選択し（ステップＳ３１２）、処理をステップＳ３０８へ戻して上述した処理を繰り返す。ＣＰＵ１１が、全てのデータベース項目を選択したと判断した場合（ステップＳ３１１：ＹＥＳ）、ＣＰＵ１１は、集計項目を含むデータベース項目及び対応するデータを一覧表示する（ステップＳ３１３）。 If the CPU 11 determines that there is a database item that has not yet been selected (step S311: NO), the CPU 11 selects the next database item (step S312), returns the processing to step S308, and repeats the above-described processing. . When the CPU 11 determines that all database items have been selected (step S311: YES), the CPU 11 displays a list of database items including the total items and corresponding data (step S313).

なお、データ型が数値型であるか否かを判断する方法は特に限定されるものではない。例えば、データベース項目に対応するデータを数値変換処理し、エラーが生じたか否かに応じてデータ型が数値型であるか否かを判断しても良い。図７は、本発明の実施の形態１に係るデータベース生成装置１のＣＰＵ１１の数値変換処理を用いる場合のデータ型判断処理の手順を示すフローチャートである。 The method for determining whether or not the data type is a numeric type is not particularly limited. For example, the data corresponding to the database item may be subjected to numerical value conversion processing, and it may be determined whether or not the data type is a numerical type depending on whether or not an error has occurred. FIG. 7 is a flowchart showing the procedure of the data type determination process when the numerical value conversion process of the CPU 11 of the database generation device 1 according to the first embodiment of the present invention is used.

図７において、データベース生成装置１のＣＰＵ１１は、データベース項目ごとに対応するデータを数値変換し（ステップＳ７０１）、エラーが生じたか否かを判断する（ステップＳ７０２）。ＣＰＵ１１が、エラーが生じたと判断した場合（ステップＳ７０２：ＹＥＳ）、ＣＰＵ１１は、データ型が文字型であると判断する（ステップＳ７０３）。ＣＰＵ１１が、エラーが生じていないと判断した場合（ステップＳ７０２：ＮＯ）、ＣＰＵ１１は、データ型が数値型であると判断する（ステップＳ７０４）。 In FIG. 7, the CPU 11 of the database generation device 1 numerically converts data corresponding to each database item (step S701), and determines whether an error has occurred (step S702). When the CPU 11 determines that an error has occurred (step S702: YES), the CPU 11 determines that the data type is a character type (step S703). When the CPU 11 determines that no error has occurred (step S702: NO), the CPU 11 determines that the data type is a numerical type (step S704).

図８は、一の表データに数値型データを含む同一項目が複数存在する場合の例示図である。図８（ａ）に示すように、パーツ‘ＤＴＡ１１４Ｅ’については、一の表データに項目‘Ａｇ’が２つ存在している。本来は、図８（ｂ）に示すように、パーツごとのＡｇ含有率を集計して、Ａｇ含有率に応じて注文対象となるパーツを抽出することが目的である場合、単にデータベース項目及び対応するデータを抽出しただけでは、パーツごとのＡｇ含有率を算出することができない。 FIG. 8 is an illustration of a case where there are a plurality of identical items including numerical data in one table data. As shown in FIG. 8A, for the part 'DTA 114E', there are two items 'Ag' in one table data. Originally, as shown in FIG. 8 (b), if the purpose is to aggregate the Ag content for each part and extract the parts to be ordered according to the Ag content, simply the database item and the corresponding It is not possible to calculate the Ag content for each part simply by extracting the data to be processed.

そこで、例えば図８（ａ）の表データから、（重量（ｍｇ）×平均重量％／１００）を総重量で除算するという集計指示を受け付けることにより、データベース生成時に集計項目を追加した所望のデータベース項目及び対応するデータを表示することができる。 Therefore, for example, from the table data of FIG. 8A, by receiving a totaling instruction to divide (weight (mg) × average weight% / 100) by the total weight, a desired database in which the totaling items are added at the time of database generation Items and corresponding data can be displayed.

図９は、Ａｇ含有量の一覧表の例示図である。図９（ａ）は、パーツごとにＡｇ含有量を算出する集計指示を受け付けた場合の表示画面の例示図である。集計指示として、項目「重量（ｍｇ）」と項目「材料の平均重量％」とを用いて項目「重量」９１を演算するという演算式の入力を集計指示として受け付け、集計結果である項目「重量」９１を新規の項目として追加している。 FIG. 9 is an exemplary diagram of a list of Ag contents. FIG. 9A is an exemplary view of a display screen when an aggregation instruction for calculating the Ag content for each part is received. As an aggregation instruction, an input of an arithmetic expression for calculating the item “weight” 91 using the item “weight (mg)” and the item “average weight% of material” is accepted as an aggregation instruction, and the item “weight” as the aggregation result is received. "91" is added as a new item.

また、集計指示の受け付けは、演算式として受け付けることに限定されるものではなく、マウス２２等による選択の入力を受け付けても良い。図９（ｂ）は、マウス２２により、集計単位を項目「注文可能なパーツ」９２とし、集計対象を項目「重量（ｍｇ）」９３とする旨の指示を受け付けた場合の表示画面の例示図である。この場合、新規の項目「重量合計」９４が追加され、集計単位に沿った集計結果が重量合計に表示される。すなわち、項目「注文可能なパーツ」９２が、複数の項目「重量（ｍｇ）」９３にまたがっている場合、項目「注文可能なパーツ」９２に含まれるすべての項目「重量（ｍｇ）」９３の値を加算して、新規の項目「重量合計」９４に集計結果が表示されている。 The acceptance of the totaling instruction is not limited to accepting as an arithmetic expression, and an input of selection using the mouse 22 or the like may be accepted. FIG. 9B is an exemplary diagram of a display screen when an instruction to set the aggregation unit as the item “parts that can be ordered” 92 and the aggregation target as the item “weight (mg)” 93 is received by the mouse 22. It is. In this case, a new item “total weight” 94 is added, and the total result along the total unit is displayed in the total weight. That is, when the item “orderable part” 92 extends over a plurality of items “weight (mg)” 93, all the items “weight (mg)” 93 included in the item “orderable part” 92 are included. By adding the values, the total result is displayed in a new item “total weight” 94.

以上のように本実施の形態１によれば、データベース項目及び対応するデータを抽出した時点でデータ型を判断することにより、不要な表データについて項目のデータ型を判断する必要がない。また、データ型が数値型であるデータベース項目についてのみ集計項目を追加して表示することができ、異なるデータ形式を有する複数の表データを基礎とする場合であっても、新たなデータベースを生成することができるとともに、正しく集計結果を算出することが可能となる。 As described above, according to the first embodiment, it is not necessary to determine the data type of the item for unnecessary table data by determining the data type at the time when the database item and the corresponding data are extracted. In addition, aggregate items can be added and displayed only for database items whose data type is numeric, and a new database is created even when based on a plurality of table data having different data formats. In addition, it is possible to calculate the total result correctly.

（実施の形態２）
本発明の実施の形態２に係るデータベース生成装置の構成例は、実施の形態１と同様であることから同一の符号を付することで詳細な説明は省略する。本実施の形態２は、マージ対象となる表データ間に、いわゆるゆらぎ情報が存在する場合に、タグ情報を用いて揺らぎ情報に対する補正処理を実行する点で実施の形態１と相違する。ここで、「ゆらぎ情報」とは、表データ間の相違に関する情報の総称である。例えば表データの開始セルの位置の相違に関する表位置ゆらぎ情報、表データの項目の順序が相違する、新規項目の存在、項目の抜けの存在等の項目の相違に関する項目ゆらぎ情報等がある。 (Embodiment 2)
Since the configuration example of the database generation device according to the second embodiment of the present invention is the same as that of the first embodiment, the same reference numerals are given and detailed description is omitted. The second embodiment is different from the first embodiment in that when so-called fluctuation information exists between table data to be merged, correction processing for fluctuation information is executed using tag information. Here, “fluctuation information” is a general term for information regarding differences between table data. For example, there are table position fluctuation information relating to the difference in the position of the start cell of the table data, item fluctuation information relating to the difference in items such as the presence of new items, the presence of missing items, etc., in which the order of the items of the table data is different.

図１０は、表データの位置に相違が存在する「表位置ゆらぎ情報」の説明図である。図１０（ａ）から図１０（ｃ）に示すように、表データを示す罫線が存在する領域の左上のセル１０１、１０２、１０３のサイズが相違することにより、電子文書ファイル中の表データのセル位置がそれぞれ相違している。表位置ゆらぎ情報が存在する場合、例えばデータ抽出規則を「上から１番目の表」等に特定しておく、又は記憶してあるデータ抽出規則から選択することにより、図１０（ａ）から図１０（ｃ）に示すすべての表データを抽出の対象とすることができる。 FIG. 10 is an explanatory diagram of “table position fluctuation information” in which there is a difference in the position of the table data. As shown in FIGS. 10A to 10C, the size of the upper left cells 101, 102, and 103 in the area where the ruled line indicating the table data exists is different, so that the table data in the electronic document file is changed. Each cell position is different. When the table position fluctuation information exists, for example, the data extraction rule is specified as “first table from the top” or the like, or selected from the stored data extraction rules. All the table data shown in 10 (c) can be extracted.

図１１は、表データの項目に相違が存在する「項目ゆらぎ情報」の説明図である。図１１（ａ）を基準とした場合、図１１（ｂ）は項目Ｃと項目Ｂとの順序が入れ替わっている。従来のＣＳＶファイルを用いて表データをマージする場合には、項目Ｃと項目Ｂとの順序が入れ替わっていることを事前にユーザが知っている状態で、入れ替え指示を出す必要があった。 FIG. 11 is an explanatory diagram of “item fluctuation information” in which differences exist in the items of the table data. When FIG. 11A is used as a reference, the order of item C and item B in FIG. When merging table data using a conventional CSV file, it is necessary to issue a replacement instruction in a state where the user knows in advance that the order of item C and item B has been switched.

本実施の形態２では、項目が入れ替わっていることを検出して、項目名にリンクしたタグ情報を付与する。すなわち図１１（ａ）の見出し部１１１では、例えば項目Ａに対してタグ情報‘ａ’を、項目Ｂに対してタグ情報‘ｂ’を、項目Ｃに対してタグ情報‘ｃ’を、それぞれ付与する。図１１（ｂ）の見出し部１１２では、項目Ｂと項目Ｃとの順序が入れ替わっているが、タグ情報は図１１（ａ）と同様の対応関係で付与しておく。データ抽出時にはタグ情報‘ａ’、タグ情報‘ｂ’及びタグ情報‘ｃ’を基礎としてデータを集約するので、基礎となる表データで項目がどのように配置されていても、新たなデータベースでは、タグ情報の順に集約することができる。したがって、項目Ｃと項目Ｂとの順序が入れ替わっていることを事前にユーザが知らなくても、表データを適正にマージすることが可能となる。 In the second embodiment, it is detected that the items are switched, and tag information linked to the item name is given. 11A, for example, tag information 'a' for item A, tag information 'b' for item B, and tag information 'c' for item C, respectively. Give. In the heading part 112 of FIG. 11B, the order of the item B and the item C is switched, but the tag information is given in the same correspondence as in FIG. 11A. Since data is aggregated based on tag information 'a', tag information 'b', and tag information 'c' at the time of data extraction, no matter how items are arranged in the basic table data, The tag information can be aggregated in the order. Therefore, the table data can be appropriately merged even if the user does not know in advance that the order of the items C and B is switched.

また、図１１（ｃ）では、見出し部１１３に新たな項目である項目Ｄ、項目Ｅが存在するのに対し、項目Ｃが欠落している。この場合も、項目Ｄに対してタグ情報‘ｄ’を、項目Ｅに対してタグ情報‘ｅ’を、それぞれ付与することにより、データ抽出時にタグ情報を基礎としてデータを抽出する限り、誤った項目を集約するおそれはない。すなわち、新規に追加された項目は独立して集約することができるし、欠落している項目については本表データからはデータを抽出することがない。 Further, in FIG. 11C, items D and E, which are new items, are present in the heading portion 113, whereas item C is missing. Also in this case, the tag information 'd' is assigned to the item D and the tag information 'e' is assigned to the item E. There is no risk of consolidating items. That is, newly added items can be aggregated independently, and no data is extracted from the table data for missing items.

図１２は、本発明の実施の形態２に係るデータベース生成装置１の機能ブロック図である。電子文書ファイル取得部２０１は、一又は複数の表データを含む電子文書ファイルを取得する。電子文書ファイルは、記憶装置１３内に記憶されている電子文書ファイルを電子文書ファイル記憶部１３１に集約しても良いし、ネットワーク２を介して外部のコンピュータから取得しても良い。また、キーボード２１、マウス２２等の入力装置を介して入力しても良い。 FIG. 12 is a functional block diagram of the database generation device 1 according to Embodiment 2 of the present invention. The electronic document file acquisition unit 201 acquires an electronic document file including one or more table data. The electronic document file may be collected in the electronic document file storage unit 131 from the electronic document file stored in the storage device 13 or may be acquired from an external computer via the network 2. Moreover, you may input via input devices, such as the keyboard 21 and the mouse | mouth 22.

ゆらぎ情報抽出部１２０１は、異なる電子文書ファイルから抽出した表データの位置の相違に関する情報、及び／又は異なる電子文書ファイルから抽出したデータベース項目の相違に関する情報を少なくとも含む表データの相違に関するゆらぎ情報を抽出する。判断部１２０２は、データベース項目、データ抽出規則及び抽出されたゆらぎ情報に基づいて、データ抽出規則の変更部分が存在するか否かを判断する。すなわち、ゆらぎ情報の存在によって、タグ情報に基づく表データの抽出規則が変更されるので、変更部分が存在すると判断した場合にはデータ抽出規則に対して何らかのゆらぎ補正が実行されていると判断することができる。 The fluctuation information extraction unit 1201 displays fluctuation information regarding differences in table data including at least information regarding differences in the positions of table data extracted from different electronic document files and / or information regarding differences in database items extracted from different electronic document files. Extract. The determination unit 1202 determines whether there is a changed part of the data extraction rule based on the database item, the data extraction rule, and the extracted fluctuation information. That is, because the table data extraction rule based on the tag information is changed due to the presence of fluctuation information, if it is determined that there is a changed portion, it is determined that some fluctuation correction has been performed on the data extraction rule. be able to.

タグ情報付与部１２０３は、判断部１２０２で変更部分が存在すると判断した場合、同一のデータベース項目に対しては同一の、異なるデータベース項目に対しては異なるタグ情報を付与する。このようにすることで、前述したようなタグ情報を用いることより、ゆらぎ情報を有する表データであっても適正にマージすることができる。 When the determination unit 1202 determines that there is a changed part, the tag information adding unit 1203 adds the same tag information to the same database item and different tag information to different database items. In this way, by using the tag information as described above, even table data having fluctuation information can be properly merged.

データ抽出部２０５は、データ抽出規則の変更部分を反映してデータベース項目及び対応するデータを抽出する。データ抽出規則にはタグ情報に関する項目が含まれており、タグ情報に応じてデータベース項目及び対応するデータが抽出され、データベース記憶部１３３に記憶される。 The data extraction unit 205 extracts a database item and corresponding data by reflecting the changed part of the data extraction rule. The data extraction rule includes items related to tag information, and database items and corresponding data are extracted according to the tag information and stored in the database storage unit 133.

表示部２０９は、集計指示を受け付けていないと判断した場合は、抽出されたデータベース項目及び対応するデータを、データベース項目に付与されているタグ情報に従ってデータベース項目を配列して、表示装置２３にて一覧表示する。集計指示を受け付けたと判断した場合は、抽出されたデータベース項目及び対応するデータだけでなく、データベース項目ごとの集計項目を加えて表示装置２３にて一覧表示する。 When the display unit 209 determines that it has not accepted the totaling instruction, the display unit 23 arranges the database items in the extracted database items and the corresponding data according to the tag information assigned to the database items. Display a list. When it is determined that the totaling instruction has been received, not only the extracted database items and corresponding data but also the totaling items for each database item are added and displayed on the display device 23 as a list.

図１３は、本発明の実施の形態２に係るデータベース生成装置１のＣＰＵ１１のゆらぎ情報の補正処理の手順を示すフローチャートである。 FIG. 13 is a flowchart showing the procedure of fluctuation information correction processing of the CPU 11 of the database generation device 1 according to Embodiment 2 of the present invention.

図１３において、データベース生成装置１のＣＰＵ１１は、図３のステップＳ３０４の処理実行後、異なる電子文書ファイルから抽出した表データの位置の相違に関する情報、及び／又は異なる電子文書ファイルから抽出したデータベース項目の相違に関する情報を少なくとも含む表データの相違に関するゆらぎ情報を抽出する（ステップＳ１３０１）。抽出するゆらぎ情報は、上述した２つに限定されるものではない。 In FIG. 13, the CPU 11 of the database generation device 1 performs the processing of step S 304 in FIG. 3, and information on the difference in the position of table data extracted from different electronic document files and / or database items extracted from different electronic document files. Fluctuation information relating to differences in table data including at least information relating to differences in the table is extracted (step S1301). The fluctuation information to be extracted is not limited to the two described above.

ＣＰＵ１１は、データベース項目、データ抽出規則及び抽出されたゆらぎ情報に基づいて、データ抽出規則の変更部分が存在するか否かを判断する（ステップＳ１３０２）。ＣＰＵ１１が、変更部分が存在しないと判断した場合（ステップＳ１３０２：ＮＯ）、ＣＰＵ１１は、ゆらぎ情報に起因するデータ抽出規則に対する何らかの補正処理（以下、ゆらぎ補正）が実行されていないと判断して、処理を図３のステップＳ３０６へ進める。ＣＰＵ１１が、変更部分が存在すると判断した場合（ステップＳ１３０２：ＹＥＳ）、ＣＰＵ１１は、同一のデータベース項目に対しては同一の、異なるデータベース項目に対しては異なるタグ情報を付与する（ステップＳ１３０３）。 The CPU 11 determines whether there is a changed part of the data extraction rule based on the database item, the data extraction rule, and the extracted fluctuation information (step S1302). When the CPU 11 determines that there is no changed part (step S1302: NO), the CPU 11 determines that any correction processing (hereinafter, fluctuation correction) for the data extraction rule caused by the fluctuation information is not executed. The process proceeds to step S306 in FIG. When the CPU 11 determines that there is a changed part (step S1302: YES), the CPU 11 assigns the same tag information to the same database item and different tag information to different database items (step S1303).

ＣＰＵ１１は、データ抽出規則の変更部分を反映してデータベース項目及び対応するデータを抽出し（ステップＳ１３０４）、データベース項目に付与されているタグ情報に従ってデータベース項目を配列して（ステップＳ１３０５）、処理を図３のステップＳ３０６へ進める。 The CPU 11 extracts the database item and corresponding data reflecting the changed part of the data extraction rule (step S1304), arranges the database item according to the tag information given to the database item (step S1305), and performs the processing. The process proceeds to step S306 in FIG.

以上のように本実施の形態２によれば、表データ間にゆらぎ情報が存在する場合であっても、タグ情報を基礎として同一項目についてはデータを集約することができ、ユーザが項目の入れ替わりに関する情報等を正確に把握することなく、新たなデータベースを生成することができる。そして、データベース項目及び対応するデータを抽出した時点でデータ型を判断することにより、不要な表データについて項目のデータ型を判断する必要がない。また、データ型が数値型であるデータベース項目についてのみ集計項目として追加して表示することができ、異なるデータ形式を有する複数の表データを基礎とする場合であっても、新たなデータベースを生成するとともに、正しく集計結果を算出することが可能となる。 As described above, according to the second embodiment, even when fluctuation information exists between table data, data can be aggregated for the same item on the basis of tag information, and the user can change items. It is possible to generate a new database without accurately grasping information and the like regarding the information. Then, by determining the data type when the database item and corresponding data are extracted, it is not necessary to determine the data type of the item for unnecessary table data. In addition, only database items whose data types are numeric types can be added and displayed as aggregated items, and a new database is generated even when based on a plurality of table data having different data formats. At the same time, it is possible to calculate the total result correctly.

なお、本発明は上記実施例に限定されるものではなく、本発明の趣旨の範囲内であれば多種の変更、改良等が可能である。例えば変更すべきデータ抽出規則が見つかった場合、自動的にデータ抽出規則を変更しても良いし、ユーザによる変更指示の入力を受け付けても良い。また、データ抽出規則が変更された場合、自動的に再度データベース生成処理を実行するようにしても良いし、ユーザによるデータベース再生成指示の入力を受け付けても良い。 The present invention is not limited to the above-described embodiments, and various changes and improvements can be made within the scope of the present invention. For example, when a data extraction rule to be changed is found, the data extraction rule may be automatically changed or an input of a change instruction by the user may be accepted. Further, when the data extraction rule is changed, the database generation process may be automatically executed again, or an input of a database regeneration instruction by the user may be accepted.

本発明の実施の形態１に係るデータベース生成装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the database production | generation apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係るデータベース生成装置の機能ブロック図である。It is a functional block diagram of the database production | generation apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係るデータベース生成装置のＣＰＵのデータベース生成処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the database production | generation process of CPU of the database production | generation apparatus which concerns on Embodiment 1 of this invention. 罫線情報に基づいて表データの抽出を行う処理の例示図である。It is an illustration figure of the process which extracts table data based on ruled line information. 本発明の実施の形態１に係るデータベース生成装置のＣＰＵの手動特定処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the manual specific process of CPU of the database production | generation apparatus which concerns on Embodiment 1 of this invention. 範囲指定が必要となる場合の例示図である。It is an illustration figure in case a range specification is needed. 本発明の実施の形態１に係るデータベース生成装置のＣＰＵの数値変換処理を用いる場合のデータ型判断処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the data type judgment process in the case of using the numerical value conversion process of CPU of the database production | generation apparatus which concerns on Embodiment 1 of this invention. 一の表データに数値型データを含む同一項目が複数存在する場合の例示図である。It is an illustration figure in case the same item containing numerical type data exists in one table data. Ａｇ含有量の一覧表の例示図である。It is an illustration figure of the list of Ag content. 表データの位置に相違が存在する「表位置ゆらぎ情報」の説明図である。It is explanatory drawing of "table position fluctuation information" in which the difference exists in the position of table data. 表データの項目に相違が存在する「項目ゆらぎ情報」の説明図である。It is explanatory drawing of "item fluctuation information" in which the difference exists in the item of table data. 本発明の実施の形態２に係るデータベース生成装置の機能ブロック図である。It is a functional block diagram of the database production | generation apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係るデータベース生成装置のＣＰＵのゆらぎ情報の補正処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the correction process of the fluctuation information of CPU of the database production | generation apparatus which concerns on Embodiment 2 of this invention.

Explanation of symbols

１データベース生成装置
２ネットワーク
１１ＣＰＵ
１２メモリ
１３記憶装置
１４Ｉ／Ｏインタフェース
１５ビデオインタフェース
１６可搬型ディスクドライブ
１７通信インタフェース
１８内部バス
２３表示装置
９０可搬型記録媒体
１００コンピュータプログラム
１３１電子文書ファイル記憶部
１３２データ抽出規則記憶部
１３３データベース記憶部
１３４ゆらぎ情報記憶部 1 Database generator 2 Network 11 CPU
12 memory 13 storage device 14 I / O interface 15 video interface 16 portable disk drive 17 communication interface 18 internal bus 23 display device 90 portable recording medium 100 computer program 131 electronic document file storage unit 132 data extraction rule storage unit 133 database storage 134 Fluctuation information storage unit

Claims

In a database generation device that generates a new database based on data extracted from one or a plurality of electronic document files including table data,
Electronic document file acquisition means for acquiring one or more electronic document files;
A data extraction rule specifying means for specifying a database item and a data extraction rule of the database to be generated;
Data extraction means for extracting the database item and corresponding data from one or a plurality of the electronic document files with the specified database item and data extraction rule;
Data type detection means for detecting the data type for each extracted database item;
Type determination means for determining whether the data type is a numeric type for each database item;
An instruction determining means for determining whether or not an aggregation instruction has been accepted for the database item for which the data type is determined to be a numerical type by the type determining means;
When the instruction determining unit determines that the aggregation instruction is not received, the extracted database items and corresponding data are displayed in a list, and when the instruction determining unit determines that the aggregation instruction is received, the aggregation item for each database item And a display means for displaying a list of the database items and corresponding data.

The data type detecting means includes
Numerical value conversion means for converting numerical values for each database item;
An error determination means for determining whether or not an error has occurred in numerical conversion;
If it is determined by the error determination means that an error has occurred, the data type of the database item is detected as a character type. If it is determined that no error has occurred, the data type of the database item is a numeric type. The database generating apparatus according to claim 1, further comprising: a detecting unit that detects the database.

Fluctuation information extracting means for extracting fluctuation information relating to differences in table data including at least information relating to differences in position of table data extracted from different electronic document files and / or information relating to differences in database items extracted from different electronic document files; ,
Determining means for determining whether there is a changed portion of the data extraction rule based on the database item, the data extraction rule and the extracted fluctuation information;
Tag information providing means for assigning different tag information to the same database item and different database items when it is determined by the judging means that there is a changed part, and
The data extraction means is adapted to extract the database item and corresponding data reflecting the changed part of the data extraction rule,
The said display means arranges the said database item according to the tag information provided to the said database item, The said database item and corresponding data are displayed as a list, The list is displayed. Database generator.

In a database generation method that can be executed by a database generation device that generates a new database based on data extracted from one or a plurality of electronic document files including table data,
Obtaining one or more electronic document files;
Identify database items and data extraction rules for the database to be generated,
Extracting the database item and corresponding data from one or more electronic document files with the specified database item and data extraction rule,
Detect the data type for each extracted database item,
For each database item, determine whether the data type is a numeric type,
For database items that are determined to be numeric data type, determine whether or not the aggregation instruction has been accepted,
When it is determined that the aggregation instruction is not accepted, the extracted database items and corresponding data are displayed in a list. When it is determined that the aggregation instruction is accepted, the database item and the corresponding data are added by adding the aggregation item for each database item. A database generation method characterized by displaying a list.

Numeric conversion for each database item,
Determine whether an error has occurred in numeric conversion,
When it is determined that an error has occurred, it is detected that the data type of the database item is a character type, and when it is determined that no error has occurred, it is detected that the data type of the database item is a numeric type. The database generation method according to claim 4.

Fluctuation information relating to differences in table data including at least information relating to differences in position of table data extracted from different electronic document files and / or information relating to differences in database items extracted from different electronic document files;
Based on the database item, the data extraction rule and the extracted fluctuation information, determine whether there is a change part of the data extraction rule,
If it is determined that there is a changed part, the same database item is assigned the same, different database items are assigned different tag information,
Extracting the database item and corresponding data reflecting the changed part of the data extraction rules,
6. The database generation method according to claim 4, wherein the database items are arranged according to tag information given to the database items, and the database items and corresponding data are displayed in a list.

In a computer program that can be executed by a database generation device that generates a new database based on data extracted from one or more electronic document files including table data,
The database generation device;
Electronic document file acquisition means for acquiring one or more electronic document files;
A data extraction rule specifying means for specifying a database item of the database to be generated and a data extraction rule;
Data extraction means for extracting the database item and corresponding data from one or a plurality of the electronic document files with the specified database item and data extraction rule;
Data type detection means for detecting the data type for each extracted database item;
Type determination means for determining whether the data type is a numeric type for each database item;
The type determining means determines whether or not the aggregation instruction is accepted for the database item whose data type is a numeric type, and the instruction determining means determines that the aggregation instruction is not accepted. In this case, the extracted database items and the corresponding data are displayed in a list, and when it is determined that the totaling instruction is received by the instruction determination unit, the database items and the corresponding data are displayed in a list by adding the totaling items for each database item. A computer program which functions as a display means.

The data type detection means;
Numerical value conversion means for converting numerical values for each database item,
Error determination means for determining whether or not an error has occurred in numerical conversion, and when the error determination means determines that an error has occurred, the data type of the database item is detected as a character type, and no error has occurred The computer program according to claim 7, wherein the computer program is made to function as a detection unit that detects that the data type of the database item is a numeric type when it is determined.

The database generation device;
Fluctuation information extracting means for extracting fluctuation information relating to differences in table data including at least information relating to differences in position of table data extracted from different electronic document files and / or information relating to differences in database items extracted from different electronic document files;
Based on the database item, the data extraction rule, and the extracted fluctuation information, a determination unit that determines whether there is a changed part of the data extraction rule, and a case where the determination unit determines that the changed part exists , Function as tag information giving means to give the same database item to the same, different database items to give different tag information,
Causing the data extraction means to function as means for extracting the database item and corresponding data reflecting the changed part of the data extraction rule;
9. The display unit according to claim 7, wherein the database item is arranged according to tag information given to the database item, and functions as a unit that displays the database item and corresponding data as a list. Computer program.