JP4832952B2

JP4832952B2 - Database analysis system, database analysis method and program

Info

Publication number: JP4832952B2
Application number: JP2006131629A
Authority: JP
Inventors: 令子森山
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2006-05-10
Filing date: 2006-05-10
Publication date: 2011-12-07
Anticipated expiration: 2026-05-10
Also published as: JP2007304796A

Description

本発明は、データベースシステムの統合処理の効率化を図る技術に関し、例えば、同一業務に関する異なるシステムを統合する際のデータベースシステム統合処理の効率化を図る技術に関する。 The present invention relates to a technique for improving the efficiency of database system integration processing, for example, a technique for improving efficiency of database system integration processing when different systems related to the same business are integrated.

従来のデータベース統合手法は、複数の異なるデータベースに属するテーブルの属性が有する属性値同士を比較することで、属性値同士の比較結果に基づいて、各テーブル内の属性項目同士を対応付けるマッピング・モデルを生成する手段を備え、その比較結果を出す際には、前述の属性値同士を比較して一致度が高いものから各テーブル内の属性項目同士を対応付けるマッピング・モデルを生成することでデータ統合を行っていた（例えば特許文献１）。
特開２００４−０８６７８２号公報第１頁〜８頁、第１図 The conventional database integration method compares the attribute values of the attributes of the tables belonging to different databases, and creates a mapping model that associates the attribute items in each table based on the comparison result of the attribute values. When the comparison result is output, data is integrated by comparing the attribute values described above and generating a mapping model that associates the attribute items in each table with the one with the highest degree of matching. (For example, Patent Document 1).
JP 2004-067882 A, pages 1 to 8, FIG.

従来のデータベースシステム統合方法は、テーブルの属性値情報を確認してマッピングモデルを作成することで一致度を計算して同一項目候補を発見していたため、統合するデータベースシステムについてはテーブルの属性値情報を与えなくてはならず、システムの規模が大きくなると与えなくてはならない情報量が多くなり、またデータベース形式変更などの整合性もユーザが管理しなくてはならないという問題点があった。 In the conventional database system integration method, the matching value is calculated by checking the attribute value information of the table and creating a mapping model, and the same item candidate is found. Therefore, for the database system to be integrated, the attribute value information of the table There is a problem that the amount of information that must be given increases as the scale of the system increases, and the consistency of database format changes and the like must be managed by the user.

この発明は上記のような問題点を解決することを主な目的としており、ユーザ入力・メンテナンスの工数削減を主な目的とする。
具体的には、ユーザがテーブル情報を与える必要なく、よく使用される１つ以上のキー項目に関する情報を与えるだけで、データベースシステム統合のための同一項目候補を知ることができる機構を実現することを主な目的とする。 The main object of the present invention is to solve the above-mentioned problems, and the main object is to reduce the man-hours for user input and maintenance.
Specifically, it is possible to realize a mechanism that allows the user to know the same item candidate for database system integration only by providing information on one or more frequently used key items without having to provide table information. Is the main purpose.

本発明に係るデータベース解析システムは、
それぞれに複数のデータ項目が含まれる第１のデータベース及び第２のデータベースに対する解析を行うデータベース解析システムであって、
第１のデータベースに含まれる複数のデータ項目及び第２のデータベースに含まれる複数のデータ項目のうち第１のデータベース及び第２のデータベース間で項目属性が共通する一部のデータ項目を共通属性データ項目として示す共通属性データ項目情報を入力する共通属性データ項目情報入力部と、
共通属性データ項目情報入力部により入力された共通属性データ項目情報と第１のデータベースのアクセスログとを用いて、第１のデータベースの共通属性データ項目と第１のデータベースの共通属性データ項目以外のデータ項目の各々との相関関係を解析し、各データ項目の共通属性データ項目に対する相関値を示す第１の相関情報を生成する第１のデータベース相関解析部と、
共通属性データ項目情報入力部により入力された共通属性データ項目情報と第２のデータベースのアクセスログとを用いて、第２のデータベースの共通属性データ項目と第２のデータベースの共通属性データ項目以外のデータ項目の各々との相関関係を解析し、各データ項目の共通属性データ項目に対する相関値を示す第２の相関情報を生成する第２のデータベース相関解析部と、
第１の相関情報に示された各データ項目の共通属性データ項目に対する相関値と、第２の相関情報に示された各データ項目の共通属性データ項目に対する相関値とに基づき、第１のデータベースの共通属性データ項目以外のデータ項目及び第２のデータベースの共通属性データ項目以外のデータ項目から第１のデータベース及び第２のデータベース間で項目属性が共通するデータ項目の候補を共通属性データ項目候補として抽出する共通属性データ項目候補抽出部とを有することを特徴とする。 The database analysis system according to the present invention includes:
A database analysis system for performing analysis on a first database and a second database each including a plurality of data items,
Among the plurality of data items included in the first database and the plurality of data items included in the second database, some of the data items having the same item attribute between the first database and the second database are common attribute data. A common attribute data item information input unit for inputting common attribute data item information indicated as an item;
Using the common attribute data item information input by the common attribute data item information input unit and the access log of the first database, other than the common attribute data item of the first database and the common attribute data item of the first database A first database correlation analysis unit that analyzes a correlation with each of the data items and generates first correlation information indicating a correlation value with respect to the common attribute data item of each data item;
Using the common attribute data item information input by the common attribute data item information input unit and the access log of the second database, other than the common attribute data item of the second database and the common attribute data item of the second database A second database correlation analysis unit that analyzes a correlation with each of the data items and generates second correlation information indicating a correlation value with respect to the common attribute data item of each data item;
Based on the correlation value for each common data item of each data item indicated in the first correlation information and the correlation value for each common data item of each data item indicated in the second correlation information, the first database Data item candidates other than the common attribute data item and data items other than the common attribute data item in the second database are data item candidates having common item attributes between the first database and the second database. And a common attribute data item candidate extraction unit that extracts as

本発明によれば、第１のデータベースのアクセスログと第２のデータベースのアクセスログを使用して、共通属性データ項目とそれ以外のデータ項目との相関関係を求めて、相関関係の類似性から２つのデータベースにおける共通属性データ項目候補を抽出するようにしているので、データベースシステム統合時の工数を削減することができる。 According to the present invention, using the access log of the first database and the access log of the second database, the correlation between the common attribute data item and the other data items is obtained, and the similarity between the correlations is calculated. Since common attribute data item candidates in the two databases are extracted, the man-hours required for database system integration can be reduced.

実施の形態１．
図１は、本実施の形態に係るデータベース解析システム１００の構成例を示す図である。
図１において、本実施の形態に係るデータベース解析システム１００は、Ａ社システム１、Ｂ社システム２、及び同一項目候補抽出・表示システム１４に大別することができる。 Embodiment 1 FIG.
FIG. 1 is a diagram illustrating a configuration example of a database analysis system 100 according to the present embodiment.
In FIG. 1, the database analysis system 100 according to the present embodiment can be broadly divided into a company A system 1, a company B system 2, and the same item candidate extraction / display system 14.

処理ブロック１及び２はシステムを統合するＡ社システム、Ｂ社システムそれぞれにおける処理ブロックを表している。図示していないが、Ａ社システムのデータベースは第１のデータベースの例であり、Ｂ社システムのデータベースは第２のデータベースの例である。
ユーザ付加情報８はＡ社システム１とＢ社システム２のそれぞれのデータベースシステムにおける共通項目を示す情報である。つまり、ユーザ付加情報８は、Ａ社システムのデータベース（第１のデータベース）に含まれる複数のデータ項目及びＢ社システムのデータベース（第２のデータベース）に含まれる複数のデータ項目のうち両データベース間で項目属性が共通する一部の共通項目（共通属性データ項目）を示す情報であり、共通属性データ項目情報の例である。
そして、Ａ社システム１、Ｂ社システム２ともに、ユーザ付加情報８を入力するためのユーザ付加情報入力部１２（共通属性データ項目情報入力部）を備える。
アクセスログ３は、各システムにおける業務アプリケーション実行時のデータベースアクセスログである。Ａ社システムのアクセスログ３は第１のデータベースのアクセスログの例であり、Ｂ社システムのアクセスログ３は第２のデータベースのアクセスログの例である。
アクセスログ処理部４はアクセスログ３から解析に必要な情報を抽出する処理部である。
解析用ＤＢ（データベース）５はアクセスログ処理部４にて生成された解析用データベース群である。
相関関係抽出部６は解析用ＤＢ５とユーザ付加情報８で指定された情報から相関関係を求める処理部である。
項目の相関情報７は相関関係抽出部６で得られた相関情報である。
本実施の形態において、アクセスログ処理部４、解析用ＤＢ５、相関関係抽出部６は、データベース相関解析部を構成し、Ａ社システムのアクセスログ処理部４、解析用ＤＢ５、相関関係抽出部６は、第１のデータベース相関解析部を構成し、Ｂ社システムのアクセスログ処理部４、解析用ＤＢ５、相関関係抽出部６は、第２のデータベース相関解析部を構成する。
これら、アクセスログ処理部４、解析用ＤＢ５、相関関係抽出部６は、それぞれのシステムのデータベースにおけるデータ項目を示したデータ項目情報１３と、ユーザ付加情報８と、それぞれのデータベースのアクセスログ３とを用いて、それぞれのデータベースにおける共通項目と共通項目以外のデータ項目の各々との相関関係を解析し、各データ項目の共通項目に対する相関値を示す相関情報７を生成する。
なお、Ａ社システムの相関関係抽出部６により出力される相関情報７は第１の相関情報の例であり、Ｂ社システムの相関関係抽出部６により出力される相関情報７は第２の相関情報の例である。 Processing blocks 1 and 2 represent processing blocks in each of the company A system and the company B system that integrate the systems. Although not shown, the database of the company A system is an example of the first database, and the database of the company B system is an example of the second database.
The user additional information 8 is information indicating common items in the database systems of the company A system 1 and the company B system 2. In other words, the user additional information 8 is the data between the two of the plurality of data items included in the database of the company A system (first database) and the plurality of data items included in the database of the company B system (second database). This is information indicating some common items (common attribute data items) having common item attributes, and is an example of common attribute data item information.
The company A system 1 and the company B system 2 both include a user additional information input unit 12 (common attribute data item information input unit) for inputting user additional information 8.
The access log 3 is a database access log when a business application is executed in each system. The access log 3 of the A company system is an example of the access log of the first database, and the access log 3 of the B company system is an example of the access log of the second database.
The access log processing unit 4 is a processing unit that extracts information necessary for analysis from the access log 3.
An analysis DB (database) 5 is an analysis database group generated by the access log processing unit 4.
The correlation extracting unit 6 is a processing unit that obtains a correlation from the information specified by the analysis DB 5 and the user additional information 8.
The item correlation information 7 is the correlation information obtained by the correlation extraction unit 6.
In the present embodiment, the access log processing unit 4, the analysis DB 5, and the correlation extraction unit 6 constitute a database correlation analysis unit, and the access log processing unit 4, the analysis DB 5 and the correlation extraction unit 6 of the company A system. Constitutes a first database correlation analysis unit, and the access log processing unit 4, the analysis DB 5, and the correlation extraction unit 6 of the B company system constitute a second database correlation analysis unit.
These access log processing unit 4, analysis DB 5, and correlation extraction unit 6 include data item information 13 indicating data items in the database of each system, user additional information 8, and access log 3 of each database. Is used to analyze the correlation between the common item in each database and each of the data items other than the common item, and generate correlation information 7 indicating a correlation value for the common item of each data item.
The correlation information 7 output by the correlation extraction unit 6 of the company A system is an example of the first correlation information, and the correlation information 7 output by the correlation extraction unit 6 of the company B system is the second correlation information. It is an example of information.

同一項目候補抽出部９はＡ社システム１とＢ社システム２のそれぞれの相関情報７から同一項目候補を発見する処理部である。同一項目候補表示部１１（表示部）は同一項目候補抽出部９で得られた情報である同一項目候補１０を表示する部分である。
同一項目候補抽出部９は、Ａ社システムの相関情報７に示された各データ項目の共通項目に対する相関値と、Ｂ社システムの相関情報７に示された各データ項目の共通項目に対する相関値とに基づき、両データベース間で項目属性が共通する可能性の高い同一項目候補（共通属性データ項目候補）を抽出し、同一項目候補表示部１１が表示する。同一項目候補抽出部９は、共通属性データ項目候補抽出部の例である。 The same item candidate extraction unit 9 is a processing unit for finding the same item candidate from the correlation information 7 of each of the company A system 1 and the company B system 2. The same item candidate display unit 11 (display unit) is a part that displays the same item candidate 10 that is information obtained by the same item candidate extraction unit 9.
The same item candidate extraction unit 9 calculates the correlation value for the common item of each data item shown in the correlation information 7 of the company A system and the correlation value for the common item of each data item shown in the correlation information 7 of the company B system. Based on the above, the same item candidate (common attribute data item candidate) having a high possibility of having the same item attribute between both databases is extracted and displayed by the same item candidate display unit 11. The same item candidate extraction unit 9 is an example of a common attribute data item candidate extraction unit.

本実施の形態では、同一業務を行う異なるシステムを統合する場合など、データベースシステムの統合を行うことを考える。ユーザはあらかじめ、氏名、支店番号といった統合する双方のデータベースの代表的な項目（キー項目）についての共通情報を与えておく。業務アプリケーション実行時のデータベースシステムへのアクセスログを採取し、このアクセスログをキー項目について相関分析することで各システムにおける項目間の相関関係を知ることができる。双方のデータベースシステムについて、キー項目に関する相関情報からデータベース統合時の同一項目候補を発見し表示することで、データ統合時の効率向上を図る。 In this embodiment, it is considered that database systems are integrated, for example, when different systems performing the same business are integrated. The user gives in advance common information about representative items (key items) of both databases to be integrated, such as name and branch number. By collecting an access log to the database system at the time of execution of the business application and performing a correlation analysis on the key item, it is possible to know the correlation between items in each system. About both database systems, the same item candidate at the time of database integration is discovered and displayed from the correlation information regarding the key items, thereby improving the efficiency at the time of data integration.

ここで、本実施の形態に係るデータベース解析システム１００の動作を図１６を参照して概説する。
最初に、Ａ社システム１及びＢ社システム２の各々において、ユーザ付加情報入力部１２が両社のデータベース間で項目属性が共通することが既知である共通項目が示されているユーザ付加情報８を入力する（Ｓ１６０１）。詳細は、後述するが、ユーザ付加情報８は、例えば、図３に示すように、既知の共通項目が示される。
次に、アクセスログ処理部４がアクセスログ３内のＳＱＬ文を解析する（Ｓ１６０２）（第１のデータベース相関解析ステップ）（第２のデータベース相関解析ステップ）。詳細は、後述するが、アクセスログ３は図２に示すように、それぞれのデータベースに対してアクセスした際のＳＱＬ文が記録されており、ＳＱＬ文に含まれるデータ項目の出現回数等を解析する。
次に、相関関係抽出部６がアクセスログ処理部４の解析結果、ユーザ付加情報８等を用いて、それぞれのシステムのデータベースにおける共通項目とその他の項目との相関関係を項目ごとに解析し、各項目の共通項目に対する相関値を示す相関情報７を生成する（Ｓ１６０３）（第１のデータベース相関解析ステップ）（第２のデータベース相関解析ステップ）。以上のＳ１６０１〜Ｓ１６０３は、Ａ社システム１、Ｂ社システム２の各々において実施される。
次に、同一項目候補抽出・表示システム１４において、同一項目候補抽出部９が、Ａ社システム１からの相関情報７及びＢ社システム２からの相関情報７の相関値から類似度値を算出し、算出した類似度値に従ってＡ社システムのデータベースとＢ社システムのデータベースとの間で項目属性が共通する可能性が高い同一項目候補を抽出し（Ｓ１６０４）（共通属性データ項目候補抽出ステップ）、同一項目候補表示部１１が同一項目候補を表示する（Ｓ１６０５）。
これにより、複数のデータベース間で同一項目の可能性が高い同一候補項目を抽出してデータベース統合時の効率を向上させる。 Here, the operation of the database analysis system 100 according to the present embodiment will be outlined with reference to FIG.
First, in each of the company A system 1 and the company B system 2, the user additional information input unit 12 displays user additional information 8 in which common items whose item attributes are known to be common between the databases of both companies are indicated. Input (S1601). Although details will be described later, the user additional information 8 indicates known common items as shown in FIG. 3, for example.
Next, the access log processing unit 4 analyzes the SQL sentence in the access log 3 (S1602) (first database correlation analysis step) (second database correlation analysis step). As will be described in detail later, as shown in FIG. 2, the access log 3 records the SQL sentence when accessing each database, and analyzes the number of appearances of data items included in the SQL sentence. .
Next, the correlation extraction unit 6 analyzes the correlation between common items and other items in the database of each system using the analysis result of the access log processing unit 4, the user additional information 8 and the like, Correlation information 7 indicating a correlation value with respect to a common item of each item is generated (S1603) (first database correlation analysis step) (second database correlation analysis step). The above steps S1601 to S1603 are executed in each of the company A system 1 and the company B system 2.
Next, in the same item candidate extraction / display system 14, the same item candidate extraction unit 9 calculates a similarity value from the correlation values of the correlation information 7 from the company A system 1 and the correlation information 7 from the company B system 2. In accordance with the calculated similarity value, the same item candidate having a high possibility that the item attribute is common between the database of the company A system and the database of the company B system is extracted (S1604) (common attribute data item candidate extraction step), The same item candidate display unit 11 displays the same item candidate (S1605).
Thereby, the same candidate item with the high possibility of the same item is extracted among several databases, and the efficiency at the time of database integration is improved.

図１では、Ａ社システム１、Ｂ社システム２、同一項目候補抽出・表示システム１４の各々を物理的に異なるコンピュータで構成する例を示しているが、これらの要素を１つのコンピュータに統合し、データベース解析システムを一つのコンピュータで実現してもよい。 FIG. 1 shows an example in which each of the company A system 1, the company B system 2, and the same item candidate extraction / display system 14 is configured by physically different computers. However, these elements are integrated into one computer. The database analysis system may be realized by a single computer.

次に、本実施の形態に係るデータベース解析システム１００、Ａ社システム１、Ｂ社システム２、同一項目候補抽出・表示システム１４のハードウェア構成例について説明する。 Next, a hardware configuration example of the database analysis system 100, the company A system 1, the company B system 2, and the same item candidate extraction / display system 14 according to the present embodiment will be described.

図１７は、本実施の形態及び以下に述べる実施の形態に示すデータベース解析システム１００、Ａ社システム１、Ｂ社システム２、同一項目候補抽出・表示システム１４のハードウェア資源の一例を示す図である。
図１７において、データベース解析システム１００、Ａ社システム１、Ｂ社システム２、同一項目候補抽出・表示システム１４は、プログラムを実行するＣＰＵ９１１（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、プロセッサともいう）を備えている。ＣＰＵ９１１は、バス９１２を介して、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９１３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１４、通信ボード９１５、表示装置９０１、キーボード９０２、マウス９０３、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。更に、ＣＰＵ９１１は、ＦＤＤ９０４（ＦｌｅｘｉｂｌｅＤｉｓｋＤｒｉｖｅ）、コンパクトディスク装置９０５（ＣＤＤ）、プリンタ装置９０６、スキャナ装置９０７と接続していてもよい。また、磁気ディスク装置９２０の代わりに、光ディスク装置、メモリカード読み書き装置などの記憶装置でもよい。
ＲＡＭ９１４は、揮発性メモリの一例である。ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０の記憶媒体は、不揮発性メモリの一例である。これらは、記憶装置あるいは記憶部の一例である。
通信ボード９１５、キーボード９０２、スキャナ装置９０７、ＦＤＤ９０４などは、入力部、入力装置の一例である。
また、通信ボード９１５、表示装置９０１、プリンタ装置９０６などは、出力部、出力装置の一例である。 FIG. 17 is a diagram showing an example of hardware resources of the database analysis system 100, the company A system 1, the company B system 2, and the same item candidate extraction / display system 14 shown in the present embodiment and the embodiments described below. is there.
In FIG. 17, the database analysis system 100, the company A system 1, the company B system 2, and the same item candidate extraction / display system 14 include a CPU 911 (Central Processing Unit, a central processing unit, a processing unit, an arithmetic unit, a micro unit that executes a program. A processor, a microcomputer, and a processor). The CPU 911 is connected to, for example, a ROM (Read Only Memory) 913, a RAM (Random Access Memory) 914, a communication board 915, a display device 901, a keyboard 902, a mouse 903, and a magnetic disk device 920 via a bus 912. Control hardware devices. Further, the CPU 911 may be connected to an FDD 904 (Flexible Disk Drive), a compact disk device 905 (CDD), a printer device 906, and a scanner device 907. Further, instead of the magnetic disk device 920, a storage device such as an optical disk device or a memory card read / write device may be used.
The RAM 914 is an example of a volatile memory. The storage media of the ROM 913, the FDD 904, the CDD 905, and the magnetic disk device 920 are an example of a nonvolatile memory. These are examples of a storage device or a storage unit.
The communication board 915, the keyboard 902, the scanner device 907, the FDD 904, and the like are examples of an input unit and an input device.
Further, the communication board 915, the display device 901, the printer device 906, and the like are examples of an output unit and an output device.

通信ボード９１５は、図１に示すように、ネットワークに接続されている。例えば、通信ボード９１５は、ＬＡＮ（ローカルエリアネットワーク）、インターネット、ＷＡＮ（ワイドエリアネットワーク）などに接続されていても構わない。
磁気ディスク装置９２０には、オペレーティングシステム９２１（ＯＳ）、ウィンドウシステム９２２、プログラム群９２３、ファイル群９２４が記憶されている。プログラム群９２３のプログラムは、ＣＰＵ９１１、オペレーティングシステム９２１、ウィンドウシステム９２２により実行される。 As shown in FIG. 1, the communication board 915 is connected to a network. For example, the communication board 915 may be connected to a LAN (local area network), the Internet, a WAN (wide area network), or the like.
The magnetic disk device 920 stores an operating system 921 (OS), a window system 922, a program group 923, and a file group 924. The programs in the program group 923 are executed by the CPU 911, the operating system 921, and the window system 922.

上記プログラム群９２３には、以下に述べる説明において「〜部」、「〜手段」として説明している機能を実行するプログラムが記憶されている。プログラムは、ＣＰＵ９１１により読み出され実行される。
ファイル群９２４には、以下に述べる説明において、「〜の判定結果」、「〜の計算結果」、「〜の処理結果」、「〜の評価結果」等として説明している情報やデータや信号値や変数値やパラメータが、「〜ファイル」や「〜データベース」の各項目として記憶されている。「〜ファイル」や「〜データベース」は、ディスクやメモリなどの記録媒体に記憶される。ディスクやメモリになどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出され、抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示などのＣＰＵの動作に用いられる。抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示のＣＰＵの動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリ、レジスタ、キャッシュメモリ、バッファメモリ等に一時的に記憶される。
また、以下で説明するフローチャートの矢印の部分は主としてデータや信号の入出力を示し、データや信号値は、ＲＡＭ９１４のメモリ、ＦＤＤ９０４のフレキシブルディスク、ＣＤＤ９０５のコンパクトディスク、磁気ディスク装置９２０の磁気ディスク、その他光ディスク、ミニディスク、ＤＶＤ等の記録媒体に記録される。また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体によりオンライン伝送される。 The program group 923 stores programs for executing functions described as “˜unit” and “˜means” in the following description. The program is read and executed by the CPU 911.
In the file group 924, information, data, and signals described as “determination result of”, “calculation result of”, “processing result of”, “evaluation result of”, etc. in the following description Values, variable values, and parameters are stored as items of “˜file” and “˜database”. The “˜file” and “˜database” are stored in a recording medium such as a disk or a memory. Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit, and extracted, searched, referenced, compared, Used for CPU operations such as calculation, calculation, processing, editing, output, printing, and display. Information, data, signal values, variable values, and parameters are stored in the main memory, registers, cache memory, and buffers during the CPU operations of extraction, search, reference, comparison, calculation, processing, editing, output, printing, and display. It is temporarily stored in a memory or the like.
The arrows in the flowchart described below mainly indicate input / output of data and signals. The data and signal values are the RAM 914 memory, FDD 904 flexible disk, CDD 905 compact disk, magnetic disk device 920 magnetic disk, In addition, it is recorded on a recording medium such as an optical disc, a mini disc, or a DVD. Data and signals are transmitted online via a bus 912, signal lines, cables, or other transmission media.

また、以下に述べる説明において「〜部」、「〜手段」として説明しているものは、「〜回路」、「〜装置」、「〜機器」、「手段」であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。すなわち、「〜部」、「〜手段」として説明しているものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。或いは、ソフトウェアのみ、或いは、素子・デバイス・基板・配線などのハードウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ等の記録媒体に記憶される。プログラムはＣＰＵ９１１により読み出され、ＣＰＵ９１１により実行される。すなわち、プログラムは、以下に述べる「〜部」、「〜手段」としてコンピュータを機能させるものである。あるいは、以下に述べる「〜部」、「〜手段」の手順や方法をコンピュータに実行させるものである。 In addition, what is described as “to part” and “to means” in the following description may be “to circuit”, “to apparatus”, “to apparatus”, and “means”. It may be “˜step”, “˜procedure”, “˜processing”. That is, what is described as “˜unit” and “˜means” may be realized by firmware stored in the ROM 913. Alternatively, it may be implemented only by software, or only by hardware such as elements, devices, substrates, and wirings, by a combination of software and hardware, or by a combination of firmware. Firmware and software are stored as programs in a recording medium such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, and a DVD. The program is read by the CPU 911 and executed by the CPU 911. That is, the program causes the computer to function as “to part” and “to means” described below. Alternatively, the procedure or method of “˜unit” and “˜means” described below is executed by a computer.

このように、本実施の形態及び以下に述べる実施の形態に示すデータベース解析システム１００、Ａ社システム１、Ｂ社システム２、同一項目候補抽出・表示システム１４は、処理装置たるＣＰＵ、記憶装置たるメモリ、磁気ディスク等、入力装置たるキーボード、マウス等、出力装置たる表示装置、通信ボード等を備えるコンピュータであり、上記したように「〜部」、「〜手段」として示された機能をこれら処理装置、記憶装置、入力装置、出力装置を用いて実現するものである。 As described above, the database analysis system 100, the company A system 1, the company B system 2, and the same item candidate extraction / display system 14 shown in the present embodiment and the embodiments described below are a processing device CPU and a storage device. A computer including a memory, a magnetic disk, etc., a keyboard as an input device, a mouse, etc., a display device as an output device, a communication board, etc. This is realized by using a device, a storage device, an input device, and an output device.

次に、本実施の形態に係るデータベース解析システム１００の動作を詳細に説明する。
まず、Ａ社システム１とＢ社システム２のそれぞれにおいて、アクセスログ３を採取する。これは業務アプリケーションがデータベースシステムにアクセスしたＳＱＬ文を一定期間採取したものである。データベースシステムが提供する採取ツールや市販のツール、あるいはＯＳに付属のツールを使用するなど手段は問わない。
業務アプリケーションのアクセスログの例を図２に示す。
２１はファイルフォーマットを表しており、２２がアクセスログである。各行は、アクセスログの情報を識別するための行ＩＤ２３と業務アプリケーションから使用されたＳＱＬ文の情報２４から成り立っている。また、ＳＱＬ文は抽出、削除、挿入、更新に関する命令を対象とする。 Next, the operation of the database analysis system 100 according to the present embodiment will be described in detail.
First, the access log 3 is collected in each of the company A system 1 and the company B system 2. This is a collection of SQL statements that a business application has accessed the database system for a certain period of time. Any means may be used, such as using a collection tool provided by the database system, a commercially available tool, or a tool attached to the OS.
An example of a business application access log is shown in FIG.
21 represents a file format, and 22 represents an access log. Each line is composed of a line ID 23 for identifying information of the access log and information 24 of the SQL sentence used from the business application. In addition, the SQL sentence targets instructions related to extraction, deletion, insertion, and update.

ここでユーザ付加情報８について説明する。
図３に示すように、Ａ社データベース３１とＢ社データベース３２が存在する場合に、両社の共通項目情報を３３のように意味とそれに対応するＡ社システムにおける項目名、Ｂ社システムにおける項目名を指定するものである。ユーザ付加情報８の指定方式については、ＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）による入力やファイル指定など方式は問わない。 Here, the user additional information 8 will be described.
As shown in FIG. 3, when the A company database 31 and the B company database 32 exist, the common item information of both companies is represented as 33, the corresponding item name in the A company system, and the item name in the B company system. Is specified. The designation method of the user additional information 8 may be any method such as input by GUI (Graphical User Interface) or file designation.

アクセスログ処理部４は、図２に示すようなアクセスログから解析用ＤＢ５の生成、即ち図５〜７に示す各種テーブルを作成する処理を行う。アクセスログ処理部４の動作例は、例えば、図４に示す流れとなる。
まず、Ｓ４１〜Ｓ４４の処理にて図５に示す項目情報テーブル５０を作成する。
図５に示す解析用データベース５０はアクセスログに存在する全ての項目情報について、項目ＩＤ５１と項目名５２を対応づけるもので、Ａ社データベース３１、Ｂ社データベース３２のそれぞれに対して存在するものである。また、例えば３３で指定したＡ．Ａ１とＢ．Ｂ３、Ｂ．Ｂ１とＣ．Ｃ１のように同じ意味を表すものは複数の項目名と一つの項目ＩＤを対応させる。
Ｓ４１にてアクセスログを１行づつ読み込み、Ｓ４２で、読み込んだアクセスログ情報に含まれる項目情報の全てについて項目情報テーブルに登録済であるかどうかを見て、登録されていない場合には、Ｓ４３で、項目情報テーブルに登録する。このとき、ユーザ付加情報８に示された共通項目情報３３を参照して情報を反映する。このような操作をファイルの最後の行まで繰り返す。Ｓ４４でファイルの最後までチェックしたら、項目情報テーブル５０は完成する。
続けてＳ４５でファイルのカーソルを先頭行に戻し、Ｓ４６〜Ｓ４９で図６に示す解析用ＤＢ＿Ｉ６０及び図７に示す解析用ＤＢ＿ＩＩ７０を作成する。 The access log processing unit 4 generates the analysis DB 5 from the access log as shown in FIG. 2, that is, creates various tables shown in FIGS. An example of the operation of the access log processing unit 4 is as shown in FIG.
First, the item information table 50 shown in FIG. 5 is created by the processing of S41 to S44.
The analysis database 50 shown in FIG. 5 associates the item ID 51 with the item name 52 for all item information existing in the access log, and exists for each of the company A database 31 and the company B database 32. is there. Also, for example, A. A1 and B.I. B3, B. B1 and C.I. A thing having the same meaning as C1 associates a plurality of item names with one item ID.
In step S41, the access log is read line by line. In step S42, it is checked whether all the item information included in the read access log information has been registered in the item information table. Then, register it in the item information table. At this time, the information is reflected by referring to the common item information 33 indicated in the user additional information 8. Repeat these operations until the last line of the file. If the end of the file is checked in S44, the item information table 50 is completed.
Subsequently, the cursor of the file is returned to the first line in S45, and the analysis DB_I 60 shown in FIG. 6 and the analysis DB_II 70 shown in FIG. 7 are created in S46 to S49.

図６及び図７はＳＱＬ文２４について行ＩＤ２３ごとに情報をまとめたものである。図２に示すアクセスログファイルは、抽出・削除・挿入・更新を対象としているため、各文の先頭にはＳＥＬＥＣＴ／ＤＥＬＥＴＥ／ＩＮＳＥＲＴ／ＵＰＤＡＴＥのいずれかの単語が記述されている。これを「ＤＭＬ種別」として番号化して表記する。また、多くの場合は特定の条件に対応する行に対する実行となるため、条件を指定する条件句が存在する。図２の場合は、ＳＱＬ文２４の各行におけるＷＨＥＲＥが条件句の区切りとなる。図２のそれぞれのＳＱＬ文について条件句より前の文から得た情報を図６の解析用ＤＢ＿Ｉ６０に、条件句より得た情報を図７の解析用ＤＢ＿ＩＩ７０に格納する。解析用ＤＢ＿Ｉ６０及び解析用ＤＢ＿ＩＩ７０はいずれも同じ形式のものであり、行ＩＤ２３、ＤＭＬ種別６１、項目情報６２または７２から成り立っている。解析用データベース６０及び７０に示される行ＩＤ２３は図２の行ＩＤ２３に対応し、ＤＭＬ種別６１は、ＳＱＬ文の命令ＳＥＬＥＣＴ、ＤＥＬＥＴＥ、ＩＮＳＥＲＴ、ＵＰＤＡＴＥを番号化したものである。また項目情報６２及び７２は項目ＩＤ５１を重複しないで横に並べたものである。 6 and 7 summarize information for each row ID 23 in the SQL sentence 24. FIG. Since the access log file shown in FIG. 2 is targeted for extraction / deletion / insertion / update, one of the words SELECT / DELETE / INSERT / UPDATE is described at the beginning of each sentence. This is numbered and expressed as “DML type”. In many cases, execution is performed on a line corresponding to a specific condition, so there is a condition phrase for specifying the condition. In the case of FIG. 2, WHERE in each line of the SQL statement 24 is a delimiter for the conditional phrase. For each SQL sentence in FIG. 2, information obtained from the sentence before the conditional phrase is stored in the analysis DB_I 60 in FIG. 6, and information obtained from the conditional phrase is stored in the analytical DB_II 70 in FIG. Both the analysis DB_I 60 and the analysis DB_II 70 have the same format, and are composed of a row ID 23, a DML type 61, and item information 62 or 72. The row ID 23 shown in the analysis databases 60 and 70 corresponds to the row ID 23 in FIG. 2, and the DML type 61 is a numbered SQL statement command SELECT, DELETE, INSERT, and UPDATE. The item information 62 and 72 are the item IDs 51 arranged side by side without overlapping.

次に、Ｓ４６でアクセスログを１行づつ読み込む。Ｓ４７にて条件句即ちＷＨＥＲＥより前の文について、ＷＨＥＲＥが含まれないときは文全体に対して、指定されている全ての項目名を取り出し、解析用ＤＢ＿Ｉ６０のフォーマットに従い、行ＩＤ２３、ＤＭＬ種別６１、項目情報６２の情報を作成して、解析用ＤＢ＿Ｉ６０に追加する。項目情報６２については、各項目ＩＤについて対応する項目の指定があった場合にＴｒｕｅ、なかった場合にＦａｌｓｅを設定する。例えば、図２の１行目では、ＷＨＥＲＥの前にＡ１、Ａ２、Ａ３、Ａ４が示されており、これを反映させて、図６では、項目１、２、３、４（項目３、４は図示されていない）がＴｒｕｅとなっており、これら以外の項目はＦａｌｓｅとなっている。
Ｓ４８にて条件句以降即ちＷＨＥＲＥ以降の文に対して、指定されている全ての項目名を取り出し、解析用ＤＢ＿ＩＩ７０のフォーマットに従い、行ＩＤ２３、ＤＭＬ種別６１、項目情報７２の情報を作成して、解析用ＤＢ＿ＩＩ７０に追加する。項目情報７２については、各項目ＩＤについて対応する項目の指定があった場合にＴｒｕｅ、なかった場合にＦａｌｓｅを設定する。またＷＨＥＲＥの指定がない場合は、項目情報７２の全ての項目ＩＤについてＦａｌｓｅを設定する。例えば、図２の１行目では、ＷＨＥＲＥの後にＡ２が示されており、これを反映させて、図７では、項目２がＴｒｕｅとなっており、これ以外の項目はＦａｌｓｅとなっている。 In step S46, the access log is read line by line. In S47, for the sentence before the conditional phrase, ie, WHERE, when WHERE is not included, all specified item names are extracted from the entire sentence, and line ID 23, DML type 61 are extracted according to the format of DB_I60 for analysis. The item information 62 is created and added to the analysis DB_I 60. For the item information 62, True is set when the corresponding item is designated for each item ID, and False is set when there is no designation. For example, in the first line of FIG. 2, A1, A2, A3, A4 are shown before WHERE, and in FIG. 6, items 1, 2, 3, 4 (items 3, 4, 4 are reflected). (Not shown) is True, and items other than these are False.
In S48, all specified item names are extracted for the sentence after the conditional phrase, that is, after the WHERE statement, and the information of the row ID 23, the DML type 61, and the item information 72 is created according to the format of the analysis DB_II 70. It adds to DB_II70 for analysis. For the item information 72, True is set when a corresponding item is designated for each item ID, and False is set when there is no item information 72. Further, when there is no designation of WHERE, False is set for all item IDs of the item information 72. For example, in the first line of FIG. 2, A2 is shown after WHERE, and reflecting this, in FIG. 7, item 2 is True, and other items are False.

アクセスログ処理部４の処理が終了して、解析用ＤＢ５を作成したら、解析用ＤＢ５を入力として相関関係抽出部６の処理を行う。相関関係抽出部６の処理の流れを図８に示す。
ユーザ付加情報８で指定された情報３３について、Ｓ８１にて項目名一覧を整理する。
３３の例をＡ社データベースについて整理すると、１番＝Ａ．Ａ１、２番＝Ａ．Ａ４、３番＝Ｂ．Ｂ１、４番＝Ｃ．Ｃ２となる。
１番目から４番目のデータについてＳ８２からＳ８５の処理を行う。例えば１番目であるＡ．Ａ１に対して、Ｓ８２では解析用ＤＢ＿Ｉ６０のデータを使用して、Ａ．Ａ１をターゲットとして、その他全ての項目ＩＤを入力として、Ｓ８３にて相関分析を行ってその結果をファイルに出力する。
Ｓ８４にて解析対象を解析用ＤＢ＿ＩＩ７０に切り替えたり、ＤＭＬ種別６１に対する条件を指定するなど分析時の条件を変更し、Ｓ８３にて相関分析を行い、相関結果をファイルに出力する。全ての条件に対して相関分析を行ったらＳ８５にて結果をまとめて、例えば図９の表９０のような形式でまとめる。このような処理をＳ８１で整理した全ての項目について行い、項目の相関情報７としてファイル出力する。 When the processing of the access log processing unit 4 is completed and the analysis DB 5 is created, the processing of the correlation extraction unit 6 is performed with the analysis DB 5 as an input. The flow of processing of the correlation extraction unit 6 is shown in FIG.
For the information 33 specified by the user additional information 8, the item name list is organized in S81.
If the example of 33 is arranged about a company A database, the 1st = A. A1,2 = A. A4, 3 = B. B1, 4 = C. C2.
The processing from S82 to S85 is performed on the first to fourth data. For example, A. In contrast to A1, in step S82, the data of the analysis DB_I60 is used. With A1 as the target and all other item IDs as inputs, correlation analysis is performed in S83, and the result is output to a file.
In S84, the analysis target is changed to the analysis DB_II 70, or the conditions for the analysis are changed such as specifying conditions for the DML type 61. Correlation analysis is performed in S83, and the correlation result is output to a file. When correlation analysis is performed for all conditions, the results are collected in S85, for example, in a format as shown in Table 90 of FIG. Such processing is performed for all items arranged in S81, and the file is output as item correlation information 7.

以上の処理を図１のＡ社システム１、Ｂ社システム２の双方について行い、Ａ社システムにおける項目の相関情報７とＢ社システムにおける項目の相関情報７が生成される。図９に一例を示す。図３の３３のように、４つのユーザ付加情報を与えた場合は、Ａ社、Ｂ社、両社のシステムにて４つの項目に対する相関分析を行う。各相関分析結果において９０と９４、９１と９５、９２と９６、９３と９７がそれぞれ対応するものとなる。９０、９４についてＳ８４で３つの条件で相関分析を行った結果の例を示す。
図９において、相関情報７のＡ．Ａ４の相関結果９１、Ｂ．Ｂ１の相関結果９２、Ｃ．Ｃ２の相関結果９３は詳細は示していないが、Ａ．Ａ１の相関結果９０と同様な構成となる。また、システムＢの相関情報においても同様に、Ｘ．Ｘ５の相関結果９５、Ｙ．Ｙ１の相関結果９６、Ｚ．Ｚ３の相関結果９７は、Ｘ．Ｘ１の相関結果９４と同様な構成となる。 The above processing is performed for both the company A system 1 and the company B system 2 in FIG. 1, and the item correlation information 7 in the company A system and the item correlation information 7 in the company B system are generated. An example is shown in FIG. When four pieces of user additional information are given as indicated by 33 in FIG. 3, correlation analysis is performed on the four items in the systems of Company A, Company B, and both companies. In each correlation analysis result, 90 and 94, 91 and 95, 92 and 96, and 93 and 97 respectively correspond. An example of the result of correlation analysis performed on S90 and 94 under three conditions in S84 is shown.
In FIG. Correlation result 91 of A4, B.E. B1 correlation result 92, C.I. The correlation result 93 of C2 does not show the details. The configuration is the same as the correlation result 90 of A1. Similarly, in the correlation information of system B, X. X5 correlation result 95; Correlation result 96 of Y1, Z. The correlation result 97 of Z3 is X. The configuration is similar to the correlation result 94 of X1.

次に、同一項目候補抽出部９では、Ａ社システム１の相関情報７及びＢ社システム２の相関情報７を入力し、まず、相関分析結果９０〜９７の情報を点数化する処理を行う。
例えば相関結果表９０と相関結果表９４において、Ｓ８４で３つの条件を指定したならば、ケース１からケース３のそれぞれのケースにおける相関係数の単純和を各項目について求めて、図１０の１０２、１０４のような表にまとめる。また、それぞれのケースにおいて、２つの項目間で同時に相関があるとみなされた回数を集計したものを図１０の１０３、１０５のような表にまとめる。
例えば相関結果９０のケース１においては、Ａ．Ａ１に対してＡ．Ａ２、Ａ．Ａ４、Ｂ．Ｂ２、Ｂ．Ｂ５、Ｃ．Ｃ２が同時に相関ありとみなされたため、Ａ．Ａ２とＡ．Ａ４、Ａ．Ａ２とＢ．Ｂ２、Ａ．Ａ２とＢ．Ｂ５、Ａ．Ａ２とＣ．Ｃ２、Ａ．Ａ４とＢ．Ｂ２、Ａ．Ａ４とＢ．Ｂ５、Ａ．Ａ４とＣ．Ｃ２、Ｂ．Ｂ２とＢ．Ｂ５、Ｂ．Ｂ２とＣ．Ｃ２、Ｂ．Ｂ５とＣ．Ｃ２でそれぞれ１回づつカウントする。Ｓ８４で３つの条件を指定した場合は最大値が３となる。相関結果表９１〜９３及び９５〜９７についても同様な計算を行い、１０６〜１１７のように求める。システムＡについての結果が１００、システムＢについての結果が１０１である。 Next, the same item candidate extraction unit 9 inputs the correlation information 7 of the company A system 1 and the correlation information 7 of the company B system 2, and first performs a process of scoring the information of the correlation analysis results 90 to 97.
For example, in the correlation result table 90 and the correlation result table 94, if three conditions are designated in S84, the simple sum of the correlation coefficients in each case of case 1 to case 3 is obtained for each item, and 102 in FIG. , 104. In each case, the totals of the number of times that two items are considered to be correlated simultaneously are summarized in a table like 103 and 105 in FIG.
For example, in case 1 of correlation result 90, A.I. A. A2, A. A4, B.I. B2, B.B. B5, C.I. Since C2 was considered to be correlated at the same time, A2 and A.I. A4, A. A2 and B.I. B2, A.B. A2 and B.I. B5, A. A2 and C.I. C2, A.I. A4 and B.I. B2, A.B. A4 and B.I. B5, A. A4 and C.I. C2, B.I. B2 and B.B. B5, B.B. B2 and C.I. C2, B.I. B5 and C.I. Count once each at C2. When three conditions are designated in S84, the maximum value is 3. Similar calculations are performed for the correlation result tables 91 to 93 and 95 to 97 to obtain 106 to 117. The result for system A is 100 and the result for system B is 101.

相関係数の単純和の表１０２、１０４と３３で入力したユーザ情報Ａ．Ａ１＝Ｘ．Ｘ１より、表１０２に含まれる項目群と１０４に含まれる項目群のいずれかが同一項目候補の可能性があると考えることができるため、相関係数の単純和の値と順位、その他同時に指定されている項目情報を各項目の相関詳細情報として記憶する。このような操作をユーザ付加情報で指定した全ての項目即ち相関分析を行った全ての項目ＩＤについてまとめる。図３の３３の例だとシステムＡについては、項目ＩＤ＝１（Ａ．Ａ１またはＢ．Ｂ３）、項目ＩＤ＝４（Ａ．Ａ４）、項目ＩＤ＝８（Ｂ．Ｂ１またはＣ．Ｃ１）、項目ＩＤ＝１２（Ｃ．Ｃ２）の４つ、システムＢについても同様に４つについてまとめる。
相関詳細情報の例を図１１に示す。
１１８はＡ社システムの相関詳細情報の例であり、縦軸がＡ社データベース３１の項目名からユーザ付加情報３３で指定した項目を除いた各項目、横軸がユーザ付加情報３３で指定した各項目になっており、図３の３３の場合だと横軸に４項目となる。相関関係を調べた際に有効な情報が得られた場合は、相関詳細情報テーブル１１９のＩＤ番号が格納されている。相関詳細情報テーブル１１９は、図１０の情報をまとめたものであり、識別するためのＩＤ番号と相関情報を調べた項目名、相関係数単純和、順位、回数、同時項目指定情報を含むものである。Ｂ社データベース３２の各項目についても、同様に相関詳細情報１１８と相関詳細情報テーブル１１９を作成する。 User information entered in the tables 102, 104 and 33 of the simple sum of correlation coefficients A1 = X. From X1, it can be considered that either the item group included in Table 102 or the item group included in 104 may be the same item candidate, so the value and rank of the simple sum of correlation coefficients, etc. are specified at the same time Stored item information is stored as detailed correlation information of each item. Such an operation is summarized for all items specified by the user additional information, that is, all item IDs subjected to correlation analysis. In the example of 33 in FIG. 3, for system A, item ID = 1 (A.A1 or B.B3), item ID = 4 (A.A4), item ID = 8 (B.B1 or C.C1) , Item ID = 12 (C.C2), and system B are also summarized in the same manner.
An example of the detailed correlation information is shown in FIG.
118 is an example of the detailed correlation information of the company A system. The vertical axis represents each item obtained by removing the item designated by the user additional information 33 from the item name of the company A database 31, and the horizontal axis represents each item designated by the user additional information 33. In the case of 33 in FIG. 3, there are four items on the horizontal axis. When valid information is obtained when the correlation is examined, the ID number of the correlation detailed information table 119 is stored. The detailed correlation information table 119 is a compilation of the information shown in FIG. 10 and includes an ID number for identification, the item name for which the correlation information is examined, the correlation coefficient simple sum, the rank, the number of times, and simultaneous item designation information. . For each item in the B company database 32, the correlation detail information 118 and the correlation detail information table 119 are similarly created.

次に、相関詳細情報テーブル１１９の内容をＩＤごとに点数化して、１１８と同じ形式にまとめる。
この例を図１２に示す。１２０がＡ社データベース３１に関する相関結果点数化テーブル、１２１がＢ社データベース３２に関する相関結果点数化テーブルの一例である。 Next, the content of the correlation detailed information table 119 is scored for each ID and collected in the same format as 118.
An example of this is shown in FIG. 120 is an example of the correlation result score table for the A company database 31, and 121 is an example of the correlation result score table for the B company database 32.

点数化の方法としては、例えば、次のような手順が考えられる。
１）図１１のテーブル１１９について、同一「項目名」（例えばＡ．Ａ１）について、相関係数単純和、順位、回数の偏差値をそれぞれ求める。図１１の例では、ＩＤ＝１〜ＩＤ＝４の４行のデータ間において、ＩＤ＝５〜ＩＤ＝９の５行のデータ間において、ＩＤ＝１０〜ＩＤ＝１３の４行のデータ間において、ＩＤ＝１４〜ＩＤ＝１９の６行のデータ間において、それぞれ偏差値化する。順位については逆順（降順）に変換してから偏差値を求める。これでＩＤごとに３種の偏差値が得られる。
２）各ＩＤにおいて３種の偏差値合計を求める。
３）「項目名」ごとに相関係数単純和の合計値「相関係数単純和＿項目名」と全ての「相関係数単純和＿項目名」の平均値を求める。
４）上記２）で求めた３種の偏差値（相関係数単純和、順位、回数）の合計値に上記３）で求めた「相関係数単純和＿項目名」÷「相関係数単純和＿項目名」をかけ合わせたものを端数処理して５点きざみの点数としたものを図１２のテーブル１２０のような形式にまとめる。
なお、図１２のテーブル１２０及びテーブル１２１は、点数化計算の結果のイメージを示すものであり、図１１の数値から上記の計算を行った結果を示しているものではない。 As a scoring method, for example, the following procedure can be considered.
1) For the same “item name” (for example, A.A1) in the table 119 of FIG. In the example of FIG. 11, between four rows of data with ID = 1 to ID = 4, between five rows of data with ID = 5 to ID = 9, and between four rows of data with ID = 10 to ID = 13. , ID = 14 to ID = 19 are converted into deviation values between 6 rows of data. The rank is converted to the reverse order (descending order) and then the deviation value is obtained. Thus, three kinds of deviation values are obtained for each ID.
2) A total of three types of deviation values is obtained for each ID.
3) For each “item name”, the total value of the correlation coefficient simple sum “correlation coefficient simple sum_item name” and the average value of all “correlation coefficient simple sum_item name” are obtained.
4) “Correlation coefficient simple sum_item name” ÷ “Correlation coefficient simple” obtained in 3) above to the total value of the three types of deviation values (correlation coefficient simple sum, rank, number of times) obtained in 2) above The product of “sum_item name” is rounded to give a score of 5 points in a format like the table 120 of FIG.
Note that the table 120 and the table 121 in FIG. 12 show an image of the result of the scoring calculation, and do not show the result of performing the above calculation from the numerical values in FIG.

次に、相関結果点数化テーブル１２０と１２１との内容と相関詳細情報テーブル１１９の同時項目指定情報の内容を点数化することで類似度値を算出して、一定の条件を満たすものを同一項目候補１０として出力し、また、同一項目候補表示部１１で表示する。
例えば、図１２のテーブル１２０及びテーブル１２１から類似度値を計算し、図１２のテーブル１２３のように同一項目候補と類似値（得点）を示す。 Next, the similarity value is calculated by scoring the contents of the correlation result scoring tables 120 and 121 and the contents of the simultaneous item designation information in the correlation detailed information table 119, and the same items satisfying a certain condition are calculated. The data is output as a candidate 10 and displayed on the same item candidate display unit 11.
For example, the similarity value is calculated from the table 120 and the table 121 in FIG. 12, and the same item candidate and the similarity value (score) are shown as in the table 123 in FIG.

類似度値の算出方法は、例えば、以下のような手順が考えられる。
１）図１２のテーブル１２０、テーブル１２１の各列が共通項目情報３３より同じ項目と判断できるため、そこに注目した処理を行う。テーブル１２０の各行（Ａ社システムの各項目）について、各列の点数配分の割合（％）と順位を求める。点数のない列について順位は“−”（該当なし）とする。１つ以上の列に点数が入っている行について、以下の２）以降の処理を行う。全ての列に点数が入っていない行については、類似情報を得ることができないため、スキップして次の行の処理に移る。
２）テーブル１２１の行について、１つ以上の列に点数がある場合は、各列の点数配分の割合（％）と順位を上記１）と同様に求めて、テーブル１２０の行の情報と比較することで、システムＡの該当項目とシステムＢの該当項目の類似度を算出する。各列について順位が一致した場合に１０点（“−”（該当なし）で一致した場合は０点）、割合が±５％以内で一致した場合に１０点（０％で一致した場合は０点）、各列の点数が±５点以内で一致した場合に１０点（“−”（該当なし）で一致した場合は５点、双方が１５０点以上の場合はさらに１０点プラス）を追加する。
３）テーブル１２０の各行に対して、テーブル１２１の全ての行に対する上記２）の操作を行うことで、全ての組み合わせの類似度値を算出して、図１２のテーブル１２３のようにまとめる。
なお、図１２のテーブル１２３は、同一項目候補及び類似度値の表示イメージを示すものであり、テーブル１２０及びテーブル１２１数値から上記の計算を行った結果を示しているものではない。 As a method for calculating the similarity value, for example, the following procedure can be considered.
1) Since each column of the table 120 and the table 121 in FIG. 12 can be determined as the same item from the common item information 33, a process paying attention thereto is performed. For each row of the table 120 (each item of the company A system), the percentage distribution ratio (%) and rank of each column are obtained. The rank is “−” (not applicable) for a column without a score. The following 2) and subsequent processes are performed for a row in which a score is included in one or more columns. Since similar information cannot be obtained for all rows that do not have a score in each column, the processing skips to the next row.
2) If there is a score in one or more columns of the row of the table 121, the score distribution ratio (%) and rank of each column are obtained in the same manner as in 1) above and compared with the information in the row of the table 120. Thus, the similarity between the corresponding item of system A and the corresponding item of system B is calculated. 10 points when the ranks match for each column (0 points when matching with “-” (not applicable)), 10 points when the ratios match within ± 5% (0 when matching at 0%) Points), 10 points when the score of each row matches within ± 5 points (5 points if they match on “-” (not applicable), plus 10 points if both are 150 points or more) To do.
3) By performing the above operation 2) for all the rows of the table 121 for each row of the table 120, the similarity values of all the combinations are calculated and summarized as shown in the table 123 of FIG.
The table 123 in FIG. 12 shows a display image of the same item candidate and the similarity value, and does not show the result of the above calculation from the numerical values of the table 120 and the table 121.

このように、図１２のテーブル１２３のように同一項目候補を表示することにより、データベースの統合を行う者は、両データベースにおいて同一項目である可能性が高いデータ項目を知ることができ、同一項目を選定する処理にかかる工数・労力を大幅に軽減することができる。 In this way, by displaying the same item candidates as in the table 123 of FIG. 12, a person who integrates the database can know data items that are likely to be the same item in both databases, and the same item The man-hours and labor required for the process of selecting can be greatly reduced.

以上のように、２つの異なるシステムにおける業務アプリケーションのアクセスログを使用して、ユーザがあらかじめ入力した項目について相関関係を求めて、相関関係の類似情報から２つのシステムを統合する際の同一項目候補を表示するようにしているので、この情報を使用することによりシステム統合時の工数を削減する効果を期待することができる。 As described above, using the access logs of business applications in two different systems, the correlation between the items input in advance by the user is obtained, and the same item candidate when integrating the two systems from the similar information of the correlation Since this information is used, it can be expected to reduce the man-hours during system integration.

このように、本実施の形態に係るデータベース解析システムは、Ａ社システムのデータベース（第１のデータベース）に含まれる複数のデータ項目及びＢ社システムのデータベース（第２のデータベース）に含まれる複数のデータ項目のうち第１のデータベース及び第２のデータベース間で項目属性が共通する一部のデータ項目を共通項目（共通属性データ項目）として示すユーザ付加情報（共通属性データ項目情報）を入力するユーザ付加情報入力部（共通属性データ項目情報入力部）と、ユーザ付加情報入力部により入力されたユーザ付加情報とアクセスログとを用いて、共通項目と共通項目以外のデータ項目の各々との相関関係を解析し、各データ項目の共通項目に対する相関値を示す相関情報（第１の相関値、第２の相関値）を生成する相関解析部（第１のデータベース相関解析部及び第２のデータベース相関解析部）と、相関情報に示された各データ項目の共通項目に対する相関値に基づき、Ａ社システムのデータベース及びＢ社システムのデータベース間で項目属性が共通するデータ項目の候補を同一項目候補（共通属性データ項目候補）として抽出する同一項目候補抽出部（共通属性データ項目候補抽出部）とを有する。 Thus, the database analysis system according to the present embodiment includes a plurality of data items included in the database of the company A system (first database) and a plurality of data items included in the database of the company B system (second database). A user who inputs user additional information (common attribute data item information) indicating, as common items (common attribute data items), some data items having common item attributes between the first database and the second database among the data items Using the additional information input unit (common attribute data item information input unit), the user additional information and the access log input by the user additional information input unit, the correlation between the common item and each of the data items other than the common item Is generated, and correlation information (first correlation value, second correlation value) indicating a correlation value with respect to a common item of each data item is generated. Based on the correlation values for the common items of each data item indicated in the correlation information, the company A system database and the company B system are based on the correlation analysis units (first database correlation analysis unit and second database correlation analysis unit) The same item candidate extraction unit (common attribute data item candidate extraction unit) for extracting data item candidates having common item attributes as the same item candidate (common attribute data item candidate).

そして、相関解析部は、アクセスログに示されたデータベースへのアクセスに用いられたデータ項目を参照し、共通項目とともにデータベースへのアクセスに用いられた共通項目以外のデータ項目を解析して、共通項目と共通項目以外のデータ項目の各々との相関関係を解析する。
より具体的には、相関解析部は、アクセスログに示されたデータベースへのＳＱＬ文に記述されているデータ項目を参照し、共通項目と同じＳＱＬ文に記述されている共通項目以外のデータ項目を解析して、共通項目と共通項目以外のデータ項目の各々との相関関係を解析する。 Then, the correlation analysis unit refers to the data items used for accessing the database indicated in the access log, analyzes the data items other than the common items used for accessing the database together with the common items, Analyze the correlation between the item and each of the data items other than the common item.
More specifically, the correlation analysis unit refers to the data item described in the SQL statement to the database indicated in the access log, and the data item other than the common item described in the same SQL statement as the common item. And the correlation between the common item and each of the data items other than the common item is analyzed.

また、相関解析部は、各データ項目の共通項目に対する相関値を複数算出し、算出した複数の相関値を示す相関情報を生成し、同一項目候補抽出部は、相関情報に示された各データ項目の共通項目に対する複数の相関値に基づき、データ項目間の類似度値を算出し、算出した類似度値に基づいて同一項目候補を抽出する。そして、同一項目候補表示部は、同一項目候補抽出部により抽出された同一項目候補と、同一項目候補抽出部より算出された類似度値とを表示する。 In addition, the correlation analysis unit calculates a plurality of correlation values for the common item of each data item, generates correlation information indicating the calculated plurality of correlation values, and the same item candidate extraction unit calculates each data indicated in the correlation information A similarity value between data items is calculated based on a plurality of correlation values with respect to a common item, and the same item candidate is extracted based on the calculated similarity value. Then, the same item candidate display unit displays the same item candidate extracted by the same item candidate extraction unit and the similarity value calculated by the same item candidate extraction unit.

実施の形態２．
以上の実施の形態１では、相関分析を行う際に異なる条件を指定することで複数のケースでの相関分析結果を得ることができるようにしたものであるが、次に、相関分析結果から同一項目を算出する際に相関分析結果の情報の要素が選択できる実施の形態を示す。 Embodiment 2. FIG.
In the first embodiment described above, correlation analysis results in a plurality of cases can be obtained by specifying different conditions when performing correlation analysis. Next, the same results are obtained from the correlation analysis results. An embodiment in which elements of correlation analysis result information can be selected when calculating an item will be described.

図１３は、このような場合の相関分析結果の情報の要素を選択する画面の一例である。
図１１の相関詳細情報テーブル１１９には、相関項目単純和、順位、回数、同時項目指定情報が含まれている。図１３の要素指定テーブル１３０では、相関詳細情報テーブル１１９の横軸に示される相関分析結果の要素となる項目一覧に対する使用有無を指定するものとなっている。使用する項目に○、使用しない項目に×を指定する。
例えば、図１３のように「相関単純総和」と「回数」を使用、「順位」と「同時項目指定情報」を使用しないと指定した場合には、図１２に示す相関結果点数化テーブル作成時に段階において「相関単純総和」と「回数」の項目情報のみを使用して点数を算出する。 FIG. 13 shows an example of a screen for selecting information elements of correlation analysis results in such a case.
The correlation detail information table 119 of FIG. 11 includes correlation item simple sum, rank, number of times, and simultaneous item designation information. In the element designation table 130 of FIG. 13, the presence / absence of use for the item list that is an element of the correlation analysis result indicated on the horizontal axis of the detailed correlation information table 119 is designated. Specify ○ for items to be used and × for items not to be used.
For example, when “correlation simple sum” and “number of times” are used and “rank” and “simultaneous item designation information” are not used as shown in FIG. 13, when creating the correlation result scoring table shown in FIG. In the stage, the score is calculated using only item information of “correlation simple sum” and “number of times”.

このように、本実施の形態では相関分析結果の要素の中から使用する要素を指定するようにしているが、これを実現するために、例えば、図１の同一項目候補抽出・表示システム１４に相関値指定情報入力部を設け、相関値指定情報入力部が図１３の情報を出力するとともに、同一項目候補の抽出に用いる相関値を指定する相関値指定情報を図１３の画面より入力するようにすればよい。 As described above, in this embodiment, an element to be used is designated from among the elements of the correlation analysis result. To realize this, for example, the same item candidate extraction / display system 14 in FIG. A correlation value designation information input unit is provided so that the correlation value designation information input unit outputs the information of FIG. 13 and inputs correlation value designation information for designating a correlation value used for extracting the same item candidate from the screen of FIG. You can do it.

本実施の形態に係るデータベース解析システムは、相関分析を行う際に、アクセスログから得られる情報を変更する手段を備えている場合に、同一候補算出の際の計算方法を変更できることを特徴としている。このように、２つの異なるシステムにおける業務アプリケーションのアクセスログから得られる相関関係の類似情報から２つのシステムを統合するための同一項目候補を得る際に、算出方法を変更可能とすることで、様々な見方での同一項目候補を示すことができるため、ユーザが追加作業を行う際に、どの資料が不足しているか、どの部分を追加するべきか知ることができ、システム統合時の工数を削減する効果を期待することができる。 The database analysis system according to the present embodiment is characterized in that, when a correlation analysis is performed, when a means for changing information obtained from an access log is provided, a calculation method for calculating the same candidate can be changed. . As described above, when obtaining the same item candidate for integrating the two systems from the similar information of the correlation obtained from the access logs of the business applications in the two different systems, the calculation method can be changed to The same item candidates can be shown in various ways, so that when a user performs additional work, it is possible to know which material is missing and which part should be added, reducing the man-hours during system integration You can expect the effect.

以上のように、本実施の形態では、データベース解析システムは、相関情報に示される複数の相関値のうち同一項目候補の抽出に用いる相関値を指定する相関値指定情報を入力する相関値指定情報入力部を有し、同一項目候補抽出部は、相関情報に示される複数の相関値のうち相関値指定情報により指定された相関値に基づき、同一項目候補を抽出する旨を説明した。 As described above, in the present embodiment, the database analysis system inputs correlation value specifying information for specifying correlation values used for extracting the same item candidate among a plurality of correlation values indicated in the correlation information. It has been described that the same item candidate extraction unit has the input unit and extracts the same item candidate based on the correlation value designated by the correlation value designation information among the plurality of correlation values indicated in the correlation information.

実施の形態３．
以上の実施の形態では、Ａ社システム、Ｂ社システムと２つの異なるシステムの統合時のデータベースの同一項目候補を求めて表示するものであるが、表示方法について、同一項目一覧表示、Ａ社システム形式に基づいた表示形式、Ｂ社システム形式に基づいた表示形式と切り替えることのできる実施の形態を示す。 Embodiment 3 FIG.
In the above embodiment, the same item candidate of the database at the time of integration of the two different systems such as the company A system and the company B system is obtained and displayed. An embodiment that can be switched between a display format based on the format and a display format based on the B company system format is shown.

図１４は、表示形式を切り替えた場合の例を示したものである。１４０が対応する項目を得点順に表示した場合（図１２のテーブル１２３と同じ内容）、１４１がＡ社のデータベース３１の形式に合わせて表示した場合、１４２がＢ社のデータベース３２の形式に合わせて表示した場合である。
１４０では、得点の高い順、即ち同一項目候補として可能性の高い順に表示しているものであり、同一項目候補としてのくらいの項目がどの程度の可能性で示されているかを知る場合に有効である。
テーブル１４１及びテーブル１４２では、各データベースシステムの形式に基づいて表示されていて、各データベースの項目仕様書などを参照しながら調査する際に、テーブル数や項目数が多い場合に、対応する項目を探す手間を必要とせずに調査・確認を行うことができる。 FIG. 14 shows an example when the display format is switched. When 140 displays the corresponding items in the order of the scores (the same contents as the table 123 of FIG. 12), when 141 is displayed according to the format of the database 31 of company A, 142 is matched with the format of the database 32 of company B. This is the case.
140 is displayed in descending order of scores, that is, in the order of high possibility as the same item candidate, and is effective in knowing how much possibility the item as the same item candidate is indicated. It is.
The table 141 and the table 142 are displayed based on the format of each database system, and when searching while referring to the item specifications of each database, the corresponding items are displayed when there are a large number of tables or items. Investigation and confirmation can be performed without the need for searching.

本実施の形態に係るデータベース解析システムは、システムＡ、システムＢの２つのデータベースシステム統合時の同一項目候補を表示する際に、同一項目候補の一覧表示、システムＡのデータベース情報に基づいた同一項目候補の表示、システムＢのデータベース情報に基づいた同一項目候補の表示、といった表示方法の切り替えができることを特徴としている。このように、類似項目候補の表示方法を切り替え可能とすることで、システム統合時に人手で追加作業を行う際に、各システムにおいてどの部分についての調査や情報開示が必要であるかを把握することができ、システム統合のための効率向上効果を期待することができる。 When the database analysis system according to the present embodiment displays the same item candidate when the two database systems of system A and system B are integrated, the same item candidate list display, the same item based on the database information of system A The display method can be switched such as displaying candidates and displaying the same item candidates based on the system B database information. In this way, by making it possible to switch the display method of similar item candidates, when performing additional work manually during system integration, it is possible to grasp which part of each system needs investigation and information disclosure Can be expected to improve efficiency for system integration.

以上のように、本実施の形態では、同一候補表示部が、Ａ社システムのデータベースに合わせた表示形式及びＢ社システムのデータベースに合わせた表示形式のいずれかを選択して同一項目候補を表示することを説明した。 As described above, in the present embodiment, the same candidate display unit displays either the same item candidate by selecting either the display format that matches the database of the company A system or the display format that matches the database of the company B system. Explained what to do.

実施の形態４．
以上の実施の形態３では、Ａ社システム、Ｂ社システムと２つの異なるシステムの統合時のデータベースの同一項目候補を求めて表示する場合に、同一項目一覧表示、Ａ社システム形式に基づいた表示形式、Ｂ社システム形式に基づいた表示形式と切り替えて表示することを可能としたものであるが、それぞれの方式で表示する際に、一定得点以上のものをわかりやすく表示することのできる実施の形態を示す。 Embodiment 4 FIG.
In the above-described third embodiment, when the same item candidate in the database when integrating the two different systems with the company A system and the company B system is obtained and displayed, the same item list display, display based on the company A system format The display format can be switched to the display format based on the B company system format, but when displaying with each method, it is possible to display more than a certain score clearly. The form is shown.

図１５は、一例として図１４と同じものを６０点以上の項目を太字で表示した例である。１５０、１５１、１５２はそれぞれ１４０、１４１、１４２の６０点以上の項目を太字表示にしたものである。
この他に別色表示にするなど別名表示方式にしたり、基準値を７０点以上、５０点以上と順次切り替えて表示することで同一項目候補の傾向を知ることも可能である。 FIG. 15 is an example in which 60 or more items are displayed in bold in the same way as FIG. 14. 150, 151, and 152 are items in which items of 60 points or more of 140, 141, and 142 are displayed in bold.
In addition to this, it is also possible to know the tendency of the same item candidate by using a different name display method such as displaying in different colors, or by sequentially switching and displaying the reference value of 70 points or more and 50 points or more.

本実施の形態に係るデータベース解析システムは、システムＡ、システムＢの２つのデータベースシステム統合時の同一項目候補を表示する際に、各表示形式において、指定した得点以上のものを有効値として太字表示にするなどわかりやすく表示することを特徴としている。このように、類似項目候補を表示する際に、一定基準値以上のものの表示をわかりやすく表示してさらに基準値を変更することを可能にしたことで、どのテーブルのどの項目が同一項目候補としてどの程度有効であるかを認識することができ、システム統合時に人手で追加作業を行う際に、各システムにおいてどの部分についての調査や情報開示が必要であるかの共通認識を深めることができ、システム統合のための効率向上効果を期待することができる。 When the database analysis system according to the present embodiment displays the same item candidates when the two database systems of system A and system B are integrated, each display format displays a bold value as a valid value that exceeds the specified score. It is characterized by easy-to-understand display such as. In this way, when displaying similar item candidates, it is possible to display the display of items above a certain reference value in an easy-to-understand manner and further change the reference value, so that which item in which table is the same item candidate It is possible to recognize how effective it is, and when conducting additional work manually during system integration, it is possible to deepen a common understanding of what part of each system needs investigation and information disclosure, The efficiency improvement effect for system integration can be expected.

以上のように、本実施の形態では、同一候補表示部が、一定値以上の類似度値の同一項目候補を強調表示することを説明した。 As described above, in the present embodiment, it has been described that the same candidate display unit highlights the same item candidate having a similarity value of a certain value or more.

なお、以上の実施の形態では、Ａ社システムのデータベース、Ｂ社システムのデータベースという二つの会社のデータベースを統合する場合の例を説明したが、会社間のデータベース統合に限るものではなく、共通のデータ項目を含む二つのデータベースの統合であればどのような形態でも以上の実施の形態で説明したデータベース解析システムを適用可能である。 In the above embodiment, the example in which the databases of the two companies, ie, the database of the company A system and the database of the company B system are integrated has been described. As long as two databases including data items are integrated, the database analysis system described in the above embodiment can be applied in any form.

また、以上の実施の形態では、アクセスログの解析として、アクセスログに示されたＳＱＬ文の実行結果を解析することとしたが、ＳＱＬ文に限るものではなく、アクセスログにデータ項目ごとのアクセス実績が示されていればデータ項目ごとにアクセス実績を解析することで同様の解析を行うことができる。 In the above embodiment, as the analysis of the access log, the execution result of the SQL statement indicated in the access log is analyzed. However, the access log is not limited to the SQL statement, and the access log is accessed for each data item. If the results are shown, the same analysis can be performed by analyzing the access results for each data item.

また、共通項目とその他の項目との相関値計算の方法は、以上の実施の形態に記載されたものに限るものではなく、共通項目とその他の項目間の相関が得られるものであれば、どのような計算方法であってもよい。 In addition, the method of calculating the correlation value between the common item and the other items is not limited to that described in the above embodiment, and if the correlation between the common item and the other items can be obtained, Any calculation method may be used.

実施の形態１に係るデータベース解析システムの構成例を示す図。1 is a diagram illustrating a configuration example of a database analysis system according to Embodiment 1. FIG. 実施の形態１に係るアクセスログの例を示す図。FIG. 3 is a diagram showing an example of an access log according to the first embodiment. 実施の形態１に係るユーザ付加情報の例を示す図。FIG. 6 is a diagram showing an example of user additional information according to the first embodiment. 実施の形態１に係るアクセスログ処理部の動作例を示すフローチャート図。FIG. 3 is a flowchart showing an operation example of an access log processing unit according to the first embodiment. 実施の形態１に係る項目情報テーブルの例を示す図。FIG. 4 is a diagram showing an example of an item information table according to the first embodiment. 実施の形態１に係るアクセスログ処理部の処理結果の例を示す図。FIG. 6 is a diagram illustrating an example of a processing result of an access log processing unit according to the first embodiment. 実施の形態１に係るアクセスログ処理部の処理結果の例を示す図。FIG. 6 is a diagram illustrating an example of a processing result of an access log processing unit according to the first embodiment. 実施の形態１に係る相関関係抽出部の動作例を示す図。FIG. 6 is a diagram illustrating an operation example of a correlation extraction unit according to the first embodiment. 実施の形態１に係る相関情報の例を示す図。FIG. 4 is a diagram illustrating an example of correlation information according to the first embodiment. 実施の形態１に係る同一項目抽出部の点数化の経過を示す図。The figure which shows progress of scoring of the same item extraction part which concerns on Embodiment 1. FIG. 実施の形態１に係る相関詳細情報の例を示す図。FIG. 6 is a diagram showing an example of detailed correlation information according to the first embodiment. 実施の形態１に係る類似度値の計算結果及び同一項目候補の表示例を示す図。The figure which shows the example of a display of the calculation result of the similarity value which concerns on Embodiment 1, and the same item candidate. 実施の形態２に係る同一項目候補の抽出に使用する要素を指定する画面の例を示す図。The figure which shows the example of the screen which designates the element used for extraction of the same item candidate which concerns on Embodiment 2. FIG. 実施の形態３に係る表示形式を変更して同一項目候補を表示する例を示す図。The figure which shows the example which changes the display format which concerns on Embodiment 3, and displays the same item candidate. 実施の形態４に係る同一項目候補の表示の際に強調表示を行う例を示す図。The figure which shows the example which highlights in the case of the display of the same item candidate which concerns on Embodiment 4. FIG. 実施の形態１に係るデータベース解析システムの動作例を示すフローチャート図。FIG. 3 is a flowchart showing an operation example of the database analysis system according to the first embodiment. 実施の形態１に係るデータベース解析システムのハードウェア構成例を示す図。FIG. 3 is a diagram illustrating a hardware configuration example of the database analysis system according to the first embodiment.

Explanation of symbols

１Ａ社システム、２Ｂ社システム、３アクセスログ、４アクセスログ処理部、５解析用ＤＢ、６相関関係抽出部、７相関情報、８ユーザ付加情報、９同一項目候補抽出部、１０同一項目候補、１１同一項目候補表示部、１２ユーザ付加情報入力部、１３データ項目情報、１４同一項目候補抽出・表示システム、１００データベース解析システム。 1 Company A system 2 Company B system 3 Access log 4 Access log processing unit 5 Analysis DB 6 Correlation extraction unit 7 Correlation information 8 User additional information 9 Same item candidate extraction unit 10 Same item Candidate, 11 Same item candidate display part, 12 User additional information input part, 13 Data item information, 14 Same item candidate extraction / display system, 100 Database analysis system.

Claims

A database analysis system for performing analysis on a first database and a second database each including a plurality of data items,
Among the plurality of data items included in the first database and the plurality of data items included in the second database, some of the data items having the same item attribute between the first database and the second database are common attribute data. A common attribute data item information input unit for inputting common attribute data item information indicated as an item;
Using the common attribute data item information input by the common attribute data item information input unit and the access log of the first database, other than the common attribute data item of the first database and the common attribute data item of the first database A first database correlation analysis unit that analyzes a correlation with each of the data items and generates first correlation information indicating a correlation value with respect to the common attribute data item of each data item;
Using the common attribute data item information input by the common attribute data item information input unit and the access log of the second database, other than the common attribute data item of the second database and the common attribute data item of the second database A second database correlation analysis unit that analyzes a correlation with each of the data items and generates second correlation information indicating a correlation value with respect to the common attribute data item of each data item;
Based on the correlation value for each common data item of each data item indicated in the first correlation information and the correlation value for each common data item of each data item indicated in the second correlation information, the first database Data item candidates other than the common attribute data item and data items other than the common attribute data item in the second database are data item candidates having common item attributes between the first database and the second database. A database analysis system comprising: a common attribute data item candidate extraction unit that extracts as:

The first database correlation analysis unit includes:
The data item used for accessing the first database indicated in the access log of the first database is referred to, and the data item used for accessing the first database together with the common attribute data item of the first database is referred to. Analyzing a data item other than the common attribute data item of the first database and analyzing a correlation between the common attribute data item of the first database and each of the data items other than the common attribute data item of the first database;
The second database correlation analysis unit includes:
The data item used for access to the second database indicated in the access log of the second database is referred to, and the second item used for accessing the second database together with the common attribute data item of the second database. Analyzing data items other than the common attribute data items in the database 2 and analyzing the correlation between the common attribute data items in the second database and the data items other than the common attribute data items in the second database. The database analysis system according to claim 1.

The first database correlation analysis unit includes:
The first data item described in the same SQL statement as the common attribute data item of the first database is referred to the data item described in the SQL statement to the first database indicated in the access log of the first database. Analyzing data items other than the common attribute data items in the database, analyzing the correlation between the common attribute data items in the first database and each of the data items other than the common attribute data items in the first database,
The second database correlation analysis unit includes:
The second data item described in the same SQL statement as the common attribute data item of the second database is referred to the data item described in the SQL statement to the second database indicated in the access log of the second database. Analyzing a data item other than the common attribute data item of the second database, and analyzing a correlation between the common attribute data item of the second database and each of the data items other than the common attribute data item of the second database. The database analysis system according to claim 2, wherein the system is a database analysis system.

The first database correlation analysis unit includes:
Calculating a plurality of correlation values for the common attribute data item of each data item, and generating first correlation information indicating the calculated plurality of correlation values;
The second database correlation analysis unit includes:
Calculating a plurality of correlation values for the common attribute data item of each data item, and generating second correlation information indicating the calculated plurality of correlation values;
The common attribute data item candidate extraction unit
Based on a plurality of correlation values for the common attribute data item of each data item indicated in the first correlation information and a plurality of correlation values for the common attribute data item of each data item indicated in the second correlation information, The database analysis system according to claim 1, wherein common attribute data item candidates are extracted.

The database analysis system further includes:
The database analysis system according to claim 1, further comprising a display unit that displays the common attribute data item candidates extracted by the common attribute data item candidate extraction unit.

The common attribute data item candidate extraction unit
Based on the correlation value for each common data item of each data item indicated in the first correlation information and the correlation value for each common data item of each data item indicated in the second correlation information, the first database Calculate similarity values between data items in data items other than common attribute data items and data items other than common attribute data items in the second database, and extract common attribute data item candidates based on the calculated similarity values The database analysis system according to claim 1, wherein:

The database analysis system further includes:
A display unit for displaying the common attribute data item candidate extracted by the common attribute data item candidate extraction unit and the similarity value of the common attribute data item candidate calculated by the common attribute data item candidate extraction unit; The database analysis system according to claim 6, wherein the system is a database analysis system.

The database analysis system further includes:
Correlation value designation information for designating a correlation value used for extraction of a common attribute data item candidate in the common attribute data item candidate extraction unit among a plurality of correlation values indicated in the first correlation information and the second correlation information is input. It has a correlation value designation information input part,
The common attribute data item candidate extraction unit
5. The common attribute data item candidate is extracted based on a correlation value designated by the correlation value designation information among a plurality of correlation values indicated by the first correlation information and the second correlation information. The database analysis system described.

The display unit
6. The database analysis system according to claim 5, wherein a common attribute data item candidate is displayed by selecting one of a display format adapted to the first database and a display format adapted to the second database.

The display unit
7. The database analysis system according to claim 6, wherein common attribute data item candidates having similarity values equal to or greater than a certain value are highlighted.

A database analysis method for performing analysis on a first database and a second database each including a plurality of data items,
Among the plurality of data items included in the first database and the plurality of data items included in the second database, some of the data items having the same item attribute between the first database and the second database are common attribute data. Common attribute data item information input step for inputting common attribute data item information indicated as an item,
Using the common attribute data item information input in the common attribute data item information input step and the access log of the first database, other than the common attribute data item of the first database and the common attribute data item of the first database A first database correlation analysis step of analyzing a correlation with each of the data items and generating first correlation information indicating a correlation value with respect to the common attribute data item of each data item;
Using the common attribute data item information input in the common attribute data item information input step and the access log of the second database, other than the common attribute data item of the second database and the common attribute data item of the second database A second database correlation analysis step of analyzing a correlation with each of the data items and generating second correlation information indicating a correlation value with respect to the common attribute data item of each data item;
Based on the correlation value for each common data item of each data item indicated in the first correlation information and the correlation value for each common data item of each data item indicated in the second correlation information, the first database Data item candidates other than the common attribute data item and data items other than the common attribute data item in the second database are data item candidates having common item attributes between the first database and the second database. And a common attribute data item candidate extraction step to extract as a database analysis method.

A computer for performing analysis on the first database and the second database each including a plurality of data items;
Among the plurality of data items included in the first database and the plurality of data items included in the second database, some of the data items having the same item attribute between the first database and the second database are common attribute data. Common attribute data item information input processing for inputting common attribute data item information indicated as an item,
Using the common attribute data item information input by the common attribute data item information input process and the access log of the first database, other than the common attribute data item of the first database and the common attribute data item of the first database A first database correlation analysis process for analyzing a correlation with each of the data items and generating first correlation information indicating a correlation value with respect to the common attribute data item of each data item;
Using the common attribute data item information input by the common attribute data item information input process and the access log of the second database, other than the common attribute data item of the second database and the common attribute data item of the second database A second database correlation analysis process for analyzing a correlation with each of the data items and generating second correlation information indicating a correlation value with respect to the common attribute data item of each data item;
Based on the correlation value for each common data item of each data item indicated in the first correlation information and the correlation value for each common data item of each data item indicated in the second correlation information, the first database Data item candidates other than the common attribute data item and data items other than the common attribute data item in the second database are data item candidates having common item attributes between the first database and the second database. And a common attribute data item candidate extraction process to be extracted as a program.