JP2019032781A

JP2019032781A - Data integration support system and data integration support method

Info

Publication number: JP2019032781A
Application number: JP2017154832A
Authority: JP
Inventors: 大輔田代; Daisuke Tashiro; 田中　剛; Tsuyoshi Tanaka; 剛田中
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2017-08-09
Filing date: 2017-08-09
Publication date: 2019-02-28
Anticipated expiration: 2037-08-09
Also published as: JP6655582B2

Abstract

To enable rule creation for solving incompatibility between pieces of data present on business or contrary to an analyst's expectation to be carried out without manual operation.SOLUTION: Provided is a data integration support system that integrates related pieces of data among a plurality of pieces of record data according to a rule, in which designation of a target table selected from among a plurality of tables that manage the plurality of pieces of record data and a specific key in the target table is accepted, and a specific rule is determined on the basis of the target table and the specific key.SELECTED DRAWING: Figure 1

Description

本発明は、いわゆる名寄せ後のデータ同士を分析者の意図する適切な態様となるように統合するデータ統合支援システム及びデータ統合支援方法に関し、特に、そのような名寄せ後のデータ同士を統合の際に用いるルールの作成を支援するデータ統合支援システムに適用して好適なものである。 The present invention relates to a data integration support system and a data integration support method that integrate so-called data after name identification so as to be in an appropriate mode intended by an analyst, and in particular, when such data after name identification is integrated. The present invention is suitable for application to a data integration support system that supports the creation of rules used for.

近年、企業内の多様なデータを分析し活用することにより経営課題を解決しようとする動きが拡大している。データ分析の際には、データ前処理に多大な工数を要することが多く、そのため分析の期間が長くなったり十分なデータを準備することができず、分析結果の品質低下を招くおそれがあった。例えば、問題の１つとしては、データのエンティティ間のリレーション（１対１、１対多）が、業務上または分析者が期待するリレーションと異なっており、その検出及び解消に工数が掛かることがあった。 In recent years, there has been an increase in the movement to solve management problems by analyzing and utilizing various data in companies. During data analysis, data preprocessing often requires a lot of man-hours, so the analysis period becomes long or sufficient data cannot be prepared, which may lead to degradation of the quality of the analysis results. . For example, one of the problems is that the relationship between data entities (one-to-one, one-to-many) is different from the relationship that business or analysts expect, and it takes time to detect and resolve it. there were.

上述のようにデータのエンティティ間のリレーションが期待とは異なる場合、分析者がそれに気付かずデータの結合及び集約を実施するとデータに誤りが生じ、分析結果の品質が低下してしまう。このような問題を解消するためには、分析者がデータを逐一確認しながらデータ上の矛盾を解消するルールを手動で設定する必要があるが（特許文献１参照）、このようなルールを設定するには特別なスキル及び時間を要する。 As described above, when the relationship between the data entities is different from the expectation, if the analyst is unaware of this and performs the combination and aggregation of the data, an error occurs in the data, and the quality of the analysis result decreases. In order to solve such a problem, it is necessary for an analyst to manually set a rule for resolving inconsistencies in the data while checking the data one by one (see Patent Document 1). It takes special skills and time.

また、従来のシステムでは、第１及び第２のレコードを同定し、レコードマッチング基準を適用して第１及び第２の同定したレコードのデータ要素の内容を比較することにより共通性データを求める。レコード発生の最先日または第１及び第２の相対的内容に基づいて、第１及び第２のレコードのうちの一方を生き残りレコードとして選択している（特許文献２参照）。 In the conventional system, the first and second records are identified, and the common data is obtained by comparing the contents of the data elements of the first and second identified records by applying a record matching criterion. One of the first and second records is selected as a surviving record based on the earliest date of record generation or the first and second relative contents (see Patent Document 2).

特開２０１３−１３７７６３号公報JP 2013-137766 A 特表２００５−５０２１３２号公報JP 2005-502132 A

しかしながら、上述した従来のシステムにおいても、レコードを選択するためのルールは人手で規定する必要があり、データ前処理を効率良く実施することができなかった。 However, even in the above-described conventional system, it is necessary to manually define a rule for selecting a record, and data preprocessing cannot be performed efficiently.

本発明は以上の点を考慮してなされたもので、業務上または分析者の期待に反して存在するデータ同士の矛盾を解消するためのルール作成を人手を介さず効率的に実施することができるデータ統合支援システム及びデータ統合支援方法を提案しようとするものである。 The present invention has been made in consideration of the above points, and it is possible to efficiently create rules for eliminating contradictions between existing data against business or analyst expectations, without human intervention. A data integration support system and a data integration support method are proposed.

かかる課題を解決するため、本発明においては、複数のレコードデータのうち関連するデータ同士をルールに従って統合するデータ統合支援システムにおいて、前記複数のレコードデータを管理する複数のテーブルと、前記複数のテーブルのうちから選択した対象テーブル及び前記対象テーブルにおける特定キーの指定を受け付ける指定受付部と、前記指定受付部によって指定が受け付けられた前記対象テーブル及び前記特定キーに基づいて特定のルールを決定するルール決定部と、前記ルール決定部によって決定された前記特定のルールに従って前記複数のレコードデータ同士のうち一のレコードデータを選択するルール実行部と、前記ルール実行部によって選択された前記一のレコードデータを解析するデータ解析部と、前記データ解析部による前記一のレコードデータについての解析結果を表示する結果表示部と、を備えることを特徴とする。 In order to solve such a problem, in the present invention, in a data integration support system that integrates related data among a plurality of record data according to a rule, a plurality of tables for managing the plurality of record data, and the plurality of tables A rule that determines a specific rule based on the target table selected from among the target table, a specification receiving unit that receives a specification of a specific key in the target table, and the target table and the specific key whose specification has been received by the specification receiving unit A determination unit; a rule execution unit that selects one record data among the plurality of record data according to the specific rule determined by the rule determination unit; and the one record data selected by the rule execution unit A data analysis unit for analyzing the data and the data analysis Characterized in that by and a result display unit for displaying an analysis result for the one record data.

また、本発明においては、複数のレコードデータのうち関連するデータ同士をルールに従って統合するデータ統合支援システムにおけるデータ統合支援方法において、前記データ統合支援システムが、前記複数のレコードデータを管理する複数のテーブルのうちから選択した対象テーブル及び前記対象テーブルにおける特定キーの指定を受け付ける指定受付ステップと、前記データ統合支援システムが、前記指定受付ステップにおいて指定が受け付けられた前記対象テーブル及び前記特定キーに基づいて特定のルールを決定するルール決定ステップと、前記データ統合支援システムが、前記ルール決定ステップにおいて決定された前記特定のルールに従って前記複数のレコードデータ同士のうち一のレコードデータを選択するルール実行ステップと、前記データ統合支援システムが、前記ルール実行ステップにおいて選択された前記一のレコードデータを解析するデータ解析部ステップと、前記データ統合支援システムが、前記一のレコードデータについての解析結果を表示する結果表示ステップと、を有することを特徴とする。 In the present invention, in the data integration support method in a data integration support system that integrates related data among a plurality of record data according to a rule, the data integration support system manages a plurality of record data. A target reception table selected from among the tables, a specification reception step for receiving a specification of a specific key in the target table, and the data integration support system based on the target table and the specific key for which the specification was received in the specification reception step And a rule execution step in which the data integration support system selects one record data among the plurality of record data according to the specific rule determined in the rule determination step. And a data analysis unit step in which the data integration support system analyzes the one record data selected in the rule execution step, and the data integration support system displays an analysis result for the one record data. And a result display step.

本発明によれば、業務上または分析者の期待に反して存在するデータ同士の矛盾を解消するためのルール作成を人手を介さず効率的に実施することができる。 According to the present invention, it is possible to efficiently create a rule for eliminating a contradiction between data existing in business or against an analyst's expectation without human intervention.

本実施の形態によるデータ統合支援システムなどの概略構成の一例を示すブロック図である。It is a block diagram which shows an example of schematic structures, such as a data integration support system by this Embodiment. 図１に示す顧客テーブルに格納されているデータの一例を示す図である。It is a figure which shows an example of the data stored in the customer table shown in FIG. 図２に示す顧客テーブルに追加されたデータの一例を示す図である。It is a figure which shows an example of the data added to the customer table shown in FIG. ルール作成支援処理の一例を示す図である。It is a figure which shows an example of a rule creation assistance process. 選定ルールのカバレッジを算出する準備の一例を示す図である。It is a figure which shows an example of the preparation which calculates the coverage of a selection rule. 選定ルールのカバレッジを算出する準備の一例を示す図である。It is a figure which shows an example of the preparation which calculates the coverage of a selection rule. 選定ルールのカバレッジの算出例を示す図である。It is a figure which shows the example of calculation of the coverage of a selection rule. 図１に示す適用履歴テーブルに格納されたデータの一例を示す図である。It is a figure which shows an example of the data stored in the application history table shown in FIG. 図８に示す適用履歴テーブルから参照された一部のレコードデータの一例を示す図である。It is a figure which shows an example of the one part record data referred from the application history table shown in FIG. 図９に示すレコードデータに基づいてスコアを与えた一例を示す図である。It is a figure which shows an example which gave the score based on the record data shown in FIG. カバレッジ及び適用履歴スコアで利用列がソートされた一例を示す図である。It is a figure which shows an example by which the use column was sorted by the coverage and the application history score. 選択条件の一例を示す図である。It is a figure which shows an example of selection conditions. 条件選択に応じて表示された選定結果の一例を示す図である。It is a figure which shows an example of the selection result displayed according to condition selection.

以下、図面について、本発明の一実施の形態について詳述する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

（１）本実施の形態によるデータ統合支援システムの構成例
図１は、本実施の形態によるデータ統合支援システム１０１の概略構成の一例を示す。本実施の形態では、いわゆる名寄せ後の結果をまとめる際に基準として使用すべきルールの作成支援に特徴がある。以下、具体的に説明する。 (1) Configuration Example of Data Integration Support System According to this Embodiment FIG. 1 shows an example of a schematic configuration of a data integration support system 101 according to this embodiment. This embodiment is characterized by support for creating a rule that should be used as a reference when summarizing the results after so-called name identification. This will be specifically described below.

データ統合支援システム１０１は、図示しないネットワークを経由して、詳細は後述する顧客テーブル及び担当者テーブルを含む複数のテーブルを管理する対象データベース（以下「対象データＤＢ」という）１０３を有するストレージ装置１０２に接続されている。なお、データ統合支援システム１０１は、対象データＤＢ１０３を内蔵しており、ストレージ装置１０２を省略した構成であっても良い。 The data integration support system 101 has a storage device 102 having a target database (hereinafter referred to as “target data DB”) 103 that manages a plurality of tables including a customer table and a person-in-charge table, which will be described in detail later, via a network (not shown). It is connected to the. The data integration support system 101 may have a configuration in which the target data DB 103 is built in and the storage apparatus 102 is omitted.

対象データＤＢ１０３は、いわゆるリレーショナル型のデータベースであり、これら複数のテーブルは、それぞれ、列項目ごとにデータを管理しており、全ての列項目に対応した複数のデータで構成されるレコードデータを管理している。 The target data DB 103 is a so-called relational database, and each of the plurality of tables manages data for each column item, and manages record data including a plurality of data corresponding to all column items. doing.

データ統合支援システム１０１は、指定受付部１、データ解析部２、ＤＢ接続部３、ルール決定部４、ルール実行部５及び結果表示部７を備え、さらに好ましくは、適用履歴テーブル６を備える。 The data integration support system 101 includes a designation reception unit 1, a data analysis unit 2, a DB connection unit 3, a rule determination unit 4, a rule execution unit 5, and a result display unit 7, and more preferably includes an application history table 6.

ＤＢ接続部３は、いわゆるＳＱＬ（ＳｔｒｕｃｔｕｒｅｄＱｕｅｒｙＬａｎｇｕａｇｅ）言語などの問い合わせ言語による問い合わせ命令を受け取ると、この問い合わせの内容に応じて対象データＤＢ１０３との間においてレコードデータの新規追加、削除または更新を実施し、その結果を問い合わせ元に対して問い合わせ結果を応答する。 When the DB connection unit 3 receives an inquiry command in a query language such as a so-called SQL (Structured Query Language) language, the DB connection unit 3 newly adds, deletes, or updates record data with the target data DB 103 according to the content of the inquiry. Then, the result of the inquiry is returned to the inquiry source.

指定受付部１は、複数のテーブルのうち選択された対象テーブル及びこの対象テーブルにおける特定の列項目（以下「特定キー」ともいう）の指定を受け付ける。 The designation receiving unit 1 receives designation of a target table selected from a plurality of tables and a specific column item (hereinafter also referred to as “specific key”) in the target table.

ルール決定部４は、指定受付部１によって受け付けられた対象テーブル及び特定キーに基づいて特定のルールを決定する。本実施の形態において「決定」とは、当該特定のルールを新規に作成することのみならず、過去適用したルールの一部のルールを流用することをいう。ルール決定部４によるルールの決定方法の詳細については後述する。 The rule determining unit 4 determines a specific rule based on the target table and the specific key received by the designation receiving unit 1. In this embodiment, “determining” means not only creating a new specific rule, but also diverting a part of the rules applied in the past. Details of the rule determination method by the rule determination unit 4 will be described later.

ルール実行部５は、ルール決定部４によって決定された特定のルールに従って、ＤＢ接続部３を経由して問い合わせ命令を発行することにより、対象データＤＢ１０３のテーブルからレコードデータを検索する。これにより、レコードデータ同士のうちから一のレコードデータを選択することができる。 The rule execution unit 5 searches the record data from the table of the target data DB 103 by issuing an inquiry command via the DB connection unit 3 according to the specific rule determined by the rule determination unit 4. Thereby, one record data can be selected from among the record data.

データ解析部２は、ルール実行部５によって選択された一のレコードデータを解析する。結果表示部７は、データ解析部２による一のレコードデータについての解析結果を表示する。この表示内容の詳細については後述する。 The data analysis unit 2 analyzes one record data selected by the rule execution unit 5. The result display unit 7 displays the analysis result of one record data by the data analysis unit 2. Details of this display content will be described later.

適用履歴テーブル６は、複数のデータに対して適用した過去のルール（後述する選定ルールＲｉに相当）を、当該過去のルールの適用回数に対応する適用頻度スコアＦｉに対応付けて管理している。 The application history table 6 manages past rules (corresponding to selection rules Ri described later) applied to a plurality of data in association with application frequency scores Fi corresponding to the number of times of application of the past rules. .

（２）データ統合支援システムによるデータ統合支援方法
データ統合支援システム１０１は以上のように構成されており、次にデータ統合支援システム１０１の動作例としてのデータ統合支援方法について説明する。なお、本実施の形態では、ルールの決定に際し、対象レコードデータ群から決定すれば良いが、ここでは、併せて、過去に適用したルールのうち流用可能なルールも採用しうる形態を説明する。 (2) Data Integration Support Method by Data Integration Support System The data integration support system 101 is configured as described above. Next, a data integration support method as an operation example of the data integration support system 101 will be described. In the present embodiment, the rules may be determined from the target record data group, but here, a mode in which divertable rules can be adopted among the rules applied in the past will be described.

（２−１）データ上の矛盾の有無
図２及び図３は、それぞれ、上述した対象データＤＢ１０３において管理されている顧客テーブル１０４の各列項目に対応したレコードデータの一例を示す。顧客テーブル１０４は、その列項目として、複数の顧客を互いに識別するための顧客ＩＤ、氏名、住所、電話番号、性別、年齢、登録日、更新日、更新時刻及び無効フラグを有する。この顧客テーブル１０４におけるユニークキーは、全ての列項目のうち顧客ＩＤであるものとする。 (2-1) Data Consistency Consistency FIGS. 2 and 3 each show an example of record data corresponding to each column item of the customer table 104 managed in the target data DB 103 described above. The customer table 104 has, as its column items, a customer ID, name, address, telephone number, gender, age, registration date, update date, update time, and invalid flag for identifying a plurality of customers. It is assumed that the unique key in the customer table 104 is a customer ID among all the column items.

本実施の形態において図２に示すレコードデータは、全てのレコードデータにおいてユニークキーである顧客ＩＤが重複していないため、業務上または分析者が期待するリレーションと異なっておらず、データ上特に解消すべき点はない（以下「データ上の矛盾がない状態」という）。 In the present embodiment, the record data shown in FIG. 2 is not different from the relation expected by the business or analyst because the customer ID that is a unique key is not duplicated in all the record data, and is particularly resolved in the data. There is nothing to be done (hereinafter referred to as “there is no data inconsistency”).

本実施の形態において図３に示すレコードデータは、全てのレコードデータのうちユニークキーである顧客ＩＤが「Ａ０００１」及び「Ａ０００３」において重複する２つのレコードデータが存在しており、この検出及び解消が必要である（以下「データ上の矛盾がある状態」という）。 In the present embodiment, the record data shown in FIG. 3 includes two record data that are duplicated in the customer IDs “A0001” and “A0003” that are unique keys among all the record data. Is necessary (hereinafter referred to as “there is a data inconsistency”).

これは、例えば、業務上または分析者の期待では１対１の関係にあるデータ同士がデータを扱うシステムやその運用または業務フローなどに応じて１対多の関係で対象データＤＢ１０３に保存される場合があるためである。 This is stored in the target data DB 103 in a one-to-many relationship depending on, for example, a system in which data having a one-to-one relationship in business or analyst's expectation handles the data or its operation or business flow. This is because there are cases.

例えば、顧客情報を管理する顧客テーブルにおいて業務上は顧客とその住所とは１対１の関係にある筈であるが，以下のいずれかの場合には、１人の顧客に対し住所の異なるレコードデータが複数存在することが生じうる。
１）住所変更の処理をレコードデータの更新時ではなく、新規追加または古いレコードデータの無効化で処理する場合（例えば無効フラグの設定）
２）レコードの類似度に応じて確率的に名寄せを実施した場合 For example, in a customer table for managing customer information, a customer and its address should have a one-to-one relationship in business, but in any of the following cases, a record having a different address for one customer There may be multiple data.
1) When address change processing is not performed at the time of updating record data, but is performed by invalidating new record data or old record data (for example, setting an invalid flag)
2) When name identification is performed probabilistically according to the similarity of records

（２−２）データ上の矛盾がある状態の解消
そこで、本実施の形態では、次のように決定したルールを用いて、これらデータ上の矛盾がある状態を解消していく。 (2-2) Resolving a state with data contradiction Therefore, in the present embodiment, a state having a contradiction on data is resolved using a rule determined as follows.

図４は、ルール決定処理の一例を示すフローチャートである。まず、指定受付部１は、対象テーブル及び特定キーの指定を受け付ける（ステップＳ１）。 FIG. 4 is a flowchart illustrating an example of the rule determination process. First, the designation receiving unit 1 receives designation of a target table and a specific key (step S1).

ルール決定部４は、当該特定キーがユニークキーであるか否かを判定し（ステップＳ２）、当該特定キーがユニークキーでない場合には処理を終了する一方、当該特定キーがユニークキーである場合には次のようなステップＳ３を実行する。 The rule determination unit 4 determines whether or not the specific key is a unique key (step S2). If the specific key is not a unique key, the rule determining unit 4 ends the process, while the specific key is a unique key. The following step S3 is executed.

このステップＳ３では、ルール決定部４が、特定キー以外の列項目を用いた選定ルールＲｉのカバレッジＣｉを各項目について算出する。 In step S3, the rule determination unit 4 calculates the coverage Ci of the selection rule Ri using column items other than the specific key for each item.

具体的には、ルール決定部４は、例えば図５及び図６に示す顧客テーブルにおいてユニークキー（顧客ＩＤ）が重複する一部のレコードデータについて、特定キー以外の列項目、例えば「電話番号」、「更新日」、「更新時刻」及び「無効フラグ」を用いて、図７に示すように、この選定ルールＲｉのカバレッジＣｉを各項目について算出する。 Specifically, the rule determination unit 4 uses a column item other than the specific key, for example, “telephone number” for a part of record data in which the unique key (customer ID) is duplicated in the customer table shown in FIGS. , Using the “update date”, “update time”, and “invalid flag”, the coverage Ci of the selection rule Ri is calculated for each item as shown in FIG.

次にルール決定部４は、図８に示す適用履歴テーブル６から、選定ルールＲｉの適用頻度スコアＦｉ（図示の適用回数に相当）を取得する（ステップＳ４）。具体的には、ルール決定部４は、適用履歴テーブル６に対して、図９に示すように、テーブル名が「顧客テーブル」であり、かつ、キー列項目が「顧客ＩＤ」であるレコードを検索し、図９に示すように問い合わせ結果を得る。 Next, the rule determination unit 4 acquires an application frequency score Fi (corresponding to the number of application times shown) of the selection rule Ri from the application history table 6 shown in FIG. 8 (step S4). Specifically, as shown in FIG. 9, the rule determination unit 4 selects a record whose table name is “customer table” and whose key column item is “customer ID” with respect to the application history table 6. Search is performed to obtain an inquiry result as shown in FIG.

さらにステップＳ４では、ルール決定部４が、これらのレコードのうち、テーブル、キー列項目または選定対象のいずれかが一致するルールを、適用履歴テーブル６から抽出し、選定ルール（使用列名、条件）ごとに適用回数から、図１０に示すように適用頻度スコアＦｉを算出する。この場合、ルール決定部４は、テーブル、キー列項目及び選定対象の全てが一致する履歴が存在するルールに高いスコアを付与している。 Further, in step S4, the rule determination unit 4 extracts, from these records, a rule that matches any of the table, the key column item, or the selection target from the application history table 6, and selects the selection rule (used column name, condition ) To calculate the application frequency score Fi as shown in FIG. In this case, the rule determination unit 4 gives a high score to a rule having a history in which all of the table, the key string item, and the selection target exist.

次にルール決定部４は、選定ルールＲｉの優先度Ｓｉを所定の関数ｆ（Ｃｉ，Ｆｉ）に従って算出する（ステップＳ５）。 Next, the rule determination unit 4 calculates the priority Si of the selection rule Ri according to a predetermined function f (Ci, Fi) (step S5).

次にルール決定部４は、選定ルールＲｉを優先度Ｓｉでソートし、このソート後の選定ルールＲｉを含む選定ルール候補表示画面を、結果表示部７に表示させる（ステップＳ６）。 Next, the rule determination unit 4 sorts the selection rules Ri by the priority Si, and causes the result display unit 7 to display a selection rule candidate display screen including the selection rules Ri after the sorting (step S6).

なお、このようにソートする際、ルール決定部４は、結果表示部７に、カバレッジＣｉ及び適用履歴スコアＦｉの大小に応じて、図１１左側に示すように使用する列項目を左詰で表示させたり、図１１右側に示すように未使用の列項目をマスクするよう表示するようにしても良い。 When sorting in this way, the rule determination unit 4 displays the column items to be used on the result display unit 7 according to the size of the coverage Ci and the application history score Fi as shown on the left side of FIG. Alternatively, it may be displayed so as to mask unused column items as shown on the right side of FIG.

さらにルール決定部４は、上述した選定ルール候補表示画面に、図１２に示すように「最大」及び「最少」という選択条件が選択可能とする選択条件画面を重ねて、結果表示部７に表示させ、分析者に選択条件を選択させるようにしても良い。このように選択された場合、結果表示部７には、図１３に示すように、例えば「更新日」を使用列項目とするとともに「最大」を選択条件とした選定ルールの候補群を提示する選定ルール候補表示画面が表示される。 Further, the rule determination unit 4 superimposes a selection condition screen on which the selection conditions of “maximum” and “minimum” can be selected as shown in FIG. The analyzer may be made to select the selection condition. When selected in this way, as shown in FIG. 13, the result display unit 7 presents, for example, a selection rule candidate group having “update date” as a use column item and “maximum” as a selection condition. A selection rule candidate display screen is displayed.

分析者は、このように選定ルール候補表示画面に表された選定ルール群のうちから所望の選定ルールＲｉを選択する操作を行うことにより、データ上の矛盾がある状態のレコードデータである顧客ＩＤが「Ａ０００１」である２つのレコードデータのうちからより適切な１つのレコード、及び、顧客ＩＤが「Ａ０００３」である２つのレコードデータのうちからより適切な１つのレコードを選択し、その後データ後処理として実施されるデータ分析に使用する。 The analyst performs the operation of selecting the desired selection rule Ri from the selection rule group displayed on the selection rule candidate display screen in this way, thereby the customer ID that is record data in a state where there is a contradiction in the data Select one more appropriate record from the two record data with "A0001" and one more appropriate record from the two record data with the customer ID "A0003". Used for data analysis performed as a process.

以上のように本実施の形態によれば、今後拡大することが見込まれるデータ活用の際に、業務上または分析者の期待に反して存在するデータ同士の矛盾を解消するためのルール決定を人手を介さず効率的に実施することができるため、上述したデータ後処理としてのデータ分析を実施する前に完了されているべきデータ前処理の工数を削減することができる。また、ルール決定が人手によらないため、作成者の熟練度が影響せず、ルールを均質化することができる。 As described above, according to the present embodiment, when utilizing data that is expected to expand in the future, rule determination for resolving inconsistencies between data that is contrary to business or analyst expectations is manually performed. Therefore, it is possible to reduce the man-hours for data pre-processing that should be completed before performing the data analysis as the data post-processing described above. Further, since rule determination is not performed manually, the skill level of the creator is not affected, and the rules can be homogenized.

（３）その他の実施形態
上記実施形態は、本発明を説明するための例示であり、本発明をこれらの実施形態にのみ限定する趣旨ではない。本発明は、その趣旨を逸脱しない限り、様々な形態で実施することができる。例えば、上記実施形態では、各種プログラムの処理をシーケンシャルに説明したが、特にこれにこだわるものではない。従って、処理結果に矛盾が生じない限り、処理の順序を入れ替え又は並行動作するように構成しても良い。また、上記実施形態における各処理ブロックを含むプログラムは、例えばコンピュータが読み取り可能な非一時的記憶媒体に格納されている形態であっても良い。 (3) Other Embodiments The above embodiment is an example for explaining the present invention, and is not intended to limit the present invention only to these embodiments. The present invention can be implemented in various forms without departing from the spirit of the present invention. For example, in the above-described embodiment, the processing of various programs is described sequentially, but this is not particularly concerned. Therefore, as long as there is no contradiction in the processing result, the processing order may be changed or the operation may be performed in parallel. Further, the program including each processing block in the above embodiment may be in a form stored in a non-transitory storage medium readable by a computer, for example.

本発明は、いわゆる名寄せ後のデータ同士を分析者の意図する適切な態様となるように統合するデータ統合支援システム及びデータ統合支援方法に広く適用することができる。 The present invention can be widely applied to a data integration support system and a data integration support method for integrating so-called name identification data so as to be in an appropriate mode intended by an analyst.

１……指定受付部、２……データ解析部、３……ＤＢ接続部、４……ルール決定部、５……ルール実行部、６……適用履歴テーブル、７……結果表示部、１０１……データ統合支援システム、１０２……ストレージ装置、１０３……対象データＤＢ。 DESCRIPTION OF SYMBOLS 1 ... Designation reception part, 2 ... Data analysis part, 3 ... DB connection part, 4 ... Rule determination part, 5 ... Rule execution part, 6 ... Application history table, 7 ... Result display part, 101 Data integration support system 102 Storage device 103 Data DB

Claims

In a data integration support system that integrates related data among multiple record data according to rules,
A plurality of tables for managing the plurality of record data;
A target accepting unit selected from the plurality of tables and a designation receiving unit that accepts designation of a specific key in the target table;
A rule determining unit that determines a specific rule based on the target table and the specific key whose designation is received by the designation receiving unit;
A rule execution unit that selects one record data among the plurality of record data according to the specific rule determined by the rule determination unit;
A data analysis unit that analyzes the one record data selected by the rule execution unit;
A result display unit for displaying an analysis result of the one record data by the data analysis unit;
A data integration support system comprising:

The rule determination unit
The data integration support system according to claim 1, wherein coverage of a selection rule using column items other than the specific key in the target table is calculated for each item.

An application history table for managing past rules applied to the plurality of data;
The rule determination unit
The data integration support system according to claim 1, wherein a rule applicable to the one record data is searched from the past rules managed in the application history table.

The result display section
3. A selection rule candidate display screen for presenting the rule candidates is displayed, and coverage and application history scores of the selection rules using column items in the target table are displayed together. And a data integration support system according to claim 3.

The result display section
The data integration support system according to claim 4, wherein a selection condition screen for inputting a selection condition is displayed on the selection rule candidate display screen in an overlapping manner.

The result display section
5. The data integration support system according to claim 4, wherein an unused column item among the column items of the target table is masked and displayed on the selection rule candidate display screen.

In a data integration support method in a data integration support system that integrates related data among a plurality of record data according to a rule,
The data integration support system receives a designation of a target table selected from a plurality of tables managing the plurality of record data and a specific key in the target table; and
A rule determining step in which the data integration support system determines a specific rule based on the target table and the specific key whose designation is accepted in the designation accepting step;
A rule execution step in which the data integration support system selects one record data among the plurality of record data according to the specific rule determined in the rule determination step;
A data analysis step in which the data integration support system analyzes the one record data selected in the rule execution step;
A result display step in which the data integration support system displays an analysis result for the one record data; and
A data integration support method characterized by comprising:

In the rule determining step,
The data integration support method according to claim 7, wherein the data integration support system calculates coverage of selection rules using column items other than the specific key in the target table for each item.

In the rule determination step,
The data integration support system searches for a rule applicable to the one record data from the past rules managed in an application history table that manages past rules applied to the plurality of data. The data integration support method according to claim 7, wherein:

The result display step includes:
The data integration support system displays a selection rule candidate display screen for presenting the rule candidates, and also displays coverage and application history scores of the selection rules using column items in the target table. The data integration support method according to claim 8 or 9, characterized by the above.

The result display step includes:
The data integration support method according to claim 10, wherein the data integration support system displays a selection condition screen for inputting a selection condition on the selection rule candidate display screen in an overlapping manner.

The result display step includes:
The data integration support method according to claim 10, wherein the data integration support system masks and displays unused column items among the column items of the target table on the selection rule candidate display screen.