JP6563549B1

JP6563549B1 - Data trend analysis method, data trend analysis system, and narrowing and restoring device

Info

Publication number: JP6563549B1
Application number: JP2018060783A
Authority: JP
Inventors: 竜矢木村; 慎一尾崎; 黒田　沢希; 沢希黒田; 響齋藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2019-08-21
Anticipated expiration: 2038-03-27
Also published as: JP2019175009A

Abstract

【課題】分析対象とするデータ項目を絞り計算負荷を下げつつも、全データ項目を使って分析したのと同様の結果を精度良く得ることを目的とする。また同類と考えられる属性をまとめることで、項目数が多くても分析結果の解釈を容易にすることも目的とする。【解決手段】計算機が、複数の前記データテーブルについてデータ項目間の関連を抽出して、データ項目の関連情報を生成し、関連情報に基づいて前記データ項目の復元情報を生成し、前記関連情報に基づいて前記データ項目のグループを生成し、前記関連情報と前記復元情報に基づいて、前記グループ内のデータ項目から代表データ項目を選択し、前記代表データ項目に対応するデータの傾向分析を実施し、前記代表データ項目が所属する前記グループのデータ項目を前記復元情報に基づいて復元し、前記代表データ項目に対応するデータの傾向分析の結果と、前記データ項目の復元結果を出力する。【選択図】図１An object of the present invention is to obtain the same result as that analyzed using all data items with high accuracy while reducing the calculation load by reducing the data items to be analyzed. It also aims to make it easier to interpret the analysis results even if the number of items is large by collecting similar attributes. A computer extracts relationships between data items for a plurality of the data tables, generates related information of the data items, generates restoration information of the data items based on the related information, and the related information Generating a group of the data items based on the data, selecting a representative data item from the data items in the group based on the related information and the restoration information, and performing a trend analysis of the data corresponding to the representative data item Then, the data item of the group to which the representative data item belongs is restored based on the restoration information, and the result of the trend analysis of the data corresponding to the representative data item and the restoration result of the data item are output. [Selection] Figure 1

Description

本発明は、データの傾向分析に関する。 The present invention relates to data trend analysis.

従来から相関分析や機械学習を用いて大量のデータについてデータ傾向分析を行って、データの特徴を導き出す技術が知られている。また、傾向分析としては、与えられたデータから類似するデータを集めて、いくつかのクラスタに分類したり、類似度を算出するクラスタ手法が知られている。 2. Description of the Related Art Conventionally, a technique for deriving data characteristics by performing data trend analysis on a large amount of data using correlation analysis or machine learning is known. As a trend analysis, a cluster method is known in which similar data is collected from given data and classified into several clusters or the degree of similarity is calculated.

データの特徴やデータ項目間の関係を抽出する技術としては特許文献１が知られている。特許文献１には、分析者（または利用者）がデータの内容を詳しく知らなくても、データ整形に適する主キーとなる列（データ項目）を推薦する技術が開示されている。 Patent Document 1 is known as a technique for extracting data characteristics and relationships between data items. Patent Document 1 discloses a technique for recommending a column (data item) serving as a primary key suitable for data shaping even if an analyst (or user) does not know the details of data in detail.

特開２０１２−２３８１５３号公報JP 2012-238153 A

ビッグデータ分析などにおいては、扱うデータ量が多いため、分析の際のリソースのパフォーマンス不足や分析処理時間が膨大にかかるなどの課題があるため、従来では、人手で入力データを取捨選択して分析対象データの絞込みが行われていた。 In big data analysis, etc., since the amount of data handled is large, there are problems such as insufficient resource performance during analysis and a huge amount of analysis processing time. Conventionally, input data is manually selected and analyzed. The target data was narrowed down.

しかし、人（分析者）の判断で入力データを取捨選択する従来の技術では、重要なデータ項目を見落としてしまう可能性があり、また、有用な分析結果が得られなかった場合には、再度入力データの取捨選択を人手で再度実行する必要が生じて作業量が増える、という問題があった。 However, with the conventional technique of selecting input data at the discretion of a person (analyst), there is a possibility that important data items may be overlooked. There was a problem that the amount of work increased because it was necessary to manually select input data again.

一方で、分析対象データのデータ項目を機械的に絞り込み、絞り込んだデータに基づいて傾向分析を行うと、分析者にとって重要なデータ項目が出力結果として得られない場合があった。 On the other hand, when data items of analysis target data are mechanically narrowed down and trend analysis is performed based on the narrowed down data, data items that are important to the analyst may not be obtained as output results.

そこで、本発明は上記問題点に鑑みてなされたもので、分析対象とするデータ項目を絞り計算負荷を下げつつも、全データ項目を使って分析したのと同様の結果を精度良く得ることを目的とする。また同類と考えられる属性をまとめることで、項目数が多くても分析結果の解釈を容易にすることも目的とする。 Therefore, the present invention has been made in view of the above problems, and it is possible to accurately obtain the same result as that analyzed using all data items while reducing the calculation load by narrowing down the data items to be analyzed. Objective. It also aims to make it easier to interpret the analysis results even if the number of items is large by collecting similar attributes.

本発明は、プロセッサとメモリを含む計算機で、データ項目に対応するデータを有するデータテーブルについてデータの傾向分析を行うデータ傾向分析方法であって、前記計算機が、複数の前記データテーブルについてデータ項目間の関連を抽出して、データ項目の関連情報を生成する第１のステップと、前記計算機が、前記関連情報に基づいて前記データ項目の復元情報を生成する第２のステップと、前記計算機が、前記関連情報に基づいて前記データ項目のグループを生成する第３のステップと、前記計算機が、前記関連情報と前記復元情報に基づいて、前記グループ内のデータ項目から代表データ項目を選択する第４のステップと、前記計算機が、前記代表データ項目に対応するデータの傾向分析を実施する第５のステップと、前記計算機が、前記代表データ項目が所属する前記グループのデータ項目を前記復元情報に基づいて復元する第６のステップと、前記計算機が、前記代表データ項目に対応するデータの傾向分析の結果と、前記データ項目の復元結果を出力する第７のステップと、を含む。 The present invention relates to a data trend analysis method for performing data trend analysis on a data table having data corresponding to a data item in a computer including a processor and a memory. A first step of generating association information of the data item, and a second step of the computer generating restoration information of the data item based on the association information, and the computer A third step of generating a group of the data items based on the related information; and a fourth step in which the computer selects a representative data item from the data items in the group based on the related information and the restoration information. A fifth step in which the computer performs a trend analysis of data corresponding to the representative data item; A sixth step of restoring the data item of the group to which the representative data item belongs based on the restoration information; and a result of trend analysis of data corresponding to the representative data item; And a seventh step of outputting a restoration result of the data item.

したがって、本発明の分析結果は同類の項目をまとめて提示するので、人が分析結果を見やすい。全項目を対象とした分析に近い結果が得られる(分析者の主観的な判断で項目を選択して特徴が失われる分析にならない)分析にかかる計算負荷を低減できる。 Therefore, since the analysis result of the present invention presents similar items together, it is easy for a person to see the analysis result. Results close to the analysis of all items can be obtained (the analysis is not an analysis in which features are lost by selecting an item based on the subjective judgment of the analyst).

本発明の実施例を示し、データ傾向分析システムの一例を示すブロック図である。It is a block diagram which shows the Example of this invention and shows an example of a data trend analysis system. 本発明の実施例を示し、絞り込み及び復元サーバの一例を示すブロック図である。It is a block diagram which shows the Example of this invention and shows an example of a narrowing down and restoration | restoration server. 本発明の実施例を示し、データ傾向分析システムで行われる処理の一例を示すフローチャートである。It is a flowchart which shows the Example of this invention and shows an example of the process performed with a data trend analysis system. 本発明の実施例を示し、スキーマ情報テーブルの一例を示す図である。It is a figure which shows the Example of this invention and shows an example of a schema information table. 本発明の実施例を示し、分析対象データテーブルの一例を示す図である。It is a figure which shows the Example of this invention and shows an example of an analysis object data table. 本発明の実施例を示し、データ間関連結果の一例を示す図である。It is a figure which shows the Example of this invention and shows an example of the relationship result between data. 本発明の実施例を示し、独立データ項目テーブルの一例を示す図である。It is a figure which shows the Example of this invention and shows an example of an independent data item table. 本発明の実施例を示し、データ傾向分析システムで行われる入力データ項目のグループ化処理の一例を示すフローチャートである。It is a flowchart which shows the Example of this invention and shows an example of the grouping process of the input data item performed with a data trend analysis system. 本発明の実施例を示し、サマリーテーブルの一例を示す図である。It is a figure which shows the Example of this invention and shows an example of a summary table. 本発明の実施例を示し、サマリーテーブルへの格納処理の一例を示すフローチャートの前半部である。It is the first half of the flowchart which shows the Example of this invention and shows an example of the storing process to a summary table. 本発明の実施例を示し、サマリーテーブルへの格納処理の一例を示すフローチャートの後半部である。It is a latter half part of the flowchart which shows the Example of this invention and shows an example of the storage process to a summary table. 本発明の実施例を示し、回帰式の一例を示す前期総資産合計と総資産合計のグラフである。It is a graph of the previous period total assets total and total assets total which shows the Example of this invention and shows an example of a regression equation. 本発明の実施例を示し、グループを割り当てたサマリーテーブルの一例を示す図である。It is a figure which shows the Example of this invention and shows an example of the summary table which allocated the group. 本発明の実施例を示し、データ項目とグループの関係の一例を示す図である。It is a figure which shows the Example of this invention and shows an example of the relationship between a data item and a group. 本発明の実施例を示し、代表データの選出処理の一例を示すフローチャートである。It is a flowchart which shows the Example of this invention and shows an example of the selection process of representative data. 本発明の実施例を示し、代表データの選出処理の他の例を示すフローチャートである。It is a flowchart which shows the Example of this invention and shows the other example of the selection process of representative data. 本発明の実施例を示し、代表データの選出処理におけるノードの分割の一例を示す図である。It is a figure which shows the Example of this invention and shows an example of the division | segmentation of the node in the selection process of representative data. 本発明の実施例を示し、代表データの選出処理におけるノードの分割の他の例を示す図である。It is a figure which shows the Example of this invention and shows the other example of the division | segmentation of the node in the selection process of representative data. 本発明の実施例を示し、代表データテーブルの一例を示す図である。It is a figure which shows the Example of this invention and shows an example of a representative data table. 本発明の実施例を示し、復元テーブルの一例を示す図である。It is a figure which shows the Example of this invention and shows an example of a restoration table. 本発明の実施例を示し、データ傾向分析処理の一例を示すフローチャートである。It is a flowchart which shows the Example of this invention and shows an example of a data trend analysis process. 本発明の実施例を示し、データの傾向分析の結果テーブルの一例を示す図である。It is a figure which shows the Example of this invention and shows an example of the result table of the tendency analysis of data. 本発明の実施例を示し、データ復元処理の一例を示すフローチャートである。It is a flowchart which shows the Example of this invention and shows an example of a data restoration process. 本発明の実施例を示し、データ復元計算処理の一例を示すフローチャートである。It is a flowchart which shows the Example of this invention and shows an example of a data restoration calculation process. 本発明の実施例を示し、最終結果テーブルの表示画面の一例を示す図である。It is a figure which shows the Example of this invention and shows an example of the display screen of a final result table. 本発明の実施例を示し、候補テーブルの一例を示す図である。It is a figure which shows the Example of this invention and shows an example of a candidate table.

以下、本発明の実施形態を添付図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

＜データ傾向分析システムの構成＞
図１は、本発明の実施例を示し、データ傾向分析システムの一例を示すブロック図である。データ傾向分析システム１は、分析対象データや最終結果などの情報を格納するデータベースサーバ２と、分析対象データのデータ項目の絞り込みと、データ項目の復元を実行する絞り込み及び復元サーバ３と、分析対象データについて傾向分析処理を実行するデータ傾向分析サーバ４と、データ傾向分析システム１を利用するユーザ端末６−１〜６−３と、各サーバとユーザ端末６−１〜６−３を接続するネットワーク５を含む。 <Configuration of data trend analysis system>
FIG. 1 is a block diagram illustrating an example of a data trend analysis system according to an embodiment of the present invention. The data trend analysis system 1 includes a database server 2 that stores information such as analysis target data and final results, a narrowing and restoration server 3 that narrows down data items of the analysis target data, and restores the data items, and an analysis target Data trend analysis server 4 that executes trend analysis processing on data, user terminals 6-1 to 6-3 that use the data trend analysis system 1, and a network that connects each server and user terminals 6-1 to 6-3 5 is included.

なお、以下の説明では、ユーザ端末の個々について特定しない場合には、「−」以降を省略した符号「６」を用いる。その他の構成要素の符号についても同様である。 In the following description, the symbol “6” in which “−” and the subsequent symbols are omitted is used when each user terminal is not specified. The same applies to the reference numerals of other components.

データベースサーバ２は、データベース２０と類義語辞書２２を提供する計算機で、ネットワークインタフェース（図中Ｉ／Ｆ）２１を介してネットワーク５に接続される。なお、データベースサーバ２では図示しないＤＢＭＳ（Ｄａｔａｂａｓｅｍａｎａｇｅｍｅｎｔｓｙｓｔｅｍ）が稼働する。 The database server 2 is a computer that provides a database 20 and a synonym dictionary 22 and is connected to the network 5 via a network interface (I / F in the figure) 21. The database server 2 operates a database management system (DBMS) (not shown).

データベース２０は、分析対象データのテーブルを格納する分析対象データテーブル２２０と、分析対象データのスキーマ情報を格納するスキーマ情報テーブル２１０と、分析対象データのデータ項目間の類似度の計算結果を格納するデータ間関連計算結果２３０と、データテーブルのデータ項目名を格納する独立データ項目テーブル２４０と、データ項目間の類似度と回帰式及びグループを格納するサマリーテーブル２５０と、データの傾向分析の結果を格納する結果テーブル２６０と、選択された代表データを格納する代表データテーブル２７０と、代表データ項目を選択するために最小経路（後述）を格納する復元テーブル２８０と、最終的な演算結果を格納する最終結果テーブル２９０を含む。 The database 20 stores an analysis target data table 220 that stores a table of analysis target data, a schema information table 210 that stores schema information of the analysis target data, and a similarity calculation result between data items of the analysis target data. The inter-data relation calculation result 230, the independent data item table 240 for storing the data item name of the data table, the similarity table between the data items, the summary table 250 for storing the regression equation and the group, and the result of the data trend analysis. A result table 260 to be stored, a representative data table 270 that stores selected representative data, a restoration table 280 that stores a minimum path (to be described later) for selecting a representative data item, and a final calculation result are stored. A final result table 290 is included.

分析対象データテーブル２２０は、複数のデータテーブルで構成することができる。類義語辞書２２は、データテーブルのデータ項目名の名寄せを行うために予め設定された辞書である。 The analysis target data table 220 can be composed of a plurality of data tables. The synonym dictionary 22 is a dictionary set in advance for name identification of data item names in the data table.

絞り込み及び復元サーバ３は、データ項目間の類似度とデータ項目に対応するデータ間の類似度をそれぞれ算出するデータ間関連分析部３１０と、データ項目名の類似度とデータの類似度に基づいてデータ項目のグループを生成するグループ生成部３２０と、各グループで代表となるデータ項目（代表データ項目）を選出する入力代表データ選出部３３０と、代表データ項目からグループに含まれるデータ項目名を復元する復元式（回帰式）を生成する復元式生成部３４０と、復元式に基づいて関連するデータ項目（群）を復元するデータ復元部３５０と、代表データ項目を選出するための候補テーブル３６０を有する。 The narrowing-down and restoration server 3 is based on the inter-data relationship analysis unit 310 that calculates the similarity between the data items and the similarity between the data corresponding to the data items, and the similarity between the data item names and the data similarity A group generation unit 320 that generates a group of data items, an input representative data selection unit 330 that selects a representative data item (representative data item) in each group, and a data item name included in the group from the representative data item A restoration formula generation unit 340 for generating a restoration formula (regression formula) to be performed, a data restoration unit 350 for restoring a related data item (group) based on the restoration formula, and a candidate table 360 for selecting representative data items Have.

データ傾向分析サーバ４は、データ傾向分析部４１を有する。データ傾向分析部４１は、前記従来例と同様であり、相関分析や機械学習等の周知又は公知の技術を適用してデータの傾向分析を実行する。本実施例では、絞り込み及び復元サーバ３が選択した代表データ項目に対応する代表データを用いて傾向分析を実行する。 The data trend analysis server 4 has a data trend analysis unit 41. The data trend analysis unit 41 is the same as the conventional example, and performs a data trend analysis by applying a known or well-known technique such as correlation analysis or machine learning. In this embodiment, the trend analysis is executed using the representative data corresponding to the representative data item selected by the narrowing-down and restoration server 3.

ユーザ端末６は、ネットワークインタフェース（図中Ｉ／Ｆ）６１を介してネットワーク５に接続された計算機で、絞り込み及び復元サーバ３に対する指示と応答を行う入出力部６２を含む。入出力部６２は、キーボードやマウスやタッチパネル等の入力装置とディスプレイなどの出力装置を含む。 The user terminal 6 is a computer connected to the network 5 via a network interface (I / F in the figure) 61 and includes an input / output unit 62 that gives instructions and responses to the narrowing and restoration server 3. The input / output unit 62 includes an input device such as a keyboard, a mouse, and a touch panel, and an output device such as a display.

ユーザ端末６は、データベースサーバ２と絞り込み及び復元サーバ３とデータ傾向分析サーバ４を含むデータ傾向分析システム１を利用してデータの傾向分析を実行する。 The user terminal 6 performs data trend analysis using the data trend analysis system 1 including the database server 2, the narrowing and restoring server 3, and the data trend analysis server 4.

なお、本実施例では、データベースサーバ２と絞り込み及び復元サーバ３とデータ傾向分析サーバ４が異なる計算機で実現される例を示すが、これに限定されるものではない。例えば、データベースサーバ２と絞り込み及び復元サーバ３とデータ傾向分析サーバ４の機能を一つの計算機で提供してもよく、あるいは、各サーバが仮想計算機で稼働するようにしても良い。また、本実施例では、分析対象のデータテーブルをデータ傾向分析システム１内に保持する例を示すが、これに限定されるものではなく、外部の装置から読み込むようにしてもよい。 In the present embodiment, an example is shown in which the database server 2, the narrowing / restoring server 3, and the data trend analysis server 4 are realized by different computers, but the present invention is not limited to this. For example, the functions of the database server 2, the narrowing and restoring server 3, and the data trend analysis server 4 may be provided by a single computer, or each server may be operated by a virtual computer. In the present embodiment, an example in which the data table to be analyzed is held in the data trend analysis system 1 is shown, but the present invention is not limited to this, and it may be read from an external device.

図２は、絞り込み及び復元サーバ３の一例を示すブロック図である。絞り込み及び復元サーバ３は、プロセッサ３１と、メモリ３２と、ストレージ３３と、ネットワークインタフェース３４と、を含む計算機である。 FIG. 2 is a block diagram illustrating an example of the narrowing and restoration server 3. The narrowing and restoring server 3 is a computer including a processor 31, a memory 32, a storage 33, and a network interface 34.

メモリ３２には、データ間関連分析部３１０と、グループ生成部３２０と、入力代表データ選出部３３０と、復元式生成部３４０と、データ復元部３５０がロードされて、プロセッサ３１によって実行される。 The memory 32 is loaded with an inter-data relation analysis unit 310, a group generation unit 320, an input representative data selection unit 330, a restoration formula generation unit 340, and a data restoration unit 350, and is executed by the processor 31.

データ間関連分析部３１０と、グループ生成部３２０と、入力代表データ選出部３３０と、復元式生成部３４０と、データ復元部３５０の各機能部はプログラムとしてメモリ２０２にロードされる。 The functional units of the inter-data relation analysis unit 310, the group generation unit 320, the input representative data selection unit 330, the restoration formula generation unit 340, and the data restoration unit 350 are loaded into the memory 202 as programs.

プロセッサ３１は、各機能部のプログラムに従って処理することによって、所定の機能を提供する機能部として稼働する。例えば、プロセッサ３１は、データ間関連分析プログラムに従って処理することでデータ間関連分析部３１０として機能する。他のプログラムについても同様である。さらに、プロセッサ３１は、各プログラムが実行する複数の処理のそれぞれの機能を提供する機能部としても稼働する。計算機及び計算機システムは、これらの機能部を含む装置及びシステムである。 The processor 31 operates as a functional unit that provides a predetermined function by processing according to a program of each functional unit. For example, the processor 31 functions as the data relation analysis unit 310 by performing processing according to the data relation analysis program. The same applies to other programs. Furthermore, the processor 31 also operates as a function unit that provides each function of a plurality of processes executed by each program. A computer and a computer system are an apparatus and a system including these functional units.

絞り込み及び復元サーバ３の各機能を実現するプログラム、テーブル等の情報は、ストレージ３３や不揮発性半導体メモリ、ハードディスクドライブ、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶デバイス、または、ＩＣカード、ＳＤカード、ＤＶＤ等の計算機読み取り可能な非一時的データ記憶媒体に格納することができる。 Information such as programs and tables for realizing the functions of the narrowing and restoration server 3 includes storage 33, nonvolatile semiconductor memory, hard disk drive, storage device such as SSD (Solid State Drive), IC card, SD card, DVD Etc., and can be stored in a computer readable non-transitory data storage medium.

＜処理の概要＞
図３は、データ傾向分析システム１で行われる処理の一例を示すフローチャートである。この処理は、絞り込み及び復元サーバ３がユーザ端末６からデータの傾向分析の指示を受け付けたときに開始される。なお、データ分析の指示には、分析対象データテーブル２２０に含まれるデータテーブルのスキーマ情報や、分析対象のデータテーブルや分析対象のデータの指定を含むことができる。 <Outline of processing>
FIG. 3 is a flowchart illustrating an example of processing performed in the data trend analysis system 1. This processing is started when the narrowing-down and restoration server 3 receives an instruction for data trend analysis from the user terminal 6. The data analysis instruction can include schema information of the data table included in the analysis target data table 220 and designation of the analysis target data table and analysis target data.

本実施例では、分析対象のデータとして創業１００年以上の企業の傾向を分析する例を示し、スキーマ情報テーブル２１０及びテーブル名２１２から財務諸表や経営者情報等を指定した場合を説明する。以下の説明は、処理の概要を示し、処理の詳細については後述する。 In the present embodiment, an example of analyzing a tendency of a company having a history of over 100 years as analysis target data will be described, and a case where financial statements, management information, etc. are designated from the schema information table 210 and the table name 212 will be described. The following description shows an outline of the processing, and details of the processing will be described later.

ステップＳ１では、絞り込み及び復元サーバ３は、ユーザ端末６が指定したデータテーブルについてデータ項目（入力データ項目）のグループ化を行う。グループ化は、以下の手順で実行される。 In step S 1, the narrowing and restoration server 3 groups data items (input data items) for the data table specified by the user terminal 6. Grouping is performed in the following procedure.

まず、データ間関連分析部３１０が、分析対象として指定されたデータテーブルのデータ項目をスキーマ情報テーブル２１０または分析対象データテーブル２２０から読み込んで、独立データ項目テーブル２４０を生成する。なお、スキーマ情報テーブル２１０には、データベースサーバ２の分析対象データテーブル２２０に格納されたテーブルのスキーマ情報が予め登録されている。また、データ間関連分析部３１０は、指定されたデータテーブルの分析対象データテーブル２２０からデータ項目を取得しても良い。 First, the data relation analysis unit 310 reads the data items of the data table designated as the analysis target from the schema information table 210 or the analysis target data table 220, and generates the independent data item table 240. In the schema information table 210, schema information of tables stored in the analysis target data table 220 of the database server 2 is registered in advance. The inter-data relationship analysis unit 310 may acquire data items from the analysis target data table 220 of the designated data table.

次に、データ間関連分析部３１０が、独立データ項目テーブル２４０と分析対象データテーブル２２０を読み込んで、データ項目間の項目名（自然言語）の類似度と、当該データ項目に対応するデータ（数値）間の類似度と分散値を算出し、データ間関連計算結果２３０を生成する。 Next, the inter-data relationship analysis unit 310 reads the independent data item table 240 and the analysis target data table 220, and compares the item name (natural language) similarity between the data items and the data (numerical value) corresponding to the data item. ) And a variance value are calculated, and an inter-data relation calculation result 230 is generated.

そして、データ間関連分析部３１０が、項目名の類似度とデータ間の類似度に基づいて、データ項目のペアを抽出し、サマリーテーブル２５０を生成する。次に、復元式生成部３４０が、サマリーテーブル２５０のデータ項目のペアに対して双方向の復元式（回帰式）を生成してサマリーテーブル２５０に格納する。また、グループ生成部３２０は、データ項目のペアにグループを割り当ててサマリーテーブル２５０を更新する。 Then, the data relation analysis unit 310 extracts a pair of data items based on the similarity between the item names and the similarity between the data, and generates a summary table 250. Next, the restoration formula generation unit 340 generates a bidirectional restoration formula (regression formula) for the data item pair in the summary table 250 and stores it in the summary table 250. Further, the group generation unit 320 updates the summary table 250 by assigning a group to the data item pair.

次にステップＳ２では、入力代表データ選出部３３０が、サマリーテーブル２５０の各グループから当該グループを代表するデータ項目を代表データ項目として選択する。 Next, in step S 2, the input representative data selection unit 330 selects a data item representing the group from each group of the summary table 250 as a representative data item.

次に、ステップＳ３では、絞り込み及び復元サーバ３が、選択された代表データ項目と代表データ項目のデータをデータ傾向分析サーバ４に通知して、データの傾向分析を実行させる。なお、本実施例では、入力代表データ選出部３３０が、データ傾向分析サーバ４に代表データ項目のデータ（データテーブルやデータ項目等）を通知する例を示すが、他の機能部が実施しても良い。 Next, in step S3, the narrowing down and restoration server 3 notifies the data trend analysis server 4 of the selected representative data item and the data of the representative data item, and causes the data trend analysis to be executed. In the present embodiment, an example is shown in which the input representative data selection unit 330 notifies the data trend analysis server 4 of the data (data table, data item, etc.) of the representative data item. Also good.

データ傾向分析サーバ４は、絞り込み及び復元サーバ３から通知された代表データ項目のデータをデータベースサーバ２から読み込んで、所定の傾向分析（傾向推定）を実行する。傾向分析が完了するとデータ傾向分析サーバ４は、傾向分析の結果を絞り込み及び復元サーバ３に応答する。 The data trend analysis server 4 reads the data of the representative data item notified from the narrowing and restoration server 3 from the database server 2 and executes a predetermined trend analysis (trend estimation). When the trend analysis is completed, the data trend analysis server 4 narrows down the result of the trend analysis and responds to the restoration server 3.

ステップＳ４では、傾向分析の結果を受信した絞り込み及び復元サーバ３ではデータ復元部３５０が、代表データ項目以外のデータ項目を回帰式に基づいて復元する。そして、データ復元部３５０は、代表データ項目に関連する復元後のデータ項目と傾向分析の結果を最終結果テーブル２９０に格納する。データ復元部３５０は、最終結果テーブル２９０の内容をユーザ端末６に出力する。 In step S4, in the refinement and restoration server 3 that has received the result of the trend analysis, the data restoration unit 350 restores data items other than the representative data item based on the regression equation. Then, the data restoration unit 350 stores the restored data item related to the representative data item and the result of the trend analysis in the final result table 290. The data restoration unit 350 outputs the contents of the final result table 290 to the user terminal 6.

以上のような処理によって、ユーザ端末６を利用する分析者にとって重要なデータ項目を、分析者の主観的な解釈に依存することなく、精度よくデータ項目を復元し、適切な分析結果を得ることが可能となる。 Through the processing as described above, data items that are important for the analyst using the user terminal 6 can be accurately restored without depending on the subjective interpretation of the analyst, and appropriate analysis results can be obtained. Is possible.

すなわち、本実施例のデータ傾向分析システム１では、関連のあるデータ項目について、データ項目の名称を自然言語の類似度と、データ項目に対応するデータの数値の類似度の双方について比較することで、絞り込むデータ項目をグループ化する。 That is, in the data trend analysis system 1 of the present embodiment, for the related data items, the names of the data items are compared with both the natural language similarity and the numerical similarity of the data corresponding to the data item. Group data items to be narrowed down.

次に、データ傾向分析システム１では、グループ化されたデータ項目の中から、復元式による復元精度が高くなる代表データ項目を選択する。そして、データ傾向分析システム１は、選択されたデータ項目のデータについて、データの傾向分析を実行する。これにより、絞り込まれたデータ項目に対応する少量のデータによって、全ての分析対象データについて傾向分析を実施した場合と同等の傾向分析の結果を得ることができる。 Next, the data trend analysis system 1 selects a representative data item whose restoration accuracy by the restoration formula is high from the grouped data items. Then, the data trend analysis system 1 performs data trend analysis on the data of the selected data item. Thereby, the result of the trend analysis equivalent to the case where the trend analysis is performed on all the analysis target data can be obtained with a small amount of data corresponding to the narrowed data items.

データ傾向分析システム１では、最後に、代表データ項目のグループ内のデータ項目を復元して、データの傾向分析結果とともにユーザ端末６へ出力することで、分析者に対して、絞り込みの根拠及び全体を提示することが可能となる。 Finally, the data trend analysis system 1 restores the data items in the group of representative data items, and outputs them to the user terminal 6 together with the data trend analysis results, thereby allowing the analyst to narrow down the grounds and overall Can be presented.

なお、データ傾向分析サーバ４が実行する傾向分析（傾向推定）については、周知または公知の技術を適用すれば良いので、本実施例では詳述しない。 Note that the trend analysis (trend estimation) executed by the data trend analysis server 4 is not described in detail in this embodiment because a known or publicly known technique may be applied.

＜テーブルの構成＞
図４は、スキーマ情報テーブル２１０の一例を示す図である。スキーマ情報テーブル２１０には、分析対象データテーブル２２０に格納されたデータテーブルのスキーマ情報が予め設定される。 <Table configuration>
FIG. 4 is a diagram illustrating an example of the schema information table 210. In the schema information table 210, schema information of the data table stored in the analysis target data table 220 is set in advance.

スキーマ情報テーブル２１０は、識別番号（図中＃）２１１と、テーブル名２１２と、項目名２１３をひとつのエントリに含む。なお、図示の例では、２つの項目の例を示したが、その他の項目を含むことができる。 The schema information table 210 includes an identification number (# in the figure) 211, a table name 212, and an item name 213 in one entry. In the illustrated example, the example of two items is shown, but other items may be included.

項目名２１３には、後述する分析対象データテーブル２２０のデータ項目（フィールド名）が自然言語で格納される。 In the item name 213, a data item (field name) of an analysis target data table 220 described later is stored in a natural language.

図５は、分析対象データテーブル２２０の一例を示す図である。分析対象データテーブル２２０は、データベースサーバ２に予め格納された分析対象データのテーブルである。なお、分析対象データテーブル２２０は複数のデータテーブルで構成され、図示の例では財務諸表のデータテーブルの例を示す。 FIG. 5 is a diagram illustrating an example of the analysis target data table 220. The analysis target data table 220 is a table of analysis target data stored in advance in the database server 2. The analysis target data table 220 includes a plurality of data tables, and the illustrated example shows an example of a financial statement data table.

分析対象データテーブル２２０は、企業コード２２１と、売上高２２２と、売上原価２２３と、営業利益２２４と、流動資産２２５と、固定資産２２６と、流動負債２２７と、固定負債２２８と、資本金２２９と、をひとつのエントリに含む。 The analysis target data table 220 includes a company code 221, sales 222, cost of sales 223, operating profit 224, current assets 225, fixed assets 226, current liabilities 227, fixed liabilities 228, and capital 229. And in one entry.

分析対象データテーブル２２０は、データテーブルの毎にデータ項目は異なるが、各フィールドのデータには数値が格納される。 The analysis target data table 220 has different data items for each data table, but numerical values are stored in the data of each field.

図６は、データ間関連計算結果２３０の一例を示す図である。データ間関連計算結果２３０は、データ間関連分析部３１０によって生成されるテーブルである。 FIG. 6 is a diagram illustrating an example of the inter-data relation calculation result 230. The inter-data relation calculation result 230 is a table generated by the inter-data relation analysis unit 310.

データ間関連計算結果２３０は、識別番号（図中＃）２３１と、データ項目（ｘ）２３２と、データ項目（ｙ）２３３と、項目名の類似度２３４と、分散の差２３５と、データ間類似度２３６をひとつのエントリに含む。 The inter-data relation calculation result 230 includes an identification number (# in the figure) 231, a data item (x) 232, a data item (y) 233, an item name similarity 234, a variance difference 235, and an inter-data The similarity 236 is included in one entry.

データ項目（ｘ）２３２とデータ項目（ｙ）２３３には、ペアとなる独立データ項目テーブル２４０（後述）の項目名２４２の値が格納される。項目名の類似度２３４には、データ項目（ｘ）２３２とデータ項目（ｙ）２３３の自然言語による項目名の類似度が格納される。 In the data item (x) 232 and the data item (y) 233, the value of the item name 242 of the independent data item table 240 (described later) as a pair is stored. The item name similarity 234 stores the item name similarity of the data item (x) 232 and the data item (y) 233 in the natural language.

分散の差２３５には、データ項目（ｘ）２３２のデータ（数値）と、データ項目（ｙ）２３３のデータ（数値）の分散値の差分が格納される。データ間類似度２３６には、データ項目（ｘ）２３２のデータ（数値）と、データ項目（ｙ）２３３のデータ（数値）の類似度が格納される。 The variance difference 235 stores the difference between the data (numerical value) of the data item (x) 232 and the variance value of the data (numeric value) of the data item (y) 233. In the inter-data similarity 236, the similarity between the data (numerical value) of the data item (x) 232 and the data (numerical value) of the data item (y) 233 is stored.

図７は、独立データ項目テーブル２４０の一例を示す図である。独立データ項目テーブル２４０は、データ間関連分析部３１０によって生成される。独立データ項目テーブル２４０は、識別番号（図中＃）２４１と、項目名２４２をひとつのエントリに含む。 FIG. 7 is a diagram illustrating an example of the independent data item table 240. The independent data item table 240 is generated by the data relation analysis unit 310. The independent data item table 240 includes an identification number (# in the figure) 241 and an item name 242 in one entry.

項目名２４２には、上述のように分析対象データテーブル２２０のデータ項目が自然言語で格納される。 In the item name 242, the data items of the analysis target data table 220 are stored in the natural language as described above.

図９は、サマリーテーブル２５０の一例を示す図である。サマリーテーブル２５０は、グループ生成部３２０によって更新される。 FIG. 9 is a diagram illustrating an example of the summary table 250. The summary table 250 is updated by the group generation unit 320.

サマリーテーブル２５０は、識別番号（図中＃）２５１と、データ項目（ｘ）２５２と、データ項目（ｙ）２５３と、データ間類似度２５４と、回帰式（ｘ→ｙの復元）２５５と、回帰式（ｙ→ｘの復元）２５６と、グループ＃２５７をひとつのエントリに含む。 The summary table 250 includes an identification number (# in the figure) 251, a data item (x) 252, a data item (y) 253, an inter-data similarity 254, a regression equation (reconstruction of x → y) 255, The regression equation (reconstruction of y → x) 256 and group # 257 are included in one entry.

データ項目（ｘ）２５２とデータ項目（ｙ）２５３は、データ間関連計算結果２３０のデータ項目（ｘ）２３２とデータ項目（ｙ）２３３の内容が格納される。データ間類似度２５４には、データ間関連計算結果２３０のデータ間類似度２３６の値が格納される。 The data item (x) 252 and the data item (y) 253 store the contents of the data item (x) 232 and the data item (y) 233 of the inter-data relation calculation result 230. The inter-data similarity 254 stores the value of the inter-data similarity 236 of the inter-data relation calculation result 230.

回帰式（ｘ→ｙの復元）２５５には、データ項目（ｘ）２５２のデータからデータ項目（ｙ）２５３を復元するための回帰式が格納される。回帰式（ｙ→ｘの復元）２５６には、データ項目（ｙ）２５３のデータからデータ項目（ｘ）２５２を復元するための回帰式が格納される。 The regression equation (reconstruction of x → y) 255 stores a regression equation for restoring the data item (y) 253 from the data item (x) 252 data. The regression equation (reconstruction of y → x) 256 stores a regression equation for restoring the data item (x) 252 from the data item (y) 253 data.

グループ＃２５７には、データ項目（ｘ）２５２とデータ項目（ｙ）２５３のペアが所属するグループの識別子（グループ番号）が格納される。 Group # 257 stores an identifier (group number) of a group to which a pair of data item (x) 252 and data item (y) 253 belongs.

なお、図１３は、図９のサマリーテーブル２５０のグループ＃２５７にグループ番号が付与された状態を示す図である。 FIG. 13 is a diagram showing a state in which a group number is assigned to group # 257 of summary table 250 in FIG.

図１９は、代表データテーブル２７０の一例を示す図である。代表データテーブル２７０は、入力代表データ選出部３３０によって生成される。代表データテーブル２７０は、識別番号（図中＃）２７１と、項目名２７２をひとつのエントリに含む。 FIG. 19 is a diagram illustrating an example of the representative data table 270. The representative data table 270 is generated by the input representative data selection unit 330. The representative data table 270 includes an identification number (# in the figure) 271 and an item name 272 in one entry.

図２０は、復元テーブル２８０の一例を示す図である。復元テーブル２８０は、入力代表データ選出部３３０によって生成される。 FIG. 20 is a diagram illustrating an example of the restoration table 280. The restoration table 280 is generated by the input representative data selection unit 330.

復元テーブル２８０は、識別番号（図中＃）２８１と、データ項目（ｘ）２８２と、データ項目（ｙ）２８３と、回帰式（ｘ→ｙの復元）２８４をひとつのエントリに含む。 The restoration table 280 includes an identification number (# in the figure) 281, a data item (x) 282, a data item (y) 283, and a regression equation (x → y restoration) 284 in one entry.

データ項目（ｘ）２８２とデータ項目（ｙ）２８３には、サマリーテーブル２５０のデータ項目（ｘ）２５２とデータ項目（ｙ）２５３の内容が格納される。回帰式（ｘ→ｙの復元）２８４にはサマリーテーブル２５０の回帰式（ｘ→ｙの復元）２５５の値が格納される。 The data item (x) 282 and the data item (y) 283 store the contents of the data item (x) 252 and the data item (y) 253 of the summary table 250. The regression equation (reconstruction of x → y) 284 stores the value of the regression equation (reconstruction of x → y) 255 of the summary table 250.

図２２は、データの傾向分析の結果テーブル２６０の一例を示す図である。データの傾向分析の結果テーブル２６０は、データ復元部３５０によって生成される。 FIG. 22 is a diagram illustrating an example of a result table 260 of data trend analysis. The data trend analysis result table 260 is generated by the data restoration unit 350.

データの傾向分析の結果テーブル２６０は、識別番号（図中＃）２６１と、条件１（２６２）と、条件２（２６３）をひとつのエントリに含む。本実施例では、創業１００年以上の企業の傾向分析結果として、条件１（２６２）と条件２（２６３）を共に満たす企業が創業から１００年以上継続可能な企業の可能性が高い、ということを示す。 The data trend analysis result table 260 includes an identification number (# in the figure) 261, condition 1 (262), and condition 2 (263) in one entry. In this example, as a result of trend analysis of companies over 100 years old, it is highly likely that companies that satisfy both Condition 1 (262) and Condition 2 (263) can continue for more than 100 years since their founding. Indicates.

図２６は、候補テーブル３６０の一例を示す図である。候補テーブル３６０は入力代表データ選出部３３０が管理するテーブルである。 FIG. 26 is a diagram illustrating an example of the candidate table 360. The candidate table 360 is a table managed by the input representative data selection unit 330.

候補テーブル３６０は、識別番号３６１と、項目名３６２と、分散の差３６３をひとつのエントリに含む。識別番号３６１には、入力代表データ選出部３３０が付与した値が格納される。項目名３６２と分散の差３６３には入力代表データ選出部３３０が選択した代表データ項目の候補の値が格納される。 The candidate table 360 includes an identification number 361, an item name 362, and a variance difference 363 in one entry. The identification number 361 stores a value assigned by the input representative data selection unit 330. In the item name 362 and the variance difference 363, the representative data item candidate value selected by the input representative data selection unit 330 is stored.

＜処理の詳細＞
図８は、データ傾向分析システム１で行われるデータ項目のグループ化処理の一例を示すフローチャートである。この処理は、図３のステップＳ１で行われる。 <Details of processing>
FIG. 8 is a flowchart illustrating an example of data item grouping processing performed in the data trend analysis system 1. This process is performed in step S1 of FIG.

データ傾向分析システム１の絞り込み及び復元サーバ３は、ユーザ端末６から傾向分析を実施するデータテーブル(スキーマ情報テーブル２１０、分析対象データテーブル２２０)の情報を受け付ける。本実施例では、上述のように創業から１００年以上継続されている企業のデータのみが格納されたデータテーブルを用いる例を示す。 The narrowing-down and restoration server 3 of the data trend analysis system 1 receives information on the data tables (the schema information table 210 and the analysis target data table 220) for performing the trend analysis from the user terminal 6. In this embodiment, as shown above, an example is shown in which a data table storing only the data of a company that has been continued for more than 100 years since its establishment is used.

まず、ステップＳ１１では、データ間関連分析部３１０が、分析対象として指定されたデータテーブルのデータ項目をスキーマ情報テーブル２１０または分析対象データテーブル２２０から読み込んで、独立データ項目テーブル２４０を生成する。 First, in step S11, the data relation analysis unit 310 reads the data items of the data table designated as the analysis target from the schema information table 210 or the analysis target data table 220, and generates the independent data item table 240.

独立データ項目テーブル２４０には、分析対象として指定された１以上のデータテーブルの全てのデータ項目が自然言語で格納される。なお、本実施例の分析対象データテーブル２２０では、データ項目の名称は自然言語で記載されているものとする。 In the independent data item table 240, all data items of one or more data tables designated as analysis targets are stored in a natural language. Note that in the analysis target data table 220 of this embodiment, the names of the data items are described in natural language.

ステップＳ１２では、データ間関連分析部３１０が、独立データ項目テーブル２４０の項目名２４２の全ての組合せについて、ステップＳ２１までの処理を繰り返して実行する。データ間関連分析部３１０は、独立データ項目テーブル２４０から２つの項目名２４２を選択してから以下の処理を実行する。 In step S12, the data relation analysis unit 310 repeatedly executes the processing up to step S21 for all combinations of the item names 242 in the independent data item table 240. The data relation analysis unit 310 selects the two item names 242 from the independent data item table 240 and then executes the following processing.

ステップＳ１３では、データ間関連分析部３１０が、選択された２つの項目名２４２の組合せについて自然言語による類似度を算出する。なお、２つの項目名２４２の類似度の算出は、周知または公知の技術を適用することができる。例えば、Ratcliff とObershelp による「ゲシュタルトパターンマッチング」（https://docs.python.jp/3/library/difflib.html）と呼ばれるアルゴリズムを利用して類似度を算出することができる。 In step S 13, the data relation analysis unit 310 calculates the similarity in natural language for the combination of the two selected item names 242. It should be noted that a known or publicly known technique can be applied to calculate the similarity between the two item names 242. For example, the similarity can be calculated using an algorithm called “Gestalt pattern matching” (https://docs.python.jp/3/library/difflib.html) by Ratcliff and Obershelp.

ステップＳ１４では、データ間関連分析部３１０が、算出された類似度と予め設定された閾値Ｔｈ１を比較して、類似度が閾値Ｔｈ１以上であるか否かを判定する。類似度が閾値Ｔｈ１以上であればステップＳ１５へ進み、類似度が閾値Ｔｈ１未満であればステップＳ２１へ進んで次の項目名２４２の組合せを選択して上記処理を繰り返す。 In step S14, the data relation analysis unit 310 compares the calculated similarity with a preset threshold Th1, and determines whether or not the similarity is equal to or greater than the threshold Th1. If the similarity is greater than or equal to the threshold Th1, the process proceeds to step S15. If the similarity is less than the threshold Th1, the process proceeds to step S21, and the next combination of item names 242 is selected and the above process is repeated.

ステップＳ１５では、データ間関連分析部３１０が、選択された２つの項目名２４２に対応するデータ項目のデータ（数値）を分析対象データテーブル２２０から取得して、データの組み合わせの分散値の差分を算出する。分散値の差分の算出は次の（１）式を用いることができる。 In step S15, the data relation analysis unit 310 acquires data (numerical value) of the data item corresponding to the two selected item names 242 from the analysis target data table 220, and calculates the difference between the variance values of the data combination. calculate. The following equation (1) can be used to calculate the difference between the variance values.

ただし、「Ａ」は、データ項目（ｘ）の各データ値を示し、「Ｂ」は、データ項目（ｙ）の各データ値を示し、「Aavg」は、データ項目（ｘ）の加算平均値、「Bavg」は、データ項目（y）の加算平均値、「ｎ」はデータの数を示す。また、データ項目（ｘ）とデータ項目（ｙ）は組み合わせたデータ項目を示す。 However, “A” indicates each data value of the data item (x), “B” indicates each data value of the data item (y), and “Aavg” indicates the addition average value of the data item (x). , “Bavg” indicates the addition average value of the data item (y), and “n” indicates the number of data. The data item (x) and the data item (y) indicate a combined data item.

ステップＳ１６では、データ間関連分析部３１０が、算出された分散値の差分と予め設定された閾値Ｔｈｓを比較して、分散値の差分が閾値Ｔｈｓ以下であるか否かを判定する。分散値の差分が閾値Ｔｈｓ以下であればステップＳ１７へ進み、分散値の差分が閾値Ｔｈｓより上であればステップＳ２１へ進んで次の項目名２４２の組合せを選択して上記処理を繰り返す。 In step S16, the inter-data relationship analysis unit 310 compares the calculated variance value difference with a preset threshold value Ths, and determines whether or not the variance value difference is equal to or less than the threshold value Ths. If the variance value difference is equal to or smaller than the threshold value Ths, the process proceeds to step S17. If the variance value difference is higher than the threshold value Ths, the process proceeds to step S21, and the combination of the next item name 242 is selected and the above process is repeated.

ステップＳ１７では、データ間関連分析部３１０が、選択された２つの項目名２４２に対応するデータ項目のデータ（数値）を分析対象データテーブル２２０から取得して、データ間の類似度を算出する。データ間の類似度についても上述のステップＳ１３と同様に「ゲシュタルトパターンマッチング」を用いることができる。 In step S 17, the data relation analysis unit 310 acquires data items (numerical values) corresponding to the two selected item names 242 from the analysis target data table 220, and calculates the similarity between the data. As for the degree of similarity between data, “Gestalt pattern matching” can be used as in step S13.

ステップＳ１８では、データ間関連分析部３１０が、算出された類似度と予め設定された閾値Ｔｈ２を比較して、類似度が閾値Ｔｈ２以上であるか否かを判定する。類似度が閾値Ｔｈ２以上であればステップＳ１９へ進み、類似度が閾値Ｔｈ１未満であればステップＳ２１へ進んで次の項目名２４２の組合せを選択して上記処理を繰り返す。 In step S18, the data relationship analysis unit 310 compares the calculated similarity with a preset threshold Th2, and determines whether the similarity is equal to or greater than the threshold Th2. If the similarity is greater than or equal to the threshold Th2, the process proceeds to step S19. If the similarity is less than the threshold Th1, the process proceeds to step S21, and the next combination of item names 242 is selected and the above process is repeated.

ステップＳ１９では、データ間関連分析部３１０が、現在選択されている項目名２４２を含む２つのレコードを独立データ項目テーブル２４０から削除する。 In step S 19, the data relation analysis unit 310 deletes two records including the currently selected item name 242 from the independent data item table 240.

ステップＳ２０では、データ間関連分析部３１０が、現在選択されている２つの項目名２４２をペアとして後述するようにサマリーテーブル２５０へ格納する。また、データ間関連分析部３１０は、現在選択されている２つの項目名２４２を、データ間関連計算結果２３０のデータ項目（ｘ）２３２とデータ項目（ｙ）２３３へ格納し、項目名２４２の類似度を項目名の類似度２３４へ格納し、分散値の差分を分散の差２３５へ格納し、データ間の数値の類似度をデータ間類似度２３６へ格納し、識別番号２３１を付与する。 In step S20, the data relation analysis unit 310 stores the two currently selected item names 242 as a pair in the summary table 250 as described later. Further, the inter-data relation analysis unit 310 stores the two currently selected item names 242 in the data item (x) 232 and the data item (y) 233 of the inter-data relation calculation result 230, and stores the item name 242. The similarity is stored in the item name similarity 234, the variance difference is stored in the variance difference 235, the numerical similarity between the data is stored in the inter-data similarity 236, and an identification number 231 is assigned.

ステップＳ２１では、データ間関連分析部３１０が、次の項目名２４２のペアを選択して上記処理を繰り返し、独立データ項目テーブル２４０の項目名２４２の全ての組み合わせについて完了すれば当該処理を終了する。 In step S21, the inter-data relationship analysis unit 310 selects the next item name 242 pair and repeats the above processing. When all the combinations of the item names 242 in the independent data item table 240 are completed, the processing ends. .

上記処理により、データ項目の項目名の類似度が閾値Ｔｈ１以上、データ間の分散値の差分が閾値Ｔｈｓ以下、データの数値の類似度が閾値Ｔｈ２以上の項目名２４２のペアがサマリーテーブル２５０に格納される。そして、独立データ項目テーブル２４０には、類似するペアがない項目名２４２が残される。 As a result of the above processing, a pair of item names 242 in which the similarity of the item names of the data items is greater than or equal to the threshold Th1, the difference in the variance between the data is less than or equal to the threshold Ths, and the similarity of the numerical values of the data is greater than or equal to the threshold Th2 Stored. Then, the item name 242 having no similar pair remains in the independent data item table 240.

本実施例では、データ間関連分析部３１０が、関連のあるデータ項目を選択する際に、自然言語のデータ項目の名称と、データ項目に対応するデータ（数値データ）の両方を比較して類似するデータ項目の絞り込みを行う。これにより、名前が類似し、かつ、値が類似するデータ項目を後述するようにグループ化することができる。 In the present embodiment, when the inter-data relation analysis unit 310 selects related data items, the names of the natural language data items and the data (numerical data) corresponding to the data items are compared and similar. Narrow down the data items to be processed. As a result, data items having similar names and similar values can be grouped as described later.

なお、上記ではデータ項目の名称を自然言語による類似度で関連の有無を判定する例を示したが、これに限定されるものではなく、類義語辞書２２を用いて自然言語による類似の判定を行うようにしてもよい。 In the above description, an example is shown in which the presence / absence of a data item name is determined based on the similarity in natural language. However, the present invention is not limited to this, and the synonym dictionary 22 is used to determine similarity in natural language. You may do it.

また、本実施例では、自然言語が類似するデータ項目についてのみ、データ項目に対応するデータ（数値）について類似の判定を行うことで、自然言語が類似しないデータ項目を関連項目から除外することができる。 Further, in this embodiment, only for data items having a similar natural language, it is possible to exclude data items having a similar natural language from related items by performing a similar determination on data (numerical values) corresponding to the data items. it can.

図１０、図１１は、サマリーテーブル２５０への格納処理の一例を示すフローチャートである。この処理は、図８のステップＳ２０で実行される。 10 and 11 are flowcharts showing an example of storage processing in the summary table 250. FIG. This process is executed in step S20 of FIG.

図１０のステップＳ３１では、復元式生成部３４０が、図８の処理で現在選択されている２つの項目名２４２を、サマリーテーブル２５０のデータ項目（ｘ）２５２とデータ項目（ｙ）２５３に格納し、ステップＳ１７で算出されたデータ間の類似度をデータ間類似度２５４へ格納し、識別番号２５１を付与する。 In step S31 of FIG. 10, the restoration formula generation unit 340 stores the two item names 242 currently selected in the processing of FIG. 8 in the data item (x) 252 and the data item (y) 253 of the summary table 250. Then, the similarity between the data calculated in step S17 is stored in the data similarity 254, and an identification number 251 is assigned.

なお、復元式生成部３４０はデータ間関連計算結果２３０から現在選択されている２つの項目名に対応するデータを取得してサマリーテーブル２５０へ格納するようにしても良い。 The restoration formula generation unit 340 may acquire data corresponding to the two currently selected item names from the inter-data relation calculation result 230 and store the data in the summary table 250.

ステップＳ３２では、復元式生成部３４０が、データ項目（ｘ）２５２のデータからデータ項目（ｙ）２５３を復元する回帰式（ｘ→ｙ）を算出する。なお、回帰式の算出については周知または公知の技術を適用すればよい。 In step S 32, the restoration formula generation unit 340 calculates a regression formula (x → y) for restoring the data item (y) 253 from the data item (x) 252. Note that a known or publicly known technique may be applied for calculating the regression equation.

例えば、図１２で示すように、データ項目（ｘ）を前期総資産合計とし、データ間関連計算結果２３０（ｙ）を総資産合計とした場合、二つの量（前期総資産合計、総資産合計）を座標平面に配置して、データの点の分布を近似する直線を回帰式で表すことができる。 For example, as shown in FIG. 12, when the data item (x) is the total asset for the previous period and the inter-data relation calculation result 230 (y) is the total asset, two amounts (total asset for the previous period, total asset) ) On the coordinate plane, a straight line approximating the distribution of data points can be represented by a regression equation.

ステップＳ３３では、復元式生成部３４０が、データ項目（ｙ）２５３のデータからデータ項目（ｘ）２５２を復元する回帰式（ｙ→ｘ）を算出する。 In step S 33, the restoration formula generation unit 340 calculates a regression formula (y → x) for restoring the data item (x) 252 from the data of the data item (y) 253.

ステップＳ３４では、復元式生成部３４０が、データ項目（ｘ）２５２のデータからデータ項目（ｙ）２５３を復元する回帰式（ｘ→ｙ）をサマリーテーブル２５０の回帰式（ｘ→ｙの復元）２５５へ格納し、データ項目（ｙ）２５３のデータからデータ項目（ｘ）２５２を復元する回帰式（ｙ→ｘ）をサマリーテーブル２５０の回帰式（ｙ→ｘの復元）２５６へ格納する。 In step S34, the restoration formula generation unit 340 uses the regression formula (x → y) for restoring the data item (y) 253 from the data of the data item (x) 252 as the regression formula of the summary table 250 (x → y restoration). The regression formula (y → x) for restoring the data item (x) 252 from the data of the data item (y) 253 is stored in the regression formula (reconstruction of y → x) 256 of the summary table 250.

次に、図１１のステップＳ３５では、グループ生成部３２０が、サマリーテーブル２５０の全てのデータ項目（ｘ）２５２とデータ項目（ｙ）２５３のそれぞれについてステップＳ４４までの処理を繰り返して実行する。 Next, in step S35 of FIG. 11, the group generation unit 320 repeatedly executes the processing up to step S44 for all of the data items (x) 252 and the data items (y) 253 of the summary table 250.

ステップＳ３６では、グループ生成部３２０が、サマリーテーブル２５０の先頭のエントリからデータ項目（ｘ）２５２と、データ項目（ｙ）２５３を選択する。そして、グループ生成部３２０は、サマリーテーブル２５０の先頭のエントリから順にグループ＃２５７の付与状況について、データ項目（ｘ）２５２とデータ項目（ｙ）２５３で列方向に比較を行う。 In step S 36, the group generation unit 320 selects the data item (x) 252 and the data item (y) 253 from the top entry of the summary table 250. Then, the group generation unit 320 compares the data item (x) 252 and the data item (y) 253 in the column direction with respect to the assignment status of the group # 257 in order from the top entry of the summary table 250.

ステップＳ３７では、グループ生成部３２０が、現在選択中のエントリのデータ項目（ｘ）２５２と一致する他のエントリのグループ＃２５７と、現在選択中のエントリのデータ項目（ｙ）２５３と一致する他のエントリのグループ＃２５７が異なるグループであるか否かを判定する。 In step S37, the group generation unit 320 matches the group # 257 of another entry that matches the data item (x) 252 of the currently selected entry and the data item (y) 253 of the currently selected entry. It is determined whether the group # 257 of the entry is a different group.

グループ生成部３２０は、データ項目（ｘ）２５２のグループ＃２５７と、データ項目（ｙ）２５３のグループ＃２５７が異なる場合にはステップＳ３８へ進み、そうでない場合にはステップＳ３９へ進む。 When the group # 257 of the data item (x) 252 and the group # 257 of the data item (y) 253 are different, the group generation unit 320 proceeds to step S38, and otherwise proceeds to step S39.

換言すれば、グループ生成部３２０は、現在選択中のエントリのデータ項目（ｘ）２５２とデータ項目が一致するエントリを図１３のサマリーテーブル２５０の列方向で検索して、一致するエントリにグループ＃２５７が付与されていれば当該グループ番号を変数Ｎｘに設定する。 In other words, the group generation unit 320 searches for an entry whose data item matches the data item (x) 252 of the currently selected entry in the column direction of the summary table 250 in FIG. If 257 is given, the group number is set in the variable Nx.

同様に、グループ生成部３２０は、現在選択中のエントリのデータ項目（ｙ）２５３とデータ項目が一致するエントリをサマリーテーブル２５０の列方向で検索して、一致するエントリにグループ＃２５７が付与されていれば当該グループ番号を変数Ｎｙに設定する。 Similarly, the group generation unit 320 searches in the column direction of the summary table 250 for an entry whose data item matches the data item (y) 253 of the currently selected entry, and the group # 257 is assigned to the matching entry. If so, the group number is set in the variable Ny.

そして、グループ生成部３２０は、変数Ｎｘと変数Ｎｙが一致するか否かを判定して、一致しなければステップＳ３８へ進み、そうでない場合にはステップＳ３９に進む。 Then, the group generation unit 320 determines whether or not the variable Nx and the variable Ny match. If they do not match, the group generation unit 320 proceeds to step S38, otherwise proceeds to step S39.

ステップＳ３８では、グループ生成部３２０が、現在選択中のデータ項目（ｘ）２５２とデータ項目が一致するエントリのグループ＃２５７と、現在選択中のデータ項目（ｙ）２５３とデータ項目が一致するエントリのグループ＃２５７のグループ番号のうち、小さい方の番号を取得して、現在選択中のエントリとデータ項目が一致したエントリのグループ＃２５７を揃える。その後、ステップＳ４４に進む。 In step S38, the group generation unit 320 determines that the data item (x) 252 that is currently selected matches the group # 257 of the data item that matches the data item, and the data item (y) 253 that is currently selected matches the data item. The smaller one of the group numbers of the group # 257 is obtained, and the group # 257 of the entry whose data item matches the currently selected entry is aligned. Thereafter, the process proceeds to step S44.

ステップＳ３９では、グループ生成部３２０が、現在選択中のデータ項目（ｘ）２５２とデータ項目が一致するエントリと、現在選択中のデータ項目（ｙ）２５３とデータ項目が一致するエントリのいずれか一方にグループ＃２５７が設定されているか否かを判定する。 In step S39, the group generation unit 320 selects either the entry whose data item matches the currently selected data item (x) 252 or the entry whose data item matches the currently selected data item (y) 253. It is determined whether or not group # 257 is set.

グループ生成部３２０は、いずれか一方にグループ＃２５７が設定されていればステップＳ４０に進み、そうでない場合にはステップＳ４１に進む。 The group generation unit 320 proceeds to step S40 if the group # 257 is set in either one, and proceeds to step S41 otherwise.

ステップＳ４０では、グループ生成部３２０は、現在選択中のデータ項目（ｘ）２５２とデータ項目が一致するエントリと、現在選択中のデータ項目（ｙ）２５３とデータ項目が一致するエントリのいずれかでグループ＃２５７が設定されている値を現在のエントリのグループ＃２５７に設定する。その後、ステップＳ４４へ進む。 In step S40, the group generation unit 320 selects either the entry whose data item matches the currently selected data item (x) 252 or the entry whose data item matches the currently selected data item (y) 253. The value in which group # 257 is set is set in group # 257 of the current entry. Thereafter, the process proceeds to step S44.

ステップＳ４１では、グループ生成部３２０が、現在選択中のデータ項目（ｘ）２５２とデータ項目が一致するエントリのグループ＃２５７と、現在選択中のデータ項目（ｙ）２５３とデータ項目が一致するエントリのグループ＃２５７が一致するか否かを判定する。グループ生成部３２０は、双方のグループ＃２５７が一致していればステップＳ４２に進み、双方のグループ＃２５７が設定されていなければステップＳ４３へ進む。 In step S41, the group generation unit 320 includes the entry ## 257 of the entry whose data item matches the currently selected data item (x) 252 and the data item (y) 253 whose data item matches. It is determined whether or not the group # 257 matches. The group generation unit 320 proceeds to step S42 if both groups # 257 match, and proceeds to step S43 if both groups # 257 are not set.

ステップＳ４２では、グループ生成部３２０は、現在選択中のデータ項目（ｘ）２５２とデータ項目が一致するエントリのグループ＃２５７の値を現在のエントリのグループ＃２５７に設定する。その後、ステップＳ４４へ進む。 In step S42, the group generation unit 320 sets the value of the group # 257 of the entry whose data item matches the currently selected data item (x) 252 to the group # 257 of the current entry. Thereafter, the process proceeds to step S44.

ステップＳ４４では、サマリーテーブル２５０の次のエントリのデータ項目（ｘ）２５２とデータ項目（ｙ）２５３を選択してからステップＳ３６に戻って上記処理を繰り返す。 In step S44, the data item (x) 252 and the data item (y) 253 of the next entry in the summary table 250 are selected, and then the process returns to step S36 to repeat the above processing.

上記処理により、図９に示したサマリーテーブル２５０は、図１３に示すようにグループ＃２５７が付与される。 Through the above processing, the summary table 250 shown in FIG. 9 is given the group # 257 as shown in FIG.

具体的な例を用いて説明する。図９のサマリーテーブル２５０でグループ生成部３２０が先頭のエントリを選択すると、データ項目（ｘ）２５２＝「前期総資産合計」に一致するデータ項目のエントリのグループ＃２５７と、データ項目（ｙ）２５３＝「負債・純資産合計」に一致するデータ項目のエントリのグループ＃２５７には値が設定されていない。このため、グループ生成部３２０は、ステップＳ４１からステップＳ４３に進んで、新規のグループ番号＝１をグループ＃２５７に設定する。 This will be described using a specific example. When the group generation unit 320 selects the first entry in the summary table 250 of FIG. 9, the data item (x) 252 = the group # 257 of the entry of the data item that matches “total assets in the previous period”, and the data item (y) 253 = No value is set in the group # 257 of the entry of the data item that matches “total liabilities / net assets”. Therefore, the group generation unit 320 proceeds from step S41 to step S43, and sets a new group number = 1 to the group # 257.

サマリーテーブル２５０の２番目のエントリも同様で、新たなグループ番号＝２がグループ＃２５７に設定されて、図１３のサマリーテーブル２５０のように更新される。 The same applies to the second entry of the summary table 250. A new group number = 2 is set in the group # 257, and is updated as in the summary table 250 of FIG.

サマリーテーブル２５０の３番目のエントリでは、データ項目（ｘ）２５２＝「前期総資産合計」とデータ項目が一致するエントリは識別番号２５１＝４で、グループ＃２５７＝１となる。また、サマリーテーブル２５０の３番目のエントリでは、データ項目（ｙ）２５３＝「負債・純資産合計」とデータ項目が一致するエントリは識別番号２５１＝１で、グループ＃２５７＝１となる。 In the third entry of the summary table 250, the entry whose data item matches the data item (x) 252 = “total assets in the previous period” is the identification number 251 = 4 and the group # 257 = 1. Further, in the third entry of the summary table 250, the data item (y) 253 = “total debt / net assets” and the data item that matches the data item have the identification number 251 = 1 and the group # 257 = 1.

したがって、サマリーテーブル２５０の３番目のエントリでは、ステップＳ４１、Ｓ４２によってグループ＃２５７＝１となる。 Therefore, in the third entry of the summary table 250, group # 257 = 1 is set by steps S41 and S42.

以上の処理によって、図１４で示すように、サマリーテーブル２５０のデータ項目（ｘ）２５２と、データ項目（ｙ）２５３の項目名の連鎖からグループ＃２５７が設定される。なお、データ項目（ｘ）２５２またはデータ項目（ｙ）２５３と一致するデータ項目が無いエントリのグループ＃２５７はブランクとなる。 With the above processing, as shown in FIG. 14, the group # 257 is set from the chain of the item names of the data item (x) 252 and the data item (y) 253 of the summary table 250. Note that the group # 257 of the entry having no data item that matches the data item (x) 252 or the data item (y) 253 is blank.

上記のように、データ間関連分析部３１０が算出した関連情報（データ間関連計算結果２３０のペア（データ項目（ｘ）、（ｙ））と類似度及び分散値の差分）と分析対象データテーブル２２０のデータから、復元式生成部３４０は復元情報（復元式）を生成する。そして、グループ生成部３２０は、関連情報に基づいてデータ項目の項目名をグループ化する。 As described above, the relation information calculated by the inter-data relation analysis unit 310 (difference between the pair of data relation calculation results 230 (data items (x), (y)) and the similarity and the variance value) and the analysis target data table From the data 220, the restoration formula generation unit 340 generates restoration information (restoration formula). Then, the group generation unit 320 groups the item names of the data items based on the related information.

図１５は、代表データの選出処理の一例を示すフローチャートである。この処理は、図３のステップＳ２で行われる。 FIG. 15 is a flowchart illustrating an example of representative data selection processing. This process is performed in step S2 of FIG.

ステップＳ５１では、入力代表データ選出部３３０がサマリーテーブル２５０のグループ＃２５７毎にステップＳ６３までの処理を繰り返して、各グループ＃２５７を代表するデータ項目を候補テーブル３６０の中から代表データ項目として選択する。入力代表データ選出部３３０は、まず、グループ＃２５７のグループ番号を取得する。 In step S51, the input representative data selection unit 330 repeats the processing up to step S63 for each group # 257 in the summary table 250, and selects a data item representing each group # 257 as a representative data item from the candidate table 360. To do. First, the input representative data selection unit 330 acquires the group number of the group # 257.

ステップＳ５２では、取得したグループ番号に所属するサマリーテーブル２５０の全てのエントリについて、ステップＳ６０までの処理を繰り返して、入力データと復元データとの差分が小さい方を候補テーブル３６０に格納する。 In step S52, the process up to step S60 is repeated for all entries of the summary table 250 belonging to the acquired group number, and the one with the smaller difference between the input data and the restored data is stored in the candidate table 360.

ステップＳ５３では、入力代表データ選出部３３０が、取得したグループ番号に所属するサマリーテーブル２５０から最初のエントリを選択し、データ項目（ｘ）２５２に対応するすべてのデータに回帰式（ｘ→ｙの復元）２５５を適用して復元データの値＝ｙを算出する。 In step S53, the input representative data selection unit 330 selects the first entry from the summary table 250 belonging to the acquired group number, and all the data corresponding to the data item (x) 252 are regressed (x → y (Restoration) 255 is applied, and the value of restored data = y is calculated.

ステップＳ５４では、入力代表データ選出部３３０が、ステップＳ５３で選択したエントリについて、データ項目（ｙ）２５３に対応するすべてのデータを入力データとして、下記（２）式で差分ｄｉｆｆを算出し、差分（２）とする。 In step S54, the input representative data selection unit 330 calculates the difference diff by the following equation (2) using all the data corresponding to the data item (y) 253 as the input data for the entry selected in step S53. (2).

ただし、「Ａ」は、入力データの各データ値を示し、「Ｂ」は、復元データの各データ値を示し、「ｎ」はデータの数を示す。 However, “A” indicates each data value of the input data, “B” indicates each data value of the restored data, and “n” indicates the number of data.

ステップＳ５５では、入力代表データ選出部３３０が、ステップＳ５３で選択したエントリについて、データ項目（ｙ）２５３に対応するすべてのデータに回帰式（ｙ→ｘの復元）２５６を適用して復元データの値＝ｘを算出する。 In step S55, the input representative data selection unit 330 applies the regression equation (restoration of y → x) 256 to all the data corresponding to the data item (y) 253 for the entry selected in step S53. Value = x is calculated.

ステップＳ５６では、入力代表データ選出部３３０が、ステップＳ５３で選択したエントリについて、データ項目（ｘ）２５２に対応するすべてのデータを入力データとして、上記（２）式で差分ｄｉｆｆを算出し、差分（４）とする。 In step S56, the input representative data selection unit 330 calculates the difference diff by the above equation (2) using all data corresponding to the data item (x) 252 as input data for the entry selected in step S53. (4).

ステップＳ５７では、入力代表データ選出部３３０が、上記算出した差分（２）が差分（４）を超えたか否かを判定する。入力代表データ選出部３３０は、差分（２）が差分（４）を超えていれば、ステップＳ５８へ進み、差分（２）が差分（４）以下であれば、ステップＳ５９へ進む。 In step S57, the input representative data selection unit 330 determines whether or not the calculated difference (2) exceeds the difference (4). The input representative data selection unit 330 proceeds to step S58 if the difference (2) exceeds the difference (4), and proceeds to step S59 if the difference (2) is equal to or less than the difference (4).

ステップＳ５８では、入力代表データ選出部３３０が、データ項目（ｙ）２５３の項目名と差分（２）を候補テーブル３６０に追加する。また、入力代表データ選出部３３０が、データ項目（ｘ）２５２とデータ項目（ｙ）２５３及び回帰式（ｘ→ｙの復元）２５５のエントリを復元テーブル２８０に追加する。 In step S 58, the input representative data selection unit 330 adds the item name of the data item (y) 253 and the difference (2) to the candidate table 360. Further, the input representative data selection unit 330 adds entries of the data item (x) 252, the data item (y) 253, and the regression equation (reconstruction of x → y) 255 to the restoration table 280.

一方、ステップＳ５９では、入力代表データ選出部３３０が、データ項目（ｘ）２５２の項目名と差分（４）を候補テーブル３６０に追加する。 On the other hand, in step S 59, the input representative data selection unit 330 adds the item name of the data item (x) 252 and the difference (4) to the candidate table 360.

その後、ステップＳ６０へ進んで、入力代表データ選出部３３０は、グループ番号内の次のエントリを選択して上記処理を繰り返す。入力代表データ選出部３３０は、グループ番号に所属する全てのエントリについて上記処理を実施した後にステップＳ６１に進む。 Thereafter, the process proceeds to step S60, and the input representative data selection unit 330 selects the next entry in the group number and repeats the above process. The input representative data selection unit 330 performs the above processing for all entries belonging to the group number, and then proceeds to step S61.

ステップＳ６１では、入力代表データ選出部３３０が、候補テーブル３６０から差分３６３が最大のエントリを選択し、当該エントリの項目名３６２を代表データテーブル２７０に追加する。 In step S 61, the input representative data selection unit 330 selects an entry having the largest difference 363 from the candidate table 360 and adds the item name 362 of the entry to the representative data table 270.

ステップＳ６２では、入力代表データ選出部３３０が、候補テーブル３６０をクリア（初期化）する。その後、ステップＳ６３へ進んで、入力代表データ選出部３３０は、サマリーテーブル２５０の次のグループ番号を選択して上記処理を繰り返す。入力代表データ選出部３３０は、サマリーテーブル２５０の全てのグループ番号について上記処理を実施した後に終了する。 In step S62, the input representative data selection unit 330 clears (initializes) the candidate table 360. Thereafter, the process proceeds to step S63, where the input representative data selection unit 330 selects the next group number in the summary table 250 and repeats the above processing. The input representative data selection unit 330 ends the above process for all group numbers in the summary table 250.

上記処理によって、入力データと復元データの差分ｄｉｆｆが小さい方の元データを代表データ項目とすることができる。これにより、ひとつのグループにまとめたデータ項目のうち、データ傾向分析の出力結果から、同じグループ内で、代表データ項目以外のデータ項目を復元する際に、できるだけ復元精度が高くなるような代表データ項目が選択される。そして、代表データ項目はデータ傾向分析サーバ４へ入力する主データ項目として利用される。 By the above process, the original data having the smaller difference diff between the input data and the restored data can be set as the representative data item. As a result, among the data items collected in one group, representative data that can be restored as accurately as possible when restoring data items other than the representative data item within the same group from the output results of data trend analysis. The item is selected. The representative data item is used as a main data item to be input to the data trend analysis server 4.

以上のように、入力代表データ選出部３３０は、データ間関連分析部３１０が生成した関連情報と、復元式生成部３４０が生成した復元情報に基づいて、復元精度を確保可能な代表データ項目をグループ毎に選択する。なお、入力代表データ選出部３３０は、データ項目のデータに復元情報を適用して、復元結果の分散値の差分をデータ項目のペアで比較して、分散値の差分が小さい方を代表データ項目の候補、すなわち復元精度を確保可能と判定する。 As described above, the input representative data selection unit 330 selects representative data items that can ensure the restoration accuracy based on the related information generated by the inter-data relationship analysis unit 310 and the restoration information generated by the restoration formula generation unit 340. Select for each group. Note that the input representative data selection unit 330 applies the restoration information to the data of the data item, compares the difference of the variance values of the restoration results with the pair of data items, and selects the one with the smaller variance value as the representative data item. It is determined that the restoration accuracy can be secured.

図１６は、代表データの選出処理におけるノードの分割の一例を示す図である。代表データ項目の選出は、上記図１６の差分に限定されるものではなく、ノードの分割によっても実施することができる。代表データの第２の選出処理としてノードの分割の例について説明する。 FIG. 16 is a diagram illustrating an example of node division in the representative data selection process. The selection of the representative data item is not limited to the difference shown in FIG. 16, but can be performed by dividing the node. An example of node division as the second selection process of representative data will be described.

なお、以下の説明では、ノード、エッジ、トポロジを以下のように定義する。ノードは、サマリーテーブル２５０のデータ項目（ｘ）２５２とデータ項目（ｙ）２５３を指す。エッジは、データ項目（ｘ）２５２とデータ項目（ｙ）２５３の関係を指し、データ間類似度２５４で表す。トポロジは、グループ＃２５７のグループ番号で表す。 In the following description, nodes, edges, and topologies are defined as follows. The node indicates the data item (x) 252 and the data item (y) 253 of the summary table 250. The edge indicates the relationship between the data item (x) 252 and the data item (y) 253 and is represented by an inter-data similarity 254. The topology is represented by a group number of group # 257.

ステップＳ７１では、入力代表データ選出部３３０がサマリーテーブル２５０のグループ＃２５７毎にステップＳ７９までの処理を繰り返して、各グループ＃２５７を代表するデータ項目を代表データテーブル２７０に格納する。入力代表データ選出部３３０は、グループ＃２５７のグループ番号を取得する。 In step S 71, the input representative data selection unit 330 repeats the processing up to step S 79 for each group # 257 of the summary table 250, and stores data items representing each group # 257 in the representative data table 270. The input representative data selection unit 330 acquires the group number of group # 257.

ステップＳ７２では、入力代表データ選出部３３０が、ステップＳ７１で取得したグループ番号に所属するノード（エントリ）について、各ノード間の最小距離を算出する。本実施例では、ノード間の距離をデータ間類似度２５４で表す。 In step S72, the input representative data selection unit 330 calculates the minimum distance between the nodes (entries) belonging to the group number acquired in step S71. In this embodiment, the distance between nodes is represented by the inter-data similarity 254.

ステップＳ７３では、入力代表データ選出部３３０が、ステップＳ７２で算出したノード間距離が最小となる経路（データ項目（ｘ）２５２とデータ項目（ｙ）２５３）を最小経路とし、最小経路と回帰式（ｘ→ｙの復元）２５５を復元テーブル２８０に格納する。 In step S73, the input representative data selection unit 330 sets the path (data item (x) 252 and data item (y) 253) having the smallest inter-node distance calculated in step S72 as the minimum path, and sets the minimum path and regression equation. (Restore x → y) 255 is stored in the restore table 280.

ステップＳ７４では、当該最小経路について、対象ノードからの距離が所定の閾値Ｔｈ３以上になったか否かを判定する。入力代表データ選出部３３０は。対象ノードからの距離が所定の閾値Ｔｈ３以上の場合には、ステップＳ７５に進み、対象ノードからの距離が所定の閾値Ｔｈ３未満の場合には、ステップＳ７６に進む。 In step S74, it is determined whether or not the distance from the target node is equal to or greater than a predetermined threshold Th3 for the minimum path. The input representative data selection unit 330. If the distance from the target node is equal to or greater than the predetermined threshold Th3, the process proceeds to step S75. If the distance from the target node is less than the predetermined threshold Th3, the process proceeds to step S76.

ステップＳ７５では、入力代表データ選出部３３０が、対象ノードからの距離が所定の閾値Ｔｈ３以上となったエッジでノード間を２分割する。そしてステップＳ７２へ戻って上記処理を繰り返す。 In step S75, the input representative data selection unit 330 divides the node into two at the edge where the distance from the target node is equal to or greater than the predetermined threshold Th3. And it returns to step S72 and repeats the said process.

ステップＳ７６では、入力代表データ選出部３３０が、ノード間の距離の合計（乗算）を算出し、距離の合計（乗算）が最小となるノードを主ノード（ｐｒｉｍａｒｙｎｏｄｅ）として算出する。なお、距離の合計（乗算）の最小値は、次の（３）式で算出する。 In step S76, the input representative data selection unit 330 calculates the sum (multiplication) of the distances between the nodes, and calculates the node having the minimum distance (multiplication) as the main node (primary node). Note that the minimum value of the sum (multiplication) of distances is calculated by the following equation (3).

ただし、Ｓはグループを示し、ｘはノードを示す。 However, S shows a group and x shows a node.

次に、ステップＳ７７では、入力代表データ選出部３３０が、距離の合計（乗算）が最小となるノードを主ノードとして選択し、ステップＳ７８に進んで主ノードのデータ項目を代表データ項目として代表データテーブル２７０に追加する。 Next, in step S77, the input representative data selection unit 330 selects a node having the smallest total distance (multiplication) as the main node, and proceeds to step S78 to display the main node data item as the representative data item. Add to table 270.

ステップＳ７９では、入力代表データ選出部３３０が、次のグループ番号を選択して上記処理を繰り返し、すべてのグループ番号について代表データ項目の選出が完了すると処理を終了する。 In step S79, the input representative data selection unit 330 selects the next group number and repeats the above process. When the selection of representative data items for all group numbers is completed, the process ends.

上記処理により、グループ内の対象のノードから、その他の全てのノード間の距離（データ間類似度）を算出し、ノード間距離の合計(乗算)が最も小さいノードの１つを主ノードとする。そして、入力代表データ選出部３３０は、主ノードと、その他のノードのノード間の距離（データ間類似度）の乗算値が、閾値Ｔｈ３以上の範囲を１つのグループと判定する。一方、入力代表データ選出部３３０は、閾値Ｔｈ３未満の場合は閾値を下回ったエッジ（枝）で切る。なお、距離の合計が最も小さいノードが２つある場合にはランダムに選択すればよい。 By the above processing, the distance between all other nodes (similarity between data) is calculated from the target nodes in the group, and one of the nodes having the smallest total (multiplication) distance between nodes is set as the main node. . Then, the input representative data selection unit 330 determines that a range in which the multiplication value of the distance between the main node and other nodes (similarity between data) is greater than or equal to the threshold Th3 is one group. On the other hand, when the input representative data selection unit 330 is less than the threshold Th3, the input representative data selection unit 330 cuts with an edge (branch) that is lower than the threshold. If there are two nodes with the smallest total distance, they may be selected randomly.

以上のように、ノード間の距離が閾値Ｔｈ３以上の場合には、復元の連鎖によって復元精度が低下するため、ノード（トポロジ）を分割することで、復元精度を確保可能な代表データ項目を選択することができる。 As described above, when the distance between the nodes is equal to or greater than the threshold Th3, the restoration accuracy decreases due to the restoration chain. Therefore, by selecting a representative data item that can ensure restoration accuracy by dividing the node (topology) can do.

図１７は、代表データの選出処理におけるノードの分割の一例を示す図である。図１６に示したノード間のデータの類似度から代表データ項目を算出する例を示す。 FIG. 17 is a diagram illustrating an example of node division in the representative data selection process. 17 shows an example of calculating representative data items from the similarity of data between nodes shown in FIG.

入力代表データ選出部３３０は、主ノードから対象ノード間の類似度（距離）の乗算値を算出する（Ｓｔｅｐ１）。そして、入力代表データ選出部３３０は、主ノードからの距離（類似度の乗算値）と閾値（０．７）とを比較して閾値未満のエッジで切断してトポロジを２分割する（Ｓｔｅｐ２）。 The input representative data selection unit 330 calculates a multiplication value of the similarity (distance) between the main node and the target node (Step 1). Then, the input representative data selection unit 330 compares the distance from the main node (similarity multiplication value) and the threshold (0.7), cuts at an edge less than the threshold, and divides the topology into two (Step 2). .

以上の処理を繰り返すことで、グループ番号内のエントリから代表データ項目（主ノード）を選出することができる。 By repeating the above processing, a representative data item (main node) can be selected from the entries in the group number.

図１８は、代表データの選出処理におけるノードの分割の他の例を示す図である。図１６に示したノード間の類似度から代表データ項目を算出する例を示す。この例は、ノードの配置が図１７の直線的な配置とは異なる例である。 FIG. 18 is a diagram illustrating another example of node division in the representative data selection process. An example in which a representative data item is calculated from the similarity between nodes illustrated in FIG. In this example, the node arrangement is different from the linear arrangement shown in FIG.

この場合も、図１７と同様に隣り合うノードとのデータ間の類似度を算出し、さらに、対角となるノード間についてもデータ間の類似度を乗算によって算出する（Ｓｔｅｐ１）。そして、入力代表データ選出部３３０は、主ノードからの距離（類似度の乗算値）と閾値（０．７）とを比較して閾値未満のエッジで切断してトポロジを２分割する（Ｓｔｅｐ２）。 In this case as well, the similarity between data with adjacent nodes is calculated in the same manner as in FIG. 17, and the similarity between data is also calculated between the diagonal nodes by multiplication (Step 1). Then, the input representative data selection unit 330 compares the distance from the main node (similarity multiplication value) and the threshold (0.7), cuts at an edge less than the threshold, and divides the topology into two (Step 2). .

図２１は、データ傾向分析処理の一例を示すフローチャートである。この処理は、図３のステップＳ３で行われる処理である。 FIG. 21 is a flowchart illustrating an example of the data trend analysis process. This process is a process performed in step S3 of FIG.

入力代表データ選出部３３０は、独立データ項目テーブル２４０と代表データテーブル２７０の項目と、項目に対応するデータ（値）をデータ傾向分析サーバ４へ入力し、傾向分析を指令する（Ｓ８１）。 The input representative data selection unit 330 inputs the items of the independent data item table 240 and the representative data table 270 and the data (value) corresponding to the items to the data trend analysis server 4 and instructs the trend analysis (S81).

データ傾向分析サーバ４では、データ傾向分析部４１が絞り込み及び復元サーバ３から受け付けた項目と値について傾向分析を実施して、傾向分析の結果を絞り込み及び復元サーバ３に応答する。 In the data trend analysis server 4, the data trend analysis unit 41 performs trend analysis on the items and values received from the narrowing and restoration server 3, and responds to the narrowing and restoration server 3 with the result of the trend analysis.

上記処理によって、グループ＃２５７の代表データ項目と、グループ化できなかった独立データ項目テーブル２４０のデータによって傾向分析が実施される。 Through the above processing, the trend analysis is performed on the representative data items of group # 257 and the data of the independent data item table 240 that cannot be grouped.

図２３、図２４は、データ復元処理の一例を示すフローチャートである。この処理は、図３のステップＳ４で行われる処理で、復元テーブル２８０を利用して代表データ以外のデータを復元する。 23 and 24 are flowcharts illustrating an example of the data restoration process. This process is a process performed in step S4 of FIG. 3, and restores data other than the representative data using the restoration table 280.

絞り込み及び復元サーバ３は、データ傾向分析サーバ４から傾向分析の結果を受け付けた後に、図２３のフローチャートを開始する。なお、絞り込み及び復元サーバ３は、データ傾向分析サーバ４から傾向分析の結果を受け付けると、代表データ選出部３３０が、データの傾向分析の結果テーブル２６０に格納しておく。 The narrowing-down and restoration server 3 starts the flowchart of FIG. 23 after receiving the result of the trend analysis from the data trend analysis server 4. When the narrowing-down and restoration server 3 receives the result of the trend analysis from the data trend analysis server 4, the representative data selection unit 330 stores the result in the data trend analysis result table 260.

図２３のステップＳ８６では、データ復元部３５０が、データの傾向分析の結果テーブル２６０の条件項目の全てについて、ステップＳ８８までの処理を繰り返す。ステップＳ８７では、データ復元部３５０が、データの傾向分析の結果テーブル２６０の条件項目（２６２、２６３）のデータ項目を入力項目として取得する。 In step S86 of FIG. 23, the data restoration unit 350 repeats the process up to step S88 for all the condition items in the data trend analysis result table 260. In step S87, the data restoration unit 350 acquires the data items of the condition items (262, 263) of the data trend analysis result table 260 as input items.

ステップＳ８７では、データ復元部３５０が、上記取得した入力項目について、復元テーブル２８０を参照して図２４に示す復元計算処理を実行する。 In step S87, the data restoration unit 350 executes the restoration calculation process shown in FIG. 24 with reference to the restoration table 280 for the acquired input items.

ステップＳ８８では、データ復元部３５０が、次の条件を結果テーブル２６０から選択して上記処理を繰り返し、すべての条件について復元計算処理が完了すると処理を終了する。 In step S88, the data restoration unit 350 selects the next condition from the result table 260 and repeats the above process. When the restoration calculation process is completed for all conditions, the process ends.

図２４のステップＳ９１では、データ復元部３５０が、上記ステップＳ８６で取得した入力項目と復元テーブル２８０のデータ項目（ｘ）２８２を比較して、一致する項目名を検索する。 In step S91 of FIG. 24, the data restoration unit 350 compares the input item acquired in step S86 with the data item (x) 282 of the restoration table 280, and searches for a matching item name.

ステップＳ９２では、データ復元部３５０が、入力項目に一致するデータ項目（ｘ）２８２が存在するか否かを判定する。入力項目に一致するデータ項目（ｘ）２８２が存在する場合はステップＳ９３へ進み、そうでない場合には処理を終了して次の条件式に進む。 In step S92, the data restoration unit 350 determines whether there is a data item (x) 282 that matches the input item. If there is a data item (x) 282 that matches the input item, the process proceeds to step S93. If not, the process ends and the process proceeds to the next conditional expression.

ステップＳ９３では、データ復元部３５０が、復元テーブル２８０の回帰式（ｘ→ｙの復元）２８４を用いて、データ項目（ｘ）２８２に対応するデータからデータ項目（ｙ）と値＝ｙを算出する。この処理は、入力項目に一致したデータ項目（ｘ）２８２の全てについて実行する。 In step S93, the data restoration unit 350 calculates the data item (y) and the value = y from the data corresponding to the data item (x) 282 by using the regression equation (x → y restoration) 284 of the restoration table 280. To do. This process is executed for all the data items (x) 282 that match the input items.

ステップＳ９４では、ステップＳ９３で復元されたデータ項目（ｙ）についてステップＳ９７までの処理を繰り返して実行する。データ復元部３５０は、復元されたデータ項目（ｙ）をひとつ選択する。 In step S94, the process up to step S97 is repeatedly executed for the data item (y) restored in step S93. The data restoration unit 350 selects one restored data item (y).

ステップＳ９５では、データ復元部３５０が、上記選択されたデータ項目（ｙ）を最終結果テーブル２９０に追加する。データ復元部３５０は、現在選択中の結果テーブル２６０の条件及び識別番号２６１に対応する最終結果テーブル２９０の条件（２９２、２９３）と識別番号２９１にデータ項目（ｙ）を追加する。 In step S95, the data restoring unit 350 adds the selected data item (y) to the final result table 290. The data restoration unit 350 adds the data item (y) to the condition (292, 293) and the identification number 291 of the final result table 290 corresponding to the condition of the currently selected result table 260 and the identification number 261.

ステップＳ９６では、データ復元部３５０が、復元したデータ項目（ｙ）に対応するデータを分析対象データテーブル２２０から取得して所定の演算を行った結果を、ステップＳ９５の最終結果テーブル２９０の条件（２９２、２９３）と識別番号２９１に追加する。なお、図２５の最終結果テーブル２９０の例では、データ項目（ｙ）に対応する値の範囲を算出する例を示した。 In step S96, the data restoration unit 350 obtains the data corresponding to the restored data item (y) from the analysis target data table 220 and performs a predetermined calculation, and the result of the condition (in the final result table 290 in step S95) ( 292, 293) and the identification number 291. In the example of the final result table 290 in FIG. 25, an example in which the range of values corresponding to the data item (y) is calculated is shown.

ステップＳ９７では、データ復元部３５０が、次のデータ項目（ｙ）を選択して上記処理を繰り返し、すべてのデータ項目（ｙ）について復元計算処理が完了すると処理を終了して、図２３の処理へ復帰する。 In step S97, the data restoration unit 350 selects the next data item (y) and repeats the above process. When the restoration calculation process is completed for all data items (y), the process ends, and the process of FIG. Return to.

上記処理により、データの傾向分析の結果テーブル２６０の条件（２６２、２６３）のデータ項目に一致する代表データ項目から、同じグループに所属するデータ項目（ｙ）と値の範囲が復元される。 By the above processing, the data item (y) belonging to the same group and the value range are restored from the representative data item that matches the data item of the condition (262, 263) in the result table 260 of the data trend analysis.

上記処理が終了すると、データ復元部３５０は、最終結果テーブル２９０の表示画面６２０をユーザ端末６へ出力する。ユーザ端末６の利用者（分析者）は、データの傾向分析の結果の後に、代表データ項目にまとめられた入力データ項目と値が復元されることで、分析者にとって重要なデータ項目が含まれているか否かを容易に判定することができる。 When the above processing ends, the data restoration unit 350 outputs the display screen 620 of the final result table 290 to the user terminal 6. The user (analyzer) of the user terminal 6 includes the data items important to the analyst by restoring the input data items and values collected in the representative data items after the result of the data trend analysis. It can be easily determined whether or not.

図２５の最終結果テーブル２９０の表示画面６２０では、図２２に示した代表データ項目によるデータの傾向分析の結果に対して、代表データ項目に関連するデータ項目の復元結果（項目名とデータ（範囲）が条件１（２９２））が追加される。 In the display screen 620 of the final result table 290 in FIG. 25, the restoration result of the data item related to the representative data item (item name and data (range) in relation to the result of the data trend analysis by the representative data item shown in FIG. ) Is added to condition 1 (292)).

図示の例では、代表データ項目＝「前期総資産合計」に対して同一のグループ＃２５７に所属する「総資産合計」と「負債・純資産合計」のデータ項目が復元され、復元されたデータ項目からデータ（数値）も復元される。 In the illustrated example, the data items of “total assets” and “total liabilities / net assets” belonging to the same group # 257 are restored for the representative data item = “total assets in the previous period”, and the restored data items Data (numerical value) is also restored.

ユーザ端末６を利用する分析者は、最終結果テーブル２９０の表示画面６２０を参照することで、絞り込み及び復元サーバ３が実施したグルーピングの復元結果と、データの傾向分析の結果から、不要なデータ項目の選択等を行うことが可能になる。 The analyst using the user terminal 6 refers to the display screen 620 of the final result table 290 to determine unnecessary data items from the grouping restoration result performed by the narrowing and restoration server 3 and the data trend analysis result. Can be selected.

＜まとめ＞
本実施例では、関連のあるデータ項目を選択する際に、自然言語のデータ項目の名称と、データ項目に対応するデータ（数値データ）の両方を比較して類似するデータ項目の絞り込みを行う。これにより、名前が類似し、かつ、値が類似するデータ項目をグループ化することができる。また、代表データ項目のデータから除外したデータ項目を復元することが可能となって、分析者に復元結果を提示することが可能となる。 <Summary>
In this embodiment, when selecting related data items, similar data items are narrowed down by comparing both the name of the data item in the natural language and the data (numerical data) corresponding to the data item. As a result, data items having similar names and similar values can be grouped. In addition, the data item excluded from the representative data item data can be restored, and the restoration result can be presented to the analyst.

絞り込み及び復元サーバ３は、データ傾向分析サーバ４へ投入するデータ項目を各グループの代表データ項目に絞り込むことで、データの傾向分析の処理負荷を低減しながら、傾向分析の精度を確保することができる。すなわち、本実施例では、代表データ項目に絞り込むことでデータ傾向分析の処理に要する時間を短縮しながら、全てのデータ項目を入力したのと同等の結果を得ることが可能となる。 The narrowing-down and restoration server 3 can secure the accuracy of trend analysis while reducing the processing load of data trend analysis by narrowing down the data items to be input to the data trend analysis server 4 to the representative data items of each group. it can. In other words, in this embodiment, it is possible to obtain the same result as inputting all data items while reducing the time required for the data trend analysis process by narrowing down to representative data items.

そして絞り込み及び復元サーバ３は、代表データ項目を選択する際に、代表データ項目のデータから除外するデータ項目を復元する精度を確保可能なデータ項目を代表データ項目として選択する。これにより、データの傾向分析結果を得た後に、代表データ項目と除外されたデータ項目の関係を提供することができる。ユーザ端末６を利用する分析者は、どのようなデータ項目が纏められたのかを知ることができ、分析者にとって重要なデータ項目が傾向分析の結果として得られたか否かを確認することができる。また、本実施例では、ユーザ端末６を利用する分析者は、傾向分析の後処理で、不要なデータ項目を選択することが可能になる。 Then, when selecting the representative data item, the narrowing and restoring server 3 selects a data item that can ensure the accuracy of restoring the data item excluded from the data of the representative data item as the representative data item. Thereby, after obtaining the trend analysis result of the data, the relationship between the representative data item and the excluded data item can be provided. An analyst using the user terminal 6 can know what data items are collected, and can confirm whether data items important to the analyst are obtained as a result of trend analysis. . In this embodiment, an analyst using the user terminal 6 can select unnecessary data items in the post-processing of trend analysis.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に記載したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加、削除、又は置換のいずれもが、単独で、又は組み合わせても適用可能である。 In addition, this invention is not limited to an above-described Example, Various modifications are included. For example, the above-described embodiments are described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. In addition, any of the additions, deletions, or substitutions of other configurations can be applied to a part of the configuration of each embodiment, either alone or in combination.

また、上記の各構成、機能、処理部、及び処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、及び機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記録装置、または、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。 Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. In addition, each of the above-described configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files for realizing each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 Further, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

１データ傾向分析システム
２データベースサーバ
３絞り込み及び復元サーバ
４データ傾向分析サーバ４
６ユーザ端末６−１〜６−３
２０データベース
２２０分析対象データテーブル
２４０独立データ項目テーブル
２６０データの傾向分析の結果テーブル
２５０サマリーテーブル
２７０代表データテーブル
２８０復元テーブル
２９０最終結果テーブル
３１０データ間関連分析部
３２０グループ生成部
３３０入力代表データ選出部
３４０復元式生成部
３５０データ復元部 1 Data trend analysis system 2 Database server 3 Refinement and restoration server 4 Data trend analysis server 4
6 User terminals 6-1 to 6-3
20 Database 220 Analysis target data table 240 Independent data item table 260 Data trend analysis result table 250 Summary table 270 Representative data table 280 Restoration table 290 Final result table 310 Inter-data relation analysis unit 320 Group generation unit 330 Input representative data selection unit 340 Restoration Expression Generation Unit 350 Data Restoration Unit

Claims

A data trend analysis method for performing a data trend analysis on a data table having data corresponding to a data item in a computer including a processor and a memory,
A first step in which the computer extracts associations between data items for a plurality of the data tables to generate association information of the data items;
A second step in which the computer generates restoration information of the data item based on the related information;
A third step in which the calculator generates the group of data items based on the related information;
A fourth step in which the computer selects a representative data item from the data items in the group based on the related information and the restoration information;
A fifth step in which the computer performs a trend analysis of data corresponding to the representative data item;
A sixth step in which the computer restores the data item of the group to which the representative data item belongs, based on the restoration information;
A seventh step in which the computer outputs a result of trend analysis of data corresponding to the representative data item, and a restoration result of the data item;
A data trend analysis method characterized by comprising:

The data trend analysis method according to claim 1,
The first step includes
Extracting a pair of related data items from the similarity of the item name of the data item in natural language and the similarity of the value of the data corresponding to the data item, and generating the related information for the pair of the data item A data trend analysis method characterized by

The data trend analysis method according to claim 2,
The second step includes
A first restoration formula for calculating the second data item from the data of the first data item for each of the first data item and the second data item constituting the data item pair; A data trend analysis method comprising: generating the restoration information including a second restoration formula for calculating the first data item from data of two data items.

A data trend analysis method according to claim 3, wherein
The fourth step includes
A data trend analysis method, wherein a data item capable of ensuring restoration accuracy when a data item is restored from the restoration information among the pair of data items is selected for each group as a representative data item.

A data trend analysis method according to claim 3, wherein
The fourth step includes
A data trend analysis method, comprising: dividing a group when a restoration accuracy falls below a predetermined threshold when restoring a data item from the restoration information in the pair of data items.

The data trend analysis method according to claim 1,
The seventh step includes
A data trend analysis method comprising: outputting a restoration result of the data item in association with the representative data item included in the trend analysis result.

A refinement and restoration server including a processor and memory;
A data trend analysis system having a data trend analysis server including a processor and a memory, and performing data trend analysis on a data table having data corresponding to a data item,
The narrowing and restoration server is:
An inter-data relation analysis unit that extracts relations between data items for a plurality of data tables and generates relation information of the data items;
A restoration formula generator for generating restoration information of the data item based on the related information;
A group generation unit for generating a group of the data items based on the related information;
A representative data item selection unit that selects a representative data item from the data items in the group based on the related information and the restoration information, and causes the data trend analysis server to perform a trend analysis of data corresponding to the representative data item. When,
Receiving the result of the trend analysis of the data corresponding to the representative data item from the data trend analysis server, restoring the data item of the group to which the representative data item belongs based on the restoration information, and restoring the data item A data restoration unit for outputting the result and the result of the trend analysis;
A data trend analysis system characterized by comprising:

The data trend analysis system according to claim 7,
The inter-data relation analysis unit
Extracting a pair of related data items from the similarity of the item name of the data item in natural language and the similarity of the value of the data corresponding to the data item, and generating the related information for the pair of the data item A data trend analysis system characterized by

The data trend analysis system according to claim 8,
The restoration formula generator is
A first restoration formula for calculating the second data item from the data of the first data item for each of the first data item and the second data item constituting the data item pair; 2. A data trend analysis system, comprising: generating restoration information including a second restoration formula for calculating the first data item from data of two data items.

The data trend analysis system according to claim 9,
The representative data item selection unit is:
A data trend analysis system, wherein a data item that can ensure restoration accuracy is selected for each group as a representative data item when the data item is restored from the restoration information among the pair of data items.

The data trend analysis system according to claim 9,
The representative data item selection unit is:
The data trend analysis system according to claim 1, wherein when a data item is restored from the restoration information among the pair of data items, the restoration of the restoration accuracy falls below a predetermined threshold.

The data trend analysis system according to claim 7,
The data restoration unit
A data trend analysis system characterized by outputting the restoration result of the data item in association with the representative data item included in the trend analysis result.

A refinement and restoration device including a processor and a memory,
A plurality of data tables having data corresponding to the data items, extracting relationships between the data items for the plurality of data tables, and generating an inter-data relationship analysis unit;
A restoration formula generator for generating restoration information of the data item based on the related information;
A group generation unit for generating a group of the data items based on the related information;
A representative data item selection unit that selects a representative data item from the data items in the group based on the related information and the restoration information, and requests an external trend analysis of data corresponding to the representative data item;
Upon receiving the result of the trend analysis of the data corresponding to the representative data item, the data item of the group to which the representative data item belongs is restored based on the restoration information, and the restoration result of the data item and the trend analysis A data restoration unit for outputting the results;
And a refining and restoring device.

The narrowing and restoring device according to claim 13,
The inter-data relation analysis unit
Extracting a pair of related data items from the similarity of the item name of the data item in natural language and the similarity of the value of the data corresponding to the data item, and generating the related information for the pair of the data item A narrowing and restoring device characterized by the above.

The narrowing and restoring device according to claim 14,
The restoration formula generator is
A first restoration formula for calculating the second data item from the data of the first data item for each of the first data item and the second data item constituting the data item pair; 2. A narrowing-down / restoring apparatus, comprising: generating restoration information including a second restoration formula for calculating the first data item from data of two data items.