JP7242343B2

JP7242343B2 - Analysis device and analysis method

Info

Publication number: JP7242343B2
Application number: JP2019031561A
Authority: JP
Inventors: 和広斉藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2019-02-25
Filing date: 2019-02-25
Publication date: 2023-03-20
Anticipated expiration: 2039-02-25
Also published as: JP2020135717A

Description

本発明は、データベースに格納されている複数のテーブルの関連性を分析する分析装置及び分析方法に関する。 The present invention relates to an analysis apparatus and analysis method for analyzing relationships between multiple tables stored in a database.

従来、データベースを効率的に利用するために、関連性が高い２つのテーブルを１つにまとめたり、関連性が高い２つのテーブルの結合結果を優先的にメモリ上にキャッシュしたりすることが行われている。 Conventionally, in order to use databases efficiently, it has been common practice to combine two highly related tables into one or to preferentially cache the results of joining two highly related tables in memory. It is

関連性が高いテーブルを特定するにあたり、例えば、特許文献１には、データベースシステムにおいて実行されたクエリに含まれるテキストを分析してテーブルの結合条件を特定することにより、複数のテーブルの関連性を分析する分析方法が開示されている。 To identify highly related tables, for example, Patent Document 1 discloses that the relationships between multiple tables are identified by analyzing text included in a query executed in a database system and specifying table join conditions. An analytical method for analyzing is disclosed.

特開２０１７－０２１４６３号公報JP 2017-021463 A

ところで、データベースシステムでは、３つ以上のテーブルを結合する場合、テーブルのサイズや２つのテーブルを結合する二項演算のコストに基づいてテーブルの結合順序を決定し、当該結合順序に基づいて２つのテーブルの結合を繰り返すことにより、３つ以上のテーブルを結合する。 By the way, in a database system, when joining three or more tables, the table joining order is determined based on the size of the tables and the cost of the binary operation for joining two tables, and two tables are joined based on the joining order. Join three or more tables by repeating the joining of the tables.

これに対し、特許文献１に開示されている分析方法は、クエリに含まれるテキストにおいて規定されているカラム同士の関係に基づいてテーブル同士の関連度を算出するので、１：１のテーブル同士の関連性しか特定できない。したがって、クエリに含まれるテキストに基づいて特定したテーブル同士の関連度が、データベースシステムが決定した結合順序に基づくテーブル同士の関連度と必ずしも一致しないという問題がある。そこで、関連性が高いテーブルを精度良く特定することが求められている。 On the other hand, the analysis method disclosed in Patent Document 1 calculates the degree of association between tables based on the relationship between columns specified in the text included in the query. Only relevance can be identified. Therefore, there is a problem that the degree of association between tables specified based on the text included in the query does not necessarily match the degree of association between tables based on the join order determined by the database system. Therefore, it is required to accurately specify a highly relevant table.

そこで、本発明はこれらの点に鑑みてなされたものであり、関連性が高いテーブルを精度良く特定することができる分析装置及び分析方法を提供することを目的とする。 Therefore, the present invention has been made in view of these points, and it is an object of the present invention to provide an analysis apparatus and an analysis method that can accurately identify highly relevant tables.

本発明の第１の態様に係る分析装置は、データベースシステムにおいて実行されたクエリに対して前記データベースシステムが前記クエリに対応する演算を行うために生成した実行計画を取得する実行計画取得部と、前記実行計画取得部が取得した前記実行計画に基づいて、前記実行計画に対応して前記データベースシステムにおいて行われるテーブルを結合する演算を示す二項演算を特定し、当該二項演算に関連する演算関連情報として、当該二項演算により結合される一以上のテーブルを含む第１のテーブル群と、一以上のテーブルを含む第２のテーブル群とを特定する解析部と、前記解析部が特定した前記演算関連情報を出力する出力部と、を備える。 An analysis apparatus according to a first aspect of the present invention includes an execution plan acquisition unit that acquires an execution plan generated for a query executed in a database system by the database system to perform an operation corresponding to the query; Based on the execution plan acquired by the execution plan acquisition unit, a binary operation indicating an operation for joining tables performed in the database system in correspondence with the execution plan is specified, and an operation related to the binary operation. As related information, an analysis unit that identifies a first table group including one or more tables that are combined by the binary operation, and a second table group that includes one or more tables, and and an output unit that outputs the calculation-related information.

前記実行計画は、前記データベースシステムにおいて行われる演算の処理順を規定する木構造のデータ構造を有しており、前記解析部は、前記実行計画取得部が取得した前記実行計画において規定されているデータ構造を解析することにより前記二項演算を特定してもよい。 The execution plan has a tree-structured data structure that defines the processing order of operations performed in the database system, and the analysis unit is defined in the execution plan acquired by the execution plan acquisition unit. The binary operation may be identified by parsing the data structure.

前記解析部は、前記演算関連情報として、前記データベースシステムにおいて前記第１のテーブル群と、前記第２のテーブル群を結合する場合の処理負荷に関連する情報を特定してもよい。 The analysis unit may specify, as the operation-related information, information related to a processing load when combining the first table group and the second table group in the database system.

前記実行計画取得部は、複数の前記クエリのそれぞれに対応する複数の前記実行計画を取得し、前記解析部は、前記演算関連情報として、前記実行計画取得部が取得した複数の前記実行計画のそれぞれにおいて行われる前記二項演算を特定し、結合するテーブル群が同一の前記二項演算の統計情報として、前記処理負荷の統計値を示す情報を生成する統計情報生成部をさらに備え、前記出力部は、前記統計情報生成部が生成した前記統計情報を出力してもよい。 The execution plan acquisition unit acquires the plurality of execution plans corresponding to each of the plurality of queries, and the analysis unit acquires the plurality of execution plans acquired by the execution plan acquisition unit as the operation-related information. further comprising a statistical information generating unit that identifies the binary operations performed in each and generates information indicating the statistical value of the processing load as statistical information of the binary operations in which the combined table group is the same; The unit may output the statistical information generated by the statistical information generation unit.

前記実行計画取得部は、複数の前記クエリのそれぞれに対応する複数の前記実行計画を取得し、前記解析部は、前記演算関連情報として、前記実行計画取得部が取得した複数の前記実行計画のそれぞれにおいて行われる前記二項演算を特定し、結合するテーブル群が同一の前記二項演算の統計情報として、前記データベースシステムにおいて前記二項演算が行われた回数を示す情報を生成する統計情報生成部をさらに備え、前記出力部は、前記統計情報生成部が生成した前記統計情報を出力してもよい。 The execution plan acquisition unit acquires the plurality of execution plans corresponding to each of the plurality of queries, and the analysis unit acquires the plurality of execution plans acquired by the execution plan acquisition unit as the operation-related information. Statistical information generation for specifying the binary operations performed in each and generating information indicating the number of times the binary operations are performed in the database system as statistical information of the binary operations in which the table group to be linked is the same. A unit may be further provided, and the output unit may output the statistical information generated by the statistical information generation unit.

本発明の第２の態様に係る分析方法は、コンピュータが実行する、データベースシステムにおいて実行されたクエリに対して前記データベースシステムが前記クエリに対応する演算を行うために生成した実行計画を取得するステップと、取得された前記実行計画に基づいて、前記実行計画に対応して前記データベースシステムにおいて行われるテーブルを結合する演算を示す二項演算を特定するステップと、特定された前記二項演算に関連する演算関連情報として、当該二項演算により結合される一以上のテーブルを含む第１のテーブル群と、一以上のテーブルを含む第２のテーブル群とを特定するステップと、特定された前記演算関連情報を出力するステップと、を備える。 An analysis method according to a second aspect of the present invention is a computer-executed step of acquiring, for a query executed in a database system, an execution plan generated by the database system for performing an operation corresponding to the query. and identifying, based on the obtained execution plan, a binary operation indicating an operation that joins tables to be performed in the database system corresponding to the execution plan; and associated with the identified binary operation. identifying a first table group including one or more tables to be combined by the binary operation and a second table group including one or more tables as operation-related information to be performed; and and outputting relevant information.

本発明によれば、関連性が高いテーブルを精度良く特定することができるという効果を奏する。 ADVANTAGE OF THE INVENTION According to this invention, it is effective in being able to specify a table with high relevance accurately.

本実施形態に係る分析装置の概要を説明する図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a figure explaining the outline|summary of the analyzer which concerns on this embodiment. 本実施形態に係る分析装置の構成を示す図である。It is a figure which shows the structure of the analyzer which concerns on this embodiment. 本実施形態に係る制御部における処理の流れを示すフローチャートを示す図である。It is a figure which shows the flowchart which shows the flow of a process in the control part which concerns on this embodiment. 本実施形態に係るデータベースシステムにおいて実行されたクエリの一例（その１）を示す図である。It is a figure which shows an example (1) of the query performed in the database system which concerns on this embodiment. 本実施形態に係るデータベースシステムにおいて実行されたクエリの一例（その２）を示す図である。It is a figure which shows an example (2) of the query performed in the database system which concerns on this embodiment. 図４に示すクエリに対応してデータベースシステムが生成した実行計画を示す図である。5 is a diagram showing an execution plan generated by the database system corresponding to the query shown in FIG. 4; FIG. 図５に示すクエリに対応してデータベースシステムが生成した実行計画を示す図である。6 is a diagram showing an execution plan generated by the database system corresponding to the query shown in FIG. 5; FIG. 図６に示す実行計画に対応するデータ構造を示す図である。7 is a diagram showing a data structure corresponding to the execution plan shown in FIG. 6; FIG. 図７に示す実行計画に対応するデータ構造を示す図である。8 is a diagram showing a data structure corresponding to the execution plan shown in FIG. 7; FIG. 本実施形態に係る実行計画解析処理における処理の流れを示すフローチャートである。6 is a flow chart showing the flow of processing in execution plan analysis processing according to the present embodiment; 図６に示す実行計画に対応する演算関連情報を示す図である。7 is a diagram showing operation-related information corresponding to the execution plan shown in FIG. 6; FIG. 図６に示す実行計画に対応する演算関連情報の一部と統計情報とを示す図である。FIG. 7 is a diagram showing part of operation-related information and statistical information corresponding to the execution plan shown in FIG. 6; 図７に示す実行計画に対応する演算関連情報を示す図である。8 is a diagram showing operation-related information corresponding to the execution plan shown in FIG. 7; FIG. 図１２に示す統計情報に対して、図７に示す実行計画に対応する統計情報が反映された例を示す図である。FIG. 13 is a diagram showing an example in which statistical information corresponding to the execution plan shown in FIG. 7 is reflected in the statistical information shown in FIG. 12; ２つのデータベースシステムを仮想統合した仮想統合環境を示す図である。It is a figure which shows the virtual integration environment which virtual-integrated two database systems. 仮想統合環境において使用されるテーブルと、当該テーブルの容量とを示す図である。FIG. 3 is a diagram showing tables used in a virtual integration environment and capacities of the tables;

［分析装置１０の概要］
図１は、本実施形態に係る分析装置１０の概要を説明する図である。分析装置１０は、データベースに格納されている複数のテーブルの関連性について分析するコンピュータである。 [Overview of analyzer 10]
FIG. 1 is a diagram illustrating an overview of an analysis device 10 according to this embodiment. The analysis device 10 is a computer that analyzes relationships between multiple tables stored in a database.

分析装置１０は、データベースシステム１にＬＡＮ（Local Area Network）やインターネット等により通信可能に接続されている。分析装置１０は、データベースシステム１において実行されたクエリに対してデータベースシステム１が生成した実行計画を取得する。クエリは、ＳＱＬ等のテキスト文である。実行計画は、データベースシステム１がクエリに対応する複数の演算を行うために生成する情報であり、当該複数の演算の処理順を規定する木構造のデータ構造を有している。クエリ及び実行計画は、例えばクエリログに含まれている。クエリログは、データベースシステム１が実行したクエリに関する情報である。クエリログは、例えば、実行計画、クエリ、クエリを特定するＩＤ、実行時刻、処理時間、クエリを実行したユーザを識別するユーザ識別情報を含んでいる。 The analysis device 10 is communicably connected to the database system 1 via a LAN (Local Area Network), the Internet, or the like. The analysis device 10 acquires an execution plan generated by the database system 1 for queries executed in the database system 1 . A query is a text sentence such as SQL. The execution plan is information generated in order for the database system 1 to perform multiple operations corresponding to a query, and has a tree-structured data structure that defines the processing order of the multiple operations. Queries and execution plans are contained in query logs, for example. A query log is information about queries executed by the database system 1 . The query log includes, for example, an execution plan, a query, an ID identifying the query, execution time, processing time, and user identification information identifying the user who executed the query.

分析装置１０は、取得した実行計画に基づいて、実行計画に対応してデータベースシステム１が行う演算の中から二項演算を特定する。二項演算は、テーブルを結合する演算である。分析装置１０は、特定した二項演算により結合される一以上のテーブルを含む第１のテーブル群と、一以上のテーブルを含む第２のテーブル群とを、二項演算に関連する演算関連情報として特定し、当該演算関連情報を出力する。 Based on the acquired execution plan, the analysis device 10 identifies a binary operation among the operations performed by the database system 1 corresponding to the execution plan. A binary operation is an operation that joins tables. The analysis apparatus 10 stores a first table group including one or more tables that are combined by the specified binary operation and a second table group including one or more tables as operation-related information related to the binary operation. , and outputs the calculation-related information.

実行計画は、複数の演算の処理順を規定するデータ構造を有しているところ、分析装置１０は、二項演算の処理順に基づいて、二項演算により結合される第１のテーブル群と、第２のテーブル群とを特定することができる。これにより、分析装置１０は、３つ以上のテーブルを結合する場合であっても関連性が高いテーブルを精度良く特定することができる。 The execution plan has a data structure that defines the processing order of a plurality of operations. Based on the processing order of the binary operations, the analysis device 10 includes a first table group that is combined by the binary operations, A second group of tables can be specified. As a result, the analysis apparatus 10 can accurately identify a highly relevant table even when three or more tables are combined.

［分析装置１０の構成］
続いて、分析装置１０の構成を説明する。図２は、本実施形態に係る分析装置１０の構成を示す図である。図２に示すように、分析装置１０は、通信部１１と、操作部１２と、表示部１３と、記憶部１４と、制御部１５とを備える。 [Configuration of analyzer 10]
Next, the configuration of the analysis device 10 will be described. FIG. 2 is a diagram showing the configuration of the analysis device 10 according to this embodiment. As shown in FIG. 2 , the analysis device 10 includes a communication section 11 , an operation section 12 , a display section 13 , a storage section 14 and a control section 15 .

通信部１１は、データベースシステム１とデータを送受信するための通信インターフェースである。操作部１２は、例えば、キーボードやマウス等により構成されており、分析装置１０のユーザから操作入力を受け付ける。表示部１３は、例えば、液晶ディスプレイ又は有機ＥＬ（Electro-Luminescence）ディスプレイ等により構成される。表示部１３は、制御部１５の制御に応じて各種情報を表示する。 The communication unit 11 is a communication interface for transmitting and receiving data to and from the database system 1 . The operation unit 12 is composed of, for example, a keyboard, a mouse, and the like, and receives operation input from the user of the analysis device 10 . The display unit 13 is configured by, for example, a liquid crystal display or an organic EL (Electro-Luminescence) display. The display unit 13 displays various information under the control of the control unit 15 .

記憶部１４は、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等を含む記憶媒体である。記憶部１４は、制御部１５が実行するプログラムを記憶している。例えば、記憶部１４は、制御部１５を、実行計画取得部１５１、解析部１５２、統計情報生成部１５３、及び出力部１５４として機能させる分析プログラムを記憶している。 The storage unit 14 is a storage medium including ROM (Read Only Memory), RAM (Random Access Memory), and the like. The storage unit 14 stores programs executed by the control unit 15 . For example, the storage unit 14 stores an analysis program that causes the control unit 15 to function as an execution plan acquisition unit 151 , an analysis unit 152 , a statistical information generation unit 153 and an output unit 154 .

制御部１５は、例えばＣＰＵ（Central Processing Unit）である。制御部１５は、記憶部１４に記憶された分析プログラムを実行することにより、実行計画取得部１５１、解析部１５２、統計情報生成部１５３、及び出力部１５４として機能する。 The control unit 15 is, for example, a CPU (Central Processing Unit). The control unit 15 functions as an execution plan acquisition unit 151 , an analysis unit 152 , a statistical information generation unit 153 and an output unit 154 by executing analysis programs stored in the storage unit 14 .

図３は、本実施形態に係る制御部１５における処理の流れを示すフローチャートを示す図である。以下、図３に示すフローチャートを参照しながら、実行計画取得部１５１、解析部１５２、統計情報生成部１５３、及び出力部１５４の機能を説明する。 FIG. 3 is a diagram showing a flowchart showing the flow of processing in the control unit 15 according to this embodiment. The functions of the execution plan acquisition unit 151, the analysis unit 152, the statistical information generation unit 153, and the output unit 154 will be described below with reference to the flowchart shown in FIG.

実行計画取得部１５１は、データベースシステム１において実行されたクエリに対して、データベースシステム１が当該クエリに対応する演算を行うために生成した実行計画を取得する（Ｓ１０）。実行計画取得部１５１は、データベースシステム１において実行された複数のクエリのそれぞれに対応する複数の実行計画を取得する。 The execution plan acquisition unit 151 acquires, for a query executed in the database system 1, an execution plan generated in order for the database system 1 to perform an operation corresponding to the query (S10). The execution plan acquisition unit 151 acquires multiple execution plans corresponding to multiple queries executed in the database system 1 .

例えば、実行計画取得部１５１は、操作部１２が、分析装置１０のユーザから実行計画を取得する操作を受け付けると、データベースシステム１にアクセスし、現在時刻から所定期間前までに実行されたクエリを含むクエリログを特定する。そして、実行計画取得部１５１は、特定したクエリログに含まれている実行計画を取得する。実行計画取得部１５１は、ユーザから、クエリが実行された期間を受け付け、当該期間内に実行されたクエリに対応する実行計画を取得してもよい。 For example, when the operation unit 12 receives an operation to acquire an execution plan from the user of the analysis apparatus 10, the execution plan acquisition unit 151 accesses the database system 1 and acquires queries executed within a predetermined period from the current time. Identify query logs to include. Then, the execution plan acquisition unit 151 acquires an execution plan included in the specified query log. The execution plan acquisition unit 151 may receive from the user a period during which the query was executed, and acquire an execution plan corresponding to the query executed within that period.

図４及び図５は、本実施形態に係るデータベースシステム１において実行されたクエリの一例を示す図である。図６は、図４に示すクエリに対応してデータベースシステム１が生成した実行計画を示す図である。図７は、図５に示すクエリに対応してデータベースシステム１が生成した実行計画を示す図である。本実施形態では、実行計画取得部１５１が、図６及び図７に示す実行計画を取得するものとする。 4 and 5 are diagrams showing examples of queries executed in the database system 1 according to this embodiment. FIG. 6 is a diagram showing an execution plan generated by the database system 1 corresponding to the query shown in FIG. FIG. 7 is a diagram showing an execution plan generated by the database system 1 corresponding to the query shown in FIG. In this embodiment, the execution plan acquisition unit 151 acquires the execution plans shown in FIGS. 6 and 7. FIG.

実行計画は、データベースシステム１において実行されるクエリに対応する演算の処理順を規定しており、木構造のデータ構造を有している。図８は、図６に示す実行計画に対応するデータ構造を示す図であり、図９は、図７に示す実行計画に対応するデータ構造を示す図である。図８及び図９に示すように、実行計画は、二分木構造を有している。また、図８及び図９に示すように、二分木は、処理の内容を示すノードと、他のノードからのデータの入力を示すエッジとを有している。図８に示す二分木は、ノードＮ１～Ｎ７を有しており、図９に示す二分木は、ノードＮ１１～Ｎ２３を有している。データベースシステム１は、図８及び図９に示す二分木における最下層のノードから順番に演算を行う。 The execution plan defines the processing order of operations corresponding to queries executed in the database system 1, and has a tree-structured data structure. 8 is a diagram showing a data structure corresponding to the execution plan shown in FIG. 6, and FIG. 9 is a diagram showing a data structure corresponding to the execution plan shown in FIG. As shown in FIGS. 8 and 9, the execution plan has a binary tree structure. As shown in FIGS. 8 and 9, the binary tree has nodes indicating the content of processing and edges indicating data input from other nodes. The binary tree shown in FIG. 8 has nodes N1 to N7, and the binary tree shown in FIG. 9 has nodes N11 to N23. The database system 1 performs operations in order from the lowest node in the binary tree shown in FIGS. 8 and 9 .

また、実行計画には、各演算に対して推定又は実際に算出された処理コストと、データの行数とを示す情報が含まれている。このため、図８及び図９に示す二分木では、各ノードの右側に、ノードに対応する処理を行う場合の処理コスト（Ｃｏｓｔ）と、ノードに対応する処理を行うことにより出力されるデータの行数（Ｒｏｗｓ）とを示している。なお、データベースシステム１の種類によっては、実行計画に、各演算に対して推定又は実際に算出された処理コストと、データの行数とを示す情報が含まれていない場合もある。この場合、分析装置１０が、データベースシステム１に格納されているテーブルのメタ情報を参照して、これらの処理コストやデータの行数を推定してもよい。 The execution plan also includes information indicating the estimated or actually calculated processing cost and the number of rows of data for each operation. For this reason, in the binary trees shown in FIGS. 8 and 9, on the right side of each node are the processing cost (Cost) when performing the processing corresponding to the node, and the data output by performing the processing corresponding to the node. and the number of rows (Rows). Depending on the type of database system 1, the execution plan may not include information indicating the estimated or actually calculated processing cost and the number of rows of data for each operation. In this case, the analysis device 10 may refer to the meta information of the tables stored in the database system 1 to estimate the processing cost and the number of data rows.

続いて、解析部１５２は、実行計画取得部１５１が取得した実行計画のうち、１つの実行計画を選択する（Ｓ２０）。本実施形態では、まず、図６に示す実行計画が選択されたものとして説明を進める。 Subsequently, the analysis unit 152 selects one execution plan from among the execution plans acquired by the execution plan acquisition unit 151 (S20). In the present embodiment, first, the description will proceed assuming that the execution plan shown in FIG. 6 is selected.

続いて、解析部１５２は、Ｓ２０において選択された実行計画について、実行計画を解析する処理である実行計画解析処理を実行する（Ｓ３０）。解析部１５２は、選択された実行計画に対する実行計画解析処理の実行を繰り返すことにより、実行計画取得部１５１が取得した複数の実行計画のそれぞれに対応してデータベースシステム１において行われる二項演算を特定する。解析部１５２は、実行計画において規定されている木構造のデータ構造を解析することにより二項演算を特定する。解析部１５２は、特定した二項演算に関連する演算関連情報として、当該二項演算により結合される第１のテーブル群と、第２のテーブル群とを特定する。ここで、テーブル群は、一以上のテーブルを含むものとする。なお、１つのテーブルのみ含む場合もテーブル群と呼ぶものとする。 Subsequently, the analysis unit 152 executes execution plan analysis processing, which is processing for analyzing the execution plan, for the execution plan selected in S20 (S30). The analysis unit 152 repeats execution of the execution plan analysis process for the selected execution plan, thereby performing a binary operation performed in the database system 1 corresponding to each of the plurality of execution plans acquired by the execution plan acquisition unit 151. Identify. The analysis unit 152 identifies a binary operation by analyzing the tree-structured data structure defined in the execution plan. The analysis unit 152 identifies, as operation-related information related to the identified binary operation, a first table group and a second table group that are combined by the binary operation. Here, the table group shall include one or more tables. Note that even when only one table is included, it is also called a table group.

図１０は、本実施形態に係る実行計画解析処理における処理の流れを示すフローチャートである。図１０を参照しながら、実行計画解析処理の詳細について説明する。
まず、解析部１５２は、実行計画のルートに対応するノードから順にノードを選択する（Ｓ３１）。例えば、図８に示す二分木において、ノードＮ１がルートノードであることから、解析部１５２は、まずノードＮ１を選択する。 FIG. 10 is a flow chart showing the flow of processing in execution plan analysis processing according to this embodiment. Details of the execution plan analysis process will be described with reference to FIG.
First, the analysis unit 152 selects nodes in order from the node corresponding to the root of the execution plan (S31). For example, in the binary tree shown in FIG. 8, the node N1 is the root node, so the analysis unit 152 first selects the node N1.

続いて、解析部１５２は、選択したノードにおいて、テーブルを結合する演算を示す二項演算が行われているか否かを判定する（Ｓ３２）。ここで、二項演算は、例えば、結合演算及び集合演算を指す。結合演算は、例えば、自然結合（Natural join）、内部結合（Inner join）、外部結合（Left/Right/Full outer join）、直積（Cross join）、部分結合（Semi-join）を含む。集合演算は、和集合（Union及びUnion all）、差集合（Difference）、積集合（Intersect）、商集合（Division）を含む。 Subsequently, the analysis unit 152 determines whether or not a binary operation indicating an operation for joining tables is performed at the selected node (S32). Here, binary operations refer to, for example, associative operations and set operations. Join operations include, for example, Natural join, Inner join, Left/Right/Full outer join, Cross join, and Semi-join. Set operations include Union and Union all, Difference, Intersect, and Division.

解析部１５２は、選択したノードにおいて二項演算が行われていると判定すると、Ｓ３３に処理を移し、二項演算が行われていないと判定すると、Ｓ３１に処理を移す。図８に示す例では、ノードＮ１、Ｎ２が二項演算ではないことから、解析部１５２は、Ｓ３１及びＳ３２を繰り返し、その後、ノードＮ３が二項演算であることを特定する。 If the analysis unit 152 determines that the binary operation is performed in the selected node, the process proceeds to S33, and if it determines that the binary operation is not performed, the process proceeds to S31. In the example shown in FIG. 8, since the nodes N1 and N2 are not binary operations, the analysis unit 152 repeats S31 and S32, and then identifies that the node N3 is a binary operation.

解析部１５２は、Ｓ３２において二項演算が行われていると判定すると、当該二項演算に対応する演算関連情報を特定する（Ｓ３３）。演算関連情報は、特定した二項演算に対応して結合される２つのテーブル群、及び、当該２つのテーブル群を結合する場合の処理負荷に関連する情報である。 When the analysis unit 152 determines in S32 that a binary operation is being performed, it identifies operation-related information corresponding to the binary operation (S33). Operation-related information is information related to two table groups to be combined corresponding to the specified binary operation and processing load when the two table groups are combined.

具体的には、解析部１５２は、二項演算に対応するノードを含み、当該ノードの下位の二分木であるサブツリーを走査することにより、結合するテーブル群としての２つのテーブル群を演算関連情報として特定する。図８に示す例では、解析部１５２は、ノードＮ３において二項演算が行われていると判定した後、ノードＮ３のサブツリーであるノードＮ３～Ｎ７を走査することにより、第１のテーブル群をｌｉｎｅｉｔｅｍテーブルと特定し、第２のテーブル群をｏｒｄｅｒｓテーブル及びｃｕｓｔｏｍｅｒテーブルと特定する。 Specifically, the analysis unit 152 includes a node corresponding to a binary operation, and by scanning a subtree that is a binary tree under the node, the analysis unit 152 extracts two table groups as a group of tables to be combined with the operation-related information. Identify as In the example shown in FIG. 8, the analysis unit 152 determines that a binary operation is performed at the node N3, and then scans the nodes N3 to N7, which are subtrees of the node N3, to obtain the first table group. Identify the lineitem table, and identify the second group of tables as the orders table and the customer table.

また、解析部１５２は、演算関連情報に含まれる処理負荷に関連する情報として、二項演算に対応するノードの出力結果と、当該ノードの直下の２つのノードの出力結果とを特定する。図１１は、図６に示す実行計画に対応する演算関連情報を示す図である。図１１に示すように、二項演算に対応する演算関連情報として、第１テーブル群、第１テーブル群に対応する入力データ数、第２テーブル群、第２テーブル群に対応する入力データ数、結合タイプ、結合条件、出力データ数が特定されていることが確認できる。図８に示す例では、解析部１５２は、ノードＮ３の二項演算に対して、一行目に示される演算関連情報を特定する。なお、解析部１５２は、演算関連情報として、二項演算が行われるノードに対応する処理コストを特定してもよい。 Further, the analysis unit 152 identifies the output result of the node corresponding to the binary operation and the output results of the two nodes immediately below the node as the information related to the processing load included in the operation related information. FIG. 11 is a diagram showing operation-related information corresponding to the execution plan shown in FIG. As shown in FIG. 11, as operation related information corresponding to the binary operation, the first table group, the number of input data corresponding to the first table group, the second table group, the number of input data corresponding to the second table group, It can be confirmed that the join type, join condition, and number of output data are specified. In the example illustrated in FIG. 8 , the analysis unit 152 identifies operation-related information shown in the first line for the binary operation of node N3. Note that the analysis unit 152 may specify, as the operation-related information, the processing cost corresponding to the node on which the binary operation is performed.

続いて、統計情報生成部１５３は、演算関連情報の統計値を含む統計情報を更新する（Ｓ３４）。具体的には、統計情報生成部１５３は、結合するテーブル群が同一の二項演算の統計情報として、処理負荷の統計値を示す情報と、データベースシステム１において二項演算が行われた回数を示す情報とを生成する。統計情報は、記憶部１４に記憶されており、統計情報生成部１５３は、当該統計情報を更新することにより、処理負荷の統計値を示す情報と、データベースシステム１において二項演算が行われた回数を示す情報とを生成する。図１２は、図６に示す実行計画に対応する演算関連情報の一部と統計情報とを示す図である。ここで、図１２に示す演算関連情報の一部は、第１テーブル群及び第２テーブル群を示す情報である。 Subsequently, the statistical information generation unit 153 updates the statistical information including the statistical value of the calculation related information (S34). Specifically, the statistical information generation unit 153 generates information indicating the statistical value of the processing load and the number of times the binary operation is performed in the database system 1 as the statistical information of the binary operation with the same group of tables to be joined. to generate the information shown. The statistical information is stored in the storage unit 14, and the statistical information generation unit 153 updates the statistical information so that the information indicating the statistical value of the processing load and the number of binomial operations performed in the database system 1 information indicating the number of times. FIG. 12 is a diagram showing part of operation-related information and statistical information corresponding to the execution plan shown in FIG. Here, part of the calculation-related information shown in FIG. 12 is information indicating the first table group and the second table group.

図１２に示される実行回数は、二項演算が行われた回数を示す。また、図１２に示されるデータ数の合計値は、第１テーブル群に対応する入力データ数と、第２テーブル群に対応する入力データ数と、出力データ数との合計値を示す。図８に示す例では、統計情報生成部１５３は、ノードＮ３の二項演算に対して、一行目に示される統計情報を生成する。 The number of executions shown in FIG. 12 indicates the number of times the binary operation was performed. Also, the total number of data shown in FIG. 12 indicates the total number of input data corresponding to the first table group, input data corresponding to the second table group, and output data. In the example shown in FIG. 8, the statistical information generation unit 153 generates statistical information shown in the first line for the binary operation of node N3.

Ｓ３４の処理が終了すると、解析部１５２は、全ノードを選択したか否かを判定する（Ｓ３５）。解析部１５２は、全ノードを選択したと判定すると、本フローチャートに係る処理を終了し、全ノードを選択していないと判定すると、Ｓ３１に処理を移す。図８に示す例では、解析部１５２は、ノードＮ３において二項演算が行われていることを特定した後、ノードＮ５において二項演算が行われていることを特定する。その後、ノードＮ５の二項演算に対して、解析部１５２が、図１１の二行目に示される演算関連情報を特定し、統計情報生成部１５３が、図１２の二行目に示される統計情報が生成する。 After the processing of S34 is completed, the analysis unit 152 determines whether or not all nodes have been selected (S35). If the analysis unit 152 determines that all the nodes have been selected, it ends the processing according to this flowchart, and if it determines that all the nodes have not been selected, the processing moves to S31. In the example illustrated in FIG. 8 , the analysis unit 152 identifies that the binary operation is performed at the node N5 after identifying that the binary operation is performed at the node N3. After that, for the binary operation of node N5, the analysis unit 152 identifies operation-related information shown in the second line of FIG. information is generated.

図３に説明を戻す。解析部１５２は、Ｓ１０において取得された全ての実行計画が選択されたか否かを判定する（Ｓ４０）。解析部１５２は、全ての実行計画が選択されたと判定すると、Ｓ５０に処理を移し、全ての実行計画が選択されていないと判定すると、Ｓ２０に処理を移す。 Returning to FIG. The analysis unit 152 determines whether or not all execution plans acquired in S10 have been selected (S40). If the analysis unit 152 determines that all the execution plans have been selected, it moves the process to S50, and if it determines that all the execution plans have not been selected, it moves the process to S20.

図６に示す実行計画に対して実行計画解析処理が実行された後、解析部１５２は、図７に示す実行計画を選択していないことから、Ｓ２０に処理を移し、図７に示す実行計画を選択する。そして、解析部１５２は、図７に示す実行計画に対して実行計画解析処理を実行する。これにより、解析部１５２が図７に示す実行計画に対応する演算関連情報を特定し、統計情報生成部１５３が、当該演算関連情報に基づいて統計情報を更新する。 After executing the execution plan analysis process for the execution plan shown in FIG. 6, the analysis unit 152 does not select the execution plan shown in FIG. to select. Then, the analysis unit 152 executes execution plan analysis processing on the execution plan shown in FIG. As a result, the analysis unit 152 identifies the operation-related information corresponding to the execution plan shown in FIG. 7, and the statistical information generation unit 153 updates the statistical information based on the operation-related information.

図１３は、図７に示す実行計画に対応する演算関連情報を示す図であり、図１４は、図１２に示す統計情報に対して、図７に示す実行計画に対応する統計情報が反映された例を示す図である。 13 is a diagram showing operation-related information corresponding to the execution plan shown in FIG. 7, and FIG. 14 shows the statistical information corresponding to the execution plan shown in FIG. 7 reflected in the statistical information shown in FIG. and FIG. 11 is a diagram showing an example.

図７に示す実行計画に対応する二分木が示されている図９を参照すると、ノードＮ１３、Ｎ１４、Ｎ１５、Ｎ１７、Ｎ２０において二項演算が行われている。これに対応して、図１３に示すように、テーブルの結合が５回行われたことが確認できる。また、図１４に示すように、図７に示す実行計画が解析された結果、第１テーブル群がｌｉｎｅｉｔｅｍテーブル、第２テーブル群がｏｒｄｅｒｓテーブル及びｃｕｓｔｏｍｅｒテーブルであるテーブルの結合が２回行われているとともに、第１テーブル群がｏｒｄｅｒｓテーブル、第２テーブル群がｃｕｓｔｏｍｅｒテーブルであるテーブルの結合が２回行われていることが確認できる。これらのテーブルの結合では、データ数の合計値も大きいことから、これらのテーブル群の関連性が高いことが確認できる。 Referring to FIG. 9, which shows the binary tree corresponding to the execution plan shown in FIG. 7, binary operations are performed at nodes N13, N14, N15, N17, N20. Correspondingly, as shown in FIG. 13, it can be confirmed that the tables are joined five times. Also, as shown in FIG. 14, as a result of analyzing the execution plan shown in FIG. 7, the first table group is the lineitem table, and the second table group is the orders table and the customer table. In addition, it can be confirmed that the first table group is the orders table and the second table group is the customer table, and that the tables are joined twice. In the join of these tables, the total value of the number of data is also large, so it can be confirmed that these table groups are highly related.

図３に説明を戻す。出力部１５４は、Ｓ４０において全ての実行計画が選択されたと判定されると、解析部１５２が特定した演算関連情報と、統計情報生成部１５３が生成した統計情報とを出力する（Ｓ５０）。例えば、出力部１５４は、図１４に示す演算関連情報の一部（第１テーブル群及び第２テーブル群）と、統計情報（実行回数及びデータ数の合計値）とを示す情報を表示部１３に出力する。ユーザは、図１４に示す情報を確認することにより、関連性の高いテーブル群を特定し、テーブルの配置等を検討することができる。 Returning to FIG. When it is determined in S40 that all execution plans have been selected, the output unit 154 outputs the operation-related information specified by the analysis unit 152 and the statistical information generated by the statistical information generation unit 153 (S50). For example, the output unit 154 outputs information indicating part of the calculation-related information (first table group and second table group) shown in FIG. output to By confirming the information shown in FIG. 14, the user can specify a group of tables with high relevance and consider the layout of the tables.

［テーブルの配置例］
図１４に示す結果を利用する例を以下に示す。図１５は、２つのデータベースシステムを仮想統合した仮想統合環境を示す図である。図１５に示すように、仮想統合環境は、２つのデータベースシステム１Ａ及び１Ｂと、データ仮想化システム２とを備える。データ仮想化システム２は、２つのデータベースシステム１Ａ及び１Ｂと、ＬＡＮやインターネット等の通信ネットワークで接続されている。 [Table arrangement example]
An example utilizing the results shown in FIG. 14 is provided below. FIG. 15 is a diagram showing a virtual integration environment in which two database systems are virtually integrated. As shown in FIG. 15, the virtual integration environment includes two database systems 1A and 1B and a data virtualization system 2. In FIG. A data virtualization system 2 is connected to two database systems 1A and 1B via a communication network such as a LAN or the Internet.

データ仮想化システム２は、クエリの入力インターフェースを備え、２つのデータベースシステム１Ａ及び１Ｂのスキーマのみを統合している。データ仮想化システム２は、クエリの入力を受け付けたことに応じて、当該クエリに対応する物理データを２つのデータベースシステム１Ａ及び１Ｂから取得する。仮想統合環境では、結合対象の２つのテーブル群が１つのデータベースシステムに含まれている場合は、当該データベースシステム内でテーブル群の結合処理が完結する。一方、結合対象のテーブル群が異なるデータベースシステムに格納されている場合、データ仮想化システム２が、通信ネットワークを介して当該テーブル群に対応するデータを取得する必要がある。 The data virtualization system 2 has a query input interface and integrates only the schemas of the two database systems 1A and 1B. The data virtualization system 2 acquires the physical data corresponding to the query from the two database systems 1A and 1B in response to receiving the input of the query. In the virtual integration environment, when two table groups to be joined are included in one database system, the table group joining process is completed within the database system. On the other hand, if the table groups to be combined are stored in different database systems, the data virtualization system 2 needs to acquire data corresponding to the table groups through the communication network.

このため、結合される頻度の高いテーブル群は、同じデータベースシステムに配置することが好ましい。図１６は、仮想統合環境において使用されるテーブルと、当該テーブルの容量とを示す図である。図１６に示すテーブルは、図１４等に示すテーブルに対応している。また、２つのデータベースシステム１Ａ及び１Ｂの容量制限が３，０００ＧＢであるものとする。 For this reason, it is preferable to place tables that are frequently combined in the same database system. FIG. 16 is a diagram showing tables used in the virtual integration environment and the capacities of the tables. The table shown in FIG. 16 corresponds to the table shown in FIG. 14 and the like. It is also assumed that the capacity limit of the two database systems 1A and 1B is 3,000 GB.

図１４に示す結果では、第１のテーブル群であるｌｉｎｅｉｔｅｍテーブルと、第２のテーブル群であるｏｒｄｅｒｓテーブル及びｃｕｓｔｏｍｅｒテーブルとが結合される頻度が高い。このため、これらのテーブルをデータベースシステム１Ａに配置する。この時点で、データベースシステム１Ａに配置されているテーブルの合計容量は３，０００ＧＢに達するので、他のテーブルをデータベースシステム１Ｂに配置する。このようにすることで、仮想統合環境において、テーブルの配置が最適化され、性能を向上させることができる。 In the results shown in FIG. 14, the lineitem table, which is the first table group, and the orders table and the customer table, which are the second table group, are frequently joined. Therefore, these tables are arranged in the database system 1A. At this point, the total capacity of the tables placed in the database system 1A reaches 3,000 GB, so other tables are placed in the database system 1B. By doing so, the table arrangement is optimized in the virtual integrated environment, and the performance can be improved.

［本実施形態における効果］
以上説明したように、本実施形態に係る分析装置１０は、データベースシステム１がクエリに対応する演算を行うために生成した実行計画に基づいて二項演算を特定する。そして、分析装置１０は、特定した二項演算に関連する演算関連情報として、当該二項演算により結合される第１のテーブル群と、第２のテーブル群とを特定し、特定した演算関連情報を出力する。このようにすることで、分析装置１０は、３つ以上のテーブルを結合する場合であっても関連性が高いテーブルを精度良く特定することができる。 [Effects of this embodiment]
As described above, the analysis device 10 according to this embodiment identifies a binary operation based on the execution plan generated by the database system 1 to perform an operation corresponding to a query. Then, the analysis device 10 identifies the first table group and the second table group that are combined by the binary operation as the operation-related information related to the specified binary operation, and the specified operation-related information to output By doing so, the analysis device 10 can accurately identify a highly relevant table even when three or more tables are combined.

また、分析装置１０は、二項演算に関連するすべてのテーブルをテーブル群として関連付けることにより、クエリにおいて規定されているものの、実際にはデータベースシステム１において実行されない１：１の結合演算に係るテーブルの関係を排除することができる。これにより、ユーザは、分析装置１０の出力結果に基づいて、データベースシステム１における利用実態や処理状態を正確に把握することができる。 In addition, the analysis device 10 associates all the tables related to the binary operation as a table group so that the tables related to the 1:1 join operation that are specified in the query but not actually executed in the database system 1 relationship can be excluded. This allows the user to accurately grasp the usage status and processing status of the database system 1 based on the output result of the analysis device 10 .

また、ユーザは、分析装置１０の出力結果に基づいて、頻繁に結合されるテーブル群を検出することができる。これにより、ユーザは、図１４に示すような、複数のデータベースシステムを連結した環境において、頻繁に結合されるテーブル群を同じデータベースシステムに配置して、これらのテーブル群の結合を１つのデータベースシステムで完結させることで、マルチデータベースシステム環境の性能を向上させることができる。 In addition, the user can detect frequently combined table groups based on the output result of the analysis device 10 . As a result, in an environment where a plurality of database systems are connected as shown in FIG. , the performance of the multi-database system environment can be improved.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、上述の実施形態では、分析装置１０は、操作部１２及び表示部１３を備えることとしたが、これに限らない。分析装置１０は、操作部１２及び表示部１３を備えなくてもよい。この場合において、実行計画取得部１５１は、ユーザから、クエリが実行された期間を受け付けず、指定した時刻に定期的にクエリに対応する実行計画を取得してもよい。また、実行計画取得部１５１は、データベースシステム１に蓄積されているクエリログの数が増加していることを検出した場合に実行計画を取得してもよい。このようにすることで、分析装置１０は、自動的に実行計画を取得することができるので、当該実行計画に基づいて自動的に二項演算を特定することができる。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes are possible within the scope of the gist thereof. be. For example, in the above-described embodiment, the analysis device 10 is provided with the operation unit 12 and the display unit 13, but the present invention is not limited to this. The analyzer 10 does not have to include the operation section 12 and the display section 13 . In this case, the execution plan acquisition unit 151 may acquire the execution plan corresponding to the query periodically at a specified time without accepting the query execution period from the user. Also, the execution plan acquisition unit 151 may acquire an execution plan when detecting that the number of query logs accumulated in the database system 1 is increasing. By doing so, the analysis device 10 can automatically acquire the execution plan, and can automatically identify the binary operation based on the execution plan.

また、装置の全部又は一部は、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を併せ持つ。 Also, all or part of the device can be functionally or physically distributed and integrated in arbitrary units. In addition, new embodiments resulting from arbitrary combinations of multiple embodiments are also included in the embodiments of the present invention. The effect of the new embodiment caused by the combination has the effect of the original embodiment.

１・・・データベースシステム、２・・・データ仮想化システム、１０・・・分析装置、１１・・・通信部、１２・・・操作部、１３・・・表示部、１４・・・記憶部、１５・・・制御部、１５１・・・実行計画取得部、１５２・・解析部、１５３・・・統計情報生成部、１５４・・・出力部 DESCRIPTION OF SYMBOLS 1... Database system, 2... Data virtualization system, 10... Analysis apparatus, 11... Communication part, 12... Operation part, 13... Display part, 14... Storage part , 15... control unit, 151... execution plan acquisition unit, 152... analysis unit, 153... statistical information generation unit, 154... output unit

Claims

An analysis device for analyzing relationships between multiple tables stored in a database system,
A node indicating the content of processing generated for a query executed in the database system so that the database system performs an operation corresponding to the query, and an edge indicating the input of data from another node to the node an execution plan acquisition unit that acquires an execution plan having a binary tree data structure consisting of
Based on the nodes included in the execution plan acquired by the execution plan acquisition unit, identify a node corresponding to a binary operation indicating an operation for joining tables performed in the database system in correspondence with the execution plan, A plurality of lower nodes that are included in a subtree that is a lower binary tree that is connected to a specified node via an edge that inputs data to the specified node included in the binary tree indicated by the execution plan . By scanning, a first table group including one or more tables connected by a binary operation corresponding to the specified node and a second table group including one or more tables are specified, and the specified an analysis unit that includes a first group of tables and the second group of tables and generates calculation-related information to be confirmed by a user of the analysis device ;
an output unit that outputs the calculation-related information generated by the analysis unit;
Analyzer with

The analysis unit identifies the binary operation by analyzing a data structure defined in the execution plan acquired by the execution plan acquisition unit.
The analyzer according to claim 1.

The analysis unit generates the operation-related information including information related to a processing load when combining the first table group and the second table group in the database system,
The analyzer according to claim 1 or 2.

The execution plan acquisition unit acquires a plurality of execution plans corresponding to each of the plurality of queries,
The analysis unit identifies the binary operation performed in each of the plurality of execution plans acquired by the execution plan acquisition unit;
further comprising a statistical information generation unit that indicates the statistical value of the processing load as statistical information of the binary operation with the same group of tables to be combined and generates information that is confirmed by the user of the analysis device ;
The output unit outputs the statistical information generated by the statistical information generation unit.
The analyzer according to claim 3.

The execution plan acquisition unit acquires a plurality of execution plans corresponding to each of the plurality of queries,
The analysis unit identifies the binary operation performed in each of the plurality of execution plans acquired by the execution plan acquisition unit;
The number of times the binary operation was performed in the database system and the number of data contained in one or more tables included in the first table group as statistical information of the binary operation with the same group of tables to be joined. a statistical information generating unit that generates information that indicates the total number of data included in one or more tables included in the second table group and the number of data included in the second table group and that is confirmed by the user of the analysis device ;
The output unit outputs the statistical information generated by the statistical information generation unit.
The analyzer according to any one of claims 1 to 4.

the computer runs
A node indicating the content of processing generated for a query executed in the database system so that the database system performs an operation corresponding to the query, and an edge indicating the input of data from another node to the node obtaining an execution plan having a binary tree data structure consisting of
a step of identifying a node corresponding to a binary operation representing an operation of joining tables performed in the database system corresponding to the execution plan, based on the nodes included in the acquired execution plan;
A plurality of lower nodes that are included in a subtree that is a lower binary tree that is connected to a specified node via an edge that inputs data to the specified node included in the binary tree indicated by the execution plan. identifying, by scanning , a first group of tables including one or more tables joined by the binary operation corresponding to the identified node, and a second group of tables including one or more tables; ,
a step of generating operation-related information including information indicating the identified first table group and the second table group and confirmed by a user of the computer ;
outputting the generated operation-related information;
A method of analysis comprising