JP2015121924A

JP2015121924A - Data management system, server device, control method of server device, and program

Info

Publication number: JP2015121924A
Application number: JP2013265095A
Authority: JP
Inventors: 峻輔太田; Shunsuke Ota
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-12-24
Filing date: 2013-12-24
Publication date: 2015-07-02

Abstract

PROBLEM TO BE SOLVED: To arrange multiple pieces of data which are highly likely to be acquired at a time, on one database system.SOLUTION: A server device receives a request for processing a data group managed in a plurality of database systems from a client device, and executes data processing on the database systems. The server device analyzes a condition, which is to be executed to a data group to be processed and is set in a processing request on the database systems received from the client device, and acquires information for identifying a location of data corresponding to the analyzed condition, from any of data systems. The server device executes data processing operation on the data group in the data systems, on the basis of the acquired identifying information.

Description

本発明は、データ管理システム、サーバ装置、サーバ装置の制御方法、及びプログラムに関するものである。 The present invention relates to a data management system, a server device, a server device control method, and a program.

大量のデータを扱うようなデータ管理システムにおいては、一つのデータベースシステムにてそれらのデータを扱うのではなく、複数のデータベースシステムに、データを分散させ、管理することが多い。このような情報処理システムを分散データベースシステムと呼ぶ。このような分散データベースシステムを用いることで、処理の一極集中を防ぐことが可能となり、各データベースシステムの処理負荷の軽減につながる効果がある。 In a data management system that handles a large amount of data, the data is often distributed and managed in a plurality of database systems, rather than being handled by a single database system. Such an information processing system is called a distributed database system. By using such a distributed database system, it is possible to prevent a concentration of processing from being concentrated, and there is an effect of reducing the processing load of each database system.

一般的に、分散データベースシステムにおいては、どのようにデータを分散させるかが、パフォーマンス向上の重要なファクターとなる。特許文献１では、データのアクセス回数、およびそのデータベースシステムが保持するリソース（例えば、主記憶量など）を考慮し、各データベースシステムの処理負荷が均等になるようにデータを分散させるシステムを考案している。
例えば、リソースが多いデータベースシステムに対しては、リソースが少ないデータベースシステムよりも、多くのデータを配置する。このように分散させることで、処理負荷が均等となり、局所的なパフォーマンスの劣化を防ぐことが可能となる。 Generally, in a distributed database system, how to distribute data is an important factor for improving performance. Patent Document 1 devised a system that distributes data so that the processing load of each database system is equal in consideration of the number of data accesses and the resources (for example, main storage amount) held by the database system. ing.
For example, a larger amount of data is allocated to a database system with more resources than a database system with fewer resources. By dispersing in this way, the processing load becomes uniform and local performance degradation can be prevented.

特開平６−２５９４７８号公報JP-A-6-259478

一般的に、データベースシステムから得られるデータに対して、何らかの処理（ソートや集計等）を実施する場合には、データベースシステム自身にそれらの処理を実施させるほうが、パフォーマンス向上につながるとされている。
例えば、集計処理を実施するような場合に、処理対象のデータが、複数のデータベースシステムに分散されているとする。この場合、アプリケーションが、複数のデータベースに接続し、データを取得し、さらにアプリケーション自身が、それらのデータの集計を行う必要が発生する。
これにより情報処理システムにおいては、大きなパフォーマンスの劣化につながる。このことから、分散データベースシステムにおいて、パフォーマンス向上を実現するためには、一度のリクエストで同時に処理されるデータは、単一のデータベースシステムに保持しておくことが必要となる。 In general, when performing some processing (sorting, totaling, etc.) on data obtained from a database system, it is considered that the performance improvement is caused by causing the database system itself to perform those processing.
For example, it is assumed that the data to be processed is distributed in a plurality of database systems when performing the aggregation process. In this case, the application needs to connect to a plurality of databases to acquire data, and the application itself needs to aggregate these data.
As a result, in the information processing system, the performance is greatly degraded. For this reason, in the distributed database system, in order to realize the performance improvement, it is necessary to store the data processed simultaneously by one request in a single database system.

特許文献１に記載の分散方法では、アクセス数と、データベースシステムのリソースのみが考慮されているため、一度の処理で同時に取得されるデータが、複数のデータベースシステムに分散して保持され、パフォーマンスが劣化する可能性がある。しかしながら、一度の処理で同時に取得されるデータは、リクエストを発行するクライアントによって大きく異なるため、一度に取得されるデータを、一つのデータベースシステムにまとめることは非常に難しい場合があった。
本発明は、上記の課題を解決するためになされたもので、本発明の目的は、一度に取得される可能性の高いデータを一つのデータベースシステムに配置できる仕組みを提供することである。 In the distribution method described in Patent Document 1, only the number of accesses and the resources of the database system are taken into consideration, so that data acquired simultaneously in one process is distributed and held in a plurality of database systems, and performance is improved. There is a possibility of deterioration. However, since the data acquired at the same time in one process varies greatly depending on the client issuing the request, it may be very difficult to combine the data acquired at one time into one database system.
The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a mechanism that can arrange data that is highly likely to be acquired at one time in one database system.

上記目的を達成する本発明のデータ管理システムは以下に示す構成を備える。
複数のデータベースシステムで管理されるデータ群に対する処理の要求をクライアント装置から受け取り、複数のデータベースシステムにデータ処理を実行する第１のサーバ装置と、各データベースシステムのデータ群を操作する第２のサーバ装置とを含むデータ管理システムであって、前記第１のサーバ装置は、前記クライアント装置から受け取る複数のデータベースシステムに対する処理の要求に設定される条件を解析する第１の解析手段と、前記第１の解析手段により解析された条件に対応するデータの所在を特定する情報をいずれかのデータシステムから取得する取得手段と、前記取得手段が取得した前記特定する情報に基づいて、各データベースシステムのデータ群に対してデータ処理操作を実行する実行手段と、を備え、いずれかのデータベースシステムは、前記第１の解析手段が解析した条件履歴を前記第１のサーバ装置から取得して記憶装置を用いて管理する第１の管理手段と、前記第１の解析手段により解析された条件に対応するデータの所在を特定する情報を前記記憶装置で管理する第２の管理手段と、を備え、前記第２のサーバ装置は、いずれかのデータベースシステムが管理する前記条件履歴を取得して各条件に対応して同時に処理されるべき複数のデータが各データベースシステムに分散して配置されているかどうかについて前記所在を特定する情報を用いて解析する第２の解析手段と、各条件に対応して同時に処理されるべき複数のデータが各データベースシステムに分散して配置されている場合、前記第２の解析手段による解析結果に基づいて前記所在を特定する情報を更新する更新手段と、前記更新手段により更新された前記所在を特定する情報を用いて、各条件に対応して同時に操作すべきデータ群がいずれかのデータベースシステムに配置されるように再配置処理を行う再配置手段と、を備えることを特徴とする。 The data management system of the present invention that achieves the above object has the following configuration.
A first server device that receives processing requests for data groups managed by a plurality of database systems from a client device and executes data processing for the plurality of database systems, and a second server that operates the data groups of each database system A first data analysis system including a first analysis unit configured to analyze a condition set in a process request for a plurality of database systems received from the client device; The acquisition means for acquiring the information specifying the location of the data corresponding to the condition analyzed by the analysis means of any one of the data systems, and the data of each database system based on the specified information acquired by the acquisition means Execution means for executing data processing operations on the group, This database system is analyzed by the first management means for acquiring the condition history analyzed by the first analysis means from the first server device and managing it using the storage device, and by the first analysis means. And a second management unit that manages information for specifying the location of data corresponding to the condition in the storage device, and the second server device acquires the condition history managed by any database system Second analysis means for analyzing whether or not a plurality of data to be processed simultaneously corresponding to each condition is distributed and arranged in each database system using information for specifying the location, and each condition When a plurality of data to be processed simultaneously corresponding to the above are arranged in each database system in a distributed manner, based on the analysis result by the second analysis means, A data group to be operated simultaneously corresponding to each condition is arranged in any database system using update means for updating information for specifying presence and information for specifying the location updated by the update means. And rearrangement means for performing rearrangement processing as described above.

本発明によれば、一度に取得される可能性の高いデータを一つのデータベースシステムに配置できる。 According to the present invention, data that is highly likely to be acquired at a time can be arranged in one database system.

データ管理システムの一例を示す図である。It is a figure which shows an example of a data management system. データ管理システムの各機能を説明するブロック図である。It is a block diagram explaining each function of a data management system. データ使用頻度解析部による処理結果例を示す図である。It is a figure which shows the example of a processing result by a data usage frequency analysis part. データグルーピング処理部よる相関係数算出結果例を示す図である。It is a figure which shows the example of a correlation coefficient calculation result by a data grouping process part. グルーピング、統合、再統合処理結果例を示す図である。It is a figure which shows an example of a grouping, integration, and a reintegration process result. 決定木の結果例を示す図である。It is a figure which shows the result example of a decision tree. クエリ履歴テーブルのスキーマ例を示す図である。It is a figure which shows the example of a schema of a query history table. 接続情報管理テーブルのスキーマ例を示す図である。It is a figure which shows the example of a schema of a connection information management table. サーバ装置の制御方法を説明するフローチャートである。It is a flowchart explaining the control method of a server apparatus. サーバ装置の制御方法を説明するフローチャートである。It is a flowchart explaining the control method of a server apparatus. グルーピング処理結果における出た数の偏り例を示す図である。It is a figure which shows the example of deviation of the number which came out in the grouping process result. サーバ装置の制御方法を説明するフローチャートである。It is a flowchart explaining the control method of a server apparatus.

次に本発明を実施するための最良の形態について図面を参照して説明する。
＜システム構成の説明＞
〔第１実施形態〕
図１は、本実施形態を示すデータ管理システムの一例を示す図である。本例は、データベースシステムとＷＥＢサーバとを接続し、クライアント装置（クライアントＰＣ（以下、単にクライアントと呼ぶ））がクラウドコンピュータシステム環境下で情報を処理するシステム例である。なお、本実施形態にシステムは、複数のデータベースシステムで管理される所定のデータ群に対する処理の要求をクライアント装置から受け取り、複数のデータベースシステムにデータ処理を実行する第１のサーバ装置と、各データベースシステムのデータ群を操作する第２のサーバ装置とを含むデータ管理システムを例とする。ここで、データベースシステムは、後述するようにデータ群を記憶して管理するためのハードウエアとソフトウエアとを備える情報処理装置、あるいはサーバ装置として構築することが可能である。また、本実施形態に示す複数のサーバ装置とを識別するため、以下の説明ではデータベースシステムとしてデータ管理処理を説明する。 Next, the best mode for carrying out the present invention will be described with reference to the drawings.
<Description of system configuration>
[First Embodiment]
FIG. 1 is a diagram illustrating an example of a data management system according to the present embodiment. In this example, a database system and a WEB server are connected to each other, and a client device (client PC (hereinafter simply referred to as a client)) processes information in a cloud computer system environment. The system according to the present embodiment receives a request for processing for a predetermined data group managed by a plurality of database systems from a client device, and executes data processing for the plurality of database systems, and each database. A data management system including a second server device that operates a data group of the system is taken as an example. Here, the database system can be constructed as an information processing apparatus or a server apparatus including hardware and software for storing and managing a data group as will be described later. In addition, in order to identify the plurality of server apparatuses shown in the present embodiment, the following description will be made of data management processing as a database system.

図１において、クライアント１、ＷＥＢサーバ２、Ｂａｔｃｈサーバ３、およびデータベースシステム４、５は、ＣＰＵ、ＲＡＭ、ＲＯＭ、ＨＤＤ、やネットワーク部等から構成される一般的な計算機である。クライアント１は、ブラウザや専用のアプリケーションを用いて、ネットワーク６を介してクＷＥＢサーバ２に接続される。ＷＥＢサーバ２は、ＨＴＴＰ等のプロトコルに準拠したＷＥＢサービスがホストされている。このＷＥＢサービスでは、クライアント１からのデータ取得に関するクエリ（例えば、項目Ａの値が１０よりも大きいデータを取得）を受付け、データベースシステム４，５から取得し、クライアント１に返却することが可能である。 In FIG. 1, a client 1, a WEB server 2, a Batch server 3, and database systems 4 and 5 are general computers including a CPU, a RAM, a ROM, an HDD, a network unit, and the like. The client 1 is connected to the web server 2 via the network 6 using a browser or a dedicated application. The WEB server 2 hosts a WEB service compliant with a protocol such as HTTP. In this WEB service, it is possible to receive a query related to data acquisition from the client 1 (for example, to acquire data whose value of the item A is larger than 10), acquire it from the database systems 4 and 5, and return it to the client 1. is there.

図２は、図１に示したデータ管理システムの各機能を説明するブロック図である。以下、ＷＥＢサーバ２、Ｂａｔｃｈサーバ３の各処理部の機能を説明する。
まず、クＷＥＢサーバ２の各処理部の機能について説明する。
図２において、リクエスト受信部２１は、クライアント１からのリクエストを受け付ける。リクエスト受信部２１は、クライアント１より発行されたリクエストに含まれるクエリの構文解析を行うために、クエリ記録部２２にクエリを渡す。ここでいうリクエストとは、データ群に対して設定される削除、更新、作成といった、処理依頼である。また、クエリとは、例えば、「ある項目Ａの値が１０よりも大きい」といった処理対象のデータを限定するための条件を意味する。例えば、リクエストが削除で、クエリが前述のとおりのものであるとした場合、項目Ａの値が、「１０」よりも大きいデータだけが、データベースシステム４、５より削除されるという処理が実行される。
クエリ記録部２２は、クエリを構文解析し、その結果を後述するデータベースシステム４のクエリ履歴テーブル４１に記録する。接続情報取得部２３は、クエリ記録部２２からクエリの解析結果を受け取り、該クエリに該当するデータがデータベースシステム４、５のどこに格納されているかを示す所在を特定する情報を、後述する接続情報管理テーブル４２から取得する。データ操作部２４は、接続情報取得部２３より受け取った接続情報を元に、データベースシステム４、５に接続し、リクエスト、およびクエリに従ってデータ処理操作を実行し、処理結果を、レスポンス送信部２５に渡す。レスポンス送信部２５は、処理結果をクライアント１に返却する。 FIG. 2 is a block diagram for explaining each function of the data management system shown in FIG. Hereinafter, functions of the processing units of the WEB server 2 and the Batch server 3 will be described.
First, the function of each processing unit of the WEB server 2 will be described.
In FIG. 2, the request receiving unit 21 receives a request from the client 1. The request receiving unit 21 passes the query to the query recording unit 22 in order to parse the query included in the request issued from the client 1. The request here is a processing request such as deletion, update, or creation set for the data group. A query means a condition for limiting data to be processed, such as “a value of a certain item A is larger than 10”. For example, if the request is a deletion and the query is as described above, only the data whose item A value is larger than “10” is deleted from the database systems 4 and 5. The
The query recording unit 22 parses the query and records the result in a query history table 41 of the database system 4 described later. The connection information acquisition unit 23 receives the query analysis result from the query recording unit 22, and specifies information indicating where the data corresponding to the query is stored in the database systems 4 and 5. Obtained from the management table 42. The data operation unit 24 connects to the database systems 4 and 5 based on the connection information received from the connection information acquisition unit 23, executes the data processing operation according to the request and the query, and sends the processing result to the response transmission unit 25. hand over. The response transmission unit 25 returns the processing result to the client 1.

続いて、Ｂａｔｃｈサーバ３の各処理部の機能について説明する。
Ｂａｔｃｈサーバ３は、最適なデータ配置を実現するための各種処理を実施する為のサーバである。本実施形態においては、ＷＥＢサーバ２とＢａｔｃｈサーバ３が別のサーバとして表現されているが、これらは同一のサーバ内で動作していても問題ない。 Next, functions of each processing unit of the Batch server 3 will be described.
The Batch server 3 is a server for performing various processes for realizing an optimal data arrangement. In the present embodiment, the WEB server 2 and the Batch server 3 are expressed as separate servers, but there is no problem even if they are operating in the same server.

まず、クエリ履歴取得部３１は、クエリ記録部２２が、クエリ履歴テーブル４１に記録したクエリの履歴情報（条件履歴情報）を取得する。また、データ使用頻度解析部３２は、クエリ履歴取得部３１から渡された履歴の内容に基づき、各クエリに該当する現在のデータベースに存在するデータを求める。この結果、図３に示すような、各クエリ（ｑ１（クエリ３１１）〜ｑ６（クエリ３１６））に対して、いずれのデータ（ｄ１（データ３０１）〜ｄ６（データ３０６））が該当するかのマップが求められる。
例えば、図３においては、クエリｑ１（クエリ３１１）に該当するデータは、データｄ１（データ３０１）、およびｄ３（データ３０３）ということを意味する。 First, the query history acquisition unit 31 acquires the query history information (condition history information) recorded by the query recording unit 22 in the query history table 41. Further, the data usage frequency analysis unit 32 obtains data existing in the current database corresponding to each query based on the contents of the history passed from the query history acquisition unit 31. As a result, as shown in FIG. 3, which data (d1 (data 301) to d6 (data 306)) corresponds to each query (q1 (query 311) to q6 (query 316)). A map is required.
For example, in FIG. 3, the data corresponding to the query q1 (query 311) means data d1 (data 301) and d3 (data 303).

次に、データグルーピング処理部３３は、どのデータとどのデータが、１度のリクエストで処理される傾向が高いかを求める。具体的には、一般的なデータ解析手法である協調フィルタリング等の手法を用いて、図４に示すようなデータ間の相関係数を求める。
図４においては、例えば、データｄ１（データ４０１）と、データｄ２（データ４１２）の相関係数は０．１であり、データｄ１（データ４０１）と、データｄ３（データ４１３）の相関係数は０．８であることを表している。相関係数が高い方が、よりそれらのデータに関係があるということを表す。
この相関係数を元に、同時に処理されやすいデータが同一のデータベースシステムに配置されるようにグルーピング処理を行う。グルーピングの処理結果例を図５の（ａ）に示す。この例では、相関係数が０．７以上のものをグルーピングしている。図４の例では、データｄ１（データ４０１）との相関係数が０．７以上のものはデータｄ３（データ４１３）であるため、それらをグループ１（グループ５０１）にグルーピングする。グルーピング処理を行うことで、図５の（ａ）に示すグループ化したグルーピング結果が得られる。 Next, the data grouping processing unit 33 determines which data and which data are likely to be processed with a single request. Specifically, a correlation coefficient between data as shown in FIG. 4 is obtained using a technique such as collaborative filtering which is a general data analysis technique.
In FIG. 4, for example, the correlation coefficient between data d1 (data 401) and data d2 (data 412) is 0.1, and the correlation coefficient between data d1 (data 401) and data d3 (data 413). Represents 0.8. A higher correlation coefficient indicates that the data is more related.
Based on this correlation coefficient, grouping processing is performed so that data that can be processed simultaneously is arranged in the same database system. An example of grouping processing results is shown in FIG. In this example, those having a correlation coefficient of 0.7 or more are grouped. In the example of FIG. 4, data d3 (data 413) having a correlation coefficient of 0.7 or more with data d1 (data 401) is grouped into group 1 (group 501). By performing the grouping process, a grouped grouping result shown in FIG. 5A is obtained.

データ統合処理部３４は、データグルーピング処理部３３が作成したグループの統合処理を実施する。この処理では、同じデータを含む各グループを統合する。図５（ａ）の例では、データｄ３（データ５０１２）、（データ５０２２）は、グループ１（グループ５０１）、およびグループ２（５０２）の双方にグルーピングされている。そこで、このグループ１（グループ５０１）と、グループ２（グループ５０２）を統合する。この処理を行うことで、図５の（ｂ）に示すようなグループが作成される。 The data integration processing unit 34 performs the group integration processing created by the data grouping processing unit 33. In this process, each group including the same data is integrated. In the example of FIG. 5A, data d3 (data 5012) and (data 5022) are grouped into both group 1 (group 501) and group 2 (502). Therefore, group 1 (group 501) and group 2 (group 502) are integrated. By performing this process, a group as shown in FIG. 5B is created.

データ再統合処理部３５は、データ統合処理部３４が統合したグループを、データ数に応じて、更に統合する。具体的には、データ再統合処理部３５は、事前に本システムに入力されていた分割数Ｎのグループ数となるように、データ統合処理部３４が作成したグループを統合する。また、この時、Ｎ個のグループに含まれるデータ数が等しくなるように、データ数を考慮しながら、グループの統合を行う。例えば、Ｎを「２」とした時には、最終的に図５の（ｃ）に示すようなグループが作成される。なお、Ｎの値は、例えば、データ量や、処理負荷などから、求めることを想定している。 The data reintegration processing unit 35 further integrates the group integrated by the data integration processing unit 34 according to the number of data. Specifically, the data reintegration processing unit 35 integrates the groups created by the data integration processing unit 34 so that the number of groups is the number of divisions N input in advance to the system. At this time, the groups are integrated while considering the number of data so that the number of data included in the N groups is equal. For example, when N is “2”, a group as shown in FIG. 5C is finally created. Note that it is assumed that the value of N is obtained from, for example, the data amount or processing load.

接続情報解析部３６は、各データが、再統合処理で求められたグループに分配されるような条件（例えば、項目Ａの値が「１０」より大きいものは、グループ１（グループ５２１）、１０以下のものは、グループ２（グループ５２２）といった条件）を求める。
具体的には、接続情報解析部３６は、データ解析手法の一つである決定木等のアルゴリズムを用いて、各データが、再統合処理後のグループに分配される条件を求める。この時、この処理で求められる決定木の例を図６に示す。
図６に示される項目６０１〜６０５は、図２に示すデータテーブル４３，５１の項目に対応する。例えば、項目６０１のＴｉｍｅＳｔａｍｐの値が、２０１２／７／１以上（条件６１１）で、Ｔｏｔａｌ（ｓｍａｌｌ）（条件６０５）の値が、「５０」以上（条件６１２）のものは、グループ１に分配される、ということを表す。
本実施形態においては、条件を求めるアルゴリズムとして、決定木を用いたが、ほかのアルゴリズムを用いて、条件を求めても問題ない。接続情報解析部３６は、データを分配する条件を求めた後に、後述する接続情報管理テーブル４２に条件を登録する。
図２において、データ再配置処理部３７は、接続情報解析部３６が求めた条件通りに、データベースシステム４，５のデータを、再配置する。 The connection information analysis unit 36 uses the condition that each data is distributed to the group obtained by the reintegration process (for example, the item A having a value greater than “10” is group 1 (group 521), 10 The following obtains the condition of group 2 (group 522).
Specifically, the connection information analysis unit 36 uses an algorithm such as a decision tree, which is one of data analysis methods, to obtain a condition for distributing each data to the group after the reintegration processing. At this time, an example of the decision tree obtained by this processing is shown in FIG.
Items 601 to 605 shown in FIG. 6 correspond to the items of the data tables 43 and 51 shown in FIG. For example, when the value of TimeStamp of item 601 is 2012/7/1 or more (condition 611) and the value of Total (small) (condition 605) is “50” or more (condition 612), it is distributed to group 1 It means that it is done.
In this embodiment, a decision tree is used as an algorithm for obtaining a condition. However, there is no problem even if a condition is obtained using another algorithm. After obtaining the conditions for distributing data, the connection information analysis unit 36 registers the conditions in the connection information management table 42 described later.
In FIG. 2, the data rearrangement processing unit 37 rearranges the data of the database systems 4 and 5 in accordance with the conditions obtained by the connection information analysis unit 36.

最後に、データベースシステム４、５について説明する。
図２において、データベースシステム４は、クエリ履歴テーブル４１、接続情報管理テーブル４２、および一つ以上のデータテーブル４３から構成される。また、データベースシステム５は、一つ以上のデータテーブル５１から構成される。それぞれのテーブルについて、以下に説明する。 Finally, the database systems 4 and 5 will be described.
In FIG. 2, the database system 4 includes a query history table 41, a connection information management table 42, and one or more data tables 43. The database system 5 is composed of one or more data tables 51. Each table will be described below.

クエリ履歴テーブル４１は、図７に示すように、各クエリに一意に振られたクエリＩＤ７０１毎に、データテーブル４３、５１の項目７１１〜７１８に対して、どのようなクエリがなされたかを管理するためのテーブルである。例えば、クエリＩＤｑ６（クエリ７０７）では、項目７１１がＴｅｎａｎｔＩＤ７１１で、Ｔ１と等しく、ｔｏｔａｌ１の値が「２００」よりも大きいクエリであることを示す。テーブル中に記載されている「Ｅｑ」、「Ｇｔ」、「Ｌｔ」は、それぞれ「等しい」、「より大きい」、「未満」を意味する演算子である。演算子の種類は、これらだけに限定されたものではなく、どのような演算子であっても構わない。 As shown in FIG. 7, the query history table 41 manages what queries are made for the items 711 to 718 of the data tables 43 and 51 for each query ID 701 uniquely assigned to each query. It is a table for. For example, in the query ID q6 (query 707), the item 711 is Tenant ID 711, which is equal to T1, and indicates that the value of total1 is larger than “200”. “Eq”, “Gt”, and “Lt” described in the table are operators that mean “equal”, “greater than”, and “less than”, respectively. The types of operators are not limited to these, and any operator may be used.

次に、図８を用いて、接続情報管理テーブル４２の説明をする。
まず図８の（ａ）には、本システムによりデータの再配置が行われる前の、接続情報管理テーブル４２を示す。
接続情報管理テーブル４２は、クライアント１から発行されたリクエストに含まれるクエリを元に、そのクエリに該当するデータがどのデータベースシステムに保持されているかを判別するためのデータが保持されるテーブルである。
接続情報管理テーブル４２は、データテーブル４３の項目と同数のカラム８０１〜８０８、および、データベースシステムへの接続情報を示すカラム８０９から構成される。本発明を実施する前には、図８の（ａ）に示すように、本テーブルには１行だけのデータが登録されており、また、全ての項目の値が「＊」となっている。「＊」は「条件を問わない」を意味する。例えば、クライアント１から発行されるクエリにおいて、ＴｅｎａｎｔＩＤに対する条件が記載されていたとしても、それらは考慮されず、接続情報８１１を用いて、データベースシステムを選択する、ということを意味する。 Next, the connection information management table 42 will be described with reference to FIG.
First, FIG. 8A shows the connection information management table 42 before data rearrangement is performed by this system.
The connection information management table 42 is a table that holds data for determining in which database system the data corresponding to the query is held based on the query included in the request issued from the client 1. .
The connection information management table 42 is composed of the same number of columns 801 to 808 as the items of the data table 43 and a column 809 indicating connection information to the database system. Before the present invention is implemented, as shown in FIG. 8A, only one line of data is registered in this table, and the values of all items are “*”. . “*” Means “regardless of conditions”. For example, even if conditions for TenantID are described in a query issued from the client 1, they are not considered and it means that a database system is selected using the connection information 811.

次に、図８の（ｂ）を用いて、本発明を実施し、データの再配置を行った後の、接続情報管理テーブルの例を示す。
上述の通り、接続情報解析部３６が求めた決定木から求められる条件が、本テーブルには登録される。例えば、条件８１２では、図６中の条件６１１、および６１２を表している。例えば、クライアント１から、ＴｉｍｅＳｔａｍｐが２０１２／８／１より大きく、Ｔｏｔａｌ（ｓｍａｌｌ）の値が１００以上のデータ取得に関するデータリクエストが来た場合には、接続情報取得部２３は、図８の（ｂ）のデータ８１２を取得する。 Next, an example of a connection information management table after implementing the present invention and rearranging data will be shown using FIG. 8B.
As described above, the conditions obtained from the decision tree obtained by the connection information analysis unit 36 are registered in this table. For example, the condition 812 represents the conditions 611 and 612 in FIG. For example, when a data request for data acquisition is received from the client 1 in which TimeStamp is greater than 2012/8/1 and the value of Total (small) is 100 or more, the connection information acquisition unit 23 selects (b) in FIG. ) Of data 812 is acquired.

最後に、データテーブル４３の説明をする。
データテーブル４３は、クエリ履歴テーブル４１や、接続情報管理テーブル４２といった管理用の情報が格納されるテーブルではなく、実際のデータを扱う為のテーブルである。分散データベースシステムの場合、図２に示すように、このテーブルのみが複数のデータベースシステムに、存在することになる。ユーザは、このテーブルのカラム（項目）毎に、前述したようなクエリを指定することが可能となる。 Finally, the data table 43 will be described.
The data table 43 is not a table storing management information such as the query history table 41 or the connection information management table 42, but a table for handling actual data. In the case of a distributed database system, as shown in FIG. 2, only this table exists in a plurality of database systems. The user can specify a query as described above for each column (item) of this table.

図９、図１０を用いて、本実施形態におけるＷＥＢサーバ２でのクライアント１からのリクエスト処理、およびＢａｔｃｈサーバ３での、データ再配置処理の処理フローについて説明する。 A processing flow of request processing from the client 1 in the WEB server 2 and data relocation processing in the Batch server 3 in the present embodiment will be described with reference to FIGS. 9 and 10.

〔ＷＥＢサーバ２の処理〕
図９は、本実施形態を示すサーバ装置の制御方法を説明するフローチャートである。本例は、クライアント１からのリクエスト処理に関する処理例である。なお、各ステップは、ＷＥＢサーバ２のＣＰＵが記憶装置に記憶された制御プログラム（図２に示したモジュール）を実行することで実現される。以下、制御主体を図２に示したモジュールとして説明する。
Ｓ９０１では、リクエスト受信部２１は、クライアント１からのリクエストを受け付け、リクエスト内容を、クエリ記録部２２に渡す。Ｓ９０２では、クエリ記録部２２は、渡されたリクエストにクエリが含まれるか否かを判断する。クエリが含まれていると、クエリ記録部２２が判断した場合には、Ｓ９０３に遷移し、含まれていないと、クエリ記録部２２が判断した場合には、Ｓ９０５に遷移する。
Ｓ９０３では、クエリ記録部２２は、渡されたクエリに対して構文解析し、各データ項目に対して、どのような条件や、どのような演算子が用いられているかを求める。Ｓ９０４では、クエリ記録部２２は、クエリ履歴テーブル４１に、解析結果を記録する。 [Process of WEB server 2]
FIG. 9 is a flowchart for explaining a control method of the server apparatus according to the present embodiment. This example is a processing example related to processing of a request from the client 1. Each step is realized by the CPU of the WEB server 2 executing a control program (module shown in FIG. 2) stored in the storage device. Hereinafter, the control entity will be described as the module shown in FIG.
In S <b> 901, the request reception unit 21 receives a request from the client 1, and passes the request content to the query recording unit 22. In S902, the query recording unit 22 determines whether a query is included in the passed request. If the query recording unit 22 determines that a query is included, the process proceeds to S903, and if not included, the process proceeds to S905 when the query recording unit 22 determines.
In S903, the query recording unit 22 parses the passed query and determines what conditions and what operators are used for each data item. In S904, the query recording unit 22 records the analysis result in the query history table 41.

Ｓ９０５では、接続情報取得部２３は、接続情報管理テーブル４２より接続情報を取得し、Ｓ９０３で構文解析されたクエリの条件を元に、該クエリに該当するデータがどのデータベースシステムに保持されているかを判別する。そして、得られたデータベースシステムへの接続情報を、データ操作部２４に渡す。 In S905, the connection information acquisition unit 23 acquires connection information from the connection information management table 42, and which database system holds data corresponding to the query based on the query condition parsed in S903. Is determined. Then, the obtained connection information to the database system is passed to the data operation unit 24.

Ｓ９０６では、データ操作部２４は、渡された接続情報を元に、データベースシステムに接続し、リクエスト、クエリに応じた処理を実行する。その処理結果をレスポンス送信部２５に渡す。Ｓ９０７では、レスポンス送信部２５は、処理結果を、クライアント１に返却する。 In S906, the data operation unit 24 connects to the database system based on the passed connection information, and executes processing according to the request and the query. The processing result is passed to the response transmission unit 25. In S907, the response transmission unit 25 returns the processing result to the client 1.

以上、Ｓ９０１〜Ｓ９０７までの処理を行うことで、クライアントから発行されたリクエスト、およびクエリに応じた処理を実行した上で、そのクエリの内容を、クエリ履歴テーブル４１に記録することが可能となる。 As described above, by performing the processing from S901 to S907, it is possible to record the contents of the query in the query history table 41 after executing the processing issued according to the request issued from the client and the query. .

〔Ｂａｔｃｈサーバ３の処理〕
図１０は、本実施形態を示すサーバ装置の制御方法を説明するフローチャートである。本例は、Ｂａｔｃｈサーバ３がデータ再配置を実行するときのいずれかのグループに関連づけるグループ統合処理例である。なお、各ステップは、Ｂａｔｃｈサーバ３のＣＰＵが記憶装置に記憶された制御プログラム（図２に示したモジュール）を実行することで実現される。以下、制御主体を図２に示したモジュールとして説明する。また、本フローが実施される前に、本データ管理システムに対して、分割数Ｎ、および相関係数のしきい値が入力されているものとする。
Ｓ１００１では、クエリ履歴取得部３１は、クエリ履歴テーブル４１からクエリ履歴群を取得する。
Ｓ１００２では、データ使用頻度解析部３２は、Ｓ１００１で取得した履歴群から一つのクエリを取得し、該クエリに該当するデータを求める。 [Process of Batch Server 3]
FIG. 10 is a flowchart for explaining a control method of the server apparatus according to the present embodiment. This example is a group integration process example associated with any group when the Batch server 3 executes data rearrangement. Each step is realized by the CPU of the Batch server 3 executing a control program (the module shown in FIG. 2) stored in the storage device. Hereinafter, the control entity will be described as the module shown in FIG. Further, it is assumed that the division number N and the correlation coefficient threshold value are input to the data management system before this flow is executed.
In S <b> 1001, the query history acquisition unit 31 acquires a query history group from the query history table 41.
In S1002, the data usage frequency analysis unit 32 acquires one query from the history group acquired in S1001, and obtains data corresponding to the query.

Ｓ１００３では、データ使用頻度解析部３２は、Ｓ１００１で取得したクエリ履歴群の全てに対してＳ１００２の処理を実行したか否かを確認し、全てに対して処理をしている場合には、Ｓ１００４に遷移し、処理をしていない場合には、Ｓ１００２に遷移する。 In S1003, the data usage frequency analysis unit 32 confirms whether or not the processing of S1002 has been executed for all of the query history groups acquired in S1001, and if processing has been performed for all of them, S1004 If the process transitions to S1002, the process proceeds to S1002.

Ｓ１００４では、データグルーピング処理部は、Ｓ１００２、およびＳ１００３の処理結果に対して、協調フィルタリングなどのアルゴリズムを用いて、各データの相関係数を求める。この処理を実行することで、図４に示すような、データ間の相関係数が格納されたマップを得ることが可能となる。 In S1004, the data grouping processing unit obtains the correlation coefficient of each data using an algorithm such as collaborative filtering for the processing results of S1002 and S1003. By executing this processing, it is possible to obtain a map storing correlation coefficients between data as shown in FIG.

Ｓ１００５では、データグルーピング処理部は、Ｓ１００４で得られたマップから、任意の１データを選択し、そのデータとの相関係数が、予め定められたしきい値以上のデータを検出し、そのデータをグルーピングする。 In S1005, the data grouping processing unit selects one arbitrary data from the map obtained in S1004, detects data whose correlation coefficient with the data is equal to or greater than a predetermined threshold, and the data Group.

Ｓ１００６では、グルーピング処理部は、Ｓ１００５の処理を、Ｓ１００４で求められたマップに含まれる全てのデータに対して実施したか否かを判断する。すべてのデータに対して実施した場合には、Ｓ１００７に遷移し、実施していない場合には、Ｓ１００５に遷移する。Ｓ１００５〜Ｓ１００６までの処理を実施することで、図５の（ａ）に示すようなグルーピング結果が完成する。
Ｓ１００７では、データ統合処理部３４は、データグルーピング処理部３３が求めたグループの中から任意の１つグループを選択する。Ｓ１００８では、データ統合処理部３４は、Ｓ１００７で選択したグループの中から任意の１データを選択する。 In S1006, the grouping processing unit determines whether or not the processing in S1005 has been performed on all data included in the map obtained in S1004. If it has been performed on all data, the process proceeds to S1007, and if not, the process proceeds to S1005. By performing the processing from S1005 to S1006, a grouping result as shown in FIG. 5A is completed.
In S1007, the data integration processing unit 34 selects an arbitrary group from the groups obtained by the data grouping processing unit 33. In S1008, the data integration processing unit 34 selects one arbitrary data from the group selected in S1007.

Ｓ１００９では、データ統合処理部３４は、Ｓ１００８で選択したデータを含むグループが、Ｓ１００７で選択したグループ以外に存在するか否かを検索する。ここで、存在するとデータ統合処理部３４が判断した場合には、Ｓ１０１０に遷移し、存在しないとデータ統合処理部３４が判断した場合には、Ｓ１０１１に遷移する。
Ｓ１０１０では、データ統合処理部３４は、Ｓ１００９で検出したグループと、Ｓ１００７で選択したグループを統合する。 In S1009, the data integration processing unit 34 searches for a group including the data selected in S1008 other than the group selected in S1007. If the data integration processing unit 34 determines that it exists, the process proceeds to S1010. If the data integration processing unit 34 determines that it does not exist, the process proceeds to S1011.
In S1010, the data integration processing unit 34 integrates the group detected in S1009 and the group selected in S1007.

Ｓ１０１１では、データ統合処理部３４は、Ｓ１００７で選択したグループに存在する全データに対して、Ｓ１００８〜Ｓ１０１０の処理を実施したか否かを確認する。実施した場合には、Ｓ１０１２に遷移し、実施していない場合には、Ｓ１００８に遷移する。 In S1011, the data integration processing unit 34 confirms whether or not the processing of S1008 to S1010 has been performed on all data existing in the group selected in S1007. If so, the process proceeds to S1012. If not, the process proceeds to S1008.

Ｓ１０１２では、データ統合処理部３４は、データグルーピング処理部３３が求めた全グループに対して、Ｓ１００７〜Ｓ１０１１の処理を実施したか否かを確認する。実施した場合には、Ｓ１０１３に遷移し、実施していない場合には、Ｓ１００７に遷移する。Ｓ１００７〜Ｓ１０１２までの処理を実施することで、図５の（ｂ）に示すようなグルーピング結果が完成する。
Ｓ１０１３では、データ再統合処理部３５は、データ統合処理部３４が求めた全グループの中から、含まれるデータ数が多いグループから順にＮ個のグループを選択する。 In S1012, the data integration processing unit 34 checks whether or not the processing of S1007 to S1011 has been performed on all the groups obtained by the data grouping processing unit 33. If it is implemented, the process proceeds to S1013. If it is not implemented, the process proceeds to S1007. By performing the processing from S1007 to S1012, a grouping result as shown in FIG. 5B is completed.
In step S <b> 1013, the data reintegration processing unit 35 selects N groups in order from the group with the largest number of data included from all the groups obtained by the data integration processing unit 34.

Ｓ１０１４では、データ再統合処理部３５は、データ統合処理部３４が求めた全グループに含まれ、かつ、Ｓ１０１３で選択したグループではないグループの中から、最もデータ数が多いグループを選択する。
Ｓ１０１５では、データ再統合処理部３５は、Ｓ１０１３で選択したＮ個のグループに含まれるデータ数を比較する。Ｓ１０１６では、データ再統合処理部３５は、Ｓ１０１５で比較した結果から、最もデータ数が少ないグループを選択し、そのグループとＳ１０１４で求めたグループを統合する。 In S1014, the data reintegration processing unit 35 selects the group having the largest number of data from the groups that are included in all the groups obtained by the data integration processing unit 34 and are not the groups selected in S1013.
In S1015, the data reintegration processing unit 35 compares the number of data included in the N groups selected in S1013. In S1016, the data reintegration processing unit 35 selects a group having the smallest number of data from the comparison result in S1015, and integrates the group obtained in S1014.

Ｓ１０１７では、データ再統合処理部３５は、データ統合処理部３４が求めた全グループに対して、Ｓ１０１４〜Ｓ１０１６の処理を行ったか否かを判断する。処理を実施しているとデータ再統合処理部３５が判断した場合には、Ｓ１０１８に遷移し、実施していないとデータ再統合処理部３５が判断した場合には、Ｓ１０１４に遷移する。Ｓ１０１３〜Ｓ１０１７の処理を行うことで、図５の（ｃ）に示すようなグルーピング結果が完成する。 In S1017, the data reintegration processing unit 35 determines whether or not the processing of S1014 to S1016 has been performed on all the groups obtained by the data integration processing unit 34. If the data reintegration processing unit 35 determines that the process is being performed, the process proceeds to S1018. If the data reintegration processing unit 35 determines that the process is not being performed, the process proceeds to S1014. By performing the processing of S1013 to S1017, the grouping result as shown in FIG. 5C is completed.

Ｓ１０１８では、接続情報解析部３６は、Ｓ１０１３〜Ｓ１０１７の処理で求められた再統合後のグループに含まれるデータが、それぞれ再統合後のグループに分類されるような決定木を求める。この処理を行うことで、図６に示すような決定木が完成する。 In S1018, the connection information analysis unit 36 obtains a decision tree in which data included in the group after reintegration obtained in the processes of S1013 to S1017 is classified into the group after reintegration. By performing this process, a decision tree as shown in FIG. 6 is completed.

Ｓ１０１９では、接続情報解析部３６は、Ｓ１０１８で得られた決定木を元に、接続情報管理テーブル４２を更新する。これにより、図８の（ｂ）に示すような接続情報管理テーブルが完成する。Ｓ１０２０では、データ再配置処理部３７は、Ｓ１０１９で得られた接続情報管理テーブルを元に、データをデータベースシステム４、５に再配置する。
以上、Ｓ１００１〜Ｓ１０２０までの処理を実施することで、同時に使用される傾向の高いデータがひとつのデータベースシステムにまとまるようなデータの配置を実現することが可能となる。
〔第２実施形態〕
本発明に係る第２実施形態について説明する。以下構成に関しては第１実施形態と異なる点についてのみ説明する。 In S1019, the connection information analysis unit 36 updates the connection information management table 42 based on the decision tree obtained in S1018. Thereby, a connection information management table as shown in FIG. 8B is completed. In S1020, the data rearrangement processing unit 37 rearranges the data in the database systems 4 and 5 based on the connection information management table obtained in S1019.
As described above, by performing the processing from S1001 to S1020, it is possible to realize data arrangement such that data that are likely to be used at the same time are collected in one database system.
[Second Embodiment]
A second embodiment according to the present invention will be described. Only the differences from the first embodiment will be described below with respect to the configuration.

第１実施形態において、データグルーピング処理部３３が、グルーピング処理を行った場合に、例えば、図１１の（ａ）に示すように、各グループに含まれるデータ数に大きな偏りが発生する場合がある。この時、統合処理、および再統合処理（分割数は２とする）を順に実施すると、図１１の（ｂ）に示すように、最終的に作成されるグループに含まれるデータ数にも大きな偏りが発生する。 In the first embodiment, when the data grouping processing unit 33 performs the grouping process, for example, as shown in FIG. 11A, a large deviation may occur in the number of data included in each group. . At this time, when the integration process and the reintegration process (the number of divisions is set to 2) are performed in order, as shown in FIG. 11B, there is a large deviation in the number of data included in the finally created group. Will occur.

上記の説明の具体的な例を、以下に述べる。例えば、クライアント１が操作可能なデータが、そのクライアント１が属する組織（テナント）に紐づくデータのみとされるような制限をもつシステムに対して、本発明の実施例１が適用された場合を想定する。ここで、この「自分が属するテナントのデータのみ操作可能」という制限を実現するためには、リクエスト受信部２１が、「ｔｅｎａｎｔＩＤｅｑ 'ＸＸＸ'」といったクエリ文字列を、自動的に付与することが考えられる。このＸＸＸはクライアント１が属するテナントを示すＩＤ（テナントＩＤ）とし、このＩＤはリクエスト受信部２１が、クライアント１の認証情報を元に求めて、クエリに付与する。このように、テナントＩＤをシステム側で自動、かつ強制的に付与することで、クライアント１が操作するテナントのデータを自身が属するテナントに限定することが可能となる。 Specific examples of the above description will be described below. For example, a case where the first embodiment of the present invention is applied to a system having a restriction that data that can be operated by the client 1 is limited to data associated with an organization (tenant) to which the client 1 belongs. Suppose. Here, in order to realize the restriction that “only the data of the tenant to which the user belongs can be manipulated”, the request reception unit 21 may automatically assign a query character string such as “tenantID eq 'XXX'”. Conceivable. This XXX is an ID (tenant ID) indicating the tenant to which the client 1 belongs, and this ID is given to the query by the request receiving unit 21 based on the authentication information of the client 1. In this way, by automatically and forcibly assigning the tenant ID on the system side, it becomes possible to limit the tenant data operated by the client 1 to the tenant to which the client belongs.

このようなクエリを、リクエスト受信部２１が付与した場合、クエリ履歴テーブル４１には、必ず「ｔｅｎａｎｔＩＤ」に関する条件が記録される。このように、「ｔｅｎａｎｔＩＤ」が必ず記録されたクエリ履歴テーブル４１を元に、グルーピング処理を行った場合、テナントＩＤが等しいデータがグルーピングされやすくなる。この時、テナント毎に保持されるデータ数が大きく異なる場合、前期グルーピングの結果、作成される各グループに含まれるデータ数にも大きく差が生まれる。 When such a query is given by the request receiving unit 21, the query history table 41 always records a condition related to “tenantID”. As described above, when the grouping process is performed based on the query history table 41 in which “tenantID” is always recorded, data having the same tenant ID is easily grouped. At this time, if the number of data held for each tenant is greatly different, the number of data included in each created group is greatly different as a result of the previous grouping.

一般的に、検索や削除等の処理のパフォーマンスに影響を大きく与えるファクターの一つとして、そのデータベースシステムに保持されているデータの総数があげられる。総数が多ければ多いほど、検索や削除にかかる時間は増加する。そのため、複数のデータベースシステム間で、それぞれのデータベースシステムが保持するデータ数に偏りがある場合、操作対象のデータベースシステムの違いによって、著しくパフォーマンスが劣化することになる。これは、ユーザーエクスペリエンスの低下につながる。一般的には、どういった状況においても、同等のパフォーマンスを持って、ユーザにシステムを提供することが重要とされる。これらの理由から、分散データベースシステムにおいて、各データベースシステムに保持されるデータ数が偏りなく、均等に配置されていることが重要となる。 In general, one factor that greatly affects the performance of processing such as search and deletion is the total number of data held in the database system. The larger the total number, the longer it takes to search and delete. Therefore, when there is a bias in the number of data held by each database system among a plurality of database systems, the performance is significantly degraded due to the difference in the database system to be operated. This leads to a poor user experience. In general, in any situation, it is important to provide a system to users with equivalent performance. For these reasons, in the distributed database system, it is important that the number of data held in each database system is evenly arranged without being biased.

そこで、第２実施形態においては、同時に使用される傾向の高いデータ同士を、一つのデータベースシステムにまとめつつも、各データベースシステムが保持するデータ数の差を是正することを目的とする。
上記の具体例の場合、グルーピング処理結果の各グループに含まれるデータ数を比較し、著しくデータ数に著しい差がある場合には、そのグループに含まれるデータに共通する条件、上記例の場合だとテナントＩＤ、を削除し、再度グルーピング処理を行う。このようにすることで、リクエストに必ず付与される条件が存在しており、それが原因となり、データ数に差が発生するような場合にも、そのデータ数の差を是正した再配置処理が可能となる。 Therefore, in the second embodiment, an object is to correct the difference in the number of data held by each database system while collecting the data that are likely to be used at the same time into one database system.
In the case of the above specific example, the number of data included in each group of the grouping processing result is compared, and when there is a significant difference in the number of data, the conditions common to the data included in the group are the case of the above example. And the tenant ID are deleted, and the grouping process is performed again. By doing this, there is a condition that must be given to the request, and even if there is a difference in the number of data due to that condition, relocation processing that corrects the difference in the number of data is performed. It becomes possible.

〔Ｂａｔｃｈサーバ３の処理〕
図１２は、本実施形態を示すサーバ装置の制御方法を説明するフローチャートである。本例は、Ｂａｔｃｈサーバ３がデータ再配置を実行するときの処理例である。なお、各ステップは、Ｂａｔｃｈサーバ３のＣＰＵが記憶装置に記憶された制御プログラム（図２に示したモジュール）を実行することで実現される。以下、制御主体を図２に示したモジュールとして説明する。また、本フローが実施される前に、本データ管理システムに対して、しきい値αが何らかの手段で入力されているものとする。 [Process of Batch Server 3]
FIG. 12 is a flowchart for explaining a control method of the server apparatus according to the present embodiment. This example is a processing example when the Batch server 3 executes data rearrangement. Each step is realized by the CPU of the Batch server 3 executing a control program (the module shown in FIG. 2) stored in the storage device. Hereinafter, the control entity will be described as the module shown in FIG. Further, it is assumed that the threshold α is input to the data management system by some means before the flow is executed.

本実施例における、データ数の偏りの是正処理は、図１２のＳ１２０１〜Ｓ１２０５までの処理で実現される。これらの処理は、図１０で示したＳ１００６と、Ｓ１００７の間にて、データグルーピング処理部３３によって実施される。Ｓ１００１〜Ｓ１００６、およびＳ１００７〜Ｓ１０２０の処理は、前述したものと違いはないため、説明を割愛する。
Ｓ１２０１では、データグルーピング処理部３３は、全データベースシステム内に存在するデータ数を算出する。この全データ数をＡという変数で定義する。Ｓ１２０２では、データグルーピング処理部３３は、均等にデータがＮ個のデータベースシステムに分散した時のデータ数Ａ／Ｎを求める。 In the present embodiment, the correction processing for the deviation in the number of data is realized by the processing from S1201 to S1205 in FIG. These processes are performed by the data grouping processing unit 33 between S1006 and S1007 shown in FIG. Since the processes of S1001 to S1006 and S1007 to S1020 are not different from those described above, the description thereof is omitted.
In S1201, the data grouping processing unit 33 calculates the number of data existing in all database systems. This total number of data is defined by a variable A. In S1202, the data grouping processing unit 33 obtains the number of data A / N when the data is evenly distributed to N database systems.

Ｓ１２０３では、データグルーピング処理部３３は、Ｓ１２０２で求めたＡ／Ｎにしきい値αをかけた値以上のデータ数を持つグループが、Ｓ１００５〜Ｓ１００６で得られるグループに存在するかを求める。この処理の目的は、均等にデータが分散された状態（各グループにＡ／Ｎ個ずつデータが存在する）から、大幅に外れたデータ数を持つグループが存在するか否か求めることである。大幅に外れたデータ数を持つグループが存在するとデータグルーピング処理部３３が判断した場合には、Ｓ１２０４に遷移し、しないとデータグルーピング処理部３３が判断した場合には、Ｓ１００７に遷移する。 In S1203, the data grouping processing unit 33 determines whether there is a group having the number of data equal to or greater than the value obtained by multiplying the A / N determined in S1202 by the threshold value α in the groups obtained in S1005 to S1006. The purpose of this processing is to determine whether or not there is a group having a number of data that deviates significantly from a state where data is evenly distributed (A / N data exists in each group). If the data grouping processing unit 33 determines that there is a group having a significantly different number of data, the process proceeds to S1204. If the data grouping processing unit 33 determines that there is no group, the process proceeds to S1007.

Ｓ１２０４では、データグルーピング処理部３３は、Ｓ１２０３で求められたグループに含まれる全データに共通して発行されているクエリが存在するか否かを、Ｓ１００１で取得したクエリ履歴群より求める。共通する条件があるとデータグルーピング処理部３３が判断した場合には、データグルーピング処理部３３は、Ｓ１２０５に遷移し、存在しないとデータグルーピング処理部３３が判断した場合には、Ｓ１００７に遷移する。
Ｓ１２０５では、データグルーピング処理部３３は、Ｓ１２０４で求めた条件を、Ｓ１００１で取得したクエリ履歴群から削除し、Ｓ１００２に遷移する。
以上、Ｓ１２０１〜Ｓ１２０５の処理を実施することで、データ数の偏りを是正することが可能となる。 In S1204, the data grouping processing unit 33 determines whether there is a query issued in common to all data included in the group determined in S1203 from the query history group acquired in S1001. If the data grouping processing unit 33 determines that there is a common condition, the data grouping processing unit 33 transitions to S1205. If the data grouping processing unit 33 determines that there is no common condition, the data grouping processing unit 33 transitions to S1007.
In S1205, the data grouping processing unit 33 deletes the condition obtained in S1204 from the query history group acquired in S1001, and the process proceeds to S1002.
As described above, by performing the processing of S1201 to S1205, it is possible to correct the deviation in the number of data.

本発明の各工程は、ネットワーク又は各種記憶媒体を介して取得したソフトウエア（プログラム）をパソコン（コンピュータ）等の処理装置（ＣＰＵ、プロセッサ）にて実行することでも実現できる。 Each process of the present invention can also be realized by executing software (program) acquired via a network or various storage media by a processing device (CPU, processor) such as a personal computer (computer).

本発明は上記実施形態に限定されるものではなく、本発明の趣旨に基づき種々の変形（各実施形態の有機的な組合せを含む）が可能であり、それらを本発明の範囲から除外するものではない。 The present invention is not limited to the above embodiment, and various modifications (including organic combinations of the embodiments) are possible based on the spirit of the present invention, and these are excluded from the scope of the present invention. is not.

１０１コンピュータ
１０５プリントデバイス
３０２印刷管理装置
３０４機器情報管理装置 101 Computer 105 Print Device 302 Print Management Device 304 Device Information Management Device

Claims

A first server device that receives processing requests for data groups managed by a plurality of database systems from a client device and executes data processing for the plurality of database systems, and a second server that operates the data groups of each database system A data management system including a device,
The first server device is
First analysis means for analyzing conditions set in processing requests for a plurality of database systems received from the client device;
Obtaining means for obtaining information identifying the location of data corresponding to the condition analyzed by the first analyzing means from any database system;
An execution means for executing a data processing operation on a data group of each database system based on the information to be specified acquired by the acquisition means;
One of the database systems
First management means for acquiring a condition history analyzed by the first analysis means from the first server device and managing the condition history using a storage device;
And second management means for managing information specifying the location of data corresponding to the condition analyzed by the first analysis means in the storage device,
The second server device is
Information for identifying the location as to whether or not a plurality of data to be processed simultaneously corresponding to each condition is distributed and arranged in each database system by acquiring the condition history managed by any database system Second analysis means for analyzing using,
Update to update the information specifying the location based on the analysis result by the second analysis means when a plurality of data to be processed simultaneously corresponding to each condition is distributed in each database system Means,
Relocation means for performing relocation processing so that a data group to be operated simultaneously corresponding to each condition is arranged in any database system using the information specifying the location updated by the update means; A data management system comprising:

The second analyzing means includes
Calculation means for obtaining a correlation coefficient of each data operated simultaneously under each condition using information for specifying the location acquired from any database system;
Grouping means for grouping each data from the value of the correlation coefficient calculated by the calculation means;
The data management system according to claim 1, further comprising an integration unit that integrates groups associated with data selected from the specific group into one group.

The second server device is
First determination means for determining whether the data group rearranged by the rearrangement means is biased in any database system;
A second determination means for determining whether or not a condition common to the grouped data group exists when it is determined that the data group is biased in any database system;
When it is determined that a condition common to the grouped data group exists, a deletion unit for deleting from the condition history,
The data management system according to claim 1, further comprising:

A server device that receives processing requests for data groups managed by a plurality of database systems from a client device, and executes data processing for a plurality of database systems,
First analysis means for analyzing conditions set in processing requests for a plurality of database systems received from the client device;
Obtaining means for obtaining information identifying the location of data corresponding to the condition analyzed by the first analyzing means from any database system;
An execution unit that executes a data processing operation on a data group of each database system based on the specified information acquired by the acquisition unit.

A server device that operates a data group of a plurality of database systems for storing and managing a predetermined data group,
Uses information that identifies the location of whether or not a plurality of data that should be processed simultaneously corresponding to each condition is distributed and arranged in each database system by acquiring condition history information managed by any database system Second analysis means for analyzing
Update to update the information specifying the location based on the analysis result by the second analysis means when a plurality of data to be processed simultaneously corresponding to each condition is distributed in each database system Means,
Relocation means for performing relocation processing so that a data group to be operated simultaneously corresponding to each condition is arranged in any database using the information specifying the location updated by the update means; A server device comprising:

A method of controlling a server device that receives a request for processing for a data group managed by a plurality of database systems from a client device and executes data processing on the plurality of database systems,
A first analysis step of analyzing conditions set in processing requests for a plurality of database systems received from the client device;
An acquisition step of acquiring information specifying the location of data corresponding to the condition analyzed by the first analysis step from any database system;
An execution step of executing a data processing operation on a data group of each database system based on the specified information acquired by the acquisition step.

A control method for a server device that operates a data group of a plurality of database systems that stores and manages a predetermined data group,
Uses information that identifies the location of whether or not a plurality of data that should be processed simultaneously corresponding to each condition is distributed and arranged in each database system by acquiring condition history information managed by any database system A second analysis step to analyze,
An update for updating the information specifying the location based on the analysis result of the second analysis step when a plurality of data to be processed simultaneously corresponding to each condition is distributed in each database system Process,
A relocation step for performing a relocation process so that a data group to be operated simultaneously corresponding to each condition is arranged in any database system using the information specifying the location updated in the update step; A control method for a server device, comprising:

A program for causing a computer to function as each means according to claim 4 or 5.