JP2007310592A

JP2007310592A - Data processing system

Info

Publication number: JP2007310592A
Application number: JP2006138114A
Authority: JP
Inventors: Yuzo Ishida; 裕三石田; Toshiyuki Koyama; 敏幸小山; Masaaki Wakai; 昌明若井; Michiharu Ibuka; 道春井深; Yoshifumi Morikawa; 嘉文森川
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2006-05-17
Filing date: 2006-05-17
Publication date: 2007-11-29
Anticipated expiration: 2026-05-17
Also published as: JP4920303B2

Abstract

<P>PROBLEM TO BE SOLVED: To increase search speed by reducing disk I/O in a DB server. <P>SOLUTION: A data processing system 10 has a DB server 14 and an AP server 12. The DB server 14 has a database management system 28 and a database 30 storing a table. The AP server 12 has a memory 26, a data processing part 22 for commanding the DB server 14 to read data, and a data compression section 24 for converting data transmitted from the DB server 14 into data of a tree structure and storing it in the memory 26. The data processing part 22 executes the retrieval processing to data of the tree structure. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明はデータ処理システムに係り、特に、APサーバ及びDBサーバを備えたデータ処理システムにおける検索速度の高速化を実現する技術に関する。 The present invention relates to a data processing system, and more particularly to a technique for realizing an increase in search speed in a data processing system including an AP server and a DB server.

クライアントサーバ型システムの進展に伴い、より大規模な情報処理の要求に応えるために、データの表示をするクライアントの他にデータの加工を行うAPサーバ及びデータの格納をするDBサーバを備えた、いわゆる三層構造のクライアントサーバシステムが普及してきている。
また、処理速度の向上を図るため、複数のAPサーバを並列配置させることで負荷を分散させることも行われている。
特開２００５−１６５６１０ Along with the development of client-server systems, in order to meet the demand for larger-scale information processing, in addition to the client that displays data, it has an AP server that processes data and a DB server that stores data. A so-called three-layer client-server system has become widespread.
In order to improve the processing speed, a load is distributed by arranging a plurality of AP servers in parallel.
JP 2005-165610 A

ところで、APサーバは廉価なPCサーバで構成することができるため、設置台数を増加させることで処理速度を向上させることは比較的容易であるが、DBサーバについてはデータの同期を維持する必要性があるため、APサーバのように簡単に分散処理に移行することはできない。
もちろん、データベースシステムのベンダ各社は、様々な技術を駆使してソフトウェア及びハードウェアの両面からDBサーバ自体の高速化を図ってきており、その結果一定の成果は上がっているが、その分システムの価格が上昇することは否めない。
また、今後ともクライアントサーバ型システムに担わされるデータベースの規模が増大を続ける限り、いずれはディスクI/O（データの読み書き）速度が壁となり、DBサーバの性能アップでは対応できない時期が来るものと予想される。 By the way, the AP server can be configured with an inexpensive PC server, so it is relatively easy to increase the processing speed by increasing the number of installed servers, but it is necessary to maintain data synchronization for the DB server Therefore, it is not possible to shift to distributed processing as easily as AP server.
Of course, database system vendors have been using various technologies to speed up the DB server itself from both the software and hardware sides, and as a result, certain results have been achieved. It cannot be denied that prices will rise.
In addition, as long as the scale of the database handled by the client-server system continues to increase, the disk I / O (data read / write) speed will eventually become a barrier, and there will be a time when the performance of the DB server cannot be improved. is expected.

この発明は、従来のデータ管理システムが抱えていた上記の問題を解決するために案出されたものであり、DBサーバ自体の性能アップに依存するのではなく、APサーバのメモリ上にDBサーバのテーブルを再現することにより、検索速度の向上を実現することを目的としている。 The present invention has been devised to solve the above-described problems that the conventional data management system has, and does not depend on the performance improvement of the DB server itself, but the DB server is stored in the memory of the AP server. The purpose is to improve the search speed by reproducing the table.

上記の目的を達成するため、請求項１に記載したデータ処理システムは、DBサーバとAPサーバを備えたデータ処理システムであって、上記DBサーバが、データベース管理システムと、テーブルを格納したデータベースを備え、上記APサーバが、メモリと、上記DBサーバにSQL文を発行し、上記テーブルに格納されたデータの読み出しを指令するデータ読み出し手段と、DBサーバから送信されたデータを木構造のデータに変換し、上記メモリに格納するデータ圧縮手段と、上記木構造のデータに対し検索処理を実行する手段とを備えたことを特徴としている。 In order to achieve the above object, a data processing system according to claim 1 is a data processing system including a DB server and an AP server, and the DB server includes a database management system and a database storing tables. Provided, the AP server issues a SQL statement to the memory and the DB server, and instructs to read the data stored in the table; and the data transmitted from the DB server is converted into tree-structured data. Data compression means for converting and storing in the memory, and means for executing search processing on the tree-structured data are provided.

請求項２に記載したデータ処理システムは、請求項１のシステムであって、さらに上記のデータ圧縮手段が、上記DBサーバから送信された各レコードの中で、少なくとも最上位のデータ項目については、値が相互に重複する場合に一つのレコードのデータを残して他のレコードのデータを削除し、削除されたデータに従属していた下位のデータを残されたデータに関連付ける第１の処理と、残された最上位のデータ項目を一つの根の元に結合させ、木構造のデータを生成する第２の処理を実行することを特徴としている。 The data processing system according to claim 2 is the system according to claim 1, wherein the data compression unit further includes at least the highest data item in each record transmitted from the DB server. A first process that leaves data of one record when values overlap with each other, deletes data of other records, and associates subordinate data that was subordinate to the deleted data with the remaining data; The remaining top-level data items are combined into one root element, and a second process for generating tree-structured data is executed.

請求項３に記載したデータ処理システムは、請求項２のシステムであって、さらに上記のデータ読み出し手段が、上記SQL文において一または複数のデータ項目を指定することにより、上記DBサーバから上記テーブルをグループに分割して読み出す処理を実行し、上記のデータ圧縮手段が、上記第１の処理をグループ単位で実行することを特徴としている。 The data processing system according to claim 3 is the system according to claim 2, wherein the data reading unit specifies one or a plurality of data items in the SQL statement, so that the table is sent from the DB server. Is divided into groups, and the data compression unit executes the first process in units of groups.

請求項１及び２に記載したデータ処理システムにあっては、DBサーバのデータベース内に格納されていたテーブルが、ディスクに比べて高速アクセスが可能なAPサーバのメモリ上に木構造のデータとして圧縮された形で再現されるため、APサーバはDBサーバにアクセスすることなく、したがってディスクI/Oを発生させることもなく、必要なデータの高速検索が可能となり、処理速度の飛躍的な向上を実現できる。
また、表形式のデータを木構造のデータに変換する過程で重複するデータ項目のデータが除去され、全体のデータ量が圧縮される結果、比較的容量の小さいAPサーバのメモリでも効率的に必要データを収容することが可能となる。 3. The data processing system according to claim 1, wherein the table stored in the database of the DB server is compressed as tree-structured data on the memory of the AP server that can be accessed at a higher speed than the disk. Since the AP server does not access the DB server, and therefore does not generate disk I / O, high-speed retrieval of necessary data is possible, and the processing speed is dramatically improved. realizable.
In addition, duplicate data items are removed in the process of converting tabular data into tree-structured data, and the total amount of data is compressed. As a result, the AP server memory with a relatively small capacity is required efficiently. Data can be accommodated.

請求項３に記載したデータ処理システムの場合、APサーバがレコードをDBサーバから取り出すに際し、テーブル全体を一度に受け取るのではなく、特定のデータ項目の値を指定することにより、グループ単位に分割して受け取り、当該グループについて圧縮処理が完了した時点で次のグループに係るレコードを受け取る方式を採用しているため、比較的大きなテーブルであってもAPサーバのメモリ上に格納することが可能となる。 In the case of the data processing system according to claim 3, when the AP server retrieves the record from the DB server, the entire table is not received at once, but is divided into groups by specifying values of specific data items. Since the record for the next group is received when compression processing is completed for the group, even a relatively large table can be stored in the AP server memory. .

図１は、この発明に係るデータ処理システム10の全体構成図であり、このシステム10は、複数のAPサーバ12と、DBサーバ14と、ロードバランサ（負荷分散装置）16とを備えている。
ロードバランサ16と各APサーバ12間、及び各APサーバ12とDBサーバ14間はネットワークによって接続されている。
また、各APサーバ12に対しては、イントラネット18やインターネット等のネットワーク及びロードバランサ16を介して多数のクライアント端末20が接続されている。 FIG. 1 is an overall configuration diagram of a data processing system 10 according to the present invention, and this system 10 includes a plurality of AP servers 12, a DB server 14, and a load balancer (load balancer) 16.
The load balancer 16 and each AP server 12, and each AP server 12 and DB server 14 are connected by a network.
A large number of client terminals 20 are connected to each AP server 12 via a network such as an intranet 18 or the Internet and a load balancer 16.

各APサーバ12は、データ処理部22と、データ圧縮部24と、メモリ26とを備えている。
各APサーバ12のハードディスク（図示省略）には、OS及びこのシステム専用のアプリケーションプログラムがセットアップされており、APサーバ12のCPUがこれらのプログラムに従って動作することにより、上記のデータ処理部22及びデータ圧縮部24が実現される。 Each AP server 12 includes a data processing unit 22, a data compression unit 24, and a memory 26.
The hard disk (not shown) of each AP server 12 is set up with an OS and an application program dedicated to this system, and the CPU of the AP server 12 operates according to these programs. The compression unit 24 is realized.

DBサーバ14は、データベース管理システム（DBMS）28と、業務処理用の各種テーブルが格納されたデータベース30を備えている。
データベース管理システム28は、データベース30を管理し、データベース30に格納されたデータの入出力、更新、および所定の演算などを行う。 The DB server 14 includes a database management system (DBMS) 28 and a database 30 in which various tables for business processing are stored.
The database management system 28 manages the database 30, and performs input / output and update of data stored in the database 30, and predetermined calculations.

図２は、データベース30に格納されたテーブルの一例を示すものであり、このテーブルは、日付、店Cd（店コード）、商品Cd（商品コード）のデータ項目を備えている。
これらのデータ項目間には概念上の階層構造が存在し、日付は最上位項目、店Cdは中位項目、商品Cdは最下位項目に該当する。 FIG. 2 shows an example of a table stored in the database 30, and this table includes data items of date, store Cd (store code), and product Cd (product code).
There is a conceptual hierarchical structure between these data items, the date corresponds to the top item, the store Cd corresponds to the middle item, and the product Cd corresponds to the lowest item.

ロードバランサ16は、クライアント端末20から送信されたリクエストを、各APサーバ12にかかっている負荷に応じて分散する役割を果たす。 The load balancer 16 serves to distribute the request transmitted from the client terminal 20 according to the load applied to each AP server 12.

クライアント端末20は、ＰＣ等のコンピュータよりなり、OSの他に、Webブラウザプログラムや専用のアプリケーションプログラムがセットアップされている。 The client terminal 20 includes a computer such as a PC, and a Web browser program and a dedicated application program are set up in addition to the OS.

以下、図３及び図４のフローチャートに従い、このシステム10における処理手順を説明する。
まず、APサーバ12のデータ処理部22が起動すると（Ｓ10）、DBサーバ14に対してSQL文を発行し、データの抽出をリクエストする（Ｓ12）。
この際データ処理部22は、例えば図２のテーブルに格納された各レコードを、日付×店コードで特定されるグループ単位で、かつ日付、店コード、商品コードに基づいて昇順に整列させた状態で送信することを指令するSQL文を生成し、DBサーバ14に送信する。 Hereinafter, the processing procedure in the system 10 will be described with reference to the flowcharts of FIGS. 3 and 4.
First, when the data processing unit 22 of the AP server 12 is activated (S10), an SQL statement is issued to the DB server 14 to request data extraction (S12).
At this time, for example, the data processing unit 22 arranges the records stored in the table of FIG. 2 in ascending order based on the date, the store code, and the product code in units of groups specified by date × store code. The SQL statement instructing to be transmitted is generated and transmitted to the DB server 14.

DBサーバ14のデータベース管理システム28から対応のレコードが日付×店コードのグループ単位で送信されると、データ処理部22はこれをデータ圧縮部24に渡す（Ｓ14）。
データ圧縮部24は、各レコードの日付と店コードに重複する値が存在するか否かをチェックし、重複がある場合には一つの日付及び一つの店コードを残し、他のデータを除去した後、メモリ26に各データを格納する（Ｓ16）。 When the corresponding record is transmitted from the database management system 28 of the DB server 14 in groups of date × store code, the data processing unit 22 passes this to the data compression unit 24 (S14).
The data compression unit 24 checks whether there is a duplicate value in the date and store code of each record, and if there is a duplicate, leaves one date and one store code, and removes the other data Thereafter, each data is stored in the memory 26 (S16).

一つのグループに関してデータの圧縮（重複データの削除）及びメモリ26への格納が完了すると、データ処理部22は次のグループ（日付×店コード）に属するレコードの抽出を指令するSQL文をDBサーバ14に発行し（Ｓ18、Ｓ12）、データ圧縮部24によるデータの圧縮及びメモリ26への格納が実行される（Ｓ14、Ｓ16）。 When data compression (deletion of duplicate data) and storage in the memory 26 are completed for one group, the data processing unit 22 sends an SQL statement that instructs the extraction of records belonging to the next group (date × store code) to the DB server. 14 (S18, S12), the data compression unit 24 compresses the data and stores it in the memory 26 (S14, S16).

APサーバ12のデータ処理部22及びデータ圧縮部24は、対象となるテーブル上の全グループについて処理が完了するまで、Ｓ12〜Ｓ16のステップを繰り返す（Ｓ18）。
図５は、メモリ26に格納されたデータのイメージを示すものであり、(a)は2006年３月15日の店コード：600店に係るグループのデータに対応している。図示の通り、日付（20060315）及び店コード（600）のデータは先頭レコードについてのみ残されており、他のレコードからは削除されている。この時点で、日付及び店コードが削除されたレコードに係る商品コードのデータは、先頭レコードの商品コードと共に配列として残された日付及び店コードに関連付けられている。
また、(g)は2006年３月17日の店コード：601店に係るグループのデータに対応している。 The data processing unit 22 and the data compression unit 24 of the AP server 12 repeat steps S12 to S16 until processing is completed for all groups on the target table (S18).
FIG. 5 shows an image of data stored in the memory 26, and (a) corresponds to the data of the group relating to the store code: 600 stores on March 15, 2006. As shown in the figure, the data of the date (20060315) and the store code (600) are left only for the first record and are deleted from the other records. At this time, the data of the product code related to the record from which the date and the store code are deleted is associated with the date and the store code left as an array together with the product code of the top record.
Further, (g) corresponds to the group data relating to the store code: 601 stores on March 17, 2006.

つぎにデータ圧縮部24は、各グループの中で日付を同じくするもの同士を一つの日付の下に集約し、それぞれを一つの根（root）の元に結合する（この時点で重複する日付は削除される）。
この結果、図６に示すように、DBサーバ14のデータベース30内に格納されていたレコードが、APサーバ12のメモリ26上に木構造のデータとして再現される（Ｓ20）。
図２のテーブルにおいては、各レコード毎に日付及び店コードのデータを備えていたが、図５に示した木構造のデータの場合、上位のデータ項目である日付については一切の重複がない形で集約され、下位のデータ項目である店コードについても同一日付のグループ内では重複が存在しない形で集約されているため、データ容量の大幅な圧縮が達成されている。 Next, the data compressing unit 24 aggregates the same dates in each group under one date, and combines them under one root (the duplicate dates at this time are Deleted).
As a result, as shown in FIG. 6, the records stored in the database 30 of the DB server 14 are reproduced as tree-structured data on the memory 26 of the AP server 12 (S20).
In the table of FIG. 2, date and store code data are provided for each record. However, in the case of the tree-structured data shown in FIG. 5, the date that is the upper data item has no duplication. The store codes, which are the lower data items, are also aggregated in such a way that there is no duplication within the group of the same date, so that the data capacity is greatly reduced.

ここで、クライアント端末20からの検索リクエストをロードバランサ16経由でAPサーバ12が受信すると（図４のＳ22）、データ処理部22はメモリ26上に形成された木構造のデータに対して検索条件に該当するデータの抽出処理を実行し（Ｓ24）、その結果をクライアント端末22に送信する（Ｓ26）。 Here, when the AP server 12 receives a search request from the client terminal 20 via the load balancer 16 (S22 in FIG. 4), the data processing unit 22 searches the tree structure data formed on the memory 26 with a search condition. (S24), and the result is sent to the client terminal 22 (S26).

このデータ処理システム10にあっては、DBサーバ14のデータベース30内に格納されていたテーブルが、ディスクに比べて高速アクセスが可能なAPサーバ12のメモリ26上に木構造のデータとして圧縮された形で再現されるため、APサーバのデータ処理部22はDBサーバ14にアクセスすることなく、したがってディスクI/Oを発生させることもなく、必要なデータの高速検索が可能となる。 In this data processing system 10, the table stored in the database 30 of the DB server 14 is compressed as tree-structured data on the memory 26 of the AP server 12 that can be accessed at a higher speed than the disk. Since the data processing unit 22 of the AP server does not access the DB server 14 and thus does not generate disk I / O, the necessary data can be retrieved at high speed.

また、表形式のデータを木構造のデータに変換する過程で重複するデータ項目のデータが除去され、全体のデータ量が圧縮される結果、比較的容量の小さいAPサーバ12のメモリ26（一般に２ＧＢ程度）でも効率的に必要データを収容することが可能となる。 Further, in the process of converting tabular data into tree-structured data, duplicate data item data is removed and the entire data amount is compressed. As a result, the memory 26 of the AP server 12 having a relatively small capacity (generally 2 GB) However, the necessary data can be efficiently accommodated.

さらに、レコードをDBサーバ14から取り出すに際し、テーブル全体を一度に受け取るのではなく、日付及び店コードのデータ項目の値を指定することにより、グループ単位に分割して受け取り、当該グループについて圧縮処理が完了した時点で次のグループに係るレコードを受け取る方式を採用しているため、比較的大きなテーブルであってもAPサーバ12のメモリ26上に格納できるようになる。 Furthermore, when retrieving records from the DB server 14, the entire table is not received at once, but by specifying the values of the date and store code data items, the records are received divided into groups, and compression processing is performed for the group. Since a method of receiving a record relating to the next group at the time of completion is adopted, even a relatively large table can be stored in the memory 26 of the AP server 12.

この発明に係るデータ処理システムの全体構成図である。1 is an overall configuration diagram of a data processing system according to the present invention. DBサーバのデータベースに格納されたテーブルの一例を示す図である。It is a figure which shows an example of the table stored in the database of DB server. このシステムにおける処理手順を示すフローチャートである。It is a flowchart which shows the process sequence in this system. このシステムにおける処理手順を示すフローチャートである。It is a flowchart which shows the process sequence in this system. メモリに格納されたグループデータを示すイメージ図である。It is an image figure which shows the group data stored in memory. メモリ上に形成された木構造のデータ構造を示すイメージ図である。It is an image figure which shows the data structure of the tree structure formed on memory.

Explanation of symbols

10 データ処理システム
12 APサーバ
14 DBサーバ
16 ロードバランサ
18 イントラネット
20 クライアント端末
22 データ処理部
24 データ圧縮部
26 メモリ
28 データベース管理システム
30 データベース
10 Data processing system
12 AP server
14 DB server
16 Load balancer
18 Intranet
20 Client terminal
22 Data processing section
24 Data compression section
26 memory
28 Database management system
30 database

Claims

A data processing system comprising a DB server and an AP server,
The DB server includes a database management system and a database storing tables.
The AP server issues an SQL statement to the memory and the DB server, and reads data stored in the table, and converts the data transmitted from the DB server into tree structure data. A data processing system comprising: data compression means for storing in the memory; and means for executing search processing on the tree-structured data.

The above data compression means leaves data of one record when the values overlap each other at least for the highest data item in each record transmitted from the DB server. A first process that associates the subordinate data that was dependent on the deleted data with the remaining data;
2. The data processing system according to claim 1, wherein the second highest level data item is generated by combining the remaining top-level data items with one root element and generating tree-structured data.

The data reading means executes a process of reading the table divided into groups from the DB server by specifying one or more data item values in the SQL statement,
The data processing system according to claim 2, wherein the data compression unit executes the first process in units of groups.