JPH10269248A

JPH10269248A - Method for random extraction of data in data base processing system, and data base processing system based upon the same

Info

Publication number: JPH10269248A
Application number: JP10026493A
Authority: JP
Inventors: Kazutomo Ushijima; 一智牛嶋; Shinji Fujiwara; 真二藤原; Kazuo Masai; 一夫正井; Yori Takahashi; ヨリ高橋; Itaru Nishizawa; 格西澤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1997-01-24
Filing date: 1998-01-23
Publication date: 1998-10-09

Abstract

PROBLEM TO BE SOLVED: To improve the throughput of the random extracting processing in a data base processing system. SOLUTION: It is made possible to issue an inquiry including random extraction in an inquiry issuing process 2, and application order between the random extraction and other inquiries is changed in an inquiry converting process 8 in consideration of extraction units of the random extraction. Further, random access to a secondary storage device is reduced in a record managing process 4. Thus, the inquiry including the random extraction can be issued and inquiry conversion is performed in consideration of the extraction units to enable application to an inquiry including a totalizing process and further improve the efficiency of inquiries over a wide range. The random access to the secondary storage device is reduced to be able to improve the efficiency more.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は関係データベースの
問合せ処理方法に係わり、特に大規模なデータベースに
対する無作為抽出処理を含む問合せを効率よく実行する
ための無作為抽出処理方法関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a query processing method for a relational database, and more particularly to a random sampling processing method for efficiently executing a query including a random sampling process for a large-scale database.

【０００２】[0002]

【従来の技術】近年、企業内情報処理システムの普及に
より、取引情報・顧客情報などの様々な業務データがデ
ータベースに蓄積されるようになり、データベースを通
じて利用可能となる情報の範囲が急速に拡大しつつあ
る。このときデータベースに蓄積された大規模データを
解析し、データの持つ特徴や規則性を抽出することで、
ビジネスチャンスの拡大や業務効率向上に役立てること
を目的とするデータマイニング処理に対する需要が拡大
している。一般にデータマイニング処理では、大規模デ
ータが持つ特徴や規則性を様々な観点から解析し抽出す
るために、データの項目の組み合わせや条件設定を変え
て問合せを発行し、繰り返しデータの解析を行う必要が
ある。しかし、データベースに蓄積されるデータサイズ
が拡大するにつれて、一回の問合せ処理の所要時間が増
大し、効率よくデータの特徴や規則性を抽出することが
困難になりつつある。2. Description of the Related Art In recent years, with the spread of in-company information processing systems, various business data such as transaction information and customer information have been accumulated in a database, and the range of information available through the database has rapidly expanded. I am doing it. At this time, by analyzing the large-scale data stored in the database and extracting the features and regularities of the data,
There is an increasing demand for data mining processing aimed at expanding business opportunities and improving business efficiency. In general, in data mining processing, in order to analyze and extract features and regularities of large-scale data from various viewpoints, it is necessary to issue queries with different combinations of data items and condition settings and analyze the data repeatedly There is. However, as the size of data stored in the database increases, the time required for one inquiry process increases, and it is becoming difficult to efficiently extract data features and regularity.

【０００３】問合せ処理の応答性を向上するための技術
としては、文献"ACM SIGMOD International Conference
on Management of Data(SIGMOD'96)"(ACM Press発行)
のP.205-216に開示されているデータキューブ方式があ
る。この方式では、問合せを受け付ける前にあらかじめ
予想される問合せの処理を行っておき、すでに処理済み
の問合せが発行された場合は実際には問合せ処理を行わ
ずに処理済みの結果だけを返すというアプローチであ
る。しかしこのアプローチでは、問合せ結果を事前に用
意しておくために大量の記憶領域を必要とし、事前に処
理を行って対処できる問合せの範囲が限られるために、
事前に問合せ結果を用意していない問合せに対しては、
多大な処理時間が必要とされるという欠点がある。一
方、大規模データからの特徴量の算出や規則性の抽出に
おいては、大規模データの持つ傾向や特徴を得ることが
できればよく、正確な問合せ結果は必要とされないこと
が多い。そこで問合せ処理時間を大幅に削減する方法と
して、問合せ処理へ無作為抽出処理を導入し、特徴量や
規則性を無作為抽出されたデータから推定することで、
処理対象となるデータ量を削減し、応答時間の短縮を図
ることが有効である。[0003] As a technique for improving responsiveness of an inquiry process, a document "ACM SIGMOD International Conference"
on Management of Data (SIGMOD'96) "(published by ACM Press)
There is a data cube method disclosed in P.205-216. In this method, an expected query is processed before the query is accepted, and if a processed query is issued, only the processed result is returned without actually performing the query processing. It is. However, this approach requires a large amount of storage space to prepare query results in advance, and the range of queries that can be processed and handled in advance is limited.
For queries that do not provide query results in advance,
There is a disadvantage that a great deal of processing time is required. On the other hand, in the calculation of the feature amount and the extraction of the regularity from the large-scale data, it suffices if the tendency and the characteristic of the large-scale data can be obtained, and an accurate query result is not often required. Therefore, as a method of greatly reducing the query processing time, a random extraction process is introduced into the query process, and the feature amount and regularity are estimated from the randomly extracted data.
It is effective to reduce the amount of data to be processed and shorten the response time.

【０００４】無作為抽出処理を含む問合せの実行におい
ては、単に無作為抽出によってデータ処理量を減らすだ
けでなく、問合せの実行前に、問合せの処理の結果を変
えずにより実行効率のよい等価な問合せに変換すること
で大幅に実行時間を短縮する事ができることが重要であ
る。すなわち、無作為抽出処理を問合せ処理のなるべく
早い段階で適用して後続の処理の対象となるデータ量を
削減することで、問合せ全体の処理量を削減することが
できる。一般に問合せ処理の対象となるデータの論理構
造は図２に示すように表形式（２０）である。この表の
横方向をレコード（２１）、縦方向をカラム（２２）と
いう。各レコードの同じカラムは、同じ形式のデータを
格納する。レコードの集合である表に対してデータベー
ス処理を適用した後に得られるレコードの集合は、再び
表となる。表に対して適用されるデータベース処理と
は、表に対する条件評価処理、射影処理、結合処理およ
び分類集計処理を指す。In the execution of a query including a random sampling process, not only does the data processing amount be reduced by random sampling, but also the execution efficiency of the query is not changed before execution of the query. It is important that the execution time can be significantly reduced by converting it to a query. That is, by applying the random extraction processing as early as possible in the inquiry processing to reduce the amount of data to be subjected to subsequent processing, the processing amount of the entire inquiry can be reduced. Generally, the logical structure of the data to be subjected to the inquiry processing is in the form of a table (20) as shown in FIG. The horizontal direction of this table is called a record (21), and the vertical direction is called a column (22). The same column of each record stores data in the same format. The set of records obtained after applying the database processing to the table which is the set of records becomes a table again. The database processing applied to the table refers to a condition evaluation processing, a projection processing, a joining processing, and a classification and aggregation processing for the table.

【０００５】以下ではそれぞれの処理の内容について説
明する。まずデータベース処理システムにおける条件評
価処理とは、１つ以上の条件評価カラムおよびそれらの
カラムの値に設定された条件を指定し、処理対象となる
表に含まれるレコードのうち指定されたカラムに関して
指定された条件が成り立つようなレコードを抽出し、再
び表を構成する処理である。またデータベース処理シス
テムにおける射影処理とは、１つ以上の射影カラムを指
定し、処理対象となる表に含まれるそれぞれのレコード
について指定されたカラムだけを抜き出し、再び表を構
成する処理である。次にデータベース処理システムにお
ける結合処理とは、処理対象となる２つの表に共通に含
まれる１つ以上の結合カラムを指定し、片方の表に含ま
れる全てのレコードについて、他方の表に含まれるレコ
ードのうち、結合カラムに同じ値を持つ全てのレコード
との結合を行い、その結果生成される新しいレコードで
再び表を構成する処理である。さらにデータベース処理
システムにおける分類集計処理とは、１個以上のグルー
プ化カラムおよび１つ以上の集計対象カラムを指定し、
処理対象となる表に含まれるレコードを、指定したグル
ープ化カラムの値が同一のレコードを１つのグループと
して分類し、それぞれのグループ毎に集計対象カラムの
値に関する合計値あるいは平均値などの統計量を計算
し、その結果を１つのレコードとして出力する処理であ
る。[0005] The contents of each process will be described below. First, the condition evaluation process in the database processing system specifies one or more condition evaluation columns and the conditions set for the values of those columns, and specifies the specified columns in the records included in the table to be processed This is a process of extracting a record that satisfies the set condition and reconstructing the table. The projection process in the database processing system is a process of designating one or more projection columns, extracting only the designated columns for each record included in the table to be processed, and configuring the table again. Next, the join processing in the database processing system is to designate one or more join columns commonly included in two tables to be processed and to include all records included in one table in the other table In this process, all records having the same value in the join column are joined with each other, and the table is formed again with the new records generated as a result. Further, the classification and aggregation processing in the database processing system is to specify one or more grouping columns and one or more aggregation target columns,
The records included in the table to be processed are classified into records where the value of the specified grouping column is the same as one group, and the statistics such as the total value or the average value of the values of the aggregation target column for each group Is calculated, and the result is output as one record.

【０００６】また本発明では、以下のように定義される
無作為抽出処理をデータベースに導入し、無作為抽出処
理を含む問合せの変換方法について述べる。本発明のデ
ータベース処理システムにおける無作為抽出処理とは、
レコードの集合である表から無作為にレコードを選び出
し再び表として構成する処理である。無作為抽出処理に
おいて一回の抽出操作で取り出されるレコードの集まり
は抽出単位と呼ばれ、一回の抽出処理においては各抽出
単位の抽出確率が等しいことを保証される。In the present invention, a method of converting a query including a random extraction process by introducing a random extraction process defined as follows into a database will be described. The random extraction processing in the database processing system of the present invention
This is a process of randomly selecting records from a table that is a set of records and reconstructing them as a table. A collection of records extracted by one extraction operation in a random extraction process is called an extraction unit, and in one extraction process, the extraction probability of each extraction unit is guaranteed to be equal.

【０００７】このようにデータベースに対して発行され
る問合せは、問合せ対象となる表に対して、上述の様々
なデータベース処理を適切な順序で組み合わせて適用す
ることによって構成される。したがって無作為抽出処理
を含む問合せの最適化では、無作為抽出処理の無作為抽
出性を失わない範囲で問合せを変形し、無作為抽出処理
を問合せ処理のなるべく早い段階で適用し後続の処理の
データ量を削減することで、処理時間の短縮を図ること
が重要である。従来の無作為抽出処理を含む問合せの変
換方式としては文献"International Conference On Ver
y Large Data Bases(VLDB'86)"（ Morgan Kaufmann Pub
lishers, Inc. 発行）のP.160-169に開示されている問
合せ最適化方式をあげることができる。この方式は、無
作為抽出処理と条件評価処理、射影処理、結合処理など
の基本的なデータベース処理を含む問合せにおいて、無
作為抽出処理の無作為抽出性を保存しつつ処理の適用順
序を変更するための問合せ変換処理について開示してい
る。A query issued to a database as described above is configured by applying the various database processes described above in combination in an appropriate order to a table to be queried. Therefore, in query optimization including random extraction processing, the query is modified within a range that does not lose the random extractability of the random extraction processing, the random extraction processing is applied as early as possible in the query processing, and the subsequent processing is performed. It is important to shorten the processing time by reducing the data amount. For a conventional query conversion method including random sampling processing, refer to the document "International Conference On Ver.
y Large Data Bases (VLDB'86) "(Morgan Kaufmann Pub
lishers, Inc.) (pp. 160-169). This method changes the order of application of queries for queries including basic database processing such as random extraction processing, condition evaluation processing, projection processing, and join processing, while preserving the random extractability of random extraction processing. It discloses a query conversion process for the purpose.

【０００８】[0008]

【発明が解決しようとする課題】従来方式における第一
の課題は、問合せ変換処理において、無作為抽出処理の
最適化を行おうとする問い合わせに分類集計処理を含む
場合に適用することができず、データマイニング応用に
おける問合せの最適化方式としては、限られた有効性し
か発揮できないことである。分類集計処理を含む問合せ
変換処理において無作為抽出処理の最適化を行う際の問
題点は、無作為抽出処理の抽出単位が適切に扱われない
ことにある。例えば、商品の売上明細情報を顧客ごとに
分類し、それぞれの顧客の購入パターンを調べようとし
た場合、単純に商品の売上明細情報レベルで無作為抽出
を行った後、顧客毎ごとの購入パターンの解析を行おう
としても、個々の顧客の商品購入履歴は完全なものを得
ることができないために、効率良く購入パターンを解析
することができない。これは、顧客毎の購入パターン解
析においては顧客毎の購買履歴を抽出単位として抽出
し、購入パターン解析を行うべきところを、これを無視
した無作為抽出処理を行ったためである。また、第二の
課題は、レコード読み出し処理において、磁気ディスク
装置などの記憶装置に格納されたレコードに対して無作
為抽出処理を適用する場合、抽出すべきレコードの格納
位置がランダムとなるため、記憶装置に対するランダム
アクセスが発生し、その結果無作為抽出処理時間が増大
することである。さらに、第三の課題は、問合せ発行処
理において、問合せ処理時間や問合せ結果の精度などを
指定する機構がなかったため、データの一部を無作為抽
出により読み出し、ユーザが望むような問合せ結果の推
定を行う問合せ発行をユーザが簡便に発行できないこと
である。The first problem in the conventional method is that it cannot be applied to the case where the query conversion process includes a classification and aggregation process in the query for optimizing the random extraction process. As a query optimization method in data mining applications, it has only a limited effectiveness. The problem in optimizing the random extraction process in the query conversion process including the classification and aggregation process is that the extraction unit of the random extraction process is not properly handled. For example, when the sales details information of a product is classified for each customer, and the purchase pattern of each customer is to be examined, after simply performing random extraction at the sales details information level of the product, the purchase pattern for each customer is determined. However, since the complete purchase history of each customer cannot be obtained, the purchase pattern cannot be analyzed efficiently. This is because, in the purchase pattern analysis for each customer, the purchase history for each customer is extracted as an extraction unit, and where the purchase pattern analysis is to be performed, random extraction processing that ignores this is performed. Further, the second problem is that, in the record reading process, when a random extraction process is applied to a record stored in a storage device such as a magnetic disk device, the storage position of the record to be extracted is random, A random access to the storage device occurs, which results in an increase in the random sampling processing time. Further, the third problem is that, in the query issuing process, since there is no mechanism for specifying the query processing time and the accuracy of the query result, etc., a part of the data is read out by random sampling, and the query result desired by the user is estimated. Cannot be easily issued by the user.

【０００９】本発明の主目的である第１の目的は、上記
第一の課題を解決し、無作為抽出処理を含む問合せを効
率よく実行するための問合せ実行に際して、抽出単位を
考慮した問合せ変換処理としての問合せ最適化を行うこ
とで、分類集計処理を含む問合せに対しても適用可能な
無作為抽出処理に関する問合せ変換処理方法を提供する
ことである。本発明の第２の目的は、記憶装置からのレ
コードの無作為抽出において、記憶装置に対するランダ
ムアクセスを発生させない効率的なレコード格納及び読
み出し方法を提供することである。本発明の第３の目的
は、問合せの処理実行時間あるいは問合せ結果の精度を
指定可能な問合せ発行方法を提供することである。A first object, which is a main object of the present invention, is to solve the above first problem, and to execute a query for efficiently executing a query including a random sampling process, to perform a query conversion in consideration of an extraction unit. An object of the present invention is to provide a query conversion processing method related to a random extraction process that can be applied to a query including a classification and aggregation process by performing query optimization as a process. A second object of the present invention is to provide an efficient record storage and reading method that does not generate random access to a storage device in random extraction of records from the storage device. A third object of the present invention is to provide a query issuing method capable of designating a query execution time or a query result accuracy.

【００１０】[0010]

【課題を解決するための手段】本発明は、上記第１〜３
の目的を達成するため、それぞれ以下の（１）〜（３）
の手段を有する。（１）無作為抽出処理を含む問合せに対して無作為抽出
カラムの概念を導入し、抽出単位を考慮した問合せの変
換を行うことで分類集計処理を含む問合せをより実行効
率の良い問合せに変換する問合せ変換処理を備える。こ
こで無作為抽出カラムとは、無作為抽出処理における抽
出単位を指定するために抽出対象の表に対して指定され
る一つ以上のカラムである。無作為抽出カラムを利用し
た無作為抽出処理では、一回の無作為抽出操作におい
て、それぞれの無作為抽出カラムに無作為に値を割り当
て、それぞれのカラムに割り当てられた値を持つレコー
ドを表から全て抽出し、無作為抽出結果とする。本発明
では、無作為抽出カラムの値が互いに等しいレコードの
集まりを一つの抽出単位として扱うように問合せの変換
を行う。このことにより例えば先の例では顧客番号を無
作為抽出カラムとして指定して同じ顧客番号を持つレコ
ードを単位として抽出を行うことで、完全な顧客購買履
歴を得ることができる。ただし以下では、無作為抽出カ
ラムSCが指定されない無作為抽出処理は、レコードのカ
ラム値とは無関係にレコード単位の無作為抽出処理を行
い、無作為抽出カラムSCがNULL(空集合)となるような無
作為抽出処理では、全件抽出を行うことと定義する。ま
た、本発明における無作為抽出処理では、無作為抽出カ
ラムSCに加えてサンプルグループ化カラムSGCが指定さ
れる場合がある。SGCが指定された場合、表のレコード
はサンプルグループ化カラムSGCの値に応じてグループ
に分けられ、それぞれのグループにおいて無作為抽出カ
ラムSCの値が互いに等しいレコードの集まりが一つの抽
出単位として抽出されるが、サンプルグループ化カラム
が指定された無作為抽出処理では、抽出単位の抽出確率
がそれぞれのグループ内で等しいことが保証されると定
義する。SUMMARY OF THE INVENTION The present invention relates to the first to third aspects.
In order to achieve the objective of (1), (1) to (3) below
Means. (1) Introduce the concept of random extraction columns for queries that include random extraction processing and convert queries that take into account the unit of extraction to convert queries that include classification and aggregation processing to more efficient queries Query conversion processing. Here, the random extraction column is one or more columns specified for a table to be extracted in order to specify an extraction unit in the random extraction process. In the random extraction process using a random extraction column, in one random extraction operation, values are randomly assigned to each random extraction column, and records with values assigned to each column are read from the table. All are extracted and the result is randomly selected. In the present invention, a query is converted so that a set of records having the same value in the random extraction column is treated as one extraction unit. Thus, for example, in the above example, a complete customer purchase history can be obtained by designating a customer number as a random extraction column and extracting records having the same customer number as a unit. However, in the following, in the random extraction process where the random extraction column SC is not specified, the random extraction process is performed on a record basis irrespective of the column value of the record, and the random extraction column SC will be NULL (empty set) In the random extraction process, it is defined that all items are extracted. In the random extraction processing according to the present invention, the sample grouping column SGC may be specified in addition to the random extraction column SC. When SGC is specified, the records in the table are divided into groups according to the value of the sample grouping column SGC, and a set of records with the same value of the random sampling column SC in each group is extracted as one extraction unit However, in the random extraction processing in which the sample grouping column is specified, it is defined that the extraction probabilities of the extraction units are guaranteed to be equal in each group.

【００１１】（２）無作為抽出処理の対象となるレコー
ドのレコード記憶装置に対するレコードの格納及び読み
出しにおいて、レコードの無作為抽出カラムにハッシュ
関数を適用し、そのハッシュ値に基づいてレコードの格
納及び読み出しを行うことでレコード記憶装置に対する
ランダムアクセスを削減するレコード格納処理及びレコ
ード読み出し処理を備える。(2) In storing and reading out a record to be stored in a record storage device of a record to be subjected to random extraction processing, a hash function is applied to a random extraction column of the record, and the storage and storage of the record are performed based on the hash value. A record storing process and a record reading process for reducing random access to the record storage device by performing reading are provided.

【００１２】（３）問合せ発行時に、問合せ処理所要時
間や問合せ結果の推定値の精度を指定し、データベース
の規模や問合せの複雑度に応じて無作為抽出抽出される
レコードの量を調節することで、任意の応答時間や精度
を持つ問合せ処理を実現する問合せ結果評価処理を備え
る。(3) At the time of issuing a query, the time required for query processing and the accuracy of the estimated value of the query result are specified, and the amount of records randomly extracted is adjusted according to the size of the database and the complexity of the query. And a query result evaluation process for realizing a query process having an arbitrary response time and accuracy.

【００１３】本発明による更に他の変形例およびこれを
実現するための構成については、実施例において述べ
る。Still another modification according to the present invention and a configuration for realizing the modification will be described in embodiments.

【００１４】[0014]

【発明の実施の形態】図１に本発明におけるデータベー
ス処理システムの一実施例を示す。まず図１を用いて、
本実施例の構成について説明する。本実施例の無作為抽
出処理方法は、端末装置１からの入力に従って問合せ文
を生成する問合せ発行処理５および問合せ処理結果と評
価結果を端末装置に対して表示する問合せ処理結果表示
処理６を備える問合せ発行処理２、前記問合せ文から実
行可能な中間コードおよび問合せ結果評価基準を生成す
る実行手順生成処理７および中間コードをより実行効率
の良い中間コードに変換する問合せ変換処理８および中
間コードにしたがって問合せ処理を行う問合せ処理実行
処理９および問合せ結果を前記問合せ評価基準にしたが
って評価する問合せ処理結果評価処理１０を備える問合
せ実行管理処理３、データとしてのレコードの読み出し
を行うレコード読み出し処理１１およびレコードのレコ
ード記憶装置１２への格納を行うレコード格納処理１３
を備えるレコード管理処理４、により構成される。これ
ら一連の処理をプログラム化して記録媒体に記録してお
けば任意の場所で本発明を利用出来ることになる。FIG. 1 shows an embodiment of a database processing system according to the present invention. First, using FIG.
The configuration of the present embodiment will be described. The random extraction processing method according to the present embodiment includes a query issuing process 5 for generating a query sentence according to an input from the terminal device 1 and a query processing result display process 6 for displaying the query processing result and the evaluation result on the terminal device. In accordance with a query issuing process 2, an execution procedure generating process 7 for generating an executable intermediate code and a query result evaluation criterion from the query statement, a query conversion process 8 for converting the intermediate code into an intermediate code with higher execution efficiency, and an intermediate code A query execution management process 3 including a query process execution process 9 for performing a query process and a query process result evaluation process 10 for evaluating a query result according to the query evaluation criterion, a record reading process 11 for reading a record as data, and a record Record storage processing 1 for storing data in the record storage device 12
And a record management process 4 comprising: If the series of processes is programmed and recorded on a recording medium, the present invention can be used at any location.

【００１５】以下、図１を用いて本実施例の動作につい
て説明する。まず問合せ発行処理５は、端末装置１から
の入力にしたがって問合せ文を生成する。実行手順生成
処理７は、前記問合せ発行処理５が生成した問合せ文を
参照し、問合せ実行手順と問合せ結果評価基準を生成す
る。さらに問合せ変換処理８は、前記実行手順生成処理
７が生成した問合せ実行手順をより実行効率の良い問合
せ実行手順に変換する。続いて問合せ実行処理９は、前
記問合せ変換処理８が変換した問合せ実行手順にしたが
って、レコード読み出し処理１１に対してレコード読み
出し要求を発行し、読み出したレコードを加工すること
で問合せ結果を生成する。前記問合せ実行処理９は、問
合せ結果評価処理１０から指示されるまで問合せ処理の
実行を続ける。レコード読み出し処理１１は、前記問合
せ実行処理９からのレコード読み出し、要求に従ってレ
コード記憶領域１２にレコード格納処理１３によって格
納されているレコードを読み出し、読み出されたレコー
ドを前記問合せ実行処理９に対して受け渡す。引き続き
問合せ結果評価処理１０は、前記問合せ実行処理９が生
成した問合せ処理結果を前記実行手順生成処理７が生成
した問合せ結果評価基準に従って評価し、前記問合せ処
理結果および前記評価結果を問合せ結果表示処理６に対
して送信するとともに、問合せ実行処理を中止すべきか
の判断を行い、もし中止すべき場合は前記問合せ実行処
理９に対して問合せ処理の中止を指示する。最後に問合
せ結果表示処理６は、前記問合せ処理結果評価処理１０
が生成した問合せ処理結果とその評価結果を受け取り、
これを端末装置１に対して表示する。The operation of this embodiment will be described below with reference to FIG. First, the inquiry issuing process 5 generates an inquiry sentence according to the input from the terminal device 1. The execution procedure generating process 7 refers to the query sentence generated by the query issuing process 5 and generates a query execution procedure and a query result evaluation criterion. Further, the query conversion process 8 converts the query execution procedure generated by the execution procedure generation process 7 into a query execution procedure with higher execution efficiency. Subsequently, the query execution process 9 issues a record read request to the record read process 11 according to the query execution procedure converted by the query conversion process 8 and processes the read record to generate a query result. The query execution process 9 continues to execute the query process until instructed by the query result evaluation process 10. The record reading process 11 reads the record from the query execution process 9, reads the record stored in the record storage area 12 by the record storage process 13 according to the request, and sends the read record to the query execution process 9. Hand over. Subsequently, the query result evaluation process 10 evaluates the query processing result generated by the query execution process 9 according to the query result evaluation criterion generated by the execution procedure generation process 7, and displays the query processing result and the evaluation result in a query result display process. 6 and determines whether or not the query execution process should be stopped, and if so, instructs the query execution process 9 to stop the query process. Finally, the query result display process 6 is the query process result evaluation process 10
Receives the query processing result and its evaluation result generated by
This is displayed on the terminal device 1.

【００１６】以下では、前記問合せ発行処理、問合せ手
順生成処理、問合せ変換処理、レコード読み出し処理、
及び問合せ処理結果評価処理の詳細について具体的な問
合せの例を用いて説明する。まず、具体例として図３に
示すような３つの表からなるデータベースについて考え
る。顧客表３１は、顧客番号・顧客分類・名前・住所の４
つのカラムからなる。このとき、顧客番号は顧客表のキ
ーカラムであり、表の各レコードごとにユニークな値を
持ち、このカラムの値が表におけるレコードを一意に決
定する。注文表３２は、注文番号・顧客番号・優先度・注
文日の４つのカラムからなる。このとき、注文番号は注
文表のキーカラムであり、表の各レコードごとにユニー
クな値を持ち、このカラムの値が表におけるレコードを
一意に決定する。また顧客番号は顧客表のキーカラム顧
客番号に対する外部キーであり、顧客表の顧客番号カラ
ムの値の範囲と注文表の顧客番号カラムの値の範囲は一
致している。商品表３３は、注文番号・品名・輸送手段・
単価の４つのカラムからなる。このとき注文番号は注文
表のキーカラム注文番号に対する外部キーであり、注文
表の注文番号カラムの値の範囲と商品表の注文番号カラ
ムの値の範囲は一致している。以下では顧客表の顧客番
号カラムを例えば顧客表.顧客番号と表わすことにす
る。In the following, the query issuing process, the query procedure generating process, the query converting process, the record reading process,
The details of the query process result evaluation process will be described using a specific query example. First, as a specific example, consider a database including three tables as shown in FIG. The customer table 31 has four fields of customer number, customer classification, name, and address.
Consists of three columns. At this time, the customer number is a key column of the customer table, has a unique value for each record of the table, and the value of this column uniquely determines a record in the table. The order table 32 has four columns: order number, customer number, priority, and order date. At this time, the order number is a key column of the order table, has a unique value for each record of the table, and the value of this column uniquely determines a record in the table. The customer number is a foreign key to the key column customer number of the customer table, and the value range of the customer number column of the customer table matches the value range of the customer number column of the order table. Commodity table 33 shows order number, product name, transportation means,
It consists of four columns of unit price. At this time, the order number is a foreign key corresponding to the key column order number in the order table, and the range of the order number column in the order table matches the range of the order number column in the product table. In the following, the customer number column of the customer table is represented as, for example, customer table.customer number.

【００１７】本実施例における問合せ発行処理５とは、
ＳＱＬ等のデータベース問合せ言語で記述された問合せ
文を実行手順生成処理７に受け渡す処理である。例えば
前記データベースに対してＳＱＬ形式で記述された以下
のような問合せ文を具体例として考える。 1:SELECT 顧客区分,優先度,輸送手段,AVG(RANDOM(注文
額)) 2:FROM SELECT注文番号,顧客区分,優先度,輸送手段,SUM
(商品表.単価) AS注文額 3: FROM顧客表,注文表,商品表 4: WHERE 顧客表.顧客番号 = 注文表.顧客番号 5: AND 注文表.注文番号 = 商品表.注文番号 6: GROUP BY注文表.注文番号,顧客表.顧客区分,注
文表.優先度,商品表.輸送手段 7:GROUP BY 顧客区分,優先度,輸送手段;ただし、この問
合せ文において文頭の数字は説明のための行番号であ
り、問合せ文の一部ではない。このとき１行目のキーワ
ードRANDOMは注文額の平均値を算出するのに無作為抽出
による推定をすることを指定している。またキーワード
RANDOMがキーワードSELECTの直前に指定された場合は、
レコードの抽出に無作為抽出を用いることを指定する。The inquiry issuing process 5 in the embodiment is as follows.
This is a process of transferring a query sentence described in a database query language such as SQL to the execution procedure generation process 7. For example, the following query sentence described in the SQL format with respect to the database is considered as a specific example. 1: SELECT Customer classification, priority, transportation means, AVG (RANDOM (order amount)) 2: FROM SELECT order number, customer classification, priority, transportation means, SUM
(Product table.Unit price) AS order amount 3: FROM customer table, order table, product table 4: WHERE customer table.Customer number = Order table.Customer number 5: AND order table.Order number = Product table.Order number 6: GROUP BY order table.Order number, customer table.customer class, order table.priority, product table.transportation means 7: GROUP BY customer classification, priority, transportation means; Is not part of the query statement. At this time, the keyword RANDOM on the first line specifies that estimation by random sampling is performed to calculate the average value of the order amount. Also keywords
If RANDOM is specified immediately before the keyword SELECT,
Specifies that random sampling is used to extract records.

【００１８】前記問合せ文の例は、顧客表・注文表・商品
表のそれぞれの表から顧客表.顧客番号と注文表.顧客番
号の値あよび注文表.注文番号と商品表.注文番号の値が
等しいレコード同士を結合し（3-5行目）、注文表.注文
番号,顧客表.顧客区分,注文表.優先度,商品表.輸送手段
の４つのカラムの値にしたがってレコードをグループ化
し（6行目）、それぞれのグループごとに商品表.単価の
合計を求め（２行目）、さらにその結果得られたレコー
ドを顧客区分,優先度,輸送手段の３つのカラムの値にし
たがってグループ化し（７行目）、最後にそれぞれのグ
ループごとに注文額の平均値を無作為抽出を用いて推定
することを指示する（１行目）。An example of the above-mentioned inquiry statement is as follows: customer table, customer number and order table, customer number value and order table, order number and product table, order number. Records with the same value are joined (lines 3-5), and records are grouped according to the values of the four columns: order table. Order number, customer table. Customer classification, order table. Priority, product table. (6th line), calculate the total of the product table and unit price for each group (2nd line), and divide the resulting record according to the values of the three columns of customer classification, priority, and means of transportation. Grouping (line 7), and finally, instructing to estimate the average value of the order amount for each group using random sampling (line 1).

【００１９】図１２は、図３に示した顧客表・注文表・商
品表のそれぞれの表から関連するカラムを抜き出し、顧
客表.顧客番号と注文表.顧客番号の値あよび注文表.注
文番号と商品表.注文番号の値が等しいレコード同士を
結合した結果を示す。この処理において一つ目の結合処
理の結合カラムは{顧客番号}、二つ目の結合処理の結合
カラムは{注文番号}である。FIG. 12 shows the relevant columns from each of the customer table, order table, and product table shown in FIG. 3, and shows the customer table, customer number and order table, customer number value, order table, and order. This shows the result of merging records with the same value of number and product table.order number. In this process, the joining column of the first joining process is {customer number}, and the joining column of the second joining process is {order number}.

【００２０】図１３は、図１２に示した結果を注文表.
注文番号,顧客表.顧客区分,注文表.優先度,商品表.輸送
手段の４つのカラムの値にしたがってレコードをグルー
プ化し、それぞれのグループごとに商品表.単価の合計
を求めた結果を示す。この処理においてグループ化カラ
ムは{注文番号,顧客区分,優先度,輸送手段}、集計対象
カラムは{単価}である。FIG. 13 is an order table showing the results shown in FIG.
Records are grouped according to the values of the four columns: order number, customer table, customer category, order table, priority, product table, and transportation means, and the result of calculating the total of the product table and unit price for each group is shown. . In this processing, the grouping column is {order number, customer classification, priority, transportation means}, and the tallying column is {unit price}.

【００２１】図１４は、図１３に示した結果を顧客区
分,優先度,輸送手段の３つのカラムの値にしたがってグ
ループ化し（７行目）、それぞれのグループごとに注文
額の平均値を算出した結果を示す。この処理においてグ
ループ化カラムは{顧客区分,優先度,輸送手段}、集計対
象カラムは{注文額}である。FIG. 14 groups the results shown in FIG. 13 according to the values of three columns of customer classification, priority, and means of transportation (the seventh line), and calculates the average value of the order amount for each group. The results are shown. In this process, the grouping column is {customer classification, priority, transportation means}, and the tallying target column is {order amount}.

【００２２】本実施例における実行手順生成処理７と
は、問合わせ発行処理５から発行された問合わせ文を、
問合わせ実行処理９において解釈実行が可能となる中間
コードに変換する処理である。一般に問合せ文をどのよ
うな中間コードに変換するかはデータベース処理システ
ム依存であり、ここではデータベース処理をそれぞれ以
下のような中間コードに変換して扱うことにする。すな
わち表Ｔに対する条件評価カラムをCCとする条件評価処
理をC(CC,T) 、表Ｔに対する射影カラムをPCとする射影
処理をP(PC,T)、表Sおよび表Ｔに対する結合カラムをJC
とする結合処理をJ(JC,S,T)、表Ｔに対する集計カラム
をAG、グループ化カラムをGCとする集計処理をA(AG,GC,
T)、さらに表Ｔに対する無作為抽出カラムをSC、サンプ
ルグループ化カラムをSGCとする無作為抽出処理をS(SC,
SGC,T)と表す。The execution procedure generating process 7 in the present embodiment is a process in which the query sentence from the query issuing process 5 is
This is a process of converting into an intermediate code that can be interpreted and executed in the query execution process 9. Generally, what kind of intermediate code is used to convert a query sentence depends on the database processing system. Here, the database processing is converted into the following intermediate codes and handled. That is, C (CC, T) is the condition evaluation process using the condition evaluation column for the table T as CC, P (PC, T) is the projection process using the projection column for the table T as PC, and the join columns for the table S and table T. JC
Is defined as J (JC, S, T), the aggregation column for the table T is AG, and the aggregation process for the grouping column is GC is A (AG, GC,
T), and the random extraction column for table T is SC and the sample grouping column is SGC, and the random extraction process is S (SC,
SGC, T).

【００２３】このとき前記問合せ例を中間コードに変換
した結果は以下のようになる。 A(注文額,{顧客区分,優先度,輸送手段}, S(無指定, {顧客区分,優先度,輸送手段}, A(商品表.単価,{注文番号,顧客区分,優先度,輸送手段}, J({注文番号},商品表,J({顧客番号},顧客表,注文表)))) このとき本問合せにおいて無作為抽出処理は分類集計処
理結果を推定するために導入されているので、無作為抽
出処理の無作為抽出カラムSCは無指定,サンプルグルー
プ化カラムSGCは推定しようとする分類集計処理のグル
ープ化カラム{顧客区分,優先度,輸送手段}とする。単純
に問合せ結果を無作為抽出抽出する場合、無作為抽出処
理の無作為抽出カラムSCは無指定,サンプルグループ化
カラムSGCは{}（空集合）とする。このとき前出の問合
せ文の例を中間コードに変換した結果を図示すると、図
４に示すように木構造となる。問合せ処理の実行では、
処理対象となる個々のレコードに対して中間コードの葉
に指定された処理から根に指定された処理に向かって順
番に処理が適用される。At this time, the result of converting the query example into an intermediate code is as follows. A (order amount, {customer class, priority, mode of transport}, S (unspecified, {customer class, priority, mode of transport}, A (product table. Unit price, {order number, customer class, priority, mode of transport, Means}, J ({order number}, product table, J ({customer number}, customer table, order table)))) At this time, the random sampling process is introduced in this query to estimate the classification and aggregation process results. Therefore, the random extraction column SC of the random extraction processing is unspecified, and the sample grouping column SGC is the grouping column {customer classification, priority, transportation means} of the classification and aggregation processing to be estimated. When the query results are extracted at random, the random extraction column SC of the random extraction process is not specified, and the sample grouping column SGC is {} (empty set). The result of the conversion into a code is a tree structure as shown in Fig. 4. In the execution of the query processing,
Processing is applied to the individual records to be processed in order from the processing specified as the leaf of the intermediate code to the processing specified as the root.

【００２４】さらに本実施例における実行手順生成処理
７では、問合わせ発行処理５から発行された問合わせ文
から、問合せ処理結果評価基準を生成し、これを問合せ
結果評価処理１０に対して受け渡す。例えば、問合せ発
行時に問合せ処理時間が指定された場合は、その指定さ
れた処理時間を問合せ結果評価処理に対して受け渡し、
あるいはまた集計処理を含む問合せ対して集計結果の精
度が指定された場合は、その指定された精度を問合せ結
果評価処理に対して受け渡す。問合せ発行時の問合せ処
理時間や集計結果の精度の指定方法の一例としては、問
合せ文に時間指定あるいは精度指定のためのキーワード
を追加することが考えられる。例えば、 SELECT顧客区分, SUM(RANDOM(単価)) AS注文額 FROM顧客表,注文表,商品表 WHERE顧客表.顧客番号＝注文表.顧客番号 AND注文表.注文番号＝商品表.注文番号 WITH IN 2 MINUTES; なる問合せ文は顧客区分ごとの商品単価合計値を２分以
内に求まる範囲で推定することを指定する。また、 SELECT顧客区分, SUM(RANDOM(単価)) AS注文額 FROM顧客表,注文表,商品表 WHERE顧客表.顧客番号＝注文表.顧客番号 AND注文表.注文番号＝商品表.注文番号 WITH 0.99 PRECISION; なる問合せ文は顧客区分ごとの商品単価合計値を９９％
の精度で推定することを指定する。このとき上記２つの
問合せ文において、時間指定あるいは精度指定のキーワ
ードのみが指定された場合、問合せ発行処理において無
作為抽出処理の挿入位置を自動決定し、RANDOMキーワー
ドを補完することで端末装置からの問合せ文発行の手間
を軽減することも可能である。Further, in the execution procedure generation processing 7 in the present embodiment, an inquiry processing result evaluation criterion is generated from the query sentence issued from the query issuance processing 5 and passed to the query result evaluation processing 10. . For example, if a query processing time is specified at the time of issuing a query, the specified processing time is passed to the query result evaluation processing,
Alternatively, when the accuracy of the tally result is specified for the query including the tally process, the specified accuracy is transferred to the query result evaluation process. As an example of a method of specifying the query processing time at the time of issuing a query and the accuracy of the tally result, it is conceivable to add a keyword for specifying the time or the accuracy to the query statement. For example, SELECT customer category, SUM (RANDOM (unit price)) AS order amount FROM customer table, order table, product table WHERE customer table. Customer number = order table. Customer number AND order table. Order number = product table. Order number WITH The query statement IN 2 MINUTES; specifies that the total product price for each customer category be estimated within a range that can be determined within 2 minutes. Also, SELECT customer category, SUM (RANDOM (unit price)) AS order amount FROM customer table, order table, product table WHERE customer table. Customer number = order table. Customer number AND order table. Order number = product table. Order number WITH 0.99 PRECISION; query statement is 99% of the total unit price for each customer segment
Specify to estimate with accuracy of. At this time, in the above two query statements, when only the keyword of time specification or precision specification is specified, the insertion position of the random extraction processing is automatically determined in the query issuing processing, and the RANDOM keyword is complemented to make it possible for the terminal apparatus to execute the processing. It is also possible to reduce the trouble of issuing a query sentence.

【００２５】本実施例における問合せ変換処理８では、
前記実行手順生成処理７によって生成された前記中間コ
ードに導入されている無作為抽出処理と、その直前に適
用される問合せ処理との間で処理の適用順序の交換を行
い、より実行効率の良い中間コードに変換する。このと
き前記問合せ変換処理では、無作為抽出処理を含む問合
せに対して、無作為抽出カラムを用いて抽出単位を保存
するような問合せの変形を行うことで、無作為抽出処理
の無作為性を保存しつつ、より実行効率のよい問合せに
変換する。In the query conversion process 8 in this embodiment,
The order of application of the processes is exchanged between the random extraction process introduced in the intermediate code generated by the execution procedure generation process 7 and the query process applied immediately before, thereby improving the execution efficiency. Convert to intermediate code. At this time, in the query conversion process, for a query including a random extraction process, a query is modified such that an extraction unit is stored using a random extraction column, thereby increasing the randomness of the random extraction process. Convert to a more efficient query while preserving it.

【００２６】まず前記問合せ例の中間コードに対して問
合せ変換を適用した場合の様子を以下に示す。図５に示
すように、中間コードに挿入されている無作為抽出処理
の無作為抽出カラムの値を、直前の分類集計処理-1のグ
ループ化カラムの値{注文番号,顧客区分,優先度,輸送手
段}に変更して無作為抽出処理と分類集計処理の適用順
序を変更する。この変更により図１５に示すように分類
集計処理-1適用前の表から各無作為抽出カラムに無作為
に値が割り当てられ、例えば{注文番号：注文１,顧客区
分：建築,優先度：高,輸送手段：トラック}を満たすレ
コードが全て抽出されるようになる。このとき、分類集
計処理後の表においてグループ化カラムはユニークカラ
ムであるので、グループ化カラムの値と個々のレコード
は１対１に対応する。したがって、分類集計処理後の表
に対してレコード単位で無作為抽出を行うことと分類集
計処理前にグループ化カラムの値を無作為に指定して無
作為抽出を行うこととは等価である。したがって上記の
ような処理の適用順序の交換によっても無作為抽出処理
の無作為性は失われない。First, the state in which the query conversion is applied to the intermediate code of the above-described query example will be described below. As shown in FIG. 5, the value of the random extraction column of the random extraction processing inserted in the intermediate code is changed to the value of the grouping column {order number, customer classification, priority, Transportation means}, and change the application order of the random sampling process and the classification and aggregation process. As a result of this change, as shown in FIG. 15, a value is randomly assigned to each randomly extracted column from the table before the application of the classification and aggregation process-1, for example, {order number: order 1, customer classification: building, priority: high. , Transportation means: truck} are all extracted. At this time, since the grouping column is a unique column in the table after the classification and aggregation processing, the value of the grouping column and each record correspond one-to-one. Therefore, performing random extraction on a record-by-record basis for the table after the classification and aggregation processing is equivalent to performing random extraction by randomly specifying the values of the grouping columns before the classification and aggregation processing. Therefore, the randomness of the random extraction processing is not lost even by changing the application order of the processing as described above.

【００２７】次に、図６に示すように、無作為抽出処理
の無作為抽出カラムを変換前の無作為抽出カラムからサ
ンプルグループ化カラムを引いた差分、すなわち{注文
番号}に変更して無作為抽出処理を適用するようにす
る。この変更により図１６に示すように分類集計処理-1
適用前の表から、無作為抽出カラムに無作為に値が割り
当てられ、例えば{注文番号：注文1}を持つレコードが
全て抽出されるようになる。このとき、無作為抽出前の
表をサンプルグループ化カラム{顧客区分,優先度,輸送
手段}の値に応じてグループ分けを行うと、それぞれの
グループではサンプルグループ化カラムの値は互いに等
しく、無作為抽出カラム{注文番号,顧客区分,優先度,輸
送手段}にそれぞれ値を割り当てレコードの抽出を行っ
ても、このうちレコードの指定に有効なのは{注文番号}
のみである。したがって、上記のような処理の適用順序
の交換によって無作為抽出処理の無作為性が失われるこ
とはない。Next, as shown in FIG. 6, the random sampling column in the random sampling process is changed to the difference obtained by subtracting the sample grouping column from the random sampling column before the conversion, that is, {order number}. Apply random extraction processing. Due to this change, classification and aggregation processing-1 as shown in FIG.
A value is randomly assigned to the random extraction column from the table before application, and all records having, for example, {order number: order 1} are extracted. At this time, if the table before random sampling is grouped according to the values of the sample grouping columns {customer classification, priority, transportation means}, the values of the sample grouping columns are equal to each other, and Even if a value is assigned to each of the custom extraction columns {order number, customer classification, priority, transportation means} and records are extracted, the effective order for specifying records is {order number}
Only. Therefore, the randomness of the random extraction processing is not lost by the exchange of the application order of the processing as described above.

【００２８】さらに、次の結合処理-2と無作為抽出処理
の適用順序を変更するために、図７に示すように結合前
のそれぞれの表に対して無作為抽出処理を分配する。こ
の変更により集計処理適用前の表から、無作為抽出カラ
ムに割り当てられた値{注文番号：注文1}を持つレコー
ドが全て抽出されるようになる。このとき、結合処理後
の表から無作為抽出カラムに指定した値を持つレコード
を抽出して得られるレコードの集合と、件都合処理雨の
の表から無作為抽出カラムに指定した値を持つレコード
を抽出して得られるレコードの集合は１対１に対応する
ので、処理の適用順序の交換によって無作為抽出処理の
無作為性は失われない。Further, in order to change the order of application of the next joining process-2 and the random sampling process, the random sampling process is distributed to each table before the joining as shown in FIG. With this change, all records having the value {order number: order 1} assigned to the random sampling column are extracted from the table before the application of the aggregation processing. At this time, a set of records obtained by extracting the records with the values specified in the random extraction column from the table after the join processing, and the records with the values specified in the random extraction columns from the rainy table The set of records obtained by extracting is one-to-one, so that the randomness of the random extraction process is not lost by exchanging the application order of the processes.

【００２９】さらに続く結合処理-1においても、図８に
示すように結合前のそれぞれの表に対して無作為抽出処
理を分配し、適用順序を交換する。この変更により集計
処理適用前の表から、無作為抽出カラムに割り当てられ
た値{注文番号：注文1}を持つレコードが全て抽出され
るようになる。この適用順序の交換も前述の理由と同じ
理由で可能である。ただし顧客表はカラムとして{注文
番号}を含まないので、顧客表に分配される無作為抽出
処理の無作為抽出カラムはNULLであり、顧客表に対して
は全件抽出を行う。以上の問合せ変換により、前記問合
せ例の中間コードは図９のように変換される。In the subsequent joining process-1, as shown in FIG. 8, the random extraction process is distributed to each table before the joining, and the application order is exchanged. With this change, all records having the value {order number: order 1} assigned to the random sampling column are extracted from the table before the application of the aggregation processing. This exchange of the application order is also possible for the same reason as described above. However, since the customer table does not include the {order number} as a column, the random extraction column of the random extraction process distributed to the customer table is NULL, and all items are extracted for the customer table. By the above query conversion, the intermediate code of the above query example is converted as shown in FIG.

【００３０】以下では、各種問合せ処理と無作為抽出処
理との間の処理順序の交換方法についてまとめて述べ
る。無作為抽出処理と条件評価処理との交換は、単純に
両者の適用順序を入れ替えればよい。すなわちS(SC,SG
C,C(CC,T))≡C(CC,S(SC,SGC,T))が成り立つ。このとき
≡は、両辺の操作が無作為抽出処理として等価であるこ
とを示す。両辺の操作が無作為抽出処理として等価であ
るためには、 (i)抽出単位が両辺の処理で一致している。 (ii)両辺の処理において各抽出単位の抽出確率が保存さ
れているという２点あるいは(i)抽出単位が両辺の処理
で一致している(ii)変換後の問合せにおいて各無作為抽
出処理の抽出単位が互いに独立で等しい抽出確率で抽出
されるという２点を示せば良い。したがって上記の両辺
の処理が等価であることは、以下の２点からわかる。 (i)一回の抽出処理において抽出されるレコードは、レ
コードが分類されたサンプルグループのサンプルグルー
プ化カラムSGCの値をsgc、無作為抽出カラムSCに指定さ
れた値をscとして、表Ｔに含まれるレコードのうち、SC
=scおよびSGC=sgcおよび条件評価カラムCCに指定された
条件を満たすレコードであり、両辺の抽出単位は等し
い。 (ii) 変換前後の問合せにおいてSCおよびSGCの値の組と
問合せ変換前後での抽出単位は１対１対応しており、 S
GCに対してSCの値が無作為決定されるならば、問合せ変
換において各抽出単位の抽出確率は保存されている。In the following, a method of exchanging the processing order between the various inquiry processing and the random extraction processing will be described. The exchange between the random extraction process and the condition evaluation process may be performed simply by changing the application order of both processes. That is, S (SC, SG
C, C (CC, T)) ≡C (CC, S (SC, SGC, T)). At this time, ≡ indicates that the operations on both sides are equivalent to random sampling processing. In order for the operations on both sides to be equivalent to the random extraction processing, (i) the extraction unit is the same in the processing on both sides. (ii) Two points that the extraction probability of each extraction unit is preserved in the processing of both sides, or (i) The extraction unit matches in the processing of both sides. (ii) In the query after conversion, It is sufficient to show two points that the extraction units are extracted independently and with the same extraction probability. Therefore, it can be seen from the following two points that the above processes on both sides are equivalent. (i) Records extracted in one extraction process are shown in Table T, where sgc is the value of the sample grouping column SGC of the sample group into which the record is classified, and sc is the value specified in the random sampling column SC. Among the included records, SC
= sc and SGC = sgc and the records satisfying the conditions specified in the condition evaluation column CC, and the extraction units on both sides are equal. (ii) In the query before and after the conversion, the set of SC and SGC values and the extraction unit before and after the query conversion correspond one-to-one.
If the value of SC is randomly determined for the GC, the extraction probabilities of each extraction unit are preserved in the query transformation.

【００３１】SCが無指定の場合、レコード単位で無作為
抽出が行われることになるが、条件評価処理によって処
理前のレコードと処理後のレコードは１対１に対応して
おり、変換前後の問合せにおいて抽出単位は等しく、表
Ｔのレコードのうち条件を満たすレコードに関して問合
せ変換の前後で抽出確率は保存されている。SC=NULLの
場合、無作為抽出処理は全件抽出なので両辺の操作は一
致する。無作為抽出処理と射影処理との交換は、単純に
両者の適用順序を入れ替えればよい。すなわち、S(SC,S
GC,P(PC,T))≡P(PC,S(SC,SGC,T))が成り立つ。ただし左
辺の無作為抽出操作が可能であるためには、SC⊆PCが必
要である。If the SC is not specified, random sampling is performed on a record-by-record basis. The records before and after the processing correspond one-to-one by the condition evaluation processing. The unit of extraction is the same in the query, and the extraction probabilities of the records in the table T that satisfy the conditions are stored before and after the query conversion. When SC = NULL, the operations on both sides match because the random extraction process extracts all items. The exchange between the random extraction processing and the projection processing may be performed simply by changing the application order of both. That is, S (SC, S
GC, P (PC, T)) ≡P (PC, S (SC, SGC, T)) holds. However, in order to be able to perform a random sampling operation on the left side, SC⊆PC is required.

【００３２】このとき上記の両辺の処理が等価であるこ
とは、以下の２点からわかる。 (i)一回の抽出処理において抽出されるレコードは、レ
コードが分類されたサンプルグループのサンプルグルー
プ化カラムSGCの値をsgc、無作為抽出カラムSCに指定さ
れた値をscとして表Ｔに含まれるレコードのうち、SC=s
cおよびSGC=sgcを満たすレコードであり、両辺の抽出単
位は等しい。 (ii)SCおよびSGCの値の組と抽出単位は１対１に対応し
ており、SGCに対してSCの値が無作為決定されるなら
ば、問合せ変換において各抽出単位の抽出確率は保存さ
れている。At this time, it can be seen from the following two points that the above-mentioned processes on both sides are equivalent. (i) Records extracted in one extraction process include the value of the sample grouping column SGC of the sample group into which the records are classified as sgc, and the value specified in the random sampling column SC as sc in Table T. SC = s of records
The record satisfies c and SGC = sgc, and the extraction units on both sides are equal. (ii) There is a one-to-one correspondence between a set of SC and SGC values and an extraction unit, and if the SC value is randomly determined for SGC, the extraction probability of each extraction unit is preserved in query conversion. Have been.

【００３３】SCが無指定の場合、射影処理によって処理
前のレコードと処理後のレコードは１対１に対応してお
り、変換前後の問合せにおいて抽出単位は等しく、レコ
ード毎の抽出確率も保存されている。SC=NULLの場合、
無作為抽出処理は全件抽出なので両辺の操作は一致す
る。無作為抽出処理と分類集計処理との交換は、単純に
両者の適用順序を入れ替えればよい。すなわち、S(SC,S
GC,A(AC,GC,T))≡A(AC,GC,S(SC,SGC,T))が成り立つ。こ
のとき上記の両辺の処理が等価であることは、以下の２
点からわかる。 (i)一回の抽出処理において抽出されるレコードは、レ
コードが分類されたサンプルグループのサンプルグルー
プ化カラムSGCの値をsgc、無作為抽出カラムSCに指定さ
れた値をsc、分類集計処理において分類されたグループ
のグループ化カラムGCの値をgcとして表Ｔに含まれるレ
コードのうちSC=scおよびSGC=sgcおよびGC=gcを満たす
レコードであり、両辺の抽出単位は等しい。 (ii)SCおよびSGCおよびGCの値の組と抽出単位は１対１
に対応しており、 SGCに対してSCの値が無作為決定され
るならば、問合せ変換において各抽出単位の抽出確率は
保存されている。When the SC is not specified, the record before the processing and the record after the processing have a one-to-one correspondence by the projection processing, the extraction unit is equal in the query before and after the conversion, and the extraction probability for each record is also stored. ing. If SC = NULL,
Since the random extraction process extracts all items, the operations on both sides match. The exchange between the random sampling process and the classification and aggregation process may be performed simply by changing the application order of both processes. That is, S (SC, S
GC, A (AC, GC, T)) ≡A (AC, GC, S (SC, SGC, T)) holds. At this time, the fact that the processing on both sides is equivalent is as follows.
I understand from the point. (i) The record extracted in one extraction process, the value of the sample grouping column SGC of the sample group in which the record was classified is sgc, the value specified in the random extraction column SC is sc, in the classification aggregation process A record satisfying SC = sc, SGC = sgc, and GC = gc among records included in Table T, where the value of the grouping column GC of the classified group is gc, and the extraction units on both sides are equal. (ii) SC and SGC and GC value sets and extraction units are one-to-one
If the value of SC is randomly determined with respect to SGC, the extraction probability of each extraction unit is preserved in the query transformation.

【００３４】SCが無指定の場合、SCに分類集計処理のグ
ループ化カラムを指定し、問合せの変換を行う。グルー
プ化処理がない場合は、分類集計処理の結果はレコード
一つであるので、無作為抽出処理は行わず、全件抽出を
行う。SC=NULLの場合、無作為抽出処理は全件抽出なの
で両辺の操作は一致する。無作為抽出処理と結合処理と
の交換は、無作為抽出カラムＳＣ及びサンプルグループ
化カラムＳＧＣをそれぞれの表に含まれるカラムに制限
し、両者の適用順序を交換すればよい。すなわちS(SC,S
GC,J(JC,S,T))≡J(JC,S(SC/S,SGC/S,S),S(SC/T,SGC/T,
T))が成り立つ。When the SC is not specified, a query is converted by specifying the grouping column of the classification and aggregation process in the SC. If there is no grouping process, the result of the classification and totaling process is one record, so that all items are extracted without performing the random extraction process. When SC = NULL, the operations on both sides match because the random extraction process extracts all items. The exchange between the random extraction process and the binding process may be performed by limiting the random extraction column SC and the sample grouping column SGC to the columns included in the respective tables, and exchanging the application order of both. That is, S (SC, S
GC, J (JC, S, T)) ≡J (JC, S (SC / S, SGC / S, S), S (SC / T, SGC / T,
T)) holds.

【００３５】両辺の処理が等価であることは、以下の２
点からわかる。 (i)一回の抽出処理において抽出されるレコードは、レ
コードが分類されたサンプルグループのサンプルグルー
プ化カラムSGCの値をsgc、無作為抽出カラムSCに指定さ
れた値をscとして、右の処理で抽出されるレコードは、
表ＳでSC/S=sc/SおよびSGC/S=sgc/S,表ＴでSC/T=sc/Tお
よびSGC/T=sgc/Tを満たすレコードでJC同士の等しいも
ので、左の処理でも抽出される。右の処理で抽出されな
いレコードは、上記のいずれかの条件を満たさないの
で、左の処理でも抽出されない。よって両辺の抽出単位
は等しい。 (ii)SCおよびSGCの値の組と抽出単位は１対１対応して
おり、 SGCに対してSCの値が無作為決定されるならば、
問合せ変換において各抽出単位の抽出確率は保存されて
いる。The fact that the processes on both sides are equivalent is as follows.
I understand from the point. (i) The record extracted in one extraction process, the value of the sample grouping column SGC of the sample group into which the record was classified is sgc, the value specified in the random extraction column SC is sc, and the right processing The records extracted by
Records satisfying SC / S = sc / S and SGC / S = sgc / S in Table S, and SC / T = sc / T and SGC / T = sgc / T in Table T It is also extracted during processing. Records that are not extracted in the right process do not satisfy any of the above conditions, and therefore are not extracted in the left process. Therefore, the extraction units on both sides are equal. (ii) There is a one-to-one correspondence between a set of SC and SGC values and an extraction unit, and if the SC value is randomly determined for SGC,
In the query conversion, the extraction probability of each extraction unit is stored.

【００３６】SCの指定がない場合、J(JC,S,T)のキーカ
ラムをKCとして、S(指定無し,SGC,J(JC,S,T))≡ S(KC,S
GC,J(JC,S,T))とする。J(JC,S,T)にキーカラムが存在し
ない場合は、ジョインの属性値に応じた重み付けをして
分配。すなわち、表Ｔにおける属性Ｘの属性値xiの出現
比率を|T.xi|として、表Ｓからの属性値xiを持つレコー
ドの抽出確率を|T.xi|/|T.x|maxとして無作為抽出を行
う。S(指定無し,SGC,J(JC,S,T))≡S(指定無し,SGC,J(J
C,Select(|T.xi|/|T.x|max,S(指定無し,S)),T))なる変
換で、抽出単位は変わらない。また、各抽出単位の抽出
単位の抽出操作は独立であり、その抽出確率も 1/|S|*|T.xi|/|T.x|max *1/|T.xi|=1/|S||T.x|max で互いに等しい。SC=NULLの場合、無作為抽出処理は全
件抽出なので両辺の操作は一致する。また、無作為抽出
処理に関してサンプルグループ化カラムSGCが無作為抽
出カラムSCに含まれる場合(SC⊇SGC)、無作為抽出カラ
ムを変換前の無作為抽出カラムからサンプルグループ化
カラムを引いた差分に置き換えることができる。例え
ば、図１７に示すように、SC={X,Y},SGC={Y}とすると、
この無作為抽出処理における抽出単位はカラムＸ,Ｙの
値の等しいレコードとなるが、表のレコードをカラムＹ
の値に関してグループ分けした後、カラムＸ,Ｙの値に
関して無作為抽出を行った場合と、まず表のレコードを
無作為抽出カラムSCから無作為抽出カラムSGCを引いた
カラムＸの値に関して無作為抽出を行った後、カラムＹ
の値に関してグループ化を行った場合も、それぞれのグ
ループにおいて抽出単位と抽出確率が等しいことに変化
は生じない。すなわちS(SC,SGC,T)≡S(SC-SGC,SGC,T)が
成り立つ。If no SC is specified, the key column of J (JC, S, T) is set to KC, and S (no specification, SGC, J (JC, S, T)) ≡S (KC, S
GC, J (JC, S, T)). If there is no key column in J (JC, S, T), distribution is performed by weighting according to the attribute value of the join. That is, the appearance ratio of the attribute value xi of the attribute X in the table T is | T.xi |, and the extraction probability of the record having the attribute value xi from the table S is | T.xi | / | Tx | max. I do. S (unspecified, SGC, J (JC, S, T)) ≡ S (unspecified, SGC, J (J
C, Select (| T.xi | / | Tx | max, S (unspecified, S)), T)), and the extraction unit does not change. The extraction operation of each extraction unit is independent and the extraction probability is 1 / | S | * | T.xi | / | Tx | max * 1 / | T.xi | = 1 / | S | | Tx | max equal to each other. When SC = NULL, the operations on both sides match because the random extraction process extracts all items. In addition, if the sample grouping column SGC is included in the random extraction column SC for random extraction processing (SC⊇SGC), the difference between the random extraction column before conversion and the sample grouping column Can be replaced. For example, as shown in FIG. 17, if SC = {X, Y}, SGC = {Y},
The unit of extraction in this random extraction process is a record in which the values of the columns X and Y are equal.
After random grouping on the values of columns X and Y after grouping on the values of, and first on the values of column X obtained by subtracting the random sampling column SGC from the random sampling column SC from the table records After performing extraction, column Y
Does not change even if the extraction unit and the extraction probability are equal in each group. That is, S (SC, SGC, T) ≡S (SC-SGC, SGC, T) holds.

【００３７】両辺の処理が等価であることは、以下の２
点からわかる。 (i)左辺の処理において、一回の抽出処理において抽出
されるレコードの一つをrとして、ｒの無作為抽出カラ
ムSCの値をsc、サンプルグループ化カラムSGCをsgcとす
ると、SGC=sgcおよびSC-SGC=sc-sgcが成り立つので、SC
およびSGCの同じ値の割り当てによってレコードｒは右
辺の処理においても抽出される。また逆に、左辺の処理
において、一回の抽出処理において抽出されるレコード
の一つをrとして、ｒの無作為抽出カラムSC-SGCの値をs
c-sgc、サンプルグループ化カラムSGCをsgcとすると、S
GC=sgcおよびSC=scが成り立つので、SCおよびSGCの同じ
値の割り当てによってレコードｒは左辺の処理において
も抽出される。以上より、 SCおよびSGCの同じ値の割り
当てに対して同じレコードが抽出されることからその抽
出単位は等しい。 (ii) SCおよびSGCの値の組と抽出単位は１対１対応して
おり、SGCに対してSCの値が無作為決定されるならば、
問合せ変換において各抽出単位の抽出確率は保存されて
いる。The fact that the processing on both sides is equivalent is as follows.
I understand from the point. (i) In the processing on the left side, assuming that one of the records extracted in one extraction processing is r, the value of the random extraction column SC of r is sc, and the sample grouping column SGC is sgc, SGC = sgc And SC-SGC = sc-sgc holds, so SC
The record r is also extracted in the processing on the right side by the assignment of the same value of SGC and SGC. Conversely, in the processing on the left side, one of the records extracted in one extraction processing is defined as r, and the value of the random extraction column SC-SGC of r is defined as s.
c-sgc, sample grouping column SGC is sgc, S
Since GC = sgc and SC = sc hold, the record r is also extracted in the processing on the left side by assigning the same value to SC and SGC. As described above, since the same record is extracted for the assignment of the same value of SC and SGC, the extraction unit is equal. (ii) There is a one-to-one correspondence between a set of SC and SGC values and an extraction unit, and if the SC value is randomly determined for SGC,
In the query conversion, the extraction probability of each extraction unit is stored.

【００３８】本実施例におけるレコード読み出し処理１
１は、前記問合せ実行処理９が発行するレコード読み出
し要求にしたがって、レコード記憶装置１２に格納され
たレコードの読み出しを行う。図１１に本実施例におけ
るレコード格納の様子を示す。本実施例におけるレコー
ド記憶装置１２へのレコードの格納では、レコード格納
処理１３を用いて、レコードの１個以上のレコードのカ
ラムをあらかじめ分割カラムBCとして指定し、指定され
た分割カラムに対してハッシュ関数１１１を適用し、そ
の値に応じてレコードをバケット１１２と呼ばれるグル
ープに分類して格納しておく。ハッシュ分割に用いるハ
ッシュ関数１１１としては、文献“PRINCIPLES OF DATA
BASE AND KNOWLEDGE-BASE SYSTEMS”,J.D.Ullman著 Co
mputer Science Press発行 P.358-360に開示されている
分割ハッシュ関数等を用いることで、分割カラム毎にハ
ッシュ値を指定したレコード読み出しが行えるようにな
る。ただし、分割カラムの指定がない場合は、全てのレ
コードを一つのバケットに格納する。それぞれのバケッ
トに対しては、レコード記憶装置内の連続領域であるブ
ロック１１３が必要に応じて割り当てられ、レコードは
それらのブロックに格納される。したがって同じハッシ
ュ値を持つレコードの読み出しにおいては、レコード記
憶装置に対するランダムアクセスは発生しない。Record reading process 1 in this embodiment
1 reads a record stored in the record storage device 12 in accordance with a record read request issued by the inquiry execution process 9. FIG. 11 shows how records are stored in this embodiment. In storing a record in the record storage device 12 according to the present embodiment, a column of one or more records of the record is specified in advance as a division column BC using a record storage process 13, and a hash is applied to the specified division column. The function 111 is applied, and records are classified and stored in groups called buckets 112 according to their values. The hash function 111 used for hash division is described in the document “PRINCIPLES OF DATA
BASE AND KNOWLEDGE-BASE SYSTEMS ”, by JDUllman Co
By using a divided hash function or the like disclosed in pp. 358-360 issued by mputer Science Press, it becomes possible to read a record in which a hash value is specified for each divided column. However, when there is no designation of the division column, all records are stored in one bucket. For each bucket, a block 113 which is a continuous area in the record storage device is allocated as necessary, and records are stored in those blocks. Therefore, in reading records having the same hash value, random access to the record storage device does not occur.

【００３９】本実施例におけるレコード記憶装置１２か
らのレコードの読み出しでは、レコード読み出し処理直
後のデータベース処理の内容によって以下の４つの方式
を使い分ける。方式１：直後のデータベース処理が無作為抽出処理以外
あるいは直後の無作為抽出処理の無作為抽出カラムがNU
LLの場合、通常の読み出し処理を行う。すなわち、全て
のバケットに含まれるレコードを全て読み出す。方式２：直後の無作為抽出処理の無作為抽出カラムが無
指定の場合、レコード単位の無作為抽出処理を行う。す
なわち、一回の読み出し要求に対してレコード格納領域
１２に格納されるレコードを無作為に１つ決定し、レコ
ード読み出しを行う。方式３：直後の無作為抽出処理の無作為抽出カラムSCと
レコードの分割カラムBCとの間に共通部分がない場合、
ハッシュ関数を適用したレコード読み出し処理を行う。
すなわち、図１０に示すようにレコード読み出し時に無
作為抽出カラムのそれぞれのカラムについてハッシュ値
を無作為に指定し、レコード記憶装置に格納されたレコ
ードの無作為抽出カラムSCに対してハッシュ関数１０１
を適用し、指定されたハッシュ値を持つレコードだけを
抽出する。分割カラムの指定が無い場合も、このレコー
ド読み出し方式を利用する。方式４：直後の無作為抽出処理の無作為抽出カラムとレ
コードの分割カラムとの間に共通部分がある場合、バケ
ット分割されたレコードを利用したレコード読み出し処
理を行う。すなわち、図１１に示すようにレコード読み
出し時に無作為抽出カラムのそれぞれのカラムについて
ハッシュ値を無作為に指定し、レコード記憶装置１２に
格納されたレコードの無作為抽出カラムSCと格納分割カ
ラムBCの共通部分SC∩BCに関しては、指定されたハッシ
ュ値を持つバケット１１２のブロック１１３に格納され
たレコードを読み出し、さらに読み出されたレコードの
無作為抽出カラムSCと格納分割カラムBCの差分SC-BCに
対してハッシュ関数１１４を適用し、指定されたハッシ
ュ値を持つレコードだけを抽出する。In reading records from the record storage device 12 in the present embodiment, the following four methods are selectively used depending on the contents of the database processing immediately after the record reading processing. Method 1: Immediate database processing other than random extraction processing or random extraction column of random extraction processing immediately after is NU
In the case of LL, normal read processing is performed. That is, all records included in all buckets are read. Method 2: If the random extraction column of the random extraction process immediately after is not specified, the random extraction process is performed in record units. That is, for one read request, one record to be stored in the record storage area 12 is randomly determined, and the record is read. Method 3: If there is no common part between the random extraction column SC of the immediately subsequent random extraction process and the record division column BC,
Performs record reading processing using a hash function.
That is, as shown in FIG. 10, at the time of reading a record, a hash value is randomly designated for each of the random extraction columns, and a hash function 101 is assigned to the random extraction column SC of the record stored in the record storage device.
Is applied to extract only records with the specified hash value. This record reading method is used even when the division column is not specified. Method 4: If there is a common part between the random extraction column of the random extraction process immediately after and the record division column, a record read process using the bucket-divided record is performed. That is, as shown in FIG. 11, at the time of record reading, a hash value is randomly specified for each of the random extraction columns, and the random extraction column SC and the storage division column BC of the record stored in the record storage device 12 are stored. Regarding the common part SC @ BC, the record stored in the block 113 of the bucket 112 having the specified hash value is read, and the difference SC-BC between the random extraction column SC and the storage division column BC of the read record is read. , A hash function 114 is applied to extract only records having a specified hash value.

【００４０】本実施例におけるレコード記憶装置１２
は、表形式にまとめられたレコードを格納する。レコー
ドをバケットに分割して格納する場合、レコードの分割
カラムのハッシュ値によってそれぞれのバケットに割り
当てられたレコードをレコード記憶装置上の連続領域で
あるブロックに割り当てて格納することで、レコードア
クセスがシーケンシャルアクセスとなり、バケット読み
出し効率を向上させることができる。The record storage device 12 in this embodiment
Stores records organized in a table format. When records are divided into buckets and stored, the records assigned to the respective buckets are assigned and stored in blocks, which are continuous areas on the record storage device, according to the hash value of the record's divided column, so that record access is sequential. It becomes an access, and the bucket reading efficiency can be improved.

【００４１】本実施例における問合せ結果評価処理１０
では、前記実行手順生成処理７が生成した問合せ評価基
準に基づき、前記問合せ実行処理９での問合せ結果を評
価し、問合せ処理の実行制御を行う。前記実行手順生成
処理７が生成した問合せ評価基準において時間を指定さ
れた場合、問合せ処理評価処理では、問合せ実行処理か
ら問合せ結果を受け取る毎に問合せ実行時間と指定され
た時間を比較し、問合せ実行時間が指定された時間を超
過していた場合は、問合せ実行処理に対して問合せ処理
の中止を指示する。前記実行手順生成処理７が生成した
問合せ評価基準において精度を指定された場合、問合せ
処理評価処理では、問合せ実行処理から問合せ結果を受
け取る毎に、問合せ結果の推定値の精度を計算し、推定
値の精度が指定された精度を超過していた場合は、問合
せ実行処理に対して問合せ処理の中止を指示し、そのと
きの推定値と精度を返す。Inquiry result evaluation processing 10 in this embodiment
Then, based on the query evaluation criterion generated by the execution procedure generation process 7, the query result in the query execution process 9 is evaluated, and execution control of the query process is performed. When a time is specified in the query evaluation criterion generated by the execution procedure generation process 7, the query execution evaluation process compares the query execution time with the specified time each time a query result is received from the query execution process, and executes the query execution. If the time has exceeded the specified time, it instructs the query execution process to stop the query process. When accuracy is specified in the query evaluation criterion generated by the execution procedure generation processing 7, the query processing evaluation processing calculates the accuracy of the estimated value of the query result every time a query result is received from the query execution processing, and calculates the estimated value. If the precision exceeds the specified precision, the query execution processing is instructed to stop the query processing, and the estimated value and precision at that time are returned.

【００４２】このとき推定値の精度は、文献「統計学辞
典」竹内啓著東洋経済新報社出版P.243-247,252-254
に開示されている、無作為抽出法および集落抽出法に関
する推定方法によって算出することができる。また上記
実施例において、問合せ発行処理あるいは問合せ変換処
理のみをそれぞれ単独に備える問合せ処理方式を用いて
データベース処理システムを構成することも可能であ
る。At this time, the accuracy of the estimated value is determined according to the literature “Dictionary of Statistics” by Kei Takeuchi, published by Toyo Keizai Shinposha, pp. 243-247, 252-254.
Can be calculated by the estimation method relating to the random sampling method and the village extraction method disclosed in the above. Further, in the above embodiment, the database processing system can be configured using an inquiry processing method having only an inquiry issuing process or an inquiry conversion process independently.

【００４３】[0043]

【発明の効果】本発明による問合せ変換処理では、無作
為抽出処理と他の問合せ処理との間の適用順序の交換に
おいて、無作為抽出処理の抽出単位を考慮した問合せの
変換を行うことにより、抽出単位を考慮しない従来の問
合せ変換処理と比べ、集計処理を含む問合せに対しても
適用することができ、さらに広い範囲の問合せの効率向
上を図ることができる。また問合せ発行時に、問合せ処
理所要時間や問合せ結果の推定値の精度を指定し、デー
タベースの規模や問合せの複雑度に応じて無作為抽出さ
れるレコードの量を調節することで、任意の応答時間や
精度を備えた問合せ結果を簡便に得ることができる。さ
らに、表からバケット単位で無作為抽出を行い集計処理
を行う際に、集計対象となるバケット当りのレコード数
が十分多い場合に、集計結果の精度計算を単純な無作為
抽出処理の場合の精度計算式で近似することで、レコー
ド分割に伴う新たな統計量を必要とせず集計結果の精度
計算を単純な無作為抽出処理の場合の精度計算式で近似
することができる。According to the query conversion process of the present invention, in the exchange of the application order between the random extraction process and another query process, the query is converted in consideration of the extraction unit of the random extraction process. Compared to a conventional query conversion process that does not consider the extraction unit, the present invention can be applied to a query including a totaling process, and the efficiency of a wider range of queries can be improved. Also, at the time of query issuance, the response time is specified by specifying the time required for query processing and the accuracy of the estimated value of the query result, and adjusting the amount of records randomly extracted according to the size of the database and the complexity of the query. And a query result with high accuracy can be easily obtained. In addition, when performing random sampling in bucket units from the table and performing aggregation processing, if the number of records per bucket to be aggregated is sufficiently large, the accuracy calculation of the aggregation result is calculated using the simple random extraction processing. By approximating by a calculation formula, the accuracy calculation of the aggregation result can be approximated by the accuracy calculation formula in the case of a simple random sampling process without requiring new statistics associated with record division.

[Brief description of the drawings]

【図１】本発明によるデータベース処理システムの実施
例の概要図である。FIG. 1 is a schematic diagram of an embodiment of a database processing system according to the present invention.

【図２】表の構成に関する説明図である。FIG. 2 is an explanatory diagram related to the configuration of a table.

【図３】データベースの例の説明図である。FIG. 3 is an explanatory diagram of an example of a database.

【図４】問合せ例の変換前の中間コードの説明図であ
る。FIG. 4 is an explanatory diagram of an intermediate code of a query example before conversion.

【図５】問合せ例での無作為抽出処理と分類集計処理-
１の適用順序交換の説明図である。FIG. 5: Random sampling process and classification and aggregation process in query example
FIG. 4 is an explanatory diagram of application order exchange of No. 1;

【図６】問合せ例での無作為抽出処理の無作為抽出カラ
ム変換の説明図である。FIG. 6 is an explanatory diagram of random extraction column conversion in random extraction processing in a query example.

【図７】問合せ例での無作為抽出処理と結合処理-２の
適用順序交換の説明図である。FIG. 7 is an explanatory diagram of an application order exchange of a random extraction process and a joining process-2 in a query example.

【図８】問合せ例での無作為抽出処理と結合処理-１の
適用順序交換の説明図である。FIG. 8 is an explanatory diagram of an application order exchange of a random extraction process and a join process-1 in a query example.

【図９】問合せ例の変換後の中間コードの説明図であ
る。FIG. 9 is an explanatory diagram of a converted intermediate code of a query example.

【図１０】レコード記憶装置からのレコード読み出し処
理の説明図である。FIG. 10 is an explanatory diagram of processing for reading a record from a record storage device.

【図１１】レコード記憶装置へのレコード格納処理の説
明図である。FIG. 11 is an explanatory diagram of processing for storing a record in a record storage device.

【図１２】問合せ例での結合処理結果の例の説明図であ
る。FIG. 12 is an explanatory diagram of an example of a join processing result in a query example.

【図１３】問合せ例での集計処理-1の処理結果の説明図
である。FIG. 13 is an explanatory diagram of a processing result of totaling processing-1 in an example of a query;

【図１４】問合せ例での集計処理-2の処理結果の説明図
である。FIG. 14 is an explanatory diagram of a processing result of a tallying process-2 in an example of a query;

【図１５】無作為抽出処理と分類集計処理-１との問合
せ変換により抽出されるレコードの説明図である。FIG. 15 is an explanatory diagram of records extracted by query conversion between random extraction processing and classification and aggregation processing-1.

【図１６】無作為抽出処理の無作為抽出カラムの変換に
より抽出されるレコードの説明図である。FIG. 16 is an explanatory diagram of a record extracted by conversion of a random extraction column in a random extraction process.

【図１７】無作為抽出処理の無作為抽出カラムの変換方
法の説明図である。FIG. 17 is an explanatory diagram of a method of converting a random extraction column in a random extraction process.

[Explanation of symbols]

１端末装置２問合せ発行処理３問合せ実行管理処理４レコード管理処理５問合せ発行処理６問合せ結果表示処理７実行手順生成処理８問合せ変換処理９問合せ実行処理１０問合せ結果評価処理１１レコード読み出し処理１２レコード記憶装置１３レコード格納処理２０表２１レコード２２カラム３１顧客表３２注文表３３商品表 DESCRIPTION OF SYMBOLS 1 Terminal device 2 Query issue process 3 Query execution management process 4 Record management process 5 Query issue process 6 Query result display process 7 Execution procedure generation process 8 Query conversion process 9 Query execution process 10 Query result evaluation process 11 Record read process 12 Record storage Device 13 Record storage processing 20 Table 21 Record 22 Column 31 Customer table 32 Order table 33 Product table

───────────────────────────────────────────────────── フロントページの続き (72)発明者高橋ヨリ神奈川県横浜市戸塚区戸塚町5030番地株式会社日立製作所ソフトウェア開発本部内 (72)発明者西澤格東京都国分寺市東恋ケ窪一丁目280番地株式会社日立製作所中央研究所内 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Yori Takahashi 5030 Totsuka-cho, Totsuka-ku, Yokohama-shi, Kanagawa Prefecture Inside Software Development Division, Hitachi, Ltd. (72) Inventor Tadashi Nishizawa 1-280, Higashi-Koigabo, Kokubunji-shi, Tokyo Central Research Laboratory, Hitachi, Ltd.

Claims

[Claims]

1. A random extraction method in a database processing system for extracting desired data from a database, comprising: (1) a query issue management process for issuing a query to the database; and (2) a query issue management process for issuing the query. It has a query execution management process for performing execution management, and (3) a data management process for storing data in the database and managing the stored data. The query execution management process in (2) includes (2- 1) a process of inserting a random extraction process into the query issued in the query issuance process; and (2-2) a more efficient data extraction of the query while preserving the unit of extraction in the inserted random extraction process. And a query conversion process for converting the query into a query that can be performed in the database processing system.

2. The query processing executed in the query execution management processing of the above (2) comprises: (A) a query issued in the above (1) for a table in which records as data to be processed are tabulated. The records of the table are divided into groups according to the values of one or more grouping columns specified in the above, and the values of one or more aggregation columns specified in the query also issued in the above (1) are respectively After applying the classification and aggregation process that aggregates for each group of
(B) A value is randomly specified for each of one or a plurality of random extraction columns specified in the query, and a random extraction process of extracting all records having the specified values from the aggregation result is applied. In the query conversion process (2-2), the process (B) is applied to the record of the table, and then the process (A) is applied. 2. The method according to claim 1, further comprising a query conversion process for exchanging the application order of A) and (B).

3. In the query conversion process of (2-2), when the process (B) is a random extraction process in a record unit that does not specify a random extraction column, the query conversion process is performed as the random extraction column. 3. The random extraction processing method in the database processing system according to claim 2, wherein the method is executed after conversion into a query using a grouping column in the classification and aggregation processing of the processing (A).

4. The query processing executed in the query execution management processing of the above (2) comprises: (A) a query issued in the above (1) for a table in which records as data to be processed are tabulated. The records in the table are divided into groups according to the values of one or more grouping columns specified in (b), and (B) one or more random extractions also specified in the query issued in (1) above After randomly specifying a value for each column in each column of each column and applying a random extraction process of extracting all records having the specified value from the grouping process result,
(C) applying a tabulation process on the values of one or more columns designated in the query similarly issued in (1) to the random sampling result for each group. If included, the above (2-2)
The query conversion process of (A), (B), (C)
(A1) First, for the table to be processed, the grouping column specified by the query is specified as the sample grouping column, and the records of the table are divided into groups according to the value of the sample grouping column. After randomly applying values to each of the random extraction columns for each group and applying a random extraction process for extracting all records having the specified values, (C1) the random extraction process 2. The random extraction processing method in the database processing system according to claim 1, further comprising a step of changing to a step of applying the classification and aggregation processing in the query before conversion to the result of (1).

5. The query processing executed in the query execution management processing of the above (2) is one of the two tables specified in the query issued in the above (1) for the two tables to be processed. Alternatively, after combining records of respective tables having the same value in a plurality of join columns and applying a join process as one record, (B) the one specified in the query issued in the above (1) is also applied. A case where a value is randomly specified in each of one or a plurality of random extraction columns, and a random extraction process of extracting all records having the specified value from the join result is applied. , The query conversion process of (2-2) is performed by the process (A),
(B) is applied to each of the tables to be processed in (A2), and the random extraction process in which the random extraction columns are limited to columns commonly included in each of the tables is applied. 2. The method according to claim 1, further comprising the step of: applying a join process for a join column specified by a query before conversion to an extraction result from the table.

6. The query processing executed in the query execution management processing of (2) is one or more of: (A) one or more specified in the query issued in (1) for the table to be processed. After the records in the table are divided into groups according to the values of the sample grouping columns, each of the one or more random sampling columns specified in the query issued in (1) above for each group. (B) after applying a random extraction process of randomly specifying values in the columns and extracting all records having the specified values, (B) the one specified in the query issued in (1) above. Alternatively, a table record is divided into groups according to the values of a plurality of grouping columns, and a classification / aggregation process is performed in which aggregation is performed on the values of one or a plurality of aggregation columns designated similarly for each group. In the case where the query conversion process of (2-2) is included, the query conversion process of (2-2) includes the process (A),
(B1) is applied to (A1) a random extraction process using a difference obtained by subtracting a sample grouping column from a random extraction column for a table to be processed as a new random extraction column. 2. The random extraction processing method in the database processing system according to claim 1, further comprising a process of changing the process to the process of B).

7. The query processing executed in the query execution management processing of (2) is performed by: (A) one or a plurality of queries specified in the query issued in (1) for the table to be processed; (B) one or more random extraction columns specified in the query issued in (1) above, after applying the condition evaluation processing of extracting a table record according to the value of the condition evaluation column Apply a random extraction process that randomly specifies a value for each column of, and extracts all records with the specified value from the result of the conditional evaluation process.
(2-2), the query conversion process of (2-2) includes the processes (A) and (B) of (A1) one or a plurality of After applying a random extraction process using a random extraction column,
(B1) A process is provided for changing the extraction result to a condition evaluation process using one or a plurality of condition evaluation columns specified in the query similarly issued in (1). A random extraction processing method in the database processing system described in the above.

8. The query processing executed in the query execution management processing of (2) is performed by: (A) one or more of the queries specified in the query issued in (1) for the table to be processed; After applying the projection process that extracts the projection column of
(B) Similarly, a value is randomly specified for each of one or a plurality of randomly extracted columns specified in the query issued in (1), and the record having the specified value is subjected to the projection processing. Applying the random extraction process of extracting all from the result, the process (2-2)
The query conversion process of (A) converts the processes (A) and (B) into (A
1) After applying the random extraction process using the random extraction column specified in the query to the table to be processed, (B)
2. The random extraction processing method in the database processing system according to claim 1, further comprising a step of: 1) applying a projection process using a projection column designated by a query to the extraction result.

9. A random extraction processing method in a database processing system for extracting desired data from a database, comprising: (1) a query issue management process for issuing a query to the database; and (2) a query issue management process for issuing the query. A query execution management process for performing execution management; and (3) a data management process for storing data in the database and managing the stored data.
The query issue management process of (1-1) is a query issue process of generating a query sentence according to an input from the terminal device;
(1-2) a query result display process for displaying a query result and an evaluation result with respect to the query result on the terminal device, wherein the query execution management process of (2) comprises (2-1)
An execution method generating process for generating a query execution procedure in which random extraction processing is inserted and a query result evaluation criterion for evaluating a query result by random extraction from a query sentence issued by the query issuing process of (1-1); (2-2) The query execution procedure generated by the execution method generation process of (2-1) can be further efficiently extracted from the query while preserving the extraction unit in the inserted random extraction process. And (2-3) execute a query in accordance with the query execution procedure converted in the query conversion process of (2-2), and perform the data management process of (3). (2-4) Evaluating the result of the query execution process of (2-3) according to the query result evaluation criterion generated by (2-1). Passing the query results and evaluation results for the query issuing process (1), wherein in response to the query result and the evaluation result (2-
3) a query result evaluation process for controlling the query execution process; and (3) the data management process includes (3-1) a data storage process for storing data in a database, and (3) a data storage process for storing data in a database.
-2) A random extraction method in a database processing system, comprising: a data read process for reading data in accordance with a data read request issued by the query execution process of (2-3).

10. A random extraction processing method in a database processing system for extracting desired data from a database, comprising: (1) a query issue management process for issuing a query to the database; and (2) a query issue management process for issuing the query. A query execution management process for performing execution management; and (3) a data management process for storing data in the database and managing the data. The data management process of (3) includes the query of (2). Regarding the query processing executed in the execution management processing, a value is randomly specified in each of one or a plurality of random extraction columns specified in the query issued in (1) above, and the specified value is specified. Is included in the table where the records as data are included in the table, A database process characterized by applying a hash function to a random extraction column when extracting records from the system, and extracting the result of extracting all records having hash values determined at random in advance as a random extraction result Random sampling method in the system.

11. The data management process of (3) divides a record of a table into buckets, which are sets of mutually exclusive records, according to a hash value of a random extraction column specified by a query. When a bucket is allocated to one or more blocks as a continuous area on an external record storage device and stored, and records are extracted from a table in which records as data are tabulated, all records included in the bucket are extracted. 11. The random extraction processing method in the database processing system according to claim 10, wherein the reading of the records included in the bucket is performed by sequential access to the record storage device by using one random extraction unit.

12. A random extraction processing method in a database processing system for extracting desired data from a database, comprising: (1) a query issue management process for issuing a query to the database; and (2) a query issue management process for issuing the query. A query execution management process for performing execution management; and (3) a data management process for storing data in the database and managing the stored data.
In the query issuance management process, a query statement designating introduction of a random extraction process at an appropriate position in the query process is issued.

13. The inquiry issuance management process of (1),
In addition to the specification regarding the introduction of the random extraction process into the query process, the time required for the query process is specified, and the query execution management process of (2) is performed in accordance with the specified query execution time. 13. The random extraction processing method in the database processing system according to claim 12, wherein adjusting the amount guarantees that the query processing is completed within the time specified by the query.

14. The inquiry issue management process (1)
In addition to the query issuance process that specifies that a random sampling process is to be introduced, the accuracy of the estimated value of the aggregation process result is specified, and the query execution management process (2) is performed according to the accuracy of the specified estimated value. 13. The random extraction processing method in the database processing system according to claim 12, wherein by adjusting the number of randomly extracted records, an estimated value of a total processing result having an accuracy designated at the time of issuing the query is returned.

15. A database processing system for extracting desired data from a database according to input information from a terminal device, comprising: (1) a query issuance management unit for issuing an inquiry to the database; and (2) an issued query. Query execution management means for performing execution management of
(3) data management means for storing data in the database and managing the stored data;
The query issuing management means of (1) comprises: (1-1) a query issuing means for generating a query sentence according to an input from a terminal device; and (1-2) a query result and an evaluation result with respect to the query result. The query execution management means of (2) includes:
-1) A query execution procedure in which random extraction processing is inserted and a query result evaluation criterion for evaluating a query result by random extraction are generated from the query sentence issued by the query issuing means of (1-1). Execution method generating means;
(2-2) The query execution procedure generated by the execution method generation means of (2-1) is used to extract the query with higher efficiency while preserving the unit of extraction in the inserted random sampling process. And (2-3) executing a query in accordance with the query execution procedure converted by the query conversion means of (2-2), and performing a query on the data management means of (3). Query execution means for issuing a data read request; and (2-4) evaluating the result of the query execution process of (2-3) according to the query result evaluation criterion generated in (2-1). And (2-3) query result evaluation processing means for passing the query result and the evaluation result to the query issuing means and controlling the query execution means of (2-3) according to the query result and the evaluation result. Data management means) is,
(3-1) data storage means for storing data in a database; (3-2) data reading means for reading data from the database in accordance with the data read request issued by the query execution means of (2-3); A database processing system comprising:

16. A recording medium on which a program for executing the method according to claim 9 is recorded.