JPH1078968A

JPH1078968A - Statistic data base system

Info

Publication number: JPH1078968A
Application number: JP8232202A
Authority: JP
Inventors: Gengo Suzuki; 源吾鈴木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-09-02
Filing date: 1996-09-02
Publication date: 1998-03-24

Abstract

PROBLEM TO BE SOLVED: To provide a statistic data base system which inputs an inquiry statement given from a user to decide a table that has a least accumulation value and also can accumulate information designated by an inquiry statement even if the information is not stored in a data base as it is. SOLUTION: The syntax of an inquiry statement sent from an application program is analyzed at a syntax analysis part 11, and it's decided whether the result of the analyzed inquiry statement can be obtained from a table. An optimization degree calculation part 14 calculates the optimization degree to show the optimum table to be retrieved out of a specific data base. Then a statement is generated in response to the table of the inquiry destination that is decided based on the calculated optimization degree and sent to a data base 10 where the data are actually stored for its retrieval. Then this retrieval result is sent back to the application program of a user.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、利用者がアプリケ
ーションプログラム中で問い合わせ文によって指定した
情報をデータベースから集計し、検索結果として利用者
側のアプリケーションプログラムに出力する統計データ
ベースシステムに関し、更に詳しくは、問い合わせ文に
よって指定された情報がそのままの形でデータベースに
格納されていなくても集計することができる統計データ
ベースシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a statistical database system in which information specified by a query sentence by a user in an application program by a user is totaled from a database and output to a user-side application program as a search result. The present invention also relates to a statistical database system that can aggregate data even if information specified by a query statement is not stored in the database as it is.

【０００２】[0002]

【従来の技術】人口の管理やネットワークに流れるトラ
ヒック量の監視などにおいては、統計的なデータの管理
が必要である。統計的なデータを管理する場合、元とな
る原始的なデータ（例えば、電話の場合、１つの呼に関
する情報）を集計することにより、より集計度の高いデ
ータ（例えば、電話の場合、東京と大阪に流れる呼の
量）を導き出すことが煩雑に行われる。しかし、大量の
原始的なデータを扱う場合、この集計処理は、非常に計
算機資源を消費する。従って、ユーザがデータに問い合
わせる度に集計を行うのではなく、予め集計し二次記憶
に保存し、その保存され集計された度合いの高いデータ
に対して問い合わせることが多い。2. Description of the Related Art Statistical data management is required for managing populations and monitoring traffic flowing through networks. When statistical data is managed, original data (for example, information about one call in the case of a telephone) is aggregated, so that data with a higher degree of aggregation (for example, in the case of a telephone, Tokyo and Deriving the amount of calls flowing to Osaka) is complicated. However, when dealing with a large amount of primitive data, this aggregation process consumes a great deal of computer resources. Therefore, in many cases, the data is not totalized every time the user inquires about the data, but is totalized in advance and stored in the secondary storage, and an inquiry is made on the data whose stored and totalized degree is high.

【０００３】企業内の情報を集約し、共用のデータベー
スを構築し、統計処理などで導出したデータを利用し
て、経営判断などをしようというデータウェアハウスと
いう考え方がある。データウェアハウスにおいては、同
じ対象に対する統計データが様々な集計レベルでいくつ
も蓄積されるということが起こりやすい。例えば、人口
管理の場合、国別のデータ、県別のデータ、都市別のデ
ータなどのデータが混在する。データウェアハウスはデ
ータを設計した以外のユーザが利用するので、その場
合、ユーザはどのデータを利用すれば、自分の必要な情
報を最も効率的に求められるのかを決めるのに苦労す
る。[0003] There is a concept of a data warehouse in which information in a company is aggregated, a shared database is constructed, and data derived through statistical processing or the like is used to make management decisions or the like. In a data warehouse, it is easy to accumulate a number of statistical data for the same object at various aggregation levels. For example, in the case of population management, data such as data for each country, data for each prefecture, and data for each city are mixed. Since the data warehouse is used by users other than the one who designed the data, in that case, the user has a hard time deciding which data to use to most efficiently obtain the necessary information.

【０００４】これまで、様々なモデルを持つデータベー
ス管理システムが開発されている（階層モデル、ネット
ワークモデル、関係モデル、オブジェクト指向モデ
ル）。しかし、これらは、データを検索するために、求
めるデータを得る手続きをすべて明示的に書き下す必要
があった。Until now, database management systems having various models have been developed (hierarchical models, network models, relational models, object-oriented models). However, in order to retrieve the data, they had to explicitly write down all procedures for obtaining the desired data.

【０００５】データ構造を意識せずアクセスする方法と
しては、普遍関係によるアクセスインタフェースが研究
されている。これは、条件と求める属性のみを指定する
と、実際にデータが入っている表の結合を行い、所望の
データを導出する方式である。しかし、この方式は、表
の結合は行うものの、自動的にデータの集約は行わな
い。[0005] As a method of accessing without being aware of the data structure, an access interface based on a universal relationship has been studied. In this method, when only a condition and an attribute to be specified are specified, a table containing data is actually joined to derive desired data. However, although this method joins tables, it does not automatically aggregate data.

【０００６】また、自動的に集約を行う技術としては、
多次元データベースがある。しかし、これでは、ユーザ
はデータ構造を知っている必要があり、複数の表から、
どの表を用いるのが一番効率的であるかはユーザが判断
し、それをプログラムに明示的に書かなければならな
い。[0006] Also, as a technique for automatically summarizing,
There is a multidimensional database. However, this requires the user to know the data structure, and from multiple tables,
The user must decide which table is most efficient to use, and explicitly write it in the program.

【０００７】[0007]

【発明が解決しようとする課題】データを検索する手続
きを明示的に書かなければならないのでは、構造をよく
知らない多量なデータから、ユーザが欲しい情報を求め
ることが困難になる。If the procedure for retrieving data must be explicitly written, it becomes difficult for the user to obtain desired information from a large amount of data whose structure is not well known.

【０００８】また、ある対象（例えば、人口）に関し
て、様々な集計レベル（例えば、県別や市別）の情報が
管理されているときに、どの情報から求めるのが正しく
かつ効率的であるかを自分で判断する必要がある。そし
て、アプリケーションプログラムに検索ロジックをコー
ド化している場合、これまで検索されていたデータより
もより高速に検索できるデータが追加された場合に、そ
の新しいデータをアクセスするようにアプリケーション
の変更を行う必要がある。[0008] When information of various aggregation levels (for example, by prefecture or city) is managed for a certain object (for example, population), which information is correct and efficient to determine from which information is obtained. You need to judge for yourself. And if you have coded your search logic in your application program, you need to change your application to access the new data when additional data is added that can be searched faster than previously searched. There is.

【０００９】本発明は、上記に鑑みてなされたもので、
その目的とするところは、ユーザからの問い合わせ文を
入力し、最も集計量の少なくなるようなテーブルを決定
するとともに問い合わせ文で指定された情報がそのまま
の形でデータベースに格納されていなくても集計するこ
とができる統計データベースシステムを提供することに
ある。[0009] The present invention has been made in view of the above,
The purpose is to input a query sent from the user, determine the table that minimizes the total amount, and calculate the table even if the information specified in the query is not stored as it is in the database. It is to provide a statistical database system that can perform the above.

【００１０】[0010]

【課題を解決するための手段】上記目的を達成するた
め、請求項１記載の本発明は、アプリケーションプログ
ラムから送られる問い合わせ文の構文を解析する構文解
析部と、該構文解析部で解析された問い合わせに対する
結果を表から求めることが可能かどうかを判定する導出
可能性判定部と、どのデータベースのどの表を検索する
のが最適であるかを示す最適度を計算する最適度計算部
と、該最適度計算部で計算された最適度に基づいて決定
された問い合わせ先の表に合わせた問い合わせ文を生成
する問い合わせ変換部と、この変換された問い合わせ文
を実際にデータが格納されているデータベースに送って
検索し、その検索結果を受け取るデータベースアクセス
部と、データベースに依存した問い合わせ結果を本来の
問い合わせの結果に沿った形に整形する検索結果処理部
と、本システムによって検索されるデータベースを登録
し、含まれるデータの形式および内容を表すデータであ
るメタデータを作成するデータベース登録処理部と、検
索されるデータベースの表の構造を蓄積するメタデータ
蓄積部と、検索されるデータを管理するデータベース管
理システムと、検索されるデータを格納するデータベー
スとを有することを要旨とする。To achieve the above object, according to the present invention, there is provided a syntax analyzing section for analyzing a syntax of a query sent from an application program, and a syntax analyzing section for analyzing the syntax. A derivability determining unit that determines whether a result of the inquiry can be obtained from the table; an optimality calculating unit that calculates an optimality indicating which table of which database is optimal to be searched; A query conversion unit that generates a query sentence according to the table of the inquiry destination determined based on the optimality calculated by the optimality calculation unit, and stores the converted query sentence in a database in which data is actually stored. A database access unit that sends and searches and receives the search results, and converts the query results depending on the database to the original query results A search result processing unit for shaping the data into a suitable form, a database registration processing unit for registering a database to be searched by the present system, and creating metadata that is data representing the format and contents of the included data, and a database to be searched The gist of the present invention is to have a metadata storage unit that stores the structure of the table, a database management system that manages the searched data, and a database that stores the searched data.

【００１１】請求項１記載の本発明にあっては、アプリ
ケーションプログラムから送られる問い合わせ文の構文
を解析し、この解析された問い合わせに対する結果を表
から求めることが可能かどうかの導出判定を行い、どの
データベースのどの表を検索するのが最適であるかを示
す最適度を計算し、計算された最適度に基づいて決定さ
れた問い合わせ先の表に合わせた問い合わせ文を生成
し、この生成された問い合わせ文を実際にデータが格納
されているデータベースに送って検索し、その検索結果
を利用者側のアプリケーションプログラムに渡す。According to the first aspect of the present invention, the syntax of a query sent from an application program is analyzed, and it is determined whether or not the result of the analyzed query can be obtained from a table. Calculate the optimality that indicates which table in which database is best searched, generate a query sentence that is tailored to the table of the inquiry determined based on the calculated optimality, and generate the generated The inquiry sentence is sent to a database in which data is actually stored and searched, and the search result is passed to an application program on the user side.

【００１２】また、請求項２記載の本発明は、請求項１
記載の発明において、前記最適度計算部が、最適度とし
て、導出するのに必要な集計の少なさを表す集計度およ
び表の全体に対して選択されるレコードの割合の目安で
ある選択度を算出し、最適度としては集計度を優先し、
これにより最も集計計算量が少なくなるように検索する
テーブルを選択することを要旨とする。The present invention according to claim 2 provides the invention according to claim 1.
In the described invention, the optimality calculating unit calculates, as the optimality, a totality indicating a small number of aggregations necessary for derivation and a selectivity which is a measure of a ratio of records selected for the entire table. Calculate and prioritize the aggregation degree as the optimal degree,
The gist of this is to select a table to be searched so as to minimize the total calculation amount.

【００１３】請求項２記載の本発明にあっては、導出す
るのに必要な集計の少なさを表す集計度および表の全体
に対して選択されるレコードの割合の目安である選択度
を最適度として算出し、選択度よりも集計度を優先し、
最も集計計算量が少なくなるように検索するテーブルを
選択する。According to the second aspect of the present invention, the totalization degree indicating the small amount of totalization necessary for derivation and the selectivity which is a measure of the ratio of records selected to the entire table are optimized. Calculated as a degree, giving priority to the degree of aggregation over the degree of selectivity,
Select the table to be searched so as to minimize the total calculation amount.

【００１４】更に、請求項３記載の本発明は、請求項１
記載の発明において、前記導出可能性判定部が、求める
属性がそのテーブルに含まれるという第１の条件と問い
合わせの条件節に含まれる値がそのテーブルのカテゴリ
のリストに含まれるという第２の条件に基づいて導出可
能性の判定を行うことを要旨とする。Further, the present invention according to claim 3 provides the invention according to claim 1.
In the invention described above, the derivability determining unit determines that the first condition that the attribute to be obtained is included in the table and the second condition that the value included in the condition clause of the query is included in the category list of the table The gist is to determine the derivability based on.

【００１５】請求項３記載の本発明にあっては、求める
属性がそのテーブルに含まれることをチェックするとと
もに、問い合わせの条件節に含まれる値がそのテーブル
のカテゴリのリストに含まれることをチェックして、導
出可能性を判定する。According to the third aspect of the present invention, it is checked that the attribute to be obtained is included in the table and that the value included in the conditional clause of the inquiry is included in the category list of the table. Then, the derivability is determined.

【００１６】[0016]

【発明の実施の形態】以下、図面を用いて本発明の実施
の形態について説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１７】図１は、本発明の一実施形態に係る統計デ
ータベースシステムの構成を示すブロック図である。同
図に示す統計データベースシステムは、利用者が指定し
た問い合わせ文を入力とし、最も集計量の少なくなるよ
うなテーブルを決定し、集約が必要な場合に自動的に集
約するシステムであり、ユーザ側に設けられた対話的問
い合わせ処理部２５によってアプリケーションプログラ
ム２３から送られる問い合わせ文の構文を解析する構文
解析部１１と、問い合わせ処理全体を制御する制御部１
２、構文解析部１１で解析された問い合わせ文に対する
結果をデータベースに格納されたテーブルから求めるこ
とが可能かどうかを判定する導出可能性判定部１３、ど
のデータベースのどの表を検索するのが最適であるかを
示す最適度を計算する最適度計算部１４、および該最適
度計算部１４で計算された最適度に基づいて決定された
問い合わせ先のテーブルに合わせた問い合わせ文を生成
する問い合わせ変換部１５からなる問い合わせ処理部２
１と、問い合わせ変換部１５で変換された問い合わせ文
を実際にデータが格納されているデータベースに送って
検索させ、その検索結果を受け取るデータベースアクセ
ス部１６と、データベースに依存した問い合わせ結果を
本来の問い合わせの結果に沿った形に整形する検索結果
処理部１７と、本システムによって検索されるデータベ
ースを登録し、含まれるデータの形式および内容を表す
データであるメタデータを作成するデータベース登録処
理部１８と、検索されるデータベースのテーブルの構造
を蓄積するメタデータ蓄積部１９と、検索されるデータ
を管理するデータベース管理システム２０と、検索され
るデータを格納するデータベース１０とを有する。FIG. 1 is a block diagram showing the configuration of a statistical database system according to one embodiment of the present invention. The statistical database system shown in the figure is a system that takes a query sentence specified by the user as input, determines the table with the least amount of aggregation, and automatically aggregates the data when aggregation is necessary. And a control unit 1 that controls the syntax of the query sent from the application program 23 by the interactive query processing unit 25 provided in the application program 23.
2. Derivability determining unit 13 that determines whether a result for the query sentence analyzed by syntax analysis unit 11 can be obtained from a table stored in the database, and which table in which database is optimal to search An optimality calculating unit 14 for calculating an optimality indicating whether there is a query, and a query converting unit 15 for generating a query sentence according to a query destination table determined based on the optimality calculated by the optimality calculating unit 14 Processing unit 2 consisting of
1, a query sentence converted by the query conversion unit 15 is sent to a database in which data is actually stored to be searched, and a database access unit 16 receiving the search result; And a database registration processing unit 18 that registers a database searched by the present system and creates metadata that is data representing the format and content of the data included. A metadata storage unit 19 for storing a table structure of a database to be searched, a database management system 20 for managing data to be searched, and a database 10 for storing data to be searched.

【００１８】このように構成される統計データベースシ
ステムは、一般には本システムを利用したいユーザが書
くアプリケーションプログラムから呼び出されることに
よって利用される。アプリケーションプログラムは例え
ば対話的なインタフェースを有する問い合わせ応答シス
テムであってもよいし、データ処理を行うごく一般的な
アプリケーションプログラムでもよい。The statistical database system configured as described above is generally used by being called from an application program written by a user who wants to use the present system. The application program may be, for example, an inquiry response system having an interactive interface, or may be a general application program for performing data processing.

【００１９】本統計データベースシステムは、ユーザ側
に設けられた対話的問い合わせ処理部２５によってアプ
リケーションプログラム２３から関数呼び出しの引数な
どの形で送られる問い合わせ部を受け取る。問い合わせ
文の文法は、関係データベースの問い合わせ言語である
ＳＱＬから、from節を省略したものである。具体的に
は、 select（属性のリスト） where（条件式）という形式である。条件式は、ＳＱＬと同様の文法（属
性＝”値”など）である。The statistical database system receives an inquiry section sent from an application program 23 in the form of a function call argument or the like by an interactive inquiry processing section 25 provided on the user side. The grammar of the query sentence is obtained by omitting the from clause from SQL, which is the query language of the relational database. Specifically, the format is select (list of attributes) where (conditional expression). The conditional expression has a grammar similar to SQL (attribute = “value”).

【００２０】本統計データベースシステムは、この問い
合わせから、問い合わせ処理部２１がどのデータにアク
セスすればよいのかを決定し、一旦、関係データベース
に対する検索言語のＳＱＬ言語に変換する。あとは必要
に応じて、実際にデータの蓄積されているデータベース
１０を管理するデータベース管理システム２０への問い
合わせの形式に変換する。ＳＱＬ言語から、その他の言
語に変換する方法はこれまで様々な方法が提案されてい
るので、それを用いることとする。従って、ここから
は、実際にデータを蓄積するデータベース１０として、
関数データベースを想定することにする。変換された問
い合わせは、データベースアクセス部１６によってデー
タベース管理システム２０に渡され、検索結果を得る。
得られた検索結果を、必要な場合、元の問い合わせに適
合するように整形し、ユーザ側のアプリケーションプロ
グラム２３に返す。The statistical database system determines from the query which data the query processing unit 21 has to access, and temporarily converts the data into the SQL language as a search language for the relational database. After that, if necessary, the data is converted into a form of an inquiry to the database management system 20 for managing the database 10 in which data is actually stored. Various methods have been proposed for converting the SQL language into another language, and will be used here. Therefore, from here on, as the database 10 for actually storing data,
Let's assume a function database. The converted query is passed to the database management system 20 by the database access unit 16 to obtain a search result.
If necessary, the obtained search result is shaped so as to conform to the original query, and is returned to the application program 23 on the user side.

【００２１】データベース１０に蓄積するデータについ
て説明する。データベース１０は関係データベースを想
定したので、表形式のデータ（テーブル）を管理してい
る。テーブルはいくつかの属性を持つが、統計値を分類
する項目に対応する属性と、統計値を格納する属性とい
う２種類の属性を持つものとする。前者を分類属性、後
者を統計属性と呼ぶ。また、分類属性のとる値をカテゴ
リと呼ぶ。データベース１０に格納するテーブルの例を
図２に示す。図２（ａ）の世界人口テーブルは、「国」
と「性別」が分類属性であり、「人口」が統計属性であ
る。国の値である「日本」はカテゴリの例である。The data stored in the database 10 will be described. Since the database 10 is assumed to be a relational database, it manages tabular data (tables). The table has several attributes, but it is assumed that the table has two types of attributes: an attribute corresponding to an item for classifying a statistical value and an attribute for storing a statistical value. The former is called a classification attribute, and the latter is called a statistical attribute. The value that the classification attribute takes is called a category. FIG. 2 shows an example of a table stored in the database 10. The world population table in FIG.
And “sex” are classification attributes, and “population” is a statistical attribute. “Japan” which is a country value is an example of a category.

【００２２】自動的に集計を行うために、テーブルのデ
ータには「カテゴリは意味的に交わらない」という制約
を満たすものとする。その理由は、集計を行った時に、
重複する対象に関して余計に集計してしまうことを防ぐ
ためである。例えば、図２の例で、「国」という属性
は、そのカテゴリが「日本」「米国」などであって、人
口管理上、日本人であって米国人であるということはな
いので、国は「人口」という統計属性について交わらな
い。これに対し、「地域」という分類属性があり、「日
本」や「神奈川県」などといった値をとるとする。この
場合、日本人であって、神奈川県民であるということが
あり得るから、カテゴリが意味的に交わっている。これ
は集計を行った場合重複してカウントしてしまうので不
適切である。このようなテーブルは許されない。In order to perform the totaling automatically, it is assumed that the data in the table satisfies the constraint that "categories do not intersect semantically". The reason is that when we do the tally,
This is to prevent unnecessary counting of overlapping targets. For example, in the example of FIG. 2, the attribute of "country" is that its category is "Japan", "USA", etc., and because of population management, it is not Japanese and not American, so the country is We do not talk about the statistical attribute "population". On the other hand, it is assumed that there is a classification attribute of “region”, which takes a value such as “Japan” or “Kanagawa Prefecture”. In this case, the category is semantically intersected because it is possible to be a Japanese and a Kanagawa citizen. This is unsuitable because when counting is performed, counting is repeated. Such tables are not allowed.

【００２３】問い合わせの時間が最も短くなるテーブル
を実行時に選択するために、本システムはメタデータ蓄
積部１９にテーブルやその持つ値についての情報（メタ
データ）を格納する。図３（ａ）にメタデータ管理フォ
ーマットと例を示す。The system stores information (metadata) on the tables and their values in the metadata storage unit 19 in order to select the table that minimizes the inquiry time at the time of execution. FIG. 3A shows a metadata management format and an example.

【００２４】メタデータ管理テーブルでは、データベー
ス１０に実際に格納されているテーブルごとに、そのテ
ーブルが持つ分類属性と統計属性を管理し、分類属性に
ついては、その属性が持つ値（カテゴリと呼ぶ）のリス
トを組で持っている。例えば、図２における例の場合、
世界人口と日本人口という２つのテーブルを持ち、世界
人口テーブルについては、国、性別という分類属性を持
ち、それぞれすべての値を管理している。また、人口と
いう１つの統計属性を持っている。日本人口テーブルに
ついては、国、性別、県という分類属性を持ち、国につ
いては、日本のみの値を管理している。日本人口テーブ
ルは国という属性は持たない。しかし、メタデータ管理
テーブルには、日本という１つの値を含む属性があるよ
うにしている。これは、データベースに国という属性は
あるのだが、すべて日本という値をとるので省略されて
いると解釈する。従って、１つのカテゴリしか持たない
分類属性は、実際のデータベースには属性を持たない。
この例の日本人口テーブルについて国という分類属性を
メタデータとして登録するかどうかは、本統計データベ
ースシステムの管理者の責任によって判断される。In the metadata management table, for each table actually stored in the database 10, the classification attribute and the statistical attribute of the table are managed, and the classification attribute has the value (called a category) of the attribute. Has a list of For example, in the case of the example in FIG.
It has two tables, the world population and the Japanese population. The world population table has classification attributes of country and gender, and manages all values. It also has one statistical attribute called population. The Japanese population table has classification attributes of country, gender, and prefecture, and the country manages values only for Japan. The Japanese population table does not have the attribute of country. However, the metadata management table has an attribute including one value of Japan. This means that although the database has an attribute of country, it is omitted because it all takes the value of Japan. Therefore, a classification attribute having only one category has no attribute in an actual database.
It is determined by the administrator of the statistical database system whether or not to register the classification attribute of country as metadata in the Japanese population table in this example.

【００２５】カテゴリ管理テーブルでは、図３（ｂ）に
示すように、ある分類属性に対して、取り得る値のリス
トを管理する。例えば、図２の例の場合、国という分類
属性は１００個のカテゴリを値として取り得て、その値
は、日本、米国・・・などである。As shown in FIG. 3B, the category management table manages a list of possible values for a certain classification attribute. For example, in the case of the example of FIG. 2, the classification attribute of country can take 100 categories as values, and the values are Japan, the United States, and so on.

【００２６】集計関数管理テーブルでは、図３（ｃ）に
示すように、統計属性について、集計が行われた場合に
導出する式を管理する。これは、統計属性によって、導
出する方法が異なることによる。例えば、人口の場合、
単純に和を計算すればよいが、平均年齢の場合、単純に
その平均を取るのではなく、人口によるウェイトをかけ
て平均をとる必要がある。この集計関数管理テーブル
は、図３（ｃ）に示すように、統計属性とその計算式の
組み合わせのフォーマットを持つ。In the aggregation function management table, as shown in FIG. 3C, an expression derived when aggregation is performed is managed for statistical attributes. This is because the derivation method differs depending on the statistical attribute. For example, for population
It is sufficient to simply calculate the sum, but in the case of the average age, it is necessary to calculate the average by weighting according to the population instead of simply calculating the average. This aggregation function management table has a format of a combination of a statistical attribute and its calculation formula, as shown in FIG.

【００２７】次に、問い合わせ処理の詳細について説明
する。Next, the details of the inquiry process will be described.

【００２８】問い合わせ文は構文解析部１１によって構
文解析され、構文に間違いがある場合は、そのエラーの
値を設定し、アプリケーションから参照できるようにす
る。エラーがないことになれば、構文解析結果は問い合
わせ処理部２１に渡される。The query sentence is parsed by the parsing unit 11, and if there is a mistake in the syntax, an error value is set so that the application can refer to it. If there is no error, the syntax analysis result is passed to the query processing unit 21.

【００２９】問い合わせ処理部２１における処理につい
て図４に示すフローチャートを参照して説明する。図４
においては、まず、すべてのテーブルについて以下の処
理を行うために１つのテーブルｔを取り上げ（ステップ
Ｓ１１）、テーブルｔが残っていないかどうかまたは以
下の処理で計算される最適度の最大値Ｏmax が最も大き
な極限値であるかどうかを判定するとともに（ステップ
Ｓ１２）、すべてのパラメータを初期化する（ステップ
Ｓ１１，Ｓ１２）。The processing in the inquiry processing section 21 will be described with reference to the flowchart shown in FIG. FIG.
First, one table t is picked up in order to perform the following processing for all the tables (step S11), and whether or not the table t remains or the maximum value Omax of the optimality calculated in the following processing is determined. It is determined whether or not it is the largest limit value (step S12), and all parameters are initialized (steps S11, S12).

【００３０】それから、該テーブルについて導出可能で
あるかどうかを判定する（ステップＳ１３）。導出可能
でない場合には、ステップＳ１２に戻って同じ処理を繰
り返すが、導出可能である場合には、該テーブルｔの最
適度Ｏｔを計算する（ステップＳ１４）。そして、この
計算した最適値Ｏｔが今まで計算した中の最適度の最大
値Ｏmax よりも大きいかどうかをチェックする（ステッ
プＳ１５）。大きくない場合には、ステップＳ１１に戻
って、次のテーブルについて同じ処理を繰り返すが、最
大値Ｏmax よりも大きい場合には、この最大値Ｏmax を
最適度Ｏｔに代入し（ステップＳ１６）、ステップＳ１
１に戻って、次のテーブルについて同じ処理を繰り返
す。以上の処理をすべてのテーブルについて繰り返し行
い、更に処理すべきテーブルｔが残っていない場合に
は、ステップＳ１７に進む。Then, it is determined whether or not the table can be derived (step S13). If the derivation is not possible, the process returns to step S12 to repeat the same process. If the derivation is possible, the optimality Ot of the table t is calculated (step S14). Then, it is checked whether or not the calculated optimum value Ot is larger than the maximum value Omax of the optimum degree calculated so far (step S15). If not, the process returns to step S11 and the same processing is repeated for the next table. If it is larger than the maximum value Omax, this maximum value Omax is substituted for the optimality Ot (step S16), and step S1 is performed.
1, the same process is repeated for the next table. The above processing is repeated for all the tables, and if there is no table t to be further processed, the process proceeds to step S17.

【００３１】ステップＳ１７では、すべてのテーブルｔ
について導出不可能であるかどうかをチェックする。そ
して、すべてのテーブルについて導出不可能である場合
には、データベースにそのデータがない旨の応答を行い
（ステップＳ１８）、処理を終了するが、導出不可能で
ない場合、すなわち導出可能なテーブルがある場合に
は、上述したように計算した最適度が最大となるテーブ
ルについてデータベースへの問い合わせを生成実行し
（ステップＳ１９）、処理を終了する。In step S17, all tables t
Check whether it is not possible to derive. If all the tables cannot be derived, a response indicating that the data does not exist in the database is sent (step S18), and the process is terminated. In this case, an inquiry to the database is generated and executed for the table having the maximum degree of optimality calculated as described above (step S19), and the process ends.

【００３２】導出可能性の判定は以下に示す２つの条件
で行う。The derivability is determined under the following two conditions.

【００３３】（条件１）求める属性がそのテーブルに含
まれる。(Condition 1) The attribute to be obtained is included in the table.

【００３４】（条件２）問い合わせの条件節に含まれる
値がそのテーブルのカテゴリのリストに含まれる。(Condition 2) The value included in the conditional clause of the inquiry is included in the category list of the table.

【００３５】上述した（条件１）は、メタデータ管理テ
ーブルの属性のリストと求める属性のリストを比較する
ことにより容易にわかる。また、（条件２）を求めるた
めには、まず、問い合わせの条件節をカテゴリ管理テー
ブルに適用することにより、求められるカテゴリ値のリ
ストを求める。次に、そのリストをメタデータ管理テー
ブルのカテゴリの値のリストに含まれるかどうかをテス
トすればよい。The above (condition 1) can be easily understood by comparing the list of attributes in the metadata management table with the list of required attributes. In order to obtain (condition 2), first, a list of category values to be obtained is obtained by applying the condition clause of the inquiry to the category management table. Next, it is sufficient to test whether the list is included in the list of category values in the metadata management table.

【００３６】導出可能性のテストがＯＫだったテーブル
について、最適度を計算する。最適度は、集計度と選択
度の組によって定義できる。The optimality is calculated for the table for which the test of the derivability was OK. The optimality can be defined by a set of the totality and the selectivity.

【００３７】集計度とは、導出するのに必要な集計の少
なさを表す数値であり、問い合わせにおいて選択された
分類属性の数をｓ、問い合わせの条件に指定された分類
属性の数をｃ、テーブルの分類属性の数をａとすると、（集計度）＝ｓ／（ａ−ｃ）で表される。問い合わせで選択された分類属性の数とテ
ーブルの分類属性の数が一致する場合、全く集計をする
必要がないので、集計度は最大の１となる。問い合わせ
で選択された分類属性の数より、テーブルの分類属性の
数が多いほど、多くの集計が必要となるから、集計度は
低くなる。The degree of aggregation is a numerical value indicating a small amount of aggregation required for derivation. The number of classification attributes selected in the inquiry is s, the number of classification attributes specified in the inquiry condition is c, Assuming that the number of classification attributes in the table is a, (totaling degree) = s / (ac). When the number of classification attributes selected in the inquiry matches the number of classification attributes in the table, there is no need to perform totaling, so the totaling degree is 1 at the maximum. The greater the number of classification attributes in the table than the number of classification attributes selected in the inquiry, the more aggregations are required, the lower the degree of aggregation.

【００３８】選択度とは、テーブルの全体に対して選択
されるレコードの割合の目安である。１つの属性に対す
る選択度は、問い合わせの条件で指定されたカテゴリの
数をＣ、その分類属性に関して、テーブルに含まれる全
カテゴリ数をＡとすると、（１つの属性に対する選択度）＝Ｃ／Ａで定義でき、問い合わせのテーブル全体に対する選択度
は、and 条件の属性の選択度をｓ１，ｓ２，・・・，ｓ
ｎ、 or 条件の属性の選択度をｔ１，ｔ２，・・・，ｔ
ｍとすると、（選択度）＝（１−Π（１−ｔｉ））Πｓｉである（Πは、それぞれｎ個、ｍ個の積を表す）。The selectivity is a measure of the proportion of records selected for the entire table. Assuming that the number of categories specified in the query condition is C and the total number of categories included in the table is A for the classification attributes, the selectivity for one attribute is: (selectivity for one attribute) = C / A And the selectivity of the query for the entire table is s1, s2, ..., s
t, t2,..., t
Assuming that m, (selectivity) = (1-Π (1-ti)) Πsi (Π represents a product of n pieces and m pieces, respectively).

【００３９】集計は選択に比べてはるかに計算機資源を
消費するから、最適度としては、集計度を優先する。つ
まり、２つの最適度（ａ１，ｂ１），（ａ２，ｂ２）を
比較する場合（ａｉは集計度、ｂｉは選択度）は、ａ１
＞ａ２の場合、無条件に、（ａ１，ｂ１）＞（ａ２，ｂ
２）と見なす。ａ１＝ａ２の場合、ｂ１＞ｂ２ならば、
（ａ１，ｂ１）＞（ａ２，ｂ２）と見なす。Since tabulation consumes much more computer resources than selection, tabulation is prioritized as the optimum degree. That is, when comparing the two optimalities (a1, b1) and (a2, b2) (ai is the aggregation degree, bi is the selectivity), a1
> A2, unconditionally (a1, b1)> (a2, b
2). When a1 = a2, if b1> b2,
It is assumed that (a1, b1)> (a2, b2).

【００４０】最適度の最も高いテーブル（これをｔとす
る）が選択されると、そのテーブルに対する問い合わせ
を生成する。それは、以下の問い合わせになる。When a table having the highest degree of optimality (here, t) is selected, an inquiry for the table is generated. It becomes the following inquiry.

【００４１】次に、本実施形態の統計データベースシステムを用いた
問い合わせの例について図５を参照して説明する。デー
タベースに実際に格納されているテーブルとして、図２
（ａ），（ｂ）に示した世界人口テーブルおよび日本人
口テーブルの２つがあるとする。[0041] Next, an example of an inquiry using the statistical database system of the present embodiment will be described with reference to FIG. As a table actually stored in the database, FIG.
It is assumed that there are two tables, a world population table and a Japanese population table shown in (a) and (b).

【００４２】アプリケーションプログラムからは、「日
本の性別の人口を求める」という意味を表す select 性別、人口 where国＝’日本’ という問い合わせが渡されるとする。まず、導出可能性
の判定を行う。It is assumed that an inquiry such as "select gender, population where country = 'Japan'" representing the meaning of "determining the population of Japanese gender" is passed from the application program. First, the possibility of derivation is determined.

【００４３】（条件１）求める属性がそのテーブルに含
まれる。(Condition 1) The attribute to be obtained is included in the table.

【００４４】この条件は明らかに両者とも満たす。This condition obviously satisfies both.

【００４５】（条件２）問い合わせの条件節に含まれる
値がそのテーブルのカテゴリのリストに含まれる。(Condition 2) The value included in the conditional clause of the inquiry is included in the category list of the table.

【００４６】問い合わせの条件節は、「国＝’日本’」
であるが、メタデータ管理テーブルを検索することによ
り、世界人口は、国をすべて管理し、日本人口は、’日
本’という国を管理しているので、両者とも満たす。The condition clause of the inquiry is "country = 'Japan'"
However, by searching the metadata management table, the world population manages all the countries, and the Japanese population manages the country called "Japan", so both are satisfied.

【００４７】次に、最適度の計算を行う。Next, the calculation of the optimality is performed.

【００４８】（世界人口に対する集計度）＝ｓ／（ａ−
ｃ）＝１／（２−１）＝１（日本人口に対する集計度）＝１／（２−０）＝０．５（世界人口に対する選択度）＝Ｃ／Ａ＝１／１００＝
０．０１（日本人口に対する選択度）＝１／１＝１集計度が選択度よりも優先されるので、検索対象として
は、世界人口が選択される。問い合わせで選択した属性
がｔの属性と等しいので、集計関数の適用の必要がな
く、group by節が必要ないので、最終的に生成される問
い合わせ文は、 select 性別、人口 from 世界人口 where 国＝’日本’ となる。(Tally of world population) = s / (a−
c) = 1 / (2-1) = 1 (totality for Japanese population) = 1 / (2-0) = 0.5 (selectivity for world population) = C / A = 1/100 =
0.01 (selectivity with respect to the Japanese population) = 1/1 = 1 Since the count is prioritized over the selectivity, the world population is selected as the search target. Since the attribute selected in the query is equal to the attribute of t, there is no need to apply the aggregation function and no group by clause is required, so the final generated query statement is select gender, population from world population where country = 'Japan'.

【００４９】また、仮に、日本人口というテーブルのみ
を持つ場合、問い合わせで選択した属性がｔの属性より
も小さいので、group 化の必要があるので、集計関数管
理テーブルに登録されている集計関数を適用し、生成さ
れる問い合わせは、 select 性別、 sum（人口） from 日本人口 group by 性別となる。Further, if only the table of Japanese population is provided, the attribute selected in the query is smaller than the attribute of t, and it is necessary to group the data. The query applied and generated is select gender, sum (population) from Japanese population group by gender.

【００５０】次に、図６を参照して、別の例について説
明する。この場合、問い合わせとして、日本人の性別の
平均年齢を求める。Next, another example will be described with reference to FIG. In this case, as an inquiry, the average age of Japanese gender is obtained.

【００５１】select 性別、平均年齢 where国＝’日
本’ という問い合わせがあるとする。導出可能性判定におい
て、平均年齢という属性を持たない世界人口テーブルは
問い合わせの対象からはずれる。問い合わせで選択した
属性がｔの属性よりも小さいので、group 化の必要があ
り、集計関数管理テーブルに登録されている集計関数を
適用し、Select Gender, average age where country = 'Japan' In the derivation possibility determination, the world population table that does not have the average age attribute is excluded from the inquiry target. Since the attribute selected in the query is smaller than the attribute of t, it is necessary to group, apply the aggregate function registered in the aggregate function management table,

【数１】select 性別、 sum（平均年齢×人口）／ sum
（人口）from日本人口 group by 性別という問い合わせを最終的に生成する。[Equation 1] select gender, sum (average age x population) / sum
(Population) from Japan population group by gender finally generate query.

【００５２】上述したように、本実施形態では、利用者
がアプリケーションプログラム中で問い合わせ文によっ
て指定した情報をそのままの形で格納されていなくても
集計して導出し、所望の情報を検索結果として利用者側
のアプリケーションプログラムに渡すことができる。ま
た、データの検索を行う時に、最も集計計算量が少なく
なるように検索するテーブルを選択することができる。
更に、検索文として、関係データベースシステムの問い
合わせ言語であるＳＱＬ言語から、form節を省略した形
式を有する言語によってアクセスすることができる。ま
た、データベース中に蓄積する表に対して一定の制約を
満たすことによって自動的に集計を行うことができる。As described above, in the present embodiment, even if the information specified by the query sentence by the user in the application program is not stored as it is, the information is totaled and derived, and the desired information is obtained as the search result. It can be passed to the application program on the user side. Further, when performing data search, it is possible to select a table to be searched so that the total calculation amount is minimized.
Further, the search sentence can be accessed from a SQL language, which is a query language of the relational database system, by a language having a form in which a form clause is omitted. In addition, totaling can be automatically performed by satisfying certain restrictions on tables accumulated in the database.

【００５３】[0053]

【発明の効果】以上説明したように、本発明によれば、
アプリケーションプログラムから送られる問い合わせ文
の構文を解析し、この解析された問い合わせに対する結
果を表から求めることが可能かどうかの導出判定を行
い、どのデータベースのどの表を検索するのが最適であ
るかを示す最適度を計算し、計算された最適度に基づい
て決定された問い合わせ先の表に合わせた問い合わせ文
を生成し、この生成された問い合わせ文を実際にデータ
が格納されているデータベースに送って検索し、その検
索結果を利用者側のアプリケーションプログラムに渡す
ので、利用者がデータベースの構造を意識せずにアクセ
スすることが可能になり、最適な検索を選択することが
できるとともに、アクセス速度の増加により、集計度が
より高いデータを追加しても、アプリケーションプログ
ラムを変更する必要がない。As described above, according to the present invention,
Analyzes the syntax of the query sent from the application program, derives whether it is possible to obtain the result of the parsed query from the table, and determines which table in which database is best searched. Calculate the optimality shown, generate a query sentence according to the table of the inquiry destination determined based on the calculated optimality, and send the generated query sentence to the database where the data is actually stored. Since the search is performed and the search results are passed to the application program on the user side, it is possible for the user to access the database without being aware of the structure of the database, so that the optimum search can be selected and the access speed can be reduced. Due to the increase, it is necessary to change the application program even when adding data with higher aggregation No.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る統計データベースシ
ステムの構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a statistical database system according to an embodiment of the present invention.

【図２】図１に使用されている統計データベースシステ
ムのデータベースに格納されるデータの例を示す図であ
る。FIG. 2 is a diagram showing an example of data stored in a database of a statistical database system used in FIG.

【図３】図１の統計データベースシステムにおけるメタ
データの管理法とその例を示す図である。FIG. 3 is a diagram showing a metadata management method and an example thereof in the statistical database system of FIG. 1;

【図４】図１の統計データベースシステムの問い合わせ
処理を示すフローチャートである。FIG. 4 is a flowchart showing an inquiry process of the statistical database system of FIG. 1;

【図５】図１の統計データベースシステムを用いた問い
合わせの例を示す図である。FIG. 5 is a diagram showing an example of an inquiry using the statistical database system of FIG. 1;

【図６】図１の統計データベースシステムを用いて問い
合わせの例を示す図である。FIG. 6 is a diagram showing an example of an inquiry using the statistical database system of FIG. 1;

[Explanation of symbols]

１０データベース１１構文解析部１２制御部１３導出可能性判定部１４最適度計算部１５問い合わせ変換部１６データベースアクセス部１７検索結果処理部１８データベース登録処理部１９メタデータ蓄積部２０データベース管理システム２３アプリケーションプログラム２５対話的問い合わせ処理部 DESCRIPTION OF SYMBOLS 10 Database 11 Syntax analysis part 12 Control part 13 Derivability | determination part 14 Optimality calculation part 15 Query conversion part 16 Database access part 17 Search result processing part 18 Database registration processing part 19 Metadata storage part 20 Database management system 23 Application program 25 Interactive Query Processing Unit

Claims

[Claims]

1. A syntax analyzer for analyzing the syntax of a query sent from an application program, and a derivability determiner for determining whether a result of the query analyzed by the syntax analyzer can be obtained from a table. And an optimality calculating unit for calculating an optimality indicating which table of which database is optimal to be searched, and a table of a query destination determined based on the optimality calculated by the optimality calculating unit A query conversion unit that generates a query sentence according to a query, a database access unit that receives the search result by sending the converted query sentence to a database in which data is actually stored, and a query dependent on the database A search result processing unit for shaping the result into a form in accordance with the result of the original query; A database registration processing unit that registers a database and creates metadata that is data representing the format and contents of the included data, a metadata storage unit that stores the structure of a database table to be searched, A statistical database system comprising: a database management system for managing; and a database for storing data to be searched.

2. The optimality calculating unit calculates, as the optimality, an aggregation degree indicating a small number of aggregations necessary for derivation and a selectivity which is a measure of a ratio of records selected for the entire table. 2. The statistical database system according to claim 1, wherein the calculation is performed, and the tallying degree is prioritized as the optimum degree, and a table to be searched is selected so as to minimize the tallying calculation amount.

3. The derivability determining unit includes a first condition that an attribute to be obtained is included in the table and a second condition that a value included in a condition clause of an inquiry is included in a category list of the table. The statistical database system according to claim 1, wherein the determination of the derivability is performed based on the following.