JPS6258332A

JPS6258332A - Optimizing system for partial inquiry of decentralized data base control system

Info

Publication number: JPS6258332A
Application number: JP60198762A
Authority: JP
Inventors: Teruo Nakada; 中田　輝生; Masayoshi Tezuka; 手塚　正義
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1985-09-09
Filing date: 1985-09-09
Publication date: 1987-03-14

Abstract

PURPOSE:To simplify the cost calculation by omitting each coupling arithmetic or applying the semi-coupling or a coupling to carry out the partial inquiry for cost calculation and then performing the minimum cost calculation. CONSTITUTION:When a command is supplied for coupling of tables A, B and C, three cases are first studied with two tables A and B as follows. That is, (1) the coupling is omitted for the time being and then the coupling is attained, (2) the semi-coupling is performed at first and then the coupling is attained and (3) the coupling is attained from the first. Then a case of the minimum cost is selected among those three cases. When the case (2) is discriminated satisfactory according to the cost calculation, the semi-coupling result A(B) and a table C are studied. Thus, it is discriminated that the case (1) is satisfactory. Finally the coupling results A and C(B) and a table B are studied and the case (1) is discriminated preferable. Thus each case is carried out according to the result of each discrimination.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、分散データベース管理システムにおいて利用
者によって入力された間合せコマンドを各電算機上で処
理される部分間合せに分解する際同時に行なわれる最適
化処理方式に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention provides a distributed database management system that simultaneously decomposes an alignment command input by a user into partial alignments that are processed on each computer. Regarding optimization processing methods.

[Conventional technology]

データベースを持つ電算機が複数台あり、相互は通信回
線で結ばれ、各々のデータベースを用い”て総合的なデ
ータ処理を行なう分散データベース管理システムでは、
利用者によってシステムに与えられた間合せコマンドを
実行する際、各電算機で実行可能な部分間合せに分解し
、実行するが、その時データ転送量及び電算機内部処理
量などが最小になるような分解方法を考える、つまり最
適化を行なう。In a distributed database management system, there are multiple computers with databases, each connected by a communication line, and each database is used to perform comprehensive data processing.
When executing an alignment command given to the system by a user, it is broken down into partial alignments that can be executed on each computer and executed, but the amount of data transferred and the amount of internal processing of the computer are minimized. Consider a decomposition method, that is, perform optimization.

分散データベース管理システムが行なう処理の主なもの
に、結合および準結合がある。個々の計算機が行なう作
業も、データ伝送、結合、準結合処理が主である。デー
タベースには第６図（ａｌの従業員テーブル、および第
６図１ｂ）の家族テーブルなどがあるが、結合とはこれ
らを突き合わせて（合成して）第７図の如き新しい大き
なテーブルを作る操作をいう。第７図は第６図の従業員
番号に注目し、その氏名、該氏名の従業員の勤務先（こ
れらは従業員テーブルにある）、家族の名前および続柄
（これらは家族テーブルにある）を列挙したものである
。この結合を行なわせるコマンドは、作業員テーブルと
家族テーブルを、従業員番号で結合せよ、である。第６
図（ａ）にある従業員でも、第６図（ｂ）にない者（家
族のない者）は結合結果のテーブル（第７図）から落さ
れる点が注目される。The main types of processing performed by distributed database management systems are joins and semi-joins. The work performed by individual computers is mainly data transmission, combination, and semi-combination processing. The database includes the employee table in Figure 6 (Al's employee table and the family table in Figure 6 1b), and joining is the operation of matching (combining) these tables to create a new large table as shown in Figure 7. means. Figure 7 focuses on the employee number in Figure 6 and shows the employee's name, place of work (these are in the employee table), family names and relationships (these are in the family table). It is listed. The command to perform this join is "Join the worker table and family table by employee number." 6th
It is noteworthy that even if the employees are in Figure 6(a), those who are not in Figure 6(b) (those without families) are dropped from the join result table (Figure 7).

か＼る結合は、家族のある従業員データを求める場合に
有効である。準結合とは、結合した結果をもとのどちら
かのテーブルの項目のみのデータについて抽出する操作
で、第８図がその一例である。Such a combination is effective when obtaining data on employees with family members. A semi-join is an operation in which the result of the join is extracted for only the data of the items in either of the original tables, and FIG. 8 is an example of this.

準結合では注目したフィールドだけ送って結合を行なう
。第８図は第７図の結合結果のうち第６図（ａ）の項目
群（従業員テーブルの諸項目）を抽出したものである。In a semi-join, only the fields of interest are sent and the join is performed. FIG. 8 shows the item group of FIG. 6(a) (items of the employee table) extracted from the combined result of FIG. 7.

第６図（ａ）と比べて、家族のない従業員が欠落してい
る点が異なる。結合はユーザがコマンド入力してこれを
行なわせるが、準結合はユーザが指定するものではなく
、ユーザコマンドを実行する過程においてシステムが内
部的に行なう。The difference from Figure 6(a) is that employees without families are missing. Binding is performed by the user inputting a command, but semi-binding is not specified by the user, but is performed internally by the system in the process of executing the user command.

[Problem that the invention seeks to solve]

ユーザが結合を指示する（どのテーブルをどの項目につ
いて結合するか指定する）とき、システムはどういう結
合または準結合をどういう順序で行なうかをコストミニ
マムを基準に決定する。この決定に当って従来は常に全
体に注目し、例えば××と××を準結合し、また××と
××を準結合し、最後に××と××を結合して答を得る
スケジュールがあったとすると、そのスケジュール全体
に要するコストを計算し、また組合せを変えた他のスケ
ジュール（一般には複数生じる）全体に要するコストを
計算し、これらを比較し、最良のもの（コスト最低）を
採用し、実行する。しかしこの方式ではコスト計算量が
多く、甚だ厄介である。When a user instructs a join (specifies which tables are to be joined for which items), the system determines what kind of join or semi-join to perform and in what order based on the cost minimum. Conventionally, when making this decision, we always focused on the whole, and for example, we semi-combined XX and XX, then semi-combined XX and XX, and finally, we combined XX and XX to obtain the answer. If there is, calculate the cost required for the entire schedule, calculate the cost required for all other schedules with different combinations (generally, multiple schedules occur), compare these, and choose the best one (lowest cost). Recruit and implement. However, this method requires a large amount of cost calculation and is extremely troublesome.

本発明はこの点を改善し、コスト計算を簡単化しようと
するものである。The present invention aims to improve this point and simplify cost calculation.

[Means for solving problems]

本発明は、複数の電算機とこれらを結ぶ通信路ネットワ
ーク上に作成された分散データベース管理システムにお
いて利用者によって入力された、複数のデータベースの
結合を含む問合せを、各電算機上で実行する部分間合せ
群に分解する除行なわれる部分間合せ最適化方式におい
て、各電算機上で実行する部分間合せを、各々データベ
ース又は処理結果を持つ２台の電算機についての（１）
とりあえず結合を省略しておいて、後に結合する、（２
）とりあえず準結合で処理しておいて、後に結合する、
■初めから結合を行なってしまう、のいずれかとしてそ
れぞれのコスト計算をし、コスト最小のものを実行し、
その処理結果を持つ電算機と、他のデータベースを持つ
電算機との間で上記■。The present invention is a distributed database management system created on a plurality of computers and a communication channel network connecting these computers, and a part that executes on each computer a query that is input by a user and includes a combination of a plurality of databases. In the partial alignment optimization method that is divided into alignment groups, the partial alignment executed on each computer is divided into two computers, each with its own database or processing result (1).
Omit the join for now and join later (2
) Process it as a semi-join for now, and then join it later.
■Calculate the cost of each of them, either perform the join from the beginning, and execute the one with the lowest cost.
■ above between the computer that has the processing results and the computer that has other databases.

■、■のケースのコスト計算をし、コスト最小のものを
実行し、以下同様処理を逐次利用者間合せに対する解答
が得られるまで行なうようにすることを特徴とするもの
である。This method is characterized in that costs are calculated for cases (1) and (2), the one with the lowest cost is executed, and the same process is repeated one after another until an answer to the user's arrangement is obtained.

[Effect]

第１図に示すようにテーブルＡとテーブルＢを結合する
（これはコマンドにより入力される）際、処理方法には
次の３つがある。即ちテーブルＡは電算機Ａに、テーブ
ルＢは電算機Ｂにあり、結合結果を電算機Ｃ上に得ると
して、その■は第２図に示すように当面の結合（計算機
Ａ、Ｂ間での結合、準結合）を省略し、テーブルＡ、Ｂ
を原型のまま計算機Ａ、Ｂから電算機Ｃに送り、そこで
結合する方式であり、データ転送量はＶ　ｃＡ）　＋　
Ｖ　（Ｂ）（こ＼でｖＯはデータの量を示す）、電算機
内部処理量はＡとＢの結合の分量で、これらの全体がコ
スト計算対象になる。As shown in FIG. 1, when combining table A and table B (this is input by a command), there are the following three processing methods. In other words, table A is on computer A, table B is on computer B, and the join result is obtained on computer C. As shown in Fig. (joins, semi-joins) are omitted, tables A and B
is sent in its original form from computers A and B to computer C, where it is combined, and the amount of data transferred is V cA) +
V (B) (where vO indicates the amount of data), the computer's internal processing amount is the amount of the combination of A and B, and the entire amount is subject to cost calculation.

その■は第３図に示すように、当面の結合を準結合で行
ない、その後テーブルを電算機Ｃに送って結合し直す方
式であり、データ転送量はＶ■＋Ｖ（ａｌ＋Ｖ　（ＢＤ
（Ａ）、電算機内部処理量はＢ■Ａの計算量とＡＨ（Ｂ
ＩＸＡ）の計算量の和で、これらの全体がコスト計算対
象になる。こ＼でＸは準結合、Ｘは結合を示し、ａは結
合の際注目する項目を示す。計算機Ａから計算機Ｂへ送
るデータ量はＶ　（ａ）、計算機Ａより計算機Ｃへ送る
データ量■（Ａ）、計算機Ｂより計算機Ｃへ送るのは準
結合結果であって、データ量はＶ　（ＢＫＡ）である。As shown in FIG.
(A), the computer's internal processing amount is B■A's calculation amount and AH(B
IXA), the total amount of these is the subject of cost calculation. Here, X indicates a semi-bond, X indicates a bond, and a indicates an item to be noted during the bond. The amount of data sent from computer A to computer B is V (a), the amount of data sent from computer A to computer C (A), the result of semi-combination is sent from computer B to computer C, and the amount of data is V ( BKA).

このケース■を第６図〜第８図の例で説明すると、家族
テーブルは計算機Ａにあり、従業員テーブルは計算機Ｂ
にあるとすると、求める結果は第７図の家族付き従業員
テーブルでこのテーブルでは家族のない従業員は外され
る（対象外）から、電算機Ａから電算機Ｂへ従業員番号
（注目する項目）を送り（これは家族のある者、の情報
であり、伝送データ量はＶ　（ａ）である）、電算機Ｂ
では電算機Ａから従業員番号が来た者の欄（従業員番号
、氏名、勤務先）のみ取出しくこれが準結合結果）、こ
れを計算機Ｃへ送る。計算機Ｃは計算機Ａからは家族テ
ーブルが送られ、これらを突合せて第７図の結合結果を
得る。To explain this case ■ using the example in Figures 6 to 8, the family table is on computer A, and the employee table is on computer B.
, the desired result is the table of employees with families shown in Figure 7.In this table, employees without families are excluded (not targeted), so the employee number (of interest) is transferred from computer A to computer B. item) (this is information about a family member, the amount of data transmitted is V(a)), and computer B
Now, take out only the column (employee number, name, workplace) of the person whose employee number came from computer A (this is the semi-combined result), and send it to computer C. Computer C receives the family table from computer A and compares them to obtain the combined result shown in FIG.

その■は第４図に示すように、初めから結合をすぐ行な
ってしまう方式であり、データ転送量はＶ　（Ａ）　＋
Ｖ　（ＡＫＢ）、電算機内部処理量はＡＸＢの計算量で
、これらの全体がコスト計算対象になる。As shown in Figure 4, the method (■) is a method in which the connection is performed immediately from the beginning, and the amount of data transferred is V (A) +
V (AKB), the computer internal processing amount is the calculation amount of AXB, and the entire amount is subject to cost calculation.

本発明では、２テーブルの結合ならその処理方法には、
第２図、第３図、および第４図の３通りが考えられるの
でこれらについてコスト計算し、どれがコスト最小かに
より選択すべきものを決定する。そして選択したものを
実行し、その結果と次のテーブルとの間で再び■、■、
■のケースをコスト計算し、コストミニマムのものを選
択し、それを実行する。以下同様で、か＼る操作を入力
コマンドに対する解答が得られるまで行う。注目する結
合の部分のみに限定してコスト計算を行なうことが特徴
である。これに対し従来方式ではあらゆる組合せにつき
前記コスト計算を行ない、どの組合せがコストミニマム
であるかを求め、コストミニマムの組合せを採用してい
たので、特にテーブルが多数ある場合コスト計算が複雑
で、時間のか＼るものであった。In this invention, when joining two tables, the processing method is as follows:
Since there are three possible options shown in FIG. 2, FIG. 3, and FIG. 4, the costs are calculated for these, and the one to be selected is determined based on which one has the lowest cost. Then execute the selected one, and again between the result and the next table ■,■,
■ Calculate the cost of the cases, select the one with the minimum cost, and execute it. In the same manner, the above operations are performed until the answer to the input command is obtained. The feature is that the cost calculation is limited to only the part of the connection of interest. On the other hand, in the conventional method, the cost calculation is performed for every combination, and the combination with the minimum cost is determined, and the combination with the minimum cost is adopted. This makes cost calculation complicated and time-consuming, especially when there are many tables. It was something like that.

〔Example〕

本発明方式はテーブルが３つ以上の場合に有効である（
テーブルが２つなら本発明法も従来法も同じ）。テーブ
ルが３つの場合の例を第５図に示す。テーブルＡ、Ｂ、
Ｃの結合を求めるコマンドが入力されると、システムは
先ず２つのテーブルについて上記３つのケース即ち（１
）とりあえず結合を省略しておいて後に結合する（第２
図）、（２）とりあえず準結合で処理しておいて後に結
合する（第３図）、■初めから結合を行なってしまう（
第４図）、を検討し、コスト最小のものを選ぶ。The method of the present invention is effective when there are three or more tables (
If there are two tables, the method of the present invention and the conventional method are the same). FIG. 5 shows an example where there are three tables. Tables A, B,
When a command to join C is input, the system first performs the above three cases for two tables, namely (1
) For now, omit the join and join later (second
(Figure 3), (2) Process it as a semi-join for now and then combine it later (Figure 3), ■ Perform the join from the beginning (
Figure 4), and select the one with the lowest cost.

第５図ではテーブルＡとＢを取り上げ（ＡとＣ１ＢとＣ
を取り上げることも考えられるが、いずれでも同じであ
るから任意の２つを取り上げてよい）、コスト計算した
結果、■のケースがよいと判断されたとしている。次は
準結合結果Ａ　（Ｂ）とテーブルＣについて検討し、こ
れは■のケースがよいと判断され、最後に結合結果Ａ、
Ｃ（’Ｂ）とテーブルＢについて検討し、やはり■のケ
ースがよいと判断されたのでそのようにした、としてい
る。In Figure 5, tables A and B are taken up (A, C1B, and C
(Although it is possible to choose any two as they are the same), as a result of cost calculations, it was determined that case (■) is better. Next, we will consider semi-join result A (B) and table C, and it is determined that case ■ is better, and finally join result A,
After considering C('B) and table B, it was determined that the case of ■ was the best choice, so that was the case.

この場合のデータ転送量はＶ　（Ａ）　＋Ｖ（ｂ）＋Ｖ
（ＡＫＢ）＋Ｖ　（Ｃ）＋Ｖ　（（ＡＫＢ）閃Ｃ）＋Ｖ
　（Ｂ）であり、計算機内部処理量はにｌ子図２十閃３
である。但し本発明ではこれらの総和は求める必要がな
（、最終結果Ａ、Ｂ、Ｃの結合を求める迄の各ステップ
における２テ一ブル間の結合を■、■、■のどれで行な
うのが最適かを検討するだけである。The amount of data transferred in this case is V (A) + V (b) + V
(AKB)+V (C)+V ((AKB) Flash C)+V
(B), and the amount of internal processing of the computer is
It is. However, in the present invention, it is not necessary to calculate these sums (it is best to use ■, ■, or All you have to do is consider whether or not.

これに対して従来方式では全体について検討するので上
記データ転送量と計算機内部処理量の総和（詳しくはこ
れにデータ入出力に要する処理量も入る）を求め、テー
ブルと部分結果の組合せも各種各様あるのでその組合せ
の全てにつき全体のコスト計算をし、どの組合せが最適
かを決定していた。On the other hand, in the conventional method, the whole is considered, so the sum of the above data transfer amount and the computer's internal processing amount (more specifically, the processing amount required for data input/output is included in this) is calculated, and various combinations of tables and partial results are also used. Since there are various combinations, we calculated the overall cost for all of the combinations and decided which combination was optimal.

〔Effect of the invention〕

以上説明したように本発明では、複数の電算機とそれら
をつなぐ通信路ネットワーク上に作成された分散データ
ベース管理システムにおいて、利用者によって入力され
た問合せを各電算機上で実行される部分間合せ群に分解
する際、最適化として、１つ１つの結合演算を省略する
か、準結合で行なうか、結合で行なうかを決めるのであ
るが、その際、全体的なコスト計算を行なうのではなく
、注目している１つの結合に関与する部分のみを見てコ
スト計算を行なって処理方法を決定するので、コスト計
算が単純になり、処理方法を迅速に決定できる利点があ
る。As explained above, in the present invention, in a distributed database management system created on a plurality of computers and a communication channel network connecting them, a query input by a user is processed by partial matching executed on each computer. When decomposing into groups, the optimization is to decide whether to omit, semi-combine, or combine each join operation, but instead of calculating the overall cost, Since the cost calculation is performed to determine the processing method by looking at only the parts involved in the one connection of interest, there is an advantage that the cost calculation is simple and the processing method can be determined quickly.

[Brief explanation of the drawing]

第１図〜第５図は本発明の説明図、第６図はテーブルの
例を示す図、第７図は結合の説明図、第８図は準結合の
説明図である。1 to 5 are explanatory diagrams of the present invention, FIG. 6 is a diagram showing an example of a table, FIG. 7 is an explanatory diagram of a join, and FIG. 8 is an explanatory diagram of a semi-join.

Claims

[Claims] In a distributed database management system created on a plurality of computers and a communication path network connecting them, an inquiry including a combination of a plurality of databases input by a user is executed on each computer. In the sub-alignment optimization method that is used when decomposing the sub-alignments into groups of sub-alignments, the sub-alignments executed on each computer are combined (1) for the time being between two computers, each with its own database or processing results. (2) For now, process it as a semi-join and then combine it later. (3)
) Perform the join from the beginning, calculate the cost of each, execute the one with the lowest cost, and connect the computer with the processing result with the computer with the other database ( The part characterized in that the costs are calculated for the cases 1), (2), and (3), the one with the lowest cost is executed, and the same process is performed sequentially until an answer to the user inquiry is obtained. Time optimization method.