JP6758252B2

JP6758252B2 - Histogram generation method, histogram generator and histogram generation program

Info

Publication number: JP6758252B2
Application number: JP2017110924A
Authority: JP
Inventors: 和広斉藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-06-05
Filing date: 2017-06-05
Publication date: 2020-09-23
Anticipated expiration: 2037-06-05
Also published as: JP2018206074A

Description

本発明は、データベースに格納されているテーブルに含まれるデータの分布を示すヒストグラムを生成するヒストグラム生成方法、ヒストグラム生成装置及びヒストグラム生成プログラムに関する。 The present invention relates to a histogram generation method, a histogram generation device, and a histogram generation program that generate a histogram showing the distribution of data contained in a table stored in a database.

データベースシステムでは、データベースを操作する問合せ文の最適化等の実行計画の作成において、問合せ文の処理コストを利用することにより、問合せ文に対応する処理の実行時間を短縮することができる。従来、処理コストの推定に、データベースに格納されているテーブルに含まれるデータの分布を示すヒストグラムを利用する手法が知られている（例えば、特許文献１参照）。 In the database system, the execution time of the processing corresponding to the query statement can be shortened by using the processing cost of the query statement in creating the execution plan such as the optimization of the query statement that operates the database. Conventionally, a method of using a histogram showing the distribution of data contained in a table stored in a database is known for estimating a processing cost (see, for example, Patent Document 1).

特開２０１６−０９５５６１号公報JP-A-2016-095561

ヒストグラムは、テーブルに含まれる複数のカラムのそれぞれに対応して、１以上のバケットを含んでいる。バケットは、データ区間における、データの数とデータが示す値の数とを示す情報を含む。データベースシステムは、ヒストグラムに対して、問合せ文における演算処理を適用することにより、演算処理後のヒストグラム（中間ヒストグラム）を作成し、クエリの中間結果のサイズを推定することができる。 The histogram contains one or more buckets, corresponding to each of the plurality of columns contained in the table. The bucket contains information that indicates the number of data and the number of values that the data represents in the data interval. The database system can create a histogram (intermediate histogram) after the arithmetic processing by applying the arithmetic processing in the query statement to the histogram, and estimate the size of the intermediate result of the query.

しかしながら、問合せ文においてカラムに対して型変換が発生した場合には、型変換後のデータと、ヒストグラムとが対応しなくなる。このため、ヒストグラムを用いた中間結果のサイズの推定結果が、実際の演算処理における中間結果のサイズと乖離してしまうという問題が発生する。そして、中間結果のサイズの推定結果が、実際の演算処理における中間結果のサイズと乖離した状態でクエリの処理を継続すると、データベースシステムにおいて利用可能なリソースがなくなり、処理が終了しなくなるおそれがある。 However, when type conversion occurs for a column in a query statement, the data after type conversion and the histogram do not correspond. Therefore, there arises a problem that the estimation result of the size of the intermediate result using the histogram deviates from the size of the intermediate result in the actual arithmetic processing. If the query processing is continued in a state where the estimation result of the size of the intermediate result deviates from the size of the intermediate result in the actual arithmetic processing, the resources available in the database system may be exhausted and the processing may not be completed. ..

そこで、本発明はこれらの点に鑑みてなされたものであり、型変換後のデータに対応するヒストグラムを生成することができるヒストグラム生成方法、ヒストグラム生成装置及びヒストグラム生成プログラムを提供することを目的とする。 Therefore, the present invention has been made in view of these points, and an object of the present invention is to provide a histogram generation method, a histogram generation device, and a histogram generation program capable of generating a histogram corresponding to the data after type conversion. To do.

本発明の第１の態様に係るヒストグラム生成方法は、コンピュータが実行する、データベースシステムにおいて実行される問合せ文において、異なる種類の型に型変換が行われるカラムを特定する特定ステップと、特定された前記カラムに対応する前記型変換が行われる前のデータの度数分布を示し、予め記憶部に記憶されているヒストグラムに基づいて、当該ヒストグラムよりも度数分布の粒度が荒いヒストグラムを、前記問合せ文の中間結果に含まれる、前記カラムに対応する前記型変換が行われた後のデータの中間結果ヒストグラムとして生成する生成ステップと、を含む。 The histogram generation method according to the first aspect of the present invention is specified as a specific step of specifying a column to be type-converted to a different type in a query statement executed by a computer and executed in a database system. The frequency distribution of the data before the type conversion corresponding to the column is shown, and based on the histogram stored in the storage unit in advance, a histogram having a coarser frequency distribution than the histogram is used in the query statement. It includes a generation step included in the intermediate results, which is generated as an intermediate result histogram of the data after the type conversion corresponding to the column is performed.

前記特定ステップにおいて、前記コンピュータは、前記問合せ文において、数値に対応する型から文字列に対応する型への型変換が行われるカラムを特定し、前記生成ステップにおいて、前記コンピュータは、特定された前記カラムに対応する前記型変換が行われる前のデータに、桁数が異なるデータが含まれている場合に、前記カラムに対応する全てのデータの型変換前の最大値と最小値とを含むデータ区間における、前記データの数と、前記データが示す値の数とを示す情報を含むバケットを生成し、生成されたバケットに基づいて前記中間結果ヒストグラムを生成してもよい。 In the specific step, the computer identifies the column in which the type conversion from the type corresponding to the numerical value to the type corresponding to the character string is performed in the query statement, and in the generation step, the computer is specified. When the data before the type conversion corresponding to the column includes data having a different number of digits, the maximum value and the minimum value before the type conversion of all the data corresponding to the column are included. You may generate a bucket containing information indicating the number of the data and the number of values indicated by the data in the data interval, and generate the intermediate result histogram based on the generated bucket.

前記特定ステップにおいて、前記コンピュータは、前記問合せ文において、数値に対応する型から文字列に対応する型への型変換が行われるカラムを特定し、前記生成ステップにおいて、前記コンピュータは、特定された前記カラムに対応する前記型変換が行われる前のデータに、正の値を示すデータと負の値を示すデータとが含まれている場合に、正の値を有する全てのデータに対応する第１バケットを生成するとともに、負の値を有する全てのデータに対応する第２バケットを生成してもよい。 In the specific step, the computer identifies the column in which the type conversion from the type corresponding to the numerical value to the type corresponding to the character string is performed in the query statement, and in the generation step, the computer is specified. When the data before the type conversion corresponding to the column includes data showing a positive value and data showing a negative value, the first data corresponding to all the data having a positive value. One bucket may be generated and a second bucket corresponding to all data having a negative value may be generated.

前記生成ステップにおいて、前記コンピュータは、前記第１バケット及び前記第２バケットのそれぞれに含まれるデータの最大値及び最小値を、それぞれのバケットに含まれるデータが取り得る値の最大値及び最小値に設定してもよい。 In the generation step, the computer sets the maximum and minimum values of the data contained in the first bucket and the second bucket to the maximum and minimum values of the data contained in the respective buckets. You may set it.

前記生成ステップにおいて、前記コンピュータは、前記第１バケットの最大値を前記中間結果ヒストグラムに含まれるデータにおける最大値に設定するとともに、前記第２バケットの最小値を前記中間結果ヒストグラムに含まれるデータの最小値に設定してもよい。 In the generation step, the computer sets the maximum value of the first bucket to the maximum value in the data included in the intermediate result histogram, and sets the minimum value of the second bucket to the maximum value of the data included in the intermediate result histogram. It may be set to the minimum value.

前記生成ステップにおいて、前記コンピュータは、前記ヒストグラムに含まれるデータの最大値が正の値である場合には、正の値を有するデータに対応する１以上のバケットである第１バケットにおける最大値を前記ヒストグラムに含まれるデータにおける最大値に設定するとともに、負の値を有するデータに対応する１以上のバケットである第２バケットにおける最小値を前記ヒストグラムに含まれるデータの最小値に設定してもよい。 In the generation step, when the maximum value of the data included in the histogram is a positive value, the computer sets the maximum value in the first bucket, which is one or more buckets corresponding to the data having a positive value. Even if the maximum value of the data included in the histogram is set and the minimum value of the second bucket, which is one or more buckets corresponding to the data having a negative value, is set to the minimum value of the data included in the histogram. Good.

前記生成ステップにおいて、前記コンピュータは、前記ヒストグラムに含まれるデータの最大値が負の値である場合には、当該データの最大値を前記ヒストグラムに含まれるデータにおける最小値に設定するとともに、当該データの最小値を前記ヒストグラムに含まれるデータの最大値に設定してもよい。 In the generation step, when the maximum value of the data included in the histogram is a negative value, the computer sets the maximum value of the data to the minimum value of the data included in the histogram and the data. The minimum value of may be set to the maximum value of the data included in the histogram.

前記生成ステップにおいて、前記コンピュータは、特定された前記カラムに対応するデータに、正の値を示すデータと負の値を示すデータとが含まれている場合に、前記カラムに対応するデータにおける型変換前の最大値と、型変換前の最小値との比率と、前記カラムに対応する全てのデータの数とに基づいて、正の値を有するデータに対応する１以上のバケットである第１バケット及び負の値を有するデータに対応する１以上のバケットである第２バケットのそれぞれに含まれるデータの数を推定し、推定したデータの数を示す情報を含む前記第１バケット及び前記第２バケットを生成してもよい。 In the generation step, when the data corresponding to the specified column includes data indicating a positive value and data indicating a negative value, the type in the data corresponding to the column. The first, which is one or more buckets corresponding to data having a positive value, based on the ratio of the maximum value before conversion to the minimum value before type conversion and the number of all data corresponding to the column. The number of data contained in each of the bucket and the second bucket, which is one or more buckets corresponding to the data having a negative value, is estimated, and the first bucket and the second bucket including information indicating the estimated number of data are included. You may generate a bucket.

前記生成ステップにおいて、前記コンピュータは、前記問合せ文に含まれる前記カラムに対応する絞り込み条件によって絞り込まれた後のデータの数を、生成された前記中間結果ヒストグラムに基づいて推定し、前記記憶部に記憶されているヒストグラムに含まれる複数のバケットのそれぞれに対応するデータの数を、推定したデータの数と、前記絞り込み条件によって絞り込まれる前のデータの数とに基づいて更新することにより、前記問合せ文が実行された後の前記カラムのヒストグラムをさらに生成してもよい。 In the generation step, the computer estimates the number of data after being narrowed down by the narrowing conditions corresponding to the columns included in the query statement based on the generated intermediate result histogram and stores it in the storage unit. The query is performed by updating the number of data corresponding to each of the plurality of buckets included in the stored histogram based on the estimated number of data and the number of data before being narrowed down by the narrowing condition. Further histograms of said columns may be generated after the statement has been executed.

本発明の第２の態様に係るヒストグラム生成装置は、データベースシステムにおいて実行される問合せ文において、異なる種類の型に型変換が行われるカラムを特定する特定部と、特定された前記カラムに対応する前記型変換が行われる前のデータの度数分布を示し、予め記憶部に記憶されているヒストグラムに基づいて、当該ヒストグラムよりも度数分布の粒度が荒いヒストグラムを、前記問合せ文の中間結果に含まれる、前記カラムに対応する前記型変換が行われた後のデータの中間結果ヒストグラムとして生成する生成部と、を備える。 The histogram generator according to the second aspect of the present invention corresponds to a specific part that specifies a column that undergoes type conversion to a different type in a query statement executed in a database system, and the specified column. The intermediate result of the query statement includes a histogram showing the frequency distribution of the data before the type conversion is performed, and based on the histogram stored in the storage unit in advance, the granularity of the frequency distribution is coarser than that of the histogram. A generation unit that generates an intermediate result histogram of the data after the type conversion corresponding to the column is provided.

本発明の第３の態様に係るヒストグラム生成プログラムは、コンピュータを、データベースシステムにおいて実行される問合せ文において、異なる種類の型に型変換が行われるカラムを特定する特定部、及び、特定された前記カラムに対応する前記型変換が行われる前のデータの度数分布を示し、予め記憶部に記憶されているヒストグラムに基づいて、当該ヒストグラムよりも度数分布の粒度が荒いヒストグラムを、前記問合せ文の中間結果に含まれる、前記カラムに対応する前記型変換が行われた後のデータの中間結果ヒストグラムとして生成する生成部、として機能させる。 The histogram generation program according to the third aspect of the present invention is a specific part that specifies a column in which a computer is subjected to type conversion to a different type in a query statement executed in a database system, and the specified unit. The frequency distribution of the data before the type conversion corresponding to the column is shown, and based on the histogram stored in the storage unit in advance, a histogram having a coarser frequency distribution than the histogram is obtained in the middle of the query statement. It functions as a generation unit that is included in the result and is generated as an intermediate result histogram of the data after the type conversion corresponding to the column is performed.

本発明によれば、型変換後のデータに対応するヒストグラムを生成することができるという効果を奏する。 According to the present invention, there is an effect that a histogram corresponding to the data after type conversion can be generated.

本実施形態に係るデータベースシステムの概要を示す図である。It is a figure which shows the outline of the database system which concerns on this embodiment. 本実施形態に係るデータベースシステムの構成を示す図である。It is a figure which shows the structure of the database system which concerns on this embodiment. 本実施形態に係るヒストグラムの一例を示す図である。It is a figure which shows an example of the histogram which concerns on this embodiment. 本実施形態に係るデータベースシステムにおける処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing in the database system which concerns on this embodiment. 本実施形態に係る桁違い対応処理における処理の流れを示すフローチャートである。It is a flowchart which shows the process flow in the digit difference correspondence processing which concerns on this embodiment. 本実施形態に係る負値対応処理における処理の流れを示すフローチャートである。It is a flowchart which shows the process flow in the negative value correspondence processing which concerns on this embodiment. 本実施形態に係る型戻し処理における処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing in the type return process which concerns on this embodiment.

［データベースシステム１の概要］
図１は、本実施形態に係るデータベースシステム１の概要を示す図である。データベースシステム１は、ユーザ端末２から取得した問合せ文を実行するシステムである。ここで、問合せ文は、データベースを操作するための文字列であり、ＳＱＬ（Structured Query Language）で記述された命令文である。以下の説明において問合せ文をクエリという。 [Overview of database system 1]
FIG. 1 is a diagram showing an outline of the database system 1 according to the present embodiment. The database system 1 is a system that executes a query statement acquired from the user terminal 2. Here, the query statement is a character string for operating the database, and is a statement written in SQL (Structured Query Language). In the following explanation, the query statement is called a query.

データベースシステム１は、互いに通信可能な１以上のコンピュータによって構成されており、ヒストグラム生成装置として機能する。データベースシステム１は、ＬＡＮやインターネット等の通信ネットワークを介してユーザ端末２と通信可能に接続されている。 The database system 1 is composed of one or more computers that can communicate with each other, and functions as a histogram generator. The database system 1 is communicably connected to the user terminal 2 via a communication network such as a LAN or the Internet.

データベースシステム１は、ユーザ端末２から、クエリを取得する（図１の（１））。データベースシステム１は、取得したクエリを解析して型変換が行われるカラムを特定し（図１の（２））、特定したカラムの型変換前のヒストグラムに基づいて、型変換後のカラムに対応するバケットを生成する（図１の（３））。ここで、バケットは、カラムに対応するデータの度数分布を示すヒストグラムに含まれている情報である。バケットは、データ区間におけるデータの数を示す情報と、データが示す値の数を示す情報とを含んでいる。 The database system 1 acquires a query from the user terminal 2 ((1) in FIG. 1). The database system 1 analyzes the acquired query to identify the column to which type conversion is performed ((2) in FIG. 1), and corresponds to the column after type conversion based on the histogram before type conversion of the identified column. Generate a bucket to be used ((3) in FIG. 1). Here, the bucket is information contained in the histogram showing the frequency distribution of the data corresponding to the column. The bucket contains information that indicates the number of data in the data interval and information that indicates the number of values that the data indicates.

データベースシステム１は、生成したバケットに基づいて、クエリの中間結果に対応するヒストグラムである中間結果ヒストグラムを生成する（図１の（４））。ここで、データベースシステム１は、特定したカラムに対応し、型変換が行われる前のヒストグラムに基づいて、当該ヒストグラムよりも度数分布の粒度が荒いヒストグラムを、中間結果ヒストグラムとして生成する。このようにすることで、データベースシステム１は、型変換後のデータに対応するヒストグラムを生成することができる。 The database system 1 generates an intermediate result histogram, which is a histogram corresponding to the intermediate result of the query, based on the generated bucket ((4) in FIG. 1). Here, the database system 1 generates a histogram as an intermediate result histogram corresponding to the specified column and having a coarser frequency distribution than the histogram based on the histogram before the type conversion is performed. By doing so, the database system 1 can generate a histogram corresponding to the data after type conversion.

データベースシステム１は、中間結果ヒストグラムに基づいてクエリの処理コストを算出し（図１の（５））、処理コストに基づいてクエリを実行する（図１の（６））。中間結果ヒストグラムが、クエリに対応する処理に対応したものとなることから、データベースシステム１は、クエリの処理コストを精度良く計算することができる。
以下、データベースシステム１の構成について説明する。 The database system 1 calculates the processing cost of the query based on the intermediate result histogram ((5) in FIG. 1), and executes the query based on the processing cost ((6) in FIG. 1). Since the intermediate result histogram corresponds to the processing corresponding to the query, the database system 1 can calculate the processing cost of the query with high accuracy.
The configuration of the database system 1 will be described below.

［データベースシステム１の構成例］
図２は、本実施形態に係るデータベースシステム１の構成を示す図である。
データベースシステム１は、記憶部１１と、制御部１２とを備える。 [Configuration example of database system 1]
FIG. 2 is a diagram showing a configuration of a database system 1 according to the present embodiment.
The database system 1 includes a storage unit 11 and a control unit 12.

記憶部１１は、例えば、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等である。記憶部１１は、データベースシステム１を機能させるための各種プログラムを記憶する。記憶部１１は、データベースシステム１の制御部１２を、後述する取得部１２１、特定部１２２、生成部１２３、コスト算出部１２４、及び実行部１２５として機能させるデータベース管理プログラムを記憶する。ここで、データベース管理プログラムは、制御部１２を、取得部１２１、特定部１２２、及び生成部１２３として機能させるヒストグラム生成プログラムと、制御部１２を、コスト算出部１２４及び実行部１２５として機能させるクエリ実行プログラムとを含んでいてもよい。 The storage unit 11 is, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), or the like. The storage unit 11 stores various programs for operating the database system 1. The storage unit 11 stores a database management program that causes the control unit 12 of the database system 1 to function as an acquisition unit 121, a specific unit 122, a generation unit 123, a cost calculation unit 124, and an execution unit 125, which will be described later. Here, the database management program is a histogram generation program that causes the control unit 12 to function as the acquisition unit 121, the specific unit 122, and the generation unit 123, and a query that causes the control unit 12 to function as the cost calculation unit 124 and the execution unit 125. It may include an executable program.

また、記憶部１１は、データベース１１１を記憶する。データベース１１１には、１以上のテーブルが格納されている。また、１以上のテーブルのそれぞれには、１以上のカラムが含まれている。１以上のカラムのそれぞれには、カラムに対応するデータの度数分布を示すヒストグラムが設けられている。 In addition, the storage unit 11 stores the database 111. One or more tables are stored in the database 111. Also, each of the one or more tables contains one or more columns. Each of the one or more columns is provided with a histogram showing the frequency distribution of the data corresponding to the columns.

図３は、本実施形態に係るヒストグラムの一例を示す図である。図３に示す例は、所定のテーブルに含まれるカラム「Ａ」に対応するヒストグラムである。図３に示すように、ヒストグラムには、カラムに対応するデータの最小値と最大値とを示す情報が含まれているとともに、１以上のバケットが含まれている。図３に示す例では、２つのバケット「Ａ１」及び「Ａ２」が含まれていることが確認できる。 FIG. 3 is a diagram showing an example of a histogram according to the present embodiment. The example shown in FIG. 3 is a histogram corresponding to the column “A” included in a predetermined table. As shown in FIG. 3, the histogram contains information indicating the minimum and maximum values of the data corresponding to the columns, and includes one or more buckets. In the example shown in FIG. 3, it can be confirmed that two buckets "A1" and "A2" are included.

バケットには、バウンドと、レコード数と、ドメイン数とが含まれている。バウンドは、データが取り得る範囲を示す情報である。レコード数は、データの個数である。ドメイン数は、値の種類の個数である。なお、図３に示す例では、バケットにデータを示しているが、このデータは、バウンド数、レコード数、及びドメイン数を説明するために示したものであり、実際には含まれていないものとする。 The bucket contains the bounds, the number of records, and the number of domains. Bound is information indicating the range that data can take. The number of records is the number of data. The number of domains is the number of value types. In the example shown in FIG. 3, data is shown in the bucket, but this data is shown to explain the number of bounds, the number of records, and the number of domains, and is not actually included. And.

制御部１２は、例えばＣＰＵ（Central Processing Unit）である。制御部１２は、記憶部１１に記憶されている各種プログラムを実行することにより、データベースシステム１に係る機能を制御する。制御部１２は、データベース管理プログラムを実行することにより、取得部１２１、特定部１２２、生成部１２３、コスト算出部１２４、及び実行部１２５として機能する。 The control unit 12 is, for example, a CPU (Central Processing Unit). The control unit 12 controls the functions related to the database system 1 by executing various programs stored in the storage unit 11. By executing the database management program, the control unit 12 functions as an acquisition unit 121, a specific unit 122, a generation unit 123, a cost calculation unit 124, and an execution unit 125.

取得部１２１は、データベース１１１に対応するクエリをユーザ端末２から取得する。
特定部１２２は、取得部１２１が取得したクエリを解析し、異なる種類の型に型変換が行われるカラムを特定する。具体的には、特定部１２２は、クエリを示す文字列において、型変換を示す演算であるキャスト演算に対応する文字列を特定することにより、異なる種類の型に型変換が行われるカラムを特定する。ここで、異なる種類の型とは、データの大小関係の並び順が異なる型をいう。特定部１２２は、例えば、数値に対応する型から文字列に対応する型への型変換が行われるカラムを特定する。 The acquisition unit 121 acquires the query corresponding to the database 111 from the user terminal 2.
The identification unit 122 analyzes the query acquired by the acquisition unit 121 and identifies a column in which type conversion is performed to a different type. Specifically, the identification unit 122 identifies a column in which type conversion is performed to a different type by specifying a character string corresponding to a cast operation, which is an operation indicating type conversion, in a character string indicating a query. To do. Here, the different types of types refer to types in which the order of the magnitude relation of the data is different. The identification unit 122 specifies, for example, a column in which type conversion from a type corresponding to a numerical value to a type corresponding to a character string is performed.

生成部１２３は、異なる種類の型に型変換が行われるカラムに対応して記憶部１１に記憶されている、型変換が行われる前のヒストグラムに基づいて、当該ヒストグラムよりも度数分布の粒度が荒いヒストグラムを、クエリの中間結果に含まれる当該カラムの中間結果ヒストグラムとして生成する。 The generation unit 123 has a frequency distribution particle size smaller than that of the histogram based on the histogram before the type conversion, which is stored in the storage unit 11 corresponding to the column in which the type conversion is performed to a different type. A rough histogram is generated as an intermediate result histogram of the column included in the intermediate result of the query.

具体的には、まず、生成部１２３は、中間結果ヒストグラムとして、型変換が行われる前のヒストグラムと同じヒストグラムを生成する。続いて、生成部１２３は、カラムに対応するデータに桁数が異なるデータが含まれる場合に実行する桁違い対応処理と、カラムに対応するデータに負の値のデータが含まれる場合に実行する負値対応処理とを実行することにより、中間結果ヒストグラムに含まれるバケットを更新する。以下、桁違い対応処理及び負値対応処理の詳細について説明する。 Specifically, first, the generation unit 123 generates the same histogram as the histogram before the type conversion is performed as the intermediate result histogram. Subsequently, the generation unit 123 executes the digit difference handling process to be executed when the data corresponding to the column contains data having a different number of digits, and is executed when the data corresponding to the column contains data having a negative value. The bucket included in the intermediate result histogram is updated by executing the negative value correspondence process. Hereinafter, the details of the digit difference handling process and the negative value handling process will be described.

［桁違い対応処理］
生成部１２３は、特定されたカラムに対応するデータに、桁数が異なるデータが含まれているか否かを判定する。例えば、生成部１２３は、中間結果ヒストグラムとして生成されているヒストグラムに含まれる最大値及び最小値を示す情報に基づいて、特定されたカラムに対応するデータに、桁数が異なるデータが含まれているか否かを判定する。 [Processing for orders of magnitude]
The generation unit 123 determines whether or not the data corresponding to the specified column includes data having a different number of digits. For example, the generation unit 123 includes data having a different number of digits in the data corresponding to the specified column based on the information indicating the maximum value and the minimum value included in the histogram generated as the intermediate result histogram. Judge whether or not.

生成部１２３は、桁数が異なるデータが含まれていると判定すると、桁違い対応処理を実行し、中間結果ヒストグラムとして生成されているヒストグラムのバケットを変更する。 When the generation unit 123 determines that the data having a different number of digits is included, the generation unit 123 executes the digit difference correspondence process and changes the bucket of the histogram generated as the intermediate result histogram.

生成部１２３は、特定されたカラムに対応する全てのデータの型変換前の最大値と最小値とを含むデータ区間における、データの数と、データが示す値の数とを示す情報を含むバケットを生成する。そして、生成部１２３は、予め中間結果ヒストグラムとして生成されているヒストグラムに含まれているバケットを消去する。このようにすることで、生成部１２３は、ヒストグラムの粒度が荒くなるものの、型変換後のデータに対応するヒストグラムを生成することができる。 The generation unit 123 contains information indicating the number of data and the number of values indicated by the data in the data interval including the maximum value and the minimum value before type conversion of all the data corresponding to the specified column. To generate. Then, the generation unit 123 erases the bucket included in the histogram generated in advance as the intermediate result histogram. By doing so, the generation unit 123 can generate a histogram corresponding to the data after type conversion, although the granularity of the histogram becomes coarse.

なお、生成部１２３は、特定されたカラムに対応するデータに、正の値を示すデータと負の値を示すデータとが含まれている場合には、正の値を有する全てのデータに対応する第１バケットを生成するとともに、負の値を有する全てのデータに対応する第２バケットを生成する。 When the data corresponding to the specified column includes data showing a positive value and data showing a negative value, the generation unit 123 corresponds to all the data having a positive value. The first bucket is generated, and the second bucket corresponding to all the data having a negative value is generated.

例えば、生成部１２３は、中間結果ヒストグラムとして生成されているヒストグラムに含まれる最大値及び最小値を示す情報に基づいて、特定されたカラムに対応するデータに、正の値を示すデータと負の値を示すデータとが含まれているか否かを判定する。 For example, the generation unit 123 adds positive and negative values to the data corresponding to the specified column based on the information indicating the maximum value and the minimum value included in the histogram generated as the intermediate result histogram. It is determined whether or not the data indicating the value is included.

生成部１２３は、正の値を示すデータと負の値を示すデータとが含まれていると判定すると、第１バケットを生成するとともに、第２バケットを生成する。例えば、中間結果ヒストグラムとして生成されているヒストグラムに、複数のバケットが含まれている場合には、生成部１２３は、複数のバケットのうち、正の値を有するデータに対応するバケットを集約して第１バケットを生成する。また、生成部１２３は、複数のバケットのうち、負の値を有するデータに対応するバケットを集約して第２バケットを生成する。ここで、生成部１２３は、複数のバケットのうち、正のデータと負のデータとの双方を含むバケットが存在する場合には、当該バケットを、第１バケット及び第２バケットに分解するものとする。 When the generation unit 123 determines that the data indicating a positive value and the data indicating a negative value are included, the first bucket is generated and the second bucket is generated. For example, when the histogram generated as the intermediate result histogram contains a plurality of buckets, the generation unit 123 aggregates the buckets corresponding to the data having a positive value among the plurality of buckets. Create a first bucket. In addition, the generation unit 123 aggregates the buckets corresponding to the data having a negative value among the plurality of buckets to generate the second bucket. Here, the generation unit 123 decomposes the bucket into the first bucket and the second bucket when there is a bucket containing both positive data and negative data among the plurality of buckets. To do.

そして、生成部１２３は、第１バケット及び第２バケットそれぞれに含まれるデータのバウンド、レコード数、ドメイン数を更新する。例えば、生成部１２３は、第１バケット及び第２バケットそれぞれに含まれるデータの最大値及び最小値を、それぞれのバケットに含まれるデータが取り得る値の最大値及び最小値に設定し、これらのバケットのバウンドを設定する。例えば、数値型から文字列型に型変換される前の正のデータ（数値型のデータ）の最小値が４、最大値が５００である場合、文字列型に型変換された後の第１バケットの最小値は１０、最大値は９９となる。また、数値型から文字列型に型変換される前の負のデータ（数値型のデータ）の最小値が−５００、最大値が−４である場合、文字列型に型変換された後の第２バケットの最小値は−９９、最大値は−１０となる。 Then, the generation unit 123 updates the bound, the number of records, and the number of domains of the data included in each of the first bucket and the second bucket. For example, the generation unit 123 sets the maximum value and the minimum value of the data contained in each of the first bucket and the second bucket to the maximum value and the minimum value of the values that the data contained in each bucket can take, and these Set the bucket bound. For example, if the minimum value of positive data (numeric type data) before type conversion from numeric type to character string type is 4 and the maximum value is 500, the first value after type conversion to character string type The minimum value of the bucket is 10 and the maximum value is 99. Also, if the minimum value of negative data (numeric type data) before type conversion from numeric type to character string type is -500 and the maximum value is -4, after type conversion to character string type. The minimum value of the second bucket is -99, and the maximum value is -10.

また、生成部１２３は、正の値を有する全てのデータに対応する第１バケットの最大値を、中間結果ヒストグラムに含まれるデータにおける最大値に設定する。また、生成部１２３は、負の値を有する全てのデータに対応する第２バケットの最小値を、中間結果ヒストグラムに含まれるデータにおける最小値に設定する。例えば、型変換前の正のデータ（数値型のデータ）の最小値が４、最大値が５００であり、型変換前の負のデータ（数値型のデータ）の最小値が−５００、最大値が−４である場合、中間結果ヒストグラムに含まれるデータにおける最小値は−９９、最大値は９９となる。 Further, the generation unit 123 sets the maximum value of the first bucket corresponding to all the data having a positive value to the maximum value in the data included in the intermediate result histogram. Further, the generation unit 123 sets the minimum value of the second bucket corresponding to all the data having a negative value to the minimum value in the data included in the intermediate result histogram. For example, the minimum value of positive data (numeric type data) before type conversion is 4, the maximum value is 500, and the minimum value of negative data (numerical type data) before type conversion is -500, maximum value. When is -4, the minimum value and the maximum value of the data included in the intermediate result histogram are -99 and 99.

また、生成部１２３は、複数のバケットのうち、正の値を有するデータに対応するバケットを集約する場合に、正の値を有するデータに対応する全てのバケットのレコード数及びドメイン数の合計を算出することにより、第１バケットのレコード数及びドメイン数を設定する。また、生成部１２３は、複数のバケットのうち、負の値を有するデータに対応するバケットを集約する場合に、負の値を有するデータに対応する全てのバケットのレコード数及びドメイン数の合計を算出することにより、第２バケットのレコード数及びドメイン数を設定する。 Further, when the generation unit 123 aggregates the buckets corresponding to the data having a positive value among the plurality of buckets, the generation unit 123 totals the number of records and the number of domains of all the buckets corresponding to the data having a positive value. By calculating, the number of records and the number of domains in the first bucket are set. Further, when the generation unit 123 aggregates the buckets corresponding to the data having a negative value among the plurality of buckets, the generation unit 123 totals the number of records and the number of domains of all the buckets corresponding to the data having a negative value. By calculating, the number of records and the number of domains in the second bucket are set.

なお、生成部１２３は、正の値を示すデータと負の値を示すデータとが含まれていた場合に、型変換前のデータの最大値と、型変換前の最小値との比率と、更新前のバケットの全てのドメイン数（データの数）とに基づいて、第１バケット及び第２バケットのそれぞれのドメイン数を推定し、推定したドメイン数を示す情報を含む第１バケット及び第２バケットを生成してもよい。 In addition, when the data showing a positive value and the data showing a negative value are included, the generation unit 123 determines the ratio of the maximum value of the data before type conversion to the minimum value before type conversion. The number of domains in each of the first and second buckets is estimated based on the total number of domains (the number of data) in the bucket before update, and the first bucket and the second bucket include information indicating the estimated number of domains. You may generate a bucket.

例えば、更新前のバケットに、１００個のデータが含まれており、当該バケットの最大値が９、最小値が−３であったとする。この場合、最大値と最小値との比率は、３対１である。したがって、更新前のバケットから生成された第１バケット及び第２バケットのそれぞれのドメイン数を、当該比率に基づいて推定すると、第１バケットのドメイン数は７５個であり、第２バケットのドメイン数は２５個である。このようにすることで、生成部１２３は、第１バケット及び第２バケットのドメイン数を簡易的に推定することができる。 For example, suppose that the bucket before update contains 100 pieces of data, and the maximum value of the bucket is 9 and the minimum value is -3. In this case, the ratio of the maximum value to the minimum value is 3: 1. Therefore, when the number of domains of the first bucket and the second bucket generated from the bucket before the update is estimated based on the ratio, the number of domains in the first bucket is 75, and the number of domains in the second bucket is 75. Is 25 pieces. By doing so, the generation unit 123 can easily estimate the number of domains in the first bucket and the second bucket.

［負値対応処理］
生成部１２３は、特定されたカラムに対応するデータに、負の値を示すデータを含むか否かを判定する。生成部１２３は、負の値を示すデータを含むと判定すると、負値対応処理を実行し、中間結果ヒストグラムとして生成されているヒストグラムのバケットを変更する。 [Negative value handling processing]
The generation unit 123 determines whether or not the data corresponding to the specified column includes data showing a negative value. When the generation unit 123 determines that the data indicating a negative value is included, the generation unit 123 executes the negative value correspondence process and changes the bucket of the histogram generated as the intermediate result histogram.

まず、生成部１２３は、正の値を示すデータと負の値を示すデータとを含むバケットが存在していると判定すると、当該バケットを、正の値を有する第１バケットと、負の値を有する第２バケットとに分解する。 First, when the generation unit 123 determines that a bucket containing data showing a positive value and data showing a negative value exists, the bucket is divided into a first bucket having a positive value and a negative value. Disassemble into a second bucket having.

続いて、生成部１２３は、既に生成されている中間結果ヒストグラムに含まれるデータの最大値が正の値であるか否かを判定する。生成部１２３は、中間結果ヒストグラムに含まれるデータの最大値が正の値である場合には、正の値を有するデータに対応する１以上の第１バケットにおける最大値を中間結果ヒストグラムに含まれるデータにおける最大値に更新する。また、生成部１２３は、中間結果ヒストグラムに含まれるデータの最小値が負の値である場合には、負の値を有するデータに対応する１以上の第２バケットにおける最大値を中間結果ヒストグラムに含まれるデータの最小値に更新する。 Subsequently, the generation unit 123 determines whether or not the maximum value of the data included in the already generated intermediate result histogram is a positive value. When the maximum value of the data included in the intermediate result histogram is a positive value, the generation unit 123 includes the maximum value in one or more first buckets corresponding to the data having a positive value in the intermediate result histogram. Update to the maximum value in the data. Further, when the minimum value of the data included in the intermediate result histogram is a negative value, the generation unit 123 converts the maximum value in one or more second buckets corresponding to the data having a negative value into the intermediate result histogram. Update to the minimum value of the contained data.

また、生成部１２３は、中間結果ヒストグラムに含まれるデータの最大値が負の値である場合には、当該データの最大値をヒストグラムに含まれるデータにおける最小値に設定するとともに、当該データの最小値をヒストグラムに含まれるデータの最大値に設定する。 Further, when the maximum value of the data included in the intermediate result histogram is a negative value, the generation unit 123 sets the maximum value of the data to the minimum value of the data included in the histogram and sets the minimum value of the data. Set the value to the maximum value of the data contained in the histogram.

また、生成部１２３は、第２バケットにおける最小値と最大値とを入れ替える。例えば、生成部１２３は、桁違い対応処理を実行したことによって、第２バケットの最小値が−９９、最大値が−１０となっている場合に、これらの最小値と最大値とを入れ替えて、第２バケットの最小値を−１０、最大値を−９９とする。 Further, the generation unit 123 replaces the minimum value and the maximum value in the second bucket. For example, when the minimum value of the second bucket is -99 and the maximum value is -10 due to the execution of the digit difference handling process, the generation unit 123 replaces these minimum and maximum values. , The minimum value of the second bucket is -10, and the maximum value is -99.

また、生成部１２３は、中間結果ヒストグラムが、桁違い対応処理によって更新されておらず、型変換前のヒストグラムと同じ状態であり、第２バケットが複数存在する場合に、複数の第２バケットの並び順を逆順にする。 Further, in the generation unit 123, when the intermediate result histogram is not updated by the digit difference processing and is in the same state as the histogram before the type conversion, and there are a plurality of second buckets, the plurality of second buckets Reverse the order.

［絞り込み条件が適用された後のドメイン数の推定］
生成部１２３は、クエリに含まれる、型変換が行われるカラムに対応する絞り込み条件によって絞り込まれた後のドメイン数（データの数）を、生成した中間結果ヒストグラムに基づいて推定する。 [Estimation of the number of domains after the filtering conditions are applied]
The generation unit 123 estimates the number of domains (the number of data) after being narrowed down by the narrowing conditions corresponding to the columns to be type-converted included in the query, based on the generated intermediate result histogram.

クエリにおいて型変換が行われたカラムは、クエリ実行中の絞り込み処理が終了した後に再び型変換が行われる前の型のデータとして保持される。したがって、絞り込み処理が終了した後の当該データのヒストグラムは、型変換が行われる前の型のデータに対応するヒストグラム、すなわち、予め記憶部１１に記憶されているヒストグラムに対応している必要がある。 The column for which type conversion has been performed in the query is retained as data of the type before the type conversion is performed again after the filtering process during query execution is completed. Therefore, the histogram of the data after the filtering process is completed needs to correspond to the histogram corresponding to the data of the type before the type conversion is performed, that is, the histogram stored in the storage unit 11 in advance. ..

そこで、生成部１２３は、記憶部１１に記憶されている、当該カラムに対応する型変換前のヒストグラムに含まれる複数のバケットのそれぞれに対応するデータの数を、中間結果ヒストグラムに基づいて推定したデータの数と、絞り込み条件によって絞り込まれる前のデータの数とに基づいて更新することにより、絞り込み処理が終了した後の当該カラムのヒストグラムをさらに生成する。 Therefore, the generation unit 123 estimates the number of data corresponding to each of the plurality of buckets stored in the storage unit 11 in the histogram before type conversion corresponding to the column, based on the intermediate result histogram. By updating based on the number of data and the number of data before being narrowed down by the narrowing conditions, a histogram of the column after the narrowing down process is completed is further generated.

コスト算出部１２４は、生成部１２３が生成した中間結果ヒストグラムに基づいてクエリの処理コストを算出する。例えば、コスト算出部１２４は、生成部１２３が生成した中間結果ヒストグラムに基づいてクエリが実行された場合における中間データのサイズを推定する。 The cost calculation unit 124 calculates the processing cost of the query based on the intermediate result histogram generated by the generation unit 123. For example, the cost calculation unit 124 estimates the size of the intermediate data when the query is executed based on the intermediate result histogram generated by the generation unit 123.

実行部１２５は、コスト算出部１２４が算出した処理コストに基づいてクエリを実行する。具体的には、まず、実行部１２５は、クエリに対応する全ての演算処理のヒストグラムの作成及びコスト推定が完了すると、推定した処理コストに基づいてクエリ最適化を含む実行計画の最適化を行う。そして、実行部１２５は、最適化された実行計画に基づいてクエリを実行する。 The execution unit 125 executes the query based on the processing cost calculated by the cost calculation unit 124. Specifically, first, when the execution unit 125 completes the creation of histograms and cost estimation of all arithmetic processing corresponding to the query, it optimizes the execution plan including query optimization based on the estimated processing cost. .. Then, the execution unit 125 executes the query based on the optimized execution plan.

［データベースシステム１における処理の流れ］
続いて、データベースシステム１における処理の流れについて説明する。図４は、本実施形態に係るデータベースシステム１における処理の流れを示すフローチャートである。 [Process flow in database system 1]
Subsequently, the processing flow in the database system 1 will be described. FIG. 4 is a flowchart showing a processing flow in the database system 1 according to the present embodiment.

まず、データベース１１１に対応するクエリをユーザ端末２から取得する（Ｓ１０）。
続いて、特定部１２２は、取得されたクエリに基づいて、型変換が行われるカラムを特定する（Ｓ２０）。 First, the query corresponding to the database 111 is acquired from the user terminal 2 (S10).
Subsequently, the identification unit 122 identifies the column in which the type conversion is performed based on the acquired query (S20).

続いて、生成部１２３は、中間結果ヒストグラムとして、型変換が行われる前のヒストグラムと同じヒストグラムを生成する（Ｓ３０）。
続いて、生成部１２３は、特定したカラムのデータが桁違いのデータを含んでいるか否かを判定する（Ｓ４０）。生成部１２３は、桁違いのデータを含んでいると判定すると、Ｓ５０に処理を移し、桁違い対応処理を実行する。桁違い対応処理の詳細については後述する。生成部１２３は、桁違いのデータを含んでいないと判定すると、Ｓ６０に処理を移す。 Subsequently, the generation unit 123 generates the same histogram as the histogram before the type conversion is performed as the intermediate result histogram (S30).
Subsequently, the generation unit 123 determines whether or not the data in the specified column contains data of an order of magnitude (S40). When the generation unit 123 determines that the data contains an order of magnitude, the generation unit 123 shifts the process to S50 and executes the process of dealing with the order of magnitude. The details of the digit difference processing will be described later. If the generation unit 123 determines that the data does not include an order of magnitude of data, the generation unit 123 shifts the processing to S60.

続いて、生成部１２３は、特定したカラムのデータが負の値のデータを含んでいるか否かを判定する（Ｓ６０）。生成部１２３は、負の値のデータを含んでいると判定すると、Ｓ７０に処理を移し、負値対応処理を実行する。負値対応処理の詳細については後述する。生成部１２３は、負の値のデータを含んでいないと判定すると、Ｓ８０に処理を移す。 Subsequently, the generation unit 123 determines whether or not the data in the specified column contains data having a negative value (S60). When the generation unit 123 determines that the data having a negative value is included, the generation unit 123 shifts the process to S70 and executes the negative value correspondence process. The details of the negative value handling process will be described later. If the generation unit 123 determines that the data does not contain negative value data, the process is transferred to S80.

続いて、生成部１２３は、カラムの型を修正する（Ｓ８０）。続いて、生成部１２３は、特定部１２２が特定した全てのカラムの型を修正したか否かを判定する（Ｓ９０）。生成部１２３は、全てのカラムの型を修正したと判定すると、Ｓ１００に処理を移し、全てのカラムの型を修正していないと判定すると、Ｓ３０に処理を移す。 Subsequently, the generation unit 123 corrects the column type (S80). Subsequently, the generation unit 123 determines whether or not the types of all the columns specified by the specific unit 122 have been corrected (S90). When the generation unit 123 determines that the types of all columns have been modified, the process is transferred to S100, and when it is determined that the types of all columns have not been modified, the process is transferred to S30.

続いて、コスト算出部１２４、生成部１２３が生成した中間結果ヒストグラムに基づいてクエリの処理コストを算出する（Ｓ１００）。
続いて、生成部１２３は、型戻し処理を実行する（Ｓ１１０）。型戻し処理の詳細については後述する。
続いて、実行部１２５は、コスト算出部１２４が算出した処理コストに基づいてクエリを実行する（Ｓ１２０）。 Subsequently, the processing cost of the query is calculated based on the intermediate result histogram generated by the cost calculation unit 124 and the generation unit 123 (S100).
Subsequently, the generation unit 123 executes the type return process (S110). The details of the mold reversion process will be described later.
Subsequently, the execution unit 125 executes the query based on the processing cost calculated by the cost calculation unit 124 (S120).

［桁違い対応処理における処理の流れ］
続いて、桁違い対応処理における処理の流れについて説明する。図５は、本実施形態に係る桁違い対応処理における処理の流れを示すフローチャートである。 [Process flow in digit difference processing]
Next, the processing flow in the digit difference processing will be described. FIG. 5 is a flowchart showing a processing flow in the digit difference handling processing according to the present embodiment.

まず、生成部１２３は、特定部１２２が特定したカラムに対応するヒストグラムにおいて、正の値及び負の値に対応するバケット、すなわち、正の値のデータ及び負の値のデータを含むバケットがあるか否かを判定する（Ｓ５１）。 First, the generation unit 123 has a bucket corresponding to positive and negative values, that is, a bucket containing positive value data and negative value data in the histogram corresponding to the column specified by the specific unit 122. Whether or not it is determined (S51).

生成部１２３は、正の値及び負の値に対応するバケットがあると判定すると、Ｓ５２に処理を移し、当該バケットを正の値のデータに対応する第１バケット、及び負の値のデータに対応する第２バケットに分解する。生成部１２３は、正の値及び負の値に対応するバケットがないと判定すると、Ｓ５３に処理を移す。 When the generation unit 123 determines that there are buckets corresponding to positive and negative values, it shifts processing to S52, and shifts the bucket to the first bucket corresponding to the positive value data and the negative value data. Disassemble into the corresponding second bucket. When the generation unit 123 determines that there is no bucket corresponding to the positive value and the negative value, the process is transferred to S53.

続いて、生成部１２３は、正の値のデータを第１バケットに集約し（Ｓ５３）、負の値のデータを第２バケットに集約する（Ｓ５４）。
続いて、生成部１２３は、カラムの最大値及び最小値を更新する（Ｓ５５）。具体的には、生成部１２３は、第１バケットの最大値を、カラムの最大値に設定し、第２バケットの最大値を、カラムの最小値に設定する。 Subsequently, the generation unit 123 aggregates the positive value data in the first bucket (S53) and aggregates the negative value data in the second bucket (S54).
Subsequently, the generation unit 123 updates the maximum value and the minimum value of the column (S55). Specifically, the generation unit 123 sets the maximum value of the first bucket to the maximum value of the column, and sets the maximum value of the second bucket to the minimum value of the column.

［負値対応処理における処理の流れ］
続いて、負値対応処理における処理の流れについて説明する。図６は、本実施形態に係る負値対応処理における処理の流れを示すフローチャートである。 [Processing flow in negative value handling processing]
Next, the processing flow in the negative value correspondence processing will be described. FIG. 6 is a flowchart showing a processing flow in the negative value correspondence processing according to the present embodiment.

まず、生成部１２３は、特定部１２２が特定したカラムに対応するヒストグラムにおいて、正の値及び負の値に対応するバケット、すなわち、正の値のデータ及び負の値のデータを含むバケットがあるか否かを判定する（Ｓ７１）。 First, the generation unit 123 has a bucket corresponding to positive and negative values, that is, a bucket containing positive value data and negative value data in the histogram corresponding to the column specified by the specific unit 122. Whether or not it is determined (S71).

生成部１２３は、正の値及び負の値に対応するバケットがあると判定すると、Ｓ７２に処理を移し、当該バケットを正の値のデータに対応する第１バケット、及び負の値のデータに対応する第２バケットに分解する。生成部１２３は、正の値及び負の値に対応するバケットがないと判定すると、Ｓ７３に処理を移す。 When the generation unit 123 determines that there are buckets corresponding to positive and negative values, it shifts processing to S72, and shifts the bucket to the first bucket corresponding to the positive value data and the negative value data. Disassemble into the corresponding second bucket. When the generation unit 123 determines that there is no bucket corresponding to the positive value and the negative value, the process is transferred to S73.

続いて、生成部１２３は、特定部１２２が特定したカラムに対応するデータに、正の値のデータが存在するか否かを判定する（Ｓ７３）。生成部１２３は、正の値のデータが存在すると判定すると、Ｓ７４に処理を移し、型変換前の負の値のデータの最大値を、カラムのデータの最小値に設定する。生成部１２３は、正の値のデータが存在しないと判定すると、Ｓ７５に処理を移し、カラムに対応するデータの最大値と、最小値とを入れ替える。 Subsequently, the generation unit 123 determines whether or not the data having a positive value exists in the data corresponding to the column specified by the specific unit 122 (S73). When the generation unit 123 determines that the positive value data exists, the process is transferred to S74, and the maximum value of the negative value data before the type conversion is set to the minimum value of the column data. When the generation unit 123 determines that the data having a positive value does not exist, the process is transferred to S75, and the maximum value and the minimum value of the data corresponding to the column are exchanged.

続いて、生成部１２３は、負の値のデータに対応する第２バケットの最大値と最小値とを入れ替える（Ｓ７６）。
続いて、生成部１２３は、第２バケットが複数存在する場合に、複数の第２バケットの順序を逆順化する（Ｓ７７）。 Subsequently, the generation unit 123 replaces the maximum value and the minimum value of the second bucket corresponding to the negative value data (S76).
Subsequently, the generation unit 123 reverses the order of the plurality of second buckets when there are a plurality of second buckets (S77).

［型戻し処理における処理の流れ］
続いて、型戻し処理における処理の流れについて説明する。図７は、本実施形態に係る型戻し処理における処理の流れを示すフローチャートである。 [Process flow in type return processing]
Next, the flow of processing in the mold return processing will be described. FIG. 7 is a flowchart showing a processing flow in the mold return processing according to the present embodiment.

まず、生成部１２３は、クエリに含まれる条件式に型変換処理が含まれるか否かを判定する（Ｓ１１１）。生成部１２３は、条件式に型変換処理が含まれていない場合には、Ｓ１１２に処理を移し、条件式に対応する処理を、型変換前のヒストグラムに適用することにより、ヒストグラムを更新する。 First, the generation unit 123 determines whether or not the conditional expression included in the query includes the type conversion process (S111). When the conditional expression does not include the type conversion process, the generation unit 123 updates the histogram by transferring the process to S112 and applying the process corresponding to the conditional expression to the histogram before the type conversion.

生成部１２３は、条件式に型変換処理が含まれている場合には、型変換対象のカラムに対して型変換を実行する（Ｓ１１３）。
続いて、生成部１２３は、型変換後のカラムに対して条件式が示す演算を実行する（Ｓ１１４）。 When the conditional expression includes the type conversion process, the generation unit 123 executes the type conversion for the column to be type-converted (S113).
Subsequently, the generation unit 123 executes the operation indicated by the conditional expression on the column after the type conversion (S114).

続いて、生成部１２３は、演算の実行前後のデータ数に基づいて、カラムに対応するデータの選択率を算出し（Ｓ１１５）、当該選択率に基づいて、型変換前のヒストグラムを更新する（Ｓ１１６）。 Subsequently, the generation unit 123 calculates the selection rate of the data corresponding to the column based on the number of data before and after the execution of the operation (S115), and updates the histogram before type conversion based on the selection rate (S115). S116).

続いて、生成部１２３は、未処理の条件式が存在するか否かを判定する（Ｓ１１７）。生成部１２３は、未処理の条件式が存在する場合には、Ｓ１１１に処理を移し、未処理の条件式が存在しない場合には、本フローチャートの処理を終了する。 Subsequently, the generation unit 123 determines whether or not an unprocessed conditional expression exists (S117). If the unprocessed conditional expression exists, the generation unit 123 shifts the process to S111, and if the unprocessed conditional expression does not exist, the generation unit 123 ends the process of this flowchart.

［本実施形態における効果］
以上のとおり、本実施形態に係るデータベースシステム１は、取得したクエリにおいて、異なる種類の型に型変換が行われるカラムに対応する型変換が行われる前のデータの度数分布を示し、予め記憶部１１に記憶されているヒストグラムに基づいて、当該ヒストグラムよりも度数分布の粒度が荒いヒストグラムを、クエリの中間結果に含まれるカラムの中間結果ヒストグラムとして生成する。これにより、データベースシステム１は、粒度が粗いものの、型変換に対応してヒストグラムやバケットの最大値及び最小値を変換することによって、型変換後のデータに対応する中間結果ヒストグラムを生成し、当該中間結果ヒストグラムに基づいてクエリの処理コストを精度良く算出することができる。 [Effect in this embodiment]
As described above, the database system 1 according to the present embodiment shows the frequency distribution of the data before the type conversion is performed corresponding to the column in which the type conversion is performed to different types in the acquired query, and the storage unit is stored in advance. Based on the histogram stored in 11, a histogram having a coarser frequency distribution than the histogram is generated as an intermediate result histogram of the columns included in the intermediate result of the query. As a result, the database system 1 generates an intermediate result histogram corresponding to the data after the type conversion by converting the maximum value and the minimum value of the histogram and the bucket in response to the type conversion, although the grain size is coarse. The query processing cost can be calculated accurately based on the intermediate result histogram.

また、データベースシステム１は、中間結果ヒストグラムに基づいてクエリの処理コストを算出した後に型戻し処理を実行し、絞り込み条件が適用された後のカラムのヒストグラムをさらに生成する。このようにすることで、データベースシステム１は、絞り込み処理が行われた後に発生する処理に対応する処理コストの推定精度を向上させることができる。 Further, the database system 1 executes the type return processing after calculating the processing cost of the query based on the intermediate result histogram, and further generates the histogram of the column after the narrowing condition is applied. By doing so, the database system 1 can improve the estimation accuracy of the processing cost corresponding to the processing that occurs after the narrowing-down processing is performed.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更又は改良を加えることが可能であることが当業者に明らかである。例えば、上述の複数の実施形態を組み合わせてもよい。また、特に、装置の分散・統合の具体的な実施形態は以上に図示するものに限られず、その全部又は一部について、種々の付加等に応じて、又は、機能負荷に応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments. It will be apparent to those skilled in the art that various changes or improvements can be made to the above embodiments. For example, the above-mentioned plurality of embodiments may be combined. Further, in particular, the specific embodiment of the distribution / integration of the apparatus is not limited to the one shown above, and all or a part thereof may be arbitrarily added according to various additions or functional loads. It can be functionally or physically distributed / integrated in units.

１・・・データベースシステム、１１・・・記憶部、１２・・・制御部、１２１・・・取得部、１２２・・・特定部、１２３・・・生成部、１２４・・・コスト算出部、１２５・・・実行部、２・・・ユーザ端末 1 ... database system, 11 ... storage unit, 12 ... control unit, 121 ... acquisition unit, 122 ... specific unit, 123 ... generation unit, 124 ... cost calculation unit, 125 ... Execution unit, 2 ... User terminal

Claims

Computer runs,
In a query statement executed in a database system, a specific step to identify a column that undergoes type conversion to a different type, and
The frequency distribution of the data before the type conversion corresponding to the specified column is shown, and a histogram having a coarser frequency distribution than the histogram is obtained based on the histogram stored in the storage unit in advance. A generation step that is included in the intermediate result of the query statement and is generated as an intermediate result histogram of the data after the type conversion corresponding to the column is performed.
Histogram generation method including.

In the specific step, the computer identifies, in the query statement, the column in which the type conversion from the type corresponding to the numerical value to the type corresponding to the character string is performed.
In the generation step, when the data before the type conversion corresponding to the specified column includes data having a different number of digits, the computer of all the data corresponding to the column. Generate a bucket containing information indicating the number of the data and the number of values indicated by the data in the data interval including the maximum value and the minimum value before the type conversion, and the intermediate result based on the generated bucket. Generate a histogram,
The histogram generation method according to claim 1.

In the specific step, the computer identifies, in the query statement, the column in which the type conversion from the type corresponding to the numerical value to the type corresponding to the character string is performed.
In the generation step, when the computer includes data showing a positive value and data showing a negative value in the data before the type conversion corresponding to the specified column is performed. A first bucket corresponding to all data having a positive value is generated, and a second bucket corresponding to all data having a negative value is generated.
The histogram generation method according to claim 1 or 2.

In the generation step, the computer sets the maximum and minimum values of the data contained in the first bucket and the second bucket to the maximum and minimum values of the data contained in the respective buckets. Set,
The histogram generation method according to claim 3.

In the generation step, the computer sets the maximum value of the first bucket to the maximum value in the data included in the intermediate result histogram, and sets the minimum value of the second bucket to the maximum value of the data included in the intermediate result histogram. Set to the minimum value,
The histogram generation method according to claim 4.

In the generation step, when the maximum value of the data included in the histogram is a positive value, the computer sets the maximum value in the first bucket, which is one or more buckets corresponding to the data having a positive value. The maximum value in the data included in the histogram is set, and the minimum value in the second bucket, which is one or more buckets corresponding to the data having a negative value, is set as the minimum value in the data included in the histogram.
The histogram generation method according to any one of claims 1 to 5.

In the generation step, when the maximum value of the data included in the histogram is a negative value, the computer sets the maximum value of the data to the minimum value of the data included in the histogram and the data. Set the minimum value of to the maximum value of the data contained in the histogram,
The histogram generation method according to any one of claims 1 to 6.

In the generation step, when the data corresponding to the specified column includes data indicating a positive value and data indicating a negative value, the type in the data corresponding to the column. The first, which is one or more buckets corresponding to data having a positive value, based on the ratio of the maximum value before conversion to the minimum value before type conversion and the number of all data corresponding to the column. The number of data contained in each of the bucket and the second bucket, which is one or more buckets corresponding to the data having a negative value, is estimated, and the first bucket and the second bucket including information indicating the estimated number of data are included. Generate a bucket,
The histogram generation method according to any one of claims 3 to 7.

In the generation step, the computer estimates the number of data after being narrowed down by the narrowing conditions corresponding to the columns included in the query statement based on the generated intermediate result histogram and stores it in the storage unit. The query is performed by updating the number of data corresponding to each of the plurality of buckets included in the stored histogram based on the estimated number of data and the number of data before being narrowed down by the narrowing condition. Further generate a histogram of the column after the statement is executed,
The histogram generation method according to any one of claims 1 to 8.

In the query statement executed in the database system, a specific part that identifies the column that undergoes type conversion to a different type, and
The frequency distribution of the data before the type conversion corresponding to the specified column is shown, and a histogram having a coarser frequency distribution than the histogram is obtained based on the histogram stored in the storage unit in advance. A generator included in the intermediate result of the query statement and generated as an intermediate result histogram of the data after the type conversion corresponding to the column is performed.
Histogram generator with.

Computer,
In the query statement executed in the database system, the specific part that identifies the column that undergoes type conversion to a different type, and
The frequency distribution of the data before the type conversion corresponding to the specified column is shown, and a histogram having a coarser frequency distribution than the histogram is obtained based on the histogram stored in the storage unit in advance. A generator that is included in the intermediate result of the query statement and is generated as an intermediate result histogram of the data after the type conversion corresponding to the column is performed.
Histogram generation program that functions as.