WO2015155846A1

WO2015155846A1 - System for storage device to execute database hash join process

Info

Publication number: WO2015155846A1
Application number: PCT/JP2014/060238
Authority: WO
Inventors: 渡辺　聡; 能毅黒川; 芳孝辻本
Original assignee: 株式会社日立製作所
Priority date: 2014-04-09
Filing date: 2014-04-09
Publication date: 2015-10-15
Also published as: JP6158430B2; JPWO2015155846A1

Abstract

In order to suppress a decrease in processing performance in the event of memory overflow in a server device during a database hash join process, according to the present invention, a storage device executes a process of selecting data as the object of a join, a hash value calculating process, a process of selecting data from a hash table, a selection rate calculating process, and a process of storing a part of the hash table in a temporary area, so as to suppress a decrease in database performance.

Description

[Name of invention determined by ISA based on Rule 37.2] System in which storage device executes database hash join processing

The present invention relates to a storage apparatus and system having a function of assisting database processing.

A database management system (DBMS: DataBase Management System) for managing a database is software for efficiently executing data accumulation and analysis. A relational database management system (RDBMS: Relational DataBase Management System), which is a type of DBMS, manages data using a set of data (hereinafter referred to as relations or tables). When the RDBMS performs data analysis, it combines relations. A relation join is an operation of creating a new relation by combining a plurality of relations into one according to a join condition.

In connection of relations, a method called hash connection is widely used. In hash join, a hash table for one relation (hereinafter referred to as the outer table) is first created. Next, a hash table of the other relation (hereinafter referred to as the inner table) is created. The outer table and inner table are joined by collating the hash table of the outer table data with the hash table of the inner table data. The hash join is described in, for example, Patent Document 1.

In a hash join, a situation called a memory overflow of the server device may occur. The overflow of the memory of the server device occurs when the hash table of the outer table does not fit in the memory of the server device. When the server device overflows, the hash table of the outer table is divided into small sections called buckets, and the hash table is written to the storage device. Next, the hash table of the inner table is also divided into buckets and written to the storage device. Then, the outer table bucket and the inner table bucket are read from the storage apparatus into the memory of the server apparatus, and the joining process is executed. For example, Japanese Patent Application Laid-Open No. 2003-259542 describes a memory overflow of the server device.

Japanese Patent Laid-Open No. 10-326215 Japanese Patent Laid-Open No. 10-269248

In a data analysis process using a hash table, when a memory overflow occurs in the server device, it is necessary to divide the hash table of the table into small sections called buckets and write them to the storage device, and then read the hash table from the storage device again. is there. Therefore, the performance of the data analysis process is reduced as compared with the case where no memory overflow occurs.

Therefore, the purpose is to prevent the occurrence of memory overflow in the server device and improve the performance of data analysis processing. In particular, the processing performance of the RDBMS can be reduced by preventing memory overflow of the server device caused by hash join in the RDBMS and reducing the consumption of the CPU resource of the server device and the consumption of the network resource between the server device and the storage device. The purpose is to improve.

From one aspect of the present invention, a system is provided that includes a server device that performs table join processing based on a hash value, and a storage device that is connected to the server device via a network. In the system, the server device executes table joining processing, first information for designating data to be joined in the table, and second information for designating data for hash value calculation in the table; A first processing unit for generating a command including the third information for designating a hash value, and a first input / output unit for transmitting the command to the storage device via the network, The apparatus includes a storage unit in which a plurality of tables are stored, a second input / output unit that receives a command via a network, and data to be combined among data constituting the table based on the first information. A second processing unit that selects and calculates a hash value of data designated by the second information among data constituting the table, and selects a hash value designated by the third information; The second input / output part A hash value that is selected based on the third information, and transmits the data selected based on the first information corresponding to the hash value, the hash table consists to the server apparatus via the network.

More preferably, the storage device has a memory in which a temporary area is secured, and the second processing unit is based on the first information corresponding to the hash value not specified in the third information and the hash value. A hash table composed of the selected data and the selected data is stored in the temporary area.

More preferably, the command includes fourth information related to whether or not the temporary area is required, and the second processing unit uses the fourth information to set an unspecified hash value in the third information and the hash value. Correspondingly, it is determined whether or not to store a hash table composed of data selected based on the first information in the temporary area.

More preferably, the second processing unit calculates a selection rate that is a ratio of data to be combined among data constituting the table, and the second input / output unit transmits the selection rate to the server device, and One processing unit determines whether or not to use the temporary area based on the selection rate.

According to another aspect of the present invention, there is provided a storage apparatus connected via a network to a server apparatus that executes table join processing based on a hash value. The storage device stores a plurality of tables, first information for designating data to be combined in the tables, second information for designating data for hash value calculation in the tables, , Input / output unit for receiving a command including the third information specifying the hash value from the server device via the network, and data to be combined among the data constituting the table based on the first information A processing unit that calculates a hash value of the data specified by the second information from among the data constituting the table, and selects the hash value specified by the third information. The output unit generates a hash table composed of a hash value selected based on the third information and data selected based on the first information corresponding to the hash value via the network. In To trust.

According to the present invention, it is possible to prevent the occurrence of memory overflow and improve the performance of data analysis processing.

It is a figure which shows the example of the system configuration | structure of this invention. It is a figure which shows the example of another system configuration | structure of this invention. It is a figure which shows the example of the function of DBMS. It is a figure which illustrates table area management information. It is a figure which illustrates table structure information. It is a figure which illustrates database statistical information. It is a figure which shows the example of the SQL sentence received from the terminal device of the user of a database. It is a figure which shows the example of a SQL execution result. It is a figure which illustrates the execution plan which the execution plan preparation part created. It is a figure which shows an example of the communication flow performed between the server apparatus and storage apparatus in this invention. It is a figure which shows the example of an outer table read command. It is a figure which shows the example of the operation | movement flow of determination of a joint process. It is a figure which shows the example of the processing cost model used for judgment of a coupling | bonding system. It is a figure which shows an example of the communication flow performed between the server apparatus and storage apparatus in this invention. It is a figure which shows the example of an outer surface 1st bucket read command. It is a figure which shows the example of the command used when a storage apparatus performs the selection process of data. It is a figure which shows the example of the command used when a storage apparatus performs the preparation process of a hash table. It is a figure which shows the example of the command used when the storage apparatus performs the selection process of the data from a hash table, and the process which stores a part of hash table in a temporary area | region. It is a figure which illustrates the flow of the process performed by the storage apparatus when a command is received. It is a figure which illustrates the detail of a database assist part. It is a figure which shows the example of the operation | movement flow in the database assistance part at the time of receiving the command shown in FIG. It is a figure which shows the example of the operation | movement flow in the database assistance part at the time of receiving the command shown in FIG. It is a figure which shows the example of the operation | movement flow in the database assistance part at the time of receiving the command shown in FIG. It is a figure which shows the example of an outer table hash table.

In the present invention, a database assist unit that assists in database processing is provided in the storage device to improve database processing efficiency. In the present invention, the storage device includes both a case of a device separate from the server device and a case of a storage medium connected to the inside of the server device. The database assist unit of the present invention includes an implementation method for installing a CPU device, an implementation method for installing a hardware circuit, and an implementation method for installing both a CPU device and a hardware circuit.

The database assist unit of the present invention has a function of extracting data necessary for data analysis from the relation, calculating a hash value of the extracted data, and transmitting data having the hash value within a specific range to the server device.

In addition, the database assist unit of the present invention extracts data necessary for data analysis from the relation, calculates a hash value of the extracted data, transmits data in which the hash value falls within a specified range to the server device, and It has a function of storing data whose values are not included in the specified range in a temporary area of the storage device.

Further, the database assist unit of the present invention has a function of calculating a selection rate that is a ratio necessary for data analysis out of all data included in the relation, and transmitting the selection rate to the server device.

In the present invention, a coupling method determination unit is provided that determines a method for executing the coupling process by using the function of the storage device. The joining method determining unit determines a method for executing the joining process based on the selection rate received from the database assist unit.

The joining method determination unit acquires information related to the relation to be hash joined from the execution plan creation unit and database statistical information included in the database. The joining method determining unit determines a command to be transmitted to the storage device based on the selection rate received from the database assist unit and the information received from the database.

The combination method determination unit has a cost model for calculating the processing cost. The coupling method determination unit uses the cost model to select a method that minimizes the processing cost and determines a command to be transmitted to the storage apparatus.

FIG. 1 is a diagram illustrating a system configuration of the present invention. As shown in FIG. 1, a storage apparatus 102 is installed inside the server apparatus 101, and the CPU 103 of the server apparatus 101, the memory 104 of the server apparatus, and the storage apparatus 102 are connected by an internal network 111 of the server apparatus 101. A DBMS 105 is stored in the memory 104. The DBMS 105 is a program executed by the CPU 103. Inside the storage apparatus 102, a CPU 106 of the storage apparatus 102, a memory 108 of the storage apparatus 102, a database assist unit 107, and a flash memory 110 are installed and connected by an internal network 111. A storage control program 109 is stored in the memory 108. The storage control program 109 is executed by the CPU 106. In the system configuration of FIG. 1, the flash memory 110 is assumed as the information storage medium, but other information storage media may be used. The flash memory 110 stores a plurality of tables.

FIG. 2 is a diagram illustrating a system configuration different from FIG. As shown in FIG. 2, the present invention can also be implemented in a configuration in which the server apparatus 101 and the storage apparatus 102 are connected by an external network 201 via input /

output interfaces

202 and 203.

1 and 2 show a configuration diagram in which the database assist unit 107 is implemented as hardware. The database assist unit 107 may be implemented as a program, stored in the memory 108, and executed by the CPU 106.

FIG. 3 is a diagram illustrating details of the DBMS 105. The DBMS 105 is used for description of a query, a user transmission / reception unit 301 that transmits / receives data such as a query (processing request for a database) transmitted from the terminal device to the server apparatus 101 including the DBMS 105 and an execution result based on the query. An SQL analysis unit 302 that analyzes an SQL statement, an execution plan creation unit 303 that determines an execution method of the SQL statement, an execution control unit 304 that controls execution of the SQL statement, and a storage transmission / reception unit 310 that transmits / receives data to / from the storage device Table area management information 306 for storing table storage area information, table structure information 307 for storing table structure information, database statistical information 308 for storing database statistical information, and selection rate and processing received from the storage apparatus 102 Determining the coupling method based on the cost model 305, has a database performance information 309, the storage transceiver 310 to perform the transmission and reception of data between the storage device 102 for storing the performance information in the database.

FIG. 4 is a diagram illustrating the table area management information 306. The table area management information 306 stores a table area name 401, a table area start address 402, and a table area end address 403. The start address 402 and the end address 403 are addresses used for data input / output with the storage apparatus 102, and may be called logical block addresses.

FIG. 5 is a diagram illustrating the table structure information 307. The table structure information 307 stores a column name 501 and a data format 502 for each table area name 401. Here, the column name 501 is the name of the column constituting the table, and the data format 502 is the data type of the column.

FIG. 6 is a diagram illustrating the database statistical information 308. The database statistical information 308 stores the number of rows 601, the average row length 602, and the index presence / absence 603 for each table area name. Here, the number of rows 601 is the number of rows stored in the table area name 401, the average row length 602 is the average number of bytes stored in the table area name 401, and the index presence / absence 603 is the corresponding table. Is information indicating whether or not an index has been created.

The database performance information 309 describes the memory capacity that can be used by the database 105, the read performance of the storage device 102, and the write performance of the storage device 102.

FIG. 7 shows an example of an SQL sentence received by the user transmitting / receiving unit 301 from the terminal device of the database user. The SQL statement 701 illustrated in FIG. 7 is an instruction to join the PART table and the LINEITEM table to extract the P_NAME column of the PART table and the L_LINENUMBER column of the LINEITEM table.

FIG. 8 shows an example of the SQL execution result 801 transmitted by the user transmission / reception unit 301. In the SQL execution result 801, the result of executing the processing in the database according to the instruction of the SQL statement 701 is described.

The SQL analysis unit 302 inspects the grammar of the SQL statement 701 received from the user transmission / reception unit 301 and transmits the SQL statement 701 to the execution plan creation unit 303.

The execution plan creation unit 303 refers to the database statistical information 308 and determines an execution method of the received SQL statement 701.

FIG. 9 is a diagram illustrating an execution plan 901 created by the execution plan creation unit 303. The execution plan 901 is composed of a plurality of steps 902. By executing each step 902 in order, the processing specified in the SQL statement 701 is executed.

In step 1 of the execution plan 901, processing for reading the PART table and creating the HASH table is specified. It is described that the selection condition at this time is “P_SIZE = 3”, the combined column is “P_PARTKEY”, and the extracted column is “P_NAME”. In step 2 of the execution plan 901, processing for reading the LINEITEM table and creating the HASH table is specified. In this case, the selection condition is “L_SUPPKEY”, and the extraction sequence is specified to be “L_LINENUMBER”. Step 3 of the execution plan 901 describes that the combination method determination unit 305 performs processing for determining the combination method. In step 4 of the execution plan 901, processing for combining the PART table and the LINEITE table is specified. The join condition at this time is “P_PARTKEY = L_PARTKEY”, and it is specified that the extracted columns are P_NAME and L_LINENUMBER.

FIG. 10 is a flowchart showing communication performed between the server apparatus 101 and the storage apparatus 102 in order to execute STEP1 and STEP2 of FIG. The outer table read command 1001 includes an address where the outer table is stored, a selection condition, a column specification for retrieving a hash value, and a column extraction condition. The command 1001 and other commands are generated by the execution control unit 304. FIG. 11 is a diagram illustrating an outer table read command 1001. The start address 1101 and the end address 1102 are determined by referring to the address where the PART table is stored from the table area management information 306. The selection condition 1103 indicates the selection condition “P_SIZE = 3” of the outer table, and indicates that the 123rd to 131st bytes from the top of the row data are compared with “3”. A column 1104 for calculating a hash value indicates that a hash value is calculated from the 0th byte to the 7th byte of the row data. The extraction condition 1105 indicates that the 8th to 63rd bytes of the row data are extracted.

The storage apparatus 102 that has received the command executes the data selection 1002 and the hash value calculation 1003 to select the outer table hash table and the selection rate that are configured by the hash value and the data extracted by the extraction condition 1105 corresponding to the hash value. Transmission 1004 is executed. Data received by transmission 1004 is stored in memory 104. In the example of the outer table hash table in FIG. 24, the hash value calculated from P_PARTKEY is stored as the join column hash value, and the value of P_NAME of the corresponding row is stored in the extracted data. The selection rate is the ratio of rows (or data) to be combined among all rows (or data) included in the table, that is, rows that meet the selection condition 1103 among all the rows included in the table. It means the ratio. The operation of the storage apparatus 102 will be described with reference to FIG.

The inner table read command 1211 includes an address where the inner table is stored, a selection condition, a column specification for retrieving a hash value, and a column extraction condition. The storage apparatus 102 that has received the command executes data selection 1212 and hash value calculation 1213, and executes transmission 1214 of the outer table hash table and the selection rate. Data received by transmission 1214 is stored in memory 104. The operation of the storage apparatus 102 will be described with reference to FIG.

The server apparatus 101 that has received the inner table hash table and the selection rate from the storage apparatus 102 executes the determination 1215 of the join process.

FIG. 12 is a flowchart showing the operation of the combination method determination unit 305 that executes the determination 1215 of the combination process. In the determination 1202, the combining method determination unit 305 determines whether or not a memory overflow of the server device occurs. This is determined by comparing the memory capacity that can be used by the database described in the database performance information 309 with the size of the hash table received in the transmission 1004 of the outer table hash table and the selectivity. If it is determined that a memory overflow occurs, the combining method determination unit 305 calculates the number of buckets in processing 1203. The number of buckets is a numerical value indicating how many times the memory capacity of the hash table can be used by the DBMS 105. The number of buckets X is calculated by X = ↑ H / M ↑ using the memory capacity M that can be used by the DBMS 105 and the size H of the outer table hash table. Here, ↑ and ↑ represent carry up to integer values. When it is determined that the memory overflow does not occur, the processing of STEP 3 is ended, and the join processing 1413 is performed using the received outer table hash table and inner table hash table.

In the combination method determination 1204, the combination method determination unit 305 determines the combination method using the processing cost model. FIG. 13 is an example of the processing cost model 1301 used by the combination method determination unit 305. The processing cost model 1301 describes a combination method 1302, a READ amount 1303, and a Write amount 1304. As the join method 1302, selection is made as to whether or not to use a temporary area for the outer table and the inner table. The Read amount is the amount of data to be read from the storage apparatus that is executed when the corresponding method is selected. The amount of write is the amount of write to the storage device that is executed when the corresponding method is selected.

In FIG. 13, A represents the size of the outer table, and B represents the size of the inner table. These can be determined by referring to the table area management information 306. X is the number of buckets. α is the selectivity of the outer table received by the transmission 1004 of the outer table hash table and the selectivity. β is a selectivity of the inner table received by the transmission 1204 of the inner table hash table and the selectivity.

A method that minimizes the processing cost by referring to the combination method determining unit 305, the processing cost model 1301 of FIG. 13 and the reading performance of the storage apparatus 102 and the writing performance of the storage apparatus 102 described in the database performance information 308. select. For example, if the ratio between the read performance and the write performance is 1:10, the value obtained by multiplying the read amount of the processing cost model 1301 by 1 and the value obtained by multiplying the write amount of the processing cost model 1301 by 10 are combined and combined. The processing cost of the method is calculated.

FIG. 14 is a flowchart showing communication performed between the server apparatus 101 and the storage apparatus 102 in order to execute Step 4 of FIG. The server apparatus 101 transmits an outer table first bucket read command 1401 to the server apparatus 102.

FIG. 15 is a diagram exemplifying details of the outer table first bucket read command 1401. The outer table first bucket read command includes an outer table start address 1501, an outer table end address 1502, an outer table selection condition 1503, a column 1504 for calculating a hash value, an extraction condition 1505, temporary area storage presence / absence 1506, and a temporary area start address 1507. , Temporary area end address 1508, hash value lower limit 1509, hash value upper limit 1510 are described.

When the combination method determination unit 305 selects the use of temporary area, the presence of temporary area storage 1505 is described. When the combination method determination unit 305 selects the use of temporary area storage, the temporary area storage presence / absence 1505 is displayed. Enter none.

The storage apparatus 102 that has received the outer-table first bucket read command 1401 performs the operation 1405 of storing the hash table in the temporary area if there is data selection 1402, hash value calculation 1403, hash value selection 1404, temporary area storage presence / absence 1506. Do. In the hash value selection 1404, the hash table of data in this range is transmitted to the server apparatus 101 in accordance with the hash value lower limit 1509 and the hash value upper limit 1510 described in the outer table first bucket read command 1401 (1406). For a hash table of data not included in the hash value range, if there is a description in the temporary area storage presence / absence 1505, it is stored in the temporary area, and if there is none in the temporary area storage presence / absence 1505, Is not stored in the temporary area.

The server apparatus 101 transmits an inner table first bucket read command 1407 to the storage apparatus. Details of the inner-table first bucket read command 1407 are the same as those of the outer-table first bucket read command 1401. If the combination method determination unit 305 selects the use of temporary area, the presence / absence of temporary area storage 1505 is described. If the combination method determination unit 305 selects the absence of temporary area use, the temporary area storage presence / absence 1505 is displayed. Enter none.

The server apparatus 101 performs a join process 1413 using the outer table hash table and the inner table hash table. A join process using a hash table is described in Patent Document 2 and the like.

Thereafter, the server apparatus 101 repeats reading of the outer table and the inner table by the number of buckets. However, when the temporary table storage presence / absence 1505 is specified in the first reading of the table, the addresses of the temporary areas are specified as the start address 1501 and end address 1502 of the command shown in FIG. There is no need to create a hash table from the outer table or the inner table, and the necessary information can be acquired from the hash table stored in the temporary area.

The storage apparatus 102 generally executes data read and write processing. Data read is a process of transmitting data stored in the flash memory 110 to the server apparatus 101. Data write is a process of storing data received from the server apparatus 101 in the flash memory 101.

In addition to data read and write processing, the storage apparatus 102 of the present invention uses data selection processing, hash table creation processing, data selection processing from a hash table, and a portion of the hash table as a temporary area. A function for executing the storing process is provided.

Examples of commands that the server apparatus 101 transmits to the storage apparatus 102 in order to execute these functions are shown in FIG. 16, FIG. 17, and FIG. FIG. 16 shows an example of a command used when the storage apparatus 102 executes data selection processing. As shown in FIG. 16, the command includes an operation code 1601, a start address 1602 of selection target data 1602, an end address 1603 of selection target data 1603, a selection condition 1604, a column extraction condition 1605 for extracting a column, and a result. A result storage memory address 1606 which is a memory address of the server apparatus 101 for storing is included.

FIG. 17 shows an example of a command used when the storage apparatus 102 executes hash table creation processing. As shown in FIG. 17, the command includes an operation code 1701, a start address 1702 for selection target data, an end address 1703 for selection target data, a selection condition 1704, a column for calculating a hash value for designating a column for calculating a hash value. 1705, a column extraction condition 1706 for extracting a column, and a result storage memory address 1706 that is a memory address of the server apparatus 101 for storing the result.

FIG. 18 is an example of commands used when the storage apparatus 102 executes data selection processing from the hash table and processing for storing a part of the hash table in the temporary area. As shown in FIG. 18, the command includes an operation code 1812, a start address 1801 of selection target data 1801, an end address 1802 of selection target data 1802, a selection condition 1803, and a column for calculating a hash value for designating a column for calculating a hash value. 1804, column extraction condition 1805 for extracting a column, temporary area storage presence / absence 1806 specifying whether or not to store in the temporary area, temporary area start address 1807, temporary area end address 1808, and transmitted to the server apparatus 101 Hash value lower limit 1809 for specifying the lower limit of the hash value, hash value upper limit 1810 for specifying the upper limit of the hash value to be transmitted to the server apparatus 101, and result storage memory address that is the memory address of the server apparatus 101 for storing the result 181 It is included.

The storage apparatus 102 that has received the command from the server apparatus 101 executes processing according to the content of the command. When the command is a read process, the storage control program 109 reads the data specified by the command from the flash memory 110 and transmits it to the server apparatus 101. If the command is a write process, the storage control program 109 stores the data received from the server in the flash memory 110.

FIG. 19 is a diagram illustrating a flow of processing executed by the storage control program 109 when the storage apparatus 102 receives a command from the server apparatus 101. In the determination 1902, the storage control program 109 determines whether the command is a read process or a write process. When the command is Read processing or Write processing, the storage control program 109 executes Read processing or Write processing in processing 1903. If the command is not Read processing or Write processing, the storage control program 109 transmits the command to the database assist unit 107 in processing 1904. In processing 1905, the storage control program 109 transmits data specified in the command to the database assist unit 107. In process 1906, the storage control program 109 transmits a process start instruction to the database assist unit 107.

FIG. 20 is a diagram illustrating details of the database assist unit 107. The database assist unit 107 selects data according to the selection condition described in the input data storage memory 2001 for storing data received from the storage control program 109, the command storage memory 2002 for storing commands received from the storage control program 109, and the commands. A row data selection unit 2003, a column data selection unit 2005 that extracts column data from data in accordance with a column extraction condition described in the command, a selection rate calculation unit 2004 that calculates a ratio of data selected by the row data selection unit 2003 A hash value calculation unit 2006 that calculates a hash value of a column specified in the command, a hash value selection unit 2007 that selects a hash value in a range described in the command, and a server transmission data storage memory that stores data to be transmitted to the server device 101 200 Includes a processing end notification unit 2010 notifies the temporary area storage memory 2009 temporary area is secured, the process ends.

FIG. 21 is a flowchart showing the operation when the database assist unit 107 receives the command shown in FIG. 16 from the storage control program 109. In process 2102, the row data selection unit 2003 reads data from the input data storage memory 2001. In processing 2103, the row data selection unit 2003 transmits data selected from the data read in processing 2102 to the column extraction unit 2005 according to the selection condition. In processing 2104, the column extraction unit 2005 extracts column data according to the column extraction condition and stores it in the server transmission data storage memory 2008. In processing 2105, the row data selection unit 2003 transmits the number of read data and the number of selected data to the selection rate calculation unit 2004. In processing 2106, the selection rate calculation unit 2004 calculates the selection rate and stores it in the server transmission data storage memory 2008. Here, the selection rate is a value obtained by dividing the number of selected data by the number of read data. In process 2107, the process end notification unit 2010 notifies the storage control program 109 of the process end. The storage control program 109 that has received the notification of the completion of processing transmits the data stored in the server transmission data storage memory 2008 to the server apparatus 101.

FIG. 22 is a flowchart showing the operation when the database assist unit 107 receives the command shown in FIG. 17 from the storage control program 109. In processing 2202, the row data selection unit 2003 reads data from the input data storage memory 2001. In processing 2203, the row data selection unit 2003 transmits data selected from the data read in processing 2202 to the column extraction unit 2005 according to the selection condition. In processing 2204, the column extraction unit 2005 extracts column data according to the column extraction condition and transmits the column data to the hash value calculation unit 2006. In processing 2205, the row data selection unit 2003 transmits the number of read data and the number of selected data to the selection rate calculation unit 2004. In processing 2206, the selection rate calculation unit 2004 calculates the selection rate and stores it in the server transmission data storage memory 2008. In processing 2207, the hash value calculation unit 2006 calculates a hash value in accordance with the designation of the column for calculating the hash value, and stores the calculated hash value and column data in the server transmission data storage memory 2008. Here, the calculation of the hash value is an operation for dividing the column data by a preset value and calculating a remainder. In process 2208, the process end notification unit 2010 notifies the storage control program 109 of the process end. The storage control program 109 that has received the notification of the completion of processing transmits the data stored in the server transmission data storage memory 2008 to the server apparatus 101.

FIG. 23 is a flowchart showing the operation when the database assist unit 107 receives the command shown in FIG. 18 from the storage control program 109. In processing 2302, the row data selection unit 2003 reads data from the input data storage memory 2001. In processing 2303, the row data selection unit 2003 transmits data selected from the data read in processing 2302 to the column extraction unit 2005 according to the selection condition. In processing 2304, the column extraction unit 2005 extracts column data according to the column extraction condition and transmits the column data to the hash value calculation unit 2006. In processing 2305, the row data selection unit 2003 transmits the number of read data and the number of selected data to the selection rate calculation unit 2004. In processing 2306, the selection rate calculation unit 2004 calculates the selection rate and stores it in the server transmission data storage memory 2008. In processing 2307, the hash value calculation unit 2006 calculates a hash value in accordance with the designation of the column for calculating the hash value, and transmits the calculated hash value and column data to the hash value calculation unit 2007. Here, the calculation of the hash value is an operation for dividing the column data by a preset value and calculating a remainder. In processing 2308, the hash value calculation unit stores the data to be transmitted to the server apparatus 101 and the hash value in the server transmission data storage memory 2008 according to the upper and lower limits of the hash value, and temporarily stores the data to be transmitted to the server apparatus 101 and the hash value. Store in the area storage memory 2009. In process 2309, the process end notification unit 2010 notifies the storage control program 109 of the process end. The storage control program 109 that has received the notification of the completion of processing transmits the data stored in the server transmission data storage memory 2008 to the server apparatus 101. When the storage device 102 has received “not” in the temporary area storage presence / absence 1806 of the command received from the server apparatus 101, the storage control program 109 that has received the processing completion notification stores the temporary area storage memory 2009 in the temporary area storage memory 2009. Data stored in the range of addresses designated by the area start address 1807 and the temporary area end address 1808 is transmitted.

101 ... Server device 102 ... Storage device 103 ... CPU
104 ... Memory 105 ... Database 106 ... CPU
107: Database assist unit 108 ... Memory 109 ... Storage control program 110 ... Flash memory 111 ... Internal network

Claims

A system comprising a server device that executes table join processing based on a hash value, and a storage device connected to the server device via a network,
The server device
The table join process is executed, first information for designating data to be joined in the table, second information for designating data to be calculated for hash values in the table, and the hash value A first processing unit for generating a command including the third information to be specified;
A first input / output unit that transmits the command to the storage device via the network;
The storage device
A storage unit for storing a plurality of tables;
A second input / output unit for receiving the command via the network;
Based on the first information, the data to be combined is selected from the data constituting the table, and the hash value of the data specified by the second information among the data constituting the table is calculated. A second processing unit that selects the hash value specified by the third information, and
The second input / output unit includes a hash value selected from the hash value selected based on the third information and data selected based on the first information corresponding to the hash value. A system that transmits a table to the server device via the network.
The system of claim 1, comprising:
The storage device has a memory in which a temporary area is secured,
The second processing unit generates a temporary table including the hash value not specified in the third information and the data corresponding to the hash value and selected based on the first information. A system characterized by storing in an area.
The system of claim 2, comprising:
The command includes fourth information related to the necessity of using the temporary area,
The second processing unit, based on the fourth information, the hash value not specified in the third information and the data selected based on the first information corresponding to the hash value, Determining whether or not to store a hash table composed of the temporary table in the temporary area.
The system according to claim 3, wherein
The second processing unit calculates a selection rate that is a ratio of data to be combined among data constituting the table,
The second input / output unit transmits the selection rate to the server device,
The first processing unit determines whether to use the temporary area based on the selection rate.
A system characterized by that.
A storage device connected via a network to a server device that performs table join processing based on a hash value,
A storage unit for storing a plurality of tables;
First information for designating data to be combined in the table, second information for designating data for calculation of the hash value in the table, third information for designating the hash value, An input / output unit that receives a command including the command from the server device via the network;
Based on the first information, select the data to be combined among the data constituting the table, calculate the hash value of the data specified by the second information among the data constituting the table, A processing unit that selects the hash value designated by the third information,
The input / output unit includes a hash table composed of the hash value selected based on the third information and data selected based on the first information corresponding to the hash value. A storage apparatus characterized by transmitting to the server apparatus via a network.
The storage apparatus according to claim 5,
It has a memory where a temporary area is secured,
The processing unit stores, in the temporary area, a hash table including the hash value not specified in the third information and data selected based on the first information corresponding to the hash value. A storage device characterized by that.
The storage device according to claim 6,
The command includes fourth information related to the necessity of using the temporary area,
The processing unit is configured based on the fourth information, the hash value not specified in the third information, and data selected based on the first information corresponding to the hash value. Determining whether to store the hash table in the temporary area.
The storage device according to claim 7,
The processing unit calculates a selection rate that is a ratio of data to be combined among data constituting the table,
The input / output unit transmits the selection rate to the server device.
A database management method in a server apparatus and a storage apparatus that executes table join processing based on a hash value,
The server device
Execute the table join process,
First information that specifies data to be combined in the table, second information that specifies data that is a calculation target of the hash value in the table, and third information that specifies the hash value. Generate included commands,
Sending the command to the storage device via the network;
The storage device
Receiving the command,
Based on the first information, select the data to be combined from the data constituting the table stored in the storage device,
Calculating the hash value of the data specified by the second information among the data constituting the table;
Select the hash value specified in the third information,
A hash table composed of the hash value selected based on the third information and the data selected based on the first information corresponding to the hash value is transmitted to the server device; A database management method characterized by the above.
The database management method according to claim 9, comprising:
The storage device
A hash table composed of the hash value not specified in the third information and data selected based on the first information corresponding to the hash value is stored in a temporary area, Database management method.
The database management method according to claim 10, wherein
The command includes fourth information related to the necessity of using the temporary area,
The storage device is configured based on the fourth information, the hash value not specified in the third information, and data selected based on the first information corresponding to the hash value. Determining whether to store the hash table in the temporary area.
The database management method according to claim 11, comprising:
The storage device
Calculating a selection ratio which is a ratio of data to be combined among data constituting the table;
Transmitting the selectivity to the server device;
The server device determines whether or not to use the temporary area based on the selection rate;
A database management method characterized by the above.
A server device that performs table join processing based on a hash value,
The table join process is executed, first information for designating data to be joined in the table, second information for designating data to be calculated for hash values in the table, and the hash value A first processing unit for generating a command including the third information to be specified;
A storage device;
A memory for storing data received from the storage device,
The storage device
A storage unit for storing a plurality of tables;
Based on the first information included in the command, the data to be combined is selected from the data constituting the table, and the data specified by the second information among the data constituting the table is selected. A second processing unit that calculates a hash value and selects the hash value designated by the third information;
A third table for transmitting to the memory a hash table composed of the hash value selected based on the third information and the data corresponding to the hash value and selected based on the first information And a processing unit.