WO2015190007A1 - Information processing device, computer system, and data processing method therefor - Google Patents

Information processing device, computer system, and data processing method therefor Download PDF

Info

Publication number
WO2015190007A1
WO2015190007A1 PCT/JP2014/077235 JP2014077235W WO2015190007A1 WO 2015190007 A1 WO2015190007 A1 WO 2015190007A1 JP 2014077235 W JP2014077235 W JP 2014077235W WO 2015190007 A1 WO2015190007 A1 WO 2015190007A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
update
database
hierarchy
unit
Prior art date
Application number
PCT/JP2014/077235
Other languages
French (fr)
Japanese (ja)
Inventor
義文 藤川
本村 哲朗
忠幸 松村
渡辺 聡
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to PCT/JP2014/065671 priority Critical patent/WO2015189970A1/en
Priority to JPPCT/JP2014/065671 priority
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Publication of WO2015190007A1 publication Critical patent/WO2015190007A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Exchange, e.g. stocks, commodities, derivatives or currency exchange

Abstract

Conventionally, since consideration has only been given to speeding up heap structures, there has been a problem that in continuous session processing, for example, there has not been a sufficient mechanism for parallel operation of execution assessment processing and heap processing. By providing a maximum priority data decision signal from a database update unit to a database update assessment unit, outputting the fact that a decision has been made at the moment when the maximum priority data is decided during database update processing, and carrying out database update assessment processing using the maximum priority data, the present invention makes it possible to operate the database update unit and database update assessment unit in parallel and improve processing speed.

Description

Information processing apparatus, computer system, and data processing method thereof

The present invention relates to an information processing apparatus that performs sorting processing at high speed, a computer system including the information processing apparatus, and a data processing method in the apparatus or system.

In stock trading, while the market is open, contract processing is performed by a method called Zaraba processing. In this process, an order that has not been filled in a past order is called a board. The boards are prioritized. In the sell order, an order with a lower price is given priority, and with the same price, the order with the first order is given priority. In the buy order, an order with a higher price is given priority, and with the same price, the order that has been placed first is given priority. When there is a new sell order, the price is compared with the price of the highest priority board in the buy order board, and the contract is executed when the sell order price is low, and the highest priority board is deleted. If not, the sell order is stored as a board. Similarly, if there is a new buy order, the price is compared with the price of the top priority board in the sell order board, and if the buy order price is high, it is executed, and the top priority board is deleted. The If not, the buy order is stored as a board.

As described above, in the mule processing, the plates are always sorted in the priority order, and the highest priority plate is deleted or a new plate is added while referring to the highest priority information.

In recent years, automatic trading orders using machines called algorithm trades are sometimes made, and the number of orders processed per unit time has increased dramatically. Therefore, it is necessary to increase the speed of the zaraba process. In order to speed up the execution of the contract determination, it is necessary to speed up the plate sorting process.

As a method for speeding up the sorting process, there is a method of using one heap structure of a binary tree structure shown in Non-Patent Document 1. Based on this method, there are apparatuses described in Patent Document 1 and Patent Document 2 as apparatuses that execute addition / deletion of data at high speed.

As shown in FIG. 7, the heap structure has one top priority data node in the first layer. Each node is associated with at most two nodes in the next lower hierarchy. The second hierarchy has up to two nodes, and the third hierarchy has up to four nodes. Each node is assigned an address in order from 1. As shown in FIG. 8, when the address of the upper node 81 is A, the addresses of the lower nodes 82 and 83 are (2A) and (2A + 1). The upper node always has higher priority than the lower two nodes. There is no ordering between the two lower nodes.

In Non-Patent Document 1 and Patent Document 2, valid nodes always occupy consecutive address nodes starting from 1. On the other hand, Patent Document 1 does not always use nodes having consecutive addresses. Since the effective number of nodes below the lower hierarchy is different on the left and right, the difference between the effective numbers is stored and managed in each node.

In Patent Document 2, in order to perform the processing of each layer that was sequentially processed in Non-Patent Document 1, at each node, whichever has the higher priority is stored in each of the two nodes below. To increase the speed.

Japanese Patent No. 3905221 Japanese Patent No. 4391464

J. W. J. Williams, "ALGORITHM 232 HEAPSORT", Communications of the ACM, Volume 7, Number 6, pp347--348, June, 1964.

Patent Document 1 and Patent Document 2 describe only the speedup of the heap structure, and node addition and deletion are treated as independent events. On the other hand, in the Zaraba process, a contract determination is made based on the conditions of the past highest priority data and new input data, and then an operation to delete or add a heap is determined. Due to this difference, there is a problem that the mechanism for the parallel operation of the contract determination process and the heap process is not sufficient.

In order to solve the above problems, the present invention provides:
In an information processing apparatus including a database that manages data in order of priority,
A database update unit for performing update of data addition or deletion in the database;
A database update determination unit that instructs the database update unit to update the data to the database;
From the database update unit to the database update determination unit, a notification of finalization of the highest priority data and the highest priority data are output,
A database update instruction and update data are output from the database update determination unit to the database update unit,
The database update determination unit
When receiving the confirmation notification of the highest priority data and the highest priority data, an update determination process is executed,
The database update unit
Updating the database based on the database update instruction and the update data;
When the highest priority data is confirmed, the confirmation notification of the highest priority data is output,
Data updating other than the highest priority data is sequentially executed.

Since the database update unit and the database update determination unit can operate in parallel, the processing speed of the entire system is improved several to ten times as compared with the case where the database update unit and the database update determination unit do not operate in parallel.

1 is a diagram illustrating a stream data processing apparatus according to a first embodiment. FIG. 10 is a diagram illustrating an example of a system according to a third embodiment. The figure which shows the Zaraba processing apparatus which can process the multiple brands of Example 3. FIG. 10 is a diagram illustrating a zaraba processing apparatus that processes one brand according to the third embodiment. The figure which shows the database update part and database of Example 1. FIG. The figure which shows the database update part and database of Example 2. FIG. The figure which shows a heap structure. The figure which shows the relationship between the nodes of a heap structure. The figure which shows the process of the pipeline stage which performs the addition process to a heap structure. The figure which shows the address of a series of nodes referred when performing an additional process. The figure which shows the address of a series of nodes referred when adding the 13th node. The figure which shows the start operation | movement of a deletion process when the addition process is not performed in the pipeline. The figure which shows the process of the pipeline stage which performs a deletion process. The figure which shows the start operation | movement of a deletion process in case the addition process is performed in the pipeline. The figure which showed the hardware constitutions of the computer system which performs an import process. The figure which showed the logical structure of the computer system.

FIG. 1 shows a stream data processing apparatus according to the first embodiment. The database 15 is a device that stores data managed in order of priority. The database update unit 14 is a device that updates the database 15 according to instructions from the output calculation unit and the database update determination unit 12. The data receiving unit 11 is a device that receives data. The output calculation unit and the database update determination unit 12 are units that determine database operations using the data received from the data reception unit 11 and the highest priority data received from the database update unit 14. The result output unit 13 is a device that outputs the result calculated by the output calculation and database update determination unit 12. The highest priority data determination signal 16 and the highest priority data signal 17 are connected from the database update unit 14 to the output calculation and database update determination unit 12. A data update instruction signal 18 and an additional update data signal 19 are connected from the output calculation and database update determination unit 12 to the database update unit 14.

The database operations include deleting the highest priority data, adding new data, and changing the highest priority data. Here, the change of the highest priority data does not change the priority but updates the attached information, and does not change the entire database. In the following, the change of the highest priority data is merely a change of one node, and the method is obvious and will be omitted.

If the database 15 has a configuration in which the highest priority data is first and the second priority data follows, there is no need to stick to the heap structure. In addition, when updating, the database updating unit 14 first determines the highest priority data first, and then updates the second priority data and thereafter. There is a heap structure as the lightest operation. In addition, it is possible to consider a structure that is linearly arranged in order of priority. Below, it demonstrates using a heap structure.

When the highest priority data has been confirmed, that is, when the operation in the first layer update unit 1401 (FIG. 5) is completed, the database update unit 14 outputs the highest priority data confirmation signal 16 and at the same time, The priority data is output using the highest priority data signal 17. When receiving the highest priority data confirmation signal 16, the output calculation and database update determination unit 12 uses the data of the highest priority data signal 17 and the data from the data reception unit 11 to calculate the data to be output and to operate the database. The data update instruction signal 18 and the additional update data signal 19 are output.

Upon receiving the data update instruction signal 18 and the additional update data signal 19, the database update unit 14 first determines the highest priority data. Then, the highest priority data determination signal 16 and the highest priority data signal 17 are output. Thereafter, the data after the highest priority data is sequentially updated. When this update processing is pipeline processing, there are cases where the update by the previous update instruction signal and the update by the new update instruction signal are performed simultaneously.

Next, a specific embodiment of the data update unit will be described with reference to FIG. The database 15 is divided into each storage element from the first layer storage element 1501 to the nth layer storage element 1505 for each layer of the heap structure. Corresponding to this, the database update unit 14 is also divided into a change unit of each layer of the first layer update unit 1401 to the nth layer update unit 1405, and together with this, the all layer control unit 1400 controls all layers. It is made up. Thus, a pipeline for each layer is configured in order from the first layer. The update unit of each layer includes an operation mode register 14011, an additional data register 14012, an operation target address register 14013, and a final storage address register 14014. The all-layer control unit 1400 includes an all valid node number register 14001 and a stored node number register 14002. The total valid node number register 14001 manages the total number of node data already stored in the database and node data that has not yet been stored and exists in the first layer update unit 1401 to the nth layer update unit 1405. The stored node number register 14002 manages the node having the largest address among the nodes stored in the database 15.

At the time of the addition operation, the total valid node number register 14001 is incremented by 1, the operation mode register 14011 of the first layer update unit 1401 is set to the addition mode, the data to be added is stored in the additional data register 1402, and the operation target address register 14013 Is set to 1, the value of the total valid node number register 14001 is stored in the final storage address register 14014, and the pipeline is started.

The operation in each layer at the time of addition will be described with reference to FIG. If the value of the operation target address register 913 is the same as the value of the final storage address register 914, the data of the additional data register 912 is stored in the target node 920 indicated by the operation target address register 913, and the stored node number register 14002. Is incremented by 1, and the pipeline operation is terminated. If the value of the operation target address register 913 is smaller than the value of the final storage address register 914, the data of the target node 920 indicated by the operation target address register 913 and the data of the additional data register 912 are compared, and the priority is high. One of the data is stored in the target node 920, and the one with the lower priority is stored in the additional data register 912. Then, the operation target address register 913 is updated as shown below, the registers 911 to 914 are stored in the registers of the next stage, and the pipeline is advanced. As shown in FIG. 10, the operation target address register 913 sequentially updates in accordance with each layer. Here, “[X]” is a Gaussian symbol indicating the maximum integer not exceeding X. FIG. 11 shows values in each layer of the operation target address register 913, taking the case where the value of the final storage address register 914 is 13, for example.

The operation at the start of the delete operation is divided into two cases depending on the state of the entire pipeline at the start of the delete pipeline. If there is no additional layer in the pipeline, the first layer data is deleted (invalidated) as in Non-Patent Document 1, and the value of the node with the highest address among the stored valid nodes Are used as the data of the first layer. That is, as shown in FIG. 12, the data 1506 indicated by the stored node number register 14002 is read and set in the additional data register 14012 in the first layer. The operation mode register 14011 is set to the delete mode, and the operation target address register 14013 is set to 1. Then, the values of the stored node number register 14002 and the total valid node number register 14001 are decreased by one. Then, the operations of the following pipeline layers are performed.

The operation in each layer at the time of deletion will be described with reference to FIG. The values of the two child nodes 930 and 931 derived from the value of the operation target register 913 are compared. Next, the data with the higher priority compared with the value of the additional data register 912 is compared. As a result, the data with the higher priority is stored in the target node 920 indicated by the operation target register 913. At this time, when the value of the additional data register 912 is stored in the target node 920, the deletion operation pipeline is terminated.

If the value stored in the target node 920 is either the child node 930 or the child node 931, the value of the operation target address register 913 is updated to the address value of the stored child node 930 or child node 931. To move to the next layer.

At the start of the delete operation, if there is a layer performing an add operation in the pipeline, it will be described with reference to FIG. Of the additional operation layers, the additional data 14062 at the shallowest layer is set in the additional data register 14012 of the first layer instead of the final data, and the pipeline operation of the layer that was performing the additional operation is stopped. The operation mode register 14011 is set to the delete mode, and the operation target address register 14013 is set to 1. Then, the value of the total valid node number register 14001 is decreased by one. The stored node number register 14002 is not changed. And the operation | movement in each layer at the time of deletion shown above is performed.

In this embodiment, when the node of the first hierarchy is determined, the highest priority data determination signal and the highest priority data signal are output from the database update unit 14 to the output calculation and database update determination unit 12, and the output calculation unit and A data update instruction signal and an additional update data signal are output from the database update determination unit 12 to the database update unit 14. Therefore, since the database update unit 14 and the output calculation and database update determination unit 12 can operate the processes in parallel, the processing speed of the entire system can be increased.

Also, the above-described pipeline operation allows heap structure update operations that can overlap each operation without sacrificing speed. As a result, it is possible to update the heap structure database with a high processing speed.

Also, Patent Document 1 and Patent Document 2 require an additional storage element at each node for speeding up. There are cases where the number of plates for the zaraba processing is tens of thousands to several millions, and there is a problem that the cost of the additional storage element is increased. On the other hand, in the present embodiment, since the storage element is provided for each heap hierarchy, the number of storage elements can be reduced as compared with the conventional technique provided for each node.

In this embodiment, in order to improve the processing of the heap structure without requiring an additional storage element, a pipeline is formed for each layer of the heap structure. When there is no additional processing on the pipeline when deleting the highest priority data, the data of the node with the highest address is extracted from the valid nodes as in Non-Patent Document 1 and Patent Document 2, and the data Then, the nodes are compared and exchanged sequentially with the nodes in the second layer and below. When deleting the highest-priority data, if there is an additional process on the pipeline, the process is interrupted, and the data to be added is used in place of the data of the node with the highest address. It is possible to compare and exchange with nodes below the layer in order.

6 is obtained by replacing the memory element in the lower layer in FIG. 5 with a cache 621 and a double-data-rate SDRAM (hereinafter referred to as DDR SDRAM) 623. In this case, although the update of the lower layer is slightly delayed, it is effective when dealing with an enormous number of nodes in which all the nodes cannot be mounted in the Field Programmable Gate Array (hereinafter referred to as FPGA). Depending on the timing of the add operation and the delete operation, the add operation may be canceled in the middle, thereby reducing access to the DDR SDRAM 623. Thereby, processing can be performed without sacrificing the processing speed.

FIG. 2 is a diagram showing the entire zalaba processing system using the present invention. The present invention is implemented in the FPGA 26 of FIG. Input of order data and execution results are input / output through the NIC 25. Some specific brands with a large number of orders are processed using the FPGA 26 and the DDR SDRAM 27, and other brands are processed using the CPU 21, the DDR SDRAM 22, the I / F 23 and the Storage 24.

Fig. 3 shows a functional block diagram of this mule processing unit. An order input through the NIC 25 is received by the new order receiving unit 30, and uses a brand-specific process sorting unit 31 and a specific brand register 32 to sort the processing destination for each brand. The specific brands are processed by the specific brand contract processing system indicated by 100, 101, and 102, and the other brands are processed by the other brand contract processing system 103. The processing results are collected in the contract result output unit 33 and output through the NIC 25.

3 is processed using 20, 21, 22, 23 and 24 of FIG. 100, 101, and 102 in FIG. 3 are implemented in the FPGA 26 in FIG.

FIG. 4 shows the internal structure of the specific brand 1 contract processing system 100. The specific brand order receiving unit 110 in FIG. 4 corresponds to the data receiving unit 11 in FIG. 4 is equivalent to the output calculation and database update determination unit 12 in FIG. The specific brand execution result output unit 130 in FIG. 4 corresponds to the result output unit 13 in FIG. The selling plate information update unit 140 and the selling plate information database 150 in FIG. 4 correspond to the database update unit 14 and the database 15 in FIG. Similarly, the buying plate information updating unit 141 and the buying plate information database 151 in FIG. 4 correspond to the database updating unit 14 and the database 15 in FIG.

That is, the specific brand 1 contract processing system 100 has two databases. When a new order is received, the specific brand execution determination and board information update determination unit 120 selectively uses the two databases depending on whether the new order is a sell order or a buy order.

If the new order is a sell order, the contract is judged using the highest priority data of the buying board information. If the contract is made, the highest priority board is deleted from the buying board information database 151. If not, the new order is added to the sales board information database 150 as a board.

Conversely, if the new order is a buy order, the contract is judged using the highest priority data in the sales board information. If the contract is made, the top priority board is deleted from the sales board information database 150. If not, the new order is added as a board to the buying board information database 151.

The signal line between the specific brand execution determination and board information update determination unit 120, the selling board information update unit 140, and the buying board information update unit 141 in FIG. 4 is the output calculation and database update determination unit 12 and database update unit in FIG. 14, there are two sets of the highest priority data determination signal 16, the highest priority data signal 17, the data update instruction signal 18, and the additional update data signal 19, and these signals operate the contract determination and the database update in parallel. This makes it possible to execute the contract determination process at high speed.

The high-speed sorting processing by the FPGA described in the first to third embodiments contributes to the speeding up of the DBMS (Database Management System). In this embodiment, data import from a row-oriented DBMS to a column-oriented DBMS will be described as an example.

<Hardware configuration of computer system>

FIG. 15 is a diagram showing a hardware configuration of a computer system that performs import processing. The computer system includes an import source computer and an import destination computer, and is connected to the network 91501 via the NIC 25. The components of both computers are the same as in FIG. Note that the FPGA 26 and the DDR SDRAM 27 may exist only in either computer. The DDR SDRAM may be replaced with another semiconductor memory. In this embodiment, when not particularly concerned with DDR SDRAM, it is simply called “memory” instead of DDR SDRAM.

For example, the mounting positions of the FPGA 26 and the memory 27 may be as follows.
An add-on card that can be inserted into a computer using PCI-Express.
A flash memory card that is inserted into a computer using PCI-Express or the like.
Controller inside Storage24.
If the storage 24 includes a plurality of HDDs or SSDs, the storage 24 is an HDD or SSD.
Motherboard if the computer is a PC.
Blade if the computer is a blade server.
NIC25, I / F23, and CPU21.

<Logical configuration of computer system>

FIG. 16 is a diagram showing a logical configuration of the computer system.

<<< Import source >>>

First, I will explain from the import source.

The computer system includes a row-oriented DBMS 16110, and the DBMS 16110 performs DBMS processing by accessing the storage resource 16120. The entity of the storage resource 16120 is the memory 22 or the Storage 24 of the import source computer. The row-oriented DBMS 16110 is generated when the CPU 21 of the import source computer executes a row-oriented DBMS program file (not shown). The storage resource 16120 stores a table 16121 managed by the row-oriented DBMS.

<< Export file >>

The export file 16130 includes export data 16131 in which the row-oriented DBMS 16110 writes all or part of the contents of the table 16121. In this figure, it is shown that the export data 16131 includes a plurality of records (four records this time) having the birthplace, age, and unique ID as attribute values. A feature of the export data 16131 written by the row-oriented DBMS 16110 is that all attribute values included in the record are stored together. In other words, it can be said that this is a data storage format in which storage address continuity between attribute values in a record is given priority over storage address continuity of a predetermined attribute value between records.

The storage address continuity here means that a plurality of values are stored in consecutive addresses, or if the attribute value has some management information such as an ID, a plurality of values including the management information are included. It means storing in consecutive addresses. If the storage address continuity is maintained in this way, the transfer efficiency from the storage 24 is improved, and even if the transfer is from the memory 22, burst transfer is possible, so the efficiency is high.

Note that the export file 16130 is only required to be temporarily stored in at least the import source computer or the import destination computer, the storage location is not limited, and a storage format other than a file may be used.

<<< Import destination >>>

Next, the import destination will be explained. The computer system includes a column-oriented DBMS 16210, and the DBMS 16220 performs DBMS processing by accessing the storage resource 16220. The entity of the storage resource 16220 is the memory 22 or Storage 24 of the import destination computer. The column-oriented DBMS 16210 is generated when the CPU 21 of the import destination computer executes a column-oriented DBMS program file (not shown). The storage resource 16220 stores tables 16221 to 16222 managed by the row-oriented DBMS 16110.

<<< Import File and Column Oriented DBMS >>>

The import file 16230 includes import data 16231 that is data that the column-oriented DBMS 16210 adds to the table 16222. Note that, as a feature of how to use the storage resource 16220 of the column-oriented DBMS 16210, the records in the table are decomposed for each attribute, and attribute values are stored together for each attribute. In other words, it can be said that this is characterized by using storage address continuity of a predetermined attribute value between records in preference to storage address continuity between attribute values in the record. An example of the merit of this usage is that the query processing speed in a query that targets only a part of the attributes of the record is improved, and that deduplication and compression efficiency are good.

Since the column-oriented DBMS 16210 has the above-described characteristics, the import data 16231 similarly has the address proximity (or continuity) of predetermined attribute values between records, and the address proximity (or continuity) between attribute values in records. The data is stored in such a way that priority is given to the property. In the example of FIG. 16, since “BornPlace” and “Age” are provided as attributes of the record of the table, an attribute value is stored for each predetermined attribute in the form of dictionary data such as 16231B and 16231C. As shown in FIG. 16, deduplication or data compression may be performed at the import data stage.

Dictionary data relating to a predetermined attribute (or one dictionary data for a plurality of attributes) is data in which attribute values corresponding to a predetermined attribute are sorted and duplicate values are excluded. Therefore, a unique ID (referred to as DID) is assigned to each attribute value of the dictionary data. Therefore, in the example of FIG. 16, there are three Age = 22 in the table, but in the “Age” dictionary data, one is deduplicated. Similarly, “BornPlace” Tokyo is also de-duplicated.

The table data 16231 is data representing the relationship between attribute values and records that have been decomposed in this way. Note that, in FIG. 16, for simplicity, the attribute value of the table of the export data 16131 is expressed in a format replaced with the DID, but this format is not necessary.

By setting the import file 16230 to the above data format, the column-oriented DBMS 16210 can import data at a lower load than when the export file 16130 is directly imported. Note that the import file 16230 only needs to be temporarily stored in at least the import source computer or the import destination computer, and may be stored in any storage format other than a file.

<<< Conversion from export data to import data >>>

The conversion from the export data 16131 to the import data 16231 is performed by the conversion program 16300. The conversion program 16300 is a program executed by the CPU 21 in which an execution file is stored in the storage resource of the import source computer or the import destination computer. The conversion program 16300 performs the conversion in cooperation with the FPGA 26. The processing is as follows.

(Step-01) The conversion program 16300 reads the export data 16131 from the export file 16130.

(Step-02) The conversion program 16300 decomposes the export data for each attribute.

(Step-03) The conversion program 16300 transmits the decomposed export data to the FPGA 26. As a result, the FPGA 26 starts the sort process while using the memory 27.

(Step-04) The conversion program 16300 receives the sorted export data after sorting from the FPGA 26.

(Step-05) The conversion program 16300 eliminates duplication of attribute values of the sorted export data after sorting, assigns a DID, and stores it in the import data 16231 as dictionary data 16231B or 16231C.

(Step-06) The conversion program 16300 refers to the dictionary data 16231B and 16231C, converts the attribute value of the export data into DID, and stores it in the import data 16231 as table data 16231A.

(Step-07) The conversion program 16300 stores the import data 16231 in the import file 16230.

When there are a plurality of pieces of export data after decomposition, Step-03 and Step-04 may be performed sequentially for each piece of export data after decomposition, or may be performed in parallel. This also applies to Step-05 to Step-07. Further, Step-05 to Step-06 may be performed by the FPGA 26 instead of the conversion program 16300.

<<< FPGA configuration >>>

The above-described processing by the FPGA can be realized even by an ASIC that cannot be reconfigured. However, it is preferable to realize the processing by an FPGA that can be reconfigured for one or more of the following reasons.
(Reason 1) The data representation format of attribute values included in the export data is not fixed. For example, in some computer systems, an attribute value represented by a 4-byte integer may be targeted for import, while in another computer system, an attribute value represented by a 1-byte integer may be desired for export. Furthermore, there are cases where attribute values expressed as character strings are to be exported as shown in FIG. In the case of an ASIC, logic may have to be prepared for each of them, and gate utilization efficiency is poor. In addition, when there are a large number of data representation formats, the ASIC may not be able to handle all of them. When implemented with a reconfigurable FPGA, these problems are eliminated. Note that as an example of the logic inside the FPGA 26 that is changed at such time, comparison logic between values used in the sort process and data length when stored in the memory 27 can be considered. Good.
(Reason 2) The data structure of the tables in the storage resources 16120 and 16220 is a structure unique to each DBMS vendor. Therefore, if it is realized by an ASIC, the ASIC can be used only for conversion between specific vendors. This problem becomes conspicuous when the FPGA 26 performs at least part of the processing from Step-05 to Step-06. When implemented with a reconfigurable FPGA, these problems are eliminated. Further, as shown in FIG. 16, the input FPGA definition information 16310 (information on the data structure in the storage resource 16120 of the table created by the row-oriented DBMS vendor) and the output FPGA definition information 16320 (the table of the table created by the row-oriented DBMS vendor). The information may be configured based on the data structure of the storage resource 16120). In this way, unnecessary internal structure information is not disclosed between DBMS vendors.

<<< Dynamic generation of configuration file >>>

Based on the above, the program for configuring the FPGA (FPGA configuration program) performs the following processing.
(Step-A) The FPGA configuration program specifies the data representation of the attribute of the export data.
(Step-B) The FPGA configuration program selects a partial configuration file that realizes the sort logic corresponding to the specified data expression.
(Step-C) The FPGA configuration program reads the input FPGA definition information 16310 and generates a partial configuration that realizes the logic for interpreting the data for the export data.
(Step-D) The FPGA configuration program reads the output FPGA definition information 16320 and generates a partial configuration that realizes a logic for generating data for export data from the data in the memory 22.
(Step-E) The FPGA configuration program merges the products of Step-B to Step-D to generate a configuration file.
(Step-F) The FPGA configuration program configures the FPGA with the generated configuration file.

Note that the FPGA configuration program is typically stored in the storage 24 of the import source computer system or the import destination computer system and executed by the CPU 21. However, some processing may be performed by other computers.

<Variation>

With the above configuration and processing, import processing between DBMSs can be performed at high speed. The following can be considered as variations.
The import source computer and the import destination computer may be the same computer.
The export data 16131 may be transmitted from the row-oriented DBMS 16110 to the conversion program 16300 without generating the export file 16130. The same applies to the import data 16231.
You may use for the conversion process inside DBMS when one DBMS is both row-oriented and column-oriented.
Conversion processing from the column-oriented DBMS to the row-oriented DBMS may be performed by the FPGA 26.

The example 4 has been described above.

Although several embodiments have been described above, these are merely examples for explaining the present invention, and the scope of the present invention is not limited to these embodiments. The present invention can be implemented in various other forms.

DESCRIPTION OF SYMBOLS 10 ... Stream data processor 11 ... Data reception part 12 ... Output calculation and database update determination part 13 ... Result output part 14 ... Database update part 15 ... Database 16 ... Top priority data decision signal 17 ... Top priority data signal 18 ... Data update Instruction signal 19 ... Additional update data signal 110 ... Specific brand order receiving unit 120 ... Specific brand execution determination and board information update determination unit 130 ... Specific brand execution result output unit 140 ... Selling board information update unit 141 ... Buying board information update unit 150 ... Selling board information database 151 ... Buying board information database

Claims (17)

  1. In an information processing apparatus including a database that manages data in order of priority,
    A database update unit for performing update of data addition or deletion in the database;
    A database update determination unit that instructs the database update unit to update the data to the database;
    From the database update unit to the database update determination unit, a notification of finalization of the highest priority data and the highest priority data are output,
    A database update instruction and update data are output from the database update determination unit to the database update unit,
    The database update determination unit
    Upon receiving the notification of the determination of the highest priority data and the highest priority data, execute the determination of the type of database operation to be instructed by the update instruction and the calculation of the update data to be output,
    The database update unit
    Updating the database based on the database update instruction and the update data;
    When the highest priority data is confirmed, the confirmation notification of the highest priority data is output,
    An information processing apparatus that sequentially executes data updates other than the highest priority data.
  2. The information processing apparatus according to claim 1,
    The database has a hierarchical structure of heap structure,
    The database update unit includes a plurality of hierarchy update units corresponding to the hierarchy,
    A storage element is provided corresponding to the hierarchy of the database,
    An information processing apparatus that updates data from a higher hierarchy to a lower hierarchy by a pipeline operation based on the update instruction and update data.
  3. The information processing apparatus according to claim 2,
    In the case of additional updates, the hierarchy update unit
    The update data as additional data,
    Compare the data of the target node in the highest hierarchy with the additional data,
    When the priority of the additional data is high, the additional data is stored as the data of the target node,
    When the priority of the data of the target node is high, the additional data is used as an update of data of a node in a lower hierarchy.
  4. The information processing apparatus according to claim 2,
    The hierarchy update unit is a deletion update, and when no addition operation is executed in any hierarchy,
    Delete the top-level data,
    An information processing apparatus using data of a node having the highest address stored in the database as data of the highest hierarchy.
  5. The information processing apparatus according to claim 2,
    The hierarchy update unit is a deletion update, and when performing an additional update in any hierarchy,
    The additional data in the highest priority layer among the layers to be added is the additional data in the highest layer,
    An information processing apparatus characterized in that the pipeline operation of the hierarchy for executing addition is stopped.
  6. The information processing apparatus according to claim 2,
    The storage element corresponding to the upper layer is configured with an FPGA,
    An information processing apparatus characterized in that a storage element corresponding to a lower layer is composed of a cache and a DDR SDRAM.
  7. The information processing apparatus according to claim 1,
    The database update unit comprises a sales board information update unit and a purchase board information update unit,
    The database consists of a selling board information database and a buying board information database,
    A plurality of sets of notifications for determining the highest priority data and the highest priority data, and database update instructions and update data are provided,
    An information processing apparatus, wherein the data is input / output between the database update determination unit and the database update unit.
  8. In a data processing method of an information processing apparatus including a database that manages data in order of priority,
    The information processing apparatus includes:
    A database update unit for performing update of data addition or deletion in the database;
    A database update determination unit that instructs the database update unit to update the data to the database;
    From the database update unit to the database update determination unit, a notification of finalization of the highest priority data and the highest priority data are output,
    A database update instruction and update data are output from the database update determination unit to the database update unit,
    The database update determination unit
    When receiving the confirmation notification of the highest priority data and the highest priority data, an update determination process is executed,
    The database update unit
    Updating the database based on the database update instruction and the update data;
    When the highest priority data is confirmed, the confirmation notification of the highest priority data is output,
    A data processing method characterized by sequentially executing data updates other than the highest priority data.
  9. The data processing method according to claim 8, wherein
    The database has a hierarchical structure of heap structure,
    The database update unit includes a plurality of hierarchy update units corresponding to the hierarchy,
    A storage element is provided corresponding to the hierarchy of the database,
    A data processing method characterized in that, based on the update instruction and update data, data is updated from a higher hierarchy to a lower hierarchy by a pipeline operation.
  10. The data processing method according to claim 9, wherein
    In the case of additional updates, the hierarchy update unit
    The update data as additional data,
    Compare the data of the target node in the highest hierarchy with the additional data,
    When the priority of the additional data is high, the additional data is stored as the data of the target node,
    When the priority of the data of the target node is high, the additional data is used as an update of data of a node in a lower hierarchy.
  11. The data processing method according to claim 9, wherein
    The hierarchy update unit is a deletion update, and when no addition operation is executed in any hierarchy,
    Delete the top-level data,
    A data processing method comprising using data of a node having the highest address stored in the database as data of the highest hierarchy.
  12. The data processing method according to claim 9, wherein
    The hierarchy update unit is a deletion update, and when performing an additional update in any hierarchy,
    The additional data in the highest priority layer among the layers to be added is the additional data in the highest layer,
    A data processing method, wherein the pipeline operation of the hierarchy for executing the addition is stopped.
  13. The information processing apparatus according to claim 1,
    The information processing apparatus is included in a computer system that executes a row-oriented DBMS and a column-oriented DBMS, and performs a sorting process in a conversion process at the time of data import between the row-oriented DBMS and the column-oriented DBMS.
    Information processing device.
  14. One or more computers including a CPU, storage resources, and an FPGA connected to the memory;
    The CPU in at least one of the one or more computers is:
    (1) A row-oriented DBMS process is executed by accessing a row-oriented table stored in the storage resource,
    (2) executing a column-oriented DBMS process by accessing a column-oriented table stored in the storage resource;
    (3) Configure the FPGA to implement at least sort logic;
    (4) Using the sort logic of the FPGA, the export data from the row-oriented DBMS is converted into the import data to the column-oriented DBMS,
    (5) importing the import data into the column-oriented DBMS;
    Computer system.
  15. The computer system according to claim 14, wherein
    The export data stores a record including a record ID and an attribute value corresponding to one or more attributes,
    In the processing of (4), the FPGA or the CPU further:
    (4a) Select an attribute value corresponding to a predetermined attribute from the export data, and send the selected attribute value to the FPGA, thereby causing the FPGA to sort the attribute value.
    Computer system.
  16. 15. The computer system according to claim 14, wherein
    In the process of (3):
    (3a) Select a configuration suitable for the data representation format of the attribute included in the export data, and configure the selected configuration in the FPGA.
    Computer system.
  17. 15. The computer system according to claim 14, wherein
    In the process of (3):
    (3b) input definition information in which information on the data structure in the storage resource of the row-oriented DBMS is described; output definition information in which information on the data structure in the storage resource in the column-oriented DBMS is described; Generate a configuration based on, and configure to FPGA,
    Computer system.
PCT/JP2014/077235 2014-06-13 2014-10-10 Information processing device, computer system, and data processing method therefor WO2015190007A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2014/065671 WO2015189970A1 (en) 2014-06-13 2014-06-13 Information processing device and data processing method therefor
JPPCT/JP2014/065671 2014-06-13

Publications (1)

Publication Number Publication Date
WO2015190007A1 true WO2015190007A1 (en) 2015-12-17

Family

ID=54833098

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/JP2014/065671 WO2015189970A1 (en) 2014-06-13 2014-06-13 Information processing device and data processing method therefor
PCT/JP2014/077235 WO2015190007A1 (en) 2014-06-13 2014-10-10 Information processing device, computer system, and data processing method therefor

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/065671 WO2015189970A1 (en) 2014-06-13 2014-06-13 Information processing device and data processing method therefor

Country Status (1)

Country Link
WO (2) WO2015189970A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017212525A1 (en) * 2016-06-06 2017-12-14 株式会社日立製作所 Computer and database processing method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08235217A (en) * 1995-02-24 1996-09-13 Pioneer Electron Corp Data retrieval and output device and karaoke device
JP2002007707A (en) * 2000-06-22 2002-01-11 Keio Gijuku Transaction system
JP2006221346A (en) * 2005-02-09 2006-08-24 Toyo Securities Co Ltd Transaction support system, transaction support method, transaction support program and recording medium
JP2010539616A (en) * 2007-09-21 2010-12-16 ハッソ−プラトナー−インスティテュート フュア ソフトバレシステムテヒニク ゲゼルシャフト ミット ベシュレンクテル ハフツング Non-overlapping ETL-less system and method for reporting OLTP data
JP2013246835A (en) * 2012-05-29 2013-12-09 Sap Ag System and method for generating in-memory model from data warehouse model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08235217A (en) * 1995-02-24 1996-09-13 Pioneer Electron Corp Data retrieval and output device and karaoke device
JP2002007707A (en) * 2000-06-22 2002-01-11 Keio Gijuku Transaction system
JP2006221346A (en) * 2005-02-09 2006-08-24 Toyo Securities Co Ltd Transaction support system, transaction support method, transaction support program and recording medium
JP2010539616A (en) * 2007-09-21 2010-12-16 ハッソ−プラトナー−インスティテュート フュア ソフトバレシステムテヒニク ゲゼルシャフト ミット ベシュレンクテル ハフツング Non-overlapping ETL-less system and method for reporting OLTP data
JP2013246835A (en) * 2012-05-29 2013-12-09 Sap Ag System and method for generating in-memory model from data warehouse model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017212525A1 (en) * 2016-06-06 2017-12-14 株式会社日立製作所 Computer and database processing method

Also Published As

Publication number Publication date
WO2015189970A1 (en) 2015-12-17

Similar Documents

Publication Publication Date Title
JP6117378B2 (en) System and method for a distributed database query engine
Kambatla et al. Trends in big data analytics
US8281395B2 (en) Pattern-recognition processor with matching-data reporting module
Plattner A course in in-memory data management
CN102317938B (en) Asynchronous distributed de-duplication for replicated content addressable storage clusters
Cudré-Mauroux et al. NoSQL databases for RDF: an empirical evaluation
US8537160B2 (en) Generating distributed dataflow graphs
US20150067243A1 (en) System and method for executing map-reduce tasks in a storage device
US20070143248A1 (en) Method using query processing servers for query processing of column chunks in a distributed column chunk data store
JP5990192B2 (en) Filtering query data in the data store
Wang et al. Performance prediction for apache spark platform
US20150161209A1 (en) Hierarchy of Servers for Query Processing of Column Chunks in a Distributed Column Chunk Data Store
US20100162230A1 (en) Distributed computing system for large-scale data handling
US8775425B2 (en) Systems and methods for massive structured data management over cloud aware distributed file system
KR101920956B1 (en) Methods and systems for detection in a state machine
TWI515669B (en) Methods and systems for data analysis in a state machine
JP2019194882A (en) Implementing semi-structured data as first class database elements
CN103177061A (en) Unique value estimation in partitioned tables
Marcu et al. Spark versus flink: Understanding performance in big data analytics frameworks
US9418101B2 (en) Query optimization
CN104067282A (en) Counter operation in a state machine lattice
Cheng et al. VENUS: Vertex-centric streamlined graph computation on a single PC
US20130055371A1 (en) Storage control method and information processing apparatus
US9213732B2 (en) Hash table and radix sort based aggregation
Kim et al. Fast, energy efficient scan inside flash memory SSDs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14894474

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14894474

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: JP