WO2018083751A1 - Sous-système de mémorisation intelligent - Google Patents

Sous-système de mémorisation intelligent Download PDF

Info

Publication number
WO2018083751A1
WO2018083751A1 PCT/JP2016/082558 JP2016082558W WO2018083751A1 WO 2018083751 A1 WO2018083751 A1 WO 2018083751A1 JP 2016082558 W JP2016082558 W JP 2016082558W WO 2018083751 A1 WO2018083751 A1 WO 2018083751A1
Authority
WO
WIPO (PCT)
Prior art keywords
execution unit
relational database
data
read
column
Prior art date
Application number
PCT/JP2016/082558
Other languages
English (en)
Japanese (ja)
Inventor
浩平 海外
Original Assignee
浩平 海外
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浩平 海外 filed Critical 浩平 海外
Priority to PCT/JP2016/082558 priority Critical patent/WO2018083751A1/fr
Publication of WO2018083751A1 publication Critical patent/WO2018083751A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures

Definitions

  • the present invention relates to a storage subsystem that improves the performance of an RDBMS, and more particularly to a storage subsystem that improves performance by offloading database processing and converting from a row format to a column format.
  • RDBMS relational database system
  • RDBMS is characterized by storing data in a table format (table) consisting of rows (rows) and columns (columns).
  • RDBMS data configuration methods are broadly divided into row-oriented and column-oriented types.
  • the row-oriented type is a more general method and stores data in units of rows.
  • the efficiency of processing for reading and writing the entire row as in online transaction processing is high, but the efficiency is low in analysis processing for extracting only a small number of columns of data from a long row.
  • the column-oriented type has the opposite characteristics. For this reason, in many RDBMS, measures are taken such as adopting one of the data construction methods for narrowing the application, or reorganizing column-oriented data from row-oriented data and storing redundant data ( For example, Patent Document 1).
  • the former method has a problem in that performance in a poor application is deteriorated
  • the latter method has a problem of a time lag caused by data reorganization or an increase in storage requirement due to redundant data.
  • the present invention is a storage system that includes a DB offload processing execution unit and a storage device, and performs relational database read processing,
  • the DB offload processing execution unit converts the block read from the storage device into a format consisting only of the data in the relational database based on the received command including information on the relational database column, and transmits the converted block.
  • the invention of the present application is a storage system that includes a DB offload processing execution unit and a storage device, and performs a read process of a relational database,
  • a storage system in which the DB offload processing execution unit reads and transmits only blocks in a range specified by the index from the storage device based on the received command including information on the index of the relational database.
  • the present invention solves the above problem by providing the storage system according to Paragraph 0006 or Paragraph 0007, in which the transmission destination is a main memory of a graphic processing unit.
  • the invention of the present application is a device driver program that is executed on a host computer connected to a storage system that includes a DB offload processing execution unit and a storage device and performs a read process of a relational database, An instruction for transmitting a command including information on a column to be read in the relational database to the DB offload processing execution unit;
  • the above problem is solved by providing a device driver program including a command for receiving data converted into a column format based on information on the column to be read from the DB offload processing execution unit.
  • the invention of the present application is a device driver program that is executed on a host computer connected to a storage system that includes a DB offload processing execution unit and a storage device and performs a read process of a relational database, A command including information related to the index of the relational database, and a command for transmitting the command to the DB offload processing execution unit;
  • the above-mentioned problem is solved by providing a device driver program including an instruction for receiving, from the DB offload processing execution unit, data consisting only of blocks in the range specified by the index transmitted from the DB offload processing execution unit To do.
  • the present invention further includes a command for directly writing the received block to the main memory of the graphic processing unit.
  • the present invention also includes a DB offload processing execution unit and a storage device, and is a computer-implemented method for acquiring data from a storage system that performs a relational database read process, Sending a command including information about a column to be read of the relational database; Converting data read from the storage device into column format data based on information about the column to be read;
  • the above-mentioned problem is solved by providing a method including the step of transmitting the column format data.
  • the present invention also includes a DB offload processing execution unit and a storage device, and is a computer-implemented method for acquiring data from a storage system that performs a relational database read process, Sending a command including information about an index of the relational database; Reading only blocks in the range specified by the index; The above problem is solved by providing a method including the step of transmitting the read block.
  • the present invention further includes a step of directly writing the transmitted block into a main memory of a graphic processing unit.
  • the above problem is solved by providing the method described in paragraph 0012 or paragraph 0013.
  • RDBMS access can be accelerated without storing redundant data.
  • FIG. 1 shows the overall structure of a first embodiment of the storage subsystem according to the present invention.
  • the host computer (101) is a general computer server machine or a server machine dedicated to storage processing called NAS (Network-Attached Storage), and has an RDBMS (relational database management system) (102), driver software (103 ), An operating system (not shown), and an application program (not shown).
  • the driver software (103) may include a file system function.
  • a DMA buffer (105) is provided on the main memory (104) of the host computer (101) to temporarily store data used for storage input / output.
  • the storage subsystem (121) is typically a storage device using SSD (solid-state drive), and it is integrated with the host computer (101) even if it is independent of the host computer (101). It may be a device.
  • the storage subsystem (121) has a command / data transmission / reception unit (122), an SSD control unit (123), and a storage / read / write unit for controlling storage according to a command (command) transmitted by the driver software (103).
  • the database processing offload execution unit (124) is provided.
  • the DB processing offload execution unit (124) is a component of the SSD control unit (123), and performs optimization in the database access processing according to the present invention, thereby improving the efficiency of processing of the RDBMS (102).
  • the command / data transmission / reception unit (122), the SSD control unit (123), and the DB processing offload execution unit (124) may be realized by a dedicated processor that generally runs a program called microcode or firmware.
  • the DB processing offload execution unit (124) is preferably realized by dedicated hardware such as FPGA (Field Programmable Grid Array) to ensure the maximum speed.
  • the SSD storage block (125) is a means for storing data in units of blocks, and is not limited to SSD, but can use any random access nonvolatile memory technology such as flash memory or DRAM with power backup. May be used. Further, it may be a conventional HDD (Hard Disk Disk Drive).
  • the host computer (101) and the storage subsystem (121) may be connected via a storage network such as Fiber Channel or Gigabit Ethernet (Ethernet is a registered trademark), but the host computer (101) and the storage subsystem ( 121) are preferably connected in the high-speed I / O bus (130) in the case.
  • a storage network such as Fiber Channel or Gigabit Ethernet (Ethernet is a registered trademark)
  • the host computer (101) and the storage subsystem ( 121) are preferably connected in the high-speed I / O bus (130) in the case.
  • FIG. 2 shows the flow of data and commands in the first embodiment of the storage subsystem according to the present invention.
  • the driver software (103) converts the database access request of the RDBMS (102) into a command for accessing the SSD storage block (125), and the command / data transmission / reception unit (122 ).
  • the SSD access command is extended from the standard specification, and includes information about which column the database access request of the RDBMS (102) targets (details will be described later).
  • the SSD control unit (123) reads the SSD storage block (125) based on the SSD access command passed from the command / data transmission / reception unit (122).
  • the read data is passed to the DB processing offload execution unit (124), and the DB processing offload execution unit (124) is required based on the database column information included in the SSD access command. Extract column information only.
  • the command / data transmission / reception unit (122) receives data consisting only of columns required for the RDBMS (102) from the DB processing offload execution unit (124), and transmits it to the driver software (203).
  • the driver software (103) places the data in the DMA buffer and completes the RDBMS (102) database access request (when the host computer (101) and the storage subsystem are connected via a high-speed I / O bus, This processing is preferably performed directly using the DMA mechanism without using the driver software (103).
  • RDBMS (102) database access requests often require only a small number of columns from very long rows of data. In such a case, it is inefficient to send the entire row data from the storage subsystem (121) to the host computer (101) via the high-speed I / O bus (130). As the processing speed of computers increases, the movement of data between subsystems is becoming a bottleneck for the entire process. This bottleneck can be eliminated by sending only the necessary columns.
  • FIG. 3 shows an example of a command (SSD extension command) transmitted from the driver software (103) to the command / data transmission / reception unit (122).
  • This command is preferably based on industry standard specifications such as NVME (Non-Volatile Memory Express).
  • NVME Non-Volatile Memory Express
  • the standard command only contains block information to be read and written, it does not contain information about database columns, but it is expanded to include information (metadata) about the columns to be read. Thus, conversion from the row format to the column format on the storage subsystem (121) side becomes possible.
  • the SSD control unit (123) that has received the extended command reads data from the SSD storage block (125) designated by the read source, and passes it to the DB processing offload execution unit (124).
  • the DB processing offload execution unit (124) expands each column included in the read block based on the column number to be requested included in the expanded SSD command and the specification of the table definition information (column The information is stored as metadata in the SSD storage block (125) instead of the command body, and pointer information (storage block number or the like) to the metadata may be included in the expanded SSD command). Note that this processing is independent for each block and can be easily parallelized, so that the benefits of parallel hardware can be easily received. Further, the DB processing offload execution unit (124) converts the data into a column format and places the data in the DMA buffer (105) via the command / data transmission / reception unit (122).
  • FIG. 4 illustrates the above example.
  • FIG. 4 is equivalent to FIG. 1 although illustration of some elements is omitted for simplification.
  • SQL_SCAN defined as an extended SSD command is transmitted from the host computer (101) to the storage subsystem (121).
  • This extended SSD command is the same as the command for a normal SSD.
  • the column information to be read in this example, column information called col_x and col_z.
  • the SSD control unit (123) reads the blocks specified in the extended SSD command (in this example, block 5, block 6, and block 7) from the SSD storage block (121), and the DB processing offload execution unit (124) To pass.
  • the DB processing offload execution unit (124) (implemented by hardware in this example) extracts only the data related to col_x and col_z from the read block based on the column information specified in the extended SSD command, Convert to format data (array). This column format data is written back to the DMA buffer (105) of the host computer (101).
  • FIG. 5 shows a second embodiment of the present invention.
  • the block read according to the specification of the extended SSD command is converted into the column format by the DB processing offload execution unit (124), which is the same as in the first embodiment.
  • the DB processing offload execution unit (124) which is the same as in the first embodiment.
  • the GPU internal memory (502) which is a component of the GPU (Graphic Processing Unit) (501) provided in the host computer (101). Transfer data directly to the DMA buffer (105) in the above GPU (in this example, the host computer (101) and storage subsystem (121) are connected via an I / O bus, not a storage network) ).
  • the GPU (501) can be used not only as a graphic-related process but also as a general-purpose parallel processing processor.
  • Data parallel processing can be executed more efficiently. For example, data subjected to parallel processing such as sorting by the GPU (501) is transferred to the DMA buffer (105) on the main memory (104), and the database access processing by the RDBMS (102) is completed.
  • the third embodiment of the present invention will be described below.
  • This embodiment is characterized in that the DB processing offload execution unit (124) performs index processing in addition to the conversion from the row format to the column format in the first embodiment and the second embodiment. .
  • the DB processing offload execution unit (124) determines blocks that do not match the search conditions based on the index information, and does not send those blocks to the host computer (101), thereby reducing the amount of transferred data and processing efficiency. To improve.
  • BRIN Block-Range INdex
  • BRIN is a type of index that bundles a plurality of data blocks and holds the maximum value / minimum value as a summary.
  • Fig. 6 shows an example of BRIN. For example, if it is determined by BRIN set in a certain column (for example, Col_X) that the minimum value of the column is 100 and the maximum value is 654 in all records included in block 0 to block 2, col_X ⁇ It can be seen that there is no need to read block 2 from block 0 and send it to the host computer (101) for a query with 1000 as a search condition.
  • FIG. 7 shows an example of an extended SSD command used in the third embodiment of the present invention.
  • the block number including the index, the number of blocks, the number of the column in which the index (search key) is set, and the minimum value and maximum value of the value are included. included.
  • These values can be set by the driver software (103) based on information obtained from the RDBMS (102).
  • the SSD control unit (123) reads data corresponding to the index from the block including the index itself and passes it to the DB processing offload execution unit (124).
  • the DB processing offload execution unit (124) specifies whether or not the block specified as the reading target is included in the block range that matches the search condition.
  • the DB processing offload execution unit (124) instructs the SSD control unit (123) to read blocks other than block 2 and block 5.
  • the subsequent processing as in the first embodiment, it is desirable to read only the blocks corresponding to the necessary columns, convert them to the column format, and return them to the host computer (101).
  • the data composed of the read blocks may be directly written to the DMA buffer on the GPU main memory via the DMA.
  • the index processing is also a routine processing, it is desirable to increase the hardware speed by means such as FPGA.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Le problème décrit par la présente invention est de fournir un sous-système de mémorisation d'une base SSD au moyen duquel il est possible de traiter efficacement l'accès à un système de gestion de base de données relationnelle (SGBDR), en format de rangée ou en format de colonne, tout en réduisant au minimum la mémorisation de données dupliquées. La solution selon l'invention consiste à développer une commande SSD, transmise à un sous-système de mémorisation et à comprendre des informations se rapportant à une colonne SGBDR. En fonction des informations de la colonne, la conversion d'un format de rengée en un format de colonne est exécutée de manière dynamique côté sous-système de mémorisation. En outre, des blocs inutiles ne sont pas lus en fonction de l'indice, ce qui permet d'obtenir un meilleur rendement.
PCT/JP2016/082558 2016-11-02 2016-11-02 Sous-système de mémorisation intelligent WO2018083751A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/082558 WO2018083751A1 (fr) 2016-11-02 2016-11-02 Sous-système de mémorisation intelligent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/082558 WO2018083751A1 (fr) 2016-11-02 2016-11-02 Sous-système de mémorisation intelligent

Publications (1)

Publication Number Publication Date
WO2018083751A1 true WO2018083751A1 (fr) 2018-05-11

Family

ID=62076835

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/082558 WO2018083751A1 (fr) 2016-11-02 2016-11-02 Sous-système de mémorisation intelligent

Country Status (1)

Country Link
WO (1) WO2018083751A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015105043A1 (fr) * 2014-01-08 2015-07-16 日本電気株式会社 Système de calcul, dispositif de gestion de base de données, et procédé de calcul
WO2015166540A1 (fr) * 2014-04-28 2015-11-05 株式会社日立製作所 Appareil de stockage, procédé de traitement de données correspondant et système de stockage

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015105043A1 (fr) * 2014-01-08 2015-07-16 日本電気株式会社 Système de calcul, dispositif de gestion de base de données, et procédé de calcul
WO2015166540A1 (fr) * 2014-04-28 2015-11-05 株式会社日立製作所 Appareil de stockage, procédé de traitement de données correspondant et système de stockage

Similar Documents

Publication Publication Date Title
US10176092B2 (en) System and method for executing data processing tasks using resilient distributed datasets (RDDs) in a storage device
US8819335B1 (en) System and method for executing map-reduce tasks in a storage device
TWI664541B (zh) 用於自主記憶體搜尋之方法及系統
US20170286507A1 (en) Database search system and database search method
WO2015166540A1 (fr) Appareil de stockage, procédé de traitement de données correspondant et système de stockage
KR102610636B1 (ko) 데이터베이스 가속기로의 병렬 컴퓨트 오프로드
US10810174B2 (en) Database management system, database server, and database management method
US11042328B2 (en) Storage apparatus and method for autonomous space compaction
CN104765574A (zh) 数据云端存储方法
US10482087B2 (en) Storage system and method of operating the same
CN112346647A (zh) 数据存储方法、装置、设备和介质
US11194522B2 (en) Networked shuffle storage
US11797506B2 (en) Database management systems for managing data with data confidence
US10437478B1 (en) Replication based on a differential multiple write command
WO2018083751A1 (fr) Sous-système de mémorisation intelligent
US10860577B2 (en) Search processing system and method for processing search requests involving data transfer amount unknown to host
US20180329756A1 (en) Distributed processing system, distributed processing method, and storage medium
WO2019008715A1 (fr) Programme de chargement de données, procédé de chargement de données et dispositif de chargement de données
WO2016135874A1 (fr) Ordinateur et procédé de gestion de base de données
US20230297575A1 (en) Storage system and data cache method
US20190065559A1 (en) Computer system and database management method
US20240111743A1 (en) Efficient evaluation of queries across multiple columnar storage tiers
WO2016181640A1 (fr) Appareil, procédé et programme de calcul

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16920554

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16920554

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP