WO2024032526A1 - Procédé et système de traitement de récupération de données - Google Patents

Procédé et système de traitement de récupération de données Download PDF

Info

Publication number
WO2024032526A1
WO2024032526A1 PCT/CN2023/111396 CN2023111396W WO2024032526A1 WO 2024032526 A1 WO2024032526 A1 WO 2024032526A1 CN 2023111396 W CN2023111396 W CN 2023111396W WO 2024032526 A1 WO2024032526 A1 WO 2024032526A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
ssd
processor
ssd controller
controller
Prior art date
Application number
PCT/CN2023/111396
Other languages
English (en)
Chinese (zh)
Inventor
张先国
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2024032526A1 publication Critical patent/WO2024032526A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Definitions

  • This application relates to the field of data processing, specifically, to a data retrieval processing method and system.
  • SQL Structured Query Language
  • SQL statements are widely used in database systems.
  • Database systems run on computing devices (such as servers).
  • the computing device architecture is basically the same. Almost all computing devices include a central processing unit (CPU for short), hard disk and memory.
  • Databases use tables to store data, and these tables are stored on the hard disk. When it is necessary to operate on the data in the database, all the data in the table will first be copied from the hard disk to the memory, and then the data will be filtered and processed from the data saved in the memory.
  • Embodiments of the present application provide a data retrieval processing method and system to at least solve the problem of relatively low efficiency caused by the need to copy a large amount of data to memory and then perform filtering in the data retrieval process in the prior art.
  • a data retrieval processing method including: an SSD controller receiving a data processing command from a processor, wherein the SSD controller is set on the SSD, and the SSD passes the SSD The controller is connected to the processor, and the SSD controller is used to process data in the SSD; the SSD controller obtains filtering parameters carried in the data processing command, wherein the filtering parameters are used to indicate The SSD controller filters out data that meets the filtering parameters; the SSD controller searches the database table stored in the SSD according to the filtering parameters, and sends the retrieved data to the processor.
  • an SSD controller is also provided, the SSD controller is used to execute one or more computer instructions, wherein the one or more computer instructions are executed by the SSD controller to implement the above method steps.
  • a data retrieval processing method including: the processor sends a data processing command to an SSD controller, wherein the SSD controller is set on the SSD, and the SSD passes the An SSD controller is connected to the processor, and the SSD controller is used to process data in the SSD; the data processing command carries filtering parameters, and the filtering parameters are used to instruct the SSD controller to filter out Data that conforms to the filtering parameters; the processor receives the data sent by the SSD controller, where the data is retrieved by the SSD controller from a database table saved in the SSD according to the filtering parameters. of.
  • a processor is also provided, the processor is configured to execute one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the above method. step.
  • a data retrieval processing system including: the above-mentioned processor and the above-mentioned SSD controller.
  • an SSD controller is used to receive data processing commands from the processor, wherein the SSD controller is provided on the SSD, and the SSD is connected to the processor through the SSD controller.
  • the SSD controller is used to process the data in the SSD; the SSD controller obtains the filtering parameters carried in the data processing command, where the filtering parameters are used to instruct the SSD controller to filter out data that matches the Filtering parameter data; the SSD controller retrieves the filtering parameters in the database table saved in the SSD and sends the retrieved data to the processor.
  • Figure 1 is a data flow diagram based on retrieval using SQL statements in related technologies
  • FIG. 2 is a flowchart 1 of a data retrieval processing method according to an embodiment of the present application
  • Figure 3 is a flow chart 2 of a data retrieval processing method according to an embodiment of the present application.
  • Figure 4 is an interaction flow chart between a processor and an SSD controller according to an embodiment of the present application
  • Figure 5 is a schematic diagram of a data retrieval processing system according to an embodiment of the present application.
  • Figure 6 is a schematic diagram of a software architecture according to an embodiment of the present application.
  • SQL statement SELECT name, country FROM Websites. This statement is used to select the name column and country column from a table named Website.
  • the CPU sends a read command to the hard disk. The read command is used to read the Websites table from the hard disk into the memory. Then, the CPU filters the data from the Websites table in the memory and filters out the name. column and country column.
  • solid state disk Solid State Disk or Solid State Drive, referred to as SSD
  • SSD Solid State Disk
  • a solid-state drive also known as a solid-state drive, is a hard drive made from an array of solid-state electronic memory chips.
  • FIG 1 is a data flow diagram according to the retrieval using SQL statements in the related art.
  • Figure 1 shows the retrieval scheme in the related art. In this retrieval scheme, it goes through a process of reading (read) to copy (copy) to The process of searching.
  • the CPU can connect to different SSDs through a high-speed serial bus (peripheral component interconnect express, referred to as PCIe). Each SSD is divided into multiple storage blocks (blocks). Each SSD is configured An SSD controller (Controller).
  • PCIE peripheral component interconnect express
  • the CPU sends a read command to the SSD controller through PCIE, which is step 1 shown in Figure 1; the SSD controller searches for the storage block where the data table is located and reads the data table from the data block.
  • the CPU searches SQL in the data table saved in the memory
  • the query results corresponding to the statement.
  • the CPU initiates read instructions to multiple SSDs; the SSDs read in multiple blocks; then, copy from multiple blocks to memory (memory), this copy process will cause a bottleneck due to limited hardware; finally, the CPU reads in the memory Retrieval of data in the retrieval process will also be subject to memory constraints and create a bottleneck.
  • the execution efficiency of SQL statements needs to be improved.
  • the process of filtering data can be partially or All are transplanted to the SSD controller, so that the SSD controller can perform data filtering and send the filtered data to the processor, thereby reducing the amount of data copied from the hard disk to the memory, so that it can overcome the problem to a certain extent. Copying the hard disk to the memory eliminates the hardware bottleneck problem and improves data retrieval efficiency.
  • Figure 2 is a flowchart 1 of the data retrieval processing method according to the embodiment of the present application. The method steps involved in Figure 2 will be described below.
  • Step S202 the SSD controller receives the data processing command from the processor, wherein the SSD controller is provided on the solid state drive SSD, the SSD is connected to the processor through the SSD controller, and the SSD controller is used to Process the data in the SSD;
  • Step S204 The SSD controller obtains the filtering parameters carried in the data processing command, where the filtering parameters are used to instruct the SSD controller to filter out data that meets the filtering parameters;
  • the filtering parameters involved in this step are used to filter out the required data from the database table.
  • the filtering parameters can be carried in the data processing command according to the format agreed between the SSD controller and the processor, and then The SSD controller can identify and use the filter parameter from the data processing command according to the agreed format.
  • SQL as a database operating language is widely used to filter data from database tables. Therefore, in database software, data can be obtained from database tables by executing SQL statements. The executed SQL statements can carry parameters, which are used to filter data from the database table.
  • the parameters carried in the SQL statement can be converted into filter parameters and carried in the data processing command to be sent to the SSD controller, which is equivalent to passing the SSD controller.
  • This kind of hardware accelerates the execution of SQL statements, that is, the filtering parameters are obtained according to the parameters carried in the SQL statements, where the parameters carried in the SQL statements are used to filter data from the database table.
  • the parameters used to retrieve the data can also be converted into filter parameters and carried in the data processing command sent to the SSD controller.
  • Step S206 The SSD controller searches the database table stored in the SSD according to the filtering parameters, and sends the retrieved data to the processor.
  • the processor sends the data processing command to the SSD controller.
  • the data processing command carries filtering parameters for data filtering.
  • the SSD controller can execute Data filtering work, filtering the data in the database table according to the filtering parameters, and then sending the filtered data to the processor.
  • the SSD controller provides the entire data table to the processor through the memory.
  • the SSD controller filters the data in the data table and then provides it to the processor. This It will inevitably reduce the copying of data from SSD to memory. Therefore, in the above steps, the data is filtered through the SSD controller, which reduces the copying of data from SSD to memory and gets rid of the bottleneck restriction of copying SSD to memory, thereby improving improve the efficiency of data retrieval.
  • step S206 the SSD controller searches the database table saved in the SSD according to the filtering parameter, and sends the retrieved data to the processor.
  • the SSD controller can also set up a cache by itself. Therefore, the SSD controller can first cache the data retrieved according to the filtering parameters in the cache of the SSD controller. Then, the SSD controller can cache the data retrieved by the SSD controller. The data cached in the controller is sent to the processor. For example, double-rate synchronous dynamic random access memory (DDR) can be set up in the SSD controller to be used as a cache. Since the cache is set up in the SSD controller, the SSD control will process the data in the cache faster, so , data retrieval efficiency can be further improved by using the cache built into the SSD controller.
  • DDR double-rate synchronous dynamic random access memory
  • an existing method of sending commands by the CPU and SSD controller can be improved by adding data processing commands that can carry filtering parameters. This processing method can make the most of the existing command system, making it easier to implement. This is illustrated below with an example.
  • the SSD controller can be connected to the processor through PCIe.
  • the existing commands that can be transmitted through PCIe can be improved, and commands that can carry filtering parameters can be added to the existing command system. That’s it.
  • NVMe commands can be standardized through the non-volatile memory host controller interface as data processing commands in the above steps.
  • NVMe Non-Volatile Memory Host Controller Interface Specification
  • PCIe PCI Express
  • NVMe command is the way of information exchange between NVMe host (such as CPU) and NVMe controller (such as SSD controller).
  • NVMe Command is the basic unit for communication between CPU and SSD controller.
  • the application's I/O request must also be converted into NVMe Command.
  • the format of NVMe Command is defined in NVMe Spec and occupies 64 bytes.
  • NVMe Command is divided into two categories: management command (Admin Command) and IO command (IO Command). Among them, Admin Command is mainly used for configuration, while IO Command is used for data transmission.
  • the SSD controller receives a data processing command from the processor through the high-speed serial bus PCIe, where the data processing command is a non-volatile memory host controller interface specification NVMe command.
  • the existing NVMe commands can be improved by adding filtering parameters to the commands. This optional method is relatively easy to implement.
  • the read command (read) in the NVMe command can be improved and filtering parameters can be added to the read command.
  • the read command is first explained below.
  • the Read command reads the data block specified by the command from the media and copies it to the provided data buffer. The following is the official description of the NVMe read command.
  • Optional reftag when used with protection information.
  • filter represents the filtering parameter, among which, --block-filter count is used to indicate the statistical counting of data that meets the filtering parameter, and --block-filter-search is used to indicate the search for data that meets the filtering parameter. .
  • the SSD controller can retrieve data that satisfies the filtering parameters and send the retrieved data to the processor.
  • the SSD controller can choose to use a stream to send the data.
  • the step of the SSD controller sending the retrieved data to the processor may include: the SSD controller sending the retrieved data to the processor through a pre-created data input stream, Wherein, the data input stream is used to transmit data in the form of a data stream in the SSD, and after the SSD controller determines that the retrieved data has been sent, the data input stream is closed.
  • the SSD controller can perform data filtering work, filter the data in the database table according to the filtering parameters, and then send the filtered data to the processor.
  • the above descriptions are from the perspective of the SSD controller. It is described from the perspective of the processor (CPU).
  • Figure 3 is a flow chart 2 of a data retrieval processing method according to an embodiment of the present application. The steps involved in Figure 3 will be described below.
  • Step S302 the processor sends the data processing command to the SSD controller, wherein the SSD controller is set on the SSD, the SSD is connected to the processor through the SSD controller, and the SSD controller is used to The data in the SSD is processed; the data processing command carries filtering parameters, and the filtering parameters are used to instruct the SSD controller to filter out data that meets the filtering parameters;
  • the filtering parameters involved in this step are used to filter out the required data from the database table.
  • the filtering parameter can be carried in the data processing command according to the format agreed between the SSD controller and the processor, and then the SSD controller can identify and use the filtering parameter from the data processing command according to the agreed format.
  • SQL as a database operation language is widely used to filter data from database tables. Therefore, at the software level, SQL language can be used to obtain data from the database table.
  • the processing The processor can obtain the parameters carried in the SQL statement, where the parameters carried in the SQL statement are used to filter data from the database table; the processor generates the filtering parameters according to the parameters carried in the SQL statement. ;
  • the processor sends the data processing command carrying the filtering parameters to the SSD controller.
  • the parameters used to retrieve the data can also be converted into filter parameters and carried in the data processing commands sent to the SSD controller.
  • Step S304 The processor receives data sent by the SSD controller, where the data is retrieved by the SSD controller from a database table saved in the SSD according to the filtering parameters.
  • the processor sends the data processing command carrying the filtering parameters to the SSD controller.
  • the SSD controller After retrieving data according to the filtering parameters in the data processing command, the SSD controller sends the retrieved data to the CPU.
  • the above steps no longer need to copy the entire data table to the memory, that is, it reduces The copy from SSD to memory is eliminated, so that it is no longer restricted by the hardware bottleneck of copying from SSD to memory, and improves the efficiency of data retrieval.
  • Figure 4 is an interaction flow chart between the processor and the SSD controller according to an embodiment of the present application.
  • Figure 4 combines the steps in Figure 2 and Figure 3.
  • the interaction process includes the following steps:
  • Step S402 The processor sends a data processing command to the SSD controller, where the data processing command carries filtering parameters.
  • Step S404 The SSD controller receives a data processing command from the processor.
  • Step S406 The SSD controller obtains the filtering parameters carried in the data processing command.
  • Step S408 The SSD controller searches the database table stored in the SSD according to the filtering parameters, and sends the retrieved data to the processor.
  • Step S410 The processor receives data sent by the SSD controller.
  • the SSD can store data in the form of data blocks, and the SSD connected to the CPU may be more than just one SSD.
  • the processor will Said number Sending the data processing command to the SSD controller may include the following steps: the processor determines the storage block where the database table is located, wherein an SSD includes multiple storage blocks; the processor determines the storage block where the database table is located; The storage block determines the SSD where the storage block is located; the processor sends the data processing command to the SSD controller on the SSD where the storage block is located.
  • the main consideration is the processing when the SSD is used as a distributed storage system.
  • the database tables may not be stored in the same SSD, but distributed in different SSDs.
  • the CPU needs to obtain each SSD where the database table is located, and then distribute the data processing commands to the SSD controller of each SSD.
  • the processor receiving the data sent by the SSD controller may include the following steps: the processor receives the data sent by the different SSD controllers. , wherein the SSD controller of each SSD in the storage block where the database table is located sends the retrieved data to the processor; the processor integrates the data sent from the different SSD controllers.
  • the processor can process data stored in the distributed file storage system by sending data processing commands to different SSD controllers and integrating data from different SSD controllers.
  • the following uses HDFS as an example for explanation.
  • HDFS Hadoop Distributed File System
  • POSIX Portable Operating System Interface
  • HDFS Metadata node
  • DataNode data node
  • NameNode is the management node, and its main functions include: (1) Managing the name space of HDFS; (2) Configuring copy strategies; (3) Manage data block (Block) mapping information; (4) Process client read and write requests.
  • HDFS has a fault-tolerant system, so for a piece of data, there may be multiple data copies. These data copies are stored in data nodes. Each data node stores multiple data blocks. Therefore, the main function of DataNode is to Issue commands to perform actual operations, such as storing actual data blocks and performing read/write operations on data blocks.
  • File reading in the HDFS system is performed using streams, which is explained below.
  • FileSystem uses remote procedure call (RPC) to call the metadata node, and uses getBlockLocations() to obtain the data block information of the file.
  • RPC remote procedure call
  • getBlockLocations() to obtain the data block information of the file.
  • the metadata node returns the address of the data node that saves the data block.
  • FileSystem returns FSDataInputStream, which is used to read data.
  • FSDataInputStream is a data input stream object. Start reading data by calling the read() function of FSDataInputStream (or become a read command).
  • FSDataInputStream connects to the nearest data node that saves the first data block of this file, and the data is transferred from the data node Read. When this data block is read, FSDataInputStream closes the connection with this data node, and then connects to the nearest data node of the next data block in this file. When the data is read, the close function of FSDataInputStream is called.
  • the filter parameters can be carried in the read command, that is,
  • the processor sending the data processing command to the SSD controller may include the following steps: the processor creates a data input stream and sends a read command in the data input stream as the data processing command. to the SSD controller, wherein the read command carries the filter parameter, and the data input stream is used to transmit data in the SSD in the form of a data stream; the processor receives the SSD control
  • the data sent by the processor includes: the processor receives the data sent by the SSD controller through the data input stream.
  • reading distributed files can include the following steps:
  • Step 3 improves the read() method.
  • Step 1 call the distributed file DistributedFileSystem.open() method to open the distributed file.
  • Step 2 addressing the request.
  • DistributedFileSystem calls the NameNode using RPC.
  • NameNode returns the address of the DataNode where the copy is stored.
  • DistributedFileSystem returns an input stream object (FSDataInputStream), which encapsulates the input stream DFSInputStream.
  • Step 3 DataNode processing.
  • the filter parameter (or can also be called the completion filter interface) is passed in from the parameter of the read function.
  • the filter parameter is used to complete the data filtering of the SQL statement.
  • Step 4 get data from DataNode.
  • the data is read from the DataNode by calling the read() method in a loop.
  • Step 5 read additional DataNode until completed.
  • the input stream DFSInputStream closes the connection with the DataNode and looks for the next DataNode.
  • Step 6 After completing reading, call the input stream FSDataInputStream.close() to close the connection.
  • the filter parameters can be passed in through the read function, so that this optional implementation can be applied in a distributed file system. Moreover, after using the data input stream, the read data can be verified through the data input stream to ensure the correctness of the data reading.
  • filter parameters can be passed through the FSDataInputStream.read() method.
  • filtering parameters can also be passed through improvements to existing functions.
  • the CPU can send NVMe commands (Command) to the SSD.
  • the processor sending the data processing command to the SSD controller may include the following steps: the processor converts the read command in the data input stream into a non-volatile memory host controller The interface specifies an NVMe command, where the NVMe command carries the filtering parameter; the processor sends the NVMe command to the SSD controller through PCIe.
  • the existing NVMe commands can be improved by adding filtering parameters to the commands. This optional method is relatively easy to implement.
  • the read command (read) in the NVMe command can be improved and filtering parameters can be added to the read command.
  • filtering parameters can be added to the read command.
  • the following is an example of the added filtering parameters.
  • filter represents the filtering parameter, among which, --block-filter count is used to indicate the statistical counting of data that meets the filtering parameter, and --block-filter-search is used to indicate the search for data that meets the filtering parameter. .
  • FIG. 5 is a schematic diagram of a data retrieval processing system according to an embodiment of the present application. The following describes a data retrieval processing system in an optional implementation with reference to FIG. 5 .
  • SSD is the SSD used to store data.
  • the SSD controller in SSD is equivalent to a CPU, and its work is controlled by the SSD. Therefore, in Figure 5, the SSD controller is shown as (CPU+Ctrl), In order to easily distinguish it from the CPU connected to the SSD through PCIe, it will still be called the SSD controller (or SSD intelligent controller or Controller) in the following description.
  • the SSD controller or SSD intelligent controller or Controller
  • the SSD uses distributed storage, so the retrieval performed by the system shown in Figure 5 is also called distributed retrieval. Since in Figure 5 it is no longer necessary to copy the entire database table to the memory, this saves the IO operation from the SSD to the memory, so this method is also called a zero IO data retrieval method.
  • the CPU issues a query command (ie, data processing command) based on the SQL statement.
  • the data processing command is used to retrieve data that meets the filtering parameters (for example, retrieve data including predetermined keywords, etc.), which carries the filtering parameters.
  • the data processing command will be received by the SSD controller, namely 1 shown in Figure 5. It should be noted that if the data to be retrieved is distributed in different SSDs, the CPU will send data processing commands to different SSD controllers.
  • the SSD intelligent controller reads data within the data block range into the built-in DDR of the SSD controller, and then the SSD Controller retrieves the data based on the filtering parameters. That is, 2 shown in Figure 5, multiple different SSD controllers will perform the same step. Each SSD controller will perform a retrieval operation, that is, 3 shown in Figure 5. Then, the SSD controller will send the retrieved data to the CPU through PCIe, and the CPU will merge the data retrieval results from different SSD controllers. That is 4 shown in Figure 5.
  • FIG. 6 is a schematic diagram of the software architecture according to an embodiment of the present application. The following describes the software architecture in this optional implementation with reference to FIG. 6 .
  • FIG 6 it involves the application layer (i.e. APP), the operating system layer (i.e. OS) and SSD.
  • the operating system layer involves the file system (FileSystem) and the SSD driver layer (SSD Driver).
  • the application layer can include various applications that can use SQL statements, such as Spark SQL, Hive, Hbase and OLAP, etc.
  • Spark is a fast, versatile and scalable big data analysis engine, which includes Spark SQL
  • Hive is A data warehouse tool based on Hadoop, used for data extraction, transformation, and loading. This is a mechanism that can store, query, and analyze large-scale data stored in Hadoop
  • HBase is a distributed database for column storage.
  • OLAP On-Line Analytical Processing
  • a SQL plug-in is used between the application layer and the OS layer to convert the parameters in the SQL statement into filter parameters.
  • the file systems used by the operating system layer can include many types, such as HDFS, Ext4, etc.
  • ext4 is the fourth-generation extended file system (Fourth extended file system, referred to as ext4) is the file system under the Linux system, which can be passed
  • the parameters of the function in the file system are improved to pass the filter parameters to the SSD Driver.
  • This method is called the FS plugin in this optional implementation.
  • the following uses HDFS as an example for explanation.
  • the data is read from the DataNode by calling the read() method in a loop.
  • the input stream DFSInputStream closes the connection with the DataNode and looks for the next DataNode. After finishing reading, call the input stream FSDataInputStream.close() to close the connection.
  • the SSD Driver can pass filtering parameters to the SSD controller through NVMe commands, which is called an NVMe plugin in this optional implementation.
  • the filtering parameters can be used to instruct the SSD controller to filter out data that meets the filtering parameters.
  • the filtering parameters can also indicate other functions at the same time, such as encrypting or decrypting the data when filtering out data that meets the filtering parameters. Processing, compressing the data when filtering out data that meets the filtering parameters, converting the type or format of the filtered data, performing calculations such as summing the filtered data, and performing size calculations on the filtered data. Compare, perform partial character matching on the filtered data, use a dictionary to replace some characters in the filtered data, etc.
  • the SSD controller After receiving the NVMe command from the processor in the SSD, the SSD controller (i.e. Controller Logic in Figure 5, which can also be referred to as Logic for short) performs a retrieval based on the filtering parameters.
  • the SQL statement is first obtained, such as select num2 from tbl where str1 LIKE'a%'and num1>10GROUP BY num2.
  • the str1 field includes a%, and the value of the num1 field is greater than 10; return the data of the num2 data column that meets the above conditions, and return the result set Make groups.
  • the filtering parameters (also called command parameters) to HDFS.
  • HDFS passes in the filter instructions and parameters through FSDataInputStream.read(filter, arg1, arg2).
  • the filter in the read function indicates that the filter parameters are passed in, and the specific content of the filter parameters is reflected in arg1 and arg2.
  • the NVMe Read function carries the following filtering parameters: --block-filter count (for group statistics), so that the filtering parameters can be passed to the SSD controller, and the SSD controller performs data retrieval based on the filtering parameters. (also known as scans), filtering and aggregation, etc.
  • the hardware mechanism of CPU+SSD controller (Logic)+storage is adopted.
  • Hardware acceleration is achieved through Logic, without the need for external chips, and the cost is lower.
  • SQL, Filesystem, NVMe Driver software interface layer technology extension, realizes transparent transmission of application logic to hardware, and realizes accelerated computing.
  • an electronic device including a memory and a processor.
  • a computer program is stored in the memory, and the processor is configured to run the computer program to perform the method in the above embodiment.
  • Computer-readable media include permanent and non-permanent, removable and non-removable media that can be stored in memory by any method or technology to achieve information storage. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cassettes tape disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • These computer programs may also be loaded onto a computer or other programmable data processing device such that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processes, thereby causing instructions to be executed on the computer or other programmable device
  • steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one or multiple boxes in the block diagram, and corresponding to different steps can be implemented through different modules.
  • the device is called a data retrieval and processing device and is located in an SSD controller.
  • the device includes: a first receiving module for receiving data processing commands from the processor, wherein the SSD controller is disposed on the SSD. , the SSD is connected to the processor through the SSD controller, and the SSD controller is used to process the data in the SSD; the first acquisition module is used to acquire the filtering parameters carried in the data processing command. , wherein the filtering parameters are used to instruct the SSD controller to filter out data that conforms to the filtering parameters; the first sending module is used to retrieve the database table saved in the SSD according to the filtering parameters, and Send the retrieved data to the processor.
  • the system or device is used to implement the functions of the method in the above embodiments.
  • Each module in the system or device corresponds to each step in the method. Those that have been described in the method will not be described again here. .
  • the first sending module is configured to cache the data retrieved according to the filtering parameters in the cache of the SSD controller; send the data cached in the SSD controller to the processing device.
  • the first receiving module is configured to receive a data processing command from the processor through a high-speed serial bus PCIe, where the data processing command is a non-volatile memory host controller interface specification NVMe command.
  • the first sending module is used to send the retrieved data to the processor through a pre-created data input stream, wherein the data input stream is used to transmit in the form of a data stream in the SSD data, and after the SSD controller completes sending the retrieved data, the data input stream is closed.
  • the filtering parameters are obtained according to the parameters carried in the SQL statement, where the parameters carried in the SQL statement are used to filter data from the database table.
  • another data retrieval processing device located in the processing, and the device includes: a second sending module for sending data processing commands to the SSD controller, wherein the SSD The controller is arranged on the SSD, and the SSD is connected to the processor through the SSD controller.
  • the SSD controller is used to process data in the SSD; the data processing command carries filtering parameters, so The filtering parameters are used to instruct the SSD controller to filter out data that conforms to the filtering parameters; the second receiving module is used to receive data sent by the SSD controller, wherein the data is obtained by the SSD controller according to the The filter parameters are retrieved from the database table saved in the SSD.
  • the system or device is used to implement the functions of the method in the above embodiments.
  • Each module in the system or device corresponds to each step in the method. Those that have been described in the method will not be described again here. .
  • the second sending module is used to obtain the storage block where the database table is located, wherein one SSD includes multiple storage blocks; obtain the SSD where the storage block is located according to the storage block where the database table is located; Send the data processing command to the SSD controller on the SSD where the storage block is located.
  • the second receiving module is used to receive data sent from different SSD controllers, where the storage blocks where the database table is located are The SSD controller of each SSD sends the retrieved data to the processor; the data sent from the different SSD controllers are integrated.
  • the second sending module is used to obtain parameters carried in the SQL statement, where the parameters carried in the SQL statement are used to filter data from the database table; according to the parameters carried in the SQL statement Parameter-generating the filtering parameters; sending the data processing command carrying the filtering parameters to the SSD controller.
  • the second sending module is used to send the data processing command to the SSD controller including: the processor creates a data input stream, and uses the read command in the data input stream as the The data processing command is sent to the SSD controller, wherein the read command carries the filter parameter, and the data input stream is used to transmit data in the SSD in the form of a data stream; the second The receiving module is configured to receive data sent by the SSD controller through the data input stream.
  • the second sending module is used to convert the read command in the data input stream into a non-volatile memory host controller interface specification NVMe command, wherein the NVMe command carries the filter Parameters: Send the NVMe command to the SSD controller via PCIe.
  • the above embodiment solves the problem of relatively low efficiency caused by the need to copy a large amount of data to the memory and then filter it during the data retrieval process in the prior art.
  • the above embodiment filters the data through the SSD controller, reducing the number of data Copying from hard disk to memory gets rid of the bottleneck restriction of copying from hard disk to memory, thereby improving the efficiency of data retrieval.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente demande divulgue un procédé et un système de traitement de récupération de données. Le procédé comprend les étapes suivantes : un contrôleur SSD reçoit une instruction de traitement de données d'un processeur ; le contrôleur SSD acquiert un paramètre de filtrage figurant dans l'instruction de traitement de données, le paramètre de filtrage servant à demander au contrôleur SSD de filtrer les données conformes au paramètre de filtrage ; et le contrôleur SSD effectue, en fonction du paramètre de filtrage, une récupération dans une table de base de données stockée dans un SSD, puis envoie les données récupérées au processeur. La présente demande permet de résoudre le problème de faible efficacité de l'état de la technique provoqué par la nécessité de copier une grande quantité de données dans une mémoire avant de filter celles-ci dans le processus de récupération de données. Selon la présente demande, les données sont filtrées au moyen d'un dispositif de commande SSD de sorte que la copie des données d'un SSD vers une mémoire est réduite et que les difficultés de copie d'un SSD vers une mémoire sont éliminées, ce qui permet d'augmenter l'efficacité de récupération de données.
PCT/CN2023/111396 2022-08-10 2023-08-07 Procédé et système de traitement de récupération de données WO2024032526A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210956509.8 2022-08-10
CN202210956509.8A CN115438066A (zh) 2022-08-10 2022-08-10 一种数据检索处理方法和系统

Publications (1)

Publication Number Publication Date
WO2024032526A1 true WO2024032526A1 (fr) 2024-02-15

Family

ID=84242455

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/111396 WO2024032526A1 (fr) 2022-08-10 2023-08-07 Procédé et système de traitement de récupération de données

Country Status (2)

Country Link
CN (1) CN115438066A (fr)
WO (1) WO2024032526A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438066A (zh) * 2022-08-10 2022-12-06 阿里巴巴(中国)有限公司 一种数据检索处理方法和系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104571946A (zh) * 2014-11-28 2015-04-29 中国科学院上海微系统与信息技术研究所 一种支持逻辑电路快速查询的存储器装置及其访问方法
CN105426119A (zh) * 2015-10-28 2016-03-23 上海新储集成电路有限公司 一种存储设备及数据处理方法
US20210081135A1 (en) * 2019-09-13 2021-03-18 Toshiba Memory Corporation Ssd supporting low latency operation
CN115438066A (zh) * 2022-08-10 2022-12-06 阿里巴巴(中国)有限公司 一种数据检索处理方法和系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104571946A (zh) * 2014-11-28 2015-04-29 中国科学院上海微系统与信息技术研究所 一种支持逻辑电路快速查询的存储器装置及其访问方法
CN105426119A (zh) * 2015-10-28 2016-03-23 上海新储集成电路有限公司 一种存储设备及数据处理方法
US20210081135A1 (en) * 2019-09-13 2021-03-18 Toshiba Memory Corporation Ssd supporting low latency operation
CN115438066A (zh) * 2022-08-10 2022-12-06 阿里巴巴(中国)有限公司 一种数据检索处理方法和系统

Also Published As

Publication number Publication date
CN115438066A (zh) 2022-12-06

Similar Documents

Publication Publication Date Title
US8819335B1 (en) System and method for executing map-reduce tasks in a storage device
CN105144160B (zh) 利用闪存高速缓存中动态生成的替代数据格式加速查询的方法
CN109213772B (zh) 数据存储方法及NVMe存储系统
US9021189B2 (en) System and method for performing efficient processing of data stored in a storage node
US10824673B2 (en) Column store main fragments in non-volatile RAM and the column store main fragments are merged with delta fragments, wherein the column store main fragments are not allocated to volatile random access memory and initialized from disk
US9092321B2 (en) System and method for performing efficient searches and queries in a storage node
US10133800B2 (en) Processing datasets with a DBMS engine
WO2013155751A1 (fr) Procédé de traitement d'interrogation concurrentes de base de données orientée olap
US8812489B2 (en) Swapping expected and candidate affinities in a query plan cache
JP6563642B2 (ja) 内部ハードウェアフィルターを含むデータ保存装置とその作動方法
WO2018157680A1 (fr) Procédé et dispositif de génération de plan d'exécution, et serveur de base de données
WO2024032526A1 (fr) Procédé et système de traitement de récupération de données
CN103595797B (zh) 一种分布式存储系统中的缓存方法
US10990571B1 (en) Online reordering of database table columns
WO2022134269A1 (fr) Procédé d'optimisation d'un moteur de pré-calcul de traitement analytique en ligne sur la base d'un stockage d'objets, et son application
CN115617878B (zh) 一种数据查询方法、系统、装置、设备及计算机存储介质
US9824025B2 (en) Information processing system, information processing device, information processing program and information processing method
US20230196199A1 (en) Querying databases with machine learning model references
CN117120998A (zh) 用于读取树数据结构中保存的数据的方法和装置
WO2023015780A1 (fr) Système et procédé d'accélération de l'exploitation des bases de données hiérarchiques
US9069821B2 (en) Method of processing files in storage system and data server using the method
WO2021057824A1 (fr) Procédé et appareil d'interrogation de données, dispositif informatique et support de stockage
WO2022222523A1 (fr) Procédé et appareil de gestion de journal
US11914587B2 (en) Systems and methods for key-based indexing in storage devices
WO2024001827A1 (fr) Procédé, appareil et système d'accès à des données

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23851747

Country of ref document: EP

Kind code of ref document: A1