WO2022257575A1 - 一种数据处理方法、装置以及设备 - Google Patents

一种数据处理方法、装置以及设备 Download PDF

Info

Publication number
WO2022257575A1
WO2022257575A1 PCT/CN2022/084919 CN2022084919W WO2022257575A1 WO 2022257575 A1 WO2022257575 A1 WO 2022257575A1 CN 2022084919 W CN2022084919 W CN 2022084919W WO 2022257575 A1 WO2022257575 A1 WO 2022257575A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data set
acceleration device
processor
format
Prior art date
Application number
PCT/CN2022/084919
Other languages
English (en)
French (fr)
Inventor
王俊捷
阙鸣健
郑渊悦
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022257575A1 publication Critical patent/WO2022257575A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/24569Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Definitions

  • the present application relates to the field of storage technologies, and in particular, to a data processing method, device, and equipment.
  • the database can store data at row granularity and at column granularity.
  • the data stored on the basis of behavior maintains the original form of the data to a certain extent, which is convenient for operations such as adding, deleting, checking and modifying the data, and is more suitable for on-line transaction processing (OLTP) business scenarios.
  • Data stored on the basis of columns arranges and stores the data of the same field together, which is convenient for subsequent analysis of the data, and is more suitable for on-line analytical processing (OLAP) business scenarios.
  • the present application provides a data processing method, device and equipment to accelerate format conversion and reduce CPU consumption.
  • the embodiment of the present application provides a data processing method, which can be applied to a device including an acceleration device and a processor.
  • the processor and the acceleration device can be connected through PCIe, and interact through PCIe.
  • the processor may send a data processor request to the acceleration device, and the data processing request is used to realize the format conversion of the first data set including a plurality of data in the database.
  • the acceleration device may obtain the first data set according to the data processing request.
  • the acceleration device may perform format conversion on the first data set, and convert the first data set stored in the first manner into the second data set stored in the second manner.
  • the acceleration device may also store the second data set in the target storage space. Wherein, the second data set includes at least one data, and the second manner is different from the first manner.
  • data sets can be converted, that is, the device can support two different data storage formats, both row storage and column storage. This makes the device suitable for both OLTP and OLAP business scenarios.
  • the processor no longer performs the conversion operation, but the acceleration device performs the conversion operation, which can greatly reduce the occupation of the processor, ensure the data processing efficiency of the processor, and improve the format conversion efficiency at the same time.
  • the first method and the second method are row storage or column storage respectively.
  • Row storage is used to indicate that data is stored in the database based on rows
  • column storage is used to indicate that data is stored in the database based on columns. Storing data.
  • the acceleration device can convert the first data set of row storage into the second data set of column storage, ensure that the second data set can be used in OLAP business scenarios, and can also convert the first data set of column storage into row storage.
  • the stored second data set ensures that the second data set can be used in OLTP business scenarios.
  • the acceleration device when the first method is row storage and the second method is column storage, the acceleration device is performing format conversion, and different conversion methods may be used for different types of fields.
  • the conversion method of fixed-length fields and variable-length fields is used as an example to illustrate:
  • the acceleration device can obtain each data under the fixed-length field in the first data set, and arrange each data continuously to generate the second data set.
  • the second data set also includes null value indication information, and the null value indication information is used to indicate the fixed-length field
  • the data under is either null or non-null.
  • the acceleration device obtains each data in the variable-length field in the first data set, arranges each data continuously to generate a second data set, and the second data set also includes position indication information, which is used to indicate each data under the variable-length field position in the second dataset.
  • the acceleration device performs format conversion in different ways for different fields, so that the converted second data set can clearly and accurately record data, some null value indication information or position indication information of the data, and can ensure the effectiveness of format conversion. sex.
  • the acceleration device can not only realize the conversion of the storage mode, but also realize the conversion of the data format, and convert the data format required for storing data into the data format required for data calculation.
  • the acceleration device can perform data format conversion on the data in the first data set to generate a second data set, wherein the data format of the data in the first data set is the data format required for storing data, and the data format of the second data set is processed by the processor.
  • converting the first data set into the data format required by the processor for data calculation can ensure that the processor can conveniently and quickly obtain the data required for data calculation when performing subsequent data calculation, and improve the efficiency of data calculation.
  • the acceleration device may implement part or all of the following operations when performing data format conversion:
  • the acceleration device obtains the data description information of the decimal type data, and uses the data description information as a part of the second data.
  • the data description information includes: sign, precision, and scale.
  • the acceleration device can perform complement or debit operation on decimal data according to the precision and range.
  • the acceleration device can use the data description information as a part of the second data, so that the subsequent processor can obtain the data description information.
  • the acceleration device adjusts the decimal type data according to the data description information, so that the subsequent processor can perform data calculation on the data.
  • the acceleration device when the data type in the first data set is date type data, when the acceleration device performs data format conversion, the acceleration device can decompose the date type data to obtain multiple sub-data, One sub-data represents one of year, month and day, and multiple sub-data are arranged continuously in the second data.
  • the acceleration device splits the data of the date type into sub-data representing the year, month, and day, so that the processor can separately call the sub-data of the year, month, and day for data calculation.
  • the acceleration device is at least one of SOC, FPGA, GPU, ASIC, AI chip or DPU.
  • the acceleration device can be realized in various and flexible manners, and is suitable for different scenarios.
  • the embodiment of the present application also provides an acceleration device, which has the function of realizing the behavior in the method example of the first aspect above, and the beneficial effects can be referred to the description of the first aspect and will not be repeated here.
  • the functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the acceleration device includes a request acquisition module, a data acquisition module, and a format conversion module. These modules can perform the corresponding functions in the method example of the first aspect above. For details, refer to the detailed description in the method example , which will not be described here.
  • the embodiment of the present application also provides an acceleration device, which has the function of implementing the behavior in the method example of the first aspect above, and the beneficial effects can be referred to the description of the first aspect, and will not be repeated here.
  • the structure of the device includes a processor, and optionally, a memory and a communication interface.
  • the processor is configured to support the acceleration device to execute corresponding functions in the method of the first aspect above.
  • the memory is coupled to the processor and holds necessary computer program instructions and data (such as the first data set or the second data set) of the communication device.
  • the structure of the acceleration device also includes a communication interface for communicating with other devices, such as receiving data processing requests.
  • the embodiment of the present application further provides a computing device, the computing device includes an acceleration device and a processor, and the processor is configured to send a data processing request to the acceleration device.
  • the acceleration device has the function of implementing the behaviors in the method example of the first aspect above, and the beneficial effects can be referred to the description of the first aspect and will not be repeated here.
  • the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer can execute the above-mentioned first aspect and each possibility of the first aspect.
  • the present application further provides a computer program product including instructions, which, when run on a computer, cause the computer to execute the method described in the above first aspect and each possible implementation manner of the first aspect.
  • the present application also provides a computer chip, the chip is connected to the memory, and the chip is used to read and execute the software program stored in the memory, and implement the above first aspect and each possibility of the first aspect.
  • FIG. 1 is a schematic structural diagram of a system provided by the present application.
  • FIG. 2 is a schematic structural diagram of a management device provided by the present application.
  • 3A-3B are schematic structural diagrams of a storage system provided by the present application.
  • Fig. 4 is a schematic diagram of a data processing method provided by the present application.
  • FIG. 5 is a schematic diagram of a first data set provided by the present application.
  • FIG. 6 is a schematic diagram of a row of data in a first data set provided by the present application.
  • FIG. 7 is a schematic diagram of a method for converting fixed-length fields in the first data set provided by the present application.
  • FIG. 8 is a schematic diagram of a method for converting variable-length fields in the first data set provided by the present application.
  • FIG. 9 is a schematic diagram of a first data set and a second data set provided by the present application.
  • Fig. 10 is a schematic structural diagram of an acceleration device provided by the present application.
  • a database can be understood as a form of storing a collection of data.
  • the data in the database can be organized, described and stored according to a specific data model.
  • a relational database is a type of database, and a relational database refers to a database that uses a relational model to establish data relationships and stores data based on the above data relationships.
  • the relational model can be understood as a two-dimensional tabular model.
  • a relational database can be understood as a data organization composed of two-dimensional tables and the links between two-dimensional tables.
  • a relationship can be understood as a two-dimensional table.
  • Each relation has a relation name, which is the table name of the two-dimensional table.
  • a two-dimensional table includes tuples, and each tuple can be understood as a row in a two-dimensional table, and a tuple can also be called a record.
  • An attribute refers to a column in a two-dimensional table, which can also be called a field, and each data in a column can be called each data under the field.
  • a fixed-length field is one that has a fixed length.
  • the length of a fixed-length field is usually recorded in the header of a two-dimensional table.
  • a variable-length field means that each piece of data in the field has a different length, and the length of the variable-length field is not fixed.
  • Character (character) type field integer (int) type field, decimal (decimal) type field, date (date) type field.
  • the field may include a character field, an integer field, a decimal field, and a date field.
  • a character field means that the data in the field is a character.
  • An integer field means that the data in the field is an integer.
  • a decimal field means that the data in the field is an exact value, which can be accurate to several digits after the decimal.
  • a date type field indicates that the data in the field indicates a date.
  • Character fields and date fields are fixed-length fields. In different relationships, decimal fields and integer fields can be fixed-length fields or variable-length fields.
  • FIG. 1 it is a schematic diagram of the architecture of the management system provided by the embodiment of the present application, and the system includes a client 200 and a management device 100 .
  • the client 200 is deployed on the user side.
  • the user can initiate a data request to the management device 100 through the client 200.
  • the user can initiate a data request for requesting data in the database to the management device 100 through the client 200.
  • a data read request to read data or a data write request to request to write data.
  • a user may initiate a data request to the management device 100 through the client 200 to request data in a certain column or multiple columns in the database, for example, a data request to read data in the first column.
  • the client 200 may be deployed on a user's local computing device (for example, computing devices such as servers, computers, laptops, or mobile terminals) or dedicated computing devices (for example, offload card with computing capability) software program.
  • the software program can be a browser, agent or document analysis software.
  • the user can connect to the management device 100 through the software program, such as establishing an Ethernet or wireless network (such as WIFI, 5th Generation (5th Generation , 5G) communication technology) between the computing device where the software program is located and the management device 100. Network connection for information exchange.
  • an Ethernet or wireless network such as WIFI, 5th Generation (5th Generation , 5G) communication technology
  • the management device 100 includes a bus 110 , a processor 120 , an acceleration device 130 , a memory 140 , a communication interface 150 and an external storage 160 .
  • the processor 120 , the acceleration device 130 , the memory 140 , and the communication interface 150 communicate through the bus 110 .
  • the bus 110 may be a line based on peripheral component interconnect express (PCIe).
  • PCIe peripheral component interconnect express
  • the processor 120 may be a central processing unit (central processing unit, CPU), an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field programmable gate array, FPGA), artificial intelligence (artificial intelligence, AI) chip, system on chip (SoC) or complex programmable logic device (complex programmable logic device, CPLD), graphics processing unit (graphics processing unit, GPU), etc.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • AI artificial intelligence
  • SoC system on chip
  • CPLD complex programmable logic device
  • GPU graphics processing unit
  • Memory 140 may include volatile memory (volatile memory), such as RAM, DRAM, etc., and may also include non-volatile memory (non-volatile memory), such as storage class memory (storage class memory, SCM), etc., or volatile Combination of volatile memory and non-volatile memory, etc.
  • volatile memory volatile memory
  • non-volatile memory such as storage class memory (storage class memory, SCM), etc.
  • storage class memory storage class memory, SCM
  • the memory 140 may also include an operating system and other software modules required for running processes.
  • the operating system can be LINUX TM , UNIX TM , WINDOWS TM and so on.
  • the data in the database can also be stored in the internal memory 140, such as the data stored in the internal memory 140 can include the data recently written in the database, when the amount of data in the internal memory 140 reaches a certain threshold, the processor 120 can store the data in the internal memory 140
  • the data is stored in the external memory 160 for persistent storage.
  • the data read from the external storage 160 can be stored in the internal memory 140 first, or it can be said that the data stored in the internal memory 140 can also include the data read from the external storage 160 .
  • External memory 160 also can be referred to as auxiliary memory
  • this external memory 160 can be non-volatile memory (non-volatile memory), such as read-only memory (read-only memory, ROM), hard disk drive (hard disk drive, HDD ) or solid state drive (solid state disk, SSD), etc.
  • the external memory 160 can be used to permanently store data.
  • the storage method of the data in the database stored in the internal memory 140 and the storage method of the data in the database stored in the external memory 160 may be the same or different.
  • the data in the database stored in the internal memory 140 and the external storage 160 may be stored in a row-based or column-based storage manner.
  • the data in the database stored in the memory 140 is stored in a row-based manner
  • the data in the database stored in the external memory 160 is stored in a column-based manner.
  • the data in the database stored in the memory 140 is stored in a column-based manner
  • the data in the database stored in the external memory 160 is stored in a row-based manner.
  • Scenario 1 the storage method of the data in the database stored in the internal memory 140 (also called main memory (main memory)) and the storage method of the data in the database stored in the external storage 160 (also called memory) different.
  • the storage method of the data in the database stored in the internal memory 140 is different from the storage method of the data in the database stored in the external storage 160.
  • format conversion of the data to be migrated is required.
  • the data format conversion can be performed by the acceleration device 130, that is, the acceleration device 130 can execute the data processing method provided in the embodiment of the present application, which is a scenario to which the data processing method provided in the embodiment of the present application applies.
  • migrating the data in the external storage 160 to the internal memory 140 or migrating the data in the internal memory 140 to the external storage 160 may be led by the processor 120 . That is to say, when the data in the external storage 160 needs to be migrated to the internal memory 140, the processor 120 can initiate an instruction to the external storage 160 to obtain the data to be migrated, and the processor 120 obtains the data to be migrated from the external storage 160 Afterwards, the processor 120 can store the data to be migrated to the memory 140, and the processor 120 can also initiate a data processing request to the acceleration device 130 to request the acceleration device 130 to perform format conversion on the data to be migrated, and the acceleration device 130 can execute The data processing method provided in the embodiment of this application.
  • the processor 120 can obtain the data to be migrated from the internal memory 140, and then issue an instruction to the external storage 160 to instruct the external storage 160 to store the data to be migrated, and process
  • the processor 120 may also initiate a data processing request to the acceleration device 130 to request the acceleration device 130 to perform format conversion on the data to be migrated, and the acceleration device 130 may execute the data processing method provided in the embodiment of the present application.
  • Scenario 2 The storage format of the data in the database in the management device 100 is a row-based storage format, and the data request initiated by the client 200 is used to request data of some columns.
  • the communication interface 150 in the management device 100 receives the data request, and sends the data request to the processor 120; the processor 120 can first determine The location where the requested data is stored for this data request.
  • the processor 120 can read the requested data from the memory 140, but since the data storage format in the database is stored in a row-standard storage format.
  • the processor 120 may initiate a data processing request to the acceleration device 130 to request the acceleration device 130 to perform format conversion on the requested data, and the acceleration device 130 may execute the data processing method provided in the embodiment of the present application to convert the format of the requested data It is a column-based storage format. After the acceleration device 130 converts the storage format of the requested data, the processor 120 reads the data whose storage format has been converted from the memory 140 .
  • the processor 120 may move the data requested by the data request from the external storage 160 to the internal memory 140 .
  • the processor 120 may initiate a data processing request to the acceleration device 130 to request the acceleration device 130 to perform format conversion on the requested data, and the acceleration device 130 may execute the data processing method provided in the embodiment of the present application to convert the format of the requested data It is a column-based storage format.
  • the processor 120 reads the data whose storage format has been converted from the memory 140 .
  • Scenario 3 The storage format of the data in the database in the management device 100 is a column-based storage format, and the data request initiated by the client 200 is used to request data of some rows.
  • Scenario 3 is similar to Scenario 2, except that in Scenario 2, data stored in rows is converted to data stored in columns.
  • data stored in rows is converted to data stored in columns.
  • the acceleration device 130 includes a processor 131 and a communication interface 132, and the processor 131 and the communication interface 132 are connected through a bus.
  • the processor 131 can interact with other components (such as the processor 120 ) in the management device 100 through the communication reception 132 , such as receiving a data processing request.
  • the processor 131 is similar to the processor 120, and the processor 131 may be a CPU, ASIC, FPGA, AI chip, SoC, CPLD, or GPU.
  • the processor 131 in the acceleration device 130 may be deployed in the management device 100 as a coprocessor of the processor 120 , and cooperate with the processor 120 to perform operations.
  • a memory 133 can be separately set in the acceleration device 130, and the memory 133 can store computer program instructions, and can also be used as a cache to store data before format conversion (such as the first data set in the embodiment of the present application), or The format-converted data (such as the second data set in the embodiment of this application) is stored.
  • the processor 120 and the processor 131 in the acceleration device 130 may share the memory 140 , that is, the memory 140 may have all or part of the functions of the memory 133 .
  • the acceleration device 130 does not need to separately install the storage 133 .
  • the processor 131 can call the computer program instructions (such as the processor When 131 is a CPU, AI chip or GPU), execute the data processing method provided in the embodiment of the present application.
  • the processor 131 also runs the computer program instruction or the processing logic of the hardware circuit programmed on the processor 131 by itself (such as when the processor 131 is an ASIC, FPGA, SoC, or CPLD), and executes the data processing provided by the embodiment of the present application. method.
  • the management device 100 can be used to manage databases, for example, the management device 100 can be a node in a centralized storage system or a distributed storage system, and can manage databases in a centralized storage system or a distributed storage system to manage.
  • FIG. 3A it is a storage system 300 provided in the embodiment of the present application.
  • the storage system is a centralized storage system, which is characterized by a unified entrance, and all data from external devices must pass through this entrance, which is the engine of the centralized storage system.
  • the engine is the core component of the centralized storage system, where many advanced functions of the storage system are implemented.
  • multiple engines can be deployed.
  • the presence engine 310 is taken as an example.
  • the embodiment of the present application does not limit the number of engines.
  • FIG. 3A illustrates that the engine 310 includes two controllers as an example.
  • controller 0 and controller 1 are mutually backup.
  • controller 0 fails, controller 1 can take over the business of controller 0.
  • controller 1 fails, the controller 0 can take over the services of the controller 1, so as to prevent the hardware failure from causing the entire storage system 300 to be unavailable.
  • Controller 0 is capable of receiving data requests and processing the data requests.
  • the controller 0 can read data from the local internal memory or the hard disk 320 according to the data request, and the processor 120 in the controller 0 judges that it is necessary to perform format conversion Next, the processor 120 in the controller 0 may initiate a data processing request to the acceleration device 130 in the controller 0, and trigger the acceleration device 130 in the controller 0 to execute the data processing method provided in the embodiment of the present application. Controller 0 may also feed back a data read response carrying the read data.
  • the controller 0 can write data in the local memory or the hard disk 320 according to the data write request, and if the processor 120 in the controller 0 judges that it needs to perform format conversion In the case of , the processor 120 in the controller 0 may initiate a data processing request to the acceleration device 130 in the controller 0, and trigger the acceleration device 130 in the controller 0 to execute the data processing method provided in the embodiment of the present application. Controller 0 may also feed back a data write response to indicate that the data has been successfully written.
  • the management device 100 may be the controller 1 or the controller 0 in the engine 310 in the system shown in FIG. 3A .
  • the structure of the controller 1 or the controller 0 reference may be made to the structure of the management device 100 shown in FIG. 2 , which will not be repeated here.
  • Figure 3A shows a centralized storage system with separate disk control.
  • the engine 310 may not have a hard disk slot, the hard disk 320 needs to be placed in a hard disk enclosure, and the rear-end interface 116 communicates with the hard disk enclosure.
  • the back-end interface 116 exists in the engine 310 in the form of an adapter card, and one engine 310 can use two or more back-end interfaces 116 to connect multiple hard disk enclosures at the same time.
  • the adapter card can also be integrated on the motherboard, and at this time the adapter card can communicate with the processor 120 through the PCIE bus.
  • the engine 130 may also have a hard disk slot, and the hard disk 320 is directly inserted into the hard disk slot.
  • FIG. 3B it is a schematic diagram of another storage system architecture provided by the embodiment of the present application.
  • the storage system in FIG. 3B is a distributed storage system, and the storage system 300 includes a computing node cluster and a storage node cluster.
  • the computing node cluster includes one or more computing nodes 330 (two computing nodes 330 are shown in FIG. 3B , but not limited to two computing nodes 330 ), and each computing node 330 can communicate with each other.
  • the computing node 330 is a computing device, such as a server, a desktop computer, or a controller of a storage array.
  • the management device 100 may be the computing node 330 in the system shown in FIG. 3B .
  • the structure of the computing node 330 reference may be made to the structure of the management device 100 shown in FIG. 2 , which will not be repeated here.
  • Compute node 330 may receive a data request and process the data request. For example, when the data request is a data read request, the computing node 330 can read data from the local memory or the storage node 340 in the storage node cluster according to the data request, and the processor 120 in the computing node 330 is judging When format conversion is required, the processor 120 in the computing node 330 can initiate a data processing request to the acceleration device 130 in the computing node 330, and trigger the acceleration device 130 in the computing node 330 to execute the data processing method provided by the embodiment of the present application . Compute node 330 may also feed back a data read response carrying the read data.
  • the computing node 330 may write data in the local memory or the storage node 340 in the storage node cluster according to the data writing request, and the processor 120 in the computing node 330 may write data to
  • the acceleration device 130 in the computing node 330 initiates a data processing request, and triggers the acceleration device 130 in the computing node 330 to execute the data processing method provided in the embodiment of the present application.
  • Computing node 330 may also feed back a data write response to indicate that the data has been successfully written.
  • Any computing node 330 can access any storage node 340 in the storage node cluster through the network.
  • the storage node cluster includes a plurality of storage nodes 340 (three storage nodes 340 are shown in FIG. 3B , but are not limited to three storage nodes 340 ).
  • a storage node 340 may include one or more hard disks.
  • the storage node 340 is mainly used to store data, such as storing data in a database. According to instructions initiated from the computing node 330, locally store data or read data from the local Feedback to computing nodes.
  • the centralized storage system and the distributed storage system mentioned above are only examples, and the data processing method provided in the embodiment of the present application is also applicable to other centralized storage systems and distributed storage systems.
  • the following describes the data processing method provided by the embodiment of the present application by taking the system and the management device 100 mentioned in FIG. 1 or FIG. 2 as an example with reference to FIG. 4 .
  • the method can be applied to the management device 100, including:
  • Step 401 The processor 120 sends a data processing request to the acceleration device 130 when determining that data format conversion is required.
  • the data processing request is used to request the acceleration device 130 to perform format conversion on the first data set in the database.
  • the first data set includes at least one piece of data, such as data in a fixed-length field, data in a variable-length field, or both data in a fixed-length field and data in a variable-length field.
  • the situations where the processor 120 determines that format conversion is required include the following two situations.
  • the storage method of the data in the database stored in the internal memory 140 is different from the storage method of the data in the database stored in the external memory 160, when the processor 120 needs to migrate the data in the external memory 160 to the internal memory 140 , or when migrating data in the internal memory 140 to the external storage 160, the processor 120 determines that format conversion needs to be performed on the data to be migrated.
  • the first data set is the data to be migrated.
  • the processor 120 receives a data request from the client 200 for requesting data in the database.
  • the storage format required by the requested data is inconsistent with the storage format of the data in the management device 100 .
  • the data request needs to request data in some columns, and the storage format of the data in the database in the management device 100 is a row-based storage format.
  • the data request needs to request data of some rows, and the storage format of the data in the database in the management device 100 is a column-based storage format.
  • Processor 120 determines that format conversion is required for the requested data. In this case, the first data set is the requested data.
  • the processor 120 will send a data processing request to the acceleration device 130 to request format conversion for the first data set.
  • Step 402 After the acceleration device 130 receives the data processing request, the acceleration device 130 may acquire the first data set first.
  • the processor 120 may send the address of the data to be migrated in the memory 140 to the acceleration device 130, and the acceleration device 130 may obtain the first data set from the memory 140 according to the address.
  • the processor 120 may also notify the acceleration device 130 of the address in the memory 140 where information related to the data is stored, and the address may be a continuous address segment or a set of discontinuous multiple address segments.
  • the relevant information of the data can indicate the information of the two-dimensional table, such as the information recorded in the header of the two-dimensional table, for example, the type of each field in the two-dimensional table (fixed-length field or variable-length field), the length of the fixed-length field, and whether the field can be empty.
  • the acceleration device 130 may read information related to the data from the memory 140 according to the address.
  • the processor 120 can send the address of the requested data in the memory 140 to the acceleration device 130, and the acceleration device 130 can The address fetches the first data set from memory 140 . If the requested data is stored in the external memory 160, the processor 120 can migrate the requested data from the external memory 160 to the internal memory 140, cache the requested data in the internal memory 140, and then the processor 120 can transfer the requested data.
  • the cache address in the memory 140 is sent to the acceleration device 130, and the acceleration device 130 can obtain the first data set from the memory 140 according to the cache address.
  • the processor 120 may also notify the acceleration device 130 of an address in the memory 140 where information related to the data is stored.
  • the acceleration device 130 may read information related to the data from the memory 140 according to the address.
  • Step 403 The acceleration device 130 performs format conversion on the first data set, and converts the first data set stored in the first manner into a second data set stored in the second manner.
  • the acceleration device 130 may convert the first data set stored in rows into the second data set stored in columns.
  • the acceleration device 130 may convert the first data set stored in columns into the second data set stored in rows.
  • Step 404 The acceleration device 130 stores the second data set in the target storage space.
  • the process of converting a row-stored dataset to a column-stored dataset is reciprocal to the process of converting a column-stored dataset to a row-stored dataset, here the first row-stored dataset is converted to a column-based
  • the process of storing the second data set is taken as an example to illustrate the way of format conversion.
  • the process of converting a row-stored data set into a column-stored data set can be obtained by reversing the process, and will not be repeated here.
  • the acceleration device 130 can adopt different operations when performing format conversion.
  • the operations performed by the acceleration device 130 during the format conversion process are described below for different fields. illustrate:
  • FIG. 5 it is a schematic diagram of a concrete two-dimensional table provided by the embodiment of the present application. There are multiple columns of fields in the two-dimensional table. In FIG. 5 , it is taken that there are N columns as an example. Each column is a field. A field can be a fixed-length field or a variable-length field.
  • the storage format of each row is shown in FIG. 6 . If the data of each row includes two parts, one part is the field description information, and the other part is the data in each field.
  • the field description information includes variable-length field length information, null value (null) information, and control information.
  • the variable-length field length information indicates the length of each variable-length field existing in the row.
  • Null value (null) information indicates whether each field in the row is a null value.
  • the control information is used to indicate the information for implementing concurrency control inside the database, such as information about concurrent processing of operations such as adding, deleting, checking, and modifying in the database.
  • FIG. 6 is only an exemplary display of a storage format for storing data in rows.
  • the embodiments of the present application are also applicable to other storage formats that store data in rows.
  • the acceleration device 130 When the acceleration device 130 performs format conversion for the fixed-length field, it can obtain each data under the fixed-length field, and each data can be arranged continuously and converted into column storage.
  • the acceleration device 130 may read the null value (null) information corresponding to the fixed-length field in each row of data, and determine whether each data under the fixed-length field is a null value. For a piece of data under the fixed-length field, if the null value information indicates a non-null value, the data is actually recorded during format conversion. If the null value information indicates a null value, during format conversion, the data is recorded with 0 bytes to indicate that the data is a null value. In the generated second data set, each data under the fixed-length field is arranged continuously. That is to say, each data under the fixed-length field is written next to each other in sequence when stored.
  • the second data set may further include null value indication information for indicating whether each data in the fixed-length field is a null value.
  • FIG. 7 it is a schematic diagram of format conversion for fixed-length fields provided by the embodiment of the present application.
  • the fixed-length field representing salary is converted into column-stored data through the acceleration device 130.
  • the acceleration device 130 additionally generates a column of null flag (null flag) fields, the null flag field includes multiple null flag values, each null flag value corresponds to a data in the fixed-length field, and is used to describe the corresponding Whether the data is null. For example, in FIG. 7 , 0 is used to represent a non-null value, and 1 is used to represent a null value.
  • each data in the fixed-length field is arranged continuously in the second data set, that is, the storage addresses of each data are continuous.
  • Each data in the null flag field is also arranged continuously in the second data set. That is to say, in the second data set, each data in the fixed-length field can be arranged continuously.
  • Each null flag value in the null value flag field is arranged consecutively.
  • the present application does not limit the ordering of each data in the fixed-length field and each null flag value in the null flag field in the second data set. For example, each data in the fixed-length field can be sorted first, and each null flag value in the null value flag field can be sorted last. For another example, each data in the fixed-length field can be sorted last, and each null flag value in the null value flag field can be sorted first.
  • the acceleration device 130 When the acceleration device 130 performs format conversion for a variable-length field, it can obtain each data under the variable-length field, and each data can be arranged continuously and converted into a column storage method. Since the length of each data in the variable-length field is not fixed, the acceleration device 130 may also add corresponding description information to describe the length of each data in the variable-length field or the position of each data in the second data set.
  • the acceleration device 130 when the acceleration device 130 performs format conversion for a variable-length field, it can read the description information in each row of data to obtain the length of the variable-length field and the corresponding null value (null) information, and determine the length of the variable-length field. The real length of each data and whether the data is null. For a piece of data under the variable-length field, if the null value information indicates a non-null value, the data is actually recorded during format conversion. If the null value information indicates a null value, during format conversion, the data is recorded with 0 bytes to indicate that the data is a null value. In the generated second data set, each data under the variable-length field is arranged continuously. That is to say, each data under the variable-length field is written next to each other in order when being stored, and the storage addresses of each data are continuous.
  • the acceleration device 130 may also generate position information when performing format conversion for the variable-length field.
  • the position information is used to indicate the position of each data under the variable-length field in the second data set.
  • the embodiment of the present application does not limit the manner in which the position information indicates the position of each data under the variable-length field in the second data set.
  • the position information can be the distance from the variable-length field
  • the offset of the first data such as the first byte of the first data
  • the location information can also be the offset of the data from the previous data (the last byte) (in this case, the offset
  • the shift can also be understood as the length of the data).
  • FIG. 8 it is a schematic diagram of format conversion for variable-length fields provided by the embodiment of the present application.
  • the variable-length field representing the name is converted into column-stored data through the acceleration device 130 .
  • the acceleration device 130 additionally generates a column of offset (offset) fields, the offset field includes a plurality of offset values, each offset value corresponds to a piece of data in the variable-length field, and is used to describe the corresponding data distance from the last The offset of the data.
  • the offset value of the first data TOM under the variable-length field is 3, indicating that the distance from the previous data, that is, the offset from the first byte of the data is 3 bytes.
  • the offset value of the second data brand under the variable-length field is 5, indicating that the last byte offset from the previous data, that is, the first data, is 5 bytes.
  • each data in the variable-length field is arranged continuously in the second data set, that is, the storage addresses of each data are continuous.
  • Each offset value in the offset field is also arranged continuously in the second data set. That is to say, in the second data set, each data in the variable-length field can be arranged continuously.
  • Each offset value in the offset field is arranged consecutively.
  • the present application does not limit the ordering of each data in the variable-length field and each offset value in the offset field in the second data set. For example, each data in the variable-length field may be sorted first, and each offset value in the offset field may be sorted last. For another example, each data in the variable-length field may be sorted last, and each offset value in the offset field may be sorted first.
  • the format of the first data set and the second data set are visualized.
  • the first data set to the second data set is equivalent to a "transposition", and the characters or values in the data set are unchanged, which is equivalent to converting a row in the first data set to the second A column in the second data set.
  • the so-called concretization here refers to the data composition method that can be visualized by the storage method of the data set.
  • each data in each row of the first data set is stored in a continuous arrangement.
  • the data in each row in the second data set is stored in a continuous arrangement.
  • the acceleration device 130 may also implement data format conversion. Specific to some data types in the data set, the data format required for data calculation and the data format required for storing data will be different.
  • the acceleration device 130 may also perform data format conversion on the data in the fields, converting the data format required for storing data into the data format required for data calculation.
  • the data format of the first data set before format conversion in FIG. 8 is the data format required for storing the data
  • the data format of the second data set is the data format required for data calculation.
  • the data type in the decimal field is decimal, which can be accurate to the last few digits of the decimal, and the specific accuracy to the last few digits of the decimal is related to the data itself.
  • some data description information of the decimal data may be required, or for the convenience of data calculation, the data length of the decimal data itself and the number of digits after the decimal point are required to meet the corresponding data description information.
  • the acceleration device 130 may also perform part or all of the following two operations:
  • the acceleration device 130 obtains the data description information corresponding to each data in such a decimal field, and uses it as a part of the second data set.
  • each data has corresponding data description information.
  • the data description information is used to describe the attributes of the corresponding data itself, such as indicating the sign (sign), precision (precision), and scale (scale) of the data.
  • the data description information can be stored together with the data in the first data set, or can be stored independently with the data in the first data set.
  • the sign indicates that the sign before the data of the decimal type is a positive sign and a negative sign.
  • the precision indicates the overall length of the decimal type data. Range indicates the number of digits after the decimal point for decimal data.
  • the data description information of the data may indicate that the sign of the data-3.01456 is a minus sign, the overall length of the data-3.01456 is 6 digits, and the number of digits after the decimal point is 5 digits.
  • the processor 120 needs to obtain the data description information so as to perform calculation or processing on the data.
  • the embodiment of the present application does not limit the manner in which the processor 120 acquires the data description information, and the acquisition manner of the data description information is related to the storage manner of the data description information in the database. In different databases, the data description information can be acquired in different ways.
  • the acceleration device 130 may also acquire data description information corresponding to each data in the decimal field, and use the data description information corresponding to each data as a part of the second data set.
  • the data description information corresponding to each data may also be arranged continuously in the second data set.
  • the acceleration device 130 performs a complement operation on each data in this type of decimal field.
  • the data in this type of decimal field in the first data set may be the data after zero is removed.
  • the acceleration device 130 may perform a classification of this type of decimal field according to the data description information (such as precision and range).
  • the data in is filled with bits, that is, filled with zeros.
  • the acceleration device 130 can pad zeros after the data, so that the number of digits after the decimal point in the data after the zero padding meets the requirements of the range; the acceleration device 130 can also pad zeros before the data, so that the overall number of digits of the data after the zero padding meet the precision requirements.
  • the original data in the decimal field is 0012.456123000.
  • the processor 120 can remove the two meaningless zeros at the head and the three meaningless zeros at the end, that is, the stored The data in the first data set will become 12.456123.
  • the acceleration device 130 After the acceleration device 130 acquires the data 12.456123, it can complement the data 12.456123.
  • the precision in the data description information indicates that the overall length of the data is 13.
  • the acceleration device 130 may pad two zeros at the head of the data 12.456123 and three zeros at the end.
  • the acceleration device 130 performs debit operation on each data in this type of decimal field.
  • the acceleration device 130 can remove zeros after the data, so that the number of digits after the decimal point in the data after the zero removal meets the requirements of the range; the acceleration device 130 can also remove zeros before the data, so that the overall number of digits of the data after the zero removal meet the precision requirements.
  • the data in this type of decimal field in the first data set is the data after zero padding.
  • the acceleration device 130 can analyze the data in this type of decimal field according to the data description information (such as precision and range).
  • the data is debited, such as removing meaningless zeros or values.
  • the original data in the decimal field is 12.456123.
  • the processor 120 stores the data, it can fill in meaningless data at the head of the data to ensure that the length of the data in the decimal field is 13. Three zeros, and two meaningless zeros at the end of the data, that is, the data in the first data set after storage will become 00012.45612300.
  • the acceleration device 130 acquires the data 00012.45612300, it can debit the data 00012.45612300.
  • the precision in the data description information indicates that the overall length of the data is 8.
  • the acceleration device 130 may remove the three zeros at the head and the two zeros at the tail of the data 00012.45612300.
  • the acceleration device 130 may also directly remove the head or tail of the data in the first data set regardless of the data description information of zero.
  • the data in the date-type field represents a date, but when storing the data in the date-type field, the data will be stored as numerical data. For example, for the date June 2, 2021, when storing, it will be stored as the value 20210602. However, when the processor 120 processes the data in this type of date field, it still needs to clearly identify the data representing the year, month, and day in the data.
  • the acceleration device 130 converts the data format of the data in the date field, it can convert the numeric data into date data.
  • the acceleration device 130 may decompose the numerical data into multiple sub-data, and one sub-data is used to represent a year, a month, or a day.
  • the acceleration device 130 may split the numerical data into three numerical sub-data of 2021, 06, and 02.
  • the acceleration device 130 may store the target storage space in the target storage space that the processor 120 applies for for the second data set.
  • the processor 120 may also carry the address of the target storage space in the data processing request, so that after the acceleration device 130 acquires the second data set, it can store the second data set in the target storage.
  • the target storage space can be the storage space in the internal memory 140 (such as the situation where the data in the external storage 160 needs to be migrated to the internal memory 140 in scenario one, or in the second and third scenarios), and the target storage space can also be an external storage space.
  • the storage space in the memory 160 (such as the situation where the data in the memory 140 needs to be migrated to the external memory 160 in Scenario 1).
  • the steps of the acceleration device 130 performing format conversion on the first data set and storing the second data set in the target storage space may be executed synchronously.
  • the acceleration device 130 may convert the format of the first data set, and store the converted data (in this case, the converted data is actually part of the data in the second data set) in the target storage space.
  • the acceleration device 130 may also perform format conversion on the first data set, and then perform step 404 after acquiring the entire second data set.
  • the data processing method provided in the embodiment of the present application is also applicable to other data sets constructed with two-dimensional tables.
  • the acceleration device 130 may also be deployed in a smart hard disk (SSD) to implement the function of format conversion.
  • the acceleration device 130 may be deployed in the controller of the smart hard disk.
  • the acceleration device 130 can convert the format of the first data set according to business requirements, generate a second data set, and store the second data in the in the storage space of the smart hard drive.
  • the acceleration device 130 reads the first data from the storage space of the smart hard disk, and performs format conversion on the first data according to business requirements, A second data set is generated, and the second data set is fed back.
  • the acceleration device 130 may also be deployed in the management device 100 for managing big data, so as to realize format conversion of data in a big data scenario.
  • the embodiment of the present application also provides an acceleration device, which is used to execute the method performed by the acceleration device in the method embodiment as shown in Figure 4 above, and related features can be found in The foregoing method embodiments will not be repeated here.
  • the acceleration device 1000 includes a request acquisition module 1001 , a data acquisition module 1002 , and a format conversion module 1003 .
  • the request acquiring module 1001 is configured to acquire a data processing request of the processor, the data processing request is used to realize the format conversion of the first data set in the database, and the first data set includes at least one piece of data.
  • the data acquisition module 1002 is configured to acquire a first data set according to a data processing request, and the first data set is stored in a first manner.
  • the format conversion module 1003 is configured to convert the format of the first data set according to the second method, obtain the second data set, and store the second data set in the target storage space, the second data set is stored in the second method, and the second data set is stored in the target storage space.
  • the data set includes at least one piece of data, and the second manner is different from the first manner.
  • the device 1000 in the embodiment of the present application can be implemented by an application-specific integrated circuit (ASIC), or a programmable logic device (PLD), and the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • the device 1000 and its modules can also be software modules.
  • the first manner may be row storage, and the second manner may be column storage; or the first manner may be column storage, and the second manner may be row storage.
  • Row storage is used to indicate that data is stored on the basis of rows in the database
  • column storage is used to indicate that data is stored on the basis of columns in the database.
  • the format conversion module 1003 when the first method is row storage and the second method is column storage, when the format conversion module 1003 performs the conversion, for the fixed-length field in the first data set, the format conversion module 1003 can obtain the first Each data under the fixed-length field in a data set is arranged continuously to generate a second data set, and the second data set also includes null value indication information, which is used to indicate that the data under the fixed-length field is empty value or non-null.
  • the format conversion module 1003 when the first method is row storage and the second method is column storage, when the format conversion module 1003 performs the conversion, for the variable-length fields in the first data set, the format conversion module 1003 can obtain the first Each data in the variable-length field in a data set is arranged continuously to generate a second data set.
  • the second data set also includes position indication information, and the position indication information is used to indicate that each data under the variable-length field is in the second data set. centralized location.
  • the format conversion module 1003 may also perform data format conversion on the data in the first data set to generate a second data set, wherein the data format of the data in the first data set is the data format required for storing data , the data format of the second data set is the data format required by the processor for data calculation.
  • the format conversion module 1003 may obtain the data description information of the decimal data when performing data format conversion, and use the data description information as As a part of the second data, the data description information includes: sign, precision, and scale; it is also possible to perform complement or debit operation on the decimal type data according to the precision and range.
  • the format conversion module 1003 may decompose the date-type data to obtain multiple sub-data, one sub-data Representing one of year, month and day, multiple sub-data are arranged continuously in the second data.
  • the device 1000 according to the embodiment of the present application may correspond to the implementation of the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of each unit in the device 1000 are respectively to realize the corresponding flow of each method in FIG. 4 , For the sake of brevity, details are not repeated here.
  • the present application also provides an acceleration device 130 as shown in FIG. 2 , the acceleration device 130 is used to implement the corresponding flow of the method described in FIG. 4 above, and for the sake of brevity, details are not repeated here. .
  • the present application also provides a management device, the management device includes an acceleration device 130, and the acceleration device 130 is used to implement the corresponding flow of the method described in FIG. 4 above.
  • the management device includes an acceleration device 130
  • the acceleration device 130 is used to implement the corresponding flow of the method described in FIG. 4 above.
  • each functional module in the embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product includes one or more computer program instructions. When the computer program instructions are loaded or executed on the computer, the processes or functions according to the embodiments of the present invention will be generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer program instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program instructions may be transmitted from a website, computer, server or A data center transmits to another website site, computer, server, or data center via wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive (SSD).
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Abstract

提供一种数据处理方法、装置以及设备,数据处理方法包括:处理器(120)向加速装置(130)发送数据处理请求,数据处理请求用于实现数据库中包括多个数据的第一数据集的格式转换;加速装置(130)在获取数据处理请求后,根据数据处理请求获取第一数据集,将以第一方式存储的第一数据集转换为第二方式存储的第二数据集,并将第二数据集存储在目标存储空间中,第二方式与第一方式不同。加速装置(130)能够对数据集进行转换,使得数据集既适用于OLTP业务场景,又适用于OLAP业务场景。处理器(120)不再执行转换操作,而是由加速装置(130)执行转换操作,能够较大程度的减少对处理器(120)的占用,保证了处理器(120)的数据处理效率,同时也提高了格式转换效率。

Description

一种数据处理方法、装置以及设备
相关申请的交叉引用
本申请要求在2021年06月11日提交中国专利局、申请号为202110653902.5、申请名称为“一种数据处理方法、装置以及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,尤其涉及一种数据处理方法、装置以及设备。
背景技术
通常数据库在存储数据时可以以行粒度存储和以列粒度存储。以行为准存储的数据在一定程度上保持了数据的原始形态,便于对数据进行增删查改等操作,更适用于联机事务处理(on-line transaction processing,OLTP)业务场景。以列为准存储的数据将相同字段的数据排布在一起存储,便于后续对数据进行分析,更适用于联机分析处理(on-line analytical processing,OLAP)业务场景中。
为了能够同时应对OLTP业务场景和OLAP业务场景,需要同时支持这两种存储格式。例如,当将数据存储在硬盘等存储器中时,以行为准存储。当需要执行数据分析等处理操作时,则需将该数据从硬盘等存储器转移至内存,并在内存中以列为准存储。在这个过程中,将数据从硬盘等存储器转移至内存需要对数据进行格式转换,目前对数据进行格式转换的任务主要由设备中的中央处理器(central processing unit,CPU)执行,由于格式转换过程中涉及到大量的数据拷贝和数据处理工作,占用了CPU的较多资源,对CPU的消耗较大。
发明内容
本申请提供一种数据处理方法、装置以及设备,用以加快实现格式转换,减少对CPU的消耗。
第一方面,本申请实施例提供了一种数据处理方法,该方法可以应用于包括加速装置以及处理器的设备中。处理器和加速装置可以通过PCIe相连,通过PCIe进行交互。在该方法中,处理器可以向加速装置发送数据处理器请求,数据处理请求用于实现数据库中包括多个数据的第一数据集的格式转换。加速装置在获取该数据处理请求后,可以根据数据处理请求获取第一数据集。加速装置可以对第一数据集进行格式转换,将以第一方式存储的第一数据集转换为第二方式存储的第二数据集。加速装置还可以将第二数据集存储在目标存储空间中。其中,第二数据集包括至少一个数据,第二方式与第一方式不同。
通过上述方法,在该设备中,能够对数据集进行转换,也即该设备能够支持两种不同的数据存储格式,既支持行存的方式又支持列存的方式。使得该设备既适用于OLTP业务场景,又适用于OLAP业务场景。在设备内部,处理器不再执行转换操作,而是由加速装置执行转换操作,能够较大程度的减少对处理器的占用,保证了处理器的数据处理效率,同时也提高了格式转换效率。
在一种可能的实施方式中,第一方式和第二方式分别为行存储或列存储,行存储用于指 示在数据库中以行为准存储数据,列存储用于指示在数据库中以列为准存储数据。
通过上述方法,加速装置能够将行存的第一数据集转换为列存的第二数据集,保证第二数据集能够用于OLAP业务场景,还能够将列存的第一数据集转换为行存的第二数据集,保证第二数据集能够用于OLTP业务场景。
在一种可能的实施方式中,当第一方式为行存储,第二方式为列存储时,加速装置在进行格式转换,对于不同类型的字段可以采用不同的转换方式。下面以定长字段和变长字段的转换方式为例进行说明:
1、对定长字段的格式转换。
加速装置可以获取第一数据集中定长字段下的各个数据,将各个数据连续排布,生成第二数据集,第二数据集还包括空值指示信息,空值指示信息用于指示定长字段下的数据为空值或为非空值。
2、对变长字段的格式转换。
加速装置获得第一数据集中变长字段的各个数据,将各个数据连续排布,生成第二数据集,第二数据集还包括位置指示信息,位置指示信息用于指示变长字段下的各个数据在第二数据集中的位置。
通过上述方法,加速装置针对不同的字段采用不同的方式进行格式转换,使得转换后的第二数据集能够清楚准确的记录数据、数据一些空值指示信息或位置指示信息,能够保证格式转换的有效性。
在一种可能的实施方式中,加速装置除了能够实现存储方式转换,还可以实现数据格式转换,将存储数据所需的数据格式转换为进行数据计算所需的数据格式。加速装置可以对第一数据集中数据进行数据格式转换,生成第二数据集,其中,第一数据集中数据的数据格式为存储数据所需的数据格式,第二数据集的数据格式为处理器进行数据计算所需的数据格式。
通过上述方法,将第一数据集转换为处理器进行数据计算所需的数据格式,可以保证处理器在后续进行数据计算时,能够方便快速的获取数据计算所需的数据,提高数据计算效率。
在一种可能的实施方式中,对于第一数据集中包括的数据类型为小数(decimal)类型的数据,加速装置在进行数据格式转换时,可以实施如下操作的部分或全部:
操作1、加速装置获取小数类型的数据的数据描述信息,将数据描述信息作为第二数据的一部分,数据描述信息包括:符号(sign)、精度(precision)、范围(scale)。
操作2、加速装置可以根据精度和范围对小数类型的数据进行补位操作或去位操作。
通过上述方法,加速装置能够将数据描述信息作为第二数据的一部分,便于后续处理器获取该数据描述信息。加速装置根据数据描述信息调整小数类型的数据,便于后续处理器对该数据进行数据计算。
在一种可能的实施方式中,对第一数据集中数据类型为日期(date)类型的数据,加速装置在进行数据格式转换时,加速装置可以对日期类型的数据进行分解,获取多个子数据,一个子数据表征年、月、日中的一个,多个子数据在第二数据中连续排布。
通过上述方法,加速装置将日期类型的数据分拆为分别表征年、月、日的子数据,便于处理器能够单独调用年、月、日的子数据进行数据计算。
在一种可能的实施方式中,加速装置为SOC、FPGA、GPU、ASIC、AI芯片或DPU中的至少一种。
通过上述方法,加速装置的实现方式较为多样灵活,适用于不同的场景。
第二方面,本申请实施例还提供了一种加速装置,该加速装置具有实现上述第以方面的 方法实例中行为的功能,有益效果可以参见第一方面的描述此处不再赘述。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述加速装置的结构中包括请求获取模块、数据获取模块、格式转换模块,这些模块可以执行上述第一方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第三方面,本申请实施例还提供了一种加速装置,该加速装置具有实现上述第一方面的方法实例中行为的功能,有益效果可以参见第一方面的描述此处不再赘述。所述装置的结构中包括处理器,可选的,还可以包括存储器和通信接口。所述处理器被配置为支持所述加速设备执行上述第一方面方法中相应的功能。所述存储器与所述处理器耦合,其保存所述通信装置必要的计算机程序指令和数据(如第一数据集或第二数据集)。所述加速装置的结构中还包括通信接口,用于与其他设备进行通信,如可以接收数据处理请求。
第四方面,本申请实施例还提供了一种计算设备,该计算设备包括加速装置和处理器,处理器用于向加速装置发送数据处理请求。加速装置具有实现上述第一方面的方法实例中行为的功能,有益效果可以参见第一方面的描述此处不再赘述。
第五方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面以及第一方面的各个可能的实施方式中所述的方法。
第六方面,本申请还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面以及第一方面的各个可能的实施方式中所述的方法。
第七方面,本申请还提供一种计算机芯片,所述芯片与存储器相连,所述芯片用于读取并执行所述存储器中存储的软件程序,执行上述第一方面以及第一方面的各个可能的实施方式中所述的方法。
附图说明
图1为本申请提供的一种系统的架构示意图;
图2为本申请提供的一种管理设备的结构示意图;
图3A~3B为本申请提供的一种存储系统的结构示意图;
图4为本申请提供的一种数据处理方法示意图;
图5为本申请提供的一种第一数据集的示意图;
图6为本申请提供的一种第一数据集中一行数据的示意图;
图7为本申请提供的一种第一数据集中定长字段的转换方法示意图;
图8为本申请提供的一种第一数据集中变长字段的转换方法示意图;
图9为本申请提供的一种第一数据集和第二数据集的示意图;
图10为本申请提供的一种加速装置的结构示意图。
具体实施方式
在对本申请实施例所提供的数据处理方法进行说明之前,对本申请实施例所涉及的概念进行说明:
(1)关系型数据库。
数据库可以理解为存储数据集合的一种形式。数据库中的数据可以按照特定的数据模型 组织、描述和存储。
关系型数据库是数据库中的一种,关系型数据库是指利用关系模型来建立数据关系,并基于上述数据关系存储数据的数据库。关系模型可以理解为二维表格模型。关系型数据库可以理解为由二维表及二维表之间的联系所组成的一个数据组织。
在关系模型中,一个关系可以理解为一张二维表。每个关系都具有一个关系名,也就是二维表的表名。二维表中包括多元组,每个元组可以理解为一张二维表中的一行,一个元组也可以称为一个记录。一个属性指二维表中的一列,也可以称为字段,一列中的各个数据可以称为该字段下的各个数据。
(2)定长字段、变长字段。
基于字段的长度对字段进行划分,字段的类型可以分为定长字段和变长字段。定长字段是指长度固定的。定长字段的长度通常记录在二维表的表头中。变长字段是指该字段中的各个数据的长度不同,变长字段的长度不固定。
(3)字符(character)型字段、整数(int)型字段、小数(decimal)型字段、日期(date)型字段。
从字段中数据的数据类型的角度,字段可以包括字符型字段、整数型字段、小数型字段、日期型字段。
字符型字段是指字段中的数据为字符。整数型字段是指字段中的数据为整数。小数型字段是指字段中的数据为精确数值,可精确到小数后的几位。日期型字段指示字段中的数据指示的是日期。
对于字符型字段和日期型字段属于定长字段。小数型字段和整数型字段在不同的关系中,可以为定长字段,也可以为变长字段。
如图1所示,为本申请实施例提供的管理系统架构示意图,该系统中包括客户端200以及管理设备100。
客户端200部署在用户侧,该用户可以通过客户端200向管理设备100发起数据请求,如用户可以通过客户端200向管理设备100发起用于请求数据库中的数据的数据请求,如用去请求读取数据的数据读取请求,或用于请求写入数据的数据写入请求。在OLAP业务场景中,用户可以通过客户端200向管理设备100发起用于请求数据库中的某一列或多列的数据的数据请求,例如,读取第一列中数据的数据请求。
本申请实施例并不限定客户端200的具体形态,例如客户端200可为部署在用户的本地计算设备(例如,服务器、计算机、笔记本电脑或移动终端等计算设备)或专用计算设备(例如,具有计算能力的卸载卡)的软件程序。该软件程序可以为浏览器,代理(agent)或文件分析软件。用户可以通过该软件程序与管理设备100连接,如在软件程序所在的计算设备与管理设备100之间通过以太网、无线网络(如WIFI、第5代(5 th Generation,5G)通信技术)建立网络连接,以进行信息交互。
如图2所示,为管理设备100的结构示意图,管理设备100包括总线110、处理器120、加速装置130、内存140、通信接口150以及外存160。处理器120、加速装置130、内存140、通信接口150之间通过总线110通信。总线110可以为基于快捷外围部件互连标准(peripheral component interconnect express,PCIe)的线路。
其中,处理器120可以为中央处理器(central processing unit,CPU)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)、人工智能(artificial intelligence,AI)芯片、片上系统(system on chip,SoC) 或复杂可编程逻辑器件(complex programmable logic device,CPLD),图形处理器(graphics processing unit,GPU)等。
内存140可以包括易失性存储器(volatile memory),例如RAM、DRAM等,也可以包括非易失性存储器(non-volatile memory),例如存储级内存(storage class memory,SCM)等,或者易失性存储器与非易失性存储器的组合等。
内存140中还可以包括操作系统等其他运行进程所需的软件模块。操作系统可以为LINUX TM,UNIX TM,WINDOWS TM等。该内存140中还可以存储数据库中的数据,如内存140中所存储的数据可以包括数据库中最近写入的数据,当内存140中的数据量达到一定阈值时,处理器120可以将内存140中的数据存储至外存160中,以进行持久化存储。在需要读取数据库的数据时,从外存160中读取的数据可以先存储在内存140,也可以是说,内存140中所存储的数据也可以包括从外存160中读取的数据。
外存160,也可以称为辅助存储器,该外存160可以为非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),硬盘驱动器(hard disk drive,HDD)或固态驱动器(solid state disk,SSD)等。外存160可以用于永久的存储数据。
内存140中所存储的数据库中的数据的存储方式和外存160中所存储的数据库中的数据的存储方式可以相同,也可以不同。例如,内存140和外存160中所存储的数据库中的数据均可以是以行为准的存储方式或以列为准的存储方式。又例如,内存140所存储的数据库中的数据以行为准的存储方式,外存160中所存储的数据库中的数据是以列为准的存储方式。又例如,内存140所存储的数据库中的数据以列为准的存储方式,外存160中所存储的数据库中的数据是以行为准的存储方式。
为了便于描述,本申请的以下实施例,以由加速装置130执行本申请实施例提供的数据处理方法为例进行说明,下面对加速装置130执行本申请实施例提供的数据处理方法所适用的几种场景进行说明。
场景一、内存140(也可以称为主存(main memory))中所存储的数据库中的数据的存储方式和外存160(也可以称为存储器)中所存储的数据库中的数据的存储方式不同。
内存140中所存储的数据库中的数据的存储方式和外存160中所存储的数据库中的数据的存储方式不同,在将外存160中的数据迁移至内存140时,或将内存140中的数据迁移至外存160时,需要对所需要迁移的数据进行格式转换。对数据的格式转换可以由加速装置130执行,也即加速装置130可以执行本申请实施例提供的数据处理方法,此为本申请实施例提供的数据处理方法所适用的一种场景。
在这种场景下,将外存160中的数据迁移至内存140或将内存140中的数据迁移至外存160可以是由处理器120主导的。也就是说,当需要将外存160中的数据迁移至内存140时,处理器120可以向外存160发起指令以获取需要迁移的数据,处理器120在从外存160中获取需要迁移的数据之后,处理器120可以将该需要迁移的数据存至内存140,处理器120还可以向加速装置130发起数据处理请求,以请求加速装置130对需要迁移的数据进行格式转换,加速装置130可以执行本申请实施例提供的数据处理方法。
当需要将内存140中的数据迁移至外存160时,处理器120可以从内存140中获取需要迁移的数据,之后向外存160发起指令,以指示外存160存储该需要迁移的数据,处理器120还可以向加速装置130发起数据处理请求,以请求加速装置130对需要迁移的数据进行格式转换,加速装置130可以执行本申请实施例提供的数据处理方法。
场景二、数据库中的数据在管理设备100中的存储格式是以行为准的存储格式,客户端 200所发起的数据请求用于请求部分列的数据。
在这种场景下,当来自客户端200的数据请求到达管理设备100时,管理设备100中的通信接口150接收到该数据请求,将该数据请求发送给处理器120;处理器120可以先确定该数据请求所请求的数据所存储的位置。
如果该数据请求所请求的数据存储在内存140中,处理器120可以从该内存140中读取所请求的数据,但由于数据库中的数据的存储格式是以行为准的存储格式存储的。处理器120可以向加速装置130发起数据处理请求,以请求加速装置130对所请求的数据进行格式转换,加速装置130可以执行本申请实施例提供的数据处理方法,将所请求的数据的格式转换为以列为准的存储格式。加速装置130在转换了所请求的数据的存储格式之后,处理器120再从该内存140中读取该转换了存储格式后的数据。
如果该数据请求所请求的数据存储在外存160中,处理器120可以将该数据请求所请求的数据从外存160移至内存140。处理器120可以向加速装置130发起数据处理请求,以请求加速装置130对所请求的数据进行格式转换,加速装置130可以执行本申请实施例提供的数据处理方法,将所请求的数据的格式转换为以列为准的存储格式。加速装置130在转换了所请求的数据的存储格式之后,处理器120再从该内存140中读取该转换了存储格式后的数据。
场景三、数据库中的数据在管理设备100中的存储格式是以列为准的存储格式,客户端200所发起的数据请求用于请求部分行的数据。
场景三与场景二类似,区别在于场景二中是将以行存储的数据转换为以列存储的数据。具有实现方式可以参见前述说明,此处不再赘述。
在硬件上,加速装置130包括处理器131以及通信接口132,处理器131与通信接口132通过总线连接。处理器131能够通信接收132与管理设备100中的其他组件(如处理器120)进行交互,如接收数据处理请求。
处理器131与处理器120类似,该处理器131可以为CPU、ASIC、FPGA、AI芯片、SoC、CPLD、或GPU等。加速装置130中的处理器131可以作为处理器120的协处理器部署在管理设备100中,与处理器120配合执行操作。
需要说明的是,加速装置130中可以单独设置存储器133,该存储器133可以存储计算机程序指令,还可以作为缓存存储格式转换前的数据(如本申请实施例中的第一数据集),还可以存储格式转换后的数据(如本申请实施例中的第二数据集)。在一种可能的情况下,处理器120和加速装置130中的处理器131可以共用内存140,也即内存140能够兼具存储器133的所有功能或部分功能。当内存140具备存储器133的全部功能时,这种情况下加速装置130中可以不再单独设备存储器133。
处理器131可以通过调用存储器133或内存140(在加速装置130中不设置存储器133或存储器133不用于存储计算机程序指令,只具备缓存功能的情况下)中存储的计算机程序指令(如该处理器131为CPU、AI芯片或GPU时),执行本申请实施例提供的数据处理方法。处理器131也自行运行烧写在处理器131上的计算机程序指令或硬件电路的处理逻辑(如该处理器131为ASIC、FPGA、SoC、或CPLD时),执行本申请实施例提供的数据处理方法。
在本申请实施例中,管理设备100能够用于管理数据库,例如该管理设备100可以是集中式存储系统或分布式存储系统中的节点,能够对集中式存储系统或分布式存储系统中的数据库进行管理。
如图3A所示,为本申请实施例提供的一个存储系统300。该存储系统为集中式存储系统, 其特点是有统一的入口,所有从外部设备来的数据都要经过该入口,该入口为集中式存储系统的引擎。引擎是集中式存储系统中最为核心的部件,许多存储系统的高级功能都在其中实现。本申请实施例中为例保证引擎的可靠性,可以部署多个引擎。在图3A所示的系统架构中以存在引擎310为例。本申请实施例并不限定引擎的数量。
引擎310中有一个或多个控制器,图3A以引擎310包含两个控制器为例予以说明。控制器0与控制器1之间具有镜像通道,控制器0和控制器1互为备份,当控制器0发生故障时,控制器1可以接管控制器0的业务,当控制器1发生故障时,控制器0可以接管控制器1的业务,从而避免硬件故障导致整个存储系统300的不可用。当引擎310中部署有4个控制器时,任意两个控制器之间都具有镜像通道,因此任意两个控制器互为备份。控制器0能够接收数据请求,处理该数据请求。例如,当该数据请求为数据读取请求时,控制器0可以根据该数据请求从本地的内存或硬盘320中读取数据,在控制器0中的处理器120在判断需要进行格式转换的情况下,控制器0中的处理器120可以向控制器0中的加速装置130发起数据处理请求,触发控制器0中的加速装置130执行本申请实施例提供的数据处理方法。控制器0还可以反馈携带有读取的数据的数据读取响应。例如,当该数据请求为数据写入请求时,控制器0可以根据该数据写入请求在本地的内存或硬盘320中写入数据,若控制器0中的处理器120在判断需要进行格式转换的情况下,控制器0中的处理器120可以向控制器0中的加速装置130发起数据处理请求,触发控制器0中的加速装置130执行本申请实施例提供的数据处理方法。控制器0还可以反馈数据写入响应,以指示该数据已成功写入。
在本申请实施例中管理设备100可以为图3A所示的系统中的引擎310中的控制器1或控制0。关于控制器1或控制器0的结构可以参见图2所示的管理设备100的结构,此处不再赘述。
图3A所示的是一种盘控分离的集中式存储系统。在该系统中,引擎310可以不具有硬盘槽位,硬盘320需要放置在硬盘框中,后端接口116与硬盘框通信。后端接口116以适配卡的形态存在于引擎310中,一个引擎310上可以同时使用两个或两个以上后端接口116来连接多个硬盘框。或者,适配卡也可以集成在主板上,此时适配卡可通过PCIE总线与处理器120通信。在该系统中,引擎130也可以具有硬盘槽位,硬盘320直接插入到硬盘槽位中。
如图3B所示,为本申请实施例提供的另一种存储系统架构示意图,图3B的存储系统为一种分布式存储系统,该存储系统300中包括计算节点集群和存储节点集群。计算节点集群包括一个或多个计算节点330(图3B中示出了两个计算节点330,但不限于两个计算节点330),各个计算节点330之间可以相互通信。计算节点330是一种计算设备,如服务器、台式计算机或者存储阵列的控制器等。
在本申请实施例中管理设备100可以为图3B所示的系统中的计算节点330。关于计算节点330的结构可以参见图2所示的管理设备100的结构,此处不再赘述。
计算节点330可以接收数据请求,处理该数据请求。例如,当该数据请求为数据读取请求时,计算节点330可以根据该数据请求从本地的内存或存储节点集群中的存储节点340中读取数据,在计算节点330中的处理器120在判断需要进行格式转换的情况下,计算节点330中的处理器120可以向计算节点330中的加速装置130发起数据处理请求,触发计算节点330中的加速装置130执行本申请实施例提供的数据处理方法。计算节点330还可以反馈携带有读取的数据的数据读取响应。例如,当该数据请求为数据写入请求时,计算节点330可以根据该数据写入请求在本地的内存或存储节点集群中的存储节点340写入数据,计算节点330中的处理器120可以向计算节点330中的加速装置130发起数据处理请求,触发计算节点330 中的加速装置130执行本申请实施例提供的数据处理方法。计算节点330还可以反馈数据写入响应,以指示该数据已成功写入。
任意一个计算节点330可通过网络访问存储节点集群中的任意一个存储节点340。存储节点集群包括多个存储节点340(图3B中示出了三个存储节点340,但不限于三个存储节点340)。一个存储节点340中可以包括一个或多个硬盘,存储节点340主要用于存储数据,如存储数据库中的数据,根据来自计算节点330发起的指令,在本地存储数据或从本地读取数据将数据反馈给计算节点。
上述提及的集中式存储系统以及分布式存储系统仅是举例,本申请实施例提供的数据处理方法也适用于其他集中式存储系统以及分布式存储系统。
下面结合附图4,以图1或图2所提及的系统以及管理设备100为例,对本申请实施例提供的数据处理方法进行说明。该方法可以应用于管理设备100,包括:
步骤401:处理器120在确定需要对数据进行格式转换时,向加速装置130发送数据处理请求。该数据处理请求用于请求加速装置130对数据库中的第一数据集进行格式转换。该第一数据集包括至少一个数据,如可以包括定长字段中的数据,也可以包括变长字段中的数据,还可以既包括定长字段中的数据又包括变长字段中的数据。
根据前述说明,处理器120确定需要进行格式转换的情况包括如下两种。
第一种、内存140中所存储的数据库中的数据的存储方式和外存160中所存储的数据库中的数据的存储方式不同,处理器120需要将外存160中的数据迁移至内存140时,或将内存140中的数据迁移至外存160时,处理器120确定需要对需要迁移的数据进行格式转换。在这种情况下,该第一数据集即为需要迁移的数据。
第二种、处理器120接收到来自客户端200的数据请求,用于请求数据库中的数据。当所请求的数据所需的存储格式与该数据在管理设备100中的存储格式不一致时。例如,该数据请求需要请求部分列的数据,而数据库中的数据在管理设备100的存储格式是以行为准的存储格式。又例如,该数据请求需要请求部分行的数据,而数据库中的数据在管理设备100的存储格式是以列为准的存储格式。处理器120确定需要对所请求的数据进行格式转换。在这种情况下,该第一数据集即为所请求的数据。
无论在哪一种情况下,处理器120在确定需要对第一数据集进行格式转换后,处理器120均会向加速装置130发送数据处理请求,以请求对该第一数据集进行格式转换。
步骤402:加速装置130在接收到该数据处理请求之后,加速装置130可以先获取第一数据集。
对应步骤401中的第一种情况,处理器120可以将需要迁移的数据在内存140中的地址发送给加速装置130,加速装置130可以根据该地址从内存140中获取该第一数据集。处理器120还可以通知加速装置130存储该数据的相关信息在内存140中的地址,该地址可以为连续地址段或不连续的多个地址段的集合。该数据的相关信息可以指示二维表的信息,如该数据的相关信息可以为二维表的表头所记录的信息,例如,该二维表中每个字段的类型(为定长字段还是变长字段)、定长字段的长度、以及字段是否可以为空的属性等。加速装置130可以根据该地址从内存140中读取该数据的相关信息。
对应步骤401中的第二种情况,若所请求的数据存储已存储在内存140中,处理器120可以将所请求的数据在内存140中的地址发送给加速装置130,加速装置130可以根据该地址从内存140中获取该第一数据集。若所请求的数据存储在外存160中,处理器120可以将所请求的数据从外存160迁移至内存140,在内存140中缓存所请求的数据,之后,处理器 120可以将所请求的数据在内存140中的缓存地址发送给加速装置130,加速装置130可以根据该缓存地址从内存140中获取该第一数据集。处理器120还可以通知加速装置130存储该数据的相关信息在内存140中的地址。加速装置130可以根据该地址从内存140中读取该数据的相关信息。
步骤403:加速装置130对第一数据集进行格式转换,将以第一方式存储的第一数据集转换为以第二方式存储的第二数据集。例如,加速装置130可以将以行存储的第一数据集转换为以列存储的第二数据集。又例如,加速装置130可以将以列存储的第一数据集转换为以行存储的第二数据集。
步骤404:加速装置130将第二数据集存储至目标存储空间。
将行存储的数据集转换为以列存储的数据集的过程与将列存储的数据集转换为以行存储的数据集的过程互逆,这里以将行存储的第一数据集转换为以列存储的第二数据集的过程为例,对格式转换的方式进行说明。对于以将行存储的数据集转换为以列存储的数据集的过程,将行存储的数据集转换为以列存储的数据集的过程进行逆向操作即可获得,此处不再赘述。
在以行存储的第一数据集中,对于不同类型的字段,在进行格式转换时,加速装置130可以采用不同的操作,下面分别针对不同字段,加速装置130在格式转换过程中所执行的操作进行说明:
如图5所示,为本申请实施例提供的一种具象的二维表的示意图。该二维表中存在多列字段,在图5中,以存在N列为例。每列为一个字段。一个字段可以为定长字段,也可以为变长字段。
在以行存储该二维表时,每一行的数据在存储时,每行的存储格式如图6所示。若每行的数据包括两部分,一部分为字段描述信息,另一部分为每个字段中的数据。
字段描述信息中包括变长字段长度信息、空值(null)信息、控制信息。其中变长字段长度信息指示该行中存在的各个变长字段的长度。空值(null)信息指示该行中各个字段是否为空值。控制信息用于指示数据库内部实现并发控制的信息,比如对数据库中进行增删查改等操作并发处理的相关信息。
需要说明的是,图6仅是示例性的展示以了以行存储数据时中的一种存储格式。本申请实施例也同样适用于其他以行存储数据的存储格式。
接下来,分别以定长字段和变长字段为例进一步解释加速装置如何进行数据转换的过程。
1)针对定长字段的格式转换方式。
加速装置130在针对定长字段进行格式转换时,可以获取该定长字段下的各个数据,该各个数据可以连续排列,转换为列存的方式。
例如,加速装置130在针对定长字段进行格式转换时,可以读取每行数据中该定长字段所对应的空值(null)信息,确定该定长字段下的各个数据是否为空值。对于该定长字段下的一个数据,若空值信息指示为非空值,在格式转换时,真实记录该数据。若空值信息指示为空值,在格式转换时,用0字节记录该数据,以表示该数据为空值。在生成的第二数据集中该定长字段下的各个数据连续排列。也即该定长字段下的各个数据在存储时,是按顺序紧邻写入的。
为了能够更加直观的表征该定长字段中各个数据是否为空值,该第二数据集中还可以包括空值指示信息,用于指示该定长字段中各个数据是否为空值。
如图7所示,为本申请实施例提供的一种针对定长字段的格式转换示意图。
以行存的第一数据集(也即表A),表示薪资的定长字段经过加速装置130,转换为列存 的数据。加速装置130还额外生成了一列空值标志(null flag)字段,该null flag字段中包括多个null flag值,每个null flag值与定长字段中的一个数据对应,用于描述所对应的数据是否为空值。例如,图7中,利用0表示非空值,利用1表示空值。
在将第一数据集转换为第二数据集时,该定长字段中的各个数据在第二数据集中连续排布,也即该各个数据存储地址是连续的。空值标志字段中的各个数据在第二数据集中也是连续排布的。也就是说,在第二数据集中,该定长字段中的各个数据可以连续排布。该空值标志字段中的各个null flag值连续排布。本申请并不限定该定长字段中的各个数据与该空值标志字段中的各个null flag值在第二数据集中的排序。例如,该定长字段中的各个数据可以排序在前,该空值标志字段中的各个null flag值可以排序在后。又例如,该定长字段中的各个数据可以排序在后,该空值标志字段中的各个null flag值可以排序在前。
2)、针对变长字段的格式转换方式。
加速装置130在针对变长字段进行格式转换时,可以获取该变长字段下的各个数据,该各个数据可以连续排列,转换为列存的方式。由于变长字段中的各个数据的长度不固定,加速装置130还可以增加相应的描述信息,以描述该变长字段中各个数据的长度或各个数据在第二数据集中的位置。
例如,加速装置130在针对变长字段进行格式转换时,可以读取每行数据中的描述信息中获取该变长字段的长度以及所对应的空值(null)信息,确定该变长字段下的各个数据的真实长度以及数据是否为空值。对于该变长字段下的一个数据,若空值信息指示为非空值,在格式转换时,真实记录该数据。若空值信息指示为空值,在格式转换时,用0字节记录该数据,以表示该数据为空值。在生成的第二数据集中该变长字段下的各个数据连续排列。也即该变长字段下的各个数据在存储时,是按顺序紧邻写入的,各个数据的存储地址是连续的。
由于该变长字段中下各个数据的长度不一致,加速装置130在针对变长字段进行格式转换时,还可以生成位置信息。该位置信息用于指示该变长字段下的各个数据在第二数据集中的位置。本申请实施例并不限定位置信息指示该变长字段下的各个数据在第二数据集中的位置的方式,例如,对于变长字段中的任一数据,位置信息可以为距离该变长字段下第一个数据(如第一个数据的第一个字节)的偏移量;位置信息也可以为该数据距离上一个数据(最后一个字节)的偏移量(这种情况下,偏移量也可以理解为该数据的长度)。
如图8所示,为本申请实施例提供的一种针对变长字段的格式转换示意图。
以行存的第一数据集(也即表A),表示姓名的变长字段经过加速装置130,转换为列存的数据。加速装置130还额外生成了一列偏移量(offset)字段,该offset字段中包括多个offset值,每个offset值与变长字段中的一个数据对应,用于描述所对应的数据距离上一个数据的偏移量。例如,图8中,该变长字段下的第一个数据TOM的offset值为3,表示距离上一个数据,也即与该数据的第一个字节偏移量为3字节。该变长字段下的第二个数据brand的offset值为5,表示距离上一个数据,也即第一个数据的最后一个字节偏移量为5字节。
在将第一数据集转换为第二数据集时,该变长字段中的各个数据在第二数据集中连续排布,也即该各个数据存储地址是连续的。该offset字段中的各个offset值在第二数据集中也是连续排布的。也就是说,在第二数据集中,该变长字段中的各个数据可以连续排布。该offset字段中的各个offset值连续排布。本申请并不限定该变长字段中的各个数据与该offset字段中的各个offset值在第二数据集中的排序。例如,该变长字段中的各个数据可以排序在前,该offset字段中的各个offset值可以排序在后。又例如,该变长字段中的各个数据可以排序在后,该offset字段中的各个offset值可以排序在前。
如图9所示,为将行存的第一数据集转换为列存的第二数据集后,第一数据集与第二数据集具象化的格式。从图9中可以看到,第一数据集到第二数据集,相当于做了“转置”,数据集中字符或数值是不变的,相当于将第一数据集中的一行,转换为第二数据集中的一列。这里所谓具象化是指数据集的存储方式所能具象的数据组成方式。在实际存储时,第一数据集的每行中的各个数据是以连续排布的方式存储的。第二数据集中的每行中的各个数据是以连续排布的方式存储的。
上面针对不同字段进行格式转换,加速装置130执行的操作进行了说明。作为一种可能的实现方式,除了上文解释的格式转换以外,加速装置还可以实现数据格式的转换。具体到数据集中的一些数据类型,在进行数据计算时所需要的数据格式与存储数据所需的数据格式会存在不同。加速装置130在对第一设备进行格式转换时,还可以对字段中的数据进行数据格式转换,将存储数据所需的数据格式转换为进行数据计算时所需要的数据格式。举例来说,在上述图8中格式转换前的第一数据集的数据格式为存储该数据所需的数据格式,第二数据集的数据格式为数据计算时所需要的数据格式。
例如,对于小数型字段,该小数型字段中数据类型为小数型,可以精确到小数的后几位,而具体精确到该小数的后几位跟数据本身有关。在进行数据计算时,可能会需要该小数型数据的一些数据描述信息,也可能为了方便数据计算,要求该小数型数据本身的数据长度、以及小数点后的位数满足其对应的数据描述信息。加速装置130还可以执行下列两种操作的部分或全部:
操作一、加速装置130获取这类小数型字段中的各个数据对应的数据描述信息,将其作为第二数据集的一部分。
对于这类小数型字段中的各个数据,每个数据有对应的数据描述信息。该数据描述信息用于描述所对应的数据本身的属性,如指示数据的符号(sign)、精度(precision)、以及范围(scale)。该数据描述信息可以与第一数据集中的数据一起存储,也可以与第一数据集中的数据独立存储。
符号指示该小数型的数据前的符号为正号,还负号。精度指示该小数型的数据整体的长度。范围指示小数型的数据小数点后的位数。
举例来说,对于小数型字段中的数据-3.01456。该数据的数据描述信息可以指示数据-3.01456的符号为负号,数据-3.01456的整体长度为6位,小数点后的位数为5位。
处理器120通常在处理小数型字段中的数据时,需要获取该数据描述信息,以便对数据进行计算或处理。本申请实施例并不限定处理器120获取该数据描述信息的方式,数据描述信息的获取方式与数据库中数据描述信息存储方式有关。在不同的数据库中,可以采用不同的方式获取该数据描述信息。
加速装置130在对该小数型字段进行格式转换时,也可以获取该小数型字段中的各个数据对应的数据描述信息,将各个数据对应的数据描述信息作为第二数据集的一部分。各个数据对应的数据描述信息在第二数据集也可以连续排布。
操作二、加速装置130对这类小数型字段中的各个数据进行补位操作。
在一些存储场景中,在存储这类小数型字段中的数据时,有时为了节省存储空间,会将该数据中头部、或末尾的无意义的零去除,在去除零后再存储。这里的无意义是指对数据的数值不存在影响。
故而,第一数据集中这类小数型字段中的数据可能为去除零之后的数据,为了能够恢复出原始的数据,加速装置130可以根据数据描述信息(如精度以及范围)对这类小数型字段 中的数据进行补位操作,也即补零。
加速装置130可以在数据之后补零,以使得补零之后的数据中小数点后的位数满足该范围的要求;加速装置130还可以在数据之前补零,使得补零之后的数据的整体位数满足精度的要求。
举例来说,小数型字段中的一个原始的数据为0012.456123000,处理器120在存储该数据时,可以去除头部无意义的两个零,以及末尾无意义的三个零,也即存储后的第一数据集中该数据将变为12.456123。加速装置130在获取数据12.456123后,可以为该数据12.456123进行补位,当该数据描述信息中的范围指示需要精确到小数点后9位,该数据描述信息中的精度指示该数据整体长度为13,加速装置130可以在该数据12.456123的头部补两个零,在尾部补三个零。
操作三、加速装置130对这类小数型字段中的各个数据进行去位操作。
在另一些存储场景中,在存储这类小数型字段中的数据时,有时为了保证该小数型字段中的各个数据所占用的存储空间接近,会在该数据中头部、或末尾补无意义的零,在补零后再存储。这里的无意义是指对数据的数值不存在影响。这里仅是以补零为例,在实际应用中,也可以补充其他数值。
加速装置130可以在数据之后去零,以使得去零之后的数据中小数点后的位数满足该范围的要求;加速装置130还可以在数据之前去零,使得去零之后的数据的整体位数满足精度的要求。
故而,第一数据集中这类小数型字段中的数据为补零之后的数据,为了能够恢复出原始的数据,加速装置130可以根据数据描述信息(如精度以及范围)对这类小数型字段中的数据进行去位操作,如去除无意义的零或数值。
举例来说,小数型字段中的一个原始的数据为12.456123,处理器120在存储该数据时,可以为保证该小数型字段中的数据长度均为13,可以在该数据的头部补无意义的三个零,以及在该数据的末尾补无意义的二个零,也即存储后的第一数据集中该数据将变为00012.45612300。加速装置130在获取数据00012.45612300后,可以为该数据00012.45612300进行去位,当该数据描述信息中的范围指示需要精确到小数点后6位,该数据描述信息中的精度指示该数据整体长度为8,加速装置130可以去除该数据00012.45612300的头部的三个零以及尾部的两个零。
需要说明的是,如果在存储数据时,在原始数据的头部或尾部补充了无意义的零,加速装置130也可以不考虑数据描述信息,直接去除第一数据集中该数据的头部或尾部的零。
又例如,对于日期型字段,该日期型字段中的数据表示的是日期,但在存储该日期型字段中的数据时,会将该数据以数值型的数据进行存储。例如,对于日期2021年6月2日,在存储时,会以数值20210602来存储。但处理器120在处理这类日期型字段中的数据时,还是需要明确出该数据中表示年、月、日的数据。
故而加速装置130在对日期型字段中的数据进行数据格式转换时,可以将数值型的数据转换为日期型的数据。对于第一数据集中日期型字段中的一个数值型数据,加速装置130可以将该数值型数据分解为多个子数据,一个子数据用于表征年、月、或日。
仍以数值型数据为20210602为例,加速装置130可以将该数值型数据分拆为2021、06、02这三个数值型的子数据。
加速装置130在生成第二数据集之后,可以将加速装置130存储在目标存储空间,该目标存储空间是处理器120为该第二数据集申请的。处理器120在发起该数据处理请求时,也 可以将该目标存储空间的地址携带在该数据处理请求中,以便加速装置130在获取第二数据集后,可以将该第二数据集存储至目标存储空间。该目标存储空间可以是内存140中的存储空间(如在场景一中需要将外存160中的数据迁移至内存140的情况、或在场景二、场景三),该目标存储空间也可以是外存160中的存储空间(如在场景一中需要将内存140中的数据迁移至外存160的情况)。
需要说明的是,加速装置130对第一数据集进行格式转换以及将第二数据集存储至目标存储空间的步骤可以同步执行的。加速装置130可以一边对第一数据集进行格式转换,一边将转换后的数据(这种情况下,转换后的数据实际为第二数据集中的部分数据)存储在目标存储空间。当然,加速装置130也可以对该第一数据集进行格式转换,获取了整个第二数据集后,再执行步骤404。
作为另一种可能的实现方式,本申请实施例提供的数据处理方法也同样适用于其他以二维表构建的数据集合中。例如,该加速装置130也可以部署在智能硬盘(SSD)中,以实现格式转换的功能。加速装置130可以部署在智能硬盘中的控制器中。当智能硬盘接收的数据存储指令,指示需要存储第一数据集时,加速装置130可以根据业务需求,对该第一数据集进行格式转换,生成第二数据集,并将该第二数据存储在该智能硬盘的存储空间中。当智能硬盘接收到数据读取指令,指示需要读取第一数据集时,加速装置130从智能硬盘的存储空间中读取第一数据,并根据业务需求,对该第一数据进行格式转换,生成第二数据集,反馈该第二数据集。
作为另一种可能的实现方式,在大数据场景下,对大数据进行管理的管理设备100中也可以部署有该加速装置130,以实现在大数据场景下对数据的格式转换。
基于与方法实施例同一发明构思,本申请实施例还提供了一种加速装置,该加速装置用于执行上述如图4所示的方法实施例中所述加速装置执行的方法,相关特征可参见上述方法实施例,此处不再赘述。如图10所示,所述加速装置1000包括请求获取模块1001、数据获取模块1002、格式转换模块1003。
请求获取模块1001,用于获取处理器的数据处理请求,数据处理请求用于实现数据库中第一数据集的格式转换,第一数据集包括至少一个数据。
数据获取模块1002,用于根据数据处理请求获取第一数据集,第一数据集以第一方式存储。
格式转换模块1003,用于根据第二方式对第一数据集进行格式转换,获得第二数据集,并将第二数据集存储至目标存储空间,第二数据集以第二方式存储,第二数据集包括至少一个数据,第二方式与第一方式不同。
应理解的是,本申请实施例的装置1000可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现图4所示的方法时,装置1000及其各个模块也可以为软件模块。
在一种可能的实施方式中,第一方式可以为行存储,第二方式为列存储;或者第一方式可以为列存储,第二方式为行存储。行存储用于指示在数据库中以行为准存储数据,列存储用于指示在数据库中以列为准存储数据。
在一种可能的实施方式中,当第一方式为行存储,第二方式为列存储时,格式转换模块 1003在进行转换时,对于第一数据集中定长字段,格式转换模块1003可以获取第一数据集中定长字段下的各个数据,将各个数据连续排布,生成第二数据集,第二数据集还包括空值指示信息,空值指示信息用于指示定长字段下的数据为空值或为非空值。
在一种可能的实施方式中,当第一方式为行存储,第二方式为列存储时,格式转换模块1003在进行转换时,对于第一数据集中变长字段,格式转换模块1003可以获得第一数据集中变长字段的各个数据,将各个数据连续排布,生成第二数据集,第二数据集还包括位置指示信息,位置指示信息用于指示变长字段下的各个数据在第二数据集中的位置。
在一种可能的实施方式中,格式转换模块1003还可以对第一数据集中数据进行数据格式转换,生成第二数据集,其中,第一数据集中数据的数据格式为存储数据所需的数据格式,第二数据集的数据格式为处理器进行数据计算所需的数据格式。
在一种可能的实施方式中,当第一数据集包括数据类型为小数类型的数据,格式转换模块1003在进行数据格式转换时,可以获取小数类型的数据的数据描述信息,将数据描述信息作为第二数据的一部分,数据描述信息包括:sign、precision、scale;也可以根据精度和范围对小数类型的数据进行补位操作或去位操作。
在一种可能的实施方式中,当第一数据包括数据类型为日期类型的数据,格式转换模块1003在进行数据格式转换时,可以对日期类型的数据进行分解,获取多个子数据,一个子数据表征年、月、日中的一个,多个子数据在第二数据中连续排布。
根据本申请实施例的装置1000可对应于执行本申请实施例中描述的方法,并且装置1000中的各个单元的上述和其它操作和/或功能分别为了实现图4中的各个方法的相应流程,为了简洁,在此不再赘述。
作为一种可能的实施例,本申请还提供一种如图2所述的加速装置130,该加速装置130用于实现上述图4所述的方法的相应流程,为了简洁,在此不再赘述。
作为另一种可能的实施例,本申请还提供一种管理设备,该管理设备包括加速装置130,该加速装置130用于实现上述图4所述的方法的相应流程,为了简洁,在此不再赘述。
需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机程序指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。 因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (17)

  1. 一种数据处理的方法,其特征在于,所述方法包括:
    加速装置获取处理器的数据处理请求,其中,所述加速装置和所述处理器设置在第一设备中,所述处理器和所述加速装置通过快捷外围部件互连标准PCIe相连,所述数据处理请求用于实现数据库中第一数据集的格式转换,所述第一数据集包括至少一个数据;
    所述加速装置根据所述数据处理请求获取第一数据集,所述第一数据集以第一方式存储;
    所述加速装置根据第二方式对所述第一数据集进行格式转换,获得第二数据集,并将所述第二数据集存储至目标存储空间,所述第二数据集以第二方式存储,所述第二数据集包括至少一个数据,所述第二方式与所述第一方式不同。
  2. 根据权利要求1所述的方法,其特征在于,所述第一方式和所述第二方式分别为行存储或列存储,所述行存储用于指示在所述数据库中以行为准存储数据,所述列存储用于指示在所述数据库中以列为准存储数据。
  3. 根据权利要求2所述的方法,其特征在于,当所述第一方式为行存储,所述第二方式为列存储时,所述加速装置根据第二格式将所述第一数据集进行转换,获得第二数据集,包括:
    所述加速装置获取所述第一数据集中定长字段下的各个数据,将所述各个数据连续排布,生成所述第二数据集,所述第二数据集还包括空值指示信息,所述空值指示信息用于指示所述定长字段下的数据为空值或为非空值。
  4. 根据权利要求2所述的方法,其特征在于,当所述第一方式为行存储,所述第二方式为列存储时,所述加速装置根据第二格式将所述第一数据集进行转换,获得第二数据集,包括:
    所述加速装置获得所述第一数据集中变长字段的各个数据,将所述各个数据连续排布,生成所述第二数据集,所述第二数据集还包括位置指示信息,所述位置指示信息用于指示所述变长字段下的各个数据在所述第二数据集中的位置。
  5. 根据权利要求3或4所述的方法,其特征在于,所述方法还包括:
    所述加速装置对所述第一数据集中数据进行数据格式转换,生成所述第二数据集,其中,所述第一数据集中数据的数据格式为存储数据所需的数据格式,所述第二数据集的数据格式为所述处理器进行数据计算所需的数据格式。
  6. 根据权利要求5所述的方法,其特征在于,所述第一数据集包括数据类型为小数decimal类型的数据,所述方法还包括:
    所述加速装置获取所述小数类型的数据的数据描述信息,将所述数据描述信息作为所述第二数据的一部分,所述数据描述信息包括:符号sign、精度precision、范围scale;
    所述加速装置根据所述精度和所述范围对所述小数类型的数据进行补位操作或去位操作。
  7. 根据权利要求5所述的方法,其特征在于,所述第一数据集包括数据类型为日期date类型的数据,所述方法还包括:
    所述加速装置对所述日期类型的数据进行分解,获取多个子数据,一个所述子数据表征所述年、月、日中的一个,所述多个子数据在所述第二数据中连续排布。
  8. 根据权利要求1~7中任一项所述的方法,其特征在于,所述加速装置为系统级芯片SOC、现场可编程逻辑门阵列FPGA、图像处理器GPU、专用集成电路ASIC、人工智能AI芯片或数据处理器DPU中的至少一种。
  9. 一种加速装置,其特征在于,所述加速装置与处理器部署在第一设备中,所述处理器和所述加速装置通过快捷外围部件互连标准PCIe相连,所述加速装置包括请求获取模块、数据获取模块、格式转换模块:
    所述请求获取模块,用于获取所述处理器的数据处理请求,所述数据处理请求用于实现数据库中第一数据集的格式转换,所述第一数据集包括至少一个数据;
    所述数据获取模块,用于根据所述数据处理请求获取第一数据集,所述第一数据集以第一方式存储;
    所述格式转换模块,用于根据第二方式对所述第一数据集进行格式转换,获得第二数据集,并将所述第二数据集存储至目标存储空间,所述第二数据集以第二方式存储,所述第二数据集包括至少一个数据,所述第二方式与所述第一方式不同。
  10. 根据权利要求9所述的装置,其特征在于,所述第一方式和所述第二方式分别为行存储或列存储,所述行存储用于指示在所述数据库中以行为准存储数据,所述列存储用于指示在所述数据库中以列为准存储数据。
  11. 根据权利要求10所述的装置,其特征在于,当所述第一方式为行存储,所述第二方式为列存储时,所述格式转换模块在根据第二格式将所述第一数据集进行转换,获得第二数据集,具体用于:
    获取所述第一数据集中定长字段下的各个数据,将所述各个数据连续排布,生成所述第二数据集,所述第二数据集还包括空值指示信息,所述空值指示信息用于指示所述定长字段下的数据为空值或为非空值。
  12. 根据权利要求10所述的装置,其特征在于,当所述第一方式为行存储,所述第二方式为列存储时,所述格式转换模块在根据第二格式将所述第一数据集进行转换,获得第二数据集,具体用于:
    获得所述第一数据集中变长字段的各个数据,将所述各个数据连续排布,生成所述第二数据集,所述第二数据集还包括位置指示信息,所述位置指示信息用于指示所述变长字段下的各个数据在所述第二数据集中的位置。
  13. 根据权利要求11或12所述的装置,其特征在于,所述格式转换模块,还用于:
    对所述第一数据集中数据进行数据格式转换,生成所述第二数据集,其中,所述第一数据集中数据的数据格式为存储数据所需的数据格式,所述第二数据集的数据格式为所述处理器进行数据计算所需的数据格式。
  14. 根据权利要求13所述的装置,其特征在于,所述第一数据集包括数据类型为小数decimal类型的数据,所述格式转换模块,具体用于:
    获取所述小数类型的数据的数据描述信息,将所述数据描述信息作为所述第二数据的一部分,所述数据描述信息包括:符号sign、精度precision、范围scale;
    根据所述精度和所述范围对所述小数类型的数据进行补位操作或去位操作。
  15. 根据权利要求13所述的装置,其特征在于,所述第一数据包括数据类型为日期date类型的数据,所述格式转换模块,具体用于:
    对所述日期类型的数据进行分解,获取多个子数据,一个所述子数据表征所述年、月、日中的一个,所述多个子数据在所述第二数据中连续排布。
  16. 一种加速装置,其特征在于,所述加速装置包括处理器,所述处理器用于执行如权利要求1~8任一所述的方法。
  17. 一种计算设备,其特征在于,所述计算设备包括加速装置和处理器;
    所述处理器,用于向所述加速装置发送数据处理请求,所述数据处理请求用于实现数据库中第一数据集的格式转换;
    所述加速装置,用于执行如权利要求1~8任一所述的方法。
PCT/CN2022/084919 2021-06-11 2022-04-01 一种数据处理方法、装置以及设备 WO2022257575A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110653902.5 2021-06-11
CN202110653902.5A CN115470235A (zh) 2021-06-11 2021-06-11 一种数据处理方法、装置以及设备

Publications (1)

Publication Number Publication Date
WO2022257575A1 true WO2022257575A1 (zh) 2022-12-15

Family

ID=84363328

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/084919 WO2022257575A1 (zh) 2021-06-11 2022-04-01 一种数据处理方法、装置以及设备

Country Status (2)

Country Link
CN (1) CN115470235A (zh)
WO (1) WO2022257575A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117234706A (zh) * 2023-08-30 2023-12-15 中科驭数(北京)科技有限公司 Numeric数据类型转换方法、装置和加速卡

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345518A (zh) * 2013-07-11 2013-10-09 清华大学 基于数据块的自适应数据存储管理方法及系统
WO2016194401A1 (ja) * 2015-06-05 2016-12-08 株式会社日立製作所 計算機、データベース処理方法、及び集積回路
CN105378716B (zh) * 2014-03-18 2019-03-26 华为技术有限公司 一种数据存储格式的转换方法及装置
CN110990402A (zh) * 2019-11-26 2020-04-10 中科驭数(北京)科技有限公司 由行存储到列存储的格式转化方法、查询方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345518A (zh) * 2013-07-11 2013-10-09 清华大学 基于数据块的自适应数据存储管理方法及系统
CN105378716B (zh) * 2014-03-18 2019-03-26 华为技术有限公司 一种数据存储格式的转换方法及装置
WO2016194401A1 (ja) * 2015-06-05 2016-12-08 株式会社日立製作所 計算機、データベース処理方法、及び集積回路
CN110990402A (zh) * 2019-11-26 2020-04-10 中科驭数(北京)科技有限公司 由行存储到列存储的格式转化方法、查询方法及装置

Also Published As

Publication number Publication date
CN115470235A (zh) 2022-12-13

Similar Documents

Publication Publication Date Title
CN110622152A (zh) 用于查询时间序列数据的可扩展数据库系统
US11403269B2 (en) Versioning validation for data transfer between heterogeneous data stores
US20200265068A1 (en) Replicating Big Data
JP2018505501A (ja) アプリケーション中心のオブジェクトストレージ
CN111339073A (zh) 实时数据处理方法、装置、电子设备及可读存储介质
US20150193526A1 (en) Schemaless data access management
CN115129621B (zh) 一种内存管理方法、设备、介质及内存管理模块
WO2022257575A1 (zh) 一种数据处理方法、装置以及设备
US20220398220A1 (en) Systems and methods for physical capacity estimation of logical space units
CN114443680A (zh) 数据库管理系统、相关装置、方法和介质
US11625192B2 (en) Peer storage compute sharing using memory buffer
CN109271456A (zh) 主机数据库文件导出方法及装置
CN105426119A (zh) 一种存储设备及数据处理方法
CN102867029B (zh) 一种管理分布式文件系统目录的方法及分布式文件系统
CN111581227A (zh) 事件推送方法、装置、计算机设备及存储介质
US20230222165A1 (en) Object storage-based indexing systems and method
WO2023040348A1 (zh) 分布式系统中数据处理的方法以及相关系统
WO2022178976A1 (zh) 基于大数据的信息处理方法、装置及相关设备
CN107832347B (zh) 数据降维方法、系统及电子设备
CN115658683A (zh) 元数据处理方法、装置、设备、介质和程序产品
CN115114297A (zh) 数据轻量存储及查找方法、装置、电子设备及存储介质
WO2022001626A1 (zh) 注入时序数据的方法、查询时序数据的方法及数据库系统
WO2022121274A1 (zh) 一种存储系统中元数据管理方法、装置及存储系统
CN114297196A (zh) 元数据存储方法、装置、电子设备及存储介质
CN111506628A (zh) 数据处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22819180

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE