WO2021120952A1 - 面向列的存储方法、装置、设备及计算机可读存储介质 - Google Patents

面向列的存储方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2021120952A1
WO2021120952A1 PCT/CN2020/129253 CN2020129253W WO2021120952A1 WO 2021120952 A1 WO2021120952 A1 WO 2021120952A1 CN 2020129253 W CN2020129253 W CN 2020129253W WO 2021120952 A1 WO2021120952 A1 WO 2021120952A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
data
column
stored
metadata
Prior art date
Application number
PCT/CN2020/129253
Other languages
English (en)
French (fr)
Inventor
黄启军
黄铭毅
李诗琦
刘玉德
陈天健
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021120952A1 publication Critical patent/WO2021120952A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • This application relates to the field of data processing technology, and in particular to a column-oriented storage method, device, device, and computer-readable storage medium.
  • the data is stored according to the row data as the basic logical storage unit, and the data in a row exists in the storage medium in the form of continuous storage.
  • Column-based storage is relative to row-based storage.
  • data is stored according to the column-based logical storage unit, and the data in a column is stored in a continuous storage form in the storage medium.
  • Columnar storage is a good solution to the problem of redundant data and low IO performance when row storage needs to be read during calculation.
  • the existing columnar storage can only support the storage structure of files, and cannot support other storage structures such as key values, queues, etc., resulting in very limited applications of columnar storage.
  • the main purpose of this application is to provide a column-oriented storage method, device, device, and computer-readable storage medium, which aims to solve the problem that the existing columnar storage can only support the storage structure of files, which leads to the extremely high application of columnar storage. Restricted technical issues.
  • the column-oriented storage method includes:
  • the present application also provides a column-oriented storage device, the column-oriented storage device includes:
  • An obtaining module which is used to obtain the storage structure type in the columnar storage requirement when the columnar storage requirement of the data to be stored is detected;
  • a generating module configured to generate metadata of the data to be stored according to the storage structure type, wherein the metadata includes description information of a columnar storage format and additional record information corresponding to the storage structure type;
  • the storage module is used for columnar storage of the data to be stored according to the metadata.
  • the present application also provides a column-oriented storage device.
  • the column-oriented storage device includes a memory, a processor, and a column-oriented storage device that is stored on the memory and can run on the processor.
  • the column-oriented storage program is executed by the processor, the steps of the column-oriented storage method described above are implemented.
  • the present application also provides a computer-readable storage medium with a column-oriented storage program stored on the computer-readable storage medium, and when the column-oriented storage program is executed by a processor, the above The steps of the column-oriented storage method described.
  • the metadata in addition to the description information describing the columnar storage format, the metadata also includes additional record information corresponding to the storage structure type, so that the columnar storage can support different storage structures and support columnar storage under different storage structures Data storage batch, column, cell level sequential read and write or random read and write; thereby expanding the application range of columnar storage, allowing the application side to implement more functions according to other storage structures supported by the storage system , Which in turn enables column-based storage to play its advantages over row-based storage in more application scenarios.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution of an embodiment of the present application
  • FIG. 3 is a schematic diagram of columnar storage of a file structure involved in an embodiment of this application.
  • FIG. 4 is a schematic diagram of columnar storage of a key value structure involved in an embodiment of this application.
  • FIG. 5 is a schematic functional block diagram of a preferred embodiment of a column-oriented storage device according to the present application.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution of an embodiment of the present application.
  • FIG. 1 can be a schematic structural diagram of a hardware operating environment of a column-oriented storage device.
  • the column-oriented storage device in the embodiment of the present application may be a PC, or a terminal device with a display function, such as a smart phone, a smart TV, a tablet computer, and a portable computer.
  • the column-oriented storage device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
  • the column-oriented storage device may also include a camera, RF (Radio Frequency (radio frequency) circuits, sensors, audio circuits, WiFi modules, etc.
  • RF Radio Frequency (radio frequency) circuits
  • sensors e.g., a camera
  • audio circuits e.g., a Wi-Fi module
  • WiFi modules e.g., Wi-Fi modules
  • FIG. 1 does not constitute a limitation on the column-oriented storage device, and may include more or less components than shown in the figure, or a combination of certain components, Or different component arrangements.
  • a memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a column-oriented storage program.
  • the network interface 1004 is mainly used to connect to the back-end server and communicate with the back-end server;
  • the user interface 1003 is mainly used to connect to the client (user side) and communicate with the client.
  • the processor 1001 may be used to call the column-oriented storage program stored in the memory 1005 and perform the following operations:
  • the additional record information includes the start offset of each storage batch or each column in the description information
  • the step of performing columnar storage of the data to be stored according to the metadata includes:
  • the additional record information includes the key name corresponding to each storage batch or each column in the description information
  • the step of performing columnar storage of the data to be stored according to the metadata includes:
  • the additional record information includes the data length of each storage batch or each column in the description information
  • the step of performing columnar storage of the data to be stored according to the metadata includes:
  • the data to be stored is stored in a queue according to the queue structure, wherein the metadata is stored as an element of the queue, or after the metadata is associated with the data to be stored through the queue name Stored in a relational database or memory.
  • step of performing columnar storage of the data to be stored according to the metadata includes:
  • the processor 1001 may be further configured to call a column-oriented storage program stored in the memory 1005 to perform the following operations:
  • the data to be calculated is read from the columnar storage according to the read address, the data to be calculated is used as the parameter of the read calculation anonymous function, and the calculation logic of the read calculation anonymous function is performed on the to be calculated The data is calculated and processed to obtain the reading result.
  • the processor 1001 may be further configured to call a column-oriented storage program stored in the memory 1005 to perform the following operations:
  • the cell data is sequentially read from the cells to be read corresponding to the column to be read to obtain virtual row data composed of each of the cell data.
  • the first embodiment of the column-oriented storage method of the present application provides a column-oriented storage method. It should be noted that although the logical sequence is shown in the flowchart, in some cases, it may be different Perform the steps shown or described in the order here.
  • the execution body of each embodiment of the column-oriented storage method of this application may be terminal devices such as PCs, smart phones, smart TVs, tablet computers, and portable computers. For ease of description, the following embodiments are described as the execution body of the storage system .
  • the column-oriented storage method includes:
  • Step S10 when a columnar storage requirement of the data to be stored is detected, obtain the storage structure type in the columnar storage requirement;
  • the storage system can be a system program installed in a terminal device, which runs on the basis of an operating system, and can interact with other application programs for data.
  • the application side can deliver various data storage requirements to the storage system, where the application side can be an application or a developer.
  • the storage system When the storage system detects the columnar storage requirement of the data to be stored, it acquires the storage structure type in the columnar storage requirement.
  • the columnar storage requirement may be a requirement transmitted by the application side, and the columnar storage requirement includes the storage structure type, which is used to indicate the storage structure according to which the to-be-stored structure is to be stored in the columnar format.
  • the storage structure types include, but are not limited to, file structures (or other similar block device storage structures), key-value structures, queue structures (or other similar streaming storage structures), and distributed structures.
  • Step S20 generating metadata of the data to be stored according to the storage structure type, wherein the metadata includes description information of a columnar storage format and additional record information corresponding to the storage structure type;
  • the storage system generates metadata of the data to be stored according to the acquired storage structure type.
  • the metadata includes the description information of the columnar storage format, such as the description information used to describe the division of the data to be stored into several storage batches, each storage batch contains several columns, the data type of each column, or column nesting, etc. Descriptive information.
  • the storage system can organize the data to be stored according to the preset columnar storage format or the columnar storage format requirements transmitted by the application party.
  • Columnar storage format can be: each column has multiple cells, each column can have sub-columns; each column (each sub-column) has the same number of cells (ie the same number of rows); one column stores the same type of cells continuously Grid, the data type can be basic types such as int/float/double/long, or composite types such as array/list/dict; columns support one or more sub-columns infinitely nested, and the use of columns and sub-columns is exactly the same; storage The batch includes multiple columns, and the structure of the columns in each storage batch is the same.
  • the metadata generated by the storage system according to the acquired storage structure type also includes additional record information corresponding to the storage structure type, that is, different storage structure types have different additional record information.
  • the additional record information corresponding to the key value structure may include the key names of each storage batch and each column.
  • the additional record information is used to support sequential read and write or random read and write of the storage batch, column, and cell level of the columnar storage data under different storage structures, so that the storage system can not only support the file structure, but also support Other storage structures expand the application scope of columnar storage, allowing applications to implement more functions based on other storage structures supported by the storage system, so that columnar storage can play its relative role in more application scenarios.
  • Step S30 Columnar storage is performed on the data to be stored according to the metadata.
  • the data to be stored can be stored in a column based on the metadata. Specifically, the storage system divides the data to be stored into multiple storage batches or multiple columns according to the description information in the metadata, and stores multiple storage batches or multiple columns as corresponding storage structures according to additional record information. For example, when the storage structure type is a key-value structure, the additional record information includes the key name (Key) corresponding to each storage batch or column, and the storage system stores each storage batch or column in the value (Value) corresponding to the key name.
  • Key key name
  • Value Value
  • the metadata can be stored in the same storage structure as the data to be stored, or can be stored separately in other places, such as in a relational database. Metadata can be stored after the data to be stored is stored; it can also be stored first, and then the data to be stored is stored. If the data to be stored is stored, according to the actual storage situation, some of the metadata in the metadata If some information is updated, the metadata can be updated again. For example, when storing according to the file structure, the starting offset of the storage batch or column may be updated.
  • the data to be stored is the columnar storage data stored in the external storage according to the columnar storage format; because the metadata corresponding to the columnar storage data contains each storage batch and each
  • the column description information also includes additional record information corresponding to the storage structure, so that when reading and writing columnar storage data, the storage batch or column can be indexed according to the additional record information. For example, when storing according to the queue structure, the additional record information is recorded.
  • the key name corresponding to each storage batch or column is created, so that the storage system can support reading and writing the data in the corresponding Value according to the key name corresponding to each storage batch or column; furthermore, it realizes sequential read and write or random read under different storage structures.
  • the storage structure type in the columnar storage requirement is obtained; the metadata of the data to be stored is generated according to the storage structure type, and the data to be stored is processed according to the metadata.
  • Columnar storage In addition to the description information describing the columnar storage format, the metadata also includes additional record information corresponding to the storage structure type, so that the columnar storage can support different storage structures and support the alignment of columns under different storage structures.
  • the second embodiment of the present application provides a column-oriented storage method.
  • the additional record information includes the start offset of each storage batch or each column in the description information
  • the step S30 includes:
  • Step S301 Store the data to be stored in a column format according to the file structure according to the metadata, wherein the metadata is stored in the file header, or the metadata is connected to the data to be stored by the file name. After the association, it is stored in a relational database or memory.
  • the additional record information in the metadata generated by the storage system includes the start offset of each storage batch or each column.
  • the starting offset of the storage batch can be the offset of the starting address of the storage batch relative to the starting address of the entire file
  • the starting offset of the column can be the starting address of the column relative to its storage The offset of the start address of the batch. For example, if storage batch 1 is stored at the top of the entire file, the starting offset of storage batch 1 is 0, storage batch 2 is stored behind storage batch 1, and the space occupied by storage batch 1 is 100 address units.
  • the starting offset of storage batch 2 is 100; there are 4 columns in storage batch 1, and each column occupies 25 address units, then the starting offset of the first column is 0, and the starting offset of the second column is 0.
  • the starting offset is 25, and so on. It should be noted that the feature of the file structure is that data is stored in blocks, and each block is stored continuously in the storage space.
  • the storage system stores the data to be stored in a column format according to the file structure according to the metadata. Specifically, the storage system continuously stores each storage batch or column of data to be stored in the form of data blocks according to the characteristics of the file structure.
  • the storage system can store metadata in the file header. As shown in Figure 3, it is a schematic diagram of column storage of a file structure. Metadata is in the file header, and storage batch 1 and storage batch 2 are stored adjacently.
  • the storage system can also store metadata in other places, such as in a relational database or memory, and associate metadata with columnar storage data through file names, so that when reading and writing data, the external storage can be found through metadata correspondence Columnar storage data in.
  • a relational database is a database used to store relational data like metadata. If the metadata is stored in the memory, when reading and writing columnar storage data, there is no need to copy the metadata to the memory, but directly use the metadata in the memory for indexing, thereby improving the efficiency of reading and writing. .
  • the storage system When reading and writing columnar storage data of the file structure, the storage system first obtains metadata, and according to the starting offset of the storage batch or column in the metadata, the storage batch or column in the columnar storage data is sequentially processed Read and write or random read and write. For example, when the first column in the second storage batch of columnar storage data needs to be read, the storage system looks for the starting offset of the second storage batch and the first column in the second storage batch in the metadata. Shift, calculate the absolute starting address of the first column in the second storage batch according to the starting offset and the starting storage address of the columnar storage data, and use the address-based fetch operation at the operating system level to store externally Read the first column of the second storage batch of data.
  • the additional record information includes the key name corresponding to each storage batch or each column in the description information, and the step S30 includes:
  • Step S302 Perform column-based storage of the data to be stored according to the key-value structure according to the metadata, wherein the metadata is stored as a key-value pair, or the metadata is connected to the to-be-stored data by a key name.
  • Stored data is stored in a relational database or memory after association.
  • the additional record information in the metadata generated by the storage system includes the key name corresponding to each storage batch or each column.
  • the key-value structure is characterized in that data is stored in the form of key-value pairs, a key name corresponds to a value, and each key-value pair is not continuously stored in the storage space.
  • the storage system stores the data to be stored in a key-value structure according to the metadata. Specifically, the storage system stores each storage batch or column of the data to be stored in the form of key-value pairs according to the characteristics of the key-value structure.
  • Figure 4 shows a schematic diagram of column-based storage with a key-value structure.
  • the storage system divides the data to be stored into storage batch 1 and storage batch 2, each including two columns; Additional record information in the data: storage batch 1-column 1 corresponds to K1, storage batch 1-column 2 corresponds to K2, storage batch 2-column 1 corresponds to K3, and storage batch 2-column 2 corresponds to K4, where K1 ⁇ K4 are keys Name, the storage system stores the data of each column in the value V1 ⁇ V4 corresponding to each key name.
  • the storage system can store metadata in the form of a key-value pair, such as storing the metadata in the Value corresponding to K0.
  • the storage system can also store metadata in other places, such as in a relational database or memory, and associate the metadata with columnar storage data through the key name, so that when reading and writing data, the external storage can be found through the metadata correspondence Columnar storage data in.
  • the storage system When reading and writing columnar storage data of the key-value structure, the storage system first obtains metadata, and reads data from the value corresponding to the key name according to the key name corresponding to the batch or column stored in the metadata to realize the columnar formula Sequential read and write or random read and write of storage batches or columns in stored data. For example, when the first column in the second storage batch of columnar storage data needs to be read, the storage system looks for the key name of the first column in the second storage batch in the metadata as K3, which corresponds to K3 Read the data in the first column of the second storage batch from the value V3.
  • the additional record information includes the data length of each storage batch or each column in the description information
  • the step S30 includes:
  • step S303 the data to be stored is stored in a queue according to the metadata according to the metadata, wherein the metadata is stored as an element of the queue, or the metadata is combined with the to-be-stored data through the queue name.
  • the data is associated, it is stored in a relational database or memory.
  • the additional record information in the metadata generated by the storage system includes the data length of each storage batch or each column.
  • the characteristic of the queue structure is that the data is stored in the form of a queue.
  • the queue contains multiple elements. Each element is not stored consecutively in the storage space. The elements in the queue can only be read and written in sequence, and it does not support random access to the designation. element.
  • the storage system stores the data to be stored in a queue structure according to the metadata. Specifically, the storage system stores each storage batch or column of data to be stored in the form of queue elements according to the characteristics of the queue structure.
  • the storage system can store metadata as elements of the queue, such as the first element.
  • the storage system can also store metadata in other places, such as in a relational database or memory, and associate the metadata with columnar storage data through the queue name, so that when reading and writing data, the external storage can be found through the metadata correspondence Columnar storage data in.
  • the storage system When reading and writing columnar storage data of the queue structure, the storage system first obtains metadata, and according to the data length of the storage batch or column in the metadata, realizes sequential reading and writing of the storage batch or column in the columnar storage data. According to the characteristics of the queue structure, random reading and writing of the columnar storage data stored in the queue structure is not supported for the designated storage batch or designated column. For example, when the columnar storage data needs to be read, the storage system obtains metadata. If the metadata is stored according to the first element of the queue, the pointer of the first element points to the starting address of the next element.
  • the storage system starts from the starting address pointed to by the pointer and reads the data of 100 address units. That is, the first column of the first storage batch is read; and so on, the storage system sequentially reads each storage batch and column of columnar storage data.
  • the additional record information in the metadata generated by the storage system includes the storage information of each storage batch or each listed in each distributed machine, that is, which storage batches are recorded in which distributed storage
  • the storage information of each storage batch or each listed in each distributed machine that is, which storage batches are recorded in which distributed storage
  • the third embodiment of the present application provides a column-oriented storage method.
  • the step S30 includes:
  • Step S304 receiving the write calculation anonymous function passed by the application
  • the storage system can provide an anonymous function interface, and the application party will transfer the anonymous function written according to the anonymous function interface to the storage system.
  • the anonymous function implements various calculation and processing logics.
  • the storage system can call and execute the anonymous function. It realizes the calculation while writing or the calculation while reading without copying the data to the memory for calculation, which improves the efficiency of reading and writing and reduces the occupation of memory resources, thereby improving the storage performance of the storage system.
  • the application side can pass in the write calculation anonymous function, which can be used to perform calculation processing on the data to be stored, and then return the calculation result for writing to the columnar storage, such as for calculating the pending data. Store the average value of each data in the data, and write the average value into the columnar storage.
  • the write calculation anonymous function can be used to perform calculation processing on the data to be stored, and then return the calculation result for writing to the columnar storage, such as for calculating the pending data.
  • Store the average value of each data in the data and write the average value into the columnar storage.
  • various different calculation logics can be implemented by anonymous functions.
  • Step S305 traverse the metadata and sequentially process each column in the description information, wherein the column index of the current column is used as the parameter of the write calculation anonymous function, and the calculation logic pair of the write calculation anonymous function is executed.
  • the data to be stored is subjected to calculation processing, and the return value of the written calculation anonymous function is written into the current column according to the column index.
  • the storage system traverses the metadata and processes each column in the description information in turn. For the current column, the storage system takes the column index of the current column as the parameter of the write calculation anonymous function, and then executes the calculation logic of the write calculation anonymous function to calculate the stored data.
  • the return value of the write calculation anonymous function is the calculation For the average value obtained, the storage system writes the average value into the current column according to the column index in the parameter.
  • the storage system can stop calling the write calculation anonymous function according to the total number of additional settings or other conditions. It should be noted that there can be multiple return values written into the calculation anonymous function.
  • the storage system processes each column at a time and completes the calculation when the data to be stored is written.
  • the application side can also pass in an iterator, and the storage system calls the iterator to obtain one or more data and write it into the columnar storage until the iterator is completed.
  • column-oriented storage method further includes:
  • Step S40 receiving the read calculation anonymous function and the read address passed in by the application
  • the application side can pass in the read calculation anonymous function and the read address.
  • the read calculation anonymous function may be used to read the data pointed to by the address for calculation processing, and then return the calculation result, for example, to calculate the average value of each cell data in a specified column in the columnar storage data.
  • Step S50 read the data to be calculated from the columnar storage according to the read address, use the data to be calculated as the parameter of the read calculation anonymous function, and execute the calculation logic of the read calculation anonymous function to the all The data to be calculated is calculated and processed to obtain the read result.
  • the storage system reads the data to be calculated from the columnar storage according to the read address, uses the data to be calculated as the parameter for reading the anonymous function, and executes the calculation logic for reading and calculating the anonymous function, and performs calculation processing on the calculated data, and the calculation result As a read structure.
  • the read calculation anonymous function is used to calculate the average value of the cell data in each column of the columnar storage data
  • the storage system reads the data of a column, executes the read calculation anonymous function, and calculates the average of the column data Value, return the average value, discard the data in the column, and then read the data in the next column, calculate the average value of the data in the next column, and return the average value until all columns have been read.
  • the columnar storage data is calculated when reading or when writing, so that there is no need to copy the data to the memory, which improves
  • the read and write efficiency also reduces the occupation of memory resources, thereby improving the storage performance of the storage system.
  • the fourth embodiment of the present application provides a column-oriented storage method.
  • the column-oriented storage method further includes:
  • Step S60 receiving the virtual row read demand of the target columnar storage data from the application party
  • the application side can transmit a virtual row read request for the target columnar storage data to the storage system, and the virtual row read request can specify the row index to be read.
  • Step S70 Determine the column to be read in the target columnar storage data and the cell to be read in the column to be read according to the virtual row read requirement;
  • the storage system determines the column to be read in the target columnar storage data and the cell to be read in the column to be read according to the virtual row read requirement. Specifically, the storage system can determine the columns involved in the row data to be read according to the row index. For example, if a row of data includes data of 5 column attributes, then it is determined that these 5 columns are the columns to be read; and the row is further determined according to the row index. The position of each cell data in each column in the data. If the row index is 2, it is determined that the cell to be read is the second cell in each of the five columns.
  • Step S80 sequentially reading cell data from the cells to be read corresponding to the column to be read, and obtaining virtual row data composed of each of the cell data.
  • the storage system sequentially reads the cell data from the cells to be read corresponding to the column to be read, and obtains the virtual row data composed of each cell data, that is, the virtual row reading is completed. For example, the storage system sequentially reads the second cell data of each column from these 5 columns, skips other cells, and thus reads 5 cell data. Since the information of each storage batch and each column is recorded in the metadata, the reading address of each cell to be read can be calculated according to the metadata, so that the reading of the cell level can be realized, so as to realize the row-by-row reading. Read data to avoid reading a whole column of data when you need to read data by row, wasting IO resources.
  • the column-oriented storage device includes:
  • the obtaining module 10 is configured to obtain the storage structure type in the columnar storage requirement when the columnar storage requirement of the data to be stored is detected;
  • the generating module 20 is configured to generate metadata of the data to be stored according to the storage structure type, where the metadata includes description information of a columnar storage format and additional record information corresponding to the storage structure type;
  • the storage module 30 is configured to perform columnar storage of the data to be stored according to the metadata.
  • the additional record information includes the start offset of each storage batch or each column in the description information
  • the storage module 30 includes:
  • the first storage unit is configured to store the data to be stored in a column format according to the file structure according to the metadata, wherein the metadata is stored in the file header, or the metadata is combined with the file name through the file name.
  • the data to be stored is associated and stored in a relational database or memory.
  • the additional record information includes the key name corresponding to each storage batch or each column in the description information
  • the storage module 30 includes:
  • the second storage unit is configured to store the data to be stored in a column format according to a key-value structure according to the metadata, wherein the metadata is stored as a key-value pair, or the metadata is stored by a key name After being associated with the data to be stored, it is stored in a relational database or memory.
  • the additional record information includes the data length of each storage batch or each column in the description information
  • the storage module 30 includes:
  • the third storage unit is configured to store the data to be stored in a queue according to the metadata according to the metadata, wherein the metadata is stored as an element of the queue, or the metadata is combined with the queue name and the metadata.
  • the data to be stored is associated and stored in a relational database or memory.
  • the storage module 30 includes:
  • the receiving unit is used to receive the write calculation anonymous function passed in by the application;
  • the traversal unit is configured to traverse the metadata and sequentially process each column in the description information, wherein the column index of the current column is used as the parameter of the write calculation anonymous function, and the calculation of the write calculation anonymous function is performed
  • the logic performs calculation processing on the data to be stored, and writes the return value of the written calculation anonymous function into the current column according to the column index.
  • column-oriented storage device further includes:
  • the first receiving module is used to receive the read calculation anonymous function and read address passed in by the application;
  • the execution module is configured to read the data to be calculated from the columnar storage according to the read address, use the data to be calculated as a parameter of the read calculation anonymous function, and execute the calculation logic of the read calculation anonymous function Perform calculation processing on the data to be calculated to obtain a reading result.
  • column-oriented storage device further includes:
  • the second receiving module is used to receive the virtual row read demand of the target columnar storage data from the application party;
  • a determining module configured to determine the column to be read in the target columnar storage data and the cell to be read in the column to be read according to the virtual row read requirement
  • the reading module is used to sequentially read cell data from the cells to be read corresponding to the column to be read, and obtain virtual row data composed of each of the cell data.
  • an embodiment of the present application also proposes a computer-readable storage medium, the computer-readable storage medium stores a column-oriented storage program, and the column-oriented storage program is executed by a processor to realize the column-oriented The steps of the storage method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种面向列的存储方法、装置、设备及计算机可读存储介质,所述方法包括:当检测到对待存储数据的列式存储需求时,获取所述列式存储需求中的存储结构类型(S10);根据所述存储结构类型生成所述待存储数据的元数据,其中,所述元数据中包括列式存储格式的描述信息和所述存储结构类型对应的额外记录信息(S20);根据所述元数据对所述待存储数据进行列式存储(S30)。

Description

面向列的存储方法、装置、设备及计算机可读存储介质
本申请要求于2019年12月20日申请的、申请号为201911326804.X、名称为“面向列的存储方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种面向列的存储方法、装置、设备及计算机可读存储介质。
背景技术
行式存储方式中数据是按照行数据为基础逻辑存储单元进行存储的,一行中的数据在存储介质中以连续存储形式存在。列式存储方式则是相对于与行式存储来说的,列式存储方式中数据是按照列为基础逻辑存储单元进行存储的,一列中的数据在存储介质中以连续存储形式存在。列式存储很好地解决了行式存储在计算时需要读取多余的数据、IO性能低的问题。但是,现有的列式存储只能够支持文件这种存储结构,对于键值、队列等其他存储结构则无法支持,导致列式存储的应用非常受限。
技术解决方案
本申请的主要目的在于提供一种面向列的存储方法、装置、设备及计算机可读存储介质,旨在解决现有的列式存储只能够支持文件这种存储结构,导致列式存储的应用非常受限的技术问题。
为实现上述目的,本申请提供一种面向列的存储方法,所述面向列的存储方法包括:
当检测到对待存储数据的列式存储需求时,获取所述列式存储需求中的存储结构类型;
根据所述存储结构类型生成所述待存储数据的元数据,其中,所述元数据中包括列式存储格式的描述信息和所述存储结构类型对应的额外记录信息;
根据所述元数据对所述待存储数据进行列式存储。
此外,为实现上述目的,本申请还提供一种面向列的存储装置,所述面向列的存储装置包括:
获取模块,用于当检测到对待存储数据的列式存储需求时,获取所述列式存储需求中的存储结构类型;
生成模块,用于根据所述存储结构类型生成所述待存储数据的元数据,其中,所述元数据中包括列式存储格式的描述信息和所述存储结构类型对应的额外记录信息;
存储模块,用于根据所述元数据对所述待存储数据进行列式存储。
此外,为实现上述目的,本申请还提供一种面向列的存储设备,所述面向列的存储设备包括存储器、处理器和存储在所述存储器上并可在所述处理器上运行的面向列的存储程序,所述面向列的存储程序被所述处理器执行时实现如上所述的面向列的存储方法的步骤。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有面向列的存储程序,所述面向列的存储程序被处理器执行时实现如上所述的面向列的存储方法的步骤。
本申请中,通过在检测到对待存储数据的列式存储需求时,获取列式存储需求中的存储结构类型;根据存储结构类型生成待存储数据的元数据,根据元数据对待存储数据进行列式存储;元数据中除了包括描述列式存储格式的描述信息外,还包括与存储结构类型对应的额外记录信息,使得列式存储能够支持不同的存储结构,支持在不同存储结构下对列式存储数据的中的存储批、列、单元格级别的顺序读写或随机读写;从而扩大了列式存储的应用范围,使得应用方可以根据存储系统所支持的其他存储结构,实现更多的功能,进而使得列式存储能够在更多的应用场景发挥其相对于行式存储的优势。
附图说明
图1是本申请实施例方案涉及的硬件运行环境的结构示意图;
图2为本申请面向列的存储方法第一实施例的流程示意图;
图3为本申请实施例涉及的一种文件结构的列式存储示意图;
图4为本申请实施例涉及的一种键值结构的列式存储示意图;
图5为本申请面向列的存储装置较佳实施例的功能示意图模块图。
本发明的实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例提供了一种面向列的存储设备,参照图1,图1是本申请实施例方案涉及的硬件运行环境的结构示意图。
需要说明的是,图1即可为面向列的存储设备的硬件运行环境的结构示意图。本申请实施例面向列的存储设备可以是PC,也可以是智能手机、智能电视机、平板电脑、便携计算机等具有显示功能的终端设备。
如图1所示,该面向列的存储设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
在一实施例中,面向列的存储设备还可以包括摄像头、RF(Radio Frequency,射频)电路,传感器、音频电路、WiFi模块等等。本领域技术人员可以理解,图1中示出的面向列的存储设备结构并不构成对面向列的存储设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及面向列的存储程序。
在图1所示的面向列的存储设备中,网络接口1004主要用于连接后台服务器,与后台服务器进行数据通信;用户接口1003主要用于连接客户端(用户端),与客户端进行数据通信;而处理器1001可以用于调用存储器1005中存储的面向列的存储程序,并执行以下操作:
当检测到对待存储数据的列式存储需求时,获取所述列式存储需求中的存储结构类型;
根据所述存储结构类型生成所述待存储数据的元数据,其中,所述元数据中包括列式存储格式的描述信息和所述存储结构类型对应的额外记录信息;
根据所述元数据对所述待存储数据进行列式存储。
进一步地,当所述存储结构类型是文件结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的起始偏移量,
所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
根据所述元数据对所述待存储数据按照文件结构进行列式存储,其中,将所述元数据存储在文件头部,或将所述元数据通过文件名与所述待存储数据关联后存储在关系数据库或内存中。
进一步地,当所述存储结构类型是键值结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列对应的键名,
所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
根据所述元数据对所述待存储数据按照键值结构进行列式存储,其中,将所述元数据以键值对进行存储,或将所述元数据通过键名与所述待存储数据关联后存储在关系数据库或内存中。
进一步地,当所述存储结构类型是队列结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的数据长度,
所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
根据所述元数据对所述待存储数据按照队列结构进行列式存储,其中,将所述元数据作为队列的元素进行存储,或将所述元数据通过队列名与所述待存储数据关联后存储在关系数据库或内存中。
进一步地,所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
接收应用方传入的写入计算匿名函数;
遍历所述元数据依次处理所述描述信息中的各个列,其中,将当前列的列索引作为所述写入计算匿名函数的参数,执行所述写入计算匿名函数的计算逻辑对所述待存储数据进行计算处理,将所述写入计算匿名函数的返回值按照所述列索引写入所述当前列。
进一步地,所述根据所述元数据对所述待存储数据进行列式存储的步骤之后,处理器1001还可以用于调用存储器1005中存储的面向列的存储程序,执行以下操作:
接收应用方传入的读取计算匿名函数和读取地址;
按照所述读取地址从列式存储中读取待计算数据,将所述待计算数据作为所述读取计算匿名函数的参数,执行所述读取计算匿名函数的计算逻辑对所述待计算数据进行计算处理得到读取结果。
进一步地,所述根据所述元数据对所述待存储数据进行列式存储的步骤之后,处理器1001还可以用于调用存储器1005中存储的面向列的存储程序,执行以下操作:
接收应用方传入的对目标列式存储数据的虚拟行读取需求;
根据所述虚拟行读取需求确定所述目标列式存储数据中的待读取列,以及所述待读取列中的待读取单元格;
依次从所述待读取列对应的待读取单元格中读取单元格数据,获得由各个所述单元格数据组成的虚拟行数据。
基于上述的硬件结构,提出本申请面向列的存储方法的各个实施例。
参照图2,本申请面向列的存储方法第一实施例提供一种面向列的存储方法,需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。本申请面向列的存储方法的各个实施例的执行主体可以是PC、智能手机、智能电视机、平板电脑和便携计算机等终端设备,为便于描述,在以下各实施例以存储系统执行主体进行阐述。所述面向列的存储方法包括:
步骤S10,当检测到对待存储数据的列式存储需求时,获取所述列式存储需求中的存储结构类型;
存储系统可以是安装在终端设备中的系统程序,在操作系统基础上运行,可以与其他应用程序进行数据交互。应用方可以向存储系统传递数据存储的各种需求,其中,应用方可以是应用程序,也可以是开发人员。
当存储系统检测到对待存储数据的列式存储需求时,获取列式存储需求中的存储结构类型。其中,列式存储需求可以是应用方所传递的需求,列式存储需求中包括存储结构类型,用于指示将待存储结构按照哪种存储结构进行列式存储。具体地,存储结构类型包括但不限于文件结构(或其他类似块设备存储结构)、键值结构、队列结构(或其他类似流式存储结构)和分布式结构。
步骤S20,根据所述存储结构类型生成所述待存储数据的元数据,其中,所述元数据中包括列式存储格式的描述信息和所述存储结构类型对应的额外记录信息;
存储系统根据获取到的存储结构类型,生成待存储数据的元数据。具体地,元数据中包括列式存储格式的描述信息,如包括用于描述将待存储数据划分为几个存储批、各个存储批包含几个列、每列的数据类型或列嵌套情况等的描述信息。其中,存储系统可根据预先设置的列式存储格式或应用方传递的列式存储格式需求,将待存储数据进行组织。列式存储格式可以是:每列有多个单元格,每列可以有子列;每列(每子列)拥有相同的单元格数(即相同的行数);一列连续存储同类型的单元格,数据类型可以是int/float/double/long等基本类型,也可以是array/list/dict等复合类型;列支持一个或多个子列无限嵌套,列和子列的使用方式完全一样;存储批包括多个列,每个存储批中列的结构相同。
为支持不同的存储结构,存储系统根据获取到的存储结构类型生成的元数据中,还包括与该存储结构类型对应的额外记录信息,也即,存储结构类型不同,额外记录信息不同。例如,键值结构对应的额外记录信息可以包括各个存储批和各个列的键名。额外记录信息用于支持在不同的存储结构下,对列式存储数据中的存储批、列、单元格级别的顺序读写或随机读写,从而使得存储系统不仅能够支持文件结构,还能够支持其他的存储结构,从而扩大了列式存储的应用范围,使得应用方可以根据存储系统所支持的其他存储结构,实现更多的功能,从而使得列式存储能够在更多的应用场景发挥其相对于行式存储的优势。
步骤S30,根据所述元数据对所述待存储数据进行列式存储。
存储系统在生成待存储数据的元数据后,可根据元数据对待存储数据进行列式存储。具体地,存储系统根据元数据中的描述信息,将待存储数据分为多个存储批或多个列,根据额外记录信息,将多个存储批或多个列存储为对应存储结构。例如,当存储结构类型是键值结构时,额外记录信息包括各个存储批或列对应的键名(Key),存储系统将各个存储批或列存储其键名对应的值(Value)中。
需要说明的是,元数据可以与待存储数据按照相同的存储结构存储,也可以单独存储在其他地方,如存储在关系数据库中。元数据可以是在将待存储数据存储后再存储;也可以是先对元数据进行存储,再对待存储数据进行存储,若将待存储数据存储后,根据实际的存储情况,元数据中的某些信息有更新,则可以再更新元数据,例如按照文件结构存储时,存储批或列的起始偏移量可能会有更新。
在将待存储数据进行列式存储后,该待存储数据即是在外部存储中按照列式存储格式存储的列式存储数据;由于列式存储数据对应的元数据中除了包含各个存储批、各个列描述信息,还包括与存储结构对应的额外记录信息,从而使得读写列式存储数据时,可以按照额外记录信息来进行存储批或列的索引,如按照队列结构存储时,额外记录信息记录了各个存储批或列对应的键名,使得存储系统能够支持根据各存储批或列对应的键名,读写对应Value中的数据;进而实现了在不同存储结构下,顺序读写或随机读写列式存储数据中的存储批、列、单元格级别的数据。
在本实施例中,通过在检测到对待存储数据的列式存储需求时,获取列式存储需求中的存储结构类型;根据存储结构类型生成待存储数据的元数据,根据元数据对待存储数据进行列式存储;元数据中除了包括描述列式存储格式的描述信息外,还包括与存储结构类型对应的额外记录信息,使得列式存储能够支持不同的存储结构,支持在不同存储结构下对列式存储数据的中的存储批、列、单元格级别的顺序读写或随机读写;从而扩大了列式存储的应用范围,使得应用方可以根据存储系统所支持的其他存储结构,实现更多的功能,进而使得列式存储能够在更多的应用场景发挥其相对于行式存储的优势。
进一步的,基于上述第一实施例,本申请第二实施例提供一种面向列的存储方法。在本实施例中,当所述存储结构类型是文件结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的起始偏移量,所述步骤S30包括:
步骤S301,根据所述元数据对所述待存储数据按照文件结构进行列式存储,其中,将所述元数据存储在文件头部,或将所述元数据通过文件名与所述待存储数据关联后存储在关系数据库或内存中。
在本实施例中,当存储系统获取到的存储结构类型是文件结构时,存储系统生成的元数据中额外记录信息包括各个存储批或各个列的起始偏移量。其中,存储批的起始偏移量可以是存储批的起始地址相对于整个文件起始地址的偏移量,列的起始偏移量可以是该列的起始地址相对于其所在存储批的起始地址的偏移量。例如,存储批1存储在整个文件的最前面,则存储批1的起始偏移量为0,存储批2存储在存储批1的后面,存储批1所占的空间是100个地址单元,则存储批2的起始偏移量是100;存储批1中有4个列,每个列占25个地址单元,则第一个列的起始偏移量是0,第二个列的起始偏移量是25,以此类推。需要说明的是,文件结构的特点是数据按块存储,各个块在存储空间中是连续存储的。
存储系统根据元数据对待存储数据按照文件结构进行列式存储。具体地,存储系统按照文件结构的特点,将待存储数据的各个存储批或列按照数据块的形式连续存储。存储系统可将元数据存储在文件头部。如图3所示,为一种文件结构的列式存储示意图,元数据在文件头部,存储批1和存储批2相邻存放。存储系统也可以将元数据存储在其他地方,如存储在关系数据库或内存中,并通过文件名将元数据和列式存储数据进行关联,从而使得读写数据时,能够通过元数据对应找到外部存储中的列式存储数据。其中,关系数据库是用于存储类似元数据这种关系型数据的数据库。若将元数据存储在内存中,在对列式存储数据进行读写操作时,则可无需将元数据复制到内存中,而是直接采用内存中的元数据进行索引,从而提高了读写效率。
在对文件结构的列式存储数据进行读写时,存储系统先获取元数据,根据元数据中存储批或列的起始偏移量,对列式存储数据中的存储批或列进行按顺序读写或随机读写。例如,当需要对列式存储数据的第二个存储批中的第一列进行读取时,存储系统查找元数据中第二个存储批和第二个存储批中第一列的起始偏移量,根据起始偏移量和列式存储数据的起始存储地址,计算出第二存储批中第一列的绝对起始地址,利用操作系统层面的按地址取数操作,从外部存储中读取出第二存储批第一列的数据。
进一步地,当所述存储结构类型是键值结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列对应的键名,所述步骤S30包括:
步骤S302,根据所述元数据对所述待存储数据按照键值结构进行列式存储,其中,将所述元数据以键值对进行存储,或将所述元数据通过键名与所述待存储数据关联后存储在关系数据库或内存中。
当存储系统获取到的存储结构类型是键值结构时,存储系统生成的元数据中额外记录信息包括各个存储批或各个列对应的键名。其中,键值结构的特点是数据以键值对的形式存储,一个键名对应一个值,各个键值对在存储空间中并不是连续存储的。
存储系统根据元数据对待存储数据按照键值结构进行列式存储。具体地,存储系统按照键值结构的特点,将待存储数据的各个存储批或列按照键值对的形式存储。例如图4所示,为一种键值结构的列式存储示意图,存储系统根据元数据中的描述信息,将待存储数据划分为存储批1和存储批2,各包括两个列;根据元数据中的额外记录信息:存储批1-列1对应K1、存储批1-列2对应K2、存储批2-列1对应K3和存储批2-列2对应K4,其中,K1~K4是键名,存储系统将各个列的数据存储在各键名对应的值V1~V4中。
存储系统可将元数据以一个键值对的形式进行存储,如将元数据存储在K0对应的Value。存储系统也可以将元数据存储在其他地方,如存储在关系数据库或内存中,并通过键名将元数据和列式存储数据进行关联,从而使得读写数据时,能够通过元数据对应找到外部存储中的列式存储数据。
在对键值结构的列式存储数据进行读写时,存储系统先获取元数据,根据元数据中存储批或列对应的键名,从键名对应的值中读取数据,实现对列式存储数据中的存储批或列的顺序读写或随机读写。例如,当需要对列式存储数据的第二个存储批中的第一列进行读取时,存储系统查找元数据中第二个存储批中第一列对应的键名为K3,从K3对应的值V3中读取出第二存储批中第一列的数据。
进一步地,当所述存储结构类型是队列结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的数据长度,所述步骤S30包括:
步骤S303,根据所述元数据对所述待存储数据按照队列结构进行列式存储,其中,将所述元数据作为队列的元素进行存储,或将所述元数据通过队列名与所述待存储数据关联后存储在关系数据库或内存中。
当存储系统获取到的存储结构类型是队列结构时,存储系统生成的元数据中额外记录信息包括各个存储批或各个列的数据长度。其中,队列结构的特点是数据以队列的形式存储,队列中包括多个元素,各个元素在存储空间中是不连续存储的,队列中的元素只能够顺序读写,不支持随机获取其中的指定元素。
存储系统根据元数据对待存储数据按照队列结构进行列式存储。具体地,存储系统按照队列结构的特点,将待存储数据的各个存储批或列按照队列元素的形式存储。存储系统可将元数据作为队列的元素进行存储,如作为第一个元素。存储系统也可以将元数据存储在其他地方,如存储在关系数据库或内存中,并通过队列名将元数据和列式存储数据进行关联,从而使得读写数据时,能够通过元数据对应找到外部存储中的列式存储数据。
在对队列结构的列式存储数据进行读写时,存储系统先获取元数据,根据元数据中存储批或列的数据长度,实现对列式存储数据中的存储批或列的顺序读写,根据队列结构的特点,不支持对队列结构存储的列式存储数据随机读写指定存储批或指定列。例如,当需要对列式存储数据进行读取时,存储系统获取元数据,若元数据是按照队列的第一个元素进行存储的,第一个元素的指针指向下一个元素的起始地址,根据元数据中记录的下一个元素是第一个存储批的第一个列,数据长度是100个地址单元,则存储系统从指针指向的起始地址开始,读取100个地址单元的数据,即读取到第一个存储批的第一个列;以此类推,存储系统顺序读取出列式存储数据的各个存储批和列。
进一步地,当存储结构类型是分布式存储时,存储系统生成的元数据中额外记录信息包括各个存储批或各个列在各个分布式机中的存储信息,即记录哪些存储批存储在哪个分布式机中,从而实现对分布式存储结构存储的列式存储数据进行读写时,能够根据元数据查找到各个存储批或各个列所在的分布式机,进而在对应的分布式机中进行读写。
进一步的,基于上述第一和第二实施例,本申请第三实施例提供一种面向列的存储方法。在本实施例中,所述步骤S30包括:
步骤S304,接收应用方传入的写入计算匿名函数;
在本实施例中,存储系统可提供匿名函数接口,应用方将根据匿名函数接口编写的匿名函数传入存储系统,匿名函数中实现各种计算处理逻辑,存储系统可调用并执行匿名函数,从而实现写入时计算或读取时计算,而无需将数据复制到内存进行计算,提高了读写效率,也减少了内存资源的占用,从而提高了存储系统的存储性能。
具体地,应用方可传入写入计算匿名函数,该写入计算匿名函数可以是用于对待存储数据进行计算处理,然后将计算结果返回,用于写入列式存储,如用于计算待存储数据中各个数据的平均值,将平均值写入列式存储。根据具体的应用场景,匿名函数可实现的各种不同的计算逻辑。
步骤S305,遍历所述元数据依次处理所述描述信息中的各个列,其中,将当前列的列索引作为所述写入计算匿名函数的参数,执行所述写入计算匿名函数的计算逻辑对所述待存储数据进行计算处理,将所述写入计算匿名函数的返回值按照所述列索引写入所述当前列。
存储系统遍历元数据,依次处理描述信息中的各个列。对于当前列,存储系统将当前列的列索引作为该写入计算匿名函数的参数,然后执行该写入计算匿名函数的计算逻辑对待存储数据进行计算处理,写入计算匿名函数的返回值是计算得到的平均值,存储系统将该平均值按照参数中的列索引写入当前列。存储系统可根据额外设置的总数或其他条件停止来停止调用写入计算匿名函数。需要说明的是,写入计算匿名函数的返回值可以是多个。存储系统一次处理各个列,完成对待存储数据的写入时计算。
应用方也可以传入迭代器,存储系统调用迭代器获取一个或多个数据写入列式存储,直到迭代器完成。
进一步地,所述面向列的存储方法还包括:
步骤S40,接收应用方传入的读取计算匿名函数和读取地址;
进一步地,对于列式存储数据,应用方可传入读取计算匿名函数以及读取地址。该读取计算匿名函数可以是用于读取地址所指向的数据进行计算处理,然后将计算结果返回,如用于计算列式存储数据中指定列的各个单元格数据的平均值。
步骤S50,按照所述读取地址从列式存储中读取待计算数据,将所述待计算数据作为所述读取计算匿名函数的参数,执行所述读取计算匿名函数的计算逻辑对所述待计算数据进行计算处理得到读取结果。
存储系统按照读取地址从列式存储中读取待计算数据,将待计算数据作为读取计算匿名函数的参数,并执行读取计算匿名函数的计算逻辑,对待计算数据进行计算处理,计算结果即作为读取结构。例如,读取计算匿名函数用于分别计算列式存储数据的每个列中单元格数据的平均值,则存储系统读取一列的数据,执行读取计算匿名函数,对该列数据进行计算平均值,返回平均值,抛弃该列数据,再读取下一列的数据,对下一列的数据计算平均值,返回平均值,直到读取完所有列。
在本实施例中,通过接收应用方传入的匿名函数,并执行匿名函数的计算逻辑,对列式存储数据进行读取时计算或写入时计算,使得无需将数据复制到内存,提高了读写效率,也减少了内存资源的占用,从而提高了存储系统的存储性能。
进一步地,基于上述第一、第二和第三实施例,本申请第四实施例提供一种面向列的存储方法。在本实施例中,所述面向列的存储方法还包括:
步骤S60,接收应用方传入的对目标列式存储数据的虚拟行读取需求;
目前的列式存储只能实现一列一列的读取数据,若要实现读取一行的数据,则需要将多个列的数据都读取到内存中,再在内存中实现筛选。
在本实施例中,应用方可向存储系统传入对目标列式存储数据的虚拟行读取需求,虚拟行读取需求中可指定要读取的行索引。
步骤S70,根据所述虚拟行读取需求确定所述目标列式存储数据中的待读取列,以及所述待读取列中的待读取单元格;
存储系统根据虚拟行读取需求确定目标列式存储数据中的待读取列,以及待读取列中的待读取单元格。具体地,存储系统可根据行索引确定要读取的行数据所涉及的列,如一行数据包括5个列属性的数据,则确定这5个列是待读取列;进一步根据行索引确定一行数据中各个单元格数据在各列中的位置,如行索引是2,则确定待读取单元格是这5列中每列的第2个单元格。
步骤S80,依次从所述待读取列对应的待读取单元格中读取单元格数据,获得由各个所述单元格数据组成的虚拟行数据。
存储系统依次从待读取列对应的待读取单元格中读取单元格数据,获取到由各个单元格数据组成的虚拟行数据,也即完成了虚拟行读取。例如,存储系统依次从这5个列中读取到各个列的第2个单元格数据,跳过其他单元格,从而读取到5个单元格数据。由于元数据中记录了各个存储批、各个列的信息,因此,根据元数据能够计算得到各个待读取单元格的读取地址,从而能够实现对单元格级别的读取,从而实现按行的读取数据,以避免需要按行读取数据时,读取一整列的数据,浪费IO资源。
本实施例中,通过增强了列式存储的谓词下推功能,实现随机读取行数据,扩大了列式存储的应用范围,可应用于大数据和人工智能算法的基础架构。
此外本申请实施例还提出一种面向列的存储装置,参照图5,所述面向列的存储装置包括:
获取模块10,用于当检测到对待存储数据的列式存储需求时,获取所述列式存储需求中的存储结构类型;
生成模块20,用于根据所述存储结构类型生成所述待存储数据的元数据,其中,所述元数据中包括列式存储格式的描述信息和所述存储结构类型对应的额外记录信息;
存储模块30,用于根据所述元数据对所述待存储数据进行列式存储。
进一步地,当所述存储结构类型是文件结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的起始偏移量,所述存储模块30包括:
第一存储单元,用于根据所述元数据对所述待存储数据按照文件结构进行列式存储,其中,将所述元数据存储在文件头部,或将所述元数据通过文件名与所述待存储数据关联后存储在关系数据库或内存中。
进一步地,当所述存储结构类型是键值结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列对应的键名,所述存储模块30包括:
第二存储单元,用于根据所述元数据对所述待存储数据按照键值结构进行列式存储,其中,将所述元数据以键值对进行存储,或将所述元数据通过键名与所述待存储数据关联后存储在关系数据库或内存中。
进一步地,当所述存储结构类型是队列结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的数据长度,所述存储模块30包括:
第三存储单元,用于根据所述元数据对所述待存储数据按照队列结构进行列式存储,其中,将所述元数据作为队列的元素进行存储,或将所述元数据通过队列名与所述待存储数据关联后存储在关系数据库或内存中。
进一步地,所述存储模块30包括:
接收单元,用于接收应用方传入的写入计算匿名函数;
遍历单元,用于遍历所述元数据依次处理所述描述信息中的各个列,其中,将当前列的列索引作为所述写入计算匿名函数的参数,执行所述写入计算匿名函数的计算逻辑对所述待存储数据进行计算处理,将所述写入计算匿名函数的返回值按照所述列索引写入所述当前列。
进一步地,所述面向列的存储装置还包括:
第一接收模块,用于接收应用方传入的读取计算匿名函数和读取地址;
执行模块,用于按照所述读取地址从列式存储中读取待计算数据,将所述待计算数据作为所述读取计算匿名函数的参数,执行所述读取计算匿名函数的计算逻辑对所述待计算数据进行计算处理得到读取结果。
进一步地,所述面向列的存储装置还包括:
第二接收模块,用于接收应用方传入的对目标列式存储数据的虚拟行读取需求;
确定模块,用于根据所述虚拟行读取需求确定所述目标列式存储数据中的待读取列,以及所述待读取列中的待读取单元格;
读取模块,用于依次从所述待读取列对应的待读取单元格中读取单元格数据,获得由各个所述单元格数据组成的虚拟行数据。
本申请面向列的存储装置的具体实施方式的拓展内容与上述面向列的存储方法各实施例基本相同,在此不做赘述。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有面向列的存储程序,所述面向列的存储程序被处理器执行时实现如上所述面向列的存储方法的步骤。
本申请面向列的存储设备和计算机可读存储介质的具体实施方式的拓展内容与上述面向列的存储方法各实施例基本相同,在此不做赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种面向列的存储方法,其中,所述面向列的存储方法包括:
    当检测到对待存储数据的列式存储需求时,获取所述列式存储需求中的存储结构类型;
    根据所述存储结构类型生成所述待存储数据的元数据,其中,所述元数据中包括列式存储格式的描述信息和所述存储结构类型对应的额外记录信息;
    根据所述元数据对所述待存储数据进行列式存储。
  2. 如权利要求1所述的面向列的存储方法,其中,当所述存储结构类型是文件结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的起始偏移量,
    所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    根据所述元数据对所述待存储数据按照文件结构进行列式存储,其中,将所述元数据存储在文件头部,或将所述元数据通过文件名与所述待存储数据关联后存储在关系数据库或内存中。
  3. 如权利要求1所述的面向列的存储方法,其中,当所述存储结构类型是键值结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列对应的键名,
    所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    根据所述元数据对所述待存储数据按照键值结构进行列式存储,其中,将所述元数据以键值对进行存储,或将所述元数据通过键名与所述待存储数据关联后存储在关系数据库或内存中。
  4. 如权利要求1所述的面向列的存储方法,其中,当所述存储结构类型是队列结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的数据长度,
    所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    根据所述元数据对所述待存储数据按照队列结构进行列式存储,其中,将所述元数据作为队列的元素进行存储,或将所述元数据通过队列名与所述待存储数据关联后存储在关系数据库或内存中。
  5. 如权利要求1所述的面向列的存储方法,其中,所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    接收应用方传入的写入计算匿名函数;
    遍历所述元数据依次处理所述描述信息中的各个列,其中,将当前列的列索引作为所述写入计算匿名函数的参数,执行所述写入计算匿名函数的计算逻辑对所述待存储数据进行计算处理,将所述写入计算匿名函数的返回值按照所述列索引写入所述当前列。
  6. 如权利要求1所述的面向列的存储方法,其中,所述根据所述元数据对所述待存储数据进行列式存储的步骤之后,还包括:
    接收应用方传入的读取计算匿名函数和读取地址;
    按照所述读取地址从列式存储中读取待计算数据,将所述待计算数据作为所述读取计算匿名函数的参数,执行所述读取计算匿名函数的计算逻辑对所述待计算数据进行计算处理得到读取结果。
  7. 如权利要求1至6任一项所述的面向列的存储方法,其中,所述根据所述元数据对所述待存储数据进行列式存储的步骤之后,还包括:
    接收应用方传入的对目标列式存储数据的虚拟行读取需求;
    根据所述虚拟行读取需求确定所述目标列式存储数据中的待读取列,以及所述待读取列中的待读取单元格;
    依次从所述待读取列对应的待读取单元格中读取单元格数据,获得由各个所述单元格数据组成的虚拟行数据。
  8. 一种面向列的存储装置,其中,所述面向列的存储装置包括:
    获取模块,用于当检测到对待存储数据的列式存储需求时,获取所述列式存储需求中的存储结构类型;
    生成模块,用于根据所述存储结构类型生成所述待存储数据的元数据,其中,所述元数据中包括列式存储格式的描述信息和所述存储结构类型对应的额外记录信息;
    存储模块,用于根据所述元数据对所述待存储数据进行列式存储。
  9. 一种面向列的存储设备,其中,所述面向列的存储设备包括存储器、处理器和存储在所述存储器上并可在所述处理器上运行的面向列的存储程序,所述面向列的存储程序被所述处理器执行时实现如下步骤:
    当检测到对待存储数据的列式存储需求时,获取所述列式存储需求中的存储结构类型;
    根据所述存储结构类型生成所述待存储数据的元数据,其中,所述元数据中包括列式存储格式的描述信息和所述存储结构类型对应的额外记录信息;
    根据所述元数据对所述待存储数据进行列式存储。
  10. 如权利要求9所述的面向列的存储设备,其中,当所述存储结构类型是文件结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的起始偏移量,
    所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    根据所述元数据对所述待存储数据按照文件结构进行列式存储,其中,将所述元数据存储在文件头部,或将所述元数据通过文件名与所述待存储数据关联后存储在关系数据库或内存中。
  11. 如权利要求9所述的面向列的存储设备,其中,当所述存储结构类型是键值结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列对应的键名,
    所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    根据所述元数据对所述待存储数据按照键值结构进行列式存储,其中,将所述元数据以键值对进行存储,或将所述元数据通过键名与所述待存储数据关联后存储在关系数据库或内存中。
  12. 如权利要求9所述的面向列的存储设备,其中,当所述存储结构类型是队列结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的数据长度,
    所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    根据所述元数据对所述待存储数据按照队列结构进行列式存储,其中,将所述元数据作为队列的元素进行存储,或将所述元数据通过队列名与所述待存储数据关联后存储在关系数据库或内存中。
  13. 如权利要求9所述的面向列的存储设备,其中,所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    接收应用方传入的写入计算匿名函数;
    遍历所述元数据依次处理所述描述信息中的各个列,其中,将当前列的列索引作为所述写入计算匿名函数的参数,执行所述写入计算匿名函数的计算逻辑对所述待存储数据进行计算处理,将所述写入计算匿名函数的返回值按照所述列索引写入所述当前列。
  14. 如权利要求9所述的面向列的存储设备,其中,所述根据所述元数据对所述待存储数据进行列式存储的步骤之后,所述面向列的存储程序被所述处理器执行时还实现如下步骤:
    接收应用方传入的读取计算匿名函数和读取地址;
    按照所述读取地址从列式存储中读取待计算数据,将所述待计算数据作为所述读取计算匿名函数的参数,执行所述读取计算匿名函数的计算逻辑对所述待计算数据进行计算处理得到读取结果。
  15. 如权利要求9至14任一项所述的面向列的存储设备,其中,所述根据所述元数据对所述待存储数据进行列式存储的步骤之后,所述面向列的存储程序被所述处理器执行时还实现如下步骤:
    接收应用方传入的对目标列式存储数据的虚拟行读取需求;
    根据所述虚拟行读取需求确定所述目标列式存储数据中的待读取列,以及所述待读取列中的待读取单元格;
    依次从所述待读取列对应的待读取单元格中读取单元格数据,获得由各个所述单元格数据组成的虚拟行数据。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有面向列的存储程序,所述面向列的存储程序被处理器执行时实现如下步骤:
    当检测到对待存储数据的列式存储需求时,获取所述列式存储需求中的存储结构类型;
    根据所述存储结构类型生成所述待存储数据的元数据,其中,所述元数据中包括列式存储格式的描述信息和所述存储结构类型对应的额外记录信息;
    根据所述元数据对所述待存储数据进行列式存储。
  17. 如权利要求16所述的计算机可读存储介质,其中,当所述存储结构类型是文件结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的起始偏移量,
    所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    根据所述元数据对所述待存储数据按照文件结构进行列式存储,其中,将所述元数据存储在文件头部,或将所述元数据通过文件名与所述待存储数据关联后存储在关系数据库或内存中。
  18. 如权利要求16所述的计算机可读存储介质,其中,当所述存储结构类型是键值结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列对应的键名,
    所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    根据所述元数据对所述待存储数据按照键值结构进行列式存储,其中,将所述元数据以键值对进行存储,或将所述元数据通过键名与所述待存储数据关联后存储在关系数据库或内存中。
  19. 如权利要求16所述的计算机可读存储介质,其中,当所述存储结构类型是队列结构时,所述额外记录信息包括所述描述信息中各个存储批或各个列的数据长度,
    所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    根据所述元数据对所述待存储数据按照队列结构进行列式存储,其中,将所述元数据作为队列的元素进行存储,或将所述元数据通过队列名与所述待存储数据关联后存储在关系数据库或内存中。
  20. 如权利要求16所述的计算机可读存储介质,其中,所述根据所述元数据对所述待存储数据进行列式存储的步骤包括:
    接收应用方传入的写入计算匿名函数;
    遍历所述元数据依次处理所述描述信息中的各个列,其中,将当前列的列索引作为所述写入计算匿名函数的参数,执行所述写入计算匿名函数的计算逻辑对所述待存储数据进行计算处理,将所述写入计算匿名函数的返回值按照所述列索引写入所述当前列。
PCT/CN2020/129253 2019-12-20 2020-11-17 面向列的存储方法、装置、设备及计算机可读存储介质 WO2021120952A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911326804.X 2019-12-20
CN201911326804.XA CN110968585B (zh) 2019-12-20 2019-12-20 面向列的存储方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021120952A1 true WO2021120952A1 (zh) 2021-06-24

Family

ID=70035690

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129253 WO2021120952A1 (zh) 2019-12-20 2020-11-17 面向列的存储方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN110968585B (zh)
WO (1) WO2021120952A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968585B (zh) * 2019-12-20 2023-11-03 深圳前海微众银行股份有限公司 面向列的存储方法、装置、设备及计算机可读存储介质
CN111752955A (zh) * 2020-06-29 2020-10-09 深圳前海微众银行股份有限公司 数据处理方法、装置、设备及计算机可读存储介质
CN111984651A (zh) * 2020-08-21 2020-11-24 苏州浪潮智能科技有限公司 一种基于持久性内存的列式存储方法、装置及设备
CN112445801B (zh) * 2020-11-27 2024-06-14 杭州海康威视数字技术股份有限公司 数据表的元信息管理方法、装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890721A (zh) * 2012-10-16 2013-01-23 苏州迈科网络安全技术股份有限公司 基于列存储技术的数据库建立方法及系统
US20160275201A1 (en) * 2015-03-18 2016-09-22 Adp, Llc Database structure for distributed key-value pair, document and graph models
CN109542889A (zh) * 2018-10-11 2019-03-29 平安科技(深圳)有限公司 流式数据列存储方法、装置、设备和存储介质
US20190188289A1 (en) * 2017-12-18 2019-06-20 Yahoo Japan Corporation Data management device, data management method, and non-transitory computer readable storage medium
CN110968585A (zh) * 2019-12-20 2020-04-07 深圳前海微众银行股份有限公司 面向列的存储方法、装置、设备及计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104516912B (zh) * 2013-09-29 2018-06-26 中国移动通信集团黑龙江有限公司 一种动态的数据存储方法及装置
CN104866497B (zh) * 2014-02-24 2018-06-15 华为技术有限公司 分布式文件系统列式存储的元数据更新方法、装置、主机
CN104035956A (zh) * 2014-04-11 2014-09-10 江苏瑞中数据股份有限公司 一种基于分布式列存储的时间序列数据存储方法
US10901977B2 (en) * 2018-05-14 2021-01-26 Sap Se Database independent detection of data changes

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890721A (zh) * 2012-10-16 2013-01-23 苏州迈科网络安全技术股份有限公司 基于列存储技术的数据库建立方法及系统
US20160275201A1 (en) * 2015-03-18 2016-09-22 Adp, Llc Database structure for distributed key-value pair, document and graph models
US20190188289A1 (en) * 2017-12-18 2019-06-20 Yahoo Japan Corporation Data management device, data management method, and non-transitory computer readable storage medium
CN109542889A (zh) * 2018-10-11 2019-03-29 平安科技(深圳)有限公司 流式数据列存储方法、装置、设备和存储介质
CN110968585A (zh) * 2019-12-20 2020-04-07 深圳前海微众银行股份有限公司 面向列的存储方法、装置、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN110968585A (zh) 2020-04-07
CN110968585B (zh) 2023-11-03

Similar Documents

Publication Publication Date Title
WO2021120952A1 (zh) 面向列的存储方法、装置、设备及计算机可读存储介质
US20180239800A1 (en) Data query method and apparatus
CN111061758B (zh) 数据存储方法、装置及存储介质
WO2017161540A1 (zh) 数据查询的方法、数据对象的存储方法和数据系统
US20140101132A1 (en) Swapping expected and candidate affinities in a query plan cache
US20230251796A1 (en) Method and apparatus for reading and writing data
CN111079917A (zh) 张量数据分块存取的方法及装置
TWI570559B (zh) 快閃記憶體及其存取方法
WO2021258512A1 (zh) 数据的聚合处理装置、方法和存储介质
CN112905587B (zh) 数据库的数据管理方法、装置及电子设备
CN107423425B (zh) 一种对k/v格式的数据快速存储和查询方法
CN115470156A (zh) 基于rdma的内存使用方法、系统、电子设备和存储介质
CN114089921A (zh) 电力系统数据存储方法、装置、计算机设备和存储介质
CN111625600B (zh) 数据存储的处理方法、系统、计算机设备及存储介质
CN111090397B (zh) 一种数据重删方法、系统、设备及计算机可读存储介质
CN110019538B (zh) 一种数据表切换方法及装置
CN113190549B (zh) 多维表数据调取方法、装置、服务器及存储介质
CN110837499A (zh) 数据访问处理方法、装置、电子设备和存储介质
CN110990394B (zh) 分布式面向列数据库表的行数统计方法、装置和存储介质
WO2015058628A1 (zh) 文件的访问方法及装置
US20240211483A1 (en) Log data query method, electronic device, and storage medium
US11526495B2 (en) Method and apparatus for processing write-ahead log
CN117041980B (zh) 一种网元管理方法、装置、存储介质及电子设备
US11609894B2 (en) Data storage system conflict management
US20240220334A1 (en) Data processing method in distributed system, and related system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20901715

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20901715

Country of ref document: EP

Kind code of ref document: A1