CN118132600A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN118132600A
CN118132600A CN202311793334.4A CN202311793334A CN118132600A CN 118132600 A CN118132600 A CN 118132600A CN 202311793334 A CN202311793334 A CN 202311793334A CN 118132600 A CN118132600 A CN 118132600A
Authority
CN
China
Prior art keywords
operator
data
block
storage module
temporary storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311793334.4A
Other languages
Chinese (zh)
Inventor
贺佐交
肖意
刘彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Oceanbase Technology Co Ltd
Original Assignee
Beijing Oceanbase Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Oceanbase Technology Co Ltd filed Critical Beijing Oceanbase Technology Co Ltd
Priority to CN202311793334.4A priority Critical patent/CN118132600A/en
Publication of CN118132600A publication Critical patent/CN118132600A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One or more embodiments of the present disclosure provide a data processing method and apparatus, an electronic device, and a storage medium, where the method includes: responding to an intermediate result generated in the process of executing an SQL sentence by an operator, and controlling the operator to call a temporary storage module through a corresponding format interface so as to store the intermediate result according to a format corresponding to the operator through the temporary storage module; the temporary storage module is configured with a plurality of format interfaces, and different format interfaces are used for enabling operators calling the temporary storage module to store data according to different formats. In other words, in the method, the operator can call the temporary storage module through different format interfaces to customize different data formats, so that the intermediate result can be stored in the format corresponding to the operator, the format requirements of different operators are met, and the storage efficiency of the intermediate result is improved.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
One or more embodiments of the present disclosure relate to the field of database technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
Today, the development of the internet and informatization is rapid, and the generation of data is explosively increasing, so that the requirements for databases and management thereof are increasing. When executing the SQL (Structured Query Language ) statement on the database, the contents in the SQL statement can be distributed to different operators for execution, and the operators often generate intermediate results needing temporary buffering in the process of processing data, for example, the SORT operator needs to SORT all data and then output the data uniformly, so that the received input data needs to be temporarily buffered as the intermediate results before outputting.
In the related art, the temporary storage component in the database only supports storing data in a specific format, so that operators can only be stored in the specific format when intermediate results are stored by the temporary storage module, resulting in lower intermediate result access efficiency of operators in non-specific formats.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a data processing method and apparatus, an electronic device, and a storage medium.
In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
According to a first aspect of one or more embodiments of the present specification, there is provided a data processing method, the method comprising:
Responding to an intermediate result generated in the process of executing an SQL sentence by an operator, and controlling the operator to call a temporary storage module through a corresponding format interface so as to store the intermediate result according to a format corresponding to the operator through the temporary storage module;
the temporary storage module is configured with a plurality of format interfaces, and different format interfaces are used for enabling operators calling the temporary storage module to store data according to different formats.
In one possible embodiment of the present disclosure, the controlling, by the operator, the temporary storage module to store the intermediate result according to a format corresponding to the operator includes:
the operator is controlled to store the intermediate result in a layered storage mode through the temporary storage module according to a format corresponding to the operator;
Wherein the hierarchical storage form comprises: and storing the data into the data blocks, constructing index information for each data block, and storing the index information of each data block into the index block.
In a possible embodiment of the present specification, the hierarchical storage form further includes: index information is built for each index block, and the index information of each index block is stored.
In a possible embodiment of the present specification, the data block includes at least one of the following information: block type, block number, amount of data, original length of data, and data writing location.
In a possible embodiment of the present specification, the index information includes at least one of the following information: the index block number, the index block category, the hardware of the index block and the compressed length of the index block.
In a possible embodiment of the present specification, the index block includes at least one of the following information: block category, index information amount, index information.
In one possible embodiment of the present disclosure, the controlling, by the temporary storage module, the operator to store, in a hierarchical storage form, the intermediate result according to a format corresponding to the operator includes:
Controlling the operator to write the intermediate result into a current data block in a memory according to a format corresponding to the operator, wherein the current data block comprises a data block aimed at by the operator for current reading and writing;
And responding to the fact that the current data block is fully written, controlling the operator to distribute a new data block through the temporary storage module, and writing the intermediate result into the new data block according to a format corresponding to the operator.
In a possible embodiment of the present specification, the controlling the operator to allocate a new data block through the temporary storage module includes:
Responsive to the space occupied by the operator data block in the memory reaching a space threshold corresponding to the operator, controlling the operator to dump the operator data block to a disk through the temporary storage module, constructing index information of the operator data block, and distributing new data blocks in the memory;
controlling the operator to distribute new data blocks in the memory through the temporary storage module in response to the space occupied by the operator data blocks in the memory not reaching the space threshold corresponding to the operator;
The operator data block comprises a data block for storing an intermediate result generated by the operator in the memory.
In a possible embodiment of the present specification, the method further comprises:
controlling the operator to read target data in the current data block of the memory;
And in response to the fact that the current data block does not contain the target data, controlling the operator to determine a data block in which the target data is located in the index block through the temporary storage module, and controlling the operator to read the target data in the data block in which the target data is located.
In a possible embodiment of the present specification, the method further comprises:
and responding to the data block of the target data in the disk, and dumping the data block of the target data in the memory.
In a possible embodiment of the present specification, the method further comprises:
And in response to the target data being distributed in a plurality of data blocks, controlling the operator to keep the plurality of data blocks in the memory through the temporary storage module in the process of reading the target data.
In one possible embodiment of the present disclosure, the controlling, by the operator, the temporary storage module to store the intermediate result according to a format corresponding to the operator includes:
And controlling the operator to compress and store the intermediate result according to a format corresponding to the operator through the temporary storage module.
In a possible embodiment of the present specification, the method further comprises:
And controlling the temporary storage module to determine and update the storage state of the intermediate result of the operator in real time.
According to a second aspect of one or more embodiments of the present specification, there is provided a data processing apparatus, the apparatus comprising:
The storage module is used for responding to an intermediate result generated in the process of executing the SQL sentence by the operator, controlling the operator to call the temporary storage module through the corresponding format interface so as to store the intermediate result according to the format corresponding to the operator through the temporary storage module;
the temporary storage module is configured with a plurality of format interfaces, and different format interfaces are used for enabling operators calling the temporary storage module to store data according to different formats.
In a possible embodiment of the present specification, the storage module is configured to:
the operator is controlled to store the intermediate result in a layered storage mode through the temporary storage module according to a format corresponding to the operator;
Wherein the hierarchical storage form comprises: and storing the data into the data blocks, constructing index information for each data block, and storing the index information of each data block into the index block.
In a possible embodiment of the present specification, the hierarchical storage form further includes: index information is built for each index block, and the index information of each index block is stored.
In a possible embodiment of the present specification, the data block includes at least one of the following information: block type, block number, amount of data, original length of data, and data writing location.
In a possible embodiment of the present specification, the index information includes at least one of the following information: the index block number, the index block category, the hardware of the index block and the compressed length of the index block.
In a possible embodiment of the present specification, the index block includes at least one of the following information: block category, index information amount, index information.
In one possible embodiment of the present disclosure, the storage module is configured to control, when the operator stores, through the temporary storage module, the intermediate result in a hierarchical storage form according to a format corresponding to the operator, to:
Controlling the operator to write the intermediate result into a current data block in a memory according to a format corresponding to the operator, wherein the current data block comprises a data block aimed at by the operator for current reading and writing;
And responding to the fact that the current data block is fully written, controlling the operator to distribute a new data block through the temporary storage module, and writing the intermediate result into the new data block according to a format corresponding to the operator.
In a possible embodiment of the present specification, the storage module is configured to control, when the operator allocates a new data block through the temporary storage module, to:
Responsive to the space occupied by the operator data block in the memory reaching a space threshold corresponding to the operator, controlling the operator to dump the operator data block to a disk through the temporary storage module, constructing index information of the operator data block, and distributing new data blocks in the memory;
controlling the operator to distribute new data blocks in the memory through the temporary storage module in response to the space occupied by the operator data blocks in the memory not reaching the space threshold corresponding to the operator;
The operator data block comprises a data block for storing an intermediate result generated by the operator in the memory.
In a possible embodiment of the present specification, the apparatus further comprises a reading module for:
controlling the operator to read target data in the current data block of the memory;
And in response to the fact that the current data block does not contain the target data, controlling the operator to determine a data block in which the target data is located in the index block through the temporary storage module, and controlling the operator to read the target data in the data block in which the target data is located.
In a possible embodiment of the present specification, the apparatus further includes a loading module configured to:
and responding to the data block of the target data in the disk, and dumping the data block of the target data in the memory.
In a possible embodiment of the present specification, the apparatus further includes a cross-block reading module for:
And in response to the target data being distributed in a plurality of data blocks, controlling the operator to keep the plurality of data blocks in the memory through the temporary storage module in the process of reading the target data.
In a possible embodiment of the present specification, the storage module is configured to:
And controlling the operator to compress and store the intermediate result according to a format corresponding to the operator through the temporary storage module.
In a possible embodiment of the present specification, the apparatus further includes a recording module configured to:
Controlling the temporary storage module to determine and update the storage state of the intermediate result of the operator in real time
According to a third aspect of one or more embodiments of the present specification, a database management system is presented, the system comprising a temporary storage module configured with a plurality of format interfaces, different format interfaces for causing operators invoking the temporary storage module to store data in different formats.
In one possible embodiment of the present specification, the temporary storage module is configured to store data in a hierarchical storage form;
Wherein the hierarchical storage form comprises: and storing the data into the data blocks, constructing index information for each data block, and storing the index information of each data block into the index block.
According to a fourth aspect of one or more embodiments of the present specification, there is provided an electronic device comprising:
a processor;
A memory for storing processor-executable instructions;
Wherein the processor implements the method of the first aspect by executing the executable instructions.
According to a fifth aspect of one or more embodiments of the present description, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to the first aspect.
The technical scheme provided by the embodiment of the specification can comprise the following beneficial effects:
According to the data processing method provided by the embodiment of the specification, an intermediate result can be generated in the process of executing the SQL statement in response to an operator, and the operator is controlled to call a temporary storage module through a corresponding format interface so as to store the intermediate result according to the format corresponding to the operator through the temporary storage module; the temporary storage module is configured with a plurality of format interfaces, and different format interfaces are used for enabling operators calling the temporary storage module to store data according to different formats. In other words, in the method, the operator can call the temporary storage module through different format interfaces to customize different data formats, so that the intermediate result can be stored in the format corresponding to the operator, the format requirements of different operators are met, and the storage efficiency of the intermediate result is improved.
Drawings
Fig. 1 is a flow chart of a data processing method according to an exemplary embodiment.
Fig. 2 is a schematic diagram of a temporary storage module provided in an exemplary embodiment.
Fig. 3 is a schematic diagram of a hierarchical structure formed by a hierarchical storage form provided in an exemplary embodiment.
FIG. 4 is a flow chart of data writing provided by an exemplary embodiment.
FIG. 5 is a flow chart of data reading provided by an exemplary embodiment.
Fig. 6 is a schematic diagram of an apparatus according to an exemplary embodiment.
Fig. 7 is a block diagram of a data processing apparatus provided in yet another exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.
It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.
Today, the development of the internet and informatization is rapid, and the generation of data is explosively increasing, so that the requirements for databases and management thereof are increasing. When the SQL statement is executed on the database, contents in the SQL statement can be distributed to different operators for execution, and the operators often generate intermediate results needing temporary buffering in the process of processing data, for example, the SORT operator needs to output all data in a unified mode after sequencing, so that the received input data needs to be temporarily buffered as intermediate results before outputting.
In the related art, the temporary storage component in the database only supports storing data in a specific format, so that operators can only be stored in the specific format when intermediate results are stored by the temporary storage module, resulting in lower intermediate result access efficiency of operators in non-specific formats.
Based on this, in the first aspect, at least one embodiment of the present disclosure provides a data processing method, which may support format requirements of different operators, and store intermediate results in a format corresponding to the operators when the operators execute SQL statements to generate intermediate results that need temporary buffering, so that efficiency of storing intermediate results and efficiency of reading intermediate results by the operators are improved, and meanwhile, damage of the intermediate results due to format conversion may be avoided, that is, stability and security of the intermediate results are improved.
The method may be applied to a database management system, i.e. the method is performed by the database management system, for example.
Referring to fig. 1, a flow of the data processing method is shown, which includes step S101.
In step S101, in response to an intermediate result generated in the process of executing the SQL statement by the operator, the operator is controlled to call a temporary storage module (TempStore) through a corresponding format interface, so that (the operator is controlled) the intermediate result is stored through the temporary storage module according to the format corresponding to the operator.
When executing the SQL statement, the database management system can split the SQL statement and distribute the split result to different operators for execution, wherein the step occurs in the execution process after the operators are distributed with the whole SQL statement or the split result of the SQL statement. The intermediate result may be input data, or a preliminary processing result of the input data, or the like. The intermediate results are stored for the purpose of: after all input data are obtained by the operator, the intermediate result can be processed and output.
The database management system is configured with a temporary storage module, the temporary storage module is used for interacting with storage media such as a memory and a magnetic disk, and an operator can execute operations such as storage, reading and the like of intermediate results in the storage media by calling the temporary storage module. The temporary storage module is not bound with a data format and is used for writing the data written by the operator calling the temporary storage module into a storage medium or feeding the data required to be read back to the operator calling the temporary storage module. The temporary storage module is provided with a plurality of format interfaces, and different format interfaces are used for enabling operators calling the temporary storage module to store data according to different formats; the operator can customize different temporary data formats to the temporary storage module through different format interfaces, format customization is completed when the operator calls the temporary storage module through a certain format interface, and then the operator can write data into the temporary storage module according to the format. For example, the temporary storage module may be configured with a row memory format interface, a column memory format interface, and the like. After the operator calls the temporary storage module through the line memory format interface, the intermediate result of the line memory format can be written into the storage medium through the temporary storage module; after the operator calls the temporary storage module through the column storage format interface, the intermediate result in the column storage format can be written into the storage medium through the temporary storage module.
Referring to fig. 2, a schematic diagram of a temporary storage module (TempStore) is shown, where the temporary storage module may customize any format, such as Row Store, column Store, random Access Row Store (random access Store), encoding Row Store (code compression Store), and the like.
Preferably, referring to fig. 2, the temporary storage module may be configured with a memory control function. The storage medium comprises a memory and a magnetic disk; when the intermediate result is stored by the operator through the temporary storage module, the intermediate result can be written into the memory first, and when the space occupied by the intermediate result generated by the operator in the memory reaches the space threshold corresponding to the operator, the intermediate result generated by the operator is dumped to the disk.
Preferably, referring to fig. 2, the temporary storage module may be configured with a data compression function, which may meet the requirement of the scene such as the index scene on the space utilization. Namely, when the operator is controlled to store the intermediate result according to the format corresponding to the operator through the temporary storage module (a management system of a database), the operator can be controlled to compress and store the intermediate result according to the format corresponding to the operator through the temporary storage module. It should be appreciated that the operator may compress the intermediate result by the temporary storage module when writing the intermediate result to memory, or compress the intermediate result by the temporary storage module when transferring the intermediate result to memory disk. The data management system configured with the temporary storage module can reduce space occupation through compression when a plurality of intermediate results are generated by a certain operator, so that the utilization rate of space in a storage medium is improved, and the intermediate results of the operator can be smoothly stored.
Preferably, referring to fig. 2, the temporary storage module may be configured with a monitoring and diagnosis function. I.e. the (management system of the database) can control the temporary storage module to determine and update in real time the storage status of the intermediate results of the operators, i.e. to determine the storage status of the intermediate results of each operator. The storage state may include a space threshold corresponding to the operator, a space actually occupied by an intermediate result generated by the operator in the memory, a space actually occupied by an intermediate result generated by the operator in the disk, and the like. The (management system of the database) can also determine SQL statement execution conditions, memory use conditions and the like based on the storage state of the intermediate result of each operator, and correct when the conditions are abnormal.
Preferably, referring to fig. 2, the temporary storage module may be configured with a data IO function. The operator can realize data writing and reading through the data IO function.
According to the data processing method provided by the embodiment of the specification, an intermediate result can be generated in the process of executing the SQL statement in response to an operator, and the operator is controlled to call a temporary storage module through a corresponding format interface so as to store the intermediate result according to the format corresponding to the operator through the temporary storage module; the temporary storage module is configured with a plurality of format interfaces, and different format interfaces are used for enabling operators calling the temporary storage module to store data according to different formats. In other words, in the method, the operator can call the temporary storage module through different format interfaces to customize different data formats, so that the intermediate result can be stored in the format corresponding to the operator, the format requirements of different operators and the requirements of different scenes are met, and the storage efficiency of the intermediate result is improved.
With continued reference to fig. 2, the temporary storage module in some embodiments of the present disclosure further has a data block management function and an inter-block random access function, which may be implemented in a hierarchical storage form. That is, the controlling the operator to store the intermediate result according to the format corresponding to the operator through the temporary storage module may include: and controlling the operator to store the intermediate result in a layered storage mode according to a format corresponding to the operator through the temporary storage module.
Wherein the hierarchical storage form comprises: and storing the data into the data blocks, constructing index information for each data block, and storing the index information of each data block into the index block. That is, when data is stored, the data are sequentially stored in a plurality of data blocks, and when one data block is full, index information is built for the data block and stored in the next database. When the data blocks of the stored data are larger than 1, the index information of the data blocks can be stored into one index block.
Further, the hierarchical storage form may further include: index information is built for each index block, and the index information of each index block is stored. It should be appreciated that the index information for each index block may also be stored into another index block.
In practice, the data block and the index block are both memory or macro blocks in the disk, and the two are named differently according to different objects stored.
Referring to fig. 3, a hierarchical structure of data blocks and Index blocks formed in a hierarchical storage form is shown, where Block is a data Block, index Block is an Index Block, and Block Index is Index information. The hierarchical structure is in a multi-layer tree structure, the root node can be an index block, index information of at least one index block is stored in the index block, and the end node is a plurality of data blocks; at least one layer of index block nodes exist between each end node and the root node, the father node of the index block nodes is the index block node or the root node, and the child nodes of the index block nodes are the index block nodes or the end nodes. In fig. 3, there is a layer of index block nodes between the root node and the end node. It should be appreciated that the root node may be stored in memory while the other nodes may be stored in disk.
With continued reference to fig. 3, the data block includes at least one of the following information: block type mac, block number block_id, data amount cnt, data original length raw_size, and data writing position payload. The Block Header of the data Block is composed of a Block type mac, a Block number block_id, a data quantity cnt and a data original length raw_size. The block class magic is used for checking data and distinguishing different blocks; the block numbers block_id of different data blocks are increased according to the writing sequence; the data amount cnt may be a number of data, such as a line number, and the sum of the block number block_id and the data amount cnt is the block number block_id of the next data block; the original length raw_size of the data is the length of the data written in the data block before compression; data is written in the data writing position payload.
With continued reference to fig. 3, the index information includes at least one of the following information: the block number of the index block_id, the block category is_idx_block of the index, the hardware on_disk of the block of the index, the disk offset or the memory pointer offset/point, and the length of the compressed block of the index. The indexed block class is_idx_block may be a data block or an index block; the block of index may be a disk or a memory.
With continued reference to fig. 3, the index block includes at least one of the following information: block class map, index information amount cnt, and index information. The block class magic is used for checking data and distinguishing different blocks; the index information amount cnt may be the number of block indexes within an index block.
Illustratively, the controlling the operator to store the intermediate result in the hierarchical storage form according to the format corresponding to the operator through the temporary storage module may include:
firstly, the operator is controlled to write the intermediate result into a current data block in a memory according to a format corresponding to the operator, wherein the current data block comprises a data block aimed at by the operator current reading and writing.
And then, in response to the fact that the current data block is fully written, controlling the operator to distribute a new data block through the temporary storage module, and writing the intermediate result into the new data block according to a format corresponding to the operator. For example, in response to the space occupied by the operator data block in the memory reaching a space threshold corresponding to the operator, controlling the operator to dump the operator data block to a disk through the temporary storage module, constructing index information of the operator data block, and distributing a new data block in the memory; controlling the operator to distribute new data blocks in the memory through the temporary storage module in response to the space occupied by the operator data blocks in the memory not reaching the space threshold corresponding to the operator; the operator data block comprises a data block for storing an intermediate result generated by the operator in the memory.
Referring to fig. 4, a detailed description of the preferred exemplary process is given by way of example of a line memory format.
First, the size of the memory space required before writing is calculated according to the line memory format, namely the memory space required by each line of data.
Then, it is determined whether or not the current Block (i.e., the current data Block) can be written: if the memory space of the current Block is larger than the memory space required by each line of data, the current Block can be written; if the memory space of the current Block is not larger than the memory space required by each line of data, the current Block cannot be written.
Then, if the current Block can be written, the data is written to the current Block.
Then, if the current Block cannot be written, judging whether the data is required to be dropped: if the occupied space of the data block storing the intermediate result generated by the operator in the memory reaches the space threshold of the operator, the disk is required to be dropped, and if the occupied space of the data block storing the intermediate result generated by the operator in the memory does not reach the space threshold of the operator, the disk is not required to be dropped.
Then, if the disk is not required to be dropped, a new Block is allocated in the memory, and data is written to the new Block.
Finally, if a drop disc is required, the drop disc (namely, the data Block storing the intermediate result generated by the operator in the memory is dumped to the disk) is dropped, an Index is constructed, a new Block is allocated in the memory, and data is written to the new Block.
For further example, after the operator is controlled to store the intermediate result in a hierarchical storage form according to a format corresponding to the operator through the temporary storage module, the intermediate result may be read randomly in the following manner:
firstly, the operator is controlled to read target data in the current data block of the memory.
And then, in response to the fact that the current data block does not contain the target data, controlling the operator to determine the data block where the target data is located in the index block through the temporary storage module, and controlling the operator to read the target data in the data block where the target data is located.
It should be appreciated that the data block in which the target data is located may be dumped into the memory to facilitate data reading in response to the data block in which the target data is located being within the disk.
It should be appreciated that the operator may be controlled to maintain the plurality of data blocks in the memory via the temporary storage module during reading of the target data in response to the target data being distributed within the plurality of data blocks. For example, the life cycle of the read data Block is controlled through the age and the holder, so that when the task is read across the Block, the memory does not release the related multiple data blocks in advance, and normal execution of the task is ensured.
Referring to fig. 5, a detailed description of the preferred exemplary process will be given by taking a line memory format as an example.
First, it is determined whether a line to be read is present in a current Block (i.e., a current data Block), and if so, data is read.
Then, if the line to be read is not present in the current Block (i.e., the current data Block), the index is searched to determine whether the Block (where the line is located) is in the memory, and if so, the Block (where the line is located) is searched to read the data.
Finally, if the Block (of the row) is not in the memory, loading the disk data to the memory (i.e. loading the Block (of the row) from the disk to the memory), and then searching the Block (of the row) in the memory to read the data.
In this embodiment, the temporary storage module can support sequential access and random access through a hierarchical storage mode, so that the requirements of operators such as a window function on random access can be adapted, the application range of the temporary storage module is further improved, and the management performance of a database management system with the temporary storage module is improved.
Fig. 6 is a schematic block diagram of an apparatus provided in an exemplary embodiment. Referring to fig. 6, at the hardware level, the device includes a processor 602, an internal bus 604, a network interface 606, a memory 608, and a non-volatile storage 610, although other tasks may be performed. One or more embodiments of the present description may be implemented in a software-based manner, such as by the processor 602 reading a corresponding computer program from the non-volatile memory 610 into the memory 608 and then running. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.
Referring to fig. 7, the data processing apparatus may be applied to the device shown in fig. 6 to implement the technical solution of the present specification. The device comprises:
The storage module 701 is configured to respond to an intermediate result generated in the process of executing an SQL statement by an operator, and control the operator to call a temporary storage module through a corresponding format interface, so that the intermediate result is stored by the temporary storage module according to a format corresponding to the operator;
the temporary storage module is configured with a plurality of format interfaces, and different format interfaces are used for enabling operators calling the temporary storage module to store data according to different formats.
In a possible embodiment of the present specification, the storage module is configured to:
the operator is controlled to store the intermediate result in a layered storage mode through the temporary storage module according to a format corresponding to the operator;
Wherein the hierarchical storage form comprises: and storing the data into the data blocks, constructing index information for each data block, and storing the index information of each data block into the index block.
In a possible embodiment of the present specification, the hierarchical storage form further includes: index information is built for each index block, and the index information of each index block is stored.
In a possible embodiment of the present specification, the data block includes at least one of the following information: block type, block number, amount of data, original length of data, and data writing location.
In a possible embodiment of the present specification, the index information includes at least one of the following information: the index block number, the index block category, the hardware of the index block and the compressed length of the index block.
In a possible embodiment of the present specification, the index block includes at least one of the following information: block category, index information amount, index information.
In one possible embodiment of the present disclosure, the storage module is configured to control, when the operator stores, through the temporary storage module, the intermediate result in a hierarchical storage form according to a format corresponding to the operator, to:
Controlling the operator to write the intermediate result into a current data block in a memory according to a format corresponding to the operator, wherein the current data block comprises a data block aimed at by the operator for current reading and writing;
And responding to the fact that the current data block is fully written, controlling the operator to distribute a new data block through the temporary storage module, and writing the intermediate result into the new data block according to a format corresponding to the operator.
In a possible embodiment of the present specification, the storage module is configured to control, when the operator allocates a new data block through the temporary storage module, to:
Responsive to the space occupied by the operator data block in the memory reaching a space threshold corresponding to the operator, controlling the operator to dump the operator data block to a disk through the temporary storage module, constructing index information of the operator data block, and distributing new data blocks in the memory;
controlling the operator to distribute new data blocks in the memory through the temporary storage module in response to the space occupied by the operator data blocks in the memory not reaching the space threshold corresponding to the operator;
The operator data block comprises a data block for storing an intermediate result generated by the operator in the memory.
In a possible embodiment of the present specification, the apparatus further comprises a reading module for:
controlling the operator to read target data in the current data block of the memory;
And in response to the fact that the current data block does not contain the target data, controlling the operator to determine a data block in which the target data is located in the index block through the temporary storage module, and controlling the operator to read the target data in the data block in which the target data is located.
In a possible embodiment of the present specification, the apparatus further includes a loading module configured to:
and responding to the data block of the target data in the disk, and dumping the data block of the target data in the memory.
In a possible embodiment of the present specification, the apparatus further includes a cross-block reading module for:
And in response to the target data being distributed in a plurality of data blocks, controlling the operator to keep the plurality of data blocks in the memory through the temporary storage module in the process of reading the target data.
In a possible embodiment of the present specification, the storage module is configured to:
And controlling the operator to compress and store the intermediate result according to a format corresponding to the operator through the temporary storage module.
In a possible embodiment of the present specification, the apparatus further includes a recording module configured to:
Controlling the temporary storage module to determine and update the storage state of the intermediate result of the operator in real time
One or more embodiments of the present specification also provide a database management system that includes a temporary storage module configured with a plurality of format interfaces, different format interfaces for causing operators calling the temporary storage module to store data in different formats.
In one possible embodiment of the present specification, the temporary storage module is configured to store data in a hierarchical storage form;
Wherein the hierarchical storage form comprises: and storing the data into the data blocks, constructing index information for each data block, and storing the index information of each data block into the index block.
The database management system may be applied to the electronic device shown in fig. 6, or to other devices.
Further details and functions of the temporary storage module in the data management system have been described in more detail in the data processing method of the first aspect, and are not repeated here.
One or more embodiments of the present specification also provide a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to the first aspect.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
The user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation entries for the user to select authorization or rejection.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims (18)

1. A method of data processing, the method comprising:
Responding to an intermediate result generated in the process of executing an SQL sentence by an operator, and controlling the operator to call a temporary storage module through a corresponding format interface so as to store the intermediate result according to a format corresponding to the operator through the temporary storage module;
the temporary storage module is configured with a plurality of format interfaces, and different format interfaces are used for enabling operators calling the temporary storage module to store data according to different formats.
2. The data processing method according to claim 1, wherein the controlling the operator to store the intermediate result in the format corresponding to the operator through the temporary storage module includes:
the operator is controlled to store the intermediate result in a layered storage mode through the temporary storage module according to a format corresponding to the operator;
Wherein the hierarchical storage form comprises: and storing the data into the data blocks, constructing index information for each data block, and storing the index information of each data block into the index block.
3. The data processing method of claim 2, the hierarchical storage form further comprising: index information is built for each index block, and the index information of each index block is stored.
4. The data processing method of claim 2, the data block comprising at least one of the following information: block type, block number, amount of data, original length of data, and data writing location.
5. The data processing method of claim 2, the index information comprising at least one of: the index block number, the index block category, the hardware of the index block and the compressed length of the index block.
6. The data processing method of claim 2, the index block comprising at least one of the following information: block category, index information amount, index information.
7. The data processing method according to any one of claims 2 to 6, wherein the controlling the operator to store, by the temporary storage module, the intermediate result in a hierarchical storage form in a format corresponding to the operator, includes:
Controlling the operator to write the intermediate result into a current data block in a memory according to a format corresponding to the operator, wherein the current data block comprises a data block aimed at by the operator for current reading and writing;
And responding to the fact that the current data block is fully written, controlling the operator to distribute a new data block through the temporary storage module, and writing the intermediate result into the new data block according to a format corresponding to the operator.
8. The data processing method of claim 7, the controlling the operator to allocate new data blocks through the temporary storage module, comprising:
Responsive to the space occupied by the operator data block in the memory reaching a space threshold corresponding to the operator, controlling the operator to dump the operator data block to a disk through the temporary storage module, constructing index information of the operator data block, and distributing new data blocks in the memory;
controlling the operator to distribute new data blocks in the memory through the temporary storage module in response to the space occupied by the operator data blocks in the memory not reaching the space threshold corresponding to the operator;
The operator data block comprises a data block for storing an intermediate result generated by the operator in the memory.
9. The data processing method of claim 7, the method further comprising:
controlling the operator to read target data in the current data block of the memory;
And in response to the fact that the current data block does not contain the target data, controlling the operator to determine a data block in which the target data is located in the index block through the temporary storage module, and controlling the operator to read the target data in the data block in which the target data is located.
10. The data processing method of claim 9, the method further comprising:
and responding to the data block of the target data in the disk, and dumping the data block of the target data in the memory.
11. The data processing method of claim 9, the method further comprising:
And in response to the target data being distributed in a plurality of data blocks, controlling the operator to keep the plurality of data blocks in the memory through the temporary storage module in the process of reading the target data.
12. The data processing method according to claim 1, wherein the controlling the operator to store the intermediate result in the format corresponding to the operator through the temporary storage module includes:
And controlling the operator to compress and store the intermediate result according to a format corresponding to the operator through the temporary storage module.
13. The data processing method of claim 1, the method further comprising:
And controlling the temporary storage module to determine and update the storage state of the intermediate result of the operator in real time.
14. A data processing apparatus, the apparatus comprising:
The storage module is used for responding to an intermediate result generated in the process of executing the SQL sentence by the operator, controlling the operator to call the temporary storage module through the corresponding format interface so as to store the intermediate result according to the format corresponding to the operator through the temporary storage module;
the temporary storage module is configured with a plurality of format interfaces, and different format interfaces are used for enabling operators calling the temporary storage module to store data according to different formats.
15. A database management system, the system comprising a temporary storage module configured with a plurality of format interfaces, different format interfaces for causing operators calling the temporary storage module to store data in different formats.
16. The database management system of claim 15, the temporary storage module for data storage in a tiered storage format;
Wherein the hierarchical storage form comprises: and storing the data into the data blocks, constructing index information for each data block, and storing the index information of each data block into the index block.
17. An electronic device, comprising:
a processor;
A memory for storing processor-executable instructions;
wherein the processor is configured to implement the method of any one of claims 1-13 by executing the executable instructions.
18. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1-13.
CN202311793334.4A 2023-12-22 2023-12-22 Data processing method and device, electronic equipment and storage medium Pending CN118132600A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311793334.4A CN118132600A (en) 2023-12-22 2023-12-22 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311793334.4A CN118132600A (en) 2023-12-22 2023-12-22 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN118132600A true CN118132600A (en) 2024-06-04

Family

ID=91244746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311793334.4A Pending CN118132600A (en) 2023-12-22 2023-12-22 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118132600A (en)

Similar Documents

Publication Publication Date Title
CN110020542B (en) Data reading and writing method and device and electronic equipment
CN111737265B (en) Block data access method, block data storage method and device
CN107391544B (en) Processing method, device and equipment of column type storage data and computer storage medium
CN107092624B (en) Data storage method, device and system
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN108536759B (en) Sample playback data access method and device
CN116578410A (en) Resource management method, device, computer equipment and storage medium
CN116595096A (en) Metadata synchronization method, device, equipment and medium based on integration of lake and warehouse
CN118132600A (en) Data processing method and device, electronic equipment and storage medium
US20160292168A1 (en) File retention
CN110837338A (en) Storage index processing method and device
CN109582938B (en) Report generation method and device
CN114489481A (en) Method and system for storing and accessing data in hard disk
CN111367464B (en) Storage space management method and device
US10169250B2 (en) Method and apparatus method and apparatus for controlling access to a hash-based disk
CN113641871B (en) Lock-free hashing method, device, equipment and medium
CN116126797A (en) File cleaning method of big data cluster and related equipment
CN117763008A (en) Data sorting method and device
CN117806567A (en) Data processing method and device
CN117688033A (en) Data processing method and device, electronic equipment and storage medium
CN117350805A (en) Order management method and device
CN116541397A (en) State data query method and device
CN116860439A (en) Memory management method and device, electronic equipment and storage medium
CN115408363A (en) Data processing method, readable medium and electronic device
CN115794960A (en) Management method and device of relational database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination