CN111459884A

CN111459884A - Data processing method and device, computer equipment and storage medium

Info

Publication number: CN111459884A
Application number: CN202010270580.1A
Authority: CN
Inventors: 陈勇华
Original assignee: Guangzhou Lizhi Network Technology Co ltd
Current assignee: Guangzhou Lizhi Network Technology Co ltd
Priority date: 2019-03-26
Filing date: 2019-03-26
Publication date: 2020-07-28
Anticipated expiration: 2039-03-26
Also published as: CN109977078A; CN111459885B; CN111459885A; CN109977078B; CN111459884B

Abstract

The embodiment of the invention provides a data processing method, a data processing device, computer equipment and a storage medium, wherein a file is divided into at least two logical partitions, the at least two logical partitions are independently mapped to a memory, the at least two logical partitions are provided with index files, and the method comprises the following steps: receiving data to be stored; traversing the at least two logic partitions to search for a free block which can store the data, wherein the free block is an area which does not store records and has a continuous offset range in the logic partitions; storing the data into the free block to generate a new record; recording index information between the record and an offset range occupied by the record in the free block in the index file; updating the offset range of the free block map to an offset range not occupied by the record. The operations among the logical partitions are not affected mutually, and the local records of the large files can be flexibly processed.

Description

Data processing method and device, computer equipment and storage medium

The patent application of the invention is a divisional application of Chinese patent application with the application date of 2019, 3, 26 and the application number of 201910230878.7, and the name of the invention is 'a data processing method, a device, computer equipment and a storage medium'.

Technical Field

The present invention relates to the field of database technologies, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.

Background

In the current internet industry, various data are increasing, and no matter data are stored by using databases such as MySQ L (a relational database management system), Oracle (a relational database management system) and the like, or data are stored by using large data products such as HDFS (Hadoop distributed file system), Elasticsearch (a L ucene-based search server) and the like, as time goes on, the storage capacity of a certain type of data becomes larger, and the situation that the volume of a single file is very large, and reaches dozens of GB (Gigabyte), hundreds of GB (Gigabyte) and even several TB (Terabyte) often occurs.

For reading and writing of files, the whole file is usually loaded into a memory for operation at present, but for dozens of GB, hundreds of GB and even TB-level oversized files, the limitation of the size of the memory is limited, the whole file is difficult to load into the memory, and even if the whole file is loaded into the memory, the resource is occupied too much, so that the operation efficiency is low.

Disclosure of Invention

The embodiment of the invention discloses a data processing method and device, computer equipment and a storage medium, aiming at solving the problem of low operation efficiency caused by large volume of a file.

In a first aspect, an embodiment of the present invention provides a data processing method, where a file partitions at least two logical partitions, where the at least two logical partitions are independently mapped to a memory, and have index files, the method including:

receiving data to be stored;

traversing the at least two logic partitions to search for a free block which can store the data, wherein the free block is an area which does not store records and has a continuous offset range in the logic partitions;

storing the data into the free block to generate a new record;

recording index information between the record and an offset range occupied by the record in the free block in the index file;

updating the offset range of the free block map to an offset range not occupied by the record.

Optionally, said traversing said at least two logical partitions to find a free block that can store said data comprises:

determining the largest free block in the logic partition as a reference block;

if the length of the data is smaller than or equal to the size of the reference block, determining that the logical partition has a free block capable of storing the data;

determining a free block meeting a preset storage condition in the logic partition, wherein the storage condition is that the size of the free block is larger than the length of the data, and the difference between the size of the free block and the length of the data is minimum;

and if the length of the data is larger than the size of the reference block, determining that the logical partition does not have a free block capable of storing the data.

Optionally, the index file is a B + tree data structure, where the B + tree data structure includes leaf nodes and non-leaf nodes, the non-leaf nodes are used to store reference information of the leaf nodes, and the leaf nodes are used to store the index information of the record.

Optionally, the method further comprises:

receiving an update operation acting on a record;

updating the record according to the updating operation to determine an original record and a new record, wherein the original record is a record before updating, and the new record is an updated record;

if the length of the new record is less than or equal to the length of the original record, storing the new record in an offset range occupied by the original record;

recording index information between the new record and an offset range occupied by the new record in the offset range of the original record in the index file;

newly adding an idle block, wherein the idle block is mapped to an offset range not occupied by the new record;

if the length of the new record is larger than that of the original record, traversing the at least two logic partitions to search for an idle block capable of storing the new record;

storing the new record into the free block;

recording index information between the new record and an offset range occupied by the new record in the free block in the index file;

updating the offset range mapped by the free block to the offset range not occupied by the new record;

deleting the index information between the original record and the offset range occupied by the original record in the index file;

determining the offset range occupied by the original record to generate a new free block.

Optionally, said traversing said at least two logical partitions to find a free block that can store said new record comprises:

determining the largest free block in the logic partition as a reference block;

if the length of the new record is smaller than or equal to the size of the reference block, determining that the logical partition has a free block capable of storing the new record;

determining a free block meeting a preset storage condition in the logical partition, wherein the storage condition is that the size of the free block is larger than the length of the new record, and the difference between the size of the free block and the length of the new record is minimum;

and if the length of the new record is larger than the size of the reference block, determining that the logical partition does not have a free block capable of storing the data.

Optionally, the method further comprises:

receiving a delete operation acting on a record;

according to the deleting operation, deleting the index information between the record and the offset range occupied by the record in the index file;

determining the offset range results in a new free block.

Optionally, the method further comprises:

counting storage characteristic values of free blocks in the logic partitions;

if the storage characteristic value meets a preset expansion condition, expanding an offset range for the file;

adding a logic partition to the offset range;

the storage characteristic value comprises a total value of the size of the free blocks and/or the number of characteristic free blocks, and the size of the characteristic free blocks is larger than a preset first threshold value;

the expansion condition includes that the total value is smaller than a preset second threshold value and/or the number is smaller than a preset third threshold value.

Optionally, the method further comprises:

caching the records in the logic partition into a memory;

writing the record in the memory into the logical partition so that the record occupies a continuous offset range in the logical partition;

recording the record and index information between offset ranges occupied by the record in the logical partitions in the index file;

determining an offset range in the logical partition where the record is not stored results in a new free block.

Optionally, the method further comprises:

determining a first partition and a second partition, wherein the first partition is a logical partition to be migrated with a record, and the second partition is a logical partition to be migrated with a record;

reading all records in the second partition;

writing the record into a free block of the first partition;

updating the record in the index file and index information between offset ranges occupied by the record in the free blocks;

determining a target offset range to generate a new free block, wherein the target offset range comprises an offset range occupied by the record in the second partition;

if the record is not stored in the logical partition, the mapping of the logical partition to the memory is cancelled;

and narrowing the offset range corresponding to the logical partition in the file to cancel the logical partition.

Optionally, the method further comprises:

receiving a query operation;

determining an offset range expressed by a record to be queried in the index file according to the query operation;

reading the record in the offset range.

In a second aspect, an embodiment of the present invention further provides an apparatus for processing data, where a file partitions at least two logical partitions, the at least two logical partitions are independently mapped to a memory, and the at least two logical partitions have index files, the apparatus including:

the data receiving module is used for receiving data to be stored;

the first free block searching module is used for traversing the at least two logic partitions to search for a free block which can store the data, wherein the free block is an area which does not store records and has a continuous offset range in the logic partitions;

the data storage module is used for storing the data into the free block to generate a new record;

a first index information recording module, configured to record, in the index file, index information between the record and an offset range occupied by the record in the free block;

a first free block updating module, configured to update the offset range of the free block map to an offset range not occupied by the record.

Optionally, the first free block searching module includes:

a first reference block determination submodule, configured to determine a largest free block in the logical partition, as a reference block;

a first logical partition block determination sub-module, configured to determine that there is a free block in the logical partition, where the free block can store the data, if the length of the data is smaller than or equal to the size of the reference block;

a first storage condition determining submodule, configured to determine, in the logical partition, a free block that meets a preset storage condition, where the storage condition is that a size of the free block is larger than a length of the data, and a difference between the size of the free block and the length of the data is minimum;

and the second logical partition block determining submodule is used for determining that the logical partition does not have a free block which can store the data if the length of the data is larger than the size of the reference block.

Optionally, the method further comprises:

an update operation receiving module for receiving an update operation applied to a record;

a record updating module, configured to update the record according to the updating operation to determine an original record and a new record, where the original record is a record before updating, and the new record is an updated record;

the first new record storage module is used for storing the new record in an offset range occupied by the original record if the length of the new record is less than or equal to the length of the original record;

a second index information recording module, configured to record, in the index file, index information between the new record and an offset range occupied by the new record in the offset range of the original record;

an idle block adding module for adding an idle block, wherein the idle block is mapped to an offset range not occupied by the new record;

a second free block searching module, configured to traverse the at least two logical partitions to search for a free block that can store the new record if the length of the new record is greater than the length of the original record;

a second new record storage module, configured to store the new record into the free block;

a third index information recording module, configured to record, in the index file, index information between the new record and an offset range occupied by the new record in the free block;

a second free block updating module, configured to update the offset range mapped by the free block to an offset range not occupied by the new record;

a first index information deleting module, configured to delete, in the index file, index information between the original record and an offset range occupied by the original record;

and the first idle block generation module is used for determining the offset range occupied by the original record to generate a new idle block.

Optionally, the second free block searching module includes:

a second reference block determination submodule, configured to determine a largest free block in the logical partition, as a reference block;

a third logical partition block determination submodule, configured to determine that there is a free block in the logical partition, where the free block is capable of storing the new record, if the length of the new record is smaller than or equal to the size of the reference block;

a second storage condition determining submodule, configured to determine, in the logical partition, a free block that meets a preset storage condition, where the storage condition is that a size of the free block is larger than a length of the new record, and a difference between the size of the free block and the length of the new record is minimum;

and the fourth logical partition block determination submodule is used for determining that the logical partition does not have a free block which can store the data if the length of the new record is larger than the size of the reference block.

Optionally, the method further comprises:

a deletion operation receiving module for receiving a deletion operation acting on a record;

the second index information deleting module is used for deleting the index information between the record and the offset range occupied by the record in the index file according to the deleting operation;

and the second idle block generation module is used for determining the offset range to generate a new idle block.

Optionally, the method further comprises:

the storage characteristic value counting module is used for counting the storage characteristic values of the free blocks in the logic partitions;

the file expansion module is used for expanding the offset range of the file if the storage characteristic value meets a preset expansion condition;

a logical partition adding module for adding a logical partition to the offset range;

Optionally, the method further comprises:

the record caching module is used for caching the records in the logic partition into a memory;

a record rewriting module, configured to write a record in the memory into the logical partition, so that the record occupies a continuous offset range in the logical partition;

a fourth index information recording module, configured to record, in the index file, the index information between the record and the offset range occupied by the record in the logical partition;

and a third free block generation module, configured to determine that the offset range in the logical partition where the record is not stored generates a new free block.

Optionally, the method further comprises:

a migration partition determining module, configured to determine a first partition and a second partition, where the first partition is a logical partition to be migrated to record, and the second partition is a logical partition to be migrated to record;

the record migration module is used for reading all records in the second partition;

a record immigration module, configured to write the record into a free block of the first partition;

the index information updating module is used for updating the records in the index file and the index information between the offset ranges occupied by the records in the free blocks;

a fourth free block generation module, configured to determine a target offset range to generate a new free block, where the target offset range includes an offset range occupied by the record in the second partition;

the memory mapping canceling module is used for canceling the mapping of the logical partition to the memory if the logical partition does not store the record;

and the logical partition canceling module is used for reducing the offset range corresponding to the logical partition in the file so as to cancel the logical partition.

Optionally, the method further comprises:

the query operation receiving module is used for receiving query operation;

the offset range query module is used for determining the offset range expressed by the record to be queried in the index file according to the query operation;

a record reading module to read the record in the offset range.

In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the data processing method when executing the program.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the data processing method.

In the embodiment of the invention, the file is divided into at least two logic partitions, each logic partition is independently mapped to the memory, the records in the logic partition can be independently operated without loading the whole file into the memory for operation, the occupation of resources is reduced, the operations among the logic partitions are not influenced mutually, and the local records of the large file can be flexibly processed.

In addition, the idle blocks are arranged in the logic area, the idle blocks can be flexibly selected to add data, the operations of erasing the data and the like can be reduced, and the storage efficiency of the data is improved.

Drawings

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating mapping of a logical partition to a memory according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an example of a B + tree data structure according to an embodiment of the present invention;

fig. 4 is a flowchart of a data processing method according to a second embodiment of the present invention;

fig. 5 is a flowchart of a data processing method according to a third embodiment of the present invention;

fig. 6 is a flowchart of a data processing method according to a fourth embodiment of the present invention;

fig. 7 is a flowchart of a data processing method according to a fifth embodiment of the present invention;

fig. 8A to 8E are exemplary diagrams of a partition defragmentation and partition compression according to a fifth embodiment of the present invention;

fig. 9 is a flowchart of a data processing method according to a sixth embodiment of the present invention;

fig. 10 is a schematic structural diagram of a data processing apparatus according to a seventh embodiment of the present invention;

fig. 11 is a schematic structural diagram of a computer device according to an eighth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention, where the method is applicable to a case where records are added in a file, and the method may be executed by a data processing apparatus, where the data processing apparatus may be configured in a computer device, such as a server, a workstation, or the like, and a file for storing records related to a business, such as a website log, is stored in a database of the computer device.

As shown in fig. 2, the file may be divided into at least two logical partitions according to size and the like, data between the logical partitions are not affected, and an offset range of each logical partition is provided.

For example, logical partition 1 has an offset range of 0-1073741824, i.e., 0-1GByte, and logical partition 2 has an offset range of 1073741824 and 2147483648, i.e., 1GByte-2 GByte.

At least two logical partitions are independently Mapped to the Memory by adopting a Memory Mapped File (MMF), so that records in the logical partitions are operated.

The memory mapping file is a mapping from a file to a process address space, when an application program operates data in a mapping area, the application program does not need to execute I/O (input/output) operation on the file like operating a local array, and a system is responsible for synchronizing mapping records to the file, so that the efficiency is high.

In the embodiment of the invention, after the memory is mapped to the logical partitions, the records in the logical partitions can be independently operated without loading the whole file into the memory for operation, and the operations among the logical partitions are not affected mutually, including the addition, deletion, modification, inquiry, defragmentation and the like of the records.

Moreover, the logic partitions with proper size and quantity are divided, so that the efficiency is higher during defragmentation, a plurality of logic partitions can perform defragmentation simultaneously, the defragmentation is not influenced mutually, and the concurrency efficiency is high.

Generally, the size of a single memory mapping area is limited, and the operation mode of dividing the memory mapping area into a plurality of logical partitions and mapping the logical partitions is adopted, so that the efficiency of recording operation is higher, and the local recording for processing a large file is more flexible.

As shown in fig. 1, the method specifically includes the following steps:

s101, receiving data to be stored.

The application program generates new data according to business requirements, such as generating a new website log, and writes the new data into a file of the database to store the new data in the file.

S102, traversing the at least two logic partitions to search for free blocks capable of storing the data.

In the embodiment of the present invention, the free block is an area in the logical partition where no record is stored and the offset range is continuous, and the free block can be used for storing a new record.

Note that the "non-stored record" may mean that data is not stored or that data is stored, and the data is invalidated.

Optionally, a free block linked list may be set for each logical partition, and the offset range of the free block in the logical partition is recorded in the free block linked list.

For example, in one logical partition, records are stored in the offset amount ranges of 0-1024, 3072-.

After receiving the data to be stored, the logical partition by partition may check the free block linked list to determine whether there is a free block of a size greater than or equal to the length of the data.

If so, the free block is pre-allocated for storing the data, at which point the state of the free block is modified to "pre-allocated".

In a preferred embodiment of the present invention, S102 comprises the steps of:

and S1021, determining the largest free block in the logic partition as a reference block.

S1022, if the length of the data is smaller than or equal to the size of the base block, determining that the logical partition has a free block capable of storing the data.

In the idle block linked list, the idle blocks can be sorted according to the size according to the service condition, and because the idle blocks in the idle block linked list are ordered, the reference block can be determined through sorting.

For example, if the free blocks in the free block linked list are sorted from small to large according to size, the last free block in the free block linked list is the reference block.

When checking logical partitions, the length of the data is compared with the size of the reference block.

If the length of the data is smaller than or equal to the size of the reference block, determining that the logic partition has a free block capable of storing the data, and continuing to look up a proper free block from the free block chain table.

And S1023, determining free blocks meeting preset storage conditions in the logic partition.

Wherein the storage condition is that the size of the free block is larger than the length of the data, and the difference between the size of the free block and the length of the data is minimum.

For example, assuming that the length of the data is 70, the size of the free blocks in the free block linked list of a certain logical partition is [10,150,75,30,90,120,45,100], i.e., the free blocks with the size greater than or equal to 70 are allocated from the logical partition to store the data.

If sorted, a free block of size 75 would be selected and the storage space would be more fully utilized.

And S1024, if the length of the data is larger than the size of the reference block, determining that the logical partition does not have a free block capable of storing the data.

If the length of the data is larger than the size of the reference block, determining that the edited partition does not have a free block capable of storing the data, and continuously checking the next logical partition.

S103, storing the data into the free block to generate a new record.

After the appropriate free block is found, the data is written into the offset range indicated by the free block as a new record in the file.

Generally, the data is written from the initial offset of the offset range indicated by the free block, so that the data is continuous with the adjacent record in the offset range, the number of free blocks is reduced, and the utilization rate of the storage space is improved.

In addition, if appropriate free blocks are not searched in all the logical partitions, the data can be stored at the end of the file, an expansion request is generated, the logical partitions are requested to be expanded, and the appropriate free blocks are searched again.

And S104, recording index information between the record and the offset range occupied by the record in the free block in the index file.

In the embodiment of the present invention, at least two logical partitions have index files, and after data is stored, index information in the index files, including the numbers of the logical partitions, the offset ranges, the lengths of records, and the like, may be updated.

Optionally, the index file is a B + tree data structure, and the B + tree data structure includes leaf nodes (all nodes in the third layer shown in fig. 3) and non-leaf nodes (root node [10] in the first layer and [5], [14,19] in the nodes in the second layer shown in fig. 3).

It should be noted that the numbers in fig. 3 represent unique identifiers of the records, such as the IDs of the records.

The non-leaf nodes are used for storing the reference information of the leaf nodes, and record data is not stored, so that a single non-leaf node can store more reference information of the leaf nodes, the height of the B + tree data structure is lower, the number of layers passing through is smaller when index information in the leaf nodes is queried, and the query efficiency is higher.

When a record of a file is read, other records behind the record to be read are also read into a disk block for application operation based on the principle of spatial locality. As the non-leaf nodes of the B + tree data structure do not store the recorded data and have smaller volume, one disk block can store more non-leaf nodes. When index information is inquired, corresponding information can be inquired through fewer disk blocks, and the number of I/O read-write operations is reduced.

The leaf nodes are used for storing the recorded index information, and when the index information is inquired each time, the path length from the root node to the leaf node is the same, so that the inquiring efficiency is very close to each other, and the inquiring efficiency is stable.

Each leaf node may record reference information (also called a pointer) for the next leaf node, and all leaf nodes are combined to form an ordered linked list. When the range is queried, the linked list is traversed transversely, so that the query efficiency is higher.

In the process of partition defragmentation, the method is particularly suitable for inquiring index information under a certain logic partition based on the leaf node linked list of the B + tree.

S105, updating the offset range of the free block mapping to the offset range not occupied by the record.

In particular implementations, the allocation of new data to free blocks may be confirmed, thereby generating records.

In one case, if the length of the record is equal to the size of the free block, the free block is deleted in the free block linked list.

In another case, if the length of the record is smaller than the size of the free block, determining an offset range which is not occupied by the record in the offset range mapped by the original free block, and updating the offset range which is not occupied by the record into the free block linked list.

Example two

Fig. 4 is a flowchart of a data processing method according to a second embodiment of the present invention, which is based on the foregoing embodiment and further adds a record updating operation. The method specifically comprises the following steps:

s401, receiving an updating operation applied to a record.

And the application program updates the record stored in the file according to the service requirement, so that the update operation is triggered aiming at the record.

S402, updating the record according to the updating operation to determine the original record and the new record.

In the embodiment of the invention, the corresponding record is updated in response to the updating operation, and at the moment, the original record and the new record are determined.

Wherein the original record is a record before updating, and the new record is an updated record.

And S403, if the length of the new record is less than or equal to the length of the original record, storing the new record in an offset range occupied by the original record.

S404, recording index information between the new record and an offset range occupied by the new record in the offset range of the original record in the index file.

And S405, adding a new idle block.

In a specific implementation, the length of the new record after updating can be judged, and the length of the new record is compared with the length of the original record.

If the length of the new record is less than or equal to the length of the original record, the new record can be written directly on the original storage location (offset range expressed by the original record).

Typically, the new record is written starting from the beginning offset of the offset range indicated by the original record.

In the index file, index information of the new record is updated, including the number of the logical partition, the offset range, the length of the record, and the like.

Further, if the length of the new record is smaller than the length of the original record, after the index information of the new record is updated in the index file, a new free block is generated, the free block is mapped to the offset range not occupied by the new record, and at this time, the new free block can be added into the free block linked list.

It should be noted that, if the original record is continuous with other free blocks in the offset range, so that the new free block is continuous with other free blocks in the offset range, the new free block may be merged with other free blocks, and the free block linked list is updated accordingly.

S406, if the length of the new record is larger than that of the original record, traversing the at least two logic partitions to search for a free block capable of storing the new record.

If the length of the new record is larger than that of the original record, the new record is not written in the original storage position (the offset range expressed by the original record), at this time, the free block linked list can be checked one by one logic partition to judge whether a free block with the size larger than or equal to that of the new record exists.

If so, the free block is pre-allocated for storing the new record, at which point the state of the free block is modified to "pre-allocated".

In a preferred embodiment of the present invention, S406 includes the steps of:

s4061, determining the largest free block in the logical partition as a reference block.

S4062, if the length of the new record is smaller than or equal to the size of the reference block, determining that there is a free block in the logical partition, where the new record can be stored.

S4063, determining a free block meeting preset storage conditions in the logic partition.

Wherein the storage condition is that the size of the free block is larger than the length of the new record, and the difference between the size of the free block and the length of the new record is minimum.

S4064, if the length of the new record is larger than the size of the reference block, determining that the logical partition does not have a free block capable of storing the data.

In the embodiment of the present invention, since the applications of S4061-S4064 and S1021-S1024 are substantially similar, the descriptions of S4061-S4064 are simple, and the relevant points can be referred to the partial descriptions of method embodiments S1021-S1024, and the embodiments of the present invention are not described in detail herein.

S407, storing the new record into the free block.

After the appropriate free block is queried, a new record is written into the offset range indicated by the free block, thereby implementing the update.

Generally, the new record is written from the initial offset of the offset range indicated by the free block, so that the new record and the adjacent record are continuous in the offset range, the number of free blocks is reduced, and the utilization rate of the storage space is improved.

In addition, if no proper free block is searched in all the logical partitions, the new record can be stored at the end of the file, an extension request is generated, the logical partitions are requested to be extended, and the proper free block is searched again.

And S408, recording index information between the new record and the offset range occupied by the new record in the free block in the index file.

After storing the new record, the index information in the index file may be updated, including the number of logical partitions, the offset range, the length of the new record, etc.

S409, updating the offset range mapped by the free block to the offset range not occupied by the new record.

In particular implementations, the allocation of new records to free blocks may be confirmed.

In one case, if the length of the new record is equal to the size of the free block, the free block is deleted in the free block linked list.

In another case, if the length of the new record is smaller than the size of the free block, determining an offset range not occupied by the new record in the offset range mapped by the original free block, and updating the offset range not occupied by the new record into the free block linked list.

And S410, deleting the index information between the original record and the offset range occupied by the original record in the index file.

S411, determining the offset range occupied by the original record to generate a new idle block.

For an original record, the index information of the original record may be deleted in the index file.

It should be noted that the original record is not erased but in a failed state, and the offset range of the original record in the file becomes a new free block, and the free block is added into the free block linked list.

When other records are manipulated (e.g., added, updated), the offset range may be allocated to cover the record, thereby reusing storage space.

In the embodiment of the invention, the original record or the idle block can be flexibly selected to store the new record, the operations of erasing the record and the like can be reduced, and the update efficiency of the record is improved.

EXAMPLE III

Fig. 5 is a flowchart of a data processing method according to a third embodiment of the present invention, and the present embodiment further adds a deletion operation of a record based on the foregoing embodiments. The method specifically comprises the following steps:

s501, receiving a deleting operation applied to a record.

And the application program deletes the record stored in the file according to the service requirement, so that deletion operation is triggered aiming at the record.

S502, according to the deleting operation, deleting the index information between the record and the offset range occupied by the record in the index file.

S503, determining the offset range to generate a new idle block.

In response to the deletion operation, the recorded index information may be deleted in the index file.

It should be noted that the record is not erased, but in a failure state, the offset range of the record in the file becomes a new free block, and the free block is added into the free block linked list.

Example four

Fig. 6 is a flowchart of a data processing method according to a fourth embodiment of the present invention, and the present embodiment further adds partition expansion operations based on the foregoing embodiments. The method specifically comprises the following steps:

s601, counting the storage characteristic values of the free blocks in the logic partitions.

In a specific implementation, the free block linked list of each logical partition may be queried at a certain interval or when an expansion request is received, and the storage characteristic values of the free blocks in the logical partitions are counted, where the storage characteristic values are used to represent characteristics of record storage.

In one example, the stored characteristic value comprises a total value of the size of the free blocks and/or a number of characteristic free blocks, wherein the size of the characteristic free blocks is greater than a preset first threshold.

S602, if the storage characteristic value meets a preset expansion condition, expanding an offset range for the file.

And S603, adding a logic partition to the offset range.

By applying the embodiment of the invention, the expansion condition can be set aiming at the storage characteristic value in advance and is used for expressing the condition of expanding the logical partition.

In one example, the expansion condition includes that the total value is less than a preset second threshold value and/or that the number is less than a preset third threshold value.

And comparing the storage characteristic value with the extension condition, if the storage characteristic value meets the extension condition, calling a file operation function to expand the offset range of the file, setting the expanded offset range as a new logic partition, and mapping the new logic partition by adopting a memory mapping file mode.

EXAMPLE five

Fig. 7 is a flowchart of a data processing method according to a fifth embodiment of the present invention, where the foregoing embodiment is taken as a basis, and a partition defragmentation operation and a partition compression operation are further added in this embodiment, in a file use process, because a new addition operation, an update operation, a deletion operation, and the like, a large number of scattered free blocks (i.e., fragments) exist in a logical partition, which are not favorable for storing a large record, and through the partition defragmentation operation, the scattered free blocks can be integrated into one large free block, so that records are migrated between logical partitions, and a storage space is further saved, where the method specifically includes the following steps:

and S701, caching the records in the logic partition into a memory.

S702, writing the record in the memory into the logical partition so that the record occupies a continuous offset range in the logical partition.

And S703, recording the record in the index file and the index information between the offset ranges occupied by the record in the logical partition.

S704, determining the offset range in the logical partition where the record is not stored to generate a new free block.

When defragmentation operation is carried out in a single logic partition, effective records are migrated to the initial offset of the logic partition and are compactly stored together, so that the offset range is continuous, no idle block exists between every two records, scattered idle blocks are migrated to the cut-off offset of the logic partition and are combined into a finished idle block.

After the defragmentation is completed, the free block of the logical partition can store more new records, thereby improving the utilization rate of the storage space.

In particular implementations, an exclusive lock may be added to a logical partition to be defragmented to prevent the business layer from operating on records in the logical partition.

And traversing the index file (such as leaf nodes of the B + tree data structure) and inquiring index information of all records under the logical partition.

And querying records from the logical partition according to the index information, and caching the records in the memory.

The records cached in the memory are written into the logical partition in batches in order, typically starting from the starting offset of the partition.

Updating the index information of the records in batch, and deleting the records cached in the memory.

When all the records are written, the record migration is completed, and at this time, a complete free block is formed between the cut-off offset of the last record and the cut-off offset of the logical partition.

And emptying elements in the free block linked list of the logic partition, and adding the free blocks into the free block linked list.

At this point, the defragmentation operation of the logical partition is completed, and the exclusive lock of the logical partition is released.

In one example, as shown in FIG. 8A, there are 3 logical partitions in a certain file, there are multiple records (record 1, record 2, record 3, record 4 … …) and two free blocks (free block 1, free block 2) for a certain logical partition,

during defragmentation, as in the direction of an arrow, the records are sequentially migrated to the start offset of the logical partition, the free blocks are migrated to the end offset of the logical partition, and a finished free block is synthesized.

As shown in fig. 8B, 3 logical partitions (logical partition 1, logical partition 2, logical partition 3) in the file are defragmented, and each logical partition forms a storage area for storing records with a continuous offset range, and a complete free block.

S705, determining the first partition and the second partition.

By applying the embodiment of the present invention, a compression condition may be preset, for example, two adjacent logical partitions, and the total length recorded in the subsequent logical partition is smaller than a threshold, where the threshold is a specified proportion of the size of a free block in the previous logical partition (the proportion may be configured according to a service requirement).

If the compression condition is satisfied, the first partition and the second partition may be selected.

The first partition is a logical partition to be migrated with the record, and the second partition is a logical partition to be migrated with the record.

Generally, to improve the efficiency of compression, records in the subsequent logical partition may be sequentially migrated to free blocks in the previous logical partition.

Thus, for a logical partition, in some cases, the logical partition may be a first partition, and in some cases, the logical partition may be a second partition.

For example, as shown in fig. 8B, when a record in logical partition 2 is migrated to logical partition 1, logical partition 2 is the second partition, and logical partition 1 is the first partition, and when a record in logical partition 3 is migrated to logical partition 2, logical partition 3 is the second partition, and logical partition 2 is the first partition.

And S706, reading all records in the second partition.

And S707, writing the record into a free block of the first partition.

S708, updating the record in the index file and the index information between the offset ranges occupied by the record in the free blocks.

S709, determining a target offset range to generate a new idle block, wherein the target offset range comprises an offset range occupied by the record in the second partition.

In the embodiment of the invention, the size of the free block of the first partition is larger than the total length of all records in the second partition, so that all records in the second partition can be read out according to the index information and then written into the free block of the first partition, and then the recorded index information and the free block linked list of the first partition and the second partition are updated.

And S710, if the record is not stored in the logical partition, canceling the mapping of the logical partition to the memory.

And S711, reducing the offset range corresponding to the logical partition in the file to cancel the logical partition.

According to S705-S709, all logical partitions in the file are traversed until the migration of records in all logical partitions is completed.

At this time, the logical partition in the file may be detected, and if the logical partition does not store a record, that is, the entire logical partition is a complete free block, at this time, the memory mapping of the logical partition may be cancelled, and the file operation function may be called to compress the offset range of the file, thereby cancelling the logical partition and releasing the disk space.

Generally, if the records in the logical partitions in the file are migrated in order, the logical partitions that do not store the records are generally at the last bit of the file.

In one example, as shown in FIG. 8C, the records in logical partition 2 (total length) are less than the specified proportion of free blocks (size) in logical partition 1, then the records in logical partition 2 may be migrated into the free blocks in logical partition 1 as indicated by the arrows.

As shown in fig. 8D, the records in logical partition 3 (total length) are less than the specified proportion of free blocks (size) in logical partition 2, then the records in logical partition 3 may be migrated into the free blocks in logical partition 2 as indicated by the arrows.

At this point, logical partition 3 stores no records as a complete free block.

As shown in fig. 8E, the memory mapping of logical partition 3 is canceled, and the size of the file is compressed, so that logical partition 3 is canceled, and logical partition 1 and logical partition 2 are reserved.

EXAMPLE six

Fig. 9 is a flowchart of a data processing method according to a sixth embodiment of the present invention, and the present embodiment further increases query operations of records based on the foregoing embodiments. The method specifically comprises the following steps:

and S901, receiving query operation.

And the application program inquires the record stored in the file according to the service requirement, so that inquiry operation is triggered aiming at the record.

S902, according to the query operation, determining an offset range expressed by the record to be queried in the index file.

And S903, reading the record in the offset range.

In the embodiment of the present invention, in response to the query operation, the index information of the record is queried in the index file, so as to determine the logical partition in which the record is located and the offset range of the logical partition.

Within the offset range in the logical partition, the record is read.

EXAMPLE seven

Fig. 10 is a schematic structural diagram of a data processing apparatus according to a seventh embodiment of the present invention, where a file is partitioned into at least two logical partitions, the at least two logical partitions are independently mapped to a memory, and the at least two logical partitions have index files, where the apparatus may specifically include the following modules:

a data receiving module 1001, configured to receive data to be stored;

a first free block searching module 1002, configured to traverse the at least two logical partitions to search for a free block that can store the data, where the free block is an area that stores no record and has a continuous offset range in the logical partition;

a data storage module 1003, configured to store the data into the free block to generate a new record;

a first index information recording module 1004, configured to record, in the index file, index information between the record and an offset range occupied by the record in the free block;

a first free block updating module 1005, configured to update the offset range of the free block map to an offset range not occupied by the record.

Optionally, the first free block searching module 1002 includes:

Optionally, the method further comprises:

Optionally, the second free block searching module includes:

Optionally, the method further comprises:

the query operation receiving module is used for receiving query operation;

a record reading module to read the record in the offset range.

The data processing apparatus provided in the embodiment of the present invention can implement each process in the method embodiments of fig. 1 to 9, and is not described here again to avoid repetition.

Example eight

Fig. 11 is a schematic structural diagram of a computer device according to an eighth embodiment of the present invention. FIG. 11 illustrates a block diagram of an exemplary computer device 1100 suitable for use in implementing embodiments of the invention. The computer device 1100 shown in fig. 11 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.

As shown in fig. 11, computer device 1100 is in the form of a general purpose computing device. The components of computer device 1100 may include, but are not limited to: one or more processors or processing units 160, a system memory 280, and a bus 180 that couples the various system components (including the system memory 280 and the processing unit 160).

Bus 180 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 1100 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 1100 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 280 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)300 and/or cache memory 320. The computer device 1100 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 11, and commonly referred to as a "hard drive"). Although not shown in FIG. 11, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 180 by one or more data media interfaces. Memory 280 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 400 having a set (at least one) of program modules 420 may be stored, for example, in memory 280, such program modules 420 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 420 generally perform the functions and/or methodologies of embodiments of the invention as described herein.

Computer device 1100 may also communicate with one or more external devices 140 (e.g., keyboard, pointing device, display 240, etc.), and may also communicate with one or more devices that enable a user to interact with the computer device 120, and/or with any devices (e.g., network card, modem, etc.) that enable the computer device 120 to communicate with one or more other computing devices, such communication may occur via input/output (I/O) interfaces 220. moreover, computer device 120 may also communicate with one or more networks (e.g., local area network (L AN), Wide Area Network (WAN) and/or a public network, such as the Internet) via network adapter 200. As shown, network adapter 200 communicates with the other modules of computer device 120 via bus 180. it should be appreciated that, although not shown, other hardware and/or software modules, including but not limited to microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like, may be used in conjunction with computer device 120.

The processing unit 160 executes various functional applications and data processing, for example, implementing a data processing method provided by an embodiment of the present invention, by executing a program stored in the system memory 280.

Example nine

The ninth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the data processing method.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including AN object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for processing data, wherein a file is partitioned into at least two logical partitions, the at least two logical partitions are independently mapped to a memory, the at least two logical partitions have an index file, the method comprising:

receiving data to be stored;

storing the data into the free block to generate a new record;

updating the offset range mapped by the free block to the offset range not occupied by the record;

counting storage characteristic values of free blocks in the logic partitions;

adding a logic partition to the offset range;

2. The method of claim 1, wherein traversing the at least two logical partitions for free blocks that can store the data comprises:

determining the largest free block in the logic partition as a reference block;

3. The method of claim 1, wherein the index file is a B + tree data structure, wherein the B + tree data structure comprises leaf nodes and non-leaf nodes, wherein the non-leaf nodes are configured to store reference information of the leaf nodes, and wherein the leaf nodes are configured to store index information of the records.

4. The method according to any one of claims 1-3, further comprising:

receiving an update operation acting on a record;

storing the new record into the free block;

5. The method of claim 4, wherein traversing the at least two logical partitions for free blocks that can store the new record comprises:

determining the largest free block in the logic partition as a reference block;

6. The method according to any one of claims 1-3, further comprising:

receiving a delete operation acting on a record;

determining the offset range results in a new free block.

7. The method according to any one of claims 1-3, further comprising:

receiving a query operation;

reading the record in the offset range.

8. An apparatus for processing data, wherein a file is partitioned into at least two logical partitions, the at least two logical partitions are independently mapped to a memory, the at least two logical partitions have an index file, the apparatus comprising:

the data receiving module is used for receiving data to be stored;

a first free block updating module, configured to update the offset range mapped by the free block to an offset range not occupied by the record;

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a method of processing data according to any of claims 1-7 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for processing data according to any one of claims 1 to 7.