CN111459884B

CN111459884B - Data processing method and device, computer equipment and storage medium

Info

Publication number: CN111459884B
Application number: CN202010270580.1A
Authority: CN
Inventors: 陈勇华
Original assignee: Guangzhou Lizhi Network Technology Co ltd
Current assignee: Guangzhou Lizhi Network Technology Co ltd
Priority date: 2019-03-26
Filing date: 2019-03-26
Publication date: 2023-05-16
Anticipated expiration: 2039-03-26
Also published as: CN109977078B; CN109977078A; CN111459884A; CN111459885A; CN111459885B

Abstract

The embodiment of the invention provides a data processing method, a device, computer equipment and a storage medium, wherein a file is divided into at least two logical partitions, the at least two logical partitions are independently mapped to a memory, the at least two logical partitions are provided with index files, and the method comprises the following steps: receiving data to be stored; traversing the at least two logical partitions to find an idle block capable of storing the data, wherein the idle block is an area which is not stored with records in the logical partitions and has a continuous offset range; storing the data into the free block to generate a new record; recording index information between the record and an offset range occupied by the record in the idle block in the index file; and updating the offset range of the free block mapping to an offset range not occupied by the record. The operation between the logic partitions is not affected, and the local record of the large file can be flexibly processed.

Description

Data processing method and device, computer equipment and storage medium

The patent application of the invention is a divisional application of Chinese patent application with the application date of 2019, 3 month and 26 days, the application number of 201910230878.7 and the name of 'a data processing method, a device, computer equipment and a storage medium'.

Technical Field

The present invention relates to the field of database technologies, and in particular, to a data processing method, apparatus, computer device, and storage medium.

Background

In the internet industry today, a variety of data are increasingly stored, whether using MySQL (a relational database management system), oracle (a relational database management system) and other databases to store data, or using HDFS (Hadoop distributed file system), elastic search (a Lucene-based search server) and other large data products to store data, with time, the storage volume of a certain type of data is increasingly larger, and a situation that a single file is very large, up to several tens of GB (gigabytes), hundreds of GB (terabytes) and even several TB (terabytes) often occurs.

For reading and writing of files, at present, the whole file is usually loaded into a memory for operation, but for ultra-large files of tens of GB, hundreds of GB and even TB levels, the files are limited by the limitation of the size of the memory, so that the whole file is difficult to load into the memory, even if the whole file is loaded into the memory, resources are excessively occupied, and the operation efficiency is low.

Disclosure of Invention

The embodiment of the invention discloses a data processing method, a data processing device, computer equipment and a storage medium, which are used for solving the problem of low operation efficiency caused by large file size.

In a first aspect, an embodiment of the present invention provides a method for processing data, where a file divides at least two logical partitions, where the at least two logical partitions are mapped to a memory independently, and the at least two logical partitions have index files, and the method includes:

receiving data to be stored;

traversing the at least two logical partitions to find an idle block capable of storing the data, wherein the idle block is an area which is not stored with records in the logical partitions and has a continuous offset range;

storing the data into the free block to generate a new record;

recording index information between the record and an offset range occupied by the record in the idle block in the index file;

and updating the offset range of the free block mapping to an offset range not occupied by the record.

Optionally, said traversing said at least two logical partitions to find a free block in which said data may be stored comprises:

determining the largest idle block in the logical partition as a reference block;

if the length of the data is smaller than or equal to the size of the reference block, determining that the logical partition has an idle block capable of storing the data;

Determining an idle block meeting a preset storage condition in the logical partition, wherein the storage condition is that the size of the idle block is larger than the length of the data, and the difference between the size of the idle block and the length of the data is minimum;

and if the length of the data is larger than the size of the reference block, determining that the logical partition does not have a free block capable of storing the data.

Optionally, the index file is a b+ tree data structure, and the b+ tree data structure includes leaf nodes and non-leaf nodes, where the non-leaf nodes are used to store reference information of the leaf nodes, and the leaf nodes are used to store the recorded index information.

Optionally, the method further comprises:

receiving an update operation for a record;

updating the record according to the updating operation to determine an original record and a new record, wherein the original record is a record before updating, and the new record is a record after updating;

if the length of the new record is smaller than or equal to the length of the original record, storing the new record in the offset range occupied by the original record;

recording index information between the new record and an offset range occupied by the new record in the offset range of the original record in the index file;

Adding a free block, wherein the free block is mapped to an offset range which is not occupied by the new record;

if the length of the new record is greater than that of the original record, traversing the at least two logical partitions to find a free block in which the new record can be stored;

storing the new record into the free block;

recording index information between the new record and an offset range occupied by the new record in the idle block in the index file;

updating the offset range of the free block mapping to an offset range not occupied by the new record;

deleting index information between the original record and the offset range occupied by the original record in the index file;

and determining the offset range occupied by the original record to generate a new idle block.

Optionally, said traversing said at least two logical partitions to find a free block in which said new record may be stored comprises:

if the length of the new record is smaller than or equal to the size of the reference block, determining that the logical partition has an idle block capable of storing the new record;

Determining an idle block meeting a preset storage condition in the logical partition, wherein the storage condition is that the size of the idle block is larger than the length of the new record, and the difference between the size of the idle block and the length of the new record is minimum;

and if the length of the new record is larger than the size of the reference block, determining that the logical partition does not have a free block capable of storing the data.

Optionally, the method further comprises:

receiving a delete operation on a record;

according to the deleting operation, deleting index information between the record and the offset range occupied by the record in the index file;

determining the offset range results in a new free block.

Optionally, the method further comprises:

counting the storage characteristic values of the idle blocks in the logical partition;

if the stored characteristic value accords with a preset expansion condition, expanding an offset range for the file;

adding a logical partition to the offset range;

the storage characteristic value comprises a total value of the size of the idle blocks and/or the number of the characteristic idle blocks, and the size of the characteristic idle blocks is larger than a preset first threshold value;

the extension condition includes the total value being smaller than a preset second threshold value and/or the number being smaller than a preset third threshold value.

Optionally, the method further comprises:

caching the records in the logic partition into a memory;

writing the record in the memory into the logic partition so that the record occupies a continuous offset range in the logic partition;

recording the record and index information between the offset ranges occupied by the record in the logical partition in the index file;

determining that the recorded offset range is not stored in the logical partition generates a new free block.

Optionally, the method further comprises:

determining a first partition and a second partition, wherein the first partition is a logic partition of a record to be migrated, and the second partition is a logic partition of the record to be migrated;

reading all records in the second partition;

writing the record into a free block of the first partition;

updating the record and index information between the offset range occupied by the record in the idle block in the index file;

determining a target offset range to generate a new idle block, wherein the target offset range comprises an offset range occupied by the record in the second partition;

if the record is not stored in the logic partition, canceling the mapping of the logic partition to the memory;

And reducing the offset range corresponding to the logical partition in the file so as to cancel the logical partition.

Optionally, the method further comprises:

receiving a query operation;

determining an offset range expressed by a record to be queried in the index file according to the query operation;

the record is read in the offset range.

In a second aspect, an embodiment of the present invention further provides a data processing apparatus, where a file divides at least two logical partitions, the at least two logical partitions are mapped to a memory independently, and the at least two logical partitions have index files, and the apparatus includes:

the data receiving module is used for receiving the data to be stored;

the first idle block searching module is used for traversing the at least two logical partitions to search idle blocks capable of storing the data, wherein the idle blocks are areas which are not stored with records in the logical partitions and have continuous offset ranges;

a data storage module for storing the data into the free blocks to generate new records;

a first index information recording module, configured to record index information between the record and an offset range occupied by the record in the free block in the index file;

And the first idle block updating module is used for updating the offset range of the idle block mapping into the offset range which is not occupied by the record.

Optionally, the first idle block searching module includes:

a first reference block determining submodule, configured to determine a largest idle block in the logical partition as a reference block;

a first logical partition block determining submodule, configured to determine that an idle block capable of storing the data is present in the logical partition if the length of the data is less than or equal to the size of the reference block;

a first storage condition determining submodule, configured to determine an idle block in the logical partition, where the idle block meets a preset storage condition, where the storage condition is that a size of the idle block is greater than a length of the data, and a difference between the size of the idle block and the length of the data is minimum;

and the second logical partition block determining submodule is used for determining that the logical partition does not have a free block capable of storing the data if the length of the data is larger than the size of the reference block.

Optionally, the method further comprises:

the updating operation receiving module is used for receiving the updating operation acted on a record;

the record updating module is used for updating the record according to the updating operation so as to determine an original record and a new record, wherein the original record is a record before updating, and the new record is a record after updating;

the first new record storage module is used for storing the new record in the offset range occupied by the original record if the length of the new record is smaller than or equal to the length of the original record;

a second index information recording module, configured to record index information between the new record and an offset range occupied by the new record in an offset range of the original record in the index file;

the free block newly-adding module is used for newly adding a free block, and the free block is mapped to an offset range which is not occupied by the new record;

the second idle block searching module is used for traversing the at least two logical partitions to search the idle blocks capable of storing the new record if the length of the new record is greater than that of the original record;

the second new record storage module is used for storing the new record into the idle block;

A third index information recording module, configured to record index information between the new record and an offset range occupied by the new record in the free block in the index file;

a second idle block updating module, configured to update an offset range mapped by the idle block to an offset range not occupied by the new record;

the first index information deleting module is used for deleting index information between the original record and the offset range occupied by the original record in the index file;

and the first idle block generation module is used for determining the offset range occupied by the original record to generate a new idle block.

Optionally, the second idle block search module includes:

a second reference block determining sub-module, configured to determine a largest idle block in the logical partition as a reference block;

a third logical partition block determining submodule, configured to determine that an idle block capable of storing the new record is present in the logical partition if the length of the new record is less than or equal to the size of the reference block;

a second storage condition determining submodule, configured to determine, in the logical partition, an idle block that meets a preset storage condition, where the storage condition is that a size of the idle block is greater than a length of the new record, and a difference between the size of the idle block and the length of the new record is minimum;

And the fourth logical partition block determining submodule is used for determining that the logical partition does not have a free block capable of storing the data if the length of the new record is larger than the size of the reference block.

Optionally, the method further comprises:

the deleting operation receiving module is used for receiving deleting operation acted on a record;

the second index information deleting module is used for deleting index information between the record and the offset range occupied by the record in the index file according to the deleting operation;

and the second idle block generating module is used for determining the offset range to generate a new idle block.

Optionally, the method further comprises:

the storage characteristic value statistics module is used for counting the storage characteristic values of the idle blocks in the logical partition;

the file expansion module is used for expanding the offset range of the file if the storage characteristic value accords with a preset expansion condition;

a logical partition newly-adding module, configured to newly-add a logical partition to the offset range;

Optionally, the method further comprises:

the record caching module is used for caching the records in the logic partition into a memory;

the record rewriting module is used for writing the record in the memory into the logic partition so that the record occupies a continuous offset range in the logic partition;

a fourth index information recording module, configured to record, in the index file, the record, and index information between the offset ranges occupied by the record in the logical partition;

and the third idle block generating module is used for determining that the offset range of the record which is not stored in the logic partition generates a new idle block.

Optionally, the method further comprises:

the migration partition determining module is used for determining a first partition and a second partition, wherein the first partition is a logical partition of a record to be migrated, and the second partition is a logical partition of the record to be migrated;

the record migration module is used for reading all records in the second partition;

the record migration module is used for writing the record into the idle block of the first partition;

An index information updating module, configured to update, in the index file, index information between the record and an offset range occupied by the record in the free block;

a fourth idle block generating module, configured to determine a target offset range to generate a new idle block, where the target offset range includes an offset range occupied by the record in the second partition;

the memory mapping cancellation module is used for canceling the mapping of the logical partition to the memory if the record is not stored in the logical partition;

and the logical partition canceling module is used for narrowing the offset range corresponding to the logical partition in the file so as to cancel the logical partition.

Optionally, the method further comprises:

the query operation receiving module is used for receiving a query operation;

the offset range query module is used for determining an offset range expressed by a record to be queried in the index file according to the query operation;

and the record reading module is used for reading the record in the offset range.

In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for processing data when executing the program.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of processing data.

In the embodiment of the invention, the file is divided into at least two logical partitions, each logical partition is independently mapped to the memory, records in the logical partition can be independently operated without loading the whole file into the memory for re-operation, the occupation of resources is reduced, the operations among the logical partitions are not affected, and the local records of the large file can be flexibly processed.

In addition, the idle blocks are arranged in the logic area, so that the idle blocks can be flexibly selected to add data, operations such as data erasure can be reduced, and the data storage efficiency is improved.

Drawings

FIG. 1 is a flowchart of a method for processing data according to a first embodiment of the present invention;

FIG. 2 is a diagram illustrating mapping of logical partitions to memory according to a first embodiment of the present invention;

FIG. 3 is an exemplary diagram of a B+ tree data structure according to a first embodiment of the present invention;

FIG. 4 is a flowchart of a data processing method according to a second embodiment of the present invention;

fig. 5 is a flowchart of a data processing method according to a third embodiment of the present invention;

FIG. 6 is a flowchart of a data processing method according to a fourth embodiment of the present invention;

fig. 7 is a flowchart of a data processing method according to a fifth embodiment of the present invention;

FIGS. 8A-8E are exemplary diagrams of a partition defragmentation and a partition compression according to a fifth embodiment of the invention;

fig. 9 is a flowchart of a data processing method according to a sixth embodiment of the present invention;

fig. 10 is a schematic structural diagram of a data processing device according to a seventh embodiment of the present invention;

fig. 11 is a schematic structural diagram of a computer device according to an eighth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flowchart of a data processing method according to a first embodiment of the present invention, where the method may be applied to adding records to a file, and the method may be performed by a data processing apparatus, where the data processing apparatus may be configured in a computer device, such as a server, a workstation, etc., and a file is stored in a database of the computer device, where the file is used to store records related to a service, such as a website log, etc.

As shown in FIG. 2, the file may be divided into at least two logical partitions in a size-wise fashion, with the data between the logical partitions not affecting each other, each logical partition offset range.

For example, logical partition 1 has an offset in the range of 0-1073741824, i.e., 0-1GByte, and logical partition 2 has an offset in the range of 1073741824-2147483648, i.e., 1GByte-2GByte.

At least two logical partitions are independently mapped to memory using memory mapped files (Memory Mapped File, MMF) to manipulate records in the logical partitions.

The memory mapping file is a mapping from a file to a process address space, and when an application program operates data in a mapping area, the application program is similar to the operation of a local array, and does not need the application program to execute I/O (input/output) operation on the file, and the system is responsible for synchronizing the mapped record to the file, so that the efficiency is high.

In the embodiment of the invention, after the memory is mapped to the logical partition, the records in the logical partition can be independently operated without loading the whole file into the memory for re-operation, and the operations among the logical partitions are not affected, including adding, deleting, modifying, inquiring, defragmenting and the like of the records.

In addition, the efficiency of dividing the logical partitions with proper sizes and numbers is higher in the process of defragmentation, and a plurality of logical partitions can be subjected to defragmentation at the same time, so that the mutual influence is avoided, and the concurrency efficiency is high.

In general, the size of a single memory mapping area is limited, and an operation mode of dividing the single memory mapping area into a plurality of logical partitions and mapping the logical partitions is adopted, so that not only is the efficiency of operation recording higher, but also the local recording of a large file is more flexible to process.

As shown in fig. 1, the method specifically includes the following steps:

s101, receiving data to be stored.

The application generates new data according to the business requirements, such as generating a new website log, and writes the new data into a file of the database to store the new data in the file.

S102, traversing the at least two logical partitions to find a free block capable of storing the data.

In the embodiment of the invention, the free block is an area which does not store records in the logic partition and has continuous offset range, and can be used for storing new records.

Note that the non-stored record may mean that data is not stored, or that data is stored, and that the data has failed.

Alternatively, a free block list may be set for each logical partition, in which the offset range of the free block in the logical partition is recorded.

For example, in a logical partition, the offset ranges of 0-1024, 3072-7168, 8192-10534, etc. store records, and at this time, there are free blocks in the offset ranges of 1024-3072, 7168-8192, etc.

After receiving the data to be stored, the linked list of free blocks may be checked logical partition by logical partition to determine if there are free blocks of a size greater than or equal to the length of the data.

If so, the free block is pre-allocated for storing the data, at which point the state of the free block is modified to "pre-allocation".

In a preferred embodiment of the present invention, S102 includes the steps of:

s1021, determining the largest idle block in the logical partition as a reference block.

And S1022, if the length of the data is smaller than or equal to the size of the reference block, determining that the logical partition has an idle block capable of storing the data.

In the free block linked list, the free blocks may be ordered according to size according to traffic conditions, and since the free blocks in the free block linked list are ordered, the reference blocks may be determined by the ordering.

For example, if the free blocks in the free block list are ordered from small to large in size, the last free block in the free block list is the reference block.

When checking a logical partition, the length of the data is compared with the size of the reference block.

If the length of the data is smaller than or equal to the size of the reference block, determining that the logical partition has free blocks capable of storing the data, and continuing to search for proper free blocks from the free block linked list.

S1023, determining idle blocks meeting preset storage conditions in the logical partition.

The storage condition is that the size of the free block is larger than the length of the data, and the difference between the size of the free block and the length of the data is minimum.

For example, assuming that the length of data is 70, the size of a free block in a free block chain table of a certain logical partition is [10,150,75,30,90,120,45,100], i.e., a free block having a size greater than or equal to 70 is allocated from the logical partition to store the data.

If ordered, free blocks of size 75 would be selected, the storage space would be more fully utilized.

And S1024, if the length of the data is larger than the size of the reference block, determining that the logical partition does not have a free block capable of storing the data.

If the length of the data is greater than the size of the reference block, it is determined that there are no free blocks in the edited partition that can store data, and the next logical partition is continued to be checked.

S103, storing the data into the idle block to generate a new record.

After a suitable free block is queried, the data is written into the offset range represented by the free block as a new record in the file.

In general, the data is written from the initial offset of the offset range indicated by the free block, so that the data and the adjacent record are continuous in the offset range, the number of free blocks is reduced, and the utilization rate of the storage space is improved.

In addition, if no suitable idle block is found in all the logical partitions, the data may be stored at the end of the file to generate an expansion request, requesting to expand the logical partitions, and searching for a suitable idle block again.

S104, recording index information between the record and the offset range occupied by the record in the idle block in the index file.

In the embodiment of the invention, at least two logical partitions are provided with index files, and after data are stored, index information in the index files can be updated, wherein the index information comprises the numbers, the offset ranges, the recorded lengths and the like of the logical partitions.

Optionally, the index file is a b+ tree data structure, which includes leaf nodes (all nodes of the third layer as shown in fig. 3) and non-leaf nodes (root node [10] of the first layer and [5], [14,19] of the nodes in the second layer as shown in fig. 3).

It should be noted that the numbers in fig. 3 represent unique identifiers of records, such as IDs of records.

The non-leaf nodes are used for storing the reference information of the leaf nodes, and not storing record data, so that a single non-leaf node can store more reference information of the leaf nodes, the height of the B+ tree data structure is lower, and when index information in the leaf nodes is queried, the number of layers is fewer, and the query efficiency is higher.

When a record of a file is read, other records behind the record to be read are also read into a disk block based on the principle of space locality, so that the application operation is supplied. Since the non-leaf nodes of the b+ tree data structure do not store record data, the volume is smaller, so that one disk block can store more non-leaf nodes. When index information is queried, corresponding information can be queried through fewer disk blocks, and the number of I/O read-write operations is reduced.

The leaf nodes are used for storing the recorded index information, and the path length from the root node to the leaf nodes is the same when the index information is queried each time, so that the efficiency of each query is very close, and the stability of the query efficiency is ensured.

Each leaf node may record the reference information (also called a pointer) of the next leaf node, and all the leaf nodes are combined to form an ordered linked list. When the range is queried, the linked list is traversed transversely, and the query efficiency is higher.

In the partition defragmentation process, the method is particularly suitable for inquiring index information under a certain logical partition based on the leaf node linked list of the B+ tree.

S105, updating the offset range of the free block mapping into an offset range which is not occupied by the record.

In a particular implementation, allocation of new data to the free blocks may be confirmed, thereby generating a record.

In one case, if the length of the record is equal to the size of the free block, the free block is deleted in the free block list.

In another case, if the length of the record is smaller than the size of the free block, determining an unoccupied offset range in the offset range mapped by the original free block, and updating the unoccupied offset range into the free block linked list.

Example two

Fig. 4 is a flowchart of a data processing method according to a second embodiment of the present invention, where the present embodiment is based on the foregoing embodiment, and further adds a record updating operation. The method specifically comprises the following steps:

s401, receiving an update operation acting on a record.

And the application program updates the record stored in the file according to the service requirement, so as to trigger the update operation for the record.

S402, updating the record according to the updating operation so as to determine an original record and a new record.

In the embodiment of the invention, corresponding records are updated in response to the updating operation, and at this time, the original records and the new records are determined.

The original record is a record before update, and the new record is a record after update.

S403, if the length of the new record is smaller than or equal to the length of the original record, storing the new record in the offset range occupied by the original record.

S404, recording index information between the new record and an offset range occupied by the new record in the offset range of the original record in the index file.

S405, adding a free block.

In a specific implementation, the length of the updated new record may be determined, and the length of the new record may be compared with the length of the original record.

If the length of the new record is less than or equal to the length of the original record, the new record can be directly written in the original storage position (the offset range expressed by the original record).

Typically, the new record is written starting from the start offset of the offset range represented by the original record.

In the index file, index information of the record is updated, including the number of the logical partition, the offset range, the length of the record, and the like.

Further, if the length of the new record is smaller than the length of the original record, a new free block is generated after the index information of the new record is updated in the index file, the free block is mapped to the offset range not occupied by the new record, and at this time, the new free block can be added into the free block linked list.

It should be noted that if the original record is continuous with other idle blocks in the offset range, so that the new idle block is continuous with other idle blocks in the offset range, the new idle block may be combined with other idle blocks, and the idle block linked list may be updated accordingly.

And S406, if the length of the new record is greater than that of the original record, traversing the at least two logical partitions to search for a free block capable of storing the new record.

If the length of the new record is greater than that of the original record, the new record is not written in the original storage position (the range of the offset expressed by the original record), at this time, the free block linked list can be checked one by logic partition, and whether the free block with the size greater than or equal to that of the new record exists is judged.

If so, the free block is pre-allocated for storing the new record, at which point the state of the free block is modified to "pre-allocation".

In a preferred embodiment of the present invention, S406 includes the steps of:

s4061, determining the largest idle block in the logical partition as a reference block.

S4062, if the length of the new record is smaller than or equal to the size of the reference block, determining that the logical partition has a free block capable of storing the new record.

S4063, determining an idle block meeting preset storage conditions in the logical partition.

Wherein the storage condition is that the size of the free block is larger than the length of the new record and the difference between the size of the free block and the length of the new record is minimal.

S4064, if the length of the new record is greater than the size of the reference block, determining that the logical partition does not have a free block capable of storing the data.

In the embodiment of the present invention, since the applications of S4061-S4064 and S1021-S1024 are substantially similar, the descriptions of S4061-S4064 are relatively simple, and the relevant points are described in the sections of the method embodiments S1021-S1024, which are not described in detail herein.

S407, storing the new record into the idle block.

After a suitable free block is queried, a new record is written into the offset range represented by the free block, thereby realizing updating.

In general, the new record starts writing from the initial offset of the offset range indicated by the free block, so that the new record and the adjacent record are continuous in the offset range, the number of free blocks is reduced, and the utilization rate of the storage space is improved.

In addition, if no suitable free block is found in all the logical partitions, the new record may be stored at the end of the file to generate an expansion request requesting to expand the logical partition and re-find the suitable free block.

S408, recording index information between the new record and the offset range occupied by the new record in the idle block in the index file.

After storing the new record, index information in the index file may be updated, including the number of logical partitions, the offset range, the length of the new record, and so on.

S409, updating the offset range of the free block mapping to the offset range not occupied by the new record.

In a particular implementation, allocation of a new record to a free block may be confirmed.

In one case, if the length of the new record is equal to the size of the free block, the free block is deleted in the free block list.

In another case, if the length of the new record is smaller than the size of the free block, determining an offset range which is not occupied by the new record in the offset range mapped by the original free block, and updating the offset range which is not occupied by the new record into the free block linked list.

S410, deleting index information between the original record and the offset range occupied by the original record in the index file.

S411, determining the offset range occupied by the original record to generate a new idle block.

For an original record, the index information of the original record may be deleted in the index file.

It should be noted that the original record is not erased, but is in a disabled state, and the offset range of the original record in the file becomes a new free block, and the free block is added to the free block linked list.

When other records are manipulated (e.g., added, updated), the offset range may be allocated to cover the record, thereby reusing storage space.

In the embodiment of the invention, the original record or the new record can be flexibly selected to be stored in the idle block, the operations such as erasing the record can be reduced, and the update efficiency of the record can be improved.

Example III

Fig. 5 is a flowchart of a data processing method according to a third embodiment of the present invention, where the record deletion operation is further added based on the foregoing embodiment. The method specifically comprises the following steps:

s501, receiving a deleting operation acting on a record.

The application program deletes the record stored in the file according to the service requirement, so as to trigger the deletion operation for the record.

S502, according to the deleting operation, deleting index information between the record and the offset range occupied by the record in the index file.

S503, determining the offset range to generate a new idle block.

In response to the delete operation, the recorded index information may be deleted in the index file.

It should be noted that the record is not erased but is in a disabled state, and the offset range of the record in the file becomes a new free block, and the free block is added to the free block list.

Example IV

Fig. 6 is a flowchart of a data processing method according to a fourth embodiment of the present invention, where the partition expansion operation is further added based on the foregoing embodiment. The method specifically comprises the following steps:

s601, counting storage characteristic values of idle blocks in the logical partition.

In a specific implementation, at intervals or when an expansion request is received, a linked list of idle blocks of each logical partition is queried, and a storage characteristic value of the idle blocks in the logical partition is counted, wherein the storage characteristic value is used for representing the characteristics of record storage.

In one example, the stored characteristic value includes a total value of the size of the free blocks and/or a number of characteristic free blocks, wherein the size of the characteristic free blocks is greater than a preset first threshold.

S602, if the storage characteristic value meets a preset expansion condition, expanding an offset range for the file.

S603, adding a logic partition to the offset range.

By applying the embodiment of the invention, the expansion condition can be preset for the storage characteristic value and used for representing the condition of expanding the logic partition.

In one example, the extension condition includes the total value being less than a preset second threshold and/or the number being less than a preset third threshold.

And comparing the storage characteristic value with the expansion condition, if the storage characteristic value accords with the expansion condition, calling a file operation function to expand the offset range of the file, setting the expanded offset range as a new logical partition, and mapping the new logical partition by adopting a memory mapping file mode.

Example five

Fig. 7 is a flowchart of a data processing method according to a fifth embodiment of the present invention, where the foregoing embodiment is based on the foregoing embodiment, and partition defragmentation operation and partition compression operation are further added, and in the use process of a file, because of the addition operation, the update operation, the deletion operation, and the like, a large number of scattered free blocks (i.e. fragments) exist in a logical partition, which is not beneficial to storing a larger record, and the scattered free blocks can be integrated into one large free block through the partition defragmentation operation, so that records can be migrated between the logical partitions, and further saving storage space, and the method specifically includes the following steps:

S701, caching the record in the logic partition into a memory.

S702, writing the record in the memory into the logic partition, so that the record occupies a continuous offset range in the logic partition.

S703, recording the record and index information between the offset ranges occupied by the record in the logical partition in the index file.

S704, determining that the offset range of the record is not stored in the logical partition to generate a new idle block.

When defragmentation operation is carried out in a single logical partition, the effective record is migrated to the initial offset of the logical partition, and the initial offset is compactly stored together, so that the offset range is continuous, no idle blocks exist between every two records, and scattered idle blocks are migrated to the cut-off offset of the logical partition and combined into a finished idle block.

After the defragmentation is completed, the free blocks of the logical partition can store more new records, thereby improving the utilization rate of the storage space.

In a particular implementation, exclusive locks may be added to the logical partition to be defragmented to prevent the business layer from operating on records in the logical partition.

Traversing index file (such as leaf node of B+ tree data structure) to query out index information of all records under the logical partition.

And inquiring records from the logic partition according to the index information, and caching the records in the memory.

Records cached in memory are written back to the logical partition in sequential batches, typically starting from the partition's starting offset.

And updating the index information of the records in batches, and deleting the records cached in the memory.

After all writing of the record is completed, the record migration is completed, and a complete idle block is formed between the cut-off offset of the last record and the cut-off offset of the logic partition.

And clearing elements in the idle block linked list of the logic partition, and adding the idle blocks into the idle block linked list.

So far, the defragmentation operation of the logical partition is completed, and the exclusive lock of the logical partition is released.

In one example, as shown in FIG. 8A, there are 3 logical partitions in a certain file, for a certain logical partition, there are multiple records (record 1, record 2, record 3, record 4 … …) and two free blocks (free block 1, free block 2),

during defragmentation, records are migrated to the initial offset of the logical partition in sequence, and free blocks are migrated to the cut-off offset of the logical partition, as indicated by the arrow, and combined into a completed free block.

As shown in fig. 8B, 3 logical partitions (logical partition 1, logical partition 2, logical partition 3) in the file are defragmented, and each logical partition forms a storage area for storing records with a continuous offset range, and a complete free block.

S705, determining the first partition and the second partition.

By applying the embodiment of the invention, the compression condition, such as two adjacent logical partitions, can be preset, and the total length recorded in the latter logical partition is smaller than a threshold, wherein the threshold is a specified proportion (the proportion can be configured according to the service requirement) for the size of the idle block in the former logical partition.

If the compression condition is satisfied, the first partition and the second partition may be selected.

The first partition is a logical partition of the record to be migrated, and the second partition is a logical partition of the record to be migrated.

Typically, to increase the compression efficiency, records in a subsequent logical partition may be sequentially migrated to free blocks in a previous logical partition.

Thus, for a logical partition, in some cases the logical partition may be a first partition, and in some cases the logical partition may be a second partition.

For example, as shown in fig. 8B, when a record in logical partition 2 is migrated to logical partition 1, logical partition 2 is a second partition, logical partition 1 is a first partition, and when a record in logical partition 3 is migrated to logical partition 2, logical partition 3 is a second partition, and logical partition 2 is a first partition.

S706, reading all records in the second partition.

And S707, writing the record into the idle block of the first partition.

S708, updating the record in the index file, and the index information between the offset range occupied by the record in the free block.

S709, determining a target offset range to generate a new idle block, wherein the target offset range comprises an offset range occupied by the record in the second partition.

In the embodiment of the invention, the size of the idle block of the first partition is larger than the total length of all records in the second partition, so that all records of the second partition can be read out according to the index information and then written into the idle block of the first partition, and then the index information of the records and the idle block linked lists of the first partition and the second partition are updated.

S710, if the logical partition does not store the record, canceling the mapping of the logical partition to the memory.

S711, reducing the offset range corresponding to the logical partition in the file so as to cancel the logical partition.

According to S705-S709, traversing all the logical partitions in the file until the migration of records in all the logical partitions is completed.

At this time, the logical partition in the file may be detected, if the logical partition does not store a record, that is, the entire logical partition is a complete free block, at this time, the memory mapping of the logical partition may be canceled, and the offset range of the file may be compressed by calling the file operation function, thereby canceling the logical partition and freeing disk space.

Typically, if the logical partitions in the file are migrated in order, then the logical partition that did not store the record is typically at the end of the file.

In one example, as shown in FIG. 8C, where the record (total length) in logical partition 2 is less than the specified proportion of free blocks (size) in logical partition 1, the record in logical partition 2 may be migrated into the free blocks in logical partition 1 as indicated by the arrow.

As shown in fig. 8D, if the record (total length) in the logical partition 3 is smaller than the specified proportion of the empty blocks (size) in the logical partition 2, the record in the logical partition 3 may be migrated into the empty block in the logical partition 2 as indicated by the arrow.

At this point, logical partition 3 does not store the record, being a complete free block.

As shown in fig. 8E, the memory mapping of the logical partition 3 is canceled, and the size of the file is compressed, so that the logical partition 3 is canceled, and the logical partition 1 and the logical partition 2 are reserved.

Example six

Fig. 9 is a flowchart of a data processing method according to a sixth embodiment of the present invention, where the present embodiment is based on the foregoing embodiment, and further adds a record query operation. The method specifically comprises the following steps:

s901, receiving a query operation.

And the application program queries the record stored in the file according to the service requirement, so as to trigger query operation for the record.

S902, according to the query operation, determining an offset range expressed by the record to be queried in the index file.

S903, reading the record in the offset range.

In the embodiment of the invention, in response to the query operation, the index information of the record is queried in the index file, so that the logical partition and the offset range thereof are determined.

The record is read out within the offset range in the logical partition.

Example seven

Fig. 10 is a schematic structural diagram of a data processing apparatus according to a seventh embodiment of the present invention, where a file is divided into at least two logical partitions, the at least two logical partitions are mapped to a memory independently, and the at least two logical partitions have index files, and the apparatus specifically may include the following modules:

A data receiving module 1001, configured to receive data to be stored;

a first idle block searching module 1002, configured to traverse the at least two logical partitions to search an idle block capable of storing the data, where the idle block is an area in the logical partition where no record is stored and an offset range is continuous;

a data storage module 1003, configured to store the data into the free block to generate a new record;

a first index information recording module 1004, configured to record index information between the record and an offset range occupied by the record in the free block in the index file;

a first idle block update module 1005 is configured to update the offset range mapped by the idle block to an offset range not occupied by the record.

Optionally, the first idle block search module 1002 includes:

Optionally, the method further comprises:

Optionally, the second idle block search module includes:

Optionally, the method further comprises:

the query operation receiving module is used for receiving a query operation;

The data processing device provided in the embodiment of the present invention can implement each process in the method embodiments of fig. 1 to 9, and in order to avoid repetition, a description is omitted here.

Example eight

Fig. 11 is a schematic structural diagram of a computer device according to an eighth embodiment of the present invention. Fig. 11 illustrates a block diagram of an exemplary computer device 1100 suitable for use in implementing embodiments of the invention. The computer device 1100 shown in fig. 11 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in FIG. 11, the computer device 1100 is in the form of a general purpose computing device. Components of computer device 1100 may include, but are not limited to: one or more processors or processing units 160, a system memory 280, a bus 180 that connects the various system components, including the system memory 280 and the processing units 160.

Bus 180 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 1100 typically includes a variety of computer system readable media. Such media can be any available media that can be accessed by computer device 1100 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 280 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 300 and/or cache memory 320. The computer device 1100 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 340 may be used to read from or write to a non-removable, non-volatile magnetic media (not shown in FIG. 11, commonly referred to as a "hard disk drive"). Although not shown in fig. 11, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 180 through one or more data medium interfaces. Memory 280 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 400 having a set (at least one) of program modules 420 may be stored, for example, in memory 280, such program modules 420 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 420 generally perform the functions and/or methods in the embodiments described herein.

The computer device 1100 may also communicate with one or more external devices 140 (e.g., keyboard, pointing device, display 240, etc.), one or more devices that enable a user to interact with the computer device 1100, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 1100 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 220. Moreover, the computer device 1100 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through the network adapter 200. As shown, network adapter 200 communicates with other modules of computer device 1100 via bus 180. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computer device 1100, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 160 executes various functional applications and data processing by running programs stored in the system memory 280, for example, implementing the processing method of data provided by the embodiment of the present invention.

Example nine

The ninth embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data processing method.

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method for processing data, wherein a file divides at least two logical partitions, the at least two logical partitions being independently mapped to a memory, the at least two logical partitions having index files, the method comprising:

receiving data to be stored;

storing the data into the free block to generate a new record;

updating the offset range of the free block map to an offset range not occupied by the record;

adding a logical partition to the offset range;

the expansion condition comprises that the total value is smaller than a preset second threshold value and/or the number is smaller than a preset third threshold value;

the method further comprises the steps of:

receiving an update operation for a record;

storing the new record into the free block;

2. The method of claim 1, wherein traversing the at least two logical partitions to find a free block in which the data may be stored comprises:

3. The method of claim 1, wherein the index file is a b+ tree data structure comprising leaf nodes and non-leaf nodes, the non-leaf nodes being used to store reference information for leaf nodes, the leaf nodes being used to store the recorded index information.

4. A method according to any one of claims 1 to 3, wherein said traversing said at least two logical partitions to find free blocks in which said new record can be stored comprises:

5. A method according to any one of claims 1-3, further comprising:

receiving a delete operation on a record;

determining the offset range results in a new free block.

6. A method according to any one of claims 1-3, further comprising:

receiving a query operation;

The record is read in the offset range.

7. A data processing apparatus, wherein a file divides at least two logical partitions, the at least two logical partitions being independently mapped to a memory, the at least two logical partitions having index files, the apparatus comprising:

the data receiving module is used for receiving the data to be stored;

the first idle block updating module is used for updating the offset range mapped by the idle block into the offset range not occupied by the record;

the apparatus further comprises:

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of processing data according to any of claims 1-6 when executing the program.

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a method of processing data according to any of claims 1-6.