CN115016728A

CN115016728A - Data processing method and device

Info

Publication number: CN115016728A
Application number: CN202210482550.6A
Authority: CN
Inventors: 吴昊; 吴忠杰; 刘昌�
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2022-09-06

Abstract

An embodiment of the present specification provides a data processing method and an apparatus, wherein the data processing method includes: receiving a data writing request submitted aiming at a target file, wherein the data writing request comprises data to be written, inquiring an idle data block according to an idle data block chain table stored in a file allocation table, writing the data to be written into the idle data block, establishing a mapping relation between the target file and the idle data block based on a writing result, updating a file information storage table according to the mapping relation, creating metadata corresponding to the writing result of the data to be written, writing the metadata into a shared memory, and generating a data writing log of the shared memory.

Description

Data processing method and device

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a data processing method.

Background

With the increasing of data volume, a large amount of data needs to be stored on a magnetic disk, and in order to reduce storage cost, the capacity of the magnetic disk is continuously increased, but the magnetic disk is a mechanical disk and is limited by mechanical capacity, and the performance of the magnetic disk is not increased along with the increase of the capacity of the magnetic disk, which leads to a key index of a storage system using the magnetic disk, namely, an index of per TB performance, to be continuously decreased. Therefore, an effective method is needed to solve such problems.

Disclosure of Invention

In view of this, the embodiments of the present specification provide a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve the technical deficiencies of the prior art.

According to a first aspect of embodiments herein, there is provided a data processing method including:

receiving a data writing request submitted by a target file, wherein the data writing request comprises data to be written;

inquiring an idle data block according to an idle data block chain table stored in a file allocation table, and writing the data to be written into the idle data block;

establishing a mapping relation between the target file and the free data block based on a writing result, and updating a file information storage table according to the mapping relation;

and creating metadata corresponding to the writing result of the data to be written, writing the metadata into a shared memory, and generating a data writing log of the shared memory.

Optionally, the data processing method further includes:

and establishing a data block linked list corresponding to the target file according to the writing result.

Optionally, the data processing method further includes:

receiving a data reading request submitted by aiming at a target file;

determining a first data block of the target file according to the identification information of the target file contained in the data reading request and the mapping relation stored in the file information storage table;

querying a second data block of the target file according to the first data block and the data block linked list;

and reading the data to be read of the target file contained in the first data block and the second data block and returning.

Optionally, the querying an idle data block according to an idle data block chain table stored in a file allocation table, and writing the data to be written into the idle data block includes:

determining the number of target idle data blocks to be distributed according to the data volume of the data to be written;

inquiring idle data blocks according to an idle data block chain table stored in a file allocation table, and determining the number of idle data blocks at the head of the idle data block chain table as target idle data blocks according to the sequence relation of storage spaces of at least two idle data blocks in the idle data block chain table;

and distributing the target idle data block to the data to be written, and writing the data to be written into the target idle data block.

Optionally, the data processing method further includes:

receiving a data deletion request submitted by a target file, wherein the data deletion request comprises data to be deleted;

determining a target data block for storing the data to be deleted, deleting the data to be deleted in the target data block, and releasing the target data block;

and adding the target data block to the tail part of the target data block group.

Optionally, the data processing method further includes:

receiving a data deletion request submitted by a target file, wherein the data deletion request comprises first type data to be deleted;

determining a target data block for storing the first type data to be deleted, and converting the first type data to be deleted in the target data block into second type data.

Optionally, the data processing method further includes:

determining the data volume stored in the target data block, and determining the storage space utilization rate of the target data block according to the data volume;

and determining the data recovery speed of the second type data in the target data block group according to the storage space utilization rate, and performing recovery processing on the second type data according to the data recovery speed and the writing sequence of the second type data.

Optionally, the data processing method further includes:

determining whether a time difference between the current time and the conversion time of the second type of data is greater than or equal to a preset time difference threshold;

if so, determining the data recovery speed of the second type data in the target data block, and performing recovery processing on the second type data according to the data recovery speed and the writing sequence of the second type data.

Optionally, the data processing method further includes:

performing erasure processing on data in the target data block when it is determined that the data included in the target data block is the third type of data;

and under the condition that the erasing is finished, releasing the target data block, and adding the target data block to the tail part of the target data block group.

Optionally, before receiving the data write request submitted for the target file, the method further includes:

creating a user-mode file system in a storage system;

correspondingly, the receiving a data write request submitted for a target file includes:

and receiving a data writing request submitted by aiming at a target file through the user mode file system.

According to a second aspect of embodiments herein, there is provided a data processing apparatus comprising:

the data writing method comprises a receiving module, a sending module and a processing module, wherein the receiving module is configured to receive a data writing request submitted by a target file, and the data writing request comprises data to be written;

the query module is configured to query an idle data block according to an idle data block chain table stored in a file allocation table, and write the data to be written into the idle data block;

the updating module is configured to establish a mapping relation between the target file and the free data block based on a writing result, and update a file information storage table according to the mapping relation;

and the creating module is configured to create metadata corresponding to a writing result of the data to be written, write the metadata into a shared memory, and generate a data writing log of the shared memory.

According to a third aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to realize the steps of any one of the data processing methods.

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any one of the data processing methods.

According to a fifth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the above-mentioned data processing method.

In an embodiment of the present specification, a data write request submitted for a target file is received, where the data write request includes data to be written, a free data block is queried according to a free data block chain table stored in a file allocation table, the data to be written is written into the free data block, a mapping relationship between the target file and the free data block is established based on a write result, a file information storage table is updated according to the mapping relationship, metadata corresponding to a write result of the data to be written is created, the metadata is written into a shared memory, and a data write log of the shared memory is generated.

In the data processing method provided in the embodiment of the present specification, the file allocation table is used for performing space management and processing data to be written based on a simplified log mechanism of the shared memory, so that internal loss of the file system is reduced as much as possible, and the file system provides a higher performance service close to a bare disk to the outside, thereby facilitating improvement of data read-write performance of the file system.

Drawings

FIG. 1 is a flow chart of a data processing method provided by an embodiment of the present description;

FIG. 2 is a flowchart illustrating a processing procedure of a data processing method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present specification;

fig. 4 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

A magnetic disk: HDDs, memories that store data using magnetic recording techniques.

chunk: the file in the file system is called chunk in this scheme.

block: basic allocation unit of file system to disk.

In the present specification, a data processing method is provided, and the present specification relates to a data processing apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following embodiments.

Fig. 1 shows a flowchart of a data processing method according to an embodiment of the present specification, which specifically includes the following steps.

Step 102, receiving a data writing request submitted by a target file, wherein the data writing request comprises data to be written.

Specifically, in the embodiments of the present description, a data write request submitted by a user for a target file may be received first, so as to perform a data write operation based on data to be written carried in the data write request.

In addition, embodiments of the present description may provide a user-mode file system, and may utilize the user-mode file system to receive data write requests submitted for a target file. From the storage system perspective, the storage system actually sends a data write request to the user-mode file system, and the user-mode file system operates in the user-mode environment.

Since the current storage system usually sends a data write request to the local file system, and the local file system belongs to the layer of the kernel of the operating system, the conversion from the user mode to the kernel mode occurs, and the memory copy operation is also accompanied. The user mode, namely the CPU can only access the memory and is not allowed to access the peripheral equipment, the capacity of occupying the CPU is deprived, and the CPU resource can be acquired by other programs. And in the kernel mode, the CPU can access all data in the memory, including peripheral devices such as a hard disk, a network card, and the CPU can also switch itself from one program to another program. Since the user program usually runs in the user mode, there may be a need for the user program to perform relevant operations on the kernel-mode data, such as reading data from a hard disk, or obtaining input from a keyboard. In this process, the user program needs to request the operating system to perform these operations on behalf of the program.

However, there are a lot of I/O operations in the user mode to the kernel mode and in the subsequent operations in the kernel mode, which greatly affects the efficiency of the storage system in reading data. Thus, embodiments of the present description may utilize a user-mode file system to receive a data write request for a target file.

Before receiving a data write request submitted for a target file through a user-mode file system, the user-mode file system may be created in a storage system, and the embodiments of the present specification do not limit how the user-mode file system is created. For the user mode file system, only when data writing or storing and other operations are carried out, conversion from the user mode to the kernel mode does not occur.

Further, the user-mode file system provided in the embodiments of the present specification includes deployment, data layout management, data allocation algorithm, I/O management, recycle bin mechanism, data rescue mechanism, and the like. The method comprises the steps of providing file system service for the outside through a disk handle called disk id, realizing access to a disk inside through a disk symbol, providing file system access interfaces such as create file, open file, close file, appended write buffer, appended write vector, read buffer, read vector, seal file and delete file for the outside, receiving a data writing request through the interfaces, not providing a directory access interface, and storing all file levels in a file system.

And 104, inquiring an idle data block according to the idle data block chain table stored in the file allocation table, and writing the data to be written into the idle data block.

Specifically, the purpose of the file system is to organize and manage files in a disk, and therefore, the target file in the embodiment of the present specification is any file in the disk. Because the minimum storage unit of the file system is a data block, when the size of the target file is large, the multiple data blocks are required to store the content included in the target file together, and therefore, in the embodiments of the present description, after receiving a data write request submitted for the target file, the idle data block may be queried first, the target idle data block may be allocated for the data to be written, and the data to be written may be written into the target idle data block.

In specific implementation, an idle data block may be queried according to an idle data block chain table stored in a file allocation table, and then the data to be written is written into the idle data block, which may specifically be implemented in the following manner:

Specifically, since the minimum storage unit of the file system is a data block, a file occupies at least one data block even if it has only 1 byte. Thus, the user-mode file system is storing the target file, and in case the target file is large, one first data block and one or at least two second data blocks may be needed to store the file content of the target file.

Based on this, when the user-mode file system receives a data write request for the target file, new data may need to be written in the target file, that is, an idle data block needs to be allocated to the target file, and data is written in the idle data block.

In practical application, in the embodiments of the present description, a disk may be divided into a superblock area (superblock area), a file information storage table area (chunk-table area), a file allocation table area (fat-table area), and a data area.

Because a disk can be used for storing files, but the disk needs to be formatted into a file system of a certain format first to store the files, after the disk is formatted into a file system of a corresponding format, disk formatting information can be stored in a superblock area, wherein the disk formatting information can include information such as file system version, data block size, and the like.

In addition, the file system may tag each data block with a block identification of the data block. The fat-table area is used for managing the blocks of the data area, and the blocks in the same target data block group can be managed in a mode of an idle data block chain table, namely, the sequence relation of each idle data block in the same target data block group is stored in the file allocation table, so that the idle data blocks can be inquired in the idle data block chain table stored in the file information storage table, the idle data blocks are allocated for the data to be written according to the inquiry result, and then the data to be written is written into the allocated idle data blocks.

Specifically, a block is a block.

Since the minimum storage unit of a file system is block, generally, 1block is 4KB is 8 sectors, and a file occupies at least one data block even if it has only 1 byte. Therefore, when the file system is used for writing data to be written, a corresponding number of idle data blocks need to be allocated to the part of data to be written according to the size of the data to be written, specifically, according to the sequential relation of the storage spaces of at least two idle data blocks in the idle data block linked list of the file allocation table, the number of idle data blocks corresponding to the head of the idle data block linked list is determined as a target idle data block, the target idle data block is allocated to the data to be written, and the data to be written is written into the target idle data block.

For example, if there are 10 data blocks in the file system, namely, block1, block2, … … and block10, each data block has a size of 4KB, the 10 data blocks are divided into the same group, and each data block in the group is in an idle state, the block identifier of each data block may be stored in the file allocation table in the form of a linked list, and the storage result may be fat-table [1] is 2, fat-table [2] is 3, … …, fat-table [9] is 10, and these several arrays jointly form the data block linked list. Because 1 represents the first free data block, and 2 represents the second free data block, the fat-table [1] is 2, that is, represents the target data block group, and the next data block of the first data block is the free data block with the block identifier of 2, and so on, the free data block in the file allocation table can be queried through the free data block chain table.

Based on this, if it is determined that the data to be written is 8KB, it may be determined that two free data blocks need to be allocated for the data to be written, and therefore, block1 and block2 in the group may be allocated to the data to be written so as to write the data to be written into block1 and block 2.

The embodiment of the specification can store the free data blocks with continuous storage space in the file system in the same data block group (block group), so that when the free data blocks are allocated for the data to be written for data writing, the continuity of the file writing result can be effectively improved.

In practical applications, the block size may be determined at the time of formatting. For example, mke2fs-b4096/dev/sda6 specifies a block size of 4096Bytes when formatting/dev/sda 6. In the formatting process, a system administrator can select different data block sizes according to the characteristics of target files stored in a disk, and if most of the files to be stored are small files of a few KB, the utilization rate of the disk storage space can be improved by selecting the data block of 1 KB; if most files in the system are large, a larger data block should be selected, so that on one hand, the data reading and writing efficiency can be improved, on the other hand, the required number of index nodes can be reduced, and the method can be determined according to actual requirements, and is not limited herein.

And 106, establishing a mapping relation between the target file and the free data block based on the writing result, and updating a file information storage table according to the mapping relation.

Specifically, as described above, in the embodiments of the present specification, a disk may be divided into a superblock area (superblock area), a file information storage table area (chunk-table area), a file allocation table area (fat-table area), and a data area.

The chunk-table area can be used for storing metadata information of the file, including mapping relation from the file to a first data block of the file, file creation time and the like; therefore, after the data to be written is written into the free data block, a mapping relation between the target file and the first free data block in the written free data block can be created, and the mapping relation is stored in the chunk-table area. When the data volume of the data to be written is large and at least two idle data blocks need to be allocated to the data to be written, the data to be written can be written into a first idle data block allocated to the data to be written, and after the idle data block is fully written, other data to be written are continuously written into a next idle data block.

In addition, the file system may tag each data block with a block identification of the data block. In this embodiment of the present specification, after allocating free data blocks to data to be written, and writing the data to be written into the free data blocks, a data block linked list corresponding to a target file may be established according to a writing result, specifically, according to a sequential relationship in which the data to be written is written into each free block and a block identifier of each free data block, a data block linked list of each free data block is established, and the data block linked list is stored in a file allocation table, specifically, when writing other data to be written into a second free data block, the data block linked list may be updated according to the block identifier of the second free data block.

For example, the block identifier of the first data block of the target file is 2, the block identifier of the second data block is 7, the block identifier of the third data block is 100, and the block identifier of the last data block is 2000, so that the array stored in the fat-table may be fat-table [2] equal to 7, fat-table [7] equal to 100, fat-table [100] equal to 0xfffffff, and these arrays jointly form the data block linked list. Because 2 represents the first data block, fat-table [2] is 7, that is, represents the target data block group, the next data block of the first data block is the data block with the block identifier of 7, and so on, all the data blocks corresponding to the target file can be queried through the linked list.

Step 108, creating metadata corresponding to the writing result of the data to be written, writing the metadata into a shared memory, and generating a data writing log of the shared memory.

Specifically, the metadata is data of management data, and is used to indicate a data storage location, history data, resource search, file record, and the like.

Since the data enters the kernel mode during writing or reading, the relevant operations of the corresponding metadata are all completed by the operating system. However, in the embodiment of the present specification, when data is written, conversion from the user mode to the kernel mode does not occur, and therefore, the user mode file system needs to construct metadata for a write result of data to be written, that is, generate metadata for a write result of data to be stored, write the metadata into the shared memory, and then generate a data write log of the shared memory. The process is equivalent to writing some update information of the fat-table and the chunk-table into the shared memory, so that after the system is restarted, the part of update information can be obtained from the shared memory, then the data to be written can be restored to the local memory according to the obtained update information, and then the data is printed to the hard disk through the Flash address.

In practical application, in the buffer writing process of the file system, before a user does not call a sink, the strong reliability of the writing result of the data cannot be guaranteed, so that the written data can be lost to a certain extent.

In the embodiment of the description, through a simplified log mechanism based on a shared memory, metadata corresponding to a data writing result of a data block is written into the shared memory in a buffer writing scene, so that buffer writing performance close to a bare disk can be provided to the outside, and after a process is restarted, data in the shared memory can be guaranteed not to be lost or mistaken, which is beneficial to guaranteeing high reliability of the data writing result.

In addition, the user-mode file system can also receive a data deletion request submitted by a target file, wherein the data deletion request comprises data to be deleted;

Specifically, after receiving a data deletion request submitted for a target file, a user-mode file system may determine a target data block storing data to be deleted, delete the data to be deleted in the target data block, release the target data block after completing deleting the data to be deleted in the target data block, and add the target data block to the tail of the target data block group.

Along with the above example, 10 data blocks are created for the target file, after block1 and block2 in the group are allocated to the data to be written, so that the data to be written is written into block1 and block2, the group contains 8 data blocks of block3, block4, block … … and block10, and the 8 data blocks are sequentially connected. After the data stored in the block1 is deleted, the block1 becomes an idle data block, and then the block1 can be put into the group, in this case, the data blocks included in the group are the block3, the block4, the block … …, the block10, and the block1, and the data blocks are sequentially connected according to the sorting mode.

In the embodiment of the present description, the data blocks corresponding to the target file are divided into the target data block group, the idle data blocks are managed in the target data block group in a linked list manner, the data block released at the latest is placed at the tail of the linked list, and the distributed data blocks are distributed from the head of the linked list, so that the data in the data block just released in the target data block group is ensured not to be covered, the reliability of data writing is improved, and higher writing performance can be provided.

In specific implementation, the user-mode file system may further provide a recycle bin mechanism, which may be implemented in the following manner:

Further, the data volume stored in the target data block can be determined, and the storage space utilization rate of the target data block set is determined according to the data volume;

Specifically, the first type of data may be valid data and the second type of data may be recycle bin data. And after the user deletes the valid data, the valid data is converted into recycle bin data.

The user-mode file system in the embodiment of the description provides a recycle bin function, and is beneficial to recovering partial data under emergency situations such as mistakenly deleting data by a user and generating bugs in software, and the like, so that the disaster tolerance capability of the system is improved. And under the condition that the user does not need to recover the recycle bin data, the system can recover the recycle bin data so as to ensure the availability of different partitions in the hard disk.

In this case, effective data to be deleted in the data block may be converted into recycle bin data, and then a data level driving policy is adopted to determine whether to recycle the recycle bin data.

The data level is composed of two parts of effective data and recycle bin data, so that the data volume stored in the target data block is determined, the storage space utilization rate of the target data block is determined according to the data volume, and whether the recycle process of the recycle bin data is triggered or not is determined by judging the proportion of the sum of the effective data and the recycle bin data in the target data block to the capacity of the target data block.

Therefore, the data volume of the recycle bin data stored in each data block in the target data block group and the data volume of the valid data can be determined, then the sum of the data volume of the recycle bin data and the data volume of the valid data in each data block is calculated to obtain a first summation result corresponding to each data block, then the sum of the first summation results is calculated to obtain the sum of the data volumes of the valid data and the recycle bin data stored in the target data block group, namely a second summation result, and then the ratio of the second summation result to the space capacity of the target data block group is used to obtain the storage space utilization rate of the target data block group.

The storage space utilization rate represents the ratio of the data amount stored in the target data block group to the space capacity of the target data block group, and the larger the storage space utilization rate is, the more the data amount stored in each data block in the target data block group is, the less the storage space available for representing each data block in the target data block group is, and under the condition, part of the data blocks need to be released in time for data storage; therefore, under the condition that the preset space utilization rate is determined to be greater than or equal to the preset utilization rate threshold value, the data recovery speed of the recycle bin data in the target data block group can be determined to be fast recovery, then the recycle bin data can be recovered according to the recovery speed and the write-in sequence of the recycle bin data of each data block in the target data block group, and the recycle bin data is converted into invalid data.

In the embodiment of the present description, when the data level of the file system is higher, the second type data is recycled and deleted according to the writing sequence of the second type data, which is beneficial to ensuring the reliability and recoverability of the data in the data block.

Alternatively, whether the time difference between the current time and the conversion time of the second type of data is greater than or equal to a preset time difference threshold value or not can also be determined;

and if so, determining the data recovery speed of the second type data in the target data block, and performing recovery processing on the second type data according to the data recovery speed and the writing sequence of the second type data.

Specifically, in addition to determining whether to trigger the recovery of the second type of data through the data water level policy in the foregoing real-time manner, it may also determine whether to perform the recovery processing on the second type of data through the generation time, that is, the conversion time of the second type of data.

Specifically, the time difference between the current time and the conversion time of the second type of data is compared with a preset time difference threshold, and if the time length between the conversion time of the second type of data and the current time is determined to be greater than the preset time length threshold according to the comparison result, the second type of data can be recycled.

In the embodiment of the present description, after the first type data is converted into the second type data, the second type data is recovered and deleted according to the writing sequence of the second type data after waiting for a certain time, which is beneficial to ensuring the reliability and recoverability of the data in the data block.

In specific implementation, under the condition that the data contained in the target data block is determined to be the third type data, erasing the data in the target data block;

Specifically, the third type of data is invalid data, and the data in the recycle bin is recycled, that is, the data is converted into the invalid data. And under the condition that the data contained in the target data block are determined to be invalid data, erasing the data in the target data block, releasing the target data block under the condition that the erasing is finished, and then adding the target data block to the tail part of the target data block group, namely, placing the target data block behind the last data block in the target data block group.

In addition, after the data to be written is written into the idle data block, the mapping relation between the target file and the idle data block is established, and the file information storage table is updated according to the mapping relation, the data reading request submitted by the target file can be received;

Specifically, as mentioned above, after the data to be written is written into the free data block, a mapping relationship between the target file and the first free data block needs to be established based on the writing result, and the file information storage table is updated according to the mapping relationship; in addition, a data block linked list corresponding to the target file needs to be established according to the writing result, and the information stored in the data block linked list is used for representing the data writing sequence of each idle data block when the data to be written of the target file is written into the idle data block, and simultaneously representing which data blocks the file content corresponding to the target file is stored in.

Based on this, after a data reading request for the target file is received, a first data block having a mapping relationship with the target file is queried in the file information storage table according to the identification information of the target file, and then a second data block associated with the first data block is queried in the data block linked list according to the block identification of the first data block, so as to read data in the first data block and the second data block, obtain data to be read of the target file, and return the data.

The traditional file management system does not provide a recycle bin function in space distribution, the recycle bin function needs to be realized by software of a file system, the timeliness of released data blocks is not considered in space distribution, the data blocks are easy to reuse, if the data are deleted by mistake, the data are difficult to rescue, the rescue time is long, and the day is used as a unit. The user mode file system in the embodiment of the specification provides a recycle bin mechanism, data is converted into recycle bin data after being deleted, and the recycle bin data is deleted according to a data writing sequence after waiting for a certain time or when the space water level is higher, so that the reliability and recoverability of the data are ensured.

In addition, the embodiment of the specification further provides a data rescue tool, and data deleted by mistake can be rescued in a recovery station and a tray sweeping mode.

Compared with a kernel-mode file system, the user-mode file system is easier to operate and maintain and has complete tool chains; the user mode file system is deeply matched with a block server (chunk server), an extend mechanism is replaced by a chunk-table + fat-table + share memory journal mechanism, buffer write performance close to bare disk performance is provided, and stable bandwidth can be improved to 125 MB/s-200 MB/s from 62 MB/s; in addition, the embodiment of the present specification provides a recycle bin mechanism, which does not require a file system user to implement a recycle bin, and provides higher data reliability; and based on the space allocation strategy of the group + fat-table linked list, the performance is ensured through a group mechanism, the data block release time is considered in data allocation, the data blocks released earlier are allocated as much as possible, the data is prevented from being mistakenly deleted and covered, the data rescue capacity in hours is provided, the data rescue performance is improved, and the data reliability is further improved.

In an embodiment of the present description, a data write request submitted for a target file is received through a user-mode file system, where the data write request includes data to be written, a first data block of the target file is determined according to a mapping relationship between the target file and a data block stored in a file information storage table, a second data block of the target file is queried according to the first data block and a data block linked list stored in a file allocation table, and when it is determined that an idle data block exists in the first data block and/or the second data block, the data to be written is written into the idle data block, metadata corresponding to a write result of the data to be written is created, the metadata is written into a shared memory, and a data write log of the shared memory is generated.

In the user-mode file system provided in the embodiment of the present specification, the file allocation table is used to perform space management and process data to be written based on a simplified log mechanism of the shared memory, so that internal loss of the file system is reduced as much as possible, and the file system provides a higher-performance service close to a bare disk to the outside, thereby facilitating improvement of data read-write performance of the file system.

The following will further describe the data processing method with reference to fig. 2 by taking an application of the data processing method provided in this specification in an actual scene as an example. Fig. 2 shows a processing procedure flowchart of a data processing method provided in an embodiment of the present specification, which specifically includes the following steps.

Step 202, receiving a data writing request submitted by a target file through a user mode file system, wherein the data writing request comprises data to be written.

And 204, determining the number of target idle data blocks to be distributed according to the data volume of the data to be written.

And step 206, inquiring idle data blocks according to an idle data block chain table stored in a file allocation table, and determining the number of idle data blocks at the head of the idle data block chain table as target idle data blocks according to the sequence relation of the storage spaces of at least two idle data blocks in the idle data block chain table.

And 208, distributing the target idle data block to the data to be written, and writing the data to be written into the target idle data block.

Step 210, establishing a mapping relation between the target file and a first free data block in the target free data block based on the writing result, and updating a file information storage table according to the mapping relation.

And 212, establishing a data block linked list corresponding to the target file according to the data writing sequence of each target idle data block in the writing result.

Step 214, receiving a data deletion request submitted by the target file, wherein the data deletion request includes valid data to be deleted.

And step 216, determining a target data block for storing the effective data to be deleted, and converting the effective data to be deleted in the target data block into recycle bin data.

At step 218, the amount of data stored in the target data block is determined, and the storage space usage of the target data block is determined according to the amount of data.

Step 220, determining the data recovery speed of the recycle bin data in the target data block according to the storage space utilization rate, and performing recovery processing on the recycle bin data according to the data recovery speed and the write-in sequence of the recycle bin data.

Specifically, recycle bin data is recycled, that is, the recycle bin data is converted into invalid data.

In step 222, when it is determined that the data included in the target data block is invalid data, the data in the target data block is erased.

And 224, releasing the target data block and adding the target data block to the tail part of the target data block group under the condition that the erasing is finished.

In the user-mode file system provided in the embodiment of the present specification, a file allocation table is used to perform space management and process data to be written based on a simplified log mechanism of a shared memory, so that internal loss of the file system is reduced as much as possible, and the file system provides a higher-performance service close to a bare disk to the outside, thereby facilitating improvement of data read-write performance of the file system.

Corresponding to the above method embodiment, this specification further provides a data processing apparatus embodiment, and fig. 3 shows a schematic structural diagram of a data processing apparatus provided in an embodiment of this specification. As shown in fig. 3, the apparatus includes:

a receiving module 302, configured to receive a data write request submitted for a target file, where the data write request includes data to be written;

the query module 304 is configured to query an idle data block according to an idle data block chain table stored in a file allocation table, and write the data to be written into the idle data block;

an updating module 306 configured to establish a mapping relationship between the target file and the free data block based on the writing result, and update a file information storage table according to the mapping relationship;

a creating module 308 configured to create metadata corresponding to a writing result of the data to be written, write the metadata into a shared memory, and generate a data writing log of the shared memory.

Optionally, the data processing apparatus further includes an establishing module configured to:

Optionally, the data processing apparatus further includes a reading module configured to:

receiving a data reading request submitted by aiming at a target file;

Optionally, the query module 304 is further configured to:

Optionally, the data processing apparatus further includes a first processing module configured to:

Optionally, the data processing apparatus further includes a data transcoding module configured to:

Optionally, the data processing apparatus further includes a data recovery module configured to:

Optionally, the data recovery module is further configured to:

determining whether a time difference between the current time and the conversion time of the second type of data is greater than or equal to a preset time difference threshold value;

Optionally, the data processing apparatus further includes a second processing module configured to:

Optionally, the data processing apparatus further includes a third processing module configured to:

creating a user-mode file system in a storage system;

The above is a schematic configuration of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method.

FIG. 4 illustrates a block diagram of a computing device 400 provided in accordance with one embodiment of the present description. The components of the computing device 400 include, but are not limited to, a memory 410 and a processor 420. Processor 420 is coupled to memory 410 via bus 430 and database 450 is used to store data.

Computing device 400 also includes access device 440, access device 440 enabling computing device 400 to communicate via one or more networks 460. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 440 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 400, as well as other components not shown in FIG. 4, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 4 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 400 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 400 may also be a mobile or stationary server.

Wherein the processor 420 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the data processing method described above.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.

An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor implement the steps of the data processing method described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the data processing method.

The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the data processing method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of combinations of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the embodiments. Furthermore, those skilled in the art will appreciate that the embodiments described in this specification are presently preferred and that no acts or modules are required in the implementations of the disclosure.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the teaching of the embodiments of the present disclosure. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method of data processing, comprising:

establishing a mapping relation between the target file and the idle data block based on a writing result, and updating a file information storage table according to the mapping relation;

2. The data processing method of claim 1, further comprising:

3. The data processing method of claim 2, further comprising:

receiving a data reading request submitted by aiming at a target file;

4. The data processing method according to claim 1, wherein the querying a free data block according to a free data block chain table stored in a file allocation table, and writing the data to be written into the free data block comprises:

5. The data processing method of claim 1, further comprising:

6. The data processing method of claim 1, further comprising:

7. The data processing method of claim 6, further comprising:

8. The data processing method of claim 6, further comprising:

9. The data processing method of claim 7 or 8, further comprising:

10. The data processing method of claim 1, before receiving the data write request submitted for the target file, further comprising:

creating a user-mode file system in a storage system;

11. A data processing apparatus comprising:

12. A computing device, comprising:

a memory and a processor;

the memory is for storing computer-executable instructions and the processor is for executing the computer-executable instructions, which when executed by the processor implement the steps of the data processing method of any one of claims 1 to 10.

13. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the data processing method of any one of claims 1 to 10.