CN114936188A

CN114936188A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN114936188A
Application number: CN202210601204.5A
Authority: CN
Inventors: 余涛
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-08-23

Abstract

The application provides a data processing method, a data processing device, electronic equipment and a storage medium, and relates to the technical field of data processing. The method comprises the following steps: acquiring a file creation request, wherein the file creation request comprises: the size of the target file, the name of the target file; determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, the preset erasure ratio and the size of the preset data blocks; respectively creating a data block of each object on each selected disk according to a preset erasure ratio, and generating file creation information; according to the file creation information, first-class metadata and second-class metadata of the target file are respectively generated and stored, the first-class metadata and the second-class metadata are stored in a system disk of the file system, and the second-class metadata are stored in a data disk of the file system. The method can reduce the data access amount and improve the data access efficiency.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

With the advent of the big data era, massive data needs to be stored, Erasure Codes (EC) are a commonly used data protection method at present, and higher data reliability can be obtained with smaller data redundancy. Mass data storage can generate a large amount of metadata, the metadata mainly describes data attribute information, and erasure correction calculation needs to store the data after certain segmentation, so that the data size of the metadata can become more when erasure correction storage is based, and when data is accessed, the metadata is usually accessed first, so that the improvement of the access performance of the metadata becomes more important.

The currently common method for optimizing the metadata access speed is mainly to improve the data access efficiency in a mode of accelerating hardware and sharing the access pressure by multiple servers.

However, the above method increases hardware consumption to some extent, so that data processing cost is high.

Disclosure of Invention

An object of the present application is to provide a data processing method, apparatus, electronic device and storage medium, so as to improve metadata access efficiency, thereby improving data processing efficiency.

In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:

in a first aspect, an embodiment of the present application provides a data processing method, which is applied to a file system based on erasure storage, and the method includes:

acquiring a file creating request, wherein the file creating request comprises: the size of the target file, the name of the target file;

determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, a preset erasure ratio and the size of a preset data block;

respectively creating a data block of each object on each selected disk according to the preset erasure ratio, and generating file creation information, wherein the file creation information comprises: the file system comprises a target file, a file system and a storage system, wherein the target file comprises a file name, a file size, a file storage path, a file identifier, file creation time, the number of objects contained in the file and information of each data block in each object contained in the file, wherein the data block in each object comprises an effective data block and a redundant data block, and each data block under the same object is distributed on different disks of the selected file system;

respectively generating first-type metadata and second-type metadata of the target file according to the file creation information, storing the first-type metadata into a system disk of the file system, and storing the second-type metadata into a data disk of the file system; the first type of metadata includes: basic information of the target file and an incidence relation between the first type metadata and second type metadata, wherein the second type metadata comprises: and the access frequency of the first type of metadata is greater than that of the second type of metadata.

Optionally, the generating the first type metadata and the second type metadata of the target file according to the file creation information respectively includes:

generating basic information of the target file according to the file name, the file size, the file creating time, the file storage path and the number of objects contained in the file, and generating an association relationship between the first type of metadata and the second type of metadata according to the file identification and information of each data block in each object contained in the file, wherein the association relationship is used for representing a mapping relationship between the file identification of the target file and each data block in each object contained in the file;

obtaining first-class metadata of the target file according to the basic information of the target file and the incidence relation between the first-class metadata and the second-class metadata;

and generating data block information of the target file according to the information of each data block in each object contained in the file, and taking the data block information as the second type metadata.

Optionally, after the first type of metadata and the second type of metadata of the target file are respectively generated according to the file creation information, the first type of metadata is stored in a system disk of the file system, and the second type of metadata is stored in a data disk of the file system, the method includes:

obtaining a file write request for the target file, wherein the file write request comprises: name of the target file, file data of the target file;

determining whether the target file exists in the created files according to the name of the target file;

if the target file exists, determining redundant data contained in each object of the target file according to the file data of the target file, wherein the redundant data is used for recovering the file data of the target file;

and writing the redundant data contained in each object of the target file into a redundant data block of each object, writing the file data of the target file into an effective data block of each object respectively, and updating the first-class metadata and the second-class metadata corresponding to the target file.

Optionally, after writing the redundant data included in each object of the target file into the redundant data block of each object and writing the file data of the target file into the valid data block of each object, the method includes:

obtaining a file reading request aiming at the target file, wherein the file reading request comprises: the name of the target file, the size of the target file, reading offset information and reading length information;

according to the name of the target file, reading offset information, inquiring first-class metadata of each file in a system disk, and determining information of each data block corresponding to the target file;

and reading the target file from the corresponding disk according to the information of each corresponding data block of the target file and the read offset information and the read length information.

Optionally, querying the first type of metadata of each file in the system disk according to the name of the target file and the read offset information, and determining information of each data block corresponding to the target file, including:

determining a file identifier of the target file according to the name of the target file;

according to the file identification of the target file, inquiring the incidence relation between the first type metadata and the second type metadata of each file, and determining the second type metadata corresponding to the target file;

inquiring the second type of metadata to obtain the information of the disk to which each data block of the target file belongs;

and determining the information of the disk to which the target data block corresponding to the target file belongs from the information of each data block corresponding to the target file according to the read offset information.

Optionally, the reading the target file from the corresponding disk according to the information of each corresponding data block of the target file and according to the read offset information and the read length information includes:

according to the information of the magnetic disk to which the target data block corresponding to the target file belongs, and according to the reading offset information and the reading length information, respectively reading the file data stored in each target data block from the magnetic disk to which each target data block belongs;

and combining to obtain the target file according to the read file data stored in each target data block.

Optionally, the reading the file data stored in each target data block from the disk to which each target data block belongs respectively includes:

and if reading of the target disk fails and the data stored in the target data block on the target disk is valid data, performing erasure correction calculation to recover the valid data in the target data block on the target disk, wherein the target disk is any one of the disks corresponding to the target data blocks.

Optionally, the performing erasure correction computation to recover valid data in the target data block on the target disk includes:

and calculating the effective data in the target data block on the target disk according to the effective data read from the target data block on the disk except the target disk and the redundant data.

Optionally, before creating a data block of each object on each selected disk according to the preset erasure ratio, the method includes:

determining the number of the disks to be selected according to the preset erasure ratio;

and selecting the number of the disks from a plurality of disks in the file system according to the disk weight of each disk in the file system, wherein the disk weight of each disk is determined according to the capacity of each disk and the total capacity of the system disks.

Optionally, the first type of metadata is stored in the form of key-value pairs.

In a second aspect, an embodiment of the present application further provides a data processing apparatus, which is applied to a file system based on erasure-correcting storage, where the apparatus includes: the device comprises an acquisition module, a determination module and a generation module;

the obtaining module is configured to obtain a file creation request, where the file creation request includes: the size of the target file, the name of the target file;

the determining module is configured to determine, according to the size of the target file, a preset erasure ratio, and a size of a preset data block, the number of objects into which the target file is divided and the number of data blocks included in each object;

the generating module is configured to respectively create a data block of each object on each selected disk according to the preset erasure ratio, and generate file creation information, where the file creation information includes: the file name, the file size, the file storage path, the file identification, the file creation time, the number of objects contained in the file, and information of each data block in each object contained in the file, wherein the data block in each object comprises an effective data block and a redundant data block, and each data block under the same object is distributed on different disks of the selected file system;

the generating module is configured to generate first-type metadata and second-type metadata of the target file according to the file creation information, store the first-type metadata in a system disk of the file system, and store the second-type metadata in a data disk of the file system; the first type of metadata includes: basic information of the target file, and an association relationship between the first type of metadata and a second type of metadata, where the second type of metadata includes: and the access frequency of the first type of metadata is greater than that of the second type of metadata.

Optionally, the generating module is specifically configured to generate basic information of the target file according to the file name, the file size, the file creation time, the file storage path, and the number of objects included in the file, and generate an association relationship between the first type of metadata and the second type of metadata according to the file identifier and information of each data block in each object included in the file, where the association relationship is used to represent a mapping relationship between the file identifier of the target file and each data block in each object included in the file;

Optionally, the apparatus further comprises: a write module;

the obtaining module is further configured to obtain a file write request for the target file, where the file write request includes: name of the target file, file data of the target file;

the determining module is further configured to determine whether the target file exists in the created file according to the name of the target file;

the determining module is further configured to determine, if the target file exists, redundant data included in each object of the target file according to the file data of the target file, where the redundant data is used to recover the file data of the target file;

the writing module is configured to write redundant data included in each object of the target file into a redundant data block of each object, write file data of the target file into an effective data block of each object, and update first-type metadata and second-type metadata corresponding to the target file.

Optionally, the apparatus further comprises: a reading module;

optionally, the obtaining module is further configured to obtain a file read request for the target file, where the file read request includes: the name of the target file, the size of the target file, reading offset information and reading length information;

the determining module is further configured to query first-class metadata of each file in a system disk according to the name of the target file and the read offset information, and determine information of each data block corresponding to the target file;

and the reading module is used for reading the target file from the corresponding disk according to the information of each corresponding data block of the target file and the reading offset information and the reading length information.

Optionally, the determining module is specifically configured to determine a file identifier of the target file according to the name of the target file;

Optionally, the reading module specifically reads, according to the information of the disk to which the target data block corresponding to the target file belongs, the file data stored in each target data block from the disk to which each target data block belongs according to the read offset information and the read length information;

Optionally, the reading module specifically executes erasure correction calculation to recover valid data in a target data block on a target disk if reading of the target disk fails and data stored in the target data block on the target disk is valid data, where the target disk is any one of disks corresponding to the target data blocks.

Optionally, the reading module calculates valid data in a target data block on the target disk specifically according to valid data read from the target data block on a disk other than the target disk and redundant data.

Optionally, the determining module is further configured to determine the number of disks to be selected according to the preset erasure correction ratio;

and selecting the number of disks from a plurality of disks in the file system according to the disk weight of each disk in the file system, wherein the disk weight of each disk is determined according to the capacity of each disk and the total capacity of the system disks.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operated, the processor executing the machine-readable instructions to perform the steps of the method as provided in the first aspect when executed.

In a fourth aspect, embodiments of the present application provide a storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform the steps of the method as provided in the first aspect.

The beneficial effect of this application is:

the application provides a data processing method, a data processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a file creating request, wherein the file creating request comprises: the size of the target file, the name of the target file; determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, a preset erasure ratio and the size of a preset data block; respectively creating a data block of each object on each selected disk according to a preset erasure ratio, and generating file creation information; according to the file creation information, first-class metadata and second-class metadata of the target file are respectively generated and stored, the first-class metadata and the second-class metadata are stored in a system disk of the file system, and the second-class metadata are stored in a data disk of the file system. The method classifies metadata information of files according to access frequency to respectively generate first type metadata and second type metadata, stores the first type metadata into a system disk of a file system, stores the second type metadata into a data disk of the file system, can access the metadata according to access requirements because the information in the metadata is not accessed uniformly in the data access process, based on the classification of the metadata, can reduce unnecessary metadata access, reduce data access amount to a certain extent and improve data access efficiency by separately storing the first type metadata accessed at high frequency and the second type metadata accessed at low frequency, and can reduce the access times of the data disk when accessing data, the data disk is depressurized.

In addition, based on the classified storage of the first type metadata and the second type metadata, the data volume of the metadata needing to be updated when the metadata information is modified can be reduced, and the reading and writing performance of the metadata is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic structural diagram of an erasure correction stored file system according to an embodiment of the present application;

fig. 2 is a schematic diagram of a disk space partition according to an embodiment of the present application;

fig. 3 is a first schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 4 is a schematic flowchart illustrating a second data processing method according to an embodiment of the present application;

fig. 5 is a third schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 6 is a schematic diagram of data separation storage based on erasure correction storage according to an embodiment of the present application;

fig. 7 is a fourth schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 8 is a fifth flowchart illustrating a data processing method according to an embodiment of the present application;

fig. 9 is a sixth schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 10 is a seventh flowchart illustrating a data processing method according to an embodiment of the present application;

FIG. 11 is a block diagram of a data processing system according to an embodiment of the present application;

FIG. 12 is a schematic diagram illustrating a data block read according to an embodiment of the present application;

FIG. 13 is a schematic diagram illustrating another example of reading a data block according to the present application;

fig. 14 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are only for illustration and description purposes and are not used to limit the protection scope of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. In addition, one skilled in the art, under the guidance of the present disclosure, may add one or more other operations to the flowchart, or may remove one or more operations from the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

Before introducing the method of the present application, a concept of metadata is explained, the metadata is mainly information describing attributes of data, and for a file, the metadata records not only a name, a type, an access authority, a size, a storage path and the like of the file data, but also information of a storage location, a file source and the like of the file. When data access is carried out, the metadata of the file is usually accessed first, so that the access frequency of the metadata of the file is very high in the data access process, and far exceeds the access of the metadata of the file.

When a file is read, the metadata is firstly required to be accessed before the data is accessed, and after the size, the access authorization, the storage position and other information of the file data are inquired and obtained according to the file name and other information, specific file data can be read from the storage disk according to the position information stored in the file. However, a large number of file access requests may not need to acquire specific data of a file, but only access basic information of the file, at this time, only metadata needs to be accessed, and in the metadata, not all information needs to be accessed every time of access, and if all metadata of the file is acquired for every access, metadata access efficiency is low.

Based on the method for storing data through an erasure technology, the metadata of the file can be separated, the metadata with high heat can be stored in the system disk through the key value database, the metadata with low heat can be stored in the data disk, and only the metadata with high heat or the metadata with high heat and the metadata with low heat can be accessed according to the access requirement, so that the metadata with high heat and the metadata with low heat can be prevented from being accessed simultaneously during each access, and the metadata access efficiency is improved. It should be noted that the metadata with high popularity is accessed more frequently and more easily than the metadata with low popularity.

Fig. 1 is a schematic structural diagram of an erasure-stored file system according to an embodiment of the present application, and as shown in fig. 1, an exemplary storage structure of a file data in the erasure-stored file system is shown, where the file data in the erasure-stored file system has the same storage structure. As shown in fig. 1, the file data may be divided into a plurality of data segments, each data segment may be stored as an object, each object may include a plurality of data blocks, where each data block may include a valid data block and a redundant data block, the data block in which the file data is stored may be used as a valid data block, the data block in which the redundant data is stored may be used as a redundant data block, the redundant data may be used to recover the valid data with damage, the number of the data blocks included in each object may be determined according to a preset erasure correction ratio, and the data blocks in each object are respectively distributed on different disks of the file system.

Fig. 2 is a schematic diagram of dividing a disk space according to an embodiment of the present application, and as shown in fig. 2, for any disk in fig. 1, the disk space may include: the device comprises a reserved area, a super block and a plurality of block groups, wherein each block group can comprise: the method includes that each index area may include a main index area and a backup index area, the information included in the backup index area is the same, taking the main index area as an example, the main index area may include a plurality of sub index units, one sub index unit may include 64 file nodes, and each file node stores metadata information of a data block, where the metadata information may include: the usage state of the data block, the file name, the file creation time, the file modification time, the file sequence number, the occupied capacity of the current data block, and the like.

The data area may include 4096 data blocks, each data block stores specific file data, each data block has a size of 64M, and each data block and its corresponding metadata are stored in one block group.

The super block stores the information of the number of index units, the number of block groups, the size of data blocks and the like.

Fig. 3 is a first schematic flowchart of a data processing method according to an embodiment of the present application; the execution subject of the method may be an electronic device or a processing device such as a processor. The method may be applied to the file system based on erasure correction storage shown in fig. 1, and as shown in fig. 3, the method may include:

s301, acquiring a file creation request, wherein the file creation request comprises: the size of the target file, the name of the target file.

The premise of file reading and writing is that a file is created first, data can be written into the created file on the basis of the created file, and after the data writing is finished, the file can be read.

Optionally, a file may be created according to the file creation request, where the created file is also a null file, and the created file does not include any specific data.

The file creation request may include: the size of the target file, the name of the target file, where the target file may refer to any file that is currently to be created.

S302, determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, a preset erasure ratio and the size of a preset data block.

Here, the erasure correction ratio can be determined according to an erasure correction strategy, and after the erasure correction strategy is selected, the erasure correction ratio is determined, and the erasure correction strategy mainly encodes original data through an erasure correction code algorithm to obtain redundant data, and stores the file data and the redundant data together to achieve the purpose of fault tolerance.

As explained above, the size of the preset data block may be 64M. Assuming that the size of the target file is 1GB (1024M), and the preset erasure ratio is 4:1, the number of data blocks corresponding to the target file is 1024/64-16 according to the size of the target file and the size of the preset data blocks, and the preset erasure ratio is 4:1, which represents that one object includes 5 data blocks, where 4 data blocks are valid data blocks and 1 data block is a redundant data block, and since 1 object can include 4 valid data blocks to store file data, the number of objects corresponding to the target file is 4 in the case that the target file corresponds to 16 data blocks, so that the data of the target file can be stored in each object respectively.

Based on the analysis, the number of objects into which the target file is divided and the number of data blocks included in each object can be determined, and when the size of the target file and the preset erasure ratio are changed, the number of objects into which the target file is divided and the number of data blocks included in each object can still be determined according to the method.

And S303, respectively creating a data block of each object on each selected disk according to a preset erasure ratio, and generating file creation information.

Wherein the file creation information may include: the file system comprises a target file, a file name of the target file, a file size, a file storage path, a file identifier, file creation time, the number of objects contained in the file, and information of each data block in each object contained in the file, wherein the data block in each object comprises an effective data block and a redundant data block, and each data block under the same object is distributed on different disks of the selected file system.

In some embodiments, the number of disks to be used for creating the data blocks may also be determined based on the predetermined erasure correction ratio, as described above, the number of data blocks included in one object may be determined according to the predetermined erasure correction ratio, and since the data blocks in the same object are distributed on different disks of the file system, the number of disks to be selected corresponds to the number of data blocks in one object.

Optionally, based on the selected disks, a data block under an object may be created on each disk, so as to complete the creation of the target file, and generate file creation information.

S304, according to the file creation information, respectively generating first-class metadata and second-class metadata of the target file, storing the first-class metadata and the second-class metadata, storing the first-class metadata into a system disk of the file system, and storing the second-class metadata into a data disk of the file system.

The first type of metadata includes: basic information of the target file and an association relation between the first type metadata and second type metadata, wherein the second type metadata comprises: and data block information of the target file, wherein the access frequency of the first type of metadata is greater than that of the second type of metadata.

The first type of metadata may refer to the above-mentioned high-heat metadata, and the second type of metadata may refer to the low-heat metadata.

In some embodiments, the first type metadata and the second type metadata of the target file may be generated according to the generated file creation information, where the first type metadata may be generated according to some information in the file creation information, the second type metadata may be generated according to other information in the file creation information, and the generated first type metadata and the generated second type metadata may be stored.

Optionally, in this embodiment, the generated first type of metadata may be stored in a system disk of a file system, from a computer perspective, the system disk is a disk of a computer-mounted system, which may generally be referred to as a C disk, the second type of metadata may be stored in a data disk of the file system, and the data disk is a disk used for storing data other than the system disk, and may generally be referred to as a D disk, an E disk, and the like.

It should be noted that, the first type of metadata herein may also include an association relationship between the first type of metadata and the second type of metadata, in some scenarios, when only the file basic information of the target file needs to be accessed, this may be achieved by accessing the basic information of the target file in the first type of metadata, in other scenarios, when the file data of the target file needs to be accessed, the file can be accessed by accessing the basic information of the target file in the first type of metadata and the association relationship between the first type of metadata and the second type of metadata, thereby determining information of the data block of the target file, reading the file data of the target file from the disk according to the information of the data block, through the established incidence relation between the first type metadata and the second type metadata, the metadata data amount required to be acquired can be reduced when the metadata information is accessed.

Because the information in the metadata is not accessed uniformly in the data access process, based on the classification of the metadata, the first type of metadata accessed at high frequency and the second type of metadata accessed at low frequency are stored separately, and the metadata can be accessed according to the access requirement, so that unnecessary metadata access is reduced, the data access amount is reduced to a certain extent, and the data access efficiency is improved. And the first type metadata and the second type metadata are respectively stored in the system disk and the data disk, so that the access times of the data disk can be reduced and the pressure of the data disk is reduced during data access. In addition, based on the classified storage of the first type metadata and the second type metadata, the data volume of the metadata needing to be updated when the metadata information is modified can be reduced, and the reading and writing performance of the metadata is improved.

In summary, the data processing method provided in this embodiment includes: acquiring a file creating request, wherein the file creating request comprises: the size of the target file, the name of the target file; determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, a preset erasure ratio and the size of a preset data block; respectively creating a data block of each object on each selected disk according to a preset erasure ratio, and generating file creation information; according to the file creation information, first-class metadata and second-class metadata of the target file are respectively generated and stored, the first-class metadata and the second-class metadata are stored in a system disk of the file system, and the second-class metadata are stored in a data disk of the file system. The method classifies the metadata information of the file according to the access frequency to respectively generate first metadata and second metadata, stores the first metadata into a system disk of the file system, stores the second metadata into a data disk of the file system, can access the metadata according to the access requirement by separately storing the first metadata accessed at high frequency and the second metadata accessed at low frequency based on the classification of the metadata because the information in the metadata is not accessed uniformly in the data access process, thereby reducing unnecessary metadata access, reducing the data access amount to a certain extent, improving the data access efficiency, and storing the first metadata and the second metadata into the system disk and the data disk respectively, reducing the access times of the data disk when accessing the data, the data disk is depressurized.

Fig. 4 is a schematic flowchart illustrating a second data processing method according to an embodiment of the present application; optionally, in step S304, generating first type metadata and second type metadata of the target file according to the file creation information, and storing the first type metadata and the second type metadata respectively may include:

s401, generating basic information of the target file according to the file name, the file size, the file creating time, the file storage path and the number of the objects contained in the file, and generating an association relationship between the first type metadata and the second type metadata according to the file identification and information of each data block in each object contained in the file, wherein the association relationship is used for representing a mapping relationship between the file identification of the target file and each data block in each object contained in the file.

Optionally, the first type of metadata includes basic information of the target file, where the basic information may include some basic data such as a name of the target file, a file identifier, a file size, file creation time, modification time, and a storage path, and an association relationship between the first type of metadata and the second type of metadata may be constructed according to the file identifier of the target file in the first type of metadata and information of each data block included in the target file, and the association relationship may be used to query and obtain information of each data block of the target file according to the file identifier of the target file.

S402, obtaining the first type metadata of the target file according to the basic information of the target file and the incidence relation between the first type metadata and the second type metadata.

Therefore, the first type metadata of the target file can be obtained by combining the obtained basic information of the target file and the association relationship between the first type metadata and the second type metadata.

And S403, generating data block information of the target file according to the information of each data block in each object contained in the file, and taking the data block information as second-type metadata.

The information of each data block in each object included in the file may refer to storage location information of each data block, that is, a disk location corresponding to each data block, and the second type metadata may include information of all data blocks of the target file.

And inquiring and acquiring each data block from the corresponding disk according to the information of each data block, and reading the stored file data from the data block.

In addition, the second type of metadata may further include: data block size, data block name, etc.

Here, it should be noted that, in this embodiment, the first type of metadata is stored in a form of key value pairs, that is, the first type of metadata is stored in a system disk of a file system through a key value pair database.

The key-value pair database is a novel database except a relational database, each record in the database is a key-value pair, the key-value pair comprises two elements, namely a key and a value, and the key and the value are variable-length byte sequences. Wherein the key and value can be binary data or text character string, and the key in the database must be unique. The key value pair database can provide functions such as a persistence mechanism and data synchronization, has the characteristics of high concurrency performance, high expandability, high reliability and the like, and is an effective method for storing metadata.

The association relationship between the first type metadata and the second type metadata may be represented in the form of a table. Mainly comprises the following table structure: the file information table is used for storing entries of the first type of metadata of the file, and includes a file path, file creation time, modification time, file size, the number of objects included in the file, file state, POOL to which the file belongs, and a file ID (file identifier), as shown in table 1:

table 1: file information table

The table shares one key: "FI @ File ID", FI is the meaning of File Info, and File ID is the only identification of File, and the characters of concatenation are '@'. The file status is represented by "0, 1, 2", where '0' means that the file is not written after creation, '1' means in writing, '2' means that the file is closed after writing is completed. The Pool ID is the number of the storage resource Pool to which the file belongs. The number of objects indicates how many OBJ objects are occupied by the file.

The data Block information table records information of all (Block) data blocks contained in each file, keys are represented by 'FB @ file ID', wherein the file ID indicates that the blocks in the table belong to a specific file, and the Block description information stores basic information of each Block. As shown in table 2.

Table 2: data block information table

The description information of Block is represented by a string of specified format character strings, wherein, the 5 th, 10 th and 12 th bits are connection symbols; the first 4 bits are denoted by '0000' as a reservation; the 6-9 bits are represented by 4-bit numbers, and the number of a disk corresponding to the block is recorded; the state of 11-bit recording block is respectively represented by 0-5, namely an initialization state, a normal state, a write-in state, a damage state, a missing state and an off-line state; the 13 bits represent the capacity state of the block. In the data block information table, a plurality of block information are adopted; ' segmentation.

The Block description information format table records the specific format of each data Block and the information represented by each bit of the data Block.

Table 3: block description information format

And the file index table is used for indexing the files and quickly searching the information of the specified files. Wherein the key is represented by "DIMF @ File directory: the file name "indicates that value is a file ID.

Table 4: file index table

Optionally, after the target file is successfully created, the data block information table may be generated according to the generated file creation information, and then the file information table is generated through metadata information of the target file, and the database is called, so that the file index table, the data block information table, and the file information table are all written into the key value pair database.

Fig. 5 is a third schematic flowchart of a data processing method according to an embodiment of the present application; optionally, in step S304, after generating first-type metadata and second-type metadata of the target file and storing the first-type metadata and the second-type metadata respectively according to the file creation information, storing the first-type metadata in a system disk of the file system, and storing the second-type metadata in a data disk of the file system, the method of the present application may further include:

s501, a file writing request aiming at the target file is obtained, wherein the file writing request comprises: name of the target file, file data of the target file.

Based on the created target file, the present embodiment writes data into the target file according to the acquired file write request. Wherein, the file writing request may include: the name of the target file, and the file data of the target file, where the file data is also the specific data to be written into the created target file, may be understood as follows: the created target file is an empty file, and the file data of the target file is also the specific content to be written in the created empty file.

It should be noted that the target file may be any file, and for convenience of understanding, the target file to be written and the target file created as described above may be considered as one file.

S502, determining whether the target file exists in the created files according to the name of the target file.

When writing the target file, before that, a plurality of files may have been created in the file system, and whether the target file exists may be searched from the plurality of files created in the file system according to the name of the target file, that is, whether the target file has been created in the file system is searched.

Optionally, the obtained name of the target file may be compared with file names of files already created in the file system to determine whether the target file exists.

And S503, if the target file exists, determining redundant data contained in each object of the target file according to the file data of the target file, wherein the redundant data is used for recovering the file data of the target file.

And if the name of the created file in the file system is the same as the name of the target file, confirming that the target file exists.

In the file system based on erasure correction storage, it has been described in the foregoing that, among the data blocks included under one object of one file, there are valid data blocks for storing valid data of the file and redundant data blocks for storing redundant data.

When the valid data stored in any valid data block under an object is damaged, erasure calculation can be performed according to the valid data in the rest valid data blocks under the object and the redundant data in the redundant data block, so that the damaged valid data can be recovered.

Here, the redundant data may be calculated using an erasure calculation function. Erasure correction calculation is an existing calculation method and can be understood with reference to the description. Alternatively, the corresponding redundant data under one object may be calculated according to each valid data under the object.

S504, writing the redundant data contained in each object of the target file into the redundant data block of each object, writing the file data of the target file into the effective data block of each object respectively, and updating the first-type metadata and the second-type metadata corresponding to the target file.

Optionally, the valid data block and the redundant data block are only used to distinguish the types of data stored in the data block, the valid data block and the redundant data block are both general data blocks under an object, any data block can be regarded as a valid data block after storing valid data, and can be regarded as a redundant data block after storing redundant data, and the redundant data block and the valid data block are not a specific data block under an object.

After the file data of the target file is written, the first type metadata and the second type metadata of the target file, which are generated when the target file is created, can be adaptively updated.

Wherein, can include: if the size of the target file in the first type of metadata is updated, and the size of the written file is inconsistent with the size of the created file due to the fact that partial data writing failure may exist in the process of writing the file data or the fact that only partial data is written by the user, the size of the file in the first type of metadata of the target file can be updated according to the size of the real written file. The state of the data blocks in the second type of metadata may also be updated, where the state of each data block may include: the initialization state, the normal state, the write-in state, the damage state, the missing state and the off-line state can be updated according to the current actual state of each data block. Different states can adopt different digital identification records, and the state of each data block can be updated by changing each identification.

Fig. 6 is a schematic diagram of data separation storage based on erasure correction storage according to an embodiment of the present application. As shown in fig. 6, the first type metadata of the file may be stored in a system disk of the file system in a copy form through the database, the second type metadata of the file may be stored in a data disk of the file system in a copy form through the database, and the association relationship between the first type metadata and the second type metadata is also stored in a system disk in a copy form through the database. Here, data security may be improved by storing in a copy form to prevent data loss.

The effective data of the file and the redundant data obtained through erasure correction calculation are stored in the data disk, so that the first type metadata and the second type metadata of the file are separately stored.

Fig. 7 is a fourth schematic flowchart of a data processing method according to an embodiment of the present application; optionally, in step S504, after writing the redundant data included in each object of the target file into the redundant data block of each object and writing the file data of the target file into the effective data block of each object, the method of the present application may further include:

s701, obtaining a file reading request aiming at a target file, wherein the file reading request comprises: name of the target file and size of the target file, read offset information, and read length information.

After the target file is successfully written, the file can be further read, and the file data of the target file can be read from each data block of the target file according to the obtained reading request of the target file.

The read offset information may refer to an offset of data to be read in a data block, and the read length information may refer to a length of the data to be read.

S702, according to the name of the target file and the read offset information, inquiring first-class metadata of each file in a system disk, and determining information of each data block corresponding to the target file.

In an implementation manner, the state of the target file can be searched from the database according to the name of the target file, whether the target file exists is determined, and under the condition that the target file exists, the identifier of the target file can be obtained according to the name of the target file, so that the data block information table is called, and the information of each data block corresponding to the target file is determined according to the association relationship between the identifier of the target file and each data block.

And S703, reading the target file from the corresponding disk according to the information of each corresponding data block of the target file and the read offset information and the read length information.

Based on the information of the data block of the determined target file, the file data of the target file can be read from the disk corresponding to each data block in each data block according to the read offset information and the read length information.

Fig. 8 is a fifth flowchart illustrating a data processing method according to an embodiment of the present application; optionally, in step S702, querying the first type of metadata of each file in the system disk according to the name of the target file and the read offset information, and determining information of each data block corresponding to the target file may include:

s801, determining a file identifier of the target file according to the name of the target file.

Assuming that the file name of the target file is a, the file identifier of the generated target file can be expressed as: DIMF @ document Path: a.

s802, according to the file identification of the target file, the incidence relation between the first type metadata and the second type metadata of each file is inquired, and the second type metadata corresponding to the target file is determined.

Here, the data block information table is also queried, and according to the association relationship between each data block in the data block information table and the identifier of each file, the information of the data block corresponding to the target file is determined, that is, the second type metadata corresponding to the target file is determined.

And S803, inquiring the second type of metadata to obtain the information of the disk to which each data block of the target file belongs.

Optionally, the identifier of the disk to which each data Block recorded in the Block description information format table belongs may be searched, and the disk information to which each data Block of the target file belongs may be determined.

S804, according to the read offset information, determining the information of the disk to which the target data block corresponding to the target file belongs from the information of each data block corresponding to the target file.

In some embodiments, the file data of the target file to be read currently may only be part of all the file data of the target file, and then, according to the read offset information and the length to be read of each data block, the target data block corresponding to the target file to be read may be determined from each data block, thereby determining the information of the disk to which the target data block belongs.

Fig. 9 is a sixth schematic flowchart of a data processing method according to an embodiment of the present application; optionally, in step S703, reading the target file from the corresponding disk according to the read offset information and the read length information according to the information of each corresponding data block of the target file, which may include:

s901, according to the information of the disk to which the target data block corresponding to the target file belongs, and according to the reading offset information and the reading length information, respectively reading the file data stored in each target data block from the disk to which each target data block belongs.

In some embodiments, for the determined target data blocks of the target file, partial file data may be read from each target data block, where the file data may be read in each target data block according to the read offset information and the read length information.

And S902, combining to obtain the target file according to the file data stored in each read target data block.

Generally, only part of the file data of the target file to be read is read from one target data block, and the complete target file data can be obtained by combining the parts of the target file data read from the target data blocks.

Optionally, in step S901, reading the file data stored in each target data block from the disk to which each target data block belongs respectively may include: and if the reading of the target disk fails and the data stored in the target data block on the target disk is valid data, performing erasure correction calculation to recover the valid data in the target data block on the target disk, wherein the target disk is any one of the disks corresponding to the target data blocks.

In an implementation manner, if, during reading of the target file data, a disk where an effective data block storing effective data is located is damaged, resulting in failure in reading of the part of effective data, then the part of effective data may be recovered by performing erasure correction calculation, so as to ensure integrity of the finally read target file data.

Optionally, performing erasure correction calculation to recover valid data in the target data block on the target disk may include: and calculating the valid data in the target data block on the target disk according to the valid data read from the target data block on the disk except the target disk and the redundant data.

For example, the following steps are carried out: the file data of the target file to be read is correspondingly distributed in each data block under the object 1 of the target file, taking an erasure ratio of 4:1 as an example, the object 1 comprises an effective data block 1, an effective data block 2, an effective data block 3, an effective data block 4 and a redundant data block 5, the effective data block 1, the effective data block 2, the effective data block 3, the effective data block 4 and the redundant data block 5 are respectively distributed in a disk 1, a disk 2, a disk 3, a disk 4 and a disk 5, and if the disk 4 is a damaged disk, when the file data is read, the data is read from the disk 4 fails, that is, the effective data stored in the effective data block 4 fails to be read, then according to the effective data stored in the effective data block 1, the effective data block 2, the effective data block 3 and the redundant data read in the redundant data block 5, and calculating to obtain the effective data stored in the effective data block 4 by adopting erasure correction calculation.

Fig. 10 is a seventh flowchart illustrating a data processing method according to an embodiment of the present application; optionally, in step S303, before creating a data block of each object on each selected disk according to the preset erasure ratio, the method may further include:

and S110, determining the number of the disks to be selected according to a preset erasure correction ratio.

The preset erasure ratio can be determined according to the selected erasure ratio random algorithm, and if the erasure ratio random algorithm is N + M, then the preset erasure ratio is N: m, N: m represents the ratio of valid data blocks to redundant data blocks, and since each data block is distributed on different disks, the number of disks to be selected is N + M.

S111, selecting a number of disks from a plurality of disks in the file system according to the disk weight of each disk in the file system, wherein the disk weight of each disk is determined according to the capacity of each disk and the total capacity of the system disks.

In order to ensure the balance of the disk capacity, the weight of each sucker can be determined according to the capacity of each disk in the total disk, and the disk with higher weight is preferably selected. Specifically, the following algorithm may be used for disk selection:

and randomly taking a random number, dividing the random number by the total weight of the disks to obtain a first round result, comparing the first round result with the disk weight of each disk, determining the disk as a disk to be selected if the first round result falls into a certain disk weight range, removing the disk from the total disks, and recalculating the weights of the rest disks.

And repeating the execution, dividing the random number by the total weight of the disks to obtain a second round result, comparing the second round result with the disk weights of the rest disks, if the second round result falls into a certain disk weight range, determining the disk as the disk to be selected, and executing in sequence until all the disks to be selected are determined.

In an implementation manner, the electronic device for executing the method may be divided into a plurality of functional modules, and the functional modules are processed interactively to execute the method.

FIG. 11 is a block diagram illustrating an architecture of a data processing system according to an embodiment of the present application, the data processing system including: a File Manager (FM), a DataBase module (DataBase, DB), an object management module (OBJ Manager, OM), an Erasure Correction calculation module (EC), and a Disk Data management module (DDM); the functions of the modules may be as follows:

a file management module: after the module receives a file creating request of a client, the FM generates a unique file ID for the file and informs the OM module to allocate free space for the file, wherein the allocated space is composed of a plurality of OBJ (objects), and each OBJ is composed of a plurality of BLKs (data blocks). After the OM return is successfully created, the FM writes the file information (the first type of metadata) and the information returned by the OM (the association relationship between the first type of metadata and the second type of metadata) into the database, and the database is responsible for storing the information into a system disk of the file system. When the file is read, the request issued by the FM analysis client is also used for finding the information of the first type metadata and the associated second type metadata of the corresponding file according to the file name, so that the data block information of the file to be read is obtained.

An object management module: and an OM (open memory) of the management OBJ acquires disk information of the server when receiving the FM file creation request, allocates the OBJ according to the current erasure correction ratio and the file size, selects a proper disk for the BLK in each OBJ, creates the BLK on the selected disk and writes metadata of the BLK. And after the OM is established, returning the integrated information to the FM. When reading data, the disk corresponding to the data block to be accessed is found through the block file information issued by the FM, and the data information of the BLK is accessed. The OBJ module is also responsible for updating the database of the changes of the file-to-block file relationship caused after the data recovery.

A database module: the database module stores server disk information, metadata information of files and the incidence relation between the files and the data blocks. And double writing is realized through the self function of the database so as to ensure the safety of the metadata, and the speed of accessing the metadata is accelerated in a cache mode.

And an erasure calculation module: and the FM reader-writer is responsible for calculating read-write data, and when a file is read and written, erasure calculation is performed according to data information transmitted by the FM to obtain verified data or restore correct data.

The disk data management system: and managing data on the disks, finding the corresponding disks according to the BLK information transmitted by the OBJ, and reading and writing the BLK file.

In the file creating process:

a) when receiving a file creating request of a client, an FM firstly queries a database to ensure whether the file exists or not, returns that the file exists or not, and generates a file index table and a file identifier when the file does not exist;

b) OM obtains all disk information from database, and prepares to establish BLK according to random calculation of erasure ratio for N + M disks, in order to ensure disk capacity balance, sets disk weight according to disk capacity, preferentially selects disk with high weight;

c) OM selects enough disks, and then creates BLK files on each disk through DDM;

d) the DDM creates a BLK file and writes BLK file information (namely BLK metadata) in the disk partition;

e) after the FM waits for the OM to return that the file is successfully created, a data block information table is generated according to the information of the created file, and a file information table is generated through file metadata information. And calling the database, and simultaneously writing the file index table, the data block information table and the file information table into the database.

In the file reading process:

a) the client reads file information, after receiving a file information reading request, the FM inquires information such as the state, existence and size of the file from a database, and calculates the OBJ to be read;

b) the OM calls the DDM to read the BLK file on each disk through the received OBJ information, the read offset and the read length;

c) the DDM finds the position of the BLK to be read on the disk through the INode (index unit) information on the disk, reads the data corresponding to the offset length, and returns to the OM.

d) And after receiving the returned result, the OM returns the EC, the EC judges whether erasure calculation needs to be carried out or not according to the returned result of the OM, if not, the EC directly returns the FM, and if calculation is needed, the EC returns after calculation.

The process of the present application will be illustrated by specific examples as follows:

example 1 is a case where erasure correction calculation is not required:

and (4) carrying out erasure correction: 1, there are 5 disks on the file system, the numbers of which are 1, 2, 3, 4, and 5, where the disk 5 is damaged, and the client needs to read a 900M file with a file path of/storagecli/file.

At this time, in the database, the file index table is

DIMF@storage:file

123456

The file information table is

The data block information table is:

the client needs to read data with an offset of 16MB and a length of 1M (1024k) for the first time, and after receiving the request, the FM queries the database to obtain that the file ID is 123456 and the file size is 900M. The file state is normal and can be read. The BLK file under the first set of OBJ is looked up from the FB table by calculating the offset 16MB on the first OBJ. The BLK information is 0000-0001-01-1; 0000-; 0000-; 0000-; 0000-; while calculating an offset of 16MB in BLK of 4MB (16 divided by 4), each block needs to be read 256 KB.

Fig. 12 is a schematic diagram illustrating a data block reading according to an embodiment of the present application.

Sending BLK information to be read and offset and read length to an OM by the FM, obtaining each BLK information by the OM, finding a magnetic disc corresponding to each BLK, sending the BLK name, the magnetic disc and the read offset and read length to a DDM (data distribution management module), finding a corresponding BLK file name by the DDM through a file index (Fnode) in a specified magnetic disc partition, obtaining the offset of the BLK in the magnetic disc, reading corresponding data according to the offset and the read length of the BLK, and returning the corresponding data to the OM.

The OM returns the read data to the EC module, at this time, the DDM fails to read the disk 5, therefore, in the data returned by the OM, the length of the 5 th BLK read data is 0, the read result is failure, and the EC module finds through analysis that the erasure ratio is 4:1, effective data is on the first 4 BLKs, and data can be directly combined without erasure calculation. And after the EC combines the data, returning the EC to the FM, and returning the FM to the client, and finishing the data reading.

Example 2 is a case where erasure correction calculation needs to be performed:

the client reads data for the second time, the reading offset is 257MB, the reading length is 1M, after the FM receives the request, the FM calculates that the data required by the client is on the second OBJ, the data is the same as the data required by the client for the first time, and the BLK under the second OBJ is 0000-; 0000-; 0000-; 0000-; 0000-0001-01-1; the offset in each BLK is set 9 stripes, requiring 8 stripes to be read.

Fig. 13 is a schematic diagram of another data block reading according to an embodiment of the present application.

In the same way as the first reading, FM sends BLK and information to be read to OM, OM sends to DDM, at this time, since BLK of the second OBJ is 4 th BLK on disk No. 5, DDM fails to read 4 th BLK. And the OM returns the read data to the EC, the EC finds that the fourth strip is a data strip through the OM result, the reading fails, erasure correction calculation is needed to recover the data, and the recovered data is returned to the FM after calculation. And the FM is returned to the client, and the data reading is finished.

Embodiment 3 is directed to a case where only basic information of a file is read, and specific data of the file is not read:

the client scans how many files are stored on the server and obtains the information of the files. The client sends a request for acquiring file information to the FM, and acquires file information under the specified directory/storage/filegroup 1. After receiving the request, the FM queries the database, and sends the query to the database through the directory/storage/filegroup 1/composition key: the DIMF @ storage/filegroup1 scans the files under the directory, acquires the file IDs of the files under the directory after acquiring the files, and then forms the key by each file ID: FI @ FileID scans the file information table. And returning the inquired information to the client.

In the process, the client only obtains the file information and does not want to read the content of each file. Therefore, after the FM acquires the request, the FM can return to the client only by querying the specified directory or the specified file information in the database (cache), and the BLK information under the file and the specific distribution of the BLK on the disk are not required to be queried.

Embodiment 4 is directed to a case where, when a data block is damaged, a first type of metadata and a second type of metadata are updated:

when a BLK is damaged, data recovery is needed, when the BLK is recovered, the BLK needs to be rewritten in the disk, if the BLK is recovered on the disk corresponding to the BLK, after the BLK is recovered, the BLK may not be in the original Block group (Block group), but the number of the disk where the BLK is located is not changed, the metadata that needs to be changed at this time is only the second type of metadata, the second type of metadata does not need to be changed, and the association relationship between the two types of metadata does not need to be changed, so that the modification in the database is not needed, and the information of the file obtained by the client is not affected. Similarly, if the information of the file is modified, such as modifying the name of the file, only the first type of metadata needs to be updated, the association relationship between the first type of metadata and the second type of metadata does not need to be updated, and the disc writing times of the data disc are effectively reduced.

In summary, the data processing method provided by the present application includes: acquiring a file creating request, wherein the file creating request comprises: the size of the target file, the name of the target file; determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, the preset erasure ratio and the size of the preset data blocks; respectively creating a data block of each object on each selected disk according to a preset erasure ratio, and generating file creation information; according to the file creation information, first-class metadata and second-class metadata of the target file are respectively generated and stored, the first-class metadata and the second-class metadata are stored in a system disk of the file system, and the second-class metadata are stored in a data disk of the file system. The method classifies metadata information of files according to access frequency to respectively generate first type metadata and second type metadata, stores the first type metadata into a system disk of a file system, stores the second type metadata into a data disk of the file system, can access the metadata according to access requirements because the information in the metadata is not accessed uniformly in the data access process, based on the classification of the metadata, can reduce unnecessary metadata access, reduce data access amount to a certain extent and improve data access efficiency by separately storing the first type metadata accessed at high frequency and the second type metadata accessed at low frequency, and can reduce the access times of the data disk when accessing data, the data disk is depressurized.

The following describes apparatuses, devices, and storage media for performing the methods provided in the present application, and specific implementation procedures and technical effects thereof are referred to above, and will not be described again below.

Fig. 14 is a schematic diagram of a data processing apparatus according to an embodiment of the present application, where functions implemented by the data processing apparatus correspond to steps executed by the foregoing method. The apparatus may be understood as the electronic device or the server or the processor of the server, and may also be understood as a component that is independent of the server or the processor and implements the functions of the present application under the control of the server, as shown in fig. 14, the apparatus may include: an acquisition module 140, a determination module 141, and a generation module 142;

an obtaining module 140, configured to obtain a file creation request, where the file creation request includes: the size of the target file, the name of the target file;

a determining module 141, configured to determine, according to the size of the target file, a preset erasure ratio, and a preset size of a data block, the number of objects into which the target file is divided and the number of data blocks included in each object;

a generating module 142, configured to respectively create a data block of each object on each selected disk according to a preset erasure ratio, and generate file creation information, where the file creation information includes: the method comprises the steps of obtaining the file name, the file size, the file storage path, the file identification, the file creation time, the number of objects contained in a file and the information of each data block in each object contained in the file, wherein the data block in each object comprises an effective data block and a redundant data block, and each data block under the same object is distributed on different disks of a selected file system;

the generating module 142 is configured to generate first-type metadata and second-type metadata of the target file according to the file creation information, store the first-type metadata in a system disk of the file system, and store the second-type metadata in a data disk of the file system; the first type of metadata includes: basic information of the target file and an incidence relation between the first type metadata and the second type metadata, wherein the second type metadata comprises: and data block information of the target file, wherein the access frequency of the first type of metadata is greater than that of the second type of metadata.

Optionally, the generating module 142 is specifically configured to generate basic information of the target file according to the file name, the file size, the file creation time, the file storage path, and the number of objects included in the file, and generate an association relationship between the first type of metadata and the second type of metadata according to the file identifier and information of each data block in each object included in the file, where the association relationship is used to represent a mapping relationship between the file identifier of the target file and each data block in each object included in the file;

and generating data block information of the target file according to the information of each data block in each object contained in the file, and taking the data block information as second-type metadata.

Optionally, the apparatus further comprises: a write module;

the obtaining module 140 is further configured to obtain a file write request for the target file, where the file write request includes: name of the target file, file data of the target file;

a determining module 141, configured to determine whether a target file exists in the created file according to the name of the target file;

the determining module 141 is further configured to determine, if the target file exists, redundant data included in each object of the target file according to the file data of the target file, where the redundant data is used to recover the file data of the target file;

and the writing module is used for writing the redundant data contained in each object of the target file into the redundant data block of each object, writing the file data of the target file into the effective data block of each object respectively, and updating the first type metadata and the second type metadata corresponding to the target file.

Optionally, the apparatus further comprises: a reading module;

optionally, the obtaining module 140 is further configured to obtain a file read request for the target file, where the file read request includes: the name of the target file, the size of the target file, reading offset information and reading length information;

the determining module 141 is further configured to query the first type of metadata of each file in the system disk according to the name of the target file and the read offset information, and determine information of each data block corresponding to the target file;

Optionally, the determining module 141 is specifically configured to determine the file identifier of the target file according to the name of the target file;

and according to the read offset information, determining the information of the disk to which the target data block corresponding to the target file belongs from the information of each data block corresponding to the target file.

Optionally, the reading module is specifically configured to, according to information of a disk to which a target data block corresponding to the target file belongs, read file data stored in each target data block from the disk to which each target data block belongs, according to the read offset information and the read length information;

and combining to obtain the target file according to the file data stored in each read target data block.

Optionally, the reading module, specifically, if reading of the target disk fails and data stored in a target data block on the target disk is valid data, performs erasure correction calculation to recover the valid data in the target data block on the target disk, where the target disk is any one of the disks corresponding to the target data blocks.

Optionally, the reading module calculates valid data in the target data block on the target disk specifically according to valid data read from the target data block on the disk other than the target disk and the redundant data.

Optionally, the determining module 141 is further configured to determine the number of disks to be selected according to a preset erasure correction ratio;

selecting a number of disks from a plurality of disks in the file system according to the disk weight of each disk in the file system, wherein the disk weight of each disk is determined according to the capacity of each disk and the total capacity of the system disks.

The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

The modules may be connected or in communication with each other via a wired connection or a wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, etc., or any combination thereof. The wireless connection may comprise a connection over a LAN, WAN, bluetooth, ZigBee, NFC, or the like, or any combination thereof. Two or more modules may be combined into a single module, and any one module may be divided into two or more units. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application.

Fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device may be a computing device with a data processing function.

The apparatus may include: a processor 801 and a memory 802.

The memory 802 is used for storing programs, and the processor 801 calls the programs stored in the memory 802 to execute the above-mentioned method embodiments. The specific implementation and technical effects are similar, and are not described herein again.

The memory 802 has stored therein program code that, when executed by the processor 801, causes the processor 801 to perform various steps in a method according to various exemplary embodiments of the present application described in the "exemplary methods" section above in this specification.

The Processor 801 may be a general-purpose Processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware components, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

Memory 802, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 802 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

Optionally, the present application also provides a program product, such as a computer readable storage medium, comprising a program which, when being executed by a processor, is adapted to carry out the above-mentioned method embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to perform some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A data processing method is applied to a file system based on erasure storage, and the method comprises the following steps:

acquiring a file creation request, wherein the file creation request comprises: the size of the target file, the name of the target file;

respectively generating first type metadata and second type metadata of the target file according to the file creation information, storing the first type metadata into a system disk of the file system, and storing the second type metadata into a data disk of the file system; the first type of metadata includes: basic information of the target file and an incidence relation between the first type metadata and second type metadata, wherein the second type metadata comprises: and the access frequency of the first type of metadata is greater than that of the second type of metadata.

2. The method according to claim 1, wherein the generating the first type metadata and the second type metadata of the target file respectively according to the file creation information comprises:

3. The method according to claim 1, wherein after the first type metadata and the second type metadata of the target file are respectively generated according to the file creation information, the first type metadata is stored in a system disk of the file system, and the second type metadata is stored in a data disk of the file system, the method comprises:

4. The method according to claim 3, wherein after writing the redundant data included in each object of the target file into the redundant data block of each object and writing the file data of the target file into the valid data block of each object, respectively, the method comprises:

5. The method of claim 4, wherein the querying the first type of metadata of each file in a system disk according to the name of the target file and the read offset information, and determining information of each data block corresponding to the target file comprises:

determining the file identifier of the target file according to the name of the target file;

6. The method according to claim 4, wherein the reading the target file from the corresponding disk according to the read offset information and the read length information according to the information of each corresponding data block of the target file comprises:

according to the information of the disk to which the target data block corresponding to the target file belongs, and according to the reading offset information and the reading length information, respectively reading the file data stored in each target data block from the disk to which each target data block belongs;

7. The method according to claim 6, wherein the reading the file data stored in each target data block from the disk to which each target data block belongs comprises:

and if the reading of the target disk fails and the data stored in the target data block on the target disk is valid data, performing erasure correction calculation to recover the valid data in the target data block on the target disk, wherein the target disk is any one of the disks corresponding to the target data blocks.

8. The method of claim 7, wherein performing the erasure correction computation to recover valid data in the target data block on the target disk comprises:

9. The method according to claim 1, wherein before creating a data block of each object on each selected disk according to the preset erasure ratio, the method comprises:

10. The method according to claim 1, wherein the first type of metadata is stored in the form of key-value pairs.

11. A data processing apparatus, for use in a file system based on erasure-storage, the apparatus comprising: the device comprises an acquisition module, a determination module and a generation module;

the generating module is configured to respectively create a data block of each object on each selected disk according to the preset erasure ratio, and generate file creation information, where the file creation information includes: the file system comprises a target file, a file system and a storage system, wherein the target file comprises a file name, a file size, a file storage path, a file identifier, file creation time, the number of objects contained in the file and information of each data block in each object contained in the file, wherein the data block in each object comprises an effective data block and a redundant data block, and each data block under the same object is distributed on different disks of the selected file system;

the generating module is configured to generate first-type metadata and second-type metadata of the target file according to the file creation information, store the first-type metadata in a system disk of the file system, and store the second-type metadata in a data disk of the file system; the first type of metadata includes: basic information of the target file and an incidence relation between the first type metadata and second type metadata, wherein the second type metadata comprises: and the access frequency of the first type of metadata is greater than that of the second type of metadata.

12. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing program instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is running, the processor executing the program instructions to perform the steps of the data processing method according to any one of claims 1 to 10 when executed.

13. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the data processing method according to one of claims 1 to 10.