WO2018077092A1

WO2018077092A1 - Saving method applied to distributed file system, apparatus and distributed file system

Info

Publication number: WO2018077092A1
Application number: PCT/CN2017/106690
Authority: WO
Inventors: 柴军红; 尹丹; 汪雷
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-10-31
Filing date: 2017-10-18
Publication date: 2018-05-03
Also published as: CN108021562A; CN108021562B

Abstract

A saving method applied to a distributed file system, saving apparatus, system thereof and computer storage medium thereof. Receiving a change record of metadata in real time by means of presetting a saving snapshot period, a current bitmap and a snapshot bitmap, and updating the corresponding current bitmap in real time according to the change record (S11); when determining that the saving snapshot period has been reached (S13), replacing the snapshot bitmap with the updated current bitmap to obtain a new current bitmap and a new snapshot bitmap (S15); and saving according to the new snapshot bitmap, while starting to record a new change record of the metadata again using the new current bitmap (S17). That is to say, the saving period is prolonged by means of regularly saving, so that the same record or file block only needs to be saved once within a period, and sequence writing is performed from top to bottom using a batch mode and according to priority, thereby increasing the degree of aggregation of files, reducing the amount of saved IO data, and ensuring completeness and accessibility of data.

Description

Storage method, device and distributed file system applied to distributed file system

Technical field

The present disclosure relates to computer storage technology, and more particularly to a storage method for a distributed file system, a storage device, a distributed file system having the storage device, and a computer storage medium thereof.

Background technique

In the era of digital information, the term big data is increasingly mentioned as a term used to describe and define the vast amounts of data generated by the information explosion era. According to the Internet Data Center (IDC) survey, the global data generated in 2011 was 1.8ZB (1ZB=1024EB, 1EB=1024PB, 1PB=1024TB, 1TB=1024GB), compared with the same period in 2010. , and increased the amount of data more than 1ZB. By 2020, the world will produce 44 times the size of today's data. Its growth rate is equivalent to more than 200GB of data per person per year worldwide.

In the case of such rapid growth of data, massive data storage technology has become the technical basis to support the rapid growth of data. On the one hand, it puts a severe test on the storage, calculation and extraction of information data. On the other hand, it puts more stringent requirements on the disaster recovery system, backup and archiving of information data. In turn, distributed storage technology has emerged. The existing research on distributed file system is mainly divided into metadata and actual data storage. The metadata request in the file system occupies more than 50% of all requests. Therefore, the metadata management problem becomes an important issue in the research of distributed file system. research direction.

In many current distributed file systems, caching technology has been adopted to achieve efficient metadata access and storage efficiency. Because of any operation of the user on the data object, such as adding, deleting, renaming, etc., it is necessary to trigger the metadata storage operation. Especially when the operation frequency is high and the metadata record changes are very discrete, when the data file corresponding to the save file is written from the memory mirror buffer, the corresponding position of the record needs to be found by the record number, and the metadata disk corresponds to a large number. Random read and write IO operations. For the file write process of the metadata is very hashed for the table record distribution, this will greatly increase the number of internal interactions of the metadata management system, thereby increasing the random read and write IO of the metadata disk, resulting in the metadata disk being busy and saving. The time process will cause the loss of metadata; and the existing saving methods are sorted according to the size of the record number, and the data to be saved is traversed from small to large each time. When the amount of data to be saved is too large, the position is backward. The data may not be saved for several hours, resulting in data loss, which affects the system's access performance and data integrity.

Summary of the invention

The technical problem to be solved by the embodiments of the present invention is to provide a storage method for a distributed file system, and a storage device and system thereof, which maintain a corresponding file system table in a memory by a preset save snapshot period. Current bitmap and snapshot bitmap to record whether the corresponding record has been modified and extended by taking a snapshot of the snapshot bitmap Save the disk cycle, so that the same record or file block in one cycle, only need to save once, and save the disk in batch mode according to the priority from high to low order, thereby increasing the degree of file aggregation to reduce the save IO The amount of data guarantees data integrity and accessibility.

In order to solve the above technical problem, an embodiment of the present invention provides a method for saving a file in a distributed file system, which presets a snapshot cycle, and pre-creates a change for the metadata of the current save snapshot period for the file system table. The current bitmap of the record, and the snapshot bitmap used to represent the change record of the metadata in the last save snapshot period, the save method includes the steps:

Receiving a change record of the metadata in real time, and updating the corresponding current bitmap according to the change record;

Determining whether the current snapshot snapshot period is reached, and if so, performing replacement according to the updated current bitmap and the snapshot bitmap to obtain a new current bitmap and a new snapshot bitmap, and according to the new one obtained after the replacement The snapshot bitmap is saved and the new change record of the metadata is restarted with the new current bitmap.

Wherein, when the save is performed, the save priority is performed according to the save priority sequence of each data segment corresponding to the new snapshot bitmap.

The calculation step of the save priority sequence of each data segment includes the following steps:

Calculating the data aggregation degree of each data segment according to the new snapshot bitmap obtained after the replacement;

Determining whether the data aggregation degree of each data segment is greater than or equal to a preset data aggregation degree threshold, respectively obtaining a plurality of first data segments whose data aggregation degree is greater than or equal to a preset data aggregation degree threshold, and the data aggregation degree is less than a preset a plurality of second data segments of the data aggregation degree threshold;

Each of the first data segments is arranged according to a preset rule, thereby obtaining a corresponding save priority sequence.

According to an exemplary embodiment, extracting each of the second data segment corresponding records into a log file; and checking each second data segment according to a preset check cycle timing until data aggregation of the second data segment When the data aggregation degree threshold is equal to or greater than the preset data aggregation degree threshold, the second data segment is saved to the corresponding save priority sequence according to a preset rule.

The preset rule refers to arranging the data segments according to the degree of aggregation from large to small; and/or, the two save snapshot cycles are one check cycle.

Correspondingly, an embodiment of the present invention further provides a disk storage device applied to a distributed file system, including:

a processing module, configured to preset a save snapshot period, and pre-create a current bitmap for indicating a change record of metadata in the current save snapshot period for the file system table, and for indicating metadata in the last save snapshot period Change the snapshot bitmap of the record;

a data access module, configured to receive a change record of metadata input by the user in real time;

And an update module, configured to update the current bitmap in real time according to the change record of the metadata received by the data access module;

a save disk module, configured to replace the snapshot bitmap with the updated current bitmap when the save snapshot period is reached, to obtain a new current bitmap and a new snapshot bitmap, and according to the new snapshot The bitmap is saved, and at the same time, the update module is triggered to update the new current bitmap according to the new change record of the metadata.

The save module includes:

a determining unit, configured to determine whether the current snapshot snapshot period is reached;

a replacement unit, configured to replace the updated current bitmap with the snapshot bitmap when the determining unit determines that the save snapshot period is currently reached, to obtain a new current bitmap and a new snapshot bitmap. ;

a priority sorting unit, configured to calculate a save priority sequence of each data segment according to the new snapshot bitmap obtained after the replacement;

And a saving thread unit, configured to save each corresponding data segment according to the saving priority sequence.

The processing module is further configured to preset a data aggregation degree threshold, where the priority ordering unit includes:

a data aggregation degree calculation sub-unit, configured to calculate a data aggregation degree of each corresponding data segment according to the new snapshot bitmap obtained after the replacement;

Comparing the sub-units, the data aggregation degree of each data segment is compared with the preset data aggregation degree threshold, respectively, to obtain a plurality of first data with a data aggregation degree greater than or equal to a preset data aggregation degree threshold. a segment, and a plurality of second data segments whose data aggregation degree is less than a preset data aggregation degree threshold;

The sorting subunit is configured to arrange each of the first data segments that are greater than or equal to the preset data aggregation degree threshold according to a preset rule according to the comparison result of the comparison subunit, to obtain a corresponding save priority sequence.

According to an exemplary embodiment, the prioritization unit further includes:

Writing a recording subunit, configured to extract a record corresponding to each of the second data segments into a log file; and the comparing subunit is further configured to check each second data segment according to a preset check cycle timing, until When the data aggregation degree of each second data segment is equal to or greater than a preset data aggregation degree threshold, the sorting subunit is triggered to save the second data segment to a corresponding priority queue according to a preset rule.

Based on the foregoing storage device, the embodiment of the present invention further provides a distributed file system, which includes any of the above-mentioned storage devices, and the storage method thereof is the same as the above-described storage method.

An embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores one or more programs executable by a computer, and when the one or more programs are executed by the computer, causing the computer to execute the application. A method of saving a distributed file system.

Embodiments of the present invention have the following beneficial effects:

The saving method and the saving device provided by the embodiment of the present invention record whether the metadata is modified, and pass the snapshot bit by pre-storing the snapshot cycle and maintaining the current bitmap and the snapshot bitmap corresponding to the file system table in the memory. Take a snapshot to extend the save cycle, so that you can save the same record or file block in one cycle, and write the disk in batch mode according to the priority from high to low, thus increasing the file size. The degree of aggregation reduces the amount of data stored in the IO, ensuring data integrity and accessibility.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only Is the invention Some of the embodiments can be obtained by those of ordinary skill in the art in view of the drawings without any inventive effort.

1 is a schematic diagram showing the basic architecture of a distributed file system based on the present invention;

2 is a flow chart of an embodiment of an application and a file storage method of a distributed file system according to the present invention;

3 is a schematic diagram showing the replacement of the current bitmap and the snapshot bitmap in step S15 of FIG. 2;

Figure 4 is a flow chart reflecting an embodiment of step S17 of Figure 2;

5 is a timing diagram reflecting an embodiment of a write file based on the save method of the distributed file system of FIG. 2;

Figure 6 is a functional block diagram of an embodiment of a disk storage device for use in a distributed file system of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The present invention is applied to a distributed file system DFS, and the basic architecture of the distributed file system is shown in FIG. When the user accesses the client FAC to write the file through the file, that is, when the metadata is changed, the file full path is first sent to the directory tree server DTS, and the global unique identifier FILEID and the file location register FLR corresponding to the file are obtained; secondly, the file access client The FAC sends a write file request to the file location register FLR to obtain the data block copy location information of the file (usually a file is divided into several data blocks of the same size, for example, a data block 64M size, called a CHUNK); Finally, the file access client FAC establishes a connection with the data storage server, passes the data block to the data storage server, and writes to the disk.

The metadata structure of the DFS is organized: the directory tree server DTS is used to manage the file namespace, the global unique identifier FILEID is allocated, and the FLR is allocated; the file location register FLR is used to manage file attributes (such as FILEID, file size). , file type, access rights, uid, gid, etc.) and the storage location of the file contents.

The present invention is based on the above-described distributed file system, which sets two bitmaps for the file system table by setting the save snapshot period in advance, that is, the current bitmap and the snapshot problem, respectively, for indicating the current save snapshot period intra-element a record of the change of the data, and a record of the change of the metadata in the last save snapshot period, and only when the save snapshot period is reached, the two bitmaps are replaced, and then the save is performed according to the replaced snapshot bitmap, thereby The snapshot is saved in a timed manner, that is, only one snapshot of the same file/data block is saved in a save snapshot cycle, thereby avoiding the need to save a record every time in the existing mode, and doing a snapshot The data is lost, and the present invention saves the disk according to the priority sequence, and performs batch writing in the order of high to low, thereby increasing the degree of aggregation of the file to reduce the amount of data stored in the disk, thereby ensuring data. Integrity, and accessibility.

Embodiment 1

2 is a flowchart of an embodiment of a method for saving a disk in a distributed file system according to the present invention. In the embodiment, in order to avoid the occurrence of the change record every time, the disk is saved once. Therefore, in this embodiment, the disk save snapshot period is set in advance, thereby implementing the timed save disk, and two different sizes of the current need to be separately created for the file system table. The bitmap and the snapshot bitmap, in this embodiment, the saving method includes the following steps:

S11: Receive a change record of the metadata in real time, and update the corresponding current bitmap according to the change record in real time.

In this embodiment, the metadata change record refers to N change records generated when operations such as adding metadata, modifying existing metadata, or deleting existing metadata are performed.

In this embodiment, the save snapshot period is preset, so that only when a save snapshot period is reached, the save snapshot is performed, thereby implementing a timed batch manner to save the snapshot, thereby avoiding the need to save the disk every time in the existing manner. Record, all have a snapshot problem.

In this embodiment, the current bitmap cur_bit is used to indicate the change record of the metadata in the current save snapshot period; the snapshot bitmap snap_bit represents the record of the last save cycle change. In an exemplary embodiment, the size of the current bitmap and the snapshot bitmap is proportional to the table capacity of the file system table, and 0 and 1 are used to record whether the corresponding record is modified, that is, when the metadata is changed. By traversing the current bitmap, and sequentially setting the corresponding position in the current bitmap according to the change record, as shown in FIG. In addition, in the process of creating a table, you need to create a mirror cache for the table. When the system is powered on, you need to initialize the mirror cache to 0, and the corresponding mirror location is also initialized to 0.

S13. Determine whether the save snapshot period is currently reached. If yes, execute step S15. Otherwise, execute step S11.

In this embodiment, the save snapshot period is preset, specifically, by a timer, and when the timer is reached, that is, when the save snapshot period is reached, a message, such as a pulse signal, is triggered to trigger the save. Thread, therefore, it can be directly judged according to the message fed back by the timer whether the save snapshot period is reached.

S15. Replace the snapshot bitmap and the updated current bitmap to obtain a new current bitmap and a new snapshot bitmap.

In this embodiment, since the current bitmap is used to indicate the change record of the metadata in the current save snapshot period, and the snapshot bitmap represents the change record of the metadata in the last save snapshot period, when the save snapshot period is reached, , the current bitmap can be directly replaced with the snapshot bitmap, as shown in FIG. 3, that is, the snapshot bitmap is cleared and replaced with a new current bitmap, so as to immediately restart the recording of the new change record of the metadata; The bitmap is replaced by a new snapshot bitmap, which is used as the basis for saving, that is, the file system table in the new snapshot bitmap obtained after the replacement until the next storage snapshot period is completed. Record and save the operation.

S17, saving according to the new snapshot bitmap obtained after the replacement, and re-recording the new change record of the metadata with the new current bitmap, and executing step S13.

In order to reduce the amount of data stored in the IO, in the embodiment, the disk is saved in the order of the priority of the data segment when the disk is saved, that is, the amount of data stored in the disk is reduced by increasing the degree of aggregation of the file, as shown in FIG. The step S17 includes the steps of:

S171. Calculate a data aggregation degree of each data segment according to a new snapshot bitmap obtained after the replacement.

In this embodiment, since the data in the snapshot bitmap is saved, each time 16K is used as a unit, which is called a data segment DATA, wherein the ratio of the data segment to the file system table record length TupleLen is stored in the segment. The maximum number of records MaxTupleNumber; the degree of data aggregation in each data segment refers to the record of changes in the bitmap (ie bitmap bit) The ratio set to 1) is multiplied by 100 with MaxTupleNumber, and the data aggregation degree DP of the data segment is used as a parameter for sorting the data segment prioritization.

S173: Compare data aggregation degrees of the data segments with the preset aggregation degree thresholds, and obtain multiple first data segments whose data aggregation degree is greater than or equal to the preset data aggregation degree threshold, and the data aggregation degree is smaller than the preset data aggregation. A plurality of second data segments of the degree threshold.

S175, the first data segments are arranged according to a preset rule to obtain a corresponding save priority sequence, and step S179 is performed.

In this embodiment, the data aggregation degree threshold is set in advance, for example, 30. Therefore, when it is determined that the data aggregation degree of the plurality of data segments is greater than the preset threshold, the first data segments need to be in a certain order. Arrange to obtain a priority queue for the first data segment.

In this embodiment, the preset rule refers to arranging the first data segments whose data aggregation degree is greater than or equal to the preset threshold according to the data aggregation degree from large to small. Of course, it is understandable to arrange them in order from small to large, or according to other rules.

S177. Extract corresponding record records of the foregoing second data segments into the log file, and check each second data segment according to a preset check cycle timing, until the data aggregation degree of each second data segment is equal to or greater than a preset When the data aggregation degree threshold is used, each second data segment is saved in the corresponding save priority sequence according to a preset rule, and step S179 is performed.

The data aggregation degree of the second data segment whose original data aggregation degree is less than the preset threshold in the previous save snapshot period can be greater than or equal to the preset threshold after a save snapshot cycle. In this embodiment, the second data segment is checked once every two save snapshot periods to determine whether the data aggregation degree reaches a preset threshold, that is, each two save snapshot cycles is set to one check cycle. Of course, it is also possible to set three or more save snapshot periods as one check cycle according to actual conditions.

In this embodiment, when a check cycle is passed, it is checked again that the data aggregation degree of each second data segment is greater than or equal to the preset data aggregation degree threshold, that is, the original save snapshot period is turned into Each data segment of the second data segment is increased in data aggregation degree over a period of time, that is, the data type is discriminated as the first data segment in the second save snapshot cycle, and therefore, directly according to the data aggregation degree thereof The save to the corresponding save priority sequence (the save priority sequence here is the priority sequence constructed according to the data aggregation of each first data segment in the second save snapshot cycle).

S179: Save the disk according to the save priority sequence, and execute step S11.

In this embodiment, since the new current bitmap (ie, the emptied snapshot bitmap) is obtained after the replacement, the change record of the metadata can be continuously recorded in real time through the new current bitmap while the disk is being saved, until the next time When a save snapshot cycle arrives, the new current bitmap is replaced while another new current bitmap is obtained, thus looping.

In this embodiment, the change record of the metadata is recorded by setting the current bitmap, and when the save snapshot period is reached, the current bitmap is immediately replaced with the snapshot bitmap, and a new current bitmap is obtained to record the new metadata. Change the record, at the same time, before the next save snapshot cycle, you can directly save the new snapshot bitmap according to the replacement. It can be seen that by taking a snapshot of the current bitmap, the save period is extended, so that the same record or file block in one cycle needs to be saved once, and the order of priority is from high to low in batch mode. Write the disk, thereby increasing the degree of aggregation of the file to reduce the amount of data stored in the disk, ensuring data integrity, and accessibility.

Embodiment 2

As can be seen from the above embodiments, the modification of the metadata includes an addition, such as writing a file/metadata. Therefore, the method of saving the file when the file is written will be described in detail below with reference to the drawings and the exemplary embodiments.

Referring to FIG. 5, it is a sequence diagram of an embodiment of a write file based on the save method in the first embodiment, wherein the file is written in the distributed file system in the embodiment:

S21. The file access client FAC sends a write file request to the directory tree server DTS.

In this embodiment, the user sends a write file request to the DTS through the FAC, and the write file request carries the full path of the file object to be written.

S22: The DTS determines whether the file exists. If not, the DTS generates a new file identifier FILEID, allocates an available FLR to it, generates a dictionary table record to store the file name, generates a file FILEID record, and stores the FILEID, FLRID, and the like. The information is then given a success message to the file access client FAC feedback; if the file does not exist, the DTS gives the FAC an error.

In this embodiment, the DTS searches for a presence in the namespace to determine whether the file exists.

S23. After receiving the message, the FAC sends a create file message to the corresponding file location register FLR.

S24. The FLR determines whether the file already exists. If yes, the feedback already exists. If not, the FILE record is created, the FILEID, the generation time, and the like are stored, and the FAC feedback creation file is successfully acknowledged.

In this embodiment, the FLR traverses the file by FILEID to determine whether it exists.

S25, the FAC receives the create file response, and sends a file block request to the FLR through the FILEID.

S26, the FLR selects the destination disk of the file block according to the storage rule, and generates a file block corresponding record, and simultaneously feeds the FAC to create the disk information of the file block.

S27. The FAC creates a file block on the FAS according to the returned disk information, and writes the file content.

S28, the FAS writes the file according to the timed batch manner, and after writing the file, after writing, returns the write result and the file block size information to the FAC.

In this embodiment, the FAS writes the file according to the timed batch mode, which means that the file content in the first embodiment is received in real time, and the corresponding preset current bitmap is updated in real time, and then the cycle is adopted. The current bitmap and the snapshot bitmap are replaced by the updated bitmap, and the snapshot bitmap obtained after the replacement is saved until the entire file content is written, that is, the file content is periodically and batch-written. And the writing process is written according to the priority sequence corresponding to each data segment.

S29, the FAC reports the write result and the file block size information to the FLR.

S210, the FLR records the reported content into the file block record and replies to the FAC.

In this embodiment, when the FAC receives the reply returned by the FLR, it indicates that the writing of the file is completed, and the user is sent a file completion response.

Embodiment 3

Corresponding to the above-described storage method, the present invention also provides a distributed file system, which will be described in detail below with reference to the accompanying drawings and exemplary embodiments.

Referring to FIG. 6, a disk storage device for a distributed file system according to the present invention, the disk storage device includes:

The processing module 61 is configured to preset a save snapshot period, and pre-create a current bitmap for indicating a change record of the metadata in the current save snapshot period for the file system table, and used to represent the metadata in the last save snapshot period. Snapshot bitmap of the change record;

The data is calculated into the module 62, and is configured to receive a change record of the metadata input by the user in real time;

The update module 63 is configured to update the current bitmap in real time according to the change record received by the data access module 62. In this embodiment, the current bitmap refers to the preset current bitmap in the initial state after the system is powered on. Or, after the system is powered on, the new current bitmap obtained after the replacement;

The save disk module 64 is configured to replace the snapshot bitmap representing the change record of the last save cycle metadata with the current bitmap updated by the update module 63 in real time to obtain a new current bitmap when the save snapshot period arrives. And the snapshot bitmap is saved according to the new snapshot bitmap obtained after the replacement; at the same time, the trigger update module updates the new current bitmap according to the new change record of the metadata.

Referring to FIG. 6, in the embodiment, the save module 64 is saved according to the save priority sequence of each data segment when the save module 64 is saved. The save module 64 includes:

The determining unit 641 is configured to determine whether the save snapshot period is currently reached. In an exemplary embodiment, a timer is set by the processing module 61 to perform timing, so that when the timing reaches a preset duration, a trigger signal is sent to the The save module 64 triggers a save operation or the like. Therefore, whether the save snapshot period is reached can be determined by determining whether the trigger signal sent by the processing module 61 is received.

The replacement unit 642 is configured to replace the updated current bitmap with the snapshot bitmap when the determining unit 641 determines that the save snapshot period is currently reached, to obtain a new current bitmap and a new snapshot bitmap. In this embodiment, After the save snapshot period is reached, that is, the determination unit 641 receives the trigger signal sent by the processing module 61, the determining unit 641 sends a trigger signal to the replacement unit 642, so that the replacement unit 642 will indicate the current save snapshot period. The current bitmap of the change record of the metadata and the snapshot bitmap representing the change record of the metadata in the previous save snapshot period (or preset in the initial state when the system is powered on) are replaced, thereby obtaining a new current bitmap and The new snapshot bitmap, as shown in Figure 3, will empty the original snapshot bitmap as the new current bitmap, and use the original current bitmap as the new snapshot bitmap;

The priority sorting unit 644 is configured to calculate a save priority sequence of each data segment according to the replaced new snapshot bitmap. In this embodiment, the priority sorting unit 644 includes: a data aggregation degree calculation subunit, configured to: The data aggregation degree of each corresponding data segment is calculated according to the new snapshot bitmap obtained after the replacement; the comparison subunit is configured to perform the data aggregation degree of each data segment and the data aggregation degree threshold preset by the processing module 61. Comparing, respectively, obtaining a plurality of first data segments whose data aggregation degree is greater than or equal to a preset data aggregation degree threshold, and a plurality of second data segments whose data aggregation degree is less than a preset data aggregation degree threshold; and pre-processing according to the processing module 61 Set the inspection cycle timing to each second number According to the segment inspection, until the aggregation degree threshold of each second data segment is equal to or greater than the preset data aggregation degree threshold, generating a trigger signal triggers the following sorting subunit to save each second data segment according to a preset rule. In the corresponding priority queue, the sorting subunit is configured to arrange each of the first data segments according to a comparison rule according to a comparison result of the comparison subunit to obtain a corresponding storage priority sequence; and write a recording subunit for Extracting, according to the comparison result of the comparison subunit, the record corresponding to each second data segment is written into the log file;

The save thread unit 643 is configured to save the corresponding data segments according to the calculated save priority sequence.

In this embodiment, the preset rule refers to arranging the first data segments whose data aggregation degree is greater than or equal to the preset threshold according to the data aggregation degree from large to small; of course, in order from small to large, Or it is understandable to arrange according to other rules.

In this embodiment, the current bitmap is separately set by the processing module to record the change record of the metadata in the current period, and the snapshot bitmap is used to record the change record of the metadata in the last save snapshot period, and after the save snapshot period is reached. , the current bitmap is immediately replaced with the snapshot bitmap, and a new current bitmap is obtained to record the new change record of the metadata, and at the same time, the new snapshot bitmap can be directly obtained according to the replacement before the next save snapshot period arrives. Performing a batch save, it can be seen that by taking a snapshot of the current bitmap, the save cycle is extended, so that the same record or file block in one cycle needs to be saved once, and the file is saved in batch mode according to the priority. High-to-low sequential writes, which increase the degree of file aggregation to reduce the amount of data stored in the disk, ensuring data integrity and accessibility.

Embodiment 4

The present invention also provides a distributed file system, which includes the disk storage device in the third embodiment, the method and the principle of the disk storage, and the above embodiment, based on the storage method and the disk storage device applied to the distributed file system. The principles in one or two or three are the same and will not be described here.

The above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited thereto, and those skilled in the art can understand all or part of the process of implementing the above embodiments, and according to the claims of the present invention. The equivalent change is still within the scope of the invention.

Industrial applicability

The technical solution provided by the embodiment of the present invention can be applied to the technical field of computer storage. In the save method and the save device provided by the technical solution of the embodiment of the present invention, whether the metadata is modified is determined by presetting the save snapshot period and maintaining the current bitmap and the snapshot bitmap corresponding to the file system table in the memory. And by taking a snapshot of the snapshot bitmap to extend the save cycle, so that the same record or file block in one cycle, only need to be saved once, and When saving, the disk is written in batch mode according to the priority from high to low, thereby increasing the degree of aggregation of the file to reduce the amount of data stored in the disk, ensuring data integrity and accessibility.

Claims

A storage method for a distributed file system, wherein a save snapshot period is preset, and a current bitmap for indicating a change record of metadata in a current save snapshot period is pre-created for the file system table, and is used for representing The snapshot bitmap of the change record of the metadata in the last save snapshot period, the save method includes the steps of:

Receiving a change record of the metadata in real time, and updating the corresponding current bitmap according to the change record;

Determining whether the save snapshot period is currently reached, and if so, replacing the snapshot bitmap with the updated current bitmap to obtain a new current bitmap and a new snapshot bitmap, and performing a new snapshot bitmap according to the new snapshot bitmap Save the disk, and at the same time, restart the recording of the new change record of the metadata with the new current bitmap.
The method of saving a disk according to claim 2, wherein the saving is performed according to a storage priority sequence of each data segment corresponding to the new snapshot bitmap.
The method of depositing a disk according to claim 2, wherein the calculating step of the sequence of saving priorities of the respective data segments comprises the steps of:

Calculating the data aggregation degree of each data segment according to the new snapshot bitmap obtained after the replacement;

The data aggregation degree of each data segment is compared with a preset data aggregation degree threshold, and a plurality of first data segments whose data aggregation degree is greater than or equal to a preset data aggregation degree threshold are respectively obtained, and the data aggregation degree is less than the preset data. a plurality of second data segments of the degree of polymerization threshold;

Each of the first data segments is arranged according to a preset rule, thereby obtaining a corresponding save priority sequence.
The method of depositing a disk according to claim 3, wherein the calculating step of the priority sequence of the data segments further comprises the steps of:

Extracting each of the second data segment corresponding records into the log file; and checking the second data segment according to a preset check cycle timing until the data aggregation degree of the second data segment is equal to or greater than a preset When the data aggregation degree threshold is used, the second data segment is saved in the corresponding save priority sequence according to a preset rule.
The disk saving method according to claim 3 or 4, wherein the preset rule refers to arranging the data segments in descending order of aggregation degree; and/or, the two disk snapshot cycles are one inspection cycle. .
A disk storage device for a distributed file system, comprising:

The processing module is configured to preset a snapshot snapshot period, and pre-create a current bitmap for indicating a change record of metadata in the current save snapshot period for the file system table, and for indicating metadata in the last save snapshot period. Change the snapshot bitmap of the record;

a data access module, configured to receive a change record of metadata input by the user in real time;

And an update module, configured to update the current bitmap in real time according to the change record of the metadata received by the data access module;

a save disk module, configured to: when the save snapshot period arrives, the snapshot bitmap and the updated The previous bitmap is replaced, a new current bitmap and a new snapshot bitmap are obtained, and are saved according to the new snapshot bitmap, and the update module is triggered to update the new change record according to the metadata. The new current bitmap.
The disk storage device of claim 6, wherein the saving module comprises:

The determining unit is configured to determine whether the current save snapshot period is reached;

a replacement unit, configured to replace the updated current bitmap with the snapshot bitmap to obtain a new current bitmap and a new snapshot bitmap when the determining unit determines that the save snapshot period is currently reached ;

a priority sorting unit, configured to calculate a save priority sequence of each data segment according to a new snapshot bitmap obtained after the replacement;

The save thread unit is configured to save the corresponding data segments according to the save priority sequence.
The disk storage device of claim 7, wherein the processing module is further configured to preset a data aggregation degree threshold, and the priority ordering unit comprises:

a data aggregation degree calculation subunit, configured to calculate a data aggregation degree of each corresponding data segment according to the new snapshot bitmap obtained after the replacement;

Comparing the sub-units, the data aggregation degree of each data segment is compared with the preset data aggregation degree threshold, and respectively obtaining a plurality of first data segments whose data aggregation degree is greater than or equal to a preset data aggregation degree threshold And a plurality of second data segments whose data aggregation degree is less than a preset data aggregation degree threshold;

The sorting subunit is configured to arrange each of the first data segments according to a preset rule according to a comparison result of the comparison subunits to obtain a corresponding save priority sequence.
The disk storage device of claim 8, wherein the prioritization unit further comprises:

Writing a recording subunit, configured to extract, according to a comparison result of the comparing subunit, a record corresponding to each of the second data segments to be written into a log file;

The comparing subunit is further configured to check each of the second data segments according to a preset check cycle timing, and trigger until the data aggregation degree of the second data segment is equal to or greater than a preset data aggregation degree threshold. The sorting subunit saves the second data segment to a corresponding priority queue according to a preset rule.
A distributed file system comprising the disk storage device according to any one of claims 6 to 9.
A computer storage medium having stored therein one or more programs executable by a computer, the one or more programs being executed by the computer to cause the computer to perform as in claims 1-5 A storage method for a distributed file system as described in any one of the preceding claims.