WO2019149261A1

WO2019149261A1 - File storage method for distributed file system and distributed file system

Info

Publication number: WO2019149261A1
Application number: PCT/CN2019/074332
Authority: WO
Inventors: 李凯; 林健
Original assignee: 中兴通讯股份有限公司
Priority date: 2018-02-01
Filing date: 2019-02-01
Publication date: 2019-08-08
Also published as: CN110109886A; CN110109886B

Abstract

Disclosed in the present invention are a file storage method for a distributed file system, the distributed file system, and a computer-readable storage medium, the method comprising: dividing a single disk into a plurality of storage areas; acquiring file data to be written; acquiring a corresponding storage area according to a current policy type; and writing the file data to be written into the corresponding storage area. According to the present invention, a file process is stored according to the current policy as much as possible during a storage phase, so that generation of small disk fragments is reduced, the aggregation of file storage is implemented, the management and distribution of disk space are optimized, and the read-write performance of the disk is improved.

Description

File storage method of distributed file system and distributed file system

The present application claims priority to Chinese Patent Application No. CN201810103081.6, filed on Jan. 01, PCT,,,,,,,,,,,,,,,,,

Technical field

The present invention relates to the field of cloud storage, and in particular, to a file storage method of a distributed file system and a distributed file system.

Background technique

Cloud storage is a system that integrates a large number of different types of storage devices in a network through cluster applications, grid technologies, distributed file systems, etc., and provides data storage and service access functions externally. Cloud storage systems have good scalability, fault tolerance, and internal implementations that are transparent to users. The distributed file system shields the differences between the underlying file systems, provides a unified access interface and resource management, and provides powerful support for cloud storage.

When the file is stored, it is cut into a lot of fragments and stored on the disk. The more fragments, the more times the disk's mechanical rocker is addressed back and forth, the lower the efficiency of reading and writing files. The longer the disk is used, the more disk fragmentation it generates, which seriously affects the read and write performance of the disk. In addition, as the disk usage time increases, the small pieces of disk will accumulate more and cannot be allocated more effectively, resulting in wasted disk storage space.

Therefore, how to save files to improve read and write performance is still an urgent problem to be solved.

Summary of the invention

In view of this, an object of the present invention is to provide a file storage method, a distributed file system, and a computer readable storage medium of a distributed file system, to optimize management and allocation of disk space, and improve disk read and write performance.

The technical solution adopted by the present invention to solve the above technical problems is as follows:

According to an aspect of the present invention, a file storage method of a distributed file system is provided, including:

Obtaining file data to be written;

Obtaining a corresponding storage area according to the current policy type; the policy type includes a file policy according to a file size to be written, a directory policy according to a directory to which the file to be written belongs, or a user policy according to a user to which the file to be written belongs;

Writing the file data to be written to the corresponding storage area.

In a possible design, when the current policy type is a file policy, when the file data to be written is new data that is not stored, the acquiring the corresponding storage area according to the current policy type includes: :

Determine if there is a storage area where file data is not written; if not, then

Obtaining a corresponding storage area according to the file size of the file data to be written.

In a possible design, the obtaining the corresponding storage area according to the file size of the file data to be written includes:

Determining whether the file size of the file data to be written is greater than a preset threshold;

If yes, determining that the file data to be written is a large file, and storing a storage area that has a large file as a storage area corresponding to the file data to be written;

If not, it is determined that the file data to be written is a small file, and a storage area in which the small file has been stored is used as a storage area corresponding to the file data to be written.

In a possible design, when the file data to be written is a file fragment of a file that has been written, the obtaining the corresponding storage area according to the current policy type includes:

Determining whether the storage area in which the file has been written is full; if not, then

The storage area in which the file has been written is used as a storage area corresponding to the file data to be written.

In a possible design, if the current policy type is a directory policy, the obtaining the corresponding storage area according to the current policy type includes:

Obtaining a directory name of the file data to be written;

Determining, in the storage area in the use state, whether there is a storage area in the identifier list that includes the directory name, and if so,

A storage area including the directory name in the identifier list is used as a storage area corresponding to the file data to be written.

In a possible design, if the current policy type is a user policy, the obtaining the corresponding storage area according to the current policy type includes:

Obtaining a username of the file data to be written;

Determining, in the storage area in the use state, whether there is a storage area in the identifier list that includes the user name, and if so,

A storage area including the user name in the identifier list is used as a storage area corresponding to the file data to be written.

In a possible design, after the file data is written on the storage area and the attribute information of the storage area is updated in a database, the method includes:

Obtain a storage area whose status is using;

Determining whether the storage space used by the storage area whose usage is using is smaller than an aggregation threshold;

If yes, the storage area whose state is using is recorded as a sub-aggregation storage area;

Aggregating at least two of the sub-aggregation storage areas into one of the sub-aggregation storage areas to obtain an aggregate storage area.

In a possible design, after the aggregation of the sub-aggregation storage area into one of the sub-aggregation storage areas to obtain an aggregate storage area, the method includes:

Update the file list information or directory ID or user ID of the aggregated aggregate storage area.

According to another aspect of the present invention, a distributed file system is provided, comprising: a memory, a processor, and a computer program stored on the memory and operable on the processor, the computer program being processed The steps of the file storage method of the distributed file system provided by the embodiment of the present invention are implemented when the device is executed.

According to another aspect of the present invention, a computer readable storage medium is provided, on which a file storage program is stored, and when the file storage program is executed by a processor, the distributed method provided by the embodiment of the present invention is implemented. The steps of the file system storage method of the file system.

The file storage method, the distributed file system, and the computer readable storage medium of the distributed file system in the embodiment of the present invention continuously store the size file storage units as much as possible in the storage stage, thereby reducing the generation of small fragments of the disk and realizing The aggregation of file storage optimizes the management and allocation of disk space and improves disk read and write performance.

DRAWINGS

1 is a schematic flowchart of a file storage method of a distributed file system according to an embodiment of the present invention;

2 is a schematic flowchart of a file storage method of a distributed file system according to another embodiment of the present invention;

3 is a schematic flowchart of a file storage method of a distributed file system according to another embodiment of the present invention;

4 is a schematic flowchart of a file storage method of a distributed file system according to another embodiment of the present invention;

FIG. 5 is a schematic flowchart diagram of a file storage method of a distributed file system according to another embodiment of the present invention; FIG.

FIG. 6 is a schematic flowchart diagram of a file storage method of a distributed file system according to another embodiment of the present invention; FIG.

FIG. 7 is a schematic flowchart diagram of a file storage method of a distributed file system according to another embodiment of the present invention; FIG.

FIG. 8 is a schematic diagram of an aggregation storage area according to an embodiment of the present invention.

The implementation, functional features, and advantages of the present invention will be further described in conjunction with the embodiments.

Detailed ways

The present invention will be further described in detail below with reference to the accompanying drawings and embodiments, in order to make the present invention. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in FIG. 1 , the present invention provides a file storage method for a distributed file system, including the following steps:

101, start.

Divide a single disk into multiple storage areas. More specifically, the disk is divided into a plurality of storage areas for storing file data as a storage pool. Generally, the storage space size of the storage area is configurable, and the default is 1G. Thus, the current mainstream single disk will be Divided into thousands of storage areas.

102. Acquire file data to be written.

103. Acquire a corresponding storage area according to the current policy type.

Current policy types can include file policies based on file size, directory policies based on directory names, or user policies based on user names. The file policy means that files are classified and stored in the storage pool according to the size; the directory policy means that all files in the same directory are stored in the same storage area; the user policy means that all files of the same user are stored in the same storage area.

104. Write the to-be-written file data into the corresponding storage area.

105, the end.

Generally, it is also necessary to write the file data on the storage area and update attribute information of the storage area in a database.

The attribute information of the storage area in the present invention may include, but is not limited to, a storage area serial number, a storage area address, a storage area write pointer, a written file list, a storage area size file identifier, a storage area policy type identifier, and a storage. The directory ID of the area, the user ID of the storage area, and the storage area status identifier.

The storage area serial number is used to identify and distinguish different storage areas. The storage area address is used to record the disk location where this storage area is located. The storage area write pointer is used to record the current write location of this storage area. The list of written files is used to record a list of files written to this storage area. The storage area size file identifier is used to identify whether the storage area stores large files or small files. The storage area policy type identifies the type of policy used to identify this storage area. The directory identifier of the storage area is used to identify the directory to which the storage area belongs, and the directory policy is used. The user ID of the storage area is used to identify the user and user policies to which the storage area belongs. The storage area status identifier is used to identify the storage status of the storage area.

The above storage area states include three types of free, using, and full. The free state of the storage area means that the storage area is empty and data has not been written yet; the using state of the storage area means that the storage area has been written with data, and is in use; the full state of the storage area indicates that the storage area data is full. Can no longer apply to write data.

Generally, the attributes of the storage area are stored in the database, and the database records are updated when the storage area attributes change, and the attribute information is obtained when the data is read or written in the storage area.

On the basis of the embodiment corresponding to FIG. 1 , when the current policy type is a file policy, step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 2 Show, including:

201, start.

Because large files and small files are stored in different storage areas under the file policy, you can first create two queues, a large file storage area queue and a small file storage area queue for the storage area in the storage pool. Large files and small files are written.

202. Query whether the file data to be written is already stored; if yes, go to step 203; if no, go to step 205.

203. Determine whether the storage space of the storage area that is already stored is sufficient; if yes, go to step 204; if no, go to step 205.

204. The storage area that has been stored is used as a corresponding storage area of the file data to be written; and the process proceeds to step 210.

For example, a disk (a storage pool) contains 1000 storage areas. The first time, 500 different files are stored, and the data is stored in the first 500 storage areas and updated in the storage area size file identifier. To write data for 500 different files, you must first check if the file is already stored. In this way, it is guaranteed that the same file is only stored in the same storage area.

205. Query whether there is a storage area in which the file data is not written, that is, a storage area whose state is free; if yes, go to step 206; if no, go to step 207.

206. The storage area of the unwritten file data is used as a storage area corresponding to the file data to be written; and the process proceeds to step 210.

For example, a disk (a storage pool) contains 1000 storage areas. The first 500 storage areas have already stored file data, and the file data to be written is not stored or the storage area of the existing storage is full. The storage is started from one after the last storage area of the last application, that is, from the 501th storage area.

207. Acquire a file size of the file data to be written and determine whether it is greater than a preset threshold; if yes, go to step 208; if no, go to step 209.

If there is no storage area in which the file data is not written, it means that all the storage areas of the storage pool have different files uniformly. At this time, the files of different sizes are started again from the first storage area of the head of the storage pool. The identifier is used to store subsequent new file data.

208. Determine that the file data to be written is a large file, and use a storage area where the large file is stored as a storage area corresponding to the file data to be written. And proceeds to step 210.

209. Determine that the to-be-written file data is a small file, and use a storage area where the small file is stored as a storage area corresponding to the to-be-written file data. And proceeds to step 210.

210, the end.

In this embodiment, if a disk contains 1000 storage areas, the data of 500 different files is stored for the first time, and then stored in the first 500 storage areas and updated with the storage area size file identifier and the like, and the second time is rewritten. 500 different file data, first check whether the file is stored, to ensure that the same file is only stored in the same storage area (unless it is full), if there is new data stored in the second file, there will be Stored from the beginning of the last storage area of the last application (that is, stored from the 501th storage area), stored in turn, until all the storage areas of the storage pool are evenly stored with different files, and then again from The first storage area of the head of the storage pool begins to store subsequent new file data according to different size file identifiers. Write as many different files as possible to different storage areas to achieve the most possible file aggregation.

If the file size is less than the preset threshold, it is judged to be a small file, otherwise it is a large file. Large files and small files are stored separately in different storage areas, that is, for any storage area, the stored content can only be large files or all small files. The storage area is distinguished by the storage area size file identifier. The small file is stored separately from the large file, which realizes the classification and aggregation of the size files, which can reduce file fragmentation more effectively, reduce the number of round-trip addressing of the disk head, and improve the throughput of system data. .

It can be seen that when the file data to be written is new data that is not stored, the storage area that is not written to the file data, that is, the storage area whose state is free, is preferentially stored. If there is no storage area in the free state, the corresponding storage area is obtained according to the file size of the file data to be written.

On the basis of the embodiment corresponding to FIG. 1 , when the current policy type is a file policy, when step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 3 ,include:

301, starting.

Create two queues, a large file storage area queue and a small file storage area queue for the storage area in the storage pool, which are used for writing large files and small files respectively.

302. Determine whether the file name exists in a file list of a storage area whose status is using, and if yes, go to step 304; if no, go to step 303.

In the specific implementation, the query may be performed in the database according to the file name of the file data to be written. When the file name is not in the file list of the storage area whose status is using, the file data to be written is newly written or the corresponding storage area is full, and the state is full, and a new storage area needs to be re-applied.

303. Determine a storage area size file identifier type corresponding to the file data to be written. And proceeds to step 305.

In a specific implementation, the size of the file to be written is greater than a preset threshold. If yes, the file size of the corresponding storage area is a large file. Otherwise, the file size of the corresponding storage area is a small file.

304. Determine a storage area corresponding to the file data to be written. And proceeds to step 305.

More specifically, if the file name of the file data to be written exists in the file list of the storage area whose usage is using, the storage area whose usage is using is the storage area corresponding to the file data to be written; otherwise, the storage state is free. Apply for one in the area, and update the size file identifier of this storage area to the database if the application is successful. In another embodiment of the present invention, the address of the storage area may also be obtained from the head of the corresponding storage area queue according to the storage area size file identification type, and the storage area is placed at the end of the queue.

305. The corresponding storage area reports the size of the data to be written to the database. And proceed to step 306.

The database updates the storage area write pointer in time to determine whether the remaining space in the storage area is still enough for the next data write. If the information such as the storage area status identifier is updated, the storage area begins to write data.

306. Determine whether the current data is all written. If yes, proceed to step 307; if no, return to step 302.

307, the end.

On the basis of the embodiment corresponding to FIG. 1 , when the current policy type is a directory policy, step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 4 Show, including:

401, start.

402. Obtain a directory name of the file data to be written.

403. Determine whether a storage area including the directory name in the identifier list exists in the storage area in the use state, and if yes, proceed to step 404; if no, proceed to step 405.

The storage area in the state of use in the present invention is a storage area whose state is using, that is, the directory name is already in the directory identifier list of the storage area and the storage area status is using, if there is a writeable storage of the directory Area, if there is no description that this directory has no writable storage area, there are two reasons, one is that this directory is the first data write this time, the other is that this directory originally had a write disk but the corresponding storage area is full The status is full.

404. The storage area that includes the directory name in the identifier list is used as a storage area corresponding to the file data to be written. After completing step 404, the process proceeds to step 406.

405. The storage area whose state is free is used as the storage area corresponding to the file data to be written. And proceed to step 406.

That is, apply for one from the storage area with the status of free, and update the directory name identifier of this storage area to the database if the application is successful.

406, the end.

It can be seen that the directory policy uniformly stores the file fragments of all the files in the same directory into a single or a limited number of storage areas, and implements batch aggregation of files, which is especially suitable for continuous access of files in a single directory.

On the basis of the corresponding embodiment of FIG. 4, after step 405, before step 406, the method further includes:

The corresponding storage area reports the size of the data to be written to the database, and the database updates the storage area write pointer in time to determine whether the remaining space of the storage area is sufficient for the next data write. If the storage area status identifier is updated, the information is simultaneously updated. The storage area begins to write data.

It is judged whether or not the current data is all written. If there is still data to be written, step 402 is re-executed; if not, the process proceeds to step 406.

On the basis of the embodiment corresponding to FIG. 1 , when the current policy type is a user policy, step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 5 Show, including:

501, start.

502. Acquire a username of the file data to be written.

503. Determine whether a storage area that includes the user name in the identifier list exists in the storage area in the use state, and if yes, go to step 504; if no, go to step 505.

That is, the username is already in the user ID list of the storage area and the storage area status is using. If there is a storage area indicating that the user has a writable area, if there is no storage area that the user has no writable, there are two reasons. One is that the user writes the first data this time, and the other is that the user originally wrote the disk but the corresponding storage area is full and the status is full.

504. The storage area that includes the user name in the identifier list is used as a storage area corresponding to the file data to be written. After completing step 504, the process proceeds to step 506.

505. The storage area with the state of free is used as the storage area corresponding to the file data to be written. And proceed to step 506.

That is, apply for one from the storage area with the status of free, and update the user ID of this storage area to the database if the application is successful.

506, the end.

In this embodiment, according to different user users, the corresponding files are classified into different storage areas. That is, a storage area only saves the file data of the user to which it belongs.

In a specific implementation, the user identifier of the storage area is determined by: when a new user A application file is written, a storage area with a free state is used to store file data, and if the application is successful, the storage area is identified. The user is A, the state of the storage area is using, and is stored in the database. When user A writes the disk, it will first check whether the database has a storage area with the user ID of the storage area and the status is using. If the query is found, select the storage area and write the disk. If the query is not available, re-apply for new. Storage area.

On the basis of the corresponding embodiment of FIG. 5, after step 505, before step 506, the method further includes:

It is judged whether or not the current data is all written. If there is still data to be written, step 502 is re-executed; if not, the process proceeds to step 506.

In an embodiment of the present invention, when a file starts to request to write a disk, a storage area is requested, and if the selected storage area state is free, the modified state is using, and then the storage area state, file name, file size, and The size file identifier calculated by the threshold value, the data volume size of the current file write disk, and the like are reported to the database, and the database updates the saved information after receiving the information.

When a file is subsequently requested to be written, the database is queried according to the file name. Whether the file name already exists in the list of a certain storage area. If it exists, it indicates that the file has been written in the storage area before, and the storage area is selected. Continue to write the disk, and then update the storage area of the database to write pointers and other information.

In order to reduce the dispersion of the file fragments, on the basis of any of the foregoing embodiments, after the end step, the storage area aggregation may also be performed, as shown in FIG. 6, including:

601, start.

602. Obtain a storage area whose status is using.

603. Determine whether the storage space used by the storage area whose usage is using is smaller than the aggregation threshold; if yes, go to step 604; if no, go to step 606.

The aggregation threshold includes a storage area threshold and a retention time threshold. If a storage area E usage rate does not reach the storage area threshold, and there is no data change within the retention time threshold T, it is determined that the storage area E is smaller than the aggregation threshold.

604. Record the storage area whose status is using as a sub-aggregation storage area. And proceeds to step 605.

605. Aggregate at least two of the sub-aggregation storage areas into one of the sub-aggregation storage areas to obtain an aggregation storage area. And proceeds to step 606.

If the data in the storage area E is aggregated into the other storage area F of the storage pool, the storage area E after the migration is set to be free.

606, the end.

Before ending, you can update the file list information or directory ID or user ID of the aggregated aggregate storage area.

The storage area aggregation shown in FIG. 6 is an storage area in which the state of the storage pool is using, and the storage area in which the state is full is already full and does not need to be re-aggregated, and the storage state in the state of free indicates that the data has not been stored yet. It is empty and does not need to aggregate data. The basic method of storage area aggregation is to obtain the storage area with the status of using one by one from the bottom of the storage pool to the top of the storage pool, and then judge one by one from the top of the storage pool to the bottom of the storage pool. Eligible storage areas are aggregated.

It should be noted that, in this embodiment, the storage area in the storage pool is sequentially numbered from the top of the storage pool and recorded as the storage area serial number, which is used to identify different storage areas, and the storage area serial number is initialized when the storage pool is initialized. Finish and update the information to the database. Create two linked lists for the storage area whose storage state is in the storage pool. One is sorted upward from the bottom of the storage pool, and is recorded as the linked list L1. One is sorted from the top of the storage pool and is recorded as L2.

Take the storage area aggregation of the file policy as an example. As shown in Figure 7, it includes:

701, start.

702. Acquire a storage area with a usage from the bottom of the storage pool, that is, obtain the first element of the linked list L1.

703. Calculate whether the storage space size used by the storage area is smaller than an aggregation threshold; if yes, go to step 705; if no, go to step 704.

704. Acquire the next element of the linked list L1, that is, obtain the storage area whose next state is using. If yes, go to step 703. If not, it means that there is no storage area to be aggregated. Step 711.

705. The storage area is A, the corresponding storage area serial number is SN1, and the used space size is K1; and the process proceeds to step 706.

706. Acquire a storage area with a usage from the top of the storage pool, that is, obtain the first element of the linked list L2.

707. Determine whether the serial number of the storage area is greater than or equal to SN1. If it is that there is no storage area that can be used to aggregate the storage area A, go to step 704. If the serial number of the storage area is smaller than SN1, go to step 708.

708. Calculate whether the remaining space of the storage area is greater than K1. If not, the remaining space of the storage area is insufficient to save the data in the storage area A. If not, go to step 709; if yes, record the storage area as B. Go to step 710.

709. Acquire the next element of the linked list L2, that is, obtain the storage area with the next state as using, and perform step 707.

710. The data in the storage area A is migrated to the storage area B, the file list of the storage area B is updated, the storage area write pointer and the like are deleted, and then the data of the storage area A is deleted, and the storage area A state is set to free, and the changed The information is updated to the database. Go to step 704.

711, the end.

Taking FIG. 8 as an example, there are M storage areas, if there is data1 in the storage area 1; data2 in the storage area 2; data3 in the storage area 3; datak in the storage area K; storage areas 1, 2, 3, K All of the sub-aggregation storage areas that can be aggregated are obtained by the method shown in FIG. 8. The obtained aggregation area is the storage area 1, and data1, datak, data3, and data2 are sequentially stored in the storage area 1.

The aggregation step of the directory policy and the user policy is basically the same as the storage area aggregation of the file policy. The difference is that after the data in the storage area A is migrated to the storage area B, the directory identifier or user identifier of the storage area B needs to be updated. That is, the directory identifier or the user identifier of the storage area A is added to the storage area B. The storage area B corresponds to two or more directory identifiers or user identifiers, and the changed information is updated to the database.

It can be seen that, in this embodiment, when the storage area aggregation is implemented, the used space size of the storage area A is calculated, and the storage area whose usage status is using and the remaining storage space is larger than the used space of the storage area is searched from the head of the storage pool. The storage area B is used as an aggregation destination. After the aggregation is completed, information such as a file name list of the destination storage area B is updated. For the directory policy storage mode, the directory identifier of the storage area A needs to be added to the directory identifier of the storage area B, so that the file operations in the directory corresponding to the original storage area A can be migrated to the storage area B. Similarly, for the user policy storage mode, the user ID of the storage area A needs to be added to the user ID of the storage area B, so that the file operation of the user corresponding to the original storage area A can be migrated to the storage area B. get on.

In addition, an embodiment of the present invention further provides a distributed file system, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being the processor The steps of the file storage method of the distributed file system provided by the embodiment of the present invention are implemented.

It should be noted that the foregoing distributed file system embodiment and the method embodiment are in the same concept, and the specific implementation process is described in the method embodiment, and the technical features in the method embodiment are applicable in the distributed file system embodiment. I won't go into details here.

In addition, the embodiment of the present invention further provides a computer readable storage medium, where the file storage program is stored on the computer readable storage medium, and the file storage program is executed by the processor to implement the distributed file provided by the embodiment of the present invention. The steps of the system's file storage method.

It should be noted that the file storage program embodiment on the above computer readable storage medium belongs to the same concept as the method embodiment, and the specific implementation process is described in the method embodiment, and the technical features in the method embodiment are in the above computer readable storage. The media embodiments are applicable, and are not described here.

The invention provides a file storage method, a distributed file system and a computer readable storage medium of a distributed file system. In the storage stage, the file is stored according to the current policy as much as possible, thereby reducing the generation of small fragments of the disk and realizing The aggregation of file storage optimizes the management and allocation of disk space and improves disk read and write performance.

By setting the data of the same file to be stored in the same storage area, the invention effectively reduces the dispersion of file fragment distribution, realizes a certain degree of file aggregation, and improves the throughput of file access.

The invention stores large files and small files in different storage areas respectively, realizes classification and aggregation of large and small files, can more effectively reduce file fragmentation, reduce the number of back and forth addressing of the disk heads, and improve the throughput of system data.

The invention uniformly stores file fragments of all files in the same directory into a single or a limited number of storage areas, and implements batch aggregation of files, which is especially suitable for continuous access of files in a single directory.

The invention uniformly stores file fragments of all the files of the same user user into a single or a limited number of storage areas, realizes batch aggregation of files, and has higher disk IO for a large number of disk access situations of a user in a period of time. And data throughput.

When the amount of data written to the disk is small in a period of time T, the distribution of the file fragments is relatively discrete, and the file fragments of the two or more storage areas in the storage pool can be aggregated into one storage area by means of aggregation. It can effectively reduce the dispersion of file fragmentation and optimize the space management and redistribution of disks.

The preferred embodiments of the present invention have been described above with reference to the drawings, and are not intended to limit the scope of the invention. A person skilled in the art can implement the invention in various variants without departing from the scope and spirit of the invention. For example, the features of one embodiment can be used in another embodiment to obtain a further embodiment. Any modifications, equivalent substitutions and improvements made within the technical concept of the invention are intended to be included within the scope of the invention.

Claims

A file storage method for a distributed file system, comprising:

Obtaining file data to be written;

Obtaining a corresponding storage area according to the current policy type; the policy type includes a file policy according to a file size to be written, a directory policy according to a directory to which the file to be written belongs, or a user policy according to a user to which the file to be written belongs;

Writing the file data to be written to the corresponding storage area.
The file storage method of the distributed file system according to claim 1, wherein, when the current policy type is a file policy, when the file data to be written is new data that is not stored, Obtaining the corresponding storage area according to the current policy type includes:

Determine if there is a storage area where file data is not written; if not, then

Obtaining a corresponding storage area according to the file size of the file data to be written.
The file storage method of the distributed file system according to claim 2, wherein the obtaining the corresponding storage area according to the file size of the file data to be written comprises:

Determining whether the file size of the file data to be written is greater than a preset threshold;

If yes, determining that the file data to be written is a large file, and storing a storage area that has a large file as a storage area corresponding to the file data to be written;

If not, it is determined that the file data to be written is a small file, and a storage area in which the small file has been stored is used as a storage area corresponding to the file data to be written.
The file storage method of the distributed file system according to claim 1, wherein when the file data to be written is a file fragment of a file that has been written, the obtaining the corresponding storage area according to the current policy type includes :

Determining whether the storage area in which the file has been written is full; if not, then

The storage area in which the file has been written is used as a storage area corresponding to the file data to be written.
The file storage method of the distributed file system according to claim 1, wherein, when the current policy type is a directory policy, the obtaining the corresponding storage area according to the current policy type includes:

Obtaining a directory name of the file data to be written;

Determining, in the storage area in the use state, whether there is a storage area in the identifier list that includes the directory name, and if so,

A storage area including the directory name in the identifier list is used as a storage area corresponding to the file data to be written.
The file storage method of the distributed file system according to claim 1, wherein, when the current policy type is a user policy, the obtaining the corresponding storage area according to the current policy type includes:

Obtaining a username of the file data to be written;

Determining, in the storage area in the use state, whether there is a storage area in the identifier list that includes the user name, and if so,

A storage area including the user name in the identifier list is used as a storage area corresponding to the file data to be written.
The file storage method of the distributed file system according to claim 1, wherein after the file data is written on the storage area and the attribute information of the storage area is updated in a database, the method includes:

Obtain a storage area whose status is using;

Determining whether the storage space used by the storage area whose usage is using is smaller than an aggregation threshold;

If yes, the storage area whose state is using is recorded as a sub-aggregation storage area;

Aggregating at least two of the sub-aggregate storage areas into one of the sub-aggregation storage areas to obtain an aggregated storage area.
The file storage method of the distributed file system according to claim 1, wherein the aggregating the sub-aggregation storage area into one of the sub-aggregation storage areas to obtain an aggregation storage area comprises:

Update the file list information or directory ID or user ID of the aggregated aggregate storage area.
A distributed file system, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being executed by the processor to implement the claims The steps of the file storage method of the distributed file system according to any one of 1 to 8.
A computer readable storage medium, wherein the computer readable storage medium stores a file storage program, the file storage program being executed by a processor to implement the distributed according to any one of claims 1 to 8. The steps of the file system storage method of the file system.