WO2019149261A1 - File storage method for distributed file system and distributed file system - Google Patents

File storage method for distributed file system and distributed file system Download PDF

Info

Publication number
WO2019149261A1
WO2019149261A1 PCT/CN2019/074332 CN2019074332W WO2019149261A1 WO 2019149261 A1 WO2019149261 A1 WO 2019149261A1 CN 2019074332 W CN2019074332 W CN 2019074332W WO 2019149261 A1 WO2019149261 A1 WO 2019149261A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
storage area
written
storage
data
Prior art date
Application number
PCT/CN2019/074332
Other languages
French (fr)
Chinese (zh)
Inventor
李凯
林健
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2019149261A1 publication Critical patent/WO2019149261A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Definitions

  • the present invention relates to the field of cloud storage, and in particular, to a file storage method of a distributed file system and a distributed file system.
  • Cloud storage is a system that integrates a large number of different types of storage devices in a network through cluster applications, grid technologies, distributed file systems, etc., and provides data storage and service access functions externally. Cloud storage systems have good scalability, fault tolerance, and internal implementations that are transparent to users. The distributed file system shields the differences between the underlying file systems, provides a unified access interface and resource management, and provides powerful support for cloud storage.
  • an object of the present invention is to provide a file storage method, a distributed file system, and a computer readable storage medium of a distributed file system, to optimize management and allocation of disk space, and improve disk read and write performance.
  • a file storage method of a distributed file system including:
  • the policy type includes a file policy according to a file size to be written, a directory policy according to a directory to which the file to be written belongs, or a user policy according to a user to which the file to be written belongs;
  • the acquiring the corresponding storage area according to the current policy type includes: :
  • the obtaining the corresponding storage area according to the file size of the file data to be written includes:
  • the file data to be written is a small file, and a storage area in which the small file has been stored is used as a storage area corresponding to the file data to be written.
  • the obtaining the corresponding storage area according to the current policy type includes:
  • the storage area in which the file has been written is used as a storage area corresponding to the file data to be written.
  • the obtaining the corresponding storage area according to the current policy type includes:
  • a storage area including the directory name in the identifier list is used as a storage area corresponding to the file data to be written.
  • the obtaining the corresponding storage area according to the current policy type includes:
  • a storage area including the user name in the identifier list is used as a storage area corresponding to the file data to be written.
  • the method includes:
  • the storage area whose state is using is recorded as a sub-aggregation storage area
  • the method includes:
  • a distributed file system comprising: a memory, a processor, and a computer program stored on the memory and operable on the processor, the computer program being processed
  • the steps of the file storage method of the distributed file system provided by the embodiment of the present invention are implemented when the device is executed.
  • a computer readable storage medium on which a file storage program is stored, and when the file storage program is executed by a processor, the distributed method provided by the embodiment of the present invention is implemented. The steps of the file system storage method of the file system.
  • the file storage method, the distributed file system, and the computer readable storage medium of the distributed file system in the embodiment of the present invention continuously store the size file storage units as much as possible in the storage stage, thereby reducing the generation of small fragments of the disk and realizing
  • the aggregation of file storage optimizes the management and allocation of disk space and improves disk read and write performance.
  • FIG. 1 is a schematic flowchart of a file storage method of a distributed file system according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a file storage method of a distributed file system according to another embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a file storage method of a distributed file system according to another embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a file storage method of a distributed file system according to another embodiment of the present invention.
  • FIG. 5 is a schematic flowchart diagram of a file storage method of a distributed file system according to another embodiment of the present invention.
  • FIG. 6 is a schematic flowchart diagram of a file storage method of a distributed file system according to another embodiment of the present invention.
  • FIG. 7 is a schematic flowchart diagram of a file storage method of a distributed file system according to another embodiment of the present invention.
  • FIG. 8 is a schematic diagram of an aggregation storage area according to an embodiment of the present invention.
  • the present invention provides a file storage method for a distributed file system, including the following steps:
  • the disk is divided into a plurality of storage areas for storing file data as a storage pool.
  • the storage space size of the storage area is configurable, and the default is 1G.
  • the current mainstream single disk will be Divided into thousands of storage areas.
  • Current policy types can include file policies based on file size, directory policies based on directory names, or user policies based on user names.
  • the file policy means that files are classified and stored in the storage pool according to the size;
  • the directory policy means that all files in the same directory are stored in the same storage area;
  • the user policy means that all files of the same user are stored in the same storage area.
  • the attribute information of the storage area in the present invention may include, but is not limited to, a storage area serial number, a storage area address, a storage area write pointer, a written file list, a storage area size file identifier, a storage area policy type identifier, and a storage.
  • the storage area serial number is used to identify and distinguish different storage areas.
  • the storage area address is used to record the disk location where this storage area is located.
  • the storage area write pointer is used to record the current write location of this storage area.
  • the list of written files is used to record a list of files written to this storage area.
  • the storage area size file identifier is used to identify whether the storage area stores large files or small files.
  • the storage area policy type identifies the type of policy used to identify this storage area.
  • the directory identifier of the storage area is used to identify the directory to which the storage area belongs, and the directory policy is used.
  • the user ID of the storage area is used to identify the user and user policies to which the storage area belongs.
  • the storage area status identifier is used to identify the storage status of the storage area.
  • the above storage area states include three types of free, using, and full.
  • the free state of the storage area means that the storage area is empty and data has not been written yet;
  • the using state of the storage area means that the storage area has been written with data, and is in use;
  • the full state of the storage area indicates that the storage area data is full. Can no longer apply to write data.
  • the attributes of the storage area are stored in the database, and the database records are updated when the storage area attributes change, and the attribute information is obtained when the data is read or written in the storage area.
  • step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 2 Show, including:
  • step 202 Query whether the file data to be written is already stored; if yes, go to step 203; if no, go to step 205.
  • step 203 Determine whether the storage space of the storage area that is already stored is sufficient; if yes, go to step 204; if no, go to step 205.
  • the storage area that has been stored is used as a corresponding storage area of the file data to be written; and the process proceeds to step 210.
  • a disk (a storage pool) contains 1000 storage areas.
  • 500 different files are stored, and the data is stored in the first 500 storage areas and updated in the storage area size file identifier.
  • step 205 Query whether there is a storage area in which the file data is not written, that is, a storage area whose state is free; if yes, go to step 206; if no, go to step 207.
  • the storage area of the unwritten file data is used as a storage area corresponding to the file data to be written; and the process proceeds to step 210.
  • a disk (a storage pool) contains 1000 storage areas.
  • the first 500 storage areas have already stored file data, and the file data to be written is not stored or the storage area of the existing storage is full.
  • the storage is started from one after the last storage area of the last application, that is, from the 501th storage area.
  • step 207 Acquire a file size of the file data to be written and determine whether it is greater than a preset threshold; if yes, go to step 208; if no, go to step 209.
  • the files of different sizes are started again from the first storage area of the head of the storage pool.
  • the identifier is used to store subsequent new file data.
  • step 208 Determine that the file data to be written is a large file, and use a storage area where the large file is stored as a storage area corresponding to the file data to be written. And proceeds to step 210.
  • step 209. Determine that the to-be-written file data is a small file, and use a storage area where the small file is stored as a storage area corresponding to the to-be-written file data. And proceeds to step 210.
  • the data of 500 different files is stored for the first time, and then stored in the first 500 storage areas and updated with the storage area size file identifier and the like, and the second time is rewritten.
  • 500 different file data first check whether the file is stored, to ensure that the same file is only stored in the same storage area (unless it is full), if there is new data stored in the second file, there will be Stored from the beginning of the last storage area of the last application (that is, stored from the 501th storage area), stored in turn, until all the storage areas of the storage pool are evenly stored with different files, and then again from The first storage area of the head of the storage pool begins to store subsequent new file data according to different size file identifiers. Write as many different files as possible to different storage areas to achieve the most possible file aggregation.
  • the file size is less than the preset threshold, it is judged to be a small file, otherwise it is a large file.
  • Large files and small files are stored separately in different storage areas, that is, for any storage area, the stored content can only be large files or all small files.
  • the storage area is distinguished by the storage area size file identifier.
  • the small file is stored separately from the large file, which realizes the classification and aggregation of the size files, which can reduce file fragmentation more effectively, reduce the number of round-trip addressing of the disk head, and improve the throughput of system data. .
  • the storage area that is not written to the file data that is, the storage area whose state is free, is preferentially stored. If there is no storage area in the free state, the corresponding storage area is obtained according to the file size of the file data to be written.
  • step 103 when the current policy type is a file policy, when step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 3 ,include:
  • step 302. Determine whether the file name exists in a file list of a storage area whose status is using, and if yes, go to step 304; if no, go to step 303.
  • the query may be performed in the database according to the file name of the file data to be written.
  • the file name is not in the file list of the storage area whose status is using, the file data to be written is newly written or the corresponding storage area is full, and the state is full, and a new storage area needs to be re-applied.
  • the size of the file to be written is greater than a preset threshold. If yes, the file size of the corresponding storage area is a large file. Otherwise, the file size of the corresponding storage area is a small file.
  • the storage area whose usage is using is the storage area corresponding to the file data to be written; otherwise, the storage state is free. Apply for one in the area, and update the size file identifier of this storage area to the database if the application is successful.
  • the address of the storage area may also be obtained from the head of the corresponding storage area queue according to the storage area size file identification type, and the storage area is placed at the end of the queue.
  • the corresponding storage area reports the size of the data to be written to the database. And proceed to step 306.
  • the database updates the storage area write pointer in time to determine whether the remaining space in the storage area is still enough for the next data write. If the information such as the storage area status identifier is updated, the storage area begins to write data.
  • step 306. Determine whether the current data is all written. If yes, proceed to step 307; if no, return to step 302.
  • step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 4 Show, including:
  • step 403. Determine whether a storage area including the directory name in the identifier list exists in the storage area in the use state, and if yes, proceed to step 404; if no, proceed to step 405.
  • the storage area in the state of use in the present invention is a storage area whose state is using, that is, the directory name is already in the directory identifier list of the storage area and the storage area status is using, if there is a writeable storage of the directory Area, if there is no description that this directory has no writable storage area, there are two reasons, one is that this directory is the first data write this time, the other is that this directory originally had a write disk but the corresponding storage area is full The status is full.
  • step 404 The storage area that includes the directory name in the identifier list is used as a storage area corresponding to the file data to be written. After completing step 404, the process proceeds to step 406.
  • the storage area whose state is free is used as the storage area corresponding to the file data to be written. And proceed to step 406.
  • the directory policy uniformly stores the file fragments of all the files in the same directory into a single or a limited number of storage areas, and implements batch aggregation of files, which is especially suitable for continuous access of files in a single directory.
  • step 405 after step 405, before step 406, the method further includes:
  • the corresponding storage area reports the size of the data to be written to the database, and the database updates the storage area write pointer in time to determine whether the remaining space of the storage area is sufficient for the next data write. If the storage area status identifier is updated, the information is simultaneously updated. The storage area begins to write data.
  • step 402 It is judged whether or not the current data is all written. If there is still data to be written, step 402 is re-executed; if not, the process proceeds to step 406.
  • step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 5 Show, including:
  • step 503. Determine whether a storage area that includes the user name in the identifier list exists in the storage area in the use state, and if yes, go to step 504; if no, go to step 505.
  • the username is already in the user ID list of the storage area and the storage area status is using. If there is a storage area indicating that the user has a writable area, if there is no storage area that the user has no writable, there are two reasons. One is that the user writes the first data this time, and the other is that the user originally wrote the disk but the corresponding storage area is full and the status is full.
  • step 504. The storage area that includes the user name in the identifier list is used as a storage area corresponding to the file data to be written. After completing step 504, the process proceeds to step 506.
  • the storage area with the state of free is used as the storage area corresponding to the file data to be written. And proceed to step 506.
  • the corresponding files are classified into different storage areas. That is, a storage area only saves the file data of the user to which it belongs.
  • the user identifier of the storage area is determined by: when a new user A application file is written, a storage area with a free state is used to store file data, and if the application is successful, the storage area is identified.
  • the user is A, the state of the storage area is using, and is stored in the database.
  • user A When user A writes the disk, it will first check whether the database has a storage area with the user ID of the storage area and the status is using. If the query is found, select the storage area and write the disk. If the query is not available, re-apply for new. Storage area.
  • step 505 after step 505, before step 506, the method further includes:
  • the corresponding storage area reports the size of the data to be written to the database, and the database updates the storage area write pointer in time to determine whether the remaining space of the storage area is sufficient for the next data write. If the storage area status identifier is updated, the information is simultaneously updated. The storage area begins to write data.
  • step 502 It is judged whether or not the current data is all written. If there is still data to be written, step 502 is re-executed; if not, the process proceeds to step 506.
  • a storage area is requested, and if the selected storage area state is free, the modified state is using, and then the storage area state, file name, file size, and The size file identifier calculated by the threshold value, the data volume size of the current file write disk, and the like are reported to the database, and the database updates the saved information after receiving the information.
  • the database is queried according to the file name. Whether the file name already exists in the list of a certain storage area. If it exists, it indicates that the file has been written in the storage area before, and the storage area is selected. Continue to write the disk, and then update the storage area of the database to write pointers and other information.
  • the storage area aggregation may also be performed, as shown in FIG. 6, including:
  • step 603. Determine whether the storage space used by the storage area whose usage is using is smaller than the aggregation threshold; if yes, go to step 604; if no, go to step 606.
  • the aggregation threshold includes a storage area threshold and a retention time threshold. If a storage area E usage rate does not reach the storage area threshold, and there is no data change within the retention time threshold T, it is determined that the storage area E is smaller than the aggregation threshold.
  • step 604. Record the storage area whose status is using as a sub-aggregation storage area. And proceeds to step 605.
  • step 605. Aggregate at least two of the sub-aggregation storage areas into one of the sub-aggregation storage areas to obtain an aggregation storage area. And proceeds to step 606.
  • the storage area E after the migration is set to be free.
  • the storage area aggregation shown in FIG. 6 is an storage area in which the state of the storage pool is using, and the storage area in which the state is full is already full and does not need to be re-aggregated, and the storage state in the state of free indicates that the data has not been stored yet. It is empty and does not need to aggregate data.
  • the basic method of storage area aggregation is to obtain the storage area with the status of using one by one from the bottom of the storage pool to the top of the storage pool, and then judge one by one from the top of the storage pool to the bottom of the storage pool. Eligible storage areas are aggregated.
  • the storage area in the storage pool is sequentially numbered from the top of the storage pool and recorded as the storage area serial number, which is used to identify different storage areas, and the storage area serial number is initialized when the storage pool is initialized. Finish and update the information to the database. Create two linked lists for the storage area whose storage state is in the storage pool. One is sorted upward from the bottom of the storage pool, and is recorded as the linked list L1. One is sorted from the top of the storage pool and is recorded as L2.
  • step 703. Calculate whether the storage space size used by the storage area is smaller than an aggregation threshold; if yes, go to step 705; if no, go to step 704.
  • Step 704. Acquire the next element of the linked list L1, that is, obtain the storage area whose next state is using. If yes, go to step 703. If not, it means that there is no storage area to be aggregated. Step 711.
  • the storage area is A, the corresponding storage area serial number is SN1, and the used space size is K1; and the process proceeds to step 706.
  • step 707. Determine whether the serial number of the storage area is greater than or equal to SN1. If it is that there is no storage area that can be used to aggregate the storage area A, go to step 704. If the serial number of the storage area is smaller than SN1, go to step 708.
  • step 708 Calculate whether the remaining space of the storage area is greater than K1. If not, the remaining space of the storage area is insufficient to save the data in the storage area A. If not, go to step 709; if yes, record the storage area as B. Go to step 710.
  • the data in the storage area A is migrated to the storage area B, the file list of the storage area B is updated, the storage area write pointer and the like are deleted, and then the data of the storage area A is deleted, and the storage area A state is set to free, and the changed The information is updated to the database. Go to step 704.
  • FIG. 8 there are M storage areas, if there is data1 in the storage area 1; data2 in the storage area 2; data3 in the storage area 3; datak in the storage area K; storage areas 1, 2, 3, K All of the sub-aggregation storage areas that can be aggregated are obtained by the method shown in FIG. 8.
  • the obtained aggregation area is the storage area 1, and data1, datak, data3, and data2 are sequentially stored in the storage area 1.
  • the aggregation step of the directory policy and the user policy is basically the same as the storage area aggregation of the file policy.
  • the difference is that after the data in the storage area A is migrated to the storage area B, the directory identifier or user identifier of the storage area B needs to be updated. That is, the directory identifier or the user identifier of the storage area A is added to the storage area B.
  • the storage area B corresponds to two or more directory identifiers or user identifiers, and the changed information is updated to the database.
  • the used space size of the storage area A is calculated, and the storage area whose usage status is using and the remaining storage space is larger than the used space of the storage area is searched from the head of the storage pool.
  • the storage area B is used as an aggregation destination. After the aggregation is completed, information such as a file name list of the destination storage area B is updated.
  • the directory policy storage mode the directory identifier of the storage area A needs to be added to the directory identifier of the storage area B, so that the file operations in the directory corresponding to the original storage area A can be migrated to the storage area B.
  • the user ID of the storage area A needs to be added to the user ID of the storage area B, so that the file operation of the user corresponding to the original storage area A can be migrated to the storage area B. get on.
  • an embodiment of the present invention further provides a distributed file system, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being the processor.
  • the embodiment of the present invention further provides a computer readable storage medium, where the file storage program is stored on the computer readable storage medium, and the file storage program is executed by the processor to implement the distributed file provided by the embodiment of the present invention.
  • the steps of the system's file storage method are described in detail below.
  • file storage program embodiment on the above computer readable storage medium belongs to the same concept as the method embodiment, and the specific implementation process is described in the method embodiment, and the technical features in the method embodiment are in the above computer readable storage.
  • the media embodiments are applicable, and are not described here.
  • the invention provides a file storage method, a distributed file system and a computer readable storage medium of a distributed file system.
  • the file is stored according to the current policy as much as possible, thereby reducing the generation of small fragments of the disk and realizing
  • the aggregation of file storage optimizes the management and allocation of disk space and improves disk read and write performance.
  • the invention effectively reduces the dispersion of file fragment distribution, realizes a certain degree of file aggregation, and improves the throughput of file access.
  • the invention stores large files and small files in different storage areas respectively, realizes classification and aggregation of large and small files, can more effectively reduce file fragmentation, reduce the number of back and forth addressing of the disk heads, and improve the throughput of system data.
  • the invention uniformly stores file fragments of all files in the same directory into a single or a limited number of storage areas, and implements batch aggregation of files, which is especially suitable for continuous access of files in a single directory.
  • the invention uniformly stores file fragments of all the files of the same user user into a single or a limited number of storage areas, realizes batch aggregation of files, and has higher disk IO for a large number of disk access situations of a user in a period of time. And data throughput.
  • the distribution of the file fragments is relatively discrete, and the file fragments of the two or more storage areas in the storage pool can be aggregated into one storage area by means of aggregation. It can effectively reduce the dispersion of file fragmentation and optimize the space management and redistribution of disks.

Abstract

Disclosed in the present invention are a file storage method for a distributed file system, the distributed file system, and a computer-readable storage medium, the method comprising: dividing a single disk into a plurality of storage areas; acquiring file data to be written; acquiring a corresponding storage area according to a current policy type; and writing the file data to be written into the corresponding storage area. According to the present invention, a file process is stored according to the current policy as much as possible during a storage phase, so that generation of small disk fragments is reduced, the aggregation of file storage is implemented, the management and distribution of disk space are optimized, and the read-write performance of the disk is improved.

Description

分布式文件系统的文件存储方法及分布式文件系统File storage method of distributed file system and distributed file system
本申请要求享有2018年02月01日提交的名称为“分布式文件系统的文件存储方法及分布式文件系统”的中国专利申请CN201810103081.6的优先权,其全部内容通过引用并入本文中。The present application claims priority to Chinese Patent Application No. CN201810103081.6, filed on Jan. 01, PCT,,,,,,,,,,,,,,,,,
技术领域Technical field
本发明涉及云存储领域,尤其涉及一种分布式文件系统的文件存储方法、及分布式文件系统。The present invention relates to the field of cloud storage, and in particular, to a file storage method of a distributed file system and a distributed file system.
背景技术Background technique
云存储是通过集群应用、网格技术、分布式文件系统等,将网络中大量类型各异的存储设备整合起来,并对外提供数据存储和业务访问功能的系统。云存储系统具有良好的可扩展性、容错性,以及内部实现对用户透明等特性。分布式文件系统屏蔽了底层文件系统之间的差异,提供了统一的访问接口和资源管理,为云存储提供有力的支撑。Cloud storage is a system that integrates a large number of different types of storage devices in a network through cluster applications, grid technologies, distributed file systems, etc., and provides data storage and service access functions externally. Cloud storage systems have good scalability, fault tolerance, and internal implementations that are transparent to users. The distributed file system shields the differences between the underlying file systems, provides a unified access interface and resource management, and provides powerful support for cloud storage.
文件存储时被切割成很多碎片存储在磁盘上的,碎片越多,磁盘的机械摇臂来回寻址的次数越多,文件的读写效率越低。磁盘使用时间越长其产生的磁盘碎片越多,严重影响磁盘的读写性能。另外,随着磁盘使用时间的增加,磁盘小碎片就会越积越多就无法更有效的被分配使用,造成磁盘存储空间的浪费。When the file is stored, it is cut into a lot of fragments and stored on the disk. The more fragments, the more times the disk's mechanical rocker is addressed back and forth, the lower the efficiency of reading and writing files. The longer the disk is used, the more disk fragmentation it generates, which seriously affects the read and write performance of the disk. In addition, as the disk usage time increases, the small pieces of disk will accumulate more and cannot be allocated more effectively, resulting in wasted disk storage space.
因此,文件如何进行存储才能提高读写性能仍是亟待解决的问题。Therefore, how to save files to improve read and write performance is still an urgent problem to be solved.
发明内容Summary of the invention
有鉴于此,本发明的目的在于提供一种分布式文件系统的文件存储方法、分布式文件系统及计算机可读存储介质,以优化了对磁盘空间的管理和分配,提升了磁盘读写性能。In view of this, an object of the present invention is to provide a file storage method, a distributed file system, and a computer readable storage medium of a distributed file system, to optimize management and allocation of disk space, and improve disk read and write performance.
本发明解决上述技术问题所采用的技术方案如下:The technical solution adopted by the present invention to solve the above technical problems is as follows:
根据本发明的一个方面,提供一种分布式文件系统的文件存储方法,包括:According to an aspect of the present invention, a file storage method of a distributed file system is provided, including:
获取待写文件数据;Obtaining file data to be written;
根据当前的策略类型获取对应的存储区域;所述策略类型包括根据待写文件大小的文件策略、根据待写文件所属目录的目录策略或根据待写文件所属用户的用户策略;Obtaining a corresponding storage area according to the current policy type; the policy type includes a file policy according to a file size to be written, a directory policy according to a directory to which the file to be written belongs, or a user policy according to a user to which the file to be written belongs;
将所述待写文件数据写入所述对应的存储区域。Writing the file data to be written to the corresponding storage area.
在一个可能的设计中,在所述当前的策略类型为文件策略的情况下,当所述待写文件数据为未有存储的新数据时,所述根据当前的策略类型获取对应的存储区域包括:In a possible design, when the current policy type is a file policy, when the file data to be written is new data that is not stored, the acquiring the corresponding storage area according to the current policy type includes: :
判断是否有未写入文件数据的存储区域;若否,则Determine if there is a storage area where file data is not written; if not, then
根据所述待写文件数据的文件大小获取对应的存储区域。Obtaining a corresponding storage area according to the file size of the file data to be written.
在一个可能的设计中,所述根据所述待写文件数据的文件大小获取对应的存储区域包括:In a possible design, the obtaining the corresponding storage area according to the file size of the file data to be written includes:
判断所述待写文件数据的文件大小是否大于预设阈值;Determining whether the file size of the file data to be written is greater than a preset threshold;
若是,则判定所述待写文件数据为大文件,并将已存有大文件的存储区域作为所述待写文件数据对应的存储区域;If yes, determining that the file data to be written is a large file, and storing a storage area that has a large file as a storage area corresponding to the file data to be written;
若否,判定所述待写文件数据为小文件,并将已存有小文件的存储区域作为所述待写文件数据对应的存储区域。If not, it is determined that the file data to be written is a small file, and a storage area in which the small file has been stored is used as a storage area corresponding to the file data to be written.
在一个可能的设计中,当所述待写文件数据为某一已写入文件的文件碎片时,所述根据当前的策略类型获取对应的存储区域包括:In a possible design, when the file data to be written is a file fragment of a file that has been written, the obtaining the corresponding storage area according to the current policy type includes:
判断所述已写入文件所处的存储区域是否已满;若否,则Determining whether the storage area in which the file has been written is full; if not, then
将所述已写入文件所处的存储区域作为所述待写文件数据对应的存储区域。The storage area in which the file has been written is used as a storage area corresponding to the file data to be written.
在一个可能的设计中,在所述当前的策略类型为目录策略的情况下,所述根据当前的策略类型获取对应的存储区域包括:In a possible design, if the current policy type is a directory policy, the obtaining the corresponding storage area according to the current policy type includes:
获取所述待写文件数据的目录名;Obtaining a directory name of the file data to be written;
判断处于使用状态的存储区域中是否存在标识列表中包含所述目录名的存储区域,若是,则Determining, in the storage area in the use state, whether there is a storage area in the identifier list that includes the directory name, and if so,
将标识列表中包含所述目录名的存储区域作为所述待写文件数据对应的存储区域。A storage area including the directory name in the identifier list is used as a storage area corresponding to the file data to be written.
在一个可能的设计中,在所述当前的策略类型为用户策略的情况下,所述根据当前的策略类型获取对应的存储区域包括:In a possible design, if the current policy type is a user policy, the obtaining the corresponding storage area according to the current policy type includes:
获取所述待写文件数据的用户名;Obtaining a username of the file data to be written;
判断处于使用状态的存储区域中是否存在标识列表中包含所述用户名的存储区域,若是,则Determining, in the storage area in the use state, whether there is a storage area in the identifier list that includes the user name, and if so,
将标识列表中包含所述用户名的存储区域作为所述待写文件数据对应的存储区域。A storage area including the user name in the identifier list is used as a storage area corresponding to the file data to be written.
在一个可能的设计中,所述在所述存储区域上写入所述文件数据并在数据库中更新所述存储区域的属性信息之后,包括:In a possible design, after the file data is written on the storage area and the attribute information of the storage area is updated in a database, the method includes:
获取状态为using的存储区域;Obtain a storage area whose status is using;
判断所述状态为using的存储区域已使用的存储空间是否小于聚合阈值;Determining whether the storage space used by the storage area whose usage is using is smaller than an aggregation threshold;
若是,则将所述状态为using的存储区域记为子聚合存储区域;If yes, the storage area whose state is using is recorded as a sub-aggregation storage area;
将至少两个所述子聚合存储区域进行聚合到其中一个所述子聚合存储区域中,得到聚合存储区域。Aggregating at least two of the sub-aggregation storage areas into one of the sub-aggregation storage areas to obtain an aggregate storage area.
在一个可能的设计中,所述将所述子聚合存储区域进行聚合到其中一个所述子聚合存储区域中,得到聚合存储区域后,包括:In a possible design, after the aggregation of the sub-aggregation storage area into one of the sub-aggregation storage areas to obtain an aggregate storage area, the method includes:
更新已聚合的聚合存储区域的文件列表信息或目录标识或用户标识。Update the file list information or directory ID or user ID of the aggregated aggregate storage area.
根据本发明的另一个方面,提供一种分布式文件系统,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现本发明实施例提供的分布式文件系统的文件存储方法的步骤。According to another aspect of the present invention, a distributed file system is provided, comprising: a memory, a processor, and a computer program stored on the memory and operable on the processor, the computer program being processed The steps of the file storage method of the distributed file system provided by the embodiment of the present invention are implemented when the device is executed.
根据本发明的另一个方面,提供一种计算机可读存储介质,所述计算机可读存储介质上存储有文件存储程序,所述文件存储程序被处理器执行时实现本发明实施例提供的分布式文件系统的文件存储方法的步骤。According to another aspect of the present invention, a computer readable storage medium is provided, on which a file storage program is stored, and when the file storage program is executed by a processor, the distributed method provided by the embodiment of the present invention is implemented. The steps of the file system storage method of the file system.
本发明实施例的分布式文件系统的文件存储方法、分布式文件系统、计算机可读存储介质,在存储阶段就尽可能的做到大小文件存储单元分别连续存储,减少磁盘小碎片的产生,实现了文件存储的聚合,优化了对磁盘空间的管理和分配,提升了磁盘读写性能。The file storage method, the distributed file system, and the computer readable storage medium of the distributed file system in the embodiment of the present invention continuously store the size file storage units as much as possible in the storage stage, thereby reducing the generation of small fragments of the disk and realizing The aggregation of file storage optimizes the management and allocation of disk space and improves disk read and write performance.
附图说明DRAWINGS
图1为本发明实施例的一种分布式文件系统的文件存储方法的流程示意图;1 is a schematic flowchart of a file storage method of a distributed file system according to an embodiment of the present invention;
图2为本发明另一实施例的分布式文件系统的文件存储方法的流程示意图;2 is a schematic flowchart of a file storage method of a distributed file system according to another embodiment of the present invention;
图3为本发明另一实施例的分布式文件系统的文件存储方法的流程示意图;3 is a schematic flowchart of a file storage method of a distributed file system according to another embodiment of the present invention;
图4为本发明另一实施例的分布式文件系统的文件存储方法的流程示意图;4 is a schematic flowchart of a file storage method of a distributed file system according to another embodiment of the present invention;
图5为本发明另一实施例的分布式文件系统的文件存储方法的流程示意图;FIG. 5 is a schematic flowchart diagram of a file storage method of a distributed file system according to another embodiment of the present invention; FIG.
图6为本发明另一实施例的分布式文件系统的文件存储方法的流程示意图;FIG. 6 is a schematic flowchart diagram of a file storage method of a distributed file system according to another embodiment of the present invention; FIG.
图7为本发明另一实施例的分布式文件系统的文件存储方法的流程示意图;FIG. 7 is a schematic flowchart diagram of a file storage method of a distributed file system according to another embodiment of the present invention; FIG.
图8为本发明实施例的聚合存储区域的示意图。FIG. 8 is a schematic diagram of an aggregation storage area according to an embodiment of the present invention.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features, and advantages of the present invention will be further described in conjunction with the embodiments.
具体实施方式Detailed ways
为了使本发明所要解决的技术问题、技术方案及有益效果更加清楚、明白,以下结合附图和实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅 用以解释本发明,并不用于限定本发明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments, in order to make the present invention. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
如图1所示,本发明提供一种分布式文件系统的文件存储方法,包括以下步骤:As shown in FIG. 1 , the present invention provides a file storage method for a distributed file system, including the following steps:
101、开始。101, start.
将单个磁盘划分成多个存储区域。更具体地,将磁盘作为一个存储池,分成多个用于存储文件数据的存储区域,一般地,存储区域的存储空间大小是可配的,默认为1G,如此,目前主流的单个磁盘就会分成数千个存储区域。Divide a single disk into multiple storage areas. More specifically, the disk is divided into a plurality of storage areas for storing file data as a storage pool. Generally, the storage space size of the storage area is configurable, and the default is 1G. Thus, the current mainstream single disk will be Divided into thousands of storage areas.
102、获取待写文件数据。102. Acquire file data to be written.
103、根据当前的策略类型获取对应的存储区域。103. Acquire a corresponding storage area according to the current policy type.
当前的策略类型可以包括根据文件大小制定的文件策略、根据目录名称制定的目录策略或根据用户名称制定用户策略。其中,文件策略是指文件根据大小在存储池内分类存储;目录策略是指相同目录下的所有文件存储在同一存储区域内;用户策略是指同一用户的所有文件存储在同一个存储区域内。Current policy types can include file policies based on file size, directory policies based on directory names, or user policies based on user names. The file policy means that files are classified and stored in the storage pool according to the size; the directory policy means that all files in the same directory are stored in the same storage area; the user policy means that all files of the same user are stored in the same storage area.
104、将所述待写文件数据写入所述对应的存储区域。104. Write the to-be-written file data into the corresponding storage area.
105、结束。105, the end.
一般地,还需要在所述存储区域上写入所述文件数据并在数据库中更新所述存储区域的属性信息。Generally, it is also necessary to write the file data on the storage area and update attribute information of the storage area in a database.
本发明中所述的存储区域的属性信息可以包括但不限于:存储区域序列号、存储区域地址、存储区域写指针、写入的文件列表、存储区域大小文件标识、存储区域策略类型标识、存储区域的目录标识、存储区域的用户标识、存储区域状态标识。The attribute information of the storage area in the present invention may include, but is not limited to, a storage area serial number, a storage area address, a storage area write pointer, a written file list, a storage area size file identifier, a storage area policy type identifier, and a storage. The directory ID of the area, the user ID of the storage area, and the storage area status identifier.
其中,存储区域序列号用于标识和区分不同的存储区域。存储区域地址用于记录本存储区域所在磁盘位置。存储区域写指针用于记录本存储区域当前写入位置。写入的文件列表用于记录写入本存储区域的文件列表。存储区域大小文件标识用于标识本存储区域存储的是大文件还是小文件。存储区域策略类型标识用于标识本存储区域的策略类型。存储区域的目录标识用于标识本存储区域所属的目录,目录策略使用。存储区域的用户标识用于标识本存储区域所属的user,user策略使用。存储区域状态标识用于标识本存储区域的存储状态。The storage area serial number is used to identify and distinguish different storage areas. The storage area address is used to record the disk location where this storage area is located. The storage area write pointer is used to record the current write location of this storage area. The list of written files is used to record a list of files written to this storage area. The storage area size file identifier is used to identify whether the storage area stores large files or small files. The storage area policy type identifies the type of policy used to identify this storage area. The directory identifier of the storage area is used to identify the directory to which the storage area belongs, and the directory policy is used. The user ID of the storage area is used to identify the user and user policies to which the storage area belongs. The storage area status identifier is used to identify the storage status of the storage area.
上述存储区域状态包含三种free、using、full。存储区域的free状态是指本存储区域为空,还未曾写入过数据;存储区域的using状态是指存储区域已经写入了数据,使用中;存储区域full状态说明此存储区域数据已经写满,不能再申请写入数据。The above storage area states include three types of free, using, and full. The free state of the storage area means that the storage area is empty and data has not been written yet; the using state of the storage area means that the storage area has been written with data, and is in use; the full state of the storage area indicates that the storage area data is full. Can no longer apply to write data.
一般地,存储区域的属性存储在数据库中,当存储区域属性变化时更新数据库记录,在存储区域有数据读写时会去数据库获取这些属性信息。Generally, the attributes of the storage area are stored in the database, and the database records are updated when the storage area attributes change, and the attribute information is obtained when the data is read or written in the storage area.
在图1对应的实施例的基础上,在所述当前的策略类型为文件策略的情况下,则在执行步骤103,即所述根据当前的策略类型获取对应的存储区域时,如图2所示,包括:On the basis of the embodiment corresponding to FIG. 1 , when the current policy type is a file policy, step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 2 Show, including:
201、开始。201, start.
由于在文件策略下大文件和小文件会被存储到不同的存储区域里,因此,可以先对存储池内的存储区域创建两个队列,大文件存储区域队列和小文件存储区域队列,分别用于大文件和小文件的写盘。Because large files and small files are stored in different storage areas under the file policy, you can first create two queues, a large file storage area queue and a small file storage area queue for the storage area in the storage pool. Large files and small files are written.
202、查询所述待写文件数据是否已有存储;若是,则进入步骤203;若否,则进入步骤205。202. Query whether the file data to be written is already stored; if yes, go to step 203; if no, go to step 205.
203、判断已有存储的存储区域的存储空间是否足够;若是,则进入步骤204;若否,则进入步骤205。203. Determine whether the storage space of the storage area that is already stored is sufficient; if yes, go to step 204; if no, go to step 205.
204、将已有存储的储存区域作为所述待写文件数据的对应存储区域;并进入步骤210。204. The storage area that has been stored is used as a corresponding storage area of the file data to be written; and the process proceeds to step 210.
例如,一块磁盘(一个存储池)包含1000个存储区域,第一次存储有500个不同文件的数据,依次存储到前500个存储区域并更新存储区域大小文件标识等信息,第二次若再写500个不同文件的数据,则要先查询文件是否已有存储。如此,可以保证同一文件只存储到同一存储区域中。For example, a disk (a storage pool) contains 1000 storage areas. The first time, 500 different files are stored, and the data is stored in the first 500 storage areas and updated in the storage area size file identifier. To write data for 500 different files, you must first check if the file is already stored. In this way, it is guaranteed that the same file is only stored in the same storage area.
205、查询是否还有未写入文件数据的存储区域即状态为free的存储区域;若是,则进入步骤206;若否,则进入步骤207。205. Query whether there is a storage area in which the file data is not written, that is, a storage area whose state is free; if yes, go to step 206; if no, go to step 207.
206、将所述未写入文件数据的存储区域作为所述待写文件数据对应的存储区域;并进入步骤210。206. The storage area of the unwritten file data is used as a storage area corresponding to the file data to be written; and the process proceeds to step 210.
例如,一块磁盘(一个存储池)包含1000个存储区域,前500个存储区域已经存储了文件数据,且此次的待写入文件数据未有存储或已有存储的存储区域满了,则会从上次申请的最后一个存储区域之后的一个开始存储,即从第501个存储区域开始存储。For example, a disk (a storage pool) contains 1000 storage areas. The first 500 storage areas have already stored file data, and the file data to be written is not stored or the storage area of the existing storage is full. The storage is started from one after the last storage area of the last application, that is, from the 501th storage area.
207、获取所述待写文件数据的文件大小并判断是否大于预设阈值;若是,则进入步骤208;若否,则进入步骤209。207. Acquire a file size of the file data to be written and determine whether it is greater than a preset threshold; if yes, go to step 208; if no, go to step 209.
若没有未写入文件数据的存储区域,则表示本存储池所有的存储区域都均匀的存有不同的文件,此时,再次从存储池的头部第一个存储区域开始按照不同的大小文件标识来存后续的新写文件数据。If there is no storage area in which the file data is not written, it means that all the storage areas of the storage pool have different files uniformly. At this time, the files of different sizes are started again from the first storage area of the head of the storage pool. The identifier is used to store subsequent new file data.
208、判定所述待写文件数据为大文件,并将已存有大文件的存储区域作为所述待写文件数据对应的存储区域。并进入步骤210。208. Determine that the file data to be written is a large file, and use a storage area where the large file is stored as a storage area corresponding to the file data to be written. And proceeds to step 210.
209、判定所述待写文件数据为小文件,并将已存有小文件的存储区域作为所述待写 文件数据对应的存储区域。并进入步骤210。209. Determine that the to-be-written file data is a small file, and use a storage area where the small file is stored as a storage area corresponding to the to-be-written file data. And proceeds to step 210.
210、结束。210, the end.
本实施例中,若一块磁盘包含1000个存储区域,第一次存储有500个不同文件的数据,依次存储到前500个存储区域并更新存储区域大小文件标识等信息,第二次比如再写500个不同文件的数据,先查询文件是否已有存储,保证同一文件只存储到同一存储区域中(除非已写满),如果第二次写的文件中存在未有存储的新数据,则会从上次申请的最后一个存储区域之后的一个开始存储(即从第501个存储区域开始存储),依次存储下去,直到本存储池所有的存储区域都均匀的存有不同的文件,然后再次从存储池的头部第一个存储区域开始按照不同的大小文件标识来存后续的新写文件数据。尽可能的把不同的文件写到不同的存储区域中,实现尽可能的文件聚合之目的。In this embodiment, if a disk contains 1000 storage areas, the data of 500 different files is stored for the first time, and then stored in the first 500 storage areas and updated with the storage area size file identifier and the like, and the second time is rewritten. 500 different file data, first check whether the file is stored, to ensure that the same file is only stored in the same storage area (unless it is full), if there is new data stored in the second file, there will be Stored from the beginning of the last storage area of the last application (that is, stored from the 501th storage area), stored in turn, until all the storage areas of the storage pool are evenly stored with different files, and then again from The first storage area of the head of the storage pool begins to store subsequent new file data according to different size file identifiers. Write as many different files as possible to different storage areas to achieve the most possible file aggregation.
如果文件大小小于预设阈值则判定是小文件,否则是大文件。大文件和小文件分开存储在不同的存储区域,即对于任一存储区域,存储的内容只能全部是大文件或者全部是小文件。存储区域使用存储区域大小文件标识来区分,小文件与大文件分开存储,实现了大小文件的分类聚合,可以更有效的减少文件碎片,降低磁盘磁头的来回寻址次数,提升系统数据的吞吐量。If the file size is less than the preset threshold, it is judged to be a small file, otherwise it is a large file. Large files and small files are stored separately in different storage areas, that is, for any storage area, the stored content can only be large files or all small files. The storage area is distinguished by the storage area size file identifier. The small file is stored separately from the large file, which realizes the classification and aggregation of the size files, which can reduce file fragmentation more effectively, reduce the number of round-trip addressing of the disk head, and improve the throughput of system data. .
可见,当所述待写文件数据为未有存储的新数据时,优先存入未写入文件数据的存储区域即状态为free的存储区域。若没有状态为free的存储区域,则根据所述待写文件数据的文件大小获取对应的存储区域。It can be seen that when the file data to be written is new data that is not stored, the storage area that is not written to the file data, that is, the storage area whose state is free, is preferentially stored. If there is no storage area in the free state, the corresponding storage area is obtained according to the file size of the file data to be written.
在图1对应的实施例的基础上,在所述当前的策略类型为文件策略的情况下,在执行步骤103,即所述根据当前的策略类型获取对应的存储区域时,如图3所示,包括:On the basis of the embodiment corresponding to FIG. 1 , when the current policy type is a file policy, when step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 3 ,include:
301、开始。301, starting.
对存储池内的存储区域创建两个队列,大文件存储区域队列和小文件存储区域队列,分别用于大文件和小文件的写盘。Create two queues, a large file storage area queue and a small file storage area queue for the storage area in the storage pool, which are used for writing large files and small files respectively.
302、判断所述文件名是否在状态为using的存储区域的文件列表中存在,若是,则进入步骤304;若否,则进入步骤303。302. Determine whether the file name exists in a file list of a storage area whose status is using, and if yes, go to step 304; if no, go to step 303.
在具体实施时,可以根据待写文件数据的文件名在数据库中进行查询。当所述文件名不在状态为using的存储区域的文件列表中存在时,说明待写文件数据为新写或者对应的存储区域已写满,状态为full,需要重新申请新的存储区域。In the specific implementation, the query may be performed in the database according to the file name of the file data to be written. When the file name is not in the file list of the storage area whose status is using, the file data to be written is newly written or the corresponding storage area is full, and the state is full, and a new storage area needs to be re-applied.
303、确定待写文件数据对应的存储区域大小文件标识类型。并进入步骤305。303. Determine a storage area size file identifier type corresponding to the file data to be written. And proceeds to step 305.
在具体实施时,可以通过判断待写文件数据的大小是否大于预设阈值,若是,则其对应的存储区域大小文件标识类型为大文件,否则,对应的存储区域大小文件标识类型为小 文件。In a specific implementation, the size of the file to be written is greater than a preset threshold. If yes, the file size of the corresponding storage area is a large file. Otherwise, the file size of the corresponding storage area is a small file.
304、确定待写文件数据对应的存储区域。并进入步骤305。304. Determine a storage area corresponding to the file data to be written. And proceeds to step 305.
更具体地,若状态为using的存储区域的文件列表中存在待写文件数据的文件名,则该状态为using的存储区域为待写文件数据对应的存储区域;否则,从状态为free的存储区域中申请一个,如果申请成功就把此存储区域的大小文件标识更新到数据库。在本发明的另一实施例中,也可以根据存储区域大小文件标识类型从对应的存储区域队列的头部获取存储区域的地址,并把此存储区域放到队列的尾部。More specifically, if the file name of the file data to be written exists in the file list of the storage area whose usage is using, the storage area whose usage is using is the storage area corresponding to the file data to be written; otherwise, the storage state is free. Apply for one in the area, and update the size file identifier of this storage area to the database if the application is successful. In another embodiment of the present invention, the address of the storage area may also be obtained from the head of the corresponding storage area queue according to the storage area size file identification type, and the storage area is placed at the end of the queue.
305、对应的存储区域上报本次将写入的数据大小给数据库。并进入步骤306。305. The corresponding storage area reports the size of the data to be written to the database. And proceed to step 306.
数据库及时更新存储区域写指针,判断存储区域剩余空间是否还够下次数据写入,如果已满更新存储区域状态标识等信息,同时存储区域开始写入数据。The database updates the storage area write pointer in time to determine whether the remaining space in the storage area is still enough for the next data write. If the information such as the storage area status identifier is updated, the storage area begins to write data.
306、判断本次数据是否全部写完,若是,则进入步骤307;若否,则返回步骤302。306. Determine whether the current data is all written. If yes, proceed to step 307; if no, return to step 302.
307、结束。307, the end.
在图1对应的实施例的基础上,在所述当前的策略类型为目录策略的情况下,则在执行步骤103,即所述根据当前的策略类型获取对应的存储区域时,如图4所示,包括:On the basis of the embodiment corresponding to FIG. 1 , when the current policy type is a directory policy, step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 4 Show, including:
401、开始。401, start.
402、获取所述待写文件数据的目录名。402. Obtain a directory name of the file data to be written.
403、判断处于使用状态的存储区域中是否存在标识列表中包含所述目录名的存储区域,若是,则进入步骤404;若否,则进入步骤405。403. Determine whether a storage area including the directory name in the identifier list exists in the storage area in the use state, and if yes, proceed to step 404; if no, proceed to step 405.
本发明中处于使用状态的存储区域就是状态为using的存储区域,即查询此目录名已经存在于存储区域的目录标识列表中并且此存储区域状态为using,如果有说明此目录存在可写的存储区域,如果没有说明此目录没有可写的存储区域,原因有两个,一个是此目录本次第一次数据写盘,另一个是此目录原先有写盘但是对应的存储区域均已写满,状态为full。The storage area in the state of use in the present invention is a storage area whose state is using, that is, the directory name is already in the directory identifier list of the storage area and the storage area status is using, if there is a writeable storage of the directory Area, if there is no description that this directory has no writable storage area, there are two reasons, one is that this directory is the first data write this time, the other is that this directory originally had a write disk but the corresponding storage area is full The status is full.
404、将标识列表中包含所述目录名的存储区域作为所述待写文件数据对应的存储区域。完成步骤404后,进入步骤406。404. The storage area that includes the directory name in the identifier list is used as a storage area corresponding to the file data to be written. After completing step 404, the process proceeds to step 406.
405、将状态为free的存储区域作为所述待写文件数据对应的存储区域。并进入步骤406。405. The storage area whose state is free is used as the storage area corresponding to the file data to be written. And proceed to step 406.
即从状态为free的存储区域中申请一个,如果申请成功就把此存储区域的目录名标识更新到数据库。That is, apply for one from the storage area with the status of free, and update the directory name identifier of this storage area to the database if the application is successful.
406、结束。406, the end.
可见,目录策略把同一目录下的所有文件的文件碎片统一存储到单个或者有限数个存 储区域内,实现了文件的批量聚合,对于单目录下文件的连续访问尤其适用。It can be seen that the directory policy uniformly stores the file fragments of all the files in the same directory into a single or a limited number of storage areas, and implements batch aggregation of files, which is especially suitable for continuous access of files in a single directory.
在图4对应的实施例的基础上,在步骤405之后,步骤406之前还包括:On the basis of the corresponding embodiment of FIG. 4, after step 405, before step 406, the method further includes:
对应的存储区域上报本次将写入的数据大小给数据库,数据库及时更新存储区域写指针,判断存储区域剩余空间是否还够下次数据写入,如果已满更新存储区域状态标识等信息,同时存储区域开始写入数据。The corresponding storage area reports the size of the data to be written to the database, and the database updates the storage area write pointer in time to determine whether the remaining space of the storage area is sufficient for the next data write. If the storage area status identifier is updated, the information is simultaneously updated. The storage area begins to write data.
判断本次数据是否全部写完,如果还有待写数据,重新执行步骤402;如果没有就进入步骤406。It is judged whether or not the current data is all written. If there is still data to be written, step 402 is re-executed; if not, the process proceeds to step 406.
在图1对应的实施例的基础上,在所述当前的策略类型为用户策略的情况下,则在执行步骤103,即所述根据当前的策略类型获取对应的存储区域时,如图5所示,包括:On the basis of the embodiment corresponding to FIG. 1 , when the current policy type is a user policy, step 103 is performed, that is, when the corresponding storage area is obtained according to the current policy type, as shown in FIG. 5 Show, including:
501、开始。501, start.
502、获取所述待写文件数据的用户名。502. Acquire a username of the file data to be written.
503、判断处于使用状态的存储区域中是否存在标识列表中包含所述用户名的存储区域,若是,则进入步骤504;若否,则进入步骤505。503. Determine whether a storage area that includes the user name in the identifier list exists in the storage area in the use state, and if yes, go to step 504; if no, go to step 505.
即查询此用户名已经存在于存储区域的用户标识列表中并且此存储区域状态为using,如果有说明此用户存在可写的存储区域,如果没有说明此用户没有可写的存储区域,原因有两个,一个是此用户本次第一次数据写盘,另一个是此用户原先有写盘但是对应的存储区域均已写满,状态为full。That is, the username is already in the user ID list of the storage area and the storage area status is using. If there is a storage area indicating that the user has a writable area, if there is no storage area that the user has no writable, there are two reasons. One is that the user writes the first data this time, and the other is that the user originally wrote the disk but the corresponding storage area is full and the status is full.
504、将标识列表中包含所述用户名的存储区域作为所述待写文件数据对应的存储区域。完成步骤504后,进入步骤506。504. The storage area that includes the user name in the identifier list is used as a storage area corresponding to the file data to be written. After completing step 504, the process proceeds to step 506.
505、将状态为free的存储区域作为所述待写文件数据对应的存储区域。并进入步骤506。505. The storage area with the state of free is used as the storage area corresponding to the file data to be written. And proceed to step 506.
即从状态为free的存储区域中申请一个,如果申请成功就把此存储区域的用户标识更新到数据库。That is, apply for one from the storage area with the status of free, and update the user ID of this storage area to the database if the application is successful.
506、结束。506, the end.
本实施例中,根据不同的用户user,把其对应的文件分类存储到不同的存储区域内。即一个存储区域只保存其所属的user的文件数据。In this embodiment, according to different user users, the corresponding files are classified into different storage areas. That is, a storage area only saves the file data of the user to which it belongs.
在具体实施时,存储区域的用户标识确定办法为:当有新user A申请文件写盘时,会申请一个状态为free的存储区域用于存储文件数据,如果申请成功,就标识此存储区域所属的user为A,存储区域的状态为using,并存储到数据库。当user A后续写盘时会先查询数据库是否已经存在存储区域的用户标识为A的并且状态为using的存储区域,如果查询到就选择此存储区域并写盘,如果查询不到就重新申请新的存储区域。In a specific implementation, the user identifier of the storage area is determined by: when a new user A application file is written, a storage area with a free state is used to store file data, and if the application is successful, the storage area is identified. The user is A, the state of the storage area is using, and is stored in the database. When user A writes the disk, it will first check whether the database has a storage area with the user ID of the storage area and the status is using. If the query is found, select the storage area and write the disk. If the query is not available, re-apply for new. Storage area.
在图5对应的实施例的基础上,在步骤505之后,步骤506之前还包括:On the basis of the corresponding embodiment of FIG. 5, after step 505, before step 506, the method further includes:
对应的存储区域上报本次将写入的数据大小给数据库,数据库及时更新存储区域写指针,判断存储区域剩余空间是否还够下次数据写入,如果已满更新存储区域状态标识等信息,同时存储区域开始写入数据。The corresponding storage area reports the size of the data to be written to the database, and the database updates the storage area write pointer in time to determine whether the remaining space of the storage area is sufficient for the next data write. If the storage area status identifier is updated, the information is simultaneously updated. The storage area begins to write data.
判断本次数据是否全部写完,如果还有待写数据,重新执行步骤502;如果没有就进入步骤506。It is judged whether or not the current data is all written. If there is still data to be written, step 502 is re-executed; if not, the process proceeds to step 506.
在本发明的一个实施例中,当有文件开始请求写盘时,申请一个存储区域,如果选中的存储区域状态为free就修改状态为using,然后把存储区域状态、文件名、根据文件大小和阈值计算出来的大小文件标识、本次文件写盘的数据量大小等信息上报给数据库,数据库接收到信息后更新保存的信息。In an embodiment of the present invention, when a file starts to request to write a disk, a storage area is requested, and if the selected storage area state is free, the modified state is using, and then the storage area state, file name, file size, and The size file identifier calculated by the threshold value, the data volume size of the current file write disk, and the like are reported to the database, and the database updates the saved information after receiving the information.
当有文件后续请求写盘时,根据文件名查询数据库,此文件名是否已存在于某存储区域的列表内,如果存在,说明此文件之前已经在此存储区域写过数据,就选择此存储区域继续写盘,然后更新数据库的存储区域写指针等信息。When a file is subsequently requested to be written, the database is queried according to the file name. Whether the file name already exists in the list of a certain storage area. If it exists, it indicates that the file has been written in the storage area before, and the storage area is selected. Continue to write the disk, and then update the storage area of the database to write pointers and other information.
为了降低文件碎片的离散度,在上述任一实施例的基础上,在结束步骤之后,还可以进行存储区域聚合,如图6所示,包括:In order to reduce the dispersion of the file fragments, on the basis of any of the foregoing embodiments, after the end step, the storage area aggregation may also be performed, as shown in FIG. 6, including:
601、开始。601, start.
602、获取状态为using的存储区域。602. Obtain a storage area whose status is using.
603、判断所述状态为using的存储区域已使用的存储空间是否小于聚合阈值;若是,则进入步骤604;若否,则进入步骤606。603. Determine whether the storage space used by the storage area whose usage is using is smaller than the aggregation threshold; if yes, go to step 604; if no, go to step 606.
该聚合阈值包括存储区域阈值以及保持时间阈值,如果某一存储区域E使用率未达到存储区域阈值,并且在保持时间阈值T内没有数据变化,则判定存储区域E小于聚合阈值。The aggregation threshold includes a storage area threshold and a retention time threshold. If a storage area E usage rate does not reach the storage area threshold, and there is no data change within the retention time threshold T, it is determined that the storage area E is smaller than the aggregation threshold.
604、将所述状态为using的存储区域记为子聚合存储区域。并进入步骤605。604. Record the storage area whose status is using as a sub-aggregation storage area. And proceeds to step 605.
605、将至少两个所述子聚合存储区域进行聚合到其中一个所述子聚合存储区域中,得到聚合存储区域。并进入步骤606。605. Aggregate at least two of the sub-aggregation storage areas into one of the sub-aggregation storage areas to obtain an aggregation storage area. And proceeds to step 606.
若将存储区域E内的数据聚合到本存储池的其他存储区域F中去,则还有设置此迁移后的存储区域E为free。If the data in the storage area E is aggregated into the other storage area F of the storage pool, the storage area E after the migration is set to be free.
606、结束。606, the end.
在结束之前,可以更新已聚合的聚合存储区域的文件列表信息或目录标识或用户标识。Before ending, you can update the file list information or directory ID or user ID of the aggregated aggregate storage area.
图6所示的存储区域聚合,其对象为本存储池内状态为using的存储区域,状态为full 的存储区域是已经存储满了不需要再聚合,状态为free的存储区域说明还未存储过数据是空的也不需要聚合数据;存储区域聚合的基本方法为,从存储池的底部向存储池顶部逐个获取状态为using的存储区域,然后再从存储池的顶部向存储池底部逐个判断,把符合条件的存储区域进行聚合。The storage area aggregation shown in FIG. 6 is an storage area in which the state of the storage pool is using, and the storage area in which the state is full is already full and does not need to be re-aggregated, and the storage state in the state of free indicates that the data has not been stored yet. It is empty and does not need to aggregate data. The basic method of storage area aggregation is to obtain the storage area with the status of using one by one from the bottom of the storage pool to the top of the storage pool, and then judge one by one from the top of the storage pool to the bottom of the storage pool. Eligible storage areas are aggregated.
需要说明的是,本实施例中,存储池中的存储区域从存储池顶部向下依次编号并记录为存储区域序列号,用于标识不同的存储区域,存储区域序列号在存储池初始化时就完成,并把信息更新到数据库。给存储池内状态为using的存储区域创建两条链表,一条是从存储池底部向上排序,记为链表L1,一条是从存储池顶部向下排序,记为L2。It should be noted that, in this embodiment, the storage area in the storage pool is sequentially numbered from the top of the storage pool and recorded as the storage area serial number, which is used to identify different storage areas, and the storage area serial number is initialized when the storage pool is initialized. Finish and update the information to the database. Create two linked lists for the storage area whose storage state is in the storage pool. One is sorted upward from the bottom of the storage pool, and is recorded as the linked list L1. One is sorted from the top of the storage pool and is recorded as L2.
以文件策略的存储区域聚合为例,如图7所示,包括:Take the storage area aggregation of the file policy as an example. As shown in Figure 7, it includes:
701、开始。701, start.
702、从存储池的底部开始获取一个状态为using的存储区域,即获取链表L1的第一个元素。702. Acquire a storage area with a usage from the bottom of the storage pool, that is, obtain the first element of the linked list L1.
703、计算该存储区域已经使用的存储空间大小是否小于聚合阈值;若是,则进入步骤705;若否,则进入步骤704。703. Calculate whether the storage space size used by the storage area is smaller than an aggregation threshold; if yes, go to step 705; if no, go to step 704.
704、获取链表L1的下一个元素,即向上获取下一个状态为using的存储区域,如果获取到了,则执行步骤703,如果获取不到,则说明没有需要聚合的存储区域了,此时,进入步骤711。704. Acquire the next element of the linked list L1, that is, obtain the storage area whose next state is using. If yes, go to step 703. If not, it means that there is no storage area to be aggregated. Step 711.
705、记此存储区域为A,对应存储区域序列号为SN1,已使用空间大小为K1;并进入步骤706。705. The storage area is A, the corresponding storage area serial number is SN1, and the used space size is K1; and the process proceeds to step 706.
706、从存储池的顶部开始获取一个状态为using的存储区域,即获取链表L2的第一个元素。706. Acquire a storage area with a usage from the top of the storage pool, that is, obtain the first element of the linked list L2.
707、判断此存储区域的序列号是否大于等于SN1,如果是说明已经没有可以用于聚合存储区域A的存储区域了,执行步骤704;如果此存储区域的序列号小于SN1,执行步骤708。707. Determine whether the serial number of the storage area is greater than or equal to SN1. If it is that there is no storage area that can be used to aggregate the storage area A, go to step 704. If the serial number of the storage area is smaller than SN1, go to step 708.
708、计算此存储区域的剩余空间大小是否大于K1,如果否,说明此存储区域的剩余空间不足以保存存储区域A中的数据,忽略,执行步骤709;如果是,记此存储区域为B,执行步骤710。708. Calculate whether the remaining space of the storage area is greater than K1. If not, the remaining space of the storage area is insufficient to save the data in the storage area A. If not, go to step 709; if yes, record the storage area as B. Go to step 710.
709、获取链表L2的下一个元素,即向下获取下一个状态为using的存储区域,执行步骤707。709. Acquire the next element of the linked list L2, that is, obtain the storage area with the next state as using, and perform step 707.
710、把存储区域A中的数据迁移到存储区域B,更新存储区域B的文件列表,存储区域写指针等信息,然后删除掉存储区域A的数据,设置存储区域A状态为free,把变 更的信息更新到数据库。执行步骤704。710. The data in the storage area A is migrated to the storage area B, the file list of the storage area B is updated, the storage area write pointer and the like are deleted, and then the data of the storage area A is deleted, and the storage area A state is set to free, and the changed The information is updated to the database. Go to step 704.
711、结束。711, the end.
以图8为例,共有M个存储区域,若存储区域1内有data1;存储区域2内有data2;存储区域3内有data3;存储区域K内有datak;存储区域1、2、3、K均为可以聚合的子聚合存储区域,采用图8所示的方法,其得到的聚合区域为存储区域1,该存储区域1内依次存储有data1、datak、data3、data2。Taking FIG. 8 as an example, there are M storage areas, if there is data1 in the storage area 1; data2 in the storage area 2; data3 in the storage area 3; datak in the storage area K; storage areas 1, 2, 3, K All of the sub-aggregation storage areas that can be aggregated are obtained by the method shown in FIG. 8. The obtained aggregation area is the storage area 1, and data1, datak, data3, and data2 are sequentially stored in the storage area 1.
目录策略和用户策略的聚合步骤与文件策略的存储区域聚合实施步骤基本一致,不同的就是在把存储区域A中的数据迁移到存储区域B之后,需要多更新存储区域B的目录标识或者用户标识,即把存储区域A的目录标识或者用户标识添加到存储区域B上,存储区域B对应两个或者多个目录标识或者用户标识,把变更的信息更新到数据库。The aggregation step of the directory policy and the user policy is basically the same as the storage area aggregation of the file policy. The difference is that after the data in the storage area A is migrated to the storage area B, the directory identifier or user identifier of the storage area B needs to be updated. That is, the directory identifier or the user identifier of the storage area A is added to the storage area B. The storage area B corresponds to two or more directory identifiers or user identifiers, and the changed information is updated to the database.
可见,本实施例中,在实现存储区域聚合时,计算本存储区域A的已使用空间大小,从本存储池头部开始查找状态为using并且剩余存储空间大于本存储区域已使用空间的存储区域作为聚合目的存储区域B。聚合完成后,更新目的存储区域B的文件名列表等信息。对于采用的目录策略存储方式,需要把存储区域A的目录标识添加到存储区域B的目录标识中,以实现原先存储区域A对应的目录下的文件操作可以正常迁移到存储区域B中进行。同样的,对于采用的user策略存储方式,需要把存储区域A的用户标识添加到存储区域B的用户标识中,以实现原先的存储区域A对应的user的文件操作可以正常迁移到存储区域B中进行。It can be seen that, in this embodiment, when the storage area aggregation is implemented, the used space size of the storage area A is calculated, and the storage area whose usage status is using and the remaining storage space is larger than the used space of the storage area is searched from the head of the storage pool. The storage area B is used as an aggregation destination. After the aggregation is completed, information such as a file name list of the destination storage area B is updated. For the directory policy storage mode, the directory identifier of the storage area A needs to be added to the directory identifier of the storage area B, so that the file operations in the directory corresponding to the original storage area A can be migrated to the storage area B. Similarly, for the user policy storage mode, the user ID of the storage area A needs to be added to the user ID of the storage area B, so that the file operation of the user corresponding to the original storage area A can be migrated to the storage area B. get on.
此外,本发明实施例还提供一种分布式文件系统,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现本发明实施例提供的分布式文件系统的文件存储方法的步骤。In addition, an embodiment of the present invention further provides a distributed file system, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being the processor The steps of the file storage method of the distributed file system provided by the embodiment of the present invention are implemented.
需要说明的是,上述分布式文件系统实施例与方法实施例属于同一构思,其具体实现过程详见方法实施例,且方法实施例中的技术特征在分布式文件系统实施例中均对应适用,这里不再赘述。It should be noted that the foregoing distributed file system embodiment and the method embodiment are in the same concept, and the specific implementation process is described in the method embodiment, and the technical features in the method embodiment are applicable in the distributed file system embodiment. I won't go into details here.
另外,本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有文件存储程序,所述文件存储程序被处理器执行时实现本发明实施例提供的分布式文件系统的文件存储方法的步骤。In addition, the embodiment of the present invention further provides a computer readable storage medium, where the file storage program is stored on the computer readable storage medium, and the file storage program is executed by the processor to implement the distributed file provided by the embodiment of the present invention. The steps of the system's file storage method.
需要说明的是,上述计算机可读存储介质上的文件存储程序实施例与方法实施例属于同一构思,其具体实现过程详见方法实施例,且方法实施例中的技术特征在上述计算机可 读存储介质的实施例中均对应适用,这里不再赘述。It should be noted that the file storage program embodiment on the above computer readable storage medium belongs to the same concept as the method embodiment, and the specific implementation process is described in the method embodiment, and the technical features in the method embodiment are in the above computer readable storage. The media embodiments are applicable, and are not described here.
本发明提供一种分布式文件系统的文件存储方法、分布式文件系统及计算机可读存储介质,在存储阶段就尽可能的做到按当前策略对文件进行存储,减少磁盘小碎片的产生,实现了文件存储的聚合,优化了对磁盘空间的管理和分配,提升了磁盘读写性能。The invention provides a file storage method, a distributed file system and a computer readable storage medium of a distributed file system. In the storage stage, the file is stored according to the current policy as much as possible, thereby reducing the generation of small fragments of the disk and realizing The aggregation of file storage optimizes the management and allocation of disk space and improves disk read and write performance.
本发明通过设置同一文件的数据存储在同一个存储区域的方法,有效的减少了文件碎片分布的离散度,实现了一定程度上的文件聚合,提升了文件访问的吞吐量。By setting the data of the same file to be stored in the same storage area, the invention effectively reduces the dispersion of file fragment distribution, realizes a certain degree of file aggregation, and improves the throughput of file access.
本发明将大文件和小文件分别存储到不同的存储区域内,实现了大小文件的分类聚合,可以更有效的减少文件碎片,降低磁盘磁头的来回寻址次数,提升系统数据的吞吐量。The invention stores large files and small files in different storage areas respectively, realizes classification and aggregation of large and small files, can more effectively reduce file fragmentation, reduce the number of back and forth addressing of the disk heads, and improve the throughput of system data.
本发明把同一目录下的所有文件的文件碎片统一存储到单个或者有限数个存储区域内,实现了文件的批量聚合,对于单目录下文件的连续访问尤其适用。The invention uniformly stores file fragments of all files in the same directory into a single or a limited number of storage areas, and implements batch aggregation of files, which is especially suitable for continuous access of files in a single directory.
本发明把同一用户user的所有文件的文件碎片统一存储到单个或者有限数个存储区域内,实现了文件的批量聚合,对于某user在一时间段内大量的磁盘访问情况具有更高的磁盘IO和数据吞吐量。The invention uniformly stores file fragments of all the files of the same user user into a single or a limited number of storage areas, realizes batch aggregation of files, and has higher disk IO for a large number of disk access situations of a user in a period of time. And data throughput.
本发明在一段时间T内写入磁盘的数据量较小的时候,文件碎片的分布还是较为离散,可以通过聚合方式把存储池内两个或者多个存储区域的文件碎片聚合到一个存储区域里,可有效的降低文件碎片的离散度,也可优化磁盘的空间管理和再次分配。When the amount of data written to the disk is small in a period of time T, the distribution of the file fragments is relatively discrete, and the file fragments of the two or more storage areas in the storage pool can be aggregated into one storage area by means of aggregation. It can effectively reduce the dispersion of file fragmentation and optimize the space management and redistribution of disks.
以上参照附图说明了本发明的优选实施例,并非因此局限本发明的权利范围。本领域技术人员不脱离本发明的范围和实质,可以有多种变型方案实现本发明,比如作为一个实施例的特征可用于另一实施例而得到又一实施例。凡在运用本发明的技术构思之内所作的任何修改、等同替换和改进,均应在本发明的权利范围之内。The preferred embodiments of the present invention have been described above with reference to the drawings, and are not intended to limit the scope of the invention. A person skilled in the art can implement the invention in various variants without departing from the scope and spirit of the invention. For example, the features of one embodiment can be used in another embodiment to obtain a further embodiment. Any modifications, equivalent substitutions and improvements made within the technical concept of the invention are intended to be included within the scope of the invention.

Claims (10)

  1. 一种分布式文件系统的文件存储方法,其中,包括:A file storage method for a distributed file system, comprising:
    获取待写文件数据;Obtaining file data to be written;
    根据当前的策略类型获取对应的存储区域;所述策略类型包括根据待写文件大小的文件策略、根据待写文件所属目录的目录策略或根据待写文件所属用户的用户策略;Obtaining a corresponding storage area according to the current policy type; the policy type includes a file policy according to a file size to be written, a directory policy according to a directory to which the file to be written belongs, or a user policy according to a user to which the file to be written belongs;
    将所述待写文件数据写入所述对应的存储区域。Writing the file data to be written to the corresponding storage area.
  2. 根据权利要求1所述的分布式文件系统的文件存储方法,其中,在所述当前的策略类型为文件策略的情况下,当所述待写文件数据为未有存储的新数据时,所述根据当前的策略类型获取对应的存储区域包括:The file storage method of the distributed file system according to claim 1, wherein, when the current policy type is a file policy, when the file data to be written is new data that is not stored, Obtaining the corresponding storage area according to the current policy type includes:
    判断是否有未写入文件数据的存储区域;若否,则Determine if there is a storage area where file data is not written; if not, then
    根据所述待写文件数据的文件大小获取对应的存储区域。Obtaining a corresponding storage area according to the file size of the file data to be written.
  3. 根据权利要求2所述的分布式文件系统的文件存储方法,其中,所述根据所述待写文件数据的文件大小获取对应的存储区域包括:The file storage method of the distributed file system according to claim 2, wherein the obtaining the corresponding storage area according to the file size of the file data to be written comprises:
    判断所述待写文件数据的文件大小是否大于预设阈值;Determining whether the file size of the file data to be written is greater than a preset threshold;
    若是,则判定所述待写文件数据为大文件,并将已存有大文件的存储区域作为所述待写文件数据对应的存储区域;If yes, determining that the file data to be written is a large file, and storing a storage area that has a large file as a storage area corresponding to the file data to be written;
    若否,判定所述待写文件数据为小文件,并将已存有小文件的存储区域作为所述待写文件数据对应的存储区域。If not, it is determined that the file data to be written is a small file, and a storage area in which the small file has been stored is used as a storage area corresponding to the file data to be written.
  4. 根据权利要求1所述的分布式文件系统的文件存储方法,其中,当所述待写文件数据为某一已写入文件的文件碎片时,所述根据当前的策略类型获取对应的存储区域包括:The file storage method of the distributed file system according to claim 1, wherein when the file data to be written is a file fragment of a file that has been written, the obtaining the corresponding storage area according to the current policy type includes :
    判断所述已写入文件所处的存储区域是否已满;若否,则Determining whether the storage area in which the file has been written is full; if not, then
    将所述已写入文件所处的存储区域作为所述待写文件数据对应的存储区域。The storage area in which the file has been written is used as a storage area corresponding to the file data to be written.
  5. 根据权利要求1所述的分布式文件系统的文件存储方法,其中,在所述当前的策略类型为目录策略的情况下,所述根据当前的策略类型获取对应的存储区域包括:The file storage method of the distributed file system according to claim 1, wherein, when the current policy type is a directory policy, the obtaining the corresponding storage area according to the current policy type includes:
    获取所述待写文件数据的目录名;Obtaining a directory name of the file data to be written;
    判断处于使用状态的存储区域中是否存在标识列表中包含所述目录名的存储区域,若是,则Determining, in the storage area in the use state, whether there is a storage area in the identifier list that includes the directory name, and if so,
    将标识列表中包含所述目录名的存储区域作为所述待写文件数据对应的存储区域。A storage area including the directory name in the identifier list is used as a storage area corresponding to the file data to be written.
  6. 根据权利要求1所述的分布式文件系统的文件存储方法,其中,在所述当前的策略类型为用户策略的情况下,所述根据当前的策略类型获取对应的存储区域包括:The file storage method of the distributed file system according to claim 1, wherein, when the current policy type is a user policy, the obtaining the corresponding storage area according to the current policy type includes:
    获取所述待写文件数据的用户名;Obtaining a username of the file data to be written;
    判断处于使用状态的存储区域中是否存在标识列表中包含所述用户名的存储区域,若是,则Determining, in the storage area in the use state, whether there is a storage area in the identifier list that includes the user name, and if so,
    将标识列表中包含所述用户名的存储区域作为所述待写文件数据对应的存储区域。A storage area including the user name in the identifier list is used as a storage area corresponding to the file data to be written.
  7. 根据权利要求1所述的分布式文件系统的文件存储方法,其中,所述在所述存储区域上写入所述文件数据并在数据库中更新所述存储区域的属性信息之后,包括:The file storage method of the distributed file system according to claim 1, wherein after the file data is written on the storage area and the attribute information of the storage area is updated in a database, the method includes:
    获取状态为using的存储区域;Obtain a storage area whose status is using;
    判断所述状态为using的存储区域已使用的存储空间是否小于聚合阈值;Determining whether the storage space used by the storage area whose usage is using is smaller than an aggregation threshold;
    若是,则将所述状态为using的存储区域记为子聚合存储区域;If yes, the storage area whose state is using is recorded as a sub-aggregation storage area;
    将至少两个所述子聚合存储区域聚合到其中一个所述子聚合存储区域中,得到聚合存储区域。Aggregating at least two of the sub-aggregate storage areas into one of the sub-aggregation storage areas to obtain an aggregated storage area.
  8. 根据权利要求1所述的分布式文件系统的文件存储方法,其中,所述将所述子聚合存储区域进行聚合到其中一个所述子聚合存储区域中,得到聚合存储区域后,包括:The file storage method of the distributed file system according to claim 1, wherein the aggregating the sub-aggregation storage area into one of the sub-aggregation storage areas to obtain an aggregation storage area comprises:
    更新已聚合的聚合存储区域的文件列表信息或目录标识或用户标识。Update the file list information or directory ID or user ID of the aggregated aggregate storage area.
  9. 一种分布式文件系统,其中,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至8中任一项所述的分布式文件系统的文件存储方法的步骤。A distributed file system, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being executed by the processor to implement the claims The steps of the file storage method of the distributed file system according to any one of 1 to 8.
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有文件存储程序,所述文件存储程序被处理器执行时实现如权利要求1至8中任一项所述的分布式文件系统的文件存储方法的步骤。A computer readable storage medium, wherein the computer readable storage medium stores a file storage program, the file storage program being executed by a processor to implement the distributed according to any one of claims 1 to 8. The steps of the file system storage method of the file system.
PCT/CN2019/074332 2018-02-01 2019-02-01 File storage method for distributed file system and distributed file system WO2019149261A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810103081.6 2018-02-01
CN201810103081.6A CN110109886B (en) 2018-02-01 2018-02-01 File storage method of distributed file system and distributed file system

Publications (1)

Publication Number Publication Date
WO2019149261A1 true WO2019149261A1 (en) 2019-08-08

Family

ID=67478607

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/074332 WO2019149261A1 (en) 2018-02-01 2019-02-01 File storage method for distributed file system and distributed file system

Country Status (2)

Country Link
CN (1) CN110109886B (en)
WO (1) WO2019149261A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400302A (en) * 2019-11-28 2020-07-10 杭州海康威视系统技术有限公司 Method, device and system for modifying continuously stored data
CN112597481A (en) * 2020-12-29 2021-04-02 平安银行股份有限公司 Sensitive data access method and device, computer equipment and storage medium
CN112925472A (en) * 2019-12-06 2021-06-08 阿里巴巴集团控股有限公司 Request processing method and device, electronic equipment and computer storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110636122A (en) * 2019-09-11 2019-12-31 中移(杭州)信息技术有限公司 Distributed storage method, server, system, electronic device, and storage medium
CN113095645B (en) * 2021-03-31 2023-06-23 中国科学院自动化研究所 Heterogeneous unmanned aerial vehicle task allocation method aiming at emergency scene with uneven task distribution
CN113192558A (en) * 2021-05-26 2021-07-30 北京自由猫科技有限公司 Reading and writing method for third-generation gene sequencing data and distributed file system
CN114265559B (en) * 2021-12-24 2024-02-09 中电信数智科技有限公司 Storage management method and device, electronic equipment and storage medium
CN117408576B (en) * 2023-12-14 2024-03-26 鲁信科技股份有限公司 Product quality analysis method, device and medium based on industrial Internet

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7165079B1 (en) * 2001-06-25 2007-01-16 Network Appliance, Inc. System and method for restoring a single data stream file from a snapshot
CN103885887A (en) * 2012-12-21 2014-06-25 腾讯科技(北京)有限公司 User data storage method, reading method and system
CN105138655A (en) * 2015-08-31 2015-12-09 深圳市茁壮网络股份有限公司 Data storage and reading method and device
CN106407355A (en) * 2016-09-07 2017-02-15 中国农业银行股份有限公司 Data storage method and device
CN106980618A (en) * 2016-01-15 2017-07-25 航天信息股份有限公司 File memory method and system based on MongoDB distributed type assemblies frameworks

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0328888D0 (en) * 2003-12-12 2004-01-14 British Telecomm Distributed computer system
CN101556557B (en) * 2009-05-14 2011-03-23 浙江大学 Object file organization method based on object storage device
CN102394935A (en) * 2011-11-10 2012-03-28 方正国际软件有限公司 Wireless shared storage system and wireless shared storage method thereof
CN102377827A (en) * 2011-12-13 2012-03-14 方正国际软件有限公司 Multilevel cloud storage system and storage method thereof
US9633041B2 (en) * 2013-09-26 2017-04-25 Taiwan Semiconductor Manufacturing Co., Ltd. File block placement in a distributed file system network
JP6106901B2 (en) * 2013-12-31 2017-04-05 ▲ホア▼▲ウェイ▼技術有限公司Huawei Technologies Co.,Ltd. Data processing method and device in distributed file storage system
CN103778222A (en) * 2014-01-22 2014-05-07 浪潮(北京)电子信息产业有限公司 File storage method and system for distributed file system
CN105981033B (en) * 2014-02-14 2019-05-07 慧与发展有限责任合伙企业 Placement Strategy is distributed into set of segments
CN104391961A (en) * 2014-12-03 2015-03-04 浪潮集团有限公司 Tens of millions of small file data read and write solution strategy
CN105718484A (en) * 2014-12-04 2016-06-29 中兴通讯股份有限公司 File writing method, file reading method, file deletion method, file query method and client
US10108631B2 (en) * 2016-01-06 2018-10-23 Acronis International Gmbh System and method of removing unused regions of a data file
CN106227795A (en) * 2016-07-20 2016-12-14 曙光信息产业(北京)有限公司 The detection method of classification storage and system
CN106294585B (en) * 2016-07-28 2019-10-18 上海倍增智能科技有限公司 A kind of storage method under cloud computing platform
CN107436952A (en) * 2017-08-15 2017-12-05 郑州云海信息技术有限公司 A kind of document copying method and device based on distributed memory system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7165079B1 (en) * 2001-06-25 2007-01-16 Network Appliance, Inc. System and method for restoring a single data stream file from a snapshot
CN103885887A (en) * 2012-12-21 2014-06-25 腾讯科技(北京)有限公司 User data storage method, reading method and system
CN105138655A (en) * 2015-08-31 2015-12-09 深圳市茁壮网络股份有限公司 Data storage and reading method and device
CN106980618A (en) * 2016-01-15 2017-07-25 航天信息股份有限公司 File memory method and system based on MongoDB distributed type assemblies frameworks
CN106407355A (en) * 2016-09-07 2017-02-15 中国农业银行股份有限公司 Data storage method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400302A (en) * 2019-11-28 2020-07-10 杭州海康威视系统技术有限公司 Method, device and system for modifying continuously stored data
CN111400302B (en) * 2019-11-28 2023-09-19 杭州海康威视系统技术有限公司 Modification method, device and system for continuous storage data
CN112925472A (en) * 2019-12-06 2021-06-08 阿里巴巴集团控股有限公司 Request processing method and device, electronic equipment and computer storage medium
CN112597481A (en) * 2020-12-29 2021-04-02 平安银行股份有限公司 Sensitive data access method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110109886B (en) 2022-11-18
CN110109886A (en) 2019-08-09

Similar Documents

Publication Publication Date Title
WO2019149261A1 (en) File storage method for distributed file system and distributed file system
US10996863B1 (en) Nonvolatile memory with configurable zone/namespace parameters and host-directed copying of data across zones/namespaces
US9355112B1 (en) Optimizing compression based on data activity
US8769225B2 (en) Optimization of data migration between storage mediums
CN101556557B (en) Object file organization method based on object storage device
US8478731B1 (en) Managing compression in data storage systems
CN110622152A (en) Scalable database system for querying time series data
EP2735978A1 (en) Storage system and management method used for metadata of cluster file system
JP2017021805A (en) Interface providing method capable of utilizing data attribute reference data arrangement in nonvolatile memory device and computer device
WO2013164878A1 (en) Management apparatus and management method for computer system
US11740801B1 (en) Cooperative flash management of storage device subdivisions
US20060212495A1 (en) Method and system for storing data into a database
US9135262B2 (en) Systems and methods for parallel batch processing of write transactions
WO2015024474A1 (en) Rapid calculation method for electric power reliability index based on multithread processing of cache data
CN109947363A (en) A kind of data cache method of distributed memory system
US11914894B2 (en) Using scheduling tags in host compute commands to manage host compute task execution by a storage device in a storage system
CN113568582B (en) Data management method, device and storage equipment
US20170123975A1 (en) Centralized distributed systems and methods for managing operations
WO2023000536A1 (en) Data processing method and system, device, and medium
JP5130169B2 (en) Method for allocating physical volume area to virtualized volume and storage device
US20120209891A1 (en) Database management method, database management system and database management program
WO2018077092A1 (en) Saving method applied to distributed file system, apparatus and distributed file system
US8478936B1 (en) Spin down of storage resources in an object addressable storage system
CN111913658B (en) Method and system for improving load performance in cloud object storage system
CN109508140A (en) Storage resource management method, apparatus, electronic equipment and electronic equipment, system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19746671

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02/12/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 19746671

Country of ref document: EP

Kind code of ref document: A1