CN116303280A - File data hierarchical storage method and device - Google Patents

File data hierarchical storage method and device Download PDF

Info

Publication number
CN116303280A
CN116303280A CN202310149147.6A CN202310149147A CN116303280A CN 116303280 A CN116303280 A CN 116303280A CN 202310149147 A CN202310149147 A CN 202310149147A CN 116303280 A CN116303280 A CN 116303280A
Authority
CN
China
Prior art keywords
file
migration
data
read
data layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310149147.6A
Other languages
Chinese (zh)
Inventor
王卫伟
高利娟
杨佳东
孙涛
江云飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 52 Research Institute
Original Assignee
CETC 52 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 52 Research Institute filed Critical CETC 52 Research Institute
Priority to CN202310149147.6A priority Critical patent/CN116303280A/en
Publication of CN116303280A publication Critical patent/CN116303280A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file data hierarchical storage method and device, which comprises recursively traversing all file size and type attributes, and preferentially migrating files to a file system target layer according to the file attribute types. The file data hierarchical storage method adopts the file access heat, the last migration score and the adjustment coefficient to comprehensively calculate the migration score of each file, wherein the adjustment system adjusts according to the size and type attribute of the file, the calculation mode of the migration score has wider application range, the file migration is more reasonable and more accurate, and the access performance of the file system hierarchical storage system is better improved; when the file data hierarchical storage method is used for transferring across file system layers, the file still maintains a uniform business IO access path, a path for redirecting the link index node to the transferred file is not required to be established, the occupation of resources is further reduced, meanwhile, IO amplification during file reading and writing is reduced, and the complexity of a system is reduced.

Description

File data hierarchical storage method and device
Technical Field
The invention belongs to the field of data storage of file systems, and particularly relates to a hierarchical storage method and device for file data.
Background
In the field of data storage, hierarchical storage is a common data storage method, and data is stored on storage media with different characteristics according to the cold and hot degree of service access, so that the performance of a storage system is improved, and meanwhile, the cost is considered. The cost performance of the whole storage system is improved by using different storage media to store business data with different heat.
Hierarchical storage is mainly used in both block storage systems and file storage systems. In the aspect of a file storage system, the industry mainly carries out automatic layering on data through a management system, the cold and hot degrees of the data are identified through built-in strategies to transfer the data to different storage media, the cold data are placed into low-cost low-speed media, and the hot data are placed into high-cost high-speed media, so that the cost performance of the whole storage system is improved. Although the method is convenient, the cost of data classification and layering can be reduced, and the storage efficiency is improved, the layering storage method of the current file storage system needs to carry out statistics and refined analysis on the access heat of the data in advance, so that the method is complex in logic, easy to make mistakes, easy to cause excessive migration of the data because of unreasonable migration strategies, occupies excessive resources of a controller, and influences business IO. For example, when a part of data blocks of a large file are accessed, the whole file is migrated to a storage medium with higher performance, controller resources are wasted, and the migration process easily causes business IO fluctuation.
Patent CN111741107a discloses a layering method and device based on a file storage system, and an electronic device, which only obtains file access time by querying file attributes to judge file access heat, and performs data layering migration based on the heat. All file migration strategies are the same, and the different access conditions of each file and the attributes such as the size of the file are not considered (for example, small files are most in a database scene, the small files are expected to be preferentially migrated to a hot data layer, and large files are mainly migrated in non-structural data scenes such as images and videos, and large file objects are also mainly migrated), so that the layered migration efficiency of various file data cannot be fully exerted. In addition, after the file in the prior art is transferred across layers, a soft link needs to be created to point to an original file system path, and service IO continues to access the file system path before transfer so as to keep the access path of the transferred file unchanged, thereby not influencing external service. The additional creation of the soft links may cause excessive occupation of inodes of the file system, especially in a large number of small file migration scenarios, the file system will additionally occupy a large number of inodes, increasing complexity of file access, failing to fully exert space utilization efficiency of the file system, and bringing additional read-write amplification.
Disclosure of Invention
The invention aims to solve the problems in the background art and provides a file data hierarchical storage method and device.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
the invention provides a file data hierarchical storage method, which comprises recursively traversing all file size and type attributes, and preferentially migrating files to a file system target layer according to the file attribute types.
And counting the read-write request information of the files in the last period of time, and calculating the access heat of each file according to the read-write request information.
And comprehensively calculating the migration score of each file according to the access heat, the last migration score and the adjustment coefficient.
The files are ordered according to the migration scores.
And migrating each file to a corresponding file system target layer according to the ordered migration scores.
Preferably, the file system target layer includes a hot data layer, a data layer, and a cold data layer.
Preferably, the read-write request information includes a read-write number of times and a read-write data length range for each file.
And when the access heat is calculated, weighting calculation is carried out on the read-write times and the read-write data length range of each file.
Preferably, when the length range of the read-write data is calculated, the data offset address and the length of the read-write operation of each file are counted, and overlapped data are removed.
Preferably, the adjustment system α adjusts according to attributes of the file to be migrated set by the user:
when the attribute type of the file to be migrated is set to be migrated to the hot data layer, alpha >0;
when the file attribute type to be migrated is set to migrate to the cold data layer, α <0.
Preferably, migrating each file to a corresponding file system target layer according to the ordered migration scores includes:
when the migration score of the file to be migrated is larger than the migration threshold of the thermal data layer, migrating the file to the thermal data layer;
when the migration score of the file to be migrated is smaller than the migration threshold of the data layer, migrating the file to the cold data layer;
and when the migration score of the file to be migrated is between the migration threshold of the hot data layer and the migration threshold of the data layer, migrating the file to the data layer.
Preferably, the file data hierarchical storage device comprises an access module, a statistics module, a migration score calculation module, a sorting module and a migration module, wherein:
the access module is used for accessing the file system.
The statistics module is used for counting the read-write request information of the files in the last period of time and calculating the access heat of each file according to the read-write request information.
And the migration score calculation module comprehensively calculates the migration score of each file according to the access heat, the last migration score and the adjustment coefficient.
The sorting module sorts the files according to the migration scores.
And the migration module migrates each file to file systems of the hot data layer, the data layer and the cold data layer according to the ordered migration scores.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the file data hierarchical storage method, the migration score of each file is comprehensively calculated by adopting the file access heat, the last migration score and the adjustment coefficient, wherein the file access heat is calculated by weighting through the read-write times and the read-write data length range of each file, and the adjustment system is adjusted according to the size and type attribute of the file;
2. when the file data hierarchical storage method is used for transferring across file system layers, the file still maintains a uniform business IO access path, a path for redirecting the link index node to the transferred file is not required to be established, the occupation of resources is further reduced, meanwhile, IO amplification during file reading and writing is reduced, and the complexity of a system is reduced.
Drawings
FIG. 1 is a flow chart of a hierarchical storage method of file data according to the present invention;
FIG. 2 is a block diagram of a file system layer of the present invention;
FIG. 3 is a flow chart of the calculation of the migration score of the file according to the present invention.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
It will be understood that when an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
In one embodiment, as shown in fig. 1-3, a method for hierarchically storing file data includes:
and S1, recursively traversing the size and type attributes of all files, and preferentially migrating the files to a file system target layer according to the types of the file attributes.
It should be noted that the file system target layer includes a hot data layer, a data layer, and a cold data layer. Presetting a relation between file attributes and a file system target layer: if the file type is ". Mp4/. Doc/. Jpg" or the file size is greater than 10MB, the file type is preferentially migrated to the hot data layer, and if the file type is ". Mp4/. Doc/. Jpg" or the file size is less than 5MB, the file type is preferentially migrated to the cold data layer (where the preferential migration does not guarantee migration, and migration is specifically performed according to the following migration score, but the file attribute setting affects the adjustment coefficient).
And S2, counting the read-write request information of the files in the last period of time, and calculating the access heat of each file according to the read-write request information.
Specifically, a uniform migration period is set as required.
The read-write request information comprises read-write times and read-write data length range of each file;
and when the access heat is calculated, weighting calculation is carried out on the read-write times and the read-write data length range of each file, when the read-write data length range is calculated, the data offset address and the length of the read-write operation of each file are counted, overlapped data are removed, the total area range of the read-write of the file data in the counting time period is further obtained, finally, the proportion of the read-write range to the whole file size is calculated, and the file with the high proportion is preferentially migrated to the thermal data layer.
S3, comprehensively calculating the migration score of each file according to the access heat, the last migration score and the adjustment coefficient;
specifically, the migration score of each file is obtained by weighting the access heat, the last migration score (if any), and the adjustment coefficient (the default value of the adjustment coefficient is 0, and the adjustment coefficient changes according to the attribute of the file) (as shown in fig. 3, the access heat corresponds to the migration score' of the period in fig. 3, the last migration score corresponds to score (n-1), and the migration score of each file corresponds to score);
the adjusting system alpha adjusts according to the attribute of the file to be migrated set by the user:
when the attribute type of the file to be migrated is set to be migrated to the hot data layer, alpha is more than 0, so that the migration score of the file is increased;
when the attribute type of the file to be migrated is set to be migrated to the cold data layer, alpha is smaller than 0, and then the migration score of the file is reduced.
For example, if the user expects to migrate the files of the type ". Mp4/. Doc/. Jpg" or files with the file size larger than a certain value to the hot data layer, the system bottom layer automatically adjusts the alpha of the files to be larger than 0, and increases the overall migration score of the files, so as to achieve the purpose of preferentially migrating to the hot data layer; otherwise, if the user expects to migrate the files of the type of mp4/, doc/, jpg or the files with the file size smaller than a certain value to the cold data layer, after a migration strategy is set, the system bottom layer automatically adjusts the adjusting system alpha of the files to be smaller than 0, and reduces the overall migration score of the files, so that the overall migration score of the files is lower than the migration threshold of the data layer after the files of the type are sorted, and the files are preferentially migrated to the cold data layer during migration.
And S4, sorting the files according to the migration scores.
Specifically, the files are ordered according to the migration scores from high to low, so that the subsequent migration module preferentially migrates each file to the target data layer according to the scores.
And S5, migrating each file to a corresponding file system target layer according to the ordered migration scores.
Specifically, a hot data layer, a data layer and a cold data layer are sequentially arranged from top to bottom, and migration thresholds of the hot data layer and the data layer are respectively set;
when the migration score of the file to be migrated is larger than the migration threshold of the thermal data layer, migrating the file to the thermal data layer;
when the migration score of the file to be migrated is smaller than the migration threshold of the data layer, migrating the file to the cold data layer;
and when the migration score of the file to be migrated is between the migration threshold of the hot data layer and the migration threshold of the data layer, migrating the file to the data layer.
And meanwhile, cross-layer migration is allowed, so long as the file migration score meets the migration threshold of the hot data layer and the data layer, the file can be directly migrated to the corresponding file system layer, for example, the file can be directly migrated from the cold data layer to the hot data layer.
As shown in fig. 2, hard disk types corresponding to the hot data layer, the data layer and the cold data layer of the file system are SSD, SAS and NL-SAS, respectively.
In another embodiment, the file data hierarchical storage device includes an access module, a statistics module, a migration score calculation module, a ranking module, and a migration module, wherein:
the access module is used for accessing the file system;
specifically, the access module shields different file systems at the back end, provides a unified file system access interface to the outside, and after registering a self-defined file system interface callback function in the operating system, when an application IO accesses the file system, the kernel virtual file system calls a callback method registered by S101, forwards the IO to the access module, and the access module receives service IO and carries out subsequent processing. The access module does not store the data of the file, and the actual file data is still recorded in the real xfs/ext4 file systems of each layer at the back end.
The access module forms an absolute access path of each file through the virtual file system mounting directory and the actual mounting path of the back-end file system, absolute path information of all files is maintained by the access module, and the external part can pass through the access module to access each file system at the back-end, and data reading and writing are performed. In addition, the initial file system layer for data writing may be set according to user requirements. For example, the data may be set to be written first to the hot data layer by default, or may be set to be written first to the data layer or the cold data layer by default.
If cross-layer migration occurs (for example, file is migrated from xfs file system of data layer to ext4 file system of hot data layer), the actual storage path of the back end of file will change (for example, xfs file path is/mnt/xfs/file, file path becomes/mnt/ext 4/file after migration to ext4 file system), but after passing through the access module, the virtual mount directory path of file is unchanged (for example, for external view, the paths of file before and after migration are/xfs/file, no change occurs), and service IO before and after file migration is not perceived. The specific process is shown in fig. 2.
In the access module, after the file is transferred in a cross-layer manner, the access module is responsible for updating and maintaining path information after the file is transferred, and when the file is accessed, the access module automatically forwards IO to a transferred file system layer. After the file data hierarchical storage method is transferred across file system layers, a uniform business IO access path is still maintained, a file path with a link index node redirected to the transferred file path is not required to be created, the occupation of resources is further reduced, the read-write IO amplification is reduced, and the complexity of a system is reduced.
The statistics module is used for counting the read-write request information of the files in the last period of time and calculating the access heat of each file according to the read-write request information;
the migration score calculation module comprehensively calculates the migration score of each file according to the access heat, the last migration score and the adjustment coefficient;
the sorting module sorts the files according to the migration scores;
and the migration module migrates each file to file systems of the hot data layer, the data layer and the cold data layer according to the ordered migration scores.
According to the file data hierarchical storage method, the migration score of each file is comprehensively calculated by adopting the file access heat, the last migration score and the adjustment coefficient, wherein the file access heat is calculated by weighting through the read-write times and the read-write data length range of each file, and the adjustment system is adjusted according to the size and type attribute of the file; when the file data hierarchical storage method is used for transferring across file system layers, the file still maintains a uniform business IO access path, a path for redirecting the link index node to the transferred file is not required to be established, the occupation of resources is further reduced, meanwhile, IO amplification during file reading and writing is reduced, and the complexity of a system is reduced.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above-described embodiments are merely representative of the more specific and detailed embodiments described herein and are not to be construed as limiting the claims. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (7)

1. A file data hierarchical storage method is characterized in that: the file data hierarchical storage method comprises the following steps:
recursively traversing the size and type attributes of all files, and preferentially migrating the files to a file system target layer according to the types of the file attributes;
counting the read-write request information of the files in the last period of time, and calculating the access heat of each file according to the read-write request information;
comprehensively calculating the migration score of each file according to the access heat, the last migration score and the adjustment coefficient;
sorting the files according to the migration scores;
and migrating each file to a corresponding file system target layer according to the ordered migration scores.
2. The hierarchical file data storage method according to claim 1, wherein: the file system target layer includes a hot data layer, a data layer, and a cold data layer.
3. The hierarchical file data storage method according to claim 1, wherein: the read-write request information comprises read-write times and read-write data length ranges of each file;
and when the access heat is calculated, weighting calculation is carried out on the read-write times and the read-write data length range of each file.
4. A file data hierarchical storage method according to claim 3, wherein: and when the length range of the read-write data is calculated, counting the data offset address and the length of the read-write operation of each file, and removing overlapped data.
5. The hierarchical file data storage method according to claim 2, wherein: the adjusting system alpha adjusts according to the attribute of the file to be migrated set by the user:
when the attribute type of the file to be migrated is set to be migrated to the hot data layer, alpha >0;
when the file attribute type to be migrated is set to migrate to the cold data layer, α <0.
6. The hierarchical file data storage method according to claim 2, wherein: migrating each file to a corresponding file system target layer according to the ordered migration scores, including:
when the migration score of the file to be migrated is larger than the migration threshold of the thermal data layer, migrating the file to the thermal data layer;
when the migration score of the file to be migrated is smaller than the migration threshold of the data layer, migrating the file to the cold data layer;
and when the migration score of the file to be migrated is between the migration threshold of the hot data layer and the migration threshold of the data layer, migrating the file to the data layer.
7. A hierarchical storage device for file data, characterized in that: the file data hierarchical storage device comprises an access module, a statistics module, a migration score calculation module, a sequencing module and a migration module, wherein:
the access module is used for accessing the file system;
the statistics module is used for counting the read-write request information of the files in the last period of time and calculating the access heat of each file according to the read-write request information;
the migration score calculation module comprehensively calculates the migration score of each file according to the access heat, the last migration score and the adjustment coefficient;
the sorting module sorts the files according to the migration scores;
and the migration module migrates each file to file systems of the hot data layer, the data layer and the cold data layer according to the ordered migration scores.
CN202310149147.6A 2023-02-22 2023-02-22 File data hierarchical storage method and device Pending CN116303280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310149147.6A CN116303280A (en) 2023-02-22 2023-02-22 File data hierarchical storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310149147.6A CN116303280A (en) 2023-02-22 2023-02-22 File data hierarchical storage method and device

Publications (1)

Publication Number Publication Date
CN116303280A true CN116303280A (en) 2023-06-23

Family

ID=86828026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310149147.6A Pending CN116303280A (en) 2023-02-22 2023-02-22 File data hierarchical storage method and device

Country Status (1)

Country Link
CN (1) CN116303280A (en)

Similar Documents

Publication Publication Date Title
US11347443B2 (en) Multi-tier storage using multiple file sets
EP2735978B1 (en) Storage system and management method used for metadata of cluster file system
US7853770B2 (en) Storage system, data relocation method thereof, and recording medium that records data relocation program
CN107168657B (en) Virtual disk hierarchical cache design method based on distributed block storage
US20020091902A1 (en) File system and data caching method thereof
CN104850358B (en) A kind of magneto-optic electricity mixing storage system and its data acquisition and storage method
CN111427844B (en) Data migration system and method for file hierarchical storage
CN105183839A (en) Hadoop-based storage optimizing method for small file hierachical indexing
CN103399823B (en) The storage means of business datum, equipment and system
CN110795363B (en) Hot page prediction method and page scheduling method of storage medium
EP2765522B1 (en) Method and device for data pre-heating
CN107391398A (en) A kind of management method and system in flash cache area
CN111475507B (en) Key value data indexing method for workload adaptive single-layer LSMT
CN107729558A (en) Method, system, device and the computer-readable storage medium that file system fragmentation arranges
CN111159176A (en) Method and system for storing and reading mass stream data
CN109445685A (en) A kind of User space file system processing method
CN111367469A (en) Layered storage data migration method and system
US20040088474A1 (en) NAND type flash memory disk device and method for detecting the logical address
CN110532200B (en) Memory system based on hybrid memory architecture
CN112947860A (en) Hierarchical storage and scheduling method of distributed data copies
CN113568572B (en) Database parallel ordering connection system based on solid state disk
CN111078143B (en) Hybrid storage method and system for data layout and scheduling based on segment mapping
CN100383721C (en) Isomeric double-system bus objective storage controller
CN116303280A (en) File data hierarchical storage method and device
CN110990340A (en) Big data multi-level storage framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination