CN111984196A - File migration method, device, equipment and readable storage medium - Google Patents

File migration method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN111984196A
CN111984196A CN202010850086.2A CN202010850086A CN111984196A CN 111984196 A CN111984196 A CN 111984196A CN 202010850086 A CN202010850086 A CN 202010850086A CN 111984196 A CN111984196 A CN 111984196A
Authority
CN
China
Prior art keywords
file
migration
invalid
small
aggregated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010850086.2A
Other languages
Chinese (zh)
Other versions
CN111984196B (en
Inventor
孙业宽
孟祥瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010850086.2A priority Critical patent/CN111984196B/en
Publication of CN111984196A publication Critical patent/CN111984196A/en
Application granted granted Critical
Publication of CN111984196B publication Critical patent/CN111984196B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0253Garbage collection, i.e. reclamation of unreferenced memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file migration method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: receiving a file migration task; judging whether the file migration task is an unfinished task or not; if so, determining invalid files from the aggregated large files located in the target storage pool and the small files located in the source storage pool corresponding to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files after aggregation; and carrying out garbage collection on the invalid files. The method can recycle invalid data generated in the polymerization migration process, and can improve the utilization rate of the disk.

Description

File migration method, device, equipment and readable storage medium
Technical Field
The present invention relates to the field of storage technologies, and in particular, to a file migration method, apparatus, device, and readable storage medium.
Background
Since the speed of reading and writing large files on a disk is often significantly higher than that of small files. In order to utilize the characteristic, a large amount of small files are applied to a scene, the small files are not directly landed when being written, but are aggregated into a large file and then landed, so that the times of writing the small files into a disk are effectively reduced, the data writing pressure is reduced, the reading hit rate is improved, and the reading I/O path is shortened.
However, if the file is hung up in the migration aggregation process, the whole migration aggregation process cannot be completed, and thus invalid garbage data is generated. Invalid garbage data occupies a storage space, resulting in a problem of low storage utilization rate.
In summary, how to effectively solve the problems of disk cleaning and the like in the small file migration process is a technical problem that needs to be solved urgently by those skilled in the art at present.
Disclosure of Invention
The invention aims to provide a file migration method, a file migration device, file migration equipment and a readable storage medium, which can effectively recycle garbage data generated by aggregation migration faults.
In order to solve the technical problems, the invention provides the following technical scheme:
a file migration method, comprising:
receiving a file migration task;
judging whether the file migration task is an unfinished task;
if so, determining invalid files from the aggregated large file located in the destination storage pool and the small files located in the source storage pool corresponding to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files respectively after aggregation;
and performing garbage collection on the invalid file.
Preferably, determining an invalid file from the aggregated large file located in the destination storage pool and the small files located in the source storage pools corresponding to the file migration task includes:
acquiring the aggregation attribute of each small file;
and determining the invalid file from the aggregated large file and each small file by using the aggregated attribute.
Preferably, determining the invalid file from the aggregated large file and each of the small files by using the aggregated attribute includes:
judging whether the aggregation attribute changes;
if so, determining the corresponding small file as the invalid file;
if not, determining that the corresponding migration small file stored in the aggregation large file is an invalid file.
Preferably, garbage-collecting the invalid file includes:
if the invalid file is the small file, deleting the small file in the source storage pool;
and if the invalid file is the small migrated file, performing fragment recovery processing on the large aggregated file.
Preferably, the fragment recovery processing is performed on the aggregated large file, and comprises:
acquiring file header information of the aggregated large file;
determining the invalid data proportion in the aggregated large file by using the file header information;
and under the condition that the invalid data proportion is larger than a threshold value, transferring the valid data in the large aggregated file to a target large aggregated file and deleting the large aggregated file.
Preferably, the fragment recovery processing is performed on the aggregated large file, and comprises:
acquiring aggregation attributes corresponding to all files in the large aggregation files;
determining the file with unchanged aggregation attribute as invalid data, and counting the proportion of the invalid data;
and under the condition that the invalid data proportion is larger than a threshold value, transferring the valid data in the large aggregated file to a target large aggregated file and deleting the large aggregated file.
Preferably, the receiving a file migration task includes:
receiving the file migration task sent by a metadata server;
correspondingly, under the condition that the garbage collection of the invalid file is completed, the method further comprises the following steps:
and feeding back cleaning response data to the metadata server.
A file migration apparatus comprising:
the task receiving module is used for receiving a file migration task;
the judging module is used for judging whether the file migration task is an unfinished task;
an invalid file determining module, configured to determine, if the determination result is yes, an invalid file from the aggregated large file located in the destination storage pool and the small files located in the source storage pools, which correspond to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files respectively after aggregation;
and the garbage recycling processing module is used for carrying out garbage recycling on the invalid files.
A file migration apparatus comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the file migration method when executing the computer program.
A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the file migration method described above.
The method provided by the embodiment of the invention is applied to receive the file migration task; judging whether the file migration task is an unfinished task or not; if so, determining invalid files from the aggregated large files located in the target storage pool and the small files located in the source storage pool corresponding to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files after aggregation; and carrying out garbage collection on the invalid files.
It can be understood that, when a file hierarchical migration client fails, a migration task corresponding to the processing cannot be completed, and at this time, if a small file is already written into an aggregate large file, it may happen that the same small file is both in the source storage pool and in the aggregate large file in the destination storage pool. Corresponding to the aggregation migration process, the same file is in both the source storage pool and the target storage pool, and obviously, an invalid file exists. Based on this, in the method, after receiving the file migration task, it is first determined whether the file migration task is an incomplete task, and if the file migration task is an incomplete task, there will be garbage data corresponding to the file migration task. At this time, an invalid file is determined from the aggregated large file in the target storage pool corresponding to the file migration task and each small file located in the source storage pool; the aggregation large file comprises migration small files corresponding to the small files after aggregation. Then, the invalid file is reclaimed. Therefore, invalid data generated in the aggregation migration process can be recycled, and the utilization rate of the disk can be improved.
Accordingly, embodiments of the present invention further provide a file migration apparatus, a device, and a readable storage medium corresponding to the file migration method, which have the above technical effects and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating an implementation of a file migration method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a file migration apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a file migration apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a file migration apparatus in an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For convenience of understanding, the following explanation is made for technical terms involved in the embodiments of the present invention:
a small file, i.e. a file with a small size, for example, a file smaller than 1M may be defined as a small file, or a file smaller than 512K may be defined as a small file.
And aggregating the large files, namely aggregating the small files in batches to obtain the large files. For example, the size of the aggregate large file may be a large file with an upper size limit of 512M.
Backend: the file hierarchical migration client is responsible for receiving a migration request sent by the MDS, executing data migration from the source storage pool to the target storage pool, and executing a related migration aggregation process according to the aggregation conditions;
MDS: the metadata server is used for maintaining metadata information of all files of the file system, organizing small files to be migrated and aggregated at the same time, and sending the small files to the Backend end;
distributed file system: the method comprises the steps that a cluster formed by a plurality of file storage node servers is used for storing files in a block mode, an object is used as a basic unit to support one part of data to be stored on a plurality of nodes, each node can acquire complete data through communication among the nodes, when the nodes are down, the complete data can be recovered according to a configured strategy, and the method has the characteristics of high availability, high performance, high expansibility and the like, wherein each node provides metadata service (MDS) for various access operations of the metadata and balancing service pressure;
grading the files: the file classification is a file migration function, files can be circulated among different pools by configuring corresponding strategies, and aggregation can be executed firstly when a new pool is written in during migration and then written in the new pool by specifying related aggregation strategies
Small file aggregation: the small files in batches are aggregated into a large file, and the speed of reading and writing the large file by the disk is usually obviously higher than that of the small files. In order to utilize the characteristic, a large amount of small files are applied to a scene, the small files are not directly landed when being written, but are merged into a large file and then landed, so that the times of writing the small files into a disk are effectively reduced, the data writing pressure is reduced, the reading hit rate is improved, and the reading I/O path is shortened.
Referring to fig. 1, fig. 1 is a flowchart illustrating a file migration method according to an embodiment of the present invention, where the method includes the following steps:
and S101, receiving a file migration task.
Specifically, after the file hierarchical migration client fails and recovers, the metadata server may send a file migration task to the file hierarchical migration client. That is, the file migration task may be a new task or a task that is not completed due to a failure.
In the embodiment of the invention, the file hierarchical migration client can receive the file migration task sent by the metadata server. Specifically, the metadata server may send the file migration task to the file hierarchical migration client when it is checked that the file hierarchical migration client after the failure recovery has an incomplete migration task.
That is to say, in the embodiment of the present invention, after the MDS sends the migration aggregation task, the MDS considers that the migration aggregation is completed only if the whole migration aggregation process of the file is completed, otherwise, the MDS is in an incomplete state all the time, and when the hierarchical migration aggregation operation is restarted, the incomplete task is started again first, that is, the MDS sends the file migration task to the Backend.
S102, judging whether the file migration task is an unfinished task.
In the file migration task, an identifier of an incomplete file migration task and an identifier of a small file to be migrated may be carried, so that the file hierarchical migration client determines an invalid file corresponding to the incomplete file migration task based on the identifier information.
Whether the file migration task is incomplete can be determined based on the existence of the incomplete file migration task identifier. If so, the operation of step S103 is performed, and if not, the file migration task is processed normally, such as the operation of S105 is performed.
Of course, whether the file migration task is an incomplete task is judged, and whether the file migration task is an incomplete task can be determined through the recorded file migration task record table.
S103, determining invalid files from the aggregated large files located in the target storage pool and the small files located in the source storage pool corresponding to the file migration task.
The aggregation large file comprises migration small files corresponding to the small files after aggregation.
And (4) file aggregation migration, namely aggregating the small files in batches into large files and then performing file migration.
In order to facilitate understanding of the technical solutions provided by the embodiments of the present invention, the following description is made by taking an example of a small file aggregation process. The small file aggregation migration process comprises the following steps:
step 1, an MDS (metadata server) sends a migration aggregation task, and a Backend (file hierarchical migration client) receives a file migration task;
step 2, Backend reads the data of the small file from the source storage pool;
step 3, writing the read data into an aggregation buffer by Backend;
step 4, the backup aggregation cache achieves a lower brushing condition, namely a plurality of small files are written into the disk in the target storage pool when the aggregation cache achieves a certain size;
and 5, after the Backend writing is finished, setting the aggregation attribute of the file, namely aggregating the large file. The offset and the data pool of the aggregation large file are sent to the MDS;
step 6, after the MDS is set, responding to the backup, and after the backup receives the metadata such as the successfully updated aggregation attribute and the like, deleting the data of the small files in the source storage pool;
step 7, responding the MDS after completion, and indicating that the migration polymerization process of the file is completed;
in the migration and aggregation process of the small file, due to the fact that Backend is hung, the whole migration and aggregation process cannot be completed, and therefore invalid junk data is generated, due to the fact that the migration and aggregation process is divided into a plurality of steps, any step may be problematic, and situations which mainly generate the junk data include, but are not limited to, the following two situations:
case 1: the small file is written into the aggregation large file, but the aggregation attribute of the small file is not modified, at this time, the effective data is the small file data in the source storage pool, the small file data in the aggregation large file is garbage data and needs to be removed, otherwise, the small file data continuously occupies space, and the system storage resource is wasted;
case 2: the small file is written into the aggregation large file, but the aggregation attribute of the small file is modified, the valid data is the data in the aggregation large file, the common small file data in the source storage pool is garbage data and needs to be removed, otherwise, the space is continuously occupied, and the system storage resource is wasted.
Based on this, in the embodiment of the present invention, an invalid file may be determined from the aggregate large file located in the destination storage pool and each small file located in the source storage pool corresponding to the file migration task. The invalid file corresponds to invalid data. According to the analysis of the small file aggregation migration process, the invalid file can be a small file in the source storage pool or an aggregation file in the target storage pool.
Specifically, the process of determining an invalid file includes:
step one, acquiring the aggregation attribute of each small file.
And step two, determining invalid files from the aggregated large files and all the small files by using the aggregation attributes.
Specifically, the aggregation attribute of the small file may be obtained from the metadata server. I.e. the aggregated attribute may be stored as metadata. Of course, the aggregation attribute may also be used as the tag information of the small file, and the aggregation attribute of the small file may also be obtained by reading the corresponding tag information.
The aggregation attribute of the small file may specifically include an offset of the small file in the aggregated large file and a data pool where the small file is located.
Wherein the second step may specifically include:
step 2.1, judging whether the aggregation attribute changes;
step 2.2, if yes, determining the corresponding small file as an invalid file;
and 2.3, if not, determining that the corresponding migration small file stored in the aggregation large file is an invalid file.
After the file classification migration client obtains the aggregation attribute of the small files, whether each small file stored in different positions is an invalid file can be judged. Specifically, in the aggregation migration process, if the small file is written into the aggregated large file, the aggregation attribute will change, so that whether the small file is valid can be determined according to whether the aggregation attribute changes.
In the embodiment of the invention, in order to distinguish the small files before and after migration, the small file in the source storage pool before migration is still called as the small file; and the small file in the aggregated large file in the destination storage pool after the migration is called a migration small file. One migration minifile in the destination storage pool corresponds to one minifile in the source storage pool.
Specifically, when the aggregation attribute of the small file is changed, it is determined that the small file in the corresponding source storage pool is invalid, and the small file is an invalid small file. When the aggregation attribute of the small file is not changed, it may be that the process of writing the small file into the aggregated large file is not completed, that is, there may be a case where data content of the corresponding migrated small file in the aggregated large file is missing at this time, so that it may be determined that the migrated small file is invalid.
In the embodiment of the present invention, whether a file is valid is determined for different positions of the same small file. For example, for a small file a, which exists in both the source storage pool (referred to as a1) and the aggregate large file in the destination storage pool (referred to as a2), the judgment of whether the file is valid is to judge which of a1 and a2 is valid, and if a1 is valid, the small file is valid, and a2 is invalid; if a2 is valid, i.e. the migration cookie is valid, then a is invalid. That is, if there are corresponding a1 and a2, there must be one file that is invalid.
And S104, performing garbage collection on the invalid files.
In the embodiment of the invention, the invalid file is subjected to garbage collection, namely, the occupation of the invalid file on a disk is eliminated.
Specifically, for the same small file, if the small file exists in both the source storage pool and the destination storage pool (the small file existing in the destination storage pool is called a migration small file), then it is necessary to have the small file in one storage pool invalid. That is, the invalid file may be a small file in the source storage pool, or a migrated small file stored to the middle aggregate large file for the purpose.
Preferably, for invalid files in different storage positions, different processing modes can be adopted for garbage collection. The concrete conditions comprise:
case 1: if the invalid file is a small file, deleting the small file in the source storage pool;
case 2: and if the invalid file is the migration small file, performing fragment recovery processing on the aggregation large file.
That is, if the small file in the source storage pool is invalid, the small file originally stored in the source storage pool can be deleted directly. If the invalid file is a migration small file in the aggregation large file in the destination storage pool, in order to avoid a large number of disk fragments caused by deleting the invalid migration small file, a fragment recovery processing mode may be adopted to remove the migration small file in the aggregation large file.
It should be noted that, the way of performing the fragment recycling process on the aggregated large file includes, but is not limited to, the following two specific implementation manners:
mode 1: carrying out fragment cleaning based on file header information, and specifically realizing the steps of:
step 2.1.1, acquiring file header information of the aggregated large file;
step 2.1.2, determining the invalid data proportion in the aggregated large file by using the file header information;
and 2.1.3, under the condition that the invalid data proportion is larger than a threshold value, transferring the valid data in the aggregated large file to the target aggregated large file and deleting the aggregated large file.
Generally, a file aggregation migration task may be interrupted by a fault at any time of migration, and therefore, valid data corresponding to a small file to be migrated may be partially in a destination storage pool and partially in a source storage pool. Accordingly, a migrated small file in the aggregate large file will also have some valid data and some invalid data. In order to reduce the number of times of repeated migration, in the embodiment of the present invention, after the invalid portion in the aggregated large file reaches a certain threshold, fragment cleaning may be performed uniformly.
Specifically, the file header information can be used to determine the invalid data proportion in the aggregated large file, and then, under the condition that the invalid data proportion is greater than the threshold value, the valid data in the aggregated large file is migrated to the target aggregated large file, and the aggregated large file is deleted. The threshold value may be set and adjusted according to the requirements of the actual application, and may be set to specific values such as 80%, 75%, and the like, for example. Specifically, the higher the threshold value is, the better the garbage collection effect is, but more data migration times may be brought; the lower the threshold, the more garbage fragments in the aggregate large file, but the less data migration due to garbage collection.
Mode 2: fragment cleaning is carried out based on the aggregation attribute of the small files, and the specific implementation steps comprise:
step 2.2.1, acquiring aggregation attributes corresponding to all files in the aggregated large file;
step 2.2.2, determining the file with unchanged aggregation attribute as invalid data, and counting the proportion of the invalid data;
and 2.2.2, under the condition that the invalid data proportion is larger than the threshold value, transferring the valid data in the aggregated large file to the target aggregated large file and deleting the aggregated large file.
As can be seen from the above, the aggregation attribute may determine whether the same small file is valid in the source storage pool or the destination storage pool, and thus the wireless data in the aggregated large file may be determined based on the aggregation attribute and the invalid data ratio may be statistically obtained. After obtaining the invalid data ratio, the following processing method may specifically refer to the above method 1.
In particular, in the case where the invalid data ratio is equal to or less than the threshold, fragmentation of the aggregate large file may not be necessary.
Preferably, after the data cleaning is completed, the normal file migration process can be resumed. Specifically, in the case of completing garbage collection of the invalid file, the clearing response data is fed back to the metadata server. Therefore, the metadata server can confirm that the data cleaning work is completed and can issue a new file aggregation migration task.
And S105, executing a file migration task.
The method comprises the steps of reading corresponding small files from a source storage pool, aggregating a plurality of small files to obtain an aggregated large file, and migrating the aggregated large file to a target storage pool.
The method provided by the embodiment of the invention is applied to receive the file migration task; judging whether the file migration task is an unfinished task or not; if so, determining invalid files from the aggregated large files located in the target storage pool and the small files located in the source storage pool corresponding to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files after aggregation; and carrying out garbage collection on the invalid files.
It can be understood that, when a file hierarchical migration client fails, a migration task corresponding to the processing cannot be completed, and at this time, if a small file is already written into an aggregate large file, it may happen that the same small file is both in the source storage pool and in the aggregate large file in the destination storage pool. Corresponding to the aggregation migration process, the same file is in both the source storage pool and the target storage pool, and obviously, an invalid file exists. Based on this, in the method, after receiving the file migration task, it is first determined whether the file migration task is an incomplete task, and if the file migration task is an incomplete task, there will be garbage data corresponding to the file migration task. At this time, an invalid file is determined from the aggregated large file in the target storage pool corresponding to the file migration task and each small file located in the source storage pool; the aggregation large file comprises migration small files corresponding to the small files after aggregation. Then, the invalid file is reclaimed. Therefore, invalid data generated in the aggregation migration process can be recycled, and the utilization rate of the disk can be improved.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a file migration apparatus, and the file migration apparatus described below and the file migration method described above may be referred to correspondingly.
Referring to fig. 2, the apparatus includes the following modules:
a task receiving module 101, configured to receive a file migration task;
the judging module 102 is configured to judge whether the file migration task is an unfinished task;
an invalid file determining module 103, configured to determine, if the determination result is yes, an invalid file from the aggregated large file located in the destination storage pool and the small files located in the source storage pools corresponding to the file migration task; the aggregation large file comprises migration small files corresponding to the small files after aggregation;
and the garbage recycling processing module 104 is used for performing garbage recycling on the invalid files.
The device provided by the embodiment of the invention is applied to receive the file migration task; judging whether the file migration task is an unfinished task or not; if so, determining invalid files from the aggregated large files located in the target storage pool and the small files located in the source storage pool corresponding to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files after aggregation; and carrying out garbage collection on the invalid files.
It can be understood that, when a file hierarchical migration client fails, a migration task corresponding to the processing cannot be completed, and at this time, if a small file is already written into an aggregate large file, it may happen that the same small file is both in the source storage pool and in the aggregate large file in the destination storage pool. Corresponding to the aggregation migration process, the same file is in both the source storage pool and the target storage pool, and obviously, an invalid file exists. Based on this, in the method, after receiving the file migration task, it is first determined whether the file migration task is an incomplete task, and if the file migration task is an incomplete task, there will be garbage data corresponding to the file migration task. At this time, an invalid file is determined from the aggregated large file in the target storage pool corresponding to the file migration task and each small file located in the source storage pool; the aggregation large file comprises migration small files corresponding to the small files after aggregation. Then, the invalid file is reclaimed. Therefore, invalid data generated in the aggregation migration process can be recycled, and the utilization rate of the disk can be improved.
In a specific embodiment of the present invention, the invalid file determining module 103 is specifically configured to obtain an aggregation attribute of each small file; and determining invalid files from the aggregated large files and the small files by using the aggregation attributes.
In a specific embodiment of the present invention, the invalid file determining module 103 is specifically configured to determine whether an aggregation attribute changes; if so, determining the corresponding small file as an invalid file; and if not, determining that the corresponding migration small file stored in the aggregation large file is an invalid file.
In an embodiment of the present invention, the garbage collection processing module 104 is specifically configured to delete a small file in the source storage pool if the invalid file is a small file; and if the invalid file is the migration small file, performing fragment recovery processing on the aggregation large file.
In a specific embodiment of the present invention, the garbage collection processing module 104 is specifically configured to obtain file header information of the aggregated large file; determining the invalid data proportion in the aggregated large file by using the file header information; and under the condition that the invalid data proportion is larger than the threshold value, transferring the valid data in the large aggregated file to the target large aggregated file, and deleting the large aggregated file.
In a specific embodiment of the present invention, the garbage collection processing module 104 is specifically configured to obtain aggregation attributes corresponding to each file in the aggregated large file; determining the file with unchanged aggregation attribute as invalid data, and counting the proportion of the invalid data; and under the condition that the invalid data proportion is larger than the threshold value, transferring the valid data in the large aggregated file to the target large aggregated file, and deleting the large aggregated file.
In a specific embodiment of the present invention, the task receiving module 101 is specifically configured to receive a file migration task sent by a metadata server;
correspondingly, the method further comprises the following steps: and the clearing feedback module is used for feeding back clearing response data to the metadata server under the condition of completing garbage collection of the invalid file.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a file migration device, and a file migration device described below and a file migration method described above may be referred to in a corresponding manner.
Referring to fig. 3, the file migration apparatus includes:
a memory 332 for storing a computer program;
a processor 322, configured to implement the steps of the file migration method of the above-described method embodiments when executing the computer program.
Specifically, referring to fig. 4, fig. 4 is a schematic diagram of a specific structure of a file migration device provided in this embodiment, the file migration device may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 322 (e.g., one or more processors) and a memory 332, where the memory 332 stores one or more computer applications 342 or data 344. Memory 332 may be, among other things, transient or persistent storage. The program stored in memory 332 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a data processing device. Still further, the central processor 322 may be configured to communicate with the memory 332 to execute a series of instruction operations in the memory 332 on the file migration device 301.
The file migration apparatus 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341.
The steps in the file migration method described above may be implemented by the structure of the file migration apparatus.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a readable storage medium, and a readable storage medium described below and a file migration method described above may be referred to in correspondence.
A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the file migration method of the above-mentioned method embodiment.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other readable storage media capable of storing program codes.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Claims (10)

1. A method for file migration, comprising:
receiving a file migration task;
judging whether the file migration task is an unfinished task;
if so, determining invalid files from the aggregated large file located in the destination storage pool and the small files located in the source storage pools corresponding to the migration task; the aggregation large file is provided with migration small files corresponding to the small files respectively after aggregation;
and performing garbage collection on the invalid file.
2. The file migration method according to claim 1, wherein determining invalid files from the aggregated large file located in the destination storage pool and the small files located in the source storage pools corresponding to the file migration task comprises:
acquiring the aggregation attribute of each small file;
and determining the invalid file from the aggregated large file and each small file by using the aggregated attribute.
3. The file migration method according to claim 2, wherein determining the invalid file from the aggregated large file and each of the small files by using the aggregated attribute comprises:
judging whether the aggregation attribute changes;
if so, determining the corresponding small file as the invalid file;
if not, determining that the corresponding migration small file stored in the aggregation large file is an invalid file.
4. The file migration method according to claim 1, wherein garbage-collecting the invalid file comprises:
if the invalid file is the small file, deleting the small file in the source storage pool;
and if the invalid file is the small migrated file, performing fragment recovery processing on the large aggregated file.
5. The file migration method according to claim 4, wherein performing fragment recycling processing on the aggregated large file comprises:
acquiring file header information of the aggregated large file;
determining the invalid data proportion in the aggregated large file by using the file header information;
and under the condition that the invalid data proportion is larger than a threshold value, transferring the valid data in the large aggregated file to a target large aggregated file and deleting the large aggregated file.
6. The file migration method according to claim 4, wherein performing fragment recycling processing on the aggregated large file comprises:
acquiring aggregation attributes corresponding to all files in the large aggregation files;
determining the file with unchanged aggregation attribute as invalid data, and counting the proportion of the invalid data;
and under the condition that the invalid data proportion is larger than a threshold value, transferring the valid data in the large aggregated file to a target large aggregated file and deleting the large aggregated file.
7. The file migration method according to any one of claims 1 to 6, wherein the receiving of the file migration task includes:
receiving the file migration task sent by a metadata server;
correspondingly, under the condition that the garbage collection of the invalid file is completed, the method further comprises the following steps:
and feeding back cleaning response data to the metadata server.
8. A file migration apparatus, comprising:
the task receiving module is used for receiving a file migration task;
the judging module is used for judging whether the file migration task is an unfinished task;
an invalid file determining module, configured to determine, if the determination result is yes, an invalid file from the aggregated large file located in the destination storage pool and the small files located in the source storage pools, which correspond to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files respectively after aggregation;
and the garbage recycling processing module is used for carrying out garbage recycling on the invalid files.
9. A file migration apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the file migration method according to any one of claims 1 to 7 when executing said computer program.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the file migration method according to any one of claims 1 to 7.
CN202010850086.2A 2020-08-21 2020-08-21 File migration method, device, equipment and readable storage medium Active CN111984196B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010850086.2A CN111984196B (en) 2020-08-21 2020-08-21 File migration method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010850086.2A CN111984196B (en) 2020-08-21 2020-08-21 File migration method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111984196A true CN111984196A (en) 2020-11-24
CN111984196B CN111984196B (en) 2022-08-19

Family

ID=73443100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010850086.2A Active CN111984196B (en) 2020-08-21 2020-08-21 File migration method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111984196B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631991A (en) * 2020-12-26 2021-04-09 中国农业银行股份有限公司 File migration method and device
CN113704027A (en) * 2021-10-29 2021-11-26 苏州浪潮智能科技有限公司 File aggregation compatible method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107643880A (en) * 2017-09-27 2018-01-30 郑州云海信息技术有限公司 The method and device of file data migration based on distributed file system
CN109471836A (en) * 2018-11-01 2019-03-15 浪潮电子信息产业股份有限公司 Data migration method, device and system
CN111176571A (en) * 2019-12-27 2020-05-19 浪潮电子信息产业股份有限公司 Method, device, equipment and medium for managing local object

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107643880A (en) * 2017-09-27 2018-01-30 郑州云海信息技术有限公司 The method and device of file data migration based on distributed file system
CN109471836A (en) * 2018-11-01 2019-03-15 浪潮电子信息产业股份有限公司 Data migration method, device and system
CN111176571A (en) * 2019-12-27 2020-05-19 浪潮电子信息产业股份有限公司 Method, device, equipment and medium for managing local object

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631991A (en) * 2020-12-26 2021-04-09 中国农业银行股份有限公司 File migration method and device
CN113704027A (en) * 2021-10-29 2021-11-26 苏州浪潮智能科技有限公司 File aggregation compatible method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111984196B (en) 2022-08-19

Similar Documents

Publication Publication Date Title
CN110569233A (en) Hot data management method, device and system
US10970190B2 (en) Hybrid log viewer with thin memory usage
CN108073352B (en) Virtual disk processing method and device
CN111984196B (en) File migration method, device, equipment and readable storage medium
CN107566470B (en) Method and device for managing virtual machine in cloud data system
CN113885809B (en) Data management system and method
CN112269763A (en) File aggregation method and related device
CN108306780B (en) Cloud environment-based virtual machine communication quality self-optimization system and method
CN111580932B (en) Virtual machine disk online migration redundancy removal method
CN111625506A (en) Distributed data deleting method, device and equipment based on deleting queue
CN111459913A (en) Capacity expansion method and device of distributed database and electronic equipment
CN110119389B (en) Writing operation method of virtual machine block equipment, snapshot creation method and device
CN115599295A (en) Node capacity expansion method and device of storage system
CN115543222A (en) Storage optimization method, system, equipment and readable storage medium
CN115309336A (en) Data writing method, cache information updating method and related device
CN111581157B (en) Object storage platform, object operation method, device and server
CN115904211A (en) Storage system, data processing method and related equipment
CN114625474A (en) Container migration method and device, electronic equipment and storage medium
CN114722261A (en) Resource processing method and device, electronic equipment and storage medium
CN105760431A (en) Method and device for transferring file blocks
CN114281246B (en) Cloud hard disk online migration method, device and equipment based on cloud management platform
CN117614973B (en) File storage method based on multi-cloud architecture
KR102599116B1 (en) Data input and output method using storage node based key-value srotre
CN113254415B (en) Method and device for processing read request of distributed file system
CN118605795A (en) Distributed data processing method, apparatus, device, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant