CN115904263B

CN115904263B - Data migration method, system, equipment and computer readable storage medium

Info

Publication number: CN115904263B
Application number: CN202310225922.1A
Authority: CN
Inventors: 孙业宽; 王永海
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2023-05-23
Anticipated expiration: 2043-03-10
Also published as: CN115904263A

Abstract

The application discloses a data migration method, a system, equipment and a computer readable storage medium, which are applied to the technical field of storage and solve the technical problem of improving the flexibility of data migration under the condition of ensuring the data recovery speed, and comprise the following steps: receiving an anti-aggregation task; aiming at any 1 file pointed by the anti-aggregation task, acquiring metadata of the file, reading the file in the second storage medium through the metadata, and writing the read file into the first storage medium in an anti-migration aggregation mode; after any file is written into the first storage medium, updating metadata of the file, and deleting the file stored in the second storage medium; the second storage medium stores file data in the form of aggregate files, the first storage medium stores file data in the form of non-aggregate files, and the data access speed of the first storage medium is higher than that of the second storage medium. By applying the scheme, the data migration flexibility is improved under the condition that the data recovery speed is guaranteed.

Description

Data migration method, system, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of storage technologies, and in particular, to a data migration method, system, device, and computer readable storage medium.

Background

In a massive small file scene, a small amount of high-speed storage media, such as a common SSD (Solid State Disk) or NVMe (Non-Volatile Memory Express, nonvolatile memory standard) SSD, are configured for a storage system, and high performance of the system can be realized by using lower hardware cost through file classification.

File classification is based on a data life cycle management theory, and file classification characteristics are required to realize that data flows between various storage media, namely data migration, under the condition that a user does not feel. The data migration can support various strategies of users, and the data meeting the migration requirement is subjected to unaware file data migration processing under the condition that normal service in the system is not influenced.

At present, a commonly adopted storage scheme is data cold and hot layering, namely a high-speed storage medium is adopted to store hot data, a low-speed storage medium is adopted to store cold data, and the storage modes of the data in the high-speed storage medium and the low-speed storage medium are the same.

At present, a migration aggregation scheme is designed for a small part of schemes, namely, a plurality of small file data are actually stored in a new large file, and the large file is directly aggregated by operation during data reading and writing, so that the data recovery speed is improved, and the small files after aggregation can be called as all aggregation small files in the large file. When migration polymerization is performed based on file classification, small files are migrated from a high-speed storage medium to be written into an aggregation large file of a low-speed storage medium during migration, and the small file aggregation is realized in the migration process. However, the current migration aggregation scheme only realizes that small files are migrated from a high-speed storage medium to a low-speed storage medium, and has narrow application scenes and low flexibility.

In summary, how to improve the flexibility of data migration while ensuring the data recovery speed is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a data migration method, a system, equipment and a computer readable storage medium, so that the flexibility of data migration is improved under the condition of ensuring the data recovery speed.

In order to solve the technical problems, the invention provides the following technical scheme:

A method of data migration, comprising:

receiving an anti-aggregation task;

for any 1 file pointed by the anti-aggregation task, acquiring metadata of the file, reading the file in a second storage medium through the metadata, and writing the read file into a first storage medium in an anti-migration aggregation mode;

after writing any one of the files to the first storage medium, updating metadata of the file, and deleting the file stored in the second storage medium;

the second storage medium stores file data in the form of aggregate files, the first storage medium stores file data in the form of non-aggregate files, and the data access speed of the first storage medium is higher than that of the second storage medium.

Preferably, any 1 file pointed by the anti-aggregation task is a file meeting a preset first grading rule;

the first grading rule includes:

when the access times of the file in the first duration is greater than or equal to a preset first threshold, the first grading rule is established.

Preferably, the first grading rule further includes:

The first classification rule is established when a file name of the file matches a first database.

Preferably, the first grading rule further includes:

and when the file size of the file is smaller than a preset first numerical value, the first grading rule is established.

Preferably, the first grading rule further includes:

when the file is a file specified by a first user instruction, the first grading rule is established.

Preferably, after writing any one of the files to the first storage medium, updating metadata of the file includes:

after any one of the files is written to the first storage medium, the location information in the metadata of the file is updated and the aggregate attribute is purged.

Preferably, the method further comprises:

before deleting the file stored in the second storage medium, when a data migration process is in error, the file stored in the second storage medium is retained, and corresponding migration data is deleted.

Preferably, the method further comprises:

receiving a first re-aggregation task;

for any 1 file pointed by the first re-aggregation task, acquiring metadata of the file, reading the file in a second storage medium through the metadata, and writing the file into an aggregation cache of the second storage medium;

Each time the aggregation cache is fully written, writing each file in the aggregation cache into a third storage medium in a form of migration aggregation of the aggregation files again;

updating metadata of any file written into the third storage medium by the aggregation cache, and deleting the file stored in the second storage medium;

the third storage medium stores file data in the form of aggregate files, and the data access speed of the second storage medium is higher than that of the third storage medium.

Preferably, any 1 file pointed by the first re-aggregation task is a file meeting a preset third grading rule;

the third grading rule includes:

when the access times of the file in the first time period are larger than or equal to a preset third threshold value and smaller than a preset second threshold value, the third grading rule is established;

wherein the second threshold is greater than the third threshold.

Preferably, for any file written to the third storage medium by the aggregation cache, updating metadata of the file includes:

and updating the position information in the metadata of any file written into the third storage medium by the aggregation cache, and updating the aggregation attribute.

Preferably, the method further comprises:

the hierarchical migration service in the first storage medium receives a migration aggregation task;

for any 1 file pointed by the migration aggregation task, the hierarchical migration service in the first storage medium acquires metadata of the file, reads the file in the first storage medium through the metadata, and writes the read file into the second storage medium in a migration aggregation mode;

after any one of the files is written to the second storage medium, the hierarchical migration service in the first storage medium updates metadata of the file and deletes the file stored in the first storage medium.

Preferably, any 1 file pointed by the migration aggregation task is a file meeting a preset second classification rule;

the second classification rule includes:

when the access times of the file in the first duration are larger than or equal to a preset second threshold value and smaller than a preset first threshold value, the second classification rule is established;

wherein the second threshold is less than the first threshold.

Preferably, the method further comprises:

the hierarchical migration service in the third storage medium receives the second re-aggregate task;

For any 1 file pointed by the second re-aggregation task, the hierarchical migration service in the third storage medium acquires metadata of the file, reads the file in the third storage medium through the metadata and writes the file into an aggregation cache of the third storage medium;

each time the aggregation cache is full, the hierarchical migration service in the third storage medium writes each file in the aggregation cache into a fourth storage medium in the form of aggregated file migration aggregation again;

updating metadata of any file written into the fourth storage medium by the aggregation cache, and deleting the file stored in the third storage medium;

the data access speed of the third storage medium is higher than that of the fourth storage medium.

Preferably, any 1 file pointed by the second re-aggregation task is a file meeting a preset fourth grading rule;

the fourth classification rule includes:

and when the access times of the file in the first time period are smaller than a preset third threshold value, the fourth grading rule is established.

Preferably, the method further comprises:

the hierarchical migration service in the fourth storage medium receives a third re-aggregate task;

for any 1 file pointed by the third re-aggregation task, the hierarchical migration service in the fourth storage medium acquires metadata of the file, reads the file in the fourth storage medium through the metadata and writes the file into an aggregation cache of the fourth storage medium;

each time the aggregation cache is full, the hierarchical migration service in the fourth storage medium writes each file in the aggregation cache into the third storage medium in the form of aggregated file migration aggregation again;

and updating metadata of any file written into the third storage medium by the aggregation cache, and deleting the file stored in the fourth storage medium.

Preferably, the method further comprises:

the hierarchical migration service in the third storage medium receives a fourth re-aggregation task;

for any 1 file pointed by the fourth re-aggregation task, the hierarchical migration service in the third storage medium acquires metadata of the file, reads the file in the third storage medium through the metadata and writes the file into an aggregation cache of the third storage medium;

Each time the aggregation cache is full, the hierarchical migration service in the third storage medium writes each file in the aggregation cache into the second storage medium in the form of aggregated file migration aggregation again;

and updating metadata of any file written into the second storage medium by the aggregation cache, and deleting the file stored in the third storage medium.

Preferably, the first storage medium is a storage medium composed of an SSD, the second storage medium is a storage medium composed of an SSD and an HDD, the third storage medium is a storage medium composed of an HDD, and the fourth storage medium is a storage medium composed of a tape library and/or an optical disk.

A method of data migration, comprising:

generating an anti-aggregation task and sending the anti-aggregation task to the hierarchical migration service;

for any 1 file pointed by the anti-aggregation task, sending metadata of the file to the hierarchical migration service, so that the hierarchical migration service reads the file in a second storage medium through the metadata, and writing the read file into a first storage medium in an anti-migration aggregation mode;

Updating metadata of the files after the hierarchical migration service writes any one of the files to the first storage medium, and deleting the files stored in the second storage medium by the hierarchical migration service after the updating is completed;

A data migration system, comprising:

the task receiving module is used for receiving the anti-aggregation task;

the anti-migration aggregation module is used for acquiring metadata of any 1 file pointed by the anti-aggregation task, reading the file in the second storage medium through the metadata, and writing the read file into the first storage medium in an anti-migration aggregation mode;

an update deletion module, configured to update metadata of the files after any one of the files is written to the first storage medium, and delete the files stored in the second storage medium;

A data migration apparatus comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the data migration method as described above.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the data migration method as described above.

By applying the technical scheme provided by the embodiment of the invention, the data access speed of the first storage medium is higher than that of the second storage medium, namely the first storage medium is a high-speed storage medium, so that high-speed access of data is realized, and compared with the first storage medium, the second storage medium is a low-speed storage medium, so that storage of a large amount of data is realized under the condition of guaranteeing the cost. The second storage medium stores file data in the form of aggregate files, which is also beneficial to guaranteeing the data recovery speed. And in the solution of the present application, migration of data from the second storage medium to the first storage medium is supported. Specifically, the hierarchical migration service of the second storage medium may receive the anti-aggregation task, and for any 1 file pointed to by the anti-aggregation task, the file needs to be pulled up to the first storage medium, so that metadata of the file may be acquired, and thus, the file in the second storage medium is read through the metadata. Since the first storage medium does not aggregate files, and the second storage medium aggregates files, the read files are written to the first storage medium in the form of anti-migration aggregation. After any file is written to the first storage medium, the metadata of the file needs to be updated and the file stored in the second storage medium is deleted. It can be seen that the present application implements reverse aggregation of files, that is, the read data is stored in the first storage medium that does not perform file aggregation from the second storage medium that performs file aggregation, so that the flexibility of data migration is improved under the condition that the data recovery speed is ensured by the scheme of the present application.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a data migration method according to the present invention;

FIG. 2 is a schematic illustration of an anti-migration polymerization process in one embodiment;

FIG. 3 is a schematic diagram of a migration-repolymerization process in one embodiment;

FIG. 4 is a schematic diagram of a hierarchical structure of a storage medium in one embodiment;

FIG. 5 is a schematic diagram of a data migration system according to the present invention;

FIG. 6 is a schematic diagram of a data migration apparatus according to the present invention;

fig. 7 is a schematic diagram of a computer readable storage medium according to the present invention.

Detailed Description

The core of the invention is to provide a data migration method, which improves the flexibility of data migration under the condition of guaranteeing the data recovery speed.

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a data migration method according to the present invention, where the data migration method may include the following steps:

step S101: an anti-aggregation task is received.

Specifically, the storage system of the scheme of the application can be a distributed file system, wherein the distributed file system is a cluster formed by a plurality of file storage node servers, supports that one data block is respectively stored on a plurality of nodes, supports that the complete data can be obtained by simultaneously reading the data from the plurality of nodes during data access, simultaneously improves the file performance through concurrent access, supports linear expansion, and can recover the complete data according to a configured strategy when the nodes are down, and has the characteristics of high availability, high performance, high expansibility and the like.

The operations of steps S101 to S103 may be performed by the hierarchical migration service of the second storage medium.

Specifically, in practical applications, an MDS (Meta Data Server) of the second storage medium may generally send an anti-aggregation task to a hierarchical migration service of the second storage medium, where the anti-aggregation task may carry 1 or more files, which indicates that the files need to be stored in the first storage medium with a higher speed.

For example, FIG. 2 is a schematic diagram of an anti-migration aggregation process in one embodiment, where in FIG. 2, the MDS sends an anti-aggregation task containing bulk files to a hierarchical migration service of a second storage medium.

It can be understood that the file is added to the anti-aggregation task, and the specific triggering mode can be set and adjusted according to actual needs, that is, the specific rule of the file classification policy can be set and adjusted according to actual needs, for example, in one specific embodiment of the present invention, any 1 file pointed by the anti-aggregation task is a file meeting the preset first classification rule; and the first classification rule may include, for example: when the access times of the file in the first duration is greater than or equal to a preset first threshold, a first grading rule is established.

It can be understood that, when the number of accesses to the file in the first duration is higher, the higher the file should be able to implement high-speed reading and writing, so as to ensure high performance execution of the service, improve product competitiveness, and improve user satisfaction, so when the number of accesses to the file in the first duration is greater than or equal to a preset first threshold, in this embodiment, it is determined that the first classification rule is established, and then the file needs to be added to the anti-aggregation task, so that the file is subsequently migrated from the low-speed second storage medium to the high-speed first storage medium.

Step S102: and aiming at any 1 file pointed by the anti-aggregation task, acquiring metadata of the file, reading the file in the second storage medium through the metadata, and writing the read file into the first storage medium in an anti-migration aggregation mode.

Since the anti-aggregation task may include a batch of files, that is, the anti-aggregation task may point to multiple files, the processing may be performed file by file.

For any 1 file to which the anti-aggregation task is directed, the hierarchical migration service of the second storage medium needs to acquire metadata of the file. Since metadata of a file is typically managed by the MDS, in practical applications, the hierarchical migration service of the second storage medium may obtain metadata of the file from the MDS for any 1 file after traversing the file of the anti-aggregate task.

It should be noted that, in the second storage medium, file data is stored in the form of an aggregate file, that is, a plurality of small file data are actually stored in one large file, for example, a large number of small file data are stored in one large file of 512M after aggregation, and 512M is the size of a single large file in this example. The method has the advantages that the large files are directly operated during data reading and writing, the data recovery time during failure is seriously influenced by considering that massive small files are not aggregated, and after aggregation, large-block data are recovered during data recovery, so that the data recovery speed is higher. The small files after aggregation may be referred to as individual aggregated small files in the large file, and thus, the "acquisition of small file metadata" described in fig. 2, that is, the process of acquiring metadata of a file in step S102, in other words, the small files in fig. 2, are the same concept as the files described in steps S101 to S103, and the small files in the second storage medium may also be referred to as aggregated small files in the large file. Also, the small file in fig. 3 in the later embodiment is the same concept as the file described in steps S101 to S103.

After the metadata of the file is obtained, the file in the second storage medium may be read through the metadata, for example, through information of an aggregate large file in the metadata, offset of the file in the aggregate large file, and the like, and data of an aggregate small file is read from the aggregate large file.

After the file in the second storage medium is read by the metadata, since the file data is stored in the first storage medium in the form of a non-aggregate file, the read file is written in the first storage medium in the form of anti-migration aggregation, that is, the read file is written in the first storage medium, and the file is stored in the first storage medium in the form of a non-aggregate file.

In practical applications, when a file in the second storage medium is read, as described in fig. 2, the hierarchical migration service of the second storage medium may send a read request and may receive a response indicating that the reading is completed. Likewise, when writing a file to the first storage medium, a write data request may be sent and a response indicating that the write is complete may be received.

Step S103: after any one of the files is written to the first storage medium, metadata of the file is updated, and the file stored in the second storage medium is deleted.

After writing any file to the first storage medium, the metadata of the file needs to be updated, typically the location information can be updated.

For example, in one embodiment of the present invention, after writing any one of the files to the first storage medium as described in step S103, updating metadata of the file may specifically include:

after any file is written to the first storage medium, the location information in the metadata of the file is updated and the aggregate attributes are purged.

In this embodiment, after any one of the files is written to the first storage medium, the location information in the metadata of the file can be updated because the storage location of the file is changed. And such an embodiment allows for storing the file data in the second storage medium in the form of aggregate files, while storing the file data in the first storage medium in the form of non-aggregate files, so that it is also necessary to clear the aggregate attributes to avoid errors.

In practical applications, when updating metadata of a file, as described in fig. 2, the hierarchical migration service of the second storage medium may transmit a request to update metadata of the file, and may receive a response indicating that the metadata update is completed. Also, when deleting a file stored in the second storage medium, a deletion request may be transmitted, and a response indicating that deletion is completed may be received.

In addition, in practical applications, when each file pointed to by the anti-aggregation task is migrated to the first storage medium, as shown in fig. 2, the hierarchical migration service of the second storage medium may answer to the MDS that the anti-aggregation task is completed.

In the scheme of the application, the data access speed of the first storage medium is higher than that of the second storage medium, that is, the performance of the first storage medium is higher than that of the second storage medium, and of course, the cost is higher. For example, in practical applications, the first storage medium of the present application may be an SSD or an NVMe SSD, so that high-speed access can be realized for random reading and writing of small files. The second storage medium may be a storage medium composed of an SSD and an HDD, that is, the second storage medium is a hybrid flash memory, which has a lower cost than the first storage medium, so that a storage space configured by the second storage medium may be larger than a storage space configured by the first storage medium, so as to implement mass storage of data.

It should be noted that when hard disks of different performances are included in the storage medium, for example, in the above example, the second storage medium is typically a storage medium composed of an SSD and an HDD, and an average value of access speeds of these hard disks may be used as the data access speed of the second storage medium.

In addition, since the performance of the first storage medium is higher than that of the second storage medium, it is common that the data reading speed of the first storage medium is higher than that of the second storage medium, the data writing speed of the first storage medium is also higher than that of the second storage medium, the data accessing speed of the second storage medium is higher than that of the third storage medium, which will be described later, and the data accessing speed of the third storage medium is higher than that of the fourth storage medium. In some cases, taking the first storage medium as an example, the average value of the data reading speed of the hard disk and the data writing speed of the hard disk may be taken as the data access speed of the hard disk, and then the data access speed of each hard disk in the first storage medium may be taken as the data access speed of the first storage medium, which is the same as the second storage medium, the third storage medium and the fourth storage medium.

As described above, the file is added to the anti-aggregation task, and the first classification rule may be set based on the number of accesses of the file within the first time period, and in this embodiment, it is considered that whether the file needs to be stored in the first storage medium may be determined based on the file name in addition to the number of accesses of the file.

That is, in a specific embodiment of the present invention, the first grading rule may further include:

the first classification rule is established when the file name of the file matches the first database.

In this embodiment, a first database needs to be built in advance, and if the file name of the file matches the first database, it is stated that the file needs to be stored in the first storage medium, so that the first grading rule is established, so that the file may be added to the anti-aggregation task to migrate from the second storage medium to the first storage medium.

In a specific embodiment of the present invention, the first grading rule may further include:

when the file size of the file is smaller than a preset first numerical value, a first grading rule is established.

In this embodiment, when the file size of the file is smaller than the preset first value, it may be determined that the first classification rule is satisfied, so that the file may be added to the anti-aggregation task to migrate from the second storage medium to the high-performance first storage medium.

The first classification rule is established when the file is a file specified by the first user instruction.

In this embodiment, the storage location of the file may also support the specification of the user, so as to further improve the flexibility of the scheme, that is, when the file is the file specified by the first user instruction, it is stated that the user needs to store the file in the high-performance first storage medium, so that it may be determined that the first grading rule is satisfied.

In the foregoing embodiment, various design manners of the first classification rule are described, it may be understood that, in practical application, the various design manners of the first classification rule in the foregoing embodiment may be combined according to actual needs, so that implementation of the present invention is not affected, and in addition, more specific forms of the first classification rule may be designed according to needs, so as to further ensure flexibility of the scheme, and all the implementation of the present invention is not affected.

In one embodiment of the present invention, the method may further include:

before deleting the files stored in the second storage medium, when the data migration process is in error, the files stored in the second storage medium are retained, and the corresponding migration data is deleted.

In this embodiment, it is considered that in the whole process of executing the steps S101 to S103, there may be an error, so, before deleting the file stored in the second storage medium, if the data migration process is error, the file stored in the second storage medium may be retained, and the corresponding migration data may be deleted, so as to ensure data security. Of course, if the data migration process occurs when traction data is not written to a new location, then there is no need to delete the migration data, as there is no migration data for this fashion.

Similarly, in the following embodiments, whether the migration aggregation process of the file or the migration re-aggregation process of the file is performed, if an error occurs, the original file may be retained, and the corresponding migration data may be deleted, which will not be repeated in the following embodiments.

In one embodiment of the present invention, the method may further include:

receiving a first re-aggregation task;

for any 1 file pointed by the first re-aggregation task, acquiring metadata of the file, reading the file in the second storage medium through the metadata, and writing the file into an aggregation cache of the second storage medium;

Each time the aggregation cache is fully written, writing each file in the aggregation cache into a third storage medium in the form of migration aggregation of the aggregation files again;

updating metadata of the files for any file written into the third storage medium from the aggregation cache, and deleting the files stored in the second storage medium;

the data access speed of the second storage medium is higher than that of the third storage medium.

In the foregoing embodiment, the first storage medium with high performance and the second storage medium with low performance are provided, and in this embodiment, it is further considered that a third storage medium with lower performance than the second storage medium, that is, the data access speed of the second storage medium is higher than the data access speed of the third storage medium, and the cost of the third storage medium is lower, and a larger storage space can be designed to implement storage of a large amount of data.

For example, in one embodiment of the present invention, the first storage medium is a storage medium constituted by an SSD, that is, the first storage medium may be constituted by a plurality of SSDs. The second storage medium is a storage medium composed of an SSD and an HDD (Hard Disk Drive), i.e., the first storage medium may be composed of a certain number of SSDs and a certain number of HDDs. The third storage medium is a storage medium constituted by HDDs, that is, the third storage medium may be constituted by a plurality of HDDs.

In this embodiment, in order to secure the data recovery speed, the third storage medium stores file data in the form of an aggregate file. In addition, since the file data is not stored in the form of an aggregate file in the high-performance first storage medium, and the recovery speed of the data is not excessively affected, the file data is stored in the form of a non-aggregate file in the first storage medium of the present application.

In such an embodiment, the first re-aggregate task may be received by a hierarchical migration service of the second storage medium. Since the first re-aggregation task may include a batch of files, i.e., the first re-aggregation task may point to a plurality of files, file-by-file processing may be performed.

For any 1 file pointed to by the first re-aggregation task, the hierarchical migration service of the second storage medium needs to acquire metadata of the file, and as the metadata of the file is generally responsible for management by the MDS, in practical application, the hierarchical migration service of the second storage medium may acquire the metadata of the file from the MDS after traversing the file of the first re-aggregation task for any 1 file.

After the files in the second storage medium are read, the files need to be written into an aggregation cache of the second storage medium so as to facilitate subsequent re-aggregation. The re-aggregation described in the present application refers to storing file data in the form of an aggregate file in the storage medium before and after migration, where the file data is stored in the form of an aggregate file in both the second storage medium and the third storage medium.

In practical applications, when a file in the second storage medium is read, as described in fig. 3, the hierarchical migration service of the second storage medium may send a read request and may receive a response indicating that the reading is completed. Likewise, when writing a file to the third storage medium, a write data request may be sent and a response indicating that the write is complete may be received.

The size of the aggregate cache may be set as desired, for example, in one scenario the aggregate large file is 512M and the aggregate cache is set to 4M.

Each time the aggregate cache is full, it is necessary to write each file in the aggregate cache in the third storage medium in the form of an aggregate file migration aggregate again, i.e., write all the files in the aggregate cache in the third storage medium, and these files are stored in the third storage medium in the form of non-aggregate files.

After writing all files in the aggregate cache to the third storage medium, metadata of the file needs to be updated for any file written to the third storage medium by the aggregate cache, and typically, location information may be updated and the file stored in the second storage medium may be deleted.

In a specific embodiment of the present invention, for any file written to the third storage medium from the aggregation cache, updating metadata of the file may specifically include:

and updating the position information in the metadata of the file for any file written in the third storage medium from the aggregation buffer memory, and updating the aggregation attribute.

In this embodiment, for any file written to the third storage medium from the aggregate cache, the storage location of the file is changed, so that the location information in the metadata of the file can be updated. And this embodiment allows for storing the file data in the second storage medium in the form of an aggregate file, and in the third storage medium in the form of an aggregate file, so that updating the aggregate attributes is also required to avoid errors.

In practical applications, when updating metadata of a file, as described in fig. 3, the hierarchical migration service of the second storage medium may send a request for updating metadata of the file, and may receive a response sent by the MDS indicating that the metadata update is completed. Also, when deleting a file stored in the second storage medium, a deletion request may be transmitted, and a response indicating that deletion is completed may be received.

In addition, in practical applications, when each file pointed to by the first re-aggregation task is migrated to the third storage medium, as shown in fig. 3, the hierarchical migration service of the second storage medium may answer to the MDS that the first re-aggregation task is completed.

The files are added into the first re-aggregation task, and a specific triggering mode can be set and adjusted according to actual needs, that is, specific rules of the file classification strategy can be set and adjusted according to actual needs, for example, in one specific embodiment of the present invention, any 1 file pointed by the first re-aggregation task is a file meeting a preset third classification rule.

The third classification rule includes: when the access times of the file in the first duration is greater than or equal to a preset third threshold value and smaller than a preset second threshold value, a third grading rule is established; wherein the second threshold is greater than the third threshold.

It can be understood that when the number of accesses of the file in the first duration is lower, the file can be indicated to be downloaded to a storage medium with lower performance, so that the occupation of the storage space of the storage medium with high performance is avoided, and the product competitiveness is improved.

The first threshold value described in the above embodiment is denoted as a, the number of accesses of the file in the first duration is denoted as X, and in the above embodiment, when X is equal to or greater than a, it is determined that the first classification rule is established and the file is added to the anti-aggregation task, so that the file is migrated from the low-speed second storage medium to the high-speed first storage medium.

In this embodiment, when b > X.gtoreq.c, it is determined that the third classification rule is satisfied, and the file is added to the first re-aggregation task to migrate the file from the low-speed second storage medium to the lower-speed third storage medium. b is the second threshold, c is the third threshold, b should be greater than c, and it will be appreciated that b is less than a, i.e., the second threshold is less than the first threshold previously described.

It will be further appreciated that referring to the foregoing description, in addition to the frequency of access to the file, the first classification rule may be determined to be valid based on the file name, based on the file size, and by user instructions. Similarly, in practical applications, the third classification rule may be determined based on one or more of these factors, and whether the second classification rule and the fourth classification rule described in other embodiments are true or not, and since the principle is the same as that described above, reference is made to the above description, and the description will not be repeated here.

In one embodiment of the present invention, the method may further include:

aiming at any 1 file pointed by a migration aggregation task, a hierarchical migration service in a first storage medium acquires metadata of the file, reads the file in the first storage medium through the metadata, and writes the read file into a second storage medium in a migration aggregation mode;

after any file is written to the second storage medium, the hierarchical migration service in the first storage medium updates the metadata of the file and deletes the file stored in the first storage medium.

In this embodiment, a process is described in which a hierarchical migration service in a first storage medium performs migration aggregation of files. Specifically, the hierarchical migration service in the first storage medium may receive a migration aggregation task sent by the MDS in the first storage medium.

For any 1 file pointed by the migration aggregation task, the hierarchical migration service in the first storage medium can acquire metadata of the file, and then the file in the first storage medium is read through the metadata.

Since the file data is stored in the second storage medium in the form of an aggregate file, the file data is stored in the first storage medium in the form of a non-aggregate file, the read file is written in the second storage medium in the form of migration aggregation, i.e., the file read in the first storage medium is written in the second storage medium, and after writing, the file is stored in the second storage medium in the form of an aggregate file.

After writing any file to the second storage medium, the hierarchical migration service in the first storage medium needs to update the metadata of the file via the MDS and delete the original file in the first storage medium.

The files are added into the migration aggregation task, the specific triggering mode can be set and adjusted according to actual needs, namely, the specific rules of the file classification strategy can be set and adjusted according to actual needs, for example, in one specific embodiment of the invention, any 1 file pointed by the migration aggregation task is a file meeting the preset second classification rules; the second classification rule includes: when the access times of the file in the first duration are larger than or equal to a preset second threshold value and smaller than a preset first threshold value, the second classification rule is established.

As in the above, when the number of accesses to the file in the first period is lower, the file needs to be downloaded to a storage medium with lower performance, and when the number of accesses to the file in the first period is higher, the file needs to be pulled up to a storage medium with higher performance, so that the product competitiveness is improved.

In this embodiment, when a > X.gtoreq.b, it is determined that the second classification rule is satisfied, and the file is added to the migration aggregation task, so as to migrate the file from the high-speed first storage medium to the low-speed second storage medium. b is the second threshold, c is the first threshold, and a should be greater than b, i.e., the second threshold is less than the first threshold.

It will be further appreciated that in the foregoing embodiments, it is possible to determine whether the first classification rule is established based on the file name, the file size, and the user instruction, in addition to the access frequency of the file. Similarly, in practical applications, it may be determined whether the third classification rule is satisfied based on one or more of these factors, and since the principle is the same as that described above, reference is made to the above description, and the description will not be repeated here.

In one embodiment of the present invention, the method may further include:

aiming at any 1 file pointed by the second re-aggregation task, the hierarchical migration service in the third storage medium acquires metadata of the files, reads the files in the third storage medium through the metadata and writes the files into an aggregation cache of the third storage medium;

each time the aggregation cache is full, the hierarchical migration service in the third storage medium writes each file in the aggregation cache into the fourth storage medium in the form of aggregation file migration aggregation again;

updating metadata of the files for any file written into the fourth storage medium from the aggregation cache, and deleting the files stored in the third storage medium;

In this embodiment, a fourth storage medium of lower speed is also provided, i.e. the data access speed of the third storage medium is higher than the data access speed of the fourth storage medium. File data is stored in the fourth storage medium in the form of an aggregate file, so that the speed of data recovery is ensured.

In this embodiment, a process of migrating files from the third storage medium to the fourth storage medium is described, and since the third storage medium and the fourth storage medium each store file data in the form of an aggregate file, the process is also a migration and re-aggregation process. In other words, the principle of this embodiment is the same as that of the above embodiment in which a file is migrated from the second storage medium to the third storage medium, and thus a description of the process of this embodiment will not be repeated.

The fourth storage medium has lower cost, can design larger storage space and realizes the storage of a large amount of data. For example, in the embodiment of fig. 4, the first storage medium is a storage medium composed of an SSD, the second storage medium is a storage medium composed of an SSD and an HDD, the third storage medium is a storage medium composed of an HDD, and the fourth storage medium is a storage medium composed of a tape library and/or an optical disk. And in fig. 4, different access frequencies of data are represented by hot data, warm data, cold data, and ice data.

In practical applications, only data that is not accessed by backup is usually stored in the fourth storage medium, i.e. the data is accessed very frequently or even not accessed. For example, in one embodiment of the present invention, any 1 file pointed to by the second re-aggregation task is a file that satisfies a preset fourth classification rule;

the fourth classification rule includes: and when the access times of the file in the first time period are smaller than a preset third threshold value, the fourth grading rule is established.

In this embodiment, when c > X, it is determined that the fourth classification rule is satisfied, and the file is added to the second re-aggregation task to migrate the file from the third storage medium to the lowest speed fourth storage medium. c is a third threshold.

In one embodiment of the present invention, the method may further include:

the hierarchical migration service in the fourth storage medium receives the third re-aggregation task;

aiming at any 1 file pointed by the third re-aggregation task, the hierarchical migration service in the fourth storage medium acquires metadata of the files, reads the files in the fourth storage medium through the metadata and writes the files into an aggregation cache of the fourth storage medium;

each time the aggregation cache is full, the hierarchical migration service in the fourth storage medium writes each file in the aggregation cache into the third storage medium in the form of aggregation file migration aggregation again;

And updating metadata of the files for any file written in the third storage medium from the aggregation cache, and deleting the files stored in the fourth storage medium.

In this embodiment, a process of migrating files from the fourth storage medium to the third storage medium is described, and since the third storage medium and the fourth storage medium both store file data in the form of an aggregate file, the process is also a migration and re-aggregation process. In other words, the principle of this embodiment is the same as the description of the above embodiment for migration of a file from the second storage medium to the third storage medium, and the description of migration of a file from the third storage medium to the fourth storage medium, except that the migration direction is different, and therefore, the description of the migration process of this embodiment is not repeated. The different migration directions described herein refer to the first 2 embodiments being downward migration, i.e., migration of files into lower speed storage media, and upward migration, i.e., migration of files into higher speed storage media, in this embodiment.

Because the embodiment can realize the migration and re-aggregation process during upward migration, the comprehensiveness of lifecycle management of the aggregated doclets is further improved.

In one embodiment of the present invention, the method may further include:

aiming at any 1 file pointed by the fourth re-aggregation task, the hierarchical migration service in the third storage medium acquires metadata of the files, reads the files in the third storage medium through the metadata and writes the files into an aggregation cache of the third storage medium;

each time the aggregation cache is full, the hierarchical migration service in the third storage medium writes each file in the aggregation cache into the second storage medium in the form of aggregation file migration aggregation again;

and updating metadata of the files for any file written in the second storage medium from the aggregation cache, and deleting the files stored in the third storage medium.

In this embodiment, a process of migrating files from the third storage medium to the second storage medium is described, and since the third storage medium and the second storage medium both store file data in the form of an aggregate file, the process is also a migration and re-aggregation process. In other words, the principle of this embodiment is the same as the above embodiment in which a file is migrated from the fourth storage medium to the third storage medium, the migration direction is also upward, i.e., the migration of the file into the higher-speed storage medium.

By combining this embodiment with the previous embodiment, full life cycle management of the aggregated doclet can be realized on the basis of setting 4 grades of storage media. In the full life cycle management of aggregated doclets, a migration aggregation process, i.e., a process in which non-aggregated documents become aggregated doclets, a reverse aggregation process, i.e., a process in which aggregated doclets become non-aggregated documents, and a migration re-aggregation process, i.e., a process in which aggregated doclets remain after migration, are covered.

In the above embodiment, the operations of steps S101 to S103 may be performed by a hierarchical migration service that may be performed by the second storage medium. Corresponding to the above method embodiment, the embodiment of the present invention further provides a data migration method, which can be referred to above in a mutually corresponding manner. The data migration method can be applied to metadata services, and comprises the following steps performed by the metadata services of the second storage medium:

step one: generating an anti-aggregation task and sending the anti-aggregation task to the hierarchical migration service;

step two: for any 1 file pointed by the anti-aggregation task, sending metadata of the file to the hierarchical migration service, so that the hierarchical migration service reads the file in the second storage medium through the metadata, and writing the read file into the first storage medium in an anti-migration aggregation mode;

step three: updating metadata of the files after the hierarchical migration service writes any one of the files into the first storage medium, and deleting the files stored in the second storage medium by the hierarchical migration service after the updating is completed;

Since the principle is consistent with the above, the description is not repeated here.

In a specific embodiment of the present invention, any 1 file pointed by the anti-aggregation task is a file satisfying a preset first classification rule;

the first classification rule includes:

when the access times of the file in the first duration is greater than or equal to a preset first threshold, a first grading rule is established.

In a specific embodiment of the present invention, the first grading rule further includes:

In one embodiment of the present invention, after the hierarchical migration service writes any one of the files to the first storage medium, updating metadata of the file includes:

after the hierarchical migration service writes any file to the first storage medium, the location information in the metadata of the file is updated and the aggregate attributes are purged.

In one embodiment of the present invention, the method further comprises:

the hierarchical migration service retains files stored in the second storage medium and deletes corresponding migration data when the data migration process is in error, before deleting files stored in the second storage medium.

In one embodiment of the present invention, the method further comprises:

transmitting the first re-aggregation task to the hierarchical migration service;

for any 1 file pointed by the first re-aggregation task, sending metadata of the file to the hierarchical migration service, so that the hierarchical migration service reads the file in the second storage medium through the metadata and writes the file into an aggregation cache of the second storage medium;

each time the aggregation cache is fully written, the hierarchical migration service writes each file in the aggregation cache into a third storage medium in the form of re-migration aggregation of the aggregation files;

updating metadata of the files for any file written into the third storage medium from the aggregation cache, and deleting the files stored in the second storage medium from the hierarchical migration service;

In a specific embodiment of the present invention, any 1 file pointed by the first re-aggregation task is a file that satisfies a preset third classification rule;

the third classification rule includes:

when the access times of the file in the first duration is greater than or equal to a preset third threshold value and smaller than a preset second threshold value, a third grading rule is established;

wherein the second threshold is greater than the third threshold.

In one embodiment of the present invention, updating metadata of a file for any file written to a third storage medium from an aggregate cache includes:

In one embodiment of the present invention, the method may further include:

the metadata service in the first storage medium sends a migration aggregation task to the hierarchical migration service in the first storage medium to receive the migration aggregation task;

for any 1 file pointed by a migration aggregation task, sending metadata of the file to a hierarchical migration service in a first storage medium, so that the hierarchical migration service in the first storage medium reads the file in the first storage medium through the metadata, and writing the read file in a second storage medium in a migration aggregation mode;

After the hierarchical migration service in the first storage medium writes any file to the second storage medium, the metadata of the file is updated, and the file stored in the first storage medium is deleted by the hierarchical migration service in the first storage medium.

In a specific embodiment of the present invention, any 1 file pointed by the migration aggregation task is a file satisfying a preset second classification rule;

the second classification rule includes:

when the access times of the file in the first duration is greater than or equal to a preset second threshold value and smaller than a preset first threshold value, a second classification rule is established;

wherein the second threshold is less than the first threshold.

In one embodiment of the present invention, the method further comprises:

the metadata service in the third storage medium sends the second re-aggregation task to the hierarchical migration service in the third storage medium;

for any 1 file pointed by the second re-aggregation task, sending metadata of the file to a hierarchical migration service in a third storage medium, so that the hierarchical migration service in the third storage medium reads the file in the third storage medium through the metadata and writes the file into an aggregation cache of the third storage medium;

updating metadata of the files for any file written into the fourth storage medium from the aggregation cache, and deleting the files stored in the third storage medium from the hierarchical migration service in the third storage medium;

In a specific embodiment of the present invention, any 1 file pointed by the second re-aggregation task is a file that satisfies a preset fourth classification rule;

the fourth classification rule includes:

In one embodiment of the present invention, the method further comprises:

the metadata service in the fourth storage medium sends the third re-aggregation task to the hierarchical migration service in the fourth storage medium;

for any 1 file pointed by the third re-aggregation task, sending metadata of the file to a hierarchical migration service in a fourth storage medium, so that the hierarchical migration service in the fourth storage medium reads the file in the fourth storage medium through the metadata and writes the file into an aggregation cache of the fourth storage medium;

and updating metadata of the files for any file written in the third storage medium by the aggregation cache, and deleting the files stored in the fourth storage medium by the hierarchical migration service in the fourth storage medium.

In one embodiment of the present invention, the method further comprises:

the metadata service in the third storage medium sends a fourth re-aggregation task to the hierarchical migration service in the third storage medium;

for any 1 file pointed by the fourth re-aggregation task, sending metadata of the file to a hierarchical migration service in a third storage medium, so that the hierarchical migration service in the third storage medium reads the file in the third storage medium through the metadata and writes the file into an aggregation cache of the third storage medium;

and updating metadata of the files for any file written in the second storage medium by the aggregation cache, and deleting the files stored in the third storage medium by the hierarchical migration service in the third storage medium.

Corresponding to the above method embodiment, the embodiment of the present invention further provides a data migration system, which can be referred to above in a mutually corresponding manner.

Referring to fig. 5, the data migration system may include:

a task receiving module 501, configured to receive an anti-aggregation task;

the anti-migration aggregation module 502 is configured to obtain metadata of the files for any 1 file pointed by the anti-aggregation task, read the files in the second storage medium through the metadata, and write the read files in the first storage medium in an anti-migration aggregation mode;

an update deletion module 503, configured to update metadata of a file after writing any file into a first storage medium, and delete a file stored in a second storage medium;

the first classification rule includes:

In one embodiment of the present invention, the update deletion module 503 updates metadata of a file after writing any file to a first storage medium, including:

In a specific embodiment of the present invention, the apparatus further includes an error reset module configured to:

In one embodiment of the present invention, the method further comprises a repolymerization module:

receiving a first re-aggregation task;

the third classification rule includes:

Wherein the second threshold is greater than the third threshold.

In one embodiment of the present invention, for any file written to the third storage medium from the aggregate cache, updating metadata of the file, and deleting the file stored in the second storage medium, including:

In one embodiment of the present invention, the hierarchical migration service in the first storage medium is further configured to:

The second classification rule includes:

wherein the second threshold is less than the first threshold.

In one embodiment of the present invention, the hierarchical migration service in the third storage medium is further configured to:

the fourth classification rule includes:

In one embodiment of the present invention, the hierarchical migration service in the fourth storage medium is further configured to:

In one embodiment of the present invention, the first storage medium is a storage medium composed of an SSD, the second storage medium is a storage medium composed of an SSD and an HDD, the third storage medium is a storage medium composed of an HDD, and the fourth storage medium is a storage medium composed of a tape library and/or an optical disk.

Corresponding to the above method and system embodiments, the embodiments of the present invention further provide a data migration device and a computer readable storage medium, which may be referred to above in correspondence with each other.

Referring to fig. 6, the data migration apparatus may include:

a memory 601 for storing a computer program;

a processor 602 for executing a computer program to implement the steps of the data migration method as described in any of the above.

Referring to fig. 7, a computer program 71 is stored on the computer readable storage medium 70, which computer program 71, when executed by a processor, implements the steps of the data migration method as in any of the embodiments described above. The computer readable storage medium 70 as described herein includes Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The principles and embodiments of the present invention have been described herein with reference to specific examples, but the description of the examples above is only for aiding in understanding the technical solution of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that the present invention may be modified and practiced without departing from the spirit of the present invention.

Claims

1. A method of data migration, comprising:

Receiving an anti-aggregation task;

the second storage medium stores file data in the form of aggregate files, the first storage medium stores file data in the form of non-aggregate files, and the data access speed of the first storage medium is higher than that of the second storage medium;

any 1 file pointed by the anti-aggregation task is a file meeting a preset first grading rule;

the first grading rule includes:

2. The data migration method of claim 1, wherein the first hierarchical rule further comprises:

3. The data migration method of claim 1, wherein the first hierarchical rule further comprises:

4. The data migration method of claim 1, wherein the first hierarchical rule further comprises:

5. The data migration method of claim 1, wherein updating metadata of the file after writing any of the files to the first storage medium comprises:

6. The data migration method of claim 1, further comprising:

7. The data migration method of claim 1, further comprising:

receiving a first re-aggregation task;

8. The data migration method of claim 7, wherein any 1 file pointed to by the first re-aggregation task is a file satisfying a preset third classification rule;

the third grading rule includes:

wherein the second threshold is greater than the third threshold.

9. The data migration method of claim 7, wherein updating metadata of any file written to the third storage medium by the aggregate cache comprises:

10. The data migration method of claim 1, further comprising:

11. The data migration method according to claim 10, wherein any 1 file pointed to by the migration aggregation task is a file satisfying a preset second classification rule;

the second classification rule includes:

wherein the second threshold is less than the first threshold.

12. The data migration method of any one of claims 1 to 11, further comprising:

13. The data migration method of claim 12, wherein any 1 file pointed to by the second re-aggregation task is a file satisfying a preset fourth classification rule;

the fourth classification rule includes:

14. The data migration method of claim 12, further comprising:

15. The data migration method of claim 12, further comprising:

16. The data migration method according to claim 12, wherein the first storage medium is a storage medium constituted by an SSD, the second storage medium is a storage medium constituted by an SSD and an HDD, the third storage medium is a storage medium constituted by an HDD, and the fourth storage medium is a storage medium constituted by a tape library and/or an optical disk.

17. A method of data migration, comprising:

the first grading rule includes:

18. A data migration system, comprising:

the task receiving module is used for receiving the anti-aggregation task;

the first grading rule includes:

19. A data migration apparatus, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the data migration method of any one of claims 1 to 16.

20. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the data migration method according to any one of claims 1 to 16.