Background
As data grows explosively, clusters that store large amounts of data come into existence. Because the amount of data stored in the cluster is large, some have reached PB levels, and there may be a large number of data accesses at the same time. Therefore, whether the storage location of the data is reasonable directly affects the access delay of the data.
Data migration refers to the movement of data between different storage media. To complete the data migration, the system needs to consume certain resources (including hardware resources, bandwidth, etc.).
In a cluster for storing mass data, the situation of data migration is needed, such as upgrading and updating of a system, backup/recovery of data, dynamic adjustment of data in a hierarchical storage system, and the like. Wherein the upgrading of the system is large-scale and disposable; the backup/restore of data is based on security considerations; typically, these two types of migration occur only infrequently and, in many cases, must be completed at a given time.
The data in the hierarchical storage system is dynamically adjusted for the purpose of rationalizing the location of the data. Hierarchical storage is based on the 'two eight principle' of data liveness: i.e. 20% of the data is active and 80% of the data is inactive, the server is divided into different storage levels according to the hardware performance, and the higher the storage level is, the better the performance is. The hierarchical storage system aims to store active data on a high-level storage, but the activity of the data changes along with the change of time, so even if data configuration at a certain time is reasonable, the configuration is untimely due to the change of the activity of the data after a period of time, and therefore, data migration is needed to achieve the aim of reasonable configuration of the data.
However, the migration needs to consume resources, if the number of times of migration is too small, the advantage of the hierarchical storage system cannot be fully exerted, and if the number of times of migration is too large, the system will have too many resources for internal consumption, which may reduce the quality of service of the system to other services. Therefore, the timing of migration is important to ensure that the advantages of the hierarchical storage system are exerted and that excessive resources are not used for internal consumption.
Currently, there are two methods for determining migration timing. One is a fixed period method, namely, the hierarchical storage system performs data migration after a certain time, and does not consider other situations; the other method is a residual space monitoring method, namely, the residual space of the primary storage is monitored, and if the residual space is insufficient, the migration is started.
However, both of these methods have disadvantages: if the period set by the fixed period method is too long, the advantage of hierarchical storage is difficult to exert; if it is too short, the system resources are frequently used for internal consumption. The load of the cluster is in dynamic change, different applications have different data access conditions, and a reasonable period which is suitable for all application scenes does not exist, that is, the fixed period method is difficult to adapt to the dynamic change of the system load.
The residual space monitoring method can work under the condition that a large amount of data are written in the cluster, but in many cases, the cluster does not have data writing any more and has frequent data reading, but the activity of the data changes continuously. In this case, the hierarchical storage system cannot perform data migration, cannot perform reasonable data configuration, and cannot perform its intended function.
Therefore, a method for determining a migration time needs to be rationalized, on one hand, the method can adapt to dynamic changes of system loads, and on the other hand, data can be stored in a proper node, so that the efficiency of the system in processing active data is improved, the access performance of the data is finally improved, and the overall access delay is reduced.
Disclosure of Invention
The embodiment of the invention provides a method and a device for determining a data migration time, aiming at solving the problem that the proper data migration time cannot be dynamically selected currently, so that the advantages of a hierarchical storage system can be ensured to be exerted, and excessive resources cannot be used for internal consumption.
Therefore, the embodiment of the invention provides the following technical scheme:
an apparatus for determining data migration opportunities, comprising:
the monitoring module is connected with the judging module and used for starting double threads and respectively monitoring the space utilization rate and the data migration period of primary storage in the hierarchical storage system;
the judging module is respectively connected with the monitoring module and the data migration module and is used for periodically judging whether the data migration condition is met or not through the double threads;
a data migration module for performing the data migration;
and the adjusting module is connected with the data migration module and used for adjusting the data migration period.
The embodiment of the invention also provides a method for determining the data migration opportunity, which comprises the following steps:
a: starting double threads, and respectively monitoring the space utilization rate and the data migration period of primary storage in the hierarchical storage system;
b: the double threads periodically judge whether the data migration condition is met, if not, the double threads wait for the same time and then judge until the data migration condition is met, and if yes, the step C is executed;
c: performing the data migration;
d: adjusting the data migration period according to the migration condition;
e: after the data migration period is adjusted, the double-thread reads parameters again and continues to monitor the next round.
Compared with the prior art, the embodiment of the invention has the following advantages:
the embodiment of the invention respectively monitors the space utilization rate and the data migration period of primary storage in the hierarchical storage system by starting double threads, regularly judges whether the data migration condition is met, and adjusts the data migration period according to the reason of meeting the migration condition. The selected data migration opportunity is ensured, so that not only can the space of primary storage in the hierarchical storage system be fully utilized, but also the data position can be timely adjusted by the system, and the access performance of the system can be optimal.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It is to be understood that the present invention has been described in connection with only some embodiments thereof, and not all embodiments thereof. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Fig. 1 is a flowchart of a method for determining a data migration opportunity according to a first embodiment of the present invention, and for convenience of description, only the portions related to the embodiment of the present invention are shown.
As shown in fig. 1, the method comprises the steps of:
step 101, starting double threads, and respectively monitoring the space utilization rate and the data migration period of primary storage in the hierarchical storage system.
And 102, periodically judging whether the data migration condition is met by the double threads, if not, waiting for the same time and then judging until the data migration condition is met, and if so, executing a step 103.
Specifically, the data migration condition includes that the space utilization rate of the primary storage in the hierarchical storage system exceeds a set threshold or a data migration period arrives.
Specifically, when the utilization rate of the space of the primary storage in the hierarchical storage system is judged not to exceed the set threshold value and the data migration period is not reached, the data migration is not triggered, and the judgment is performed after the same time interval. And triggering migration when the space utilization rate of the primary storage in the hierarchical storage system is judged to exceed a set threshold value or a data migration period is reached.
Step 103, executing the data migration.
Preferably, if migration is triggered, the system records which background thread triggered migration is triggered, and selects a good migration object to complete migration.
And 104, adjusting the data migration period according to the migration condition.
Specifically, when the reason for triggering migration is that the space utilization rate of the primary storage in the hierarchical storage system exceeds a set threshold, that is: migration caused by insufficient primary storage residual space shows that a large amount of data are written at this time, the data position needs to be adjusted in time, and the data migration period is shortened to serve as a new data migration period.
Specifically, migration is initiated when the migration period time is consumed, but no actual data movement occurs, which indicates that the system has no significant read-write operation in the period, and the migration period is extended to serve as a new data migration period.
Specifically, if there is a normal data movement due to migration caused by depletion of a migration period, the original migration period is not changed.
Preferably, an upper limit value and a lower limit value are set for the data migration period. And when the result of shortening the data migration period is lower than the lower limit value, taking the lower limit value as a new data migration period, or when the result of prolonging the data migration period is higher than the upper limit value, taking the upper limit value as a new data migration period.
And 105, after the data migration period is adjusted, re-reading the parameters by the double threads, and continuing to monitor the next round.
Preferably, after the data migration period is adjusted, the double-thread rereads the updated data migration period parameter, and rerun whether the data migration condition is reached.
Based on the same concept, the second embodiment of the present invention further provides an apparatus for determining a data migration opportunity, as shown in fig. 2, the apparatus includes a monitoring module 201, a determining module 202, a data migration module 203, and an adjusting module 204.
The monitoring module 201 is connected to the determining module 202, and is configured to start a dual thread, and monitor a space utilization rate and a data migration period of a primary storage in the hierarchical storage system, respectively.
The determining module 202 is connected to the monitoring module 201 and the data migration module 203, respectively, and is configured to periodically determine whether the data migration condition is met through the dual threads.
Specifically, the data migration condition includes that the space utilization rate of the primary storage in the hierarchical storage system exceeds a set threshold or a data migration period arrives.
Specifically, when the utilization rate of the space of the primary storage in the hierarchical storage system is judged not to exceed the set threshold value and the data migration period is not reached, the data migration is not triggered, and the judgment is performed after the same time interval. And triggering migration when the space utilization rate of the primary storage in the hierarchical storage system is judged to exceed a set threshold value or a data migration period is reached.
The data migration module 203 is configured to perform the data migration.
Preferably, if migration is triggered, the system records which background thread triggered migration is triggered, and selects a good migration object to complete migration.
The adjusting module 204 is connected to the data migration module 203, and is configured to adjust the data migration period.
Specifically, when the reason for triggering migration is that the space utilization rate of the primary storage in the hierarchical storage system exceeds a set threshold, that is: migration caused by insufficient primary storage residual space shows that a large amount of data are written at this time, the data position needs to be adjusted in time, and the data migration period is shortened to serve as a new data migration period.
Specifically, migration is initiated when the migration period time is consumed, but no actual data movement occurs, which indicates that the system has no significant read-write operation in the period, and the migration period is extended to serve as a new data migration period.
Specifically, if there is a normal data movement due to migration caused by depletion of a migration period, the original migration period is not changed.
Preferably, an upper limit value and a lower limit value are set for the data migration period. And when the result of shortening the data migration period is lower than the lower limit value, taking the lower limit value as a new data migration period, or when the result of prolonging the data migration period is higher than the upper limit value, taking the upper limit value as a new data migration period.
Preferably, the adjusting module 204 is connected to the monitoring module 201, and is configured to, after the parameter is re-read by the dual threads, start the dual threads to monitor the space utilization and the data migration period of any level of storage in the hierarchical storage system, respectively, that is, return to step 101 in the first embodiment.
Preferably, after the data migration period is adjusted, the double-thread rereads the updated data migration period parameter, and rerun whether the data migration condition is reached.
The embodiment of the invention provides a method and a device for determining data migration time, aiming at solving the problem that the proper data migration time cannot be dynamically selected at present, not only ensuring that the advantages of a hierarchical storage system can be exerted, but also not causing the problem of excessive resource consumption, not only fully utilizing the space of primary storage in the hierarchical storage system, but also ensuring that the system can timely adjust the data position, and ensuring that the access performance of the system is optimal.
Those skilled in the art will appreciate that the modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, and may be correspondingly changed in one or more devices different from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for enabling a terminal device (which may be a mobile phone, a personal computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.