Industrial real-time data hierarchical storage and migration method
Technical Field
The invention relates to the technical field of data access and storage processing, in particular to a hierarchical storage and migration method for industrial real-time data.
Background
With the scale expansion of industrial systems and the continuous development of automated information technologies, the application of massive data of industrial automation systems leads to the sudden increase of concurrent access volume of distributed file systems, and the increase of file read-write pressure inevitably needs to consider system bottlenecks caused by file I/O. Meanwhile, many applications in process control have high requirements on the real-time performance of data. Considering that different storage devices have different performances and costs, and data access has time and space locality, hierarchical storage is needed, so that data which is frequently accessed is prone to be stored in a high-performance device, and data which is not frequently read and written in the last access time is placed in a low-performance device. In addition, considering that the data has a periodic change rule, the heat of data access is changed, a considerable proportion of data in the mass storage system is still, and high-performance storage equipment is limited, so that data migration is performed based on a hierarchical storage technology.
With the rapid development of solid state disks such as SSDs and the like and the popularization and application in various fields, the combination of solid state disks for multilevel storage has become a key point of current and future storage research. Compared with the traditional hard disk, the solid state disk has more obvious advantages and disadvantages, can better optimize the performance and energy consumption of a system, and can be used as a flash disk medium of a multi-level storage medium. However, due to their high price, a compromise in performance, cost and energy consumption is made by taking a combination of factors into account.
The earliest conventional tiered storage applications were primarily used in archive backup environments where access was not particularly frequent. However, considering that the performance differences of the devices are different, if the devices with large performance difference and small performance difference adopt the same trigger condition, it is not favorable for the scalability of the system. The method comprises the steps of realizing unified management of files in the multi-stage storage device, respectively setting a metadata module, a metadata server module and a target data server module, wherein the metadata server module is provided with a system management and file migration decision module, manually acquiring migration candidate files, dividing the files into an upgrade queue and a downgrade queue, and sending a migration instruction to perform migration by a migration scheduling controller. And respectively setting a data service module and a migration execution module for the source data server and the target data server. The main defect of the technology is that there is no system and specific method for judging the file migration trigger point, and the file migration is carried out by artificially proposing the file migration proportion, so that all factors influencing the data value judgment cannot be comprehensively and accurately evaluated.
The mass data hierarchical storage technology is mainly used for placing data on different devices according to different performance values of the storage devices and carrying out data migration at proper time. However, these hierarchical storage methods do not fully mine various metrics for the hierarchical and migration policies (data value determination methods), and since the overall performance of the entire hierarchical storage system is directly determined by the data placement and data migration policies, more complete migration policies and data hierarchical storage placement methods are urgently needed to be proposed.
Disclosure of Invention
The invention aims to provide a method for hierarchical storage and migration of industrial real-time data, which aims to solve the problem that the existing technology for hierarchical storage and migration of mass data does not fully judge the data value to influence the data storage and migration performance.
In order to achieve the above object, the present invention provides a method for hierarchical storage and migration of industrial real-time data, which comprises two parts, namely, hierarchical storage of data and hierarchical migration of data, wherein the hierarchical storage of data comprises the following steps:
i: evaluating the value of the data;
II: placing or migrating the data into an appropriate hierarchy according to its value;
the data hierarchical migration comprises the following steps:
s1: the hierarchical storage system is monitored regularly, when the utilization rate of the storage capacity of the high-priority storage equipment reaches a preset first threshold value, data migration calculation is triggered, and the step S2 is executed;
s2: evaluating the value of each data object in the storage equipment to obtain the value of each data object, and sequencing the corresponding data objects according to the value;
s3: selecting data objects with lower value ordering stored in high-priority storage equipment according to a second threshold value with the preset proportion to form a migration queue, and migrating the data objects in the migration queue to low-priority storage equipment;
s4: comparing the remaining data object addresses with lower value rank stored in the high-priority storage device after the execution of step S3 with the data object address currently held in the cache according to a third threshold value, if any one of the data object addresses is already stored in the cache, migrating the data object to the low-priority storage device, otherwise, storing the memory address of the data object in the cache, and so on, and setting the number of data object addresses in all the compared caches to be NbIf N is presentb≤NhIf the migration operation is stopped, N isb>NhSorting the data object addresses stored in the cache from big to small in sequence according to the values corresponding to the data object addresses, and sequentially removing the data object addresses with the largest value until the number of the residual data object addresses is NhThe migration operation is stopped, wherein NhPresetting an upper limit for the address number of the data object in the cache;
s5: and finding the data object with the maximum value in the current cache, and forming a migration queue by all the data objects with the value larger than the maximum value in the low-priority storage device according to the sequence of the values from high to low and moving the data objects to the high-priority storage device.
Preferably, in S2, a sliding window method is adopted to calculate a weighted average of the values calculated at each time in the sliding window, specifically:
setting a given windowThe width is N, and the values of the data objects calculated for the current N times in the window are respectively V1、V2、…VNThen, the value calculation formula of the current data object is as follows:
preferably, in S2, when the value of the data object is evaluated, the value is calculated according to the following formula:
V=w1T+w2C+w3M+w4CT+w5/S
wherein T is a time factor, C is a number of access users factor, M is a value factor of a data object related to the data object, CT is a contrast factor of different storage devices, S is a size factor of the data object, w1、w2、w3、w4And w5Respectively, the weights of the corresponding factors.
Preferably, the method for obtaining the time factor T includes:
acquiring all access accepting moments after the creation of the data object is started: t is t1、t2...tnN is a positive integer;
calculating the time length T of the interval between each visit1、T2...Tn-1And then:
Ti=ti+1-ti i=1,2,...,n-1
and calculating T:
wherein, αiN-1 is a predetermined set of i-1, 2And satisfyAnd α1≤α2≤...≤αn-1。
Preferably, for any one data object, its associated data object is defined as follows:
setting a time length threshold value as TthAny t0Time of day data object obj1Accessed, then at t0+TthWithin a time interval, the data object obj2Also accessed, the data object obj is considered1And obj2Are associated.
Preferably, the data object obj1The value factor Q of the related data object is obtained as follows:
find and data object obj1Associated set of data objects Φ (obj)1);
Find phi (obj)1) The value of all data objects in;
for and data object obj1The values of all the associated data objects are summed as follows:
Vobjis a value record for the data object obj.
Preferably, the data object is divided into m segments from the time of creation to the current time, and the contrast factors CT of the different storage devices are calculated according to the following formula:
wherein FWiAnd FRiIndicating the read and write frequency of the data object during the ith period of time, βiRepresents a weighted weight of the ith period of time, and β1<β2<...<βm,δrFor read contrast, delta, between two different memory deviceswIs the write contrast between two different storage devices.
Preferably, the read contrast is set to δ for the different memory devices A and BrWrite contrast of deltawThen, there are:
wherein R isA、RBThe speed of continuous reading data on two devices with different performances of A and B is respectively represented, WA、WBSpeed at which data is written for the corresponding duration.
Preferably, the first threshold is 80%, the second threshold is 10%, and the third threshold is 10%.
The invention also provides a hierarchical storage and migration system for industrial real-time data, which comprises:
the hierarchical storage system comprises a plurality of storage devices with different priorities for storing data objects, and also comprises a cache, wherein the cache is used for storing the addresses of the data objects with lower value in the storage devices with high priority, the data objects dynamically change along with the migration process, and when the data are migrated from high to low, if any selected data object address is stored in the cache, the data object is migrated to the storage device with low priority;
the value judgment manager is used for acquiring the data objects in the hierarchical storage system in real time and calculating the value of the data objects;
the data placement plan manager is used for acquiring the value from the value judgment manager, selecting the data objects to be migrated according to the value result to form a migration queue, and forming a data placement plan and a migration strategy;
the migration engine controller is used for acquiring the data placement plan and the migration strategy, sending a migration command to an application server agent and registering the application server agent;
the application server agent is used for registering to the migration engine controller during initialization and receiving the migration command to forward to the corresponding data migration/back-migration module;
and the data migration/migration module is respectively arranged between every two storage devices with different priorities and used for performing data migration or data migration according to the migration command and feeding back the migration result to the migration engine controller.
Preferably, the migration engine controller includes:
the data monitoring module is used for monitoring and recording the updating condition and the value change of the data object, feeding back the updating condition to the value judgment manager, and monitoring the I/O access condition of the system;
and the data management module is used for regularly inquiring the data placement plan manager so as to update data information, send the migration command and receive the migration result.
The industrial real-time data hierarchical storage and migration system and method provided by the invention realize the following technical effects:
(1) the invention adopts an industrial automation system data hierarchical storage architecture, and provides a complete and reasonable complete set of a system of a physical structure and a logical structure for migration.
(2) The invention adopts a data value judgment method aiming at the storage requirement of mass industrial real-time data, the method adopts a value index function, introduces a group of weight parameters to carry out quantitative analysis on the influence factors including time, the number of users asked for, the degree of association with other data, the value of associated data, the I/O access contrast of different storage devices, the size of a data object and the like, and adopts a sliding window method to carry out dynamic and sufficient judgment on the data value, thereby improving the accuracy of data value judgment.
(3) The invention adopts a data dynamic migration strategy, namely, a cache region is added in a traditional data migration mechanism to serve as an undetermined region before migration of a part of data objects with lower value on high-performance equipment, if the value of the data objects is still lower in the second value judgment, a migration event is triggered, and meanwhile, the maximum value of the data objects in the cache region is also used as an upward migration threshold value of the data objects of the low-performance equipment. By adopting the mechanism, the value of the data object on the high-performance equipment can be dynamically evaluated, and the repeated migration of the data object between the high-performance equipment and the low-performance equipment can be effectively inhibited.
Drawings
FIG. 1 is a schematic diagram of an industrial real-time data hierarchical storage and migration system architecture according to the present invention;
fig. 2 is a flowchart of a method for hierarchical storage and migration of industrial real-time data according to the present invention.
Detailed Description
To better illustrate the present invention, a preferred embodiment is described in detail with reference to the accompanying drawings, in which:
the industrial real-time data hierarchical storage and migration system provided by the invention is applied to the current general industrial data storage system, the storage system is used for storing industrial mass real-time data, and the data can be transmitted through an SAN network or an IP network to store the data.
Specifically, as shown in fig. 1, the system for hierarchical storage and migration of industrial real-time data provided in this embodiment includes: the hierarchical storage system 10 (which includes a plurality of storage devices with different priorities, as shown in fig. 1, the hierarchical storage system in this embodiment includes a primary device 11, a secondary device 12, and a tertiary device 13, where the primary device 11 belongs to a higher priority device than the secondary device 12, and the secondary device 12 belongs to a higher priority device than the tertiary device 13), a value determination manager 20, a data placement plan manager 30, a migration engine controller 40, an application server agent 50, and a data migration/migration module 60. In this embodiment, the data migration/migration module 60 includes a first migration/migration module 61 located between the primary device 11 and the secondary device 12, and a second migration/migration module 62 located between the secondary device 12 and the tertiary device 13.
The hierarchical storage system 10 further includes a cache. The cache in this embodiment is a high performance cache, such as a cache. The cache is used for storing addresses of data objects with low value in the storage device with high priority, the data objects dynamically change along with the migration process, and the data objects belong to a pending category. When data migration from high to low occurs, if any one selected data object address is stored in the cache, the data object is migrated to the low-priority storage device.
In this embodiment, the high priority storage device refers to a high performance storage device, for example: a solid state disk; while a low priority storage device refers to a low performance storage device, and is referred to as a relatively high priority storage device, for example: sas hard disks or sata hard disks. Of course, a person skilled in the art can freely select the types of the high-priority storage device and the low-priority storage device according to the data storage requirement, as long as the difference between the data reading and writing performance of the storage devices with different priorities is satisfied, thereby affecting the storage efficiency of different data objects. The method and system of the present invention can thus be applied to any situation where optimization is required based on such differences in storage devices and data objects.
When the industrial real-time data hierarchical storage and migration system works, the value judgment manager 20 acquires data objects in the hierarchical storage system in real time, calculates the values of the data objects, and sends value information to the data placement plan manager 30; after obtaining the value information and analyzing and balancing the quality of the value evaluation method according to the value result, the data placement plan manager 30 selects the data objects to be migrated to form a migration queue, and forms a data placement plan and a migration strategy to provide to the migration engine controller 40; the migration engine controller 40 sends a migration command to the application server agent according to the content thereof after acquiring the data placement plan and the migration policy, and also registers the application server agent 50 at the time of initialization; application server agent 50 receives migration commands from migration engine controller 40 to forward to the corresponding data migration/migration module; the data migration/migration module 60 performs data migration or migration according to the migration command corresponding thereto, and feeds back the migration result to the migration engine controller.
The migration engine controller 40 specifically includes:
the data monitoring module is used for monitoring and recording the updating condition and the value change of the data object, feeding back the updating condition to the value judgment manager, and monitoring the I/O access condition of the system;
and the data management module is used for regularly inquiring the data placement plan manager so as to update data information, send the migration command and receive the migration result.
The invention provides an industrial real-time data hierarchical storage and migration method, which comprises two parts of data hierarchical storage and data hierarchical migration, wherein the data hierarchical storage comprises the following steps:
i: evaluating the value of the data;
II: place or migrate the data into the appropriate hierarchy depending on its value.
Specifically, those skilled in the art perform hierarchical storage of data according to the data value obtained in step I and a current general hierarchical storage architecture, for example, place the data value into appropriate hierarchies, such as the primary device 11, the secondary device 12, and the tertiary device 13, according to the matching degree between the data value and storage devices with different priorities. The storage mode can give consideration to both the performance and the economy of the storage system, and can fully consider the value attribute of the data object.
As shown in fig. 2, the data migration method includes the following steps, and the data migration occurs between two storage devices with different performances. Without loss of generality, the following takes as an example a migration process between the primary device 11 and the secondary device 12, where the primary device 11 is a high-priority storage device:
s1: the hierarchical storage system is monitored regularly, when the utilization rate of the storage capacity of the advanced storage equipment reaches a preset first threshold value, data migration calculation is triggered, and the step S2 is executed;
s2: evaluating the value of each data object in the storage equipment to obtain the value of each data object, and sequencing the corresponding data objects according to the value;
s3: selecting data objects with lower value ordering stored in the high-priority storage device 11 according to a second threshold value with a preset ratio to form a migration queue, and migrating the data objects in the migration queue to the low-priority storage device;
s4: comparing the addresses of the remaining data objects with lower value rank stored in the high-priority storage device after the step S3 is executed with the address of the data object currently held in the cache according to a third threshold, if any address of the data object is already stored in the cache, migrating the data object to the low-priority storage device, otherwise, migrating the data objectThe memory address is stored in the cache, and so on, the number of the data object addresses in the cache after all comparison is finished is set as NbIf N is presentb≤NhIf the migration operation is stopped, N isb>NhSorting the data object addresses stored in the cache from big to small in sequence according to the values corresponding to the data object addresses, and sequentially removing the data object addresses with the largest value until the number of the residual data object addresses is NhThe migration operation is stopped, wherein NhPresetting an upper limit for the address number of the data object in the cache;
s5: and finding the data object with the maximum value in the current cache, and forming a migration queue by all the data objects with the value larger than the maximum value in the low-performance equipment according to the sequence of the values from high to low and moving the data objects to the high-performance equipment.
That is, each time the system migrates, after the data objects in the storage device with high priority are sorted from high value to low value, the data objects occupying the current capacity of the storage device with the percentage of the data objects with the preset second threshold value from the side with the lowest value in the sorting are directly migrated to the storage device with low priority. And considering the remaining data objects again, wherein the percentage of the side with the lowest value in the remaining sorting accounting for the current data volume of the storage device is the third threshold. The considered standard is to see whether the addresses of the data objects are the existing data addresses in the current cache, that is, whether the data objects have been placed in the cache in the previous data migration, if so, it indicates that the data objects have low value, and the data objects corresponding to the addresses of the data objects are required to be adjusted from the high-priority storage device to the low-priority storage device when the data objects are within the third threshold range at least during the second migration. And for the data object which is in the third threshold range and has no corresponding address in the cache, storing the address of the data object into the cache, and taking the data object to be migrated subsequently as reference. And the data object with the maximum value corresponding to the cached address is used as a reference for migrating the data object stored in the low-priority storage device.
Further, in S2, the weighted average of the values calculated at each time in the sliding window is calculated by using a sliding window method, specifically:
setting the width of a given window as N, and respectively setting the values of the data objects calculated for the current N times in the window as V1、V2、…VNThen, the value calculation formula of the current data object is as follows:
and the Vc is the value of the final data object and is transmitted to the data placement plan manager and a subsequent data migration/migration module for reference and use during data dynamic migration.
In S2, when the value of the data object is evaluated, the value is calculated according to the following formula:
V=w1T+w2C+w3M+w4CT+w5/S
wherein T is a time factor, C is a number of access users factor, M is a value factor of a data object related to the data object, CT is a contrast factor of different storage devices, S is a size factor of the data object, w1、w2、w3、w4And w5Respectively, the weights of the corresponding factors. Specifically, the method comprises the following steps:
1) the method for acquiring the time factor T comprises the following steps:
first, all the access accepting times after the creation of the data object is acquired: t is t1、t2...tnN is a positive integer;
then, the time length T of the interval between the power positions is calculated1、T2...Tn-1And then:
Ti=ti+1-ti i=1,2,...,n-1
finally, calculating to obtain a time factor T:
wherein, αiN-1 is a set of predetermined weight values, and satisfies the following conditionsAnd α1≤α2≤...≤αn-1. Since the access time characteristic of the recorded data may change from creation to the current time, the time lengths of the last times are counted into the average value with larger weight, so that the obtained result T is more consistent with the current actual situation.
2) The analysis of the relevance factor between the data objects requires finding all the data objects relevant to the data object. For any data object, the related data object is defined as follows:
setting a time length threshold value as TthAny t0Time of day data object obj1Accessed, then at t0+TthWithin a time interval, the data object obj2Also accessed, the data object obj is considered1And obj2Are associated.
Then the data object obj1The value factor Q of the related data object is obtained as follows:
first, find any data object obj1Associated set of data objects Φ (obj)1);
Second, find Φ (obj)1) The value of all data objects in;
finally, the data object obj is paired with1The values of all the associated data objects are summed as follows:
Vobjis a record of the value of the data object obj in the migration engine controller.
3) The value of the number of access users C is then directly available from the migration engine controller.
4) The I/O access contrast calculation method of different storage devices is as follows, because the read-write speed is different even if the same device is used, the read-write contrast needs to be considered separately:
let the read contrast be δ for different memory devices A and BrWrite contrast of deltawThen, there are:
wherein R isA、RBThe speed of continuous reading data on two devices with different performances of A and B is respectively represented, WA、WBSpeed at which data is written for the corresponding duration.
The I/O access contrast of different storage devices is also related to the I/O access frequency, and the recent access frequency closer to the current time is weighted more heavily into the I/O access contrast calculation. Therefore, in the present embodiment, the data object is divided into m segments from the time interval established to the current time interval, and the contrast factors CT of different storage devices are calculated according to the following formula:
wherein FWiAnd FRiIndicating the read and write frequency of the data object during the ith period of time, βiRepresents a weighted weight of the ith period of time, and β1<β2<...<βm,δrFor read contrast, delta, between two different memory deviceswIs the write contrast between two different storage devices.
5) The value of the data object size factor S may be obtained directly from the migration engine controller.
Preferably, the first threshold value in this embodiment is 80%, the second threshold value is 10%, and the third threshold value is 10%. Of course, in other hierarchical storage systems, the magnitudes of the first threshold, the second threshold, and the third threshold may be set to other suitable values as needed.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to make modifications or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.