CN106775475A

CN106775475A - Data hierarchy storage method and system

Info

Publication number: CN106775475A
Application number: CN201611173634.2A
Authority: CN
Inventors: 刘鹏; 孙红涛; 慕世勋
Original assignee: Hangzhou Star Technology Co Ltd
Current assignee: Hangzhou Star Technology Co Ltd
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2017-05-31

Abstract

The present invention provides a kind of data hierarchy storage method and system, and the data hierarchy storage method includes：Read operation in response to applications is marked to the data resource being read, and treats to calculate the mark value of marked data resource after the completion of read operation；Be assigned to the data resource in corresponding accumulation layer by the size of the mark value according to each data resource.

Description

Data hierarchy storage method and system

Technical field

The present invention relates to the communications field, and more particularly to a kind of data hierarchy storage method and system.

Background technology

With cloud computing and the development of containerization technique, container technique has been widely applied to every field, and container The core of technology is mirror image, and the storage of mirror image plays very crucial effect for the normal operation of business.Existing container mirror As data storage technology uses " static state " memory technology in magnetic disk media management, i.e., divided in fixed disk array Volume, then data are stored, so that many users have found after demixing technology is implemented in deployment, overall IOPS (Input/ Output Operations Per Second) performance is still suitable with conventional architectures, i.e., and disk handling capacity is not obtained greatly Width is lifted.

Meanwhile, existing container mirror image data memory technology considers from the bottom of storage system, the storage of application data Cost is optimized so that needing the data of access performance high can be stored in readwrite performance magnetic disk media high, and magnanimity is low The data of access frequency are stored in the slow-speed of revolution disk of inexpensive Large Copacity.But the storage performance type for data is distinguished Not, it is necessary to using artificial mode, by artificial distinguishing after, then store data into corresponding magnetic disk media.Existing appearance Device mirror image data storage system Organization Chart is as shown in Figure 1.Mark in Fig. 1 is specially：Internet 1000, keeper 2000, friendship Change planes 3000, disk array group 4000.From figure 1 it appears that when data are read and write, judgement data that can be artificial are cold numbers It is according to (the low data of rate of people logging in) or dsc data (rate of people logging in data high) and then different types of data storage is different in property In disk, user friendly next time reads, and can so accelerate the reading speed of data, ensures that some business functions can be efficient Operation.

Needed in existing method it is artificial data are judged, when occur substantial amounts of read-write operation when artificial decision-making It is more difficult, and to storage device build-up of pressure, it is impossible to timely respond to.It is artificial to cold data and dsc data when the factor that considers compared with For single, it is interactional relation not account for data with data, and the differentiation of data has inaccuracy, cold when needing to read During data, it is impossible to which so as to cause upper line service to be not normally functioning, user perceives poor quick response demand.More than generation During problem, passive troubleshooting can only be carried out by keeper, database or storage device are modified.When serious, lead Storage system is caused to delay machine, it is necessary to restart, this is unacceptable for core system.Additionally, manual operation needs Substantial amounts of human cost is put into, the waste of resource is caused.

Further, existing container mirror image data memory technology application data volume (LUN) and disk array (RAID) group Mapping relations are formed, the performance of disk can influence the data access performance of the book in RAID groups, become bottleneck problem, And book, once creating, all data RAID frameworks for belonging to the volume are fixed, it is impossible to changed.

The content of the invention

The present invention in order to overcome the available data memory technology can not intelligently and exactly to distinguish the cold and hot problem of data, There is provided a kind of automatically and accurately the cold and hot degree of detection data and can carry out the data of AUTOMATIC ZONING storage according to the cold and hot degree of data Bedding storage method and system.

To achieve these goals, the present invention provides a kind of data hierarchy storage method, and the method includes：

Read operation in response to applications is marked to the data resource being read, and treats to be calculated after the completion of read operation The mark value of the data resource of mark；

Be assigned to the data resource in corresponding accumulation layer by the size of the mark value according to each data resource.

In one embodiment of the invention, it is based on when the mark value of a certain data resource is calculated common with it in adjacent time The mark value of the same data resource being read and the data acquisition system being read jointly are calculated.

In one embodiment of the invention, the mark value of data resource is calculated using below equation：

Wherein, DR (i) represents i-th mark value of data resource, and N (j) was represented before current time, j-th data Resource has the number of times being read altogether, and B (j) represents the data acquisition system being read while i is read.

In one embodiment of the invention, by each number obtained by calculating after the mark value for calculating marked data resource It is stored in tables of data according to the mark value of resource.

In one embodiment of the invention, the accumulation layer includes performance driver and capacity driver, is stored into performance drive The mark value of the data resource of dynamic device is more than the mark value of the data resource being stored into capacity driver.

In one embodiment of the invention, the data hierarchy storage method also includes：

In response to the mark value of the data resource of the write operation initialization write-in of applications；

Data resource is distributed to corresponding accumulation layer after the mark value is stored into tables of data.

In one embodiment of the invention, rower is entered to the data resource being read in the read operation in response to applications Note, before treating the step for calculating the mark value of marked data resource after the completion of read operation, the data hierarchy storage side Method also includes：

Detect the operation of applications；

The class of operation of the applications that judgement is detected.

Corresponding, the present invention also provides a kind of data hierarchy storage system, including computing module and distribute module.Calculate Module is marked in response to the read operation of applications to the data resource being read, and treats to calculate marked after the completion of read operation Data resource mark value.The data resource is assigned to phase by distribute module according to the size of the mark value of each data resource In the accumulation layer answered.

In one embodiment of the invention, computing module is based in adjacent time when the mark value of a certain data resource is calculated The mark value of the data resource being inside read together and the data acquisition system being read jointly are calculated.

In one embodiment of the invention, data hierarchy storage system also includes detection module and judge module.Detection module Detect the operation of applications.Judge module judges the class of operation of the applications detected by detection module.

In sum, the present invention provide data hierarchy storage method and system compared with prior art, with following excellent Point：

By the data resource being read is marked and to mark after data resource calculate its mark value come in real time The cold and hot degree of dynamic analyze data resource, realizes the automatic decision of the cold and hot degree of data resource, and according to the cold of the data resource Temperature is distributed dsc data to the performance driving device of high speed, and the cold data of magnanimity is distributed to the capacity driver of low speed, from And the hit rate of data access and the resource utilization of storage system are improved, can be carried out very well for substantial amounts of read-write problem Reply, ensures the normal operation of operation system.Further, number is considered when the mark value of marked data resource is calculated According to the relations problems between data, the response speed of data is substantially increased.

It is that above and other objects of the present invention, feature and advantage can be become apparent, preferred embodiment cited below particularly, And coordinate accompanying drawing, it is described in detail below.

Brief description of the drawings

Fig. 1 show existing container mirror image data storage system Organization Chart.

Fig. 2 show the flow chart of the data hierarchy storage method of the offer of the embodiment of the present invention one.

Fig. 3 show the framework map of the data hierarchy storage of present invention offer.

Fig. 4 show the frame diagram of the data hierarchy storage system of the offer of the embodiment of the present invention one.

Fig. 5 show the flow chart of the data hierarchy storage method of the offer of the embodiment of the present invention two.

Fig. 6 show the frame diagram of the data hierarchy storage system that the embodiment of the present invention two and embodiment three are provided.

Fig. 7 show the flow chart of the data hierarchy storage method of the offer of the embodiment of the present invention three.

Specific embodiment

Embodiment one

Used as a big optimisation strategy of information technology, Bedding storage technology has become matching somebody with somebody substantially for main flow storage product Put.Bedding storage can improve the flexibility of storage demand, optimize data management, and reduce TCO.Existing container mirror In as storage, judged using the cold and hot degree artificially to data resource, stored not data resource according to the result for judging In connatural accumulation layer.The artificial judgement of the cold and hot degree of data resource do not only exist judge speed cannot meet read-write demand and Also there is a problem of being out of one's reckoning.In view of this, the invention provides a kind of automatic accurate cold and hot degree of identification data resource of energy Data hierarchy storage method and system.

The data hierarchy storage method that the present embodiment is provided includes：In response to applications read operation to the number that is read It is marked according to resource, treats to calculate the mark value (step S1) of marked data resource after the completion of read operation.According to each number The data resource is assigned in corresponding accumulation layer (step S2) according to the size of the mark value of resource.

The method starts from step S1, and system is that each data resource associates a flag parameters in the step, is designated as DataRank (abbreviation DR).When applications read the data resource, system is by changing DR values (mark value) come to the number It is marked according to resource, is often read once, DR values are constantly applied, the process is completed until the read operation.DR values are characterized The frequency (i.e. importance degree) that this data resource is read, DR values show that more greatly the data resource is heavier in current this period Will, the number of times that it is read is more, and during this period of time the data resource is dsc data.Opposite, within current this period The DR value very littles of a certain data resource, then show that during this period of time the data resource is unessential data resource, and it is read Seldom, during this period of time the data are cold data to the number of times for taking.

However, in actual applications, due to there is the relation of dependence between application and application, between data and data, Applications read wall scroll data and tend not to solve problem.Usual applications can simultaneously or successively read a plurality of data resource To meet the demand of application.Therefore, can be deposited if only considering the DR values of its own when the cold and hot degree to data resource judges In the deviation for judging.In digital independent, typically will be considered that the data resource that is read together with significant data is general also more It is important, it is theoretical based on this, when calculating the DR values of a certain data resource based on being read together in adjacent time The DR values of data resource and data acquisition system B (j) being read jointly are calculated.Calculate specific formula be：

For standardized calculation as a result, it is desirable to increase a constant C on the basis of formula one, below equation is obtained：

After the DR values of the data resource being read are calculated, step S2 is performed, according to the DR values by the data resource It is assigned in corresponding accumulation layer.In the present embodiment, further according to data money after all labeled data resources have been calculated Be assigned to the data resource in corresponding accumulation layer by the DR values in source.However, the present invention is not limited in any way to this.In other realities In applying example, the data resource is assigned in corresponding accumulation layer by can having calculated after a DR value for data resource, afterwards The DR values for carrying out next data resource again are calculated.

As shown in figure 3, the data hierarchy storage method that the present invention is provided is based on the storage framework given by Fig. 3.Yu Benshi Apply in example, whole data storage layer is divided into three layers：Positioned at the low frequency layer that the capacity of bottom is maximum, the capacity on low frequency layer Less intermediate frequency layer and the minimum high frequency layer of the capacity on intermediate frequency layer；The reading speed of three is：During high frequency layer is more than Frequency layer, intermediate frequency layer is more than low frequency layer.In the present embodiment, a DR threshold value, the DR values according to the data resource being read are set DR values are assigned to low frequency range less than the data resource (COLD data) of DR threshold values, and DR values are more than or equal to the number of DR threshold values High frequency region is assigned to according to resource (HOT data).However, the present invention is not limited in any way to this.In other embodiments, can be set Data resource is assigned to low frequency range, intermediate frequency zone (WARM data) and high frequency region by two DR threshold values, it is also possible to set three with On DR threshold values whole accumulation layer is divided into the accumulation layer of more than four.In the present embodiment, accumulation layer is driven including performance Dynamic device and capacity driver, the high frequency layer in performance driving device correspondence storage framework, in capacity driver correspondence storage framework Low frequency layer.

Data resource distribution based on DR values realizes quick movement of the data resource between different accumulation layers, dynamic in real time The analysis of state enables that the data resource read by high frequency quickly enters high frequency region from low frequency range；And by low frequency read and position Low frequency range can be rapidly introduced into the data resource in high frequency region, the reading speed of data is substantially increased.

For ease of the calculating of DR values, in the present embodiment, the DR values are deposited after the DR values for calculating marked data resource In entering tables of data.However, the present invention is not limited in any way to this.

Corresponding with above-mentioned data hierarchy storage method, the present embodiment also provides a kind of data hierarchy storage system 300, The system 300 includes computing module 1 and distribute module 2.Computing module 1 in response to applications 100 read operation to being read Data resource be marked, treat to calculate the mark value of marked data resource after the completion of read operation.Distribute module 2 is according to every Be assigned to the data resource in corresponding accumulation layer 200 by the size of the mark value of one data resource.

Computing module 1 is that each data resource associates a flag parameters, is designated as DataRank (abbreviation DR).Work as applications When reading the data resource, computing module 1 is marked by changing DR values (mark value) to the data, such as often reads one Secondary, DR values are constantly applied until the read operation is completed.It is (i.e. important that DR values characterize the frequency that this data resource is read Degree), DR values show that more greatly current this period interior data resource is more important, and the number of times that it is read is more, in this period The interior data resource is dsc data.Opposite, the DR value very littles of a certain data resource, then show at this within current this period The data resource is unessential data resource in the section time, and seldom, during this period of time the data are the number of times that it is read Cold data.

In the calculating process of DR values, it is contemplated that the pass of dependence is there is between application and application, between data and data System, typically will be considered that the data resource being read together with significant data is general also more important, theoretical based on this, calculate DR values during the DR values of a certain data resource based on the data resource being read together in adjacent time and jointly quilt Data acquisition system B (j) of reading is calculated.Calculate specific formula be：

After computing module 1 calculates the DR values of each marked data resource, distribute module 2 will have been marked according to DR values The data resource of note is distributed to different accumulation layers.Specifically, specific storage framework is given in Fig. 3.The framework will Whole data storage layer is divided into three regions, positioned at top layer with most fast data reading speed and the minimum high frequency of capacity Area, in the region memory storage is HOT data；Positioned at middle intermediate frequency zone, in the region memory storage is WARM data；And Positioned at the low frequency range that the data reading speed of bottom is most slow and capacity is maximum, in the region memory storage is COLD data.In In the present embodiment, a DR threshold value is set and makes a distinction data resource, data resource of the DR values more than or equal to DR threshold values It is assigned in high frequency region, and the data distribution by DR values less than DR threshold values is in low frequency range.However, the present invention does not make any to this Limit.In other embodiments, two DR threshold values can be set by data distribution to low frequency range, intermediate frequency zone (WARM data) and height Frequency area, it is also possible to which the accumulation layer that whole accumulation layer is divided into more than four for the DR threshold values of more than three is set.

The present embodiment provide data hierarchy storage method and system not only realize the automatic of the cold and hot degree of data resource and Accurately identification, and Bedding storage can be carried out according to the cold and hot degree of data resource, dsc data storage is straight in high frequency region Meet offer client to access, improve the hit rate of data access, retrieval rate and transmission speed.Further, to characterizing number The influence between data and data has been considered in calculating process according to the DR values of cold and hot degree, data current data is being read Simultaneously, it is considered to the influence of other data in the neighbouring time, the accuracy that the cold and hot degree of data resource judges is greatly improved.Additionally, phase To traditional distributed storage scheme, while system IOPS is maintained, input cost is reduced, and improve the money of system Source utilization rate, reduces energy consumption.

Embodiment two

The data hierarchy storage method that the present embodiment is provided includes：

The operation of step S10, detection applications.

Whether step S20, the operation for judging applications are read operation；

Step S30, if so, the read operation in response to applications is marked to the data resource being read, behaviour of continuing The mark value of marked data resource is calculated after the completion of work.

Step S40, the data resource is assigned to by corresponding accumulation layer according to the size of the mark value of each data resource It is interior.

The present embodiment is with the difference of embodiment one and its change, before being marked to the data resource being read Whether the operation for judging applications is read operation, and specific flow chart is as shown in Figure 5.

Corresponding, the present embodiment also provides a kind of data hierarchy storage system 300, and the system 300 includes detection module 30th, judge module 40, computing module 10 and distribute module 20.Detection module 30 detects the operation of applications 100.Judge module 40 class of operations for judging the applications 100 detected by detection module.In the present embodiment, judge module 40 only needs to judge Whether the operation of applications 100 is read operation.Read operation pair of the computing module 10 in response to applications when for read operation The data resource being read is marked, and treats to calculate the mark value of all marked data resources after the completion of read operation.Distribution Be assigned to the data resource in corresponding accumulation layer according to the size of the mark value of each data resource by module 20.Computing module 10 specific calculating and the specific method of salary distribution of distribute module 20 are identical with embodiment one.

To ensure the accuracy and validity of DR values, in the present embodiment, data hierarchy storage system also includes to DR The maintenance module 50 that value is safeguarded.However, the present invention is not limited in any way to this.

Embodiment three

The present embodiment is that the data hierarchy storage method that the present embodiment is provided is also with the difference of embodiment one and its change Including：

Specific flow as shown with 7, including：

The operation of step S100, detection applications.

Step S200, the operation for judging applications are read operation or write operation；

Step S300, when the operation of applications is read operation, in response to applications read operation to being read Data resource is marked, and treats to calculate the mark value of marked data resource after the completion of read operation.

Step S301, the data resource is assigned to by corresponding accumulation layer according to the size of the mark value of each data resource It is interior.

Step S400, when applications operation be write operation when, in response to applications write operation initialize write-in Data resource mark value.

Step S401, the mark value is stored into tables of data after data resource is distributed to corresponding accumulation layer.

In the present embodiment, when applications write a data resource, system is the data resource association of write-in automatically One mark DR, and DR values (i.e. mark value) are initialized, make it equal to 1, i.e. step S400.Step is performed after the completion of initialization S401, the DR values are stored into tables of data and the data resource is distributed to corresponding accumulation layer according to DR threshold values.However, The present invention is not limited in any way to this.In other embodiments, the generation of DR values can be read out in the first time of data resource Initialized, as applied in embodiment one.

Likewise, when a certain data resource is read in peripheral operation, the data resource is labeled and the DR values corresponding to it Constantly it is applied, the process is completed until the operation, i.e. step S300.Each data resource is calculated after the completion of the read operation DR values.In the present embodiment, it is contemplated that the dependence between data and data, it is considered that be read together with significant data Data be also important, therefore can consider while the DR values of data resource are calculated within the time and its The set of the DR values and the data resource being read jointly of other data resources being read jointly.

It is specific to be calculated using equation below

After the DR values of the data resource being read are calculated, step S301 is performed, provided the data according to the DR values Source is assigned in corresponding accumulation layer.In the present embodiment, further according to data after all labeled data resources have been calculated Be assigned to the data resource in corresponding accumulation layer by the DR values of resource.However, the present invention is not limited in any way to this.In other In embodiment, the data resource is assigned in corresponding accumulation layer by can having calculated after a DR value for data resource, it The DR values for carrying out next data resource again afterwards are calculated.

The present embodiment is that the judgement to the operation of applications is different with the difference of embodiment two and its change. Need only to determine whether read operation in embodiment two, and then need to judge be specifically which kind of type in the present embodiment Operation, system according to different types of applications operation do different responses.

Corresponding, as shown in fig. 6, the data hierarchy storage system 300 that the present embodiment is provided, the system 300 is including examining Survey module 30, judge module 40, computing module 10 and distribute module 20.The theory diagram is identical with embodiment two, but judges The work of module 40, computing module 10 and distribute module 20 is different, specific as follows：

Detection module 30 detects the operation of applications.Judge module 40 judges the applications detected by detection module Class of operation.In the present embodiment, judge module 40 needs specifically to judge the operation of applications 100 for read operation or writes Operation.When judge module 40 judges applications for read operation computing module 10 in response to the read operation of applications to being read The data resource for taking is marked, and treats to calculate the mark value of marked data resource after the completion of read operation.Distribute module 20 The data resource is assigned in corresponding accumulation layer 200 according to the size of the mark value of each data resource.When judge module 40 When judging applications for write operation, computing module 10 is the DR of data correlation one of write-in, and the DR values are initialized, and is made It is 1.Be stored into the DR values in tables of data and the data resource is distributed to corresponding according to DR threshold values by distribute module 20 afterwards Accumulation layer in.However, the present invention is not limited in any way to this.In other embodiments, the association of DR and the generation of DR values Can be read out being initialized in the first time of data resource, as applied in embodiment one.Computing module in the present embodiment 10 specific calculating and the specific method of salary distribution of distribute module 20 are identical with embodiment one.

In sum, the data hierarchy storage method and system that the present invention is provided are carried out by the data resource being read The cold and hot degree for marking and calculating the data resource after mark its mark value to carry out dynamic analyze data resource in real time, realizes data The automatic decision of the cold and hot degree of resource, and distributed dsc data to the performance driving device of high speed according to the cold and hot degree of the data resource, And distribute the cold data of magnanimity to the capacity driver of low speed, so as to improve the hit rate and storage system of data access Resource utilization, can be tackled very well for substantial amounts of read-write problem, ensure the normal operation of operation system.Further , the relations problems between data and data are considered when the mark value of marked data resource is calculated, substantially increase number According to response speed.

Although the present invention is disclosed above by preferred embodiment, but the present invention is not limited to, it is any to know this skill Skill person, without departing from the spirit and scope of the present invention, can make a little change and retouching, therefore protection scope of the present invention is worked as It is defined depending on claims scope required for protection.

Claims

1. a kind of data hierarchy storage method, it is characterised in that including：

Read operation in response to applications is marked to the data resource being read, and treats to calculate marked after the completion of read operation Data resource mark value；

2. data hierarchy storage method according to claim 1, it is characterised in that calculating the mark of a certain data resource Mark value during value based on the data resource being read together in adjacent time and the data acquisition system being read jointly Calculated.

3. data hierarchy storage method according to claim 2, it is characterised in that data resource is calculated using below equation Mark value：

D R (i) = \underset{j &Element; B (j)}{Σ} \frac{D R (j)}{N (j)}

Wherein, DR (i) represents i-th mark value of data resource, and N (j) was represented before current time, j-th data resource The number of times being read altogether, B (j) represents the data acquisition system being read while i is read.

4. data hierarchy storage method according to claim 1, it is characterised in that calculating marked data resource The mark value of each data resource obtained by calculating is stored in tables of data after mark value.

5. data hierarchy storage method according to claim 1, it is characterised in that the accumulation layer includes performance driver With capacity driver, it is stored into the mark value of the data resource of performance driving device and is more than the data money being stored into capacity driver The mark value in source.

6. data hierarchy storage method according to claim 1, it is characterised in that the data hierarchy storage method is also wrapped Include：

7. data hierarchy storage method according to claim 1, it is characterised in that in the read operation in response to applications Data resource to being read is marked, and treats the step for calculating the mark value of marked data resource after the completion of read operation Before, the data hierarchy storage method also includes：

Detect the operation of applications；

The class of operation of the applications that judgement is detected.

8. a kind of data hierarchy storage system, it is characterised in that including：

Computing module, the read operation in response to applications is marked to the data resource being read, after the completion for the treatment of read operation Calculate the mark value of marked data resource；

Be assigned to the data resource in corresponding accumulation layer by distribute module, the size of the mark value according to each data resource.

9. data hierarchy storage system according to claim 8, it is characterised in that computing module is calculating a certain data money The mark value during mark value in source based on the data resource being read together in adjacent time and it is read jointly Data acquisition system is calculated.

10. data hierarchy storage system according to claim 8, it is characterised in that the data hierarchy storage system is also Including：

Detection module, detects the operation of applications；

Judge module, judges the class of operation of the applications detected by detection module.