A kind of methods, devices and systems of cloud data center task backup
Technical field
The invention belongs to cloud computing system control field, a kind of method backed up more particularly to cloud data center task,
Device and system.
Background technology
Cloud computing is a kind of calculation Internet-based, in this way, shared software and hardware resources and information
It can be supplied to computer and other equipment on demand.Relative to traditional software and form is calculated, cloud computing has loose coupling
The significant advantages such as conjunction, on-demand, cost is controllable, resource is virtual, isomery collaboration, make its more adapt to e-commerce now,
The applications such as flexible manufacturing, mobile Internet.
Cloud data center refers to by multiple isomeries, is carried by what the server of network connection together was formed for carrying
For the distributed computing system of the enterprise-level application of online cloud service.In cloud data center, a large amount of server is collected
Middle unified management can ensure that server runs required stabilized power source environment, suitable Temperature and Humidity Control and Netowrk tape
Wide condition.
The same with other software and hardware systems, there is also the risks of failure and failure for the server in cloud data center.Due to
Cloud computing system now more apply to the high loads such as extensive scientific algorithm, real time financial, online transaction, Streaming Media multicast and
The application of high complexity, server is in the state of overload operation often, thus the frequency for breaking down and failing is higher
And the loss brought is larger.Further, since the when and where distribution of cloud task requests embody erratic behavior and it is artificial accidentally
Property, therefore the real time load of cloud system also has more dynamic fluctuation, and then lead to the reliability properties of server in data center
It is arbitrarily fluctuated at any time with malfunction and failure risk, it is difficult to carry out preventative control and disaster avoids.Existing task backup skill
Art, it tends to be difficult to which the dynamic reliability variation tendency for holding each server in data center has " overfrequency backup " and " backup
It is insufficient " the problem of:In order to avoid some server often excessively will frequently be recognized in recent failure or failure, management strategy
It is set to the task on the server of high risk and backups to other servers, these task immigrations and backup activity itself is brought very
High overhead, and it is identified that the server of high risk actually may be at no distant date there is no breaking down, but because of task
It moves out in the idle state to dally, forms waste;It, may if underestimated to the possibility of server failure and failure
Cause backup insufficient, comes temporarily in server failure and failure, also many tasks do not have enough time moving out, and then cause in operation
Task malfunction therewith, eventually lead to system and integrally collapse.
Existing technical solution is primarily present deficiency below:
(1) means of fixed cycle control are mostly used.Existing method more preset a fixed interval time into
The periodic task backup of row.However, due to the dynamic variability of system load, the control strategy of fixed interval is often difficult
It is made and being responded rapidly to the instant sudden variation to short time server reliability;
(2) lack the mechanism of quantization trend prediction.Existing technology, not adequately to Server history reliability data
Analyzed, modeled and trend prediction, and be mostly machinery use history it is average or data are controlled certainly as foundation recently
Plan.
In this context, the reliability state of each server of cloud data center how is dynamically tracked, setting is rational
Task backs up opportunity, avoid overfrequency and it is insufficient two it is extreme, it is final to realize before not increasing considerably system operation expense
It puts and promotes cloud data center global reliability, become the hot and difficult issue for research.
Invention content
In view of the drawbacks described above of the prior art, technical problem to be solved by the invention is to provide one kind capable of promoting cloud
The cloud data center task backup method of data center's reliability.
To achieve the above object, the present invention provides a kind of method of cloud data center task backup, include the following steps:
Step 1: obtaining the information of the historical failure time of occurrence of each server of cloud data center:It is serviced including nearest k times
The time ft that device failure occurs1, ft2... ftkThe server number fn occurred with this k times failure1, fn2... fnk;K is just
Integer;
Step 2: calculating time between failures ifiWith unit interval multistep time between failures growth rate zzl:
ifi=fti+1-fti, 0 < i≤k-1;
Zzl=mean { ZI, j| i < j < k };
Step 3: calculating the stand-by period dt of next subtask backup:
Wherein, csz is the backup interval time of system default, and m is the quantity of server in data center;
Step 4: calculate recent individual server maximum number of faults dgzs, recent individual server minimum number of faults xgzs,
The serial number yhx of the backup tasks source server and serial number mdhx of backup tasks destination server;
Dgzs=max { the gzsj| 0 < j≤m };
Xgzs=min { the gzsj| 0 < j≤m };
It is described
It is described
The gzsjIndicate that j-th of server in the number of faults occurred in the recent period, is calculated as:
It is described
Step 5: carrying out task backup:If it is 0 to have at least one in yxh and mdxh, any operation is not done;If yxh
It is not 0 with mdxh, then being carrying out on yxh servers for task backups on mdxh servers;Then etc.
It waits for the dt times, returns to step one.
Preferably, in step 2, the improper point determines according to the following steps:
Calculate the average positive and negative cymomotive force of time between failures sequence, bp and bn:
Wherein, max { } is to gather to ask maximum operation, and min { } is that set asks minimum operation.
When (And)
Or
(And) when, time between failures value ifiIt is improper
Point;
The xs is previously given coefficient, and xs is positive integer.
Another technical problem to be solved by this invention is to provide a kind of cloud number that can promote cloud data center reliability
According to central task back-up device.
To achieve the above object, the present invention provides a kind of cloud data center task back-up devices, including malfunction monitoring list
Member, control decision module and task backup module;The output end of the malfunction monitoring unit connects the control decision module
Input terminal, the output end of the control decision module connect the input terminal of the task backup module;
The malfunction monitoring unit is used to obtain the information of the historical failure time of occurrence of each server of cloud data center;
The control decision module is used to analyze the risk of each server future failure of data center, calculates next time
The stand-by period of task backup calculates control decision reference value;
The task backup module is used for the task backup between execute server.
It is calculated preferably, the control decision module includes risk analysis unit, control opportunity decision package and controlled quentity controlled variable
Unit;
First output end of the malfunction monitoring unit connects the input terminal of the risk analysis unit;The malfunction monitoring
The second output terminal of unit connects the first input end of the control opportunity decision package;The third of the malfunction monitoring unit is defeated
Outlet connects the input terminal of the controlled quentity controlled variable computing unit;The output end connection control opportunity of the risk analysis unit determines
Second input terminal of plan unit;The output end of the control opportunity decision package connects the first input of the task backup module
End, the output end of the controlled quentity controlled variable computing unit connect the second input terminal of the task backup module;
The malfunction monitoring unit obtains the information of the historical failure time of occurrence of each server of cloud data center:Including most
The time ft that nearly k server failure occurs1, ft2... ftkThe server number fn occurred with this k times failure1, fn2,
...fnk;K is positive integer;
The risk analysis unit calculates time between failures ifiWith unit interval multistep time between failures growth rate
zzl:
ifi=fti+1-fti, 0 < i≤k-1;
Zzl=mean { ZI, j| i < j < k };
Wherein mean { } is the operation for gathering averaging;
The control opportunity decision package calculates the stand-by period dt of next subtask backup:
Wherein, csz is the backup interval time of system default, and m is the quantity of server in data center;
It is minimum that the controlled quentity controlled variable computing unit calculates recent individual server maximum number of faults dgzs, recent individual server
The serial number mdhx of number of faults xgzs, the serial number yhx of backup tasks source server and backup tasks destination server;
Dgzs=max { the gzsj| 0 < j≤m };
Xgzs=min { the gzsj| 0 < j≤m };
It is described
It is described
The gzsjIndicate that j-th of server in the number of faults occurred in the recent period, is calculated as:
It is described
The task backup module carries out task backup:If it is 0 to have at least one in yxh and mdxh, do not do any
Operation;If yxh and mdxh are not 0, being carrying out on yxh servers for task backups to No. mdxh service
On device.
Preferably, the risk analysis unit determines improper point by the following method:
Calculate the average positive and negative cymomotive force of time between failures sequence, bp and bn:
Wherein, max { } is to gather to ask maximum operation, and min { } is that set asks minimum operation.
When (And)
Or
(And) when, time between failures value ifiIt is improper
Point;
The xs is previously given coefficient, and xs is positive integer.
The present invention technical problem also to be solved is to provide a kind of cloud data that can promote cloud data center reliability
Central task standby system.
To achieve the above object, the present invention provides a kind of cloud data center task standby systems, including cloud data center
Server is provided with cloud data center back-up device in the cloud data center server;The cloud data center back-up device
Including malfunction monitoring unit, control decision module and task backup module;Described in the output end connection of the malfunction monitoring unit
The input terminal of control decision module, the output end of the control decision module connect the input terminal of the task backup module;
The malfunction monitoring unit is used to obtain the information of the historical failure time of occurrence of each server of cloud data center;
The control decision module is used to analyze the risk of each server future failure of data center, calculates next time
The stand-by period of task backup calculates control decision reference value;
The task backup module is used for the task backup between execute server.
It is calculated preferably, the control decision module includes risk analysis unit, control opportunity decision package and controlled quentity controlled variable
Unit;
First output end of the malfunction monitoring unit connects the input terminal of the risk analysis unit;The malfunction monitoring
The second output terminal of unit connects the first input end of the control opportunity decision package;The third of the malfunction monitoring unit is defeated
Outlet connects the input terminal of the controlled quentity controlled variable computing unit;The output end connection control opportunity of the risk analysis unit determines
Second input terminal of plan unit;The output end of the control opportunity decision package connects the first input of the task backup module
End, the output end of the controlled quentity controlled variable computing unit connect the second input terminal of the task backup module;
The malfunction monitoring unit obtains the information of the historical failure time of occurrence of each server of cloud data center:Including most
The time ft that nearly k server failure occurs1, ft2... ftkThe server number fn occurred with this k times failure1, fn2,
...fnk;K is positive integer;
The risk analysis unit calculates time between failures ifiWith unit interval multistep time between failures growth rate
zzl:
ifi=fti+1-fti, 0 < i≤k-1;
Zzl=mean { ZI, j| i < j < k };
Wherein mean { } is the operation for gathering averaging;
The control opportunity decision package calculates the stand-by period dt of next subtask backup:
Wherein, csz is the backup interval time of system default, and m is the quantity of server in data center;
It is minimum that the controlled quentity controlled variable computing unit calculates recent individual server maximum number of faults dgzs, recent individual server
The serial number mdhx of number of faults xgzs, the serial number yhx of backup tasks source server and backup tasks destination server;
Dgzs=max { the gzsj| 0 < j≤m };
Xgzs=min { the gzsj| 0 < j≤m };
It is described
It is described
The gzsjIndicate that j-th of server in the number of faults occurred in the recent period, is calculated as:
It is described
The task backup module carries out task backup:If it is 0 to have at least one in yxh and mdxh, do not do any
Operation;If yxh and mdxh are not 0, being carrying out on yxh servers for task backups to No. mdxh service
On device.
Preferably, the risk analysis unit determines improper point by the following method:
Calculate the average positive and negative cymomotive force of time between failures sequence, bp and bn:
Wherein, max { } is to gather to ask maximum operation, and min { } is that set asks minimum operation.
When (And)
Or
(And) when, time between failures value ifiIt is improper
Point;The xs is previously given coefficient, and xs is positive integer.
The beneficial effects of the invention are as follows:The present invention has fully considered the dynamic fluctuation of system reliability, by tracking it
The rational task backup frequency of trend prediction, while the present invention eliminates the influence of the abnormal point in historical reliability data, it is ensured that
The accuracy of trend prediction.The present invention can assess not according to the variation tendency of each server reliability in cloud data center
Carry out the risk that system is integrally collapsed, carries out preventative task backup in advance, and formulate variable control interval time, avoid
" backup overfrequency " and " backup is insufficient " two is extreme.
Description of the drawings
Fig. 1 is the flow diagram of one specific implementation mode of cloud data center task backup method of the present invention.
Fig. 2 is the principle schematic of one specific implementation mode of cloud data center task back-up device of the present invention.
Fig. 3 is the principle schematic of one specific implementation mode of cloud data center task standby system of the present invention.
Specific implementation mode
The invention will be further described with reference to the accompanying drawings and examples:
As shown in Figure 1, a kind of method of cloud data center task backup, includes the following steps:
Step 1: obtaining the information of the historical failure time of occurrence of each server of cloud data center:It is serviced including nearest k times
The time ft that device failure occurs1, ft2... ftkThe server number fn occurred with this k times failure1, fn2... fnk;, k is
Positive integer;In the present embodiment, the random natural number of the value range of k between 10-100.
Step 2: calculating time between failures ifiWith unit interval multistep time between failures growth rate zzl:
ifi=fti+1-fti, 0 < i≤k-1;
Zzl=mean { ZI, j| i < j < k;
Wherein mean { } is the operation for gathering averaging,
Step 3: calculating the stand-by period dt of next subtask backup:
Wherein, csz is the backup interval time of system default, in the present embodiment, the value range of csz be 0.1 to 1 second it
Between any numerical value.M is the quantity of server in data center.The intuitive meaning of above-mentioned formula is:If occurred in the csz times
The expection number of stoppages be less than in entire data center 0.7 times of number of servers, then it is assumed that system failure risk is little, still presses
Next round Standby control is carried out according to the scheduled csz stand-by period;Conversely, then with maximum time between failures growth rate in history
The ratio of the expection number of stoppages and m that occur in the csz times is calculated, and by csz divided by this ratio, backs up and controls as next round
The stand-by period of system.
Step 4: calculate recent individual server maximum number of faults dgzs, recent individual server minimum number of faults xgzs,
The serial number yhx of the backup tasks source server and serial number mdhx of backup tasks destination server;
Dgzs=max { the gzsj| 0 < j≤m };
Xgzs=min { the gzsj| 0 < j≤m };
It is described
It is described
The gzsjIndicate that j-th of server in the number of faults occurred in the recent period, is calculated as:
It is described
Step 5: carrying out task backup:If it is 0 to have at least one in yxh and mdxh, any operation is not done;If yxh
It is not 0 with mdxh, then being carrying out on yxh servers for task backups on mdxh servers;Then etc.
It waits for the dt times, returns to step one.
Due to actual cloud computing system operation by many system factors (message exception deferral, connection bandwidth variation,
Calculation resources conflict etc.) and nonsystematic factor (system, the accidental failure of software and hardware, information drop-out etc.) influence, it is above-mentioned
The case where being significantly departing from overall variation rule there are part record value in time between failures sequence, these improper points cannot be by
It is considered as general routine data to be analyzed and assessed, and is rejected.The improper point determines according to the following steps:
The average positive and negative cymomotive force of time between failures sequence is calculated, bp and bn, the bp are time between failures sequences
The average positive wave fatigue resistance of row, the bn is the average negative wave fatigue resistance of time between failures sequence:
Wherein, max { } is to gather to ask maximum operation, and min { } is that set asks minimum operation.
When (And)
Or
(And) when, time between failures value ifiIt is improper
Point;
The xs is previously given coefficient, and xs is positive integer.In the present embodiment, xs 10.
As shown in Fig. 2, a kind of cloud data center task back-up device, including malfunction monitoring unit 3, control decision module 4
With task backup module 5;The output end of the malfunction monitoring unit 3 connects the input terminal of the control decision module 4, the control
The output end of decision-making module 4 processed connects the input terminal of the task backup module 5.
The malfunction monitoring unit 3 is used to obtain the information of the historical failure time of occurrence of each server of cloud data center.
The control decision module 4 is next for analyzing risk, calculating that data center will break down in each server future
The stand-by period of subtask backup calculates control decision reference value.
The task backup module 5 is used for the task backup between execute server.
The control decision module 4 includes that risk analysis unit 401, control opportunity decision package 402 and controlled quentity controlled variable calculate
Unit 403.
First output end of the malfunction monitoring unit 3 connects the input terminal of the risk analysis unit 401;The failure
The second output terminal of monitoring unit 3 connects the first input end of the control opportunity decision package 402;The malfunction monitoring unit
3 third output end connects the input terminal of the controlled quentity controlled variable computing unit 403;The output end of the risk analysis unit 401 connects
Connect the second input terminal of the control opportunity decision package 402;Described in the output end connection of the control opportunity decision package 402
The output end of the first input end of task backup module 5, the controlled quentity controlled variable computing unit 403 connects the task backup module 5
The second input terminal.
The malfunction monitoring unit 3 obtains the information of the historical failure time of occurrence of each server of cloud data center:Including
The time ft that nearest k server failure occurs1, ft2... ftkThe server number fn occurred with this k times failure1,
fn2... fnk;And by ft1, ft2... ftkAnd fn1, fn2... fnkIt is sent to risk analysis unit, control opportunity decision package
With controlled quentity controlled variable computing unit, k is positive integer;In the present embodiment, the random natural number of the value range of k between 10-100.
The risk analysis unit 401 calculates time between failures ifiIncrease with unit interval multistep time between failures
Rate zzl:
ifi=fti+1-fti, 0 < i≤k-1;
Zzl=mean { ZI, j| i < j < k;
Wherein mean { } is the operation for gathering averaging;
Then, zzl values are issued control opportunity decision package by risk analysis unit 401.
The control opportunity decision package 402 calculates the stand-by period dt of next subtask backup:
Wherein, csz is the backup interval time of system default, and in the present embodiment, csz value ranges are between 0.1 to 1 second
Any numerical value.M is the quantity of server in data center.Dt values are sent to task by the control opportunity decision package 402
Backup module.
The controlled quentity controlled variable computing unit 403 calculates recent individual server maximum number of faults dgzs, recent individual server
The serial number mdhx of minimum number of faults xgzs, the serial number yhx of backup tasks source server and backup tasks destination server.
Dgzs=max { the gzsj| 0 < j≤m };
Xgzs=min { the gzsj| 0 < j≤m };
It is described
It is described
The gzsjIndicate that j-th of server in the number of faults occurred in the recent period, is calculated as:
It is described
Yxh and mdxh values are sent to task backup module by the controlled quentity controlled variable computing unit 403.
The task backup module 5 carries out task backup:If it is 0 to have at least one in yxh and mdxh, do not do any
Operation;If yxh and mdxh are not 0, being carrying out on yxh servers for task backups to No. mdxh service
On device.
The risk analysis unit 401 determines improper point by the following method:
Calculate the average positive and negative cymomotive force of time between failures sequence, bp and bn:
Wherein, max { } is to gather to ask maximum operation, and min { } is that set asks minimum operation.
When (And)
Or
(And) when, time between failures value ifiIt is improper
Point.
The xs is previously given coefficient, and xs is positive integer.In the present embodiment, xs values are 10.
As shown in figure 3, a kind of cloud data center task standby system, including cloud data center server 1, the cloud data
Cloud data center back-up device 2 is provided in central server 1;The cloud data center back-up device 2 includes malfunction monitoring list
Member 3, control decision module 4 and task backup module 5;The output end of the malfunction monitoring unit 3 connects the control decision mould
The input terminal of block 4, the output end of the control decision module 4 connect the input terminal of the task backup module 5.
The malfunction monitoring unit 3 is used to obtain the information of the historical failure time of occurrence of each server of cloud data center.
The control decision module 4 is next for analyzing risk, calculating that data center will break down in each server future
The stand-by period of subtask backup calculates control decision reference value.
The task backup module 5 is used for the task backup between execute server.
The control decision module 4 includes that risk analysis unit 401, control opportunity decision package 402 and controlled quentity controlled variable calculate
Unit 403;
First output end of the malfunction monitoring unit 3 connects the input terminal of the risk analysis unit 401;The failure
The second output terminal of monitoring unit 3 connects the first input end of the control opportunity decision package 402;The malfunction monitoring unit
3 third output end connects the input terminal of the controlled quentity controlled variable computing unit 403;The output end of the risk analysis unit 401 connects
Connect the second input terminal of the control opportunity decision package 402;Described in the output end connection of the control opportunity decision package 402
The output end of the first input end of task backup module 5, the controlled quentity controlled variable computing unit 403 connects the task backup module 5
The second input terminal.
The malfunction monitoring unit 3 obtains the information of the historical failure time of occurrence of each server of cloud data center:Including
The time ft that nearest k server failure occurs1, ft2... ftkThe server number fn occurred with this k times failure1,
fn2... fnk;And by ft1, ft2... ftkAnd fn1, fn2... fnkIt is sent to risk analysis unit, control opportunity decision package
With controlled quentity controlled variable computing unit, k is positive integer;In the present embodiment, the random natural number of the value range of k between 10-100.
The risk analysis unit 401 calculates time between failures ifiIncrease with unit interval multistep time between failures
Rate zzl:
ifi=fti+1-fti, 0 < i≤k-1;
Zzl=mean { ZI, j| i < j < k;
Wherein mean { } is the operation for gathering averaging;
Then, zzl values are issued control opportunity decision package by risk analysis unit 401.
The control opportunity decision package 402 calculates the stand-by period dt of next subtask backup:
Wherein, csz is the backup interval time of system default, and in the present embodiment, csz value ranges are between 0.1 to 1 second
Any numerical value.M is the quantity of server in data center.Dt values are sent to task by the control opportunity decision package 402
Backup module.
The controlled quentity controlled variable computing unit 403 calculates recent individual server maximum number of faults dgzs, recent individual server
The serial number mdhx of minimum number of faults xgzs, the serial number yhx of backup tasks source server and backup tasks destination server.
Dgzs=max { the gzsj| 0 < j≤m };
Xgzs=min { the gzsj| 0 < j≤m };
It is described
It is described
The gzsjIndicate that j-th of server in the number of faults occurred in the recent period, is calculated as:
It is described
Yxh and mdxh values are sent to task backup module by the controlled quentity controlled variable computing unit 403.
The task backup module 5 carries out task backup:If it is 0 to have at least one in yxh and mdxh, do not do any
Operation;If yxh and mdxh are not 0, being carrying out on yxh servers for task backups to No. mdxh service
On device.
The risk analysis unit 401 determines improper point by the following method:
Calculate the average positive and negative cymomotive force of time between failures sequence, bp and bn:
Wherein, max { } is to gather to ask maximum operation, and min { } is that set asks minimum operation.
When (And)
Or
(And) when, time between failures value ifiIt is improper
Point.
The xs is previously given coefficient, and xs is positive integer.In the present embodiment, xs values are 10.
The device for the cloud data center task backup that the embodiment of the present invention was provided analyzed based on reliability trends, can be with
Be deployed in an existing server, can also dispose with one be separately provided be exclusively used in analyzing based on reliability trends
In the server of cloud data center task backup.For this purpose, the present invention provides a kind of server, including the embodiment of the present invention is carried
The cloud data center task back-up device based on reliability trends analysis supplied.One of ordinary skill in the art will appreciate that realizing
The process of the cloud data center task backup based on reliability trends analysis, can pass through program instruction in above-described embodiment method
Relevant hardware is completed, which executes the correspondence step in the above method when being executed.
The preferred embodiment of the present invention has been described in detail above.It should be appreciated that those skilled in the art without
It needs creative work according to the present invention can conceive and makes many modifications and variations.Therefore, all technologies in the art
Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea
Technical solution, all should be in the protection domain being defined in the patent claims.