CN104767806B

CN104767806B - A kind of methods, devices and systems of cloud data center task backup

Info

Publication number: CN104767806B
Application number: CN201510147743.6A
Authority: CN
Inventors: 夏云霓; 周刚; 罗辛; 俞可; 朱庆生
Original assignee: Chongqing University
Current assignee: Quzhou Haiyi Technology Co ltd
Priority date: 2015-03-31
Filing date: 2015-03-31
Publication date: 2018-09-25
Anticipated expiration: 2035-03-31
Also published as: CN104767806A

Abstract

The invention discloses a kind of methods, devices and systems of cloud data center task backup, belong to cloud computing system control field, the present invention obtains the information of the historical failure time of occurrence of each server of cloud data center first, calculates time between failures and unit interval multistep time between failures growth rate, the stand-by period for calculating next subtask backup, the recent individual server maximum number of faults of calculating, recent individual server minimum number of faults, the serial number of the serial number and backup tasks destination server of backup tasks source server；Then task backup is carried out.The present invention can be according to the variation tendency of each server reliability in cloud data center, the risk that assessment system in future is integrally collapsed, preventative task backup is carried out in advance, and formulates variable control interval time, and it is extreme to avoid " backup overfrequency " and " backup is insufficient " two.

Description

A kind of methods, devices and systems of cloud data center task backup

Technical field

The invention belongs to cloud computing system control field, a kind of method backed up more particularly to cloud data center task, Device and system.

Background technology

Cloud computing is a kind of calculation Internet-based, in this way, shared software and hardware resources and information It can be supplied to computer and other equipment on demand.Relative to traditional software and form is calculated, cloud computing has loose coupling The significant advantages such as conjunction, on-demand, cost is controllable, resource is virtual, isomery collaboration, make its more adapt to e-commerce now, The applications such as flexible manufacturing, mobile Internet.

Cloud data center refers to by multiple isomeries, is carried by what the server of network connection together was formed for carrying For the distributed computing system of the enterprise-level application of online cloud service.In cloud data center, a large amount of server is collected Middle unified management can ensure that server runs required stabilized power source environment, suitable Temperature and Humidity Control and Netowrk tape Wide condition.

The same with other software and hardware systems, there is also the risks of failure and failure for the server in cloud data center.Due to Cloud computing system now more apply to the high loads such as extensive scientific algorithm, real time financial, online transaction, Streaming Media multicast and The application of high complexity, server is in the state of overload operation often, thus the frequency for breaking down and failing is higher And the loss brought is larger.Further, since the when and where distribution of cloud task requests embody erratic behavior and it is artificial accidentally Property, therefore the real time load of cloud system also has more dynamic fluctuation, and then lead to the reliability properties of server in data center It is arbitrarily fluctuated at any time with malfunction and failure risk, it is difficult to carry out preventative control and disaster avoids.Existing task backup skill Art, it tends to be difficult to which the dynamic reliability variation tendency for holding each server in data center has " overfrequency backup " and " backup It is insufficient " the problem of：In order to avoid some server often excessively will frequently be recognized in recent failure or failure, management strategy It is set to the task on the server of high risk and backups to other servers, these task immigrations and backup activity itself is brought very High overhead, and it is identified that the server of high risk actually may be at no distant date there is no breaking down, but because of task It moves out in the idle state to dally, forms waste；It, may if underestimated to the possibility of server failure and failure Cause backup insufficient, comes temporarily in server failure and failure, also many tasks do not have enough time moving out, and then cause in operation Task malfunction therewith, eventually lead to system and integrally collapse.

Existing technical solution is primarily present deficiency below：

(1) means of fixed cycle control are mostly used.Existing method more preset a fixed interval time into The periodic task backup of row.However, due to the dynamic variability of system load, the control strategy of fixed interval is often difficult It is made and being responded rapidly to the instant sudden variation to short time server reliability；

(2) lack the mechanism of quantization trend prediction.Existing technology, not adequately to Server history reliability data Analyzed, modeled and trend prediction, and be mostly machinery use history it is average or data are controlled certainly as foundation recently Plan.

In this context, the reliability state of each server of cloud data center how is dynamically tracked, setting is rational Task backs up opportunity, avoid overfrequency and it is insufficient two it is extreme, it is final to realize before not increasing considerably system operation expense It puts and promotes cloud data center global reliability, become the hot and difficult issue for research.

Invention content

In view of the drawbacks described above of the prior art, technical problem to be solved by the invention is to provide one kind capable of promoting cloud The cloud data center task backup method of data center's reliability.

To achieve the above object, the present invention provides a kind of method of cloud data center task backup, include the following steps：

Step 1: obtaining the information of the historical failure time of occurrence of each server of cloud data center：It is serviced including nearest k times The time ft that device failure occurs₁, ft₂... ft_kThe server number fn occurred with this k times failure₁, fn₂... fn_k；K is just Integer；

Step 2: calculating time between failures if_iWith unit interval multistep time between failures growth rate zzl:

if_i=ft_i+1-ft_i, 0 ＜ i≤k-1；

Zzl=mean { Z_{I, j}| i ＜ j ＜ k }；

Step 3: calculating the stand-by period dt of next subtask backup：

Wherein, csz is the backup interval time of system default, and m is the quantity of server in data center；

Step 4: calculate recent individual server maximum number of faults dgzs, recent individual server minimum number of faults xgzs, The serial number yhx of the backup tasks source server and serial number mdhx of backup tasks destination server；

Dgzs=max { the gzs_j| 0 ＜ j≤m }；

Xgzs=min { the gzs_j| 0 ＜ j≤m }；

It is described

The gzs_jIndicate that j-th of server in the number of faults occurred in the recent period, is calculated as：

It is described

Step 5: carrying out task backup：If it is 0 to have at least one in yxh and mdxh, any operation is not done；If yxh It is not 0 with mdxh, then being carrying out on yxh servers for task backups on mdxh servers；Then etc. It waits for the dt times, returns to step one.

Preferably, in step 2, the improper point determines according to the following steps：

Calculate the average positive and negative cymomotive force of time between failures sequence, bp and bn：

Wherein, max { } is to gather to ask maximum operation, and min { } is that set asks minimum operation.

When (And)

Or

(And) when, time between failures value if_iIt is improper Point；

The xs is previously given coefficient, and xs is positive integer.

Another technical problem to be solved by this invention is to provide a kind of cloud number that can promote cloud data center reliability According to central task back-up device.

To achieve the above object, the present invention provides a kind of cloud data center task back-up devices, including malfunction monitoring list Member, control decision module and task backup module；The output end of the malfunction monitoring unit connects the control decision module Input terminal, the output end of the control decision module connect the input terminal of the task backup module；

The malfunction monitoring unit is used to obtain the information of the historical failure time of occurrence of each server of cloud data center；

The control decision module is used to analyze the risk of each server future failure of data center, calculates next time The stand-by period of task backup calculates control decision reference value；

The task backup module is used for the task backup between execute server.

It is calculated preferably, the control decision module includes risk analysis unit, control opportunity decision package and controlled quentity controlled variable Unit；

First output end of the malfunction monitoring unit connects the input terminal of the risk analysis unit；The malfunction monitoring The second output terminal of unit connects the first input end of the control opportunity decision package；The third of the malfunction monitoring unit is defeated Outlet connects the input terminal of the controlled quentity controlled variable computing unit；The output end connection control opportunity of the risk analysis unit determines Second input terminal of plan unit；The output end of the control opportunity decision package connects the first input of the task backup module End, the output end of the controlled quentity controlled variable computing unit connect the second input terminal of the task backup module；

The malfunction monitoring unit obtains the information of the historical failure time of occurrence of each server of cloud data center：Including most The time ft that nearly k server failure occurs₁, ft₂... ft_kThe server number fn occurred with this k times failure₁, fn₂, ...fn_k；K is positive integer；

The risk analysis unit calculates time between failures if_iWith unit interval multistep time between failures growth rate zzl:

if_i=ft_i+1-ft_i, 0 ＜ i≤k-1；

Zzl=mean { Z_{I, j}| i ＜ j ＜ k }；

Wherein mean { } is the operation for gathering averaging；

The control opportunity decision package calculates the stand-by period dt of next subtask backup：

It is minimum that the controlled quentity controlled variable computing unit calculates recent individual server maximum number of faults dgzs, recent individual server The serial number mdhx of number of faults xgzs, the serial number yhx of backup tasks source server and backup tasks destination server；

Dgzs=max { the gzs_j| 0 ＜ j≤m }；

Xgzs=min { the gzs_j| 0 ＜ j≤m }；

It is described

The task backup module carries out task backup：If it is 0 to have at least one in yxh and mdxh, do not do any Operation；If yxh and mdxh are not 0, being carrying out on yxh servers for task backups to No. mdxh service On device.

Preferably, the risk analysis unit determines improper point by the following method：

When (And)

Or

(And) when, time between failures value if_iIt is improper Point；

The xs is previously given coefficient, and xs is positive integer.

The present invention technical problem also to be solved is to provide a kind of cloud data that can promote cloud data center reliability Central task standby system.

To achieve the above object, the present invention provides a kind of cloud data center task standby systems, including cloud data center Server is provided with cloud data center back-up device in the cloud data center server；The cloud data center back-up device Including malfunction monitoring unit, control decision module and task backup module；Described in the output end connection of the malfunction monitoring unit The input terminal of control decision module, the output end of the control decision module connect the input terminal of the task backup module；

The task backup module is used for the task backup between execute server.

if_i=ft_i+1-ft_i, 0 ＜ i≤k-1；

Zzl=mean { Z_{I, j}| i ＜ j ＜ k }；

Wherein mean { } is the operation for gathering averaging；

Dgzs=max { the gzs_j| 0 ＜ j≤m }；

Xgzs=min { the gzs_j| 0 ＜ j≤m }；

It is described

When (And)

Or

(And) when, time between failures value if_iIt is improper Point；The xs is previously given coefficient, and xs is positive integer.

The beneficial effects of the invention are as follows：The present invention has fully considered the dynamic fluctuation of system reliability, by tracking it The rational task backup frequency of trend prediction, while the present invention eliminates the influence of the abnormal point in historical reliability data, it is ensured that The accuracy of trend prediction.The present invention can assess not according to the variation tendency of each server reliability in cloud data center Carry out the risk that system is integrally collapsed, carries out preventative task backup in advance, and formulate variable control interval time, avoid " backup overfrequency " and " backup is insufficient " two is extreme.

Description of the drawings

Fig. 1 is the flow diagram of one specific implementation mode of cloud data center task backup method of the present invention.

Fig. 2 is the principle schematic of one specific implementation mode of cloud data center task back-up device of the present invention.

Fig. 3 is the principle schematic of one specific implementation mode of cloud data center task standby system of the present invention.

Specific implementation mode

The invention will be further described with reference to the accompanying drawings and examples：

As shown in Figure 1, a kind of method of cloud data center task backup, includes the following steps：

Step 1: obtaining the information of the historical failure time of occurrence of each server of cloud data center：It is serviced including nearest k times The time ft that device failure occurs₁, ft₂... ft_kThe server number fn occurred with this k times failure₁, fn₂... fn_k；, k is Positive integer；In the present embodiment, the random natural number of the value range of k between 10-100.

if_i=ft_i+1-ft_i, 0 ＜ i≤k-1；

Zzl=mean { Z_{I, j}| i ＜ j ＜ k；

Wherein mean { } is the operation for gathering averaging,

Step 3: calculating the stand-by period dt of next subtask backup：

Wherein, csz is the backup interval time of system default, in the present embodiment, the value range of csz be 0.1 to 1 second it Between any numerical value.M is the quantity of server in data center.The intuitive meaning of above-mentioned formula is：If occurred in the csz times The expection number of stoppages be less than in entire data center 0.7 times of number of servers, then it is assumed that system failure risk is little, still presses Next round Standby control is carried out according to the scheduled csz stand-by period；Conversely, then with maximum time between failures growth rate in history The ratio of the expection number of stoppages and m that occur in the csz times is calculated, and by csz divided by this ratio, backs up and controls as next round The stand-by period of system.

Dgzs=max { the gzs_j| 0 ＜ j≤m }；

Xgzs=min { the gzs_j| 0 ＜ j≤m }；

It is described

Due to actual cloud computing system operation by many system factors (message exception deferral, connection bandwidth variation, Calculation resources conflict etc.) and nonsystematic factor (system, the accidental failure of software and hardware, information drop-out etc.) influence, it is above-mentioned The case where being significantly departing from overall variation rule there are part record value in time between failures sequence, these improper points cannot be by It is considered as general routine data to be analyzed and assessed, and is rejected.The improper point determines according to the following steps：

The average positive and negative cymomotive force of time between failures sequence is calculated, bp and bn, the bp are time between failures sequences The average positive wave fatigue resistance of row, the bn is the average negative wave fatigue resistance of time between failures sequence：

When (And)

Or

(And) when, time between failures value if_iIt is improper Point；

The xs is previously given coefficient, and xs is positive integer.In the present embodiment, xs 10.

As shown in Fig. 2, a kind of cloud data center task back-up device, including malfunction monitoring unit 3, control decision module 4 With task backup module 5；The output end of the malfunction monitoring unit 3 connects the input terminal of the control decision module 4, the control The output end of decision-making module 4 processed connects the input terminal of the task backup module 5.

The malfunction monitoring unit 3 is used to obtain the information of the historical failure time of occurrence of each server of cloud data center.

The control decision module 4 is next for analyzing risk, calculating that data center will break down in each server future The stand-by period of subtask backup calculates control decision reference value.

The task backup module 5 is used for the task backup between execute server.

The control decision module 4 includes that risk analysis unit 401, control opportunity decision package 402 and controlled quentity controlled variable calculate Unit 403.

First output end of the malfunction monitoring unit 3 connects the input terminal of the risk analysis unit 401；The failure The second output terminal of monitoring unit 3 connects the first input end of the control opportunity decision package 402；The malfunction monitoring unit 3 third output end connects the input terminal of the controlled quentity controlled variable computing unit 403；The output end of the risk analysis unit 401 connects Connect the second input terminal of the control opportunity decision package 402；Described in the output end connection of the control opportunity decision package 402 The output end of the first input end of task backup module 5, the controlled quentity controlled variable computing unit 403 connects the task backup module 5 The second input terminal.

The malfunction monitoring unit 3 obtains the information of the historical failure time of occurrence of each server of cloud data center：Including The time ft that nearest k server failure occurs₁, ft₂... ft_kThe server number fn occurred with this k times failure₁, fn₂... fn_k；And by ft₁, ft₂... ft_kAnd fn₁, fn₂... fn_kIt is sent to risk analysis unit, control opportunity decision package With controlled quentity controlled variable computing unit, k is positive integer；In the present embodiment, the random natural number of the value range of k between 10-100.

The risk analysis unit 401 calculates time between failures if_iIncrease with unit interval multistep time between failures Rate zzl:

if_i=ft_i+1-ft_i, 0 ＜ i≤k-1；

Zzl=mean { Z_{I, j}| i ＜ j ＜ k；

Wherein mean { } is the operation for gathering averaging；

Then, zzl values are issued control opportunity decision package by risk analysis unit 401.

The control opportunity decision package 402 calculates the stand-by period dt of next subtask backup：

Wherein, csz is the backup interval time of system default, and in the present embodiment, csz value ranges are between 0.1 to 1 second Any numerical value.M is the quantity of server in data center.Dt values are sent to task by the control opportunity decision package 402 Backup module.

The controlled quentity controlled variable computing unit 403 calculates recent individual server maximum number of faults dgzs, recent individual server The serial number mdhx of minimum number of faults xgzs, the serial number yhx of backup tasks source server and backup tasks destination server.

Dgzs=max { the gzs_j| 0 ＜ j≤m }；

Xgzs=min { the gzs_j| 0 ＜ j≤m }；

It is described

Yxh and mdxh values are sent to task backup module by the controlled quentity controlled variable computing unit 403.

The task backup module 5 carries out task backup：If it is 0 to have at least one in yxh and mdxh, do not do any Operation；If yxh and mdxh are not 0, being carrying out on yxh servers for task backups to No. mdxh service On device.

The risk analysis unit 401 determines improper point by the following method：

When (And)

Or

(And) when, time between failures value if_iIt is improper Point.

The xs is previously given coefficient, and xs is positive integer.In the present embodiment, xs values are 10.

As shown in figure 3, a kind of cloud data center task standby system, including cloud data center server 1, the cloud data Cloud data center back-up device 2 is provided in central server 1；The cloud data center back-up device 2 includes malfunction monitoring list Member 3, control decision module 4 and task backup module 5；The output end of the malfunction monitoring unit 3 connects the control decision mould The input terminal of block 4, the output end of the control decision module 4 connect the input terminal of the task backup module 5.

The task backup module 5 is used for the task backup between execute server.

The control decision module 4 includes that risk analysis unit 401, control opportunity decision package 402 and controlled quentity controlled variable calculate Unit 403；

if_i=ft_i+1-ft_i, 0 ＜ i≤k-1；

Zzl=mean { Z_{I, j}| i ＜ j ＜ k；

Wherein mean { } is the operation for gathering averaging；

Dgzs=max { the gzs_j| 0 ＜ j≤m }；

Xgzs=min { the gzs_j| 0 ＜ j≤m }；

It is described

The risk analysis unit 401 determines improper point by the following method：

When (And)

Or

(And) when, time between failures value if_iIt is improper Point.

The device for the cloud data center task backup that the embodiment of the present invention was provided analyzed based on reliability trends, can be with Be deployed in an existing server, can also dispose with one be separately provided be exclusively used in analyzing based on reliability trends In the server of cloud data center task backup.For this purpose, the present invention provides a kind of server, including the embodiment of the present invention is carried The cloud data center task back-up device based on reliability trends analysis supplied.One of ordinary skill in the art will appreciate that realizing The process of the cloud data center task backup based on reliability trends analysis, can pass through program instruction in above-described embodiment method Relevant hardware is completed, which executes the correspondence step in the above method when being executed.

The preferred embodiment of the present invention has been described in detail above.It should be appreciated that those skilled in the art without It needs creative work according to the present invention can conceive and makes many modifications and variations.Therefore, all technologies in the art Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Technical solution, all should be in the protection domain being defined in the patent claims.

Claims

1. a kind of method of cloud data center task backup, it is characterised in that include the following steps：

Step 1: obtaining the information of the historical failure time of occurrence of each server of cloud data center：Including the event of nearest k server Hinder the time ft occurred₁, ft₂... ft_kThe server number fn occurred with this k times failure₁, fn₂... fn_k；K is just whole Number；

if_i=ft_i+1-ft_i, 0 ＜ i≤k-1；

Zzl=mean { Z_{I, j}| i ＜ j ＜ k }；

Step 3: calculating the stand-by period dt of next subtask backup：

Step 4: calculating recent individual server maximum number of faults dgzs, recent individual server minimum number of faults xgzs, backup The serial number yxh of the task source server and serial number mdxh of backup tasks destination server；

Dgzs=max { the gzs_j| 0 ＜ j≤m }；

Xgzs=min { the gzs_j| 0 ＜ j≤m }；

It is described

Step 5: carrying out task backup：If it is 0 to have at least one in yxh and mdxh, any operation is not done；If yxh and Mdxh is not 0, then being carrying out on yxh servers for task backups on mdxh servers；Then it waits for The dt times, return to step one；

In step 2, the improper point determines according to the following steps：

When (And)

Or

(And) when, time between failures value if_iFor improper point；

The xs is previously given coefficient, and xs is positive integer.

2. a kind of cloud data center task back-up device, it is characterized in that：Including malfunction monitoring unit (3), control decision module (4) With task backup module (5)；The output end of the malfunction monitoring unit (3) connects the input terminal of the control decision module (4), The output end of the control decision module (4) connects the input terminal of the task backup module (5)；

The malfunction monitoring unit (3) is used to obtain the information of the historical failure time of occurrence of each server of cloud data center；

The control decision module (4) is used to analyze the risk of each server future failure of data center, calculates next time The stand-by period of task backup calculates control decision reference value；

The task backup module (5) is used for the task backup between execute server；

The control decision module (4) includes risk analysis unit (401), control opportunity decision package (402) and control gauge Calculate unit (403)；

First output end of the malfunction monitoring unit (3) connects the input terminal of the risk analysis unit (401)；The failure The second output terminal of monitoring unit (3) connects the first input end of the control opportunity decision package (402)；The malfunction monitoring The third output end of unit (3) connects the input terminal of the controlled quentity controlled variable computing unit (403)；The risk analysis unit (401) Output end connect the second input terminal of the control opportunity decision package (402)；The control opportunity decision package (402) Output end connects the first input end of the task backup module (5), the output end connection of the controlled quentity controlled variable computing unit (403) Second input terminal of the task backup module (5)；

The malfunction monitoring unit (3) obtains the information of the historical failure time of occurrence of each server of cloud data center：Including most The time ft that nearly k server failure occurs₁, ft₂... ft_kThe server number fn occurred with this k times failure₁, fn₂, ...fn_k；K is positive integer；

The risk analysis unit (401) calculates time between failures if_iWith unit interval multistep time between failures growth rate zzl:

if_i=ft_i+1-ft_i, 0 ＜ i≤k-1；

Zzl=mean { Z_{I, j}| i ＜ j ＜ k }；

The control opportunity decision package (402) calculates the stand-by period dt of next subtask backup：

The controlled quentity controlled variable computing unit (403) calculates recent individual server maximum number of faults dgzs, recent individual server most The serial number mdxh of glitch number xgzs, the serial number yxh of backup tasks source server and backup tasks destination server；

Dgzs=max { the gzs_j| 0 ＜ j≤m }；

Xgzs=min { the gzs_j| 0 ＜ j≤m }；

It is described

The task backup module (5) carries out task backup：If it is 0 to have at least one in yxh and mdxh, any behaviour is not Make；If yxh and mdxh are not 0, being carrying out on yxh servers for task backups to mdxh servers On；

The risk analysis unit (401) determines improper point by the following method：

When (And)

Or

(And) when, time between failures value if_iFor improper point；

The xs is previously given coefficient, and xs is positive integer.

3. a kind of cloud data center task standby system, including cloud data center server (1), it is characterized in that：The cloud data It is provided with cloud data center back-up device (2) in central server (1)；The cloud data center back-up device (2) includes failure Monitoring unit (3), control decision module (4) and task backup module (5)；The output end of the malfunction monitoring unit (3) connects The output end of the input terminal of the control decision module (4), the control decision module (4) connects the task backup module (5) input terminal；

if_i=ft_i+1-ft_i, 0 ＜ i≤k-1；

Zzl=mean { Z_{I, j}| i ＜ j ＜ k }；

Dgzs=max { the gzs_j| 0 ＜ j≤m }；

Xgzs=min { the gzs_j| 0 ＜ j≤m }；

It is described

When (And)

Or

(And) when, time between failures value if_iFor improper point；It is described Xs is previously given coefficient, and xs is positive integer.