CN108573303A

CN108573303A - It is a kind of that recovery policy is improved based on the complex network local failure for improving intensified learning certainly

Info

Publication number: CN108573303A
Application number: CN201810375758.1A
Authority: CN
Inventors: 冯强; 吴其隆; 任羿; 孙博; 杨德真
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2018-04-25
Filing date: 2018-04-25
Publication date: 2018-09-25

Abstract

The invention discloses a kind of based on the complex network local failure for improving intensified learning from recovery policy method is improved, and solves the problems, such as that complex network carries out the recovery policy generation of cluster repair.Steps are as follows：1 establishes the cluster service mode matrix of complex network according to local failure information.2 generate complex network adjacency matrix based on initial cluster service mode.3 priori service mode transition probability and the maintenance policy values based on Neural Network model predictive cluster.4 traverse the maintenance policy solution space of cluster based on Monte Carlo tree search algorithm, and select current time global best maintenance action.5 variations based on cluster service mode update complex network adjacency matrix.6 calculate based on cluster service mode and adjacency matrix and examine the recovery extent of complex network.7 train neural network parameter based on intensified learning empirical parameter.8 generate a complete repair recovery scheme based on recovery policy from a series of best maintenance actions of development.

Description

It is a kind of to be restored from improvement based on the complex network local failure for improving intensified learning Strategy

Technical field

The present invention provides a kind of recovery plan of improvement certainly based under the complex network local failure state for improving intensified learning Slightly (Self-improvement Recovery Strategy, SIRS) method more particularly to a kind of consideration network node composition Element characteristic is based on improved nitrification enhancement, realizes from the recovery plan for improving the repair of solving complexity network multi-node cluster Slightly method, belongs to maintainability engineering field.

Background technology

Refer to destroying position after local failure occurs for complex network and multinode concentration occur from recovery policy (SIRS) is improved Not available situation, rapid rush-repair is to whole available mode by way of cluster repair.But it is tieed up both at home and abroad about cluster at present The research repaiied does not consider sequential generally.As maintainability is increasingly taken seriously, cluster maintenance policy is carried out to complex network Higher requirement has been researched and proposed, that is, it is whole to fully consider that the sequential of cluster repair and income do not know feature and problem NP-hard features provide a kind of efficient cluster maintenance policy method.

The present invention is based on the neural network prediction models of service mode transition probability and Monte Carlo tree to search for (Monte Carlo Tree Search, MCTS) algorithm, it has invented a kind of based on the novel improvement recovery policy certainly for improving intensified learning (SIRS) method solves the problems, such as that the cluster maintenance policy under complex network local failure state generates.

Invention content

The purpose of the present invention is provide a kind of novel improvement recovery policy certainly for the complex network under local failure state (SIRS) method, it is intended to solve conventional cluster maintenance policy method and not fully consider the sequential of cluster repair and the uncertain spy of income The problems such as NP-hard features of sign and problem entirety.

The present invention proposes a kind of SIRS based on neural network prediction model and Monte Carlo tree search (MCTS) algorithm Method mainly comprises the steps of：

Step 1：The cluster service mode matrix of complex network is established based on local failure.

Research is unfolded in the cluster maintenance problem that complex network local failure recovery policy is considered as to multinode.First, it builds Node set K={ the k of complex network₁,k₂,…,k_i,…,k_j,…,k_n(wherein n is the number of node), by the group of each node At being disassembled, its unit set U={ u are established₁,u₂,…,u_m}.Based on this, " node-unit " square of m × n is established Battle array, and according to local failure information, with the trouble unit in " 0 " expression local failure space to be repaired, " 1 " indicates normal single Member forms service mode matrix S to element assignment in matrix.

Step 2：Complex network adjacency matrix is generated based on initial cluster service mode.

One complex network is abstracted as one by set of node K={ k₁,k₂,…,k_i,…,k_j,…,k_nAnd connection (side) CollectionThe figure G=(K, E) of composition.It is described in complex network between n node with the adjacency matrix A of a n × n Connection relation (side), and do not consider from ring.When all units are normal in complex network, adjacency matrix is denoted as A*.

By node k_iUnit collection U_i={ u₁,u₂,…,u_mBe divided into three classes unit collectionThenTable Show unit collectionIn all nodes be destroy space in trouble unit, similarly can be rightWithTwo class unit collection into Row description.Based on above-mentioned classification, with node k_iFor, it is assumed that element is reflected with element in adjacency matrix A in service mode matrix S Penetrate relationship f_S→AFor

Above-mentioned relation indicates, as node k_iA classes unit when all destroying, disconnected with the associated all sides of the node； As node k_iB classes unit when all destroying, the side that remaining node is directed toward by the node disconnects；As node k_iC classes units it is whole When destruction, the side that the node is directed toward by remaining node disconnects.Initial repair state based on complex network, by mapping relationship f_S→A The adjacency matrix A of initial repair state can be generated.

Step 3：Priori service mode transition probability based on Neural Network model predictive cluster.

Design one compression-excitation residual error network (Squeeze-and-Excitation Residual Networks, SE-ResNet the priori service mode transition probability matrix p and priori cluster maintenance policy valence of " node-unit " cluster) are predicted Value v.

Neural network input feature vector figure X：Including current " node-unit " cluster service mode S, maintenance policy iteration mistake The neighbour of nearest history cluster service mode (by taking 7 step history cluster service modes as an example) and complex network node in journey Meet matrix A (S) and A*.

Neural network output information：A priori cluster service mode transition probability p including " node-unit " cluster and One priori cluster maintenance policy is worth v.

The neural network structure of selection：Including convolution module, residual error module, compression-excitation (Squeeze-and- Excitation, SE) module, ReLU function modules etc..The expression formula of neural network is f_θ(X)=(p, v).

Step 4：The maintenance policy solution space of cluster is traversed based on Monte Carlo tree search algorithm.

To improve complex network " node-unit " clustering performance recovery extent, reduction recovery time is target, structure repair Strategy is from improved iteration system.A kind of intensified learning frame based on improved weighting MCTS algorithms is designed, for solving most Excellent maintenance policy.

MCTS algorithms avoid the direct overall situation using the Maintenance forecast result p of SE-ResNet in step 3 as search weight There is multiple shot array problem in search cluster maintenance policy solution space, and the local search that solution space is carried out based on prior probability p is same Global optimum's maintenance policy can be obtained, improved service mode transition probability matrix π is obtained according to tree search, is executed primary global Best maintenance policy acts a, and current " node-unit " cluster service mode S is transferred to subsequent time cluster service mode, Its expression formula of MCTS algorithms is MCTS_θ(X, p, v)=(π, a).

Step 5：Variation based on cluster service mode updates complex network adjacency matrix.

From after executing the best maintenance action at development a certain moment, cluster service mode is transferred to next recovery policy Moment, based on the variation of cluster service mode, according to the mapping relationship f in step 2_S→A, update complex network adjacency matrix.

Step 6：Calculate and examine the recovery extent of complex network.

A recovery policy is executed from after improving operation (including Step 3: step 4 and step 5), after shifting " node-unit " cluster service mode S and its adjacency matrix A (S) calculates the recovery extent of complex network.

It is required if not meeting recovery, return to step three, continues to execute recovery policy and improve operation certainly.

If the cluster service mode S at T moment_TMeet the requirements, then pass through T time from improve operation complete one completely it is extensive Then multiple strategy is performed simultaneously step 7 and step 8 from development.

Step 7：Neural network parameter is trained based on intensified learning empirical parameter.

A reward value z is calculated by reward function to assess recovery policy from development, based on reward value and extensive The newest intensified learning empirical parameter of T groups that multiple strategy is generated from development, SE-ResNet is to minimize the assessed value of prediction Error between v and the reward value z for improving end certainly, and maximize prior state transition probability p and the transfer of improved state Similarity between probability π is target, trains network parameter θ using gradient descent method, obtains a new SE-ResNet and be used for Next time recovery policy from development.The better direction of search can be provided for MCTS by repetitive exercise neural network.

Step 8：Based on recovery policy a complete repair recovery scheme is generated from development

A series of best maintenance action { a stored from development by recovery policy¹,a²,...,a^TGenerate one completely Repair recovery scheme, repair recovery scheme can be expressed as

Recovery=f_Rec(a¹,a²,...,a^T)=1 × a¹+2×a²+…+T×a^T

By final cluster service mode S_TAnd its adjacency matrix A (S_T) calculate and export the recovery extent of complex network.

Description of the drawings

Fig. 1 is the overall architecture block diagram of heretofore described method

Fig. 2 is the SE-ResNet prediction models of priori service mode transition probability in the present invention

Fig. 3 is the SE-Residual cellular constructions that priori service mode transition probability prediction model is selected in the present invention

Fig. 4 is the Monte Carlo tree search algorithm flow chart that maintenance policy solution space is traversed in the present invention

Specific implementation mode

To make technical scheme of the present invention, feature and advantage are better understood upon, below in conjunction with attached drawing, make specifically It is bright.

The present invention gives a kind of novel improvement recovery policy (SIRS) methods certainly, can be used for multiple under local failure state The cluster maintenance policy problem of miscellaneous network solves conventional method and does not fully consider the sequential of cluster repair and the uncertain spy of income The deficiencies of NP-hard features of sign and problem entirety.

The overall architecture of the present invention, as shown in Figure 1.Its specific implementation step is：

When recovery policy is carried out from development to t moment, the service mode matrix expression of " node-unit " cluster is

Element in matrixIndicate t moment node k_nMiddle unit u_mRepair shape State,Indicate that the unit is normal,Indicate the trouble unit in the local failure space that the unit is to be repaired.

Example：If analysis object is a complex network for including 10 nodes, each node includes 6 units, when initial Carve " node-unit " cluster service mode matrix expression be

Element in matrixIndicate initial time node k₁Middle unit u₁Normally,Indicate initial time node k₅ Middle unit u₁It is the trouble unit in local failure space to be repaired.

Above-mentioned relation indicates, as node k_iA classes unit when all destroying, disconnected with the associated all sides of the node； As node k_iB classes unit when all destroying, the side that remaining node is directed toward by the node disconnects；As node k_iC classes units it is whole When destruction, the side that the node is directed toward by remaining node disconnects.

Initial repair state based on complex network, by mapping relationship f_S→AThe adjoining square of initial repair state can be generated Battle array A, the expression formula of adjacency matrix are

Element x in matrix^ij(i, j=1,2 ..., n；I ≠ j) indicate node k_iWith node k_jBetween connection relation (side), x^ijThere is no side (destroy or be not present), x between=0 two nodes of expression^ijThere is one between=1 two nodes of expression By node k_iIt is directed toward node k_jSide.When all units are normal in complex network, adjoining can be generated after the same method Matrix A *.

Example：If the node k in the m × n complex networks established in step 1_iOnly with set of node { k_i-2,k_i-1,k_i+1,k_i+2} In node there are connection relation, then the expression formula of adjacency matrix A* is

Assuming that node k_iUnit collection U_i={ u₁,u₂,…,u₆Be divided into three classes unit collection By mapping relationship f_S→AIt can obtain, the neck of initial time complex network connects square in step 1 Battle array expression formula be

(1) neural network input information：

Neural network input feature vector figure X includes " node-unit " the cluster service mode S of t moment_t, maintenance policy iteration Adjacency matrix A (the S of nearest history cluster service mode and complex network node in the process_t) and A*.With 7 step history For cluster service mode, input feature vector figure X is in the expression formula of t moment

X_t=[S_t,S_t-1,...,S_t-7,A(S_t),A*]

(2) neural network output information：

A priori cluster service mode transition probability p including " node-unit " cluster and a priori cluster repair Policy value v.

A) priori cluster service mode transition probability p of " node-unit " cluster in t moment_t, it is denoted as：

Element in matrixIndicate that unit m executes dimension in t moment is to node n Repair the probability of action.

B) priori cluster maintenance policy is worth v_tIt is a normalized parameter, predicts that the cluster service mode of t moment meets The assessed value of recovery extent.

(3) neural network structure：

The SE-ResNet neural network structures of selection include convolution module, residual error module, compression-excitation (Squeeze- And-Excitation, SE) module, ReLU function modules etc..

Example：The deep neural network of design is as shown in Fig. 2, input feature vector figure X_tBy a depth S E-Residual tower into Row processing, depth S E-Residual towers include an individual convolution module and stack the centre of multiple SE-Residual units Layer module composition：

A) individual convolution module：

1. the convolutional layer being made of 256 3 × 3 filters, step-length 1；

2. ReLU function layers；

B) middle layer module：By stacking the middle layer of SE-Residual unit construction depth neural networks (to stack 19 For layer SE-Residual), the structure of SE-Residual units is as shown in figure 3, include with lower structure：

Residual error 1. (Residual) module：Containing there are one the convolutional layer being made of c filter, which exports one Size is the characteristic pattern of w × h × c, and c is the depth of characteristic pattern (for choosing 256 filters)；

2. compressing (Squeeze) module：It is made of the overall situation pond layer that is averaged；

3. encouraging (Excitation) module：A bottleneck structure being made of two full articulamentums, two full articulamentums Between pass through ReLU functional links, the dimensionality reduction coefficient r of previous full articulamentum is usually arranged as 16；

4. normalizing module：Normalized weight between obtaining 0~1 by a Sigmoid function；

Weights resetting 5. (Reweight) module：It will be on each channel of the Weight after normalization to characteristic pattern；

Note：When in Fig. 3 by SE Module-embeddings to residual error module, SE modules export the channel of characteristic pattern simultaneously with convolutional layer Connection, in residual error moduleThe characteristic pattern exported to convolutional layer in branch before operation has carried out feature recalibration.

The output of depth S E-Residual towers is divided into policy module and value module two parts：

C) policy module：

1. the convolutional layer being made of 21 × 1 filters, step-length 1；

2. ReLU function layers；

3. connection output layer entirely：Output size is the characteristic pattern of m × n, corresponding " node-unit " cluster p_tDecilog；

D) it is worth module：

1. the convolutional layer being made of 11 × 1 filter, step-length 1；

2. the linear full articulamentum that scale is 256；

3. ReLU function layers；

4. linear full articulamentum；

5. tanh output layer：Export the scalar value of a value on section [- 1,1].

The expression formula of above-mentioned depth S E-ResNet prediction models isθ_iIt is ith from development The network parameter of depth S E-ResNet prediction models, initial network parameter θ₀It is obtained by random initializtion.

To improve complex network " node-unit " clustering performance recovery extent, reduction recovery time is target, structure repair Strategy is from improved iteration system.A kind of intensified learning frame based on improved weighting MCTS algorithms is designed, for solving most Excellent repair recovery policy.

MCTS algorithms utilize the Maintenance forecast result p of SE-ResNet in step 3_tAs search weight, avoid directly complete There is multiple shot array problem in office's search cluster maintenance policy solution space, is based on prior probability p_tThe local search for carrying out solution space is same Sample can obtain global optimum's maintenance policy, and improved service mode transition probability matrix π is obtained according to tree search_t, execute primary Global best maintenance policy acts a^t, current " node-unit " cluster service mode S is transferred to subsequent time cluster repair shape Its expression formula of state, MCTS algorithms is

Tree nodes of the cluster service mode S as MCTS search trees, all branch (S, a) corresponding tree node next step All maintenance action a ∈ Action (S), (S a) stores one group of statistical data, as follows to every branch：

Data (S, a)=N (S, a), W (S, a), Q (S, a), P (S, a) }

Wherein, (S a) indicates accessed number to N；(S a) indicates the summation of total action value to W；(S a) indicates average to Q Action value；(S a) indicates selection branch (S, prior probability a) to P.

In service mode input feature vector figure X_tUnder conditions of, with the Study first (p of SE-ResNet acquisitions_t,v_t) it is input, The Searching Resolution Space operation based on MCTS algorithms is executed, as shown in figure 4, its search process includes mainly 4 steps：

(1) it selects

First, the service mode S of t moment is selected_tAs the root node of search tree, root node is denoted as S⁰, MCTS search process By root node, the leaf node S until carrying out to the L moment reaching search tree end^LWhen end.In l moment (1≤l ＜ L), according to present node S^lEvery branch storage statistical data select a maintenance action a_l, it is represented by

Wherein U (S^l, it is a) intermediate variable, refers to a kind of improved PUCT algorithms, be represented by

Wherein c_puctIt is a constant determined by MCTS search degree；Initially tendency and the selection of this search control strategy Action a with higher prior probability and relatively low access times, but with search into guild more be inclined to selection have compared with The action of height action value.

(2) extension and assessment

By leaf node S^LIt is added in a queue, by mapping relationship f_S→AGenerate A (S^L), and then obtain leaf node The input feature vector figure X of corresponding cluster service mode^L, it is input to neural network and is expanded the side (S of leaf node^L, a) need The statistical data of storage, this operation can be expressed as

f_θ(X^L)=(p_a,v)

Before completing aforesaid operations, this search thread is constantly in locking state.As leafy node S^LContinue extension When, its each branch (S^L, a) the statistical parameter initialization of storage, can be expressed as

Data(S^L, a)={ N (S^L, a)=0, W (S^L, a)=0, Q (S^L, a)=0, P (S^L, a)=p_a}

(3) recall

Statistical data, which is recalled along all branches that search thread accesses from leaf node to root node, to be transmitted, and is updated and is deposited It is stored in search tree branch.In trace-back process, branch (S^l,a_l) storage the update of access times parameter it is primary, can be expressed as

N(S^l,a_l)=N (S^l,a_l)+1

Meanwhile the branch (S^l,a_l) total action value and averagely action value parameter also update once, can be expressed as

W(S^l,a_l)=W (S^l,a_l)+v

(4) it executes

It is operated by iteration above three, after completing 1000 tree search, according to an improved system service mode Transition probability matrix π_tSelect the best maintenance action a of t moment^t, cluster service mode S_tIt is transferred to S_t+1, π_tIn element π can be with It is expressed as

π(a|X_t)=N (X_t,a)^1/τ/∑_bN(X_t,b)^1/τ

Wherein τ is the temperature parameter of a command deployment process.

Search tree is continuing with next from improving operation, executes best maintenance action a every time^tReach later Child node becomes new search root vertex, retains all branches of the node, while abandoning its cotree of a root node Branch.

The search spread of maintenance policy solution space is operated based on MCTS algorithm performs 1000 times, t moment finally can be obtained The best maintenance action a of the overall situation^tWith improved service mode transition probability matrix π_t, expression formula is

Example：By Step 1: step 2 and step 3 obtain one group of (X_t,p_t,v_t) after, the traversal repair of MCTS algorithm search Tactful solution space obtains improved m × n service modes transition probability matrix

The maximum unit of selection wherein service mode transition probability executes maintenance action, the global best maintenance action of t moment a^tExpression formula be

Above formula indicates t moment to node k₂Unit u₂Maintenance action is executed, t moment cluster service mode turns after completion Move to the t+1 moment.

Recovery policy is from after executing the best maintenance action at development t-1 moment, when cluster service mode is transferred to t It carves, based on the variation of cluster service mode, according to the mapping relationship f in step 2_S→A, update complex network adjacency matrix, table It is up to formula

Element in matrixIndicate t moment node k_iWith node k_jBetween connection relation (side),Indicate there is no side (destroy or be not present) between two nodes,Indicate there is one between two nodes By node k_iIt is directed toward node k_jSide.

Step 6：Calculate and examine the recovery extent of complex network.

A reward value z is calculated by reward function to assess recovery policy from development, based on reward value and extensive The newest intensified learning empirical parameter that multiple strategy is generated from development, SE-ResNet with minimize the assessed value v of prediction with Error between the reward value z for improving end, and maximize prior state transition probability p and improved state transition probability Similarity between π is target, trains network parameter θ, loss function that can be expressed as using gradient descent method

Loss=(z-v)²-π^Tlogp+c||θ||²

After the completion of network parameter training, a new SE-ResNet improving certainly for recovery policy next time is obtained Journey.The better direction of search can be provided for MCTS by repetitive exercise neural network.

Recovery=f_Rec(a¹,a²,...,a^T)=1 × a¹+2×a²+…+T×a^T

Example：If " node-unit " the cluster service mode S of initial time 10 × 6 in step 1₀It is improved certainly in recovery policy Process performs 5 maintenance action { a altogether¹,a²,a³,a⁴,a⁵, the repair recovery scheme of generation can be expressed as

The program indicates, according to repair sequential, to execute maintenance action to the following units successively：Node k₇Unit u₂, node k₃Unit u₆, node k₇Unit u₆, node k₂Unit u₅, node k₁₀Unit u₆。

Claims

1. improving recovery policy method certainly based on the complex network local failure for improving intensified learning, it is characterised in that：It is wrapped Containing following steps：

The first step：The cluster service mode matrix of complex network is established based on local failure：Complex web is established according to information is destroyed The service mode 0-1 matrixes of network " node-unit " cluster.

Second step：Complex network adjacency matrix is generated based on initial cluster service mode：Consider service mode matrix and adjacent square The mapping relations of battle array generate complex network adjacency matrix based on initial cluster service mode.

Third walks：Priori service mode transition probability based on Neural Network model predictive cluster：Design a SE-ResNet god Priori service mode transition probability through neural network forecast " node-unit " cluster and priori maintenance policy value.

4th step：The maintenance policy solution space of cluster is traversed based on Monte Carlo tree search algorithm：Maintenance policy solution space is traversed, Obtain improved service mode transition probability matrix, and selection current time global best maintenance action accordingly.

5th step：Variation based on cluster service mode updates complex network adjacency matrix.

6th step：Calculate and examine the recovery extent of complex network：Cluster service mode based on complex network and adjacency matrix It calculates and examines its recovery extent.

7th step：Neural network parameter is trained based on intensified learning empirical parameter：It is generated from development based on recovery policy One group of newest intensified learning empirical parameter trains neural network parameter using gradient descent method.

8th step：Based on recovery policy a complete repair recovery scheme is generated from development：It is improved certainly by recovery policy A series of best maintenance actions of process storage generate a complete repair recovery scheme.

By above step, a kind of improvement recovery policy method certainly based on improvement intensified learning is given, complexity can be solved The recovery policy problem of cluster repair is carried out under the collapse state of network part.

2. according to claim 1 improve recovery policy side certainly based on the complex network local failure for improving intensified learning Method, it is characterised in that：In the first step in " the cluster service mode matrix for establishing complex network based on local failure ", The recovery problem of complex network local failure state is considered as to the cluster maintenance problem of multinode, complexity is established according to information is destroyed The service mode 0-1 matrixes of network " node-unit " cluster.

First, the node set K={ k of complex network are built₁,k₂,…,k_i,…,k_j,…,k_n(wherein n is the number of node), The composition of each node is disassembled, its unit set U={ u are established₁,u₂,…,u_i,…,u_j,…,u_m}.Based on this, it builds " node-unit " matrix of vertical m × n, and according to information is destroyed with " 0 ", " 1 " to element assignment in matrix, form service mode Matrix S.

3. according to claim 1 improve recovery policy side certainly based on the complex network local failure for improving intensified learning Method, it is characterised in that：It, will in " generating complex network adjacency matrix based on initial cluster service mode " described in second step One complex network is abstracted as one by set of node K={ k₁,k₂,…,k_i,…,k_j,…,k_nAnd connection (side) collection The figure G=(K, E) of composition.Connection relation in complex network between n node is described with the adjacency matrix A of a n × n (side), and do not consider from ring.When all units are normal in complex network, adjacency matrix is denoted as A*.

By node k_iUnit collection U_i={ u₁,u₂,…,u_mBe divided into three classes unit collectionThenIndicate single MetasetIn all nodes be destroy space in trouble unit, other two classes unit collection can be similarly described. Based on above-mentioned classification, with node k_iFor, it is assumed that the mapping relations of element and element in adjacency matrix A in service mode matrix S f_S→AFor

Above-mentioned relation indicates, as node k_iA classes unit when all destroying, disconnected with the associated all sides of the node；Work as node k_iB classes unit when all destroying, the side that remaining node is directed toward by the node disconnects；As node k_iC classes units all destroy When, the side that the node is directed toward by remaining node disconnects.Initial repair state based on complex network, by mapping relationship f_S→AIt can be with Generate the adjacency matrix A of initial repair state.

4. according to claim 1 improve recovery policy side certainly based on the complex network local failure for improving intensified learning Method, it is characterised in that：It is described in the third step that " the priori service mode transfer based on Neural Network model predictive cluster is general In rate ", devise a compression-excitation residual error network (Squeeze-and-Excitation Residual Networks, SE-ResNet the priori service mode transition probability matrix p and priori cluster maintenance policy valence of " node-unit " cluster) are predicted Value v.

Neural network input feature vector figure X：Including in current " node-unit " cluster service mode S, maintenance policy iterative process Nearest history cluster service mode (by taking 7 step history cluster service modes as an example) and complex network adjacency matrix A (S) and A*.

Neural network output information：Including a priori cluster service mode transition probability p of " node-unit " cluster and one Priori cluster maintenance policy is worth v.

5. according to claim 1 improve recovery policy side certainly based on the complex network local failure for improving intensified learning Method, it is characterised in that：" the maintenance policy solution sky of cluster is traversed based on Monte Carlo tree search algorithm described in the 4th step Between " in, to improve complex network " node-unit " clustering performance recovery extent, reduction recovery time is target, structure repair plan Slightly from improved iteration system.A kind of intensified learning frame based on improved weighting MCTS algorithms is designed, it is optimal for solving Maintenance policy.

The Maintenance forecast result p of SE-ResNet avoids direct global search as search weight during MCTS algorithms are walked using third There is multiple shot array problem in cluster maintenance policy solution space, and the local search of solution space is carried out based on prior probability p and can equally be obtained To global optimum's maintenance policy, improved service mode transition probability matrix π is obtained according to tree search, is executed primary global best Maintenance policy acts a, and current " node-unit " cluster service mode S is transferred to subsequent time cluster service mode, and MCTS is calculated Its expression formula of method is MCTS_θ(X, p, v)=(π, a).

6. according to claim 1 improve recovery policy side certainly based on the complex network local failure for improving intensified learning Method, it is characterised in that：In " the variation update complex network adjacency matrix based on cluster service mode " described in the 5th step, For recovery policy from after executing the best maintenance action at development a certain moment, cluster service mode is transferred to subsequent time, base In the variation of cluster service mode, according to the mapping relationship f in second step_S→A, update complex network adjacency matrix.

7. according to claim 1 improve recovery policy side certainly based on the complex network local failure for improving intensified learning Method, it is characterised in that：In " recovery extent for calculating and examining complex network " described in the 6th step, primary recovery plan is executed Slightly from after improving operation (including third step, the 4th step and the 5th step), by " node-unit " cluster service mode after shifting S and its adjacency matrix A (S) calculates the recovery extent of complex network.

It is required if not meeting recovery, returns to third step, continued to execute recovery policy and improve operation certainly.

If after the T times is improved operation certainly, cluster service mode satisfaction restores requirement, then passes through T times from improvement operation completion Then one complete recovery policy is performed simultaneously the 7th step and the 8th step from development.

8. according to claim 1 improve recovery policy side certainly based on the complex network local failure for improving intensified learning Method, it is characterised in that：In " training neural network parameter based on intensified learning empirical parameter " described in the 7th step, by rewarding Function calculates a reward value z and is assessed from development recovery policy, is improved certainly based on reward value and recovery policy The newest intensified learning empirical parameter of T groups that journey generates, SE-ResNet are terminated with the assessed value v for minimizing prediction with from improvement Reward value z between error, and maximize prior state transition probability p and improved state transition probability π between phase It is target like degree, network parameter θ is trained using gradient descent method, obtains a new SE-ResNet for restoring plan next time Slightly from development.The better direction of search can be provided for MCTS by repetitive exercise neural network.

9. according to claim 1 improve recovery policy side certainly based on the complex network local failure for improving intensified learning Method, it is characterised in that：" a complete repair recovery side is generated from development based on recovery policy described in the 8th step In case ", a series of best maintenance action { a for being stored from development by recovery policy¹,a²,...,a^TGenerate one completely Recovery scheme is repaired, repair recovery scheme can be expressed as

Recovery=f_Rec(a¹,a²,...,a^T)=1 × a¹+2×a²+…+T×a^T