CN110740054A

CN110740054A - data center virtualization network fault diagnosis method based on reinforcement learning

Info

Publication number: CN110740054A
Application number: CN201910644115.7A
Authority: CN
Inventors: 东方; 沈典; 张欢欢; 王士琦; 罗军舟
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2019-07-17
Filing date: 2019-07-17
Publication date: 2020-01-31
Anticipated expiration: 2039-07-17
Also published as: CN110740054B

Abstract

The invention discloses data center virtualization network fault diagnosis methods based on reinforcement learning, which comprise the following steps of 1 initializing a network fault diagnosis model, 2 training a Q table by adopting a reinforcement learning algorithm according to a set fault diagnosis target, wherein the Q table records an accumulated discount reward value obtained by taking each action under each fault, 3 mapping network state information to a network state in the Q table when the fault occurs, inquiring the Q table according to the network state, selecting the action as a fault diagnosis result according to the maximum reward value principle, 4 optimizing a network state space by using an information gain method in steps, reducing the use cost of a model memory and improving the diagnosis precision.

Description

data center virtualization network fault diagnosis method based on reinforcement learning

Technical Field

The invention belongs to the field of data center networks and reinforcement learning, and particularly relates to diagnosis methods for solving data center virtualization network faults by using a reinforcement learning algorithm.

Background

Data centers form large-scale clusters through network connection servers, and provide mass storage capacity and ultra-strong computing capacity for upper-layer applications in a demand-allocation and elastic expansion mode.

However, network failures cause problems of extended task completion time, slow response, unavailable service and the like of an application program, user experience is reduced, data center availability is reduced, taking a cache system as an example, the cache system places recently accessed or frequently accessed contents in a memory, when a request comes, application response speed is increased by pre-accessing a cache, server access pressure is reduced, and network failures (such as network inaccessibility) cause problems that a user request is directly sent to a server, server load is increased sharply, partial user requests are reduced in response speed, even the service is unavailable and the like, the data center has the characteristics of network heterogeneity, complex communication and the like, so that a large number of network failures exist in the data center, Tortoto and research teams of Microsoft find that the average data center has 5.2 device failures each day, 40.8 link failures, each failure needs 5 minutes for positioning and diagnosis, the longest failure diagnosis time reaches weeks, each failure is found in the research process, each failure causes 59000 failures in Stantan 61 failures in the year, the economic failure diagnosis time reaches weeks, the important data center failure diagnosis results of the USC > 3957, the main failure diagnosis cost is more than 20.56, the failure, the main failure causes the loss of the network failure, the overhead of the USC # 2, the USC shows that the failure is increased, the failure causes the failure, the failure of the failure, the failure of the USC loss of the USC # 2, the USC, the failure, the USC shows that the failure is found that the failure is increased, the failure, the USC, the failure is found that the USP fails, the failure, the USP fails, the USP.

The method mainly comprises the core steps of the process of model construction and fault identification, the common information acquisition method mainly comprises the information acquisition method based on network equipment and the information acquisition method based on a terminal, the information acquisition method based on the network equipment mainly realizes the acquisition function on a switch, such as message dyeing, message sampling, message mirroring and the like.

The existing network fault diagnosis research work mainly aims at the traditional network fault diagnosis, and with the development of virtualization technology, the virtualization network technology becomes the mainstream network construction mode of a data center in recent years. The virtual network is an abstraction of the traditional network, and a user customizes a private network on a shared physical network by building virtual equipment, a virtual link and a virtual machine, so that the communication of the virtual machine under a specific network topology is realized. The virtualized network exists in all servers and mainly consists of software-defined network devices, virtual links, virtual machines, and the like. The virtualization network realizes connectivity among virtual machines by introducing a large number of virtual devices: the TA equipment is used as a virtual network card to realize the connectivity of the virtual machine and the external equipment; OpenvSwitch (OVS) is used as a virtual bridge, and the functions of message forwarding, flow control and the like are realized. The virtualized network has the characteristics that the network state changes frequently and the virtual devices share the server resources through configuration parameters. In order to fully utilize resources, a cloud data center frequently migrates virtual machines, and research shows that a large data center may migrate thousands of virtual machines per second, and frequent virtual machine migration may cause a wide range of changes in network states of the data center, which may cause a failure. Meanwhile, the virtual network uses a large number of virtual devices to realize connectivity among virtual machines, the virtual devices are deployed in a server, share server resources, and allocate available resources through global configuration of virtual device parameters, such as flow table priority, routing forwarding rules, queue length and the like, so that reasonable parameter configuration and optimization are realized, and high-quality network communication of the virtual network is realized. The use of virtualized networks presents new problems and challenges to traditional network failures. The main two aspects are as follows:

the conventional data center network has the characteristics of stable network communication and relatively few temporary faults, and the data acquisition scale or data quality loss caused by reducing the information acquisition granularity has low influence on the network fault diagnosis precision, so the conventional research work mainly uses a sampling acquisition or information periodic uploading method to reduce the acquisition cost.

The method comprises the steps that a virtual device shares server resources through configuration parameters, so that performance of multiple devices is reduced when network faults occur, the virtual device has a large number of faults with similar characteristics due to fuzzy boundary of different fault characteristics, such as the phenomenon that a network card and a TUN device lose package due to CPU competition or memory bandwidth competition, the problem that accessibility, loops and other faults are effectively diagnosed by a modeling method based on a message path in the research work of network fault diagnosis of the existing data center is high in information acquisition cost and limited in fault diagnosis range, the problem that training data is incomplete and data quality is low in a classification model established based on a machine learning method, the problem that accuracy of the model is sacrificed in the training process of the model based on the machine learning method is low in order to avoid overfitting problems in the training process of the model, the problem that the accuracy of a fault diagnosis model trained based on the machine learning method is difficult to improve and the problem that the similar faults are difficult to accurately diagnose is difficult to diagnose is solved, meanwhile, although the fault diagnosis model can identify fault types, the fault causes difficulty in diagnosing the fault types of the network devices, such as the problem that the network fault information is difficult to be identified by analyzing characteristics that the virtual network characteristics that the network fault diagnosis of a network communication model based on a network learning method, such as the network fault diagnosis of a network fault diagnosis network card, a network device, a network card, a network communication network, a network fault diagnosis method, a network card is difficult to diagnose a fault diagnosis method, a fault diagnosis network device, a fault diagnosis method, a network card, a fault diagnosis method of a network device, a network card, a network device, a network failure diagnosis method of a network device, a network.

The existing fault diagnosis method is also greatly limited when being applied to fault diagnosis of a virtualization network, and the method for information acquisition overhead and fault diagnosis precision cannot meet the low-overhead and high-precision diagnosis requirements of the virtualization network of the data center.

Disclosure of Invention

The invention aims to provide data center virtualization network fault diagnosis methods based on reinforcement learning, which can overcome the problems of high information acquisition overhead and low fault diagnosis precision existing in the application of the existing fault diagnosis method to a virtualization network pointed out in the background art.

In order to achieve the above purpose, the solution of the invention is:

A data center virtualization network fault diagnosis method based on reinforcement learning comprises the following steps:

step 1, initializing a network fault diagnosis model;

step 2, training a Q table by adopting a reinforcement learning algorithm according to a set fault diagnosis target, wherein the Q table records the accumulated discount reward value obtained by taking each action under each fault;

step 3, when a fault occurs, mapping the network state information to the network state in the Q table, inquiring the Q table according to the network state, and selecting an action as a fault diagnosis result according to the principle of maximum reward value;

step 4, the information gain method is used to further optimize the network state space.

The specific process of the step 1 is as follows:

step 11, representing a virtualized network environment by using 1028-dimensional vectors composed of server operation environment information, virtual device parameter configuration information and virtual machine network information;

step 12, with equal distance radius ═<r₁,r₂,…,r_d>Dividing every dimensional data to construct a network state space set r_dRepresenting the division interval of the d-th dimension data, wherein d is the characteristic quantity of the virtualized network information;

step 13, setting an action set, wherein the execution action set comprises 21 instructions, and each instructions represent solutions of faults;

step 14, selecting action a by using an exploration strategy of E-greedy in the action selection process_tBalancing the training time and the fault diagnosis precision of the model;

step 15, updating training memory by using a round updating mode, if the fault resolution immediate reward value is R, otherwise, if the reward value is 0, updating a Q table by a system after the fault resolution for every faults, wherein the Q table updating formula is as follows:

where γ ∈ (0,1) is the discount rate, α ∈ (0,1) is the learning efficiency, and R denotes being in the state S_nSelection action a_nRear instant prize value, Q_nIs shown in state S_nTime corresponds to action a_nNumerical values in the table of Q, Q_n(S_n,a_n) Is shown in state S_nSelection action a_nThe latter jackpot value.

The step 2 of training the Q table by using the reinforcement learning algorithm specifically comprises the following steps:

step 21, injecting a fault into the virtualized network by using a fault injection mode;

step 22, identifying network abnormality by using a network fault perception model and sending a diagnosis request to a fault diagnosis server;

step 23, the fault diagnosis server maps the multi-dimensional information in the diagnosis request to a state space in the Q table after preprocessing;

step 24, selecting an action by using an E-greedy exploration strategy and transmitting the action to a fault server;

step 25, the fault server executes the issued action, judges whether the fault is solved by using a fault perception model, and feeds back a perception result to the fault diagnosis server;

step 26, if the fault is solved, updating the Q table, and turning to step 27; if the fault is not resolved, repeating steps 22-26;

step 27, inject the next faults, and repeat steps 21 through 27 until the Q table converges.

In the step 3, the classification model trained by the decision tree algorithm is used for identifying the network fault and is deployed in all the information acquisition servers, and the information acquired in real time is input into the model for identifying the network fault.

The specific steps of the step 4 are as follows:

step 41, setting a memory use constraint condition L;

step 42, for the optimal action in any network state, after the Q table converges, Q is the value, n represents the number of iterations:

Q＝R(1-(1-α)ⁿ)≤R

for non-optimal actions in any network state, the Q value after Q table convergence is:

Q＝γR(1-(1-α)ⁿ≤γR

step 43, for the network state with multiple faults, the Q values of different actions in the Q table are:

Q_s＝<R,R,…,R,γR,γR,…,γR>

by calculating Q_SThe number of middle Rs identifies multiple fault conditions;

step 44, in the Q table training process, counting all the training samples X ═ X (X) received by the Q table₁,x₂,…,x_d) Wherein x is_dRepresenting the value of the d-th attribute, assuming that X is divided into states S and action a is selected_tCounting data X divided into states S and action selection a under the data_tComposition sample data T ═ X, a_t)，a_tA category label for the new sample;

step 45, setting a dividing boundary condition by using the fault ratio and the number of samples in the state S:

or

Wherein c is_iRepresents each action a_tThe number of the middle samples X is,

comprises the following steps:

where m is the number of faults obtained according to step 43;

step 46, for the state which does not meet the boundary division in step 44, using the information gain to calculate the information gain value of every features in the state S, selecting the feature with the maximum information gain to split into two new states, and retraining the model;

and step 47, repeating the steps 43 to 46, and constructing an optimal network state space under the condition that the memory constraint condition is met.

In step 45, the calculation formula of the information gain is as follows:

wherein Ent (D) represents that the ratio of kth class sample in the sample set D is p_kThe entropy of time information, where k is 1,2, …, γ, γ is the number of categories, Gain (D, D) represents the information Gain when the attribute D is used to divide the sample set D, the range of D is constructed by the bisection method, and for the range f (X) of some attribute X, the values are sorted from small to large to be (X)¹,x²,…,xⁿ) Then, the candidate partition nodes constructed by the dichotomy are:

for V branches generated by dividing D, wherein the V-th branch contains D with D attribute as D^vAll samples of (2) are denoted as D^v。

By adopting the scheme, the invention mainly solves the problems that the existing method for diagnosing the virtual network fault has large information acquisition overhead and low fault diagnosis precision due to the fact that the virtual network has the characteristics of frequent change of network states and the virtual equipment shares server resources through parameter configuration. The core logic includes: and constructing a fault diagnosis system framework, acquiring system information, and sensing and diagnosing faults. Firstly, establishing a Q table by adopting a reinforcement learning algorithm according to a set network fault diagnosis target; and then monitoring the virtualized network fault of the data center by using an information acquisition module and a fault perception model, mapping fault information to a network state space in a Q table, and selecting an action according to a maximum reward principle to diagnose the network fault.

Compared with the prior art, the invention has the following advantages:

(1) according to the invention, a fault perception model is deployed on an edge server, so that the aim of filtering normal network state information in the information acquisition process and reducing information acquisition overhead is achieved;

(2) the invention adopts a reinforcement learning algorithm to establish the relationship between the data center virtualization network fault and the fault solution, and can effectively identify a large number of faults with similar characteristics in the virtualization network;

(3) the invention optimizes the network state space in the Q table by using an information gain method in step , can effectively reduce the memory use overhead of the fault perception model, and improves the fault diagnosis precision.

Drawings

FIG. 1 is a schematic diagram of a reinforcement learning-based virtualized network fault diagnosis model training process according to the present invention;

FIG. 2 is a schematic diagram of a reinforcement learning based virtualization network fault diagnosis framework of the present invention;

FIG. 3 is a schematic diagram of a fault diagnosis model module of the present invention;

FIG. 4 is a flow chart of the fault diagnosis model training and use of the present invention.

Detailed Description

The technical solution and the advantages of the present invention will be described in detail with reference to the accompanying drawings.

The invention provides data center virtualization network fault diagnosis methods based on reinforcement learning, which comprise four parts, namely construction of a reinforcement learning-based fault diagnosis framework, information acquisition, fault perception and fault diagnosis, and specifically comprise the following steps:

virtualization based on reinforcement learningIn the network fault diagnosis model training process, as shown in fig. 1, the invention applies the reinforcement learning algorithm to the field of data center network fault diagnosis, and realizes the network fault diagnosis with low overhead and high precision for the virtualized network fault. In the process of training the reinforcement learning model, actions are selected according to strategies in different states, and then the actions are verified by the environment and the advantages and disadvantages of the actions are measured according to the feedback reward value. If the states in the reinforcement learning model are defined as nodes and the action sets are defined as edges, the process of training the reinforcement learning model is shown in fig. 1. Nodes a, b, c, d in the diagram represent states, b is the termination state, A₁,A₂Representing a set of actions, and determining the node relation, the direction of the edge and the weight in the graph through the execution and feedback of the actions. When in use<State, action>When the revenue in the table no longer changes or changes less, the model training is complete. In the using process of the model, according to the memory of future benefits in the table, selecting the action corresponding to the maximum benefit each time in a greedy mode, for example, selecting the action A in the state a₁The goal of the reinforcement learning training process is to train the table on the right side of fig. 1, where column shows the state space S ═ a, b, c, d on the vertical axis, and row shows the action set a ═ a₁,A₂The value in the table represents the desired benefit from being in state S select action a.

In order to solve the problem of low precision of fault diagnosis representing similarity, the most ideal network state division is that there are faults in any state, in the four states { a, b, c, d }, { a, c, d } are fault states, b is a normal state, and after entering the normal state, the current round of diagnosis is finished, faults in the virtualized network may cause performance loss (multiple fault causes) in multiple places, for example, when CPU competition is small, network card cache queue is small, CPU processing messages in network card queue is slow, virtual machine internal application processing is slow, and so on, network performance is degraded, so in order to identify numerous fault causes, there may be multiple actually executed actions.

Wherein S_HIndicates a normal state, S_EIndicating a fault condition, A_kRepresenting a sequence of action executions, i.e. for any fault condition, there are action sequences A_kThe network can be brought from the fault state to the normal state.

Based on the above analysis, the present invention designs a schematic diagram of a reinforcement learning-based virtualized network fault diagnosis framework as shown in fig. 2, and the main steps of the framework are as follows:

step A1, the information acquisition module and the fault perception module are deployed in a server of the data center, and the fault diagnosis module is deployed in a fault diagnosis server.

Step a2, a fault injection tool is used to inject various types of faults into all servers.

Step A3, the fault perception model identifies the fault and sends a fault diagnosis request to the fault diagnosis server.

And step A4, the fault diagnosis server maps the fault information to the network state space of the Q table, selects a proper action in the action set by using the E-greedy exploration strategy, and sends the action to the fault server.

And step A5, the failure server receives the issued action and executes the action.

And A6, collecting network state information of next periods, carrying out fault perception by a fault perception model, and sending a fault perception result to a fault diagnosis server.

Step a7, the failure diagnosis server updates the record in the Q table according to the feedback result.

Step A8, repeat the process A1 through A7 until the Q table converges.

According to the analysis of fig. 1 and fig. 2, the precision of network state space division in the Q table directly determines the precision of fault diagnosis, fine-grained division can improve the precision of fault diagnosis but can increase the memory use overhead of the model, coarse-grained division reduces the memory use overhead but a plurality of network faults exist in partially divided network states, resulting in low precision of fault diagnosis, by analyzing virtualized network states, most of the network state spaces are normal network states and do not need to be stored, and if the network states with a plurality of faults can be divided into stages, the precision of network state space division can be effectively improved, therefore, the steps of improving the precision of network state space division are mainly as follows:

and step B1, recognizing multiple fault diagnoses, and calculating the values of actions after the Q table is converged according to the following updating formula:

for optimal action in any network state, Q is equal after Q table convergence

Q＝R(1-(1-α)ⁿ)≤R

Q＝γR(1-(1-α)ⁿ≤γR

therefore, for a network state with multiple faults, the Q values of different actions of the Q table are:

Q_S＝<R,R,…,R,γR,γR,…,γR>

by calculating Q_SThe number of medium data greater than gammar identifies a multiple failure network condition.

Step B2, setting splitting condition, and aiming at any state S in Q table training process_tE.g. S, when the Q table receives the sample X ═ X₁,x₂,…,x_d) In which x is_dRepresenting the value of the d-th attribute, assuming that X is divided into states S_tIn and select action a_tThen the division into states S can be counted_tData X of (2) and action selection a under the data_tComposition sample data T ═ X, a_t)，a_tIs a category label for the new sample. Suppose S_tThe fault occurring inCorresponding to a sample number of

Network state S_tAll the samples in the fault state are equal in occurrence probability, and the fault state S can be represented by the ratio of the number of the fault samples_tThe probability of (1), i.e.:

then the definition is divided into states S_tNumber of samples M and

the state S need not be aligned when the following formula is satisfied_tSplitting:

or

Where Q is the minimum number of split samples and θ is the fault split ratio. I.e. for state S_tThe total number of samples is less, which indicates that the state has low probability of occurrence, and the certain fault duty ratio is higher than theta, which indicates that faults mainly exist in the state, and the state S in the two cases_tNo splitting is required.

In step B3, the invention uses information entropy to measure sample purity, and uses information gain to decide how to divide the attribute. Suppose S_t＝(D₁,D₂,…,D_d) Wherein D is_kThe k-th characteristic interval. In this chapter, the continuous numerical value processing method in the decision tree construction process is used for reference, the dichotomy is used for discretizing the continuous numerical value and then calculating the information gain, and S is calculated according to the formula in step 47_t＝(D₁,D₂,…,D_d) The information gain of the middle D characteristics divided for different classes is calculated and assumed to be at the k-th characteristic D_kThe value z of (a) is obtained as an information gainA maximum value. Then state S_tK-th attribute of (2)_kSplitting into two states according to the value z

And

the two states are added to the Q-table and the model continues to be trained.

In order to implement a virtualized network fault diagnosis model based on reinforcement learning, the invention implements a fault diagnosis model module schematic diagram as shown in fig. 3, and the fault diagnosis server is mainly deployed with: the system comprises a fault automatic injection module, a diagnosis process rollback module, a model training module and a communication agent module. The modules deployed by the edge server mainly comprise: the device comprises an information acquisition module, a fault perception module, an action instruction execution module and a communication agent module.

(1) Action execution module

The method comprises the steps that a reinforcement learning module sends an action to a fault server according to a strategy, the fault server executes an action verification diagnosis result, however, the action designed in the document is not a complete instruction, such as a sudo ethyool-G eth0 rx 1024 instruction, and the action is directly executed after being sent to the fault server, but a plurality of instructions need the fault server to complete, for example, a CPU limit instruction is used, a PID in the CPU limit-p PID-l limit is a process number, the limit is a CPU value (set to 10) which is limited in use, the whole instruction indicates that the CPU utilization rate of a limited Process (PID) is 10%, the fault diagnosis module sends the instruction to the fault server, the fault server needs to obtain a current system process tree, analyzes the process PIDs which use the most CPUs, and splices the process PIDs into the CPU limit instruction to execute, the process PIDs are obtained by executing a top-n 1-b instruction execution module which is realized in the section , the top-n 1-b instruction is used to obtain process tree information (a top-n 1-b instruction execution result, then the top instruction output a top instruction is analyzed, the result of the top instruction, the output of a top instruction is output, the result of a process instruction is spliced, the result of the CPU execution is obtained by the result, the result of the CPU execution is obtained by the highest utilization rate of the CPU.

(2) Fault automation injection module and fault diagnosis process rollback module

In order to accelerate the model training process, when the fault is solved by the action selected by the fault diagnosis module, all the previously executed action sequences need to be immediately rolled back, and a new fault is injected, so that the timing task provided by Linux cannot realize the fine-grained fault injection function. According to the theoretical part of chapter four, the method for updating the Q value by using the round updating is selected, so that the complete diagnosis process can be memorized in the model training process, and the Q values of all operations are updated after the diagnosis is finished. Thus, all actions performed can be known by examining the diagnostic process, thereby rolling back all actions and selecting new faults for injection. The method specifically comprises the following steps: assume a command sequence executed after the fault diagnosis ends: 1. and (3) increasing a network card receiving queue buffer, and 2. the CPU limits the use of a process CPU. For the adjustment of the network card cache instruction, the network card cache instruction can be directly rolled back to the original parameters. The CPU limit needs to restore the process execution state of the Linux server, the real operating environment is simulated by using stress-ng and MBW in the embodiment, and the process with the highest CPU utilization rate analyzed when the CPU limit instruction is executed is necessarily the CPU use process simulated by the stress-ng instruction. Therefore, in the instruction recovery process, the present embodiment directly executes all relevant processes of kill, re-executes the environment simulation instruction to recover the original running state, and then executes a new fault injection instruction to inject a fault. The realization and the use of the module accelerate the fault injection frequency and shorten the training time of the model.

(3) Network state space division and training module

The specific implementation process includes that whether a memory limit condition is met is checked, if the memory limit condition is exceeded, the training is finished, otherwise, whether a plurality of faults exist in the network state in a Q table is checked, if the faults exist, whether the state division condition is met is calculated, if the state division condition needs to be carried out , the state is added into an Error queue, if the Error queue is empty, the model training is finished, if the Error queue is not empty, information gains of every features are calculated according to a network state division algorithm, the features with the largest information gains are selected to be split into two states and added into a Q table, and the model before splitting is deleted from the Q table.

(4) The overall process of model training is shown in fig. 4, and the main steps are as follows:

step C1, training a Q table;

step C2, checking whether the network state in the Q table meets the memory use constraint;

step C3, if not, the model training is finished;

step C4, if the memory use constraint is satisfied, traversing all the states to check whether splitting is needed;

step C5, if the splitting is not needed, the model training is finished;

step C6, if splitting is needed, splitting the network state by using an information gain mode, and adding the split network state into a network state space;

and C7, repeating the steps C1 to C6, solving the optimal network state space division method meeting the memory use constraint, and improving the model precision.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims

1, A data center virtualization network fault diagnosis method based on reinforcement learning, which is characterized by comprising the following steps:

step 1, initializing a network fault diagnosis model;

2. The reinforcement learning-based data center virtualization network fault diagnosis method of claim 1, wherein: the specific process of the step 1 is as follows:

step 12, with equal distance radius ═<r₁，r₂，...，r_d>Dividing every dimensional data to construct a network state space set r_dRepresenting the division interval of the d-th dimension data, wherein d is the characteristic quantity of the virtualized network information;

where γ ∈ (0,1) is the discount rate, α ∈ (0,1) is the learning efficiency, and R denotes being in the state S_nSelection action a_nRear instant prize value, Q_nIs shown in state S_nTime corresponds to action a_nNumerical values in the table of Q, Q_n(S_n，a_n) Is shown in state S_nSelection action a_nThe latter jackpot value.

3. The reinforcement learning-based data center virtualization network fault diagnosis method of claim 1, wherein: the step 2 of training the Q table by using the reinforcement learning algorithm specifically comprises the following steps:

4. The reinforcement learning-based data center virtualization network fault diagnosis method of claim 1, wherein: in the step 3, the classification model trained by the decision tree algorithm is used for identifying the network fault and is deployed in all the information acquisition servers, and the information acquired in real time is input into the model for identifying the network fault.

5. The reinforcement learning-based data center virtualization network fault diagnosis method of claim 1, wherein: the specific steps of the step 4 are as follows:

step 41, setting a memory use constraint condition L;

Q＝R(1-(1-α)ⁿ)≤R

Q＝γR(1-(1-α)ⁿ≤γR

Q_S＝<R，R，...，R，γR，γR，...，γR>

by calculating Q_SThe number of middle Rs identifies multiple fault conditions;

step 44, in the Q table training process, counting all the training samples X ═ X (X) received by the Q table₁，x₂，...，x_d) Wherein x is_dRepresenting the value of the d-th attribute, assuming that X is divided into states S and action a is selected_tCounting data X divided into states S and action selection a under the data_tComposition sample data T ═ X, a_t)，a_tA category label for the new sample;

or

Wherein c is_iRepresents each action a_tThe number of the middle samples X is,comprises the following steps:

where m is the number of faults obtained according to step 43;

6. The reinforcement learning-based data center virtualization network fault diagnosis method of claim 5, wherein: in step 45, the calculation formula of the information gain is as follows:

wherein Ent (D) represents that the ratio of kth class sample in the sample set D is p_kThe entropy of time information, k is 1,2,., γ, γ is the number of categories, Gain (D, D) represents the information Gain when the attribute D is used to divide the sample set D, the value range of D is constructed by the dichotomy, and for the value range f (X) of some attribute X, the values are sorted from small to large to be (X)¹，x²，...，xⁿ) Then, the candidate partition nodes constructed by the dichotomy are: