CN112801272A

CN112801272A - Fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning

Info

Publication number: CN112801272A
Application number: CN202110109341.2A
Authority: CN
Inventors: 丁宇; 王超; 马剑; 杨帆; 吕琛
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-05-14

Abstract

The invention discloses a fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, which comprises the following steps: configuring N fault diagnosis model self-learning intelligent agents for simultaneously running respective actor-judge reinforcement learning algorithms on the CPU multithreading; configuring N fault diagnosis model self-learning interactive environments, wherein each fault diagnosis model self-learning interactive environment interacts with a corresponding fault diagnosis model self-learning intelligent agent; configuring a global network for synchronizing global network parameters to the N fault diagnosis model self-learning agents; and network parameters of the self-learning fault diagnosis model are learned through multiple operations among the self-learning intelligent agent of each fault diagnosis model, the self-learning interactive environment of each fault diagnosis model and the global network.

Description

Fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning

Technical Field

The invention relates to the technical field of fault diagnosis, in particular to a fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning.

Background

The fault diagnosis is a technical means for detecting, isolating and identifying faults, along with the rapid development of scientific technology, the structures of various mechanical equipment are increasingly complex, monitoring parameters tend to be massive and multidimensional, and higher requirements are provided for a fault diagnosis method. Because the deep learning technology has high approximation capability to high-dimensional nonlinear parameters, a series of fault diagnosis methods combined with the deep learning method are widely applied to industrial engineering. The fault diagnosis method based on deep learning can integrate fault feature extraction and fault mode classification, realize that the algorithm autonomously extracts useful fault features from data and identifies the fault mode, and has obvious performance improvement compared with the traditional fault diagnosis method based on data driving. But it corresponds to the fact that researchers must adjust various parameters and hyper-parameters in the deep neural network to obtain the best fault diagnosis effect, which requires a lot of time and computational resources and depends on expert experience.

The model self-learning refers to a process of automatically searching more suitable model parameters through multiple rounds of iteration, and is one of important ways for solving the problems. An AC (Actor-criticic) framework in reinforcement learning combines the advantages of a method based on a value function and a strategy gradient, has strong advantages in the problems of continuous space search, single-step action and the like, is suitable for solving the self-learning problem of a fault diagnosis model, and still has the problems of high resource consumption and difficult convergence. The distributed asynchronous parallel reinforcement learning can greatly improve the learning performance and speed of the model, but the multi-device parallel mode consumes excessive resources.

Disclosure of Invention

The invention aims to provide a fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, which is used for solving the technical problems of low self-learning efficiency and excessive resource consumption of a fault diagnosis model in a multi-parameter and high-dimensional space.

The invention relates to a fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, which comprises the following steps

The method comprises the following steps:

configuring N fault diagnosis model self-learning intelligent agents for simultaneously running respective actor-judge reinforcement learning algorithms on the CPU multithreading;

configuring N fault diagnosis model self-learning interactive environments, wherein each fault diagnosis model self-learning interactive environment interacts with a corresponding fault diagnosis model self-learning intelligent agent;

configuring a global network for synchronizing global network parameters to the N fault diagnosis model self-learning agents;

network parameters of each fault diagnosis model are learned through multiple operations among each fault diagnosis model self-learning intelligent agent, each fault diagnosis model self-learning interactive environment and the global network:

each fault diagnosis model self-learning intelligent agent, each fault diagnosis model self-learning interactive environment and each operation among the global network comprise the following steps:

after each fault diagnosis model self-learning intelligent body receives the global network parameters synchronously transmitted by the global network, the network parameters of the fault diagnosis model self-learning intelligent bodies are updated by the global network parameters, and an actor-review reinforcement learning algorithm is operated by the updated network parameters, so that each fault diagnosis model self-learning interactive environment interacts with each corresponding fault diagnosis model self-learning intelligent body to obtain one fault diagnosis model network parameter;

each fault diagnosis model self-learning interactive environment asynchronously uploads network parameters of each corresponding fault diagnosis model self-learning intelligent agent after the fault diagnosis model self-learning intelligent agent runs an actor-review reinforcement learning algorithm to a global network;

the global network selects the optimal network parameters from the network parameters of the fault diagnosis model self-learning intelligent bodies after the fault diagnosis model self-learning intelligent bodies run the actor-review reinforcement learning algorithm, and synchronizes the selected optimal network parameters to each fault diagnosis model self-learning intelligent body again so that each fault diagnosis model self-learning intelligent body updates the network parameters, so that each fault diagnosis model self-learning interactive environment interacts with the corresponding fault diagnosis model self-learning intelligent body to obtain the next fault diagnosis model network parameter.

Preferably, each fault diagnosis model self-learning interactive environment diagnoses the network parameters of the fault diagnosis model obtained each time, and uploads the network parameters of the fault diagnosis model self-learning intelligent agent with the diagnosis result to the global network step by step, so that the global network selects the optimal network parameters from the network parameters of the fault diagnosis model self-learning intelligent agent after each fault diagnosis model self-learning intelligent agent runs an actor-review reinforcement learning algorithm.

Preferably, each fault diagnosis model self-learning interactive environment comprises:

the intelligent agent interaction module is used for receiving and analyzing action a output by the fault diagnosis model self-learning intelligent agent and generating a fault diagnosis model network parameter according to the action a;

and the result evaluation module is used for diagnosing the network parameters of the fault diagnosis model generated by the intelligent agent interaction module according to the action a, uploading the network parameters of the fault diagnosis model self-learning intelligent agent with the diagnosis result to the global network step by step, and generating a state S transmitted to the fault diagnosis model self-learning intelligent agent.

Preferably, diagnosing the fault diagnosis model network parameters generated by the agent interaction module according to the action a includes:

loading the network parameters of the fault diagnosis model into the fault diagnosis model,

and diagnosing the fault diagnosis model loaded with the network parameters of the fault diagnosis model by using the training data and the verification data so as to determine the fault diagnosis accuracy.

Preferably, each fault diagnosis model self-learning interactive environment further comprises a judging module for judging whether the parameters of the fault diagnosis model are complete.

Preferably, when the fault diagnosis model parameters are judged to be incomplete, the incomplete fault diagnosis model parameters are discarded.

Preferably, the fault diagnosis model self-learning interactive environment is software running a CUP thread.

Preferably, each fault diagnosis model self-learning agent comprises an actor strategy network and a commentator value network;

the actor network processes the state S of the self-learning interactive environment of the paired fault diagnosis model according to a Gaussian strategy, and determines an action a for modifying the current fault diagnosis model architecture;

and the evaluation agent value network processes the state S from the fault diagnosis model self-learning interactive environment, outputs a value function, evaluates the action a determined by the actor network by using the value function, and outputs the evaluated action a to the matched fault diagnosis model self-learning interactive environment, so that the fault diagnosis model self-learning interactive environment generates corresponding fault diagnosis model network parameters according to the evaluated action a.

Preferably, the agent evaluation module generates a reward value based on the diagnostic result and sends the reward value to a network of panelists.

Preferably, the actor network and the commentator network are both constructed using recurrent neural networks.

The invention has the beneficial effects that: 1) a large amount of independent and uniformly distributed training data are obtained through asynchronous and parallel agents, and the effect of parallel of a plurality of computers can be achieved only by using a single CPU, so that the self-learning efficiency of the fault diagnosis model is improved. 2) The interaction environment can give instant reward and state transition results through interaction with the intelligent agent, and guide the learning direction of the intelligent agent, so that the intelligent agent capable of generating the optimal fault diagnosis model is obtained. 3. The method can be suitable for self-learning of fault diagnosis models of different types of industrial equipment.

Drawings

FIG. 1 is a flow chart of the operation of the fault diagnosis model self-learning interactive environment of the present invention;

FIG. 2 is a single thread agent learning flow diagram of the present invention;

FIG. 3 is an overall architecture diagram of a fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning according to the present invention;

FIG. 4 is a functional block diagram of a self-learning interactive environment of the fault diagnosis model of FIG. 1;

FIG. 5 is a schematic diagram of the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning of the present invention;

FIG. 6 is a schematic representation of the cumulative reward change during the self-learning process of the gearbox fault diagnosis model of the present invention;

FIG. 7 is a schematic illustration of the diagnostic effect of the intelligent agent self-learning gearbox fault diagnosis model obtained at time 1 of the present invention;

FIG. 8 is a schematic illustration of the diagnostic effect of the gearbox fault diagnosis model obtained by the agent self-learning 100 th time of the present invention;

FIG. 9 is a self-learning jackpot comparison chart of the gearbox fault diagnosis model of the present invention;

FIG. 10 is a schematic diagram of the cumulative reward change during the self-learning process of the hydraulic pump fault diagnosis model of the present invention;

FIG. 11 is a schematic illustration of the intelligent agent self-learning hydraulic pump fault diagnosis model diagnostic effect obtained at time 1 of the present invention;

FIG. 12 is a schematic illustration of the diagnostic effect of the hydraulic pump fault diagnosis model obtained by the intelligent agent self-learning 100 th time of the present invention;

FIG. 13 is a self-learning jackpot comparison diagram of the hydraulic pump fault diagnosis model of the present invention.

Detailed Description

The fault diagnosis model self-learning using state, action and reward definition interaction process between the intelligent agent and the interaction environment based on reinforcement learning is an indispensable basic module in the whole self-learning process. The method is based on a built fault diagnosis model self-learning typical framework, and combines the self-learning target of the fault diagnosis model to design the state, the action and the reward function.

In the fault diagnosis model self-learning based on reinforcement learning, the state comprises complete description of a fault diagnosis model framework and represents environmental information which can be sensed by a self-learning intelligent agent and changes caused by self-selected actions. The state information is the basis for making decisions and evaluating long-term benefits of the self-learning intelligent system, and the condition of state design directly determines whether the self-learning algorithm can be converged, the convergence speed and the final performance of the self-learning algorithm.

Before state design, current tasks are firstly analyzed, in the invention, the final purpose of self-learning is to obtain a fault diagnosis model with higher fault diagnosis accuracy, therefore, the state design should contain the key parameters of the model to be learned and designed as much as possible, according to the typical fault diagnosis model architecture research, the key parameters for self-learning are selected as (i) the number of layers of a full connection layer and (ii) the number of nodes of each layer, and the design of other key parameters is designed and determined in the fault diagnosis model self-learning interactive environment design.

The main information to be represented by the state comprises the number of layers of the fault diagnosis model and the number of nodes of each layer, the number of layers and the number of nodes of each layer are usually designed sequentially in the neural network design, in order to simulate the process, the state is represented in a one-dimensional array mode, and the first element represents the number N of network layers of the current fault diagnosis model_latyerThe subsequent elements represent the number of nodes from the first layer to the last layer, respectively. And when the state is updated, the data of each position is updated in turn, namely the design process from the network layer number to the node number of each layer is simulated. Since the length of the state array must be fixed, the length L of the state array_stateDetermining the length of array when taking the maximum number of network layers, the invention is set as L_stateAnd 9, determining the maximum network layer number to be 8.

The action is generated by interaction of the model self-learning agent and the environment, the state is changed through the action, and the set of all actions is called an action space. The action is the operation that the self-learning agent can carry out, and in the self-learning task of the fault diagnosis model, the purpose of the agent is to design the number of layers of the fault diagnosis model and the number of nodes on each layer. Therefore, the motion space is the value space of the two key parameters. And the action of the intelligent agent is the number of layers or the number of nodes of a certain layer for designing the fault diagnosis model.

It should be noted that not all actions are valid actions. For example, when the number of layers of the fault diagnosis model is determined to be 3, designing the number of network nodes at layer 4 is an invalid action. Thus, the present invention provides that the action is ineffective when the number of layers corresponding to the number of network nodes being designed exceeds the number of network layers currently being designed.

The reward function is an important loop in the self-learning process, communication between an algorithm and a task target is achieved, and the ability of a self-learning intelligent body to learn whether to construct a fault diagnosis model with good performance is determined. In the invention, the intelligent agent obtains a complete fault diagnosis model structure to be regarded as that the intelligent agent completes one task, and the reward is fed back to the intelligent agent at the moment. Because the fault diagnosis model with an incomplete structure cannot be trained, when a complete fault diagnosis model framework is not obtained, the reward value fed back to the intelligent agent is set to be-1, so that the self-learning intelligent agent can be guided to continue generating the fault diagnosis model, after the whole fault diagnosis model is designed, a certain forward reward is fed back to the intelligent agent according to the fault diagnosis performance of the model, different performances correspond to different reward values, and the intelligent agent is enabled to move forward in the direction of obtaining more rewards. Because the amplitude of the fault diagnosis accuracy rate is small, when the difference value is small, the intelligent agent cannot be given clear learning guidance, the reward is set to be 1000 times of the diagnosis accuracy rate accuracy, and the training of the self-learning intelligent agent can be better guided. The reward function design ultimately adopted by the invention is as follows:

wherein terminal is used to indicate whether the current fault diagnosis model result is complete.

Two interactive objects are respectively an intelligent agent and an interactive environment in the self-learning process of the fault diagnosis model based on asynchronous parallel reinforcement learning, wherein the intelligent agent can sense the state of the interactive environment and select a proper action according to the feedback reward learning to maximize the accumulated reward. Therefore, the interactive environment gives instant reward and state transition results through interaction with the intelligent agent to guide the learning direction of the intelligent agent, and is one of the cores in the self-learning framework of the fault diagnosis model.

The model self-learning interactive environment mainly comprises an intelligent agent interactive module and an intelligent agent learning result evaluation part, the intelligent agent interactive module realizes communication analysis with the intelligent agent, and the result evaluation module builds a fault diagnosis model based on the model state and feeds back the diagnosis result.

(1) Intelligent interaction module

The intelligent agent interaction module can provide an initial model state sequence for the intelligent agent, give a next model state sequence according to the model action input by the intelligent agent, and store the current state step.

1) Model state initialization

And determining that the model state space is a 9 x 1-dimensional vector according to the research on the model state, and respectively representing the number of layers of the model and the number of nodes of each layer. The model state initialization function feeds back to the agent initial state sequence s as 0, 0, 0, 0, 0, 0] to indicate that the current state is the initial state, and resets the state step to 0.

2) Model state transitions

According to the research on the model action, the model action is selected as the number of layers or the number of nodes, when the state step is 0, the model action is selected as the number of layers, and when the state step is more than 0, the model action is selected as the number of nodes. In order to reduce the gradient value during the learning of the self-learning agent and prevent gradient explosion, the model action selection is regulated to the interval (-2, 2), so the interactive environment firstly decodes according to the following formula after obtaining the model action input by the agent:

wherein action refers to the action of a model selected by the self-learning agent_realIs the actual action used to construct the model state after decoding, limup is the current model action value upper limit (number of layers or nodes), limdown is the current model action value lower limit (number of layers or nodes).

And after the decoded model action is obtained, replacing the value of the corresponding position in the model state vector according to the position indicated by the state step, feeding back the updated model state of the intelligent agent when the state step of the model is less than the number of model layers, wherein the fault diagnosis model is not designed completely, returning the state indication terminal to False, and outputting the instant reward of-1 according to a reward function. When the model state step is larger than the model layer number, the design of the fault diagnosis model is finished, the state indication terminal returns to True, and the instant reward outputs accuracy 1000 according to the reward function.

And the intelligent agent learning result evaluation module receives the complete structural parameters of the fault diagnosis model, constructs a corresponding fault diagnosis model, then utilizes the training data and the verification data to carry out efficient training of the model and finally adopts the test set to verify the fault diagnosis accuracy of the model.

The core of the result evaluation module is a dynamic neural network construction module, and the neural network can be efficiently and dynamically constructed according to different fault diagnosis model structure parameters. The input layer determines input dimensionality according to the training data dimensionality, the middle layer determines the number of layers and the number of nodes on each layer according to the input model state sequence, and each fault diagnosis model is different. The output layer is a Softmax layer and is used for achieving fault mode identification.

For training set P (x)⁽¹⁾,y⁽¹⁾),…,(x^(m),y^(m)) Is of y⁽ⁱ⁾E {1,2, … n }, for a total of n classes. For each input x there will be a probability for each class, i.e. P (y ═ j | x; θ). The cost function of Softmax is defined as follows, including an exemplary function 1{ expression whose value is true } -, 1.

All designed fault diagnosis models adopt an Adaptive Moment Estimation (Adam) optimization algorithm to minimize a cost function of Softmax.

The operation flow of the self-learning interactive environment of the finally designed and constructed fault diagnosis model is shown in fig. 1. And (3) the whole environment and the self-learning intelligent agent interact to generate data for the intelligent agent to learn and finally obtain the intelligent agent capable of generating the optimal fault diagnosis model.

In combination with the study on the reinforcement learning method, the design of the self-learning intelligent agent of the fault diagnosis model adopts an AC framework, and the fault diagnosis model mainly comprises an Actor network and a criticic comment network.

1) Actor network

The Actor network of the Actor aims to learn and optimize the strategy of generating the fault diagnosis model by the intelligent agent, so that the performance of the model is better and better, therefore, the Actor network of the Actor inputs the state S of the fault diagnosis model framework and outputs the state S of the fault diagnosis model framework related to the Gaussian strategy

The Actor determines an action to modify the current fault diagnosis model architecture according to the gaussian strategy, that is, the number of network layer nodes to be increased is determined or the network architecture design of the time is completed according to the current fault diagnosis model architecture state. Because the construction of the fault diagnosis network architecture is carried out layer by layer, the selection of the nodes of the neural network of the later layer has correlation with the nodes of the previous layer, and in order to better represent the correlation and obtain better action distribution, a Recurrent Neural Network (RNN) is adopted to construct an Actor network of the Actor.

The structural information of the fault diagnosis neural network designed at the last moment and the current state information are jointly sent into the RNN for learning, so that the result of the design at the last step can be simultaneously considered when the network structure is designed, and the time sequence relevance of the network architecture design is realized. The input dimension of the Actor should be the same as the dimension of the model state S vector, and the output is the average action mu describing the action distribution_θ(s) and standard deviation σ_θAnd(s) acquiring the structural design action of the fault diagnosis network given by the actor network in the current state through sampling. The specific network architecture of Actor is shown in table 1.

TABLE 1 Actor network architecture

2) Critic judge network

The goal of the Critic network is to learn a value function and use the value function to evaluate the quality of design parameters for designing a fault diagnosis model, so as to guide an Actor to design a better fault diagnosis model. The Critic Critic network also has inputs of a state representation S of the fault diagnosis model and outputs of a state value function V (S). Like the actor network, in order to learn the timing information in the fault diagnosis model state sequence as much as possible, the Critic network also adopts an RNN network structure, and the specific network architecture is shown in table 2.

TABLE 2 Critic Committee network architecture

After the fault diagnosis model self-learning agent based on the Actor-Critic structure is established, the learning strategy of the agent needs to be determined. According to the reinforcement learning method, an Actor in an agent learns based on a strategy gradient, and according to a generalized dominance estimation theory, the generalized strategy gradient can be written as follows:

therein, Ψ_tRelating to a return function for evaluating the quality of the current action, parameterizing a state value function V(s), namely V (s; w), by adopting

Critic judge network for parameterization of Ψ_tAnd the Actor network of the Actor guides the updating of the strategy function parameter theta according to the value obtained by the Critic comment network. The objective of Critic update is to make the estimation of the value function more accurate, so the loss function of this part is constructed as:

the update rule of w is

The update rule of theta is

Is a function of the behavior values estimated based on the time difference.

The fault diagnosis model self-learning agent learning process is shown in table 3.

TABLE 3 Fault diagnosis model self-learning intelligent agent learning process

On the basis of the fault diagnosis model self-learning method based on Actor-Critic reinforcement learning, the invention provides a fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, in order to further improve the convergence speed and stability of the algorithm.

Asynchronous means that data are generated at different times, and parallel means that a plurality of self-learning agents are operated simultaneously. Based on asynchronous parallelism, a plurality of Actor-Critic reinforcement learning algorithms can be simultaneously operated on multiple threads of a CPU, the reinforcement learning process on each thread is independently and uniformly distributed, and data obtained based on simultaneous sampling of the threads can break the relevance between data obtained by a single Actor-Critic algorithm, so that the convergence of the algorithms is greatly improved. Meanwhile, due to the multi-thread parallel training, the calculation efficiency of the self-learning method based on the asynchronous parallel training is certainly and greatly improved.

As multithreading simultaneous training is carried out, and the final aim of the invention is to obtain an AC network with excellent performance, when the intelligent body training is carried out based on asynchronous parallel, a Global Actor network (Global Actor) and a Global Critic network (Global Critic) are firstly established and network parameters are initialized, then the Global network parameters are synchronized to the Actor network (Actor) and the Critic network (Critic) in each thread, the Actor-Critic algorithms in each thread run independently, update gradients are calculated and asynchronously updated to the Global network, the Global network stores the latest Global network parameters, and when the Global network parameters are converged, the learning and the training of the intelligent body of each thread are stopped. Wherein the interaction flow between the self-learning agent and the environment within a single thread is shown in figure 2.

FIG. 3 shows the general architecture of the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, which comprises the following steps: n fault diagnosis model self-learning intelligent agents which are used for simultaneously running respective actor-judge reinforcement learning algorithms on the CPU multithreading; n fault diagnosis model self-learning interactive environments, wherein each fault diagnosis model self-learning interactive environment interacts with a corresponding fault diagnosis model self-learning intelligent agent; and the global network is used for synchronizing the global network parameters to the N fault diagnosis models and self-learning intelligent agents.

Each fault diagnosis model self-learning agent comprises an Actor network (Actor policy network) and a Critic network (Critic value network).

And network parameters of the fault diagnosis model are learned by the aid of multiple operations among each fault diagnosis model self-learning intelligent agent, each fault diagnosis model self-learning interactive environment and the global network.

FIG. 4 shows the functional structure of the self-learning interactive environment of the fault diagnosis model of the invention, which comprises an agent interaction module, a module for judging whether the structural parameters of the fault diagnosis model are complete, and an agent evaluation module.

For the fault diagnosis model self-learning agent and the self-learning interactive environment of a single thread, after the self-learning agent gives an action each time, the agent interactive module of the self-learning interactive environment judges whether the current state indication terminal signal is True or False. For the condition that the Terminal signal is False, the reward function returns an instant reward value of-1, and meanwhile, the intelligent agent interaction module gives out a next model state sequence; and when the Terminal signal is True, the intelligent agent evaluation module in the self-learning interactive environment returns the running process of the intelligent agent learning result evaluation module to the instant reward, and the fault diagnosis model state sequence s is reinitialized. On the basis, each independent fault diagnosis self-learning intelligent agent updates the parameters of the actor-comment algorithm.

FIG. 5 shows a fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning. The method shown in fig. 5 includes:

the global network selects the optimal network parameters from the network parameters of the fault diagnosis model self-learning intelligent bodies after the fault diagnosis model self-learning intelligent bodies run the actor review reinforcement learning algorithm, and synchronizes the selected optimal network parameters to each fault diagnosis model self-learning intelligent body again so that each fault diagnosis model self-learning intelligent body updates the network parameters, so that each fault diagnosis model self-learning interactive environment interacts with the corresponding fault diagnosis model self-learning intelligent body to obtain the next fault diagnosis model network parameter.

In the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, each fault diagnosis model self-learning interactive environment diagnoses the network parameters of the fault diagnosis model obtained each time, and the network parameters of the fault diagnosis model self-learning intelligent bodies with diagnosis results are uploaded to a global network step by step, so that the global network can select the optimal network parameters from the network parameters of the fault diagnosis model self-learning intelligent bodies after each fault diagnosis model self-learning intelligent body runs an actor-review reinforcement learning algorithm.

In the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, each fault diagnosis model self-learning interactive environment comprises the following steps:

In the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, the diagnosis of the fault diagnosis model network parameters generated by the intelligent interaction module according to the action a comprises the following steps:

In the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, each fault diagnosis model self-learning interactive environment further comprises a judging module for judging whether the parameters of the fault diagnosis model are complete or not.

In the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, when the fault diagnosis model parameters are judged to be incomplete, the incomplete fault diagnosis model parameters are discarded.

In the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, the fault diagnosis model self-learning interactive environment is software for running a CUP thread.

In the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, each fault diagnosis model self-learning intelligent agent comprises an actor strategy network and a committee value network;

In the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, the intelligent agent evaluation module generates an incentive value according to the diagnosis result and sends the incentive value to a judge value network judge network.

In the fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning, the actor network and the appraiser network are both constructed by adopting a recurrent neural network.

The following describes the practice of the present invention in detail with reference to specific examples.

The method provided by the invention is mainly based on multi-thread operation, and meanwhile, the multi-thread intelligent agent simultaneously acquires interactive data in the operation process and requires a larger system memory for data storage, so that a CPU core with more threads and higher main frequency is selected to self-learn the fault diagnosis model by combining the large memory. The platform system hardware configuration used for the experiments is shown in table 4.

TABLE 4 System platform hardware configuration

The software configuration aspect of the system platform mainly comprises a TensorFlow algorithm library used for building the self-learning agent and a reading library used for data reading and multithreading parallel, and the software configuration condition of the system platform is shown in Table 5.

TABLE 5 System platform software configuration

In the invention, the two fault diagnosis tasks adopt a method based on asynchronous parallel reinforcement learning to carry out fault diagnosis model self-learning, and the whole framework comprises a global AC network and a plurality of thread AC networks. Wherein, the global AC network only updates and synchronizes network parameters without training; and the AC network of each thread has the same structure as the global network, carries out asynchronous parallel training, generates different network parameters, and updates the network parameters to the global network at a fixed time. Specific AC network configuration parameters are shown in tables 6 and 7.

An Actor network:

TABLE 6 Actor network architecture

Critic network:

TABLE 7 Critic network architecture

The input of the two networks is state S, which represents the network architecture for fault diagnosis constructed currently, and the architecture is used for realizing two embodiments of gearbox fault diagnosis and hydraulic pump fault diagnosis. The output of the Actor network contains two parts, namely the mean and variance of the action strategy distribution, and then the action a is generated by random sampling according to the distribution. The output of the Critic network is the value evaluation result of the current state S, which is used for constructing and updating the strategy gradient of the actor and the commentator network.

The rest of the information design of the self-learning architecture except for the configuration of the intelligent agent parameters is shown in table 8.

TABLE 8 self-learning test parameter settings

The number of threads is distributed according to computing resources, the used computing CPU is 4-core 8 threads, and 8 thread AC networks are correspondingly constructed for asynchronous parallel training.

Each thread AC network utilizes back propagation training, and in the process of interacting with the environment, each round generates a complete fault diagnosis model which comprises 4-9 rounds, the number of layers of the fault diagnosis network and the number of nodes in each layer are respectively determined, and the total number of rounds is set as 100 rounds. Although the training data of the two fault diagnosis tasks are different, the adopted self-learning architectures are the same, and the design parameters are shared.

Example 1: gear box fault diagnosis model self-learning based on asynchronous parallel reinforcement learning

The gearbox data used in the tests was from a power transmission failure prediction test rig manufactured by Spectra Quest, usa. The test bench consists of a driving motor, a planetary gear box for testing, a parallel gear box for testing, a load parallel gear box (2), a load motor, a driver and a plurality of matched sensors.

In the research of the case, the fault injection part is positioned on the planetary gearbox, and the injection fault modes comprise four fault modes of gear crack, abrasion, missing tooth and broken tooth.

The working condition of data acquisition is that the gear rotating speed is 20Hz, and the load is 0Nm (no load). The acceleration sensor is installed on the outer end cover of the input shaft of the planetary gearbox through a threaded connection, and the vibration signal of the gearbox is sampled for 40 seconds at the frequency of 12.8 kHz. Thus, there are 512000 sample points for each fault condition (including the normal condition). Considering that the current state of the gear can be reflected by signals acquired by one rotation of the gear, original signal data are cut into 3000 samples according to the cutting mode of 1280 data points of each sample, and the training of a diagnostic model is facilitated. Details of the gearbox data set are shown in table 9.

TABLE 9 gearbox data set details

In the self-learning process of the model, the change of the accumulated reward value can directly reflect the learning effect of the intelligent agent. FIG. 6 shows the variation of the cumulative rewards in the self-learning process of the gearbox fault diagnosis model, and is a cumulative reward interval graph obtained by 5 times of repeated training. It can be seen that the jackpot rises steadily and remains steady during the model self-learning process. Some fluctuations in prize values due to random exploration of new network structures caused the prize to slip down around 50 rounds, but quickly returned to near the optimal jackpot. On the whole, the algorithm does not fall into local optimization and fast convergence, so that a better fault diagnosis model structure is obtained.

In order to visually display the self-learning effect of the fault diagnosis model, data obtained in front of a Softmax layer of a fault diagnosis network generated by an intelligent agent are adopted, and the fault diagnosis effect is visualized by combining two common dimensionality reduction methods of random field embedding (t-SNE) and Principal Component Analysis (PCA).

Fig. 7 shows the diagnostic effect of the fault diagnosis model randomly obtained by the agent at the initial stage of the model self-learning, wherein the left side is a t-SNE visualized two-dimensional and three-dimensional result graph, the right side is a PCA visualized two-dimensional and three-dimensional result graph, and data points with different colors represent different fault modes. It can be seen that the fault diagnosis model randomly obtained at the initial stage of model self-learning has a poor fault diagnosis effect, and identification of each fault mode cannot be completed.

Fig. 8 shows the diagnostic effect of the fault diagnosis model constructed by the agent design after learning through 100 rounds. It can be seen that through learning of 100 rounds, the intelligent agent can well distinguish different fault modes, and a high fault diagnosis accuracy rate is obtained.

The experiments prove the effectiveness of the method in the aspect, and in order to further verify the advantages of the method compared with other methods, the random search and Bayesian optimization search methods are adopted for self-learning of the fault diagnosis model, and the advantages of the method are verified by comparison.

To avoid randomness in the individual experiments, each method was repeated 5 times simultaneously. The results of the same comparison of the methods are shown in FIG. 9. Wherein (a) - (e) respectively show the change of the self-learning accumulated reward values of all methods along with the time in the 5 times of repeated experiments. It can be seen that in all the tests, the bayesian optimization method is far behind the random search and the asynchronous parallel-based fault diagnosis model self-learning method provided by the invention in both the time required for obtaining the optimal fault diagnosis model and the stability. Comparing the random search with the method provided by the invention, it can be seen that the time consumption for obtaining the optimal fault diagnosis model is obviously less than the self-learning of the fault diagnosis model based on the random search, and the method provided by the invention has better diagnostic performance for obtaining the fault diagnosis model. The graph (f) reflects the stability of each method and the performance upper limit of the finally obtained fault diagnosis model, and the method provided by the invention has strong advantages no matter the stability or the self-learning effect is shown.

Furthermore, the invention consumes less overall time than the RS and BO algorithms, and thus has significant advantages in the time dimension.

In conclusion, in the self-learning embodiment of the gearbox fault diagnosis model, the fault diagnosis method based on asynchronous parallel reinforcement learning provided by the invention has obviously improved self-learning speed and effect compared with other methods.

Example 2: hydraulic pump fault diagnosis model self-learning based on asynchronous parallel reinforcement learning

The plunger type hydraulic pump test equipment is characterized in that an acceleration sensor for collecting vibration signals is arranged on the end face of the pump, and the sampling frequency is 1024 Hz. During the test, the motor speed was set to 5280rpm, seven pumps were used in total, and the collected data included normal conditions, slipper and swashplate wear failures, and port plate wear failures. Two sets of 1024 points were collected for each pump, for a total of 14 samples. For the evaluation of the failure diagnosis effect, each group of samples was recombined by a sliding window with a length of 512, that is, the length of a single sample was 512 points, and the data of each pump was formed into a data set containing 1536 failure samples by the sliding window, as shown in table 10.

TABLE 10 plunger Hydraulic Pump data composition

The self-learning effect of the intelligent agent is still evaluated by adopting the change of the accumulated reward value as the self-learning effect evaluation of the gearbox fault diagnosis model. Fig. 10 shows the accumulated reward variation during the self-learning process of the hydraulic pump fault diagnosis model, and the accumulated reward interval obtained by 5 times of repeated training. Each training run for 100 rounds, and it can be seen that, similar to the self-learning process of the gearbox fault diagnosis model, the cumulative reward steadily rises and remains stable throughout the learning process, and the cumulative reward fluctuation generated by random exploration appears in 30 th and 60 th rounds, but the cumulative reward can be quickly returned to the vicinity of the optimal cumulative reward. And on the whole, the target task is completed by self-learning of the hydraulic pump fault diagnosis model based on asynchronous parallel reinforcement learning.

In order to further evaluate the self-learning effect of the hydraulic pump fault diagnosis model, data obtained in front of a Softmax layer of a fault diagnosis network generated by an intelligent agent are also adopted, and the fault diagnosis effect is visualized by combining two common dimensionality reduction methods of random field embedding (t-SNE) and Principal Component Analysis (PCA).

Fig. 11 and 12 show the diagnostic effect of the fault diagnosis model obtained by the self-learning agent in different learning stages, wherein the left side is a t-SNE visualized two-dimensional and three-dimensional result diagram, and the right side is a PCA visualized two-dimensional and three-dimensional result diagram. It can be seen that, in the stage of starting learning by the intelligent agent, the obtained fault diagnosis model can not be used for fault diagnosis at all, all fault modes are mixed together, and through 100 rounds of learning, the fault diagnosis model obtained by the intelligent agent can be used for fault diagnosis well, and different fault categories are clearly separable.

The method is characterized in that a random search method and a Bayesian optimization method are also adopted for a comparison test aiming at a hydraulic pump fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning. Each method was repeated 5 times at the same time, and the results of the same comparison of each method are shown in fig. 13.

From the above figures (a) - (e) it can be seen that although the method proposed by the present invention is slightly less time consuming than the random search method, the speed boost is not obvious from the self-learning embodiment of the gearbox fault diagnosis model. Compared with the prior art, the hydraulic pump has the advantages that the complexity of data used by the hydraulic pump is relatively low, the requirement on a fault diagnosis model is low, and therefore a random search-based method can also achieve a good effect. However, the invention still has great advantages as can be seen from the specific comparison of the time consumption.

In the self-learning process of the hydraulic pump fault diagnosis model, the fault diagnosis model obtained through the whole self-learning process is good in effect, and the self-learning time consumption is the least. The fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning can be verified to have good universality and can be quickly and conveniently applied to fault diagnosis tasks of various different data.

In summary, the present invention can achieve the following technical effects:

1. in the aspects of accuracy and stability of self-learning of the fault diagnosis model, the fault diagnosis self-learning method based on asynchronous parallel reinforcement learning provided by the invention is superior to the traditional model self-learning methods such as Bayesian optimization and random search. The method provided by the invention is superior to the traditional model self-learning method in the aspects of accuracy and stability, because the method provided by the invention can realize the automatic optimization of various hyper-parameters of the fault diagnosis model through the interaction of environment and an intelligent agent and the maximization of the cumulative reward function, the accuracy of the fault diagnosis model is improved, and meanwhile, the learning mechanism of asynchronous parallel reinforcement learning enables the self-learned fault diagnosis model to be more stable, thereby realizing the accurate and stable diagnosis of various fault modes.

2. The fault diagnosis self-learning method based on asynchronous parallel reinforcement learning provided by the invention consumes less time in the learning process than the traditional model self-learning methods such as Bayesian optimization, random search and the like. The invention adopts an asynchronous parallel method, a plurality of Actor-Critic reinforcement learning algorithms are simultaneously operated on the multithreading of the CPU, the reinforcement learning process on each thread is independently and uniformly distributed, and the data obtained by simultaneously sampling a plurality of threads can break the relevance between the data obtained by a single Actor-Critic algorithm, thereby greatly improving the convergence of the algorithm. Meanwhile, due to the fact that multi-thread parallel training is conducted, the calculation efficiency of the asynchronous parallel-based self-learning method is greatly improved, and therefore the self-learning efficiency of the method is higher than that of a traditional model self-learning method.

3. The invention aims to provide a set of self-learning process and method with universality for a fault diagnosis model, and the traditional fault diagnosis model self-learning method has certain object and data limitations and cannot meet the requirement of the universal application of the current fault diagnosis model self-learning method. Therefore, the invention provides the fault diagnosis self-learning method based on asynchronous parallel reinforcement learning, and the method can be suitable for the self-learning process of different equipment fault diagnosis models by constructing a universal fault diagnosis model self-learning framework mainly comprising an actor network structure, a comment network structure and other related parameters in an algorithm training optimization process, and has strong application convenience.

Although the present invention has been described in detail hereinabove, the present invention is not limited thereto, and various modifications can be made by those skilled in the art in light of the principle of the present invention. Thus, modifications made in accordance with the principles of the present invention should be understood to fall within the scope of the present invention.

Claims

1. A fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning comprises the following steps:

2. The fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning as claimed in claim 1, wherein each fault diagnosis model self-learning interactive environment diagnoses the network parameters of the fault diagnosis model obtained each time, and uploads the network parameters of the fault diagnosis model self-learning agent with diagnosis results to the global network step by step, so that the global network selects the optimal network parameters from the network parameters of the fault diagnosis model self-learning agent after each fault diagnosis model self-learning agent runs the actor-review reinforcement learning algorithm.

3. The fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning according to claim 2, wherein each fault diagnosis model self-learning interactive environment comprises:

4. The fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning of claim 3, wherein the diagnosis of the fault diagnosis model network parameters generated by the agent interaction module according to the action a comprises:

5. The fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning according to claim 2 or 3, wherein each fault diagnosis model self-learning interactive environment further comprises a judging module for judging whether the parameters of the fault diagnosis model are complete.

6. The asynchronous parallel reinforcement learning-based fault diagnosis model self-learning method according to claim 2 or 3, wherein when the fault diagnosis model parameters are judged to be incomplete, the incomplete fault diagnosis model parameters are discarded.

7. The fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning according to claim 2 or 3, wherein the fault diagnosis model self-learning interactive environment is software running a CUP thread.

8. The fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning according to claim 1,2 or 4, wherein each fault diagnosis model self-learning agent comprises an actor policy network and a reviewer value network;

9. The asynchronous parallel reinforcement learning based fault diagnosis model self-learning method of claim 8, wherein the agent evaluation module generates a reward value according to the diagnosis result and sends the reward value to a network of reviewers value network.

10. The fault diagnosis model self-learning method based on asynchronous parallel reinforcement learning of claim 8, wherein the actor network and the commentator network are both constructed by using a recurrent neural network.