CN113988177A

CN113988177A - Water quality sensor abnormal data detection and fault diagnosis method

Info

Publication number: CN113988177A
Application number: CN202111255726.6A
Authority: CN
Inventors: 蔡倩倩; 朱雅璐; 孟伟; 麦达明; 鲁仁全
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2022-01-28

Abstract

The invention discloses a water quality sensor abnormal data detection and fault diagnosis method, which comprises the following steps: collecting N label-free sample water quality sensor data sets and preprocessing the data sets; constructing a deep reinforcement learning network, wherein the deep reinforcement learning network comprises an Alex convolutional neural network and an Actor _ Critic network; establishing an environment for interaction of the intelligent agent in reinforcement learning, and setting actions and obtained returns of each interaction of the intelligent agent and the environment; inputting sample data aiming at the deep reinforcement learning network, performing iterative training until the total return value is stable and converged, extracting network model parameters and storing an optimal model; inputting a label-free sample sensor data set to be detected into a model to generate a plurality of hyperplanes; dividing the data set into positive and negative regions of different degrees; detecting data points appearing in a negative area of the hyperplane with low accuracy, and regarding the data points as abnormal data; and recording the corresponding sensor when the data point appears in the negative area of the hyperplane with lower accuracy for multiple times, and judging that the sensor possibly fails.

Description

Water quality sensor abnormal data detection and fault diagnosis method

Technical Field

The invention relates to the field of abnormal data detection of water quality sensors, in particular to a water quality sensor abnormal data detection and fault diagnosis method based on reinforcement learning.

Background

The environmental protection lies in water resource protection, which lies in water pollution control. In the prevention and treatment of water pollution, a water quality sensor is mainly used for detecting important indexes reflecting the water quality pollution condition, so that the pollution degree is monitored. The key to water pollution detection is to ensure the accuracy and validity of sensor data. Therefore, the detection of abnormal data of the original data collected by the sensor is particularly important. Sensor anomaly data refers to data that is inconsistent with a majority of data in a data set or deviates from normal behavior patterns. The common detection method is that a probability statistical model, namely probability gives the distribution of the population to deduce the sample property, and statistics is used for verifying the hypothesis of the population distribution from the sample. The machine learning method, in which clustering and support vector machine methods are commonly used, is characterized in that the generalization capability of the model is strong, and data samples with historical abnormal data labels are often needed. However, most of the abnormal data in actual production is not labeled. Therefore, there is a need to find a method of abnormal data detection for unlabeled data samples.

The convolutional neural network is a feedforward neural network and is mostly applied to the field of image processing, wherein the convolutional layer mainly extracts the characteristics of input data, and abstracting the implicit relation in the original data through convolutional cores. The pooling layer mainly plays a role in down-sampling, namely, the dimensionality reduction is carried out on the feature data, and overfitting is reduced. Reinforcement learning is a field of machine learning, and the content of reinforcement learning is that an agent performs actions based on the environment to obtain corresponding rewards and punishments, and gradually iterates and optimizes under the stimulation to achieve maximum benefits, namely, habitual behaviors pursuing expected maximum benefits are achieved, and the machine learning has universality.

Disclosure of Invention

The invention aims to provide a water quality sensor abnormal data detection and fault diagnosis method, which utilizes a convolutional neural network to extract data difference characteristics of unlabeled sample data, constructs an environment based on probability density as a standard, and classifies the sample data by reinforcement learning so as to achieve the purposes of water quality sensor abnormal data detection and fault diagnosis.

In order to realize the task, the invention adopts the following technical scheme:

a water quality sensor abnormal data detection and fault diagnosis method comprises the following steps:

step 1, collecting N unlabeled sample water quality sensor data sets { D₁，D₂，...D_NAnd preprocessing, each data set D_iDetection data comprising m time segments, D_i＝[V₁，V₂，...，V_m](ii) a Wherein V_iThe data points are multidimensional data points and represent the water quality condition detected by the water quality sensor in a certain time period;

the data preprocessing process comprises the following steps:

for a single data set D_iRandom extraction of r²Vectors and synthesizing n-r-dimensional tensors, wherein the contained n-dimensional monomial index number is used as an initial state S of a deep reinforcement learning single sampling round epamode;

step 2, constructing a deep reinforcement learning network

Step 2.1, the deep reinforcement learning network comprises an Alex convolutional neural network and an Actor _ Critic network, the Alex convolutional neural network is used for extracting data difference characteristics between data points, and the Actor _ Critic network is used as a decision and evaluation network;

the Alex convolutional neural network comprises five convolutional layers from front to back, each convolutional layer is provided with a convolutional core, an activation function ReLU function and a pooling layer are arranged between the adjacent convolutional layers, and a smooth layer Flatten is connected behind the last pooling layer to realize the transition of the convolutional layers and the fully-connected layers; connecting an adaptive average pooling layer between the convolutional neural network and the Actor _ critical network;

the Actor _ Critic network comprises a decision network Actor and an evaluation network criticic, wherein the decision network Actor comprises two output layers which respectively output a parameter mean value mu and a parameter variance sigma to form Gaussian probability distribution for generating action, and the exploration capability of the action is increased; wherein a is the output of the action and is represented as the weight W and the deviation b of the generated hyperplane; the evaluation network criticic comprises two hidden layers and an output layer, an evaluation value function value evaluated under the state S is output, and the larger the evaluation function value is, the more optimal the state S is;

step 2.2, the decision network Actor outputs n +1 mean values mu and mean square deviations sigma, where n is a multi-dimensional data point V_iThe dimension of (a), i.e., the number of measurement indices of the sensor data; wherein the n Gaussian probability distributions generate a weight W ═ W for hyperplane U₁，W₂，..W_n]One gaussian probability distribution generates the deviation b, U ═ W of the hyperplane U₁，W₂，..W_n]V_i+ b; the hyperplane U divides the data set into a positive area, a negative area and a hyperplane area; wherein, the positive area is a normal data point, the negative area is an abnormal data point, and the hyperplane area is a point on the hyperplane;

step 3, establishing an environment for interaction of the agent in reinforcement learning according to the probability density characteristics of the multidimensional data points, and setting the action and the obtained return of each interaction of the agent and the environment; the action is a hyperplane for classifying the multi-dimensional data points each time, and the obtained return is a value for measuring the quality of the classification effect of the hyperplane generated at this time;

step 4, inputting sample data aiming at the deep reinforcement learning network, performing iterative training until the total return value is stable and converged, extracting network model parameters and storing an optimal model;

step 5, inputting a data set of the unlabeled sample sensor to be detected into a model to generate a plurality of hyperplanes;

according to different hyperplanes, dividing a label-free sample sensor data set into positive and negative areas with different degrees: the positive area is normal data, and the negative area is abnormal data; detecting data points appearing in a negative area of the hyperplane with low accuracy, and regarding the data points as abnormal data; and recording the corresponding sensor when the data point appears in the negative area of the hyperplane with lower accuracy for multiple times, and judging that the sensor possibly fails.

Furthermore, in the Alex convolutional neural network, a convolution kernel is arranged in each convolutional layer, and the dimension is 5 × n; convolution layer dimension output formula

W is an input size r, namely an input tensor of n x r dimension, k is a dimension 5 of a convolution kernel, p is a filling 0, and s is a step size 1; pooling layer dimension output formula

Where w is the convolutional layer output dimension, k is the dimension 3 of the pooling window, and s is the step size 1.

Further, the establishing an environment for interaction of the agent in reinforcement learning according to the probability density characteristic of the multidimensional data points, and setting actions and obtained rewards of each interaction of the agent and the environment, includes:

step 3.1, a Gaussian function is adopted as a window function of the kernel density estimation method, the counting weight is larger if the sample point is larger from the center of the sample area, and therefore the probability density estimation formula is as follows:

wherein

Represents the mean value;

mahalanobis distance between multi-dimensional data points

Where S is a covariance matrix

The probability density estimate, equation 1, can be written as:

wherein d is_ijIs a multi-dimensional data point V_iAnd V_jMahalanobis distance of;

step 3.2, introducing a segmentation function:

when the function input a is larger than 0, the output is 1; when the function input is equal to 0, the output is 0; when the function input is less than 0, the output is-1; for the selected data set D_i＝[V₁，V₂，...，V_m]Vector of data points V_i(i ═ 1, 2, 3,. m), calculate f ([ W · W)₁，W₂，..W_n]V_i+ b) when its value is 1, the multi-dimensional data point V is compared_iInto a positive region F⁺The label is set to 1; otherwise, when the value is-1, the data is stored in the negative area F^-The label is set to-1; when the value is 0, the data point is on the hyperplane and is stored in a hyperplane area F; the calculation is obtained by the calculation of the environment when an intelligent agent, namely a network, interacts with the environment;

step 3.3, set the reward for the positive zone

I.e. the sum of the relative probability densities of m multi-dimensional data points randomly drawn over a positive region, where p_iThe formula 2 shows that ζ is a set relatively dense constant, when the probability density of the data point is greater than ζ, the reward is obtained, and otherwise, the penalty is obtained;

step 3.4, setting punishment of negative region

I.e. the sum of the relative probability densities of k data randomly drawn in the negative region, where p_iIs derived from formula 2, wherein K_pIs a magnification factor;

step 3.5, distance between two hyperplanes

Wherein x_lastIs a point on the last hyperplane, i.e. satisfies W_lastx_last+b _last0, where W is the weight of the hyperplane at that time, W_lastWeight of the last hyperplane, b_lastDeviation of the last hyperplane; f denotes a dividing function, then

Step 3.6, the number of the data points in the hyperplane is d, namely d is the length of the hyperplane area;

step 3.7, setting penalty | | | R of hyperplane₃(D + D), the penalty is smaller when the hyperplane is spaced from the data point more, and the transition of the hyperplane is smaller;

step 3.8, setting the return of single action

Reward＝||R₁||+||R₂||+||R₃Formula 3

Further, the iteratively training the input sample data until the total return value is stable and converged, and extracting the network model parameter saving model includes:

step 4.1, N unlabeled sample sensor datasets { D }collected₁，D₂，...D_NAfter pretreatment, randomly and circularly entering network training, wherein each cycle is an epsilon;

step 4.2, during single loop iteration, generating an initial state S of a single round after data preprocessing of a data set; inputting the initial state S into the network, generating action, i.e. hyperplane U, dividing the data set into positive regions F⁺Negative region F^-And a hyperplane region; randomly selecting C data points to calculate and obtain a Reward positive region F⁺Randomly pick r from the data points of²Preprocessing the data points to generate a next state S ', inputting the next state S' into the whole network, and finally outputting the next state S 'from the Actor network to form a new hyperplane U', a new positive region, a new negative region and a new Reward; continuously obtaining action from state, obtaining new state from action, and obtaining updated action until the action is completedEnding after the single round is completed by the maximum step max _ ep _ step;

4.3, storing the state, the action and the return of each step, and training the network after finishing sampling of one round;

step 4.4, obtaining the actual value function G of each step by adopting the Monte Carlo method_t＝R_t+γR_t+1+γ²R_t+2+...+γ^{max_ep_step-t}R_{max_ep_step}(ii) a Where gamma is a discount factor, R_tThe reward obtained after the t-th action is obtained by formula 3;

step 4.5, according to the optimization of the action strategy, the fitting of the evaluation network to the actual value function is carried out, and the weight of the whole network is updated;

step 4.6, calculate the total reward R of the new round generated by the updated network_allContinuing training according to the step 4.2;

step 4.7, loop to N unlabeled sample sensor datasets { D₁，D₂，...D_NSampling and training randomly;

step 4.8, when the iterative training reaches the total return R_allAnd when the convergence is stable, the network model is stored, and the total return is maximum at the moment according to the gradient descent principle, namely the model is optimal.

Further, in step 4.5, the loss function L of action strategy optimization_a＝E(logπ_θ(as) V (s, a)), where E is desired, π_θ(as) is the probability distribution of the action strategy generated in each step, V (s, a) is the output evaluation value of the evaluation network criticic, and the loss function L of the value function fitting_c＝(G_t-V(s，a))²。

Further, a single round maximum step max _ ep _ step hyperplane { U _ is generated in step 5₁，U₂，...，U_{max_ep_step}}; due to the exploratory nature of reinforcement learning, the accuracy U of each hyperplane can be known₁<U₂<...<U_{max_ep_step}(ii) a The hyperplane with lower accuracy is determined by setting an accuracy threshold.

Compared with the prior art, the invention has the following technical characteristics:

1. the method takes the probability characteristic of sample data as a standard, utilizes the Alex convolutional neural network to extract the data difference characteristic between each data point, uses the Actor _ Critic network as a decision and evaluation network, generates an optimal classification hyperplane group by iterating network weight and optimizing action strategies, classifies the data into normal data, abnormal data, secondary abnormal data and the like, solves the problem of label-free training of data collected in actual engineering, realizes more accurate division of the data, and achieves the purposes of monitoring the abnormal data of the water quality sensor and diagnosing whether the sensor fails.

2. The method is used for training a label-free sample, and optimizing the model based on probability density distribution of data points, so that detection data can be effectively classified, and further, detection and fault diagnosis of abnormal data are realized; the invention can realize universal applicability and generalization under the condition of sufficient training samples.

Drawings

FIG. 1 is a schematic illustration of pre-processing a water quality sensor data set;

FIG. 2 is a network architecture diagram of the method of the present invention;

fig. 3 is a schematic flow chart of network training in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, specific technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings.

Referring to the attached drawings, the invention provides a water quality sensor abnormal data detection and fault diagnosis method, which comprises the following steps:

step 1, collecting N unlabeled sample water quality sensor data sets { D₁，D₂，...D_NAnd carrying out pretreatment, as shown in FIG. 1; wherein each data set D_i(i-1, 2, …, N) contains detection data for m time segments, i.e., D_i＝[V₁，V₂，...，V_m]，V_i(i-1, 2, …, m) is a multidimensional data point which represents the water quality condition detected by the water quality sensor in a certain time periodIn condition of V_i＝[x₁，x₂，...，x_n]^TWherein x is_i(i-1, 2, …, n) represents the detected data of a single index in a certain time period, and the detected data of the single index is an unlabeled sample.

The data preprocessing process comprises the following steps:

for a single data set D_iRandom extraction of r²An individual vector

And synthesizing n-r-dimensional tensors including n-dimensional single index numbers as the initial state S of the deep reinforcement learning single sampling round epasopode, see FIG. 1; and the data in the same index and different time periods are synthesized into an r x r matrix, so that the characteristics of data difference are conveniently extracted.

Step 2, constructing a deep reinforcement learning network

Step 2.1, the deep reinforcement learning network comprises an Alex convolutional neural network and an Actor _ Critic network, the Alex convolutional neural network is used for extracting data difference characteristics between each data point, and the Actor _ Critic network is used as a decision and evaluation network, wherein:

the Alex convolutional neural network designed in the scheme comprises five convolutional layers from front to back, as shown in fig. 2, each convolutional layer is provided with a convolutional core, and the dimension is 5 × n; an activation function ReLU function and a pooling layer are arranged between adjacent convolutional layers, a smooth layer Flatten is connected behind the last pooling layer, multidimensional data are subjected to one-dimensional conversion, and transition between the convolutional layers and the fully-connected layer is realized. Wherein, the dimension of the convolution layer is output formula

Where w is the convolutional layer output dimension, k is the dimension 3 of the pooling window, and s is the step size 1. N of Alex convolution network inputThe tensor of r x r dimension finally obtains the output dimension O of the smooth layer of the Alex convolution neural network_dim＝(r-30)². An adaptive AvgPool layer, namely an adaptive averaging pooling layer, is connected between the convolutional neural network and the Actor _ Critic network, and the integrity of information is reserved.

The Actor _ criticic network comprises a decision network Actor and an evaluation network criticic. The decision network Actor comprises two output layers which respectively output parameters mu and sigma and form Gaussian probability distribution for generating action:

increasing the exploratory power of the action; where a represents the output of the action, in the present invention, the weight W and the deviation b of the generated hyperplane. The evaluation network Critic comprises two hidden layers, an output layer and an evaluation value function V under the output evaluation state S_π(S)，V_πThe greater (S), the more optimal this state S.

And 2.2, preprocessing the data set in the step 1 to generate an initial state S of a single epsilon, wherein the dimension of S is n x r.

Decision network Actor generates a hyperplane U (hyperplane U ═ W)₁，W₂，..W_n]V_i+ b, hyperplane refers to a subspace of dimension n-1 in n-dimensional linear space. It may divide the linear space into two disjoint parts. Used in the present invention to partition abnormal and non-abnormal data points in a multidimensional space; the weight of the hyperplane generated by the Actor _ critical network) partitions the dataset into positive regions, negative regions and hyperplane regions. Wherein the positive region is a normal data point, the negative region is an abnormal data point, and the hyperplane region is a point on the hyperplane.

The decision network Actor outputs n +1 Gaussian probability distributions, namely n +1 mean values mu and mean square deviations sigma are generated, wherein n is a multi-dimensional data point V_iThe dimension of (a) is the number of measurement indices of the sensor data; wherein the n Gaussian probability distributions generate a weight W ═ W for hyperplane U₁，W₂，..W_n]A gaussian probability distribution generates the deviation b of the hyperplane U. I.e. U ═ W₁，W₂，..W_n]V_i+b，V_iIndicating the water quality condition detected by the water quality sensor in a certain time period, and making U ═ W₁，W₂，..W_n]V_i+b＝0。

Step 3, according to the multi-dimensional data points, i.e. V in the invention_iThe probability density characteristic of the method is used for establishing an environment for interaction of the intelligent agent in reinforcement learning, and setting actions and obtained returns of interaction of the intelligent agent and the environment each time. In the invention, the action refers to a hyperplane for classifying data points each time, and the obtained return refers to a value for measuring the quality of the classification effect of the hyperplane generated at this time. The environment is relatively independent of the network, the network outputs actions to interact with the environment, and the network outputs evaluation on the action interaction.

Step 3.1, since data point V_iThe data is label-free data, and the distribution form of the data is unknown, so a nonparametric estimation function method and a nuclear density estimation method, namely a Parzen window density estimation method, are adopted. A Gaussian function is adopted as a window function of a kernel density estimation method, namely the counting weight is larger when a sample point is larger from the center of a sample area, so that a probability density estimation formula is as follows:

wherein

The mean value is indicated.

The dimension of the sensor detection data increases with the increase of the index quantity, a high-dimensional state is presented, and very large correlation exists between data in different dimensions. The Mahalanobis distance rotates the variable according to the principal component, because each direction of the principal component is the direction of the feature vector, the variance of each direction corresponds to the feature value, each dimension is independent, and then the Euclidean distance is used for calculation. Mahalanobis retains more features between the dimensions than euclidean distances.

Therefore, as used in the present invention, the mahalanobis distance between the multi-dimensional data points of the sample

Where S is a covariance matrix

The probability density estimate, equation 1, can be written as:

wherein d is_ijIs a multi-dimensional data point V_iAnd V_jThe distance between the two adjacent channels of the channel,

is the mean value.

Step 3.2, introducing a segmentation function:

when the function input is greater than 0, the output is 1; when the function is equal to 0, the output is 0; when the function is less than 0, the output is-1. Data set D to be selected_i＝[V₁，V₂，...，V_m]Vector of data points V_i(i ═ 1, 2, 3,. m), calculate f ([ W · W)₁，W₂，..W_n]Vi + b) that, when it has a value of 1, stores the multidimensional data point Vi in the positive region F⁺The label is set to 1. Otherwise, when the value is-1, the data is stored in the negative area F^-The label is set to-1. When the value is 0, the data point is on the hyperplane and is stored in a hyperplane area F; these calculations are calculated by the environment when the agent, i.e. the network, interacts with the environment.

Step 3.3, set the reward for the positive zone

I.e. the sum of the relative probability densities of m multi-dimensional data points randomly drawn over a positive region, where p_iFrom equation 2, where ζ is a relatively dense constant that is set, it is a reward when the probability density of the data point is greater than ζ, and a penalty otherwise. The more dense data points in the correct region can be reached, then R₁The larger the | i, the larger the reward.

Step 3.4, setting punishment of negative region

I.e. the sum of the relative probability densities of k data randomly drawn in the negative region, where p_iIs derived from formula 2, wherein K_pThe degree of density of the negative area is constrained for magnification. Can reach the condition that R is less and less sparse when the data point of the negative area is more and less₂The smaller the | i, the smaller the penalty.

Step 3.5, distance between two hyperplanes

And the stability of the model caused by the jump of the hyperplane is prevented, and the hyperplane is better when the D is smaller.

Step 3.6, according to the same region, the more sparse the data points are, the smaller the density among the data points is, and the better the number of the data points in the hyperplane is; i.e., fewer data points for the hyperplane region F, the better. Let the number of data points in the hyperplane be d, i.e., d is the length of the hyperplane region.

Step 3.7, setting penalty | | | R of hyperplane₃I.e. when the hyperplane and numberThe larger the spot spacing, the smaller the transition of the hyperplane, and the smaller the penalty.

Step 3.8, setting the return of single action

Reward＝||R₁||+||R₂||+||R₃Formula 3

And 4, aiming at input sample data, performing iterative training until the total return value is stable and converged, and extracting a network model parameter storage model.

Step 4.1, N unlabeled sample sensor datasets { D }collected₁，D₂，...D_NAnd after preprocessing, randomly and circularly entering network training, so that the network has better robustness. Once per cycle, the cycle is an epsilon, i.e. a round, which is a markov chain.

And 4.2, during single loop iteration, generating an initial state S of a single round after data preprocessing of the data set. Inputting the initial state S into the network, generating action, i.e. hyperplane U, dividing the data set into positive regions F⁺Negative region F^-And a hyperplane region. And randomly selecting C data points to calculate and obtain Reward. Positive region F⁺Randomly picking the data points of r²And preprocessing the data points to generate a next state S ', inputting the next state S' into the whole network, and finally outputting the next state S 'from the Actor network to form a new hyperplane U', a new positive and negative area and a new Reward. And obtaining the action from the state continuously, obtaining a new state from the action, and obtaining an updated action until the maximum step max _ ep _ step of a single round is finished.

And 4.3, storing the state, action and return of each step, and training the network after finishing sampling of one round.

Step 4.4, obtaining the actual value function G of each step by adopting the Monte Carlo method_t＝R_t+γR_t+1+γ²R_t+2+...+γ^{max_ep_step-t}R_{max_ep_step}. Where gamma is a discount factor, R_tThe reward obtained after the t-th action is given by equation 3.

Step 4.5, optimizing according to the action strategy and simulating the actual value function by the evaluation networkIn combination, the weights for the entire network are updated. Wherein the action strategy is optimized by a loss function L_a＝E(logπ_θ(as) V (s, a)), where π_θ(a | s) is an action strategy probability distribution generated at each step, and V (s, a) is an output evaluation value of the evaluation network criticic. Loss function L of value function fit_c＝(G_t-V(s，a))²。

Step 4.6, calculate the total reward R of the new round generated by the updated network_allContinuing to train according to the step 4.2.

Step 4.7, loop to N unlabeled sample sensor datasets { D₁，D₂，...D_NAnd fifthly, randomly sampling and training.

Step 4.8, when the iterative training reaches the total return value R_allAnd when the model is stably converged, the model is saved, and the total return is maximum at the moment according to the gradient descent principle, namely the model is optimal.

Step 5, inputting a label-free sample sensor data set to be detected into a model, and generating max _ ep _ step hyperplanes { U) in a single round₁，U2，...，U_{max_ep_step}}; due to the exploratory nature of reinforcement learning, the accuracy U of each hyperplane can be known₁<U₂<...<U_{max_ep_step}。

According to different hyperplanes, dividing the unlabeled data sample into positive and negative areas with different degrees: the positive area is normal data, and the negative area is abnormal data; data points that appear in the negative region of the hyperplane with lower accuracy are detected and considered as anomalous data. Wherein the hyperplane with lower accuracy can be determined by setting an accuracy threshold.

And recording the corresponding sensor when the data point appears in the negative area of the hyperplane with lower accuracy for multiple times, and judging that the sensor possibly fails.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A water quality sensor abnormal data detection and fault diagnosis method is characterized by comprising the following steps:

step 1, collecting N unlabeled sample water quality sensor data sets { D₁，D₂，…D_NAnd preprocessing, each data set D_iDetection data comprising m time segments, D_i＝[V₁，V₂，…，V_m]In which V is_iThe data points are multidimensional data points and represent the water quality condition detected by the water quality sensor in a certain time period;

the data preprocessing process comprises the following steps:

step 2, constructing a deep reinforcement learning network

2. The method for detecting the abnormal data and diagnosing the faults of the water quality sensor according to claim 1, wherein in the Alex convolutional neural network, a convolution kernel with the dimension of 5 x n is arranged in each convolution layer; convolution layer dimension output formula

3. The method for detecting abnormal data and diagnosing faults of a water quality sensor according to claim 1, wherein the establishing of an environment for interaction of an agent in reinforcement learning according to probability density characteristics of multidimensional data points, and the setting of actions and obtained returns of each interaction of the agent and the environment comprises:

wherein

Represents the mean value;

mahalanobis distance between multi-dimensional data points

Where S is a covariance matrix

The probability density estimate, equation 1, can be written as:

step 3.2, introducing a segmentation function:

when the function input a is larger than 0, the output is 1; when the function input is equal to 0, the output is 0; when the function input is less than 0, the output is-1; for the selected data set D_i＝[V₁，V₂，…，V_m]Vector of data points V_i(i ═ 1, 2, 3,. m), calculate f ([ W · W)₁，W₂，..W_n]V_i+ b) when its value is 1, the multi-dimensional data point V is compared_iInto a positive region F⁺The label is set to 1; otherwise, when the value is-1, the data is stored in the negative area F^-The label is set to-1; when the value is 0, the data point is on the hyperplane and is stored in a hyperplane area F; the calculation is obtained by the calculation of the environment when an intelligent agent, namely a network, interacts with the environment;

step 3.3, set the reward for the positive zone

step 3.4, setting punishment of negative region

step 3.5, distance between two hyperplanes

Wherein x_lastIs a point on the last hyperplane, i.e. satisfies W_lastx_last+b_last0, where W is the weight of the hyperplane at that time, W_lastWeight of the last hyperplane, b_lastDeviation of the last hyperplane; f denotes a dividing function, then

step 3.8, setting the return of single action

Reward＝||R₁||+||R₂||+||R₃Equation 3.

4. The method for detecting abnormal data and diagnosing faults of a water quality sensor according to claim 1, wherein the iterative training is performed on input sample data until a total return value is stable and converged, and a network model parameter storage model is extracted, and the method comprises the following steps:

step 4.1, N unlabeled sample sensor datasets { D }collected₁，D₂，…D_NAfter pretreatment, randomly and circularly entering network training, wherein each cycle is an epsilon;

step 4.2, during single loop iteration, generating an initial state S of a single round after data preprocessing of a data set; will be as followsThe initial state S is input into the network, action, i.e. hyperplane U is generated, the data set is divided into positive regions F⁺Negative region F^-And a hyperplane region; randomly selecting C data points to calculate and obtain a Reward positive region F⁺Randomly pick r from the data points of²Preprocessing the data points to generate a next state S ', inputting the next state S' into the whole network, and finally outputting the next state S 'from the Actor network to form a new hyperplane U', a new positive region, a new negative region and a new Reward; obtaining an action from the state continuously, obtaining a new state from the action, and obtaining an updated action until the maximum step max _ ep _ step of a single round is finished;

step 4.4, obtaining the actual value function G of each step by adopting the Monte Carlo method_t＝R_t+γR_t+1+γ²R_t+2+…+γ^{max_ep_step-t}R_{max_ep_step}(ii) a Where gamma is a discount factor, R_tThe reward obtained after the t-th action is obtained by formula 3;

5. The method for detecting abnormal data and diagnosing faults of a water quality sensor according to claim 4, wherein in step 4.5, the loss function L optimized by action strategy_a＝E(logπ_θ(as) V (s, a)), wherein E isExpectation of,. pi_θ(as) is the probability distribution of the action strategy generated in each step, V (s, a) is the output evaluation value of the evaluation network criticic, and the loss function L of the value function fitting_c＝(G_t-V(s，a))²。

6. The method as claimed in claim 1, wherein the maximum steps max _ ep _ step hyperplanes { U _ of a single round are generated in step 5₁，U₂，...，U_{max_ep_step}}; accuracy U of each hyperplane due to reinforcement learning exploratory₁＜U₂＜...＜U_{max_ep_step}(ii) a The hyperplane with lower accuracy is determined by setting an accuracy threshold.