CN115099321B

CN115099321B - Bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method and application

Info

Publication number: CN115099321B
Application number: CN202210687441.8A
Authority: CN
Inventors: 叶柯; 周奕希; 孔佳玉; 曹瀚洋; 姜沁琬; 李宛欣; 韩伟
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2023-08-04
Anticipated expiration: 2042-06-17
Also published as: CN115099321A

Abstract

The invention belongs to the field of pollution discharge abnormality monitoring, and provides a bidirectional autoregressive non-supervision pretraining fine-tuning pollution discharge abnormality monitoring method, which comprises the following steps: the multi-channel acquisition and transmission module periodically acquires the data of the pollution source discharge port and preprocesses the original multi-dimensional time series sample; resampling the preprocessed multidimensional time series sample; constructing a model comprising three parts of data resampling enhancement, an encoder and a decoder and pre-training; performing small sample fine adjustment and sequence point classification on the model after pre-training; and carrying out pollution discharge abnormality monitoring by using the model. The method fuses a network with strong generalization capability to fully extract more abstract semantic features in the multidimensional time sequence, so that the model can have faster reasoning speed and higher precision.

Description

Bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method and application

Technical Field

The invention relates to the field of pollution discharge abnormality monitoring, in particular to a bidirectional autoregressive non-supervision pretraining fine-tuning pollution discharge abnormality monitoring method and application.

Background

The scheme of the fourteenth five-year planning and 2035 distant target outline of national economy and society development of the people's republic of China is as follows: "deep pollution prevention and attack combat, build up sound environment treatment system, promote accurate, science, legal, systematic pollution control". Pollution discharge enterprises are fixed pollution sources, and pollution discharge is one of main sources of environmental pollution in China. The supervision of fixed pollution sources is a serious issue for pollution control in China. However, at present, the conditions of abnormal or invalid monitoring data caused by enterprise theft and malicious tampering of monitoring equipment parameters, damage to on-line monitoring equipment facilities, non-standard equipment operation and maintenance and untimely time still exist, pollution is caused to the environment, and higher requirements are put forward on supervision. At present, the detection of water quality emission at home and abroad is quite complete in terms of hardware, various water quality sensors, floating water quality monitoring stations, water quality monitoring terminals and the like are developed and researched, and the data of relevant indexes of water quality emission are ensured to be accurately and efficiently collected in large quantities. However, in the aspect of processing pollutant data, a large margin exists, and the legal pollution discharge of enterprises can not be judged reasonably and efficiently by utilizing the data.

The traditional time sequence feature extraction method is mainly divided into two parts: (1) Statistical based decisions such as 3-sigma principle, confidence principle, etc. The method comprehensively evaluates the data of all dimensions of the sequence, and updates and judges the outliers in real time by means of statistics. (2) Based on common machine learning, features are constructed manually, which requires researchers to have clear knowledge and understanding of the meaning of data to perform appropriate feature engineering to ensure robustness of the model. Therefore, in more cases, it is difficult to achieve a preferable effect. Learning and extracting the multi-dimensional time series features are necessary preconditions for the task of outlier detection.

Disclosure of Invention

Aiming at the defects of the traditional method and the characteristics of the multi-dimensional time sequence, the improved deep learning network is utilized to extract the characteristics of the multi-dimensional time sequence, the network with strong generalization capability is fused to fully extract more abstract semantic characteristics in the multi-dimensional time sequence, and parameters are optimized while a complex model is constructed, so that the model can have higher reasoning speed and higher precision.

The invention provides a bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method, which comprises the following steps:

s1: collecting data

The multi-channel acquisition and transmission module periodically acquires the following data of the pollution source discharge port: pressure, flow rate, temperature, density; obtaining an original multidimensional time sequence sample, and transmitting the original multidimensional time sequence sample to a data processing module;

s2: preprocessing an original multidimensional time series sample;

using a Mask-based method to perform denoising and interpolation operation on the data obtained by the data processing module;

s3: resampling the preprocessed multidimensional time series sample;

performing telescopic transformation on a certain time sequence, resampling in a sliding window mode, extracting original samples at set time intervals, and extracting multi-dimensional time sequence characteristics of set scales;

s4: constructing a model comprising three parts of data resampling enhancement, an encoder and a decoder and pre-training;

s5: and performing small sample refinement and sequence point classification on the model after the pre-training.

After pre-training, the model learns the characteristics of the multi-dimensional time sequence, namely the characteristics of hierarchy and multi-scale, the model is a universal characteristic, the intrinsic relation among the parts of the whole multi-dimensional time sequence is included, the pre-training model is used as a base line Baseline of a downstream task, fine-tuning is carried out on the downstream abnormality detection task, an input sequence is used as a primary sequence, the sequence output by the model is recorded as a reconstructed sequence, so that the primary sequence can be encoded and decoded by the pre-training model, the reconstructed sequence without noise information can be reconstructed, but the high-dimensional characteristics of the primary sequence are provided, the original sequence is used for comparing with the multi-dimensional time sequence obtained after reconstruction, the dynamic time regularity index of each corresponding point is calculated as an abnormality index, and then the abnormality and the normal condition are classified by a clustering method;

firstly, randomly extracting 10% of continuous sequences from a data set formed by a long string of multidimensional time sequences to serve as samples, repeating the training process in the S4, and performing Fine tuning of small samples in a downstream task;

the specific steps of fine tuning the small sample are as follows: input is the original multidimensional time series X 'after pretreatment' _in ∈R ^batch ^{_size×input_window×in_dim} Wherein batch_size is the training batch size, input_window represents the sequence length, i.e. the sliding window width in the preprocessing process, and in_dim represents the dimension of the sample;

reconstructing a sequence by an encoder and a decoderThe loss of each time point is obtained, and the sequence points are classified by a K-means clustering method, and the method specifically comprises the following steps:

(1) Firstly setting parameters K, wherein the meaning of K is to aggregate data into several categories, and K=2 is taken here;

(2) Randomly selecting two points from the data to form a clustering initial center point;

(3) Calculating the distances from all other points to the two points, finding out the center point nearest to each data point, and dividing the point into clusters represented by the center point; then all points are divided into two clusters;

(4) Re-calculating the mass centers of the two clusters to be used as the center point of the next cluster;

(5) Repeating the processes of the steps (3) - (4), clustering again, and repeating the process repeatedly;

(6) Stopping when the attribution category of all the sample points is not changed after re-clustering;

finally, two types of samples can be obtained, and the type with the small sample number is abnormal;

in practical cases, enterprises carry out modification of uploading data privately in order to discharge without exceeding standard: when the received emission data index has an ascending trend in a certain time period, but suddenly and rapidly descends after a certain time point in the process, a spike-shaped data change condition is generated, and the phenomenon does not occur in a short time period, it is reasonably speculated that a monitoring object possibly performs external interference on the detection equipment or tamper behavior on the uploaded data when the pollution discharge index is found to be in the ascending trend and exceeds the standard, the corresponding time point is marked as abnormal data, the company and the company sharing the pollution discharge port are marked with abnormality, and finally all abnormal points are found out through a clustering method.

S6: and (5) monitoring pollution discharge abnormality by using the model obtained in the step (S5).

Preferably, in the step S1, the step of collecting data by the multi-channel collecting and transmitting module includes: initializing, reading primary serial port interrupt data, packaging and transmitting the data,

the initializing step comprises the following steps: initializing and configuring an ESP8266 chip running environment;

the step of reading the primary serial port interrupt data specifically comprises the following steps: setting the data type of temporarily stored serial port interrupt data as the unsigned int type of a 32-bit computer, wherein the data length is 16bits, updating the data in real time during serial port interrupt, and performing secondary filtering in fixed time interrupt;

the data packaging and transmitting method specifically comprises the following steps: the ESP8266 chip breaks the sixteen bits of data, the eight bits are divided into the first eight bits and the last eight bits, the data needle heads are used for 0x03 and 0x03, the data needle tails are used for 0x03 and 0x03, the data are packed, and finally, the ESP8266 chip is used for transmitting the data to the data processing module in a wireless hot spot mode through a network in a character form.

Preferably, in the step S2, the Mask-based Mask method specifically includes: a complete time sequence is taken as a sample, 10% of the time sequence is randomly covered, and the time sequence is restored by using a Kalman filtering algorithm.

Preferably, the step S4 specifically includes the following substeps:

s4.1: using a noise function to destroy the sequence:

destroying the time sequence obtained in the step S3 by using any one or more combination of five noise functions of Token Masking, token delay, text infiling Sentence Permutation and Document Rotation;

s4.2: constructing an encoder network skeleton part:

selecting a self-attention layer and an MLP network as backbone networks, and iterating 12 times to form an encoder;

the multidimensional time series is standardized, the influence of dimension is eliminated,

adding position information for multidimensional time series using position coding PE

pos refers to positions of different time points, 2i and 2i+1 respectively correspond to different dimension indexes of a certain time point, and odd dimensions are favorable

With sin sine coding, even dimension with cos cosine coding, d _model Refers to the total dimension of the data, here preventing overflow by an index of 10000 being too large;

three matrices of Q, K and V are generated by using three linear layers, each K is accessed by using Q, and the three matrices are converted into an exponent based on e after scaling and softmax, and then normalization processing is carried out:

the normalized value is used as the weight of V, so that the attribute value is calculated for subsequent MLP layers and decoders to reconstruct sequences:

a multi-headed mechanism is introduced to accommodate higher dimensional time series information:

MultiHead(Q，K，V)＝[head ₁ ，...，head _n ]W _o

where head _i ＝Attention(QW _i ^Q ，KW _i ^K ，VW _i ^V )

wherein h is the number of attention heads, the size of the input dimension must be divided by the number of attention heads when using multi-head attention, the input dimension is divided into h groups, and each group of characteristics has an own attention system;

s4.3: constructing a decoder part:

taking a multi-head self-Attention layer and an MLP layer as network frameworks, performing Attention aggregation calculation by using a hidden state result of a Cross multi-head Attention aggregation operation Cross Attention and the last layer of an encoder after one-time stacking, and then stacking the MLP layers of the multi-time self-Attention layer;

taking K and V obtained by self-attention mechanism in the encoder and Q obtained by training in the decoder to perform aggregation calculation, and adding a standardized layer after each layer of MLP by adopting a Layer Normalization method, wherein H is the number of hidden layer nodes in one layer, l is the layer number of the MLP, and we can calculate the normalized statistic mu of Layer Normalization ^l Sum sigma ^l ：

Wherein a is ^l Statistics μ for an input multidimensional time series ^l Sum sigma ^l Is independent of the number of samples, but depends only on the number of hidden nodes,

as long as the number of hidden nodes is enough, the normalization statistics of Layer Normalization can be guaranteed to be representative enough, and the data output after the L-layer MLP isi represents the dimension, denoted in_dim, passing μ ^l Sum sigma ^l Normalized values can be obtained>Wherein E takes 1e ^-5 Prevent dividing by 0;

performing restoration reconstruction on the destroyed time sequence in an autoregressive mode; the restoration reconstruction degree is evaluated through a dynamic time warping index;

the calculation mode of the dynamic time warping index is as follows: after aligning the two time sequences, calculating a difference matrix, wherein the goal is to find a path from (0, 0) to (n, n) in the matrix, so that the cumulative euler distance of the elements on the path is minimum, and the minimum path is a Wraping path, namely a dynamic time warping index, used for representing the similarity of the two time sequences:

constructing an n x n matrix, the matrix (i _th ，j _th ) The element being the point q _i And c _j Euclidean distance d (q) _i ，c _j ) The method comprises the steps of carrying out a first treatment on the surface of the The Wraping path defines a mapping between timings Q and C, denoted as P, a set of consecutive matrix elements, the t < th > of P _h The individual elements being defined as p _t ＝d(q _i ，c _j ) _t Wherein p=p ₁ ，p ₂ ，...，p _T N is less than or equal to T is less than or equal to 2n-1, and is essentially obtained by a dynamic programming method:

d _ij ＝d(x _i ，y _j )

D(i，j)＝d _ij +min{D(i-1，j)，D(i，j-1)，D(i-1，j-1)}

wherein D (i-1, j) represents x _i -1 and y _j The subsequence distance at the time of matching, D (i, j-1) represents x _i And y is _j -1 subsequence distance at match, D (i-1, j-1) represents x _i -1 and y _j -1 subsequence distance at match;

in a multivariate time series, x _i And y _j Are vectors in the in dim dimension, and x _i The element in (a) is the value of the variable at time i, y _j The element in (a) is the value of the variable at time j, d (x _i ，y _j ) I.e. x at time i _i And y at time j _j Distance at alignment; vector x _i And y _j The distance calculation method d (x _i ，y _j ) May be calculated by euclidean distance or mahalanobis distance;

euclidean distance:

for a mean value μ= (μ) ₁ ，μ ₂ ，μ ₃ ，...，μ _p ) ^T Multivariate x= (x) with covariance matrix s ₁ ，x ₂ ，x ₃ ，...，x _p ) ^T The mahalanobis distance is:

it differs from euclidean distance in that it allows for a link between various characteristics and is Scale-independent (Scale-independent), i.e. independent of the measurement Scale.

S4.4: the pre-training process is as follows:

resampling the acquired data to be used as an original sequence, then carrying out sequence destruction on the original sequence, then transmitting the original sequence into a model to carry out Forward reasoning Forward, processing the original sequence through an encoder and a decoder to obtain a new sequence, and recording the new sequence as a reconstructed sequence of the original sequence;

the similarity between the original sequence and the reconstructed sequence is measured by using a dynamic time warping index, and is recorded as DTW, and a loss function is constructed:

after the loss before and after the reconstruction is calculated, back propagation is carried out, model parameters are updated, so that the loss is smaller and smaller until the loss is converged and is not reduced, the model gradually learns the characteristics in the multidimensional time sequence in the process, the characteristic is automatically extracted, and the model is a pre-training process, and finally a pre-training model with the sequence extraction capability is obtained for later use of downstream tasks.

The invention also provides application of the bidirectional autoregressive non-supervision pre-training fine-tuning type pollution discharge abnormality monitoring method on a visual large screen management platform, wherein the hardware part of the visual large screen management platform comprises the following components: the multichannel acquisition and transmission module, data processing module, multichannel acquisition and transmission module includes: the system comprises a pressure sensor, a flow rate meter, a thermometer and a density detector, wherein the sensor is arranged at a pollution source discharge port to be monitored, the front end of a data processing module is developed based on a reaction frame, and is used for interactively acquiring analyzed pollution discharge data with the rear end, and introducing a DataV and an AntV for visualizing the abnormal pollution discharge data; the back end of the data processing module is developed based on Golang and comprises Gin and GORM frames; coding a data processing module, namely generating a Docker mirror image by writing Docker file, and performing multi-platform migration operation; the visual large screen management platform comprises three views of a government, an enterprise and a person.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings is provided below, and some specific examples of the present invention will be described in detail below by way of example and not by way of limitation with reference to the accompanying drawings. It will be appreciated by those skilled in the art that the drawings are not necessarily drawn to scale. In the accompanying drawings:

FIG. 1 is a functional flow chart of the present invention;

fig. 2 is a table of a specified format uploaded by a user.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The flow chart of the bidirectional autoregressive non-supervision pre-training fine-tuning type pollution discharge abnormality monitoring method is shown in fig. 1, and the specific implementation steps of the invention are as follows:

s1.1: the hardware device is selected.

And selecting data sensors such as a pressure sensor, a flow rate meter, a thermometer, a density detector and the like, and collecting data near the discharge port to be used as an index for detecting pollution. The main control and Bluetooth transmission module selects ESP8266 to realize main control and wireless transmission functions. The ultra-low power consumption 32-bit micro MCU of the Tensilica L106 which is advanced in the industry is integrated in a smaller-size package, the micro MCU is provided with a 16-bit simplified mode, the main frequency supports 80MHz and 160MHz, RTOS is supported, wi-Fi MAC/B/RF/PA/LNA is integrated, and an on-board antenna is integrated. The module supports standard IEEE802.11 b/g/n protocol, complete TCP/IP protocol stack.

S1.2: the data acquisition program mainly comprises modules for initializing, reading primary serial port interrupt data, packaging and transmitting the data and the like.

The initialization is mainly to perform initialization configuration on the MCU running environment. Setting the data type of temporarily storing the serial port interrupt data as the unsigned int type of a 32-bit computer, wherein the data length is 16bits, updating the data in real time during serial port interrupt, and performing secondary filtering in fixed time interrupt. The MCU breaks the sixteen bits of data read, divides the sixteen bits into the first eight bits and the last eight bits, uses 0x03 and 0x03 as data needle heads, uses 0x03 and 0x03 as data needle tails, packages the data, and finally sends the data in a character form. The data uploading part is connected with a computer in a hot spot mode through an ESP8266 module to realize wireless data transmission, so that the data is transmitted into a network and is uploaded to a data center of a cloud.

S2: preprocessing an original multidimensional time series sample;

selecting an interpolation algorithm by using a Mask-based method, and denoising and interpolating the data obtained by the data processing module;

taking a complete time sequence as a sample, randomly covering 10% of the time sequence, restoring the time sequence by using different interpolation algorithms such as linear interpolation, quadratic interpolation, moving average, index average and the like, and if the difference between the restored time sequence and the original sequence is smaller, indicating that the interpolation scheme has better characteristic adaptability to the sequence. And finally, selecting Kalman filtering as an interpolation algorithm.

S3: resampling the preprocessed multidimensional time series sample;

and performing telescopic transformation on a certain time sequence, resampling in a sliding window mode, extracting original samples at different time intervals, and extracting multi-dimensional time sequence features of different scales.

Determining a sliding window with a fixed length, then moving the sliding window to the right from the beginning of the sequence by a certain step length, wherein the covered sequence area after each movement is a small sample, thus dividing the original long-time sequence into a plurality of subsequences as a new data set for training a model, presetting the sliding window with the size of input_window, inputting the total length of the sample sequence to be seq_len and the dimension of the multidimensional time sequence to be in_dim, so that the input sample is a two-dimensional tensor X _in ∈R ^{seq_len×in_dim} The three-dimensional tensor X 'of the sample is obtained through multi-scale stretching transformation and sliding window resampling' _in ∈R ^batch ^{_size×input_window×in_dim} 。

s4.1: the sequence is corrupted using a noise function.

Five noise function methods, namely Token Masking, token delay, text messaging, sentence Permutation and Document Rotation, are utilized to destroy the time sequence input in the pre-training process, and destroy the time sequence obtained in the step S3. The difficulty of the task of reconstructing the pre-training phase sequence is increased, so that the model can learn and extract the characteristics in the multidimensional time sequence better.

S4.2: an encoder network backbone portion is constructed.

The self-attention layer and the MLP network are selected as backbone networks, and the encoder is formed by 12 iterations. The multidimensional time series is firstly entered

The line is standardized, so that the influence of dimension is eliminated, and position codes (Position Encoding) are used to add position information to the multidimensional time sequence, so that the position information is prevented from being lost due to parallelism in a follow-up attention mechanism.

pos means different time point positions, 2i and 2i+1 respectively correspond to different dimension indexes of a certain time point, odd dimension is coded by sin sine, even dimension is coded by cos cosine, and d _model Referring to the overall dimension of the data, here the index of 10000 is prevented from overflowing too much. Three matrices of q (query), K (key), V (value) are then generated using three linear layers, and q is used to access each K, scaled and softmax (first converted to an exponent based on e, then normalized:then the value is used as the weight of V, so that the value of the attribute is calculated for subsequent MLP layers and decoders to reconstruct sequences:

MultiHead(Q,L,V)＝[head ₁ ,…,head _h ]W ₀

where g is the number of attention heads, the size of the input dimension must be divided by the number of attention heads when using multiple heads, dividing the input dimension into μ groups, each group of features having its own attention system.

S4.3: a decoder section is constructed.

Also, multi-head self-Attention layer and MLP layer are used as network skeleton, after one-time stacking, cross multi-head Attention aggregation operation (Cross Attention) and encoder are utilized to be the mostThe hidden state result of the latter layer is subjected to attention aggregation calculation, and then is subjected to multi-time stacking of the self-attention layer MLP layers. Taking K (key) obtained by self-attention mechanism in encoder, V (value) and Q (query) obtained by training in decoder to perform aggregation calculation, adding standardized layer after each MLP, adopting Layer Normalization, setting H as the number of hidden layer nodes in one layer, and l as the layer number of MLP, we can calculate Layer Normalization normalized statistic mu ^l Sum sigma ^l ：

The calculation of the statistics is irrelevant to the number of samples, and the number of the statistics is only dependent on the number of hidden nodes, so that as long as the number of the hidden nodes is enough, we can ensure that the normalized statistics of LN are enough representative, and the data output after the L-layer MLP isi represents the dimension, denoted in_dim, passing μ ^l Sum sigma ^l Can obtain normalized valueWherein E takes 1e ^-5 Prevent division by 0. And finally, recovering and reconstructing the destroyed time sequence in an autoregressive mode. The degree of restoration reconstruction is evaluated by a dynamic time warping index. Namely, after aligning two time sequences, a difference matrix is calculated, and the objective is to find a path from (0, 0) to (n, n) in the matrix, so that the cumulative euler distance of elements on the path is minimum, wherein one path is called a wraping path, namely a dynamic time warping index, and is used for representing the similarity of two time sequences:

an n x n matrix is constructed first, the matrix (i _th ,j _th ) The element being the point q _i And c _j Euclidean distance d (q) _i ,c _j ). The Wraping path defines the mapping between timings Q and C, denoted P. Is thatA group of consecutive matrix elements, t < th > of P _h The individual elements being defined as p _t ＝d(q _i ,c _j ) _t Wherein p=p ₁ ,p ₂ ,…,p _T N is less than or equal to T is less than or equal to 2n-1, and is essentially obtained by a dynamic programming method:

d _ij ＝d(x _i ,y _j )

D(i,j)＝d _ij +min{D(i-1,j),D(i,j-1),D(i-1,j-1)}

wherein D (i-1, j) represents x _i -1 and y _j The subsequence distance at the time of matching, D (i, j-1) represents x _i And y is _j -1 subsequence distance at match, D (i-1, j-1) represents x _i -1 and y _j -1 subsequence distance at match. In a multivariate time series, x _i And y _j Are vectors in the in dim dimension, and x _i The element in (a) is the value of the variable at time i, y _j The element in (a) is the value of the variable at time j, d (x _i ,y _j ) I.e. x at time i _i And y at time j _j Distance when aligned. Vector x _i And y _j The distance calculation method d (x _i ,y _j ) Calculated by euclidean distance.

Euclidean distance:

s4.4: pre-training process

Resampling the acquired data to be used as an original sequence, then carrying out sequence destruction on the original sequence, then carrying out Forward reasoning (Forward) on an input model, processing the original sequence through an encoder and a decoder to obtain a new sequence, and recording the new sequence as a reconstructed sequence of the original sequence.

And then measuring the similarity between the original sequence and the reconstructed sequence by using a dynamic time warping index, marking the similarity as DTW, and constructing a loss function:

With a small number of samples, the training process is repeated, and Fine tuning (Fine tuning) of the small samples is done in the downstream task.

The data set is a long series of multidimensional time sequences, 10% of continuous sequences are randomly extracted in the data set to serve as samples, the training process is repeated, and Fine adjustment (Fine tuning) of small samples is performed in a downstream task: the input is a preprocessed multidimensional time series X' _in ∈R ^{batch_size×input_window×in_dim} Reconstructing the sequence by an encoder and a decoder to obtain Calculating a dynamic time warping indicator for representing the similarity of two time sequences, and calculating +.>Finally, K-means clustering is carried out on the Loss to obtain abnormal points:

(1) Firstly setting a parameter K, which means that data are aggregated into several classes (here k=2);

(3) The distances from all other points to these two points are calculated, then the center point closest to each data point is found, and the point is divided into clusters represented by this center point. Then all points are divided into two clusters;

(5) Repeating the process of the steps 3-4, re-clustering, and repeating the process repeatedly;

finally, two types of samples can be obtained, and the type with the small sample number is abnormal.

Furthermore, the model is integrated in a visual large screen management platform, and the front end and the back end are built to form a management system of three views of government, enterprise and individual.

The front end side is developed based on a compact framework, and uses axios to interact with the rear end to acquire the analyzed company pollution discharge data, and introduces DataV and AntV to visualize the company pollution discharge abnormal data, so that the abnormal state of the company pollution discharge is visually displayed. The rear end uses Golang to develop the rear end, and mainly uses light Gin and GORM frames, wherein the Gin frame has better performance and more expansion functions compared with the native http. The persistence layer is written using the GORM framework to efficiently read, write and manage data to MySQL. In the aspect of system design, users are classified into three levels of government administrators, company personnel and tourists, different levels have different authorities, for example, the administrators (government-related administrators) can modify abnormal states of the company, the company can conduct abnormal complaints, the tourists can conduct reporting and the like. The Gin framework is combined with the flash of python, the Gin is responsible for processing routing, the flash is responsible for data processing and model operation, relevant indexes and data are obtained through processing by utilizing data processing libraries such as Numpy, pandas and the like and models written by the Pytorch framework, and are returned to the front end for visual display, and the advantages of the two programming languages are complementary and organically combined, so that the functional advantages of the two programming languages in different fields are fully exerted.

And packaging the local project and various dependencies by writing the Dockerfile, generating a Docker mirror image, uploading the Docker mirror image to a warehouse, and then carrying out mirror image pulling on a server side and creating a container to deploy application services. The Docker can run on a plurality of platforms, and can easily migrate the application running on one platform to the other platform without worrying about the condition that the application cannot run normally due to the change of the running environment.

The invention can reconstruct the sewage discharge time sequence data of a certain port in a certain period of time of a standard data set (namely the data set uploaded after hardware acquisition), compare the difference before and after reconstruction and cluster to obtain abnormality.

The user can also upload a table in a specified format, so that the user can be helped to analyze multidimensional time series anomalies autonomously:

as shown in FIG. 2, the table uploaded by the user can perform anomaly analysis on the multi-dimensional time sequence by pre-selecting the two-end time, and return to an anomaly time point, a single-dimensional anomaly comparison condition (such as an increase of the number of anomaly points) of two periods, and a multi-dimensional anomaly comparison condition (i.e. consider the comprehensive condition of all pollutants and output statistical anomaly information) of two periods.

While the invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and substitutions can be made herein without departing from the scope of the invention as defined by the appended claims.

Claims

1. A bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method,

the method is characterized by comprising the following steps of:

s1: collecting data:

s2: preprocessing an original multidimensional time series sample:

s3: resampling the preprocessed multidimensional time series samples:

performing telescopic transformation on the preprocessed multidimensional time series samples, resampling in a sliding window mode, extracting original samples at set time intervals, and extracting multidimensional time series features with set scales;

s5: performing small sample refinement and sequence point classification on the model after pre-training:

firstly, randomly extracting 10% of continuous sequences from a data set formed by a long string of multidimensional time sequences as samples, repeating the training process in S4, and performing Fine tuning of small samples in a downstream task

through encoder and decoderCode reconstruction sequence obtainingThe loss of each time point is obtained, and the sequence points are classified by a K-means clustering method, and the method specifically comprises the following steps:

s6: monitoring pollution discharge abnormality by using the model obtained in the step S5;

the step S4 specifically includes the following steps:

s4.1: using a noise function to destroy the sequence:

s4.2: constructing an encoder network skeleton part:

MultiHead(Q,K,V)＝[head ₁ ,…,head _h ]W ₀

where head _i ＝Attention(QWi _i ^Q ,KW _i ^K ,VW _i ^V )

s4.3: constructing a decoder part:

as long as the number of hidden nodes is enough, the normalization statistics of Layer Normalization can be guaranteed to be representative enough, and the data output after the L-layer MLP isi represents the dimension, denoted in_dim, passing μ ^l Sum sigma ^l Normalized values can be obtained>Wherein E is 1e-5, and 0 is prevented from being removed;

the calculation mode of the dynamic time warping index is as follows: after aligning the two time sequences, calculating a difference matrix, wherein the goal is to find a path from (0, 0) to (n, n) in the matrix, so that the accumulated Euler distance of elements on the path is minimum, and the minimum path is a Wraping path, namely a dynamic time warping index, used for representing the similarity of the two time sequences:

d _ij ＝d(x _i ，y _j )

D(i，j)＝d _ij +min{D(i-1，j)，D(i，j-1)，D(i-1，j-1)}

wherein D (i-1, j) represents x _i-1 And y is _j The subsequence distance at the time of matching, D (i, j-1) represents x _i And y is _j-1 The subsequence distance at the time of matching, D (i-1, j-1) represents x _i-1 And y is _j-1 Sub-sequence distance at match;

in a multivariate time series, x _i And y _j Are vectors in the in dim dimension, and x _i The element in (a) is the value of the variable at time i, y _j The element in (a) is the value of the variable at time j, d (x _i ，y _j ) I.e. x at time i _i And y at time j _j Distance at alignment; vector x _i And y _j The distance calculation method d (x _i ,y _j ) May be calculated by euclidean distance or mahalanobis distance;

euclidean distance:

for a mean value μ= (μ) ₁ ,μ ₂ ,μ ₃ ,…,μ _p ) ^T Multivariate x= (x) with covariance matrix S ₁ ,x ₂ ,x ₃ ,…,x _p ) ^T The mahalanobis distance is:

it differs from euclidean distance in that it allows for a link between various characteristics and is Scale-independent (Scale-independent), i.e. independent of the measurement Scale;

s4.4: the pre-training process is as follows:

2. The method for monitoring the abnormal sewage disposal according to claim 1, wherein in the step S1,

the steps of the multichannel acquisition and transmission module for acquiring data comprise: initializing, reading primary serial port interrupt data, packaging and transmitting the data,

3. The bidirectional autoregressive non-supervision pretraining fine-tuning sewage anomaly monitoring method according to claim 2, wherein in the step S2, the Mask-based method specifically comprises: a complete time sequence is taken as a sample, 10% of the time sequence is randomly covered, and the time sequence is restored by using a Kalman filtering algorithm.

4. The application of the bidirectional autoregressive non-supervision pre-training fine-tuning sewage anomaly monitoring method to a visual large screen management platform according to claim 3,

the hardware part of the visual large screen management platform comprises: a multi-channel acquisition and transmission module, a data processing module,

the multi-channel acquisition and transmission module comprises: the pressure sensor, the flow velocity meter, the thermometer and the density detector are arranged at the pollution source discharge port to be monitored,

the front end of the data processing module is developed based on a compact framework, and uses axios to interact with the rear end to acquire the analyzed pollution discharge data, and introduces DataV and AntV to visualize the abnormal pollution discharge data;

the back end of the data processing module is developed based on Golang and comprises Gin and GORM frames;

coding a data processing module, namely generating a Docker mirror image by writing Docker file, and performing multi-platform migration operation;

the visual large screen management platform comprises three views of a government, an enterprise and a person.