CN115099321A

CN115099321A - Bidirectional autoregression unsupervised pre-training fine-tuning type abnormal pollution discharge monitoring method and application

Info

Publication number: CN115099321A
Application number: CN202210687441.8A
Authority: CN
Inventors: 叶柯; 周奕希; 孔佳玉; 曹瀚洋; 姜沁琬; 李宛欣; 韩伟
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2022-09-23
Anticipated expiration: 2042-06-17
Also published as: CN115099321B

Abstract

The invention belongs to the field of abnormal pollution discharge monitoring, and provides a bidirectional autoregression unsupervised pre-training fine tuning type abnormal pollution discharge monitoring method, which comprises the following steps: the multi-channel acquisition and transmission module periodically acquires data of a pollution source discharge port and preprocesses an original multi-dimensional time sequence sample; resampling the preprocessed multidimensional time series samples; constructing a model comprising three parts of data resampling enhancement, an encoder and a decoder and pre-training; carrying out small sample fine tuning and sequence point classification on the pre-trained model; and monitoring abnormal pollution discharge by using the model. The method integrates the network with strong generalization capability to fully extract more abstract semantic features in the multidimensional time sequence, so that the model can have higher reasoning speed and higher precision.

Description

Bidirectional autoregression unsupervised pre-training fine-tuning type abnormal pollution discharge monitoring method and application

Technical Field

The invention relates to the field of abnormal pollution discharge monitoring, in particular to a bidirectional autoregression unsupervised pre-training fine-tuning abnormal pollution discharge monitoring method and application.

Background

Pollution discharge enterprises are fixed pollution sources, and the pollution discharge of the enterprises is one of the main sources of environmental pollution in China. The supervision of fixed pollution sources is the central importance of pollution prevention and control in China. However, at present, the conditions that monitoring data is abnormal or invalid due to the fact that an enterprise steals and maliciously tampers monitoring equipment parameters, damages on-line monitoring equipment facilities, and is not normal in operation and maintenance and untimely still exist, the environment is polluted, and higher requirements are put forward for supervision. At present, the detection of water quality emission at home and abroad is complete in terms of hardware, and the development and research of various water quality sensors, floating water quality monitoring stations, water quality monitoring terminals and the like ensure the relevant index data of water quality emission and accurately and efficiently collect a large amount of data. However, in the aspect of processing pollutant data, the method has a large blank, and the legal pollution discharge of enterprises is judged reasonably and efficiently without using the data.

The traditional time sequence feature extraction method mainly comprises two parts: (1) a statistical-based decision, such as the 3-sigma principle, the confidence principle, etc. The methods comprehensively evaluate the data of all dimensions of the sequence, and update and judge the outliers in real time depending on the statistics. (2) Features are constructed manually based on ordinary machine learning, which requires a researcher to have clear knowledge and understanding of the meaning of the data in order to perform appropriate feature engineering to ensure the robustness of the model. Therefore, in many cases, it is difficult to achieve a good effect. Learning and extracting multi-dimensional time series characteristics are necessary preconditions of an abnormal point detection task.

Disclosure of Invention

Aiming at the defects of the traditional method and the characteristics of the multi-dimensional time sequence, the feature extraction is carried out on the multi-dimensional time sequence by utilizing the improved deep learning network, the network with strong generalization capability is fused to fully extract more abstract semantic features in the multi-dimensional time sequence, and parameters are optimized while a complex model is constructed, so that the model can have higher reasoning speed and higher precision.

The invention provides a bidirectional autoregression unsupervised pre-training micro-regulation type abnormal pollution discharge monitoring method, which comprises the following steps of:

s1: collecting data

The multi-channel acquisition and transmission module periodically acquires the following data of the pollution source discharge port: pressure, flow rate, temperature, density; obtaining an original multi-dimensional time series sample, and transmitting the original multi-dimensional time series sample to a data processing module;

s2: preprocessing an original multi-dimensional time series sample;

carrying out denoising and interpolation operation on the data obtained by the data processing module by using a Mask-based method;

s3: resampling the preprocessed multidimensional time series samples;

performing telescopic transformation on a certain time sequence, performing resampling in a sliding window mode, extracting an original sample at a set time interval, and extracting multi-dimensional time sequence features with a set scale;

s4: constructing a model comprising three parts of data resampling enhancement, an encoder and a decoder and pre-training;

s5: and carrying out small sample fine tuning and sequence point classification on the pre-trained model.

After pre-training, the model learns the characteristics of multi-dimensional time sequence with layers and multiple scales, which is a universal characteristic and contains the internal relation between the local part and the local part of the whole multi-dimensional time sequence and between the whole and the local part, the pre-training model is used as the Baseline of a downstream task to finely tune a downstream anomaly detection task, the input sequence is used as an original sequence, and the sequence output by the model is marked as a reconstruction sequence, so that the original sequence can be encoded and decoded by using the pre-training model, a reconstruction sequence without noise point information can be reconstructed, the reconstruction sequence has the high-dimensional characteristics of the original sequence, the original sequence is compared with the multi-dimensional time sequence obtained after reconstruction, the dynamic time warping index of each corresponding point is calculated as an anomaly index, and then the anomaly and the normal condition are classified by a clustering method;

firstly, randomly extracting 10% of continuous sequences in a data set consisting of a long string of multidimensional time sequences as samples, repeating the training process in S4, and performing Fine tuning of small samples in a downstream task;

the fine adjustment of the small sample comprises the following specific steps: input is a preprocessed raw multidimensional time series X' _in ∈ R ^batch ^{_size×input_window×in_dim} Wherein, batch _ size is the size of the training batch, input _ window represents the length of the sequence, i.e. the width of the sliding window in the preprocessing process, and in _ dim represents the dimension of the sample;

is obtained by reconstructing the sequence through an encoder and a decoder

Obtaining the loss of each time point, and classifying the sequence points by a K-means clustering method, wherein the method comprises the following specific steps:

(1) firstly, setting a parameter K, wherein the meaning of K is to aggregate data into several types, and taking K as 2;

(2) randomly selecting two points from the data to form a clustering initial central point;

(3) calculating the distance from all other points to the two points, then finding out the central point closest to each data point, and dividing the point into the clusters represented by the central point; then all points are divided into two clusters;

(4) re-calculating the mass centers of the two clusters to be used as the central point of the next clustering;

(5) repeating the processes of the steps (3) to (4), clustering again, and continuously iterating and repeating the process;

(6) stopping when the attribution categories of all the sample points are not changed after re-clustering;

finally, two types of samples can be obtained, wherein the type with less samples is abnormal;

in practical situations, in order to ensure that emission does not exceed standards, enterprises privately modify uploaded data: when the received emission data index has an ascending trend in a certain time period, but suddenly and rapidly descends after a certain time point in the process, a spine-shaped data change situation is generated, and the phenomenon does not occur in the near period, it is reasonable to speculate that a monitoring object may perform external interference on detection equipment when the emission data index is found to be in the ascending trend and is about to exceed the standard, or the uploaded data has tampering behavior, mark the corresponding time point as abnormal data, perform abnormal marking on the company and the company sharing the sewage discharge outlet, and finally find out all abnormal points by a clustering method.

S6: and monitoring abnormal pollution discharge by using the model obtained in the step S5.

Preferably, in step S1, the step of acquiring data by the multi-channel acquisition and transmission module includes: initializing, reading serial port interrupt data once, packaging and sending the data,

the initialization step includes: performing initialization configuration on an ESP8266 chip operating environment;

the step of reading the serial port interrupt data once specifically comprises the following steps: setting the data type of temporarily stored serial port interruption data as an unknown int type of a 32-bit computer, wherein the data length is 16bits, updating the data in real time when the serial port is interrupted, and performing secondary filtering in the process of interruption in fixed time;

the data packing and sending method specifically comprises the following steps: the ESP8266 chip breaks the read sixteen-bit data into the first eight bits and the last eight bits, the data are packaged by taking 0x03 and 0x03 as data needles and taking 0x03 and 0x03 as data needles, and finally the data are transmitted to a data processing module through a network in a character form by using the ESP8266 chip in a wireless hotspot mode.

Preferably, in step S2, the Mask-based method specifically includes: and taking a complete time sequence as a sample, randomly covering 10% of the sample, and restoring the sample by using a Kalman filtering algorithm.

Preferably, the step S4 specifically includes the following sub-steps:

s4.1: sequence destruction using a noise function:

destroying the time sequence obtained in the step S3 by using any one or a plurality of combinations of five noise functions of Token Masking, Token Deletion, Textlnfilling, sequence permatation and Document Rotation;

s4.2: constructing a network framework part of an encoder:

selecting a self-attention layer and an MLP network as backbone networks, and iterating for 12 times to form an encoder;

standardizing the multidimensional time series, eliminating the influence of dimension,

adding position information to multidimensional time series by using position coding PE

pos refers to different time point positions, 2i and 2i +1 respectively correspond to different dimensionality indexes of a certain time point, and odd dimensionality indexes

Coded with sin sine, coded with cos cosine in even dimensions, d _model Refers to the total dimension of the data, here to prevent the index of 10000 from being too large and overflowing;

generating three matrixes of Q, K and V by using three linear layers, accessing each K by using Q, converting into an index with e as a base through scaling and softmax, and then performing normalization processing:

after normalization processing, the weight is used as V, so that the Attention value is calculated for the subsequent MLP layer and the decoder to carry out sequence reconstruction:

a multi-head mechanism is introduced to adapt to the time series information of higher dimension:

MultiHead(Q，K，V)＝[head ₁ ，...，head _h ]W ₀

h is the number of the attention heads, when multi-head attention is used, the size of input dimensionality must be ensured to be evenly divided by the number of the attention heads, the input dimensionality is divided into h groups, and each group of characteristics has an attention system;

s4.3: constructing a decoder part:

taking a multi-head self-Attention layer and an MLP layer as a network framework, performing Attention aggregation calculation by using Cross multi-head Attention aggregation operation Cross Attention and a hidden state result of the last layer of an encoder after one-time stacking, and then performing multi-time stacking of the MLP layer of the self-Attention layer;

taking K and V obtained by a self-attention mechanism in an encoder and Q obtained by training in a decoder for aggregation calculation, and adding a Normalization Layer after each Layer of MLP by adopting a Layer Normalization method, wherein H is the number of hidden Layer nodes in one Layer, and l is the number of layers of the MLP, and the Normalization statistic mu of Layer Normalization can be calculated ^l And σ ^l ：

Wherein a is ^l For an input multidimensional time series, the statistic μ ^l And σ ^l The calculation of (2) is independent of the number of samples, and depends only on the number of hidden nodes,

as long as the number of hidden Layer nodes is enough, the Normalization statistics of Layer Normalization can be guaranteed to be representative enough, and data output after the I-Layer MLP is

i represents the dimension, denoted in _ dim, passing μ ^l And σ ^l Normalized values can be obtained

Wherein e takes 1e ^-5 Preventing the removal of 0;

performing recovery reconstruction on the damaged time sequence in an autoregressive mode; the restoration reconstruction degree is evaluated through a dynamic time warping index;

the calculation mode of the dynamic time warping index is as follows: after aligning the two time sequences, calculating a difference matrix, wherein the target is to find a path from (0, 0) to (n, n) in the matrix, so that the accumulated euler distance of elements on the path is the minimum, and the minimum path is a Wraping path, which is a dynamic time warping index and is used for representing the similarity of the two time sequences:

constructing an n x n matrix of (i) _th ，j _th ) The element being a point q _i And c _j Euclidean distance d (q) between _i ，c _j ) (ii) a The wrap path defines the mapping between the time sequences Q and C, denoted P, which is a set of consecutive matrix elements, the tth of P _h Each element being defined as p _t ＝d(q _i ，c _j ) _t Wherein P ═ P ₁ ，p ₂ ，...，p _T And n is less than or equal to T is less than or equal to 2n-1, and is obtained by a dynamic programming method essentially:

d _ij ＝d(x _i ，y _j )

D(i，j)＝d _ij +min{D(i-1，j)，D(i，j-1)，D(i-1，j-1)}

wherein D (i-1, j) represents x _i -1 and y _j Subsequence distance in matching, D (i, j-1) denotes x _i And y _j -1 subsequence distance when matching, D (i-1, j-1) denotes x _i -1 and y _j -1 subsequence distance at matching;

in a multivariate time series, x _i And y _j Are all in _ dim dimensional vectors, and x _i In (1)The element being the value of the variable at time i, y _j The element in (1) is the value of the variable at time j, d (x) _i ，y _j ) I.e. x at time i _i And y at time j _j Distance at alignment; vector x _i And y _j The distance between them is calculated by d (x) _i ，y _j ) Can be calculated by euclidean distance or mahalanobis distance;

euclidean distance:

for one mean value, μ ═ μ (μ) ₁ ，μ ₂ ，μ ₃ ，...，μ _p ) ^T Multivariate x ═ x (x) with covariance matrix S ₁ ，x ₂ ，x ₃ ，...，x _p ) ^T The mahalanobis distance is:

it differs from the euclidean distance in that it allows for a link between various characteristics and is Scale-independent, i.e. independent of the measurement Scale.

S44: the pre-training process comprises the following steps:

resampling the acquired data to be used as an original sequence, then carrying out sequence destruction on the original sequence, then transmitting the original sequence into a model to be subjected to Forward reasoning Forward, processing the original sequence by an encoder and a decoder to obtain a new sequence, and recording the new sequence as a reconstruction sequence of the original sequence;

measuring the similarity between the original sequence and the reconstructed sequence by using a dynamic time warping index, recording the similarity as DTW, and constructing a loss function:

and after the losses before and after reconstruction are obtained through calculation, back propagation is carried out, model parameters are updated, the losses are reduced gradually until the losses are converged and are not reduced, the model gradually learns the features in the multi-dimensional time sequence in the process, the function of automatically extracting the features is achieved, the pre-training process is adopted, and finally a pre-training model with the sequence extraction capability is obtained for the use of the downstream tasks.

The invention also provides an application of the bidirectional autoregression unsupervised pre-training micro-regulation type abnormal pollution discharge monitoring method on a visual large-screen management platform, wherein the hardware part of the visual large-screen management platform comprises the following components: multichannel collection and transmission module, data processing module, multichannel collection and transmission module includes: the device comprises a pressure sensor, a flow rate meter, a thermometer and a density detector, wherein the sensors are arranged at a discharge port of a pollution source to be monitored and the front end of a data processing module, are developed based on a act frame, interact with the rear end by using axios to acquire analyzed pollution discharge data, and introduce DataV and AntV to visualize abnormal pollution discharge data; the back end of the data processing module is developed based on Golang and comprises Gin and GORM frameworks; coding the data processing module, generating a Docker mirror image by compiling Dockerfile, and performing multi-platform migration operation; the visual large-screen management platform comprises three perspectives of government, enterprise and individual.

Drawings

In order to illustrate more clearly the embodiments or solutions of the present invention in the prior art, reference will now be made briefly to the attached drawings, which are used in the description of the embodiments or solutions in the prior art, and in which some specific embodiments of the invention will be described in detail, by way of example and not by way of limitation, with reference to the attached drawings. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

FIG. 1 is a functional flow diagram of the present invention;

fig. 2 is a table of a specified format uploaded by a user.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The flow chart of the bidirectional autoregression unsupervised pre-training micro-regulation type abnormal pollution discharge monitoring method is shown in figure 1, and the specific implementation steps of the method are as follows:

s1.1: and selecting a type of the hardware equipment.

And selecting data sensors such as a pressure sensor, a flow velocity meter, a thermometer, a density detector and the like, and acquiring data near the discharge port to be used as an index for detecting pollution. The main control and Bluetooth transmission module selects ESP8266 to realize the main control and wireless transmission functions. The industrial leading Tensiica L106 ultra-low power consumption 32-bit micro MCU is integrated in a small-size package, the micro MCU is provided with a 16-bit simplified mode, a main frequency supports 80MHz and 160MHz, an RTOS is supported, Wi-Fi MAC/B/RF/PA/LNA is integrated, and an onboard antenna is provided. The module supports standard IEEE802.11b/g/n protocol and complete TCP/IP protocol stack.

S1.2: the data acquisition program mainly comprises modules of initializing, reading once serial port interrupt data, packaging and sending data and the like.

The initialization is mainly to carry out initialization configuration on the MCU running environment. Setting the data type of the temporary storage serial port interruption data as an unknown int type of a 32-bit computer, wherein the data length is 16bits, updating the data in real time when the serial port is interrupted, and performing secondary filtering in the interruption of fixed time. The MCU breaks the read sixteen-bit data into the first eight bits and the last eight bits, takes 0x03 and-0 x03 as data needles and-0 x03 and 0x03 as data needles, packs the data and finally sends the data in a character form. The data uploading part is connected with a computer in a hot spot mode through an ESP8266 module to realize wireless data transmission, and then data are transmitted into a network and uploaded to a cloud data center.

S2: preprocessing an original multi-dimensional time series sample;

selecting an interpolation algorithm by using a Mask-based method, and carrying out denoising and interpolation operations on the data obtained by the data processing module;

taking a complete time sequence as a sample, randomly covering 10% of the time sequence, restoring the time sequence by using different interpolation algorithms such as linear interpolation, quadratic interpolation, moving average, exponential average and the like, and if the difference between the restored time sequence and the original sequence is smaller, indicating that the characteristic adaptability of the interpolation scheme to the sequence is better. And finally, selecting Kalman filtering as an interpolation algorithm.

S3: resampling the preprocessed multidimensional time series samples;

and performing telescopic transformation on a certain time sequence, performing resampling in a sliding window mode, extracting original samples at different time intervals, and extracting multi-dimensional time sequence features of different scales.

Determining a sliding window with fixed length, then continuously moving the sliding window to the right from the beginning of the sequence with a certain step length, wherein the sequence area covered after each movement is a small sample, thus realizing that the original long-time sequence is divided into a plurality of subsequences, the subsequences are used as a new data set for training a model, the preset sliding window size is input _ window, the total length of the input sample sequence is seq _ len, the multidimensional time sequence dimension is in _ dim, and the input sample is a two-dimensional tensor X _in ∈ R ^{seq_len×in_dim} Obtaining a sample as three-dimensional tensor X 'through multi-scale stretching transformation and sliding window resampling' _in ∈ R ^batc ^{_size×input_window×in_dim} 。

s4.1: the sequence is destroyed using a noise function.

Five noise function methods of Token Masking, Token Deletion, Text profiling, sequence permatation and Document Rotation are used for destroying the time sequence input during pre-training and destroying the time sequence obtained in the step S3. The difficulty of the sequence reconstruction task in the pre-training stage is increased, so that the model can better learn and extract the features in the multi-dimensional time sequence.

S4.2: and constructing a network framework part of the encoder.

And (4) selecting a self-attention layer and an MLP network as backbone networks, and iterating for 12 times to form the encoder. Firstly, a multidimensional time sequence is entered

And (4) standardizing to eliminate the influence of dimension, and adding Position information into the multidimensional time sequence by using Position Encoding (Position Encoding) to prevent the Position information from being lost due to parallelism in a subsequent attention mechanism.

pos refers to different time point positions, 2i and 2i +1 respectively correspond to different dimension indexes of a certain time point, odd dimensions are coded by sin sine, even dimensions are coded by cos cosine, and d _model Refers to the overall dimension of the data, here to prevent overflow of 10000 exponent. Then, three linear layers are used to generate three matrixes of Q (query), K (key), and V (value), and Q is used to visit each K, and after scaling and softmax (firstly, the index based on e is converted, and then normalization is carried out:

) Then as the weight of V, thus calculating the Attention value for the subsequent MLP layer and decoder to reconstruct the sequence:

and introducing a multi-head mechanism to adapt to the time sequence information with higher dimensionality:

MultiHead(Q，K，V)＝[head ₁ ，...，head _h ]W ₀

wherein h is the number of attention heads, it must be ensured that the size of the input dimension must be evenly divisible by the number of attention heads when using multi-head attention, the input dimension is divided into h groups, and each group of features has its own attention system.

S4.3: the decoder portion is constructed.

Similarly, the multi-head self-Attention layer and the MLP layer are used as a network framework, after one-time stacking, Attention aggregation calculation is carried out by using Cross multi-head Attention aggregation operation (Cross Attention) and a hidden state result of the last layer of the encoder, and then the MLP layer of the multi-head self-Attention layer is stacked. Performing aggregation calculation on K (key) and V (value) obtained by a self-attention mechanism in an encoder and Q (query) obtained by training in a decoder, and adding a Normalization Layer after each MLP (multi-Layer processing), wherein Layer Normalization is adopted, H is the number of hidden Layer nodes in one Layer, l is the number of layers of the MLP, and the Normalization statistic mu of the Layer Normalization can be calculated ^l And σ ^l ：

The calculation of the above statistic is independent of the number of samples, and the number of the samples depends on the number of hidden nodes, so that the normalization statistic of LN can be guaranteed to be representative enough as long as the number of the hidden nodes is enough, and the data output after l layers of MLP is

Wherein e is 1e ^-5 A divide by 0 is prevented. And finally, performing recovery reconstruction on the damaged time sequence in an autoregressive mode. The degree of restitution reconstruction is evaluated by a dynamic time warping index. That is, after aligning two time series, a difference matrix is calculated, and the goal is to find a path from (0, 0) to (n, n) in the matrix, so that the accumulated Euler distance of the elements on the path is minimized, such thatOne path is called a Wraping path, which is a dynamic time warping index, and is used for representing the similarity of two time sequences:

first, an n x n matrix is constructed, the (i) of the matrix _th ，j _th ) The element being a point q _i And c _j Euclidean distance d (q) between _i ，c _j ). The wrap path defines the mapping between timing Q and C, denoted as P. Is a set of consecutive matrix elements, tth of P _h Each element being defined as p _t ＝d(q _i ，c _j ) _t Wherein P ═ P ₁ ，p ₂ ，...，p _T And n is less than or equal to T is less than or equal to 2n-1, and is obtained by a dynamic programming method essentially:

d _ij ＝d(x _i ，y _j )

D(i，j)＝d _ij +min{D(i-1，j)，D(i，j-1)，D(i-1，j-1)}

wherein D (i-1, j) represents x _i -1 and y _j Subsequence distance in matching, D (i, j-1) denotes x _i And y _j -1 subsequence distance when matching, D (i-1, j-1) denotes x _i -1 and y _j -1 subsequence distance when matching. In a multivariate time series, x _i And y _j Are all in _ dim dimensional vectors, and x _i Is the value of the variable at time i, y _j The element in (1) is the value of the variable at time j, d (x) _i ，y _j ) I.e. x at time i _i And y at time j _j Distance when aligned. Vector x _i And y _j The distance between them is calculated by d (x) _i ，y _j ) Calculated by the euclidean distance.

Euclidean distance:

s4.4: pre-training process

The method comprises the steps of resampling acquired data to be used as an original sequence, then carrying out sequence destruction on the original sequence, then transmitting the original sequence into a model to carry out Forward reasoning (Forward), processing the original sequence through an encoder and a decoder to obtain a new sequence, and recording the new sequence as a reconstruction sequence of the original sequence.

And then, measuring the similarity between the original sequence and the reconstructed sequence by using a dynamic time warping index, recording the similarity as DTW, and constructing a loss function:

With a small number of samples, the training process is repeated, and Fine tuning of the small samples is done in the downstream tasks (Fine tuning).

The data set is a long-string multidimensional time sequence, firstly, 10% of continuous sequences are randomly extracted from the data set as samples, the training process is repeated, and fine adjustment (Finetuneng) of small samples is carried out in downstream tasks: input is a pre-processed multidimensional time series X' _in ∈ R ^{batch_size×input_window×in_dim} Is obtained by reconstructing the sequence through an encoder and a decoder

Calculating dynamic time warping index for representing the similarity of two time series, and calculating

And finally, carrying out K-means clustering on the Loss to obtain an anomaly point:

(1) firstly, setting a parameter K, wherein the meaning of K is to aggregate data into several classes (taking K as 2);

(3) the distance between all other points and the two points is calculated, then the central point closest to each data point is found, and the point is divided into the clusters represented by the central point. Then all points are divided into two clusters;

(4) recalculating the centroids of the two clusters as the central points of the next clustering;

(5) repeating the process of the 3-4 steps, clustering again, and continuously iterating and repeating the process;

and finally, obtaining two types of samples, wherein the type with less samples is abnormal.

Furthermore, the model is integrated in a visual large-screen management platform, and front and rear ends are set up to form a management system with three views of government, enterprise and individual.

The front end is developed based on a exact frame, axios and the rear end are used for interacting to acquire analyzed company pollution discharge data, and DataV and AntV are introduced for visualizing the company pollution discharge abnormal data, so that the abnormal state of the company pollution discharge is visually displayed. The rear end is developed by using Golang, and mainly lightweight Gin and GORM frameworks are used. And writing a persistence layer by using a GORM framework to efficiently read, write and manage the MySQL data. In the aspect of system design, users are divided into three levels of government managers, company personnel and tourists, and different levels have different authorities, for example, the managers (government related managers) can modify the abnormal state of the company, the company can carry out abnormal complaints, the tourists can report, and the like. The Gin framework is combined with a flash of python, the Gin framework is responsible for processing the route, the flash is responsible for data processing and model operation, relevant indexes and data are obtained through processing by utilizing data processing libraries such as Numpy and Pandas and a model written by a Pythroch framework and are returned to a front end for visual display, advantages of the two programming languages are complementary and organically combined, and functional advantages of the two programming languages in different fields are fully exerted.

Packaging local projects and various dependencies by compiling Dockerfile to generate Docker mirror images, uploading the Docker mirror images to a warehouse, performing mirror image pulling on a server side, and creating containers for deployment of application services. The Docker can run on a plurality of platforms, applications running on one platform can be easily migrated to another platform, the situation that the applications cannot run normally due to changes of running environments is not worried about, application services are deployed on the Ali cloud server by applying the Docker technology, and although the Docker technology is used, the Docker technology is more and more complicated, and the Docker technology can be easily implemented under the support of the Docker container technology.

The invention can reconstruct the pollution discharge time sequence data of a certain port in a certain period of time of a standard data set (namely the data set uploaded after hardware acquisition), compare the difference before and after reconstruction, and perform clustering to obtain the abnormity.

The user can also upload a table with a specified format, and the user can be helped to autonomously analyze the multi-dimensional time series abnormity:

the table uploaded by the user is shown in fig. 2, and the user selects the two-end time in advance, so that the multi-dimensional time sequence can be subjected to anomaly analysis, and returns the single-dimensional anomaly comparison conditions (such as the increment of the number of the anomaly points) of the anomaly time points and the two-end time, and also the multi-dimensional anomaly comparison conditions (that is, the comprehensive conditions of all pollutants are considered, and the statistical anomaly information is output) of the two-end time.

The above description is only a partial embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also included in the scope of the present invention.

Claims

1. A bidirectional autoregression unsupervised pre-training micro-regulation type abnormal pollution discharge monitoring method,

the method is characterized by comprising the following steps:

s1: collecting data:

s2: preprocessing an original multi-dimensional time series sample:

s3: resampling the preprocessed multidimensional time series samples:

s5: carrying out small sample fine tuning and sequence point classification on the pre-trained model:

firstly, 10% of continuous sequences are randomly extracted from a data set consisting of a long string of multidimensional time sequences to be used as samples, the training process in S4 is repeated, and Fine tuning of small samples is performed in a downstream task

The fine adjustment of the small sample comprises the following specific steps: input is a preprocessed raw multidimensional time series X' _in ∈R ^batch ^{_size×input_window×in_dim} Wherein, batch _ size is the size of the training batch, input _ window represents the length of the sequence, i.e. the width of the sliding window in the preprocessing process, and in _ dim represents the dimension of the sample;

is obtained by reconstructing the sequence through an encoder and a decoder

(3) calculating the distance between all other points and the two points, then finding out the central point closest to each data point, and dividing the point into the clusters represented by the central point; then all points are divided into two clusters;

2. The bi-directional autoregressive unsupervised pre-trained fine tuned emissions anomaly monitoring method of claim 1, wherein in said step S1,

the step of the multi-channel acquisition and transmission module for acquiring data comprises the following steps: initializing, reading serial port interrupt data once, packaging and sending the data,

the specific steps of the data packing and sending are as follows: the ESP8266 chip breaks the read sixteen-bit data into the first eight bits and the last eight bits, the data are packaged by taking 0x03 and 0x03 as data needles and taking 0x03 and 0x03 as data needles, and finally the data are transmitted to a data processing module through a network in a character form by using the ESP8266 chip in a wireless hotspot mode.

3. The method for monitoring abnormal pollution discharge in a bidirectional autoregressive unsupervised pre-training fine tuning manner as claimed in claim 2, wherein in the step S2, the method based on Mask is specifically as follows: and taking a complete time sequence as a sample, randomly covering 10% of the sample, and restoring the sample by using a Kalman filtering algorithm.

4. The method for bi-directional autoregressive unsupervised pre-trained fine-tuned emissions anomaly monitoring according to claim 3, wherein said step S4 comprises the following sub-steps:

s4.1: sequence destruction using noise function:

destroying the time sequence obtained in the step S3 by using any one or more combinations of five noise functions of Token Masking, Token Deletion, Text profiling, sequence validation and Document validation;

s4.2: constructing a network framework part of an encoder:

a multi-head mechanism is introduced to accommodate higher dimensional time series information:

MultiHead(Q，K，V)＝[head ₁ ，...，head _h ]W ₀

wherehead _i ＝Attention(QW _i ^Q ，KW _i ^K ，VW _i ^V )

s4.3: constructing a decoder part:

the multi-head self-Attention layer and the MLP layer are used as a network framework, after one-time stacking, Cross multi-head Attention aggregation operation Cross Attention and the hidden state result of the last layer of the encoder are used for performing Attention aggregation calculation, and then the MLP layer of the multi-head self-Attention layer is stacked for many times;

k and V obtained by an attention mechanism in an encoder and Q obtained by training in a decoder are subjected to aggregation calculation, a normalized Layer is added behind each Layer of MLP by adopting a Layer Normalization method, wherein H is the number of hidden Layer nodes in one Layer, l is the number of layers of the MLP, and the Normalization statistic u of the Layer Normalization can be calculated ^l And σ ^l ：

as long as the number of hidden Layer nodes is enough, the Normalization statistics of Layer Normalization can be guaranteed to be representative enough, and data output after passing through the L Layer MLP is

Wherein the epsilon is 1e-5, and 0 is prevented from being removed;

d _ij ＝d(x _i ，y _j )

D(i，j)＝d _ij +min(D(i-1，j)，D(i，j-1)，D(i-1，j-1)}

wherein D (i-1, j) represents x _i -1 and y _j Subsequence distance in matching, D (i, j-1) denotes x _i And y _j -1 subsequence distance when matching, D (i-1, j-1) denotes x _i -1 and y _j -1 subsequence distance when matching;

in a multivariate time series, x _i And y _j Are all in _ dim dimensional vectors, and x _i Is the value of the variable at time i, y _j The element in (1) is the value of the variable at time j, d (x) _i ，y _j ) I.e. x at time i _i And y at time j _j Distance at alignment; vector x _i And y _j The distance between them is calculated in the manner d (x) _i ，y _j ) Can be calculated by euclidean distance or mahalanobis distance;

euclidean distance:

it differs from the euclidean distance in that it allows for a link between various characteristics and is Scale-independent, i.e. independent of the measurement Scale;

s4.4: the pre-training process comprises the following steps:

and measuring the similarity between the original sequence and the reconstructed sequence by using a dynamic time warping index, recording the similarity as DTW, and constructing a loss function:

and after calculating the loss before and after reconstruction, performing back propagation, updating model parameters to ensure that the loss is smaller and smaller until the loss is converged and is not reduced any more, gradually learning the characteristics in the multidimensional time sequence by the model in the process, and performing the function of automatically extracting the characteristics, wherein the function is a pre-training process, and finally obtaining a pre-training model with the sequence extraction capability for use of the downstream tasks.

5. The application of the bidirectional autoregressive unsupervised pre-trained fine-tuned abnormal pollution discharge monitoring method in the visual large-screen management platform according to claim 4,

the hardware part of the visualization large-screen management platform comprises: a multi-channel acquisition and transmission module, a data processing module,

the multichannel is gathered and the transmission module includes: a pressure sensor, a flow velocity meter, a thermometer and a density detector, wherein the sensors are arranged at a pollution source discharge port to be monitored,

the front end of the data processing module is developed based on a act framework, axios and the rear end are used for carrying out interaction to obtain analyzed pollution discharge data, and DataV and AntV are introduced for carrying out visualization on abnormal pollution discharge data;

the back end of the data processing module is developed based on Golang and comprises Gin and GORM frameworks;

coding the data processing module, generating a Docker mirror image by compiling Dockerfile, and performing multi-platform migration operation;

the visual large-screen management platform comprises three perspectives of government, enterprise and individual.