CN115099321B - Bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method and application - Google Patents

Bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method and application Download PDF

Info

Publication number
CN115099321B
CN115099321B CN202210687441.8A CN202210687441A CN115099321B CN 115099321 B CN115099321 B CN 115099321B CN 202210687441 A CN202210687441 A CN 202210687441A CN 115099321 B CN115099321 B CN 115099321B
Authority
CN
China
Prior art keywords
data
sequence
time
attention
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210687441.8A
Other languages
Chinese (zh)
Other versions
CN115099321A (en
Inventor
叶柯
周奕希
孔佳玉
曹瀚洋
姜沁琬
李宛欣
韩伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202210687441.8A priority Critical patent/CN115099321B/en
Publication of CN115099321A publication Critical patent/CN115099321A/en
Application granted granted Critical
Publication of CN115099321B publication Critical patent/CN115099321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A20/00Water conservation; Efficient water supply; Efficient water use
    • Y02A20/20Controlling water pollution; Waste water treatment

Abstract

The invention belongs to the field of pollution discharge abnormality monitoring, and provides a bidirectional autoregressive non-supervision pretraining fine-tuning pollution discharge abnormality monitoring method, which comprises the following steps: the multi-channel acquisition and transmission module periodically acquires the data of the pollution source discharge port and preprocesses the original multi-dimensional time series sample; resampling the preprocessed multidimensional time series sample; constructing a model comprising three parts of data resampling enhancement, an encoder and a decoder and pre-training; performing small sample fine adjustment and sequence point classification on the model after pre-training; and carrying out pollution discharge abnormality monitoring by using the model. The method fuses a network with strong generalization capability to fully extract more abstract semantic features in the multidimensional time sequence, so that the model can have faster reasoning speed and higher precision.

Description

Bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method and application
Technical Field
The invention relates to the field of pollution discharge abnormality monitoring, in particular to a bidirectional autoregressive non-supervision pretraining fine-tuning pollution discharge abnormality monitoring method and application.
Background
The scheme of the fourteenth five-year planning and 2035 distant target outline of national economy and society development of the people's republic of China is as follows: "deep pollution prevention and attack combat, build up sound environment treatment system, promote accurate, science, legal, systematic pollution control". Pollution discharge enterprises are fixed pollution sources, and pollution discharge is one of main sources of environmental pollution in China. The supervision of fixed pollution sources is a serious issue for pollution control in China. However, at present, the conditions of abnormal or invalid monitoring data caused by enterprise theft and malicious tampering of monitoring equipment parameters, damage to on-line monitoring equipment facilities, non-standard equipment operation and maintenance and untimely time still exist, pollution is caused to the environment, and higher requirements are put forward on supervision. At present, the detection of water quality emission at home and abroad is quite complete in terms of hardware, various water quality sensors, floating water quality monitoring stations, water quality monitoring terminals and the like are developed and researched, and the data of relevant indexes of water quality emission are ensured to be accurately and efficiently collected in large quantities. However, in the aspect of processing pollutant data, a large margin exists, and the legal pollution discharge of enterprises can not be judged reasonably and efficiently by utilizing the data.
The traditional time sequence feature extraction method is mainly divided into two parts: (1) Statistical based decisions such as 3-sigma principle, confidence principle, etc. The method comprehensively evaluates the data of all dimensions of the sequence, and updates and judges the outliers in real time by means of statistics. (2) Based on common machine learning, features are constructed manually, which requires researchers to have clear knowledge and understanding of the meaning of data to perform appropriate feature engineering to ensure robustness of the model. Therefore, in more cases, it is difficult to achieve a preferable effect. Learning and extracting the multi-dimensional time series features are necessary preconditions for the task of outlier detection.
Disclosure of Invention
Aiming at the defects of the traditional method and the characteristics of the multi-dimensional time sequence, the improved deep learning network is utilized to extract the characteristics of the multi-dimensional time sequence, the network with strong generalization capability is fused to fully extract more abstract semantic characteristics in the multi-dimensional time sequence, and parameters are optimized while a complex model is constructed, so that the model can have higher reasoning speed and higher precision.
The invention provides a bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method, which comprises the following steps:
s1: collecting data
The multi-channel acquisition and transmission module periodically acquires the following data of the pollution source discharge port: pressure, flow rate, temperature, density; obtaining an original multidimensional time sequence sample, and transmitting the original multidimensional time sequence sample to a data processing module;
s2: preprocessing an original multidimensional time series sample;
using a Mask-based method to perform denoising and interpolation operation on the data obtained by the data processing module;
s3: resampling the preprocessed multidimensional time series sample;
performing telescopic transformation on a certain time sequence, resampling in a sliding window mode, extracting original samples at set time intervals, and extracting multi-dimensional time sequence characteristics of set scales;
s4: constructing a model comprising three parts of data resampling enhancement, an encoder and a decoder and pre-training;
s5: and performing small sample refinement and sequence point classification on the model after the pre-training.
After pre-training, the model learns the characteristics of the multi-dimensional time sequence, namely the characteristics of hierarchy and multi-scale, the model is a universal characteristic, the intrinsic relation among the parts of the whole multi-dimensional time sequence is included, the pre-training model is used as a base line Baseline of a downstream task, fine-tuning is carried out on the downstream abnormality detection task, an input sequence is used as a primary sequence, the sequence output by the model is recorded as a reconstructed sequence, so that the primary sequence can be encoded and decoded by the pre-training model, the reconstructed sequence without noise information can be reconstructed, but the high-dimensional characteristics of the primary sequence are provided, the original sequence is used for comparing with the multi-dimensional time sequence obtained after reconstruction, the dynamic time regularity index of each corresponding point is calculated as an abnormality index, and then the abnormality and the normal condition are classified by a clustering method;
firstly, randomly extracting 10% of continuous sequences from a data set formed by a long string of multidimensional time sequences to serve as samples, repeating the training process in the S4, and performing Fine tuning of small samples in a downstream task;
the specific steps of fine tuning the small sample are as follows: input is the original multidimensional time series X 'after pretreatment' in ∈R batch _size×input_window×in_dim Wherein batch_size is the training batch size, input_window represents the sequence length, i.e. the sliding window width in the preprocessing process, and in_dim represents the dimension of the sample;
reconstructing a sequence by an encoder and a decoderThe loss of each time point is obtained, and the sequence points are classified by a K-means clustering method, and the method specifically comprises the following steps:
(1) Firstly setting parameters K, wherein the meaning of K is to aggregate data into several categories, and K=2 is taken here;
(2) Randomly selecting two points from the data to form a clustering initial center point;
(3) Calculating the distances from all other points to the two points, finding out the center point nearest to each data point, and dividing the point into clusters represented by the center point; then all points are divided into two clusters;
(4) Re-calculating the mass centers of the two clusters to be used as the center point of the next cluster;
(5) Repeating the processes of the steps (3) - (4), clustering again, and repeating the process repeatedly;
(6) Stopping when the attribution category of all the sample points is not changed after re-clustering;
finally, two types of samples can be obtained, and the type with the small sample number is abnormal;
in practical cases, enterprises carry out modification of uploading data privately in order to discharge without exceeding standard: when the received emission data index has an ascending trend in a certain time period, but suddenly and rapidly descends after a certain time point in the process, a spike-shaped data change condition is generated, and the phenomenon does not occur in a short time period, it is reasonably speculated that a monitoring object possibly performs external interference on the detection equipment or tamper behavior on the uploaded data when the pollution discharge index is found to be in the ascending trend and exceeds the standard, the corresponding time point is marked as abnormal data, the company and the company sharing the pollution discharge port are marked with abnormality, and finally all abnormal points are found out through a clustering method.
S6: and (5) monitoring pollution discharge abnormality by using the model obtained in the step (S5).
Preferably, in the step S1, the step of collecting data by the multi-channel collecting and transmitting module includes: initializing, reading primary serial port interrupt data, packaging and transmitting the data,
the initializing step comprises the following steps: initializing and configuring an ESP8266 chip running environment;
the step of reading the primary serial port interrupt data specifically comprises the following steps: setting the data type of temporarily stored serial port interrupt data as the unsigned int type of a 32-bit computer, wherein the data length is 16bits, updating the data in real time during serial port interrupt, and performing secondary filtering in fixed time interrupt;
the data packaging and transmitting method specifically comprises the following steps: the ESP8266 chip breaks the sixteen bits of data, the eight bits are divided into the first eight bits and the last eight bits, the data needle heads are used for 0x03 and 0x03, the data needle tails are used for 0x03 and 0x03, the data are packed, and finally, the ESP8266 chip is used for transmitting the data to the data processing module in a wireless hot spot mode through a network in a character form.
Preferably, in the step S2, the Mask-based Mask method specifically includes: a complete time sequence is taken as a sample, 10% of the time sequence is randomly covered, and the time sequence is restored by using a Kalman filtering algorithm.
Preferably, the step S4 specifically includes the following substeps:
s4.1: using a noise function to destroy the sequence:
destroying the time sequence obtained in the step S3 by using any one or more combination of five noise functions of Token Masking, token delay, text infiling Sentence Permutation and Document Rotation;
s4.2: constructing an encoder network skeleton part:
selecting a self-attention layer and an MLP network as backbone networks, and iterating 12 times to form an encoder;
the multidimensional time series is standardized, the influence of dimension is eliminated,
adding position information for multidimensional time series using position coding PE
pos refers to positions of different time points, 2i and 2i+1 respectively correspond to different dimension indexes of a certain time point, and odd dimensions are favorable
With sin sine coding, even dimension with cos cosine coding, d model Refers to the total dimension of the data, here preventing overflow by an index of 10000 being too large;
three matrices of Q, K and V are generated by using three linear layers, each K is accessed by using Q, and the three matrices are converted into an exponent based on e after scaling and softmax, and then normalization processing is carried out:
the normalized value is used as the weight of V, so that the attribute value is calculated for subsequent MLP layers and decoders to reconstruct sequences:
a multi-headed mechanism is introduced to accommodate higher dimensional time series information:
MultiHead(Q,K,V)=[head 1 ,...,head n ]W o
where head i =Attention(QW i Q ,KW i K ,VW i V )
wherein h is the number of attention heads, the size of the input dimension must be divided by the number of attention heads when using multi-head attention, the input dimension is divided into h groups, and each group of characteristics has an own attention system;
s4.3: constructing a decoder part:
taking a multi-head self-Attention layer and an MLP layer as network frameworks, performing Attention aggregation calculation by using a hidden state result of a Cross multi-head Attention aggregation operation Cross Attention and the last layer of an encoder after one-time stacking, and then stacking the MLP layers of the multi-time self-Attention layer;
taking K and V obtained by self-attention mechanism in the encoder and Q obtained by training in the decoder to perform aggregation calculation, and adding a standardized layer after each layer of MLP by adopting a Layer Normalization method, wherein H is the number of hidden layer nodes in one layer, l is the layer number of the MLP, and we can calculate the normalized statistic mu of Layer Normalization l Sum sigma l
Wherein a is l Statistics μ for an input multidimensional time series l Sum sigma l Is independent of the number of samples, but depends only on the number of hidden nodes,
as long as the number of hidden nodes is enough, the normalization statistics of Layer Normalization can be guaranteed to be representative enough, and the data output after the L-layer MLP isi represents the dimension, denoted in_dim, passing μ l Sum sigma l Normalized values can be obtained>Wherein E takes 1e -5 Prevent dividing by 0;
performing restoration reconstruction on the destroyed time sequence in an autoregressive mode; the restoration reconstruction degree is evaluated through a dynamic time warping index;
the calculation mode of the dynamic time warping index is as follows: after aligning the two time sequences, calculating a difference matrix, wherein the goal is to find a path from (0, 0) to (n, n) in the matrix, so that the cumulative euler distance of the elements on the path is minimum, and the minimum path is a Wraping path, namely a dynamic time warping index, used for representing the similarity of the two time sequences:
constructing an n x n matrix, the matrix (i th ,j th ) The element being the point q i And c j Euclidean distance d (q) i ,c j ) The method comprises the steps of carrying out a first treatment on the surface of the The Wraping path defines a mapping between timings Q and C, denoted as P, a set of consecutive matrix elements, the t < th > of P h The individual elements being defined as p t =d(q i ,c j ) t Wherein p=p 1 ,p 2 ,...,p T N is less than or equal to T is less than or equal to 2n-1, and is essentially obtained by a dynamic programming method:
d ij =d(x i ,y j )
D(i,j)=d ij +min{D(i-1,j),D(i,j-1),D(i-1,j-1)}
wherein D (i-1, j) represents x i -1 and y j The subsequence distance at the time of matching, D (i, j-1) represents x i And y is j -1 subsequence distance at match, D (i-1, j-1) represents x i -1 and y j -1 subsequence distance at match;
in a multivariate time series, x i And y j Are vectors in the in dim dimension, and x i The element in (a) is the value of the variable at time i, y j The element in (a) is the value of the variable at time j, d (x i ,y j ) I.e. x at time i i And y at time j j Distance at alignment; vector x i And y j The distance calculation method d (x i ,y j ) May be calculated by euclidean distance or mahalanobis distance;
euclidean distance:
for a mean value μ= (μ) 1 ,μ 2 ,μ 3 ,...,μ p ) T Multivariate x= (x) with covariance matrix s 1 ,x 2 ,x 3 ,...,x p ) T The mahalanobis distance is:
it differs from euclidean distance in that it allows for a link between various characteristics and is Scale-independent (Scale-independent), i.e. independent of the measurement Scale.
S4.4: the pre-training process is as follows:
resampling the acquired data to be used as an original sequence, then carrying out sequence destruction on the original sequence, then transmitting the original sequence into a model to carry out Forward reasoning Forward, processing the original sequence through an encoder and a decoder to obtain a new sequence, and recording the new sequence as a reconstructed sequence of the original sequence;
the similarity between the original sequence and the reconstructed sequence is measured by using a dynamic time warping index, and is recorded as DTW, and a loss function is constructed:
after the loss before and after the reconstruction is calculated, back propagation is carried out, model parameters are updated, so that the loss is smaller and smaller until the loss is converged and is not reduced, the model gradually learns the characteristics in the multidimensional time sequence in the process, the characteristic is automatically extracted, and the model is a pre-training process, and finally a pre-training model with the sequence extraction capability is obtained for later use of downstream tasks.
The invention also provides application of the bidirectional autoregressive non-supervision pre-training fine-tuning type pollution discharge abnormality monitoring method on a visual large screen management platform, wherein the hardware part of the visual large screen management platform comprises the following components: the multichannel acquisition and transmission module, data processing module, multichannel acquisition and transmission module includes: the system comprises a pressure sensor, a flow rate meter, a thermometer and a density detector, wherein the sensor is arranged at a pollution source discharge port to be monitored, the front end of a data processing module is developed based on a reaction frame, and is used for interactively acquiring analyzed pollution discharge data with the rear end, and introducing a DataV and an AntV for visualizing the abnormal pollution discharge data; the back end of the data processing module is developed based on Golang and comprises Gin and GORM frames; coding a data processing module, namely generating a Docker mirror image by writing Docker file, and performing multi-platform migration operation; the visual large screen management platform comprises three views of a government, an enterprise and a person.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings is provided below, and some specific examples of the present invention will be described in detail below by way of example and not by way of limitation with reference to the accompanying drawings. It will be appreciated by those skilled in the art that the drawings are not necessarily drawn to scale. In the accompanying drawings:
FIG. 1 is a functional flow chart of the present invention;
fig. 2 is a table of a specified format uploaded by a user.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The flow chart of the bidirectional autoregressive non-supervision pre-training fine-tuning type pollution discharge abnormality monitoring method is shown in fig. 1, and the specific implementation steps of the invention are as follows:
s1.1: the hardware device is selected.
And selecting data sensors such as a pressure sensor, a flow rate meter, a thermometer, a density detector and the like, and collecting data near the discharge port to be used as an index for detecting pollution. The main control and Bluetooth transmission module selects ESP8266 to realize main control and wireless transmission functions. The ultra-low power consumption 32-bit micro MCU of the Tensilica L106 which is advanced in the industry is integrated in a smaller-size package, the micro MCU is provided with a 16-bit simplified mode, the main frequency supports 80MHz and 160MHz, RTOS is supported, wi-Fi MAC/B/RF/PA/LNA is integrated, and an on-board antenna is integrated. The module supports standard IEEE802.11 b/g/n protocol, complete TCP/IP protocol stack.
S1.2: the data acquisition program mainly comprises modules for initializing, reading primary serial port interrupt data, packaging and transmitting the data and the like.
The initialization is mainly to perform initialization configuration on the MCU running environment. Setting the data type of temporarily storing the serial port interrupt data as the unsigned int type of a 32-bit computer, wherein the data length is 16bits, updating the data in real time during serial port interrupt, and performing secondary filtering in fixed time interrupt. The MCU breaks the sixteen bits of data read, divides the sixteen bits into the first eight bits and the last eight bits, uses 0x03 and 0x03 as data needle heads, uses 0x03 and 0x03 as data needle tails, packages the data, and finally sends the data in a character form. The data uploading part is connected with a computer in a hot spot mode through an ESP8266 module to realize wireless data transmission, so that the data is transmitted into a network and is uploaded to a data center of a cloud.
S2: preprocessing an original multidimensional time series sample;
selecting an interpolation algorithm by using a Mask-based method, and denoising and interpolating the data obtained by the data processing module;
taking a complete time sequence as a sample, randomly covering 10% of the time sequence, restoring the time sequence by using different interpolation algorithms such as linear interpolation, quadratic interpolation, moving average, index average and the like, and if the difference between the restored time sequence and the original sequence is smaller, indicating that the interpolation scheme has better characteristic adaptability to the sequence. And finally, selecting Kalman filtering as an interpolation algorithm.
S3: resampling the preprocessed multidimensional time series sample;
and performing telescopic transformation on a certain time sequence, resampling in a sliding window mode, extracting original samples at different time intervals, and extracting multi-dimensional time sequence features of different scales.
Determining a sliding window with a fixed length, then moving the sliding window to the right from the beginning of the sequence by a certain step length, wherein the covered sequence area after each movement is a small sample, thus dividing the original long-time sequence into a plurality of subsequences as a new data set for training a model, presetting the sliding window with the size of input_window, inputting the total length of the sample sequence to be seq_len and the dimension of the multidimensional time sequence to be in_dim, so that the input sample is a two-dimensional tensor X in ∈R seq_len×in_dim The three-dimensional tensor X 'of the sample is obtained through multi-scale stretching transformation and sliding window resampling' in ∈R batch _size×input_window×in_dim
S4: constructing a model comprising three parts of data resampling enhancement, an encoder and a decoder and pre-training;
s4.1: the sequence is corrupted using a noise function.
Five noise function methods, namely Token Masking, token delay, text messaging, sentence Permutation and Document Rotation, are utilized to destroy the time sequence input in the pre-training process, and destroy the time sequence obtained in the step S3. The difficulty of the task of reconstructing the pre-training phase sequence is increased, so that the model can learn and extract the characteristics in the multidimensional time sequence better.
S4.2: an encoder network backbone portion is constructed.
The self-attention layer and the MLP network are selected as backbone networks, and the encoder is formed by 12 iterations. The multidimensional time series is firstly entered
The line is standardized, so that the influence of dimension is eliminated, and position codes (Position Encoding) are used to add position information to the multidimensional time sequence, so that the position information is prevented from being lost due to parallelism in a follow-up attention mechanism.
pos means different time point positions, 2i and 2i+1 respectively correspond to different dimension indexes of a certain time point, odd dimension is coded by sin sine, even dimension is coded by cos cosine, and d model Referring to the overall dimension of the data, here the index of 10000 is prevented from overflowing too much. Three matrices of q (query), K (key), V (value) are then generated using three linear layers, and q is used to access each K, scaled and softmax (first converted to an exponent based on e, then normalized:then the value is used as the weight of V, so that the value of the attribute is calculated for subsequent MLP layers and decoders to reconstruct sequences:
a multi-headed mechanism is introduced to accommodate higher dimensional time series information:
MultiHead(Q,L,V)=[head 1 ,…,head h ]W 0
where g is the number of attention heads, the size of the input dimension must be divided by the number of attention heads when using multiple heads, dividing the input dimension into μ groups, each group of features having its own attention system.
S4.3: a decoder section is constructed.
Also, multi-head self-Attention layer and MLP layer are used as network skeleton, after one-time stacking, cross multi-head Attention aggregation operation (Cross Attention) and encoder are utilized to be the mostThe hidden state result of the latter layer is subjected to attention aggregation calculation, and then is subjected to multi-time stacking of the self-attention layer MLP layers. Taking K (key) obtained by self-attention mechanism in encoder, V (value) and Q (query) obtained by training in decoder to perform aggregation calculation, adding standardized layer after each MLP, adopting Layer Normalization, setting H as the number of hidden layer nodes in one layer, and l as the layer number of MLP, we can calculate Layer Normalization normalized statistic mu l Sum sigma l
The calculation of the statistics is irrelevant to the number of samples, and the number of the statistics is only dependent on the number of hidden nodes, so that as long as the number of the hidden nodes is enough, we can ensure that the normalized statistics of LN are enough representative, and the data output after the L-layer MLP isi represents the dimension, denoted in_dim, passing μ l Sum sigma l Can obtain normalized valueWherein E takes 1e -5 Prevent division by 0. And finally, recovering and reconstructing the destroyed time sequence in an autoregressive mode. The degree of restoration reconstruction is evaluated by a dynamic time warping index. Namely, after aligning two time sequences, a difference matrix is calculated, and the objective is to find a path from (0, 0) to (n, n) in the matrix, so that the cumulative euler distance of elements on the path is minimum, wherein one path is called a wraping path, namely a dynamic time warping index, and is used for representing the similarity of two time sequences:
an n x n matrix is constructed first, the matrix (i th ,j th ) The element being the point q i And c j Euclidean distance d (q) i ,c j ). The Wraping path defines the mapping between timings Q and C, denoted P. Is thatA group of consecutive matrix elements, t < th > of P h The individual elements being defined as p t =d(q i ,c j ) t Wherein p=p 1 ,p 2 ,…,p T N is less than or equal to T is less than or equal to 2n-1, and is essentially obtained by a dynamic programming method:
d ij =d(x i ,y j )
D(i,j)=d ij +min{D(i-1,j),D(i,j-1),D(i-1,j-1)}
wherein D (i-1, j) represents x i -1 and y j The subsequence distance at the time of matching, D (i, j-1) represents x i And y is j -1 subsequence distance at match, D (i-1, j-1) represents x i -1 and y j -1 subsequence distance at match. In a multivariate time series, x i And y j Are vectors in the in dim dimension, and x i The element in (a) is the value of the variable at time i, y j The element in (a) is the value of the variable at time j, d (x i ,y j ) I.e. x at time i i And y at time j j Distance when aligned. Vector x i And y j The distance calculation method d (x i ,y j ) Calculated by euclidean distance.
Euclidean distance:
s4.4: pre-training process
Resampling the acquired data to be used as an original sequence, then carrying out sequence destruction on the original sequence, then carrying out Forward reasoning (Forward) on an input model, processing the original sequence through an encoder and a decoder to obtain a new sequence, and recording the new sequence as a reconstructed sequence of the original sequence.
And then measuring the similarity between the original sequence and the reconstructed sequence by using a dynamic time warping index, marking the similarity as DTW, and constructing a loss function:
after the loss before and after the reconstruction is calculated, back propagation is carried out, model parameters are updated, so that the loss is smaller and smaller until the loss is converged and is not reduced, the model gradually learns the characteristics in the multidimensional time sequence in the process, the characteristic is automatically extracted, and the model is a pre-training process, and finally a pre-training model with the sequence extraction capability is obtained for later use of downstream tasks.
S5: and performing small sample refinement and sequence point classification on the model after the pre-training.
With a small number of samples, the training process is repeated, and Fine tuning (Fine tuning) of the small samples is done in the downstream task.
The data set is a long series of multidimensional time sequences, 10% of continuous sequences are randomly extracted in the data set to serve as samples, the training process is repeated, and Fine adjustment (Fine tuning) of small samples is performed in a downstream task: the input is a preprocessed multidimensional time series X' in ∈R batch_size×input_window×in_dim Reconstructing the sequence by an encoder and a decoder to obtain Calculating a dynamic time warping indicator for representing the similarity of two time sequences, and calculating +.>Finally, K-means clustering is carried out on the Loss to obtain abnormal points:
(1) Firstly setting a parameter K, which means that data are aggregated into several classes (here k=2);
(2) Randomly selecting two points from the data to form a clustering initial center point;
(3) The distances from all other points to these two points are calculated, then the center point closest to each data point is found, and the point is divided into clusters represented by this center point. Then all points are divided into two clusters;
(4) Re-calculating the mass centers of the two clusters to be used as the center point of the next cluster;
(5) Repeating the process of the steps 3-4, re-clustering, and repeating the process repeatedly;
(6) Stopping when the attribution category of all the sample points is not changed after re-clustering;
finally, two types of samples can be obtained, and the type with the small sample number is abnormal.
In practical cases, enterprises carry out modification of uploading data privately in order to discharge without exceeding standard: when the received emission data index has an ascending trend in a certain time period, but suddenly and rapidly descends after a certain time point in the process, a spike-shaped data change condition is generated, and the phenomenon does not occur in a short time period, it is reasonably speculated that a monitoring object possibly performs external interference on the detection equipment or tamper behavior on the uploaded data when the pollution discharge index is found to be in the ascending trend and exceeds the standard, the corresponding time point is marked as abnormal data, the company and the company sharing the pollution discharge port are marked with abnormality, and finally all abnormal points are found out through a clustering method.
S6: and (5) monitoring pollution discharge abnormality by using the model obtained in the step (S5).
Furthermore, the model is integrated in a visual large screen management platform, and the front end and the back end are built to form a management system of three views of government, enterprise and individual.
The front end side is developed based on a compact framework, and uses axios to interact with the rear end to acquire the analyzed company pollution discharge data, and introduces DataV and AntV to visualize the company pollution discharge abnormal data, so that the abnormal state of the company pollution discharge is visually displayed. The rear end uses Golang to develop the rear end, and mainly uses light Gin and GORM frames, wherein the Gin frame has better performance and more expansion functions compared with the native http. The persistence layer is written using the GORM framework to efficiently read, write and manage data to MySQL. In the aspect of system design, users are classified into three levels of government administrators, company personnel and tourists, different levels have different authorities, for example, the administrators (government-related administrators) can modify abnormal states of the company, the company can conduct abnormal complaints, the tourists can conduct reporting and the like. The Gin framework is combined with the flash of python, the Gin is responsible for processing routing, the flash is responsible for data processing and model operation, relevant indexes and data are obtained through processing by utilizing data processing libraries such as Numpy, pandas and the like and models written by the Pytorch framework, and are returned to the front end for visual display, and the advantages of the two programming languages are complementary and organically combined, so that the functional advantages of the two programming languages in different fields are fully exerted.
And packaging the local project and various dependencies by writing the Dockerfile, generating a Docker mirror image, uploading the Docker mirror image to a warehouse, and then carrying out mirror image pulling on a server side and creating a container to deploy application services. The Docker can run on a plurality of platforms, and can easily migrate the application running on one platform to the other platform without worrying about the condition that the application cannot run normally due to the change of the running environment.
The invention can reconstruct the sewage discharge time sequence data of a certain port in a certain period of time of a standard data set (namely the data set uploaded after hardware acquisition), compare the difference before and after reconstruction and cluster to obtain abnormality.
The user can also upload a table in a specified format, so that the user can be helped to analyze multidimensional time series anomalies autonomously:
as shown in FIG. 2, the table uploaded by the user can perform anomaly analysis on the multi-dimensional time sequence by pre-selecting the two-end time, and return to an anomaly time point, a single-dimensional anomaly comparison condition (such as an increase of the number of anomaly points) of two periods, and a multi-dimensional anomaly comparison condition (i.e. consider the comprehensive condition of all pollutants and output statistical anomaly information) of two periods.
While the invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and substitutions can be made herein without departing from the scope of the invention as defined by the appended claims.

Claims (4)

1. A bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method,
the method is characterized by comprising the following steps of:
s1: collecting data:
the multi-channel acquisition and transmission module periodically acquires the following data of the pollution source discharge port: pressure, flow rate, temperature, density; obtaining an original multidimensional time sequence sample, and transmitting the original multidimensional time sequence sample to a data processing module;
s2: preprocessing an original multidimensional time series sample:
using a Mask-based method to perform denoising and interpolation operation on the data obtained by the data processing module;
s3: resampling the preprocessed multidimensional time series samples:
performing telescopic transformation on the preprocessed multidimensional time series samples, resampling in a sliding window mode, extracting original samples at set time intervals, and extracting multidimensional time series features with set scales;
s4: constructing a model comprising three parts of data resampling enhancement, an encoder and a decoder and pre-training;
s5: performing small sample refinement and sequence point classification on the model after pre-training:
firstly, randomly extracting 10% of continuous sequences from a data set formed by a long string of multidimensional time sequences as samples, repeating the training process in S4, and performing Fine tuning of small samples in a downstream task
The specific steps of fine tuning the small sample are as follows: input is the original multidimensional time series X 'after pretreatment' in ∈R batch _size×input_window×in_dim Wherein batch_size is the training batch size, input_window represents the sequence length, i.e. the sliding window width in the preprocessing process, and in_dim represents the dimension of the sample;
through encoder and decoderCode reconstruction sequence obtainingThe loss of each time point is obtained, and the sequence points are classified by a K-means clustering method, and the method specifically comprises the following steps:
(1) Firstly setting parameters K, wherein the meaning of K is to aggregate data into several categories, and K=2 is taken here;
(2) Randomly selecting two points from the data to form a clustering initial center point;
(3) Calculating the distances from all other points to the two points, finding out the center point nearest to each data point, and dividing the point into clusters represented by the center point; then all points are divided into two clusters;
(4) Re-calculating the mass centers of the two clusters to be used as the center point of the next cluster;
(5) Repeating the processes of the steps (3) - (4), clustering again, and repeating the process repeatedly;
(6) Stopping when the attribution category of all the sample points is not changed after re-clustering;
finally, two types of samples can be obtained, and the type with the small sample number is abnormal;
s6: monitoring pollution discharge abnormality by using the model obtained in the step S5;
the step S4 specifically includes the following steps:
s4.1: using a noise function to destroy the sequence:
destroying the time sequence obtained in the step S3 by using any one or more combination of five noise functions of Token Masking, token delay, text infiling Sentence Permutation and Document Rotation;
s4.2: constructing an encoder network skeleton part:
selecting a self-attention layer and an MLP network as backbone networks, and iterating 12 times to form an encoder;
the multidimensional time series is standardized, the influence of dimension is eliminated,
adding position information for multidimensional time series using position coding PE
pos refers to positions of different time points, 2i and 2i+1 respectively correspond to different dimension indexes of a certain time point, and odd dimensions are favorable
With sin sine coding, even dimension with cos cosine coding, d model Refers to the total dimension of the data, here preventing overflow by an index of 10000 being too large;
three matrices of Q, K and V are generated by using three linear layers, each K is accessed by using Q, and the three matrices are converted into an exponent based on e after scaling and softmax, and then normalization processing is carried out:
the normalized value is used as the weight of V, so that the attribute value is calculated for subsequent MLP layers and decoders to reconstruct sequences:
a multi-headed mechanism is introduced to accommodate higher dimensional time series information:
MultiHead(Q,K,V)=[head 1 ,…,head h ]W 0
where head i =Attention(QWi i Q ,KW i K ,VW i V )
wherein h is the number of attention heads, the size of the input dimension must be divided by the number of attention heads when using multi-head attention, the input dimension is divided into h groups, and each group of characteristics has an own attention system;
s4.3: constructing a decoder part:
taking a multi-head self-Attention layer and an MLP layer as network frameworks, performing Attention aggregation calculation by using a hidden state result of a Cross multi-head Attention aggregation operation Cross Attention and the last layer of an encoder after one-time stacking, and then stacking the MLP layers of the multi-time self-Attention layer;
taking K and V obtained by self-attention mechanism in the encoder and Q obtained by training in the decoder to perform aggregation calculation, and adding a standardized layer after each layer of MLP by adopting a Layer Normalization method, wherein H is the number of hidden layer nodes in one layer, l is the layer number of the MLP, and we can calculate the normalized statistic mu of Layer Normalization l Sum sigma l
Wherein a is l Statistics μ for an input multidimensional time series l Sum sigma l Is independent of the number of samples, but depends only on the number of hidden nodes,
as long as the number of hidden nodes is enough, the normalization statistics of Layer Normalization can be guaranteed to be representative enough, and the data output after the L-layer MLP isi represents the dimension, denoted in_dim, passing μ l Sum sigma l Normalized values can be obtained>Wherein E is 1e-5, and 0 is prevented from being removed;
performing restoration reconstruction on the destroyed time sequence in an autoregressive mode; the restoration reconstruction degree is evaluated through a dynamic time warping index;
the calculation mode of the dynamic time warping index is as follows: after aligning the two time sequences, calculating a difference matrix, wherein the goal is to find a path from (0, 0) to (n, n) in the matrix, so that the accumulated Euler distance of elements on the path is minimum, and the minimum path is a Wraping path, namely a dynamic time warping index, used for representing the similarity of the two time sequences:
constructing an n x n matrix, the matrix (i th ,j th ) The element being the point q i And c j Euclidean distance d (q) i ,c j ) The method comprises the steps of carrying out a first treatment on the surface of the The Wraping path defines a mapping between timings Q and C, denoted as P, a set of consecutive matrix elements, the t < th > of P h The individual elements being defined as p t =d(q i ,c j ) t Wherein p=p 1 ,p 2 ,...,p T N is less than or equal to T is less than or equal to 2n-1, and is essentially obtained by a dynamic programming method:
d ij =d(x i ,y j )
D(i,j)=d ij +min{D(i-1,j),D(i,j-1),D(i-1,j-1)}
wherein D (i-1, j) represents x i-1 And y is j The subsequence distance at the time of matching, D (i, j-1) represents x i And y is j-1 The subsequence distance at the time of matching, D (i-1, j-1) represents x i-1 And y is j-1 Sub-sequence distance at match;
in a multivariate time series, x i And y j Are vectors in the in dim dimension, and x i The element in (a) is the value of the variable at time i, y j The element in (a) is the value of the variable at time j, d (x i ,y j ) I.e. x at time i i And y at time j j Distance at alignment; vector x i And y j The distance calculation method d (x i ,y j ) May be calculated by euclidean distance or mahalanobis distance;
euclidean distance:
for a mean value μ= (μ) 123 ,…,μ p ) T Multivariate x= (x) with covariance matrix S 1 ,x 2 ,x 3 ,…,x p ) T The mahalanobis distance is:
it differs from euclidean distance in that it allows for a link between various characteristics and is Scale-independent (Scale-independent), i.e. independent of the measurement Scale;
s4.4: the pre-training process is as follows:
resampling the acquired data to be used as an original sequence, then carrying out sequence destruction on the original sequence, then transmitting the original sequence into a model to carry out Forward reasoning Forward, processing the original sequence through an encoder and a decoder to obtain a new sequence, and recording the new sequence as a reconstructed sequence of the original sequence;
the similarity between the original sequence and the reconstructed sequence is measured by using a dynamic time warping index, and is recorded as DTW, and a loss function is constructed:
after the loss before and after the reconstruction is calculated, back propagation is carried out, model parameters are updated, so that the loss is smaller and smaller until the loss is converged and is not reduced, the model gradually learns the characteristics in the multidimensional time sequence in the process, the characteristic is automatically extracted, and the model is a pre-training process, and finally a pre-training model with the sequence extraction capability is obtained for later use of downstream tasks.
2. The method for monitoring the abnormal sewage disposal according to claim 1, wherein in the step S1,
the steps of the multichannel acquisition and transmission module for acquiring data comprise: initializing, reading primary serial port interrupt data, packaging and transmitting the data,
the initializing step comprises the following steps: initializing and configuring an ESP8266 chip running environment;
the step of reading the primary serial port interrupt data specifically comprises the following steps: setting the data type of temporarily stored serial port interrupt data as the unsigned int type of a 32-bit computer, wherein the data length is 16bits, updating the data in real time during serial port interrupt, and performing secondary filtering in fixed time interrupt;
the data packaging and transmitting method specifically comprises the following steps: the ESP8266 chip breaks the sixteen bits of data, the eight bits are divided into the first eight bits and the last eight bits, the data needle heads are used for 0x03 and 0x03, the data needle tails are used for 0x03 and 0x03, the data are packed, and finally, the ESP8266 chip is used for transmitting the data to the data processing module in a wireless hot spot mode through a network in a character form.
3. The bidirectional autoregressive non-supervision pretraining fine-tuning sewage anomaly monitoring method according to claim 2, wherein in the step S2, the Mask-based method specifically comprises: a complete time sequence is taken as a sample, 10% of the time sequence is randomly covered, and the time sequence is restored by using a Kalman filtering algorithm.
4. The application of the bidirectional autoregressive non-supervision pre-training fine-tuning sewage anomaly monitoring method to a visual large screen management platform according to claim 3,
the hardware part of the visual large screen management platform comprises: a multi-channel acquisition and transmission module, a data processing module,
the multi-channel acquisition and transmission module comprises: the pressure sensor, the flow velocity meter, the thermometer and the density detector are arranged at the pollution source discharge port to be monitored,
the front end of the data processing module is developed based on a compact framework, and uses axios to interact with the rear end to acquire the analyzed pollution discharge data, and introduces DataV and AntV to visualize the abnormal pollution discharge data;
the back end of the data processing module is developed based on Golang and comprises Gin and GORM frames;
coding a data processing module, namely generating a Docker mirror image by writing Docker file, and performing multi-platform migration operation;
the visual large screen management platform comprises three views of a government, an enterprise and a person.
CN202210687441.8A 2022-06-17 2022-06-17 Bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method and application Active CN115099321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210687441.8A CN115099321B (en) 2022-06-17 2022-06-17 Bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210687441.8A CN115099321B (en) 2022-06-17 2022-06-17 Bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method and application

Publications (2)

Publication Number Publication Date
CN115099321A CN115099321A (en) 2022-09-23
CN115099321B true CN115099321B (en) 2023-08-04

Family

ID=83291933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210687441.8A Active CN115099321B (en) 2022-06-17 2022-06-17 Bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method and application

Country Status (1)

Country Link
CN (1) CN115099321B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115792158B (en) * 2022-12-07 2023-09-15 广东建研环境监测股份有限公司 Method and device for realizing dynamic water quality monitoring based on Internet of things
CN116821697A (en) * 2023-08-30 2023-09-29 聊城莱柯智能机器人有限公司 Mechanical equipment fault diagnosis method based on small sample learning

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102128794A (en) * 2011-01-31 2011-07-20 重庆大学 Manifold learning-based method for monitoring water quality by remote sensing
CN108801950A (en) * 2018-05-21 2018-11-13 东南大学 A kind of ultraviolet spectra abnormal water detection method based on sliding window Multiscale Principal Component Analysis
CN109858572A (en) * 2019-03-13 2019-06-07 中南大学 A kind of modified hierarchy clustering method for sewage abnormality detection
CN111401582A (en) * 2020-03-12 2020-07-10 中交疏浚技术装备国家工程研究中心有限公司 Abnormity identification method and monitoring platform for domestic sewage treatment facility
CN112529678A (en) * 2020-12-23 2021-03-19 华南理工大学 Financial index time sequence abnormity detection method based on self-supervision discriminant network
CN112765896A (en) * 2021-01-29 2021-05-07 湖南大学 LSTM-based water treatment time sequence data anomaly detection method
CN113361199A (en) * 2021-06-09 2021-09-07 成都之维安科技股份有限公司 Multi-dimensional pollutant emission intensity prediction method based on time series
WO2021247408A1 (en) * 2020-06-02 2021-12-09 Pangolin Llc Ai and data system to monitor pathogens in wastewater and methods of use
CN114090396A (en) * 2022-01-24 2022-02-25 华南理工大学 Cloud environment multi-index unsupervised anomaly detection and root cause analysis method
CN114358435A (en) * 2022-01-11 2022-04-15 北京工业大学 Pollution source-water quality prediction model weight influence calculation method of two-stage space-time attention mechanism

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181876A1 (en) * 2016-12-22 2018-06-28 Intel Corporation Unsupervised machine learning to manage aquatic resources

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102128794A (en) * 2011-01-31 2011-07-20 重庆大学 Manifold learning-based method for monitoring water quality by remote sensing
CN108801950A (en) * 2018-05-21 2018-11-13 东南大学 A kind of ultraviolet spectra abnormal water detection method based on sliding window Multiscale Principal Component Analysis
CN109858572A (en) * 2019-03-13 2019-06-07 中南大学 A kind of modified hierarchy clustering method for sewage abnormality detection
CN111401582A (en) * 2020-03-12 2020-07-10 中交疏浚技术装备国家工程研究中心有限公司 Abnormity identification method and monitoring platform for domestic sewage treatment facility
WO2021247408A1 (en) * 2020-06-02 2021-12-09 Pangolin Llc Ai and data system to monitor pathogens in wastewater and methods of use
CN112529678A (en) * 2020-12-23 2021-03-19 华南理工大学 Financial index time sequence abnormity detection method based on self-supervision discriminant network
CN112765896A (en) * 2021-01-29 2021-05-07 湖南大学 LSTM-based water treatment time sequence data anomaly detection method
CN113361199A (en) * 2021-06-09 2021-09-07 成都之维安科技股份有限公司 Multi-dimensional pollutant emission intensity prediction method based on time series
CN114358435A (en) * 2022-01-11 2022-04-15 北京工业大学 Pollution source-water quality prediction model weight influence calculation method of two-stage space-time attention mechanism
CN114090396A (en) * 2022-01-24 2022-02-25 华南理工大学 Cloud environment multi-index unsupervised anomaly detection and root cause analysis method

Also Published As

Publication number Publication date
CN115099321A (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN115099321B (en) Bidirectional autoregressive non-supervision pretraining fine-tuning type pollution discharge abnormality monitoring method and application
Gao et al. Ship-handling behavior pattern recognition using AIS sub-trajectory clustering analysis based on the T-SNE and spectral clustering algorithms
Wei et al. LSTM-autoencoder-based anomaly detection for indoor air quality time-series data
US10073908B2 (en) Functional space-time trajectory clustering
Sayed et al. From time-series to 2d images for building occupancy prediction using deep transfer learning
CN116300691A (en) State monitoring method and system for multi-axis linkage numerical control machining
Wong et al. Recurrent auto-encoder model for large-scale industrial sensor signal analysis
CN114487673A (en) Power equipment fault detection model based on Transformer and electronic equipment
CN116527357A (en) Web attack detection method based on gate control converter
Kumar et al. An adaptive transformer model for anomaly detection in wireless sensor networks in real-time
Fan et al. Data visualization of anomaly detection in semiconductor processing tools
Mantyla Discrete hidden Markov models with application to isolated user-dependent hand gesture recognition
CN110260914A (en) A kind of project security monitoring system realm division methods based on measuring point space-time characteristic
Takeishi et al. Knowledge-based regularization in generative modeling
CN115879051A (en) Track big data anomaly detection method and system based on VAE
CN116383747A (en) Anomaly detection method for generating countermeasure network based on multi-time scale depth convolution
CN112735604B (en) Novel coronavirus classification method based on deep learning algorithm
CN110650130B (en) Industrial control intrusion detection method based on multi-classification GoogLeNet-LSTM model
CN114969761A (en) Log anomaly detection method based on LDA theme characteristics
Azim et al. Deep graph stream SVDD: anomaly detection in cyber-physical systems
Correia et al. Online Time-series Anomaly Detection: A Survey of Modern Model-based Approaches
Lu et al. Visual high dimensional industrial process monitoring based on deep discriminant features and t-SNE
Di Ciaccio Optimal coding of high-cardinality categorical data in machine learning
US20240104344A1 (en) Hybrid-conditional anomaly detection
Su et al. Unsupervised Attack Isolation in Cyber-physical Systems: A Competitive Test of Clustering Algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant