CN116108885A

CN116108885A - An abnormal detection method for medium and low pressure gas pressure regulators based on LSTM-AE-DT model

Info

Publication number: CN116108885A
Application number: CN202310008710.8A
Authority: CN
Inventors: 张艳; 王海超
Original assignee: Shanghai University of Electric Power
Current assignee: Shanghai University of Electric Power
Priority date: 2023-01-04
Filing date: 2023-01-04
Publication date: 2023-05-12

Abstract

The present invention relates to a method for detecting abnormalities of medium and low pressure gas pressure regulators based on LSTM-AE-DT model, comprising: S1, collecting outlet pressure data of medium and low pressure gas pressure regulators; S2, pre-setting the outlet pressure data of gas pressure regulators Processing; S3, establishing the LSTM‑AE model, inputting the preprocessed training data to train the model, and saving the trained model; S4, inputting the preprocessed test data into the trained algorithm model, and outputting the gas adjustment Predicted value of the outlet pressure of the pressure regulator; S5. Calculate the error and threshold ε between the predicted value of the outlet pressure of the gas pressure regulator and the actual value of the outlet pressure of the gas pressure regulator; S6. Detect the state of the time series data of the gas pressure regulator according to the threshold ε , output the corresponding operating status of the gas pressure regulator. Compared with the prior art, the present invention combines the long-term short-term memory network and the self-encoder, can reconstruct the time-series data on the basis of learning the long-term dependence of the data, and improves the abnormal detection performance of the time-series data of the gas pressure regulator .

Description

A method for detecting abnormality of medium and low pressure gas regulators based on LSTM-AE-DT model

技术领域Technical Field

本发明涉及一种中低压燃气调压器异常检测方法，尤其是涉及一种基于LSTM-AE-DT模型的中低压燃气调压器异常检测方法。The present invention relates to a method for detecting anomalies of a medium- and low-pressure gas pressure regulator, and in particular to a method for detecting anomalies of a medium- and low-pressure gas pressure regulator based on an LSTM-AE-DT model.

背景技术Background Art

随着燃气事业迅猛发展，如何保障燃气输送过程中的平稳性、安全性一直是燃气领域研究的重点。燃气调压器是保障燃气正常输送至下游用户的重要设备，当其发生故障时会导致下游用气不足、燃气供应不平稳等问题，如果没有及时发现故障并进行维修，会不断恶化引发爆炸、火灾等事故。通过燃气调压器运行过程中的时序数据可以反馈调压器状态，进而检测燃气调压器是否出现故障。因此燃气调压器时序数据的异常检测研究具有重大实践意义和价值。With the rapid development of the gas industry, how to ensure the stability and safety of gas transportation has always been the focus of research in the gas field. The gas pressure regulator is an important equipment to ensure the normal transportation of gas to downstream users. When it fails, it will cause problems such as insufficient downstream gas consumption and unstable gas supply. If the fault is not discovered and repaired in time, it will continue to deteriorate and cause accidents such as explosions and fires. The time series data during the operation of the gas pressure regulator can be used to feedback the status of the pressure regulator, and then detect whether the gas pressure regulator is faulty. Therefore, the research on anomaly detection of gas pressure regulator time series data has great practical significance and value.

燃气调压器异常检测技术主要是用来判断燃气调压器是正常工作状态还是异常状态。传统方法的燃气调压器异常检测技术是通过维修工人的经验直接判断异常或者利用传统的统计学理论进行异常判断；传统方法不能提前预警故障隐患，且精确率较低、时效性较差。故对燃气调压器时序数据异常检测开展智能检测方法的研究是亟待解决的问题。Gas pressure regulator anomaly detection technology is mainly used to determine whether the gas pressure regulator is in normal working state or abnormal state. The traditional method of gas pressure regulator anomaly detection technology is to directly judge the anomaly through the experience of maintenance workers or use traditional statistical theory to judge the anomaly; the traditional method cannot warn of potential faults in advance, and has low accuracy and poor timeliness. Therefore, the research on intelligent detection methods for anomaly detection of gas pressure regulator time series data is an urgent problem to be solved.

近年来，机器学习方法越来越多，应用也越来越广泛，行业内许多学者提出基于机器学习的燃气In recent years, there are more and more machine learning methods and their applications are becoming more and more extensive. Many scholars in the industry have proposed gas

调压器异常检测技术。基于机器学习的燃气调压器异常检测技术是利用系统在正常和故障情况下的历史数据训练支持向量机等机器学习算法进行异常检测；该技术需要人工标注数据做有监督处理，需要消耗大量的时间和人力成本。Gas pressure regulator anomaly detection technology. The machine learning-based gas pressure regulator anomaly detection technology uses the system's historical data under normal and fault conditions to train support vector machines and other machine learning algorithms for anomaly detection; this technology requires manual labeling of data for supervised processing, which consumes a lot of time and labor costs.

发明内容Summary of the invention

本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于LSTM-AE-DT模型的中低压燃气调压器异常检测方法。The purpose of the present invention is to overcome the defects of the above-mentioned prior art and provide a method for detecting abnormalities of medium and low pressure gas pressure regulators based on the LSTM-AE-DT model.

本发明的目的可以通过以下技术方案来实现：The purpose of the present invention can be achieved by the following technical solutions:

一种基于LSTM(长短期记忆网络)-AE(自编码器)-DT(动态阈值法)模型的中低压燃气调压器异常检测方法，包括以下步骤：A method for detecting abnormalities of medium and low pressure gas pressure regulators based on LSTM (Long Short-Term Memory Network)-AE (Autoencoder)-DT (Dynamic Threshold Method) model comprises the following steps:

S1、采集中低压燃气调压器出口压力数据；S1. Collect outlet pressure data of medium and low pressure gas pressure regulators;

S2、对燃气调压器出口压力数据进行预处理，得到训练数据和测试数据；S2. Preprocess the gas pressure regulator outlet pressure data to obtain training data and test data;

S3、建立LSTM-AE模型，输入预处理后的训练数据对模型进行训练，并且保存训练好的模型；S3. Establish an LSTM-AE model, input the preprocessed training data to train the model, and save the trained model;

S4、将预处理后的测试数据输入到训练好的算法模型中，输出燃气调压器出口压力的预测值；S4, input the preprocessed test data into the trained algorithm model, and output the predicted value of the gas pressure regulator outlet pressure;

S5、重构输出的燃气调压器出口压力的预测值与燃气调压器出口压力的真实值的误差，和此段时序数据的阈值ε；S5, the error between the predicted value of the gas pressure regulator outlet pressure output by reconstruction and the true value of the gas pressure regulator outlet pressure, and the threshold ε of this section of time series data;

S6、根据阈值ε检测燃气调压器时序数据的状态，输出燃气调压器正常运行状态或异常运行状态。S6. Detect the state of the gas pressure regulator timing data according to the threshold ε, and output the normal operation state or abnormal operation state of the gas pressure regulator.

进一步地，所述的步骤S2中通过步骤S1得到燃气调压器原始出口压力时序训练数据集X＝(x₁，x₂，…，x_j，…，x_N)，1≤j≤N，计算缺失数据的值前三天同一时间点出口压力值的平均值，对缺失数据的值进行填补；运用离差标准化方法对原始时序数据集进行处理，得到标准化的时序数据集X′＝(x′₁，x′₂，…，x′_j，…，x′_N)，x′_j∈[0，1]，公式如下：Furthermore, in the step S2, the original outlet pressure time series training data set X=(x ₁ , x ₂ , …, x _j , …, x _N ), 1≤j≤N, of the gas pressure regulator is obtained through step S1, and the average value of the outlet pressure value at the same time point three days before the missing data is calculated to fill the missing data value; the original time series data set is processed by using the deviation standardization method to obtain a standardized time series data set X′=(x′ ₁ , x′ ₂ , …, x′ _j , …, x′ _N ), x′ _j ∈[0,1], and the formula is as follows:

其中，x_j为第j个原始数据，x′_j为第j个标准化时序数据，min_1≤j≤N{x_j}表示训练数据集中的最小值，max_1≤j≤N{x_j}表示训练数据集中的最大值；最后通过滑动窗口的方法对数据集X′进行分帧操作，滑动窗口宽度为l，1≤l≤N，滑动步长设为1，此时经过分帧操作的数据集Y＝(y₁，y₂，…，y_N-l+1)，其中y_i＝(x′_i，x′_i+1，…，x′_i+l-1)，1≤i≤N-l+1。Among them, _xj is the jth original data, _x′j is the jth standardized time series data, min _1≤j≤N { _xj } represents the minimum value in the training data set, and max _1≤j≤N { _xj } represents the maximum value in the training data set; finally, the data set X′ is framed by the sliding window method, the sliding window width is l, 1≤l≤N, and the sliding step is set to 1. At this time, the data set after the frame operation is Y=( _y1 , _y2 ,…, _yN-l+1 ), where _yi =( _x′i , _x′i+1 ,…, _x′i+l-1 ), 1≤i≤N-l+1.

进一步地，所述的步骤S3中LSTM网络采用遗忘门筛选信息，输入门保留必要信息并进行编码，将隐藏状态通过输出门输出，传递到下一个LSTM单元进行训练，对数据进行特征提取。Furthermore, in step S3, the LSTM network uses a forget gate to filter information, the input gate retains necessary information and encodes it, and the hidden state is output through the output gate and passed to the next LSTM unit for training to extract features from the data.

进一步地，所述的步骤S3将LSTM网络构建在AE的编码器和解码器上，编码器以固定大小的向量形式获取高维输入数据序列。Furthermore, the step S3 constructs an LSTM network on the encoder and decoder of AE, and the encoder obtains a high-dimensional input data sequence in the form of a vector of a fixed size.

进一步地，所述的步骤S4中将分帧后的数据y_i＝(x′_i，x′_i+1，…，x′_i+l-1)，输入LSTM-AE模型进行训练，得到重构的时序数据集

Furthermore, in step S4, the framed data _yi = ( _x'i , x'i ₊₁ , ..., _x'i+l-1 ) is input into the LSTM-AE model for training to obtain a reconstructed time series data set.

进一步地，所述的步骤S5中计算燃气调压器出口压力的预测值与燃气调压器出口压力的真实值的重构误差e_j：Furthermore, in step S5, the reconstruction error e _j between the predicted value of the gas pressure regulator outlet pressure and the true value of the gas pressure regulator outlet pressure is calculated as follows:

其中，x′_j表示第j个出口压力数据，n表示x′_j重构的次数，

表示x′_j第k次重构的出口压力数据，由x′_j的重构误差e_j的计算方法得到重构误差序列：e＝(e₁，e₂，…，e_j，…，e_N)，对e进行指数加权滑动平均，得到误差序列e′＝(e′₁，e′₂，…，e′_j，…，e′_N)，之后计算阈值序列ε：Where x′ _j represents the jth outlet pressure data, n represents the number of times x′ _j is reconstructed,

represents the outlet pressure data reconstructed for the kth time by x′ _j . The reconstruction error sequence is obtained by calculating the reconstruction error e _j of x′ _j : e = (e ₁ , e ₂ , …, e _j , …, e _N ). The error sequence e′ = (e′ ₁ , e′ ₂ , …, e′ _j , …, e′ _N ) is obtained by performing exponential weighted sliding average on e. Then, the threshold sequence ε is calculated:

ε＝μ(e′)+Z·σ(e′)ε＝μ(e′)+Z·σ(e′)

其中，μ(·)是均值，σ(·)是标准差，Z表示人为设定的权重，

计算f(ε_t)：Among them, μ(·) is the mean, σ(·) is the standard deviation, and Z represents the artificially set weight.

Calculate f(ε _t ):

Δμ(e′)＝μ(e′)-μ({e∈e′|e＜ε_t})Δμ(e′)=μ(e′)-μ({e∈e′|e<ε _t })

Δσ(e′)＝σ({e∈e′|e＜ε_t})Δσ(e′)=σ({e∈e′|e<ε _t })

e_a＝{e∈e′|e＞ε_t}e _a ={e∈e′|e＞ε _t }

其中，E_seq为e_a中连续数据的集合，使得f(ε_t)达到最大值的阈值ε为所求阈值，根据ε进行时序数据的异常检测。Wherein, E _seq is a set of continuous data in _ea , the threshold ε that makes f(ε _t ) reach the maximum value is the required threshold, and anomaly detection of time series data is performed based on ε.

进一步地，所述的步骤S5中通过使用反向传播算法对LSTM-AE-DT模型进行优化，并对网络参数进行更新，直至输入和输出的损失函数L达到最小值，损失函数L的公式为：Furthermore, in step S5, the LSTM-AE-DT model is optimized by using a back propagation algorithm, and the network parameters are updated until the input and output loss functions L reach a minimum value. The formula of the loss function L is:

其中，e_j为燃气调压器出口压力的预测值与燃气调压器出口压力的真实值的重构误差。Wherein, _ej is the reconstruction error between the predicted value of the gas pressure regulator outlet pressure and the true value of the gas pressure regulator outlet pressure.

进一步地，所述的步骤S6中采用精确率、召回率和F1值作为评价指标，计算公式为：Furthermore, in step S6, precision, recall and F1 value are used as evaluation indicators, and the calculation formula is:

其中，Precision为精确率，Recall为召回率，TP表示异常数据预测为异常数据的个数；FP表示正常数据预测为异常数据的个数；FN表示异常数据预测为正常数据的个数。Among them, Precision is the precision, Recall is the recall, TP represents the number of abnormal data predicted as abnormal data; FP represents the number of normal data predicted as abnormal data; FN represents the number of abnormal data predicted as normal data.

进一步地，所述的步骤S3中LSTM-AE模型利用LSTM的存储能力，使得经过编码器处理的数据保持时序数据长依赖性，同时将高维输入向量压缩到低维向量。Furthermore, the LSTM-AE model in step S3 utilizes the storage capacity of LSTM so that the data processed by the encoder maintains the long-term dependency of the time series data, while compressing the high-dimensional input vector into a low-dimensional vector.

进一步地，所述的步骤S5中运用动态阈值法计算时序数据的阈值ε。Furthermore, in step S5, a dynamic threshold method is used to calculate the threshold ε of the time series data.

与现有技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1.本发明通过将长短期记忆网络和自编码器相结合，长短期记忆网络由多个LSTM单元组成，自编码器能较好地对非平稳时序数据进行特征提取，消除噪声对时序数据的影响，二者结合建立的模型采用无监督算法，在训练过程中不需要为数据打标签，学习数据在时间序列中的长依赖性，提取数据间的相关性，从而重构燃气调压器时序数据，节省时间，降低了人力成本。1. The present invention combines a long short-term memory network with an autoencoder. The long short-term memory network is composed of multiple LSTM units. The autoencoder can better extract features from non-stationary time series data and eliminate the influence of noise on time series data. The model established by combining the two adopts an unsupervised algorithm. During the training process, it is not necessary to label the data. The long-term dependence of the data in the time series is learned and the correlation between the data is extracted, thereby reconstructing the time series data of the gas pressure regulator, saving time and reducing labor costs.

2.本发明通过使用动态阈值法，充分考虑到时序数据中的长记忆性，避免时序序列太长导致信息丢失，根据时序数据的变化动态的调整阈值，实现阈值自动更新进行异常检测，提高了燃气调压器时序数据异常检测的性能。2. The present invention uses a dynamic threshold method, fully considering the long memory in the time series data, avoiding information loss caused by too long a time series sequence, dynamically adjusting the threshold according to the changes in the time series data, realizing automatic updating of the threshold for anomaly detection, and improving the performance of anomaly detection of the gas pressure regulator time series data.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的流程示意图；Fig. 1 is a schematic diagram of a process of the present invention;

图2为本发明的LSTM-AE模型结构示意图；FIG2 is a schematic diagram of the LSTM-AE model structure of the present invention;

图3为本发明的中低压燃气调压器异常检测结果曲线示意图。FIG. 3 is a schematic diagram of a curve showing abnormality detection results of a medium- and low-pressure gas pressure regulator according to the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图和具体实施例对本发明进行详细说明。本实施例以本发明技术方案为前提进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。The present invention is described in detail below in conjunction with the accompanying drawings and specific embodiments. This embodiment is implemented based on the technical solution of the present invention, and provides a detailed implementation method and specific operation process, but the protection scope of the present invention is not limited to the following embodiments.

如图1所示为一种基于LSTM-AE-DT模型的中低压燃气调压器异常检测方法，包括以下步骤：As shown in FIG1 , a method for detecting abnormalities of medium and low pressure gas pressure regulators based on the LSTM-AE-DT model is shown, which includes the following steps:

步骤S2中通过步骤S1得到燃气调压器原始出口压力时序训练数据集X＝(x₁，x₂，…，x_j，…，x_N)，1≤j≤N，计算缺失数据的值前三天同一时间点出口压力值的平均值，对缺失数据的值进行填补；运用离差标准化方法对原始时序数据集进行处理，得到标准化的时序数据集X′＝(x′₁，x′₂，…，x′_j，…，x′_N)，x′_j∈[0，1]，公式如下：In step S2, the original outlet pressure time series training data set X=( _x1 , _x2 , ..., _xj , ..., _xN ), 1≤j≤N of the gas pressure regulator is obtained through step S1, and the average value of the outlet pressure value at the same time point three days before the missing data is calculated to fill the missing data value; the original time series data set is processed by using the deviation standardization method to obtain a standardized time series data set X′=( _x′1 , _x′2 , ..., _x′j , ..., _x′N ), _x′j∈ [0,1], and the formula is as follows:

步骤S3中LSTM网络采用遗忘门筛选信息，输入门保留必要信息并进行编码，将隐藏状态通过输出门输出，传递到下一个LSTM单元进行训练，对数据进行特征提取。将LSTM网络构建在AE的编码器和解码器上，编码器以固定大小的向量形式获取高维输入数据序列。In step S3, the LSTM network uses the forget gate to filter information, the input gate retains necessary information and encodes it, and the hidden state is output through the output gate and passed to the next LSTM unit for training to extract features from the data. The LSTM network is built on the encoder and decoder of AE, and the encoder obtains high-dimensional input data sequences in the form of fixed-size vectors.

步骤S4中将分帧后的数据y_i＝(x′_i，x′_i+1，…，x′_i+l-1)，输入LSTM-AE模型进行训练，得到重构的时序数据集

In step S4, the framed data _yi = ( _x'i , x'i ₊₁ , ..., _x'i+l-1 ) is input into the LSTM-AE model for training to obtain a reconstructed time series data set.

步骤S5中计算燃气调压器出口压力的预测值与燃气调压器出口压力的真实值的重构误差e_j：In step S5, the reconstruction error e _j between the predicted value of the gas pressure regulator outlet pressure and the true value of the gas pressure regulator outlet pressure is calculated:

ε＝μ(e′)+Z·σ(e′)ε＝μ(e′)+Z·σ(e′)

其中，μ(·)是均值，σ(·)是标准差，Z表示人为设定的权重，

Calculate f(ε _t ):

Δσ(e′)＝σ({e∈e′|e＜ε_t})Δσ(e′)=σ({e∈e′|e<ε _t })

e_a＝{e∈e′|e＞ε_t}e _a ={e∈e′|e＞ε _t }

其中，E_seq为ea中连续数据的集合，使得f(ε_t)达到最大值的阈值ε为所求阈值，根据ε进行时序数据的异常检测。Wherein, E _seq is a set of continuous data in ea , the threshold ε that makes f(ε _t ) reach the maximum value is the required threshold, and anomaly detection of time series data is performed based on ε.

通过使用反向传播算法对LSTM-AE-DT模型进行优化，并对网络参数进行更新，直至输入和输出的损失函数L达到最小值，损失函数L的公式为：The LSTM-AE-DT model is optimized by using the back propagation algorithm, and the network parameters are updated until the input and output loss function L reaches the minimum value. The formula of the loss function L is:

步骤S6中采用精确率、召回率和F1值作为评价指标，计算公式为：In step S6, precision, recall and F1 value are used as evaluation indicators, and the calculation formula is:

步骤S3中LSTM-AE模型利用LSTM的存储能力，使得经过编码器处理的数据保持时序数据长依赖性，同时将高维输入向量压缩到低维向量。In step S3, the LSTM-AE model uses the storage capacity of LSTM to ensure that the data processed by the encoder maintains the long-term dependency of time series data, while compressing the high-dimensional input vector into a low-dimensional vector.

步骤S5中运用动态阈值法计算时序数据的阈值ε。In step S5, a dynamic threshold method is used to calculate the threshold ε of the time series data.

本发明的理论基础：Theoretical basis of the present invention:

(a)长短期记忆网络模型(a) Long short-term memory network model

长短期记忆网络是循环神经网络RNN的一种变体，RNN隐藏层的信息只来源于当前输入和上一时刻隐藏层的信息，没有记忆功能。为了解决RNN无法实现长期依赖的问题，LSTM模型引入了细胞状态，并且使用输入门、遗忘门和输出门三种门来保持和控制信息。Long short-term memory network is a variant of recurrent neural network RNN. The information of RNN hidden layer only comes from the current input and the information of the hidden layer at the previous moment, and it has no memory function. In order to solve the problem that RNN cannot achieve long-term dependence, LSTM model introduces cell state and uses three gates: input gate, forget gate and output gate to maintain and control information.

读取上一时刻输出门的隐藏状态h_t-1和当前时刻的输入x_t，运用Sigmoid激活函数δ(·)计算遗忘门的输出f_t。通过f_t控制上一时刻细胞状态C_t-1需要遗忘的信息。其中，W_f和b_f为遗忘门的权重和偏置。Read the hidden state h _t-1 of the output gate at the previous moment and the input x _t at the current moment, and use the Sigmoid activation function δ(·) to calculate the output _ft of the forget gate. Use _ft to control the information that needs to be forgotten in the cell state C _t-1 at the previous moment. Among them, W _f and b _f are the weight and bias of the forget gate.

f_t＝δ(W_f[h_t-1，x_t]+b_f)f _t =δ (W _f [h _t-1 , x _t ]+b _f )

计算输入门的值i_t和当前时刻的临时细胞状态

控制上一层h_t-1和x_t通过输入门需要保留的信息。其中，W_i和W_c分别为输入门和细胞状态的权重，b_i和b_c分别为输入门和细胞状态的偏置。Calculate the value of the input gate i _t and the temporary cell state at the current moment

Control the information that needs to be retained by the previous layer h _t-1 and x _t through the input gate. Among them, _Wi and _Wc are the weights of the input gate and cell state, respectively, and _bi and _bc are the biases of the input gate and cell state, respectively.

i_t＝δ(W_i[h_t-1，x_t]+b_i)i _t = δ (W _i [h _t-1 , x _t ]+b _i )

计算当前时刻细胞状态C_t、输出门的值o_t和输出门的隐藏状态h_t。其中，W_o和b_o分别为输出门的权重和偏置。Calculate the current cell state C _t , the value of the output gate o _t and the hidden state of the output gate h _t . Among them, W _o and b _o are the weight and bias of the output gate respectively.

o_t＝δ(W_o[h_t-1，x_t]+b_o)o _t = δ (W _o [h _t-1 , x _t ]+b _o )

h_t＝o_ttan(C_t)h _t = o _t tan(C _t )

LSTM单元先利用上一时刻输出门的隐藏状态h_t-1和当前时刻的输入x_t，计算三个门及当前时刻的临时细胞状态

再结合遗忘门的输出f_t和输入门的值i_t更新当前时刻细胞状态C_t。最后，结合输出门的值o_t将内部信息传递到外部隐藏状态h_t。The LSTM unit first uses the hidden state of the output gate at the previous moment h _t-1 and the input x _t at the current moment to calculate the temporary cell states of the three gates and the current moment.

Then combine the output of the forget gate _ft and the value of the input gate _{it to} update the current cell state _Ct . Finally, combine the value of the output gate _ot to pass the internal information to the external hidden state _ht .

(b)自编码器模型(b) Autoencoder model

自编码器是1986年提出的一种神经网络模型，由编码器和解码器构成。编码器进行模型的编码过程，解码器进行模型的解码过程。通过反向传播算法优化自编码器的参数，使得输入和输出的损失误差达到最小值，此时模型训练到最优。The autoencoder is a neural network model proposed in 1986, which consists of an encoder and a decoder. The encoder performs the encoding process of the model, and the decoder performs the decoding process of the model. The parameters of the autoencoder are optimized through the back propagation algorithm so that the loss error between the input and output reaches the minimum value. At this time, the model is trained to the optimal value.

输入数据X经过编码，对其进行加权求和。结合编码器的偏置，通过激活函数f_m(·)计算得编码Y。其中，R_m和b_m分别为编码器的权重和偏置。The input data X is encoded and weighted and summed. Combined with the bias of the encoder, the encoding Y is calculated through the activation function f _m (·). Among them, R _m and b _m are the weight and bias of the encoder respectively.

Y＝f_m(R_mX+b_m)Y＝f _m (R _m X+b _m )

将编码Y以同样的计算方式，通过激活函数f_n(·)输出

其中，R_n和b_n分别为解码器的权重和偏置。The code Y is calculated in the same way and output through the activation function f _n (·)

Among them, _Rn and _bn are the weight and bias of the decoder respectively.

选用均方误差作为损失函数L，运用梯度下降法对损失函数L做最小化处理。反向传播算法逐步更新模型参数θ使得损失误差不断接近最小值，此时可得最小损失误差E。The mean square error is selected as the loss function L, and the gradient descent method is used to minimize the loss function L. The back propagation algorithm gradually updates the model parameters θ so that the loss error continues to approach the minimum value, and the minimum loss error E can be obtained at this time.

本发明实施例提供了一种基于LSTM-AE-DT模型的中低压燃气调压器异常检测方法。在数据预处理时，设置滑动窗口宽度为60，滑动步长为1。动态阈值法计算阈值时，根据燃气调压器出口压力时序数据的特点，设置权重Z＝(1.25,1.30,…2.0)。本发明使用Python3.7基于TensorFlow2.2构建模型，运用Windows11操作系统，在处理器为Inteli5-12500H，内存为16G的硬件设备上进行实验。具体步骤如下：The embodiment of the present invention provides a method for detecting abnormalities of medium and low pressure gas pressure regulators based on the LSTM-AE-DT model. During data preprocessing, the sliding window width is set to 60 and the sliding step size is set to 1. When calculating the threshold using the dynamic threshold method, the weight Z=(1.25, 1.30, ... 2.0) is set according to the characteristics of the gas pressure regulator outlet pressure time series data. The present invention uses Python 3.7 to build a model based on TensorFlow 2.2, uses the Windows 11 operating system, and conducts experiments on a hardware device with an Intel i5-12500H processor and 16G memory. The specific steps are as follows:

步骤一：采集中低压燃气调压器出口压力数据；Step 1: Collect outlet pressure data of medium and low pressure gas pressure regulators;

具体过程如下：The specific process is as follows:

选取某能源公司通过SCADA系统采集的2021年6月25日～2021年10月9日的燃气调压器出口压力数据集，每间隔5分钟采集一次数据，共采集30528条数据。前21370条为训练集，后9158条为测试集。A gas regulator outlet pressure data set collected by an energy company through the SCADA system from June 25, 2021 to October 9, 2021 was selected. Data was collected every 5 minutes, and a total of 30,528 data were collected. The first 21,370 data were used as training sets, and the last 9,158 data were used as test sets.

步骤二，对燃气调压器出口压力数据进行预处理；Step 2: preprocessing the gas pressure regulator outlet pressure data;

具体过程如下：The specific process is as follows:

假设燃气调压器原始出口压力时序训练数据集为X＝(x₁,x₂,…,x_j,…,x_N),1≤j≤N。首先，通过计算步骤一所获得的数据中缺失数据的值前三天同一时间点出口压力值的平均值，对缺失数据的值进行填补。然后，运用离差标准化方法对步骤一中所获得的原始时序数据集进行处理，可得到标准化的时序数据集X′＝(x′₁，x′₂，…，x′_j，…，x′_N)，x′_j∈[0，1]，公式如下：Assume that the original outlet pressure time series training data set of the gas pressure regulator is X = (x ₁ , x ₂ , …, x _j , …, x _N ), 1≤j≤N. First, the missing data values are filled by calculating the average value of the outlet pressure value at the same time point three days before the missing data value in the data obtained in step 1. Then, the deviation standardization method is used to process the original time series data set obtained in step 1, and a standardized time series data set X′ = (x′ ₁ , x′ ₂ , …, x′ _j , …, x′ _N ), x′ _j ∈ [0, 1], can be obtained, and the formula is as follows:

步骤三，建立LSTM-AE模型，输入预处理后的训练数据对模型进行训练，并保存训练好的模型；Step 3: Establish an LSTM-AE model, input the preprocessed training data to train the model, and save the trained model;

具体过程如下：The specific process is as follows:

在LSTM-AE模型中，LSTM采用遗忘门筛选信息，输入门保留必要信息并进行编码，将隐藏状态通过输出门输出，传递到下一个LSTM单元进行训练，进而实现了数据特征提取。在AE的编码器和解码器上构建LSTM网络，编码器以固定大小的向量形式获取高维输入数据序列。利用LSTM的存储能力，使得经过编码器处理的数据仍保持时序数据长依赖性，同时将高维输入向量压缩到低维向量。LSTM-AE模型的结构如图2所示。In the LSTM-AE model, LSTM uses a forget gate to filter information, the input gate retains necessary information and encodes it, and the hidden state is output through the output gate and passed to the next LSTM unit for training, thereby realizing data feature extraction. An LSTM network is built on the encoder and decoder of AE, and the encoder obtains a high-dimensional input data sequence in the form of a vector of a fixed size. By utilizing the storage capacity of LSTM, the data processed by the encoder still maintains the long-term dependency of the time series data, while compressing the high-dimensional input vector into a low-dimensional vector. The structure of the LSTM-AE model is shown in Figure 2.

LSTM编码器主要作用是实现LSTM-AE模型的编码过程，同时学习时序数据特征的规律。将经过预处理的数据Y中的每条时序数据y_i，按照顺序输入LSTM单元中。单个LSTM单元的输入为上一时刻经过LSTM单元编码得到的隐藏状态h_t-1、临时细胞状态

和当前时刻的输入单元x′_i。第一个单元编码得到当前时刻的隐藏状态h_t和临时细胞状态

传递到第二个单元，第二个单元决定是否保留第一个单元信息，依次向下传递，直至最后一个LSTM单元。所有数据的信息将通过最后一个LSTM单元输出，输出结果为Z，将Z进行重复编码得RV_Z1，RV_Z2，…，RV_Zl。编码过程如图2LSTM编码器所示。The main function of the LSTM encoder is to implement the encoding process of the LSTM-AE model and learn the rules of time series data features. Each time series data y _i in the preprocessed data Y is input into the LSTM unit in sequence. The input of a single LSTM unit is the hidden state h _t-1 and the temporary cell state obtained by the LSTM unit encoding at the previous moment.

and the current input unit x′ _i . The first unit encodes the current hidden state h _t and the temporary cell state

The information is passed to the second unit, which decides whether to keep the information of the first unit, and is passed down in sequence until the last LSTM unit. All data information will be output through the last LSTM unit, and the output result is Z. Z is repeatedly encoded to obtain RV _Z1 , RV _Z2 , ..., RV _Zl . The encoding process is shown in Figure 2 LSTM encoder.

LSTM解码器主要作用是实现LSTM-AE模型的解码过程。编码器将RV_Z1，RV_Z2，…，RV_Zl中的信息传递到解码器，分别作为LSTM单元的输入。再利用解码器重构(N-l+1)时刻的输出，计算(N-l+1)时刻的隐藏状态，从而实现(N-l+1)时刻的时序数据的重构。依次传递，直到计算最后一个LSTM单元。解码过程如图2LSTM解码器所示。The main function of the LSTM decoder is to realize the decoding process of the LSTM-AE model. The encoder passes the information in RV _Z1 , RV _Z2 , ..., RV _Zl to the decoder as the input of the LSTM unit. The decoder is then used to reconstruct the output at time (N-l+1) and calculate the hidden state at time (N-l+1), thereby realizing the reconstruction of the time series data at time (N-l+1). This is passed in sequence until the last LSTM unit is calculated. The decoding process is shown in Figure 2 LSTM decoder.

原始数据经过离差标准化可得出口压力数据集X′＝(x′₁，x′₂，…，x′_j，…，x′_N)，x′_j∈[0，1]，将其进行分帧操作，将分帧后的数据y_i＝(x′_i，x′_i+1，…，x′_i+l-1)，1≤i≤N-l+1输入LSTM-AE模型，，运用重构误差计算模型的损失函数，进行模型的训练，其中，重构的时序数据集

x′_j表示第j个出口压力数据，n表示x′_j重构的次数，

表示x′_j第k次重构的出口压力数据，计算燃气调压器出口压力的预测值与燃气调压器出口压力的真实值的重构误差e_j：The original data is normalized by deviation to obtain the outlet pressure data set X′＝(x′ ₁ ，x′ ₂ ，…，x′ _j ，…，x′ _N ), x′ _j ∈[0，1], which is framed and the framed data y _i ＝(x′ _i ，x′ _i+1 ，…，x′ _i+l-1 ), 1≤i≤N-l+1 is input into the LSTM-AE model. The loss function of the reconstruction error calculation model is used to train the model. The reconstructed time series data set

x′ _j represents the jth outlet pressure data, n represents the number of times x′ _j is reconstructed,

x′ _j represents the outlet pressure data reconstructed for the kth time, and the reconstruction error e _j between the predicted value of the outlet pressure of the gas pressure regulator and the true value of the outlet pressure of the gas pressure regulator is calculated:

使用反向传播算法优化模型更新网络参数，直至输入和输出的损失函数达到最小化。The back propagation algorithm is used to optimize the model and update the network parameters until the loss function of the input and output is minimized.

步骤四，将预处理后的测试数据输入到训练好的算法模型，使用滑动窗口为l，输入l个数据，可以预测出第l+1个数据，从而输出燃气调压器出口压力的预测值；Step 4: Input the preprocessed test data into the trained algorithm model, use a sliding window of l, input l data, and predict the l+1th data, thereby outputting the predicted value of the gas pressure regulator outlet pressure;

步骤五，重构燃气调压器出口压力的预测值与燃气调压器出口压力的真实值的误差，运用动态阈值法，计算出该段时序数据的阈值。Step five, reconstruct the error between the predicted value of the gas pressure regulator outlet pressure and the true value of the gas pressure regulator outlet pressure, and use the dynamic threshold method to calculate the threshold of the time series data segment.

具体过程如下：The specific process is as follows:

由x′_j的重构误差e_j的计算方法得到重构误差序列：e＝(e₁，e₂，…，e_j，…，e_N)，对e进行指数加权滑动平均，得到误差序列e′＝(e′₁，e′₂，…，e′_j，…，e′_N)，之后计算阈值序列ε：The reconstruction error sequence is obtained by calculating the reconstruction error e _j of x′ _j : e = (e ₁ , e ₂ , …, e _j , …, e _N ), and an exponentially weighted sliding average is performed on e to obtain the error sequence e′ = (e′ ₁ , e′ ₂ , …, e′ _j , …, e′ _N ), and then the threshold sequence ε is calculated:

ε＝μ(e′)+Z·σ(e′)ε＝μ(e′)+Z·σ(e′)

其中，μ(·)是均值，σ(·)是标准差，Z表示人为设定的权重，重构误差序列e根据上述公式可计算出阈值序列ε。

计算f(ε_t)：Among them, μ(·) is the mean, σ(·) is the standard deviation, Z represents the artificially set weight, and the threshold sequence ε can be calculated from the reconstruction error sequence e according to the above formula.

Calculate f(ε _t ):

Δσ(e′)＝σ({e∈e′|e＜ε_t})Δσ(e′)=σ({e∈e′|e<ε _t })

e_a＝{e∈e′|e＞ε_t}e _a ={e∈e′|e＞ε _t }

步骤六，根据阈值ε检测燃气调压器时序数据的状态，输出燃气调压器正常运行状态或异常运行状态。Step six, detecting the state of the gas pressure regulator timing data according to the threshold ε, and outputting the normal operation state or abnormal operation state of the gas pressure regulator.

为了更全面地对燃气调压器时序数据异常检测结果进行分析与评价，本文采用精确(Precision)、召回率(Recall)和F1值作为评价指标。精确率是指预测为异常的数据中真实为异常的数据所占比率；召回率是指真实为异常的数据中预测为异常的数据所占比率；F1值是精确率和召回率的调和平均值。三个评价指标的计算公式为：In order to more comprehensively analyze and evaluate the anomaly detection results of the gas pressure regulator time series data, this paper uses precision, recall and F1 value as evaluation indicators. Precision refers to the proportion of data predicted to be abnormal in the data that is actually abnormal; recall refers to the proportion of data predicted to be abnormal in the data that is actually abnormal; F1 value is the harmonic mean of precision and recall. The calculation formulas for the three evaluation indicators are:

以上详细描述了本发明的较佳具体实施例。应当理解，本领域的普通技术人员无需创造性劳动就可以根据本发明的构思作出诸多修改和变化。因此，凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案，皆应在由权利要求书所确定的保护范围内。The preferred specific embodiments of the present invention are described in detail above. It should be understood that a person skilled in the art can make many modifications and changes based on the concept of the present invention without creative work. Therefore, any technical solution that can be obtained by a person skilled in the art through logical analysis, reasoning or limited experiments based on the concept of the present invention on the basis of the prior art should be within the scope of protection determined by the claims.

Claims

1. The method for detecting the abnormality of the medium-low pressure gas pressure regulator based on the LSTM-AE-DT model is characterized by comprising the following steps of:

s1, collecting outlet pressure data of a medium-low pressure gas pressure regulator;

s2, preprocessing the outlet pressure data of the gas pressure regulator to obtain training data and test data;

s3, building an LSTM-AE model, inputting preprocessed training data to train the model, and storing the trained model;

s4, inputting the preprocessed test data into a trained algorithm model, and outputting a predicted value of the outlet pressure of the gas pressure regulator;

s5, reconstructing an error between a predicted value of the output gas pressure regulator outlet pressure and a true value of the gas pressure regulator outlet pressure, and a threshold epsilon of the time sequence data;

s6, detecting the state of the time sequence data of the gas pressure regulator according to the threshold epsilon, and outputting the normal operation state or the abnormal operation state of the gas pressure regulator.

2. The abnormality detection method for the medium-low pressure fuel gas pressure regulator based on the LSTM-AE-DT model of claim 1, wherein in the step S2, the fuel gas pressure regulation is obtained through the step S1Raw outlet pressure time series training data set x= (X) ₁ ,x ₂ ,…,x _j ,…,x _N ) Calculating the average value of the outlet pressure values at the same time point three days before the value of the missing data, and filling the value of the missing data; processing the original time sequence data set obtained in the step S1 by using a deviation normalization method to obtain a normalized time sequence data set X ' = (X ') ' ₁ ,x′ ₂ ,…,x′ _j ,…,x′ _N ),x′ _j ∈[0,1]The formula is as follows:

wherein x is _j For the j-th raw data, x' _j For the j-th normalized time series data, min _1≤j≤N {x _j The minimum value in the training data set, max _1≤j≤N {x _j -represents the maximum value in the training dataset; finally, framing the data set X' by a sliding window method, wherein the sliding window width is l, l is more than or equal to 1 and less than or equal to N, the sliding step length is set to be 1, and the data set Y= (Y) subjected to framing operation at the moment ₁ ,y ₂ ,…,y _N-l+1 ) Wherein y is _i ＝(x′ _i ,x′ _i+1 ,…,x′ _i+l-1 )，1≤i≤N-l+1。

3. The method for detecting the abnormality of the medium-low pressure gas pressure regulator based on the LSTM-AE-DT model according to claim 1, wherein in the step S3, the LSTM network adopts forgetting gate screening information, an input gate retains necessary information and codes, a hidden state is output through an output gate, and the hidden state is transmitted to the next LSTM unit for training, and the data is subjected to characteristic extraction.

4. The method for detecting the abnormality of the medium-low pressure gas pressure regulator based on the LSTM-AE-DT model according to claim 1, wherein the step S3 constructs the LSTM network on an encoder and a decoder of AE, and the encoder acquires the high-dimensional input data sequence in a vector form with a fixed size.

5. The anomaly detection method for the medium-low pressure gas pressure regulator based on the LSTM-AE-DT model of claim 1, wherein in the step S4, the data y after framing is obtained _i ＝(x′ _i ，x′ _i+1 ，，x′ _i+l-1 ) Inputting the LSTM-AE model for training to obtain a reconstructed time sequence data set

6. The anomaly detection method for a medium-low pressure fuel gas pressure regulator based on an LSTM-AE-DT model as set forth in claim 1, wherein in said step S5, a reconstruction error e of a predicted value of the outlet pressure of the fuel gas pressure regulator and a true value of the outlet pressure of the fuel gas pressure regulator is calculated _j ：

Wherein x' _j Represents the j-th outlet pressure data, n represents x' _j The number of times of the reconstruction,

represents x' _j The kth reconstructed outlet pressure data, represented by x' _j Reconstruction error e of (2) _j The calculation method of (1) obtains a reconstruction error sequence: e= (e) ₁ ，e ₂ ，…，e _j ，…，e _N ) Exponentially weighted sliding average is performed on e to obtain an error sequence e '= (e' ₁ ，e′ ₂ ，…，e′ _j ，…，e′ _N ) The threshold sequence ε is then calculated:

ε＝μ(e′)+Z·σ(e′)

wherein μ (·) is the mean, σ (·) is the standard deviation, Z represents the artificially set weight,

calculating f (epsilon) _t )：

Δμ(e′)＝μ(e′)-μ({e∈e′|e＜ε _t })

Δσ(e′)＝σ({e∈e′|e＜ε _t })

e _a ＝{e∈e′|e＞ε _t }

Wherein E is _seq E is _a Such that f (epsilon) _t ) The threshold epsilon at which the maximum value is reached is a threshold value, and abnormality detection of time series data is performed based on epsilon.

7. The method for detecting the abnormality of the medium-low pressure gas pressure regulator based on the LSTM-AE-DT model according to claim 1, wherein in the step S5, the LSTM-AE-DT model is optimized by using a back propagation algorithm, and the network parameters are updated until the input and output loss function L reaches a minimum value, and the loss function L has a formula:

wherein e _j Is the reconstruction error of the predicted value of the outlet pressure of the gas pressure regulator and the true value of the outlet pressure of the gas pressure regulator.

8. The method for detecting the abnormality of the medium-low pressure gas pressure regulator based on the LSTM-AE-DT model according to claim 1, wherein in the step S6, the accuracy, the recall rate and the F1 value are adopted as evaluation indexes, and the calculation formula is as follows:

precision is the Precision, recall is the Recall, TP represents the number of abnormal data predicted as abnormal data; FP represents the number of abnormal data predicted from normal data; FN represents the number of abnormal data predicted as normal data.

9. The method for detecting the abnormality of the medium-low pressure gas pressure regulator based on the LSTM-AE-DT model according to claim 1, wherein the LSTM-AE model in step S3 uses the storage capability of the LSTM to make the data processed by the encoder maintain long dependency of time series data, and simultaneously compress the high-dimensional input vector to the low-dimensional vector.

10. The method for detecting the abnormality of the medium-low pressure gas pressure regulator based on the LSTM-AE-DT model according to claim 1, wherein the step S5 is characterized in that a dynamic threshold method is used for calculating the threshold epsilon of the time series data.