CN111814849B

CN111814849B - A fault early warning method for key components of wind turbines based on DA-RNN

Info

Publication number: CN111814849B
Application number: CN202010573207.3A
Authority: CN
Inventors: 杨秦敏; 刘广仑; 鲍雨浓; 陈积明; 孙优贤
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2024-02-06
Anticipated expiration: 2040-06-22
Also published as: CN111814849A

Abstract

The invention discloses a fault early warning method for key components of wind turbine generators based on a recurrent neural network DA-RNN with a dual attention mechanism. This method is based on data sets collected from the Supervisory Control and Data Acquisition (SCADA) system under normal operating conditions of wind turbines. It designs a preprocessing process and selects the DA‑RNN model for real-time estimation of variables. Through multi-threshold setting and discrimination criterion design, the judgment results are output. sequence, and give the final warning result based on the judgment result sequence. In the fault early warning method of the present invention, the preprocessing process is designed for different types of noise data, which provides a reliable data basis; the DA-RNN model comprehensively considers the influence of relevant variables and historical information, and assigns different weights to ensure the accuracy of variable estimation. Accuracy; multiple threshold settings and discrimination criterion design avoid a single 0-1 judgment, making the final early warning result more robust; ultimately realizing key component failure early warning, reducing unit downtime, saving operation and maintenance costs, and having strong Theoretical and practical.

Description

A fault early warning method for key components of wind turbines based on DA-RNN

技术领域Technical field

本发明涉及一种基于DA-RNN的风电机组关键组件故障预警方法，基于风机正常运行状态数据集，设计规范化数据预处理流程，选取DA-RNN作为变量估计模型对目标变量进行实时估计，基于实时运行残差，设计多阈值设置结合多判别准则选取的故障预警策略从而进行关键组件故障预警的方法。The invention relates to a DA-RNN-based fault early warning method for key components of wind turbines. Based on the wind turbine normal operating status data set, a standardized data preprocessing process is designed, and DA-RNN is selected as a variable estimation model to estimate target variables in real time. Based on the real-time Run the residuals and design a fault warning strategy based on multi-threshold settings combined with multi-discrimination criterion selection to provide fault warning for key components.

背景技术Background technique

随着全球污染以及传统化石能源日益匮乏，清洁能源的发展引起了广泛的关注，风能以其清洁无污染的优势得以迅速发展，风电产业也由此成为国内外大力发展的新型可再生能源产业之一。目前，我国的风机装机总容量已位于世界前列，但近几年风力发电市场的快速发展也导致了研发时期的准备不足，风机的运行维护费用居高不下。With global pollution and the increasing scarcity of traditional fossil energy, the development of clean energy has attracted widespread attention. Wind energy has developed rapidly due to its clean and pollution-free advantages. The wind power industry has thus become one of the new renewable energy industries being vigorously developed at home and abroad. one. At present, my country's total installed capacity of wind turbines ranks among the top in the world. However, the rapid development of the wind power market in recent years has also led to insufficient preparation during the research and development period, and the operation and maintenance costs of wind turbines remain high.

风机的高故障率是导致运维费用高的主要因素，风电机组是由多组件多子系统组成的复杂系统，机组通常运行在远郊平原、山区、临海等偏远地区，运行环境恶劣多变，关键组件的故障会导致整机的停机检修，带来大量的经济损失。因此，实现关键组件异常的初期辨识，避免初期异常演变为灾难性故障，实现关键组件的故障预警，从而进行预测性维护，对减少运维成本，实现风电场智能运维具有重大的意义。然而，现有用于故障预警的变量估计模型难以综合考虑相关变量及历史信息的影响，现有简单的预警策略也难以保证预警结果的准确性。因此，选用更精确的变量估计模型，并设计鲁棒性的预警策略，对实现精确的故障预警具有重大的意义。The high failure rate of wind turbines is the main factor leading to high operation and maintenance costs. Wind turbines are complex systems composed of multiple components and subsystems. The units usually operate in remote areas such as suburban plains, mountainous areas, and coastal areas. The operating environment is harsh and changeable. The key is Component failure will cause the entire machine to be shut down for maintenance, resulting in a large amount of economic losses. Therefore, it is of great significance to realize the initial identification of abnormalities in key components, avoid early abnormalities from evolving into catastrophic failures, and achieve early warning of failure of key components to perform predictive maintenance, which is of great significance to reducing operation and maintenance costs and realizing intelligent operation and maintenance of wind farms. However, it is difficult for existing variable estimation models for fault early warning to comprehensively consider the impact of relevant variables and historical information, and it is difficult for existing simple early warning strategies to ensure the accuracy of early warning results. Therefore, selecting a more accurate variable estimation model and designing a robust early warning strategy are of great significance to achieve accurate fault early warning.

发明内容Contents of the invention

本发明目的在于通过对目标变量的精确估计，并设计鲁棒性的预警策略进行风电机组关键组件的故障预警，提出一种基于DA-RNN的风电机组关键组件故障预警方法。该方法选取风电机组正常运行状态的数据集，首先考虑不同噪音数据类型设计数据预处理流程，之后选取DA-RNN作为变量估计模型，综合考虑相关变量及历史信息的影响对目标变量进行实时估计，保证了估计的精确性，再设计鲁棒性的预警策略，考虑不同预警结果要求及不同异常特性，保证了预警结果的灵活性和准确性。该方法能够扩展至风电机组各个具有温度测点的关键组件，实现关键组件的故障预警，具有实用价值，扩展性强。The purpose of the present invention is to provide a fault early warning method for key components of wind turbine generators through accurate estimation of target variables and designing a robust early warning strategy, and propose a fault early warning method for key components of wind turbine generators based on DA-RNN. This method selects the data set of the normal operating status of the wind turbine, first considers different noise data types to design the data preprocessing process, and then selects DA-RNN as the variable estimation model to comprehensively consider the impact of relevant variables and historical information to estimate the target variable in real time. The accuracy of the estimation is ensured, and a robust early warning strategy is designed, taking into account different early warning result requirements and different abnormal characteristics, ensuring the flexibility and accuracy of the early warning results. This method can be extended to key components with temperature measurement points in the wind turbine to achieve early warning of faults of key components. It has practical value and strong scalability.

本发明的目的通过以下的技术方案实现：一种基于DA-RNN的风电机组关键组件故障预警方法，该方法包括以下步骤：The purpose of the present invention is achieved through the following technical solution: a DA-RNN-based wind turbine key component fault early warning method, which method includes the following steps:

1)选取待进行故障预警的风电机组，获取该风机SCADA系统中记录的正常运行状态下N条运行数据，风电机组的关键组件包括齿轮箱、发电机、变桨系统等，选取SCADA系统中待预警组件温度测点所测温度变量作为目标变量y，与该组件温度相关的所有变量作为相关变量X，构造初始训练集 1) Select the wind turbine for which fault warning is to be carried out, and obtain N pieces of operating data under normal operating conditions recorded in the SCADA system of the wind turbine. The key components of the wind turbine include gearboxes, generators, pitch systems, etc. Select the wind turbine to be used for fault warning in the SCADA system. The temperature variable measured at the temperature measuring point of the early warning component is used as the target variable y, and all variables related to the temperature of the component are used as related variables X to construct an initial training set.

2)离线训练阶段，基于初始训练集设计数据预处理流程，预处理步骤包括孤立异常点的剔除与插补，基于运维记录进行风机停机维护期间数据的剔除与插补，以及缺失值插补，将预处理后的训练集[X_train,y_train]作为变量估计模型的输入进行模型训练；2) Offline training phase, based on the initial training set Design the data preprocessing process. The preprocessing steps include the elimination and interpolation of isolated abnormal points, the elimination and interpolation of data during wind turbine shutdown maintenance based on operation and maintenance records, and the interpolation of missing values. The preprocessed training set [X _train , y _train ] as input to the variable estimation model for model training;

3)选取基于双注意力机制的循环神经网络DA-RNN模型作为变量估计模型，选取滑窗长度W，滑窗内时刻表示为t_w,w＝T-W+1,T-W+2,...,T，其中T为当前时刻，通过设计输入注意力机制分配每时刻t_w相关变量对目标变量/>的影响权重/>重构相关变量/>作为编码器输入，编码器部分为多个LSTM单元，每个LSTM单元的输入为滑窗内一个时刻的重构相关变量，编码器部分输出为隐藏向量h；之后设计时间注意力机制分配滑窗内不同历史时刻的隐藏向量对当前时刻目标变量/>的影响权重/>得到语义向量c，使用线性回归模型整合语义向量与目标变量历史值作为解码器的输入，解码器部分为多个LSTM单元，输出为当前时刻上一时刻t_T-1的隐藏状态，再通过线性函数得到当前时刻目标变量的估计值/> 3) Select the recurrent neural network DA-RNN model based on the dual attention mechanism as the variable estimation model, select the sliding window length W, and the time within the sliding window is expressed as t _w , w=T-W+1, T-W+2, ...,T, where T is the current moment, and the input attention mechanism is designed to allocate relevant variables t _w at each moment To the target variable/> The weight of influence/> Refactor related variables/> As the encoder input, the encoder part is composed of multiple LSTM units. The input of each LSTM unit is the reconstructed relevant variable at a moment in the sliding window, and the output of the encoder part is the hidden vector h; then the temporal attention mechanism is designed to allocate the sliding window Hidden vectors at different historical moments within the target variable at the current moment/> The weight of influence/> The semantic vector c is obtained, and a linear regression model is used to integrate the semantic vector and the historical value of the target variable as the input of the decoder. The decoder part is composed of multiple LSTM units, and the output is the hidden state of the previous moment t _T-1 at the current moment, and then through linear The function obtains the estimated value of the target variable at the current moment/>

4)将训练集中的目标变量实际值减去对应时刻的模型输出，即目标变量估计值，得到训练集估计残差序列，求取残差序列均值μ_train及标准差σ_train；4) Subtract the model output at the corresponding moment from the actual value of the target variable in the training set, that is, the estimated value of the target variable, to obtain the estimated residual sequence of the training set, and obtain the mean μ _train and standard deviation σ _train of the residual sequence;

5)基于训练集残差序列进行多阈值设置，多阈值设置为训练集估计残差序列的均值μ_train加减k倍标准差σ_train，分别作为残差序列阈值上下限，其中上限U_r(k)＝μ_train+kσ_train，下限L_r(k)＝μ_train-kσ_train；阈值上限越高，或下限越低，在线应用阶段超过此阈值的数据点数量越少，对应故障预警结果中误报率会下降，漏报率会上升；5) Multi-threshold setting is performed based on the training set residual sequence. The multi-threshold setting is the mean μ _train of the estimated residual sequence of the training set plus or minus k times the standard deviation σ _train , which are respectively used as the upper and lower limits of the residual sequence threshold, where the upper limit U _r ( k)=μ _train +kσ _train , lower limit L _r (k)=μ _train -kσ _train ; the higher the upper limit of the threshold, or the lower the lower limit, the fewer the number of data points exceeding this threshold in the online application phase, corresponding to the fault warning results The false positive rate will decrease and the false negative rate will increase;

6)在线应用阶段，基于实时运行数据点d，基于离线阶段训练完成的DA-RNN模型，得到数据点d的实际测量值减去模型估计值的估计残差值r_d；6) In the online application phase, based on the real-time running data point d, based on the DA-RNN model trained in the offline phase, the estimated residual value r _d of the actual measured value of the data point d minus the model estimated value is obtained;

7)选取k的一个取值，确定阈值上下限，如果r_d超过当前k值对应阈值，即大于阈值上限U_r(k)或小于阈值下限L_r(k)，计算该数据点d之前连续超过阈值的数据点个数count1(k)，以及该数据点d前一天时间范围内超过阈值的数据点个数占一天内总数据点个数的百分比数值count2(k)％，如果r_d处于阈值上限U_r(k)与阈值下限L_r(k)之间，输出两个判断结果0；7) Select a value of k and determine the upper and lower limits of the threshold. If r _d exceeds the threshold corresponding to the current k value, that is, is greater than the upper threshold U _r (k) or less than the lower threshold L _r (k), calculate the continuous values before the data point d The number of data points that exceed the threshold count1(k), and the percentage value of the number of data points that exceed the threshold in the time range of the previous day of data point d to the total number of data points in one day count2(k)%, if r _d is in Between the upper threshold U _r (k) and the lower threshold L _r (k), two judgment results 0 are output;

8)进行多判别准则设置，多判别准则为连续超限判别准则结合百分比超限判别准则，在当前k值下设置连续超限判别准则阈值参数S(k)以及百分比超限判别准则阈值参数P(k)％，当满足条件count1(k)≥S(k)时该k-S(k)预警策略组合的判断结果记为1，不满足条件时该判断结果记0，满足条件count2(k)≥P(k)时该k-P(k)预警策略组合的判断结果记1，不满足条件时该判断结果记0；8) Set the multi-discrimination criterion. The multi-discrimination criterion is the continuous over-limit discrimination criterion combined with the percentage over-limit discrimination criterion. Under the current k value, set the threshold parameter S(k) of the continuous over-limit discrimination criterion and the threshold parameter P of the percentage over-limit discrimination criterion. (k)%. When the condition count1(k)≥S(k) is met, the judgment result of the k-S(k) early warning strategy combination is recorded as 1. When the condition is not met, the judgment result is recorded as 0. When the condition count2(k)≥ When P(k), the judgment result of the k-P(k) early warning strategy combination is recorded as 1, and when the conditions are not met, the judgment result is recorded as 0;

9)对k的所有其他取值，重复步骤7)与步骤8)，得到判别结果序列，基于判别结果序列判断对于实时数据点d是否给出最终报警。9) For all other values of k, repeat steps 7) and 8) to obtain the discrimination result sequence. Based on the discrimination result sequence, determine whether to give a final alarm for the real-time data point d.

进一步地，所述步骤2)中，离线训练的预处理流程包括以下步骤：Further, in step 2), the preprocessing process of offline training includes the following steps:

a)孤立异常点通常是由传感器异常导致的记录错误，通过运行机理进行判断，判断条件为：对于温度变量，数值大于150度或小于0度；对于风速变量，数值大于机组切出风速或小于切入风速；对于功率变量，数值大于机组额定功率或为负值；满足以上条件时，该数据判断为孤立异常点，予以剔除；a) Isolated abnormal points are usually recording errors caused by sensor abnormalities. They are judged through the operating mechanism. The judgment conditions are: for the temperature variable, the value is greater than 150 degrees or less than 0 degrees; for the wind speed variable, the value is greater than the cut-out wind speed of the unit or less than Cut into the wind speed; for power variables, the value is greater than the rated power of the unit or is a negative value; when the above conditions are met, the data is judged to be an isolated abnormal point and will be eliminated;

b)结合运维记录查看，当风电机组处于运维检修期间，机组处于停机状态，SCADA系统所记录数据通常为0值或系统默认值，该值无法代表机组正常运行状态，因此对停机维护检修期间的数据予以剔除；b) Combined with the operation and maintenance records, when the wind turbine unit is in operation and maintenance, the unit is in a shutdown state. The data recorded by the SCADA system is usually 0 value or the system default value. This value cannot represent the normal operating status of the unit, so shutdown maintenance and inspection are not required. Data during the period will be deleted;

c)为保证数据的时间连续性，对剔除数据以及SCADA系统中缺失值进行插补，插补方法为平均值插补，即取变量插补位置前1小时数据的平均值作为当前时刻插补值。c) In order to ensure the time continuity of the data, the deleted data and missing values in the SCADA system are interpolated. The interpolation method is average interpolation, that is, the average value of the data 1 hour before the variable interpolation position is used as the current time interpolation. value.

进一步地，所述步骤3)中，模型输入为滑窗内相关变量其中n为相关变量的个数，输入注意力机制部分输入为相关变量X及编码器的隐藏状态输出h和记忆单元输出s，输出第k个相关变量对目标变量的影响权重/>计算过程如下：Further, in step 3), the model input is the relevant variables in the sliding window Where n is the number of relevant variables, the input to the attention mechanism part is the relevant variable The calculation process is as follows:

其中v_en,W_en,U_en为所需要学习的参数，在时刻t_w，基于影响权重对相关变量进行重构，得到时刻t_w的重构相关变量，表示为：Among them v _en , W _en , U _en are the parameters that need to be learned. At time t _w , based on the influence weight Reconstruct the relevant variables to obtain the reconstructed relevant variables at time t _w , which is expressed as:

该重构向量作为编码器部分输入，编码器由多个LSTM单元构成，输出为隐藏状态h及记忆单元s，在时刻t_w，LSTM的更新由遗忘门f_t，输入门i_t，输出门o_t决定，其更新规则如下所示：The reconstructed vector is used as the input of the encoder part. The encoder is composed of multiple LSTM units, and the output is the hidden state h and the memory unit s. At time t _w , the LSTM update consists of the forget gate f _t , the input gate i _t , and the output gate o _t is decided, and its update rules are as follows:

其中W_f,W_i,W_o,W_s,b_f,b_i,b_o,b_s为所需要学习的参数，σ为sigmoid函数，⊙代表对应元素相乘；Among them, W _f , W _i , W _o , W _s , b _f , b _i , _bo , and b _s are the parameters that need to be learned, σ is the sigmoid function, and ⊙ represents the multiplication of corresponding elements;

时间注意力机制的输入为隐藏状态h及解码器隐藏状态输出h′和记忆单元输出s′，输出第i个隐藏状态的权重计算过程如下：The input of the temporal attention mechanism is the hidden state h, the decoder hidden state output h′ and the memory unit output s′, and the weight of the i-th hidden state is output. The calculation process is as follows:

其中v_de,W_de,U_de为所需要学习的参数，基于权重β对隐藏状态进行重构，得到语义向量，表示为：Among them, v _de , W _{de and} U _de are the parameters that need to be learned. The hidden state is reconstructed based on the weight β to obtain the semantic vector, which is expressed as:

之后使用线性回归模型整合语义向量与目标变量历史值，得到解码器的输入表示为：Then use a linear regression model to integrate the semantic vector and the historical value of the target variable to obtain the input of the decoder. Expressed as:

其中为所需要学习的参数；in For the parameters that need to be learned;

解码器同样由LSTM单元组成，遗忘门f_t′，输入门i_t′，输出门o_t′的更新规则如下所示：The decoder is also composed of LSTM units. The update rules of forget gate f _t ′, input gate i _t ′, and output gate o _t ′ are as follows:

其中W′_f,W′_i,W′_o,W′_s,b′_f,b′_i,b′_o,b′_s为所需要学习的参数；Among them, W′ _f ,W′ _i ,W′ _o ,W′ _s ,b′ _f ,b′ _i ,b′ _o ,b′ _s are the parameters that need to be learned;

最后通过线性函数得到当前时刻T目标变量的估计值如下所示：Finally, the estimated value of the T target variable at the current moment is obtained through a linear function. As follows:

其中W_T,b_T为所需要学习的参数。Among them, W _T and b _T are the parameters that need to be learned.

进一步地，所述步骤5)中，可以假设训练集残差序列服从正态分布，基于正态分布特性，多阈值设置中k的取值为[1.5,2,2.5,3]。Furthermore, in step 5), it can be assumed that the training set residual sequence obeys the normal distribution. Based on the characteristics of the normal distribution, the value of k in the multi-threshold setting is [1.5, 2, 2.5, 3].

进一步地，所述步骤8)中，多判别准则对应不同的异常特性，连续超限判别准则对应出现连续异常值的异常特性，百分比超限判别准则对应出现数据波动大的异常情况，连续超限判别准则阈值参数S_k及百分比超限判别准则阈值参数P_k％设置为其中C为常数。Further, in the step 8), the multi-discrimination criterion corresponds to different abnormal characteristics, the continuous over-limit criterion corresponds to the abnormal characteristics of continuous outliers, the percentage over-limit criterion corresponds to the abnormal situation of large data fluctuations, and the continuous over-limit criterion corresponds to the abnormal situation of large data fluctuations. The judgment criterion threshold parameter _Sk and the percentage overrun judgment criterion threshold parameter P _k % are set to where C is a constant.

进一步地，所述步骤9)中，对于实时数据点d，每个k值输出两个0-1判断结果，最终输出为一个0-1序列，维度为2k维，当序列中1的个数大于k时，对该实时数据点给出报警，序列中1的个数小于等于k时不给出报警。Further, in step 9), for the real-time data point d, each k value outputs two 0-1 judgment results, and the final output is a 0-1 sequence with a dimension of 2k. When the number of 1s in the sequence When it is greater than k, an alarm will be given for the real-time data point. When the number of 1s in the sequence is less than or equal to k, no alarm will be given.

与现有技术相比，本发明具有以下创新优势及显著效果：Compared with the existing technology, the present invention has the following innovative advantages and significant effects:

1)针对正常运行时期数据集存在的不同类型的噪声数据设计规范化的数据预处理流程，保证了所构造的训练集能够准确表征风电机组正常运行状态，为目标变量的精确估计提供数据基础；1) Design a standardized data preprocessing process for different types of noise data existing in the data set during normal operation, ensuring that the constructed training set can accurately represent the normal operating status of the wind turbine and provide a data basis for accurate estimation of target variables;

2)选取DA-RNN作为变量估计模型，综合考虑相关变量以及历史信息的影响，并通过双层注意力机制确定其对目标变量估计不同的影响权重，保证了模型的可靠性与准确性；2) Select DA-RNN as the variable estimation model, comprehensively consider the influence of relevant variables and historical information, and determine its different impact weights on target variable estimation through a double-layer attention mechanism, ensuring the reliability and accuracy of the model;

3)设计鲁棒性的故障预警策略，多阈值设置能够综合考虑预警结果对不同误报率、漏报率的要求，两种判别准则针对不同的异常特性进行更全面的预警，基于判断结果序列给出最终预警结果，综合考虑不同预警策略组合，保证了预警结果的鲁棒性与准确性；3) Design a robust fault early warning strategy. Multi-threshold settings can comprehensively consider the requirements of early warning results for different false alarm rates and false negative rates. Two discrimination criteria provide a more comprehensive early warning for different abnormal characteristics, based on the judgment result sequence. The final early warning results are given, and different combinations of early warning strategies are comprehensively considered to ensure the robustness and accuracy of the early warning results;

4)本发明为针对温度参数的关键组件故障预警方法，该流程对有对应温度测点的所有风电机组关键组件均适用，具有扩展性。4) The present invention is a key component failure early warning method for temperature parameters. This process is applicable to all key components of wind turbines with corresponding temperature measurement points and is scalable.

附图说明Description of drawings

图1是本发明的风电机组关键组件故障预警方法流程图；Figure 1 is a flow chart of the fault early warning method for key components of wind turbines according to the present invention;

图2是本发明实施例中所选目标变量预处理前的数据图；Figure 2 is a data diagram before preprocessing of selected target variables in the embodiment of the present invention;

图3是本发明实施例中所选目标变量预处理后的数据图；Figure 3 is a data diagram after preprocessing of selected target variables in the embodiment of the present invention;

图4是本发明选取的DA-RNN模型加入输入注意力机制的编码器示意图；Figure 4 is a schematic diagram of the encoder that adds the input attention mechanism to the DA-RNN model selected by the present invention;

图5是本发明选取的DA-RNN模型加入时间注意力机制的解码器示意图；Figure 5 is a schematic diagram of the decoder that adds a temporal attention mechanism to the DA-RNN model selected by the present invention;

图6是本发明实施例的变量估计模型目标变量估计结果图；Figure 6 is a diagram of the target variable estimation results of the variable estimation model according to the embodiment of the present invention;

图7是本发明实施例的多阈值设置示意图；Figure 7 is a schematic diagram of multiple threshold settings according to an embodiment of the present invention;

图8是本发明实施例的预警结果图。Figure 8 is a diagram of early warning results according to the embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图对本发明的具体实施方式做详细的说明。In order to make the above objects, features and advantages of the present invention more obvious and easy to understand, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，但是本发明还可以采用其他不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本发明内涵的情况下做类似推广，因此本发明不受下面公开的具体实施例的限制。Many specific details are set forth in the following description to fully understand the present invention. However, the present invention can also be implemented in other ways different from those described here. Those skilled in the art can do so without departing from the connotation of the present invention. Similar generalizations are made, and therefore the present invention is not limited to the specific embodiments disclosed below.

如图1所示，本申请提出的一种基于DA-RNN的风电机组关键组件故障预警方法，包括：As shown in Figure 1, this application proposes a DA-RNN-based fault early warning method for key components of wind turbines, including:

进一步地，所述步骤3)中，编码器及输入注意力机制部分流程如图4所示，输入为滑窗内相关变量其中n为相关变量的个数，输入注意力机制部分输入为相关变量X及编码器的隐藏状态输出h和记忆单元输出s，输出第k个相关变量对目标变量的影响权重/>计算过程如下：Further, in step 3), part of the process of the encoder and input attention mechanism is shown in Figure 4, and the input is the relevant variable in the sliding window Where n is the number of relevant variables, the input to the attention mechanism part is the relevant variable The calculation process is as follows:

该重构向量作为编码器部分输入，编码器由多个LSTM单元构成，输出为隐藏状态h及记忆单元s，在时刻t_w，LSTM的更新由遗忘门f_t，输入门i_t，输出门o_t决定，其更新规则如下所示：The reconstructed vector is used as the input of the encoder part. The encoder is composed of multiple LSTM units, and the output is the hidden state h and the memory unit s. At time t _w , the LSTM update consists of the forgetting gate f _t , the input gate i _t , and the output gate o _t is decided, and its update rules are as follows:

解码器及时间注意力机制部分流程如图5所示，时间注意力机制的输入为隐藏状态h及解码器隐藏状态输出h′和记忆单元输出s′，输出第i个隐藏状态的权重计算过程如下：Part of the process of the decoder and temporal attention mechanism is shown in Figure 5. The input of the temporal attention mechanism is the hidden state h, the decoder hidden state output h′ and the memory unit output s′, and the weight of the i-th hidden state is output. The calculation process is as follows:

以下给出本申请的一个实施例，并结合表1、表2、图2-8详细说明该实例实施的具体步骤。An example of the present application is given below, and the specific steps for implementing this example are described in detail with reference to Table 1, Table 2, and Figures 2-8.

本实施例针对某风电场某发生过发电机非驱动端轴承组件损坏故障的风电机组进行故障预警，该机组于2017.04.17发生发电机非驱动端轴承损坏，选取该风电机组SCADA系统在2016至2017年采集到的数据进行故障预警，其中SCADA系统的数据采样间隔为5min，数据信息为期16个月，时间范围为2016.01.01 00:00:00至2017.04.30 23:55:00，选取发电机非驱动端轴承温度测点所测温度为目标变量，发电机其他运行参数以及系统参数等所有对目标变量值产生影响的参数作为相关变量。数据集具体变量如表1所示：This embodiment provides a fault warning for a wind turbine in a wind farm that has suffered damage to the non-driving end bearing assembly of the generator. The unit suffered damage to the non-driving end bearing of the generator on April 17, 2017. The SCADA system of the wind turbine was selected from 2016 to The data collected in 2017 are used for fault warning. The data sampling interval of the SCADA system is 5 minutes. The data information lasts for 16 months. The time range is from 2016.01.01 00:00:00 to 2017.04.30 23:55:00. Power generation is selected. The temperature measured at the non-drive end bearing temperature measuring point of the machine is the target variable, and all other operating parameters of the generator and system parameters that affect the target variable value are used as relevant variables. The specific variables of the data set are shown in Table 1:

表1某风电场某风机目标变量及相关变量Table 1 Target variables and related variables of a wind turbine in a wind farm

本实施例中风电机组发电机非驱动端轴承组件故障预警方法的实施数据集即为上述风电机组16个月的运行数据，方法实施步骤具体如下：The implementation data set of the wind turbine generator non-driving end bearing component failure early warning method in this embodiment is the 16-month operating data of the above-mentioned wind turbine generator. The implementation steps of the method are as follows:

1)获取该风机SCADA系统中记录的正常运行状态下运行数据集，数据集包括目标变量发电机非驱动端轴承温度，以及所有相关变量，选取前12个月处于正常运行状态的数据，即2016.01.01 00:00:00至2016.12.31 23:55:00的数据构造初始训练集后4个月，即2017.01.01 00:00:00至2017.04.30 23:55:00的数据构造初始测试集 1) Obtain the operating data set under normal operating conditions recorded in the SCADA system of the wind turbine. The data set includes the target variable generator non-drive end bearing temperature and all related variables. Select the data in normal operating conditions in the previous 12 months, that is, 2016.01 .01 00:00:00 to 2016.12.31 23:55:00 to construct the initial training set Construct the initial test set using data from the last 4 months, that is, 2017.01.01 00:00:00 to 2017.04.30 23:55:00

2)对初始训练集中所有变量进行数据预处理，预处理步骤包括孤立异常点的剔除与插补，基于运维记录进行风机停机维护期间数据的剔除与插补，以及缺失值插补，在本实施例中，切入风速为2m/s，额定风速为14m/s，切出风速为25m/s，额定功率为1500kW，将测试集/>作为在线应用时的实时运行数据集，其中目标变量发电机非驱动端轴承温度预处理前的训练集加测试集数据如图2所示，缺失值与停机期间数据均以0值出现，预处理后的训练集加测试集数据如图3所示；2) For the initial training set All variables in the data are preprocessed. The preprocessing steps include the elimination and interpolation of isolated abnormal points, the elimination and interpolation of data during fan shutdown and maintenance based on operation and maintenance records, and the interpolation of missing values. In this embodiment, cut in The wind speed is 2m/s, the rated wind speed is 14m/s, the cut-out wind speed is 25m/s, and the rated power is 1500kW. The test set/> As a real-time operating data set for online applications, the training set plus test set data before preprocessing of the target variable generator non-drive end bearing temperature is shown in Figure 2. Missing values and data during shutdown all appear as 0 values. Preprocessing The final training set and test set data are shown in Figure 3;

3)选取DA-RNN模型作为变量估计模型，选取滑窗长度W＝5，滑窗内时刻为t_w,w＝T-4,T-3,...,T，通过设计输入注意力机制分配每时刻t_w相关变量对目标变量/>的影响权重/>重构相关变量/>作为编码器输入，编码器部分为多个LSTM单元，每个LSTM单元的输入为滑窗内一个时刻的重构相关变量，编码器部分输出为隐藏向量h；之后设计时间注意力机制分配滑窗内不同历史时刻的隐藏向量对当前时刻目标变量/>的影响权重/>得到语义向量c，使用线性回归模型整合语义向量与目标变量历史值作为解码器的输入，解码器部分为多个LSTM单元，输出为当前时刻上一时刻t_T-1的隐藏状态，再通过线性函数得到当前时刻目标变量的估计值/>通过训练集训练好模型之后输入测试集得到测试集的实时估计值，变量估计结果如图6所示，黑色圆圈为故障确认时刻对应目标变量的实际运行值；3) Select the DA-RNN model as the variable estimation model, select the sliding window length W = 5, and the time in the sliding window as t _w , w = T-4, T-3,..., T, and input the attention mechanism through design Assign relevant variables at each time t _w To the target variable/> The weight of influence/> Refactor related variables/> As the encoder input, the encoder part is composed of multiple LSTM units. The input of each LSTM unit is the reconstructed relevant variable at a moment in the sliding window, and the output of the encoder part is the hidden vector h; then the temporal attention mechanism is designed to allocate the sliding window Hidden vectors at different historical moments within the target variable at the current moment/> The weight of influence/> The semantic vector c is obtained, and a linear regression model is used to integrate the semantic vector and the historical value of the target variable as the input of the decoder. The decoder part is composed of multiple LSTM units, and the output is the hidden state of the previous moment t _T-1 at the current moment, and then through linear The function obtains the estimated value of the target variable at the current moment/> After training the model through the training set, input the test set to obtain the real-time estimated value of the test set. The variable estimation results are shown in Figure 6. The black circle is the actual operating value of the target variable corresponding to the fault confirmation time;

5)基于训练集残差序列进行多阈值设置，多阈值设置为训练集估计残差序列的均值μ_train加减k倍标准差σ_train，分别作为残差序列阈值上下限，其中上限U_r(k)＝μ_train+kσ_train，下限L_r(k)＝μ_train-kσ_train，其中k取值为[1.5,2,2.5,3]，图7中，黑色实线为多个阈值上限，黑色虚线为多个阈值下限；5) Multi-threshold setting is performed based on the training set residual sequence. The multi-threshold setting is the mean μ _train of the estimated residual sequence of the training set plus or minus k times the standard deviation σ _train , which are respectively used as the upper and lower limits of the residual sequence threshold, where the upper limit U _r ( k)=μ _train +kσ _train , the lower limit L _r (k)=μ _train -kσ _train , where the value of k is [1.5, 2, 2.5, 3]. In Figure 7, the black solid lines are the upper limits of multiple thresholds. The black dotted lines are the lower limits of multiple thresholds;

6)测试集作为在线应用阶段数据集，基于测试集中实时运行数据点d，通过离线阶段训练完成的DA-RNN模型，得到数据点d的实际测量值减去模型估计值的估计残差值r_d；6) The test set is used as a data set in the online application phase. Based on the real-time running of data point d in the test set, through the DA-RNN model trained in the offline phase, the estimated residual value r of the actual measured value of data point d minus the model estimated value is obtained. _d ;

8)进行多判别准则设置，多判别准则为连续超限判别准则结合百分比超限判别准则，在当前k值下设置连续超限判别准则阈值参数S(k)以及百分比超限判别准则阈值参数P(k)％，本实施例中参数设置为当满足条件count1(k)≥S(k)时该k-S(k)预警策略组合的判断结果记为1，不满足条件时该判断结果记0，满足条件count2(k)≥P(k)时该k-P(k)预警策略组合的判断结果记1，不满足条件时该判断结果记0；8) Set the multi-discrimination criterion. The multi-discrimination criterion is the continuous over-limit discrimination criterion combined with the percentage over-limit discrimination criterion. Under the current k value, set the threshold parameter S(k) of the continuous over-limit discrimination criterion and the threshold parameter P of the percentage over-limit discrimination criterion. (k)%, in this embodiment the parameters are set to When the condition count1(k)≥S(k) is met, the judgment result of the kS(k) early warning strategy combination is recorded as 1. When the condition is not met, the judgment result is recorded as 0. When the condition count2(k)≥P(k) is met, the judgment result is recorded as 1. The judgment result of this kP(k) early warning strategy combination is recorded as 1, and when the conditions are not met, the judgment result is recorded as 0;

9)对k的所有其他取值，重复步骤7)与步骤8)，每个k值输出两个0-1判断结果，最终对该数据点d输出一个0-1序列，本实施例中序列维度为8维，当序列中1的个数大于4时，对该数据点d给出报警，序列中1的个数小于等于4时不给出报警，对测试集所有数据点执行此操作，图8为最终预警结果，黑色星号为给出报警的数据点，黑色圆圈为故障确认时刻的数据点，最早报警时间为2017.03.23 02:00:00，该时间点的判断结果如下表所示：9) Repeat steps 7) and 8) for all other values of k. Each k value outputs two 0-1 judgment results, and finally outputs a 0-1 sequence for the data point d. In this embodiment, the sequence The dimension is 8. When the number of 1 in the sequence is greater than 4, an alarm is given for the data point d. When the number of 1 in the sequence is less than or equal to 4, no alarm is given. This operation is performed for all data points in the test set. Figure 8 shows the final warning result. The black asterisks are the data points that gave the alarm, and the black circles are the data points at the fault confirmation time. The earliest alarm time is 2017.03.23 02:00:00. The judgment results at this time point are as follows: Show:

表2最早报警时刻判断结果Table 2 Judgment results of earliest alarm time

参数选取Parameter selection 百分比超限Percent over limit 连续超限Continuously exceeding the limit k＝1.5,S＝P＝60k=1.5,S=P=60 00 11 k＝2,S＝P＝45k=2,S=P=45 11 11 k＝2.5,S＝P＝36k=2.5,S=P=36 11 11 k＝3,S＝P＝30k=3,S=P=30 00 11

判断结果序列为[0,1,1,1,1,1,0,1]，1的个数为6个，因此在该时刻最终给出报警，在故障发生前24天实现预警。The judgment result sequence is [0,1,1,1,1,1,0,1], and the number of 1's is 6, so an alarm is finally given at this moment, and an early warning is implemented 24 days before the fault occurs.

本发明风电机组关键组件故障预警方法，主要包括目标变量及相关变量选取，数据预处理，变量估计模型训练，实时运行残差获取，鲁棒性预警策略设计等环节。图1为本发明的关键组件故障预警的具体流程，图2，图3为本发明实施例中所选目标变量预处理前后的数据图，图4为本发明选取的DA-RNN模型加入输入注意力机制的编码器示意图，图5为本发明选取的DA-RNN模型加入时间注意力机制的解码器示意图，图6为本发明实施例的变量估计模型目标变量估计结果图，图7为本发明实施例的多阈值设置示意图，图8为本发明实施例的预警结果图，该结果显示，本发明能够在故障发生前实现准确的报警，结果具有有效性与可靠性。The invention's wind turbine key component fault early warning method mainly includes target variable and related variable selection, data preprocessing, variable estimation model training, real-time operation residual acquisition, robust early warning strategy design and other links. Figure 1 is the specific process of key component failure warning of the present invention. Figure 2 and Figure 3 are data diagrams before and after preprocessing of the selected target variables in the embodiment of the present invention. Figure 4 is the DA-RNN model selected by the present invention adding input attention. Schematic diagram of the encoder of the force mechanism. Figure 5 is a schematic diagram of the decoder of the DA-RNN model selected by the present invention adding the temporal attention mechanism. Figure 6 is a diagram of the target variable estimation results of the variable estimation model in the embodiment of the present invention. Figure 7 is a diagram of the target variable estimation results of the present invention. A schematic diagram of multi-threshold settings of the embodiment. Figure 8 is an early warning result diagram of the embodiment of the present invention. The result shows that the present invention can achieve accurate alarm before a fault occurs, and the result is effective and reliable.

上述实施例只是本发明的举例，尽管为说明目的公开了本发明的最佳实例和附图，但是本领域的技术人员可以理解：在不脱离本发明及所附的权利要求的精神和范围内，各种替换、变化和修改都是可能的。因此，本发明不应局限于最佳实施例和附图所公开的内容。The above embodiments are only examples of the present invention. Although the best examples and drawings of the present invention are disclosed for illustrative purposes, those skilled in the art can understand that: without departing from the spirit and scope of the present invention and the appended claims, , various substitutions, changes and modifications are possible. Therefore, the present invention should not be limited to the disclosure of the preferred embodiments and drawings.

Claims

1. A fault early warning method for key components of a wind turbine generator based on DA-RNN is characterized by comprising the following steps:

1) Selecting a wind turbine generator to be subjected to fault early warning, acquiring N pieces of operation data recorded in a SCADA (supervisory control and data acquisition) system of the fan in a normal operation state, and selecting a temperature variable measured at a temperature measuring point of a component to be early warned in the SCADA system as a target variable y, wherein the temperature variable is related to the temperature of the componentWith variables as related variables X, constructing an initial training set

2) Offline training stage, based on initial training setDesigning a data preprocessing flow, wherein the preprocessing step comprises the steps of eliminating and interpolating isolated abnormal points, eliminating and interpolating data during fan shutdown maintenance based on operation and maintenance records, interpolating missing values, and preprocessing a training set [ X ] _train ,y _train ]Performing model training as input of a variable estimation model;

3) Selecting a circulating neural network DA-RNN model based on a double-attention mechanism as a variable estimation model, selecting a sliding window length W, and expressing the moment in the sliding window as t _w W=t-w+1, T-w+2, &..t, where T is the current time, each time T is assigned by designing an input attention mechanism _w Related variableFor the target variable->Influence weight of->Reconstruction related variable +.>As encoder input, the encoder part is a plurality of LSTM units, the input of each LSTM unit is a reconstruction related variable at one moment in the sliding window, and the encoder part outputs as a hidden vector h; then designing a time attention mechanism to distribute hidden vectors of different historical moments in the sliding window to target variables of the current moment +.>Influence weight of->Obtaining a semantic vector c, integrating the semantic vector and a target variable history value by using a linear regression model as input of a decoder, wherein the decoder is divided into a plurality of LSTM units and outputs the LSTM units as a time t before the current time _T-1 Obtaining the estimated value of the target variable at the current moment through a linear function>

4) Subtracting the model output at the corresponding moment from the actual value of the target variable in the training set, namely the estimated value of the target variable, obtaining an estimated residual sequence of the training set, and obtaining the average value mu of the residual sequence _train Standard deviation sigma _train ；

5) Multi-threshold setting is carried out based on the residual sequence of the training set, and the multi-threshold setting is that the average value mu of the estimated residual sequence of the training set is set _train Plus or minus k times standard deviation sigma _train Respectively used as the upper and lower limits of the residual sequence threshold, wherein the upper limit U _r (k)＝μ _train +kσ _train Lower limit L _r (k)＝μ _train -kσ _train ；

6) In the online application stage, based on real-time operation data point d and on DA-RNN model trained in the offline stage, an estimated residual value r of the actual measured value of the data point d minus the estimated value of the model is obtained _d ；

7) Selecting a value of k, determining upper and lower thresholds, if r _d Exceeding the corresponding threshold of the current k value, i.e. greater than the upper threshold limit U _r (k) Or less than the threshold lower limit L _r (k) Calculating the number of data points count1 (k) which continuously exceed the threshold value before the data point d, and calculating the percentage value count2 (k)% of the number of data points which exceed the threshold value in the time range of day before the data point d, if r _d At the upper threshold limit U _r (k) And a lower threshold limit L _r (k) Outputting two judgment results 0;

8) Setting multiple discriminants, wherein the multiple discriminants are continuous overrun discriminants combined with percentage overrun discriminants, setting a continuous overrun discriminant threshold parameter S (k) and a percentage overrun discriminant threshold parameter P (k)%, recording the judgment result of the k-S (k) early warning strategy combination as 1 when the condition count1 (k) is more than or equal to S (k) is met, recording 0 when the condition count2 (k) is not met, recording 1 when the condition count2 (k) is more than or equal to P (k) is met, and recording 0 when the condition is not met;

9) Repeating the step 7) and the step 8) for all other values of k to obtain a judging result sequence, and judging whether a final alarm is given for the real-time data point d based on the judging result sequence.

2. The method for early warning of faults of key components of a wind turbine generator system based on DA-RNN according to claim 1, wherein in the step 2), the pretreatment flow of offline training comprises the following steps:

a) The isolated abnormal point is usually a recording error caused by sensor abnormality, and is judged by an operation mechanism, and the judgment conditions are as follows: for temperature variables, the value is greater than 150 degrees or less than 0 degrees; for the wind speed variable, the numerical value is larger than the cut-out wind speed of the unit or smaller than the cut-in wind speed; for the power variable, the numerical value is larger than the rated power of the unit or is a negative value; when the conditions are met, the data are judged to be isolated abnormal points and are removed;

b) When the wind turbine generator is in an operation, maintenance and overhaul period, the wind turbine generator is in a shutdown state, and the data recorded by the SCADA system is usually 0 value or a system default value, wherein the value cannot represent the normal operation state of the wind turbine generator, so that the data in the shutdown, maintenance and overhaul period are removed;

c) In order to ensure the time continuity of the data, the missing values in the removed data and the SCADA system are interpolated, wherein the interpolation method is mean value interpolation, namely, the mean value of the data 1 hour before the variable interpolation position is taken as the interpolation value at the current moment.

3. The method for early warning of faults of key components of a wind turbine generator based on DA-RNN as claimed in claim 1, wherein in the step 3), the model input is related to a sliding windowVariable(s)Wherein n is the number of related variables, the input attention mechanism part inputs the related variables X, the hidden state output h and the memory unit output s of the encoder, and the influence weight of the kth related variable on the target variable is output->The calculation process is as follows:

wherein v is _en ,W _en ,U _en For the parameter to be learned, at time t _w Based on impact weightsReconstructing the related variable to obtain a time t _w Is expressed as:

the reconstructed vector is input as an encoder part, the encoder is composed of a plurality of LSTM units, and the reconstructed vector is output as a hidden state h and a memory unit s, at a time t _w The update of LSTM is performed by forget gate f _t Input gate i _t Output gate o _t The decision, its update rule is as follows:

wherein W is _f ,W _i ,W _o ,W _s ,b _f ,b _i ,b _o ,b _s For the parameter to be learned, σ is a sigmoid function, and the corresponding elements are multiplied by the ";

the input of the time attention mechanism is a hidden state h, a decoder hidden state output h 'and a memory unit output s', and the weight of the ith hidden state is outputThe calculation process is as follows:

wherein v is _de ,W _de ,U _de Reconstructing the hidden state based on the weight beta to obtain a semantic vector for the parameter to be learned, wherein the semantic vector is expressed as follows:

after which the wire is usedIntegrating semantic vectors and historical values of target variables by using a sexual regression model to obtain input of a decoderExpressed as:

wherein the method comprises the steps ofIs a parameter to be learned;

the decoder is also composed of LSTM units, forgetting the gate f _t ' input gate i _t ' output door o _t The update rule of' is as follows:

wherein W' _f ,W′ _i ,W′ _o ,W′ _s ,b′ _f ,b′ _i ,b′ _o ,b′ _s Is a parameter to be learned;

finally, obtaining the current time T through a linear functionEstimated value of target variableThe following is shown:

wherein W is _T ,b _T Is a parameter to be learned.

4. The method for early warning faults of key components of a wind turbine generator system based on DA-RNN according to claim 1, wherein in the step 5), the training set residual sequence is assumed to be subjected to normal distribution, and k in the multi-threshold setting is [1.5,2,2.5,3] based on normal distribution characteristics.

5. The method for early warning faults of key components of a wind turbine generator system based on DA-RNN as claimed in claim 1, wherein in the step 8), the multiple discriminant criteria correspond to different abnormal characteristics, the continuous overrun discriminant criteria correspond to abnormal characteristics with continuous abnormal values, the percentage overrun discriminant criteria correspond to abnormal conditions with large data fluctuation, and the continuous overrun discriminant criteria threshold parameter S _k Threshold parameter P of percentage overrun criterion _k % is set asWherein C is a constant.

6. The method for early warning faults of key components of a wind turbine generator system based on DA-RNN according to claim 1 is characterized in that in the step 9), for a real-time data point d, two 0-1 judgment results are output for each k value, a 0-1 sequence is finally output, the dimension is 2k dimensions, when the number of 1 in the sequence is greater than k, an alarm is given to the real-time data point, and when the number of 1 in the sequence is less than or equal to k, no alarm is given.