CN113127237A

CN113127237A - Main fault identification method and system of wind generating set

Info

Publication number: CN113127237A
Application number: CN201911376358.3A
Authority: CN
Inventors: 李长俊; 张言军; 薛袭峰; 贺海涛; 刘芳
Original assignee: Jiangsu Jinfeng Software Technology Co ltd; Qinghai Green Energy Data Co ltd; Beijing Goldwind Smart Energy Service Co Ltd
Current assignee: Jiangsu Jinfeng Software Technology Co ltd; Qinghai Green Energy Data Co ltd; Beijing Goldwind Smart Energy Service Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2021-07-16

Abstract

The disclosure provides a method and a system for identifying a main fault of a wind generating set. The main fault identification method comprises the following steps: acquiring a data sequence related to a main fault in an SCADA (supervisory control and data acquisition) system, wherein the data sequence comprises at least one piece of data, and the type of the at least one piece of data comprises at least one of dynamic fault data, remote signaling data and main fault displacement data; formatting the data sequence related to the main fault to obtain an intermediate data sequence, wherein each piece of intermediate data in the intermediate data sequence has a corresponding time identifier, a data type mark and a doubtful degree; obtaining at least one piece of result data based on the intermediate data sequence; and performing primary fault analysis on the at least one piece of result data to identify a primary fault, wherein the data type mark comprises a start data mark, an end data mark and an invalid data mark, and the doubtful degree represents the occurrence probability of the primary fault. According to the present disclosure, the primary failure recognition can be accurately performed.

Description

Main fault identification method and system of wind generating set

Technical Field

The disclosure relates to a wind power generation technology, in particular to a method and a system for identifying a main fault of a wind generating set.

Background

Wind power generation is a clean energy acquisition mode, and is widely applied and gradually developed. During the operation of a wind turbine generator set (which may be referred to as a wind turbine below), various faults may occur, which are sent by a main controller (for example, a programmable logic controller (which may be referred to as a PLC)) of the wind turbine to a wind farm data acquisition and monitoring control (which may be referred to as a SCADA) system, and the SCADA system notifies emergency maintenance personnel to perform fault handling. The main fault (also called the first fault) of the wind turbine is usually used as an identification of the fault occurrence and a basis for emergency repair (e.g., generating a work order, dispatching a worker). The generation of the primary fault may be accompanied by changes and generation of associated data (e.g., monitoring data) including real-time data (may be abbreviated as RData), primary fault location data (may be abbreviated as CData), and dynamic fault data (may be abbreviated as FData).

The current primary fault identification method takes the first received dynamic fault as the primary fault. However, in the actual operation process, the main fault may not be identified accurately (for example, the start and end of the main fault cannot be identified accurately), or even the main fault cannot be identified due to the processing logic error of the PLC of the wind turbine, the instability of the network (according to different network access methods, the network may include a public communication network and a wireless network), the data acquisition error, the difference of the SCADA processing sequence, the backlog of the uploaded data, the restart of the programs of each node, the unqualified uploaded data, and the like. The statistics of failure times and the like can be influenced if the main failure cannot be identified or the identification is inaccurate, and economic loss and equipment loss can be caused due to the fact that rush repair cannot be carried out in time according to the main failure, so that electric field examination and stable operation of a wind power plant are influenced.

Disclosure of Invention

The invention provides a method and a system for identifying a main fault of a wind generating set, which aim to solve the problem that the main fault identification of the existing main fault identification method is inaccurate or can not identify the main fault.

According to an exemplary embodiment of the present disclosure, a method for identifying a primary fault of a wind turbine generator system is provided, wherein the method for identifying the primary fault comprises: acquiring a data sequence related to a main fault in an SCADA system, wherein the data sequence comprises at least one piece of data, and the type of the at least one piece of data comprises at least one of dynamic fault data, remote signaling data and main fault displacement data; formatting a data sequence related to a main fault to obtain an intermediate data sequence, wherein each intermediate data in the intermediate data sequence has a corresponding time identifier, a data type mark and a doubtful degree; obtaining at least one piece of result data based on the intermediate data sequence; and performing primary fault analysis on at least one piece of result data to identify a primary fault, wherein the data type marks comprise a start data mark, an end data mark and an invalid data mark, and the doubt degree represents the occurrence probability of the primary fault.

Optionally, the step of performing a primary fault analysis on at least one piece of result data includes: and performing primary fault analysis on the at least one piece of result data by using a primary fault model to identify a primary fault, wherein the primary fault model defines a primary fault type, and a time window, a primary fault identification rule and an equipment type which correspond to the primary fault type.

Optionally, the step of performing a primary fault analysis on at least one piece of result data by using the primary fault model includes: and determining a corresponding main fault type based on the equipment type, and performing main fault identification analysis according to a time window corresponding to the main fault type and a main fault identification rule.

Optionally, the step of acquiring a data sequence related to a master failure in the SCADA system includes: the raw data of the SCADA system is preprocessed, and a data sequence related to the main fault is obtained.

Optionally, the pre-processing comprises the steps of filtering and converting; wherein, filtering the original data of the SCADA system comprises: searching a data filtering rule corresponding to the type of the original data; filtering the original data according to the searched data filtering rules to reserve main fault data, remote signaling data and main fault displacement data related to the main fault; converting raw data of the SCADA system includes: searching a data conversion rule corresponding to the main fault data, the remote signaling data and the main fault deflection data; and converting the main fault data, the remote signaling data and the main fault deflection data into corresponding code data according to the searched data conversion rule so as to obtain a data sequence related to the main fault.

Optionally, the step of formatting the data sequence related to the primary failure to obtain an intermediate data sequence includes: and sequencing and uniformly formatting all the converted code data according to the time sequence, and assigning the doubtful degree of the formatted data according to the type of the code data.

Optionally, the step of obtaining at least one piece of result data based on the intermediate data sequence includes: determining intermediate data with the earliest time identifier in the intermediate data sequence according to the time identifier, and screening out all intermediate data with the earliest time identifier and all intermediate data with the same type as the intermediate data; setting a data type mark corresponding to the screened intermediate data on a time axis; and in time sequence on a time axis, taking the first start data mark after the earliest start data mark or any one end data mark as a start mark of a piece of result data, and taking the first end data mark after the start mark as an end mark of the piece of result data.

Optionally, the step of identifying the primary failure includes: determining a corresponding time window based on the device type corresponding to the intermediate data having the earliest time identification; for any piece of result data, on a time axis, using the time represented by the earliest time mark as the starting time of a time window; determining all starting data markers present within the time window and calculating a sum of doubts corresponding to the determined starting data markers; determining all end data markers present within the time window and calculating a sum of doubts corresponding to the determined end data markers; regarding result data with the sum of the doubtful degrees corresponding to the determined starting data marks larger than a preset doubtful degree threshold value, taking intermediate data corresponding to the starting marks of the result data as main fault starting data; and regarding result data with the sum of the doubtful degrees corresponding to the determined end data mark larger than a preset doubtful degree threshold value, taking intermediate data corresponding to the end mark of the result data as main fault end data.

According to another exemplary embodiment of the present disclosure, a primary fault identification system of a wind park is provided, wherein the primary fault identification system comprises: the data cleaning unit is used for acquiring a data sequence related to a main fault in the SCADA system, the data sequence comprises at least one piece of data, and the type of the at least one piece of data comprises any one of dynamic fault data, remote signaling data and main fault displacement data; the data formatting unit is used for formatting a data sequence related to the main fault to obtain an intermediate data sequence, wherein each piece of intermediate data in the intermediate data sequence has a corresponding time identifier, a data type mark and a doubtful degree; the main fault identification unit is used for obtaining at least one piece of result data based on the intermediate data sequence and carrying out main fault analysis on the at least one piece of result data so as to identify a main fault; the message middleware is used for caching and forwarding the data sequence acquired by the data cleaning unit to the data formatting unit; the memory or the buffer is used for storing the intermediate data sequence obtained by the data formatting unit in a queue form and storing at least one piece of result data obtained by the main fault identification unit in the queue form; and the database is used for backing up data in the memory or the cache, wherein the data type marks comprise a start data mark, an end data mark and an invalid data mark, and the suspiciousness represents the occurrence probability of the main fault.

Optionally, the master fault identifying unit performs master fault analysis on the at least one piece of result data by using a master fault model to identify a master fault, where the master fault model defines a master fault type, and a time window, a master fault identification rule, and a device type corresponding to the master fault type.

Optionally, the master fault identifying unit determines a corresponding master fault type based on the device type, and performs master fault identification analysis according to a time window corresponding to the master fault type and a master fault identification rule.

Optionally, the data obtaining unit preprocesses raw data of the SCADA system, and obtains a data sequence related to the main fault.

Optionally, the pre-processing comprises filtering and converting; wherein, filtering the raw data of the SCADA system comprises: searching a data filtering rule corresponding to the type of the original data; according to the searched data filtering rule, filtering the original data to reserve main fault data, remote signaling data and main fault displacement data related to the main fault; converting raw data of the SCADA system includes: searching a data conversion rule corresponding to the main fault data, the remote signaling data and the main fault deflection data; and converting the main fault data, the remote signaling data and the main fault deflection data into corresponding code data according to the searched data conversion rule so as to obtain a data sequence related to the main fault.

Optionally, the data formatting unit performs sequencing and unified formatting on all the converted code data according to a time sequence, and performs similarity assignment on the formatted data according to the type of the code data.

Optionally, the result data acquiring unit determines the intermediate data with the earliest time identifier in the intermediate data sequence according to the time identifier, and screens out all intermediate data including the intermediate data with the earliest time identifier and all intermediate data consistent with the type of the intermediate data; setting a data type mark corresponding to the screened intermediate data on a time axis; in time sequence on a time axis, the first start data mark after the earliest start data mark or any one end data mark is used as a start mark of one piece of result data, and the first end data mark after the start mark is used as an end mark of one piece of result data.

Optionally, the master failure identifying unit determines a corresponding time window based on the device type corresponding to the intermediate data having the earliest time identifier; regarding any piece of result data, on a time axis, taking the time represented by the earliest time identifier as the starting time of a time window; determining all starting data markers present within the time window and calculating a sum of doubts corresponding to the determined starting data markers; determining all end data markers present within the time window and calculating a sum of doubts corresponding to the determined end data markers; for result data with the sum of the doubtful degrees corresponding to the determined starting data marks larger than a preset doubtful degree threshold value, taking intermediate data corresponding to the starting marks of the result data as main fault starting data; and regarding result data with the sum of the doubtful degrees corresponding to the determined end data mark larger than a preset doubtful degree threshold value, taking intermediate data corresponding to the end mark of the result data as main fault end data.

According to another exemplary embodiment of the present disclosure, there is provided a primary fault identification system of a wind turbine generator set, the primary fault identification system including:

the SCADA system is used for acquiring original data from a PLC (programmable logic controller) of the wind generating set, and the original data comprises at least one of dynamic fault data, remote signaling data and main fault displacement data; a primary fault identification system for identifying a primary fault from raw data of the SCADA system; and the operation maintenance and repair system is used for sending out an emergency repair notice according to the identified main fault of the main fault identification system.

According to another exemplary embodiment of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the primary failure identification method as above.

In an exemplary embodiment of the present disclosure, a data sequence related to a primary fault is formatted, the data sequence is formatted into an intermediate data sequence having a time identification, a data type flag, and a plausibility, and primary fault identification is performed based on result data obtained by analyzing the intermediate data sequence. The method takes all main fault related data sequences as the basis of main fault identification, and accurately identifies the main faults based on corresponding identification rules according to the assigned suspicion degrees. The method can effectively solve the problem that the main fault cannot be accurately identified due to unstable data transmission in the existing method.

Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.

Drawings

The above and other objects and features of exemplary embodiments of the present disclosure will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate exemplary embodiments, wherein:

fig. 1 shows a flow chart of a method of primary fault identification of a wind park according to an exemplary embodiment of the present disclosure;

FIG. 2 is a schematic diagram showing the connection relationship between the wind turbine PLC, the SCADA system, the primary fault identification system, and the operation and maintenance system;

FIG. 3 shows a schematic structural diagram of a primary fault identification system according to an exemplary embodiment of the present disclosure;

FIG. 4 illustrates a structural schematic diagram showing a primary fault model according to an exemplary embodiment of the present disclosure;

FIG. 5 illustrates intermediate data and result data on a time axis according to an exemplary embodiment of the present disclosure;

FIG. 6 shows a schematic diagram of a time window according to an exemplary embodiment of the present disclosure;

FIG. 7 shows a flowchart of a primary fault identification method according to yet another example embodiment of the present disclosure;

fig. 8 illustrates a flowchart of a method of determining an operation required to be performed when setting data on a time axis according to an exemplary embodiment of the present disclosure;

FIG. 9 shows a time axis corresponding to FIG. 8;

FIG. 10 illustrates a flowchart of an operation of obtaining a last piece of data according to an exemplary embodiment of the present disclosure;

FIG. 11 illustrates a flowchart of an operation of obtaining a next piece of data according to an exemplary embodiment of the present disclosure;

FIG. 12 illustrates a flowchart of an operation of obtaining end data according to an exemplary embodiment of the present disclosure;

fig. 13 illustrates a flowchart of an operation of calculating a plausibility according to an exemplary embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present disclosure by referring to the figures.

In an exemplary embodiment of the present disclosure, a primary fault may represent a source fault that is uploaded as the fault is generated, typically the first generated dynamic fault (in the order in which the dynamic faults occur, also referred to as a time-scale order). In the normal data generation order, a primary failure is typically generated first, followed by associated data generation or change. Primary failure identification inaccuracies may result from a variety of causes, for example, if data is delayed from being transmitted, lost, and may result in a primary failure identification error; as another example, if the type of data used for primary fault identification is single (e.g., primary fault identification is performed using only dynamic fault data), then partial information may be lost, resulting in an inability to accurately identify a primary fault, or even an inability to identify a primary fault.

As described above, the generation of the primary fault is accompanied by the change and generation of associated data including real-time data, primary fault displacement data, and dynamic fault data. Primary fault identification may be performed using one or more of real-time data, primary fault shift data, and dynamic fault data in the present disclosure.

The shifted data may be understood as exception data, and the main fault shifted data may be understood as exception data associated with a dynamic fault or with a sub-fault caused by a dynamic fault. The displacement data is different according to different types of wind generating sets of different manufacturers. Real-time data, primary fault shift data and dynamic fault data may be obtained from the SCADA system, the real-time data including telemetry data and telemetry data, wherein the data associated with the primary fault is primarily telemetry data. The data related to the primary fault identification are sequentially from high to low in priority: dynamic fault data, main fault displacement data and real-time data. In a complete fault process, only one piece of main fault data is provided, the main fault data has a beginning and an end, and a plurality of pieces of real-time data, a plurality of pieces of main fault displacement data and a plurality of pieces of dynamic fault data can be provided. Since the three types of data, dynamic fault data, primary fault shift data, real-time data, are associated with primary fault data, the primary fault identification system may receive at least one of the three types of data and identify a primary fault according to the methods of the present disclosure, and may determine the beginning and end of the primary fault.

The existing main fault identification system only relies on dynamic fault data to identify main faults, and does not consider main fault displacement data and real-time data. After a wind generating set fails, the existing primary fault identification system may not receive dynamic fault data, and thus does not identify and report a primary fault. Under the condition, the wind generating set cannot normally generate power under the influence of faults, the faults need to be found through external expressions such as fan stop and the like, or the faults are found through a monitoring device, for example, an electric energy meter displays that the fan stops generating power. Since the primary fault is not identified, no notification and maintenance is issued until the fault is discovered by other means, causing economic loss from the time the fault occurs to the time the fault is discovered. If a fault occurs but the maintenance is not carried out in time, the fault operation of the fan is caused, and therefore the service life of the fan can be reduced.

In addition, the order in which the existing master failure recognition system receives the dynamic failures may not be the time scale order (the order of generation time) of the dynamic failures. In this case, the existing primary fault identification system may report the first received dynamic fault as the primary fault, but the first received dynamic fault may not be the actual primary fault, i.e., the first actually occurring dynamic fault. The dynamic fault may point to a wrong fault source, which may lead to untimely emergency repair and increase difficulty and time of emergency repair.

In addition, the existing master failure recognition system may receive a delayed dynamic failure, that is, a packet of the dynamic failure may be delayed from reaching the existing master failure recognition system due to network instability, congestion, and the like. In such a case, the primary failure may not be identified in time, resulting in an untimely repair.

In view of the above, exemplary embodiments of the present disclosure provide a method, an apparatus and a system for identifying a main fault of a wind turbine generator system, which will be described in detail below with reference to the accompanying drawings.

Fig. 1 shows a flowchart of a primary fault identification method of a wind park according to an exemplary embodiment of the present disclosure, which may include steps 101 to 104, as shown in fig. 1.

In step 101, a data sequence related to a main fault in the SCADA system is obtained, where the data sequence includes at least one piece of data, and a type of the at least one piece of data includes any one of main fault data, remote signaling data, and main fault displacement data. In step 102, the data sequence related to the primary fault is formatted to obtain an intermediate data sequence, where each intermediate data in the intermediate data sequence has a corresponding time identifier, a data type flag, and a plausibility, and the plausibility indicates an occurrence probability of the primary fault. At step 103, at least one piece of result data is obtained based on the intermediate data sequence. At step 104, a primary fault analysis is performed on the at least one piece of result data to identify a primary fault.

The above steps 101 to 104 may be performed by a master fault identification system, which may be communicatively connected to the SCADA system for obtaining a data sequence related to the master fault from the SCADA system. Meanwhile, the system can be in communication connection with the operation and maintenance system so as to send the identified main fault to the operation and maintenance system, and the operation and maintenance system can send out emergency repair notice, perform fault position analysis, perform fault reason analysis and the like according to the identified main fault. In addition, the SCADA system is in communication connection with the wind turbine generator set, and obtains a data sequence related to the main fault from a PLC (programmable logic controller) of the wind turbine generator set (simply referred to as a fan PLC).

Fig. 2 is a schematic diagram showing a connection relationship among the wind turbine PLC, the SCADA system, the master failure recognition system, and the operation and maintenance system, and an arrow in fig. 2 shows a data transmission direction. Specifically, the SCADA system transmits real-time data, main fault displacement data and dynamic fault data to the main fault identification system, and the main fault identification system transmits the main fault data identified by the main fault identification system to the operation and maintenance system.

Since the present application is primarily concerned with primary fault identification, the primary fault identification system of fig. 2 is described in detail below in conjunction with fig. 3.

Fig. 3 illustrates a schematic structure of a master failure recognition system according to an exemplary embodiment of the present disclosure, which may include a data cleaning unit, a data formatting unit, a master failure recognition unit, and a database, as illustrated in fig. 3.

Specifically, the data cleaning unit is used for acquiring a data sequence related to a main fault in the SCADA system, where the data sequence includes at least one piece of data, and the type of the at least one piece of data includes any one of dynamic fault data, remote signaling data, and main fault displacement data.

Preferably, in order to avoid a large amount of data accumulated in the data formatting unit, which results in data loss, message middleware may be provided for buffering and forwarding the data sequence acquired by the data cleansing unit to the data formatting unit.

The data formatting unit formats the data sequence related to the main fault to obtain an intermediate data sequence, wherein each piece of intermediate data in the intermediate data sequence has a corresponding time identifier, a data type mark and a plausibility, the data type mark comprises a start data mark, an end data mark and an invalid data mark, and the plausibility represents the occurrence probability of the main fault. That is, the data formatting unit generates intermediate data from the preprocessed raw data, and specifically, performs format normalization on the raw data to generate intermediate data, which will be described in detail below. Such intermediate data (e.g., intermediate data M in fig. 3) is stored in a queue to a memory or a buffer.

The main fault identification unit acquires an intermediate data sequence from a memory or a cache, acquires at least one piece of result data based on the intermediate data sequence, and performs main fault analysis on the at least one piece of result data to identify a main fault. The result data (e.g., result data R in fig. 3) may be stored in a queue in memory or in a cache.

The memory or cache may be capable of fast access to data to improve overall processing speed of the primary fault identification, but the capacity of the memory or cache is limited, which requires a database to store data in the memory or cache for a long time. Specifically, the memory or the cache may store data in a recent period of time, that is, the data has a time identifier, and may store data in a predetermined period of time before the current time point according to the time identifier, where data not in the predetermined period of time is to be deleted. To ensure that data is not lost, i.e., to preserve historical data, the data may be stored in a database (DB for short). In other words, data in memory or cache may be backed up by the database.

In addition, a model library (abbreviated as MDB) can be stored in the memory or the cache, and the DB can also back up the MDB in the memory or the cache. The MDB stores therein a master failure model, which will be described in detail below.

The memory or the cache is used for storing a proper amount of data, so that the query speed of the data can be effectively improved, the burden of a database can be reduced, and the real-time performance and the stability of the main fault identification are obviously improved. And loading related data into a memory or a cache when the main fault identification system is started, and updating the data in the operation process of the main fault identification system. The data storage conditions of the memory or cache are shown in table 1 below:

TABLE 1

The steps of acquiring a data sequence described with reference to fig. 1, and the operations of acquiring a data sequence performed by the data cleansing unit described in connection with fig. 3, may be implemented in a variety of implementations. One possible implementation is described in the exemplary embodiments of the present disclosure, but this is only for illustrative purposes and is not intended to limit the scope of the present disclosure, and other possible implementations may also be used to implement the relevant operations and steps.

Specifically, raw data of the SCADA system is preprocessed to obtain a data sequence related to a main fault. The raw data of the SCADA system is unwashed data received and stored directly from the wind turbine generator system, where cleansing may include both filtering and converting. The unconverted raw data can be understood as: the field meaning of each SCADA data is not uniform, which does not facilitate analysis of the data.

As described above, raw data (SCADA data) includes dynamic fault data, real-time data including telemetry data and telemetry data, and primary fault shift data. The primary fault shift data includes data associated with the position signal. The dynamic fault data, the real-time data and the main fault displacement data can be filtered and converted in sequence.

The filtering of data is described in detail below, and the filtering of data requires the use of a main fault model, which will be described in detail below, wherein a data cleansing rule for filtering is defined in the main fault model, and when the data cleansing rule is required to be used, the required data cleansing rule can be searched in advance.

The filtering comprises two dimensions, the filtering of the first dimension in the longitudinal direction is to selectively reserve all received original data, only reserve data required by the identification of the main fault, and discard other data. The filtering in the second dimension is to selectively reserve fields of each type of data received, for example, the reserved fields after filtering include: device ID, device type, time identification, and value, etc. It is understood that different types of data may include the same or different fields.

Specifically, for real-time data, the data may be filtered according to data cleansing rules in the primary fault model to preserve data needed for primary fault identification. The filtering includes deleting the telemetry data from the real-time data, retaining data related to the main fault in the telemetry data, and deleting other data. If 400 pieces of real-time data including 200 pieces of telesignaling data and 200 pieces of telemetering data are received in 1s, firstly discarding 200 pieces of telemetering data, and only keeping 200 pieces of telesignaling data; and filtering the 200 pieces of remote signaling data according to a data cleaning rule in the main fault model, only keeping the data which is defined in the data cleaning rule and is relevant to main fault identification, and discarding other data. The filtering of the primary fault shift data includes retaining primary fault shift data associated with the primary fault and deleting other shift data. Filtering the dynamic fault data includes retaining dynamic fault data of a type associated with the primary fault, with other types of dynamic fault data deleted. The retention and deletion referred to in the above filtering are also performed in accordance with a data cleansing rule, for example, each data has a data type, and the data type to be retained and the data type to be deleted are defined in the data cleansing rule, so that the data retention and deletion based on the data cleansing rule can be realized.

In addition, the fields of the original data can be filtered according to the data cleaning rule, only the key fields for identifying the main fault are reserved, and other fields are deleted. For example, a data cleansing rule defines fields to be retained and fields to be deleted.

For example, in the original data obtained after the above processing of the data field, the field of the remote signaling data includes: device ID, device type, time identification (i.e., time of generation of data), type (e.g., wind speed, fault, power generation), and value (status code), there may be multiple status codes, typically a first status code corresponding to a status; the fields of the primary fault location data include: device ID, device type, time identification, and value (e.g., flag indicating failure occurrence and end); the fields of the dynamic fault data include: device ID, device type, time identification, and value (e.g., failure flag). As described above, the field of "value" is not uniform in meaning and needs to be converted, for example, into code data including "0" and "1", where "0" indicates normal, "1" indicates abnormal, the first "1" is the start of fault, and the last "0" is the end of fault.

Data cleaning provides necessary and sufficient data basis for subsequent main fault identification, and discards data irrelevant to main fault identification, so that data space occupation is reduced, and processing speed is increased.

The following describes the formatting of data, and the formatted object is the original data that has been cleaned, and the intermediate data is obtained after formatting. Specifically, the attributes of the intermediate data obtained after formatting are shown in table 2 below:

TABLE 2

Serial number	Properties	Description of the invention
			1	Device ID
2	Type of device
			3	Time identification	Time stamp, time of data generation
4	Type identification	Representing a data source
			5	Status identification	Fault sign
6	Suspected degree of main failure	From master fault model
			7	Data type marking	Start data S, end data E, invalid data I
8	Other Properties	Such as device identification, status identification, etc

The above code data after being formatted obtains "status flag", which can indicate whether a fault occurs, and the "status flag" and "time flag" are combined to obtain start data (marked as "S"), end data (marked as "E"), invalid data (marked as "I"), which is distinguished by "data type flag", for example, "0" indicates normal, "1" indicates abnormal, the first "1" is the fault start, the last "0" is the fault end, a plurality of code data form a code "100111100", the data type of the data corresponding to the first "1" is marked as S, the data type of the data corresponding to the next "0" before the "1" is marked as E, the "1" adjacent to the E is marked as another S, and so on, the data types corresponding to the rest codes are marked as I.

To better understand the above data cleansing and formatting process, the primary fault model will be described in detail below in conjunction with FIG. 4.

Fig. 4 shows a schematic structural diagram of a master failure model according to an exemplary embodiment of the present disclosure. As shown in FIG. 4, the master failure model in the model library may define data cleansing rules, doubts for different data types, time windows, and master failure decision rules. The main fault model has a tree structure and is stored in the DB, and the main fault recognition system loads the main fault model from the DB into a memory or a cache to serve as a model library after being started, so that the main fault model can be accessed quickly, and the model access efficiency is improved. The master failure model stores data according to each master failure (e.g., master failure 1 to master failure n), each master failure includes a corresponding time window, a master failure determination rule and a device type, the device types are divided according to the data types of the devices, and for each data type, a data cleansing rule and a suspicion degree are defined.

The primary fault model defines data cleansing rules for filtering, converting and formatting received raw data; simultaneously, the corresponding suspected degree of each data type and the suspected degree threshold value of the main fault are defined and used for obtaining result data R according to the intermediate data M; specifically, the dynamic fault data, the remote signaling data and the main fault displacement data have different doubtful degrees, and the main fault is identified according to the doubtful degrees of the data in the process of identifying the main fault, so that the defect caused by the condition that the first received dynamic fault data is singly used as the main fault data is avoided.

The time window (extend) includes a start time window and an end time window. For each primary failure there is a start and an end, and there may be multiple start and end data for each start and end, the present disclosure provides for primary failure identification of data that falls within a time window. The start time window and the end time window define a time range (start time window) in which the main fault starts and a time range (end time window) in which the main fault ends, respectively, and are used for calculating the sum of the doubts of the intermediate data in the time range for the judgment of the subsequent doubts. The start time window and the end time window are used for the identification and determination of the start of the primary fault and the identification and determination of the end of the primary fault, respectively.

It is understood that the above doubts, time windows and data cleansing rules can be predetermined according to historical data and experience, and can also be obtained by training models or data statistics.

On the basis of the obtained intermediate data, in order to perform the main fault analysis, the intermediate data needs to be further analyzed to obtain result data. The specific process comprises the following steps: determining intermediate data with the earliest time identifier in the intermediate data sequence according to the time identifier, and screening out all intermediate data with the earliest time identifier and all intermediate data with the same type as the earliest time identifier; setting a data type mark corresponding to the screened intermediate data on a time axis; and taking the first start data mark after the earliest start data mark S or any one end data mark E as the start mark of one piece of result data and taking the first end data mark after the start mark as the end mark of the piece of result data in time sequence on the time axis. The generation process of the result data is described in detail below in conjunction with fig. 5.

Fig. 5 illustrates intermediate data and result data on a time axis according to an exemplary embodiment of the present disclosure. As shown in FIG. 5, S denotes data with a start data marker, E denotes data with a knotData of bundle data marks, R_SIndicating the start of the resulting data, R_EIndicating the end of the resulting data.

Specifically, the intermediate data sequence may be analyzed to find the intermediate data identified at the earliest time and the subsequent intermediate data, and the intermediate data is represented by the data type mark on the time axis, thereby obtaining the time axis with "S", "E", "I" in fig. 5. Then, the start identifier and the end identifier are determined according to the following rules: and in time sequence on the time axis, using the first start data mark after the earliest start data mark or any one end data mark as the start mark of one piece of result data, and using the first end data mark after the start mark as the end mark of the piece of result data. By the above rule, 4 pieces of result data R are determined on the time axis shown in fig. 5.

The result data can be used as candidate data of the main fault data, and has a start time and an end time, wherein the start time depends on the time identifier of the intermediate data corresponding to the start identifier, the end time depends on the time identifier of the intermediate data corresponding to the end identifier, and the attributes and characteristics of the result data are shown in table 3:

TABLE 3

Serial number	Properties	Description of the invention
			1	Start mark
2	Ending mark
			3	Degree of occurrence of doubtful
4	End of suspected degree
			5	Other Properties	Such as device ID, status identification, etc

As shown in the above table, the result data R is characterized by a start flag and an end flag. The plurality of pieces of result data R divide the intermediate data on the entire time axis into a plurality of sections, each section including a plurality of pieces of intermediate data, thereby dividing the data in more detail to solve the influence on the identification of the master failure caused by the data not being received in time.

Where the resulting data is determined, a primary fault analysis may be performed on the resulting data to identify a primary fault, as described in step 104 of FIG. 1. Specifically, a primary fault analysis is performed on the at least one piece of result data using a primary fault model to identify a primary fault.

Specifically, the primary fault analysis is performed by: determining a corresponding time window based on the device type corresponding to the intermediate data having the earliest time identification; for any piece of result data, on the time axis, the time represented by the earliest time identifier is used as the starting time of a time window; determining all start data markers present within the time window and calculating a sum of doubts corresponding to the determined start data markers; determining all end data markers present within the time window and calculating a sum of doubts corresponding to the determined end data markers; for result data with the sum of the doubtful degrees corresponding to the determined starting data marks larger than a preset doubtful degree threshold value, taking intermediate data corresponding to the starting marks of the result data as main fault starting data; and regarding the result data of which the sum of the doubtful degrees corresponding to the determined end data mark is larger than a preset doubtful degree threshold value, taking the intermediate data corresponding to the end mark of the result data as the main fault end data.

To more clearly illustrate the primary fault identification process, the time window is described below in conjunction with fig. 6.

Fig. 6 shows a schematic diagram of a time window according to an exemplary embodiment of the present disclosure. In FIG. 6, start data marks S are provided on the time axis₀、S₁、S₂And S ₃3 pieces of data out of 4 pieces of data (which may be called start data) located in the start time window, each with an end data flag E₀、E₁And E₂Is located within the end time window (which may be referred to as the end data). Calculating occurrence suspicion and end suspicion for each candidate result data R, wherein the occurrence suspicion of the result data R is the sum of suspicions of all intermediate data with a start data mark S within a start time window starting at the time indicated by the earliest time mark in the piece of result data; the end plausibility of the result data R is the sum of the plausibility of all intermediate data with the end data flag E within the end time window starting from the time indicated by the earliest time marker in the piece of result data. The sum of the data doubts may exceed 100% over a period of time, for example, when the sum of the doubts is equal to or greater than 80%, a major fault is considered to have occurred, and 80% is a predetermined doubts threshold. The doubtful degree threshold value may be the same or different for the occurrence doubtful degree and the end doubtful degree. If S is₀、S₁、S₂The sum is more than 80%,will be compared with S₀Corresponding intermediate data as primary fault start data if E₀、E₁And E₂If the sum is more than 80%, the sum will be equal to E₀The corresponding intermediate data is used as the main fault end data.

Fig. 7 shows a flowchart of a primary fault identification method according to yet another example embodiment of the present disclosure. Differences from fig. 1 include: the operation of the present embodiment relates to a more specific processor, and steps 303 to 309, 311, and 312 are newly added steps, and step 104 of fig. 1 is not executed.

Referring to FIG. 7, at step 301, a processor receives raw data, which is the same as step 101 of FIG. 1; at step 302, intermediate data M is generated from the raw data, which is the same as step 102 of fig. 1; in step 303, judging whether to enable the middleware, if so, enabling the message middleware in step 304; in step 305, it is determined whether to cache the data, and if so, in step 306, the intermediate data M is cached, that is, the intermediate data M is stored in the memory or the cache; in step 307, it is judged whether or not the data is duplicated with the previous data, and if not, it is judged whether or not the data is valid data in step 308, and if not, the intermediate data M is stored in DB in step 309; if it is valid data, the result data R is determined from the intermediate data M in step 310, which is the same as step 103 of fig. 1; in step 311, the intermediate data and the result data are stored in the DB, and the memory or the cache is updated, and the suspicion degree of the result data may be optionally set; in step 312, it is determined whether the saving is successful, if the saving is successful, the flow ends, otherwise, the flow returns to step 306 to store the data into the cache and then step 307 is executed.

In an exemplary embodiment of the present disclosure, it is determined whether data is repeated with previous data, the data repeated with the previous data is referred to as repeated data, and the repeated data is filtered. The duplicate data includes: data that is the same in attributes such as time stamp as the previous data; and data that is the same in time identification and type identification as the previous data, but different in other attributes (e.g., different in state identification).

As described above, when the process of generating the result data from the intermediate data is specifically described, it is necessary to perform, with the time axis (specifically, "set the data type flag corresponding to the screened intermediate data on the time axis"), if data has already been set on the time axis, an update operation Upd of changing the already set data on the time axis without adding new data and/or an insert operation Ins of adding new data on the time axis, based on the current intermediate data. A method of judging an operation (Upd and/or Ins) that needs to be performed when setting data on a time axis can be understood with reference to fig. 8, 9, and table 4.

Fig. 8 illustrates a flowchart of a method of determining an operation that needs to be performed when setting data on a time axis according to an exemplary embodiment of the present disclosure, fig. 9 illustrates a time axis corresponding to fig. 8, and table 4 illustrates how result data is modified when a piece of intermediate data is generated. In the present exemplary example, the last valid data of the current data C is data L, the next valid data is data N, S is a start data flag, interpreted as intermediate data with a start data flag, and E is an end data flag, interpreted as intermediate data with an end data flag.

TABLE 4

To correct for plausibility, a plausibility modification function may be used. If the plausibility modification function is enabled, the plausibility of the result data R may be further selectively modified each time the process shown in fig. 8 is executed, where the new plausibility of the result data R is 0 or the original plausibility of R + the plausibility of C. The time stamp is an abbreviation of the time stamp, NULL denotes no data (NULL data). The original doubtful degree of the result data R is the sum of the doubtful degrees calculated in the corresponding time window. The screening conditions corresponding to serial numbers 6, 7, 8 may be replaced with: the start time of the result data R is not greater than the time stamp of L, and the end time of the result data R has no time stamp. The start time of the result data R is a point in time corresponding to the start flag of the result data R, and the end time of the result data R is a point in time corresponding to the end flag of the result data R.

As shown in table 4 above, the type of operation (Ins, Upd, or both Ins and Upd are executed) is determined according to the data type flag (S, E, or I) of the current data C, the previous valid data L, and the next valid data N.

Similarly, referring to fig. 8, in step 401, it is determined whether there is duplicate data, if there is duplicate data, then in step 402, the plausibility is calculated (for example, the plausibility of the data that is duplicate with the current data is taken as the plausibility of the current data), otherwise, in step 403, the previous piece of data L is obtained; in step 404, determining whether the current data is the start data S, if so, executing step 405, otherwise, executing step 406; judging whether the data L is empty or end data E in step 405, if so, acquiring the data N in step 407, executing Ins when the data N is empty, executing Upd when the data N is result data, executing Ins when the data N is end data, and then optionally executing the step of calculating the doubtful degree in step 408; judging whether the data L is empty or the data E is ended in step 406, and if not, acquiring the data N in step 409; in step 410, performing Upd when data N is empty, performing Upd and Ins when data N is start data S, performing Upd when data N is end data E, and then optionally performing the step of calculating the doubtful degree; if the determination result in step 405 is negative or the determination result in step 406 is positive, step 411 is executed to selectively calculate the plausibility, which can be calculated by referring to the method shown in fig. 13.

Fig. 9 shows operations similar to those shown in table 4 by means of time axes, which are not described in detail here. In table 4 the following operations are referred to: the operations of obtaining the last valid data, obtaining the next valid data, and obtaining the next valid data with the end data flag E of the next valid data N are described in detail below with reference to fig. 10 to 12.

Fig. 10 illustrates a flowchart of an operation of acquiring a previous piece of data according to an exemplary embodiment of the present disclosure. As shown in fig. 10, in step 501, it is determined whether the identification of the current data is within the memory range (predetermined time range), if yes, the last valid data L is obtained from the memory or the cache in step 503, otherwise, the last valid data L is obtained from the DB in step 502; in step 504, it is determined whether the data L exists in the memory or the cache, if so, the process is terminated, otherwise, step 502 is executed to obtain the data L from the DB.

Fig. 11 illustrates a flowchart of an operation of acquiring next piece of data according to an exemplary embodiment of the present disclosure. As shown in fig. 11, in step 601, it is determined whether the timestamp of the data C is the latest data, and if so, the next valid data N is empty in step 602; otherwise, it is determined whether the time stamp is in the memory range (e.g. 20 minutes from the time stamp of the data) in step 603, if so, the next valid data N is obtained from the memory or the cache in step 604, otherwise, the next valid data N is obtained from the DB in step 605.

Fig. 12 illustrates a flowchart of an operation of acquiring end data according to an exemplary embodiment of the present disclosure. As shown in fig. 12, in step 701, it is determined whether the timestamp of the data N is within the memory range, and if so, the first piece of result data E after the data N is obtained from the memory or the cache in step 702, otherwise, the first piece of result data E after the data N is obtained from the DB in step 703.

Fig. 13 illustrates a flowchart of an operation of calculating a plausibility according to an exemplary embodiment of the present disclosure. As shown in fig. 13, at step 801, a corresponding time window is obtained from a master failure model in a model library; in step 802, a corresponding primary fault is obtained; in step 803, it is determined whether the data is before or within the time window, if so, the plausibility is calculated in step 804, otherwise the process ends.

The primary fault suspicion calculation adopts a mode that the primary fault starts and ends to be calculated respectively. According to the definition of the primary fault doubtness, each type of primary fault defines a starting time window and an ending time window. The values of the time window come from the master fault model, and the time window is determined based mainly on historical data and corrected based on empirical data. The historical data is concentrated distribution time according to the beginning and the end of the data related to the main fault which occurs in the past, and the correction according to the empirical data is correction of the distribution time according to the empirical value of the fault. And calculating the sum of the doubts of all the intermediate data in the current time window, and if the sum reaches 80%, determining that the intermediate data is a main fault.

If the time window for the start data S is 5 seconds, the sum of the suspiciousness of all data within this time window (5 seconds after the time scale of the start data S) is calculated, and if 80% or more is reached, it is considered that the master failure starts. A similar calculation is made for the end of the primary failure. For the currently calculated doubtful degree, if the new data is before the time window or within the time window, the doubtful degree is calculated again; otherwise, the doubtness is not recalculated.

In an exemplary embodiment of the present disclosure, the generation and termination of a primary fault is determined based on a primary fault suspicion threshold defined by a primary fault model.

The occurrence plausibility of R is the sum of the plausibility of the relevant S data. In the time window, the data plausibility is added, and may exceed 100%, and 80% is considered to have a main failure.

The end plausibility of R is the sum of the plausibility of the relevant E data. The sum of the doubts of the data may exceed 100% in the time window, and 80% is considered to be the end of the primary failure.

And generating an alarm after the main fault is identified for displaying a human-computer interface, and sending the main fault alarm to an operation and maintenance system to generate a rush-repair work order for dispatching. The complete master failure is also stored to the DB.

To optimize the timeliness of identifying the primary failure, the data for the last 20 minutes of the time stamp is stored to memory or cache, and at least the last piece of data is stored. The 20 minutes is not a fixed value and can be adjusted according to the configuration. Data beyond 20 minutes is stored to the DB, from which it is loaded to memory or cache when needed.

In an exemplary embodiment of the present disclosure, when a fault occurs (the first fault is a main fault), the fault recognition system may receive a dynamic fault, and at the same time, corresponding real-time data and shift data (main fault shift data) may be generated within a predetermined time range (e.g., within 5 s). On the basis, the real-time data, the main fault displacement data and the dynamic fault data are weighted (20%, 30% and 50%) and used for determining the generation of the main fault. The degree of plausibility is used to represent such a weight. The doubtful degree of each type of intermediate data can be adjusted according to the actual operation condition.

Further, according to another exemplary embodiment of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the master failure identification method as described above.

The advantages and benefits of the present disclosure include:

(1) the main fault is not lost and reported, and the problem of main fault loss is solved. The operation and maintenance system can carry out rush repair arrangement according to main faults, and economic loss and equipment loss of the wind power plant are reduced. The wind power plant index operation can count the fault, so that the statistical form is more accurate.

(2) The main fault is not reported by mistake, and the problem of inaccurate identification of the main fault is solved. The method has the important significance that the fault details can be accurately pointed, the emergency repair arrangement is more targeted, and the value and the emergency repair efficiency of the main fault are improved.

(3) And reporting the main fault in time. The main fault can be timely reported to the SCADA system and the operation, maintenance and repair system, so that the repair arrangement can be developed more quickly, the fault treatment can be carried out more quickly, and the economic loss is reduced.

(4) The main fault correlation analysis and judgment do not need manual participation. The uncertainty and instability of manual participation are reduced, the rules are sorted into a library, and the advantages of the model are fully exerted. And the model can be continuously optimized and improved in the operation process, so that the main fault can be judged and generated more accurately.

(5) The main fault identification method and the main fault identification system use SCADA data of the wind power plant, do not need to additionally increase a sensor and network equipment, save investment and are easy to deploy and implement.

Having described embodiments according to the disclosed concept, features from the various embodiments can be combined without departing from the scope of the disclosure, and such combinations will also fall within the scope of the disclosure.

The computer readable storage medium is any data storage device that can store data which can be read by a computer system. Examples of computer-readable storage media include: read-only memory, random access memory, read-only optical disks, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the internet via wired or wireless transmission paths).

Further, it should be understood that various units according to exemplary embodiments of the present disclosure may be implemented as hardware components and/or software components. The individual units may be implemented, for example, using Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs), depending on the processing performed by the individual units as defined by the skilled person.

Furthermore, methods according to exemplary embodiments of the present disclosure may be implemented as computer code in a computer-readable storage medium. The computer code can be implemented by those skilled in the art from the description of the method above. The computer code when executed in a computer implements the above-described methods of the present disclosure.

Although a few exemplary embodiments of the present disclosure have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the claims and their equivalents.

Claims

1. A main fault identification method of a wind generating set is characterized by comprising the following steps:

acquiring a data sequence related to a main fault in an SCADA (supervisory control and data acquisition) system, wherein the data sequence comprises at least one piece of data, and the type of the at least one piece of data comprises at least one of dynamic fault data, remote signaling data and main fault displacement data;

formatting the data sequence related to the main fault to obtain an intermediate data sequence, wherein each piece of intermediate data in the intermediate data sequence has a corresponding time identifier, a data type mark and a doubtful degree;

obtaining at least one piece of result data based on the intermediate data sequence;

performing primary fault analysis on the at least one piece of result data to identify a primary fault,

the data type marks comprise a start data mark, an end data mark and an invalid data mark, and the doubtful degree represents the occurrence probability of the main fault.

2. The primary fault identification method of claim 1, wherein the step of performing primary fault analysis on the at least one piece of result data comprises:

performing primary fault analysis on the at least one piece of result data using a primary fault model to identify a primary fault,

the main fault model defines a main fault type, and a time window, a main fault identification rule and an equipment type corresponding to the main fault type.

3. The primary fault identification method of claim 2, wherein the step of performing primary fault analysis on the at least one result datum using a primary fault model comprises:

and determining a corresponding main fault type based on the equipment type, and performing main fault identification analysis according to a time window corresponding to the main fault type and a main fault identification rule.

4. Method for primary fault identification according to claim 1, characterized in that the step of acquiring a data sequence related to a primary fault in a SCADA system comprises:

the raw data of the SCADA system is preprocessed, and a data sequence related to the main fault is obtained.

5. The primary fault identification method according to claim 4, characterized in that said preprocessing comprises the steps of filtering and converting;

wherein, filtering the raw data of the SCADA system comprises:

searching a data filtering rule corresponding to the type of the original data;

according to the searched data filtering rule, filtering the original data to reserve main fault data, remote signaling data and main fault displacement data related to the main fault;

converting raw data of the SCADA system includes:

searching a data conversion rule corresponding to the main fault data, the remote signaling data and the main fault deflection data;

and converting the main fault data, the remote signaling data and the main fault deflection data into corresponding code data according to the searched data conversion rule so as to obtain a data sequence related to the main fault.

6. The primary fault identification method of claim 5, wherein the step of formatting the primary fault related data sequence to obtain an intermediate data sequence comprises:

and sequencing and uniformly formatting all the converted code data according to the time sequence, and assigning the doubtful degree of the formatted data according to the type of the code data.

7. The primary fault identification method according to any one of claims 3 to 6, wherein the step of obtaining at least one piece of result data based on the intermediate data sequence comprises:

determining intermediate data with the earliest time identifier in the intermediate data sequence according to the time identifier, and screening out all intermediate data with the earliest time identifier and all intermediate data with the same type as the earliest time identifier;

setting a data type mark corresponding to the screened intermediate data on a time axis;

and in time sequence on the time axis, using the first start data mark after the earliest start data mark or any one end data mark as the start mark of one piece of result data, and using the first end data mark after the start mark as the end mark of the piece of result data.

8. The primary fault identification method of claim 7, wherein the step of identifying a primary fault comprises:

determining a corresponding time window based on the device type corresponding to the intermediate data having the earliest time identification;

for any piece of result data, on the time axis, the time represented by the earliest time identifier is used as the starting time of a time window;

determining all starting data markers present within the time window and calculating a sum of doubts corresponding to the determined starting data markers;

determining all end data markers present within the time window and calculating a sum of doubts corresponding to the determined end data markers;

for result data with the sum of the doubtful degrees corresponding to the determined starting data marks larger than a preset doubtful degree threshold value, taking intermediate data corresponding to the starting marks of the result data as main fault starting data;

and regarding result data with the sum of the doubtful degrees corresponding to the determined end data mark larger than a preset doubtful degree threshold value, taking intermediate data corresponding to the end mark of the result data as main fault end data.

9. A primary fault identification system of a wind turbine generator set, characterized in that the primary fault identification system comprises:

the data cleaning unit is used for acquiring a data sequence related to a main fault in the SCADA system, wherein the data sequence comprises at least one piece of data, and the type of the at least one piece of data comprises any one of dynamic fault data, remote signaling data and main fault displacement data;

the data formatting unit is used for formatting the data sequence related to the main fault to obtain an intermediate data sequence, wherein each piece of intermediate data in the intermediate data sequence has a corresponding time identifier, a data type mark and a doubtful degree;

a main fault identification unit, configured to obtain at least one piece of result data based on the intermediate data sequence, and perform main fault analysis on the at least one piece of result data to identify a main fault;

the memory or the buffer is used for storing the intermediate data sequence obtained by the data formatting unit in a form of a queue and storing the at least one piece of result data obtained by the main fault identification unit in a form of a queue;

a database for backing up data in the memory or the cache,

10. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform a primary failure identification method as claimed in any one of claims 1 to 8.