CN115293236A

CN115293236A - Hybrid clustering-based parallel fault diagnosis method and device for power equipment

Info

Publication number: CN115293236A
Application number: CN202210791423.4A
Authority: CN
Inventors: 刘少伟; 戴必翔; 秦昌嵩; 董贝; 经周
Original assignee: Nanjing SAC Automation Co Ltd
Current assignee: Nanjing SAC Automation Co Ltd
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-11-04
Also published as: WO2024007580A1

Abstract

The invention provides a parallel fault diagnosis method and device for power equipment based on hybrid clustering, which can complete parallel diagnosis of corresponding streaming data in real time, meet fault diagnosis of monitored data in real time and find faults of the power equipment in time. The method comprises the following steps: adaptively configuring the parallelism and the related process number of each component in the storm platform according to historical power grid data; the method comprises the steps that real-time power grid data are connected into a Spout source component of a storm platform through an IRichspout interface to form a data stream to be processed; encapsulating the data stream to be processed into a plurality of Tuple tuples according to the time sequence, and generating a unique ID for each Tuple; receiving a Tuple of Tuple by using a PreBolt component, and preprocessing a data set in the Tuple of Tuple by using a standard fraction method to obtain a standardized sample; and processing the standardized sample by using the fault diagnosis model to obtain a fault diagnosis result of the power equipment.

Description

Hybrid clustering-based parallel fault diagnosis method and device for power equipment

Technical Field

The invention belongs to the field of multivariate data monitoring and diagnosis in the power grid and power industry, and relates to a hybrid clustering-based parallel fault diagnosis method and device for power equipment.

Background

With the development of power systems, power equipment failure has a great impact on human life, and therefore, continuous monitoring of the state of equipment is urgently needed. The continuous progress of sensor technology and communication technology leads to exponential growth of power grid data, and the data is real-time, volatile and infinite and is streaming data which needs to be monitored continuously. The original platform Hadoop can process batch data, but instantaneity is poor, and Storm is an open-source distributed real-time computing framework, so that massive data streams can be processed quickly, and the defect of real-time processing of Hadoop is overcome.

Currently, with the rise of Storm, some application achievements appear in the power industry field. And realizing a time-based sliding window processing method on Storm, and realizing the abnormal detection of the power grid data flow through threshold judgment. And (4) rapidly processing alarm data in the power grid equipment, and realizing the processing of related data streams through a clustering algorithm.

The subtractive clustering algorithm and the K-means algorithm belong to machine learning algorithms which can be divided into supervised learning and unsupervised learning. In the real world, most samples are unlabeled, so unsupervised learning is more widely used than supervised learning. The K-means algorithm belongs to a typical unsupervised learning clustering algorithm, and the initial clustering center is randomly initialized, so that the accuracy of the clustering result is unstable.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a power equipment parallel fault diagnosis method based on hybrid clustering, which can complete parallel diagnosis of corresponding streaming data in real time, meet the fault diagnosis of monitoring data in real time and find the fault of power equipment in time.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the invention provides a parallel fault diagnosis method for power equipment based on hybrid clustering, which comprises the following steps:

adaptively configuring the parallelism and the related process number of each component in the storm platform according to historical power grid data;

the real-time power grid data are accessed into a spit source component of the storm platform through an IRichspit interface to form a data stream to be processed;

encapsulating the data stream to be processed into a plurality of Tuple tuples according to the time sequence, and generating a unique ID for each Tuple;

receiving a Tuple of Tuple by using a PreBolt component, and preprocessing a data set in the Tuple of Tuple by using a standard fraction method to obtain a standardized sample;

and processing the standardized sample by using the fault diagnosis model to obtain a fault diagnosis result of the power equipment.

Further, the method for adaptively configuring the parallelism and the related process number of each component in the storm platform comprises the following steps:

simulating a real-time power grid data flow by using historical power grid data, wherein the flow of the historical power grid data is greater than the expected flow of the real-time power grid data;

calculating data throughput of each component in the storm platform under different parallelism degrees and different process numbers according to historical power grid data;

and in the case that the data throughput meets the expected throughput, the component parallelism and the process number with the lowest overhead are adaptively configured.

Further, preprocessing the data set in the Tuple by a standard fraction method to obtain a standardized sample, including:

it is normalized according to the following formula:

in the above formula, x '(x' epsilon [0,1 ])]) Is a normalized data value; x is the number of _min The minimum value of a certain one-dimensional data in the metadata is obtained; x is a radical of a fluorine atom _max Is the maximum value of this dimension of data.

Further, the method for constructing the fault diagnosis model comprises the following steps:

respectively deploying a subtractive clustering algorithm and a K-means clustering algorithm into an SCMBelt component and a K-means bolt component, connecting the SCMBelt component and the K-means bolt component, and setting the parallelism of the components to obtain a fault diagnosis model.

Further, the processing of the standardized sample by using the fault diagnosis model to obtain a fault diagnosis result of the electrical equipment includes:

determining a better initial clustering center of the standardized sample through a subtractive clustering algorithm;

and taking the better initial clustering center obtained by the subtractive clustering as the initial clustering center of the K-means algorithm, and then clustering, thereby realizing the fault diagnosis result of the sample data.

Further, determining a better initial clustering center by using the normalized sample through a subtractive clustering algorithm, wherein the method comprises the following steps:

the SCMBelt component receives the tuple transmitted by the PreBolt component, subtractive clustering is carried out on data in the tuple, a clustering center is determined through a density value, and the obtained clustering center is a point in original data;

and when the subtractive clustering algorithm is finished, obtaining an initial clustering center, packaging the initial clustering center, the corresponding Id number and the standardized sample to be clustered corresponding to the Id number into a tuple, and transmitting the tuple to a downstream component K-means bolt.

The subtractive clustering method comprises the following steps:

the dimension of the sample is M, the number of the sample points is n, and the sample points are respectively (x) ₁ ,x ₂ ,...,x _n ). All sample points are normalized to one hypercube when the dimension is high. Here, each sample point may be a candidate for a cluster center. The density index of the sample point xi is defined as

In the above formula, r _a Is a positive number. r is _a Is taken as a neighborhood radius of the point, and sample points outside the radius have little contribution to the density index of the point。

After the density index of each sample point is calculated, selecting the sample point with the highest density index as a first clustering center, x _c1 For the selected point, D _c1 Is an indicator of the density at this point. Then each sample point x is selected for the next cluster center _i The density index of (2) can be corrected by the following equation.

In the above formula, r _b Is a positive number.

After the density indexes of all the sample points are corrected, a new clustering center x is selected _c2 And correcting the density indexes of all the sample points again, and continuously repeating the process until enough clustering centers appear to obtain a better initial clustering center.

Further, taking the better initial clustering center obtained by the subtractive clustering as the initial clustering center of the K-means algorithm, and then clustering, wherein the clustering comprises the following steps:

the K-means bolt component carries out K-means clustering on the standard sample to be clustered transmitted by the upstream SCMB bolt component, a clustering center transmitted by the upstream SCMB bolt component is used as an initial clustering center of the K-means clustering in the clustering process, updating of the clustering center is achieved through iteration, and finally a related clustering result is obtained.

Further, in the clustering process, a clustering center transmitted by an upstream SCMBelt component is used as an initial clustering center of the K-means cluster, and the updating of the clustering center is realized through iteration, and the method comprises the following steps:

a) And taking the cluster center transmitted by the upstream SCMBelt component as the initial cluster center of the K-means cluster.

b) And calculating the vector distance from all samples in the sample set to each cluster center, selecting the sample with the minimum vector distance from the sample set, and dividing the sample into the corresponding class.

c) And updating the clustering centers, namely calculating the average value of all sample data in each class, and taking the average value as a new clustering center in the k classes.

d) And c) continuously executing the step b) and the step c) until the newly obtained clustering center does not change any more or the deviation value of the difference between the newly obtained clustering center and the clustering center obtained last time is smaller than a specified threshold value, or the iteration times of algorithm execution reach specified requirements, and stopping clustering when one of the three conditions is met.

5) And (5) saving and summarizing calculation results.

Further, the calculation result storage and summarization includes:

the storage operation of the model diagnosis result to the database is realized through the database bolt, so that the query and the retrieval of the diagnosis result in the power and related industries are facilitated, or the diagnosis result is stored in a data file through a Filebolt component, and the file can be flexibly copied and migrated.

In a second aspect, the present invention provides a parallel fault diagnosis apparatus for electrical devices based on hybrid clustering, including:

the platform deployment module is used for building a storm platform and deploying a machine learning network structure on the storm platform to obtain a fault diagnosis model;

the self-adaptive configuration module is used for self-adaptively configuring the parallelism and the related process number of each component in the storm platform according to historical power grid data;

the data access module is used for accessing the real-time power grid data into a Spout source component of the storm platform through an IRichspout interface to form a data stream to be processed;

the data encapsulation module is used for encapsulating the data stream to be processed into a plurality of Tuple groups according to the time sequence and generating a unique ID for each Tuple group;

a pre-processing module for receiving the Tuple of Tuple with the PreBolt component, and preprocesses the data set in the Tuple of Tuple by a standard fraction method, obtaining a standardized sample;

and the fault diagnosis module is used for processing the standardized samples by using the fault diagnosis model to obtain a fault diagnosis result of the power equipment.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides a storm platform-based hybrid clustering structure. The method is characterized in that a subtraction clustering algorithm component is arranged at the upstream of a storm platform and used for determining an initial clustering center, the clustering speed of the algorithm is high, the obtained clustering centers are points in original data, and the clustering centers are as far as possible apart, so that the situation that a subsequent clustering algorithm K-means falls into local optimum is avoided to a great extent, the iteration times of the subsequent clustering algorithm K-means are reduced, and the accuracy and the efficiency of classification are improved. Compared with the conventional K-means algorithm, the classification accuracy of the algorithm is higher, and the method can accurately classify the streaming data of the power grid equipment.

2. The method is suitable for the streaming data of the power equipment, because the data are basically label-free data, and the algorithm is a clustering algorithm and can better process related sample data.

3. The method has the advantages that the efficiency of processing the streaming data of the power grid equipment is high, namely, the classification model is deployed on the storm platform, and the diagnosis efficiency of fault processing is improved through the self-adaptive configuration of the process number, the element components and the parallelism of the processing components.

4. The method can monitor the fault type of the power equipment, can ensure the safe operation of the power equipment, reduce the loss to the production and the life of residents, can discover various faults of the equipment as soon as possible, and avoids catastrophic accidents.

Drawings

FIG. 1 is a schematic diagram of the data processing process of the present invention.

Fig. 2 is a data access flow diagram.

FIG. 3 is a single-machine implementation flow of the hybrid clustering algorithm.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The first embodiment is as follows:

the embodiment provides a parallel fault diagnosis method for power equipment based on hybrid clustering, which can complete parallel diagnosis of corresponding streaming data in real time. The fault type of the sample data can be accurately given. And the classification algorithm deployed on the storm platform realizes high throughput and low delay processing of streaming data by setting the number of tasks, the number of cluster nodes, and the number of source components and processing components. The fault diagnosis of the monitoring data can be met in real time, and the fault of the power equipment can be found in time.

The invention provides a storm platform-based online parallel diagnosis method for power grid power equipment. The following problems are mainly solved:

(1) The monitoring of the power equipment fault type in the power grid power equipment monitoring field can ensure the safe operation of the power equipment, reduce the loss caused to the production and the life of residents, and timely carry out state monitoring and fault diagnosis on the power equipment, thereby finding various faults of the equipment as soon as possible and avoiding catastrophic accidents.

(2) In the big data of the power system, the monitoring data of various power equipment implies huge commercial value and social value, and more valuable things can be obtained by classifying and mining the high-value data through the method.

(3) When the power equipment is in an extremely severe environment, such as fog, ice rain, storms, thunderstorms and the like, the power equipment frequently sends alarm data to a monitoring center due to the fact that the monitoring value is out of limit, so that the well blowout phenomenon of the monitoring data occurs in the monitoring center, the receiving and processing of the data by the conventional platform cannot meet the actual requirements, the real-time performance cannot be met, and the data is lost and covered. The online parallel fault diagnosis method based on the storm platform can process the blowout data in time.

The method of the invention comprises the following steps:

1) Data source data access

The Spout component serves as a source of the whole topology, data access is achieved through an IRichSpout interface, accessed power grid feature vector data are data streams without intervals, the feature vector data are continuously sent to the Spout source component, and data streams to be processed are formed. Tuple is a Tuple of data flow between components, each Tuple should encapsulate an appropriate amount of data, where each Tuple encapsulates 1000 data, called a data set, i.e. one data set is encapsulated in each Tuple, and then sends the Tuple to the pending queue. In the subsequent processing, in order to facilitate the processing of the diagnostic result and to ensure the sequentiality of the processing tuples, each tuple sent, i.e. the corresponding data set in each tuple, is marked with a unique Id indicating the position of the tuple or the data set in the tuple in the data stream.

2) Sample data standardization preprocessing

The downstream preprocessing component receives tuple sent by the upstream component Spout, samples to be diagnosed are packaged in the tuple, and each tuple contains 1000 pieces of feature vector data. The preprocessing component preprocesses the received intra-tuple feature vector data. Taking the chromatographic data of transformer fault diagnosis oil as an example, H is selected ₂ 、CH ₄ 、C ₂ H ₆ 、C ₂ H ₂ 、C ₂ H ₄ The five gas contents are used as input and are preprocessed, the data value distribution intervals are large, the data of the same type are also large in difference, and in order to reduce the influence caused by the value difference, before the input characteristic quantity is clustered, the input characteristic quantity is normalized according to the following formula, the normalization can also reduce the iteration times of a clustering main body in the process of diagnosing a model, and the clustering accuracy is improved.

The formula is as follows:

in the above formula, x '(x' e [0,1 ]]) Is a normalized data value; x is the number of _min The minimum value of certain one-dimensional data in the tuple data is obtained; x is a radical of a fluorine atom _max Is the maximum value of this dimension data. Respectively taking the normalized value of each dimension feature in the input feature vector set as an input sample of the diagnosis model, and the DGA data input vector mode is [ x ] ₁ ,x ₂ ,x ₃ ,x ₄ ,x ₅ ] ^T 。

After each tuple data in the preprocessing component is preprocessed, the downstream component receives tuples transmitted by the preprocessing component PreBolt, data in the tuples are standardized deformations of data sets corresponding to corresponding Id numbers in the tuples transmitted into the PreBolt component, the data sets are called standardized samples to be classified, the samples participate in subsequent fault diagnosis processing, and final summarization is carried out according to the numbers

3) Better initial cluster center selection

And after the downstream component receives the tuple sent by the preprocessing component PreBolt, performing subtraction clustering processing on the standardized samples to be classified contained in the tuple, thereby obtaining a better initial clustering center.

The subtractive clustering process is packaged here as one component, the SCMBolt component. The component receives the tuple transmitted by the upstream component, clusters the data in the tuple, determines the clustering center through the density value, the clustering speed of the algorithm is high, the obtained clustering center is a point in the original data, and the clustering centers are as far as possible apart, so that the situation that the subsequent clustering algorithm falls into local optimum is avoided to a large extent, after the subtractive clustering algorithm is completed, the initial clustering center is obtained, the initial clustering center, the corresponding Id number and the standardized sample to be clustered corresponding to the ID number are packaged into a tuple, and the tuple is transmitted to the downstream component K-means bolt.

The theoretical basis of subtractive clustering is as follows:

subtractive Clustering (SCM) is a dense Clustering algorithm.

The dimension of the sample is M, the number of sample points is n, and the sample points are (x) ₁ ,x ₂ ,...,x _n ). All sample points are normalized to one hypercube when the dimension is high. Here, each sample point may be a candidate for a cluster center. The density index of the sample point xi is defined as

In the above formula, r _a Is a positive number. r is _a The value of (a) is a neighborhood radius of the point, and the sample points outside the radius have little contribution to the density index of the point.

When the density index of each sample point is calculatedSelecting the sample point with the highest density index as the first clustering center, x _c1 For the selected point, D _c1 Is an indicator of the density at this point. Then each sample point x is selected as the next cluster center _i The density index of (2) can be corrected by the following equation.

In the above formula, r _b Is a positive number. It can be seen that the center x is clustered with the first one _c1 The density index of nearby sample points is significantly reduced, so the probability of these nearby points becoming new cluster centers is not great. And constant r _b A neighborhood is defined whose density index function is significantly reduced. In general r _b Greater than r _a So as to prevent the occurrence of cluster centers which are clustered very close to each other, and generally, r is taken _b ＝1.5r _a 。

After the density indexes of all the sample points are corrected, a new clustering center x is selected _c2 The density index of all sample points is corrected again. The process is repeated continuously until enough cluster centers appear, and the number of the cluster centers can be automatically determined according to conditions.

4) Standardized classification treatment of samples to be classified

The method comprises the steps that an upstream component SCMBelt transmits tuples to a downstream component K-means Bolt, the K-means Bolt component is a main part of a fault diagnosis model, a hard clustering K-means algorithm is realized inside the component, namely, K-means clustering is carried out on a standard sample to be clustered transmitted by the upstream component SCMBelt, a clustering center transmitted by the upstream component is used as an initial clustering center of K-means clustering in the clustering process, the algorithm updates the clustering center through iteration, and finally a related clustering result is obtained.

The K-means bolt component and the SCMBbolt component are combined, and compared with a single K-means bolt component, the overall clustering effect of the K-means bolt component is better than that of a single K-means bolt component, so that the iterative operation times of a K-means algorithm main body are reduced, and the robustness of the K-means algorithm main body is enhanced.

The steps of the K-means original algorithm are as follows:

a) And randomly selecting k different samples from the N sample data as initial clustering centers.

b) And calculating the vector distance from all samples in the sample set to each cluster center, selecting the sample with the minimum vector distance from the sample set, and dividing the sample into the corresponding class. The K-means algorithm typically uses Euclidean distances to implement the classification problem for samples. The formula is as follows:

d _ij represents point x _i And point y _j Euclidean distance between, x _i The coordinates of the point are (x) _i1 ，x _i2 ，x _i3 ，…,x _in )，y _j The coordinates of the point are (y) _j1 ，y _j2 ，y _j3 ，…，y _jn )。

c) And updating the clustering centers, namely calculating the average value of all sample data in each class, and taking the average value as the new clustering center in the k classes.

5) Computation result saving and summarization

The Storm framework is not responsible for saving the calculation results, and the storage and the summarization of the calculation results can be completed by implementing Bolt, namely the calculation results can be directly written into a data file or persistently stored into a database. As required, the result processing modes of the fault diagnosis model include a DatabaseBolt and a FileBolt. The database based bolt realizes the storage operation of the model diagnosis result to the database, thereby facilitating the query and retrieval of the diagnosis result in the power and related industries; and the FileBolt component stores the diagnosis result into a data file, and the file can be flexibly copied and migrated.

The method has the following characteristics and functions:

(1) Compared with the conventional K-means algorithm, the classification accuracy of the algorithm is higher, and the method can accurately classify the streaming data of the power grid equipment.

(2) The method is suitable for the streaming data of the power equipment, because the data are basically label-free data, and the algorithm is a clustering algorithm and can better process related sample data.

(3) The method has the advantages that the efficiency of processing the streaming data of the power grid equipment is high, namely, the classification model is deployed on the storm platform, and the diagnosis efficiency of fault processing is improved through the self-adaptive configuration of the process number, the element components and the parallelism of the processing components.

The method can diagnose the online faults of the power data of the power grid equipment in real time, and the clustering algorithm is deployed on the storm platform, so that the high-efficiency classification processing of the streaming data of the power grid equipment is realized. In addition, the method can monitor the fault type of the power equipment, ensure the safe operation of the power equipment, reduce the loss to the production and life of residents, discover various faults of the equipment as soon as possible and avoid catastrophic accidents.

The following is a preferred embodiment of the invention, which comprises the online fault diagnosis of the power grid equipment using the method of the invention, and the characteristics, purposes and advantages thereof will be apparent from the description of the embodiments.

Taking the fault diagnosis of the transformer as an example, the dissolved gas H in the oil is selected by collecting the chromatographic data of the transformer oil ₂ ，CH ₄ ，C ₂ H ₆ ，C ₂ H ₄ ，C ₂ H ₂ The content of these 5 gases (uL/L) constitutes the eigenvector. And continuously sending the characteristic vector data to a fault diagnosis model on the storm platform, thereby realizing the online diagnosis of the transformer.

Before data processing, a storm cloud platform is built, and the storm cloud platform is a main node and a plurality of slave nodes respectively. Five servers form a physical cluster, and the servers are connected by a gigabit switch.

Before the formal power grid streaming data is processed, the on-line power grid monitoring data flow is simulated through historical data, and the historical data flow is larger than the formal data, so that the optimal process number, the concurrency of a source component and a logic processing component are configured in a self-adaptive mode through throughput calculation, and the subsequent formal power grid streaming data can be processed to the maximum extent.

And after the relevant self-adaptive configuration is completed, processing formal power grid streaming data. Firstly, data access is carried out, a Spout source component is connected with an external data source, and in data selection, in order to prevent data set deflection, oil chromatography detection data before and after failures of transformers of the same type in a plurality of engineering sites are usually selected during data acquisition. These data include normal data as well as fault-class data, and are unlabeled samples. And then reading the metadata to a buffer area, when the number of the metadata meets a tuple requirement, namely when the number meets 1000, packaging the metadata into 1 tuple, and sending the tuple to a queue to be processed so as to perform subsequent processing of the data. Here the suitable parallelism of the source components can improve the processing efficiency.

Following data preprocessing, the Spout component sends the tuple to the downstream preprocessing component PreBolt. PreBolt processes the received tuple through normalization, and normalization can also reduce the iteration times of a clustering main body in the process of diagnosing the model and improve the clustering accuracy. After preprocessing the data, the prebelt composes the related data into a new tuple and sends the new tuple to the downstream component scmbelt.

The whole classification module comprises two components, namely a SCMBelt and a K-means Bolt, the SCMBelt components realize a subtractive clustering algorithm, a tuple transmitted by an upstream component is received, data in the tuple are clustered, a clustering center is determined through a density value, the clustering speed of the algorithm is high, the obtained clustering center is a point in original data, and the clustering centers are as far as possible, so that the situation that a subsequent clustering algorithm is in local optimization is avoided to a large extent. The K-means bolt component and the Subbolt component are combined, and compared with a single K-means bolt component, the overall clustering effect of the K-means bolt component is that the iterative operation times of a K-means algorithm main body are reduced and the robustness of the K-means bolt component is enhanced on the aspect of diagnosing the feature vector data of the power grid equipment.

And finally, storing and summarizing results, and directly writing the diagnosis results into a data file or persistently storing the diagnosis results into a database, namely, the result processing mode of the fault diagnosis model comprises a database bolt and a Filebolt according to needs. The database based bolt realizes the storage operation of the model diagnosis result to the database, thereby facilitating the query and the retrieval of the diagnosis result in the power and related industries; and the FileBolt component stores the diagnosis result into a data file, and the file can be flexibly copied and migrated.

Example two:

the embodiment provides a parallel fault diagnosis system of power equipment based on hybrid clustering, which comprises:

the preprocessing module is used for receiving the Tuple by using the PreBolt component and preprocessing a data set in the Tuple by a standard fraction method to obtain a standardized sample;

The system of the embodiment can be used for implementing the method described in the first embodiment.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims

1. A parallel fault diagnosis method for power equipment based on hybrid clustering is characterized by comprising the following steps:

utilizing a PreBolt component to receive a Tuple of Tuple, and preprocessing a data set in the Tuple of Tuple by a standard fraction method to obtain a standardized sample;

2. The hybrid clustering-based power equipment parallel fault diagnosis method according to claim 1, wherein the method for adaptively configuring the parallelism and the related process number of each component in the storm platform comprises the following steps:

the adaptive configuration has the lowest overhead component parallelism and process count in the case of data throughput that meets the expected throughput.

3. The hybrid clustering-based power equipment parallel fault diagnosis method of claim 1, wherein preprocessing a data set in a Tuple by a standard score method to obtain a standardized sample comprises:

it is normalized according to the following formula:

in the above formula, x '(x' e [0,1 ]]) Is a normalized data value; x is a radical of a fluorine atom _min The minimum value of a certain one-dimensional data in the metadata is obtained; x is the number of _max Is the maximum value of this dimension data.

4. The parallel fault diagnosis method for the power equipment based on the hybrid clustering is characterized in that the construction method of the fault diagnosis model comprises the following steps:

5. The parallel fault diagnosis method for the electric power equipment based on the hybrid clustering as claimed in claim 4, wherein the step of processing the standardized samples by using the fault diagnosis model to obtain the fault diagnosis result of the electric power equipment comprises the following steps:

6. The parallel fault diagnosis method for the power equipment based on the hybrid clustering is characterized in that the step of determining the optimal initial clustering center of the normalized samples through a subtractive clustering algorithm comprises the following steps:

the SCMBelt component receives the tuple transmitted by the PreBolt component, subtractive clustering is carried out on data in the tuple, a clustering center is determined according to the density value, and the obtained clustering center is a point in the original data;

after the subtractive clustering algorithm is completed, obtaining an initial clustering center, packaging the initial clustering center, the corresponding Id number and the standardized sample to be clustered corresponding to the ID number into a tuple, and transmitting the tuple to a downstream component K-means bolt;

the subtractive clustering method comprises the following steps:

the dimension of the sample is M, the number of the sample points is n, and the sample points are respectively (x) ₁ ,x ₂ ,...,x _n ) (ii) a All sample points are normalized to a hypercube when the dimension is high; here, each sample point may be a candidate for a cluster center; the density index of the sample point xi is defined as

In the above formula, r _a Is a positive number; r is _a Is taken to be a neighborhood radius of the point,

after the density index of each sample point is calculated, selecting the sample point with the highest density index as a first clustering center, x _c1 For the selected point, D _c1 Is an index of the density at this point; then each sample point x is selected for the next cluster center _i The density index of (a) can be corrected by the following formula;

in the above formula, r _b Is a positive number;

7. The parallel fault diagnosis method for the electric power equipment based on the hybrid clustering as claimed in claim 5, wherein the clustering is performed by using the optimal initial clustering center obtained by the subtractive clustering as the initial clustering center of the K-means algorithm, and comprises the following steps:

the K-means bolt component carries out K-means clustering on a standard sample to be clustered transmitted by an upstream SCMB bolt component, a clustering center transmitted by the upstream SCMB bolt component is used as an initial clustering center of the K-means clustering in the clustering process, the updating of the clustering center is realized through iteration, and finally a related clustering result is obtained.

8. The parallel fault diagnosis method for the power equipment based on the hybrid clustering as claimed in claim 7, wherein in the clustering process, a clustering center transmitted from an upstream SCMBelt component is used as an initial clustering center of a K-means cluster, and the updating of the clustering center is realized through iteration, and comprises the following steps:

a) Taking the clustering center transmitted by the upstream SCMBelt component as an initial clustering center of the K-means cluster;

b) Calculating the vector distance from all samples in the standard sample set to each initial clustering center, selecting the sample with the minimum vector distance from the vector distances and dividing the sample into the corresponding class;

c) Updating the clustering centers, namely calculating the average value of all sample data in each class, and taking the average values as new clustering centers in the k classes;

d) Continuously executing the step b) and the step c) until the newly obtained clustering center does not change any more or the deviation value of the difference between the newly obtained clustering center and the clustering center obtained last time is less than a specified threshold value, or the iteration frequency of algorithm execution reaches a specified requirement, and stopping clustering when one of the three conditions is met;

e) And (5) saving and summarizing calculation results.

9. The parallel fault diagnosis method based on hybrid clustering of power equipment according to claim 8, wherein the saving and summarizing of calculation results comprises:

the storage operation of the model diagnosis result to the database is realized through the database bolt component, so that the query and the retrieval of the diagnosis result in the power and related industries are facilitated, or the diagnosis result is stored in a data file through the Filebolt component, and the file can be flexibly copied and migrated.

10. A parallel fault diagnosis device of power equipment based on hybrid clustering is characterized by comprising:

the data encapsulation module is used for encapsulating the data stream to be processed into a plurality of Tuple tuples according to the time sequence and generating a unique ID for each Tuple;