CN106503717A - Feature extracting method based on the power transmission and transformation main equipment load curve of unsupervised model - Google Patents

Feature extracting method based on the power transmission and transformation main equipment load curve of unsupervised model Download PDF

Info

Publication number
CN106503717A
CN106503717A CN201610833862.1A CN201610833862A CN106503717A CN 106503717 A CN106503717 A CN 106503717A CN 201610833862 A CN201610833862 A CN 201610833862A CN 106503717 A CN106503717 A CN 106503717A
Authority
CN
China
Prior art keywords
feature set
power transmission
load curve
main equipment
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610833862.1A
Other languages
Chinese (zh)
Inventor
庄池杰
张斌
胡军
段炼
尹立群
郭丽娟
张玉波
罗怿
曾嵘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Electric Power Research Institute of Guangxi Power Grid Co Ltd
Original Assignee
Tsinghua University
Electric Power Research Institute of Guangxi Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Electric Power Research Institute of Guangxi Power Grid Co Ltd filed Critical Tsinghua University
Priority to CN201610833862.1A priority Critical patent/CN106503717A/en
Publication of CN106503717A publication Critical patent/CN106503717A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of feature extracting method of the power transmission and transformation main equipment load curve based on unsupervised model, extraction step is:Feature extraction and selection is carried out to original loads curve, obtains physical features collection;Physical features collection is extended, extension feature collection is obtained;Physical features collection is overlapped with extension feature collection, attributive character collection is obtained;Attributive character collection is classified.Its advantage is:When machine utilization anomalous data identification is carried out, can be using the output of unsupervised model as the input for having monitor model, the relevant information of the factor that locally peels off is introduced i.e. in property set, information of the property set comprising correlation between various kinds of equipment load, so as to characterize various kinds of equipment load characteristic pattern more fully hereinafter, and then improve the estimated performance of grader.

Description

Unsupervised model-based power transmission and transformation main equipment load curve feature extraction method
Technical Field
The invention relates to the field of big data mining of an electric power system, in particular to an analysis method for load curve characteristics of main power transmission and transformation equipment.
Background
With the continuous improvement of the informatization degree of the power system and the rapid increase of the large power data volume, an algorithm suitable for large power data mining is researched, an effective knowledge discovery model is established, and the method has important significance for the innovation and development of the intelligent power grid business mode.
In the power system, when the load abnormal data of the power transmission and transformation main equipment is identified, the obtained load curve of the power transmission and transformation main equipment is a sample without training, namely the types (normal and abnormal) of all the equipment load data are unknown. In this case, the outlier object, i.e. the abnormal pattern of the load data, is found by analyzing the relationship between the loads of various types of equipment. The whole process belongs to unsupervised learning. In the conventional detection of abnormal patterns of the load data of the equipment, the method is generally used. The attribute set obtained by the equipment load curve through feature extraction, feature selection and dimension reduction reflects the feature modes of various equipment load data, namely the attribute set does not contain the information of the interrelation among various equipment load data. Therefore, the representation of the load data of various types of equipment is not comprehensive, and the prediction performance of the classifier is low.
Disclosure of Invention
The invention aims to solve the problems and designs a feature extraction method of a load curve of power transmission and transformation main equipment based on an unsupervised model. The specific design scheme is as follows:
a feature extraction method of a load curve of power transmission and transformation main equipment based on an unsupervised model comprises the following extraction steps:
step 1, extracting and selecting features of an original load curve to obtain a physical feature set;
step 2, expanding the physical feature set to obtain an expanded feature set;
step 3, superposing the physical feature set and the extended feature set to obtain an attribute feature set;
and 4, classifying the attribute feature set.
In step 2, the expansion mode of the physical feature set comprises an LOF algorithm and a composite line analysis algorithm, and the composite line analysis algorithm comprises a principal component analysis algorithm and a factor analysis algorithm.
The number of the physical feature sets is multiple, and the classification mode of the attribute feature sets comprises decision tree classification, Bayesian classification and support vector machine classification.
The LOF algorithm is a characteristic index obtained based on Euclidean distance;
the extended feature set obtained by the LOF algorithm reflects the distribution situation of all objects in the physical feature set,
the extended feature set obtained by the LOF algorithm can be directly introduced into the physical feature set.
The composite line analysis algorithm comprises a principal component analysis algorithm and a factor analysis algorithm.
The calculation steps in the composite line analysis algorithm are as follows:
reducing the dimension of the load curve characteristic in the physical characteristic set;
mapping the curve characteristics after dimension reduction to a two-dimensional plane;
carrying out layout density calculation to obtain layout density rhoi
Calculating the local density rhoiDistance to high local density pointsi
The layout density ρiThe calculation method is as follows:
ρi=Σχ(dij-dc)
wherein,
dcfor the cutting distance, cut-off distance dcIs a hyper-parameter;
distance of the high local density pointiThe calculation method is as follows:
for the point with the maximum global density, let iRepresenting the closest distance of point i to a point greater than its density.
The layout density ρiSmaller by said truncation distance d with respect to the distance from point icThe number of points of (b), the piThe phase reflects the density of the point i, and the value of piThe sensitivity of the relative value in the calculation process is higher than that of the relative valueThe step distance dc
The extended feature set comprises an LOF extended feature set, a principal component extended feature set and a factor extended feature set,
the principal component extended feature set is an extended feature set obtained by the physical feature set through a principal component algorithm;
the factor expansion feature set is an expansion feature set obtained by the physical feature set through a factor analysis algorithm;
the LOF expansion feature set is an expansion feature set obtained through a LOF algorithm.
The method for extracting the characteristics of the load curve of the power transmission and transformation main equipment based on the unsupervised model, which is obtained by the technical scheme, has the beneficial effects that:
when the abnormal data of the equipment load is identified, the output of the unsupervised model can be used as the input of the supervised model, namely, the related information of the local outlier factor is introduced into the attribute set, and the attribute set contains the information of the interrelation among various equipment loads, so that the characteristic modes of various equipment loads are more comprehensively characterized, and the prediction performance of the classifier is further improved.
Drawings
Fig. 1 is a characteristic extraction method of a load curve of an electric transmission and transformation main device based on an unsupervised model, wherein a subject working characteristic curve, also called a susceptibility curve, of a principal component extended characteristic set and an attribute characteristic set after classification by an SVM support vector machine algorithm is described;
fig. 2 is a characteristic extraction method of a load curve of an electric transmission and transformation main device based on an unsupervised model, wherein a subject working characteristic curve, also called a susceptibility curve, of the factor analysis expansion feature set and the attribute feature set after classification by an SVM support vector machine algorithm is also shown;
fig. 3 is a comparison table of area values AUC under the curve after classification of a physical feature set and an attribute feature set by an SVM support vector machine algorithm in the feature extraction method of a load curve of a power transmission and transformation main equipment based on an unsupervised model.
In the figure, an attribute set a and a principal component extended feature set are shown; attribute set b, factor expansion feature set; the attribute set a + and the attribute feature set obtained by superposing the principal component extended feature set and the physical feature set; and the attribute set b + and the attribute feature set obtained by superposing the factor expansion feature set and the physical feature set.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
A feature extraction method of a load curve of power transmission and transformation main equipment based on an unsupervised model comprises the following extraction steps:
step 1, extracting and selecting features of an original load curve to obtain a physical feature set;
step 2, expanding the physical feature set to obtain an expanded feature set;
step 3, superposing the physical feature set and the extended feature set to obtain an attribute feature set;
and 4, classifying the attribute feature set.
In step 2, the expansion mode of the physical feature set comprises an LOF algorithm and a composite line analysis algorithm, and the composite line analysis algorithm comprises a principal component analysis algorithm and a factor analysis algorithm.
The number of the physical feature sets is multiple, and the classification mode of the attribute feature sets comprises decision tree classification, Bayesian classification and support vector machine classification.
The LOF algorithm is a characteristic index obtained based on Euclidean distance;
the extended feature set obtained by the LOF algorithm reflects the distribution situation of all objects in the physical feature set,
the extended feature set obtained by the LOF algorithm can be directly introduced into the physical feature set.
The composite line analysis algorithm comprises a principal component analysis algorithm and a factor analysis algorithm.
The calculation steps in the composite line analysis algorithm are as follows:
reducing the dimension of the load curve characteristic in the physical characteristic set;
mapping the curve characteristics after dimension reduction to a two-dimensional plane;
carrying out layout density calculation to obtain layout density rhoi
Calculating the local density rhoiDistance to high local density pointsi
The layout density ρiThe calculation method is as follows:
ρi=Σχ(dij-dc)
wherein,
dcfor the cutting distance, cut-off distance dcIs a hyper-parameter;
distance of the high local density pointiThe calculation method is as follows:
for the point with the maximum global density, let iRepresenting the closest distance of point i to a point greater than its density.
The layout density ρiSmaller by said truncation distance d with respect to the distance from point icThe number of points of (b), the piThe phase reflects the density of the point i, and the value of piSensitivity of relative value in calculation process is higher than the stage distance dc
The extended feature set comprises an LOF extended feature set, a principal component extended feature set and a factor extended feature set,
the principal component extended feature set is an extended feature set obtained by the physical feature set through a principal component algorithm;
the factor expansion feature set is an expansion feature set obtained by the physical feature set through a factor analysis algorithm;
the LOF expansion feature set is an expansion feature set obtained through a LOF algorithm.
Example 1
The data set used in this embodiment is load data of 6200 power transmission and transformation main equipment for 18 months, and the sampling frequency is 30 minutes. Since the study of the abnormality of the long-term load data of the equipment is emphasized, the time unit of the study is taken as one month, namely, the original data set is processed, and the monthly average load of each equipment is calculated to reflect the characteristic pattern of the data, namely 30 days. There are a total of 111600 load curves in this example. 6200 load data of the power transmission and transformation equipment comprises 6123 normal conditions and 77 abnormal conditions, and the abnormal proportion is 1.24%. The input of the model is an original data set, and the output is the equipment load data abnormality degree and suspected probability ordering.
According to the physical meaning of the equipment load curve, extracting 14 physical feature sets comprises: the descending trend index trb, the differences of average loads before and after 1, 3 and 6 months davg _1, davg _3 and davg _6, the standard deviation sd of all monthly average load sequences, the standard deviations bsd _6 and esd _6 of the average load sequences before and after 6 months, the ratios of the average loads of the last 3, 6 and 9 months to the average loads of all months, namely, the rates 3, 6 and 9, the slope slo of linear fitting of average load data, the moduli dfou _6 of Fourier coefficient difference sequences before and after 6 months, and the correlation coefficient cor of each equipment load sequence to the median load sequence of all equipment. Besides, a historical evaluation grade record of the equipment load data is added, and the record has A, B, C, D, E discrete grades, namely V1-V14, from high to low.
Example 2
On the basis of the example 1, the method comprises the following steps of,
performing principal component analysis and factor analysis on the physical feature sets to obtain 14 attribute sets a and b corresponding to each physical feature set and the corresponding local density rho after each physical feature set row is associatediHigh lift point distancei
LOF calculation is carried out on the attribute set a and the attribute set b to obtain an LOF value,
and (4) carrying out SVM (support vector machine) algorithm classification on the attribute set a and the attribute set b and the physical feature set respectively to obtain an attribute set a + and an attribute set b +.
Example 3
On the basis of example 2, the characteristic a and the characteristic a + are subjected to receiver operating characteristic calculation to obtain ROC curves with the specific FPR as abscissa and the sensitivity FPR as ordinate,
fig. 1 is a characteristic extraction method of a load curve of power transmission and transformation main equipment based on an unsupervised model, wherein a subject working characteristic curve of a principal component expansion characteristic set and an attribute characteristic set after classification by an SVM support vector machine algorithm is shown in fig. 1, and as can be seen from a calculation result, the output of an unsupervised equipment load abnormal data identification model is used as the input of the supervised model to expand an original attribute set, so that the performance of a classifier can be effectively improved.
Example 4
On the basis of example 2, the characteristic b and the characteristic b + were subjected to receiver operating characteristic calculation to obtain ROC curves having the specific FPR as the abscissa and the sensitivity FPR as the ordinate,
fig. 2 is a characteristic extraction method of a load curve of power transmission and transformation main equipment based on an unsupervised model, wherein the factor-based expansion feature set and the attribute feature set are subjected to a working characteristic curve of a subject classified by an SVM support vector machine algorithm, as shown in fig. 2, the calculation result shows that the output of an unsupervised equipment load abnormal data identification model is used as the input of the supervised model to expand an original attribute set, so that the performance of the classifier can be effectively improved.
Example 5
Based on embodiments 3 and 4, a lower area AUC defined by an ROC curve and a coordinate axis is calculated, fig. 3 is a comparison table of area values AUC under a curve obtained by classifying a physical feature set and an attribute feature set by an SVM support vector machine algorithm in the feature extraction method of a load curve of power transmission and transformation main equipment based on an unsupervised model, as shown in fig. 3, it can be seen from a calculation result that an output of an unsupervised equipment load abnormal data identification model is used as an input of the supervised model to expand an original attribute set, so that the performance of a classifier can be effectively improved.
The technical solutions described above only represent the preferred technical solutions of the present invention, and some possible modifications to some parts of the technical solutions by those skilled in the art all represent the principles of the present invention, and fall within the protection scope of the present invention.

Claims (8)

1. A feature extraction method of a load curve of power transmission and transformation main equipment based on an unsupervised model is characterized by comprising the following extraction steps:
step 1, extracting and selecting features of an original load curve to obtain a physical feature set;
step 2, expanding the physical feature set to obtain an expanded feature set;
step 3, superposing the physical feature set and the extended feature set to obtain an attribute feature set;
and 4, classifying the attribute feature set.
2. The unsupervised model-based method for extracting characteristics of load curves of power transmission and transformation main equipment according to claim 1, wherein in the step 2, the expansion mode of the physical characteristic set comprises an LOF algorithm and a composite line analysis algorithm, and the composite line analysis algorithm comprises a principal component analysis algorithm and a factor analysis algorithm.
3. The unsupervised model-based method for extracting characteristics of a load curve of power transmission and transformation main equipment according to claim 1, wherein the number of the physical characteristic sets is multiple, and the classification manner of the attribute characteristic sets comprises decision tree classification, Bayesian classification and support vector machine classification.
4. The unsupervised model-based method for extracting characteristics of a load curve of a power transmission and transformation main equipment according to claim 2,
the LOF algorithm is a characteristic index obtained based on Euclidean distance;
the extended feature set obtained by the LOF algorithm reflects the distribution situation of all objects in the physical feature set,
the extended feature set obtained by the LOF algorithm can be directly introduced into the physical feature set.
The composite line analysis algorithm comprises a principal component analysis algorithm and a factor analysis algorithm.
5. The unsupervised model-based method for extracting characteristics of the load curve of the power transmission and transformation main equipment according to claim 2, wherein the composite line analysis algorithm comprises the following calculation steps:
reducing the dimension of the load curve characteristic in the physical characteristic set;
mapping the curve characteristics after dimension reduction to a two-dimensional plane;
carrying out layout density calculation to obtain layout density rhoi
Calculating local densityDegree rhoiDistance to high local density pointsi
6. The method for extracting the characteristics of the load curve of the power transmission and transformation main equipment based on the unsupervised model in the 5 is characterized in that,
the layout density ρiThe calculation method is as follows:
ρi=∑χ(dij-dc)
wherein,
dcfor the cutting distance, cut-off distance dcIs a hyper-parameter;
distance of the high local density pointiThe calculation method is as follows:
for the point with the maximum global density, let iRepresenting the closest distance of point i to a point greater than its density.
7. The unsupervised model-based power transmission and transformation main equipment load curve feature extraction method according to claim 6, characterized in that the layout density ρ isiSmaller by said truncation distance d with respect to the distance from point icThe number of points of (b), the piThe phase reflects the density of the point i, and the value of piSensitivity of relative value in calculation process is higher than the stage distance dc
8. The feature extraction method of the load curve of the power transmission and transformation main equipment based on the unsupervised model is characterized in that the extended feature set comprises an LOF extended feature set, a principal component extended feature set and a factor extended feature set,
the principal component extended feature set is an extended feature set obtained by the physical feature set through a principal component algorithm;
the factor expansion feature set is an expansion feature set obtained by the physical feature set through a factor analysis algorithm;
the LOF expansion feature set is an expansion feature set obtained through a LOF algorithm.
CN201610833862.1A 2016-09-19 2016-09-19 Feature extracting method based on the power transmission and transformation main equipment load curve of unsupervised model Pending CN106503717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610833862.1A CN106503717A (en) 2016-09-19 2016-09-19 Feature extracting method based on the power transmission and transformation main equipment load curve of unsupervised model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610833862.1A CN106503717A (en) 2016-09-19 2016-09-19 Feature extracting method based on the power transmission and transformation main equipment load curve of unsupervised model

Publications (1)

Publication Number Publication Date
CN106503717A true CN106503717A (en) 2017-03-15

Family

ID=58291432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610833862.1A Pending CN106503717A (en) 2016-09-19 2016-09-19 Feature extracting method based on the power transmission and transformation main equipment load curve of unsupervised model

Country Status (1)

Country Link
CN (1) CN106503717A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214840A (en) * 2017-06-30 2019-01-15 阿里巴巴集团控股有限公司 A kind of data dependence analysis method and device
CN110956281A (en) * 2019-10-29 2020-04-03 广东电网有限责任公司 Power equipment abnormity detection alarm system based on Log analysis

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509178A (en) * 2011-11-25 2012-06-20 江苏省电力公司淮安供电公司 Distribution network device status evaluating system
CN102999791A (en) * 2012-11-23 2013-03-27 广东电网公司电力科学研究院 Power load forecasting method based on customer segmentation in power industry
CN103838959A (en) * 2013-12-18 2014-06-04 国网上海市电力公司 Method for applying partial least squares regression to power distribution network harmonic source positioning and detecting
CN103872782A (en) * 2014-03-31 2014-06-18 国家电网公司 Electric energy quality data comprehensive service system
CN104252686A (en) * 2014-08-15 2014-12-31 国家电网公司 Determination method for power grid safety aggregative indicators

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509178A (en) * 2011-11-25 2012-06-20 江苏省电力公司淮安供电公司 Distribution network device status evaluating system
CN102999791A (en) * 2012-11-23 2013-03-27 广东电网公司电力科学研究院 Power load forecasting method based on customer segmentation in power industry
CN103838959A (en) * 2013-12-18 2014-06-04 国网上海市电力公司 Method for applying partial least squares regression to power distribution network harmonic source positioning and detecting
CN103872782A (en) * 2014-03-31 2014-06-18 国家电网公司 Electric energy quality data comprehensive service system
CN104252686A (en) * 2014-08-15 2014-12-31 国家电网公司 Determination method for power grid safety aggregative indicators

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
庄池杰 等: "基于无监督学习的电力用户异常用电模式检测", 《中国电机工程学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214840A (en) * 2017-06-30 2019-01-15 阿里巴巴集团控股有限公司 A kind of data dependence analysis method and device
CN110956281A (en) * 2019-10-29 2020-04-03 广东电网有限责任公司 Power equipment abnormity detection alarm system based on Log analysis

Similar Documents

Publication Publication Date Title
Fan et al. Wafer defect patterns recognition based on OPTICS and multi-label classification
WO2016101628A1 (en) Data processing method and device in data modeling
CN111160401B (en) Abnormal electricity utilization discriminating method based on mean shift and XGBoost
CN107463604A (en) A kind of time series fixed segments algorithm based on vital point
CN105760889A (en) Efficient imbalanced data set classification method
CN110995153B (en) Abnormal data detection method and device for photovoltaic power station and electronic equipment
Jiang et al. A family of joint sparse PCA algorithms for anomaly localization in network data streams
CN108333468A (en) The recognition methods of bad data and device under a kind of active power distribution network
CN106484838A (en) Safety inspection java standard library dynamic updating method based on data mining
CN103500343A (en) Hyperspectral image classification method based on MNF (Minimum Noise Fraction) transform in combination with extended attribute filtering
JP5128437B2 (en) Entity classification apparatus and method based on time series relation graph
CN112463848A (en) Method, system, device and storage medium for detecting abnormal user behavior
CN106503717A (en) Feature extracting method based on the power transmission and transformation main equipment load curve of unsupervised model
Jandan et al. Recognition and classification of power quality disturbances by DWT-MRA and SVM classifier
CN105279524A (en) High-dimensional data clustering method based on unweighted hypergraph segmentation
CN106610977A (en) Data clustering method and device
CN114064723A (en) Association rule mining method and device, computer equipment and storage medium
CN117349786B (en) Evidence fusion transformer fault diagnosis method based on data equalization
CN116719831B (en) Standard database establishment and update method for health monitoring
CN107784015B (en) Data reduction method based on online historical data of power system
Kamani et al. Data normalization in data mining using graphical user interface: A pre-processing stage
CN107666403B (en) Index data acquisition method and device
CN112306730B (en) Defect report severity prediction method based on historical item pseudo label generation
KR102357475B1 (en) Energy Theft Detecting System And Method Using Improved GBTD Algorithm
CN113705920A (en) Generation method of water data sample set for thermal power plant and terminal equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170315