CN116304931B

CN116304931B - Electric power data mining method based on big data

Info

Publication number: CN116304931B
Application number: CN202310530075.XA
Authority: CN
Inventors: 李营; 李孟雷; 王修伦; 崔玉静
Original assignee: Shandong Yingwei Electronic Technology Co ltd
Current assignee: Shandong Yingwei Electronic Technology Co ltd
Priority date: 2023-05-12
Filing date: 2023-05-12
Publication date: 2023-08-04
Anticipated expiration: 2043-05-12
Also published as: CN116304931A

Abstract

The invention discloses a power data mining method based on big data, which is used for mining various types of power data such as power load data, power generation data, power supply quality data, power consumer behavior data, weather data and the like. According to the method, interference data blocks are removed and frequency domain compression is carried out on effective data blocks through multiple technical means such as recursive classification, data clustering, direction vector consistency analysis and frequency domain compression, so that more efficient data mining is achieved. The method has the advantages of high efficiency, accuracy, applicability and the like, can be widely applied to the field of data mining of the power system, and improves the safety, stability and reliability of the power system.

Description

Electric power data mining method based on big data

Technical Field

The invention relates to the technical field of data processing, in particular to an electric power data mining method based on big data.

Background

With the development of society, energy problems have become a global focus of attention. Among them, electric power is one of important energy forms, and is important for the development of the economical society. During the production and use of electricity, a large amount of data such as power load data, power generation data, power quality data, power consumer behavior data, weather data, and the like are generated. These data can provide important reference information for management and operation of the power industry, but also bring problems of huge data volume and low data processing efficiency. For this reason, power data mining is one of the hot spots of research in the power domain.

Currently, power data mining techniques have been widely used, with the most common being cluster analysis-based approaches. This approach typically breaks the power data into several data blocks, which are then clustered to find rules and associations in the data. There have been many patent documents that propose different power data mining methods.

For example, U.S. patent No. 8484001B2 discloses a method of power load prediction based on cluster analysis. The method obtains statistical characteristics of the power load by clustering historical power load data, and predicts future power load by utilizing the characteristics. The method can effectively improve the accuracy of power load prediction, but does not consider the classification problem of power data, is easily influenced by data noise and abnormal values, and has higher requirements on data preprocessing.

In addition, chinese patent CN101812877B discloses a method for clustering power data. According to the method, firstly, clustering analysis is carried out on the power data, and then, the clustering result is mapped onto a two-dimensional plane so as to be convenient for visual display of the clustering result. The method can conveniently display the clustering result of the electric power data, but is difficult to deeply analyze and mine the data because the clustering result is only displayed by adopting a two-dimensional plane.

In addition, some patent documents propose some power data mining methods based on frequency domain analysis. For example, chinese patent CN103431902B discloses a method for extracting power data features based on wavelet transform. The method performs wavelet transformation on the power data and analyzes wavelet coefficients to extract characteristics of the power data. The method can effectively extract the characteristics of the power data, but has larger calculation amount and lower processing efficiency due to the complexity of wavelet analysis.

Although the above patent documents propose different power data mining methods, the power data mining methods proposed in the above patent documents still have some problems. For example, in the method based on wavelet analysis, the power data is decomposed by means of wavelet packet decomposition, but wavelet packet decomposition is relatively poor in the decomposing ability of signals, and high-frequency components are easily decomposed into low-frequency components, resulting in poor quality of the decomposed signals. In the method based on singular value decomposition, the anti-interference capability of the method on noise is relatively poor, and the method is easily influenced by the noise, so that the quality of a clustering center is reduced, and the accuracy of a data mining result is further influenced.

In addition, in the existing power data mining method, when analyzing the power data, only mining is often performed on single type of power data, for example, only analysis is performed on power load data or only analysis is performed on power supply quality data, fusion analysis on different types of power data is lacking, and the overall operation condition of a power system cannot be comprehensively reflected. In addition, the existing power data mining method also has some problems which cannot be effectively solved, such as unstable clustering results, large calculated amount and the like.

Disclosure of Invention

The invention aims to provide a power data mining method based on big data, which adopts technologies such as recursive classification, singular value decomposition, direction vector consistency analysis, frequency domain compression and the like, can efficiently and accurately mine key information in power data, and improves the reliability and economy of a power system.

In order to solve the technical problems, the invention provides a power data mining method based on big data, which comprises the following steps:

step S1: acquiring power data, and performing recursive classification on the power data based on a tree structure, wherein the method specifically comprises the following steps: performing first classification on the acquired power data according to a set classification rule to obtain a plurality of first classification data, performing second classification on the first classification data according to the set classification rule, and the like until the classification data corresponding to the final node in the finally obtained classification tree contains only one data value, thereby obtaining the classification tree of the power data;

step S2: obtaining data values contained in all the terminal nodes of the classification tree, carrying out data clustering processing to obtain a plurality of data blocks with different clustering centers, and calculating the direction vector of each data block;

step S3: carrying out data consistency analysis based on the direction vector to obtain a data block with the direction vector deviating from a consistency range as an interference data block; removing the interference data blocks;

step S4: and carrying out frequency domain compression on each data block according to the direction vector of the data block obtained by calculation on the rest data blocks to obtain frequency domain compressed data blocks as a data mining result.

Further, the obtained power data at least includes the following power data: power load data, power generation data, power quality data, power consumer behavior data, and weather data.

Further, in the step S1, the classification rules used in each classification are different; the classification rule at least comprises the following categories: time rules, numerical rules, type rules, data type rules, location rules; the time rule is defined as: a rule for classifying the power data according to the time of acquiring the power data; the numerical rule is defined as: a rule for classifying according to the difference of the numerical ranges of the power data; the type rule is defined as: a rule for classifying the power data according to the category to which the power data belongs; the data type rule is defined as: a rule for classifying the electric power data according to the type to which the data value of the electric power data belongs; the location rule is defined as: a rule for classifying the acquired power data according to different node positions of the acquired power data in a power system; when classifying the power data, the first classification is to classify the power data according to a set type rule, and the last classification is to classify the power data according to a set numerical rule.

Further, the step S2 specifically includes: firstly, acquiring data values contained in all the terminal nodes of a classification tree, and taking the data values as input data of a clustering algorithm; performing singular value decomposition on input data by using the following formula to obtain a singular value decomposition matrix of the data:

；

wherein, the input data is represented by a representation of the input data,andtwo orthogonal matrices representing the decomposition matrix are shown,representing a diagonal matrix, the elements on the diagonal representing singular values; for a pair ofCutting off to obtain new diagonal matrix：

；

Wherein, representing the number of singular values to be retained,represent the firstSingular values; will beSubstituting the singular value decomposition matrix to obtain a truncated singular value decomposition matrix:

；

wherein, representing the singular value decomposition matrix after the interception; will beAs input data, clustering the input data by using a decomposition clustering algorithm to obtain a plurality of clustering centers; the direction vector of each cluster center is calculated, namely, each cluster center is regarded as a vector, and then normalization processing is carried out on the vector.

Further, the formula of the decomposition clustering algorithm is as follows:

；

wherein, representing the number of samples to be taken,representing the number of cluster centers,represent the firstA number of samples of the sample were taken,represent the firstA cluster center; the formula for calculating the direction vector is as follows:

；

wherein, represent the firstAnd clustering centers.

Further, the step S3 specifically includes: direction vector for each cluster centerCalculating the included angle between the two clustering centers by using the following formulaWherein：

；

Wherein, represent the firstThe direction vector of the center of the cluster,the modulus of the vector is represented,representing an inverse cosine function; direction vector for each cluster centerCalculate the average included angle with all other direction vectors：

；

Wherein, representing the number of cluster centers; according to a given consistency rangeCalculating the lower and upper limits of the consistency range:

；

wherein, the radius of the range of uniformity is indicated,representing the circumference ratio; for each data block, its direction vector is calculatedDirection vector to all cluster centersIncluded angle of (2)：

；

Wherein, a direction vector representing the block of data,represent the firstA direction vector of each cluster center; judging whether the direction vector of the data block is in the consistency range, namely:

；

if the direction vector of the data block is within the consistency range, the data block is considered to meet the consistency requirement, otherwise, the data block is considered to deviate from the consistency range, is considered to be an interference data block and needs to be removed.

Further, the step S4 specifically includes: for each pairIndividual data blocksFFT transforming to obtain its frequency domain representationWhereinRepresenting the frequency; will beIs divided intoSub-bandWhereinThe method comprises the steps of carrying out a first treatment on the surface of the For each sub-bandThe method comprises the following steps of: for a pair ofAbsolute value operation is carried out to obtain the amplitude spectrumWhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beIs divided intoEach intervalWherein，Represent the firstThe left end point of the individual section,represent the firstRight end point of each interval; for each intervalCalculation ofAverage value in the intervalThe method comprises the steps of carrying out a first treatment on the surface of the For each intervalCalculation ofStandard deviation within this intervalThe method comprises the steps of carrying out a first treatment on the surface of the For each intervalIf (3)WhereinIs a preset parameter, which represents tolerance, the interval is divided into two sub-intervalsAndwhereinIs an intervalIs a midpoint of (2); the compressed sub-bandsExpressed as the average value of each subintervalThe method comprises the following steps:

；

for each data blockAll sub-bands to be compressedCombining to obtain a compressed frequency domain representationWhereinRepresenting the frequency.

Further, for each sub-bandAccording to the frequency domain compression algorithm, the method compresses the compressed data, and comprises the following specific processes: for a pair ofAbsolute value operation is carried out to obtain the amplitude spectrumWhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beIs divided intoEach intervalWherein，Represent the firstThe left end point of the individual section,represent the firstRight end point of each interval; for each intervalCalculation ofAverage value in the interval：

；

For each intervalCalculation ofStandard deviation within this interval：

；

For each intervalIf (3)WhereinIs a preset parameter, which represents tolerance, the interval is divided into two sub-intervalsAndwhereinIs an intervalIs a midpoint of (2);

；

the compressed sub-bandsExpressed as the average value of each subintervalThe method comprises the following steps:

。

further, the pair ofPerforming IFFT to obtain time domain representationThe method for compressing the data block comprises the following steps: representing the compressed frequency domainPerforming inverse IFFT to obtain time domain representationThe method comprises the steps of carrying out a first treatment on the surface of the Is provided withLength of (2)Then the time domain representation after IFFT transformationIs also of length ofThe formula is as follows:

；

wherein, representing the units of an imaginary number,indicating the time of day.Representing the compressed data block as a block of lengthIs used for the time domain signal of the (a).

Further, the number of the subbands is in a range of values: 120-200.

The power data mining method based on big data has the following beneficial effects: the power data mining method based on the big data can fully utilize the big data technology and the data mining algorithm to effectively process and analyze the mass data in the power system, thereby realizing comprehensive monitoring, prediction and optimization of the power system. The core technology of the invention comprises recursive classification of tree structure, decomposition clustering algorithm, direction vector analysis, frequency domain compression algorithm, frequency domain compression and the like.

Firstly, the recursive classification of the tree structure can effectively classify and generalize the power data, so that key features of the data are extracted. According to the method, the electric power data are classified layer by setting different classification rules until a classification tree containing only one data value is finally obtained. In the classifying process, the invention uses different classifying rules including time rules, numerical rules, type rules, data type rules, place rules and the like, can finely and comprehensively classify and generalize the electric power data, and provides a reliable basis for data mining and analysis.

Secondly, the decomposition clustering algorithm of the invention can compress and simplify the data and extract the main characteristics of the data. The algorithm decomposes data into a plurality of principal components through singular value decomposition of the data, and clusters the principal components through a decomposition clustering algorithm to obtain a plurality of clustering centers. For each cluster center, the invention calculates the direction vector of the cluster center, and can better reveal the characteristics and rules of the data.

Thirdly, the direction vector analysis of the invention can identify the interference factors in the data, and improve the accuracy and reliability of the data. By calculating the included angle between the direction vector of the clustering center and the direction vectors of different data blocks, the method and the device can determine whether the data blocks deviate from the consistency requirement, and identify the interference factors in the data, thereby improving the accuracy and the reliability of the data.

Fourth, the frequency domain compression algorithm and the frequency domain compression method of the present invention can effectively compress and simplify data, thereby reducing the time and resource consumption of data processing. By compressing the data in the frequency domain, the invention can reduce the size and complexity of the data and improve the efficiency and precision of data processing.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a method for mining electric power data based on big data according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of partitioned data blocks of a big data based power data mining method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of performing recursive classification on power data based on a tree structure according to the big data-based power data mining method according to an embodiment of the present invention.

Detailed Description

The core of the invention is to provide a power data mining method based on big data.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

Referring to fig. 1, a big data based power data mining method performs the steps of:

in this step, data values contained in all the end nodes of the classification tree need to be acquired, and data clustering processing is performed to obtain a plurality of data blocks with different clustering centers. On this basis, it is also necessary to calculate the direction vector for each data block. The direction vector refers to the trend of the data block in a certain dimension. The purpose of this step is to cluster the power data and calculate the direction vector for each clustered data block, providing for subsequent data consistency analysis and frequency domain compression.

in this step, a data consistency analysis is required based on the direction vector, and data blocks deviating from the consistency range, which are regarded as interference data blocks, are found, and need to be removed. Data consistency analysis refers to comparing direction vectors of data blocks to determine whether they are within a consistency range. If the direction vector deviates from the consistency range, the data block may have abnormal value or noise and the like and needs to be removed.

In this step, the remaining data blocks need to be frequency domain compressed to obtain frequency domain compressed data blocks as a result of data mining. Frequency domain compression refers to converting a time domain signal into a frequency domain signal and compressing the frequency domain signal into smaller data blocks. By frequency domain compression, the storage space of the data can be effectively reduced, and the main characteristics of the original signal are reserved. In this step, frequency domain compression is required for each data block according to the calculated direction vector of the data block. The compressed data blocks may be used as a result of data mining for further analysis and application.

Example 2

On the basis of the above embodiment, the acquired power data includes at least the following power data: power load data, power generation data, power quality data, power consumer behavior data, and weather data.

Power load data: the load condition of each time point in the power system comprises information such as load, voltage, current and the like of each area.

Generating capacity data: the generating capacity information of each time point in the power system comprises information such as generating capacity, generating power and the like of various types of generating sets.

Power quality data: the power supply quality information of each time point in the power system comprises information such as voltage fluctuation, current harmonic waves, power quality and the like.

Power consumer behavior data: the power consumer behavior information of each time point in the power system comprises information such as power consumption behavior, power consumption mode, power consumption period and the like of the power consumer.

Weather data: refers to weather data related to the operation of the power system, including temperature, humidity, wind speed, air pressure, etc.

The power data are very important data in the operation process of the power system, can reflect the operation state and quality of the power system, and provide basis for the power management department to formulate a power management and optimization strategy. In the method of this patent, mining and application of the power data is achieved by performing processes such as classification, clustering, data consistency analysis, and frequency domain compression on the power data.

Example 3

Based on the above embodiment, in step S1, the classification rule used in each classification is different; the classification rule at least comprises the following categories: time rules, numerical rules, type rules, data type rules, location rules; the time rule is defined as: a rule for classifying the power data according to the time of acquiring the power data; the numerical rule is defined as: a rule for classifying according to the difference of the numerical ranges of the power data; the type rule is defined as: a rule for classifying the power data according to the category to which the power data belongs; the data type rule is defined as: a rule for classifying the electric power data according to the type to which the data value of the electric power data belongs; the location rule is defined as: a rule for classifying the acquired power data according to different node positions of the acquired power data in a power system; when classifying the power data, the first classification is to classify the power data according to a set type rule, and the last classification is to classify the power data according to a set numerical rule.

Specifically, the time rule refers to a rule that classifies power data according to a difference in time at which the power data is acquired. For example, the power data may be categorized by different time periods, such as by hour, by day, by week, etc. This allows finer granularity of analysis of the power data as a function of time.

The numerical rule refers to a rule that classifies the power data according to a difference in numerical range. For example, the power data may be classified by a range of values, such as classifying the power load data into high load, medium load, low load, and the like. Therefore, the power data can be classified according to the size of the power data, and the running state of the power system can be reflected better.

The type rule is a rule that classifies the power data according to a category to which the power data belongs. For example, the power data may be classified by different data types, such as power load data, power generation amount data, power supply quality data, power consumer behavior data, and the like, respectively. The power data can be classified according to different aspects of the power data, and the operation condition of the power system can be better understood.

The data type rule is a rule that classifies the power data according to a type to which a data value of the power data belongs. For example, the power data may be classified by different data types, such as dividing the power load data into different data types of instantaneous value, average value, maximum value, minimum value, and the like. The power data can be classified according to different attributes, and the characteristics of the power system can be better described.

The location rule refers to a rule that classifies the acquired power data according to the difference in the node position in the power system. For example, power data may be classified by different power system node locations, such as by classifying power load data by different regions. The power data can be classified according to the spatial distribution of the power data, and the operation condition of the power system can be better understood.

When classifying the power data, the first classification is to classify the power data according to a set type rule, and the last classification is to classify the power data according to a set numerical rule. Therefore, classification rules can be gradually refined in the classification process, and the characteristics of the power data can be better reflected.

Example 4

On the basis of the above embodiment, the step S2 specifically includes: firstly, acquiring data values contained in all the terminal nodes of a classification tree, and taking the data values as input data of a clustering algorithm; performing singular value decomposition on input data by using the following formula to obtain a singular value decomposition matrix of the data:

；

Specifically, when singular value decomposition is performed on input data, the following procedure is performed: when the input data is subjected to singular value decomposition, we regard it as a matrixWherein each row represents a block of data and each column represents data at a point in time. The specific process is as follows:

and carrying out normalization processing on the input data to ensure that the data mean value of each row is 0 and the variance is 1.

For normalized data matrixSingular value decomposition is carried out to obtain three matrixes，And。

from the result of singular value decomposition, a data matrix can be calculatedLow rank approximation matrixWhereinThe number of the singular values is represented, and a specific calculation formula is as followsWherein、、Respectively represent、、Before taking inA matrix of columns.

Example 5

Based on the above embodiment, the formula of the decomposition clustering algorithm is:

；

wherein, represent the firstAnd clustering centers.

Example 6

On the basis of the above embodiment, the step S3 specifically includes: direction vector for each cluster centerCalculating the included angle between the two clustering centers by using the following formulaWherein：

；

Specifically, the main reason for using the direction vector to perform data consistency analysis is that it can reflect the trend of the data in the multidimensional space, thereby identifying the inconsistency of the data.

Specifically, the direction vector is composed of the principal components of the data, which represent the most important trend of change in the data. By calculating the principal component of the data block, the direction of change of the data block in the multidimensional space can be obtained, thereby judging whether the data block has a consistent change trend with other data blocks. If the change direction of the data block deviates from the consistency range, the data block can be judged to be an interference data block, and then the data block is removed.

Therefore, the data consistency analysis using the direction vector can help us identify those data blocks having inconsistencies, thereby improving the accuracy and efficiency of data mining.

Example 7

On the basis of the above embodiment, the step S4 specifically includes: for each data blockFFT transforming to obtain its frequency domain representationWhereinRepresenting the frequency; will beIs divided intoSub-bandWhereinThe method comprises the steps of carrying out a first treatment on the surface of the For each sub-bandThe method comprises the following steps of: for a pair ofAbsolute value operation is carried out to obtain the amplitude spectrumWhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beIs divided intoEach intervalWherein，Represent the firstThe left end point of the individual section,represent the firstRight end point of each interval; for each intervalCalculation ofAverage value in the intervalThe method comprises the steps of carrying out a first treatment on the surface of the For each intervalCalculation ofStandard deviation within this intervalThe method comprises the steps of carrying out a first treatment on the surface of the For each intervalIf (3)WhereinIs a preset parameter, which represents tolerance, the interval is divided into two sub-intervalsAndwhereinIs an intervalIs a midpoint of (2); the compressed sub-bandsExpressed as the average value of each subintervalThe method comprises the following steps:

；

Specifically, in practical applications, toleranceSettings are required for specific problems and datasets. In general, if the tolerance is smaller, the compressed data block is closer to the original data, but the compression rate is correspondingly reduced; if the tolerance is large, the compression rate increases, but the difference between the compressed data block and the original data increases.

In practical applications, different attempts can be madeThe values are tested to determine the best by comparing the compression rate under different parameters with the quality of the compressed dataValues.

Example 8

On the basis of the above embodiment, the above is applied to each subbandAccording to the frequency domain compression algorithm, the method compresses the compressed data, and comprises the following specific processes: for a pair ofAn absolute value operation is performed, and the absolute value is calculated,obtain the amplitude spectrumWhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beIs divided intoEach intervalWherein，Represent the firstThe left end point of the individual section,represent the firstRight end point of each interval; for each intervalCalculation ofAverage value in the interval：

；

For each intervalCalculation ofStandard deviation within this interval：

；

。

example 9

On the basis of the above embodiment, the pair ofPerforming IFFT to obtain time domain representationThe method for compressing the data block comprises the following steps: representing the compressed frequency domainPerforming inverse IFFT to obtain time domain representationThe method comprises the steps of carrying out a first treatment on the surface of the Is provided withLength of (2)Then the time domain representation after IFFT transformationIs also of length ofThe formula is as follows:

；

The structure of the partitioned data blocks is shown in fig. 2. In the data block, the direction vector is represented by the UE.

A structure of recursive classification based on a tree structure is shown in fig. 3 for classifying power data.

Example 10

Based on the above embodiment, the number of subbands may have a value ranging from: 120-200.

In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The present invention has been described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A method of power data mining based on big data, the method comprising:

step S4: carrying out frequency domain compression on each data block according to the direction vector of the data block obtained by calculation on the rest data blocks to obtain frequency domain compressed data blocks as a data mining result;

in the step S1, the classification rules used in each classification are different; the classification rule at least comprises the following categories: time rules, numerical rules, type rules, data type rules, location rules; the time rule is defined as: a rule for classifying the power data according to the time of acquiring the power data; the numerical rule is defined as: a rule for classifying according to the difference of the numerical ranges of the power data; the type rule is defined as: a rule for classifying the power data according to the category to which the power data belongs; the data type rule is defined as: a rule for classifying the electric power data according to the type to which the data value of the electric power data belongs; the location rule is defined as: a rule for classifying the acquired power data according to different node positions of the acquired power data in a power system; when classifying the power data, the first classification is to classify the power data according to a set type rule, and the last classification is to classify the power data according to a set numerical rule.

2. The method of claim 1, wherein the acquired power data includes at least the following categories of power data: power load data, power generation data, power quality data, power consumer behavior data, and weather data.

3. The method according to claim 2, wherein the step S2 specifically includes: firstly, acquiring data values contained in all the terminal nodes of a classification tree, and taking the data values as input data of a clustering algorithm; performing singular value decomposition on input data by using the following formula to obtain a singular value decomposition matrix of the data:

；

wherein, Xthe input data is represented by a representation of the input data,UandVtwo orthogonal matrices representing the decomposition matrix, Σ representing the diagonal matrix, and the elements on the diagonal representing the singular values; truncating sigma to obtain a new diagonal matrix sigma _k ：

；

Wherein, krepresenting the number of singular values to be retained,σ _i represent the firstiSingular values; will be sigma _k Substituting the singular value decomposition matrix to obtain a truncated singular value decomposition matrix:

；

wherein, X _k representing the singular value decomposition matrix after the interception; will beX _k As input data, clustering the input data by using a decomposition clustering algorithm to obtain a plurality of clustering centers; the direction vector of each cluster center is calculated, namely, each cluster center is regarded as a vector, and then normalization processing is carried out on the vector.

4. The method of claim 3, wherein the formulation of the decomposition-clustering algorithm is:

；

wherein, mrepresenting the number of samples to be taken,krepresenting the number of cluster centers,x ⁱ⁽⁾ represent the firstiA number of samples of the sample were taken,μ _j represent the firstjA cluster center; the formula for calculating the direction vector is:

；

wherein, v _j represent the firstjAnd clustering centers.

5. The method according to claim 4, wherein the step S3 specifically includes: direction vector for each cluster centerv _j Calculating the included angle between the two clustering centers by using the following formulaθ _ij Whereini≠j：

；

Wherein, v _i represent the firstiThe direction vector of the center of the cluster,modulus, cos, representing vector ^-1 Representing an inverse cosine function; direction vector for each cluster centerv _j Calculate its average angle with all other direction vectors +.>：

；

Wherein, k' represents the number of cluster centers; according to a given consistency rangeCalculating the lower and upper limits of the consistency range:

；

wherein, radius representing the range of uniformity, pi representing the circumference ratio; for each data block, its direction vector is calculatedvDirection vector to all cluster centersv _j Included angle of (2)θ：

；

Wherein, va direction vector representing the block of data,v _j represent the firstjA direction vector of each cluster center; judgingWhether the direction vector of the broken data block is within the consistency range, namely:

L≤θ≤U；

6. The method according to claim 5, wherein the step S4 specifically includes: for each data blockX _j (t) FFT transforming to obtain its frequency domain representationX _j (f) WhereinfRepresenting the frequency; will beX _j (f) Is divided intoNSub-bandX _i (f) Whereini∈[1,N]The method comprises the steps of carrying out a first treatment on the surface of the For each sub-bandX _i (f) The method comprises the following steps of: for a pair ofX _i (f) Absolute value operation is carried out to obtain the amplitude spectrumA _i (f) WhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beA _i (f) Is divided intoKIntervals [l _k ,r _k ]Whereink∈[1,K]，l _k Represent the firstkThe left end point of the individual section,r _k represent the firstkRight end point of each interval; for each interval [l _k ,r _k ]Calculation ofA _i (f) Average value in this interval +.>The method comprises the steps of carrying out a first treatment on the surface of the For each interval [l _k ,r _k ]Calculation ofA _i (f) Standard deviation within this intervals _{i k,} The method comprises the steps of carrying out a first treatment on the surface of the For each interval [l _k ,r _k ]If->Wherein->Is a preset parameter representing tolerance, the interval is divided into two sub-intervalsl _k ,m _k ]Sum [m _k +1,r _k ]Whereinm _k Is interval [l _k ,r _k ]Is a midpoint of (2); the compressed sub-bandsX′i(f) Expressed as mean +.>The method comprises the following steps:

；

for each data blockX _j (t) All sub-bands to be compressedX′ _i (f) Combining to obtain a compressed frequency domain representationX′ _j (f) WhereinfRepresenting the frequency.

7. The method of claim 6, wherein for each subbandX _i (f) According to the frequency domain compression algorithm, the method compresses the compressed data, and comprises the following specific processes: for a pair ofX _i (f) Absolute value operation is carried out to obtain the amplitude spectrumA _i (f) WhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beA _i (f) Is divided intoKIntervals [l _k ,r _k ]Whereink∈[1,K]，l _k Represent the firstkEach intervalIs provided at the left end point of the (c),r _k represent the firstkRight end point of each interval; for each interval [l _k ,r _k ]Calculation ofA _i (f) Average value in this interval +.>：

；

For each interval [l _k ,r _k ]Calculation ofA _i (f) Standard deviation within this intervals _i,k ：

；

For each interval [l _k ,r _k ]If (3)Wherein->Is a preset parameter representing tolerance, the interval is divided into two sub-intervalsl _k ,m _k ]Sum [m _k +1,r _k ]Whereinm _k Is interval [l _k ,r _k ]Is a midpoint of (2);

；

the compressed sub-bandsX′i(f) Expressed as the average value of each subintervalThe method comprises the following steps:

。

8. the method of claim 7, wherein the pair ofX′ _j (f) Performing IFFT to obtain time domain representationX′ _j (t) The method for compressing the data block comprises the following steps: representing the compressed frequency domainX′ _j (f) Performing inverse IFFT to obtain time domain representationX′ _j (t) The method comprises the steps of carrying out a first treatment on the surface of the Is provided withX′ _j (f) Length of (2)NThen the time domain representation after IFFT transformationX′ _j (t) Is also of length ofNThe formula is as follows:

；

wherein, irepresenting the units of an imaginary number,tindicating the time of day.X′ _j (t) Representing the compressed data block as a block of lengthNIs used for the time domain signal of the (a).

9. The method of claim 8, wherein the number of subbands has a range of values: 120-200.