CN116304931B - Electric power data mining method based on big data - Google Patents

Electric power data mining method based on big data Download PDF

Info

Publication number
CN116304931B
CN116304931B CN202310530075.XA CN202310530075A CN116304931B CN 116304931 B CN116304931 B CN 116304931B CN 202310530075 A CN202310530075 A CN 202310530075A CN 116304931 B CN116304931 B CN 116304931B
Authority
CN
China
Prior art keywords
data
power
representing
classification
interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310530075.XA
Other languages
Chinese (zh)
Other versions
CN116304931A (en
Inventor
李营
李孟雷
王修伦
崔玉静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yingwei Electronic Technology Co ltd
Original Assignee
Shandong Yingwei Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yingwei Electronic Technology Co ltd filed Critical Shandong Yingwei Electronic Technology Co ltd
Priority to CN202310530075.XA priority Critical patent/CN116304931B/en
Publication of CN116304931A publication Critical patent/CN116304931A/en
Application granted granted Critical
Publication of CN116304931B publication Critical patent/CN116304931B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses a power data mining method based on big data, which is used for mining various types of power data such as power load data, power generation data, power supply quality data, power consumer behavior data, weather data and the like. According to the method, interference data blocks are removed and frequency domain compression is carried out on effective data blocks through multiple technical means such as recursive classification, data clustering, direction vector consistency analysis and frequency domain compression, so that more efficient data mining is achieved. The method has the advantages of high efficiency, accuracy, applicability and the like, can be widely applied to the field of data mining of the power system, and improves the safety, stability and reliability of the power system.

Description

Electric power data mining method based on big data
Technical Field
The invention relates to the technical field of data processing, in particular to an electric power data mining method based on big data.
Background
With the development of society, energy problems have become a global focus of attention. Among them, electric power is one of important energy forms, and is important for the development of the economical society. During the production and use of electricity, a large amount of data such as power load data, power generation data, power quality data, power consumer behavior data, weather data, and the like are generated. These data can provide important reference information for management and operation of the power industry, but also bring problems of huge data volume and low data processing efficiency. For this reason, power data mining is one of the hot spots of research in the power domain.
Currently, power data mining techniques have been widely used, with the most common being cluster analysis-based approaches. This approach typically breaks the power data into several data blocks, which are then clustered to find rules and associations in the data. There have been many patent documents that propose different power data mining methods.
For example, U.S. patent No. 8484001B2 discloses a method of power load prediction based on cluster analysis. The method obtains statistical characteristics of the power load by clustering historical power load data, and predicts future power load by utilizing the characteristics. The method can effectively improve the accuracy of power load prediction, but does not consider the classification problem of power data, is easily influenced by data noise and abnormal values, and has higher requirements on data preprocessing.
In addition, chinese patent CN101812877B discloses a method for clustering power data. According to the method, firstly, clustering analysis is carried out on the power data, and then, the clustering result is mapped onto a two-dimensional plane so as to be convenient for visual display of the clustering result. The method can conveniently display the clustering result of the electric power data, but is difficult to deeply analyze and mine the data because the clustering result is only displayed by adopting a two-dimensional plane.
In addition, some patent documents propose some power data mining methods based on frequency domain analysis. For example, chinese patent CN103431902B discloses a method for extracting power data features based on wavelet transform. The method performs wavelet transformation on the power data and analyzes wavelet coefficients to extract characteristics of the power data. The method can effectively extract the characteristics of the power data, but has larger calculation amount and lower processing efficiency due to the complexity of wavelet analysis.
Although the above patent documents propose different power data mining methods, the power data mining methods proposed in the above patent documents still have some problems. For example, in the method based on wavelet analysis, the power data is decomposed by means of wavelet packet decomposition, but wavelet packet decomposition is relatively poor in the decomposing ability of signals, and high-frequency components are easily decomposed into low-frequency components, resulting in poor quality of the decomposed signals. In the method based on singular value decomposition, the anti-interference capability of the method on noise is relatively poor, and the method is easily influenced by the noise, so that the quality of a clustering center is reduced, and the accuracy of a data mining result is further influenced.
In addition, in the existing power data mining method, when analyzing the power data, only mining is often performed on single type of power data, for example, only analysis is performed on power load data or only analysis is performed on power supply quality data, fusion analysis on different types of power data is lacking, and the overall operation condition of a power system cannot be comprehensively reflected. In addition, the existing power data mining method also has some problems which cannot be effectively solved, such as unstable clustering results, large calculated amount and the like.
Disclosure of Invention
The invention aims to provide a power data mining method based on big data, which adopts technologies such as recursive classification, singular value decomposition, direction vector consistency analysis, frequency domain compression and the like, can efficiently and accurately mine key information in power data, and improves the reliability and economy of a power system.
In order to solve the technical problems, the invention provides a power data mining method based on big data, which comprises the following steps:
step S1: acquiring power data, and performing recursive classification on the power data based on a tree structure, wherein the method specifically comprises the following steps: performing first classification on the acquired power data according to a set classification rule to obtain a plurality of first classification data, performing second classification on the first classification data according to the set classification rule, and the like until the classification data corresponding to the final node in the finally obtained classification tree contains only one data value, thereby obtaining the classification tree of the power data;
step S2: obtaining data values contained in all the terminal nodes of the classification tree, carrying out data clustering processing to obtain a plurality of data blocks with different clustering centers, and calculating the direction vector of each data block;
step S3: carrying out data consistency analysis based on the direction vector to obtain a data block with the direction vector deviating from a consistency range as an interference data block; removing the interference data blocks;
step S4: and carrying out frequency domain compression on each data block according to the direction vector of the data block obtained by calculation on the rest data blocks to obtain frequency domain compressed data blocks as a data mining result.
Further, the obtained power data at least includes the following power data: power load data, power generation data, power quality data, power consumer behavior data, and weather data.
Further, in the step S1, the classification rules used in each classification are different; the classification rule at least comprises the following categories: time rules, numerical rules, type rules, data type rules, location rules; the time rule is defined as: a rule for classifying the power data according to the time of acquiring the power data; the numerical rule is defined as: a rule for classifying according to the difference of the numerical ranges of the power data; the type rule is defined as: a rule for classifying the power data according to the category to which the power data belongs; the data type rule is defined as: a rule for classifying the electric power data according to the type to which the data value of the electric power data belongs; the location rule is defined as: a rule for classifying the acquired power data according to different node positions of the acquired power data in a power system; when classifying the power data, the first classification is to classify the power data according to a set type rule, and the last classification is to classify the power data according to a set numerical rule.
Further, the step S2 specifically includes: firstly, acquiring data values contained in all the terminal nodes of a classification tree, and taking the data values as input data of a clustering algorithm; performing singular value decomposition on input data by using the following formula to obtain a singular value decomposition matrix of the data:
wherein, the input data is represented by a representation of the input data,andtwo orthogonal matrices representing the decomposition matrix are shown,representing a diagonal matrix, the elements on the diagonal representing singular values; for a pair ofCutting off to obtain new diagonal matrix
Wherein, representing the number of singular values to be retained,represent the firstSingular values; will beSubstituting the singular value decomposition matrix to obtain a truncated singular value decomposition matrix:
wherein, representing the singular value decomposition matrix after the interception; will beAs input data, clustering the input data by using a decomposition clustering algorithm to obtain a plurality of clustering centers; the direction vector of each cluster center is calculated, namely, each cluster center is regarded as a vector, and then normalization processing is carried out on the vector.
Further, the formula of the decomposition clustering algorithm is as follows:
wherein, representing the number of samples to be taken,representing the number of cluster centers,represent the firstA number of samples of the sample were taken,represent the firstA cluster center; the formula for calculating the direction vector is as follows:
wherein, represent the firstAnd clustering centers.
Further, the step S3 specifically includes: direction vector for each cluster centerCalculating the included angle between the two clustering centers by using the following formulaWherein
Wherein, represent the firstThe direction vector of the center of the cluster,the modulus of the vector is represented,representing an inverse cosine function; direction vector for each cluster centerCalculate the average included angle with all other direction vectors
Wherein, representing the number of cluster centers; according to a given consistency rangeCalculating the lower and upper limits of the consistency range:
wherein, the radius of the range of uniformity is indicated,representing the circumference ratio; for each data block, its direction vector is calculatedDirection vector to all cluster centersIncluded angle of (2)
Wherein, a direction vector representing the block of data,represent the firstA direction vector of each cluster center; judging whether the direction vector of the data block is in the consistency range, namely:
if the direction vector of the data block is within the consistency range, the data block is considered to meet the consistency requirement, otherwise, the data block is considered to deviate from the consistency range, is considered to be an interference data block and needs to be removed.
Further, the step S4 specifically includes: for each pairIndividual data blocksFFT transforming to obtain its frequency domain representationWhereinRepresenting the frequency; will beIs divided intoSub-bandWhereinThe method comprises the steps of carrying out a first treatment on the surface of the For each sub-bandThe method comprises the following steps of: for a pair ofAbsolute value operation is carried out to obtain the amplitude spectrumWhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beIs divided intoEach intervalWhereinRepresent the firstThe left end point of the individual section,represent the firstRight end point of each interval; for each intervalCalculation ofAverage value in the intervalThe method comprises the steps of carrying out a first treatment on the surface of the For each intervalCalculation ofStandard deviation within this intervalThe method comprises the steps of carrying out a first treatment on the surface of the For each intervalIf (3)WhereinIs a preset parameter, which represents tolerance, the interval is divided into two sub-intervalsAndwhereinIs an intervalIs a midpoint of (2); the compressed sub-bandsExpressed as the average value of each subintervalThe method comprises the following steps:
for each data blockAll sub-bands to be compressedCombining to obtain a compressed frequency domain representationWhereinRepresenting the frequency.
Further, for each sub-bandAccording to the frequency domain compression algorithm, the method compresses the compressed data, and comprises the following specific processes: for a pair ofAbsolute value operation is carried out to obtain the amplitude spectrumWhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beIs divided intoEach intervalWhereinRepresent the firstThe left end point of the individual section,represent the firstRight end point of each interval; for each intervalCalculation ofAverage value in the interval
For each intervalCalculation ofStandard deviation within this interval
For each intervalIf (3)WhereinIs a preset parameter, which represents tolerance, the interval is divided into two sub-intervalsAndwhereinIs an intervalIs a midpoint of (2);
the compressed sub-bandsExpressed as the average value of each subintervalThe method comprises the following steps:
further, the pair ofPerforming IFFT to obtain time domain representationThe method for compressing the data block comprises the following steps: representing the compressed frequency domainPerforming inverse IFFT to obtain time domain representationThe method comprises the steps of carrying out a first treatment on the surface of the Is provided withLength of (2)Then the time domain representation after IFFT transformationIs also of length ofThe formula is as follows:
wherein, representing the units of an imaginary number,indicating the time of day.Representing the compressed data block as a block of lengthIs used for the time domain signal of the (a).
Further, the number of the subbands is in a range of values: 120-200.
The power data mining method based on big data has the following beneficial effects: the power data mining method based on the big data can fully utilize the big data technology and the data mining algorithm to effectively process and analyze the mass data in the power system, thereby realizing comprehensive monitoring, prediction and optimization of the power system. The core technology of the invention comprises recursive classification of tree structure, decomposition clustering algorithm, direction vector analysis, frequency domain compression algorithm, frequency domain compression and the like.
Firstly, the recursive classification of the tree structure can effectively classify and generalize the power data, so that key features of the data are extracted. According to the method, the electric power data are classified layer by setting different classification rules until a classification tree containing only one data value is finally obtained. In the classifying process, the invention uses different classifying rules including time rules, numerical rules, type rules, data type rules, place rules and the like, can finely and comprehensively classify and generalize the electric power data, and provides a reliable basis for data mining and analysis.
Secondly, the decomposition clustering algorithm of the invention can compress and simplify the data and extract the main characteristics of the data. The algorithm decomposes data into a plurality of principal components through singular value decomposition of the data, and clusters the principal components through a decomposition clustering algorithm to obtain a plurality of clustering centers. For each cluster center, the invention calculates the direction vector of the cluster center, and can better reveal the characteristics and rules of the data.
Thirdly, the direction vector analysis of the invention can identify the interference factors in the data, and improve the accuracy and reliability of the data. By calculating the included angle between the direction vector of the clustering center and the direction vectors of different data blocks, the method and the device can determine whether the data blocks deviate from the consistency requirement, and identify the interference factors in the data, thereby improving the accuracy and the reliability of the data.
Fourth, the frequency domain compression algorithm and the frequency domain compression method of the present invention can effectively compress and simplify data, thereby reducing the time and resource consumption of data processing. By compressing the data in the frequency domain, the invention can reduce the size and complexity of the data and improve the efficiency and precision of data processing.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a method for mining electric power data based on big data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of partitioned data blocks of a big data based power data mining method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of performing recursive classification on power data based on a tree structure according to the big data-based power data mining method according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a power data mining method based on big data.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, a big data based power data mining method performs the steps of:
step S1: acquiring power data, and performing recursive classification on the power data based on a tree structure, wherein the method specifically comprises the following steps: performing first classification on the acquired power data according to a set classification rule to obtain a plurality of first classification data, performing second classification on the first classification data according to the set classification rule, and the like until the classification data corresponding to the final node in the finally obtained classification tree contains only one data value, thereby obtaining the classification tree of the power data;
step S2: obtaining data values contained in all the terminal nodes of the classification tree, carrying out data clustering processing to obtain a plurality of data blocks with different clustering centers, and calculating the direction vector of each data block;
in this step, data values contained in all the end nodes of the classification tree need to be acquired, and data clustering processing is performed to obtain a plurality of data blocks with different clustering centers. On this basis, it is also necessary to calculate the direction vector for each data block. The direction vector refers to the trend of the data block in a certain dimension. The purpose of this step is to cluster the power data and calculate the direction vector for each clustered data block, providing for subsequent data consistency analysis and frequency domain compression.
Step S3: carrying out data consistency analysis based on the direction vector to obtain a data block with the direction vector deviating from a consistency range as an interference data block; removing the interference data blocks;
in this step, a data consistency analysis is required based on the direction vector, and data blocks deviating from the consistency range, which are regarded as interference data blocks, are found, and need to be removed. Data consistency analysis refers to comparing direction vectors of data blocks to determine whether they are within a consistency range. If the direction vector deviates from the consistency range, the data block may have abnormal value or noise and the like and needs to be removed.
Step S4: and carrying out frequency domain compression on each data block according to the direction vector of the data block obtained by calculation on the rest data blocks to obtain frequency domain compressed data blocks as a data mining result.
In this step, the remaining data blocks need to be frequency domain compressed to obtain frequency domain compressed data blocks as a result of data mining. Frequency domain compression refers to converting a time domain signal into a frequency domain signal and compressing the frequency domain signal into smaller data blocks. By frequency domain compression, the storage space of the data can be effectively reduced, and the main characteristics of the original signal are reserved. In this step, frequency domain compression is required for each data block according to the calculated direction vector of the data block. The compressed data blocks may be used as a result of data mining for further analysis and application.
Example 2
On the basis of the above embodiment, the acquired power data includes at least the following power data: power load data, power generation data, power quality data, power consumer behavior data, and weather data.
Power load data: the load condition of each time point in the power system comprises information such as load, voltage, current and the like of each area.
Generating capacity data: the generating capacity information of each time point in the power system comprises information such as generating capacity, generating power and the like of various types of generating sets.
Power quality data: the power supply quality information of each time point in the power system comprises information such as voltage fluctuation, current harmonic waves, power quality and the like.
Power consumer behavior data: the power consumer behavior information of each time point in the power system comprises information such as power consumption behavior, power consumption mode, power consumption period and the like of the power consumer.
Weather data: refers to weather data related to the operation of the power system, including temperature, humidity, wind speed, air pressure, etc.
The power data are very important data in the operation process of the power system, can reflect the operation state and quality of the power system, and provide basis for the power management department to formulate a power management and optimization strategy. In the method of this patent, mining and application of the power data is achieved by performing processes such as classification, clustering, data consistency analysis, and frequency domain compression on the power data.
Example 3
Based on the above embodiment, in step S1, the classification rule used in each classification is different; the classification rule at least comprises the following categories: time rules, numerical rules, type rules, data type rules, location rules; the time rule is defined as: a rule for classifying the power data according to the time of acquiring the power data; the numerical rule is defined as: a rule for classifying according to the difference of the numerical ranges of the power data; the type rule is defined as: a rule for classifying the power data according to the category to which the power data belongs; the data type rule is defined as: a rule for classifying the electric power data according to the type to which the data value of the electric power data belongs; the location rule is defined as: a rule for classifying the acquired power data according to different node positions of the acquired power data in a power system; when classifying the power data, the first classification is to classify the power data according to a set type rule, and the last classification is to classify the power data according to a set numerical rule.
Specifically, the time rule refers to a rule that classifies power data according to a difference in time at which the power data is acquired. For example, the power data may be categorized by different time periods, such as by hour, by day, by week, etc. This allows finer granularity of analysis of the power data as a function of time.
The numerical rule refers to a rule that classifies the power data according to a difference in numerical range. For example, the power data may be classified by a range of values, such as classifying the power load data into high load, medium load, low load, and the like. Therefore, the power data can be classified according to the size of the power data, and the running state of the power system can be reflected better.
The type rule is a rule that classifies the power data according to a category to which the power data belongs. For example, the power data may be classified by different data types, such as power load data, power generation amount data, power supply quality data, power consumer behavior data, and the like, respectively. The power data can be classified according to different aspects of the power data, and the operation condition of the power system can be better understood.
The data type rule is a rule that classifies the power data according to a type to which a data value of the power data belongs. For example, the power data may be classified by different data types, such as dividing the power load data into different data types of instantaneous value, average value, maximum value, minimum value, and the like. The power data can be classified according to different attributes, and the characteristics of the power system can be better described.
The location rule refers to a rule that classifies the acquired power data according to the difference in the node position in the power system. For example, power data may be classified by different power system node locations, such as by classifying power load data by different regions. The power data can be classified according to the spatial distribution of the power data, and the operation condition of the power system can be better understood.
When classifying the power data, the first classification is to classify the power data according to a set type rule, and the last classification is to classify the power data according to a set numerical rule. Therefore, classification rules can be gradually refined in the classification process, and the characteristics of the power data can be better reflected.
Example 4
On the basis of the above embodiment, the step S2 specifically includes: firstly, acquiring data values contained in all the terminal nodes of a classification tree, and taking the data values as input data of a clustering algorithm; performing singular value decomposition on input data by using the following formula to obtain a singular value decomposition matrix of the data:
wherein, the input data is represented by a representation of the input data,andtwo orthogonal matrices representing the decomposition matrix are shown,representing a diagonal matrix, the elements on the diagonal representing singular values; for a pair ofCutting off to obtain new diagonal matrix
Wherein, representing the number of singular values to be retained,represent the firstSingular values; will beSubstituting the singular value decomposition matrix to obtain a truncated singular value decomposition matrix:
wherein, representing the singular value decomposition matrix after the interception; will beAs input data, clustering the input data by using a decomposition clustering algorithm to obtain a plurality of clustering centers; the direction vector of each cluster center is calculated, namely, each cluster center is regarded as a vector, and then normalization processing is carried out on the vector.
Specifically, when singular value decomposition is performed on input data, the following procedure is performed: when the input data is subjected to singular value decomposition, we regard it as a matrixWherein each row represents a block of data and each column represents data at a point in time. The specific process is as follows:
and carrying out normalization processing on the input data to ensure that the data mean value of each row is 0 and the variance is 1.
For normalized data matrixSingular value decomposition is carried out to obtain three matrixesAnd
from the result of singular value decomposition, a data matrix can be calculatedLow rank approximation matrixWhereinThe number of the singular values is represented, and a specific calculation formula is as followsWhereinRespectively representBefore taking inA matrix of columns.
Example 5
Based on the above embodiment, the formula of the decomposition clustering algorithm is:
wherein, representing the number of samples to be taken,representing the number of cluster centers,represent the firstA number of samples of the sample were taken,represent the firstA cluster center; the formula for calculating the direction vector is as follows:
wherein, represent the firstAnd clustering centers.
Example 6
On the basis of the above embodiment, the step S3 specifically includes: direction vector for each cluster centerCalculating the included angle between the two clustering centers by using the following formulaWherein
Wherein, represent the firstThe direction vector of the center of the cluster,the modulus of the vector is represented,representing an inverse cosine function; direction vector for each cluster centerCalculate the average included angle with all other direction vectors
Wherein, representing the number of cluster centers; according to a given consistency rangeCalculating the lower and upper limits of the consistency range:
wherein, the radius of the range of uniformity is indicated,representing the circumference ratio; for each data block, its direction vector is calculatedDirection vector to all cluster centersIncluded angle of (2)
Wherein, a direction vector representing the block of data,represent the firstA direction vector of each cluster center; judging whether the direction vector of the data block is in the consistency range, namely:
if the direction vector of the data block is within the consistency range, the data block is considered to meet the consistency requirement, otherwise, the data block is considered to deviate from the consistency range, is considered to be an interference data block and needs to be removed.
Specifically, the main reason for using the direction vector to perform data consistency analysis is that it can reflect the trend of the data in the multidimensional space, thereby identifying the inconsistency of the data.
Specifically, the direction vector is composed of the principal components of the data, which represent the most important trend of change in the data. By calculating the principal component of the data block, the direction of change of the data block in the multidimensional space can be obtained, thereby judging whether the data block has a consistent change trend with other data blocks. If the change direction of the data block deviates from the consistency range, the data block can be judged to be an interference data block, and then the data block is removed.
Therefore, the data consistency analysis using the direction vector can help us identify those data blocks having inconsistencies, thereby improving the accuracy and efficiency of data mining.
Example 7
On the basis of the above embodiment, the step S4 specifically includes: for each data blockFFT transforming to obtain its frequency domain representationWhereinRepresenting the frequency; will beIs divided intoSub-bandWhereinThe method comprises the steps of carrying out a first treatment on the surface of the For each sub-bandThe method comprises the following steps of: for a pair ofAbsolute value operation is carried out to obtain the amplitude spectrumWhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beIs divided intoEach intervalWhereinRepresent the firstThe left end point of the individual section,represent the firstRight end point of each interval; for each intervalCalculation ofAverage value in the intervalThe method comprises the steps of carrying out a first treatment on the surface of the For each intervalCalculation ofStandard deviation within this intervalThe method comprises the steps of carrying out a first treatment on the surface of the For each intervalIf (3)WhereinIs a preset parameter, which represents tolerance, the interval is divided into two sub-intervalsAndwhereinIs an intervalIs a midpoint of (2); the compressed sub-bandsExpressed as the average value of each subintervalThe method comprises the following steps:
for each data blockAll sub-bands to be compressedCombining to obtain a compressed frequency domain representationWhereinRepresenting the frequency.
Specifically, in practical applications, toleranceSettings are required for specific problems and datasets. In general, if the tolerance is smaller, the compressed data block is closer to the original data, but the compression rate is correspondingly reduced; if the tolerance is large, the compression rate increases, but the difference between the compressed data block and the original data increases.
In practical applications, different attempts can be madeThe values are tested to determine the best by comparing the compression rate under different parameters with the quality of the compressed dataValues.
Example 8
On the basis of the above embodiment, the above is applied to each subbandAccording to the frequency domain compression algorithm, the method compresses the compressed data, and comprises the following specific processes: for a pair ofAn absolute value operation is performed, and the absolute value is calculated,obtain the amplitude spectrumWhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beIs divided intoEach intervalWhereinRepresent the firstThe left end point of the individual section,represent the firstRight end point of each interval; for each intervalCalculation ofAverage value in the interval
For each intervalCalculation ofStandard deviation within this interval
For each intervalIf (3)WhereinIs a preset parameter, which represents tolerance, the interval is divided into two sub-intervalsAndwhereinIs an intervalIs a midpoint of (2);
the compressed sub-bandsExpressed as the average value of each subintervalThe method comprises the following steps:
example 9
On the basis of the above embodiment, the pair ofPerforming IFFT to obtain time domain representationThe method for compressing the data block comprises the following steps: representing the compressed frequency domainPerforming inverse IFFT to obtain time domain representationThe method comprises the steps of carrying out a first treatment on the surface of the Is provided withLength of (2)Then the time domain representation after IFFT transformationIs also of length ofThe formula is as follows:
wherein, representing the units of an imaginary number,indicating the time of day.Representing the compressed data block as a block of lengthIs used for the time domain signal of the (a).
The structure of the partitioned data blocks is shown in fig. 2. In the data block, the direction vector is represented by the UE.
A structure of recursive classification based on a tree structure is shown in fig. 3 for classifying power data.
Example 10
Based on the above embodiment, the number of subbands may have a value ranging from: 120-200.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The present invention has been described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims (9)

1. A method of power data mining based on big data, the method comprising:
step S1: acquiring power data, and performing recursive classification on the power data based on a tree structure, wherein the method specifically comprises the following steps: performing first classification on the acquired power data according to a set classification rule to obtain a plurality of first classification data, performing second classification on the first classification data according to the set classification rule, and the like until the classification data corresponding to the final node in the finally obtained classification tree contains only one data value, thereby obtaining the classification tree of the power data;
step S2: obtaining data values contained in all the terminal nodes of the classification tree, carrying out data clustering processing to obtain a plurality of data blocks with different clustering centers, and calculating the direction vector of each data block;
step S3: carrying out data consistency analysis based on the direction vector to obtain a data block with the direction vector deviating from a consistency range as an interference data block; removing the interference data blocks;
step S4: carrying out frequency domain compression on each data block according to the direction vector of the data block obtained by calculation on the rest data blocks to obtain frequency domain compressed data blocks as a data mining result;
in the step S1, the classification rules used in each classification are different; the classification rule at least comprises the following categories: time rules, numerical rules, type rules, data type rules, location rules; the time rule is defined as: a rule for classifying the power data according to the time of acquiring the power data; the numerical rule is defined as: a rule for classifying according to the difference of the numerical ranges of the power data; the type rule is defined as: a rule for classifying the power data according to the category to which the power data belongs; the data type rule is defined as: a rule for classifying the electric power data according to the type to which the data value of the electric power data belongs; the location rule is defined as: a rule for classifying the acquired power data according to different node positions of the acquired power data in a power system; when classifying the power data, the first classification is to classify the power data according to a set type rule, and the last classification is to classify the power data according to a set numerical rule.
2. The method of claim 1, wherein the acquired power data includes at least the following categories of power data: power load data, power generation data, power quality data, power consumer behavior data, and weather data.
3. The method according to claim 2, wherein the step S2 specifically includes: firstly, acquiring data values contained in all the terminal nodes of a classification tree, and taking the data values as input data of a clustering algorithm; performing singular value decomposition on input data by using the following formula to obtain a singular value decomposition matrix of the data:
wherein, Xthe input data is represented by a representation of the input data,UandVtwo orthogonal matrices representing the decomposition matrix, Σ representing the diagonal matrix, and the elements on the diagonal representing the singular values; truncating sigma to obtain a new diagonal matrix sigma k
Wherein, krepresenting the number of singular values to be retained,σ i represent the firstiSingular values; will be sigma k Substituting the singular value decomposition matrix to obtain a truncated singular value decomposition matrix:
wherein, X k representing the singular value decomposition matrix after the interception; will beX k As input data, clustering the input data by using a decomposition clustering algorithm to obtain a plurality of clustering centers; the direction vector of each cluster center is calculated, namely, each cluster center is regarded as a vector, and then normalization processing is carried out on the vector.
4. The method of claim 3, wherein the formulation of the decomposition-clustering algorithm is:
wherein, mrepresenting the number of samples to be taken,krepresenting the number of cluster centers,x i() represent the firstiA number of samples of the sample were taken,μ j represent the firstjA cluster center; the formula for calculating the direction vector is:
wherein, v j represent the firstjAnd clustering centers.
5. The method according to claim 4, wherein the step S3 specifically includes: direction vector for each cluster centerv j Calculating the included angle between the two clustering centers by using the following formulaθ ij Whereinij
Wherein, v i represent the firstiThe direction vector of the center of the cluster,modulus, cos, representing vector -1 Representing an inverse cosine function; direction vector for each cluster centerv j Calculate its average angle with all other direction vectors +.>
Wherein, k' represents the number of cluster centers; according to a given consistency rangeCalculating the lower and upper limits of the consistency range:
wherein, radius representing the range of uniformity, pi representing the circumference ratio; for each data block, its direction vector is calculatedvDirection vector to all cluster centersv j Included angle of (2)θ
Wherein, va direction vector representing the block of data,v j represent the firstjA direction vector of each cluster center; judgingWhether the direction vector of the broken data block is within the consistency range, namely:
LθU
if the direction vector of the data block is within the consistency range, the data block is considered to meet the consistency requirement, otherwise, the data block is considered to deviate from the consistency range, is considered to be an interference data block and needs to be removed.
6. The method according to claim 5, wherein the step S4 specifically includes: for each data blockX j (t) FFT transforming to obtain its frequency domain representationX j (f) WhereinfRepresenting the frequency; will beX j (f) Is divided intoNSub-bandX i (f) Whereini∈[1,N]The method comprises the steps of carrying out a first treatment on the surface of the For each sub-bandX i (f) The method comprises the following steps of: for a pair ofX i (f) Absolute value operation is carried out to obtain the amplitude spectrumA i (f) WhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beA i (f) Is divided intoKIntervals [l k ,r k ]Whereink∈[1,K],l k Represent the firstkThe left end point of the individual section,r k represent the firstkRight end point of each interval; for each interval [l k ,r k ]Calculation ofA i (f) Average value in this interval +.>The method comprises the steps of carrying out a first treatment on the surface of the For each interval [l k ,r k ]Calculation ofA i (f) Standard deviation within this intervals i k, The method comprises the steps of carrying out a first treatment on the surface of the For each interval [l k ,r k ]If->Wherein->Is a preset parameter representing tolerance, the interval is divided into two sub-intervalsl k ,m k ]Sum [m k +1,r k ]Whereinm k Is interval [l k ,r k ]Is a midpoint of (2); the compressed sub-bandsXi(f) Expressed as mean +.>The method comprises the following steps:
for each data blockX j (t) All sub-bands to be compressedX i (f) Combining to obtain a compressed frequency domain representationX j (f) WhereinfRepresenting the frequency.
7. The method of claim 6, wherein for each subbandX i (f) According to the frequency domain compression algorithm, the method compresses the compressed data, and comprises the following specific processes: for a pair ofX i (f) Absolute value operation is carried out to obtain the amplitude spectrumA i (f) WhereinThe method comprises the steps of carrying out a first treatment on the surface of the Will beA i (f) Is divided intoKIntervals [l k ,r k ]Whereink∈[1,K],l k Represent the firstkEach intervalIs provided at the left end point of the (c),r k represent the firstkRight end point of each interval; for each interval [l k ,r k ]Calculation ofA i (f) Average value in this interval +.>
For each interval [l k ,r k ]Calculation ofA i (f) Standard deviation within this intervals i,k
For each interval [l k ,r k ]If (3)Wherein->Is a preset parameter representing tolerance, the interval is divided into two sub-intervalsl k ,m k ]Sum [m k +1,r k ]Whereinm k Is interval [l k ,r k ]Is a midpoint of (2);
the compressed sub-bandsXi(f) Expressed as the average value of each subintervalThe method comprises the following steps:
8. the method of claim 7, wherein the pair ofX j (f) Performing IFFT to obtain time domain representationX j (t) The method for compressing the data block comprises the following steps: representing the compressed frequency domainX j (f) Performing inverse IFFT to obtain time domain representationX j (t) The method comprises the steps of carrying out a first treatment on the surface of the Is provided withX j (f) Length of (2)NThen the time domain representation after IFFT transformationX j (t) Is also of length ofNThe formula is as follows:
wherein, irepresenting the units of an imaginary number,tindicating the time of day.X j (t) Representing the compressed data block as a block of lengthNIs used for the time domain signal of the (a).
9. The method of claim 8, wherein the number of subbands has a range of values: 120-200.
CN202310530075.XA 2023-05-12 2023-05-12 Electric power data mining method based on big data Active CN116304931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310530075.XA CN116304931B (en) 2023-05-12 2023-05-12 Electric power data mining method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310530075.XA CN116304931B (en) 2023-05-12 2023-05-12 Electric power data mining method based on big data

Publications (2)

Publication Number Publication Date
CN116304931A CN116304931A (en) 2023-06-23
CN116304931B true CN116304931B (en) 2023-08-04

Family

ID=86829085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310530075.XA Active CN116304931B (en) 2023-05-12 2023-05-12 Electric power data mining method based on big data

Country Status (1)

Country Link
CN (1) CN116304931B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007147166A2 (en) * 2006-06-16 2007-12-21 Quantum Leap Research, Inc. Consilence of data-mining
US7715961B1 (en) * 2004-04-28 2010-05-11 Agnik, Llc Onboard driver, vehicle and fleet data mining
CN104809244A (en) * 2015-05-15 2015-07-29 成都睿峰科技有限公司 Data mining method and device in big data environment
CN105893483A (en) * 2016-03-29 2016-08-24 天津贝德曼科技有限公司 Construction method of general framework of big data mining process model
CN106022614A (en) * 2016-05-22 2016-10-12 广州供电局有限公司 Data mining method of neural network based on nearest neighbor clustering
CN106094744A (en) * 2016-06-04 2016-11-09 上海大学 The determination method of thermoelectricity factory owner's operational factor desired value based on association rule mining
CN107274100A (en) * 2017-06-23 2017-10-20 广东知元机器人科技有限公司 Economic alarming analysis method based on electric power big data
CN109933620A (en) * 2019-03-18 2019-06-25 上海大学 Thermoelectricity big data method for digging based on Spark
CN113190670A (en) * 2021-05-08 2021-07-30 重庆第二师范学院 Information display method and system based on big data platform
WO2022041265A1 (en) * 2020-08-31 2022-03-03 苏州大成电子科技有限公司 Big data service method for electric vehicle power user
WO2022193569A1 (en) * 2021-03-15 2022-09-22 南京邮电大学 Method and system for optimizing turbine of thermal power unit on basis of sparse big data mining
JP7240691B1 (en) * 2021-09-08 2023-03-16 山東大学 Data drive active power distribution network abnormal state detection method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6965895B2 (en) * 2001-07-16 2005-11-15 Applied Materials, Inc. Method and apparatus for analyzing manufacturing data
CN103077402B (en) * 2012-12-28 2016-05-11 国家电网公司 Partial discharge of transformer mode identification method based on singular value decomposition algorithm

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7715961B1 (en) * 2004-04-28 2010-05-11 Agnik, Llc Onboard driver, vehicle and fleet data mining
WO2007147166A2 (en) * 2006-06-16 2007-12-21 Quantum Leap Research, Inc. Consilence of data-mining
CN104809244A (en) * 2015-05-15 2015-07-29 成都睿峰科技有限公司 Data mining method and device in big data environment
CN105893483A (en) * 2016-03-29 2016-08-24 天津贝德曼科技有限公司 Construction method of general framework of big data mining process model
CN106022614A (en) * 2016-05-22 2016-10-12 广州供电局有限公司 Data mining method of neural network based on nearest neighbor clustering
CN106094744A (en) * 2016-06-04 2016-11-09 上海大学 The determination method of thermoelectricity factory owner's operational factor desired value based on association rule mining
CN107274100A (en) * 2017-06-23 2017-10-20 广东知元机器人科技有限公司 Economic alarming analysis method based on electric power big data
CN109933620A (en) * 2019-03-18 2019-06-25 上海大学 Thermoelectricity big data method for digging based on Spark
WO2022041265A1 (en) * 2020-08-31 2022-03-03 苏州大成电子科技有限公司 Big data service method for electric vehicle power user
WO2022193569A1 (en) * 2021-03-15 2022-09-22 南京邮电大学 Method and system for optimizing turbine of thermal power unit on basis of sparse big data mining
CN113190670A (en) * 2021-05-08 2021-07-30 重庆第二师范学院 Information display method and system based on big data platform
JP7240691B1 (en) * 2021-09-08 2023-03-16 山東大学 Data drive active power distribution network abnormal state detection method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
数据挖掘技术用于负荷与负荷影响因素的相关性分析;李莉;栗然;;华北电力大学学报(自然科学版)(06);全文 *

Also Published As

Publication number Publication date
CN116304931A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
Ma et al. Lightweight deep residual CNN for fault diagnosis of rotating machinery based on depthwise separable convolutions
CN110781332A (en) Electric power resident user daily load curve clustering method based on composite clustering algorithm
CN111950620A (en) User screening method based on DBSCAN and K-means algorithm
CN112330078B (en) Power consumption prediction method and device, computer equipment and storage medium
CN115081795A (en) Enterprise energy consumption abnormity cause analysis method and system under multidimensional scene
Yang et al. Negative selection algorithm based on antigen density clustering
CN115563477B (en) Harmonic data identification method, device, computer equipment and storage medium
CN113255777A (en) Equipment fault early warning method and system based on multi-mode sensitive feature selection fusion
CN116304931B (en) Electric power data mining method based on big data
Obeidat et al. EEG based epilepsy diagnosis system using reconstruction phase space and naive Bayes classifier
CN109309538A (en) A kind of frequency spectrum sensing method, device, equipment, system and storage medium
CN111259965A (en) Method and system for carrying out mean value clustering on electrical characteristic data based on dimension reduction
CN111339986A (en) Frequency law mining method and system for equipment based on time domain/frequency domain analysis
Jungan et al. A k-deviation density based clustering algorithm
CN115935212A (en) Adjustable load clustering method and system based on longitudinal trend prediction
Jiang et al. An adversarial examples identification method for time series in Internet-of-Things system
Han et al. Research and application of regularized sparse filtering model for intelligent fault diagnosis under large speed fluctuation
CN111310842A (en) Density self-adaptive rapid clustering method
CN113810500B (en) Crowd gathering risk identification method and related equipment
YANG et al. Data Desensitization Method of Electricity Information
CN113723835B (en) Water consumption evaluation method and terminal equipment for thermal power plant
CN115545107B (en) Cloud computing method and system based on mass power data
CN117493921B (en) Artificial intelligence energy-saving management method and system based on big data
Zhang et al. Network Traffic Feature Weight Map Based Approach for Intrusion Detection
CN117421583A (en) Method for reconstructing data of compressed sensing power quality by selecting blocking sieve

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant