CN114880948A - Harmonic prediction modeling method and system based on random forest optimization algorithm - Google Patents

Harmonic prediction modeling method and system based on random forest optimization algorithm Download PDF

Info

Publication number
CN114880948A
CN114880948A CN202210620824.3A CN202210620824A CN114880948A CN 114880948 A CN114880948 A CN 114880948A CN 202210620824 A CN202210620824 A CN 202210620824A CN 114880948 A CN114880948 A CN 114880948A
Authority
CN
China
Prior art keywords
harmonic
random forest
prediction
characteristic
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210620824.3A
Other languages
Chinese (zh)
Inventor
马兴
杨爽
陈咏涛
廖玉祥
张友强
董光德
匡红刚
付昂
朱小军
王瑞妙
易鹏飞
汪颖
周为
邹平
赵小娟
胡文曦
喻梦洁
周敬森
朱晟毅
方辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of State Grid Chongqing Electric Power Co Ltd
State Grid Corp of China SGCC
State Grid Chongqing Electric Power Co Ltd
Original Assignee
Electric Power Research Institute of State Grid Chongqing Electric Power Co Ltd
State Grid Corp of China SGCC
State Grid Chongqing Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of State Grid Chongqing Electric Power Co Ltd, State Grid Corp of China SGCC, State Grid Chongqing Electric Power Co Ltd filed Critical Electric Power Research Institute of State Grid Chongqing Electric Power Co Ltd
Priority to CN202210620824.3A priority Critical patent/CN114880948A/en
Publication of CN114880948A publication Critical patent/CN114880948A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/40Arrangements for reducing harmonics

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Water Supply & Treatment (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Primary Health Care (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a harmonic prediction modeling method and a harmonic prediction modeling system based on a random forest optimization algorithm, wherein the method comprises the following steps of: acquiring power quality monitoring data, preprocessing the monitoring data, and constructing a harmonic voltage current feature set; sampling random forest training samples according to the harmonic voltage current feature set; establishing a regression decision tree according to the sampling sample; constructing a random forest model according to the decision tree; optimizing the random forest model; obtaining a random forest harmonic prediction model based on harmonic characteristic optimization after optimization; the invention realizes the prediction of harmonic amplitude; the method is beneficial to analyzing the interaction influence among various harmonic voltage and currents in industrial users, and provides theoretical support for preventing harmonic pollution, detecting the action relation among various harmonic voltage and currents and analyzing the influence factors of the power quality characteristics of the power system.

Description

Harmonic prediction modeling method and system based on random forest optimization algorithm
Technical Field
The invention relates to the field of harmonic prediction modeling, in particular to a harmonic prediction modeling method and system based on a random forest optimization algorithm.
Background
In recent years, with the rapid development of power electronics technology, power electronics devices are widely used in life, and have a new characteristic of high power electronics. Because most nonlinear loads lack a uniform harmonic emission limiting standard and a harmonic filtering device is not installed yet, harmonic current generated by the harmonic filtering device is directly injected into a power grid, and harmonic pollution in the power grid is aggravated. For the household users, the harmonic current injected into the power grid is negligible due to the small power of the single load, and the influence is negligible. However, for a large-capacity industrial user, the nonlinear load contained in the industrial user has the characteristics of large quantity, wide distribution and strong randomness, so that the harmonic current in the industrial user is high in content and has random fluctuation characteristics, and the harmonic problem caused by the harmonic current to a power grid cannot be ignored. Through practical measurement data, the current distortion generated by the load comprising the power electronic device is very serious, and the total harmonic distortion rate of the current of some loads can even exceed 100%. If the harmonic wave cannot be reasonably and effectively intervened, the harmonic wave flows into a higher-level power system and causes a series of related power quality problems, so that the design of a power quality control scheme is difficult. It is therefore necessary to model the harmonics accurately.
At present, many researches on modeling of harmonic prediction have been conducted at home and abroad, and the existing harmonic prediction models mainly include a constant current source model, a model based on a cross frequency admittance matrix, a Norton equivalent model and a simplified model based on a least square method. With the increase of harmonic prediction types and the improvement of the complexity of a topological structure, a neural network modeling method without considering the internal mechanism of harmonic prediction appears.
The method still has the problems of limited applicable scenes, weak adaptability, complex modeling, slower operation and the like.
Disclosure of Invention
The invention aims to: aiming at the existing problems, a harmonic prediction modeling method and a harmonic prediction modeling system based on a random forest optimization algorithm are provided; the method solves the problem of weak adaptability of a harmonic prediction model; the problems of complex modeling and slow operation are solved.
In view of the defects in the prior art, the invention provides a harmonic source modeling method based on a random forest optimization algorithm by taking the harmonic characteristics of industrial users as research contents, so as to realize accurate prediction of harmonic current under a steady-state condition. Firstly, acquiring voltage, current and power data of each harmonic of an industrial user through an electric energy quality monitoring device, and preprocessing the monitoring data to acquire an effective value of each harmonic; secondly, constructing a harmonic voltage and current characteristic matrix, sampling harmonic data by using a Bagging (boosting aggregation algorithm) idea, and constructing a decision tree data set so as to realize regression analysis on unknown harmonic voltage and current values; and finally, establishing a random forest harmonic source model based on a Wrapper algorithm (a packaging method) and a whale optimization algorithm to realize a prediction result of a certain harmonic current. Based on the result, the influence of different harmonic voltages on the harmonic current can be accurately evaluated and predicted, and the method is beneficial to providing a theoretical basis for preventing harmonic pollution and analyzing the influence factors of the power quality characteristics of the power system.
The technical scheme adopted by the invention is as follows:
a harmonic prediction modeling method based on a random forest optimization algorithm comprises the following steps: acquiring power quality monitoring data, preprocessing the monitoring data, and constructing a harmonic voltage current feature set; sampling random forest training samples according to the harmonic voltage current feature set; establishing a regression decision tree according to the sampling sample; constructing a random forest model according to the decision tree; optimizing the random forest model; and obtaining a random forest harmonic prediction model based on harmonic characteristic optimization after optimization.
Further, the preprocessing of the data is to clean the data, and includes data deletion and abnormal data processing.
Further, the set of harmonic voltage and current characteristics is constructed by:
Figure BDA0003676656640000021
wherein N represents the number of sampling points of each column; v hj A jth sample point value representing the h harmonic voltage; i is hj The jth sample point value representing the h harmonic current.
Further, the sampling method of the random forest training samples comprises the following steps: and randomly and repeatedly extracting k harmonic samples in the original harmonic feature set X and Y, wherein k is the total number of data in the harmonic feature set and is used as a data set P to carry out T-time circulation to form a training set Q of the random forest model.
Further, the method for establishing the regression decision tree comprises the following steps: the harmonic voltage current characteristic set X, Y has k sampling points and M characteristic variables, wherein M is each input harmonic characteristic; extracting k sampling points from an original harmonic feature set X in a release manner by using a Bootstrap Sample method, and finally obtaining a harmonic Sample set with the number of k samples; and randomly selecting M characteristic variables from the M characteristic variables to form a harmonic characteristic variable subset, wherein M is a fixed value, and selecting the optimal characteristic through a least square deviation function to split so as to form a decision tree.
Further, the method for constructing the random forest model comprises the following steps: constructing T decision trees; with M characteristic variables, randomly selecting M at each node of each decision tree t Selecting one of the extracted variables having the most dominant energy by individual characteristics or by calculating information gainForce variables, random forest algorithm, will be at m t Finding the best segmentation point of each tree in the selected features, and selecting the harmonic features with the largest information gain as the current splitting features; each tree grows to the maximum extent without any pruning; the generated decision trees form a random forest to predict the voltage and current values of certain harmonic wave, and the output result is determined according to the average value of the tree output in the random forest; modifying the number of trees in the random forest, and selecting a group with the best effect; and evaluating the performance of the model.
Further, the method for optimizing the random forest model comprises the following steps: the importance of each harmonic feature in the random forest model is arranged in a descending order; firstly, selecting the features with the highest importance for prediction; then selecting two characteristics with the highest importance for prediction, repeating the steps, and calculating the decisive coefficients of the prediction results with different characteristic numbers; and selecting a plurality of optimal determinant coefficients to construct a harmonic voltage current characteristic set to obtain optimized characteristic data.
Further, the method for obtaining the random forest harmonic prediction model based on harmonic feature optimization after optimization comprises the following steps: reconstructing a random forest model with optimized harmonic characteristics based on the optimized characteristic data; the optimization of the constructed harmonic prediction model can be completed by combining the screening of the harmonic model; and obtaining a random forest harmonic prediction model based on harmonic characteristic optimization.
A harmonic prediction modeling system based on a random forest optimization algorithm comprises: the acquisition module is used for acquiring power quality monitoring data, preprocessing the monitoring data and constructing a harmonic voltage current feature set; the processing module is used for sampling random forest training samples according to the harmonic voltage current feature set; establishing a regression decision tree according to the sampling sample; constructing a random forest model according to the decision tree; and the optimization module is used for optimizing the random forest model to obtain a random forest harmonic prediction model based on harmonic characteristic optimization.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. the method adopts a random forest algorithm as a main body, has high modeling efficiency, can be repeatedly used only by once construction of the decision tree, and overcomes the defects of weak generalization capability and easy overfitting of a single decision tree.
2. The invention can process the data of harmonic voltage and current with very high dimension and does not need to make feature selection.
3. The method can identify which harmonic features are important through a random forest algorithm; and it is very inclusive in terms of noise and harmonic outliers.
4. The harmonic prediction result is determined by the average value of all tree outputs in the forest, so that the prediction precision can be improved.
5. The method realizes effective extraction of high-importance harmonic features, reduces harmonic feature data dimensionality, enables model training to be fast, is relatively simple to realize, and improves prediction precision and model efficiency.
6. The invention realizes the prediction of harmonic amplitude. The method is beneficial to analyzing the interaction influence among various harmonic voltage and currents in industrial users, and provides theoretical support for preventing harmonic pollution, detecting the action relation among various harmonic voltage and currents and analyzing the influence factors of the power quality characteristics of the power system.
Drawings
The invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of a harmonic prediction modeling method.
Fig. 2 is a basic schematic diagram of a decision tree.
Fig. 3 is a schematic diagram of the deterministic coefficients of the prediction results of different feature numbers.
FIG. 4 is a diagram of a wave predictive modeling system architecture.
Detailed Description
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
Any feature disclosed in this specification (including any accompanying claims, abstract) may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.
Example 1
A harmonic prediction modeling method based on a random forest optimization algorithm is shown in figure 1 and comprises the following steps:
s1: and acquiring power quality monitoring data, preprocessing the monitoring data, and constructing a harmonic voltage current feature set.
The method for acquiring the power quality monitoring data comprises the following steps: in this embodiment, 110kV power quality monitoring data of an industrial user is obtained, and sampling data with a time interval of 3min in the power quality monitoring system of the industrial user is selected for research, where the data includes a total harmonic voltage distortion rate (THDu), an individual harmonic voltage content rate (hruhh), a total harmonic current distortion rate (THDi), an individual harmonic current effective value (Ih), and the like, and the individual harmonic voltage effective value (Vh) can be calculated as follows:
V h =HRU h ×V 1
wherein HRU h For the voltage content of each harmonic, V 1 Is the effective value of the fundamental voltage.
The preprocessing of the data is to clean the data, and includes data deletion and abnormal data processing, specifically:
data deletion is mainly divided into two cases: deletion of missing data and deletion of non-principal data; if the missing data exceeds 90%, the data set is deleted, otherwise, the missing data is filled.
In this embodiment, the data set is processed by using an "adjacent data average method": selecting L effective data before the null value and L effective data after the null value, averaging, estimating and replacing the null value; for industrial users, odd harmonics of the industrial users are main harmonic components, so that the odd harmonics are directly selected as harmonic data samples.
For the processing of abnormal data, in the embodiment, the CP95 value is selected and the mode of manual inspection is mainly used, the sampled points are arranged in the order from large to small, the maximum value of 5% is removed, and the maximum value in the rest is the 95% probability value; and data that significantly deviates from the actual measurement is replaced by data stuffing.
The set of harmonic voltage and current characteristics is: after data processing, constructing harmonic voltage current feature sets X and Y of an analysis object;
Figure BDA0003676656640000051
in the formula, N represents the number of sampling points of each column; v hj (h 1,3, … …, N, j 1,2, … …, N) represents the jth sampling point value of the h harmonic voltage; i is hj (h 1,3, … …, N, j 1,2, … …, N) represents the jth sample point value of the h harmonic current.
S2: and sampling random forest training samples according to the harmonic voltage current feature set.
The sampling method of the random forest training samples comprises the following steps: randomly and retractably extracting k harmonic samples from an original harmonic feature set X, Y, wherein k is the total data number in the harmonic feature set and is used as a data set P:
Figure BDA0003676656640000053
wherein x i,t For each randomly extracted voltage or current value of a certain harmonic in the set of features X, Y, N + Representing a non-zero set of natural numbers.
And performing T cycles to form a training set Q of the random forest model:
Q={P 1 ,P 2 ,…,P t ,…|t≤T,t∈N + }
where each cycle sample is independent of the others and is used to construct T regression decision trees. And finally, taking the average value of each decision tree as a final prediction result, thereby greatly improving the accuracy of the model. In generating the training subsets, the probability of each training sample being drawn is theoretically the same each time. After multiple times of extraction, the probability of each harmonic sampling point being extracted is
Figure BDA0003676656640000052
When k → ∞, the probability is approximately equal to 0.368, i.e. about 36.8% of the samples are not extracted at a time. These samples are referred to as Out-Of-Bag data (Out-Of-Bag, OOB).
S3: and establishing a regression decision tree according to the sampling samples.
A decision tree is a tree structure (which may be a binary tree or a non-binary tree). Wherein each non-leaf node represents a test on a harmonic feature attribute, each branch represents the output of the harmonic feature attribute over a range of values, and each leaf node represents the final test result. The process of using the decision tree to make a decision is to test corresponding characteristic attributes in the harmonic waves to be classified from a root node, select an output branch according to the value of the characteristic attributes until the output branch reaches a leaf node, and take the category stored in the leaf node as a decision result; the principle of the decision tree is shown in fig. 2.
The specific implementation process is as follows: the harmonic voltage current characteristic set X, Y has k sampling points and M characteristic variables, wherein M is each input harmonic characteristic. And (3) extracting k sampling points from the original harmonic feature set X in a release manner by using a Bootstrap Sample method, and finally obtaining a harmonic Sample set with k samples. M characteristic variables (M is less than or equal to M) are randomly selected from the M characteristic variables to form a harmonic characteristic variable subset, M is a fixed value, and the optimal characteristic is selected through a least square deviation function to split, so that a decision tree is formed. When performing Bootstrap sampling on the feature set X, Y, some sample points in X, Y may be sampled repeatedly, and the probability of each sample point being selected in this embodiment is 63.3%. Decision trees are actually a method of dividing a space by hyperplanes, and each time a space is divided, a current space is divided into two parts, for example, when a three-dimensional data is processed.
S4: and constructing a random forest model according to the decision tree.
The random forest model is a forest established in a random mode, and the forest is composed of a plurality of decision trees. For the classification algorithm, after a forest is formed, when a new input harmonic sample enters, each decision tree in the forest is judged once, the class to which the sample belongs is judged, and then the class is predicted to be the class by judging which class is selected the most. Random forest regression can be viewed as a strong predictor integrating many weak predictors.
Each decision tree in the random forest is a binary tree, the generation of the binary tree follows a top-down recursive splitting principle, namely, a harmonic training set is divided in sequence from a root node; in the binary tree, a root node contains all harmonic training data, the harmonic training data are split into a left node and a right node according to the principle of minimum node purity, the left node and the right node respectively contain a subset of the harmonic training data, and the splitting of the nodes is continued according to the same rule until the branch stopping rule is met and the growth is stopped.
The specific implementation process of the random forest is as follows:
(1) the original training set is a harmonic feature set X and Y, k sample points are extracted at random in a replacement mode by applying a Bootstrap method in the upper section, T new harmonic sample sets are obtained finally by repeating the K sample points for T times, T decision trees are constructed according to the T new harmonic sample sets, and T extra-bag data are formed by harmonic samples which are not extracted each time;
(2) with M characteristic variables, M is randomly selected at each node of each tree t A characteristic (m) t ≤M),m t Generally, the square root or one third of the characteristic number M, and a variable with the most dominant capability can be selected from the extracted variables by calculating the information gain of the characteristic number M, and the random forest algorithm is used for calculating the value M t The best segmentation point for each tree is found among the selected features.
The known training data set is a harmonic voltage current feature set X, Y, | XY | which represents the number of samples. With M characteristic variables M k ,k=1,2…K,|M k I is the characteristic M of belonging to harmonic k The number of samples of (1) is | M k | XY |. Let a certain harmonic feature A have n different values { a } 1 ,a 2 …a n Dividing X and Y into n subsets XY according to the value of the characteristic A 1 ,XY 2 …XY n ;|XY i L is XY i The number of samples of (a) is: sigma | XY i | XY |. Calculating harmonic feature sets X, YEmpirical entropy:
Figure BDA0003676656640000061
calculating the empirical condition entropy:
Figure BDA0003676656640000062
where the set XYik is the collection of samples belonging to the feature Mk in the subset XYi; and | XYik | is the number of samples of the set XYik.
Information gain of a certain harmonic feature a:
g(XY,A)=G(XY)-G(XY|A)
and selecting the harmonic characteristic with the maximum information gain as the current splitting characteristic.
(3) Each tree grows to the maximum extent without any pruning.
(4) And (4) forming a random forest by the generated decision trees to predict the required voltage and current values of certain harmonic, wherein the output result is determined according to the average value of tree outputs in the random forest.
The random forest algorithm is based on a group of decision tree combined models, and finally the average value of the outputs of the decision trees is taken as a harmonic source model of the random forest
Figure BDA0003676656640000071
Obtaining the predicted value of the required certain harmonic voltage or current, and the expression is
Figure BDA0003676656640000072
In the formula, theta t Random variables subject to independent distribution; x is an input harmonic vector; t represents the number of the decision trees; h (x, theta) t ) Based on x and θ for each decision tree t The harmonic prediction value of (1). For any new input vector x, the prediction function can give a prediction value
Figure BDA0003676656640000073
Modifying the number of trees in the random forest and selecting a group with the best effect.
(6) Evaluating the performance of the model; and generating a prediction result of harmonic voltage or current amplitude by using a trained random forest model, and calculating the average absolute error (MAE) and the square average error (MSE) of the prediction result for evaluating the calculation accuracy of the model.
Figure BDA0003676656640000074
Figure BDA0003676656640000075
In the formula:
Figure BDA0003676656640000076
and Yi is a harmonic wave predicted value of the model and a harmonic wave actual measurement result.
S5: and optimizing the random forest model and judging whether the random forest model is optimal or not.
For the industrial user, odd harmonics are the dominant feature vectors, but for the model, the training time of the model is too long due to the large amount of harmonic data, the data analysis dimension is high, and the prediction effect of the model is also affected by different amounts of harmonic features. In order to effectively reduce the dimension of data analysis and optimize the prediction effect of the model, the method for screening the harmonic feature sets X and Y of the industrial user is adopted by a Wrapper method. The basic process is as follows:
the importance (feature _ importance) of each harmonic feature in the random forest model is sorted in descending order. Firstly, selecting the features with the highest importance for prediction; and then selecting two features with the highest importance for prediction, repeating the steps, and calculating a determinant coefficient R2 of the prediction result with different feature numbers.
R2=SSR/SST
Figure BDA0003676656640000081
Figure BDA0003676656640000082
Figure BDA0003676656640000083
Wherein SST is the sum of the squares of the total squares, and SSR is the sum of the squares of the regression;
Figure BDA0003676656640000084
is the average of the actual measurements of the harmonics. R2 has a value range of [0, 1 ]]The larger the determinant coefficient, the higher the interpretation of the dependent variable by the independent variable, and the higher the percentage of the total variation by which the variation is caused by the independent variable. The determinant coefficient R2 of the different feature number prediction results is shown in fig. 3.
As can be seen from fig. 3, when the number of feature choices is 6, the prediction effect of the model is the best, and at this time, R2 is 0.9876. With the increase of the selected number of the features, the prediction effect of the model tends to be stable, and the single feature cannot influence the prediction effect of the model to a great extent. In summary, six characteristics are selected, which are respectively: fundamental voltage, 3 rd harmonic voltage, fundamental current, 7 th harmonic current, 11 th harmonic current, 13 th harmonic current. And constructing a harmonic voltage current feature set X'. As follows:
Figure BDA0003676656640000085
in the formula, Vhj (h ═ 1,3, j ═ 1,2, … …, N) represents the jth sampling point value of the h-th harmonic voltage; ihj (h 1,3,7, 11, j 1,2, … …, N) represents the jth sample point value of the h harmonic current.
And on the basis of the optimized feature data, if the optimized feature data does not reach the optimal feature data, repeating S2-S4, and constructing a random forest model after the harmonic feature optimization is carried out after the optimal feature data is reached.
S6: and obtaining a random forest harmonic prediction model based on harmonic characteristic optimization after optimization.
Reconstructing a random forest model with optimized harmonic characteristics based on the optimized characteristic data; the optimization of the constructed harmonic prediction model can be completed by combining the screening of the harmonic model; and obtaining a random forest harmonic prediction model based on harmonic characteristic optimization.
Example 2
A harmonic prediction modeling system based on a random forest optimization algorithm, as shown in fig. 4, includes:
and the acquisition module is used for acquiring power quality monitoring data, preprocessing the monitoring data and constructing a harmonic voltage current feature set.
Preprocessing the data comprises data deletion and abnormal data processing, namely cleaning the data; constructing a harmonic voltage current characteristic set as follows:
Figure BDA0003676656640000091
wherein N represents the number of sampling points in each column; v hj A jth sample point value representing the h harmonic voltage; i is hj The jth sample point value representing the h harmonic current.
The processing module is used for sampling random forest training samples according to the harmonic voltage current feature set; establishing a regression decision tree according to the sampling sample; and constructing a random forest model according to the decision tree.
The sampling method of the random forest training samples comprises the following steps: and randomly and repeatedly extracting k harmonic samples in the original harmonic feature set X and Y, wherein k is the total number of data in the harmonic feature set and is used as a data set P to carry out T-time circulation to form a training set Q of the random forest model.
The method for establishing the regression decision tree comprises the following steps: the harmonic voltage current characteristic set X, Y has k sampling points and M characteristic variables, wherein M is each input harmonic characteristic; extracting k sampling points from an original harmonic feature set X in a release manner by using a Bootstrap Sample method, and finally obtaining a harmonic Sample set with the number of k samples; and randomly selecting M characteristic variables from the M characteristic variables to form a harmonic characteristic variable subset, wherein M is a fixed value, and selecting the optimal characteristic through a least square deviation function to split so as to form a decision tree.
The method for constructing the random forest model comprises the following steps: constructing T decision trees; with M characteristic variables, randomly selecting M at each node of each decision tree t Selecting one of the extracted variables having the most dominant ability by calculating information gain, wherein the random forest algorithm is in m t Finding the best segmentation point of each tree in the selected features, and selecting the harmonic features with the largest information gain as the current splitting features; each tree grows to the maximum extent without any pruning; the generated decision trees form a random forest to predict the voltage and current values of certain harmonic wave, and the output result is determined according to the average value of the tree output in the random forest; modifying the number of trees in the random forest, and selecting a group with the best effect; and evaluating the performance of the model.
And the optimization module is used for optimizing the random forest model to obtain a random forest harmonic prediction model based on harmonic characteristic optimization.
The method for optimizing the random forest model comprises the following steps: the importance of each harmonic feature in the random forest model is arranged in a descending order; firstly, selecting the features with the highest importance for prediction; then selecting two characteristics with highest importance for prediction, repeating the steps, and calculating the decisive coefficients of the prediction results of different characteristic numbers; and selecting a plurality of optimal determinant coefficients to construct a harmonic voltage current characteristic set to obtain optimized characteristic data.
The method for obtaining the random forest harmonic prediction model based on harmonic characteristic optimization after optimization comprises the following steps: reconstructing a random forest model with optimized harmonic characteristics based on the optimized characteristic data; the optimization of the constructed harmonic prediction model can be completed by combining the screening of the harmonic model; and obtaining a random forest harmonic prediction model based on harmonic characteristic optimization.
The method adopts a random forest algorithm as a main body, has high modeling efficiency, can be repeatedly used only by once construction of the decision tree, and overcomes the defects of weak generalization capability and easy overfitting of a single decision tree; the method has good performance on the harmonic feature set, can process the very high-dimensional data of harmonic voltage and current, and does not need to select features; after training, the random forest algorithm can identify which harmonic features are important; and it is very inclusive in terms of noise and harmonic outliers. The harmonic prediction result is determined by the average value of all tree outputs in the forest, so that the prediction precision can be improved.
And analyzing the 24-hour monitoring data in the calculation example by adopting a Wrapper method, comparing and selecting the prediction effects of different numbers of harmonic features after the importance of each harmonic feature is obtained, and determining the required harmonic features. The method and the device realize effective extraction of high-importance harmonic features, reduce harmonic feature data dimensionality, enable model training speed to be fast, are relatively simple to realize, and improve prediction precision and model efficiency.
The invention realizes the prediction of harmonic amplitude. The method is beneficial to analyzing the interaction influence among various harmonic voltage and currents in industrial users, and provides theoretical support for preventing harmonic pollution, detecting the action relation among various harmonic voltage and currents and analyzing the influence factors of the power quality characteristics of the power system.
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims (9)

1. A harmonic prediction modeling method based on a random forest optimization algorithm is characterized by comprising the following steps: acquiring power quality monitoring data, preprocessing the monitoring data, and constructing a harmonic voltage current feature set; sampling random forest training samples according to the harmonic voltage current feature set; establishing a regression decision tree according to the sampling sample; constructing a random forest model according to the decision tree; optimizing the random forest model; and obtaining a random forest harmonic prediction model based on harmonic characteristic optimization after optimization.
2. The harmonic prediction modeling method based on the random forest optimization algorithm as claimed in claim 1, wherein the preprocessing of the data is data cleaning, and comprises data deletion and abnormal data processing.
3. The harmonic prediction modeling method based on the random forest optimization algorithm as claimed in claim 1, wherein the set of constructed harmonic voltage current characteristics is:
Figure FDA0003676656630000011
wherein N represents the number of sampling points in each column; v hj A jth sample point value representing the h harmonic voltage; i is hj The jth sample point value representing the h harmonic current.
4. The harmonic prediction modeling method based on the random forest optimization algorithm as claimed in claim 1, wherein the random forest training sample sampling method is as follows: and randomly and repeatedly extracting k harmonic samples in the original harmonic feature set X and Y, wherein k is the total number of data in the harmonic feature set and is used as a data set P to carry out T-time circulation to form a training set Q of the random forest model.
5. The harmonic prediction modeling method based on the random forest optimization algorithm as claimed in claim 1, wherein the method for establishing the regression decision tree is as follows: the harmonic voltage current characteristic set X, Y has k sampling points and M characteristic variables, wherein M is each input harmonic characteristic; extracting k sampling points from an original harmonic feature set X in a release manner by using a Bootstrap Sample method, and finally obtaining a harmonic Sample set with the number of k samples; and randomly selecting M characteristic variables from the M characteristic variables to form a harmonic characteristic variable subset, wherein M is a fixed value, and selecting the optimal characteristic through a least square deviation function to split so as to form a decision tree.
6. As claimed in claim 1The harmonic prediction modeling method based on the random forest optimization algorithm is characterized in that the method for constructing the random forest model comprises the following steps: constructing T decision trees; with M characteristic variables, randomly selecting M at each node of each decision tree t Selecting one of the extracted variables having the most dominant ability by calculating information gain, wherein the random forest algorithm is in m t Finding the best segmentation point of each tree in the selected features, and selecting the harmonic features with the largest information gain as the current splitting features; each tree grows to the maximum extent without any pruning; the generated decision trees form a random forest to predict the voltage and current values of certain harmonic wave, and the output result is determined according to the average value of the tree output in the random forest; modifying the number of trees in the random forest, and selecting a group with the best effect; and evaluating the performance of the model.
7. The harmonic prediction modeling method based on the random forest optimization algorithm as claimed in claim 1, wherein the method for optimizing the random forest model comprises the following steps: the importance of each harmonic feature in the random forest model is arranged in a descending order; firstly, selecting the features with the highest importance for prediction; then selecting two characteristics with the highest importance for prediction, repeating the steps, and calculating the decisive coefficients of the prediction results with different characteristic numbers; and selecting a plurality of optimal determinant coefficients to construct a harmonic voltage current characteristic set to obtain optimized characteristic data.
8. The harmonic prediction modeling method based on the random forest optimization algorithm as claimed in claim 1, wherein the method for obtaining the harmonic feature optimization-based random forest harmonic prediction model after optimization is as follows: reconstructing a random forest model with optimized harmonic characteristics based on the optimized characteristic data; the optimization of the constructed harmonic prediction model can be completed by combining the screening of the harmonic model; and obtaining a random forest harmonic prediction model based on harmonic characteristic optimization.
9. A harmonic prediction modeling system based on a random forest optimization algorithm is characterized by comprising the following steps: the acquisition module is used for acquiring power quality monitoring data, preprocessing the monitoring data and constructing a harmonic voltage current feature set; the processing module is used for sampling random forest training samples according to the harmonic voltage current feature set; establishing a regression decision tree according to the sampling sample; constructing a random forest model according to the decision tree; and the optimization module is used for optimizing the random forest model to obtain a random forest harmonic prediction model based on harmonic characteristic optimization.
CN202210620824.3A 2022-06-02 2022-06-02 Harmonic prediction modeling method and system based on random forest optimization algorithm Pending CN114880948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210620824.3A CN114880948A (en) 2022-06-02 2022-06-02 Harmonic prediction modeling method and system based on random forest optimization algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210620824.3A CN114880948A (en) 2022-06-02 2022-06-02 Harmonic prediction modeling method and system based on random forest optimization algorithm

Publications (1)

Publication Number Publication Date
CN114880948A true CN114880948A (en) 2022-08-09

Family

ID=82679158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210620824.3A Pending CN114880948A (en) 2022-06-02 2022-06-02 Harmonic prediction modeling method and system based on random forest optimization algorithm

Country Status (1)

Country Link
CN (1) CN114880948A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116448425A (en) * 2023-03-09 2023-07-18 江苏波司登科技有限公司 Conveyor belt bearing remote fault diagnosis method and system based on improved random forest
CN116595368A (en) * 2023-05-16 2023-08-15 北京航空航天大学 Nonlinear modeling-based power amplifier harmonic prediction method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287762A (en) * 2019-04-03 2019-09-27 江苏林洋能源股份有限公司 A kind of non-intrusion type load discrimination method and device based on data mining technology
CN114004162A (en) * 2021-11-03 2022-02-01 国网重庆市电力公司电力科学研究院 Modeling method for smelting load harmonic emission level under multi-working-condition scene

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287762A (en) * 2019-04-03 2019-09-27 江苏林洋能源股份有限公司 A kind of non-intrusion type load discrimination method and device based on data mining technology
CN114004162A (en) * 2021-11-03 2022-02-01 国网重庆市电力公司电力科学研究院 Modeling method for smelting load harmonic emission level under multi-working-condition scene

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
卢珏;孙云莲;谢信霖;郑龙武;徐冰涵;吴莹;: "基于改进组合预测的电能质量预警研究", 电工电能新技术, no. 09, 22 September 2020 (2020-09-22), pages 68 - 76 *
唐守义: "电网谐波测量与谐波源状态识别方法研究", 硕士电子期刊, 31 December 2020 (2020-12-31), pages 4 *
李琦: "基于谐波监测数据的谐波源辨识方法研究", 硕士电子期刊, 15 February 2021 (2021-02-15), pages 3 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116448425A (en) * 2023-03-09 2023-07-18 江苏波司登科技有限公司 Conveyor belt bearing remote fault diagnosis method and system based on improved random forest
CN116448425B (en) * 2023-03-09 2023-11-21 江苏波司登科技有限公司 Conveyor belt bearing remote fault diagnosis method and system based on improved random forest
CN116595368A (en) * 2023-05-16 2023-08-15 北京航空航天大学 Nonlinear modeling-based power amplifier harmonic prediction method
CN116595368B (en) * 2023-05-16 2024-01-26 北京航空航天大学 Nonlinear modeling-based power amplifier harmonic prediction method

Similar Documents

Publication Publication Date Title
CN114880948A (en) Harmonic prediction modeling method and system based on random forest optimization algorithm
CN109272156B (en) Ultra-short-term wind power probability prediction method
CN109597968B (en) SMT big data-based solder paste printing performance influence factor analysis method
CN110335168B (en) Method and system for optimizing power utilization information acquisition terminal fault prediction model based on GRU
CN110544177A (en) Load identification method based on power fingerprint and computer readable storage medium
CN110782658A (en) Traffic prediction method based on LightGBM algorithm
CN113671394A (en) Lithium ion battery expected life prediction method and system
CN111722046A (en) Transformer fault diagnosis method based on deep forest model
CN110987436B (en) Bearing fault diagnosis method based on excitation mechanism
Liao et al. Electricity theft detection using Euclidean and graph convolutional neural networks
CN111008726A (en) Class image conversion method in power load prediction
CN111860624A (en) Power grid fault information classification method based on decision tree
CN111444963A (en) Blast furnace molten iron silicon content prediction method based on SSA-SVR model
CN116245019A (en) Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm
CN117114184A (en) Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device
CN111274701A (en) Harmonic source affine modeling method adopting interval monitoring data dimension reduction regression
CN111178641A (en) Short-term power load prediction method based on feature extraction and multi-core RSVR combined model
CN111950752A (en) Photovoltaic power station generating capacity prediction method, device and system and storage medium thereof
CN117132132A (en) Photovoltaic power generation power prediction method based on meteorological data
CN112215410A (en) Power load prediction method based on improved deep learning
CN107808245A (en) Based on the network scheduler system for improving traditional decision-tree
CN111143436A (en) Data mining method for big data
CN112801388A (en) Power load prediction method and system based on nonlinear time series algorithm
CN111539275A (en) Electrical load classification method and system based on load characteristic visualization
CN111680846A (en) Simplified width learning system based on L1 and L2 norms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination