CN116245019A - Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm - Google Patents

Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm Download PDF

Info

Publication number
CN116245019A
CN116245019A CN202310077201.0A CN202310077201A CN116245019A CN 116245019 A CN116245019 A CN 116245019A CN 202310077201 A CN202310077201 A CN 202310077201A CN 116245019 A CN116245019 A CN 116245019A
Authority
CN
China
Prior art keywords
random forest
algorithm
bagging
training
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310077201.0A
Other languages
Chinese (zh)
Inventor
李亚飞
刘乙
钱科军
郑众
谢鹰
张显楚
宋杰
陈嘉栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nari Technology Co Ltd
NARI Nanjing Control System Co Ltd
State Grid Electric Power Research Institute
Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Nari Technology Co Ltd
NARI Nanjing Control System Co Ltd
State Grid Electric Power Research Institute
Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nari Technology Co Ltd, NARI Nanjing Control System Co Ltd, State Grid Electric Power Research Institute, Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Nari Technology Co Ltd
Priority to CN202310077201.0A priority Critical patent/CN116245019A/en
Publication of CN116245019A publication Critical patent/CN116245019A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Geometry (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Power Engineering (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a load prediction method, a system, a device and a storage medium based on Bagging sampling and an improved random forest algorithm, belonging to the technical field of intelligent power grids and intelligent power consumption, wherein the method comprises the following steps: acquiring a historical load value; inputting the historical load value into the constructed random forest model to obtain a prediction result; according to the invention, based on the Bagging sampling method, a CART decision tree is constructed, abnormal and redundant information is processed more accurately and comprehensively during data processing, effective data is extracted to perform the next operation, so that the calculated amount is reduced, and the random forest model jointly constructed by the combined algorithm has the advantages of each algorithm, so that the short-term load prediction of the power system is more intelligent, and the accuracy of the short-term load prediction of the power system is effectively improved.

Description

Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm
Technical Field
The invention relates to a load prediction method, a system, a device and a storage medium based on Bagging sampling and an improved random forest algorithm, and belongs to the technical field of intelligent power grids and intelligent power utilization.
Background
As one of indispensable works of an electric department, the accurate operation of the power load prediction ensures the efficient and safe operation of a power system, the maintenance plan is safely arranged, the stop and start of a generator set is efficiently and accurately controlled, the occurrence of extra trouble and accidents is reduced, the social benefit and the economic benefit are improved under the condition that the power generation cost is controlled to be the lowest, the normal operation of society is ensured, and the problem is fundamentally and practically solved.
The expert of students at home and abroad performs a great deal of research on a theoretical method of short-term power load prediction, and a plurality of model algorithms with excellent performance are applied to the field, so that the short-term load prediction enters into the age of rapid development, and the short-term load prediction method is generally considered to be divided into two main types, namely a traditional classical prediction method and a modern intelligent prediction method, and the traditional classical prediction method has simple principle but high limitation, often has low precision and causes larger error. With the development of artificial intelligence, the modern intelligent prediction method has extremely strong data processing capability, greatly improves the accuracy of power load prediction, but excessively strong simulation is often accompanied with the problem of large calculation amount.
In the prior art, in addition to the traditional time series method, regression analysis method and trend extrapolation method, other intelligent prediction methods such as an artificial neural network algorithm, a wavelet analysis method and a fuzzy theory exist for the prediction of the load of the power system, in general, a single algorithm is used, the workload is large, the calculation is complex, the prediction accuracy of the load of the power system is low, and more types of misjudgment exist.
Disclosure of Invention
The invention aims to provide a load prediction method, a system, a device and a storage medium based on Bagging sampling and an improved random forest algorithm, which solve the problems of low prediction accuracy, large calculated amount and the like in the prior art.
In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides a loading prediction method based on Bagging sampling and improving a random forest algorithm, including:
acquiring a historical load value;
inputting the historical load value into the constructed random forest model to obtain a prediction result;
the random forest model is constructed by the following method:
acquiring an original sample set, and randomly not replacing samples from the original sample set by utilizing a Bagging algorithm to generate a plurality of training sets;
training each training set to obtain a corresponding CART decision tree;
all CART decision trees are assembled together to form a random forest model.
With reference to the first aspect, further, the historical load values include predicted 96-point load values and environmental data for day-before and seven-day-before.
With reference to the first aspect, further, the generating a plurality of training sets by using Bagging algorithm from the original sample set without randomly replacing the samples includes:
n training samples (d 1, d 2) were randomly decimated from the original sample set using boottrap method, dN), N cycles are performed, resulting in N training sets, and each training set is mutually incoherent.
With reference to the first aspect, further, the training each training set to obtain a corresponding CART decision tree includes:
dividing the training set into two subsets by using a CART algorithm, and continuously recursively dividing to enable each generated non-leaf node to have two branches, wherein the nodes are divided according to the Gini index minimum principle, and the expression of each Gini index divided by each node is as follows:
Figure BDA0004066462960000031
where D is the set before segmentation, D 1 And D 2 Is a split subset of two, gini (D 1 ) Is D 1 Gini index of Gini (D) 2 ) Is D 2 Gini index, gini of (a) split (D) Is the Gini index of D.
In combination with the first aspect, further, in the constructed random forest model, the historical load values are tested and classified through a plurality of CART decision trees, and the final classification is obtained according to the preset proportion, so that a prediction result is obtained.
With reference to the first aspect, in a process of constructing the random forest model, the method further includes the step of setting parameters:
and setting a feature evaluation standard, the number of the maximum weak learners, the maximum feature number, the maximum depth of the decision tree, the minimum sample number required by internal node subdivision and the minimum sample number of the leaf nodes in the random forest model.
In a second aspect, the present invention further provides a load prediction system based on Bagging sampling and improving a random forest algorithm, including:
and a data acquisition module: for obtaining a historical load value;
load prediction module: the method comprises the steps of inputting a historical load value into a constructed random forest model to obtain a prediction result;
the model building unit is used for building a random forest model through the following method:
acquiring an original sample set, and randomly not replacing samples from the original sample set by utilizing a Bagging algorithm to generate a plurality of training sets;
training each training set to obtain a corresponding CART decision tree;
all CART decision trees are assembled together to form a random forest model.
In a third aspect, the invention also provides a load prediction device based on Bagging sampling and improved random forest algorithm, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is operative according to the instructions to perform the steps of the method according to any one of the first aspects.
In a fourth aspect, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.
Compared with the prior art, the invention has the following beneficial effects:
according to the load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm, which are provided by the invention, based on the Bagging sampling method, a CART decision tree is constructed, abnormal and redundant information is processed more accurately and comprehensively during data processing, effective data is extracted for next operation, and the calculated amount is further reduced; then, a random forest algorithm is adopted, a specific sample is repeatedly extracted in a training sample set through historical data, a new training sample set is obtained for training, different CART decision trees are obtained through the sets, the CART decision trees obtained through the method are different, then a random forest is formed, the new input division is determined by the comprehensive result after each CART decision tree division, the decision trees are mutually incoherent, the flexibility is good, and the prediction model jointly built by the combined algorithm has the advantages of each algorithm, so that the short-term load prediction of the power system is more intelligent, and the accuracy of the short-term load prediction of the power system can be effectively improved.
Drawings
FIG. 1 is a flow chart of a load prediction method based on Bagging sampling and an improved random forest algorithm provided by an embodiment of the invention;
fig. 2 is a schematic diagram of a Bagging sampling method provided by an embodiment of the present invention;
FIG. 3 is a flowchart of a Bagging sampling method provided by an embodiment of the present invention;
FIG. 4 is a block diagram of a random forest algorithm provided by an embodiment of the present invention;
fig. 5 is an example diagram of short-term load prediction for a power system according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and the following examples are only for more clearly illustrating the technical aspects of the present invention, and are not to be construed as limiting the scope of the present invention.
Example 1
As shown in fig. 1, the load prediction method based on Bagging sampling and improved random forest algorithm provided by the embodiment of the invention includes:
the invention discloses a short-term load prediction method of an electric power system, which is based on data such as maximum temperature, minimum temperature, average temperature, relative humidity, rainfall and the like, and based on a Bagging sampling method, a CART decision tree is constructed and a random forest algorithm is applied on the basis of predicting the load value of 96 points in the day before and the load value of 96 points in the seven days before, and the weather condition of the day. Four links on the figure are respectively described as follows:
s1, randomly not replacing sampling by using a Bagging algorithm to generate a training set, wherein a schematic diagram of the Bagging sampling method is shown in FIG. 2, and a specific method of a flow chart shown in FIG. 3 is as follows:
randomly drawing N training samples (d 1, d2,) from an original sample set by adopting a Bootstrap method, and executing N times of circulation to obtain N training sets, wherein each training set is mutually incoherent;
each training set is capable of being trained to obtain a model. n training sets obtain n decision tree models;
for each input variable x classification problem, the result is determined by voting from the results of n decision trees.
S2, generating a meta decision tree classifier in a random forest, and constructing a CART decision tree by using a training set, wherein the method comprises the following specific steps of:
the original set of samples is divided into two subsets using the CART algorithm so that there are two branches on each non-leaf node. When nodes are split, the splitting rule is according to the minimum rule of the Gini index, and the following formula is a specific formula step flow of the Gini index formula of probability distribution:
Figure BDA0004066462960000061
/>
the total number of attributes of the data at the time of K in the formula, p k Is the probability of belonging to the k-class attribute feature samples in the node.
The Gini formula used to calculate the sample set D is as follows:
Figure BDA0004066462960000062
wherein C is k Is the sample set that is categorized among the set of kth classes.
The Gini index divided by each node is represented by:
Figure BDA0004066462960000063
where D is the set before segmentation, D 1 And D 2 Is a split subset of two, gini (D 1 ) Is D 1 Gini index of Gini (D) 2 ) Is D 2 Gini index, gini of (a) split (D) Is the Gini index of D.
The CART algorithm (which is a binary recursive partitioning method) can also process data containing missing values or outliers, and the binary tree generated by the CART algorithm is simple and easy to understand and has higher precision. The method can process discrete variables and continuous variables, and has wide application.
S3, setting parameters, constructing a random forest model and an algorithm, wherein a structure diagram of the random forest algorithm is shown in FIG. 4, and the specific steps of constructing the random forest model and the algorithm are as follows:
extracting N samples from the original samples by adopting a Bootstrap method to form training sets, carrying out N times to obtain N groups of training sets which are not mutually related, obtaining N CART decision trees by training the N training sets, and independently taking out the samples which are not extracted to construct m pieces of out-of-bag data;
each CART decision tree randomly selects m attributes corresponding to the classification attribute of each node, and the corresponding optimal attribute is selected according to the information quantity of the classification attribute to split each CART decision tree until the leaf node is split through the optimal feature;
and (3) assembling the CART decision trees obtained through training to form a random forest model, and carrying out test classification on the plurality of CART decision trees according to a preset proportion to obtain final classification. And (5) averaging the regression problems to obtain a final prediction result.
In addition, the specific steps of the setting parameters in S3 are as follows:
six super parameters are chosen in the experiment, which are respectively the characteristic evaluation standard, the maximum weak learner number, the maximum characteristic number, the maximum depth of the decision tree, the minimum sample number required by internal node subdivision, the minimum sample number of leaf nodes, and the like.
The characteristic evaluation standard is an index for measuring the splitting standard, and generally, the mean square error, the average absolute value error and the like can be selected; the number of the maximum weak learners represents the total amount of trees in the forest; the maximum feature number represents the number of attributes considered when the training tree is the best split node; the maximum depth of the decision tree represents the depth to which each tree can be split at most; the minimum number of samples required for internal node subdivision refers to the minimum number of samples required for splitting the internal node; the minimum number of samples for a leaf node represents the minimum number of samples that should be present on the leaf node.
The super-parameter default is the average absolute error at the beginning, the maximum number of weak learners is 800, the maximum depth of the decision tree is 60, the minimum number of samples required by internal node subdivision is 4, and the minimum number of samples of leaf nodes is 4.
S4, inputting 96-point load values, obtaining a predicted result by calculating an average value through a plurality of tree predicted values, wherein an example diagram of short-term load prediction of the power system is shown in FIG. 5, and the specific steps are as follows:
in the experiment, the data of the highest temperature, the lowest temperature, the average temperature, the relative humidity, the rainfall and the like are extracted as input by predicting the load value of 96 points in the day before and the load value of 96 points in the seven days before, and the weather condition of the day, and the date variable of 1-dimension mark working day/weekend is added, and the total 198-dimension variable is calculated, and the output variable is the 96-dimension variable of 96 points in the day to be predicted. A total of 250 days of data was used, 90% as training set and 10% as test set when building random forest model.
And then, optimizing parameters of the estimation function by a cross-validation method by using a grid search method to obtain an optimal learning algorithm. And (3) arranging and combining possible values of the parameters, and listing all possible combination results to generate a grid. Each combination was then used for SVM (support vector machine) training and performance was evaluated using cross-validation. After the fitting function attempts all parameter combinations, it returns to a proper classifier and automatically adjusts to the optimal parameter combination. After the optimal parameters are found by the method, the optimal parameters are obtained, wherein the characteristic evaluation standard is average absolute error, the number of the maximum weak learners is 1800, the maximum depth of the decision tree is 20, the minimum number of samples required by internal node subdivision is 2, and the minimum number of samples of leaf nodes is 2. And finally, obtaining a prediction result through operation.
The theoretical basis of the points of the random forest algorithm in the step S3 is specifically as follows:
(1) Edge function:
random forests are formed by a series of decision trees h (x, θ) k ) The composition, then the edge function expression is:
Figure BDA0004066462960000081
wherein X is an input vector whose maximum threshold contains J different types, where J is one of the attribute categories of J; the definition of Y is the correct classification vector; av (ave) k (g) Is an averaging function; i (g) is an indicator function.
(2) Generalization error:
the generalization error expression of the forest is then:
PE · =P X,Y (K(X,Y)<0)
wherein P is X,Y Is a classification error rate function for a given input X.
K (X, Y) < 0 represents the misclassification of the test input X, and the generalization error represents the probability of misclassification of the input by the model. Namely, the generalization error can reflect the quality of the classification result of the random forest on the test sample. If the generalization error is smaller, the expected error of the mathematical model is smaller, and the classification result is better.
(3) Intensity:
the classification performance of the random forest is affected by the classification performance of the meta classifier, namely the comprehensive value of the classification performance of the meta classifier is the strength of the random forest. If the performance of all meta-classifiers is good, then the classifiability of the random forest will also be better because of the good performance of the meta-classifier. The intensity of the random forest is:
S=E X,Y (K(X,Y))
wherein E is X,Y Is a desired function.
(4) Generalization error upper bound:
Figure BDA0004066462960000091
where s is the average intensity of the decision tree.
The smaller the upper bound of the generalization error, the better the generalization thereof. The above equation illustrates that the computation of the generalization error maximum is independent of the two characteristics of the decision tree. The larger the average intensity of the decision tree, the smaller the average correlation coefficient, and the better the generalization capability of the random forest. This suggests that we can improve the prediction accuracy of random forests by increasing the strength of the meta-classifier decision tree and decreasing the average correlation coefficient.
Example 2
The embodiment of the invention provides a load prediction system based on Bagging sampling and an improved random forest algorithm, which comprises the following components:
and a data acquisition module: for obtaining a historical load value;
load prediction module: the method comprises the steps of inputting a historical load value into a constructed random forest model to obtain a prediction result;
the model building unit is used for building a random forest model through the following method:
acquiring an original sample set, and randomly not replacing samples from the original sample set by utilizing a Bagging algorithm to generate a plurality of training sets;
training each training set to obtain a corresponding CART decision tree;
all CART decision trees are assembled together to form a random forest model.
Example 3
The load prediction device based on Bagging sampling and improved random forest algorithm provided by the embodiment of the invention comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate according to the instructions to perform the steps of the method of:
acquiring a historical load value;
inputting the historical load value into the constructed random forest model to obtain a prediction result;
the random forest model is constructed by the following method:
acquiring an original sample set, and randomly not replacing samples from the original sample set by utilizing a Bagging algorithm to generate a plurality of training sets;
training each training set to obtain a corresponding CART decision tree;
all CART decision trees are assembled together to form a random forest model.
Example 4
The embodiment of the invention provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the method of:
acquiring a historical load value;
inputting the historical load value into the constructed random forest model to obtain a prediction result;
the random forest model is constructed by the following method:
acquiring an original sample set, and randomly not replacing samples from the original sample set by utilizing a Bagging algorithm to generate a plurality of training sets;
training each training set to obtain a corresponding CART decision tree;
all CART decision trees are assembled together to form a random forest model.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (9)

1. A load prediction method based on Bagging sampling and an improved random forest algorithm is characterized by comprising the following steps:
acquiring a historical load value;
inputting the historical load value into the constructed random forest model to obtain a prediction result;
the random forest model is constructed by the following method:
acquiring an original sample set, and randomly not replacing samples from the original sample set by utilizing a Bagging algorithm to generate a plurality of training sets;
training each training set to obtain a corresponding CART decision tree;
all CART decision trees are assembled together to form a random forest model.
2. The Bagging sampling and refinement random forest algorithm-based load prediction method according to claim 1, wherein the historical load values include 96 point load values and environmental data each of day before day and seven days before prediction.
3. The method for predicting the load based on Bagging sampling and improving random forest algorithm according to claim 1, wherein the generating a plurality of training sets by Bagging algorithm from the original sample set randomly without replacing the sampling comprises:
n training samples (d 1, d 2) were randomly decimated from the original sample set using boottrap method, dN), N cycles are performed, resulting in N training sets, and each training set is mutually incoherent.
4. The loading prediction method based on Bagging sampling and improved random forest algorithm according to claim 1, wherein the training each training set to obtain a corresponding CART decision tree comprises:
dividing the training set into two subsets by using a CART algorithm, and continuously recursively dividing to enable each generated non-leaf node to have two branches, wherein the nodes are divided according to the Gini index minimum principle, and the expression of each Gini index divided by each node is as follows:
Figure FDA0004066462940000021
where D is the set before segmentation, D 1 And D 2 Is a split subset of two, gini (D 1 ) Is D 1 Gini index of Gini (D) 2 ) Is D 2 Gini index, gini of (a) split (D) Is the Gini index of D.
5. The load prediction method based on Bagging sampling and improved random forest algorithm according to claim 1, wherein in the established random forest model, historical load values are tested and classified through a plurality of CART decision trees, and final classification is obtained according to a preset proportion, so that a prediction result is obtained.
6. The load prediction method based on Bagging sampling and improving random forest algorithm according to claim 1, wherein in the process of constructing the random forest model, the method further comprises the steps of setting parameters:
and setting a feature evaluation standard, the number of the maximum weak learners, the maximum feature number, the maximum depth of the decision tree, the minimum sample number required by internal node subdivision and the minimum sample number of the leaf nodes in the random forest model.
7. A Bagging sampling and random forest algorithm improvement-based load prediction system, comprising:
and a data acquisition module: for obtaining a historical load value;
load prediction module: the method comprises the steps of inputting a historical load value into a constructed random forest model to obtain a prediction result;
the model building unit is used for building a random forest model through the following method:
acquiring an original sample set, and randomly not replacing samples from the original sample set by utilizing a Bagging algorithm to generate a plurality of training sets;
training each training set to obtain a corresponding CART decision tree;
all CART decision trees are assembled together to form a random forest model.
8. The load prediction device based on Bagging sampling and improved random forest algorithm is characterized by comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor being operative according to the instructions to perform the steps of the method according to any one of claims 1 to 6.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
CN202310077201.0A 2023-01-30 2023-01-30 Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm Pending CN116245019A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310077201.0A CN116245019A (en) 2023-01-30 2023-01-30 Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310077201.0A CN116245019A (en) 2023-01-30 2023-01-30 Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm

Publications (1)

Publication Number Publication Date
CN116245019A true CN116245019A (en) 2023-06-09

Family

ID=86632481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310077201.0A Pending CN116245019A (en) 2023-01-30 2023-01-30 Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm

Country Status (1)

Country Link
CN (1) CN116245019A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116631572A (en) * 2023-07-24 2023-08-22 中国人民解放军总医院 Acute myocardial infarction clinical decision support system and device based on artificial intelligence
CN116720145A (en) * 2023-08-08 2023-09-08 山东神舟制冷设备有限公司 Wireless charging remaining time prediction method based on data processing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116631572A (en) * 2023-07-24 2023-08-22 中国人民解放军总医院 Acute myocardial infarction clinical decision support system and device based on artificial intelligence
CN116631572B (en) * 2023-07-24 2023-12-19 中国人民解放军总医院 Acute myocardial infarction clinical decision support system and device based on artificial intelligence
CN116720145A (en) * 2023-08-08 2023-09-08 山东神舟制冷设备有限公司 Wireless charging remaining time prediction method based on data processing
CN116720145B (en) * 2023-08-08 2023-10-27 山东神舟制冷设备有限公司 Wireless charging remaining time prediction method based on data processing

Similar Documents

Publication Publication Date Title
CN116245019A (en) Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm
CN109886464B (en) Low-information-loss short-term wind speed prediction method based on optimized singular value decomposition generated feature set
CN106649479A (en) Probability graph-based transformer state association rule mining method
CN112987666A (en) Power plant unit operation optimization regulation and control method and system
CN111738477A (en) Deep feature combination-based power grid new energy consumption capability prediction method
CN112884236B (en) Short-term load prediction method and system based on VDM decomposition and LSTM improvement
CN114169434A (en) Load prediction method
CN112529053A (en) Short-term prediction method and system for time sequence data in server
CN110569883A (en) Air quality index prediction method based on Kohonen network clustering and Relieff feature selection
CN114841199A (en) Power distribution network fault diagnosis method, device, equipment and readable storage medium
CN113468796A (en) Voltage missing data identification method based on improved random forest algorithm
CN112184412A (en) Modeling method, device, medium and electronic equipment of credit rating card model
CN116799796A (en) Photovoltaic power generation power prediction method, device, equipment and medium
CN117556369B (en) Power theft detection method and system for dynamically generated residual error graph convolution neural network
CN110929835A (en) Novel silicon carbide-based aviation power converter fault diagnosis method and system
CN117132132A (en) Photovoltaic power generation power prediction method based on meteorological data
CN115600102B (en) Abnormal point detection method and device based on ship data, electronic equipment and medium
CN113111588B (en) NO of gas turbine X Emission concentration prediction method and device
CN116054144A (en) Distribution network reconstruction method, system and storage medium for distributed photovoltaic access
CN116108963A (en) Electric power carbon emission prediction method and equipment based on integrated learning module
Tran et al. A new grid search algorithm based on XGBoost model for load forecasting
CN114139783A (en) Wind power short-term power prediction method and device based on nonlinear weighted combination
CN112465195A (en) Bus load prediction method and system considering high-proportion distributed photovoltaic access
CN117117858B (en) Wind turbine generator power prediction method, device and storage medium
CN117436911A (en) Bagging-based power transmission and transformation project cost prediction method and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination