CN114154679B - Spark-based PCFOA-KELM wind power prediction method and device - Google Patents

Spark-based PCFOA-KELM wind power prediction method and device Download PDF

Info

Publication number
CN114154679B
CN114154679B CN202111232892.4A CN202111232892A CN114154679B CN 114154679 B CN114154679 B CN 114154679B CN 202111232892 A CN202111232892 A CN 202111232892A CN 114154679 B CN114154679 B CN 114154679B
Authority
CN
China
Prior art keywords
wind power
data
value
kelm
pcfoa
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111232892.4A
Other languages
Chinese (zh)
Other versions
CN114154679A (en
Inventor
经正俊
齐刚
宋坤
张世磊
马驰源
吉书强
周安
倪晓锋
王文贵
管超
甘露平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Nanzi Huadun Digital Technology Co ltd
Original Assignee
Nanjing Huadun Power Information Security Evaluation Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Huadun Power Information Security Evaluation Co Ltd filed Critical Nanjing Huadun Power Information Security Evaluation Co Ltd
Priority to CN202111232892.4A priority Critical patent/CN114154679B/en
Publication of CN114154679A publication Critical patent/CN114154679A/en
Application granted granted Critical
Publication of CN114154679B publication Critical patent/CN114154679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Genetics & Genomics (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physiology (AREA)
  • Game Theory and Decision Science (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a Spark-based PCFOA-KELM wind power prediction method and device. The method comprises the following steps: collecting meteorological index data affecting the wind power of a single fan in a wind power plant; preprocessing the meteorological index data by using a Spark platform; inputting the preprocessed meteorological index data into a pre-trained KELM model corresponding to the single fan, wherein the KELM model outputs a wind power predicted value of the single fan; and adding the wind power predicted values of all fans in the wind power plant to obtain the total power predicted value of the wind power plant. The KELM model is based on a Spark platform, is trained by taking meteorological indexes influencing the wind power of a single fan in a wind power plant and corresponding wind power historical data as training samples, and is obtained by carrying out parameter optimization on the KELM model by using a PCFOA algorithm. The wind power prediction method can realize high-efficiency and accurate prediction of wind power, and simultaneously well improves the running speed.

Description

Spark-based PCFOA-KELM wind power prediction method and device
Technical Field
The invention relates to a Spark-based PCFOA (parallel chaos drosophila optimization algorithm, parallel ChaoticFruit Fly Optimization Algorithm) -KELM (kernel extreme learning machine ) wind power prediction method and device, and belongs to the field of wind power generation.
Background
The wind power generation is influenced by meteorological factors, and the output power of the wind power system has the characteristics of fluctuation, randomness, intermittence and the like along with the difference of seasons and geographic positions. The wind power is predicted with high precision, which is a basis for evaluating the running state of the wind power plant, is an important basis for planning, designing and scheduling the running of the power grid, and has important significance for guaranteeing the safe and stable running of the power grid.
In the research of wind power prediction technology, many universities, enterprises, research institutions and the like begin to conduct, and the currently proposed wind power prediction method can be roughly divided into: physical methods and statistical methods.
The physical method is based on data such as numerical weather forecast NWP (Numerical Weather Prediction) and topography information, wind speed and other data are obtained through simulation models (such as a microscopic meteorological model and a CFD model), and then a prediction result is obtained by combining an actual power curve. The method is suitable for the wind power plant with complex terrain, does not need a large amount of data, but is very complex in calculation, and the prediction accuracy is difficult to guarantee because the NWP resolution is not easy to reach the use requirement. The statistical method is to build a prediction model based on historical data to predict by using a machine learning method, such as a neural network, a support vector machine, a time sequence method and the like. The method can adapt to the position of the wind power plant and reduce errors in a self-adaptive manner, but needs a large amount of historical data, and has the problems of low training optimizing speed and the like.
With the advent of the 4.0 era of industry, big data technology has become increasingly used in smart power plants. In view of the characteristics of large wind power prediction data volume, complex data types and the like, students propose to apply a large data technology to process a large-scale data set, so that training and prediction time is reduced, and prediction accuracy is improved.
Among the solutions for modeling based on big data platforms, the solutions currently existing are: processing and analyzing wind power prediction input data through a Hadoop platform, and then modeling and predicting by combining a BP neural network and an SVM to establish a combined model and the like; and establishing a parallel SVM model based on the Hadoop platform, and carrying out SVM optimization and power prediction through parallel calculation.
On the one hand, the traditional physical method has high algorithm complexity, very complex calculation and prediction accuracy basically depends on weather data prediction conditions, and the prediction accuracy needs to be improved; the statistical method needs to process a large amount of historical data, has overlong data training optimizing time, is difficult to meet the predicted timeliness, and is practically applied. On the other hand, in the algorithm for wind power prediction by using the big data technology, the technical advantages of the big data platform cannot be fully utilized, and the big data related technology is only applied to the preprocessing of data or the parallel model calculation. In addition, the traditional wind power prediction is performed for the whole power plant, and along with the expansion of the wind power plant, the accuracy of predicting the modeling of the whole wind power plant is not high enough due to the fact that the meteorological conditions and the geographic conditions of the positions of the wind turbines are different.
Disclosure of Invention
The invention aims to provide a Spark-based PCFOA-KELM wind power prediction method and device, which are used for solving the problem of how to rapidly and accurately predict wind power and improving running speed while guaranteeing prediction accuracy.
In order to achieve the above purpose, the invention adopts the following technical scheme:
in one aspect, a Spark-based PCFOA-KELM wind power prediction method comprises:
collecting meteorological index data affecting the wind power of a single fan in a wind power plant;
preprocessing the meteorological index data by using a Spark platform;
inputting the preprocessed meteorological index data into a pre-trained KELM model corresponding to the single fan, wherein the KELM model outputs a wind power predicted value of the single fan;
adding the wind power predicted values of all fans in the wind power plant to obtain a total power predicted value of the wind power plant;
the KELM model is obtained by training the KELM model by using meteorological indexes influencing the wind power of a single fan in a wind power plant and corresponding wind power historical data as training samples based on a Spark platform and performing parameter optimization on the KELM model by using a PCFOA algorithm.
Further, the training method of the KELM model comprises the following steps:
acquiring weather index historical data and corresponding wind power historical data which influence the wind power of a single wind turbine in a wind power plant;
preprocessing the acquired historical data by using a Spark platform;
establishing a feature vector according to the preprocessed data, and dividing the feature vector into a training sample and a test sample;
training the KELM model by adopting a training sample, and optimizing regularization coefficient and nuclear parameter of the KELM model by using a PCFOA algorithm to obtain an optimized KELM model;
and testing the prediction effect of the optimized KELM model by adopting a test sample.
Further, the preprocessing the historical data by using the Spark platform includes:
and storing the historical data into RDD to generate an RDD data set, dividing the RDD data set into a plurality of sub-data sets, respectively carrying out data preprocessing on each sub-data set, and distributing tasks of carrying out data preprocessing on the plurality of sub-data sets to a plurality of corresponding executors for parallel operation.
Further, the preprocessing includes: data outlier processing, data missing value processing, data noise reduction and normalization processing.
Further, the data outlier processing includes:
for the situation of sporadic data missing, adopting an average value of data values before and after the missing point as missing point data;
for the case of a small amount of data missing, an interpolation method is adopted for processing, and a specific interpolation formula is shown as follows:
wherein P is n 、P n+i And P n+j Values at the n, n+i and n+j moments of the index respectively;
for the case of a large amount of data missing, the data for that period of time is discarded.
Further, the data noise reduction includes:
performing wavelet decomposition on the acquired index values by using a Haar wavelet basis function;
according to the wavelet coefficient of the normal data signal being larger than the wavelet coefficient of the noise, based on the set threshold, the noise is separated from the normal data according to the following formula:
where λ is the wavelet coefficient threshold, ω is the wavelet coefficient, ω λ For the denoised wavelet coefficients, sgn is a sign function;
and reconstructing the data after noise removal.
Further, the weather indicators include wind speed, wind direction, air pressure, temperature and humidity, and the normalization process includes:
according to the maximum value and the minimum value, wind speed, air pressure and humidity are respectively processed according to the following formula:
wherein, the value is the normalized result, the value range is 0-1, and the value is max ,value min And value t Respectively representing a maximum value, a minimum value and a current value in the index data set;
the temperature was treated according to the following formula:
wherein T is the normalized temperature value, T t Is the current temperature value;
wind direction is normalized to between 0 and 1 using a combination of sine and cosine values.
Further, the optimizing regularization coefficients and kernel parameters of the KELM model by using the PCFOA algorithm comprises the following steps:
dividing a population of Drosophila of size N into N independent parallel units by RDD of a Spark platform, each individual Drosophila calculating a taste concentration value according to an fitness function and locally searching for an updated position from the current position, the taste concentration value and the current position locally searching for the updated position performing the calculation in parallel.
Further, the optimizing regularization coefficients and kernel parameters of the KELM model by using the PCFOA algorithm comprises the following steps:
s1, initializing a Drosophila population scale N, a maximum iteration number max, a search step length l and an initial position (x) 0 ,y 0 ) And a chaotic parameter mu;
s2, dividing the Drosophila population into N independent parallel computing units;
s3, starting iterative optimization, and for each parallel computing unit, carrying out the first iteration, wherein the random distance for the drosophila individual to search food by using smell is shown as follows:
after the first iteration is completed, the subsequent iteration uses the optimal solution information to obtain a new chaotic step length l based on two-dimensional Logistic chaotic mapping chaotization x And l y The calculation formula is shown as follows:
wherein mu is E (0,2.28), x n ,y n ∈(0,1);
S4, calculating the distance d from the current position of the drosophila individual to the origin i And taking the reciprocal thereof as the individual taste concentration determination value s of the fruit fly i The calculation formula is shown as follows:
s i =1d i
s5, judging the taste concentration to be S i Substituting the taste concentration judgment function to calculate the taste concentration value of the drosophila individual at the current position;
s6, finding out the fruit fly with the optimal taste concentration value in all fruit fly individuals, and recording the taste concentration value and the corresponding position value of the fruit fly;
s7, maintaining the optimal taste concentration value S and its coordinates (x index ,y index ) And utilizing vision to fly to the optimal position recorded in the step S6 to update the position, so as to form a new drosophila population center:
s best =s
and S8, repeatedly executing the steps S2 to S6, judging whether the optimal taste concentration value is better than the historical optimal taste concentration value, and executing the step S7 if the optimal taste concentration value is smaller than max and the current iteration number is smaller than max.
In another aspect, a Spark-based PCFOA-KELM wind power prediction apparatus includes:
the acquisition module is configured to acquire meteorological index data affecting the wind power of a single fan in the wind power plant;
the data preprocessing module is configured to preprocess the weather index data by using a Spark platform;
the prediction module is configured to input the preprocessed meteorological index data into a pre-trained KELM model corresponding to the single fan, and the KELM model outputs a wind power predicted value of the single fan;
the calculation module is configured to add the wind power predicted values of all fans in the wind power plant to obtain a total power predicted value of the wind power plant;
the KELM model is obtained by training the KELM model by using meteorological indexes influencing the wind power of a single fan in a wind power plant and corresponding wind power historical data as training samples based on a Spark platform and performing parameter optimization on the KELM model by using a PCFOA algorithm.
Compared with the prior art, the invention has the beneficial technical effects that:
the invention fully utilizes the powerful parallel analysis processing data capacity of the Spark big data platform, carries out high-efficiency preprocessing on the acquired data based on the Spark platform, designs a parallel parameter optimizing method PCFOA, and improves the data preprocessing and optimizing modeling speed; meanwhile, by utilizing the characteristics of better generalization performance of KELM than ELM, SVM and other machine learning algorithms, higher calculation speed and the like when better or similar prediction accuracy is obtained, based on a Spark big data platform, PCFOA is utilized to perform parameter optimization on the KELM, so that the prediction accuracy of wind power is effectively improved, and the running speed is well improved.
Drawings
FIG. 1 is a flowchart of a Spark-based PCFOA-KELM wind power prediction method according to an embodiment of the present invention.
Detailed Description
The invention is further described below in connection with specific embodiments. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
As described above, in terms of predicting the power of the wind farm, the traditional physical method has high algorithm complexity, very complex calculation and the prediction accuracy basically depends on the weather data prediction condition; the statistical method needs to process a large amount of historical data, has overlong data training optimizing time, and is difficult to meet the timeliness of prediction; in the algorithm for wind power prediction by using the big data technology, the technical advantages of the big data platform cannot be fully utilized.
Therefore, the invention provides a parallel chaos drosophila optimization algorithm PCFOA-based parallel kernel extreme learning machine KELM wind power algorithm for modeling and predicting a single fan, and then taking the sum of the predicted powers of all fans as the total output of a wind power plant, thereby improving the wind power prediction precision; and the function characteristics of the Spark platform are fully utilized, and the Spark is utilized for correlation realization in the two aspects of data processing analysis and parallel model calculation, so that the prediction accuracy is effectively improved, and meanwhile, the running speed is well improved.
In one embodiment, a Spark-based PCFOA-KELM wind power prediction method, as shown in FIG. 1, comprises:
step S1, weather index historical data and corresponding wind power historical data which influence the wind power of a single wind turbine in a wind power plant are obtained;
the meteorological indexes such as wind speed V, wind direction D, air pressure P, temperature T, humidity H and the like are taken as factors influencing wind power.
S2, preprocessing the acquired historical data by using a Spark platform;
because some uncertain factors may cause data abnormality or missing in actual data measurement and collection, the index value data acquired in step S1 needs to be preprocessed.
In this embodiment, data preprocessing is performed based on the Spark platform.
Spark is a big data parallel computing framework based on memory computation. Spark is based on memory computation and is built on a unified and abstract RDD (resilient distributed data set, resilient Distributed Datasets), so that the real-time performance of data processing in a big data environment is improved, high fault tolerance and high scalability are ensured, and a user is allowed to deploy Spark on a large amount of cheap hardware to form a cluster. RDD is a fault-tolerant, parallel data structure, which allows users to explicitly store data to disk and memory, and can control the partitioning of data, and large data analysis and processing has extremely high performance.
The method comprises the steps of storing original data into RDD, decomposing the RDD into a plurality of tasks with the same logic after triggering actions by using MapReduce principle, and distributing the tasks to a plurality of Executor for parallel execution.
In this embodiment, the history data obtained in step S1 is stored in RDD to generate an RDD dataset, the RDD dataset is divided into a plurality of sub-datasets, each sub-dataset is respectively subjected to data preprocessing, and tasks of the data preprocessing of the plurality of sub-datasets are distributed to a plurality of corresponding executors for parallel operation.
Wherein, data preprocessing includes: data outlier processing, data missing value processing, data noise reduction and normalization processing. The method comprises the following steps:
(1) Data outlier handling
And (3) referring to the reasonable range table 1 of the test values for preprocessing the data of each index data obtained, and eliminating the data which are not in the range.
Table 1 parameter rational range table
(2) Data missing value handling
For the condition of the index data missing, in order to ensure the continuity and the authenticity of the data, the following conditions are respectively processed:
a) For the situation of sporadic data missing, adopting an average value of data values before and after the missing point as missing point data;
b) For the condition of a small amount of data missing, an interpolation method is adopted for processing, and a specific interpolation formula is shown as a formula (1):
wherein P is n 、P n+i And P n+j Values at the n, n+i and n+j moments of the index respectively;
c) For the case of a large amount of data missing, such as caused by fan maintenance or planned limiting, the time data should be discarded as invalid data.
(3) Data denoising process
Due to the influence of factors such as environment, the acquired index data is mixed with a lot of noise, and denoising processing is needed.
In this embodiment, denoising is performed by using a wavelet threshold method, after wavelet decomposition is performed on each index, the wavelet coefficient of the original signal is larger than the wavelet coefficient of the noise, and a proper threshold is selected to separate the signal and the noise, and then the signal is reconstructed, so as to achieve the effect of denoising, which specifically comprises the following steps:
a) Selecting a Haar wavelet basis function to perform 3-layer decomposition on the acquisition index value;
b) Threshold processing is carried out on each detail coefficient according to the formula (2), and noise is separated from normal data;
where λ is the wavelet coefficient threshold, ω is the wavelet coefficient, ω λ For the denoised wavelet coefficients, sgn is a sign function;
c) Reconstructing the data after noise removal to obtain processed data.
(4) Data normalization
The indexes such as wind speed, air pressure, humidity and the like are treated according to a formula (3) by referring to the maximum value and the minimum value:
wherein, the value is the normalized result, the value range is 0-1, and the value is max ,value min And value t Representing the maximum, minimum and current values within the index data set, respectively.
The temperature normalization is performed according to the following formula (4):
wherein T is the normalized temperature value, T t Is the current temperature value.
The wind direction is between 0 and 360 degrees, and can be normalized to be between 0 and 1 by combining sine and cosine values.
Step S3, a feature vector is established according to the preprocessed data, and is divided into a training sample and a test sample;
s4, training the KELM model by using a training sample, and optimizing regularization coefficients and nuclear parameters of the KELM model by using a PCFOA algorithm to obtain an optimized KELM model;
the fruit Fly Optimization Algorithm (FOA) is a search type global optimization evolution algorithm which is evolved from fruit fly foraging behaviors, has fewer FOA adjusting parameters, is simple to operate and is easy to be practically applied.
Based on FOA, in the embodiment, a parallel chaos fruit fly algorithm PCFOA is designed based on a Spark platform, the process of searching the optimal taste concentration position of individual fruit flies is parallelized, namely, a fruit fly population with the size of N is divided into N independent parallel units, each individual fruit fly calculates the taste concentration value according to a fitness function and locally searches for an updated position from the current position, and the steps are performed by the Spark frame to perform parallel calculation.
The kernel extreme learning machine KELM is derived from an extreme learning machine ELM theory, the ELM is a training method of a single hidden layer forward neural network, the output weight of the network can be obtained through one-step calculation and analysis, the kernel extreme learning machine KELM has high learning speed, and the regression function and the hidden layer and output layer link weight are shown as a formula (14):
wherein x is sample input, f (x) is network output, H (x) and H are hidden layer feature mapping matrixes of random mapping, beta is a hidden layer and output layer connection weight obtained according to a generalized inverse matrix theory, I is a diagonal matrix, C is a punishment coefficient, and T is a sample target value vector.
Compared with ELM, KELM has stronger capability of solving the regression prediction problem, better generalization performance and higher prediction speed under the condition of equal prediction precision. The KELM can calculate the value of the output function by knowing the form of the kernel function, and the initial weight and offset values of the hidden layer are not required to be specially set when the output function value is solved, and the output can be expressed as the formula (15):
in the formula, C represents a regularization coefficient and is used for balancing output weight and training error; omega shape ELM =HH T Representing a kernel function matrix, using which all input samples can be mapped from dimensional space to Gao Weiyin layers of feature space; k (x) i ,x j ) The representation kernel functions may include gaussian kernel functions, polynomial kernel functions, linear kernel functions, and the like; i represents an identity matrix; t represents the desired output.
And optimizing a regularization coefficient C and a nuclear parameter lambda of the KELM by using the PCFOA to obtain an optimized KELM model, wherein the method comprises the following specific steps of:
s301, initializing a Drosophila population scale N, a maximum iteration number max, a search step length l and an initial position (x 0 ,y 0 ) And a chaotic parameter mu;
s302, dividing a drosophila population into N independent parallel computing units;
s303, starting iterative optimization, and calculating the random distance of the drosophila individual searching food by using smell when each parallel calculation unit performs the first iteration, wherein the random distance is shown as the following formula (5):
after the first iteration is completed, the subsequent iteration uses the optimal solution information to obtain a new chaotic step length l based on two-dimensional Logistic chaotic mapping chaotization x And l y The calculation formulas are shown in the following formulas (6) and (7):
wherein mu is E (0,2.28), x n ,y n ∈(0,1);
S304, since the position of the food is not known initially, the distance d between the current position of the drosophila individual and the origin is calculated first i And taking the reciprocal thereof as the individual taste concentration determination value s of the fruit fly i The calculation formulas are shown in the following formulas (8) and (9):
s i =1/d i (9)
s305, judging the taste concentration of the flavor i Substituting into the taste concentration determination function to calculate the taste concentration value s of the Drosophila individual at the current position current
s current =Function(s i ) (10)
S306, finding out the fruit fly with the optimal taste concentration value in all fruit fly individuals, namely the individual with the highest concentration or the lowest concentration, and recording the taste concentration value and the corresponding position value of the fruit fly;
[s,index]=min(s i )||max(s i ) (11)
s307, retaining the optimal taste concentration value S and its coordinates (x index ,y index ) And utilizing vision to fly to the optimal position recorded in the step S306 to update the position, so as to form a new drosophila population center:
s best =s (13)
and S308, starting iterative optimization, repeatedly executing the steps S302-S306, judging whether the optimal taste concentration value is better than the historical optimal taste concentration value, and executing the step S307 if the optimal taste concentration value is smaller than max and the current iteration number is smaller than max.
And S5, testing the prediction effect of the optimized KELM model by adopting a test sample so as to evaluate the prediction effect.
Next, the wind power of a single fan is predicted by using a trained KELM model, which specifically comprises:
s6, collecting meteorological index data affecting the wind power of a single fan in a wind power plant;
weather-indicating data including wind speed V, wind direction D, barometric pressure P, temperature T, and humidity H.
S7, preprocessing the weather index data by using a Spark platform;
the preprocessing step is the same as step S2.
S8, inputting the preprocessed meteorological index data into a pre-trained KELM model corresponding to the single fan, and outputting a wind power predicted value of the single fan by the KELM model;
and S9, adding the wind power predicted values of all fans in the wind power plant to obtain the total power predicted value of the wind power plant.
Through the embodiment, the Spark-based PCFOA-KELM wind power prediction method can be used for efficiently preprocessing sample data based on a Spark big data platform, and a parallel Drosophila optimization algorithm PCFOA is designed, so that the modeling prediction speed of wind power is improved; the kernel extreme learning machine KELM is used for modeling and predicting the wind power, and PCFOA is used for parameter optimization and optimizing the KELM model, so that the wind power prediction precision is improved.
The invention fully utilizes the powerful parallel analysis processing data capacity of the Spark big data platform, carries out high-efficiency preprocessing on the acquired data based on the Spark platform, designs a parallel parameter optimizing method PCFOA, and improves the data preprocessing and optimizing modeling speed; the method has the advantages that the KELM is better in generalization performance than the ELM, SVM and other machine learning algorithms, the calculation speed is higher when better or similar prediction accuracy is obtained, the PCFOA is used for carrying out parameter optimization on the KELM, and the wind power prediction accuracy is improved well; and finally, the wind power is efficiently and accurately predicted based on the Spark big data platform.
In another embodiment, a Spark-based PCFOA-key wind power prediction apparatus includes:
the acquisition module is configured to acquire meteorological index data affecting the wind power of a single fan in the wind power plant;
the data preprocessing module is configured to preprocess the weather index data by using a Spark platform;
the prediction module is configured to input the preprocessed meteorological index data into a pre-trained KELM model corresponding to the single fan, and the KELM model outputs a wind power predicted value of the single fan;
the calculation module is configured to add the wind power predicted values of all fans in the wind power plant to obtain a total power predicted value of the wind power plant;
the KELM model is obtained by training the KELM model by using meteorological indexes influencing the wind power of a single fan in a wind power plant and corresponding wind power historical data as training samples based on a Spark platform and performing parameter optimization on the KELM model by using a PCFOA algorithm.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (7)

1. A Spark-based PCFOA-KELM wind power prediction method is characterized by comprising the following steps:
collecting meteorological index data affecting the wind power of a single fan in a wind power plant;
preprocessing the meteorological index data by using a Spark platform;
inputting the preprocessed meteorological index data into a pre-trained KELM model corresponding to the single fan, wherein the KELM model outputs a wind power predicted value of the single fan;
adding the wind power predicted values of all fans in the wind power plant to obtain a total power predicted value of the wind power plant;
the KELM model is obtained by training the KELM model by taking meteorological indexes influencing the wind power of a single fan in a wind power plant and corresponding wind power historical data as training samples based on a Spark platform and performing parameter optimization on the KELM model by using a PCFOA algorithm;
the training method of the KELM model comprises the following steps:
acquiring weather index historical data and corresponding wind power historical data which influence the wind power of a single wind turbine in a wind power plant;
preprocessing the acquired historical data by using a Spark platform;
establishing a feature vector according to the preprocessed data, and dividing the feature vector into a training sample and a test sample;
training the KELM model by adopting a training sample, and optimizing regularization coefficient and nuclear parameter of the KELM model by using a PCFOA algorithm to obtain an optimized KELM model;
testing the prediction effect of the optimized KELM model by adopting a test sample;
the optimizing regularization coefficient and nuclear parameter of the KELM model by using PCFOA algorithm comprises the following steps:
dividing a drosophila population with the size of N into N independent parallel units through RDD of a Spark platform, calculating a taste concentration value by each individual drosophila according to a fitness function and locally searching for an updated position from the current position, wherein the taste concentration value and the locally searching for the updated position at the current position are parallelly calculated;
the optimizing regularization coefficient and nuclear parameter of the KELM model by using PCFOA algorithm comprises the following steps:
s1, initializing a Drosophila population scale N, a maximum iteration number max, a search step length l and an initial position (x) 0 ,y 0 ) And a chaotic parameter mu;
s2, dividing the Drosophila population into N independent parallel computing units;
s3, starting iterative optimization, and for each parallel computing unit, carrying out the first iteration, wherein the random distance for the drosophila individual to search food by using smell is shown as follows:
after the first iteration is completed, the subsequent iteration uses the optimal solution information to obtain a new chaotic step length l based on two-dimensional Logistic chaotic mapping chaotization x And l y The calculation formula is shown as follows:
wherein mu is E (0,2.28), x n ,y n ∈(0,1);
S4, calculating the distance d from the current position of the drosophila individual to the origin i And take its reciprocalAs a taste concentration determination value s of Drosophila individuals i The calculation formula is shown as follows:
s i =1/d i
s5, judging the taste concentration to be S i Substituting the taste concentration judgment function to calculate the taste concentration value of the drosophila individual at the current position;
s6, finding out the fruit fly with the optimal taste concentration value in all fruit fly individuals, and recording the taste concentration value and the corresponding position value of the fruit fly;
s7, maintaining the optimal taste concentration value S and its coordinates (x index ,y index ) And utilizing vision to fly to the optimal position recorded in the step S6 to update the position, so as to form a new drosophila population center:
s best =s
and S8, repeatedly executing the steps S2 to S6, judging whether the optimal taste concentration value is better than the historical optimal taste concentration value, and executing the step S7 if the optimal taste concentration value is smaller than max and the current iteration number is smaller than max.
2. The Spark-based PCFOA-key wind power prediction method according to claim 1, wherein the preprocessing the historical data by using a Spark platform comprises:
and storing the historical data into RDD to generate an RDD data set, dividing the RDD data set into a plurality of sub-data sets, respectively carrying out data preprocessing on each sub-data set, and distributing tasks of carrying out data preprocessing on the plurality of sub-data sets to a plurality of corresponding executors for parallel operation.
3. The Spark-based PCFOA-key wind power prediction method according to claim 1, wherein the preprocessing comprises: data outlier processing, data missing value processing, data noise reduction and normalization processing.
4. A Spark-based PCFOA-key wind power prediction method according to claim 3, characterized in that the data outlier processing comprises:
for the situation of sporadic data missing, adopting an average value of data values before and after the missing point as missing point data;
for the case of a small amount of data missing, an interpolation method is adopted for processing, and a specific interpolation formula is shown as follows:
wherein P is n 、P n+i And P n+j Values at the n, n+i and n+j moments of the index respectively;
for the case of a large amount of data missing, the data for that period of time is discarded.
5. A Spark-based PCFOA-key wind power prediction method according to claim 3, characterized in that the data noise reduction comprises:
performing wavelet decomposition on the acquired index values by using a Haar wavelet basis function;
according to the wavelet coefficient of the normal data signal being larger than the wavelet coefficient of the noise, based on the set threshold, the noise is separated from the normal data according to the following formula:
where λ is the wavelet coefficient threshold, ω is the wavelet coefficient, ω λ For the denoised wavelet coefficients, sgn is a sign function;
and reconstructing the data after noise removal.
6. A Spark-based PCFOA-key wind power prediction method according to claim 3, wherein the meteorological indexes include wind speed, wind direction, air pressure, temperature and humidity, and the normalization process comprises:
according to the maximum value and the minimum value, wind speed, air pressure and humidity are respectively processed according to the following formula:
wherein, the value is the normalized result, the value range is 0-1, and the value is max ,value min And value t Respectively representing a maximum value, a minimum value and a current value in the index data set;
the temperature was treated according to the following formula:
wherein T is the normalized temperature value, T t Is the current temperature value;
wind direction is normalized to between 0 and 1 using a combination of sine and cosine values.
7. Spark-based PCFOA-KELM wind power prediction device is characterized by comprising:
the acquisition module is configured to acquire meteorological index data affecting the wind power of a single fan in the wind power plant;
the data preprocessing module is configured to preprocess the weather index data by using a Spark platform;
the prediction module is configured to input the preprocessed meteorological index data into a pre-trained KELM model corresponding to the single fan, and the KELM model outputs a wind power predicted value of the single fan;
the calculation module is configured to add the wind power predicted values of all fans in the wind power plant to obtain a total power predicted value of the wind power plant;
the KELM model is obtained by training the KELM model by taking meteorological indexes influencing the wind power of a single fan in a wind power plant and corresponding wind power historical data as training samples based on a Spark platform and performing parameter optimization on the KELM model by using a PCFOA algorithm;
the training method of the KELM model comprises the following steps:
acquiring weather index historical data and corresponding wind power historical data which influence the wind power of a single wind turbine in a wind power plant;
preprocessing the acquired historical data by using a Spark platform;
establishing a feature vector according to the preprocessed data, and dividing the feature vector into a training sample and a test sample;
training the KELM model by adopting a training sample, and optimizing regularization coefficient and nuclear parameter of the KELM model by using a PCFOA algorithm to obtain an optimized KELM model;
testing the prediction effect of the optimized KELM model by adopting a test sample;
the optimizing regularization coefficient and nuclear parameter of the KELM model by using PCFOA algorithm comprises the following steps:
dividing a drosophila population with the size of N into N independent parallel units through RDD of a Spark platform, calculating a taste concentration value by each individual drosophila according to a fitness function and locally searching for an updated position from the current position, wherein the taste concentration value and the locally searching for the updated position at the current position are parallelly calculated;
the optimizing regularization coefficient and nuclear parameter of the KELM model by using PCFOA algorithm comprises the following steps:
s1, initializing a Drosophila population scale N, a maximum iteration number max, a search step length l and an initial position (x) 0 ,y 0 ) And a chaotic parameter mu;
s2, dividing the Drosophila population into N independent parallel computing units;
s3, starting iterative optimization, and for each parallel computing unit, carrying out the first iteration, wherein the random distance for the drosophila individual to search food by using smell is shown as follows:
after the first iteration is completed, the subsequent iteration uses the optimal solution information to obtain a new chaotic step length l based on two-dimensional Logistic chaotic mapping chaotization x And l y The calculation formula is shown as follows:
wherein mu is E (0,2.28), x n ,y n ∈(0,1);
S4, calculating the distance d from the current position of the drosophila individual to the origin i And taking the reciprocal thereof as the individual taste concentration determination value s of the fruit fly i The calculation formula is shown as follows:
s i =1/d i
s5, judging the taste concentration to be S i Substituting the taste concentration judgment function to calculate the taste concentration value of the drosophila individual at the current position;
s6, finding out the fruit fly with the optimal taste concentration value in all fruit fly individuals, and recording the taste concentration value and the corresponding position value of the fruit fly;
s7, maintaining the optimal taste concentration value S and its coordinates (x index ,y index ) And utilizing vision to fly to the optimal position recorded in the step S6 to update the position, so as to form a new drosophila population center:
s best =s
and S8, repeatedly executing the steps S2 to S6, judging whether the optimal taste concentration value is better than the historical optimal taste concentration value, and executing the step S7 if the optimal taste concentration value is smaller than max and the current iteration number is smaller than max.
CN202111232892.4A 2021-10-22 2021-10-22 Spark-based PCFOA-KELM wind power prediction method and device Active CN114154679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111232892.4A CN114154679B (en) 2021-10-22 2021-10-22 Spark-based PCFOA-KELM wind power prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111232892.4A CN114154679B (en) 2021-10-22 2021-10-22 Spark-based PCFOA-KELM wind power prediction method and device

Publications (2)

Publication Number Publication Date
CN114154679A CN114154679A (en) 2022-03-08
CN114154679B true CN114154679B (en) 2024-01-26

Family

ID=80458567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111232892.4A Active CN114154679B (en) 2021-10-22 2021-10-22 Spark-based PCFOA-KELM wind power prediction method and device

Country Status (1)

Country Link
CN (1) CN114154679B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570250A (en) * 2016-11-02 2017-04-19 华北电力大学(保定) Power big data oriented microgrid short-period load prediction method
CN109934330A (en) * 2019-03-04 2019-06-25 温州大学 The method of prediction model is constructed based on the drosophila optimization algorithm of diversified population
CN111639695A (en) * 2020-05-26 2020-09-08 温州大学 Method and system for classifying data based on improved drosophila optimization algorithm
CN113466615A (en) * 2021-06-17 2021-10-01 三峡大学 Drosophila optimization algorithm-based post-fault wave recording data synchronization method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570250A (en) * 2016-11-02 2017-04-19 华北电力大学(保定) Power big data oriented microgrid short-period load prediction method
CN109934330A (en) * 2019-03-04 2019-06-25 温州大学 The method of prediction model is constructed based on the drosophila optimization algorithm of diversified population
CN111639695A (en) * 2020-05-26 2020-09-08 温州大学 Method and system for classifying data based on improved drosophila optimization algorithm
CN113466615A (en) * 2021-06-17 2021-10-01 三峡大学 Drosophila optimization algorithm-based post-fault wave recording data synchronization method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"基于BP神经网络的风电场风电功率预测系统的设计与实现";吴嘉文;中国优秀硕士学位论文全文数据库工程科技Ⅱ辑;第13-15、36-40页 *
"基于PSO-KELM 的风功率预测研究";赵鹏等;电测与仪表;57(11);第24-29页 *
"基于云计算的风电场短期风功率预测方法的研究";赵磊;中国优秀硕士学位论文全文数据库工程科技Ⅱ辑;第26-45页 *
许国根等.《最优化方法及其MATLAB实现》.北京航空航天大学出版社,2018,第398-399页. *

Also Published As

Publication number Publication date
CN114154679A (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN111814956B (en) Multi-task learning air quality prediction method based on multi-dimensional secondary feature extraction
CN106874478A (en) Parallelization random tags subset multi-tag file classification method based on Spark
CN111898703B (en) Multi-label video classification method, model training method, device and medium
CN111091247A (en) Power load prediction method and device based on deep neural network model fusion
CN105184368A (en) Distributed extreme learning machine optimization integrated framework system and method
CN113962358A (en) Information diffusion prediction method based on time sequence hypergraph attention neural network
CN114462718A (en) CNN-GRU wind power prediction method based on time sliding window
WO2022039675A1 (en) Method and apparatus for forecasting weather, electronic device and storage medium thereof
CN114548591A (en) Time sequence data prediction method and system based on hybrid deep learning model and Stacking
CN115221396A (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN116799796A (en) Photovoltaic power generation power prediction method, device, equipment and medium
CN115185804A (en) Server performance prediction method, system, terminal and storage medium
Wen et al. MapReduce-based BP neural network classification of aquaculture water quality
CN116627773B (en) Abnormality analysis method and system of production and marketing difference statistics platform system
CN112819246A (en) Energy demand prediction method for optimizing neural network based on cuckoo algorithm
CN114154679B (en) Spark-based PCFOA-KELM wind power prediction method and device
CN116579468A (en) Typhoon generation prediction method, device, equipment and medium based on cloud system memory
Yangyang et al. Research on parallel lstm algorithm based on spark
Gao et al. Revisiting thread configuration of SpMV kernels on GPU: A machine learning based approach
Li et al. Power data cleaning method based on isolation forest and LSTM neural network
CN117435870B (en) Load data real-time filling method, system, equipment and medium
Ghiasi et al. Combining thermodynamics-based model of the centrifugal compressors and active machine learning for enhanced industrial design optimization
Barriot et al. A Possible Artificial Intelligence Ecosystem Avatar: the Moorea case (IDEA)
Yang et al. Research on apple surface defect detection based on improved YOLOv8
Yuan Research Article Multidimensional Sensor Data Fusion Processing System Based on Big Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: No. 38, New Model Road, Gulou District, Nanjing City, Jiangsu Province, 210000

Patentee after: Nanjing Nanzi Huadun Digital Technology Co.,Ltd.

Country or region after: China

Address before: No.39 Shuige Road, Jiangning District, Nanjing City, Jiangsu Province, 211100

Patentee before: NANJING HUADUN POWER INFORMATION SECURITY EVALUATION CO.,LTD.

Country or region before: China