CN112258251B

CN112258251B - Grey correlation-based integrated learning prediction method and system for electric vehicle battery replacement demand

Info

Publication number: CN112258251B
Application number: CN202011294838.8A
Authority: CN
Inventors: 张玉利; 于浩洁; 梁熙栋; 张倩
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2022-12-27
Anticipated expiration: 2040-11-18
Also published as: CN112258251A

Abstract

The invention discloses an integrated learning prediction method and system for an electric vehicle battery replacement requirement based on grey correlation, which comprises the following steps: constructing a data set and preprocessing the data set, and dividing the preprocessed data set into a training set and a testing set; selecting k base learners, and training and predicting samples of a training set by each base learner in a cross validation mode; for each input sample in the test set, selecting a best similar day training set through grey correlation analysis; establishing a prediction deviation minimization optimization model according to the prediction result of each base learner in the training set of the optimal similar day, and adopting an L1 norm with a regularization coefficient as a regularization item; solving based on the optimization model to obtain the weight coefficient of each base learner, and further obtaining an integrated predictor; and obtaining an integrated learning prediction result based on the integrated predictor. The method can effectively reduce the prediction deviation, has better prediction effect on data with high random fluctuation, and can be more suitable for data sets obtained in practice.

Description

Grey correlation-based integrated learning prediction method and system for electric vehicle battery replacement demand

Technical Field

The invention relates to the technical field of machine learning, in particular to an integrated learning prediction method and system for an electric vehicle battery replacement requirement based on grey correlation.

Background

The automobile is a daily trip mode of people, but the traditional fuel oil automobile can bring serious environmental pollution problems, such as pollution of the atmosphere and water resources, global warming and the like. And the appearance of the electric automobile can reduce the use of the traditional fossil energy, further reduce the emission of pollutants, and play a certain role in protecting the environment. The battery replacement mode of the electric automobile can reduce charging time and improve convenience of users. For example, in 2017, beijing automobile industry consortium limited (BAIC) announced implementation of "Optimus prime's program" aiming at promoting integrated development of new energy and electric vehicles through a battery exchange model. The BAIC project built 3000 optical storage switching stations before the end of 2022.

Although the battery replacement mode of the electric vehicle has many advantages compared with the charging mode, the popularity of the battery replacement mode is far lower than that of the charging mode at present. The main reason is that infrastructure construction such as power station replacement is imperfect in China at present, so that electric automobile users often cannot find the power station replacement in time to replace the power. In addition, the unreasonable management and operation of the battery replacement station on the battery also become a barrier for the development of the battery replacement mode of the electric automobile. Due to the fact that operators of the battery replacement station lack knowledge of the number of customers or change of battery requirements in a short period of time in the future, the problem that the number of supplied batteries is insufficient or the batteries are queued for charging often occurs, the batteries of the electric automobile cannot be replaced in time, and therefore the satisfaction degree of users, especially time-sensitive users, is greatly lowered.

In order to improve the service level and the battery charging efficiency, an operator of a battery replacement station needs to accurately predict the battery replacement requirement of the electric vehicle, and therefore the battery replacement requirement of the electric vehicle needs to be accurately predicted. There are three main prediction methods, including monte carlo-based simulation analysis, time series analysis, and machine learning methods. Machine learning has very high prediction accuracy and is widely applied to various fields.

For a single type of machine learning method (commonly referred to as a base learner), there is a bias in its operation at the beginning of the design, and its prediction accuracy is low on datasets that it does not adapt to. In order to overcome the defects of a single predictor, an integrated prediction method is gradually appeared, namely, an integrated predictor is constructed by combining a plurality of base learners to improve the prediction accuracy. At present, the integration prediction methods include voting (voting), bagging (bagging), boosting (boosting), stacking (stacking), and the like. Because the data set obtained in practice is not uniformly distributed, and the fluctuation is larger and the uncertainty is extremely strong if the electric automobile needs to be replaced.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an integrated learning prediction method and system for the battery replacement requirement of the electric vehicle based on grey correlation, which adopt an integrated prediction method similar to a stacking method to improve the generalization of a model and the accuracy of a prediction result.

The invention discloses an integrated learning prediction method for an electric vehicle battery replacement requirement based on grey correlation, which comprises the following steps:

constructing a data set and preprocessing the data set, and dividing the preprocessed data set into a training set and a testing set;

selecting k base learners, and training and predicting samples of a training set by each base learner in a cross validation mode;

for each input sample in the test set, selecting the best similar day training set through grey correlation analysis;

establishing a prediction deviation minimization optimization model according to the prediction result of each base learner in the training set of the optimal similar days, and adopting an L1 norm with a weight coefficient as a regular term;

solving the obtained weight coefficient of each base learner based on the optimization model to obtain an integrated predictor, and obtaining an integrated learning prediction result based on the integrated predictor; wherein the output of the integrated predictor is a linear weighted combination of the outputs of the basis learners.

As a further refinement of the invention, the dataset is T = { (x) ₁ ,Y ₁ ),...,(x _n ,Y _n )}，

Where T is the data set, x _i Taking 1, n and n as the number of samples for the ith sample;

taking 1, m and m as feature numbers for j which is the jth feature of the sample i; y is _i And taking 1, namely, the power conversion demand of the electric automobile as a label of the sample i.

As a further improvement of the invention, said features comprise:

x ⁽¹⁾ for weeks, codes from 1 to 7;

x ⁽²⁾ if the number is weekend, the number is 1 if the number is weekend, otherwise, the number is 0;

x ⁽³⁾ the weather is divided into sunny days, cloudy days, rainy days or snowy days, and the codes are respectively 1,2,3;

x ⁽⁴⁾ the highest air temperature value of the day;

x ⁽⁵⁾ is the lowest air temperature value of the current day;

x ⁽⁶⁾ the battery replacement demand of all the electric automobiles on the same day of the last week;

x ⁽⁷⁾ predicting the battery replacement demand of all electric vehicles on the day before the day;

x ⁽⁸⁾ predicting the driving mileage of all electric automobiles one day before the day;

x ⁽⁹⁾ ～x ⁽¹³⁾ the remaining battery capacity (SOC) was in the interval of 0,20% when all vehicles were finished the day before the predicted day]、[20％,40％]、[40％,60％]、[60％,80％]、[80％,100％]Accounts for a proportion of all electric vehicles.

As a further improvement of the present invention, the preprocessing of the data set comprises: standardizing data, and then performing dimensionality reduction by adopting PCA (principal component analysis);

the training set comprises a training subset and a verification set, wherein the training subset is 70% of the total data volume, and the verification set and the test set are 15% of the total data volume;

the basis learners include K-nearest neighbors (KNN), support vector machines (SVR), gradient Boosting Regression Trees (GBRT), random Forests (RF), and Ridge Regression (RR).

As a further improvement of the present invention, the training and predicting samples of the training set by each base learner in a cross-validation manner includes:

adopting six-fold cross validation;

averagely dividing the training set into 6 parts which are respectively T1, T2, T3, T4, T5 and T6, taking 5 parts as training subsets to train a base learner, and taking the other part as a verification set to predict by using the base learner;

after multiple times of training and prediction, obtaining the prediction result of each base learner in a training set; wherein, f _r (. H) is the r-th radical learner, r =1,2,3,4,5Through a cross validation mode, a prediction result f of 5 base learners in a training set can be obtained _r (x _i )；

All base learners are trained on the entire training subset.

As a further refinement of the present invention, the R (i) most relevant to the ith sample in the training set is selected using a gray correlation analysis ₁ ,i ₂ ,…,i _R ) Day, as training set T of similar days _i ；

In the formula (I), the compound is shown in the specification,

the prediction result for the kth base learner for the R-th sample i.

As a further improvement of the invention, the grey correlation analysis comprises:

first, the gray correlation coefficient ([ xi ]) is calculated _0i ) And then calculates a gray correlation degree (gamma) _0i ) For an input test set sample x ₀ And samples x in the training set _i The gray correlation coefficient calculation formula is as follows:

in which ξ _0i (c) Sample x for the test set ₀ And samples x in the training set _i In the grey correlation coefficient of the c-th feature,

after calculating the gray correlation coefficients of each feature of the input test set samples and all samples in the training set, the gray correlation degree, which is common to the gray correlation coefficients of the input test set samples and all samples in the training set, of each sample in the input test set samples and all samples in the training set needs to be calculatedIs of the formula

Taking the average value of each gray correlation coefficient, wherein the larger the calculated gray correlation value is, the higher the correlation is; and selecting the most relevant R samples from the training set as a similar day training set according to the calculated gray correlation degree.

As a further improvement of the invention, the optimization model is as follows:

this formula can be equated to the following linear program:

s.t.α≥0

in the formula, E2]The method is used for solving expectation of a formula in brackets, and is used for solving an average value under a discrete condition; y is a random variable, i.e. the actual predicted value of a sample in the training set, Y _i Is the actual value of the ith sample; f = (f) ₁ ,...,f _k ) I.e. the predicted values of the k base predictors at the samples corresponding to Y, | α | respectively ₁ ＝|α ₁ |+...+|α _k L, w is a weight coefficient, and k is the number of the basis learners; z and v _i Solving the optimization model for the introduced intermediate variables to obtain a decision variable alpha = (alpha) ₁ ,..,α _k )。

As a further improvement of the present invention, for a certain test set sample x _i Final prediction result F (x) _i )＝α ₁ f ₁ (x _i )+...+α _k f _k (x _i )。

The invention also discloses a prediction system for realizing the integrated learning prediction method, which comprises the following steps:

the building module is used for building a data set, preprocessing the data set and dividing the preprocessed data set into a training set and a testing set;

the training module is used for selecting k base learners, and each base learner is used for training and predicting samples of a training set in a cross validation mode;

the analysis module is used for selecting the best similar day training set of each input sample in the test set through grey correlation analysis;

the building module is used for building a prediction deviation minimization optimization model according to the prediction result of each base learner in the training set of the optimal similar day, and the L1 norm with the weight coefficient is used as a regular term;

the prediction module is used for solving the obtained weight coefficient of each base learner based on the optimization model to obtain an integrated predictor and obtaining an integrated learning prediction result based on the integrated predictor; wherein the output of the integrated predictor is a linear weighted combination of the outputs of the basis learners.

Compared with the prior art, the invention has the beneficial effects that:

the prediction accuracy and the generalization of the integrated predictor are considered, the prediction deviation can be effectively reduced compared with a base learner with the best prediction effect, the prediction method has a better prediction effect on data with high random fluctuation, can be more suitable for data sets obtained in practice, can be applied to other data sets, and has higher practicability.

Drawings

Fig. 1 is a flowchart of an integrated learning prediction method for an electric vehicle battery replacement demand based on gray correlation according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The invention provides an integrated learning prediction method and system for an electric vehicle battery replacement demand based on grey correlation, belonging to the technical field of machine learning; the method involves a two-layered structure, namely a plurality of base learners and an integrated predictor, which is a weighted combination of the plurality of base learners. In order to improve the prediction accuracy, the optimal similar day training set of each input prediction sample is selected based on gray correlation analysis, and then the most relevant training set is provided for solving the weight of the integrated predictor so as to improve the prediction accuracy of the model. In order to improve the generalization of the integrated predictor, the invention establishes an optimization model with a weighted L1 norm regular term, the optimization model is equivalent to a linear programming problem, and the weight coefficient is solved through the optimization model. The invention considers the prediction accuracy and generalization of the integrated predictor, can effectively reduce the prediction deviation compared with the base learner with the best prediction effect, has better prediction effect on data with high random fluctuation, can better adapt to the data set acquired in practice, can be applied to other data sets, and has stronger practicability.

The invention is described in further detail below with reference to the following drawings:

as shown in fig. 1, the invention provides an integrated learning prediction method for an electric vehicle battery replacement demand based on gray correlation, which includes:

step 1, selecting characteristics, constructing a data set, preprocessing the data set, and dividing the preprocessed data set into a training set and a testing set; wherein the content of the first and second substances,

for the electric automobile record data that gather, its characterized in that includes:

x ⁽¹⁾ for weeks, codes from 1 to 7;

x ⁽⁴⁾ the highest air temperature value of the day;

x ⁽⁵⁾ the lowest air temperature value of the current day;

x ⁽⁹⁾ ～x ⁽¹³⁾ the remaining battery capacity (SOC) was in the interval of 0,20% when all vehicles were finished the day before the predicted day]、[20％,40％]、[40％,60％]、[60％,80％]、[80％,100％]The proportion of the vehicles in all the electric automobiles is increased;

y is the electric automobile battery replacement demand on the corresponding date, namely the label of the data set, and if n samples exist and each sample has m characteristics, the final data set is T = { (x) ₁ ,Y ₁ ),...,(x _n ,Y _n )}，

Where T is the data set, x _i Taking 1, n, n as the number of samples for the ith sample;

taking 1, m and m as feature numbers for j which is the jth feature of the sample i; y is _i Taking a label of a sample i, namely the battery replacement demand of the electric automobile, wherein i is 1, ·, n;

pre-processing of a data set, comprising: firstly adopts the formula

And (3) normalizing the data, then performing dimension reduction processing by adopting PCA, and reducing the dimension of the normalized data set from 13 features to 12 features.

The training set comprises a training subset and a verification set, wherein the training subset is 70% of the total data volume, and the verification set and the test set are 15% of the total data volume.

Further, to facilitate data extraction, dates (year-month-day) may be added to the dataset, but not entered as features into the model.

Step 2, selecting k base learners, and training and predicting samples of a training set by each base learner in a cross validation mode; wherein, the first and the second end of the pipe are connected with each other,

the number of the base learners is 5, and the base learners comprise K Nearest Neighbors (KNN), support vector machines (SVR), gradient Boosting Regression Trees (GBRT), random Forests (RF) and Ridge Regression (RR);

the cross validation adopts six-fold cross validation, and the assumption is that _r (. Cndot.) is the r-th base learner (r =1,2,3,4, 5), and the predicted result f of the 5 base learners in the training set can be obtained through a cross validation mode _r (x _i )。

Specifically, the method comprises the following steps:

adopting six-fold cross validation; averagely dividing the training set into 6 parts which are respectively T1, T2, T3, T4, T5 and T6, taking 5 parts as a training subset to train a base learner, and taking the other part as a verification set to predict by using the base learner; after multiple times of training and prediction, obtaining the prediction result of each base learner in the training set, and recording the result; all base learners are trained on the entire training subset.

Since the final integrated prediction method is integrated in conjunction with the base learner, increasing the prediction accuracy of the base learner necessarily increases the accuracy of the integrated prediction method. So all base learner hyperparameters adopt the default values in the sklern packet in python when training and predicting.

Step 3, selecting the best similar day training set of each input sample in the test set through grey correlation analysis;

wherein the content of the first and second substances,

selecting the R (i) most correlated with the i-th sample in the training set using gray correlation analysis ₁ ,i ₂ ,…,i _R ) Day, as training set T of similar days _i ；

In the formula (I), the compound is shown in the specification,

predicting the result of the sample i on the R day for the kth base learner;

the grey correlation analysis comprises:

first, a gray correlation coefficient ([ xi ]) is calculated _0i ) And then calculates a gray correlation (gamma) _0i ) For an input test set sample x ₀ And samples x in the training set _i The grey correlation coefficient calculation formula is as follows:

after calculating the gray correlation coefficient of each feature of the input test set sample and all samples in the training set, the gray correlation degree of each sample in the input test set sample and training set needs to be calculated, and the formula is

Taking the average value of each gray correlation coefficient, wherein the larger the calculated gray correlation value is, the higher the correlation is; selecting the most relevant R samples from the training set as a similar day training set according to the calculated grey correlation degree; where R is typically 75% of the total training set samples.

Step 4, establishing a prediction deviation minimization optimization model according to the prediction result of each base learner in the training set of the optimal similar day, and adopting an L1 norm with a weight coefficient as a regular term;

wherein, the optimization model is as follows:

this formula can be equated to the following linear program:

s.t.α≥0

in the formula, E2]The method is used for solving expectation of a formula in brackets, and is used for solving an average value under a discrete condition; y is a random variable, i.e. the actual predicted value of a sample in the training set, Y _i Is the actual value of the ith sample; f = (f) ₁ ,...,f _k ) I.e. the predicted values of the k base predictors at the samples corresponding to Y, | α | respectively ₁ ＝|α ₁ |+...+|α _k |,w is a weight coefficient, and k is the number of the basis learners; z and v _i Solving the optimization model for the introduced intermediate variables to obtain a decision variable alpha = (alpha) ₁ ,..,α _k )。

Wherein w can search for the best R in the training set by adopting a cross validation mode, namely, the training set is averagely divided into 6 parts, five parts are selected as the training set, the other part is used as the validation set for prediction, each part can obtain an evaluation index, 6 evaluation indexes can be obtained because the part is divided into 6 parts, the average value of the 6 evaluation indexes is taken as the final prediction effect performance of the parameter at a certain value, the optimal value of the evaluation indexes is found as the value obtained by the parameter during prediction through continuous iteration, and the evaluation indexes in the example are preferably mean square error MAE and Symmetric Mean Absolute Percentage Error (SMAPE).

Step 5, solving the obtained weight coefficient of each base learner based on the optimization model to obtain an integrated predictor, and obtaining an integrated learning prediction result based on the integrated predictor; wherein the output of the integrated predictor is a linear weighted combination of the outputs of the basis learners, i.e. for a certain test set sample x _i Final prediction result F (x) _i )＝α ₁ f ₁ (x _i )+...+α _k f _k (x _i )。

The invention provides a prediction system for realizing the integrated learning prediction method, which comprises the following steps:

a construction module for implementing the step 1;

a training module for implementing the step 2;

the analysis module is used for realizing the step 3;

the establishing module is used for realizing the step 4;

and the prediction module is used for realizing the step 5.

Compared with the prior art, the invention has the beneficial effects that:

the invention considers the prediction accuracy and generalization of the integrated predictor, can effectively reduce the prediction deviation compared with the base learner with the best prediction effect, has better prediction effect on data with high random fluctuation, can better adapt to the data set acquired in practice, can be applied to other data sets, and has stronger practicability.

The present invention has been described in terms of the preferred embodiment, and it is not intended to be limited to the embodiment. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An electric vehicle battery replacement demand integrated learning prediction method based on grey correlation is characterized by comprising the following steps:

establishing a prediction deviation minimization optimization model according to the prediction result of each base learner in the training set of the optimal similar day, and adopting an L1 norm with a regularization coefficient as a regularization item; wherein, the optimization model is as follows:

the model is equivalent to the following linear programming:

s.t.α _q more than or equal to 0, q is 1, 8230, k

In the formula, E2]The method is used for solving expectation of a formula in brackets, and is used for solving an average value under a discrete condition; y is a random variable, namely the actual required value of a sample in the optimal similar day training set, Y _p Training the actual value of the p sample in the set for the optimal similar day; f = (f) ₁ ，...，f _k ) That is, the k basis predictors predict the predicted values, | | α | | sweet wind in the sample corresponding to Y respectively ₁ ＝|α ₁ |+...+|α _k I, w is a regularization coefficient, and k is the number of the basis learners; z and v _p Solving the optimization model for the introduced intermediate variables to obtain a decision variable alpha = (alpha) ₁ ，..，α _k ) Wherein α is ₁ ，..，α _k A weight coefficient for each base learner; r is the sample number of the training set of the optimal similar day;

2. The ensemble learning prediction method of claim 1, wherein the data set is T = { (x) ₁ ，Y ₁ )，...，(x _n ，Y _n )}，

Where T is the data set, x _i For the ith sample, i is 1, \ 8230, n and n are the number of samples;

j is the jth characteristic of the sample i, wherein j is 1, \ 8230, and m are characteristic numbers; y is _i The label of sample i, namely the battery replacement demand of the electric automobile, is 1, \8230;, n.

3. The ensemble learning prediction method of claim 2, wherein the features include:

x ⁽¹⁾ for weeks, codes from 1 to 7;

x ⁽³⁾ the weather is divided into sunny days, cloudy days, rainy days or snowy days, and the sunny day is coded as 1, the cloudy day is coded as 2, and the rainy day or snowy day is coded as 3;

x ⁽⁴⁾ the highest air temperature value of the day;

x ⁽⁵⁾ the lowest air temperature value of the current day;

x ⁽⁷⁾ predicting the battery replacement demand of all the electric vehicles on the day before the day;

x ⁽⁸⁾ predicting the driving mileage of all electric vehicles one day before the day;

x ⁽⁹⁾ ～x ⁽¹³⁾ the remaining battery capacity (SOC) was in the interval [0,20% ] when all vehicles finished traveling the day before the predicted day]、[20％，40％]、[40％，60％]、[60％，80％]、[80％，100％]The vehicle (c) accounts for the proportion of all electric vehicles.

4. The ensemble learning prediction method of claim 2, wherein the preprocessing of the data set comprises: firstly, standardizing data, and then performing dimensionality reduction treatment by adopting PCA (principal component analysis);

the basis learner comprises K neighbors, a support vector machine, a gradient boosting regression tree, a random forest and a ridge regression.

5. The ensemble learning prediction method of claim 4, wherein said using cross-validation to train and predict samples of the training set for each base learner comprises:

adopting six-fold cross validation;

averagely dividing the training set into 6 parts which are respectively T1, T2, T3, T4, T5 and T6, taking 5 parts as a training subset to train a base learner, and taking the other part as a verification set to predict by using the base learner;

after multiple times of training and prediction, obtaining the prediction result of each base learner in a training set; wherein f is _r (. To) is the r base learner, r =1,2,3,4,5, and the predicted result f of the 5 base learners in the training set is obtained through a cross validation mode _r (x _i )；

All base learners are trained on the entire training subset.

6. The ensemble learning prediction method of claim 5, wherein the grey correlation analysis is used to select the R days in the training set that are most correlated with the ith sample, denoted as i ₁ ，i ₂ ，...，i _R Obtaining a training set T of similar days _i ；

In the formula (I), the compound is shown in the specification,

the prediction for the kth base learner for the R day most correlated with the ith sample.

7. The ensemble learning prediction method of claim 6, wherein the grey correlation analysis comprises:

firstly, calculating a gray correlation coefficient xi _0i Then, the gray correlation degree gamma is calculated _0i The method specifically comprises the following steps:

for theInput test set sample x ₀ And samples x in the training set _i The gray correlation coefficient calculation formula is as follows:

ρ∈[0，1]after calculating the gray correlation coefficient of each feature of the input test set sample and all samples in the training set, the gray correlation degree of each sample in the input test set sample and training set needs to be calculated, and the formula is

8. The ensemble learning prediction method of claim 7, wherein sample x is taken for a test set _i Final prediction result F (x) _i )＝α ₁ f ₁ (x _i )+...+α _k f _k (x _i )。

9. A prediction system of the ensemble learning prediction method according to any one of claims 1 to 8, comprising:

the building module is used for building a prediction deviation minimization optimization model according to the prediction result of each base learner in the training set of the optimal similar day, and the L1 norm with the regularization coefficient is used as a regularization item;