CN115526398A

CN115526398A - Attention mechanism-based charging load prediction method for charging station

Info

Publication number: CN115526398A
Application number: CN202211188648.7A
Authority: CN
Inventors: 易媛; 鞠晨; 汤晓栋; 吴小东; 奚培锋; 王海杰; 刘春雨; 沈怀川
Original assignee: Shanghai Lianruike Energy Technology Co ltd; Shanghai Electrical Apparatus Research Institute Group Co Ltd
Current assignee: Shanghai Lianruike Energy Technology Co ltd; Shanghai Electrical Apparatus Research Institute Group Co Ltd
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2022-12-27

Abstract

The application discloses a charging station charging load prediction method based on an attention mechanism, which comprises a prediction model, wherein the prediction model comprises a characteristic embedding module, a historical data coding module and a fusion prediction module; obtaining characteristic data of a charging station time period t and a charging load value y corresponding to the time period t from historical data _t And the prediction model calculates a prediction charging load value, and obtains a convergent prediction model through historical data training so as to predict the charging load in a future period. According to the method, a novel prediction method is adopted to give the charging load according to the historical charging load data of the charging stationThe charging load change curves of the charging stations in the future time period are predicted, data support is better provided for construction planning and daily operation of the charging stations, and further quantitative analysis basis is provided for load prediction of the power distribution network.

Description

Attention mechanism-based charging load prediction method for charging station

Technical Field

The application relates to a charging load prediction method of a charging station based on an attention mechanism, which comprises the technologies of general data processing, data prediction or optimization and the like, and belongs to the technical field of data mining, utilization and prediction.

Background

Because the charging load has obvious space-time characteristics, the charging load of the electric vehicle has completely different performances in different time and space, and the charging load curves of different charging stations are different, although a traditional classical autoregressive time series prediction model such as AR, MA, ARMA, ARIMA and the like has the characteristics of strong pertinence and good robustness, the learning freedom degree of the model is low, the introduction support degree of covariates is not high, but the data distribution difference between different items is large, and the method cannot accurately capture the data distribution difference, so the generalization performance of the method is poor. Compared with an autoregressive model, the shallow machine learning method based on the feature engineering, such as a regression analysis method, a gradient lifting decision tree and a support vector machine, has improved freedom, but depends on complex feature engineering and business experience, has higher extrapolation complexity and lower preparation for predicting missing data.

The prediction method based on deep learning gradually becomes a research hotspot in the field of data prediction due to the characteristics of high learning freedom and low dependence on artificial characteristic engineering. Such as the autoregressive recurrent neural network deep proposed by David Salinas et al, the method combines autoregressive and recurrent neural networks, introduces covariates, supports the prediction of missing items through similar expressions among similar items, is also subject to the defects of the recurrent neural network, and is difficult to capture long-term period and season information. In recent years, attention mechanism has been highlighted in capturing long-term dependency, so researchers have fused attention mechanism into deep learning models to further improve the accuracy of data prediction. However, in the prediction problem in the real application scene, a general method may not obtain a good prediction result, and in order to solve the prediction problem in the actual complex scene, a plurality of methods are often combined for prediction.

Disclosure of Invention

The purpose of the application is to predict the charging load change curve of each charging station in a given future time period by adopting a novel prediction method according to historical charging load data of the charging stations, better provide data support for construction planning and daily operation of the charging stations and further provide quantitative analysis basis for load prediction of a power distribution network.

In order to achieve the above purpose, the technical scheme of the application is to provide a charging load prediction method for a charging station based on an attention mechanism, which is characterized by comprising a prediction model, wherein the prediction model comprises a feature embedding module, a historical data coding module and a fusion prediction module;

the prediction model training steps are as follows:

acquiring characteristic data of a charging station time period t and a charging load value y corresponding to the time period t _t All feature data constituting a feature vector r _t (ii) a Extracting a feature vector r _t Middle time and weather characteristic phaseThe off fraction is denoted g _t ；

Acquiring historical data of a charging station in a period t of the past s days, including

And

respectively representing the date, time and weather characteristics of a time period t in the past s days and the load value in the corresponding time period;

the input vector of the historical data encoding module is u ^t And g _t ；

The historical data encoding module is provided with a network basic module FA which is sequentially provided with a linear layer, a BatchNorm layer, a Tanh layer and a Dropout layer from input to output;

inputting the vector X into a network basic module FA to obtain FA (X);

historical data encoding module pair vector u ^t The process of (3) is a residual concatenation process:

K ^t ＝FA(u ^t +FA(u ^t ))，

for g _t The same treatment yields: q ^t ＝FA(g ^t +FA(g ^t ))；

K ^t And Q ^t Normalization processing is carried out after multiplication to obtain feature data pair prediction y corresponding to time period t in historical data _t Influence factor alpha of ^t ：

α ^t ＝softmax(K ^t ×Q ^t )

By alpha ^t To p is p ^t Weighting and calculating to obtain the output vector V of the historical data coding module ^t ：

The input vector of the fusion prediction module is O ^t ，O ^t ＝[r _t ，V ^t ]；

The output of the fusion prediction module is:

wherein, W _p And b _p Respectively representing a weight matrix and an offset, which are updated weights after learning and training are needed in a prediction model;

a mean square error loss function LMSE is set,

performing error back propagation and updating the prediction model based on a random gradient descent algorithm and a momentum minimization loss algorithm until the prediction model converges to obtain a converged prediction model;

the prediction steps of the prediction model are as follows:

recording the time period to be predicted as t ', acquiring characteristic data of the time period t', wherein the characteristic data related to weather is acquired through weather forecast, and substituting the characteristic data into the converged prediction model for calculation to obtain the characteristic data

I.e. the predicted charging load value for time period t'.

Wherein the feature data comprises x _t ＝[x _t0 ，x _t1 ，...，x _td-1 ]And h _t ＝[h _t0 ，k _t1 ，...，h _tk-1 ]，x _t A continuous feature vector h composed of d continuous feature data corresponding to the time interval t _t A category type feature vector composed of k category type feature data corresponding to the time period t; class-type feature vector h _t Converting input feature embedding module into continuous feature vector h' _t Continuous type feature vector x _t And continuous type feature vector h' _t Combining to obtain the characteristic vector r _t ＝[h′ _t |x _t ]。

Specifically, the feature embedding module processes the input class-type feature vector as follows: one-hot coding is carried out on each feature, and then a learnable weight matrix is multiplied to convert the feature into a continuous feature vector with dimension r, wherein r is a hyper-parameter of the model and is determined through priori knowledge and manual debugging.

Specifically, the input-output relation of the linear layer is as follows:

f(X)＝XW ^T +b，

x represents an input vector of the layer, W and b represent a weight matrix and an offset respectively, and the weight matrix and the offset are updated weights after learning and training are needed in a prediction model;

the input and output relational expression of the BatchNorm layer is as follows:

mu represents the expectation of an input vector X, sigma is a standard deviation, epsilon is a positive number close to 0, gamma and beta respectively represent affine transformation parameter vectors and are updated weights after learning and training in a prediction model;

the input and output relational expression of the Tanh layer is as follows:

preferably, the feature data input to the prediction model is normalized by the following formula:

x' represents raw data, μ represents a mean value of the raw data, S represents a standard deviation of the raw data, and x is data after normalization processing.

Preferably, the charging load value is a charging load value after a logarithmic processing, and the logarithmic processing formula is:

y＝log(y′+1)

where y' represents the original data and y represents the data after the logarithmic processing.

Specifically, the softmax function formula is as follows:

preferably, the characteristic data at least includes charging station information, location information to which the charging station belongs, weather information, and time information. Preferably, the weather information includes a current time temperature and a current time humidity. Further, the time information includes the hour and the minute of the day where the current time is located.

Drawings

FIG. 1 is a flow diagram of a feature embedding process provided in an embodiment;

fig. 2 is a site charging load curve before the logarithm processing provided in the embodiment;

fig. 3 is a log-processed site charging load curve provided in the embodiment;

FIG. 4 is a schematic diagram of the overall structure of a prediction model provided in the embodiment;

fig. 5 is a comparison of site a charging load prediction results provided in the example;

fig. 6 is a comparison of site B charging load prediction results provided in the embodiments;

fig. 7 is a comparison of the site C charging load prediction results provided in the example;

fig. 8 is a comparison of the site D charging load prediction results provided in the example.

Detailed Description

In order to make the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Examples

The embodiment provides a charging load prediction method for a charging station based on an attention mechanism, which is used for predicting the charging load of the electric vehicle station aiming at a time period t, wherein the time period t can be selectively divided into minute levels, and the time division scale can be expanded based on practical application.

1. A model training stage:

the training stage of the model is mainly divided into three parts, 1, data acquisition and feature extraction; 2. preprocessing data; 3. and (5) training a model.

1. Data collection and feature extraction

In the method for predicting the load of the electric vehicle charging station, an operator of the charging equipment pushes data through the data transmission protocol in table 1, and my party receives the data and stores the data in the database.

TABLE 1 detail data of interface charging process of charging equipment

Parameter name	Definition of	Must fill in	Type of parameter	Remarks for note
					Charging order number	StartChargeSeq	Is that	Character string	Docking platform system order numbering
Charging device interface coding	ConnectorID	Is that	Character string	Unique muzzle number under platform
					Vehicle identification code	Vin	Whether or not	Character string	Vehicle identification number or frame number
Phase A current	CurrentA	Is that	Floating point type	Unit: a, default: 0 with DC (output)
					Phase voltage of A	VoltageA	Is that	Floating point type	Unit: v, default: 0 with DC (output)
Percentage of remaining battery capacity	SOC	Is that	Floating point type	Defaults are as follows: 0
					Time to start charging	StartTime	Is that	Character type	The format "yyyy-MM-dd HH: mm: ss'
This time of sampling	EndTime	Is that	Character type	The format "yyyy-MM-dd HH: mm: ss'
					Cumulative charge amount	TotalElect	Is that	Floating point type	Unit: degree, 2 bits after decimal point
Accumulated electricity charge	ElecMoney	Is that	Floating point type	Unit: 2 bits after the element, decimal point
					Accumulated service charge	SeviceMoney	Is that	Floating point type	Unit: 2 bits after the element, decimal point
Accumulated total sum	TotalMoney	Is that	Floating point type	Unit: 2 bits after the element, decimal point

The power P (the sum of the powers taken by a station to the power system at a certain time is called a charging load of the station) can be obtained by multiplying the phase current a by the phase voltage a

P＝CurrentA×VoltageA (1)

And aggregating according to minutes, setting that k vehicles are charged within one minute of a certain station, and respectively pushing d to the charging behaviors of the k vehicles within the minute by an operator ₁ 、d ₂ ...d _k The charging load of the station in this minute is calculated according to the following formula (2):

meanwhile, the popularity of the charging station is related to the charging station and the geographical position of the charging station, and the charging behavior of the electric vehicle is influenced by weather and time, so that characteristic data coexistence tables such as charging station information, location information to which the charging station belongs, weather information, time information and the like are extracted from the database.

TABLE 2 characterization of the model

Feature(s)	Feature classification	Description of the preferred embodiment
			frequency	Continuous type	Monthly charging frequency at a site
conn_num	Continuous type	Number of charging interfaces of station
			dc_conn_num	(Continuous)Model (III)	Number of site DC interfaces
ac_conn_num	Continuous type	Number of site AC interfaces
			equipment_amount	Continuous type	Number of site devices
ac_equipment_amount	Continuous type	Number of site direct current devices
			dc_equipment_amount	Continuous type	Number of station communication devices
st_equipment_power	Continuous type	Total power of site
			dc_equipment_power	Continuous type	Total power of site DC equipment
single_power	Continuous type	Station single pile power
			ac_dedicated	Continuous type	Line of siteNumber of communication devices dedicated to administrative district
dc_dedicated	Continuous type	Number of direct current devices dedicated to administrative district where site is located
			ac_public	Continuous type	Number of public communication equipment in administrative district where site is located
dc_public	Continuous type	Number of common direct current devices in administrative district where site is located
			density	Continuous type	Site distribution density of administrative district in which site is located
equipment_power	Continuous type	Charging equipment total power of administrative district where station is located
			temperature	Continuous type	Temperature at present
humidity	Continuous type	Humidity of current time
			power	Continuous type	SiteCharging load at present
district	Class type	Administrative district
			hour	Class type	Hour(s)
minute	Class type	Minute (min)
			station_type	Class type	Site type
park_fee_type	Class type	Type of stop parking charge
			construction	Class type	Site construction site
demonstration	Class type	Charging or not demonstration station
			city_area	Class type	The ring area to which the station belongs
equipment_structure	Class type	Station equipment composition structure
			connector_structure	Class type	Station muzzle structure
scale	Class type	Size of operator to which site belongs

The method specifically comprises four types of characteristic data, namely charging station information, location information to which the charging station belongs, weather information and time information. The charging station comprises a station, a station charging interface, a station direct current interface, a station alternating current interface, a station device, a station direct current device, a station alternating current device, a station total power, a charging demonstration station, a station direct current device total power, a station single pile power, a station type, a station parking charging type, a station construction site, a station device composition structure, a station gun mouth structure and a station affiliated operator scale, wherein the station previous month charging frequency, the station charging interface quantity, the station direct current interface quantity, the station device quantity, the station direct current device quantity, the station parking charging type, the station construction site, the station device composition structure, the station gun mouth structure and the station affiliated operator scale belong to charging station information; the number of the special alternating current equipment for the administrative district where the site is located, the number of the special direct current equipment for the administrative district where the site is located, the number of the public alternating current equipment for the administrative district where the site is located, the number of the public direct current equipment for the administrative district where the site is located, the site distribution density of the administrative district where the site is located, the total power of the charging equipment for the administrative district where the site is located, the administrative district, and the district to which the site belongs belong to the regional information of the charging site; the current time temperature and the current time humidity belong to weather information; the hour and the minute of the time period belong to the time information.

2. Data pre-processing

It can be seen from table 2 that the data set used in the method of the present invention mainly includes a class type variable and a continuous type variable, and for class type features, we convert the class variable into the continuous type variable by an Embedding manner (hereinafter referred to as Embedding), first perform one-hot encoding on each feature, and then convert into a continuous feature vector with dimension r by multiplying a learnable weight matrix, where r is a hyper-parameter of a model, and is determined by priori knowledge and manual debugging, and a feature Embedding processing flow is shown in fig. 1.

Because the random gradient descent algorithm is adopted for optimization in the model training process, under the condition that the fluctuation range of input data set data is large, the phenomena of gradient explosion and gradient disappearance can occur, so that the model cannot be converged. In order to prevent the phenomenon from occurring in model training, before the model training, the continuity characteristic of the data set is subjected to standardization processing, so that the continuity characteristic of the data set is in standard normal distribution, and the possibility that the model falls into a linear region of a nonlinear activation function due to an outlier is reduced. The normalization of the continuum profile is shown in equation (3), where x' represents the input data, μ represents the mean of the input data, and S represents the standard deviation of the input data.

In addition, in order to make the distribution of the model label data more stable, reduce the calculated amount of the model, and avoid introducing new calculation errors due to the difference between the data distribution in the training set and the data distribution in the test set, we choose to perform logarithmic processing on the charging load data y' of the station, as shown in formula (4).

y＝log(y′+1) (4)

Specifically, a comparison of the charging load data of the logarithmic process is shown in fig. 2 and fig. 3, where fig. 2 is a station charging load curve before the logarithmic process, and fig. 3 is a station charging load curve after the logarithmic process.

Experimental data a total of 959040 sample examples were randomly selected in this example to cover 22 days, where the test set samples were the data of the last day in the data set, and the training set and the validation set were the data of the first 21 days in the data set, as follows: 2, the verification set is separated from the training data to facilitate selection of a better model in multiple training, and the final separation result is shown in table 3.

TABLE 3 Experimental data set partitioning results

3. Model training

The overall structure diagram of the model is shown in fig. 4, and the present embodiment will explain the structure of the model by composing the input and output of the model and the modules of the model into two parts.

3-1 model input and output

The input of the prediction model is divided into two parts, and if the current predicted content is the load value of the time period t, the first part of the model input data is x shown on the left side of fig. 4 _t ＝[x _t0 ，x _t1 ，...，x _td ]，x _t ∈R ^1×d And h _t ＝[h _t0 ，h _t1 ，...，h _td ]，h _t ∈R ^1×k Respectively representing a continuous characteristic vector and a category characteristic vector corresponding to the time period t; the second part being a right-hand historical data encoding module

u ^t ∈R ^s×c And

p ^t ∈R ^s×1 the historical data step length s =7 is obtained in our experiment, that is, the characteristic data and the charging load value corresponding to the time period t in the past 7 days are taken as the input of the historical data coding module.

Correspondingly, the output of the prediction model is the load value corresponding to the t time period of the day predicted by the model

3-2 model each module composition

The model mainly comprises three modules: 1) A feature embedding module; 2) A historical data encoding module; 3) And a fusion prediction module.

The continuous type characteristic and the classification type characteristic are separately input in the characteristic embedding module, and the classification type characteristic h _t Input Embedding layer converts category features into continuous type feature vector h' _t ∈R ^1×m Where m = k _ embedding _ size, in the present embodiment, embedding _ size =3 is taken, and the output result h 'after the embedding process is used as the type feature' _t ∈R ^1×3k And continuous type feature x _t Combine to obtain a new vector r _t ＝[h′ _t |x _t ]，r _t ∈R ^1×e ，e＝d+3k。

Input of historical data encoding Module model input data u ^t 、p ^t And model other Module output data g _t ∈R ¹ ^×(j-t) Composition of g _t Is an amount r _t The part of the feature vector related to the medium time and the weather is obtained by means of vector slicing, and i and j can be defined as an expression (5) according to the feature arrangement condition.

g _t ＝r _t [：，i：j] (5)

As can be seen in FIG. 4, the historical data encoding module first encodes u ^t Input into a network infrastructure module FA, the structural diagram inside the network infrastructure module FA is shown in the upper right corner of fig. 4, where Linear represents a Linear layer, which is defined as shown in equation (6), where W and b represent a weight matrix and an offset that can be learned, respectively, where X = [ X ], [ ₀ ，X ₁ ，...，X _i ，..]In order to realize the jump addition method (i.e. the residual connection method) between layers in the historical data encoding module of the method of the embodiment, we set the output dimension and the input dimension of the linear layer of the module to be equalThe value is obtained.

f(X)＝XW ^T +b (6)

The BatchNorm layer in the FA block is defined as shown in equation (7), where μ represents the expectation of the input data X, σ is the standard deviation, and e is a value close to 0, and γ and β represent learnable affine transformation parameter vectors with the same vector dimension as that of the input data in the layer in order to avoid calculation errors when the standard deviation σ = 0.

Tanh is the activation function, as shown in equation (8):

to u in historical data coding module ^t Is performed in a residual concatenation manner, which can be expressed as formula (9), where

K ^t ∈R ^s×z Z is a hyper-parameter of the model, which can be adjusted manually according to a priori knowledge, in this embodiment z =16.

K＝FA(u ^t +FA(u ^t )) (9)

For another input g of the module _t Is obtained after being processed by the same residual error connection mode

Q ^t ∈R ^z×1 。

Q ^t ＝FA(g ^t +FA(g ^t )) (10)

Then K is put ^t And Q ^t After multiplication, normalization processing is carried out to obtain t time period in historical dataThe value of the influence factor alpha on the t time period predicted on the day ^t ∈R ^s×1 It is calculated as shown in equation (11), where the definition of the softmax function is given by equation (12).

α ^t ＝softmax(K ^t ×Q ^t ) (11)

Finally using alpha ^t To p ^t Weighting and calculating to obtain output vector V of historical data encoding module ^t ∈R ^1×1 As shown in equation (13):

input data O of fusion prediction module ^t Obtained by combining the output of the feature embedding module and the output of the historical data encoding module, i.e. O ^t ＝[r _t ，V ^t ]，O ^t ∈R ^1×(e+1) 。

The fusion prediction module also adopts a residual error connection mode, and is different from the historical data coding module in that the output dimension and the input dimension of a linear layer in the network basic module FA are different, so that a characteristic combination mode rather than a summation mode is adopted in the residual error connection, the last part of the module, namely a Predict block in a figure 4, consists of 2 FA blocks and one linear layer, namely the last layer of the whole model is the linear layer, the output dimension of the linear layer is 1, and the output content is a logarithmic load value obtained by model calculation

The modular calculation process can be expressed as equation (14), where W _p ∈R ^c×1 ，b _p ∈R ^1×1 And c denotes the dimension of the last FA block output vector.

Finally, regarding model loss function selection, the task in this embodiment belongs to a regression task, and the objective is to fit a load curve, so we adopt a Mean-Squared Error (MSE) loss function commonly used in the regression task, and the formula definition is shown in formula (15). In the method, error back propagation is realized and the weight of the network is updated through a random Gradient Descent (SGD) + Momentum (Momentum) minimum loss function, the initial learning rate of model training is set to be 0.12, and the learning rate adjustment strategy is ReduceLROnPlateau.

Wherein, y _t The actual charging load value corresponding to the time t obtained from the history data.

2. Model prediction phase

Similarly, the model prediction phase, again consists of a number of steps: 1. preparing data; 2. model prediction; 3. and (6) restoring data.

1. Data preparation

The data acquisition and storage are configured in the model prediction stage, and only the normal operation of the data acquisition and storage program is required to be ensured, so that the operation is not required to be repeated in the prediction stage, the processing mode of the characteristics is the same as that in the model training stage, and the weather-related characteristics are future weather prediction results obtained through a weather forecast interface.

2. Model prediction

In the model prediction process, two cases are divided: 1) Predicting the load value of the next day of the site; 2) And predicting the load value of the station on the day.

In the first case, we basically adopt a way to predict the charging load value of a station corresponding to 1440 minutes per minute for one station at a time, in which case, the current day is not over, so the historical data of the historical data coding module of the model is shifted forward by one day. In the second case, the prediction time range may be selected, and the load value of the actually generated time period does not need to be predicted by the model, that is, the prediction result is corrected by the real data before the current time period, so that the inference time of the model may be effectively reduced, if we need to predict the charging load curve of the station on the same day, each station should correspond to 1440 prediction samples, if the current time period is at the t-th minute of the day, the [0, t-1] time period belongs to the generated time period, the charging load value of the station in the corresponding time period should be replaced by the real value, and the model actually predicts only the load value in the [ t,1439] time period when inferring, so that only the data input to the model is also the feature vector of the [ t,1439] time period.

3. Data recovery

In order to make the data distribution of model labels more stable, reduce the calculated amount of the model and avoid introducing new calculation errors due to the data distribution difference of a training set in a test set, the charging load data of a station is subjected to logarithmic processing before the data is input into the model, the real load value can not be restored in the training stage, but the real load value needs to be restored in the prediction stage, and the data restoration mode is shown in formula (16), wherein y' represents the output value of the model, and y represents the restored charging load.

y＝e ^y′ -1 (16)

Based on the charging load prediction method, in order to verify the prediction accuracy and the improvement compared with the prior art, the evaluation index is selected as R in the embodiment ² (R-Squared), which is also called fitness and is commonly used to measure the fit of the predicted result to the real situation, is generally defined as shown in equations (17) - (20), where SST (Total Sum of Squares) is the sample variance, SSE (Sum of Squares of Error) is the residual Sum of Squares, and SSR (Sum of Squares of the Regression) is the Regression Sum of Squares.

As shown in table 4, the experimental results show that the prediction model provided in this embodiment can achieve a better effect on the prediction of the charging load of the charging station of the electric vehicle, and the average R of each station in the test set ² 0.72, up to 0.95, can be used to provide data support for the construction planning and daily operation of the charging station. The reference models obtained by removing and replacing the modules on the basis of the model provided by the invention are numbered 3-6 in the table 4, wherein MLP is a model for removing a residual error connection mode and a historical data coding module, MLP _ RES is a model for removing a historical data coding module, MLP _ ATT is a model for removing a residual error connection mode, and MLP _ RES _ GRU is a model for replacing an attention coding layer in the historical coding module by a GRU. From the experimental results in table 4, it can be seen that the LightGBM, XGBoost, MLP _ RES have no historical data processing module, and the model fitting effect is poor; the model effect of removing and replacing part of the modules is slightly worse than that of the method proposed by the inventor, which also shows that the modules of the proposed model are effective.

TABLE 4R of control and experimental models on test set ²

By adopting the charging load prediction method, the four randomly selected stations are subjected to prediction verification, and the real charging loads of the four stations and the prediction results are compared with a graph shown in fig. 5 to 8.

Claims

1. A charging station charging load prediction method based on an attention mechanism is characterized by comprising a prediction model, wherein the prediction model comprises a feature embedding module, a historical data coding module and a fusion prediction module;

the prediction model training steps are as follows:

acquiring characteristic data of a charging station time period t and a charging load value y corresponding to the time period t _t All feature data constituting a feature vector r _t (ii) a Extracting a feature vector r _t The part related to the middle time and the weather characteristics is recorded as g _t ；

And

the input vector of the historical data encoding module is u ^t And g _t ；

inputting the vector X into a network basic module FA to obtain FA (X);

K ^t ＝FA(u ^t +FA(u ^t ))，

for g to _t The same treatment gives: q ^t ＝FA(g ^t +FA(g ^t ))；

K ^t And Q ^t After multiplication, normalization processing is carried out to obtain the prediction y of the characteristic data pair corresponding to the time period t in the historical data _t Influence factor alpha of ^t :

α ^t ＝softmax(K ^t ×Q ^t )

By alpha ^t To p ^t Weighting and calculating to obtain the output vector V of the historical data coding module ^t ：

The input vector of the fusion prediction module is O ^t ，O ^t ＝[r _t ,V ^t ]；

The output of the fusion prediction module is:

wherein, W _p And b _p Respectively representing a weight matrix and an offset, wherein the weight matrix and the offset are updated weights after learning and training in a prediction model;

setting the mean square error loss function L _MSE ，

the prediction steps of the prediction model are as follows:

I.e. the predicted charging load value for time period t'.

2. The charging station charging based on attention mechanism as claimed in claim 1The load prediction method is characterized in that the characteristic data comprises x _t ＝[x _t0 ,x _t1 ,…,x _td-1 ]And h _t ＝[h _t0 ,h _t1 ,…,h _tk-1 ]，x _t A continuous feature vector h formed by d continuous feature data corresponding to the representation time interval t _t A category type feature vector composed of k category type feature data corresponding to the time period t; class type feature vector h _t Converting input feature embedding module into continuous feature vector h' _t Continuous type feature vector x _t And continuous type feature vector h' _t Combining to obtain the characteristic vector r _t ＝[h′ _t |x _t ]。

3. The method of claim 2, wherein the feature embedding module processes the input class-type feature vector as follows: one-hot coding is carried out on each feature, and then a learnable weight matrix is multiplied to convert the feature into a continuous feature vector with dimension r, wherein r is a hyper-parameter of the model and is determined through priori knowledge and manual debugging.

4. The method of claim 1, wherein the linear layer has an input-output relationship of:

f(X)＝XW ^T +b，

mu represents the expectation of an input vector X, sigma is a standard deviation, epsilon is a positive number close to 0, gamma and beta respectively represent affine transformation parameter vectors and are weight values which need to be updated after learning and training in a prediction model;

the input and output relational expression of the Tanh layer is as follows:

5. the method as claimed in claim 1, wherein the feature data input into the prediction model is normalized by the following formula:

6. The method of claim 1, wherein the charging load value is a logarithmized charging load value, and the logarithmized charging load value is represented by the following formula:

y＝log(y′+1)

where y' represents the original data and y represents the data after the logarithmic process.

7. The attention-based charging station charging load prediction method of claim 1, wherein the softmax function is formulated as:

8. the method as claimed in claim 1, wherein the characteristic data at least includes charging station information, location information of charging station, weather information, and time information.

9. The method of claim 8, wherein the weather information comprises a current temperature and a current humidity.

10. The method of claim 8, wherein the time information comprises hours and minutes of a day that the current time is.