CN110406526A

CN110406526A - Parallel hybrid electric energy management method based on adaptive Dynamic Programming

Info

Publication number: CN110406526A
Application number: CN201910717298.0A
Authority: CN
Inventors: 张冰战; 倪尧尧; 吴俊成; 邱明明
Original assignee: Hefei Polytechnic University
Current assignee: Hefei University of Technology; Hefei Polytechnic University
Priority date: 2019-08-05
Filing date: 2019-08-05
Publication date: 2019-11-05

Abstract

The invention discloses a kind of parallel hybrid electric energy management methods based on adaptive Dynamic Programming, it is intended to carry out energy management optimization to the combination drive mode in several operating modes of PHEV, include: that the total demand torque of vehicle is obtained according to the order of driver and speed, this moment battery SOC and motor torque are obtained by vehicle's current condition；The comprehensive information of vehicles obtained, establishes the parallel hybrid electric energy management model based on ADP；Execution network in ADP is used to near-optimization control strategy, it evaluates network and is used to near-optimization performance index function, the combination of the two networks is equivalent to the dynamic change that an intelligent body is capable of intelligence learning and online response system, it is automatically adjusted it to the parameter in network structure, the energy management model of foundation is solved, obtains optimal distribution of torque to motor.The method of the present invention can in real time by demand torque reasonably, be optimally assigned to engine and motor.

Description

Parallel hybrid electric energy management method based on adaptive Dynamic Programming

Technical field

The invention belongs to parallel hybrid electric energy management technical fields, relate to a kind of based on adaptive dynamic rule The parallel hybrid electric energy management method drawn.

Background technique

Energy management strategies are the key that realize that vehicle energy demand is distributed between engine and motor, the warp of PHEV Ji property, dynamic property and used energy management strategies are closely related.Energy management strategies are oil burning power system and electric drive System realizes the tie of good combination, in order to improve energy efficiency to the maximum extent and reduce exhaust emission, needs efficient energy It measures management strategy and realizes distribution of the demand power between engine and motor.The core of energy management sought to solve to the phase Hope the control that energy is converted between target and the vehicle performance of acquisition.Desired performance indicator multilist reduces fuel consumption now, subtracts Few noxious gas emission, increase comfort and extension battery life etc. are many-sided.The purpose of control strategy is exactly to improve fuel oil Economy, energy-saving and emission-reduction and guarantee that system has preferable performance indicator.Control strategy determine what moment, which type of Under load, how internal combustion engine and motor will be used.

The drive system of PHEV in such a way that engine and motor are in parallel, have engine and driving motor directly to The ability of driving wheel offer mechanical output.Motor had not only been able to achieve the function of motor but also had been able to achieve the function of generator.PHEV two Item driving path parallel connection increases driving power to enhance the dynamic property of hybrid vehicle, parallel hybrid electric from Engine is into the power transmission process between wheel, converted without mechanical energy-electric energy-mechanic energy in addition to frictional dissipation Journey, energy conversion efficiency are high.

Summary of the invention

(1) technical problems to be solved

The object of the present invention is to provide one kind to be based on adaptive Dynamic Programming (Adaptive Dynamic Programming, ADP) parallel hybrid electric (Parallel Hybrid Electrical Vehicle, PHEV) energy Quantity management method uses a kind of novel intelligent algorithm --- adaptive Dynamic Programming, and it is applied to parallel hybrid power In the energy management control of automobile, PHEV can be promoted while the smooth change and efficient region for maintaining battery SOC work Fuel economy.

(2) technical solution

Parallel hybrid electric energy management method based on adaptive Dynamic Programming, it is characterised in that including walking as follows It is rapid:

Step 1, ordering as the stroke and speed of accelerator pedal or brake pedal obtain vehicle always according to driver Demand torque T obtains this moment battery charge state SOC and motor torque T by vehicle's current condition_e；

Step 2, be based on adaptive dynamic programming method, with maintain battery SOC smooth change and efficient region work The energy management model that the fuel economy of PHEV establishes parallel hybrid electric for target is promoted simultaneously；

Step 3 is learnt by executing network in adaptive dynamic programming method and evaluating the intelligent online of network come to energy Amount administrative model is solved, and is obtained the optimal demand torque for distributing to motor, is then divided further according to total demand torque The optimal demand torque of dispensing engine；Detailed process is as follows:

Step 3.1, initialization evaluation network and the weight for executing network；

Step 3.2, the battery pack SOC that each sampling instant is acquired, vehicle demand torque T and engine current time turn Square T_eInput executes network, and executing network output is motor demand torque T_{m_req}；

Step 3.3, the battery pack SOC that each sampling instant is acquired, vehicle demand torque T, engine current time turn Square T_eWith the motor demand torque T for executing network output_{m_req}As the input of evaluation network, the approximation of cost function J is obtained

Step 3.4, according to evaluation network right value update mode, the more weight of New Appraisement network, make its exportWith generation The error E of valence function J_cConstantly approach 0；

Step 3.5, according to execute network right value update mode, update execute network weight；

Step 3.6 updates and records above-mentioned execution network and evaluates the weight of network；

Circulation step 3.1~3.6, untilIt is approximately equal to cost function J, optimum control is completed and exports T_{m_req}。

The parallel hybrid electric energy management method based on adaptive Dynamic Programming, it is characterised in that: step In rapid 3, the cost function J is exactly the purpose function for finding optimal solution, is the pass of adaptive dynamic programming method operation Key, is defined as:

In formula, γ is discount factor, and 0 < γ≤1；U is utility function；The purpose of adaptive Dynamic Programming is exactly to select One control sequence u (i), i=k, k+1 ... so that the cost function J of definition is minimized；

Wherein, the quadratic form utility function U defined in conjunction with SOC is as follows:

U (k)=x (k) Ax (k)^T+ε(SOC-τ)²；

In formula, A is the unit matrix for meeting this formula matrix operation；X is the state variable of system input；ε be discount because Son；τ is to fluctuate lower limit according to the SOC that vehicle is chosen.

The parallel hybrid electric energy management method based on adaptive Dynamic Programming, it is characterised in that: institute The optimization object function of commentary valence network are as follows:

Wherein, e_c(k) the prediction error of evaluation network is indicated；

Evaluate the output valve of networkIt can be by being realized with time minimum error below:

When having E for all k_c(k)=0 when, above formula means

Obviously have,Therefore, error function defined in minimizing, will obtain a trained nerve Network, the output valve of the evaluation networkIt is an estimation of the cost function J of definition.

The parallel hybrid electric energy management method based on adaptive Dynamic Programming, it is characterized in that, institute The right value update of commentary valence network is carried out using gradient descent method, is made:

W_c(k+1)=W_c(k)+ΔW_c(k),

Wherein,

In formula, k indicates sampling period, W_cRepresent the weight of evaluation network, E_c(k) the optimization aim letter of evaluation network is indicated Number, l_c(k) learning rate of evaluation network is indicated.

The parallel hybrid electric energy management method based on adaptive Dynamic Programming, it is characterized in that, institute Stating and executing the right value update mode of network is by using control signal u (k), to minimizeFor target；Specific steps are as follows:

Battery pack SOC, vehicle demand torque T and the engine current time torque T that each sampling instant is acquired_eInput Network is executed, executing network output is motor demand torque T_{m_req}, and execute the right value update model of network are as follows:

W_a(k+1)=W_a(k)+ΔW_a(k),

Wherein,

In formula, W_aIndicate that the weight of execution network, u (k) indicate control variable, l_a(k) learning rate of execution network is indicated.

The parallel hybrid electric energy management method based on adaptive Dynamic Programming, it is characterized in that: institute Stating execution network and evaluating network to be all made of BP neural network is also three_layer planar waveguide, and the training of network is by positive meter It calculates and reversed error propagation process forms.

(3) beneficial effect

Parallel hybrid electric energy management control method provided by the invention based on adaptive Dynamic Programming, overcomes The control strategy based on optimization such as the rule-based control strategies such as electric auxiliary, logic threshold and conventional Dynamic Programming (DP) Deficiency is worked as by the method for adaptive Dynamic Programming using vehicle total demand torque T, battery charge state SOC and engine Preceding torque T_eFor input, the fuel-economy of PHEV is promoted while work with the smooth change and efficient region that maintain battery SOC Property for target establish the energy management model of parallel hybrid electric, wherein executing network and evaluation in adaptive Dynamic Programming The combination of network is equivalent to the dynamic change that an intelligent body (Agent) is capable of intelligence learning and online response system, keeps its right Parameter in network structure is automatically adjusted, and is solved, is obtained optimal to energy management model established by the present invention Distribution of torque guarantees that PHEV can high-efficiency operation in different driving cycles to motor.

Detailed description of the invention

Fig. 1 is adaptive Dynamic Programming schematic diagram of the present invention.

Fig. 2 is the parallel hybrid electric energy management control strategy of the present invention based on adaptive Dynamic Programming Structural schematic diagram.

Fig. 3 is the structural scheme of mechanism that network is evaluated in adaptive Dynamic Programming of the present invention.

Fig. 4 is the structural scheme of mechanism that network is executed in adaptive Dynamic Programming of the present invention.

Fig. 5 is adaptive dynamic programming algorithm flow chart (ADHDP) of the present invention.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention do into The detailed description of one step.

The thought of adaptive Dynamic Programming (ADP) is using approximation to function structure, to approach the property in Dynamic Programming Equation Energy target function and control strategy, to meet the principle of optimization.To obtain optimum control and optimal performance index function.It is calculated The leading of method is exactly selection to controlling network, evaluating network and prototype network, minimizes cost by the iteration of network weight Function.It embodies an important development direction of artificial intelligence and control field, is to dynamic programming algorithm and intensified learning The synthesis of algorithm, the advantage with both the above algorithm, not only with the optimizing characteristic of Dynamic Programming, but with intensified learning from Learning characteristic.As shown in Figure 1, ADP is mainly consisted of three parts: dynamical system, execution (Action) function and evaluation (Critic) function.Each part can be replaced by neural network, and the present invention uses BP neural network, and BP neural network is as three Layer feedforward neural network by the training to sample come the feature of Learning Studies object, so that it be made to possess the energy of association and prediction Power.Wherein dynamical system can be modeled by neural network, executed network and be used to near-optimization control strategy, evaluate network For near-optimization performance index function.The combination of the two is equivalent to an intelligent body (Agent), control/execution afterwards (Action) reward/punishment (Reward/Penalty) that dynamical system (or controlled device) is generated in different phase is acted on To influence evaluation function.Approximation to function structure or neural network are recycled, realizes and execution function and evaluation function is approached, But executing function is to carry out on the basis of estimation of score function, that is, evaluation function must be made minimum.Evaluation function Parameter update be is carried out based on Bell's graceful optimum principle, can not only reduce the forward calculation time in this way, can also online The dynamic change for responding unknown system, is automatically adjusted it to certain parameters in network structure.

In conjunction with Fig. 2~Fig. 5, the parallel hybrid electric energy management provided by the invention based on adaptive Dynamic Programming Control method includes the following steps:

Step 1: total according to the order (stroke of accelerator pedal or brake pedal) of driver and speed acquisition vehicle Demand torque T obtains this moment battery charge state SOC and motor torque T by vehicle's current condition_e；

Step 2: be based on adaptive dynamic programming method, with maintain battery SOC smooth change and efficient region work The energy management model that the fuel economy of PHEV establishes parallel hybrid electric for target is promoted simultaneously.

In the present embodiment, adaptive dynamic programming method contains BP neural network, dynamic programming algorithm and extensive chemical The theory of habit derives from time forward dynamic programming method, is closed by the iteration of optimal policy, cost or their derivative System constantly generates the general solution of forward Dynamic Programming, and target is to overcome " dimension calamity ", and guarantee to converge to an approximation at any time Optimal solution.ADP has the advantage that

(1) adaptive Dynamic Programming is not relying on an accurate controlled device mathematical model sometimes, and controller can With online " study " control.

(2) adaptive Dynamic Programming can avoid " dimension calamity " problem of Dynamic Programming (DP) by Approach by inchmeal.

(3) adaptive Dynamic Programming does not need precise definition system performance index.

(4) adaptive Dynamic Programming is to solve Control of Nonlinear Systems to open up a new way.

The present invention is using one of adaptive Dynamic Programming classification --- and it executes and relies on heuristic dynamic programming (Action Dependent Heuristic Dynamic Programming, ADHDP), be heuristic dynamic programming (HDP) execution according to Rely form.ADHDP does not need prototype network, and only comprising executing network and evaluation network, the two networks are according to real system It is required that can choose suitable neural network structure, the present invention selects BP (Back Propagation) neural network.Wherein, it holds The input of row network is the state variable x of system, and output is the current control variable u (k) of system, it will generate one it is optimal or The control sequence u (i) of person's suboptimum, i=k, k+1 ..., so that defined performance index function J (i.e. cost) is minimized, cost Function is exactly the purpose function for finding optimal solution, this is also the effect of cost function, it is adaptive dynamic programming method The key of operation defines the quality of methods and results in a sense.State variable and control variable are all commented in ADHDP The input of valence network, output are the approximations to cost function。

Step 3: being learnt by executing network in adaptive dynamic programming method and evaluating the intelligent online of network come to energy Amount administrative model is solved, and is obtained the optimal demand torque for distributing to motor, is then divided further according to total demand torque The optimal demand torque of dispensing engine.Detailed process is as follows:

(1) weight of initialization evaluation network and execution network；

(2) battery pack SOC, vehicle demand torque T and the engine current time torque T acquired each sampling instant_e Input executes network, and executing network output is motor demand torque T_{m_req}；

(3) battery pack SOC, the vehicle demand torque T, engine current time torque T acquired each sampling instant_eWith Execute the motor demand torque T of network output_{m_req}As the input of evaluation network, the approximation of cost function is obtained

(4) according to the right value update mode of evaluation network, the more weight of New Appraisement network exports itIt is approximately equal to Cost function J.

(5) according to the right value update mode for executing network, the weight for executing network is updated；

(6) it updates and records above-mentioned execution network and evaluate the weight of network；

(2)~(6) are recycled, untilIt is approximately equal to cost function J, optimum control is completed and exports T_{m_req}。

In step 3, evaluation network and the intelligent online for executing network learn, and specific training process is as follows:

1) the on-line training process of network is evaluated

It evaluates network and uses three layers of BP neural network, 4 input neurons, 25 hidden layer neurons and 1 output mind Through member.The hidden layer for evaluating network uses bipolarity sigmoidal function, and output layer uses linear function purelin.Evaluate net The training of network consists of two parts, and one is positive calculating process, the other is the error of more New Appraisement network weight matrix Back-propagation process.

The input vector in definition evaluation network k stage is inputC (k), does not bring specific value into herein for convenience of description.

InputC (k)=[u₁(k),…,u_m(k),x₁(k),…,x_n(k)]

Evaluation network positive calculating process be

In formula, c_h1jIt (k) is the input for evaluating network concealed j-th of node of layer；c_h2jIt (k) is the network concealed layer jth of evaluation The output of a node.

The training for evaluating network uses gradient descent method, is realized by minimizing the error that following formula defines.

The right value update process derivation for evaluating network is as follows.

①W_c2(weight matrix of hidden layer to output layer).

W_c2(k+1)=W_c2(k)+ΔW_c2(k)

②W_c1(weight matrix of input layer to hidden layer).

W_c1(k+1)=W_c1(k)+ΔW_c1(k)

2) the on-line training process of network is executed

The BP neural network that network equally uses three layers is executed, hidden layer neuron quantity is 20, and neuron uses Sigmoidal function calculates, and output layer uses purelin linear function.The input for executing network is that three state variables include Battery SOC, vehicle demand torque T and engine current torque T_e.Network is executed to play a significant role ADP algorithm, it is each The control law of step provides the calculating for being decided by execute network.

The training for executing network is still made of positive calculating and reversed error propagation process.It is same to describe for convenience Do not bring specific value into, positive calculating process is

In formula, a_h1jIt (k) is the input for executing network concealed j-th of node of layer；a_h2jIt (k) is to execute network concealed layer jth The output of a node.

The training of network is executed to minimizeFor target.The training for executing network still uses gradient descent method.

The right value update process derivation for executing network is as follows.

①W_a2(weight matrix of hidden layer to output layer)

W_a2(k+1)=W_a2(k)+ΔW_a2(k)

(2. the weight matrix of input layer to hidden layer)

In formula, W_a2j:=W_a2(j,:),W_a2(j :) it is common matrix representation forms in MATLAB, representing matrix W_a2's Jth row.

W_a1(k+1)=W_a1(k)+ΔW_a1(k)

3) selection of network of relation parameter

In the implementation procedure of ADHDP, relevant parameter mainly includes the hidden layer number of nodes evaluated network and execute network And learning rate and discount factor.Specific method determines the number of hiding node layer currently not yet, in neural network In, the number of hidden layer neuron is generally possible to reflect to nonlinear mapping ability, theoretically, the quantity of the node of hidden layer Bigger, nonlinear mapping ability is better.But with becoming larger for neuronal quantity, learning rate can decline；Neuron number is few, It is corresponding to approach function and die down.So the selection of hidden layer neuron number is needed through depending on overtesting virtual condition.This It is 20 that invention, which is chosen and executes the hidden layer neuron number of network, and the hidden layer neuron number for evaluating network is 25.

Learning rate is one and is greater than 0 number less than 1, and learning rate is bigger, and pace of learning is faster, but excessive learning rate It can cause to vibrate, too small learning rate makes pace of learning too slow again, causes the training time too long.Therefore, learning rate usually with The time be reduced to a lesser value from a biggish initial value, to accelerate pace of learning and avoid vibrating.Discount factor Usually one positive number no more than 1 also determines the value without specific method now, can only be by test result come really It is fixed.In general, the easier success of discount factor more small test, and the bigger control effect of discount factor is more preferably.

4) algorithm flow chart

Fig. 5 show the ADHDP algorithm flow chart that uses of the present invention, i.e. while instructing using based on " parallel " training method Practice and executes network and evaluation network to control system progress Strategies Training.

Claims

1. the parallel hybrid electric energy management method based on adaptive Dynamic Programming, it is characterised in that including walking as follows It is rapid:

Step 1 obtains the total demand of vehicle according to the order such as stroke and speed of accelerator pedal or brake pedal of driver Torque T obtains this moment battery charge state SOC and motor torque T by vehicle's current condition_e；

Step 2 is based on adaptive dynamic programming method, while the smooth change to maintain battery SOC and efficient region work Promote the energy management model that the fuel economy of PHEV establishes parallel hybrid electric for target；

Step 3 is learnt by executing network in adaptive dynamic programming method and evaluating the intelligent online of network come to energy pipe Reason model is solved, and is obtained the optimal demand torque for distributing to motor, is then distributed to further according to total demand torque The optimal demand torque of engine；Detailed process is as follows:

Step 3.2, the battery pack SOC that each sampling instant is acquired, vehicle demand torque T and engine current time torque T_e Input executes network, and executing network output is motor demand torque T_{m_req}；

Step 3.3, the battery pack SOC that each sampling instant is acquired, vehicle demand torque T, engine current time torque T_eWith Execute the motor demand torque T of network output_{m_req}As the input of evaluation network, the approximation of cost function J is obtained

Step 3.4, according to evaluation network right value update mode, the more weight of New Appraisement network, make its exportWith cost letter The error E of number J_cConstantly approach 0；

2. the parallel hybrid electric energy management method according to claim 1 based on adaptive Dynamic Programming, Be characterized in that: in step 3, it is adaptive Dynamic Programming side that the cost function J, which is exactly the purpose function for finding optimal solution, The key of method operation, is defined as:

U (k)=x (k) Ax (k)^T+ε(SOC-τ)²；

In formula, A is the unit matrix for meeting this formula matrix operation；X is the state variable of system input；ε is discount factor；τ It is that lower limit is fluctuated according to the SOC that vehicle is chosen.

3. the parallel hybrid electric energy management method according to claim 2 based on adaptive Dynamic Programming, It is characterized in that: the optimization object function of the evaluation network are as follows:

Wherein, e_c(k) the prediction error of evaluation network is indicated；

When having E for all k_c(k)=0 when, above formula means

Obviously have,Therefore, error function defined in minimizing, will obtain a trained nerve net Network, the output valve of the evaluation networkIt is an estimation of the cost function J of definition.

4. the parallel hybrid electric energy management method according to claim 3 based on adaptive Dynamic Programming, Feature is that the right value update of the evaluation network is carried out using gradient descent method, makes:

W_c(k+1)=W_c(k)+ΔW_c(k),

Wherein,

In formula, k indicates sampling period, W_cRepresent the weight of evaluation network, E_c(k) optimization object function of evaluation network, l are indicated_c (k) learning rate of evaluation network is indicated.

5. the parallel hybrid electric energy management method according to claim 4 based on adaptive Dynamic Programming, Feature is that the right value update mode for executing network is by using control signal u (k), to minimizeFor target； Specific steps are as follows:

Battery pack SOC, vehicle demand torque T and the engine current time torque T that each sampling instant is acquired_eInput executes Network, executing network output is motor demand torque T_{m_req}, and execute the right value update model of network are as follows:

W_a(k+1)=W_a(k)+ΔW_a(k),

Wherein,

6. the parallel hybrid electric energy management method according to claim 5 based on adaptive Dynamic Programming, Feature is: it is also three_layer planar waveguide, the instruction of network that the execution network, which is all made of BP neural network with evaluation network, Practice and is made of positive calculating and reversed error propagation process.