CN115503559A

CN115503559A - Learning type collaborative energy management method for fuel cell automobile considering air conditioning system

Info

Publication number: CN115503559A
Application number: CN202211385462.0A
Authority: CN
Inventors: 唐小林; 邓磊; 甘炯鹏; 朱和龙; 胡晓松; 李佳承
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2022-12-23
Anticipated expiration: 2042-11-07
Also published as: CN115503559B

Abstract

The invention relates to a learning type collaborative energy management method for a fuel cell automobile considering an air conditioning system, and belongs to the field of new energy automobiles. The method comprises the following steps: s1: acquiring vehicle state parameter information, fuel cell parameter information, power cell parameter information and air conditioning system parameter information of a fuel cell vehicle; s2: establishing a fuel cell automobile collaborative energy management model; s3: establishing a fuel cell vehicle cooperative energy management optimization control strategy considering an air conditioning system, solving a multi-objective optimization problem including hydrogen combustion economy and cabin temperature comfort by combining a SAC algorithm, and controlling the change of the refrigeration/heating capacity of an air conditioner to maintain the cabin temperature in a comfort interval while performing energy flow optimization control. The invention can effectively solve the compromise problem between hydrogen energy consumption and cabin temperature comfort, and optimize the hydrogen-burning economy and cabin temperature comfort of the fuel cell automobile.

Description

Learning type collaborative energy management method for fuel cell automobile considering air conditioning system

Technical Field

The invention belongs to the field of new energy automobiles, and relates to a learning type collaborative energy management method for a fuel cell automobile considering an air conditioning system.

Background

In the face of increasingly severe problems of ecological environment pollution, fossil fuel shortage and the like, various automobile manufacturers strive to develop new energy automobiles. With the development of fuel cell technology, fuel cell vehicles fully exert the advantages of zero emission, low energy consumption and strong endurance, and are considered to be one of important research directions for realizing the sustainable development of vehicles in the future. The energy management strategy is a core control technology of a fuel cell automobile multi-power source system, and the quality of the performance directly determines the economic performance of the whole automobile. In current research, energy management methods are mainly divided into three types: rule-based, optimization-based, and learning-based energy management strategies. However, rule-based and optimization-based energy management methods face a dilemma that they cannot meet both real-time and optimality; for the traditional deep reinforcement learning algorithm, although the real-time performance and the optimality of energy flow optimization can be realized at the same time, certain defects exist in the aspects of training data and hyper-parameter setting. Therefore, the proposal of the soft constraint actor critic algorithm provides a method for solving the problems.

On the other hand, the air conditioning system is an indispensable auxiliary device for a fuel cell vehicle, and contributes to providing a comfortable riding environment for passengers in the vehicle. However, the use of the air conditioning system inevitably increases the energy consumption of the fuel cell vehicle, thereby affecting the economic performance of the entire vehicle. In the current research on the energy management method of the fuel cell automobile, the energy consumption of the air conditioning system is generally regarded as a fixed value or ignored. However, due to the change of driving environment, the heat exchange quantity inside and outside the cab changes, and the power used by the air conditioning system changes.

Therefore, a new energy management method for a fuel cell vehicle is needed to coordinate and control the air conditioning system and the power source components, and to optimize the energy flow in the vehicle while considering the energy consumption variation of the air conditioning system.

Disclosure of Invention

In view of the above, the present invention provides a learning-type collaborative energy management method for a fuel cell vehicle considering an air conditioning system, which coordinately controls the air conditioning system and power source components of the fuel cell vehicle by applying a Soft constraint actor critic (SAC) algorithm, so as to optimize the energy flow of the entire vehicle while ensuring cabin comfort, so as to reduce the energy consumption of the entire vehicle of the fuel cell vehicle.

In order to achieve the purpose, the invention provides the following technical scheme:

a learning type collaborative energy management method for a fuel cell automobile considering an air conditioning system specifically comprises the following steps:

s1: acquiring vehicle state parameter information, fuel cell parameter information, power cell parameter information and air conditioning system parameter information of a fuel cell vehicle;

s2: establishing a fuel cell automobile collaborative energy management model, comprising the following steps: the method comprises the following steps that a whole vehicle longitudinal dynamics model, a fuel cell model, a power cell model, a motor model, an air conditioning system model and a vehicle cabin thermal load model are adopted;

s3: establishing a fuel cell automobile cooperative energy management optimization control strategy considering an air conditioning system, solving a multi-objective optimization problem containing hydrogen combustion economy and cabin temperature comfort by combining a SAC algorithm, and controlling the change of the refrigeration/heating capacity of an air conditioner to maintain the cabin temperature in a comfortable interval while performing energy flow optimization control; the SAC algorithm is a soft-constraint actor critic algorithm.

Further, in step S1, the vehicle state parameter information includes: the method comprises the following steps of (1) vehicle speed, vehicle cabin thermal load parameters, motor operation efficiency and transmission system characteristic parameters; the fuel cell parameter information includes: power, efficiency, and hydrogen energy consumption of the fuel cell; the power battery parameter information comprises: the state of charge, internal resistance and open circuit voltage of the power battery; the air conditioning system parameter information includes: air conditioning system cooling capacity/heating capacity and corresponding power.

Further, in step S2, the established longitudinal dynamics model of the entire vehicle is:

P _drive ＝(F _air +F _f +F _i +m ₀ a)·v

P _dem ＝P _b +P _fc ·η _DC/DC

wherein ,m₀ Representing the mass of the whole vehicle; v represents the vehicle speed of the whole vehicle; a represents a vehicle acceleration; f _air Expressed as air resistance; f _f Expressed as rolling resistance; f _i Expressed as acceleration resistance; eta _m 、η _DC/AC 、η _DC/DC and η_motor Respectively representing transmission efficiency, DC/AC converter efficiency, DC/DC converter efficiency and motor efficiency; p _drive 、P _dem 、P _b and P_fc Respectively representing the driving power, the required power, and the battery output power, the fuel cell output power at the vehicle wheels.

Further, in step S2, the fuel cell model is established as follows:

η _fc ＝f _η (P _fc )

wherein ,f_η(·) and

the efficiency and hydrogen consumption can be calculated by interpolation, respectively expressed as fitting functions of the efficiency and hydrogen consumption.

Further, in step S2, the power battery model established is:

wherein ,I_L Expressed as power cell current; v _oc Expressed as the power cell open circuit voltage; r _in Expressed as the equivalent internal resistance of the power battery; SOC (system on chip) ₀ Expressed as initial SOC; q _t Expressed as the maximum capacity of the power battery; t is t ₀ Expressed as an initial time; t is t _f Denoted as the final time instant.

Further, in step S2, the established motor model is:

η _m ＝f _m (ω _m ,T _m )

wherein ,ω_m and T_m Respectively representing the rotating speed and the torque of the motor; p _m Expressed as motor output power, f _m The (DEG) is expressed as a fitting function of the working efficiency of the motor, and the working efficiency of the motor can be obtained by an interpolation method.

Further, in step S2, the air conditioning system model is established as follows:

wherein ,Q_ac Expressed as a cooling capacity or a heating capacity of the air conditioning system; p _ac Expressed as the corresponding power consumption of the air conditioning system; eta _cop Expressed as the air conditioning system coefficient of performance.

Further, in step S2, the built cabin thermal load model is:

Q _c ＝∑KF(T _out -T _in )

Q _h ＝145+116n

Q _n ＝m _e ξCp _air (T _out -T _in )

wherein ,Q_c 、Q _r 、Q _h and Q_n Respectively representing thermal conduction load, radiant heat load, heat generated by the vehicle occupant (empirically, about 145 watts of heat generated by the driver and about 116 watts of heat generated by each occupant), and ventilation system heat load; k is expressed as the heat transfer coefficient; f denotes the heat transfer area of the respective housing; t is _out Expressed as ambient temperature; t is _in Expressed as cabin air temperature; η is expressed as permeability; i represents the intensity of sunlight; a. The _i Expressed as windshield, left and right side windows, and rear window area; theta _i Expressed as the sunlight incident angle; β is expressed as a shading factor; n represents the number of passengers in the vehicle; m is a unit of _e Expressed as the mass of air passing through the evaporator; ξ is expressed as the air recirculation coefficient; cp _air Expressed as indoor air heat capacity; rho _air and V_air Respectively, as air density and cabin volume in the cabin.

Further, in step S3, establishing a fuel cell vehicle cooperative energy management optimization control strategy considering an air conditioning system, specifically including the following steps:

s301: determining a state space: in order to reflect key environmental information, the SOC of the power battery and the output power P of the fuel battery are measured _fc Vehicle speed v, cooling/heating capacity Q of air conditioning system _ac Set as a state variable, a state space S is constructed, which can be expressed as:

S＝{SOC,P _fc ,v,Q _ac }

s302: determining an action space: considering the cooperative energy management of the air conditioning system, the power of the power source is not only distributed, but also changed according to the refrigerating/heating capacity of the air conditioning systemMaintaining thermal comfort of the cabin temperature, for which purpose the fuel cell output power is varied

And air conditioning system cooling/heating capacity variation

Setting as an action variable, constructing an action space A, which can be expressed as:

s303: establishing a reward function: in order to ensure the comfort of the cabin temperature, the temperature in the cabin of the vehicle is maintained at about 24 ℃, for this reason, the reward function also comprises an optimization term of the cabin temperature change, and then the reward function R is set as the weighted sum of three indexes of hydrogen energy consumption, SOC change and cabin temperature change, which is expressed as:

R＝-(ζ·fuel(t)+ψ·(SOC(t)-0.7) ² +γ·(T _in -24) ² )

zeta, psi and gamma are weight factors of each optimization item, and the problem of compromise between hydrogen energy consumption and cabin temperature comfort is solved by adjusting the weight factors, so that the multi-objective optimization problem is solved; fuel (t) represents the amount of hydrogen energy consumption at the present time; the SOC (t) represents the state of charge of the power battery at the present time.

Further, in the step S3, a multi-objective optimization problem including hydrogen combustion economy and cabin temperature comfort is solved by combining a SAC algorithm, and the method specifically comprises the following steps:

s311: the multi-objective optimization problem in energy management is solved by combining a SAC algorithm, action entropy is introduced into the SAC algorithm to enable action output to be more dispersed, and then exploration capacity, new task learning capacity and stability of the algorithm are improved, wherein the entropy is expressed as:

H(π(·|s _t ))＝-logπ(·|s _t )

wherein H is strategy pi (· | s) _t ) Entropy of (2).

S312: in the solution process, the actor network in the agent is in state s _t As input, the mean and variance of the Gaussian distribution of the motion are output, and the motion a is generated by using a re-parameterization technology _t ：

wherein ,τ_t Represents a noise signal sampled from a standard normal distribution;

representing the mean and variance of the function output;

and

respectively, mean and variance of the gaussian distribution.

S313: performing action a _t Thereafter, the vehicle environment feeds back a reward r to the agent _t And shifts to the next state s _t+1 I.e. the interactive data(s) of the environment and the intelligent agent can be generated _t ,a _t ,r _t ,s _t+1 And stored in an experience pool

In (1).

S314: randomly extracting a small batch of experience samples from an experience pool, and introducing a parameter theta to avoid overestimation when the function value of the action state is maximized and further overestimation when the target is calculated by utilizing the network of the user ₁ ,θ ₂ Is the evaluation critic network and the parameter is θ' ₁ ，θ′ ₂ The target critic network selects the target critic network to output a smaller action state function value as a target value; for a particular state s _t And action a _t Soft constrained action value function Q in SAC algorithm _soft (s _t ,a _t ) The update formula is as follows:

wherein r represents a reward earned by the vehicle; gamma represents a discount factor; α represents a temperature coefficient.

S315: by minimizing the loss function L (theta) when updating the policy network _i ) Updating the evaluation critic network, the loss function being defined as

And

mean square error between, expressed as:

wherein ,

expressed as an evaluation critic network parameter of theta _i An evaluation function of time, and

the list is a target comment family network parameter of theta' _i The evaluation function of time.

S316: the actor network parameter updating is realized by minimizing KL divergence, and the smaller the KL value is, the smaller the difference between rewards corresponding to output actions is, and the better the convergence effect of the strategy is; objective function of actor network

Is defined as:

wherein ,D_KL Expressing KL divergence calculation expressions; z(s) _t ) Is a partition function for normalizing the distribution;

indicating the vehicle state s at the current moment _t And performing action a _t The mathematical expectation function of the time of day,

indicates that the current state is s _t The function of the policy in time,

expressed as parameters of the policy function.

S317: updating actor network parameters according to a gradient descent method, represented as:

wherein ,

expressed in terms of policy function parameters

The gradient of the fall of (a) is,

is shown as relating to the execution of action a at the current time t _t A falling gradient of (c).

S318: in the SAC algorithm system, the adjustment of the temperature coefficient alpha is important for the training effect of the SAC algorithm, and the values of the optimal temperature coefficient are different in different reinforcement learning tasks and training periods. In order to realize the automatic adjustment of the temperature coefficient, the minimum value of an objective function in the optimization problem is solved, so that the optimal temperature coefficient of each step can be obtained by updating, wherein the objective function is expressed as:

wherein ,H₀ A threshold value representing a predefined minimum policy entropy,

expressed as a function of the policy pi _t Performing action a _t Mathematical expectation function of time, pi _t (a _t |s _t ) Expressed as a policy function, s _t Is expressed as the state of the fuel cell vehicle at the current time t, a _t It is expressed as an action executed according to the policy function at the current time t.

The invention has the beneficial effects that:

1) The invention designs an energy management strategy based on a soft constraint actor critic algorithm, effectively gets rid of the dependence of the traditional deep reinforcement learning algorithm on training data and hyper-parameter setting in the fuel cell automobile energy management application, and is beneficial to improving the stability of control tasks under a continuous action space.

2) Considering that the energy consumption change of the air conditioning system is generally ignored during the design of the energy management problem of the fuel cell automobile, the invention sets up a cooperative energy management optimization control framework considering the air conditioning system by taking hydrogen energy consumption, SOC maintenance and cabin temperature comfort as optimization targets, and realizes the cooperative control of the energy management and the air conditioning system.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of a fuel cell vehicle collaborative energy management method of the present invention;

FIG. 2 is a schematic structural diagram of a multi-power-source system of a fuel cell vehicle;

FIG. 3 is a schematic diagram of a cabin thermal load model and an air conditioning system configuration;

fig. 4 is a diagram of a collaborative energy management framework in consideration of an air conditioning system built by applying a SAC algorithm in the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Referring to fig. 1 to 4, the fuel cell vehicle collaborative energy management optimization method considering the air conditioning system is designed based on the soft constraint actor critic algorithm. Considering that the energy consumption change of an air conditioning system is generally ignored in the energy management of a fuel cell automobile, the main influence factors of the temperature comfort in the cabin of the automobile are analyzed, an air conditioning system model and a cabin heat load model are established, the hydrogen consumption, the SOC maintenance and the cabin temperature are taken as optimization targets, a collaborative energy management optimization control framework considering the air conditioning system is established by applying a soft constraint actor critic algorithm suitable for control tasks under a continuous action space, the collaborative control of the energy management and the air conditioning system is realized, and the hydrogen combustion economy and the cabin temperature comfort of the fuel cell automobile are optimized. As shown in fig. 1, the energy management collaborative optimization method specifically includes the following steps:

s1: acquiring key parameter information of a fuel cell vehicle, comprising the following steps:

the vehicle state parameter information includes: the method comprises the following steps of (1) vehicle speed, vehicle cabin thermal load parameters, motor operation efficiency and transmission system characteristic parameters;

the fuel cell parameter information includes: power, efficiency, and hydrogen energy consumption of the fuel cell;

the power battery parameter information comprises: the state of charge, internal resistance and open circuit voltage of the power battery;

the air conditioning system parameter information includes: air conditioning system cooling capacity/heating capacity and corresponding power.

S2: establishing a fuel cell vehicle collaborative energy management model, as shown in fig. 2 and 3, the specific steps are as follows:

s21: establishing a longitudinal dynamic model of the whole vehicle:

P _drive ＝(F _air +F _f +F _i +m ₀ a)·v

P _dem ＝P _b +P _fc ·η _DC/DC

S22: establishing a fuel cell model:

η _fc ＝f _η (P _fc )

wherein ,f_η(·) and

S23: establishing a power battery model:

wherein ,I_L Expressed as power cell current; v _oc Expressed as the power cell open circuit voltage; r is _in Expressed as the equivalent internal resistance of the power battery; SOC ₀ Expressed as initial SOC; q _t Expressed as the maximum capacity of the power battery; t is t ₀ Expressed as an initial time; t is t _f Denoted as the final time instant.

S24: establishing a motor model:

η _m ＝f _m (ω _m ,T _m )

S25: establishing an air conditioning system model:

S26: establishing a vehicle cabin heat load model:

Q _c ＝∑KF(T _out -T _in )

Q _h ＝145+116n

Q _n ＝m _e ξCp _air (T _out -T _in )

wherein ,Q_c 、Q _r 、Q _h and Q_n Respectively representing thermal conduction load, radiant heat load, heat generated by the vehicle occupant (empirically, about 145 watts of heat generated by the driver and about 116 watts of heat generated by each occupant), and ventilation system heat load; k is expressed as the heat transfer coefficient; f denotes the heat transfer area of the respective housing; t is _out Expressed as ambient temperature; t is _in Expressed as cabin air temperature; eta is expressed as permeability; i represents the intensity of sunlight; a. The _i Expressed as windshield, left and right side windows, and rear window area; theta _i Expressed as the sunlight incident angle; β is expressed as a shading factor; n represents the number of passengers in the vehicle; m is _e Expressed as the mass of air passing through the evaporator; ξ is expressed as the air recirculation coefficient; cp _air Expressed as indoor air heat capacity; rho _air and V_air Respectively, as air density and cabin volume in the cabin.

S3: a fuel cell automobile collaborative energy management optimization control framework considering an air conditioning system is established based on a SAC algorithm, and a multi-objective optimization problem including hydrogen combustion economy and cabin temperature comfort is solved. As shown in fig. 3, the cooperative control of the energy management and air conditioning system is realized by applying the soft constraint actor critic algorithm, and the hydrogen-burning economy and cabin temperature comfort of the fuel cell vehicle are optimized, specifically:

s301: in order to reflect key environmental information, the SOC of the power battery and the output power P of the fuel battery are measured _fc Vehicle speed v, air-conditioning cooling/heating capacity Q _ac Setting as a state variable, a state space is constructed, which can be expressed as:

S＝{SOC,P _fc ,v,Q _ac }

s302: considering that the cooperative energy management of the air conditioning system not only distributes power source power, but also maintains the thermal comfort of the cabin temperature according to the change of the refrigerating/heating capacity of the air conditioning system, for this reason, the output power of the fuel cell is changed by an amount

And air conditioning system cooling/heating capacity variation

Setting as an action variable, constructing an action space, which can be expressed as:

s303: in order to ensure the comfort of the cabin temperature, the temperature in the cabin of the vehicle is maintained at about 24 ℃, for this reason, the reward function also comprises an optimization term of the cabin temperature change, and then the reward function is set as a weighted sum of three indexes of hydrogen energy consumption, SOC change and cabin temperature change, which is expressed as:

R＝-(ζ·fuel(t)+ψ·(SOC(t)-0.7) ² +γ·(T _in -24) ² )

zeta, psi and gamma are weight factors of each optimization item, and the problem of compromise between hydrogen energy consumption and cabin temperature comfort is solved by adjusting the weight factors, so that the multi-objective optimization problem is solved; fuel (t) is expressed as the amount of hydrogen energy consumption at the present time; the SOC (t) is expressed as the state of charge of the power battery at the current moment.

S304: the multi-objective optimization problem in energy management is solved by combining a SAC algorithm, action entropy is introduced into the SAC algorithm to enable action output to be more dispersed, and then exploration capacity, new task learning capacity and stability of the algorithm are improved, wherein the entropy is expressed as:

H(π(·|s _t ))＝-logπ(·|s _t )

wherein H is the strategy pi (· | s) _t ) The entropy of (c).

S305: in the solution process, the actor network in the agent is in state s _t As input, the mean and variance of the Gaussian distribution of the motion are output, and the motion a is generated by using a re-parameterization technology _t ：

wherein ,τ_t Represented as a noise signal sampled from a standard normal distribution;

outputting a mean value and a variance of the function;

and

respectively, mean and variance of the gaussian distribution.

S306: performing action a _t Thereafter, the vehicle environment feeds back a reward r to the agent _t And transition to the next state s _t+1 I.e. generating interaction data s of the environment and the agent _t ,a _t ,r _t ,s _t+1 And stored in an experience pool

In (1).

S307: randomly extracting small batch of experience samples from the experience pool to avoid overestimation when maximizing the action state function value and to utilize the experience samplesFurther overestimation when the network calculates the target, and the introduced parameter is theta ₁ ,θ ₂ Is the evaluation critic network and the parameter is θ' ₁ ，θ′ ₂ The target critic network of (4) selects the target critic network to output a small action state function value as a target value. For a particular state s _t And action a _t Soft constrained action value function Q in SAC algorithm _soft (s _t, a _t ) The update formula is as follows:

wherein r represents the reward earned for the vehicle; gamma is expressed as a discount factor; α is expressed as a temperature coefficient.

S308: when updating a policy network, by minimizing a loss function L (theta) _i ) Updating the evaluation critic network, the loss function being defined as

And

mean square error between, expressed as:

wherein ,

expressed as an evaluation critic network parameter θ _i An evaluation function of time, and

the list is a target comment family network parameter of theta' _i Evaluation letter of timeAnd (4) counting.

S309: the actor network parameter updating is realized by minimizing KL divergence, and the smaller the KL value is, the smaller the difference between rewards corresponding to output actions is, and the better the convergence effect of the strategy is. Objective function of actor network

Is defined as:

wherein ,D_KL Expressed as KL divergence calculation expression; z(s) _t ) Is a partition function for normalizing the distribution;

indicates that the current state is s _t The function of the policy in time,

expressed as parameters of the policy function.

S310: updating actor network parameters according to a gradient descent method, represented as:

wherein ,

expressed in terms of policy function parameters

The gradient of the fall of (a) is,

is shown as relating to the execution of action a at the current time t _t A falling gradient of;

s311: in the SAC algorithm system, the adjustment of the temperature coefficient alpha is important for the training effect of the SAC algorithm, and the values of the optimal temperature coefficient are different in different reinforcement learning tasks and training periods. In order to realize the automatic adjustment of the temperature coefficient, the minimum value of an objective function in the optimization problem is solved, so that the optimal temperature coefficient of each step can be obtained by updating, wherein the objective function is expressed as:

wherein ,H₀ Expressed as a predefined threshold of minimum policy entropy,

expressed as a function of the policy pi _t Performing action a _t Mathematical expectation function of time, pi _t (a _t |s _t ) Expressed as a policy function, s _t Is expressed as the state of the fuel cell vehicle at the current time t, a _t It is expressed as an action performed according to the policy function at the current time t.

Finally, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A learning type collaborative energy management method for a fuel cell automobile considering an air conditioning system is characterized by comprising the following steps:

s2: establishing a fuel cell vehicle collaborative energy management model, which comprises the following steps: the method comprises the following steps that a whole vehicle longitudinal dynamics model, a fuel cell model, a power cell model, a motor model, an air conditioning system model and a vehicle cabin thermal load model are adopted;

s3: establishing a fuel cell automobile cooperative energy management optimization control strategy considering an air conditioning system, solving a multi-objective optimization problem comprising hydrogen combustion economy and cabin temperature comfort by combining a SAC algorithm, and controlling the change of air conditioning refrigeration/heating capacity to maintain the cabin temperature in a comfort interval while performing energy flow optimization control; the SAC algorithm is a soft-constraint actor critic algorithm.

2. The fuel cell automobile learning-type collaborative energy management method according to claim 1, wherein in step S1, the vehicle state parameter information includes: the method comprises the following steps of (1) vehicle speed, vehicle cabin thermal load parameters, motor operation efficiency and transmission system characteristic parameters; the fuel cell parameter information includes: power, efficiency, and hydrogen energy consumption of the fuel cell; the power battery parameter information comprises: the state of charge, internal resistance and open circuit voltage of the power battery; the air conditioning system parameter information includes: air conditioning system cooling capacity/heating capacity and corresponding power.

3. The fuel cell automobile learning type collaborative energy management method according to claim 1, wherein in the step S2, the established overall automobile longitudinal dynamics model is:

P _drive ＝(F _air +F _f +F _i +m ₀ a)·v

P _dem ＝P _b +P _fc ·η _DC/DC

wherein ,m₀ Representing the mass of the whole vehicle; v represents the vehicle speed of the whole vehicle; a represents a vehicle acceleration;F _air expressed as air resistance; f _f Expressed as rolling resistance; f _i Expressed as acceleration resistance; eta _m 、η _DC/AC 、η _DC/DC and η_motor Respectively representing transmission efficiency, DC/AC converter efficiency, DC/DC converter efficiency and motor efficiency; p _drive 、P _dem 、P _b and P_fc Respectively representing the driving power, the required power, and the battery output power, the fuel cell output power at the vehicle wheels.

4. The fuel cell vehicle learning-type collaborative energy management method according to claim 3, wherein in step S2, the fuel cell model is established as:

η _fc ＝f _η (P _fc )

wherein ,f_η(·) and

the efficiency and hydrogen consumption are calculated by interpolation as fitting functions of efficiency and hydrogen consumption, respectively.

5. The fuel cell vehicle learning-type collaborative energy management method according to claim 3, wherein in step S2, the power cell model is established as:

wherein ,I_L Expressed as power cell current; v _oc Is shown asOpen circuit voltage of the power battery; r _in Expressed as the equivalent internal resistance of the power battery; SOC ₀ Expressed as initial SOC; q _t Expressed as the maximum capacity of the power battery; t is t ₀ Expressed as an initial time; t is t _f Denoted as the final time instant.

6. The fuel cell vehicle learning-type collaborative energy management method according to claim 3, wherein in step S2, the motor model is established as:

η _m ＝f _m (ω _m ,T _m )

wherein ,ω_m and T_m Respectively representing the rotating speed and the torque of the motor; p _m Expressed as motor output power, f _m And (v) representing a fitting function of the working efficiency of the motor, and obtaining the working efficiency of the motor by an interpolation method.

7. The fuel cell vehicle learning-type collaborative energy management method according to claim 1, wherein in step S2, the air conditioning system model is established as follows:

wherein ,Q_ac Expressed as a cooling capacity or a heating capacity of the air conditioning system; p is _ac Expressed as the corresponding power consumption of the air conditioning system; eta _cop Expressed as the air conditioning system coefficient of performance.

8. The fuel cell vehicle learning-type collaborative energy management method according to claim 1, wherein in step S2, the vehicle cabin thermal load model is established as follows:

Q _c ＝∑KF(T _out -T _in )

Q _h ＝145+116n

Q _n ＝m _e ξCp _air (T _out -T _in )

wherein ,Q_c 、Q _r 、Q _h and Q_n Respectively representing heat conduction load, radiation heat load, heat generated by people in the vehicle and heat load of a ventilation system; k is expressed as the heat transfer coefficient; f denotes the heat transfer area of the respective housing; t is a unit of _out Expressed as ambient temperature; t is _in Expressed as cabin air temperature; η is expressed as permeability; i represents the intensity of sunlight; a. The _i Expressed as windshield, left and right side windows, and rear window area; theta _i Expressed as the sunlight incident angle; β is expressed as a shading factor; n represents the number of passengers in the vehicle; m is a unit of _e Expressed as the mass of air passing through the evaporator; ξ is expressed as the air recirculation coefficient; cp _air Expressed as indoor air heat capacity; rho _air and V_air Respectively, as air density and cabin volume in the cabin.

9. The fuel cell vehicle learning type collaborative energy management method according to claim 1, wherein in step S3, a fuel cell vehicle collaborative energy management optimization control strategy considering an air conditioning system is established, and specifically comprises the following steps:

s301: determining a state space: the SOC of the power battery and the output power P of the fuel battery _fc Vehicle speed v, cooling/heating capacity Q of air conditioning system _ac Set as state variables, construct a state space S, represented as:

S＝{SOC,P _fc ,v,Q _ac }

s302: determining an action space: will be provided withVariation of output power of fuel cell

And air conditioning system cooling/heating capacity variation

Setting as an action variable, constructing an action space A, expressed as:

s303: establishing a reward function: the reward function R is set as a weighted sum of three indicators, hydrogen consumption, SOC variation and cabin temperature variation, expressed as:

R＝-(ζ·fuel(t)+ψ·(SOC(t)-0.7) ² +γ·(T _in -24) ² )

zeta, psi and gamma are weight factors of various optimization items, and the problem of compromise between hydrogen energy consumption and cabin temperature comfort is solved by adjusting the weight factors, so that the multi-objective optimization problem is solved; fuel (t) represents the amount of hydrogen energy consumption at the present time; the SOC (t) represents the state of charge of the power battery at the present time.

10. The fuel cell automobile learning type collaborative energy management method according to claim 9, wherein in the step S3, a multi-objective optimization problem including hydrogen combustion economy and cabin temperature comfort is solved by combining with a SAC algorithm, and the method specifically includes the following steps:

H(π(·|s _t ))＝-logπ(·|s _t )

wherein H is strategy pi (· | s) _t ) Entropy of (d);

s312: solving processIn the agent, the actor network is in state s _t As input, the mean and variance of the Gaussian distribution of the motion are output, and the motion a is generated by using a re-parameterization technology _t ：

representing the mean and variance of the function output;

and

respectively representing the mean and variance of the Gaussian distribution;

s313: performing action a _t Thereafter, the vehicle environment feeds back a reward r to the agent _t And transition to the next state s _t+1 I.e. generating interaction data s of the environment and the agent _t ,a _t ,r _t ,s _t+1 And stored in an experience pool

Performing the following steps;

s314: randomly extracting a small batch of experience samples from an experience pool, and introducing a parameter theta ₁ ,θ ₂ Is the evaluation critic network and the parameter is θ' ₁ ，θ′ ₂ The target critic network selects the target critic network to output a smaller action state function value as a target value; for a particular state s _t And action a _t Soft constrained action value function Q in SAC algorithm _soft (s _t ,a _t ) The update formula is as follows:

wherein r represents a reward earned by the vehicle; gamma represents a discount factor; α represents a temperature coefficient;

And

mean square error between, expressed as:

wherein ,

the network parameter of the evaluation critic is represented as theta _i The evaluation function of the time of day,

representing that the network parameter of the target comment family is theta' _i An evaluation function of time;

s316: updating actor network parameters is realized by minimizing KL divergence; objective function of actor network

Is defined as:

indicating the state s of the vehicle at the present moment _t And performing action a _t A mathematical expectation function of time;

indicates that the current state is s _t The function of the policy in time,

parameters expressed as policy functions;

wherein ,

expressed in terms of policy function parameters

The gradient of the fall of (a) is,

s318: the minimum value of the objective function in the optimization problem is solved, so that the optimal temperature coefficient of each step can be obtained through updating, and the objective function is expressed as: