CN116424332B

CN116424332B - Energy management strategy enhancement updating method for deep reinforcement learning type hybrid electric vehicle

Info

Publication number: CN116424332B
Application number: CN202310378883.9A
Authority: CN
Inventors: 唐小林; 陈佳信; 杨为; 胡晓松; 杨亚联; 谢翌; 李佳承
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2023-04-10
Filing date: 2023-04-10
Publication date: 2023-11-21
Anticipated expiration: 2043-04-10
Also published as: CN116424332A

Abstract

The invention relates to a method for enhancing and updating an energy management strategy of a deep reinforcement learning type hybrid electric vehicle, and belongs to the technical field of hybrid electric vehicles. The method comprises the following steps: s1: acquiring historical speed data of different types of vehicles; s2: dividing the acquired data into initial, reinforced and final stages, and then merging to generate a speed state transfer characteristic matrix of the corresponding stage; s3: generating a characteristic driving condition based on a state sequence according to the speed state transfer characteristic matrix, and training an energy management strategy of the deep reinforcement learning type hybrid electric vehicle; s4: defining a variable space and a reward function required by strategy training, and realizing joint simulation training by using an m file of Matlab as a data interface; s5: and finishing an online enhanced update type iterative training process of the deep reinforcement learning type hybrid electric vehicle energy management strategy, downloading the latest strategy after training, and loading the latest strategy into a hybrid electric system model for subsequent testing.

Description

Energy management strategy enhancement updating method for deep reinforcement learning type hybrid electric vehicle

Technical Field

The invention belongs to the technical field of hybrid electric vehicles, and relates to a state sequence working condition based deep reinforcement learning type hybrid electric vehicle energy management strategy reinforcement updating method.

Background

The global automobile industry is coming into the new development opportunities, and the technologies of new energy sources, intellectualization and the like bring great change to the power system and control of automobiles. New energy automobiles have been regarded as an important measure for realizing energy transformation and relieving energy crisis. Currently, mainstream and emerging automobile manufacturers are in a state of providing corresponding pure electric automobiles, hybrid electric automobiles and fuel cell automobiles. The pure electric automobile can attract the attention of consumers by low charging price and environment-friendly driving mode and meeting the demands of people for traveling in urban areas. However, the public is still more concerned about the endurance mileage, charging facilities, and the degree of security assurance. Although the pure electric vehicle can replace the traditional fuel vehicle to become a main vehicle in the future, the key technology needs to be further improved. Fuel cell automobiles use hydrogen instead of gasoline to generate electricity and drive motors, and are regarded as the main power system of future commercial vehicles in the middle, united states, europe, and the like. At present, the hybrid electric vehicle has the most mature technical level, can meet the requirements of driving mileage, convenient energy supplementing, energy saving, emission reduction and the like, is an ideal transition product, and occupies the sales share of the new energy vehicle market for a long time.

For the technical route of the hybrid electric vehicle, the model selection and parameter matching of the power assembly are finished in the initial stage, and a solution is required to be determined according to the service environment and the client requirements of the hybrid electric vehicle. The energy management strategy is one of core technologies for realizing energy conservation and emission reduction and improving fuel economy of the hybrid power system. The main principle is that the power flow is reasonably distributed among a plurality of power sources while meeting the requirements and constraints of a power system, so that the expected optimization target is achieved. In addition, some researches are beginning to consider other important factors affecting the operation of the power system, such as battery aging, motor heating, etc., so that the energy management strategy becomes a control strategy gradually considering the whole vehicle operation environment. Generally, a reliable set of energy management strategies can be designed by using the experience of researchers or experts to form rule-based energy management strategies, and optimization algorithms such as dynamic planning, pontriful minimum principle, equivalent fuel consumption minimization strategies, model predictive control and the like can also be adopted to obtain the optimization-based energy management strategies. However, both of the above energy management strategies have drawbacks in terms of adaptability, computational efficiency, optimization effects, etc.

Disclosure of Invention

In view of the above, the invention aims to provide a fully new training concept which is more suitable for the principle of reinforcement learning algorithm for the energy management strategy of the hybrid electric vehicle based on deep reinforcement learning, and provides a deep reinforcement learning type control strategy reinforcement updating method based on state sequence working conditions (but not time sequence speed working conditions) by adopting a joint simulation form of an agent model in a Python environment and a hybrid electric system model in a Simulink environment, so that the control strategy obtained by final training has a more perfect application effect.

In order to achieve the above purpose, the present invention provides the following technical solutions:

the method for enhancing and updating the energy management strategy of the deep reinforcement learning type hybrid electric vehicle specifically comprises the following steps:

s1: different types of vehicle historical speed data are obtained through diversified driving information sources, and automatic driving simulation software (CARLA) simulation data, a real driving data set DBNet, a racing type electronic game Gran Turismo and standard working condition (HWEFT, US06, WLTC and the like) data which are specially used for testing the vehicle performance are mainly covered;

s2: dividing each acquired vehicle history speed data into three stages (an initial stage, a strengthening stage and a final stage) respectively, and then merging to generate a speed state transition feature matrix under the corresponding stage;

s3: generating a characteristic driving working condition based on a state sequence according to a speed state transition characteristic matrix generated by vehicle historical speed data, and training an energy management strategy of the deep reinforcement learning type hybrid electric vehicle;

s4: the method is oriented to a deep reinforcement learning type hybrid electric vehicle energy management strategy, a state space S, an action space A and a reward function R required by a training process are defined, and a m file of Matlab is used as a data interface to realize the joint simulation training of a deep reinforcement learning type intelligent agent in a Python environment and a parallel hybrid electric system in a Simulink environment;

s5: and (3) completing an online enhancement and updating iterative training process of the deep reinforcement learning type hybrid electric vehicle energy management strategy based on a cloud server (such as a Tengmao cloud virtual machine), downloading the latest hybrid electric vehicle energy management strategy after training is finished, and loading the latest hybrid electric vehicle energy management strategy into a hybrid electric system model for subsequent testing.

Further, in step S1, the acquired different types of vehicle history speed data include:

(1) Based on virtual simulated autopilot data, source calla (autopilot study simulator): based on the official vehicles and the map as the environment, controlling the vehicles to run in the area through the automatic driving function, wherein the environment where the target vehicle is located comprises surrounding vehicles, pedestrians and traffic management equipment, and further obtaining simulation speed data representing the automatic driving control characteristics;

(2) Vehicle speed data for a real human driver, source DBNet: downloading a data set which is issued by a real driver in the urban area range on the internet (Shanghai university of traffic) to acquire real speed data capable of representing the human driving characteristics;

(3) Based on the speed data of the racing class electronic game, the source Gran Turismo: the simulation speed data which fully characterizes the vehicle in the racing environment is obtained according to the difference of the driving styles of the track, the vehicle and the player by running Gran Turismo Sport a real driving simulator on a PlayStation platform;

(4) Standard operating conditions specially used for testing vehicle performance: several standard speed conditions commonly used in the field of vehicle testing are selected for merging, including HWEFT, US06, WLTC and the like, and real speed data issued by authorities are acquired.

Further, in step S2, a speed state transition feature matrix at different stages is generated, which specifically includes the following steps:

s21: based on the four types of vehicle historical speed data, dividing the vehicle historical speed data into three stages, namely an initial stage, a strengthening stage and a final stage by taking time as a standard; over time, when a driver enters a strange driving environment or repeatedly runs in a known driving environment for a plurality of times, the driving habit and the driving style can be changed, and the change is sequentially used as a main basis of a dividing stage;

s22: combining the four types of vehicle historical speed data according to different stages to form a complete speed working condition;

s23: the speed transfer characteristic matrixes corresponding to the three stages (initial stage, strengthening stage and final stage) are respectively constructed, and four types of vehicle speed data contained in the speed transfer characteristic matrixes can reflect more comprehensive driving characteristics.

Further, in step S3, training the deep reinforcement learning type hybrid electric vehicle energy management strategy by using the generated characteristic working condition based on the state sequence, specifically including the following steps:

s31: the vehicle history speed data cover part of the range of the state transition matrix by the speed state transition characteristic matrix in the initial stage, and then an envelope curve is obtained from the known range, and the related area of the envelope curve can embody the history driving characteristics of the vehicle, namely the driving behavior which is experienced at present, and can judge whether a driver has habits such as high-speed driving, rapid acceleration, rapid deceleration and the like;

s32: acquiring boundary state transfer characteristic points of a known driving characteristic range based on the envelope region, and randomly generating a plurality of discrete state transfer characteristic points in the envelope region;

s33: connecting boundary points of an envelope area with internal random points by taking acceleration change and vehicle speed change conditions as indexes to jointly construct a speed track generated based on speed transfer characteristic points, namely a state sequence driving condition;

s34: when the vehicle enters a new driving environment, new driving habits, i.e. new speed transfer features, may be generated, whereby the previous envelope will be extended. Thus, the enhanced morphology driving regime is generated periodically or aperiodically through the extended speed transfer characteristic envelope region generated by the enhanced phase, whereas the morphology driving regime generated by the theoretical final phase would include all the speed transfer characteristics of the vehicle.

Further, in step S4, a variable space and a reward function required for the training process are defined, specifically including: in order to ensure that a hybrid electric vehicle energy management strategy based on an equivalent fuel consumption minimum strategy can be issued with MathWork authorities to have a fair setting condition, by taking optimizing the fuel economy of the hybrid electric vehicle as a main target and utilizing a depth value network suitable for discrete control tasks as a main control algorithm, a state space S, an action space A and a reward function R involved in the training process are defined as follows:

S＝(T _wheel ,SOC,Voc _batt ,Gear _trans ,ω _mot ,Vel _car ,Temp _env )

A＝Throttle＝[0,0.1,0.2,......,0.9,1]

wherein T is _wheel Is the torque demand at the vehicle, SOC is the battery state of charge, voc _batt Is the open circuit voltage of the battery, gear _trans Is the gear of the speed changer, omega _mot Is the motor speed, vel _car Is the longitudinal speed of the vehicle, temp _env Is ambient temperature, defined as a constant 313K; throttle is Throttle, discretized into 11 points of action [0,0.1,0.2, …,0.9,1]The method comprises the steps of carrying out a first treatment on the surface of the Alpha, beta and gamma are weight coefficients, T _eng Is engine torque, n _eng Is the rotational speed of the engine,is the instantaneous oil consumption of the engine, BSFC(s) is the effective fuel consumption rate, SOC _target Is the target battery state of charge.

Further, in step S4, the joint simulation setup specifically includes: respectively writing four m files with the purposes of opening a model, transmitting data, continuing to run the model and closing the model in a Matlab environment as function files for joint simulation data interaction; therefore, a control command DRL_action of the deep reinforcement learning type agent in the Python environment is transmitted to the hybrid power system, and the state parameters of the hybrid power system after the control command is executed in the Simulink environment are transmitted to the deep reinforcement learning type agent; the state parameters include brake fuel consumption rate BSFC, instantaneous fuel consumption FuelFlw, battery state of charge BattSoc, battery voltage BattV, transmission gear, longitudinal running speed xdot, motor rotation speed MotSpd, motor torque MotTrq, engine rotation speed EngSpd, engine torque EngTrq, ambient temperature Temp, wheel demand torque whtrq, and simulation time SimuTime.

Further, in step S5, after the total jackpot function is in the stable maximum convergence state, training is ended.

The invention has the beneficial effects that: aiming at the hybrid electric vehicle and the corresponding deep reinforcement learning type energy management strategy, the invention adopts an entirely new training concept which is more suitable for the reinforcement learning algorithm principle, and the control strategy obtained by the final training has a more perfect application effect by taking the state sequential speed working condition rather than the time sequential speed working condition as the data base.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:

FIG. 1 is an overall flow chart of a hybrid vehicle energy management strategy enhancement update method of the present invention;

FIG. 2 is an overall frame diagram of a hybrid vehicle energy management strategy enhancement update method of the present invention;

FIG. 3 is a graph of diversified vehicle historic speed data, wherein (a) is CARLA-based driving data, (b) is DBNet-based driving data, (c) is Gran Turismo-based driving data, and (d) is standard operating mode-based driving data (HWEFT, US06, WLTC);

FIG. 4 is a velocity transfer feature matrix co-constructed from four classes of vehicle historic velocity data;

FIG. 5 is a block diagram of a depth value network algorithm;

FIG. 6 is a schematic diagram of a joint simulation data interface.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.

Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.

Referring to fig. 1 to 6, the present invention provides a method for enhancing and updating an energy management policy of a deep reinforcement learning type hybrid electric vehicle based on a state sequence working condition, wherein a flow is shown in fig. 1, and a frame is shown in fig. 2. The method specifically comprises the following steps:

s1: different types of vehicle historical speed data are obtained through diversified driving information sources, and automatic driving simulation software (CARLA) simulation data, a real driving data set DBNet, a racing type electronic game Gran Turismo and standard working conditions (HWEFT, US06, WLTC and the like) special for testing vehicle performance are mainly covered;

(1) Virtual simulation-based automatic driving data source CARLA: based on the official vehicles and the map, controlling the vehicles to run in the area through the automatic driving function, wherein the environment where the target vehicle is located comprises surrounding vehicles, pedestrians and traffic management equipment, and further obtaining simulation speed data representing the automatic driving control characteristics, as shown in fig. 3 (a);

(2) Real human driver-oriented vehicle speed data source DBNet: downloading a data set issued by Shanghai university to drive a real driver in the urban area range, and acquiring real speed data capable of representing the human driving characteristics, as shown in fig. 3 (b);

(3) Vehicle speed data source Gran Turismo based on racing class electronic games: by running Gran Turismo Sport a real driving simulator on the PlayStation platform, according to the difference of the driving styles of the track, the vehicle and the player, obtaining simulation speed data which fully characterizes the vehicle in a racing environment, as shown in fig. 3 (c);

(4) Standard operating conditions specially used for testing vehicle performance: several standard speed conditions commonly used in the test field are selected for merging, including HWEFT, US06, WLTC and the like, and the true speed data issued by authorities is acquired, as shown in fig. 3 (d).

S2: dividing each acquired vehicle history speed data into three stages (an initial stage, a strengthening stage and a final stage) respectively, and then merging to jointly construct a corresponding speed state transition feature matrix; the method specifically comprises the following steps:

s21: based on the four types of vehicle history speed information, the vehicle history speed information is divided into three stages, namely an initial stage, a strengthening stage and a final stage by taking time as a standard. Over time, when a driver enters a strange driving environment or repeatedly runs in a known driving environment for a plurality of times, the driving habit and the driving style can be changed, and the change is sequentially used as a main basis of a dividing stage;

s22: combining the four types of vehicle historical speed information according to different stages to form a complete speed working condition;

s23: the speed transfer feature matrices corresponding to the three phases (initial phase, reinforcement phase, final phase) are respectively constructed, and four kinds of vehicle speed information contained therein can reflect more comprehensive driving features, as shown in fig. 4.

S3: generating a characteristic driving working condition based on a state sequence according to a speed state transition characteristic matrix generated by vehicle historical speed data, and training an energy management strategy of the deep reinforcement learning type hybrid electric vehicle; the method specifically comprises the following steps:

s31: the method comprises the steps that a speed transfer characteristic matrix in an initial stage is used, historical vehicle speed data cover a part of the range of a state transfer matrix, an envelope curve is obtained from the known range, the related area of the curve can embody the historical driving characteristics of a vehicle, namely the driving behavior which is experienced at present, and whether a driver has habits such as high-speed driving, rapid acceleration and rapid deceleration or not can be judged;

S4: the energy management strategy of the hybrid electric vehicle facing deep reinforcement learning defines a state space S, an action space A and a reward function R required by a training process, and an interface environment and an interaction scheme facing joint simulation are set;

in order to ensure that the energy management strategy of the hybrid electric vehicle based on the equivalent fuel consumption minimum strategy (ECMS) can be issued with the MathWork authorities to have fair setting conditions, by taking the optimization of the fuel economy of the hybrid electric vehicle as a main goal and utilizing a depth value network suitable for discrete control tasks as a main control algorithm, as shown in fig. 5, a state space S, an action space A and a reward function R involved in the training process are defined as follows:

S＝(T _wheel ,SOC,Voc _batt ,Gear _trans ,ω _mot ,Vel _car ,Temp _env )

A＝Throttle＝[0,0.1,0.2,......,0.9,1]

wherein T is _wheel Is the torque demand at the vehicle, SOC is the battery state of charge, voc _batt Is the open circuit voltage of the battery, gear _trans Is the gear of the speed changer, omega _mot Is the motor speed, vel _car Is the longitudinal speed of the vehicle, temp _env Is ambient temperature, defined as a constant 313K; throttle is the Throttle, discretized into 11 operating points [0,0.1,0.2, …,0.9,1]the method comprises the steps of carrying out a first treatment on the surface of the Alpha, beta and gamma are weight coefficients, T _eng Is engine torque, n _eng Is the rotational speed of the engine,is the instantaneous oil consumption of the engine, BSFC(s) is the effective fuel consumption rate, SOC _target Is the target battery state of charge.

And then, respectively writing four m files with the purposes of opening the model, transmitting data, continuing to run the model and closing the model in a Matlab environment as function files for joint simulation data interaction. Thus, as shown in fig. 6, the control command drl_action of the deep reinforcement learning agent in the Python environment is transmitted to the hybrid system, and the hybrid system state parameter after the control command is executed in the Simulink environment is transmitted to the deep reinforcement learning agent. The state parameters include a brake fuel consumption rate BSFC, an instantaneous fuel consumption amount FuelFlw, a battery state of charge BattSoc, a battery voltage BattV, a transmission gear, a longitudinal running speed xdot, a motor rotation speed MotSpd, a motor torque MotTrq, an engine rotation speed EngSpd, an engine torque EngTrq, an environmental temperature Temp, a wheel demand torque whlrq, and a simulation time SimuTime.

S5: the online enhanced update type iterative training process of the deep reinforcement learning type control strategy is completed based on the Tencent cloud virtual machine, and after training is finished, the latest type control strategy is downloaded and loaded into the hybrid power system model for subsequent testing. The method specifically comprises the following steps:

s51: purchasing the use right of the communication cloud virtual server shown in the table 1, and uploading the energy management strategy training program of the hybrid electric vehicle based on deep reinforcement learning and the state sequence speed characteristic working conditions of the corresponding stage;

TABLE 1 Tencent cloud Server configuration

S52: configuring a training environment, and installing and setting a Python/TensorFlow required by a deep reinforcement learning algorithm and a Matlab/Simulink environment required by a hybrid power system;

s53: and (3) performing iterative trial-and-error updating on the control strategy in a cloud server environment, downloading a neural network parameter file corresponding to the energy management strategy to a local environment after the total accumulated rewarding function is in a stable maximum convergence state, and loading the latest energy management strategy into a hybrid power system model in a brand new test environment for subsequent testing and verification.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. The method for enhancing and updating the energy management strategy of the deep reinforcement learning type hybrid electric vehicle is characterized by comprising the following steps of:

s1: acquiring different types of vehicle historical speed data through diversified driving information sources;

s2: dividing each acquired historical speed data of the vehicle into three stages respectively, and then merging to generate a speed state transition feature matrix under the corresponding stage; the method specifically comprises the following steps:

s21: based on the four types of vehicle historical speed data, dividing the vehicle historical speed data into three stages, namely an initial stage, a strengthening stage and a final stage by taking time as a standard;

s23: respectively constructing speed transfer feature matrixes corresponding to three stages, wherein four types of vehicle speed data can reflect more comprehensive driving features;

defining the variable space and the rewarding function required by the training process, specifically comprising: the state space S, action space a and reward function R involved in the training process are defined as follows:

S＝(T _wheel ,SOC,Voc _batt ,Gear _trans ,ω _mot ,Vel _car ,Temp _env )

A＝Throttle＝[0,0.1,0.2,......,0.9,1]

wherein T is _wheel Is the torque demand at the vehicle, SOC is the battery state of charge, voc _batt Is the open circuit voltage of the battery, gear _trans Is the gear of the speed changer, omega _mot Is the motor speed, vel _car Is the longitudinal speed of the vehicle, temp _env Is ambient temperature; throttle is Throttle, discretized into 11 points of action [0,0.1,0.2, …,0.9,1]The method comprises the steps of carrying out a first treatment on the surface of the Alpha, beta and gamma are weight coefficients, T _eng Is engine torque, n _eng Is the rotational speed of the engine,is the instantaneous oil consumption of the engine, BSFC(s) is the effective fuel consumption rate, SOC _target Is the target battery state of charge;

s5: and (3) completing an online enhanced and updated iterative training process of the deep reinforcement learning type hybrid electric vehicle energy management strategy based on the cloud server, downloading the latest hybrid electric vehicle energy management strategy after training is finished, and loading the latest hybrid electric vehicle energy management strategy into a hybrid electric system model for subsequent testing.

2. The hybrid vehicle energy management strategy enhancement updating method according to claim 1, wherein in step S1, the acquired different types of vehicle history speed data include:

(1) Automatic driving data based on virtual simulation, source CARLA: based on the official vehicles and the map as the environment, controlling the vehicles to run in the area through the automatic driving function, wherein the environment where the target vehicle is located comprises surrounding vehicles, pedestrians and traffic management equipment, and further obtaining simulation speed data representing the automatic driving control characteristics;

(2) Vehicle speed data for a real human driver, source DBNet: downloading a data set which is issued by a real driver and runs in the urban area range on the internet, and acquiring real speed data capable of representing the human driving characteristics;

(4) Standard operating conditions specially used for testing vehicle performance: several standard speed working conditions commonly used in the field of vehicle testing are selected for combination, and real speed data issued by authorities are obtained.

3. The method for enhancing and updating the energy management strategy of the hybrid electric vehicle according to claim 1, wherein in step S3, the deep reinforcement learning type hybrid electric vehicle energy management strategy is trained by using the generated characteristic working condition based on the state sequence, and specifically comprises the following steps:

s31: the method comprises the steps that a characteristic matrix is transferred in a speed state in an initial stage, the historical speed data of a vehicle covers a part of the range of the state transfer matrix, further, an envelope curve is obtained for a known range, and the related area of the envelope curve can embody the historical driving characteristics of the vehicle;

s34: when the vehicle enters a new driving environment, the enhanced driving condition is generated periodically or aperiodically through the expanded speed transfer characteristic envelope region generated in the enhanced stage.

4. The method for enhancing and updating the energy management strategy of the hybrid electric vehicle according to claim 1, wherein in step S4, the joint simulation setting specifically includes: respectively writing four m files with the purposes of opening a model, transmitting data, continuing to run the model and closing the model in a Matlab environment as function files for joint simulation data interaction; therefore, a control command DRL_action of the deep reinforcement learning type agent in the Python environment is transmitted to the hybrid power system, and the state parameters of the hybrid power system after the control command is executed in the Simulink environment are transmitted to the deep reinforcement learning type agent; the state parameters include brake fuel consumption rate BSFC, instantaneous fuel consumption FuelFlw, battery state of charge BattSoc, battery voltage BattV, transmission gear, longitudinal running speed xdot, motor rotation speed MotSpd, motor torque MotTrq, engine rotation speed EngSpd, engine torque EngTrq, ambient temperature Temp, wheel demand torque whtrq, and simulation time SimuTime.

5. The method of claim 1, wherein in step S5, training is completed when the total jackpot function is in a stable maximum convergence state.