CN116424332B - Energy management strategy enhancement updating method for deep reinforcement learning type hybrid electric vehicle - Google Patents
Energy management strategy enhancement updating method for deep reinforcement learning type hybrid electric vehicle Download PDFInfo
- Publication number
- CN116424332B CN116424332B CN202310378883.9A CN202310378883A CN116424332B CN 116424332 B CN116424332 B CN 116424332B CN 202310378883 A CN202310378883 A CN 202310378883A CN 116424332 B CN116424332 B CN 116424332B
- Authority
- CN
- China
- Prior art keywords
- vehicle
- speed
- hybrid electric
- energy management
- driving
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000002787 reinforcement Effects 0.000 title claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 32
- 238000012546 transfer Methods 0.000 claims abstract description 26
- 238000004088 simulation Methods 0.000 claims abstract description 22
- 230000006870 function Effects 0.000 claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims abstract description 16
- 238000012360 testing method Methods 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims abstract description 12
- 230000002708 enhancing effect Effects 0.000 claims abstract description 6
- 239000000446 fuel Substances 0.000 claims description 18
- 230000007704 transition Effects 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 9
- 239000003795 chemical substances by application Substances 0.000 claims description 9
- 230000009471 action Effects 0.000 claims description 8
- 238000005728 strengthening Methods 0.000 claims description 6
- 230000001133 acceleration Effects 0.000 claims description 5
- 230000003993 interaction Effects 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000007726 management method Methods 0.000 description 29
- 238000011217 control strategy Methods 0.000 description 7
- 238000005457 optimization Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 241000189662 Calla Species 0.000 description 1
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 1
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000004134 energy conservation Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W20/00—Control systems specially adapted for hybrid vehicles
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0002—Automatic control, details of type of controller or control system architecture
- B60W2050/0004—In digital systems, e.g. discrete-time systems involving sampling
- B60W2050/0005—Processor details or data handling, e.g. memory registers or chip architecture
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0019—Control system elements or transfer functions
- B60W2050/0028—Mathematical models, e.g. for simulation
- B60W2050/0031—Mathematical model of the vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Mechanical Engineering (AREA)
- Transportation (AREA)
- Artificial Intelligence (AREA)
- Automation & Control Theory (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Human Computer Interaction (AREA)
- Geometry (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Electric Propulsion And Braking For Vehicles (AREA)
Abstract
The invention relates to a method for enhancing and updating an energy management strategy of a deep reinforcement learning type hybrid electric vehicle, and belongs to the technical field of hybrid electric vehicles. The method comprises the following steps: s1: acquiring historical speed data of different types of vehicles; s2: dividing the acquired data into initial, reinforced and final stages, and then merging to generate a speed state transfer characteristic matrix of the corresponding stage; s3: generating a characteristic driving condition based on a state sequence according to the speed state transfer characteristic matrix, and training an energy management strategy of the deep reinforcement learning type hybrid electric vehicle; s4: defining a variable space and a reward function required by strategy training, and realizing joint simulation training by using an m file of Matlab as a data interface; s5: and finishing an online enhanced update type iterative training process of the deep reinforcement learning type hybrid electric vehicle energy management strategy, downloading the latest strategy after training, and loading the latest strategy into a hybrid electric system model for subsequent testing.
Description
Technical Field
The invention belongs to the technical field of hybrid electric vehicles, and relates to a state sequence working condition based deep reinforcement learning type hybrid electric vehicle energy management strategy reinforcement updating method.
Background
The global automobile industry is coming into the new development opportunities, and the technologies of new energy sources, intellectualization and the like bring great change to the power system and control of automobiles. New energy automobiles have been regarded as an important measure for realizing energy transformation and relieving energy crisis. Currently, mainstream and emerging automobile manufacturers are in a state of providing corresponding pure electric automobiles, hybrid electric automobiles and fuel cell automobiles. The pure electric automobile can attract the attention of consumers by low charging price and environment-friendly driving mode and meeting the demands of people for traveling in urban areas. However, the public is still more concerned about the endurance mileage, charging facilities, and the degree of security assurance. Although the pure electric vehicle can replace the traditional fuel vehicle to become a main vehicle in the future, the key technology needs to be further improved. Fuel cell automobiles use hydrogen instead of gasoline to generate electricity and drive motors, and are regarded as the main power system of future commercial vehicles in the middle, united states, europe, and the like. At present, the hybrid electric vehicle has the most mature technical level, can meet the requirements of driving mileage, convenient energy supplementing, energy saving, emission reduction and the like, is an ideal transition product, and occupies the sales share of the new energy vehicle market for a long time.
For the technical route of the hybrid electric vehicle, the model selection and parameter matching of the power assembly are finished in the initial stage, and a solution is required to be determined according to the service environment and the client requirements of the hybrid electric vehicle. The energy management strategy is one of core technologies for realizing energy conservation and emission reduction and improving fuel economy of the hybrid power system. The main principle is that the power flow is reasonably distributed among a plurality of power sources while meeting the requirements and constraints of a power system, so that the expected optimization target is achieved. In addition, some researches are beginning to consider other important factors affecting the operation of the power system, such as battery aging, motor heating, etc., so that the energy management strategy becomes a control strategy gradually considering the whole vehicle operation environment. Generally, a reliable set of energy management strategies can be designed by using the experience of researchers or experts to form rule-based energy management strategies, and optimization algorithms such as dynamic planning, pontriful minimum principle, equivalent fuel consumption minimization strategies, model predictive control and the like can also be adopted to obtain the optimization-based energy management strategies. However, both of the above energy management strategies have drawbacks in terms of adaptability, computational efficiency, optimization effects, etc.
Disclosure of Invention
In view of the above, the invention aims to provide a fully new training concept which is more suitable for the principle of reinforcement learning algorithm for the energy management strategy of the hybrid electric vehicle based on deep reinforcement learning, and provides a deep reinforcement learning type control strategy reinforcement updating method based on state sequence working conditions (but not time sequence speed working conditions) by adopting a joint simulation form of an agent model in a Python environment and a hybrid electric system model in a Simulink environment, so that the control strategy obtained by final training has a more perfect application effect.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the method for enhancing and updating the energy management strategy of the deep reinforcement learning type hybrid electric vehicle specifically comprises the following steps:
s1: different types of vehicle historical speed data are obtained through diversified driving information sources, and automatic driving simulation software (CARLA) simulation data, a real driving data set DBNet, a racing type electronic game Gran Turismo and standard working condition (HWEFT, US06, WLTC and the like) data which are specially used for testing the vehicle performance are mainly covered;
s2: dividing each acquired vehicle history speed data into three stages (an initial stage, a strengthening stage and a final stage) respectively, and then merging to generate a speed state transition feature matrix under the corresponding stage;
s3: generating a characteristic driving working condition based on a state sequence according to a speed state transition characteristic matrix generated by vehicle historical speed data, and training an energy management strategy of the deep reinforcement learning type hybrid electric vehicle;
s4: the method is oriented to a deep reinforcement learning type hybrid electric vehicle energy management strategy, a state space S, an action space A and a reward function R required by a training process are defined, and a m file of Matlab is used as a data interface to realize the joint simulation training of a deep reinforcement learning type intelligent agent in a Python environment and a parallel hybrid electric system in a Simulink environment;
s5: and (3) completing an online enhancement and updating iterative training process of the deep reinforcement learning type hybrid electric vehicle energy management strategy based on a cloud server (such as a Tengmao cloud virtual machine), downloading the latest hybrid electric vehicle energy management strategy after training is finished, and loading the latest hybrid electric vehicle energy management strategy into a hybrid electric system model for subsequent testing.
Further, in step S1, the acquired different types of vehicle history speed data include:
(1) Based on virtual simulated autopilot data, source calla (autopilot study simulator): based on the official vehicles and the map as the environment, controlling the vehicles to run in the area through the automatic driving function, wherein the environment where the target vehicle is located comprises surrounding vehicles, pedestrians and traffic management equipment, and further obtaining simulation speed data representing the automatic driving control characteristics;
(2) Vehicle speed data for a real human driver, source DBNet: downloading a data set which is issued by a real driver in the urban area range on the internet (Shanghai university of traffic) to acquire real speed data capable of representing the human driving characteristics;
(3) Based on the speed data of the racing class electronic game, the source Gran Turismo: the simulation speed data which fully characterizes the vehicle in the racing environment is obtained according to the difference of the driving styles of the track, the vehicle and the player by running Gran Turismo Sport a real driving simulator on a PlayStation platform;
(4) Standard operating conditions specially used for testing vehicle performance: several standard speed conditions commonly used in the field of vehicle testing are selected for merging, including HWEFT, US06, WLTC and the like, and real speed data issued by authorities are acquired.
Further, in step S2, a speed state transition feature matrix at different stages is generated, which specifically includes the following steps:
s21: based on the four types of vehicle historical speed data, dividing the vehicle historical speed data into three stages, namely an initial stage, a strengthening stage and a final stage by taking time as a standard; over time, when a driver enters a strange driving environment or repeatedly runs in a known driving environment for a plurality of times, the driving habit and the driving style can be changed, and the change is sequentially used as a main basis of a dividing stage;
s22: combining the four types of vehicle historical speed data according to different stages to form a complete speed working condition;
s23: the speed transfer characteristic matrixes corresponding to the three stages (initial stage, strengthening stage and final stage) are respectively constructed, and four types of vehicle speed data contained in the speed transfer characteristic matrixes can reflect more comprehensive driving characteristics.
Further, in step S3, training the deep reinforcement learning type hybrid electric vehicle energy management strategy by using the generated characteristic working condition based on the state sequence, specifically including the following steps:
s31: the vehicle history speed data cover part of the range of the state transition matrix by the speed state transition characteristic matrix in the initial stage, and then an envelope curve is obtained from the known range, and the related area of the envelope curve can embody the history driving characteristics of the vehicle, namely the driving behavior which is experienced at present, and can judge whether a driver has habits such as high-speed driving, rapid acceleration, rapid deceleration and the like;
s32: acquiring boundary state transfer characteristic points of a known driving characteristic range based on the envelope region, and randomly generating a plurality of discrete state transfer characteristic points in the envelope region;
s33: connecting boundary points of an envelope area with internal random points by taking acceleration change and vehicle speed change conditions as indexes to jointly construct a speed track generated based on speed transfer characteristic points, namely a state sequence driving condition;
s34: when the vehicle enters a new driving environment, new driving habits, i.e. new speed transfer features, may be generated, whereby the previous envelope will be extended. Thus, the enhanced morphology driving regime is generated periodically or aperiodically through the extended speed transfer characteristic envelope region generated by the enhanced phase, whereas the morphology driving regime generated by the theoretical final phase would include all the speed transfer characteristics of the vehicle.
Further, in step S4, a variable space and a reward function required for the training process are defined, specifically including: in order to ensure that a hybrid electric vehicle energy management strategy based on an equivalent fuel consumption minimum strategy can be issued with MathWork authorities to have a fair setting condition, by taking optimizing the fuel economy of the hybrid electric vehicle as a main target and utilizing a depth value network suitable for discrete control tasks as a main control algorithm, a state space S, an action space A and a reward function R involved in the training process are defined as follows:
S=(T wheel ,SOC,Voc batt ,Gear trans ,ω mot ,Vel car ,Temp env )
A=Throttle=[0,0.1,0.2,......,0.9,1]
wherein T is wheel Is the torque demand at the vehicle, SOC is the battery state of charge, voc batt Is the open circuit voltage of the battery, gear trans Is the gear of the speed changer, omega mot Is the motor speed, vel car Is the longitudinal speed of the vehicle, temp env Is ambient temperature, defined as a constant 313K; throttle is Throttle, discretized into 11 points of action [0,0.1,0.2, …,0.9,1]The method comprises the steps of carrying out a first treatment on the surface of the Alpha, beta and gamma are weight coefficients, T eng Is engine torque, n eng Is the rotational speed of the engine,is the instantaneous oil consumption of the engine, BSFC(s) is the effective fuel consumption rate, SOC target Is the target battery state of charge.
Further, in step S4, the joint simulation setup specifically includes: respectively writing four m files with the purposes of opening a model, transmitting data, continuing to run the model and closing the model in a Matlab environment as function files for joint simulation data interaction; therefore, a control command DRL_action of the deep reinforcement learning type agent in the Python environment is transmitted to the hybrid power system, and the state parameters of the hybrid power system after the control command is executed in the Simulink environment are transmitted to the deep reinforcement learning type agent; the state parameters include brake fuel consumption rate BSFC, instantaneous fuel consumption FuelFlw, battery state of charge BattSoc, battery voltage BattV, transmission gear, longitudinal running speed xdot, motor rotation speed MotSpd, motor torque MotTrq, engine rotation speed EngSpd, engine torque EngTrq, ambient temperature Temp, wheel demand torque whtrq, and simulation time SimuTime.
Further, in step S5, after the total jackpot function is in the stable maximum convergence state, training is ended.
The invention has the beneficial effects that: aiming at the hybrid electric vehicle and the corresponding deep reinforcement learning type energy management strategy, the invention adopts an entirely new training concept which is more suitable for the reinforcement learning algorithm principle, and the control strategy obtained by the final training has a more perfect application effect by taking the state sequential speed working condition rather than the time sequential speed working condition as the data base.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
FIG. 1 is an overall flow chart of a hybrid vehicle energy management strategy enhancement update method of the present invention;
FIG. 2 is an overall frame diagram of a hybrid vehicle energy management strategy enhancement update method of the present invention;
FIG. 3 is a graph of diversified vehicle historic speed data, wherein (a) is CARLA-based driving data, (b) is DBNet-based driving data, (c) is Gran Turismo-based driving data, and (d) is standard operating mode-based driving data (HWEFT, US06, WLTC);
FIG. 4 is a velocity transfer feature matrix co-constructed from four classes of vehicle historic velocity data;
FIG. 5 is a block diagram of a depth value network algorithm;
FIG. 6 is a schematic diagram of a joint simulation data interface.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.
Referring to fig. 1 to 6, the present invention provides a method for enhancing and updating an energy management policy of a deep reinforcement learning type hybrid electric vehicle based on a state sequence working condition, wherein a flow is shown in fig. 1, and a frame is shown in fig. 2. The method specifically comprises the following steps:
s1: different types of vehicle historical speed data are obtained through diversified driving information sources, and automatic driving simulation software (CARLA) simulation data, a real driving data set DBNet, a racing type electronic game Gran Turismo and standard working conditions (HWEFT, US06, WLTC and the like) special for testing vehicle performance are mainly covered;
(1) Virtual simulation-based automatic driving data source CARLA: based on the official vehicles and the map, controlling the vehicles to run in the area through the automatic driving function, wherein the environment where the target vehicle is located comprises surrounding vehicles, pedestrians and traffic management equipment, and further obtaining simulation speed data representing the automatic driving control characteristics, as shown in fig. 3 (a);
(2) Real human driver-oriented vehicle speed data source DBNet: downloading a data set issued by Shanghai university to drive a real driver in the urban area range, and acquiring real speed data capable of representing the human driving characteristics, as shown in fig. 3 (b);
(3) Vehicle speed data source Gran Turismo based on racing class electronic games: by running Gran Turismo Sport a real driving simulator on the PlayStation platform, according to the difference of the driving styles of the track, the vehicle and the player, obtaining simulation speed data which fully characterizes the vehicle in a racing environment, as shown in fig. 3 (c);
(4) Standard operating conditions specially used for testing vehicle performance: several standard speed conditions commonly used in the test field are selected for merging, including HWEFT, US06, WLTC and the like, and the true speed data issued by authorities is acquired, as shown in fig. 3 (d).
S2: dividing each acquired vehicle history speed data into three stages (an initial stage, a strengthening stage and a final stage) respectively, and then merging to jointly construct a corresponding speed state transition feature matrix; the method specifically comprises the following steps:
s21: based on the four types of vehicle history speed information, the vehicle history speed information is divided into three stages, namely an initial stage, a strengthening stage and a final stage by taking time as a standard. Over time, when a driver enters a strange driving environment or repeatedly runs in a known driving environment for a plurality of times, the driving habit and the driving style can be changed, and the change is sequentially used as a main basis of a dividing stage;
s22: combining the four types of vehicle historical speed information according to different stages to form a complete speed working condition;
s23: the speed transfer feature matrices corresponding to the three phases (initial phase, reinforcement phase, final phase) are respectively constructed, and four kinds of vehicle speed information contained therein can reflect more comprehensive driving features, as shown in fig. 4.
S3: generating a characteristic driving working condition based on a state sequence according to a speed state transition characteristic matrix generated by vehicle historical speed data, and training an energy management strategy of the deep reinforcement learning type hybrid electric vehicle; the method specifically comprises the following steps:
s31: the method comprises the steps that a speed transfer characteristic matrix in an initial stage is used, historical vehicle speed data cover a part of the range of a state transfer matrix, an envelope curve is obtained from the known range, the related area of the curve can embody the historical driving characteristics of a vehicle, namely the driving behavior which is experienced at present, and whether a driver has habits such as high-speed driving, rapid acceleration and rapid deceleration or not can be judged;
s32: acquiring boundary state transfer characteristic points of a known driving characteristic range based on the envelope region, and randomly generating a plurality of discrete state transfer characteristic points in the envelope region;
s33: connecting boundary points of an envelope area with internal random points by taking acceleration change and vehicle speed change conditions as indexes to jointly construct a speed track generated based on speed transfer characteristic points, namely a state sequence driving condition;
s34: when the vehicle enters a new driving environment, new driving habits, i.e. new speed transfer features, may be generated, whereby the previous envelope will be extended. Thus, the enhanced morphology driving regime is generated periodically or aperiodically through the extended speed transfer characteristic envelope region generated by the enhanced phase, whereas the morphology driving regime generated by the theoretical final phase would include all the speed transfer characteristics of the vehicle.
S4: the energy management strategy of the hybrid electric vehicle facing deep reinforcement learning defines a state space S, an action space A and a reward function R required by a training process, and an interface environment and an interaction scheme facing joint simulation are set;
in order to ensure that the energy management strategy of the hybrid electric vehicle based on the equivalent fuel consumption minimum strategy (ECMS) can be issued with the MathWork authorities to have fair setting conditions, by taking the optimization of the fuel economy of the hybrid electric vehicle as a main goal and utilizing a depth value network suitable for discrete control tasks as a main control algorithm, as shown in fig. 5, a state space S, an action space A and a reward function R involved in the training process are defined as follows:
S=(T wheel ,SOC,Voc batt ,Gear trans ,ω mot ,Vel car ,Temp env )
A=Throttle=[0,0.1,0.2,......,0.9,1]
wherein T is wheel Is the torque demand at the vehicle, SOC is the battery state of charge, voc batt Is the open circuit voltage of the battery, gear trans Is the gear of the speed changer, omega mot Is the motor speed, vel car Is the longitudinal speed of the vehicle, temp env Is ambient temperature, defined as a constant 313K; throttle is the Throttle, discretized into 11 operating points [0,0.1,0.2, …,0.9,1]the method comprises the steps of carrying out a first treatment on the surface of the Alpha, beta and gamma are weight coefficients, T eng Is engine torque, n eng Is the rotational speed of the engine,is the instantaneous oil consumption of the engine, BSFC(s) is the effective fuel consumption rate, SOC target Is the target battery state of charge.
And then, respectively writing four m files with the purposes of opening the model, transmitting data, continuing to run the model and closing the model in a Matlab environment as function files for joint simulation data interaction. Thus, as shown in fig. 6, the control command drl_action of the deep reinforcement learning agent in the Python environment is transmitted to the hybrid system, and the hybrid system state parameter after the control command is executed in the Simulink environment is transmitted to the deep reinforcement learning agent. The state parameters include a brake fuel consumption rate BSFC, an instantaneous fuel consumption amount FuelFlw, a battery state of charge BattSoc, a battery voltage BattV, a transmission gear, a longitudinal running speed xdot, a motor rotation speed MotSpd, a motor torque MotTrq, an engine rotation speed EngSpd, an engine torque EngTrq, an environmental temperature Temp, a wheel demand torque whlrq, and a simulation time SimuTime.
S5: the online enhanced update type iterative training process of the deep reinforcement learning type control strategy is completed based on the Tencent cloud virtual machine, and after training is finished, the latest type control strategy is downloaded and loaded into the hybrid power system model for subsequent testing. The method specifically comprises the following steps:
s51: purchasing the use right of the communication cloud virtual server shown in the table 1, and uploading the energy management strategy training program of the hybrid electric vehicle based on deep reinforcement learning and the state sequence speed characteristic working conditions of the corresponding stage;
TABLE 1 Tencent cloud Server configuration
S52: configuring a training environment, and installing and setting a Python/TensorFlow required by a deep reinforcement learning algorithm and a Matlab/Simulink environment required by a hybrid power system;
s53: and (3) performing iterative trial-and-error updating on the control strategy in a cloud server environment, downloading a neural network parameter file corresponding to the energy management strategy to a local environment after the total accumulated rewarding function is in a stable maximum convergence state, and loading the latest energy management strategy into a hybrid power system model in a brand new test environment for subsequent testing and verification.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.
Claims (5)
1. The method for enhancing and updating the energy management strategy of the deep reinforcement learning type hybrid electric vehicle is characterized by comprising the following steps of:
s1: acquiring different types of vehicle historical speed data through diversified driving information sources;
s2: dividing each acquired historical speed data of the vehicle into three stages respectively, and then merging to generate a speed state transition feature matrix under the corresponding stage; the method specifically comprises the following steps:
s21: based on the four types of vehicle historical speed data, dividing the vehicle historical speed data into three stages, namely an initial stage, a strengthening stage and a final stage by taking time as a standard;
s22: combining the four types of vehicle historical speed data according to different stages to form a complete speed working condition;
s23: respectively constructing speed transfer feature matrixes corresponding to three stages, wherein four types of vehicle speed data can reflect more comprehensive driving features;
s3: generating a characteristic driving working condition based on a state sequence according to a speed state transition characteristic matrix generated by vehicle historical speed data, and training an energy management strategy of the deep reinforcement learning type hybrid electric vehicle;
s4: the method is oriented to a deep reinforcement learning type hybrid electric vehicle energy management strategy, a state space S, an action space A and a reward function R required by a training process are defined, and a m file of Matlab is used as a data interface to realize the joint simulation training of a deep reinforcement learning type intelligent agent in a Python environment and a parallel hybrid electric system in a Simulink environment;
defining the variable space and the rewarding function required by the training process, specifically comprising: the state space S, action space a and reward function R involved in the training process are defined as follows:
S=(T wheel ,SOC,Voc batt ,Gear trans ,ω mot ,Vel car ,Temp env )
A=Throttle=[0,0.1,0.2,......,0.9,1]
wherein T is wheel Is the torque demand at the vehicle, SOC is the battery state of charge, voc batt Is the open circuit voltage of the battery, gear trans Is the gear of the speed changer, omega mot Is the motor speed, vel car Is the longitudinal speed of the vehicle, temp env Is ambient temperature; throttle is Throttle, discretized into 11 points of action [0,0.1,0.2, …,0.9,1]The method comprises the steps of carrying out a first treatment on the surface of the Alpha, beta and gamma are weight coefficients, T eng Is engine torque, n eng Is the rotational speed of the engine,is the instantaneous oil consumption of the engine, BSFC(s) is the effective fuel consumption rate, SOC target Is the target battery state of charge;
s5: and (3) completing an online enhanced and updated iterative training process of the deep reinforcement learning type hybrid electric vehicle energy management strategy based on the cloud server, downloading the latest hybrid electric vehicle energy management strategy after training is finished, and loading the latest hybrid electric vehicle energy management strategy into a hybrid electric system model for subsequent testing.
2. The hybrid vehicle energy management strategy enhancement updating method according to claim 1, wherein in step S1, the acquired different types of vehicle history speed data include:
(1) Automatic driving data based on virtual simulation, source CARLA: based on the official vehicles and the map as the environment, controlling the vehicles to run in the area through the automatic driving function, wherein the environment where the target vehicle is located comprises surrounding vehicles, pedestrians and traffic management equipment, and further obtaining simulation speed data representing the automatic driving control characteristics;
(2) Vehicle speed data for a real human driver, source DBNet: downloading a data set which is issued by a real driver and runs in the urban area range on the internet, and acquiring real speed data capable of representing the human driving characteristics;
(3) Based on the speed data of the racing class electronic game, the source Gran Turismo: the simulation speed data which fully characterizes the vehicle in the racing environment is obtained according to the difference of the driving styles of the track, the vehicle and the player by running Gran Turismo Sport a real driving simulator on a PlayStation platform;
(4) Standard operating conditions specially used for testing vehicle performance: several standard speed working conditions commonly used in the field of vehicle testing are selected for combination, and real speed data issued by authorities are obtained.
3. The method for enhancing and updating the energy management strategy of the hybrid electric vehicle according to claim 1, wherein in step S3, the deep reinforcement learning type hybrid electric vehicle energy management strategy is trained by using the generated characteristic working condition based on the state sequence, and specifically comprises the following steps:
s31: the method comprises the steps that a characteristic matrix is transferred in a speed state in an initial stage, the historical speed data of a vehicle covers a part of the range of the state transfer matrix, further, an envelope curve is obtained for a known range, and the related area of the envelope curve can embody the historical driving characteristics of the vehicle;
s32: acquiring boundary state transfer characteristic points of a known driving characteristic range based on the envelope region, and randomly generating a plurality of discrete state transfer characteristic points in the envelope region;
s33: connecting boundary points of an envelope area with internal random points by taking acceleration change and vehicle speed change conditions as indexes to jointly construct a speed track generated based on speed transfer characteristic points, namely a state sequence driving condition;
s34: when the vehicle enters a new driving environment, the enhanced driving condition is generated periodically or aperiodically through the expanded speed transfer characteristic envelope region generated in the enhanced stage.
4. The method for enhancing and updating the energy management strategy of the hybrid electric vehicle according to claim 1, wherein in step S4, the joint simulation setting specifically includes: respectively writing four m files with the purposes of opening a model, transmitting data, continuing to run the model and closing the model in a Matlab environment as function files for joint simulation data interaction; therefore, a control command DRL_action of the deep reinforcement learning type agent in the Python environment is transmitted to the hybrid power system, and the state parameters of the hybrid power system after the control command is executed in the Simulink environment are transmitted to the deep reinforcement learning type agent; the state parameters include brake fuel consumption rate BSFC, instantaneous fuel consumption FuelFlw, battery state of charge BattSoc, battery voltage BattV, transmission gear, longitudinal running speed xdot, motor rotation speed MotSpd, motor torque MotTrq, engine rotation speed EngSpd, engine torque EngTrq, ambient temperature Temp, wheel demand torque whtrq, and simulation time SimuTime.
5. The method of claim 1, wherein in step S5, training is completed when the total jackpot function is in a stable maximum convergence state.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310378883.9A CN116424332B (en) | 2023-04-10 | 2023-04-10 | Energy management strategy enhancement updating method for deep reinforcement learning type hybrid electric vehicle |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310378883.9A CN116424332B (en) | 2023-04-10 | 2023-04-10 | Energy management strategy enhancement updating method for deep reinforcement learning type hybrid electric vehicle |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116424332A CN116424332A (en) | 2023-07-14 |
CN116424332B true CN116424332B (en) | 2023-11-21 |
Family
ID=87084933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310378883.9A Active CN116424332B (en) | 2023-04-10 | 2023-04-10 | Energy management strategy enhancement updating method for deep reinforcement learning type hybrid electric vehicle |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116424332B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117184095B (en) * | 2023-10-20 | 2024-05-14 | 燕山大学 | Hybrid electric vehicle system control method based on deep reinforcement learning |
CN117911670A (en) * | 2023-12-01 | 2024-04-19 | 重庆大学 | Algorithm for detecting small target by improving YOLOv5 |
CN118082891B (en) * | 2024-04-26 | 2024-06-18 | 广汽埃安新能源汽车股份有限公司 | Gear optimization method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102717797A (en) * | 2012-06-14 | 2012-10-10 | 北京理工大学 | Energy management method and system of hybrid vehicle |
CN111845701A (en) * | 2020-08-05 | 2020-10-30 | 重庆大学 | HEV energy management method based on deep reinforcement learning in car following environment |
WO2021083785A1 (en) * | 2019-10-31 | 2021-05-06 | Psa Automobiles Sa | Method for training at least one algorithm for a control device of a motor vehicle, computer program product, and motor vehicle |
WO2021092639A1 (en) * | 2019-11-12 | 2021-05-20 | Avl List Gmbh | Method and system for analysing and/or optimizing a configuration of a vehicle type |
CN114103971A (en) * | 2021-11-23 | 2022-03-01 | 北京理工大学 | Energy-saving driving optimization method and device for fuel cell vehicle |
CN114802180A (en) * | 2022-05-19 | 2022-07-29 | 广西大学 | Mode prediction system and method for hybrid electric vehicle power system coordination control |
CN115470700A (en) * | 2022-09-01 | 2022-12-13 | 吉泰车辆技术(苏州)有限公司 | Hybrid vehicle energy management method based on reinforcement learning training network model |
CN115495997A (en) * | 2022-10-28 | 2022-12-20 | 东南大学 | New energy automobile ecological driving method based on heterogeneous multi-agent deep reinforcement learning |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7314831B2 (en) * | 2020-02-17 | 2023-07-26 | トヨタ自動車株式会社 | VEHICLE CONTROL DATA GENERATION METHOD, VEHICLE CONTROL DEVICE, VEHICLE CONTROL SYSTEM, AND VEHICLE LEARNING DEVICE |
EP4244770A1 (en) * | 2020-11-12 | 2023-09-20 | Umnai Limited | Architecture for explainable reinforcement learning |
-
2023
- 2023-04-10 CN CN202310378883.9A patent/CN116424332B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102717797A (en) * | 2012-06-14 | 2012-10-10 | 北京理工大学 | Energy management method and system of hybrid vehicle |
WO2021083785A1 (en) * | 2019-10-31 | 2021-05-06 | Psa Automobiles Sa | Method for training at least one algorithm for a control device of a motor vehicle, computer program product, and motor vehicle |
WO2021092639A1 (en) * | 2019-11-12 | 2021-05-20 | Avl List Gmbh | Method and system for analysing and/or optimizing a configuration of a vehicle type |
CN111845701A (en) * | 2020-08-05 | 2020-10-30 | 重庆大学 | HEV energy management method based on deep reinforcement learning in car following environment |
CN114103971A (en) * | 2021-11-23 | 2022-03-01 | 北京理工大学 | Energy-saving driving optimization method and device for fuel cell vehicle |
CN114802180A (en) * | 2022-05-19 | 2022-07-29 | 广西大学 | Mode prediction system and method for hybrid electric vehicle power system coordination control |
CN115470700A (en) * | 2022-09-01 | 2022-12-13 | 吉泰车辆技术(苏州)有限公司 | Hybrid vehicle energy management method based on reinforcement learning training network model |
CN115495997A (en) * | 2022-10-28 | 2022-12-20 | 东南大学 | New energy automobile ecological driving method based on heterogeneous multi-agent deep reinforcement learning |
Non-Patent Citations (1)
Title |
---|
韩少剑 ; 张风奇 ; 任延飞 ; 席军强 ; .基于深度学习的混合动力汽车预测能量管理.中国公路学报.2020,(第08期),第3页第1栏第15行-第7页第2栏第17行. * |
Also Published As
Publication number | Publication date |
---|---|
CN116424332A (en) | 2023-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116424332B (en) | Energy management strategy enhancement updating method for deep reinforcement learning type hybrid electric vehicle | |
Liessner et al. | Deep reinforcement learning for advanced energy management of hybrid electric vehicles. | |
Dextreit et al. | Game theory controller for hybrid electric vehicles | |
Phan et al. | Intelligent energy management system for conventional autonomous vehicles | |
Liu et al. | Formula-E race strategy development using artificial neural networks and Monte Carlo tree search | |
Wu et al. | Fast velocity trajectory planning and control algorithm of intelligent 4WD electric vehicle for energy saving using time‐based MPC | |
CN105539423A (en) | Hybrid vehicle torque distribution control method and system for protecting battery based on environment temperature | |
Nguyen et al. | Optimal drivetrain design methodology for enhancing dynamic and energy performances of dual-motor electric vehicles | |
Zhu et al. | A deep reinforcement learning framework for eco-driving in connected and automated hybrid electric vehicles | |
Zhu et al. | Energy management of hybrid electric vehicles via deep Q-networks | |
CN113554337B (en) | Plug-in hybrid electric vehicle energy management strategy construction method integrating traffic information | |
CN115495997B (en) | New energy automobile ecological driving method based on heterogeneous multi-agent deep reinforcement learning | |
CN115793445B (en) | Hybrid electric vehicle control method based on multi-agent deep reinforcement learning | |
CN115534929A (en) | Plug-in hybrid electric vehicle energy management method based on multi-information fusion | |
CN112498334B (en) | Robust energy management method and system for intelligent network-connected hybrid electric vehicle | |
Johri et al. | Self-learning neural controller for hybrid power management using neuro-dynamic programming | |
Li et al. | Distributed cooperative energy management system of connected hybrid electric vehicles with personalized non-stationary inference | |
CN112473151A (en) | Information providing device, information providing method, and storage medium | |
Xu et al. | Real-time energy optimization of HEVs under-connected environment: a benchmark problem and receding horizon-based solution | |
You et al. | Real-time energy management strategy based on predictive cruise control for hybrid electric vehicles | |
CN117698685B (en) | Dynamic scene-oriented hybrid electric vehicle self-adaptive energy management method | |
Sim et al. | A control algorithm of an idle stop and go system with traffic conditions for hybrid electric vehicles | |
Yadav et al. | Intelligent energy management strategies for hybrid electric transportation | |
CN117807714B (en) | Adaptive online lifting method for deep reinforcement learning type control strategy | |
Zlocki et al. | Methodology for quantification of fuel reduction potential for adaptive cruise control relevant driving strategies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |