CN113715805A - Rule fusion deep reinforcement learning energy management method based on working condition identification - Google Patents

Rule fusion deep reinforcement learning energy management method based on working condition identification Download PDF

Info

Publication number
CN113715805A
CN113715805A CN202111177978.1A CN202111177978A CN113715805A CN 113715805 A CN113715805 A CN 113715805A CN 202111177978 A CN202111177978 A CN 202111177978A CN 113715805 A CN113715805 A CN 113715805A
Authority
CN
China
Prior art keywords
vehicle
port
battery
torque
working condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111177978.1A
Other languages
Chinese (zh)
Other versions
CN113715805B (en
Inventor
周小川
昌诚程
张自宇
栾众楷
赵万忠
周冠
文凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tianhang Intelligent Equipment Research Institute Co ltd
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing Tianhang Intelligent Equipment Research Institute Co ltd
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tianhang Intelligent Equipment Research Institute Co ltd, Nanjing University of Aeronautics and Astronautics filed Critical Nanjing Tianhang Intelligent Equipment Research Institute Co ltd
Priority to CN202111177978.1A priority Critical patent/CN113715805B/en
Publication of CN113715805A publication Critical patent/CN113715805A/en
Application granted granted Critical
Publication of CN113715805B publication Critical patent/CN113715805B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • B60W20/20Control strategies involving selection of hybrid configuration, e.g. selection between series or parallel configuration
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W10/00Conjoint control of vehicle sub-units of different type or different function
    • B60W10/04Conjoint control of vehicle sub-units of different type or different function including control of propulsion units
    • B60W10/06Conjoint control of vehicle sub-units of different type or different function including control of propulsion units including control of combustion engines
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W10/00Conjoint control of vehicle sub-units of different type or different function
    • B60W10/04Conjoint control of vehicle sub-units of different type or different function including control of propulsion units
    • B60W10/08Conjoint control of vehicle sub-units of different type or different function including control of propulsion units including control of electric propulsion units, e.g. motors or generators
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • B60W20/10Controlling the power contribution of each of the prime movers to meet required power demand
    • B60W20/11Controlling the power contribution of each of the prime movers to meet required power demand using model predictive control [MPC] strategies, i.e. control methods based on models predicting performance
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • B60W20/10Controlling the power contribution of each of the prime movers to meet required power demand
    • B60W20/15Control strategies specially adapted for achieving a particular effect
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2710/00Output or target parameters relating to a particular sub-units
    • B60W2710/06Combustion engines, Gas turbines
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2710/00Output or target parameters relating to a particular sub-units
    • B60W2710/08Electric propulsion units
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/60Other road transportation technologies with climate change mitigation effect
    • Y02T10/62Hybrid vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Automation & Control Theory (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)

Abstract

The invention discloses a rule fusion deep reinforcement learning energy management method based on working condition identification. The plug-in hybrid electric vehicle is established by taking a plug-in hybrid electric vehicle as an object, a parallel structure is used as a connection mode of an engine and a motor to establish a hybrid electric system model, a working condition library is established by selecting 8 standard working conditions and is subjected to kinematics segmentation, the working conditions of the vehicle are classified and identified by comparing 9 representative parameters according to segmented kinematics segments, then states, actions, agents and penalty functions in a deep Q learning algorithm are designed, and then the designed depth-enhanced learning algorithm with a rule fused is trained and distributed under three different training working conditions, so that the purposes of efficient energy distribution and utilization are achieved, fewer poor samples exist in the training process of the algorithm, the training efficiency is high, and the comprehensive performance of the hybrid electric vehicle system is high.

Description

Rule fusion deep reinforcement learning energy management method based on working condition identification
Technical Field
The invention relates to the field of energy management of hybrid power systems, in particular to a rule fusion deep reinforcement learning energy management method based on working condition identification.
Background
The hybrid power system is a relatively mature driving mode in the transition period from a fuel vehicle to a pure electric vehicle, and a plug-in hybrid power system is widely applied in recent years along with the development of battery technology as a relatively new driving mode.
The energy management strategies of present hybrid vehicles can be roughly divided into three categories: a rule-based energy management policy, an optimization-based energy management policy, and a learning-based energy management policy. The rule-based energy management strategy needs more experimental results and experience, is biased to local optimization at a component level, cannot realize overall optimization control on the plug-in hybrid power system, and the designed rule is usually only aimed at a specific working condition and has poor working condition adaptability. The energy management strategy based on optimization can only solve the optimal solution under the known working condition, cannot be well suitable for the unknown working condition, global optimization is easy to generate dimension disaster, algorithm instantaneity is poor, the dependence degree of an instantaneous optimizer on a model is large, and optimal distribution in a long time period cannot be guaranteed. The problem of working condition adaptability is not considered in the learning-based energy management strategy, an algorithm is generally trained under a standard working condition, and when the working condition characteristics change, the energy management strategy can cause the problems of unreasonable energy distribution, low running efficiency of a hybrid power system and the like. Meanwhile, the intelligent algorithm gives all action spaces to a machine for exploration during training, and does not integrate advantages brought by expert experience, so that the algorithm has more poor samples in the training process, the training efficiency is low, and the problems that the control effect of the trained energy management strategy is not ideal under certain conditions, the comprehensive performance of a hybrid power system is low and the like can be caused. Therefore, aiming at the problems, the invention provides a rule fusion deep reinforcement learning energy management method based on working condition identification, and the purpose of reasonably distributing the energy of the hybrid electric system is achieved.
Disclosure of Invention
The invention aims to solve the technical problem of providing a rule fusion deep reinforcement learning energy management method based on working condition identification aiming at the defects of the background technology. The method solves the problems that the algorithm has more poor samples in the training process, the training efficiency is low, the control effect of the trained energy management strategy is not ideal under certain conditions, and the comprehensive performance of the hybrid power system is low.
The invention adopts the following technical scheme for solving the technical problems:
a rule fusion deep reinforcement learning energy management method based on working condition identification specifically comprises the following steps:
step 1, establishing a hybrid power system model;
step 2, classifying and identifying working conditions;
and 3, designing a rule-fused deep reinforcement learning energy management strategy.
Further, the step 1 is established by taking a plug-in hybrid electric vehicle as a target, and a parallel structure is adopted as a connection mode of an engine and a motor to establish a hybrid power system model. The plug-in hybrid power system comprises a fuel engine, a motor, a vehicle-mounted power battery, an oil tank, a torque coupler, a clutch and a 5-gear transmission. The fuel engine is connected with the torque coupler, the motor is directly connected with one end of the torque coupler, the output end of the torque coupler is connected with the clutch, the other end of the clutch is connected with the 5-gear transmission, and then power is transmitted to the front axle to drive the vehicle to run;
the power battery adopts a Rint equivalent circuit model:
Figure BDA0003296250310000021
in the formula, I is a battery, and is positive when discharging and negative when charging; u shapeocvThe open-circuit voltage of the battery can be obtained by an open-circuit voltage test; r is the internal resistance of the battery, the value of which changes along with the SOC and can be obtained by looking up a table; pbatFor the power of the battery, when the motor torque TmWhen the time is positive, the battery is in a discharge state, and the motor torque TmWhen the voltage is negative, the battery is in a charging state; n ismThe motor rotating speed; etabat-dThe efficiency of discharge for the cell; etabat-cEfficiency of charging the battery; etamThe efficiency of the motor under the current rotating speed and torque is obtained; SOC is the state of charge of the battery; Δ t is the sampling interval; q is the battery capacity;
the longitudinal running equation of the vehicle is that when the vertical motion and the operation stability of the vehicle are not considered:
Figure BDA0003296250310000022
in the formula, Tcon is the torque required by the current working condition; ig is the transmission ratio of the transmission at the current gear; i0 is the main speed reducer transmission ratio; etaTTo total transmission efficiency(ii) a r is the wheel radius; m is the mass of the whole vehicle; g is the acceleration of gravity; f is a rolling resistance coefficient; theta is the ramp angle; CD is the air resistance coefficient; a is the frontal area of the vehicle; u is the vehicle speed; δ is a rotating mass conversion factor.
The vehicle torque coupler adopts three-port two-degree-of-freedom mechanical configuration, a port 1 is used for unidirectional power input, a port 2 and a port 3 are used for bidirectional power input or output, the port 1 is connected with an engine crankshaft, the port 2 is connected with a motor output shaft, and the port 3 is connected with a clutch input end;
the relationship between the torque and the rotating speed of each port of the torque coupler is as follows:
Figure BDA0003296250310000031
in the formula, TeIs the engine torque; n iseIs the engine speed; t is3Outputting torque for the coupler; n is3Outputting the rotating speed for the coupler; i.e. ieFor the gear ratio at the connection of port 1 to the crankshaft of the engine, i is taken heree=1;imFor the transmission ratio of the port 2 connected with the output shaft of the motor, the rotating speed of the motor is generally higher and needs to be reduced, the invention imTaken as 1.7368;
there are 3 driving modes according to the energy flow direction of the engine and the motor in the torque coupler:
(1) a combined driving mode: in the mode, the port 1 and the port 2 are power input ends, the port 3 is a power output end, the engine and the motor jointly provide power to drive the vehicle to run, and the motor torque T is at the momentmPositive, the battery is in a discharged state;
(2) pure electric drive mode: in the mode, the port 1 has no power input, the port 2 is a power input end, the motor drives the vehicle independently, and the motor torque T is realized at the momentmIf the voltage is positive, the battery is in a discharging state, the engine is stopped, and the port 1 is in one-way power input, so that the decoupling of the engine on a power system can be realized, and the mechanical loss is reduced;
(3) and (3) a motor charging mode: in this mode, the vehicleBecomes a generator, motor torque TmIs negative; and can be classified into charging in a driving state and charging in a non-driving state according to the vehicle running state. When the vehicle is charged in a driving state, the clutch is combined, the port 1 is a power input end, the port 2 and the port 3 are power output ends, the engine provides power to drive the vehicle to run, meanwhile, the generator is driven to rotate, and the battery is in a charging state. When charging is carried out in a parking state, the port 1 is a power input end, the port 2 is a power output end, the port 3 has no power output, the clutch is separated, mechanical loss caused by the gearbox and the front axle is reduced, and the engine only provides power for the generator to charge the battery.
Further, the kinematic segment of the vehicle operating condition in step 2 represents the driving state of the vehicle in the period from the beginning of one idling to the beginning of the next idling, and includes an idling process and a driving process, wherein the vehicle is in a stationary state in the idling process, and the driving process includes multiple acceleration, uniform speed and deceleration behaviors of the vehicle. In the invention, for comprehensively establishing a deep reinforcement learning training working condition, 8 standard working conditions are selected to establish a working condition library, the working condition library is subjected to kinematics segmentation, and then the following 9 representative parameters are selected according to the segmented kinematics segment to calculate the characteristics of the kinematics segment: average vehicle speed, average running vehicle speed, maximum vehicle speed, average acceleration, acceleration ratio, deceleration ratio, constant speed ratio, maximum acceleration and maximum deceleration;
the characteristic parameters in each kinematic segment can represent the characteristics of the kinematic segment, but each characteristic parameter is not independent and has a certain relationship with each other, so that the invention utilizes principal component analysis to reduce the dimension of the characteristic parameters of the kinematic segment and simultaneously covers all working condition characteristics as fully as possible, thereby reducing the classification difficulty and improving the reliability. The specific implementation process is as follows:
(1) data were normalized:
Figure BDA0003296250310000041
wherein x isijA j-th characteristic parameter representing an i-th kinematic segment;
Figure BDA0003296250310000042
is the sample mean; sjIs the standard deviation. 1,2,3, …, n; j is 1,2,3, …, m.
(2) Calculating the covariance matrix C of the Z matrix
Figure BDA0003296250310000043
(3) Eigenvalue decomposition of covariance matrix C
C=Q∑Q-1 (6)
Wherein Q is a matrix formed by eigenvalue vectors, sigma is a diagonal matrix, and the elements on the diagonal are eigenvalues lambda1、λ2、…、λm
(4) Calculating the contribution ratio p of each feature vector1、p2、…pmAnd cumulative contribution rates.
Wherein the content of the first and second substances,
Figure BDA0003296250310000044
k=1,2,…,m。
cumulative contribution rate PjIs the accumulation of the first k principal component contribution rates.
Figure BDA0003296250310000045
(5) Taking the feature vector corresponding to the principal component as a conversion matrix, and multiplying the data matrix by the conversion matrix to realize principal component mapping to obtain the corresponding kinematics segment feature parameters after dimension reduction;
then, fuzzy C-means clustering in the fuzzy clustering is used, and the clustering analysis is carried out on the kinematic segments according to the obtained principal component result, wherein the process comprises the following steps:
(1) setting the number of clusters ncAnd a weighting index b;
(2) initializing each cluster center mj
(3) Calculating membership functions of all samples under the current clustering center:
Figure BDA0003296250310000051
wherein muj(xi) Expressed as the membership function of the ith sample corresponding to the jth class.
(4) Calculating various clustering centers under the current membership function:
Figure BDA0003296250310000052
(5) and (4) repeating the steps (3) and (4) until the algorithm converges or the maximum iteration number is reached.
To determine the number of clusters ncL (n) is used hereinc) The function is used as an evaluation index, and the formula is as follows:
Figure BDA0003296250310000053
wherein the numerator represents the sum of the inter-class distances and the denominator represents the sum of the intra-class distances, so L (n)c) Larger values indicate better classification.
And according to the fuzzy clustering result, combining the different types of kinematic fragments into a 3-type kinematic fragment library, then randomly extracting a certain number of kinematic fragments from the 3-type kinematic fragment library, and randomly arranging the kinematic fragments to obtain 3 working conditions for training.
And finally, training and identifying the working condition type under the training working condition of 3 by using the LVQ neural network, wherein the specific steps are as follows:
(1) and combining the working conditions 1,2 and 3 for training, calculating 9 corresponding characteristic parameters in window data by using a sliding window algorithm to serve as input of the LVQ neural network, and training by taking a vector form of the working condition category as a label.
(2) If the number of windows is too large, the window data may include more than one type of operating condition data, thereby increasing the difficulty of identification. If the number of the windows is too short, the working condition characteristic information is incomplete, so that the identification precision is reduced, and the fuel economy of the whole vehicle is reduced. Comprehensively, the method uses 35s as the window length to perform rolling extraction of the characteristic parameters of the working conditions.
(3) And training the LVQ neural network. The selected hyper-parameters are: the number of nodes of the LVQ nerve competition layer is 500, the learning rate is 0.0005, the type of the learning function is Learnlv1, and the iteration cycle is 50 times.
(4) And verifying the accuracy of the LVQ neural network. And carrying out sliding window operation with the length of 35s on the verification working condition, rolling the extracted characteristic parameters to be used as input of the trained LVQ neural network, and carrying out indexing operation on the output to obtain a verification working condition identification result.
4. The rule fusion deep reinforcement learning energy management method based on working condition identification according to claim 1, the method is characterized in that the design in the step 3 comprises a state, an action, an agent and a penalty function, a state space is selected as a required torque Tr, a battery SOC and a current transmission ratio of a transmission, an action variable is selected as an engine output torque Te and a gear shifting action Ag, the agent design fusing rules is based on the idea of energy distribution by a rule algorithm, the rules are fused into a machine for deep Q learning, a deep Q learning algorithm with a fusion rule is obtained, the number of effective samples in a sample pool is increased, a plug-in hybrid electric vehicle generally controls the SOC working interval of a battery within a certain range so as to ensure the cycle life of the battery and small amount of electric energy storage for special conditions, the SOC is used as a rule control quantity, and the efficient SOC working range is set to be 0.2-0.8; and taking the torque of the power system as a regular control quantity;
the penalty function calculation method comprises the following steps:
Figure BDA0003296250310000061
wherein b isThe fuel consumption rate can be obtained from the universal characteristic curve chart according to the current torque and the rotating speed of the engine; ρ is the fuel density; g is the acceleration of gravity; cf is the price per liter of fuel; ce is the price of electrical energy per kwh; lambda [ alpha ]AIs a shift action value weighting factor; lambda [ alpha ]p1Is a penalty factor under a poor shift strategy; lambda [ alpha ]p2Is the penalty factor of SOC exceeding the upper and lower use limits.
Compared with the prior art, the invention adopting the technical scheme has the following beneficial effects:
1. training the designed rule-fused deep reinforcement learning algorithm under three different training working conditions to obtain three deep neural networks net1, net2 and net3 suitable for different working condition categories for energy distribution of a hybrid power system;
2. in the actual use process, a sliding window algorithm is used firstly, 9 corresponding characteristic parameters in window data are calculated and used as the input of a trained LVQ neural network to obtain the current working condition type, and then a rule-fused deep reinforcement learning algorithm under the training of the corresponding working condition type is used for distributing the energy of the hybrid power system, so that the purpose of efficient energy distribution and utilization is achieved.
Drawings
FIG. 1 is a block diagram of a plug-in hybrid powertrain system;
FIG. 2 is a battery Rint equivalent circuit model;
FIG. 3 is a flow chart of an energy management policy algorithm.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings:
the present invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 shows a plug-in hybrid power system structure diagram, which is composed of a fuel engine, an electric motor, a vehicle-mounted power battery, a fuel tank, a torque coupler, a clutch and a 5-gear transmission. The fuel engine is connected with the torque coupler, the motor is directly connected with one end of the torque coupler, the output end of the torque coupler is connected with the clutch, the other end of the clutch is connected with the 5-gear transmission, and then power is transmitted to the front axle to drive the vehicle to run. The vehicle model comprises a five-gear transmission, and the gear directly relates to the torque required by the power system, so that the power reserve capacity of the vehicle is influenced, so that the torque of the power system is used as a regular control quantity.
The plug-in hybrid electric vehicle engine is a power source for driving a vehicle to run and supplementing the electric quantity of a battery, the importance of the plug-in hybrid electric vehicle engine is higher than that of a motor, so that the engine torque is used as a first-stage regulation control quantity, the SOC of the battery is used as a second-stage regulation control quantity, and the motor torque is used as a third-stage regulation control quantity because the motor torque is larger and the power reserve capacity is stronger.
As shown in fig. 2, the equivalent circuit model of the battery Rint can be obtained:
Figure BDA0003296250310000071
Figure BDA0003296250310000081
Figure BDA0003296250310000082
in the formula, I is a battery, and is positive when discharging and negative when charging; u shapeocvThe open-circuit voltage of the battery can be obtained by an open-circuit voltage test; r is the internal resistance of the battery, the value of which changes along with the SOC and can be obtained by looking up a table; pbatFor the power of the battery, when the motor torque TmWhen the time is positive, the battery is in a discharge state, and the motor torque TmWhen the voltage is negative, the battery is in a charging state; n ismThe motor rotating speed; etabat-dThe efficiency of discharge for the cell; etabat-cEfficiency of charging the battery; etamThe efficiency of the motor under the current rotating speed and torque is obtained; sOC is the state of charge of the battery; Δ t is the sampling interval; q is the battery capacity.
As shown in figure 3 is a flow chart of an energy management policy algorithm,
the method comprises the steps of firstly, carrying out dimensionality reduction on characteristic values of a velocity fragment in a working condition by using principal component analysis, classifying the motion fragment by using fuzzy clustering, carrying out working condition recombination according to classification results to obtain low-speed, medium-speed and high-speed training working conditions, and training the working condition type by using an LVQ neural network. And then, establishing a rule with the engine torque, the SOC and the motor torque as rule control variables and the driving mode as output quantities, integrating the rule into an agent of deep reinforcement learning, and training the rule-integrated deep reinforcement learning energy management under three working conditions by combining with a designed penalty function. And then in the actual use process, firstly, extracting characteristic parameters from the current operation working condition by using a sliding window algorithm, then taking the characteristic parameters as the input of the trained LVQ neural network to obtain the current working condition category, and then selecting a rule after corresponding working condition training according to the working condition category to fuse a deep reinforcement learning energy management strategy for energy distribution of the hybrid power system.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only illustrative of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. A rule fusion deep reinforcement learning energy management method based on working condition identification is characterized by comprising the following steps:
step 1, establishing a hybrid power system model;
step 2, classifying and identifying working conditions;
and 3, designing a rule-fused deep reinforcement learning energy management strategy.
2. The rule fusion depth reinforcement learning energy management method based on working condition identification is characterized in that the step 1 is established by taking a plug-in hybrid electric vehicle as an object, and a hybrid power system model is established by adopting a parallel structure as a connection mode of an engine and a motor; the plug-in hybrid power system consists of a fuel engine, a motor, a vehicle-mounted power battery, an oil tank, a torque coupler, a clutch and a 5-gear transmission case; the fuel engine is connected with the torque coupler, the motor is directly connected with one end of the torque coupler, the output end of the torque coupler is connected with the clutch, the other end of the clutch is connected with the 5-gear transmission, and then power is transmitted to the front axle to drive the vehicle to run;
the power battery adopts a Rint equivalent circuit model:
Figure FDA0003296250300000011
Figure FDA0003296250300000012
Figure FDA0003296250300000013
in the formula, I is a battery, and is positive when discharging and negative when charging; u shapeocvThe open-circuit voltage of the battery can be obtained by an open-circuit voltage test; r is the internal resistance of the battery, the value of which changes along with the SOC and can be obtained by looking up a table; pbatFor the power of the battery, when the motor torque TmWhen the time is positive, the battery is in a discharge state, and the motor torque TmWhen the voltage is negative, the battery is in a charging state; n ismThe motor rotating speed; etabat-dThe efficiency of discharge for the cell; etabat-cEfficiency of charging the battery; etamThe efficiency of the motor under the current rotating speed and torque is obtained; SOC is the state of charge of the battery; Δ t is the sampling interval; q is the battery capacity;
the longitudinal running equation of the vehicle is that when the vertical motion and the operation stability of the vehicle are not considered:
Figure FDA0003296250300000014
in the formula, Tcon is the torque required by the current working condition; ig is the transmission ratio of the transmission at the current gear; i0 is the main speed reducer transmission ratio; etaTThe total transmission efficiency; r is the wheel radius; m is the mass of the whole vehicle; g is the acceleration of gravity; f is a rolling resistance coefficient; theta is the ramp angle; CD is the air resistance coefficient; a is the frontal area of the vehicle; u is the vehicle speed; delta is a rotating mass conversion coefficient;
the vehicle torque coupler adopts three-port two-degree-of-freedom mechanical configuration, a port 1 is used for unidirectional power input, a port 2 and a port 3 are used for bidirectional power input or output, the port 1 is connected with an engine crankshaft, the port 2 is connected with a motor output shaft, and the port 3 is connected with a clutch input end;
the relationship between the torque and the rotating speed of each port of the torque coupler is as follows:
T3=ieTe+imTm
Figure FDA0003296250300000021
in the formula, TeIs the engine torque; n iseIs the engine speed; t is3Outputting torque for the coupler; n is3Outputting the rotating speed for the coupler; i.e. ieFor the gear ratio at the connection of port 1 to the crankshaft of the engine, i is taken heree=1;imFor the transmission ratio of the port 2 connected with the output shaft of the motor, the rotating speed of the motor is generally higher and needs to be reduced, the invention imTaken as 1.7368;
there are 3 driving modes according to the energy flow direction of the engine and the motor in the torque coupler:
(1) a combined driving mode: in the mode, the port 1 and the port 2 are power input ends, the port 3 is a power output end, the engine and the motor jointly provide power to drive the vehicle to run, the motor torque Tm is positive, and the battery is in a discharging state;
(2) pure electric drive mode: in the mode, the port 1 has no power input, the port 2 is a power input end, the motor drives the vehicle independently, the motor torque Tm is positive, the battery is in a discharging state, and the engine is stopped;
(3) and (3) a motor charging mode: in this mode, the motor of the vehicle becomes a generator, and the motor torque Tm is negative; and can be divided into charging in a driving state and charging in a non-driving state according to the vehicle running state; when the vehicle is charged in a driving state, the clutch is combined, the port 1 is a power input end, the port 2 and the port 3 are power output ends, the engine provides power to drive the vehicle to run, the generator is driven to rotate, and the battery is in a charging state; when charging is carried out in a parking state, the port 1 is a power input end, the port 2 is a power output end, the port 3 has no power output, the clutch is separated, mechanical loss caused by the gearbox and the front axle is reduced, and the engine only provides power for the generator to charge the battery.
3. The rule fusion deep reinforcement learning energy management method based on condition identification as claimed in claim 1, wherein the kinematic segment of the vehicle condition in step 2 represents the driving state of the vehicle in the period from one idling start to the next idling start, and comprises an idling process and a driving process, wherein the vehicle is in a stationary state in the idling process, and the driving process comprises a plurality of acceleration, uniform speed and deceleration behaviors of the vehicle; in the invention, for comprehensively establishing a deep reinforcement learning training working condition, 8 standard working conditions are selected to establish a working condition library, the working condition library is subjected to kinematics segmentation, and then the following 9 representative parameters are selected according to the segmented kinematics segment to calculate the characteristics of the kinematics segment: average vehicle speed, average running vehicle speed, maximum vehicle speed, average acceleration, acceleration ratio, deceleration ratio, constant speed ratio, maximum acceleration and maximum deceleration;
the characteristic parameters in each kinematic segment can represent the characteristics of the kinematic segment, but each characteristic parameter is not independent and has a certain relationship with each other, so that the method utilizes principal component analysis to reduce the dimension of the characteristic parameters of the kinematic segment, simultaneously covers all working condition characteristics as fully as possible, reduces the classification difficulty and simultaneously improves the reliability; the specific implementation process is as follows:
(1) data were normalized:
Figure FDA0003296250300000031
wherein x isijA j-th characteristic parameter representing an i-th kinematic segment;
Figure FDA0003296250300000032
is the sample mean; sjIs the standard deviation; 1,2,3, …, n; j ═ 1,2,3, …, m;
(2) calculating the covariance matrix C of the Z matrix
Figure FDA0003296250300000033
(3) Eigenvalue decomposition of covariance matrix C
C=Q∑Q-1
Wherein Q is a matrix formed by eigenvalue vectors, sigma is a diagonal matrix, and the elements on the diagonal are eigenvalues lambda1、λ2、…、λm
(4) Calculating the contribution ratio p of each feature vector1、p2、…pmAnd cumulative contribution rate;
wherein the content of the first and second substances,
Figure FDA0003296250300000034
cumulative contribution rate PjIs the first k main componentsAccumulating contribution rates;
Figure FDA0003296250300000041
(5) taking the feature vector corresponding to the principal component as a conversion matrix, and multiplying the data matrix by the conversion matrix to realize principal component mapping to obtain the corresponding kinematics segment feature parameters after dimension reduction;
then, fuzzy C-means clustering in the fuzzy clustering is used, and the clustering analysis is carried out on the kinematic segments according to the obtained principal component result, wherein the process comprises the following steps:
(1) setting the number of clusters ncAnd a weighting index b;
(2) initializing each cluster center mj
(3) Calculating membership functions of all samples under the current clustering center:
Figure FDA0003296250300000042
wherein muj(xi) The membership function is expressed as that the ith sample corresponds to the jth class;
(4) calculating various clustering centers under the current membership function:
Figure FDA0003296250300000043
(5) until the algorithm converges or the maximum iteration times is reached, otherwise, repeating the steps (3) and (4);
to determine the number of clusters ncL (n) is used hereinc) The function is used as an evaluation index, and the formula is as follows:
Figure FDA0003296250300000044
in the formula, the numerator represents the sum of the distances between classes, and the denominator represents the interior of the classSum of pitches, so L (n)c) The larger the value is, the better the classification effect is;
according to the fuzzy clustering result, the different types of kinematic fragments form a 3-type kinematic fragment library, then a certain number of kinematic fragments are randomly extracted from the 3-type kinematic fragment library, and the various kinematic fragments are randomly arranged to obtain 3 working conditions for training;
and finally, training and identifying the working condition type under the training working condition of 3 by using the LVQ neural network, wherein the specific steps are as follows:
(1) combining the training working conditions 1,2 and 3, calculating 9 corresponding characteristic parameters in window data by using a sliding window algorithm as input of an LVQ neural network, and training by using a vector form of the working condition category as a label;
(2) if the number of the windows is too large, more than one type of working condition data may be contained in the window data, so that the identification difficulty is increased; if the number of the windows is too short, the operating condition characteristic information is incomplete, so that the identification precision is reduced, and the fuel economy of the whole vehicle is reduced; comprehensively considering, rolling and extracting working condition characteristic parameters by taking 35s as a window length;
(3) training the LVQ neural network, wherein the selected hyper-parameters are as follows: the number of nodes of the LVQ nerve competition layer is 500, the learning rate is 0.0005, the type of the learning function is Learnlv1, and the iteration cycle is 50 times;
(4) verifying the accuracy of the LVQ neural network; and carrying out sliding window operation with the length of 35s on the verification working condition, rolling the extracted characteristic parameters to be used as input of the trained LVQ neural network, and carrying out indexing operation on the output to obtain a verification working condition identification result.
4. The rule fusion deep reinforcement learning energy management method based on working condition identification according to claim 1, the method is characterized in that the design in the step 3 comprises a state, an action, an agent and a penalty function, a state space is selected as a required torque Tr, a battery SOC and a current transmission ratio of a transmission, an action variable is selected as an engine output torque Te and a gear shifting action Ag, the agent design fusing rules is based on the idea of energy distribution by a rule algorithm, the rules are fused into a machine for deep Q learning, a deep Q learning algorithm with a fusion rule is obtained, the number of effective samples in a sample pool is increased, a plug-in hybrid electric vehicle generally controls the SOC working interval of a battery within a certain range so as to ensure the cycle life of the battery and small amount of electric energy storage for special conditions, the SOC is used as a rule control quantity, and the efficient SOC working range is set to be 0.2-0.8; and taking the torque of the power system as a regular control quantity;
the penalty function calculation method comprises the following steps:
Figure FDA0003296250300000051
wherein b is the fuel consumption rate, and can be obtained from the universal characteristic curve chart according to the current torque and the rotating speed of the engine; ρ is the fuel density; g is the acceleration of gravity; cf is the price per liter of fuel; ce is the price of electrical energy per kwh; lambda [ alpha ]AIs a shift action value weighting factor; lambda [ alpha ]p1Is a penalty factor under a poor shift strategy; lambda [ alpha ]p2Is the penalty factor of SOC exceeding the upper and lower use limits.
CN202111177978.1A 2021-10-09 2021-10-09 Rule fusion deep reinforcement learning energy management method based on working condition identification Active CN113715805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111177978.1A CN113715805B (en) 2021-10-09 2021-10-09 Rule fusion deep reinforcement learning energy management method based on working condition identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111177978.1A CN113715805B (en) 2021-10-09 2021-10-09 Rule fusion deep reinforcement learning energy management method based on working condition identification

Publications (2)

Publication Number Publication Date
CN113715805A true CN113715805A (en) 2021-11-30
CN113715805B CN113715805B (en) 2023-01-06

Family

ID=78685752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111177978.1A Active CN113715805B (en) 2021-10-09 2021-10-09 Rule fusion deep reinforcement learning energy management method based on working condition identification

Country Status (1)

Country Link
CN (1) CN113715805B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821775A (en) * 2023-08-29 2023-09-29 陕西重型汽车有限公司 Load estimation method based on machine learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090198396A1 (en) * 2008-02-04 2009-08-06 Fernando Rodriguez Adaptive control strategy and method for optimizing hybrid electric vehicles
CN104071161A (en) * 2014-04-29 2014-10-01 福州大学 Method for distinguishing working conditions and managing and controlling energy of plug-in hybrid electric vehicle
CN110929920A (en) * 2019-11-05 2020-03-27 中车戚墅堰机车有限公司 Hybrid power train energy management method based on working condition identification
CN112035949A (en) * 2020-08-14 2020-12-04 浙大宁波理工学院 Real-time fuzzy energy management method combined with Q reinforcement learning
CN112116156A (en) * 2020-09-18 2020-12-22 中南大学 Hybrid train energy management method and system based on deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090198396A1 (en) * 2008-02-04 2009-08-06 Fernando Rodriguez Adaptive control strategy and method for optimizing hybrid electric vehicles
CN104071161A (en) * 2014-04-29 2014-10-01 福州大学 Method for distinguishing working conditions and managing and controlling energy of plug-in hybrid electric vehicle
CN110929920A (en) * 2019-11-05 2020-03-27 中车戚墅堰机车有限公司 Hybrid power train energy management method based on working condition identification
CN112035949A (en) * 2020-08-14 2020-12-04 浙大宁波理工学院 Real-time fuzzy energy management method combined with Q reinforcement learning
CN112116156A (en) * 2020-09-18 2020-12-22 中南大学 Hybrid train energy management method and system based on deep reinforcement learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821775A (en) * 2023-08-29 2023-09-29 陕西重型汽车有限公司 Load estimation method based on machine learning

Also Published As

Publication number Publication date
CN113715805B (en) 2023-01-06

Similar Documents

Publication Publication Date Title
Khayyam et al. Adaptive intelligent energy management system of plug-in hybrid electric vehicle
CN112287463B (en) Fuel cell automobile energy management method based on deep reinforcement learning algorithm
CN105868942B (en) The orderly charging schedule method of electric car
Lee et al. A novel big data modeling method for improving driving range estimation of EVs
Li et al. Back-to-back competitive learning mechanism for fuzzy logic based supervisory control system of hybrid electric vehicles
CN112327168A (en) XGboost-based electric vehicle battery consumption prediction method
CN112180280A (en) Hybrid electric vehicle battery life optimization method considering battery health state
Khayyam et al. Intelligent energy management in hybrid electric vehicles
CN113479186B (en) Energy management strategy optimization method for hybrid electric vehicle
CN111079230A (en) NSGA-II-based multi-objective optimization method for parameters of plug-in hybrid electric vehicle power transmission system
CN113554337A (en) Plug-in hybrid electric vehicle energy management strategy construction method fusing traffic information
CN113715805B (en) Rule fusion deep reinforcement learning energy management method based on working condition identification
CN115759462A (en) Charging behavior prediction method and device for electric vehicle user and electronic equipment
Ghobadpour et al. An intelligent energy management strategy for an off‐road plug‐in hybrid electric tractor based on farm operation recognition
Lee et al. Learning to recognize driving patterns for collectively characterizing electric vehicle driving behaviors
Chang et al. A novel energy management strategy integrating deep reinforcement learning and rule based on condition identification
Balch et al. The affect of battery pack technology and size choices on hybrid electric vehicle performance and fuel economy
Peng et al. Ecological driving framework of hybrid electric vehicle based on heterogeneous multi agent deep reinforcement learning
Chen et al. A novel method of developing driving cycle for electric vehicles to evaluate the private driving habits
Wu et al. Adaptive energy management strategy for extended-range electric vehicle based on micro-trip identification
Zeng et al. Cooperative optimization of speed planning and energy management for hybrid electric vehicles based on Nash equilibrium
CN117465301A (en) Fuel cell automobile real-time energy management method based on data driving
Wang et al. An enhanced hypotrochoid spiral optimization algorithm based intertwined optimal sizing and control strategy of a hybrid electric air-ground vehicle
Wimalendra et al. Determination of maximum possible fuel economy of HEV for known drive cycle: genetic algorithm based approach
CN114670803A (en) Parallel hybrid electric vehicle energy management method based on self-supervision learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant