Hybrid power system control method based on road surface identification and deep reinforcement learning
Technical Field
The invention belongs to the technical field of intelligent control of new energy automobiles, and relates to a hybrid power system control method based on road surface identification and deep reinforcement learning.
Background
At present, the main development types of new energy automobiles comprise pure electric automobiles, plug-in hybrid electric automobiles and fuel cell automobiles, and in comparison, the hybrid electric automobiles have the characteristics of good fuel economy, good emission optimization effect, long endurance, mature control technology, low requirement on battery performance and the like, and are automobile models suitable for future development. On the other hand, the key technical route of the intelligent automobile sequentially comprises an environment perception technology, an intelligent decision technology and a control execution technology, but the technical route is not specific to a certain type of automobile, and the automobile is regarded as an entity to be controlled macroscopically. On the basis, it is conceivable that if the environment perception technology of the intelligent automobile and the energy management strategy of the hybrid electric vehicle are organically integrated, the automobile can acquire the environment information in real time after carrying the visual identification function, and therefore more intelligent and reasonable system control is conducted.
Most of the existing system control methods of hybrid electric vehicles only consider road condition information or the steep gradient of a road, and almost do not consider the influence factors of the road type.
Disclosure of Invention
In view of the above, the present invention provides a hybrid power system control method based on road surface identification and deep reinforcement learning, which utilizes a convolution network model based on computer vision to complete online identification of road surface types, and then performs economic-oriented energy management and safety-oriented brake control on a plurality of components in the hybrid power system through a deep reinforcement learning algorithm, so as to achieve cooperative assurance of fuel economy and brake safety, and is suitable for an unmanned hybrid vehicle.
In order to achieve the purpose, the invention provides the following technical scheme:
a hybrid power system control method based on road surface identification and deep reinforcement learning specifically comprises the following steps:
s1: establishing a parallel hybrid power system with a P3 structure (namely, a motor is positioned between a transmission and a main reducer) and a driving environment model fusing various time-varying state information, and completing the construction of a training environment;
s2: building a VGG16 convolutional neural network for road surface identification, acquiring images of five typical road surfaces, and training the convolutional neural network on road surface type characteristic extraction;
s3: after the road surface type is identified on line, determining the optimal slip rate in the braking stage according to a slip rate-adhesion coefficient characteristic curve, and using the optimal slip rate as a reference value of a motor rotating speed fine-tuning strategy in a subsequent technical system;
s4: establishing a three-dimensional neural Network suitable for multi-target control based on a Deep Q-Network (DQN) algorithm;
s5: defining a state variable space, an action variable space and a reward function of the three-dimensional neural network, and then performing iterative training on the three-dimensional neural network;
s6: extracting and storing the neural network synchronously fitting the three parameterized control strategies, and realizing the cooperative guarantee of the fuel economy and the braking safety of the hybrid electric vehicle; the three parameterized control strategies comprise a motor rotating speed fine adjustment strategy in a braking stage, an engine power control strategy and a mechanical continuously variable transmission gear shifting strategy.
Further, in step S1, the plurality of time-varying state information includes: the longitudinal running speed, the gradient, the number of passengers, the running picture collected by the vehicle-mounted camera and the like.
Further, in step S2, five typical road surface pictures are collected, including collecting multiple pictures of dry asphalt road surface, dry cobble road surface, wet asphalt road surface, wet cobble road surface, snow road surface, etc., and the surrounding environment is cut off by batch cutting so as to only keep the road surface portion, so as to ensure that the pixel information input into the convolutional neural network is valid information.
Further, in step S3, determining an optimal slip rate in the braking phase specifically includes: after the current road surface type is identified on line, the optimal slip rate capable of fully utilizing the road surface adhesion condition is determined according to a slip rate-adhesion coefficient characteristic curve, and the aim is that when the vehicle is in a braking state, the working efficiency of an anti-lock braking system (ABS) can be achieved by utilizing the motor to brake and keeping a certain slip rate under the regenerative braking mode without starting the braking force of a brake by adjusting the motor speed directly related to the wheel speed; wherein the slip ratio is determined according to the following formula:
where s is the slip ratio, vvehIs the longitudinal speed of the vehicle, r is the wheel radius, ωwheelIs the wheel speed; at the moment, the motor rotating speed corresponding to the optimal slip rate is used as a reference value of a motor rotating speed fine-tuning strategy in a subsequent technical system.
Further, in step S4, establishing a stereo neural network specifically includes: establishing a depth value network algorithm framework, and defining a hyper-parameter capable of maximally improving the calculation efficiency and the learning effect; the depth value network algorithm frame comprises an environment module and an agent module; the environment module comprises a parallel hybrid power system established in the step S1 and a driving environment model fusing various time-varying state information, and is used as a training environment for extracting an optimal control strategy; the intelligent agent module comprises a depth value network algorithm based on depth reinforcement learning, and specifically comprises a target network module, an experience playback mechanism module and the like; the hyper-parameters include: learning rate, attenuation rate of greedy coefficient, and experience pool capacity.
Further, in step S5, the expression of the state variable space S is defined as:
S={soc,vel,acc,ωmg,iCVT,Peng,θ,Roadsurface,Npeople}
where soc is the battery state of charge, vel is the speed, acc is the acceleration, ω ismgIs the motor speed, iCVTIs the transmission ratio of the continuously variable transmission, PengIs engine power, θ is slope, RoadsurfaceIs of the road surface type, NpeopleIs the number of passengers. Meanwhile, in a nine-dimensional state variable space, the battery charge state, speed and acceleration are subordinate to the vehicle system state, the motor rotation speed, the continuously variable transmission ratio and the engine power are subordinate to the control part state, and the gradient, the road surface type and the number of passengers are subordinate to the driving environment state.
Further, in step S5, the expression of the defined action variable space a is:
wherein, Δ ωmgIs the amount of change, Δ i, in the rotational speed of the motorCVTIs the change in transmission ratio, Δ P, of the transmissionengIs the amount of change in engine power.
Further, in step S5, the reward function R is defined by the following expression:
where α, β, γ, χ, and ξ are the weighting coefficients, t is the time, soc is the battery state of charge
targetIs the target charge state, abs denotes the absolute value, ω
refIs a reference motor speed, i
CVTIs the transmission ratio of the continuously variable transmission, i
refReference is made to the transmission ratio of the transmission,
is instantaneous oil consumption, T
engIs hairMotive torque, n
engIs the engine speed, η
engIs the engine efficiency.
Further, in step S6, extracting and storing the neural network that synchronously fits the three parameterized control strategies specifically includes: and after the total accumulated reward value is stably converged (marking the end of training), extracting the constructed three-dimensional deep neural network parameters and storing the parameters as a persistence model, wherein three parameterized control strategies are simultaneously stored in the model, namely a motor rotating speed fine adjustment strategy, an engine power control strategy and a mechanical continuously variable transmission gear shifting strategy in a braking stage, so that synchronous learning of three different types of control strategies is realized.
The invention has the beneficial effects that: the intelligent energy management strategy capable of synchronously ensuring the fuel economy and the braking safety is provided for the hybrid electric vehicle after the computer vision technology is combined, the online identification of the road surface type is specifically completed by utilizing a convolution network model based on the computer vision, and the economic-oriented energy management and the safety-oriented braking control are carried out on a plurality of components in the hybrid electric vehicle through a deep reinforcement learning algorithm, so that the cooperative guarantee of the fuel economy and the braking safety is realized, and the intelligent energy management strategy is suitable for the unmanned hybrid electric vehicle.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of a control strategy of the method of the present invention;
FIG. 2 is a parallel hybrid powertrain of the P3 configuration;
FIG. 3 is a driving environment modeling diagram;
FIG. 4 is a network architecture diagram of a VGG 16;
FIG. 5 is a graph of slip versus road adhesion coefficient for different road types;
FIG. 6 is a diagram of the DQN algorithm structure;
fig. 7 is a three-dimensional deep neural network structure.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Referring to fig. 1 to 7, the present invention preferably discloses a hybrid power system control method based on road surface identification and deep reinforcement learning, which specifically includes the following steps:
s1: and establishing a parallel hybrid power system with a P3 structure and a driving environment model fusing various time-varying state information, wherein the driving environment model comprises longitudinal driving speed, gradient, passenger number, driving pictures acquired by a vehicle-mounted camera and the like, and completing the establishment of a training environment.
A parallel hybrid power system (comprising an engine, a clutch, a hydraulic torque converter, a motor/generator, a lithium ion power battery, a mechanical continuously variable transmission, a rear axle and the like) with a P3 structure shown in FIG. 2 is built, and a driving environment model (comprising longitudinal driving speed, gradient, passenger number, driving pictures collected by an on-board camera and the like) with a plurality of time-varying states is fused as shown in FIG. 3. In the former case, since the motor is installed between the mechanical continuously variable transmission and the final drive in the P3 configuration, the rotation speed of the motor is directly related to the rotation speed of the wheels, and the engine operating speed can be controlled by adjusting the real-time transmission ratio of the transmission. For the latter, the total running time of the driving environment model is 3602 seconds (namely the combination of a forward WLTC working condition and a reverse WLTC working condition), and the divided 10 modules are distributed with respective driving pictures and the number of passengers while the WLTC speed track and the real road gradient are included, so that an environment model containing various time-varying state information is built.
S2: a VGG convolutional network model for pavement identification is established, five typical pavement picture materials are collected, and training on pavement type feature extraction is conducted on a convolutional neural network.
A VGG16 convolutional network model shown in FIG. 4 is established through a Python language and Pythroch deep learning tool for online road surface type identification, and driving pictures are recorded through a vehicle-mounted camera, so that a large number of picture materials are collected as deep learning data sets for five typical road surfaces, for example, more than 2000 pieces of dry asphalt road surfaces, dry cobblestone road surfaces, wet asphalt road surfaces, wet cobblestone road surfaces and snowfield road surfaces are collected, the surrounding environment is cut off by batch cutting, only the road surface part is reserved, and therefore all pixel information input into a neural network can be guaranteed to be effective information. The VGG16 convolutional network was then trained for pavement type identification by defining 90% of the material as the training set and the remaining 10% as the test set.
According to a conventional deep learning training scheme, an original picture is compressed to the size of a neural network input layer for characteristic extraction in a specific training process, and finally, a classification layer is defined as a membership type of the picture through the subsequent processing of a series of convolution layers, pooling layers and full connection layers.
S3: after the road surface type is identified on line, the optimal slip rate in the braking stage is determined according to the slip rate-adhesion coefficient characteristic curve and is used as a reference value of a motor rotating speed fine-tuning strategy in a subsequent technical system.
After a driving picture is input into a trained VGG16 convolutional network through a vehicle-mounted camera to realize online identification of a road type, an optimal sliding rate capable of fully exerting a road adhesion condition is determined according to a characteristic curve of the sliding rate-adhesion coefficient shown in FIG. 5, and the aim is that when the vehicle is in a braking state, the working efficiency of an anti-lock braking system (ABS) can be achieved by utilizing motor braking and keeping a certain sliding rate under a regenerative braking mode without starting brake braking force through controlling the motor rotating speed directly related to the wheel rotating speed. Wherein the slip ratio is determined according to the following formula:
where s is the slip ratio, vvehIs the longitudinal speed of the vehicle, r is the wheel radius, ωwheelIs the wheel speed. The optimal slip ratio and the corresponding optimal motor speed at this time are also used as reference values of a motor speed fine-tuning strategy in a subsequent technical system.
S4: and establishing a three-dimensional neural network model suitable for multi-target control based on a depth value network (DQN) algorithm.
A depth value network algorithm framework shown in FIG. 6 is established by a Python language and a Pythrch depth learning tool. In FIG. 6, the solid line represents the reinforcement learning training loop for the subject, while the inner greedy algorithm actually selects random actions with a probability of ε% and selects the best actions known to date with a probability of 1- ε%. The greedy coefficients also decay gradually as the iterative training progresses. The chain line indicates the flow of the computation of the loss function, and the chain line indicates the gradient computation and the inverse update process. The environment module comprises a hybrid power system model and a driving environment model which are established before (as a training environment for extracting an optimal control strategy), and the agent module comprises a depth value network algorithm for depth reinforcement learning (comprising a target network, an experience playback mechanism and the like). Meanwhile, hyper-parameters (including a learning rate, an attenuation rate of a greedy coefficient, an experience pool capacity, and the like) capable of maximizing the improvement of the calculation efficiency and the learning effect are defined. Because three controlled components, namely an engine, a motor and a mechanical continuously variable transmission, exist in the parallel hybrid power system with the P3 structure, the aim of complying with the purpose of learning a control strategy by an online network is to combine online networks of three depth value network frames in a three-dimensional mode, and finally establish a three-dimensional neural network as shown in FIG. 7, so that the three-dimensional neural network serves as a basic frame for a multi-target control strategy of a subsequent hybrid power system.
In general, the three-dimensional neural network structure in the hidden layer includes three sides, namely, the three sides correspond to three control strategies to be learned, and each side of the neural network has the same structure, namely, each side includes three layers of neurons, each layer has 100 neurons in total, and each neuron has the same activation function.
S5: after a state variable space, an action variable space and a reward function are defined, the three-dimensional neural network is trained.
S51: the state variable space S is defined as follows:
S={soc,vel,acc,ωmg,iCVT,Peng,θ,Roadsurface,Npeople}
where soc is the battery state of charge, vel is the speed, acc is the acceleration, ω ismgIs the motor speed, iCVTIs the transmission ratio of the continuously variable transmission, PengIs engine power, θ is slope, RoadsurfaceIs of the road surface type, NpeopleIs the number of passengers. Meanwhile, in a nine-dimensional state variable space, the battery charge state, speed and acceleration are subordinate to the vehicle system state, the motor rotation speed, the continuously variable transmission ratio and the engine power are subordinate to the control part state, and the gradient, the road surface type and the number of passengers are subordinate to the driving environment state.
S52: the motion variable space A is defined as follows
Wherein, Δ ωmgIs the amount of change, Δ i, in the rotational speed of the motorCVTIs the amount of change in the transmission ratio of the transmission,ΔPengis the amount of change in engine power.
S53: reward function definition R is as follows
Wherein α, β, γ, χ and ξ are weight coefficients, t is time, soc
targetIs the target charge state, abs denotes the absolute value, ω
refIs a reference motor speed, i
refReference is made to the transmission ratio of the transmission,
is instantaneous oil consumption, T
engIs the engine torque, n
engIs the engine speed, η
engIs the engine efficiency.
After the definition of the state space, the action space and the reward function is completed, an iterative training process is started.
S6: and extracting and storing a neural network (comprising a motor rotating speed fine adjustment strategy, an engine power control strategy and a mechanical continuously variable transmission gear shifting strategy) synchronously fitting the three parameterized control strategies, and realizing the cooperative guarantee of fuel economy and braking safety.
Under the guidance of a reward function, the training model gradually increases towards an accumulated reward value and is developed in a state of keeping stable convergence (which marks the end of the training process and the successful learning of an optimal control strategy), then three-dimensional deep neural network parameters are extracted and stored as a persistence model for practical test and application, wherein three parameterized control strategies are simultaneously stored in the hidden layer model and respectively comprise a motor rotating speed fine adjustment strategy, an engine power control strategy and a mechanical continuously variable transmission gear shifting strategy in a braking stage. In a normal driving stage, particularly in a pure generator driving mode, a driving charging mode and a hybrid driving mode, the optimal or near-optimal fuel economy is obtained through an engine power control strategy and a mechanical continuously variable transmission gear shifting strategy, and in a braking stage, when the braking requirement is in the motor braking capacity range, the performance requirements of braking energy recovery and safe braking are realized through a motor rotating speed fine-adjustment strategy, so that the finally realized synchronous learning of three different types of control strategies is realized, and the cooperative guarantee of the fuel economy and the braking safety is also realized.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.