CN110716550A - Gear shifting strategy dynamic optimization method based on deep reinforcement learning - Google Patents

Gear shifting strategy dynamic optimization method based on deep reinforcement learning Download PDF

Info

Publication number
CN110716550A
CN110716550A CN201911076016.XA CN201911076016A CN110716550A CN 110716550 A CN110716550 A CN 110716550A CN 201911076016 A CN201911076016 A CN 201911076016A CN 110716550 A CN110716550 A CN 110716550A
Authority
CN
China
Prior art keywords
network
gear shifting
shifting strategy
predicted
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911076016.XA
Other languages
Chinese (zh)
Other versions
CN110716550B (en
Inventor
陈刚
袁靖
张介
顾爱博
周楠
王和荣
苏树华
陈守宝
王良模
王陶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tech University
Original Assignee
Nanjing Tech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tech University filed Critical Nanjing Tech University
Priority to CN201911076016.XA priority Critical patent/CN110716550B/en
Publication of CN110716550A publication Critical patent/CN110716550A/en
Application granted granted Critical
Publication of CN110716550B publication Critical patent/CN110716550B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process

Abstract

The invention belongs to the field of engineering machinery and vehicle engineering, and particularly relates to a gear shifting strategy dynamic optimization method based on deep reinforcement learning. The method comprises the following steps: (1): determining a gear shifting strategy state input variable and an action output variable; (2): determining a Markov decision process of a gear shifting strategy according to the state input variable and the action output variable; (3): establishing a reinforcement learning gear shifting strategy reward function according to a gear shifting strategy target; (4): solving a deep reinforcement learning gear shifting strategy according to a Markov decision process and a reward function; (5): putting the predicted Q network calculated in the step (4) into a gear shifting strategy controller, and selecting gears of the engineering machinery and the vehicle according to the gear shifting strategy controller in the driving process of the engineering machinery and the vehicle; (6): the predictive Q network is updated periodically during travel. According to the invention, the gear shifting strategy is updated by a deep reinforcement learning method, so that the dynamic optimization of the gear shifting strategy is realized.

Description

Gear shifting strategy dynamic optimization method based on deep reinforcement learning
Technical Field
The invention belongs to the field of engineering machinery and vehicle engineering, and particularly relates to a gear shifting strategy dynamic optimization method based on deep reinforcement learning.
Background
The gear shifting strategy is one of core technologies of the existing engineering machinery and vehicle control technology, and refers to the rule that gears change along with selected parameters in the driving process of the engineering machinery and the vehicle. The solving method is mainly considered by establishing a gear shifting strategy. The solving method of the gear shifting strategy comprises a graphical method, an analytical method, a genetic algorithm, a dynamic programming method and the like. The solution and optimization of the shift strategy are core directions related to the research of the shift strategy, and particularly, the dynamic optimization of the shift strategy is one of the difficulties in the research of the shift strategy.
The method comprises the steps of 'correcting the optimal dynamic AMT gear shifting law based on variable load', Lihao, control project, volume 22, phase 1, pages 50-54, and month 1 in 2015. Acceleration is introduced as a gear shifting parameter on the basis of a two-parameter gear shifting strategy, and dynamic three-parameter gear shifting considering the acceleration is realized. The solving method is an analytical method, an acceleration-speed curve needs to be fitted according to each accelerator opening degree in the solving process, the solving is complex and large in calculation amount, and meanwhile, the solving can be only carried out aiming at a single performance index, and dynamic optimization cannot be carried out aiming at the actual running condition.
"Performance Evaluation application for Indvidulized Gearshift Schedule Optimization", Yin X, year 2016, month 05. The gear shifting strategy is optimized by using the genetic algorithm, the comprehensive performance of the gear shifting strategy is improved, and the problem that only a single performance index can be solved by an analytical method is solved, but dynamic optimization cannot be performed on the actual driving condition.
"Optimal gear shift strategies for fuel economy and drive", VietDacNgo, Proceedings of the institute of Mechanical Engineers, Part D, Journal of automatic Engineering, Vol.227, No. 10, p.1398-1413, p.2013 for 10 months. And solving the gear shifting strategy aiming at the specific driving cycle working condition by a dynamic programming method. The disadvantages are that: when the shift schedule is solved by dynamic programming, a complex state diagram needs to be constructed, and the state diagram is expressed in a table form. The complexity of the state diagram depends on the degree of dispersion in the dynamic programming algorithm. An excessively complex state diagram may have a reduced convergence rate or may fail to converge due to a berman latitude disaster. Meanwhile, due to the fact that optimization is conducted according to a specific driving cycle, dynamic optimization cannot be conducted in the driving process.
In the existing patent, patent application No. 201710887558.X discloses a method for optimizing a shift schedule of an automobile by using a dynamic programming algorithm. According to the embodiment, shift schedules based on economy and dynamics are respectively established. When the shift schedule is solved by dynamic programming, a complex state diagram needs to be constructed, and the complexity degree of the state diagram depends on the discrete degree in a dynamic programming algorithm. An excessively complex state diagram may have a reduced convergence rate or may fail to converge due to a berman latitude disaster. And meanwhile, dynamic optimization cannot be carried out according to the actual running condition.
In prior art patent application No. 201811306659.4 discloses a shift strategy correction method and system based on driving intent. And updating the current gear shifting correction coefficient and the compensation deviant according to the driving process of a driver, correcting the original gear shifting strategy and realizing the dynamic update of the gear shifting strategy. However, the dynamic updating rule of the gear shifting strategy needs to be manually established, the optimization effect is greatly influenced by the manual establishment, and meanwhile, the optimization method is not universal and only can be used for a single vehicle type. The intelligent degree is lower.
Generally, most of the existing gear shifting strategy solving or optimizing methods cannot perform dynamic optimization aiming at actual driving conditions, and the self-adaptive capacity is poor. Part of gear shifting strategies capable of realizing dynamic optimization need to artificially establish dynamic updating rules of the gear shifting strategies, and the intelligent performance and the universality are low.
Disclosure of Invention
The invention aims to provide a gear shifting strategy dynamic optimization method based on deep reinforcement learning. The method comprises the steps of constructing a Markov decision process and a reward function of a gear shifting strategy, then solving the gear shifting strategy by using a deep reinforcement learning method, then putting a prediction Q network solved by the deep reinforcement learning method into a gear shifting strategy controller to realize gear selection, simultaneously collecting driving data in the daily driving process, updating the gear shifting strategy by using the deep reinforcement learning method, and realizing dynamic optimization of the gear shifting strategy.
The technical solution for realizing the purpose of the invention is as follows: a gear shifting strategy dynamic optimization method based on deep reinforcement learning comprises the following steps:
step (1): determining a gear shifting strategy state input variable and an action output variable;
step (2): determining a Markov decision process of a gear shifting strategy according to the state input variable and the action output variable in the step (1);
and (3): establishing a reinforcement learning gear shifting strategy reward function according to a gear shifting strategy target;
and (4): solving a deep reinforcement learning gear shifting strategy according to the Markov decision process in the step (2) and the reward function in the step (3); firstly, a Markov chain is calculated through a Markov decision process and a reward function, the Markov chain is stored in an experience pool, and then a prediction Q network in a deep reinforcement learning gear shifting strategy is updated according to data in the experience pool;
and (5): putting the predicted Q network calculated in the step (4) into a gear shifting strategy controller, and selecting gears of the engineering machinery and the vehicle according to the gear shifting strategy controller in the driving process of the engineering machinery and the vehicle;
and (6): during the driving process, collecting the driving data of the engineering machinery and the vehicle, storing the driving data into an experience pool, periodically updating the forecast Q network, and putting the forecast Q network into a gear shifting strategy controller after the updating is finished so as to realize the dynamic optimization of the gear shifting strategy.
Further, the state input variables in the step (1) comprise vehicle speed v and accelerationAnd accelerator opening degree alphatThe slope of travel and the coefficient of ground friction resistance; the action output variables include gear operations including upshift, downshift or hold, and shift operations, i.e., selected gear ng
Further, the markov decision process of the shift strategy in step (2) is expressed in the form of a transfer function with the next time state as the current state and the selected action, and the form of the transfer function is as follows:
st+1=T(st,at)
in the formula, st+1Is the state variable at the next moment, stAs a current state variable, atIs the selected action variable, where S ∈ S, a ∈ A, S is the set of state variables, and A is the set of action variables.
Further, the shift strategy reward function in step (3) is positively correlated with shift strategy objectives, including power, economy and comfort.
Further, the gear shifting strategy target is a dynamic gear shifting strategy, which is described as that the engineering machinery and the vehicle reach the highest speed in the shortest time t under the comfort degree constraint condition, and the reward and punishment mechanism is as follows:
Figure BDA0002262464010000031
in the formula, r is the reward calculated by a reward and punishment mechanism; r istFor temporary awards, rt=-0.001||VTamx-v||;vTmaxIs the current throttle opening alphatThe maximum vehicle speed; j is the impact degree of the engineering machinery and the vehicle; j. the design is a squaremaxThe designed maximum allowable impact degree.
Further, the form of the markov chain in the step (4) is as follows:
<st,at,rt,st+1>
in the formula, rtIs a temporary prize calculated based on a prize target.
Further, the deep reinforcement learning method in step (4) includes two neural networks with the same structure but different parameters, which are called a predicted Q network and a target Q network, wherein the predicted Q network is used for calculating a Q value of each action in the current state, and the target Q network is used for updating the predicted Q network.
Further, in step four, the Markov chain is established with the action variable atThe selection is by a greedy algorithm, which is expressed as:
Figure BDA0002262464010000041
in the formula, QpTo predict the Q network, θpE is a greedy algorithm parameter for predicting Q network parameters;
in the step (4), the Markov chain is saved into an experience pool, and then a predicted Q network in the deep reinforcement learning shift strategy is updated according to data in the experience pool, wherein the predicted Q network is used for calculating the driving state stPredicting the Q value under the lower gear set A, and predicting the output of the Q network to be Qp(s,A,θp)。
Further, in the step (5), during the driving process of the construction machine and the vehicle, the construction machine and the vehicle select a gear according to a gear shift strategy controller, and the gear shift controller selects an appropriate gear a according to the predicted Q network:
a*(s)=argmaxa[Qp(s,a,θp)|a∈A]
in the formula, QpTo predict the Q network, θpTo predict Q network parameters.
Further, the step (6) of collecting the driving data comprises: vehicle speed, accelerator opening, acceleration, running gradient and ground friction resistance coefficient;
the method for updating the predicted Q network in the step (6) comprises two methods: reconstructing the transfer function in the step (2) through the driving data of the engineering machinery and the vehicle, and then updating the prediction Q network according to the step (3) and the step (4); the second method is to directly update the predicted Q network according to the predicted Q network updating method in the step (4);
reconstructing the transfer function in the step (2) by acquiring the driving data of the engineering machinery and the vehicle, wherein the reconstruction method is to recalculate parameters in the transfer function to form the transfer function with the same structure but different parameters, or to fit the transfer function by adopting a neural network, linear fitting and a Fourier transform method;
and (4) acquiring the driving data of the engineering machinery and the vehicle, and updating according to the predicted Q network updating method in the step (4), wherein the updating method of the predicted Q network comprises the following steps:
Qp(s,a,θp)=Qp(s,a,θp)+α(r+γmaxaQt(s,a,θt)-Qp(s,a,θp))2
wherein γ is the reward decrement value; alpha is the neural network learning rate; qtTo target Q network, θtFor the target Q network parameter
Compared with the prior art, the invention has the remarkable advantages that:
(1) by adopting the deep reinforcement learning method, the principle that the prediction Q network is updated by constructing the Markov chain comprising the Markov decision process and the reward function according to the driving process of the engineering machinery and the vehicle can be realized, and the problems of solving and dynamically optimizing the gear shifting strategy can be solved; the method has the characteristic of strong self-adaptive capacity;
(2) by adopting the deep reinforcement learning method, the algorithm has uniformity in the steps of solving and dynamic optimization, is not influenced by the controlled object body, and can be suitable for different vehicle types such as passenger vehicles, engineering machinery, vehicles, special vehicles, electric vehicles and the like; the transfer function can be fitted by adopting a neural network, linear fitting and a Fourier transform method, the influence of an applied applicable object body is avoided, and the method has the characteristic of strong universality;
(3) the gear shifting strategy is solved and dynamically optimized by adopting a deep reinforcement learning method, and the algorithm is not influenced by a controlled object body, and meanwhile, the dynamic optimization of the gear shifting strategy can be realized, so that the gear shifting strategy has the characteristic of strong intelligence;
(4) the method and the device have the advantages that the selection of gears is realized by adopting the prediction Q network in deep reinforcement learning, a table form in the traditional method is replaced, and the problem of Bellman latitude disaster can be solved due to the fact that the neural network has strong fitting capacity and is suitable for a gear shifting strategy under a high-dimensional state variable.
Drawings
FIG. 1 is a schematic diagram of a shift strategy dynamic optimization method based on deep reinforcement learning.
FIG. 2 is a flow chart for solving a deep reinforcement learning shift strategy according to the present invention.
FIG. 3 is a diagram of a neural network architecture model employed in the present invention.
FIG. 4 is a process diagram for dynamic optimization of the shift strategy of the present invention.
Detailed Description
The invention provides a gear shifting strategy dynamic optimization method based on deep reinforcement learning. The method constructs a Markov decision process and a reward function of the gear shifting strategy, and then solves the gear shifting strategy by using a deep reinforcement learning method. And then, putting the predicted Q network solved by the deep reinforcement learning method into a gear shifting strategy controller to realize gear selection. Meanwhile, driving data are collected in the daily driving process, and the gear shifting strategy is updated through a deep reinforcement learning method. And realizing dynamic optimization of the gear shifting strategy.
A gear shifting strategy dynamic optimization method based on deep reinforcement learning comprises the following steps:
step one, determining a gear shifting strategy state variable and an action variable.
And step two, determining a Markov decision process of the gear shifting strategy according to the state input variable and the action output variable.
And step three, establishing a reinforcement learning gear shifting strategy reward function according to the gear shifting strategy optimization target.
And step four, solving a deep reinforcement learning gear shifting strategy according to the Markov decision process in the step two and the reward function in the step three. Firstly, a Markov chain is calculated through an established Markov decision process and a reward function, the Markov chain is stored in an experience pool, and then a prediction Q network in a deep reinforcement learning gear shifting strategy is updated according to data in the experience pool.
And step five, putting the predicted Q network calculated in the step four into a gear shifting strategy controller, and selecting gears by the engineering machinery and the vehicle according to the gear shifting strategy controller in the driving process of the engineering machinery and the vehicle.
And step six, in the driving process, collecting the driving data of the engineering machinery and the vehicle, storing the driving data into an experience pool, periodically updating the predicted Q network, and putting the predicted Q network into a gear shifting strategy controller after the updating is finished so as to realize dynamic optimization of the gear shifting strategy.
Further, in the step one, the shift strategy state variables are engineering machinery and vehicle running state variables or external environment variables. The action variable is a gear operation or a shift operation. The gear operation comprises an upshift, a downshift or a gear holding; the shift operation is the selected gear.
In the second step, the Markov decision process of the gear shifting strategy is expressed in the form of a transfer function T of which the state is the current state and the selected action at the next moment. The transfer function is of the form:
st+1=T(st,at)
in the formula, st+1Is the state variable at the next moment, stAs a current state variable, atIs the selected action variable. Wherein S ∈ S, and a ∈ A. S is a set of state variables, and A is a set of action variables. In the gear shifting strategy, the state variables are the running state variables of the engineering machinery and the vehicle or the external environment variables, including the speed, the opening degree of an accelerator, the acceleration, the running gradient and the ground friction resistance coefficient. The action variable includes a gear operation or a shift operation.
In the third step, the established gear shifting strategy reward function is positively correlated with the gear shifting target.
In the third step, the shift target includes power, economy and comfort.
In the fourth step, a Markov chain is calculated through the established Markov decision process and the reward function. The Markov chain is of the form:
<st,at,rt,st+1>
in the formula, rtIs a temporary prize calculated based on a prize target.
In the fourth step, the Markov chain is established, action atThe selection is by a greedy algorithm, which is expressed as:
Figure BDA0002262464010000071
in the formula, QpTo predict the Q network, θpTo predict Q network parameters. e is a greedy algorithm parameter.
In the fourth step, the Markov chain is stored into an experience pool, and then the prediction Q network in the deep reinforcement learning shift strategy is updated according to the data in the experience pool. Predictive Q network for calculating the driving state stQ value under the lower gear set A. Predicting the output of the Q network as Qp(s,A,θp) The updating method of the prediction Q network comprises the following steps:
Qp(s,a,θp)=Qp(s,a,θp)+α(r+γmaxaQt(s,a,θt)-Qp(s,a,θp))2
wherein γ is the reward decrement value; alpha is the neural network learning rate; qtIs the target Q network. ThetatIs the target Q network parameter.
And in the fifth step, the engineering machinery and the vehicle select gears according to the gear shifting strategy controller in the driving process. The gear shift controller selects the appropriate gear a based on the predicted Q network.
a*(s)=argmaxa[Qp(s,a,θp)|a∈A]
In the formula, QpTo predict the Q network, θpTo predict Q network parameters.
In the sixth step, the collecting the driving data includes: vehicle speed, accelerator opening, acceleration, travel grade and ground friction resistance coefficient.
In the sixth step, two methods for updating the predicted Q network are included. And the first method is to reconstruct the transfer function in the second step through the driving data of the engineering machinery and the vehicle, and then update the predicted Q network according to the third step and the fourth step. The second method is to directly update the predicted Q network according to the predicted Q network updating method in the fourth step.
In the sixth step, the method for updating the predicted Q network is to reconstruct the transfer function in the second step by collecting the driving data of the engineering machinery and the vehicle, and the reconstruction method is to recalculate the parameters in the transfer function to form the transfer function with the same structure but different parameters, or to fit the transfer function by using a neural network, linear fitting, fourier transform method, and the like.
In the sixth step, the second method for updating the predicted Q network is to collect the driving data of the engineering machinery and the vehicle and update the driving data according to the updating method of the predicted Q network in the fourth step, and the updating method of the predicted Q network is as follows:
Qp(s,a,θp)=Qp(s,a,θp)+α(r+γmaxaQt(s,a,θt)-Qp(s,a,θp))2
in the sixth step, the dynamic optimization of the shift strategy is realized by predicting the update of the Q network in the deep reinforcement learning.
Examples
The invention provides a gear shifting strategy dynamic optimization method based on deep reinforcement learning. The method constructs a Markov decision process of the gear shifting strategy, and then solves the gear shifting strategy by using a deep reinforcement learning method. And after solving is completed, putting the prediction Q network trained by deep reinforcement learning into a gear shifting strategy controller to realize gear selection. And then, in the driving process, updating the predicted Q network by collecting the driving data of the engineering machinery and the vehicle so as to realize the dynamic optimization of the gear shifting strategy. The updating method of the prediction Q network comprises the following steps: and reconstructing a gear shifting strategy transfer function according to the driving data of the engineering machinery and the vehicle to update the prediction Q network and directly updating the prediction Q network according to a deep reinforcement learning method. The principle of the gear shifting strategy dynamic optimization method based on deep reinforcement learning is shown in fig. 1, and the method comprises the following steps:
step one, determining a gear shifting strategy state variable and an action variable.
And step two, determining a Markov decision process of the gear shifting strategy according to the state input variable and the action output variable.
And step three, establishing a reinforcement learning gear shifting strategy reward function according to the gear shifting strategy optimization target.
And step four, solving a deep reinforcement learning gear shifting strategy according to the Markov decision process in the step two and the reward function in the step three. Firstly, a Markov chain is calculated through an established Markov decision process and a reward function, the Markov chain is stored in an experience pool, and then a prediction Q network in a deep reinforcement learning gear shifting strategy is updated according to data in the experience pool.
And step five, putting the predicted Q network calculated in the step four into a gear shifting strategy controller, and selecting gears by the engineering machinery and the vehicle according to the gear shifting strategy controller in the driving process of the engineering machinery and the vehicle.
And step six, in the driving process, collecting the driving data of the engineering machinery and the vehicle, storing the driving data into an experience pool, periodically updating the predicted Q network, and putting the predicted Q network into a gear shifting strategy controller after the updating is finished so as to realize dynamic optimization of the gear shifting strategy.
The technical solution of the present invention is described below with reference to the accompanying drawings and examples.
Step one, determining a gear shifting strategy state variable and an action variable. In the embodiment, the shift strategy state variables are vehicle speed v and acceleration
Figure BDA0002262464010000081
And accelerator opening degree alphat. The action variable being gear ng
In an embodiment, the shift strategy markov decision process is determined from state variables (vehicle speed, acceleration, throttle opening) and action variables (gears). The Markov decision process state transfer function T is:
Figure BDA0002262464010000091
in the formula, TeOutputting torque for the engine; i.e. igFor a gear ngA corresponding gear ratio; i.e. i0The transmission ratio of the main speed reducer is set; etatFor transmission system efficiency; m is the total weight of the automobile; beta is the equivalent gradient resistance coefficient. CdIs the air resistance coefficient; a is the frontal area of the automobile; fbIs braking force; r is the effective turning radius of the tire; ρ is the air density.
And step three, establishing a reinforcement learning gear shifting strategy reward function according to the gear shifting target. In the present embodiment, the learning objective is a dynamic shift strategy, which describes that the engineering machine and the vehicle reach the highest speed in the shortest time t under the comfort constraint condition. The reward and punishment mechanism is as follows:
Figure BDA0002262464010000092
in the formula, r is the reward calculated by a reward and punishment mechanism; r istFor temporary awards, rt=-0.001||VTamx-v||;vTmaxIs the current throttle opening alphatThe maximum vehicle speed; j is the impact degree of the engineering machinery and the vehicle; j. the design is a squaremaxThe designed maximum allowable impact degree.
And step four, solving a deep reinforcement learning gear shifting strategy according to the Markov decision process in the step two and the reward function in the step three. Firstly, a Markov chain is calculated through an established Markov decision process and a reward function, the Markov chain is stored in an experience pool, and then a prediction Q network in a deep reinforcement learning gear shifting strategy is updated according to data in the experience pool. The flow of step four is shown in fig. 2. The specific steps are as follows.
The first step is as follows: firstly, initializing a state variable and an action variable, and calculating the state of the next moment according to the established Markov decision process transfer function.
The second step is that: and calculating the reward through a designed reward and punishment mechanism.
The third step: the state-action-next-time state and reward representations described above are saved into the experience pool in the form of markov chains.
The fourth step: and taking the state at the next moment as the current state, calculating the Q value of each action according to the current state by the prediction Q network, and calculating the actually selected gear in the current state according to the Q value of each action by a greedy algorithm. Then returning to the first step and circulating to and fro.
In the above steps, when the number of markov chains in the experience pool reaches a predetermined number, updating of the predictive Q network is started.
The updating process of the predicted Q network is completed by the predicted Q network and the target Q network together, and the updating method of the predicted Q network comprises the following steps:
Qp(s,a,θp)=Qp(s,a,θp)+α(r+γmaaxQt(s,a,θt)-Qp(s,a,θp))2
wherein γ is the reward decrement value; alpha is the neural network learning rate; qtIs the target Q network. ThetatIs the target Q network parameter.
In the updating process of the predicted Q network, parameters of the predicted Q network need to be imported and copied into the target Q network periodically to achieve updating of the target Q network.
The predicted Q network and the target Q network have the same neural network structure. In the present embodiment, the neural network structure models adopted by the predicted Q network and the target Q network are shown in fig. 3. The used neural network structure model has five full connection layers as intermediate layers, and adopts a linear rectification function ReLU as a neural network activation function. The linear rectification function ReLU is expressed as:
ReLU(x)=max(0,Wx+b)
in the formula: w is the weight of the neural network; b is the bias rank of the neural network; x is the neural network input.
In the present embodiment, the neural network data input is a state variable (vehicle speed, accelerator opening, acceleration). All gears n output by the output layergCorresponding Q value. The larger the Q value is, the larger the maximum discount accumulated reward value can be obtained by selecting the gear corresponding to the Q value under the current state.
And step five, putting the predicted Q network calculated in the step four into a gear shifting strategy controller, and selecting gears by the engineering machinery and the vehicle according to the gear shifting strategy controller in the driving process of the engineering machinery and the vehicle.
And in the fifth step, the engineering machinery and the vehicle select gears according to the gear shifting strategy controller. The concrete expression is as follows:
a*(s)=argmaxa[Qp(s,a,θp)|a∈A]
in the formula, QpTo predict the Q network, θpTo predict Q network parameters.
And step six, in the driving process, collecting the driving data of the engineering machinery and the vehicle, storing the driving data into an experience pool, periodically updating the predicted Q network, and putting the predicted Q network into a gear shifting strategy controller after the updating is finished so as to realize dynamic optimization of the gear shifting strategy.
In step six, the method for updating the predicted Q network includes two methods. And the first method is to reconstruct the transfer function in the second step through the driving data of the engineering machinery and the vehicle, and then update the predicted Q network according to the third step and the fourth step. The second method is to directly update the predicted Q network according to the predicted Q network updating method in the fourth step.
In the sixth step, the method for updating the predicted Q network is to reconstruct the transfer function in the second step by collecting the driving data of the engineering machinery and the vehicle, wherein the reconstruction method is to recalculate the parameters in the transfer function to form the transfer function with the same structure but different parameters, or to fit the transfer function by adopting a neural network, linear fitting, Fourier transform and the like. In this embodiment, the reconstructed transfer function is:
Figure BDA0002262464010000111
depending on the reconstruction method, in this embodiment, the parameters in the transfer function may be recalculated. Or by fitting the transfer function using a neural network, linear fitting, and fourier transform. Regardless of the form of reconstruction performed, the reconstructed transfer function can be uniformly expressed as:
st+1=Tnew(st,at,Θ)
where Ω is a transfer function parameter.
After the reconstruction is finished, the fourth step and the fifth step need to be carried out again to obtain a new prediction Q network.
And in the sixth step, the second method for updating the predicted Q network is to collect the driving data of the engineering machinery and the vehicle and then update according to the updating method of the predicted Q network in the fourth step.
In the sixth step, the second method for updating the predicted Q network is to acquire the driving data of the engineering machinery and the vehicle and update the driving data according to the predicted Q network updating method in the fourth step, so as to realize the dynamic optimization process of the gear shifting strategy, and the dynamic optimization process of the gear shifting strategy is shown in fig. 4. The specific process is as follows:
the first step is as follows: collecting engineering machinery and vehicle running data
The second step is that: the collected driving data of the engineering machinery and the vehicle are processed, and the processed data are expressed in a Markov chain form and can be expressed as follows:
<st,at,rt,st+1>
the third step: the updating of the predicted Q network is completed by the predicted Q network and the target Q network together, and the method comprises the following steps:
Qp(s,a,θp)=Qp(s,a,θp)+α(r+γmaxaQt(s,a,θt)-Qp(s,a,θp))2
in addition to the above embodiments, the present invention may have other embodiments. All technical solutions formed by adopting equivalent substitutions or equivalent transformations fall within the protection scope of the claims of the present invention.

Claims (10)

1. A gear shifting strategy dynamic optimization method based on deep reinforcement learning is characterized by comprising the following steps:
step (1): determining a gear shifting strategy state input variable and an action output variable;
step (2): determining a Markov decision process of a gear shifting strategy according to the state input variable and the action output variable in the step (1);
and (3): establishing a reinforcement learning gear shifting strategy reward function according to a gear shifting strategy target;
and (4): solving a deep reinforcement learning gear shifting strategy according to the Markov decision process in the step (2) and the reward function in the step (3); firstly, a Markov chain is calculated through a Markov decision process and a reward function, the Markov chain is stored in an experience pool, and then a prediction Q network in a deep reinforcement learning gear shifting strategy is updated according to data in the experience pool;
and (5): putting the predicted Q network calculated in the step (4) into a gear shifting strategy controller, and selecting gears of the engineering machinery and the vehicle according to the gear shifting strategy controller in the driving process;
and (6): during the driving process, collecting the driving data of the engineering machinery and the vehicle, storing the driving data into an experience pool, periodically updating the forecast Q network, and putting the forecast Q network into a gear shifting strategy controller after the updating is finished so as to realize the dynamic optimization of the gear shifting strategy.
2. The method of claim 1, wherein the state input variables in step (1) include vehicle speed v, accelerationAnd accelerator opening degree alphatThe slope of travel and the coefficient of ground friction resistance; the motion output variables include gear operation and gear shift operation, wherein the gear operationIncluding upshifting, downshifting, or holding gears, the shifting operation being the selected gear ng
3. The method of claim 2, wherein the shift strategy markov decision process in step (2) is expressed in the form of a transfer function with the next moment state being the current state and the selected action, the transfer function being of the form:
st+1=T(st,at)
in the formula, st+1Is the state variable at the next moment, stAs a current state variable, atIs the selected action variable, where S ∈ S, a ∈ A, S is the set of state variables, and A is the set of action variables.
4. The method of claim 3, wherein the shift strategy reward function of step (3) is positively correlated with shift strategy objectives, including power, economy, and comfort.
5. The method according to claim 4, wherein the shift strategy target is a dynamic shift strategy, which is described as the engineering machinery and the vehicle reaching the highest speed in the shortest time t under the comfort constraint condition, and the reward and punishment mechanism is as follows:
Figure FDA0002262462000000021
in the formula, r is the reward calculated by a reward and punishment mechanism; r istFor temporary awards, rt=-0.001||VTamx-v||;vTmaxIs the current throttle opening alphatThe maximum vehicle speed; j is the impact degree of the engineering machinery and the vehicle; j. the design is a squaremaxThe designed maximum allowable impact degree.
6. The method of claim 4, wherein the Markov chain of step (4) is of the form:
<st,at,rt,st+1>
in the formula, rtIs a temporary prize calculated based on a prize target.
7. The method according to claim 6, wherein the deep reinforcement learning method in step (4) comprises two neural networks with the same structure but different parameters, namely a predicted Q network and a target Q network, wherein the predicted Q network is used for calculating Q values of actions in the current state, and the target Q network is used for updating the predicted Q network.
8. The method of claim 7, wherein in step four, the Markov chain is established with the action variable atThe selection is by a greedy algorithm, which is expressed as:
Figure FDA0002262462000000022
in the formula, QpTo predict the Q network, θpE is a greedy algorithm parameter for predicting Q network parameters;
in the step (4), the Markov chain is saved into an experience pool, and then a predicted Q network in the deep reinforcement learning shift strategy is updated according to data in the experience pool, wherein the predicted Q network is used for calculating the driving state stPredicting the Q value under the lower gear set A, and predicting the output of the Q network to be Qp(s,A,θp)。
9. The method according to claim 8, wherein in step (5), the work machine and the vehicle select gears according to a shift strategy controller during driving, and the shift controller selects an appropriate gear a according to the predicted Q network:
a*(s)=argmaxa[Qp(s,a,θp)|a∈A]
in the formula, QpTo predict the Q network, θpTo predict Q network parameters.
10. The method of claim 9, wherein the step (6) of collecting travel data comprises: vehicle speed, accelerator opening, acceleration, running gradient and ground friction resistance coefficient;
the method for updating the predicted Q network in the step (6) comprises two methods: reconstructing the transfer function in the step (2) through the driving data of the engineering machinery and the vehicle, and then updating the prediction Q network according to the step (3) and the step (4); the second method is to directly update the predicted Q network according to the predicted Q network updating method in the step (4);
reconstructing the transfer function in the step (2) by acquiring the driving data of the engineering machinery and the vehicle, wherein the reconstruction method is to recalculate parameters in the transfer function to form the transfer function with the same structure but different parameters, or to fit the transfer function by adopting a neural network, linear fitting and a Fourier transform method;
and (4) acquiring the driving data of the engineering machinery and the vehicle, and updating according to the predicted Q network updating method in the step (4), wherein the updating method of the predicted Q network comprises the following steps:
Qp(s,a,θp)=Qp(s,a,θp)+α(r+γmaxaQt(s,a,θt)-Qp(s,a,θp))2
wherein γ is the reward decrement value; alpha is the neural network learning rate; qtTo target Q network, θtIs the target Q network parameter.
CN201911076016.XA 2019-11-06 2019-11-06 Gear shifting strategy dynamic optimization method based on deep reinforcement learning Active CN110716550B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911076016.XA CN110716550B (en) 2019-11-06 2019-11-06 Gear shifting strategy dynamic optimization method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911076016.XA CN110716550B (en) 2019-11-06 2019-11-06 Gear shifting strategy dynamic optimization method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN110716550A true CN110716550A (en) 2020-01-21
CN110716550B CN110716550B (en) 2022-07-22

Family

ID=69213797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911076016.XA Active CN110716550B (en) 2019-11-06 2019-11-06 Gear shifting strategy dynamic optimization method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN110716550B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111487863A (en) * 2020-04-14 2020-08-04 东南大学 Active suspension reinforcement learning control method based on deep Q neural network
CN111882030A (en) * 2020-06-29 2020-11-03 武汉钢铁有限公司 Ingot adding strategy method based on deep reinforcement learning
CN111965981A (en) * 2020-09-07 2020-11-20 厦门大学 Aeroengine reinforcement learning control method and system
CN112395690A (en) * 2020-11-24 2021-02-23 中国人民解放军海军航空大学 Reinforced learning-based shipboard aircraft surface guarantee flow optimization method
CN112861269A (en) * 2021-03-11 2021-05-28 合肥工业大学 Automobile longitudinal multi-state control method based on deep reinforcement learning preferential extraction
CN114662982A (en) * 2022-04-15 2022-06-24 四川大学 Urban power distribution network multi-stage dynamic reconstruction method based on machine learning
CN116069014A (en) * 2022-11-16 2023-05-05 北京理工大学 Vehicle automatic control method based on improved deep reinforcement learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020079149A1 (en) * 2000-12-21 2002-06-27 Kotre Stephen John Adaptive fuel strategy for a hybrid electric vehicle
US20180018757A1 (en) * 2016-07-13 2018-01-18 Kenji Suzuki Transforming projection data in tomography by means of machine learning
CN107797534A (en) * 2017-09-30 2018-03-13 安徽江淮汽车集团股份有限公司 A kind of pure electronic automated driving system
CN108407797A (en) * 2018-01-19 2018-08-17 洛阳中科龙网创新科技有限公司 A method of the realization agricultural machinery self shifter based on deep learning
CN109325624A (en) * 2018-09-28 2019-02-12 国网福建省电力有限公司 A kind of monthly electric power demand forecasting method based on deep learning
CN109991856A (en) * 2019-04-25 2019-07-09 南京理工大学 A kind of integrated control method for coordinating of robot driver vehicle
CN110136481A (en) * 2018-09-20 2019-08-16 初速度(苏州)科技有限公司 A kind of parking strategy based on deeply study
CN110244701A (en) * 2018-03-08 2019-09-17 通用汽车环球科技运作有限责任公司 The method and apparatus of intensified learning for the autonomous vehicle based on the course sequence automatically generated

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020079149A1 (en) * 2000-12-21 2002-06-27 Kotre Stephen John Adaptive fuel strategy for a hybrid electric vehicle
US20180018757A1 (en) * 2016-07-13 2018-01-18 Kenji Suzuki Transforming projection data in tomography by means of machine learning
CN107797534A (en) * 2017-09-30 2018-03-13 安徽江淮汽车集团股份有限公司 A kind of pure electronic automated driving system
CN108407797A (en) * 2018-01-19 2018-08-17 洛阳中科龙网创新科技有限公司 A method of the realization agricultural machinery self shifter based on deep learning
CN110244701A (en) * 2018-03-08 2019-09-17 通用汽车环球科技运作有限责任公司 The method and apparatus of intensified learning for the autonomous vehicle based on the course sequence automatically generated
CN110136481A (en) * 2018-09-20 2019-08-16 初速度(苏州)科技有限公司 A kind of parking strategy based on deeply study
CN109325624A (en) * 2018-09-28 2019-02-12 国网福建省电力有限公司 A kind of monthly electric power demand forecasting method based on deep learning
CN109991856A (en) * 2019-04-25 2019-07-09 南京理工大学 A kind of integrated control method for coordinating of robot driver vehicle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卢伟: "拖拉机驾驶机器人设计及人机协作方法研究", 《南京信息工程大学学报》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111487863B (en) * 2020-04-14 2022-06-17 东南大学 Active suspension reinforcement learning control method based on deep Q neural network
CN111487863A (en) * 2020-04-14 2020-08-04 东南大学 Active suspension reinforcement learning control method based on deep Q neural network
CN111882030A (en) * 2020-06-29 2020-11-03 武汉钢铁有限公司 Ingot adding strategy method based on deep reinforcement learning
CN111882030B (en) * 2020-06-29 2023-12-05 武汉钢铁有限公司 Ingot adding strategy method based on deep reinforcement learning
CN111965981A (en) * 2020-09-07 2020-11-20 厦门大学 Aeroengine reinforcement learning control method and system
CN111965981B (en) * 2020-09-07 2022-02-22 厦门大学 Aeroengine reinforcement learning control method and system
CN112395690A (en) * 2020-11-24 2021-02-23 中国人民解放军海军航空大学 Reinforced learning-based shipboard aircraft surface guarantee flow optimization method
CN112861269B (en) * 2021-03-11 2022-08-30 合肥工业大学 Automobile longitudinal multi-state control method based on deep reinforcement learning preferential extraction
CN112861269A (en) * 2021-03-11 2021-05-28 合肥工业大学 Automobile longitudinal multi-state control method based on deep reinforcement learning preferential extraction
CN114662982A (en) * 2022-04-15 2022-06-24 四川大学 Urban power distribution network multi-stage dynamic reconstruction method based on machine learning
CN114662982B (en) * 2022-04-15 2023-07-14 四川大学 Multistage dynamic reconstruction method for urban power distribution network based on machine learning
CN116069014A (en) * 2022-11-16 2023-05-05 北京理工大学 Vehicle automatic control method based on improved deep reinforcement learning
CN116069014B (en) * 2022-11-16 2023-10-10 北京理工大学 Vehicle automatic control method based on improved deep reinforcement learning

Also Published As

Publication number Publication date
CN110716550B (en) 2022-07-22

Similar Documents

Publication Publication Date Title
CN110716550B (en) Gear shifting strategy dynamic optimization method based on deep reinforcement learning
CN108087541B (en) Multi-performance comprehensive optimal gear decision system of automobile stepped automatic transmission
CN111731303B (en) HEV energy management method based on deep reinforcement learning A3C algorithm
CN112943914B (en) Vehicle gear shifting line determining method and device, computer equipment and storage medium
DE112011104930T5 (en) Control equipment for vehicle drive system
CN110550034A (en) two-gear AMT comprehensive gear shifting method for pure electric vehicle
CN110985566B (en) Vehicle starting control method and device, vehicle and storage medium
CN110792762B (en) Method for controlling prospective gear shifting of commercial vehicle in cruise mode
CN106080155A (en) A kind of optimization integrated system driving motor and automatic transmission and shift control method
You et al. Shift strategy of a new continuously variable transmission based wheel loader
CN109733406A (en) Policy control method is travelled based on the pure electric automobile of fuzzy control and Dynamic Programming
CN106114492A (en) New-energy automobile automatic transmission power gear-shifting control system and control method
Mashadi et al. An automatic gear-shifting strategy for manual transmissions
CN113104023B (en) Distributed MPC network-connected hybrid electric vehicle energy management system and method
DE102010011018A1 (en) Vehicular power transmission control device
CN115805840A (en) Energy consumption control method and system for range-extending type electric loader
Zhao et al. Fuzzy determination of target shifting time and torque control of shifting phase for dry dual clutch transmission
CN103206524A (en) Gear-shifting control method of automatic gear box
CN109849897B (en) Hybrid power energy management method considering dynamic efficiency of coupling transmission system
Zou et al. Research on shifting process control of automatic transmission
CN106347373A (en) Dynamic planning method based on battery SOC (state of charge) prediction
CN107869579B (en) Fuzzy logic-based gear shifting rule control method and device and vehicle
Yin et al. Multi-performance optimal gearshift schedule of stepped automatic transmissions adaptive to road slope
Yin et al. Shift quality improvement through integrated control of dual clutches pressure and engine speed for DCT
EP3225885B1 (en) Method and control device for selecting a gear of a gearbox in a drivetrain of a motor vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant