CN115730529B

CN115730529B - PHET energy management strategy generation method and system based on working condition identification

Info

Publication number: CN115730529B
Application number: CN202211627066.4A
Authority: CN
Inventors: 王姝; 赵轩; 韩琪; 谢鹏辉; 张凯
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2022-12-16
Filing date: 2022-12-16
Publication date: 2024-02-27
Anticipated expiration: 2042-12-16
Also published as: CN115730529A

Abstract

PHET energy management strategy generation method and system based on working condition identification, wherein the method comprises the following steps: constructing typical driving conditions of the vehicle in different driving scenes; identifying real-time driving conditions of the vehicle; constructing a neural network based on a DDPG algorithm, and performing deep reinforcement learning on a source domain of the neural network to complete training of the neural network, wherein the source domain is a typical driving condition of a vehicle under different driving scenes; and transferring the trained neural network from a source domain to a target domain by adopting transfer learning to generate a PHET energy management strategy conforming to the characteristics of a driving scene, wherein the target domain is the real-time driving working condition of the vehicle. The invention can solve the problems that the existing learning-based energy management strategy cannot ensure the real-time optimization of energy consumption, the control strategy has poor timeliness under the brand-new, complex and special changeable driving working conditions, the real-time application effect is poor, the energy loss and the aging state balance of the power battery of the plug-in hybrid electric vehicle are not considered, and the like.

Description

PHET energy management strategy generation method and system based on working condition identification

Technical Field

The invention belongs to the technical field of new energy automobile design, and particularly relates to a PHET energy management strategy generation method and system based on working condition identification.

Background

With the rapid development of new energy automobile industry, the hybrid electric automobile can realize better fuel economy and lower tail gas emission compared with the traditional internal combustion engine automobile, and has higher driving range compared with the pure electric automobile. Compared with the traditional hybrid electric vehicle, the plug-in hybrid electric vehicle has the advantage that electric energy can be obtained from a power grid through an external charger, and the adaptability to various different driving environments is enhanced, so that the plug-in hybrid electric vehicle is widely focused and researched in the field of commercial vehicles. Currently, research on optimization and improvement of performance of a plug-in hybrid heavy truck (PHET) is mainly focused on a control strategy of energy management, and along with development of artificial intelligence technology, a learning-based energy management strategy, particularly a Deep Reinforcement Learning (DRL) method, is an effective method in a real-time energy management strategy. However, due to the complexity of the deep reinforcement learning calculation, there is a certain disadvantage in real-time application. Meanwhile, because of the complex and changeable PHET driving conditions, the PHET driving control system has higher requirements on energy management strategies.

The deep reinforcement learning-based energy management strategies currently involved have the following disadvantages:

1) After training a control strategy based on deep reinforcement learning, optimizing under one working condition, and suboptimal under the other working condition, wherein real-time optimization of energy consumption cannot be ensured; 2) When deep reinforcement learning faces a brand new working condition, even the energy management strategy problem in the same field, the deep reinforcement learning needs to be explored again, longer calculation time is needed, and the requirement of the energy management strategy on instantaneity is difficult to meet; 3) In the traditional working condition identification, the typical representative driving working condition is constructed on the basis of a standard driving working condition, however, PHET driving working conditions are changeable, driving behaviors in different operation scenes are different, and the standard driving working condition is insufficient to reflect the driving characteristics of a heavy truck in a specific scene; 4) The existing energy management strategies are related to the adoption of transfer learning to accelerate the training rate of deep reinforcement learning, so that the aim of accelerating the training efficiency is achieved. However, the influence of the real-time driving working condition of the vehicle on the energy management strategy formulation and application is not considered and accurately identified. 5) Most of the existing deep reinforcement learning energy management strategies only pay attention to how to optimize the fuel economy of the vehicle by using intelligent information on the basis of ensuring the dynamic property, and do not consider the balance of battery energy loss and aging state.

Disclosure of Invention

The invention aims to provide a PHET energy management strategy generation method and system based on working condition identification, which can obtain a control strategy conforming to the characteristics of a driving scene, and solve the problems that the existing learning-based energy management strategy cannot ensure real-time optimization of energy consumption, the control strategy has poor timeliness under the brand-new, complex and special changeable driving working conditions, the real-time application effect is poor, the energy loss of a power battery of a plug-in hybrid electric vehicle and the state balance of aging are not considered, and the like.

In order to achieve the above purpose, the present invention has the following technical scheme:

a PHET energy management strategy generation method based on working condition identification comprises the following steps:

constructing typical driving conditions of the vehicle in different driving scenes;

identifying real-time driving conditions of the vehicle;

constructing a neural network based on a DDPG algorithm, and performing deep reinforcement learning on a source domain of the neural network to complete training of the neural network, wherein the source domain is a typical driving condition of a vehicle under different driving scenes;

and transferring the trained neural network from a source domain to a target domain by adopting transfer learning to generate a PHET energy management strategy conforming to the characteristics of a driving scene, wherein the target domain is the real-time driving working condition of the vehicle.

As a preferred scheme, the typical driving conditions of the construction vehicle under different running scenes specifically comprise the following steps:

acquiring driving condition data of the vehicle in different operation scenes through cloud big data or vehicle-mounted OBD;

preprocessing running condition data by adopting wavelet decomposition and reconstruction, and performing kinematic segmentation on the preprocessed data;

performing dimension reduction processing on characteristic parameters describing the characteristics of each kinematic segment by adopting a principal component analysis algorithm;

the method comprises the steps of classifying the kinematics segments by adopting an SVM and K-means mixed classification algorithm, and constructing typical driving conditions under different operation scenes by using a Markov chain and a Monte Carlo simulation method on the basis of classification completion.

As a preferable scheme, the learning vector quantization is selected as the working condition identifier when the real-time driving working condition of the vehicle is identified.

As a preferable scheme, the identifying the real-time driving condition of the vehicle specifically includes the following steps:

selecting characteristic parameters by calculating pearson correlation coefficients among classical characteristic parameters;

extracting and training corresponding characteristic parameters based on typical driving condition data of the vehicle in different driving scenes;

And (5) identifying the real-time driving condition of the vehicle by calculating the Pearson correlation coefficient among the characteristic parameters.

In the step of extracting and training the corresponding feature parameters, a sliding window mode is adopted to extract the parameters.

As a preferable scheme, when the real-time driving condition of the vehicle is identified by calculating the pearson correlation coefficient among the characteristic parameters, 25s is selected as an initial identification viewing area, and the rolling superposition mode is adopted to judge the driving condition of the vehicle for the accumulated history condition every 25 s.

As a preferable scheme, the DDPG algorithm is based to construct a neural network, and deep reinforcement learning is carried out on a source domain of the neural network, and training of the neural network is completed, wherein the training of the neural network comprises the design of a state space, an action space and a reward function of the deep reinforcement learning;

the design expression of the state space is as follows:

S＝{V,acc,SoC,SoH}

wherein V and acc are the vehicle speed and the vehicle acceleration respectively, soC is the battery charge state, and SoH is the battery health state;

the design expression of the motion space is as follows:

action＝{P _eng |P _eng ∈[0,172kw]}

wherein P is _eng Output power for the engine;

the design expression of the bonus function is as follows:

J＝{α[fuel(t)+elec(t)]+β[SoC(t)-SoC _ref ] ² +γ[SoH(t)-SoH _ref ]}

where J is an objective function defined in energy management, α is a fuel consumption weight, β is a battery charge maintenance weight, γ is a battery degradation cost weight, fuel is a fuel consumption, elec is an electric energy consumption, and SOC _ref Is the reference value of the SOC of the battery, SOH _ref Is a reference value for the state of health of the battery.

As a preferable scheme, the DDPG algorithm is used for constructing a neural network, deep reinforcement learning is carried out on a source domain of the neural network, and training of the neural network further comprises the steps of putting forward corresponding constraint on each part of the whole vehicle power assembly;

the constraint expression is as follows:

the DDPG algorithm is a deep reinforcement learning algorithm developed based on an Actor-Critic architecture, and is developed based on the Actor-network mu (s|theta) ^μ ) The input state observables are mapped to a deterministic behavior through a neural network; critic-network Q (s|θ) ^Q ) Inputting actions taken by an Actor network and observed quantity of the current state to evaluate the quality of the current actions;

introducing a target Actor-network μ' (s|θ) ^μ' ) And target critical-network Q' (s|θ) ^Q' ) To estimate Q-value:

y _t ＝r _t +γQ'(s _t+1 ,μ'(s _t+1 |θ ^μ' )|θ ^Q' )

training Critic-network:

the objective of the DDPG algorithm is to minimize the expectations of the loss function by updating the network parameters, calculating the time difference (td) -error as follows:

wherein L is average loss, N is the fixed size of mini-batch, randomly selected from the experience replay buffer;

For the Actor-network μ (s|θ) ^μ ) The purpose of the selection action is to maximize the Q-value and thus the parameter θ ^μ And (3) carrying out numerical solution by using a gradient method, wherein the derived chain rule is as follows:

in addition, the target networks μ 'and Q' learn with time lag updates, and the specific expressions are as follows:

wherein, tau is a soft update factor, and theta' are original network and target network parameters respectively;

on the premise of ensuring the fuel economy of the whole vehicle, the controller is enabled to search the optimal solution in a smaller action space.

As a preferred solution, the transferring the trained neural network from the source domain to the target domain by adopting the transfer learning, and generating the PHET energy management strategy according with the driving scene features includes: in a given source domain M _s And a target domain M _t On the basis of (1) obtaining slave source domain M through migration learning _s Middle learning object domain M _t Is of the optimum strategy pi ^* Transfer from source domain to target domain is realized, source networkBoth the network and the target network use the same DDPG architecture.

A PHET energy management strategy generation system based on condition identification, comprising:

the typical working condition construction module is used for constructing typical driving working conditions of the vehicle in different operation scenes;

the real-time working condition identification module is used for identifying the real-time driving working condition of the vehicle;

The neural network training module is used for constructing a neural network based on a DDPG algorithm, and performing deep reinforcement learning on a source domain of the neural network to complete training of the neural network, wherein the source domain is a typical driving condition of a vehicle in different driving scenes;

the transfer learning module is used for transferring the trained neural network from a source domain to a target domain by adopting transfer learning to generate a PHET energy management strategy which accords with the driving scene characteristics, wherein the target domain is the real-time driving working condition of the vehicle.

Compared with the prior art, the invention has at least the following beneficial effects:

the control strategy conforming to the characteristics of the driving scene is obtained by combining the driving condition recognition technology and the depth deterministic strategy gradient (DDPG) algorithm based on the Transfer Learning (TL), so that the PHET comprehensive performance is improved, the system efficiency and the adaptability are improved, and the control strategy is obviously superior to the existing energy management strategy based on the depth reinforcement learning. According to the PHET energy management strategy generation method, transfer Learning (TL) is adopted to realize the transfer of the energy management strategy between a source domain (PHET typical driving working condition constructed based on data driving) and a target domain (PHET real-time driving working condition identified based on a neural network working condition identification algorithm), the convergence rate of energy management strategy training can be accelerated, the timeliness of an energy management control strategy is further effectively improved, and the adaptability of the PHET energy management strategy to various complex driving working conditions is improved.

Drawings

FIG. 1 is a schematic diagram of an overall energy management control framework for a plug-in hybrid vehicle;

FIG. 2 is a schematic view of the driving parameters after wavelet decomposition and reconstruction preprocessing;

FIG. 3 is a schematic flow diagram based on a Markov chain and Monte Carlo simulation;

FIG. 4 is a schematic diagram of three exemplary cycle conditions constructed: (a) urban construction; (b) mining; (c) coal;

FIG. 5 is a schematic diagram of a learning vectorized neural network;

FIG. 6 is a map diagram of an engine (including an optimal equivalent fuel consumption curve);

FIG. 7 is an engine output power distribution for three energy management strategies for three operating scenarios: (a) scene one; (b) scene two; (c) scenario three.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, one of ordinary skill in the art may also obtain other embodiments without undue burden.

As shown in FIG. 1, the overall PHET energy management control framework is divided into an upper layer and a lower layer, the upper layer is a working condition identification framework based on LVQ, and on the basis of the built PHET running working condition, the running scene of the vehicle is identified by collecting real-time data of the running of the vehicle. The lower layer is an energy management control framework based on deep transfer reinforcement learning of a DDPG algorithm, and based on the actual running scene of the automobile identified by the upper layer, the transfer learning technology applies the neural network which is fully trained through Deep Reinforcement Learning (DRL) in the corresponding scene to the actual working condition, so that the aim that the current neural network converges with fewer learning rounds under the target working condition is fulfilled, the energy management strategy can be generated quickly, and the optimal performance is ensured. The PHET energy management strategy generation method based on the working condition identification is realized by four steps of building a typical driving working condition, identifying the real-time driving working condition of a vehicle, pre-training and storing the representative working condition of each operation scene based on deep reinforcement learning of a DDPG algorithm, and realizing the transfer of a pre-trained neural network from a source domain to a target domain by adopting transfer learning.

Step 1: the construction of the typical driving working condition specifically comprises the following steps:

step 1.1: and acquiring driving condition data of the vehicle in different application scenes through cloud big data/vehicle-mounted OBD. The PHET studied in the invention has three main application scenarios: urban construction dregs transport vehicle, mining transport vehicle and coal transport vehicle.

Step 1.2: because the collected original data often presents the conditions of burrs, abrupt changes and the like, wavelet decomposition and reconstruction are adopted to carry out smoothing and noise reduction treatment on the original data before a typical driving working condition is constructed. The data after wavelet decomposition and reconstruction preprocessing is shown in fig. 2. And the preprocessed data is subjected to kinematic segmentation, wherein the segmentation standard is as follows: idle speed section (vehicle speed < 2km/h, -0.15 m/s) ² Acceleration < 0.15m/s ² ) Acceleration section (acceleration not less than 0.15 m/s) ² ) Deceleration section (acceleration is less than or equal to-0.15 m/s) ² ) And a cruise section (vehicle speed not less than 2km/h and-0.15 m/s) ² Acceleration < 0.15m/s ² )。

Step 1.3: and adopting a Principal Component Analysis (PCA) algorithm to perform dimension reduction processing on characteristic parameters describing characteristics of each motion segment. The principal component analysis results of the characteristic parameters are shown in table 1.

TABLE 1

Main component	Variance of	Contribution ratio (%)	Cumulative contribution rate (%)
				1	5.5762	55.762	55.762
2	2.0218	20.228	75.990
				3	1.0112	10.112	86.102
4	0.7913	7.913	94.015
				5	0.3975	3.975	97.99

As can be seen from table 1, the first 3 principal components all have variances greater than 1, the fourth is close to 1 and the cumulative contribution of the first four principal components is greater than 90%, so the four characteristic parameters basically contain most of the information of the original variables. The first four principal components, namely maximum speed, minimum speed, average speed and standard deviation of speed, are chosen to characterize the kinematics of the driving segment.

Step 1.4: and classifying the motion fragments by adopting an SVM and K-means mixed classification algorithm. On the basis of classification, a Markov chain and Monte Carlo simulation method are utilized to construct representative circulation working conditions under three application scenes. The process flow based on Markov chain and Monte Carlo simulation is shown in FIG. 3. The vehicle speed-time relationship for the three typical representative cycling conditions constructed is shown in the graphs (a), (b), and (c) of fig. 4.

Step 2: the method for identifying the real-time driving working condition of the automobile specifically comprises the following steps:

the embodiment of the invention selects a data driving mode to perform modeling of working condition identification.

Because the Learning Vector Quantization (LVQ) has higher accuracy and strong practicability in the working condition recognition, the invention selects the learning vector quantization as the working condition recognizer. The structure of the learning vector quantization neural network is shown in fig. 5. The LVQ neural network is mainly divided into a competition layer and a linear layer, in the competition layer, the neural network classifies an input vector X in combination with competition learning and supervised learning, and the process includes two parts: one is to select the best matching neuron and the other is to adaptively update the weight vector. In the linear layer, the classification result of the competition layer is transferred to the target classification defined by the user. Driving scenarios for PHET can be classified into 3 categories: scene 1 represents a urban construction residue carrier vehicle, scene 2 represents a mining carrier vehicle, and scene 3 represents a coal carrier vehicle.

Step 2.1: and 10 characteristic parameters including maximum speed, average speed, speed standard deviation, average acceleration, average deceleration, acceleration standard deviation, acceleration proportion (acceleration time/total time), deceleration proportion (deceleration time/total time), constant speed proportion (constant speed time/total time) and idle speed proportion (idle speed time/total time) are selected by calculating the Pearson correlation coefficient among classical characteristic parameters.

Specific values of the characteristic parameters of each driving scene are shown in table 2.

TABLE 2

Characteristic parameter	Urban construction	Mining	Coal
				Maximum vehicle speed (km/h)	64.87	65.02	86.00
Average speed (km/h)	18.78	13.80	41.74
				Standard deviation of speed	21.13	15.70	27.97
Average acceleration (m/s) ² )	0.31	0.38	0.30
				Average deceleration (m/s) ² )	-0.42	-0.45	-0.46
Standard deviation of acceleration	0.23	0.27	0.28
				Acceleration ratio (%)	0.18	0.17	0.25
Deceleration ratio (%)	0.12	0.15	0.17
				Average speed ratio (%)	0.30	0.27	0.12
Idle speed ratio (%)	0.39	0.40	0.44

Step 2.2: and (3) extracting and training the corresponding characteristic parameters based on the 3 typical driving cycle representative working condition data established in the step (1). When determining the training set, the invention adopts a sliding window mode to extract parameters in order to increase the number of training samples. Because the duration of the synthesized representative working condition is limited, a single working condition is insufficient to provide enough training samples, the working conditions established in the previous step are repeated for 10 times in series to form the representative working condition, the window length is selected to be 1800s, and the speed interval in the window is extracted by taking 100s as the interval time and is input into the identifier for training.

Step 2.3: and the real-time driving working condition of the automobile is accurately identified by calculating the Pearson correlation coefficient among the characteristic parameters. When the running condition identification is carried out on the real vehicle running data, 25s is selected as an initial identification visual field, and the vehicle running condition judgment is carried out on the accumulated historical working conditions every 25s by adopting a rolling superposition mode. Because the initially selected working condition is shorter, the characteristic of the whole driving cycle can not be reflected, but as the accumulated duration increases, the working condition characteristic can more and more represent the driving scene of the vehicle, and the proportion of the driving scene determined by the driving condition recognition module in the first 1000s of the vehicle driving to each input accumulated historical time is taken as the scene of the vehicle driving at this time in combination with the actual situation.

Step 3: the deep reinforcement learning based on the DDPG algorithm pre-trains and stores the representative working conditions of each operation scene, and specifically comprises the following steps:

a deep reinforcement learning method based on a DDPG algorithm is adopted to pretrain the source domain, namely 3 typical representative driving condition data constructed in the first part. The overall algorithm of DDPG is as follows:

step 3.1: the state, action and rewards of deep reinforcement learning are designed as follows:

In the design of the state space, the embodiment of the invention not only considers the energy consumption of the whole system, but also considers the balance of the energy consumption of the system and the aging of the battery, so that the SOC, the temperature and the state of health SOH of the battery are selected. The entire state space is represented by the following formula (1):

S＝{V,acc,SoC,SoH} (1)

v and acc are the vehicle speed and the vehicle acceleration respectively, soC is the battery charge state, soH is the battery health state, and these variables are the key parameters representing the vehicle running state.

In designing the operating space, since the control strategy for overall vehicle energy management aims to continuously control the mechanical power of the vehicle, the engine output power is used as a control variable, specifically as shown in the following formula (2):

action＝{P _eng |P _eng ∈[0,172kw]} (2)

in the design of a reward function (objective function) of an energy management problem, several optimization targets of whole vehicle energy consumption, power battery SOC and battery degradation cost are comprehensively considered, so that the reward function is determined, and the concrete expression is as shown in the following formula (3):

J＝{α[fuel(t)+elec(t)]+β[SoC(t)-SoC _ref ] ² +γ[SoH(t)-SoH _ref ]} (3)

where J is an objective function defined in energy management, α is a weight of fuel consumption, β is a weight of battery power maintenance, γ is a weight of battery degradation cost, fuel is fuel consumption, elec is electric energy consumption, and SOC _ref Is the reference value of the SOC of the battery, SOH _ref Is a reference value for the state of health of the battery.

Step 3.2: corresponding constraint is put forward on each part of the whole vehicle power assembly, and the specific expression is as shown in the following formula (4):

because the DDPG algorithm is a deep reinforcement learning algorithm developed based on the Actor-Critic architecture, wherein the Actor-network μ (s|θ) ^μ ) The input is a state observance, which is then mapped to a deterministic behavior by a neural network. Critic-network Q (s|θ) ^Q ) And inputting the action taken by the Actor network and the observed quantity of the current state to evaluate the quality of the current action.

Step 3.4: to reduce bias due to the estimation of Q-value by a single Critic and the selection of actions by a single Actor network, a target Actor-network μ' (s|θ) is introduced ^μ' ) And target critical-network Q' (s|θ) ^Q' ) To estimate the Q-value, the specific expression is as follows (5):

y _t ＝r _t +γQ'(s _t+1 ,μ'(s _t+1 |θ ^μ' )|θ ^Q' ) (5)

step 3.5: critic-network is trained. Small batches of experience are randomly selected from the experience pool, the loss function is calculated and the Critic-network parameters are updated, the goal of DDPG is to minimize the desire for the loss function by updating the network parameters, and the calculated time difference (td) -error is shown in the following equation (6):

where L is the average loss, N is the fixed size of the mini-batch, randomly selecting from the empirical replay buffer, for the Actor-network μ (s|θ ^μ ) The purpose of the selection action is to maximize the Q-value and thus the parameter θ ^μ The update of (2) can be numerically solved by using a gradient method, and the derived chain rule is shown in the following formula (7):

in addition, the target networks μ 'and Q', using time lag update, can greatly improve learning stability, and the specific expression is as shown in the following formula (8):

where τ is a soft update factor and θ' are the original network and target network parameters, respectively.

Step 3.6: on the premise of ensuring the fuel economy of the whole vehicle, in order to effectively reduce the dimension of the action space of deep reinforcement learning, the controller is enabled to search for an optimal solution in a smaller action space so as to accelerate the convergence rate of reinforcement learning, and as shown in fig. 6, an optimal fuel consumption rate curve is constructed in a Map diagram of the engine, and when the engine works, any engine power corresponds to a rotating speed torque pair on the curve.

Step 4: the transfer from the source domain to the target domain of the pre-trained neural network is realized by adopting transfer learning, and the method specifically comprises the following steps:

based on the transferability of the neural network, the transfer learning technology can apply the neural network which is fully trained in the corresponding scene to the actual working condition. The transmission of the energy management policy between the source domain and the target domain can be achieved by combining the depth migration (DTL) algorithm with the DDPG algorithm.

The source domain is the three typical driving cycles of the step 1 representing the working conditions, and the target domain is the real-time driving working conditions of the vehicle identified by the working condition identification module in the step 2. Because the drive loops of the source domain and the target domain have the same feature space and are interrelated, in a given source domain M _s And a target domain M _t On the basis of (1), the migration learning can be realized from the source domain M _s Middle learning object domain M _t Is the optimal strategy mu of (1) ^* Namely, the transfer of source domain knowledge to related target domains is realized. At the same time, because most of the parameters in the neural network are the same, only the parameters of the output layer need to be retrained, so the source network and the target network both use the same DDPG architecture.

As described above, the PHET energy management strategy generation method based on the working condition identification combines the driving working condition identification technology with the deep reinforcement learning method based on the transfer learning, so as to obtain the control strategy conforming to the driving scene characteristics.

As shown in the diagrams (a), (b) and (c) of fig. 7, with the DP-based energy management strategy as a reference, the engine situation of the whole control strategy is closer to the DP-based energy management strategy, and the duty ratio of the engine operating point ratio in the high power interval is smaller than that of the deep reinforcement learning energy management strategy without considering the driving condition identification, which indicates that the energy management strategy provided by the invention has good fuel economy performance and better energy saving effect than that of the DRL energy management strategy without driving condition identification.

As shown in table 3, since the battery energy consumption is considered in combination with the dynamic balance of battery health in determining the state space and the reward function of DDPG, if the energy management strategy proposed by the present invention is slightly reduced in economy from the index of fuel consumption alone, but the degradation cost of the power battery is considered together, the control strategy of the present invention is effectively reduced in the overall running cost compared to the DDPG algorithm that ignores the battery health strategy.

TABLE 3 Table 3

Finally, as shown in table 4, the transfer learning technology is adopted, so that the training period can be effectively shortened in the training of the neural network, the number of iterative steps of convergence is reduced by about 50%, the convergence speed is increased, the real-time utilization of the energy strategy based on the working condition identification provided by the invention is facilitated, and the efficiency of the implementation of the whole vehicle control is improved.

TABLE 4 Table 4

TABLE 5

The PHET energy management strategy generation method based on the working condition identification has at least the following advantages:

(1) A two-layered energy management framework is provided. The upper layer adopts a working condition identification framework based on Learning Vectorization (LVQ), the lower layer adopts a depth migration reinforcement learning control framework based on depth deterministic strategy gradient (DDPG) algorithm, and the driving condition identification technology and the depth deterministic strategy gradient (DDPG) algorithm based on migration learning (TL) are combined, so that a control strategy conforming to driving scene characteristics is obtained. Based on an energy management strategy based on Dynamic Programming (DP), compared with a deep reinforcement learning energy management strategy without considering driving condition recognition, the energy management strategy combining driving condition recognition and deep reinforcement learning is closer to the energy management strategy based on the DP in the aspect of engine operation, and the duty ratio of the engine operating point proportion in a high-power interval is smaller than that of the deep reinforcement learning energy management strategy without considering driving condition recognition. Meanwhile, the battery state of charge (SOC) track descending trend is slower and fluctuation is smaller in the whole process by adopting a driving condition identification technology and considering an energy management strategy of the battery state of health (SOH). The energy management strategy provided by the invention can adopt a power distribution strategy capable of reflecting the characteristics of actual operation scenes, has obvious effects on improving the comprehensive performance of PHET and improving the efficiency and adaptability of the system, and is obviously superior to the existing energy management strategy based on deep reinforcement learning.

(2) The method is based on data driving, firstly, collected historical motion data fragments of a vehicle are classified by using a SVM and K-means mixed classification algorithm, and then, a Markov chain and a Monte Carlo simulation method are utilized to construct a typical representative cycle driving condition reflecting the actual operation scene and driving behavior of a plug-in hybrid power truck (PHET). Because the constructed driving working condition data is derived from the real driving related data of the vehicle, the driving working condition constructed based on the data is used as a source domain of transfer learning, a more accurate evaluation basis can be provided for actual energy consumption of PHET, and the proposed energy management strategy has important practical significance.

(3) An actual running condition of the vehicle is accurately identified by adopting a Learning Vectorization (LVQ) neural network condition identification algorithm. In order to enhance the accuracy of the recognition working conditions, 10 characteristic parameters, namely maximum speed, average speed, speed standard deviation, average acceleration, average deceleration, acceleration standard deviation, acceleration proportion (acceleration time/total time), deceleration proportion (deceleration time/total time), constant speed proportion (constant speed time/total time) and idle speed proportion (idle speed time/total time), are finally selected by calculating the pearson correlation coefficient among classical characteristic parameters. Meanwhile, when the training set is determined, in order to improve the number of training samples, the invention adopts a sliding window mode to extract parameters. By the method, on the premise of ensuring usability, accuracy of real-time working condition identification is greatly improved, and guarantee is provided for the presentation of a target domain for later transfer learning.

(4) The lower control framework adopts a depth deterministic strategy gradient (DDPG) algorithm, and the DDPG algorithm of the invention can improve training efficiency, stability of a training process and robustness of a model by removing randomness and dependence among samples through adopting priority replay experience. Meanwhile, extra noise is added in the output of the Actor-Network to enable the DDPG algorithm to perform better exploration and correct action selection, and the influence of different action noises on the algorithm performance is compared, so that Soft-max action noise (SAN) noise is adopted.

(5) In determining the state space and the reward function of the DDPG algorithm. Compared with the conventional optimization algorithm which only considers the energy consumption of the whole system, the invention also considers the balance of the energy consumption of the battery and the aging state, and takes the state of health (SOH) of the battery into the consideration range of a state space. At the same time, when determining the objective function (reward function), the battery degradation cost is introduced into the optimization object together with the fuel consumption term. By the method, the running state of the vehicle in each time period is more comprehensively and deeply represented, the comprehensive performance of the whole vehicle under the algorithm is further improved, and the fuel consumption cost and the electric energy consumption cost of the whole vehicle are reduced, and meanwhile the degradation cost of the battery is also reduced.

(6) On the premise of ensuring the fuel economy, in order to effectively reduce the dimension of the DDPG algorithm action space, an optimal fuel consumption rate curve is constructed in an engine map, and when the engine is running, any engine power corresponds to a rotating speed torque pair on the curve. Therefore, the controller can search the optimal solution in a smaller action space, and the convergence speed of reinforcement learning is further increased.

(7) Transfer Learning (TL) is adopted to realize the transfer of energy management strategies between a source domain (PHET typical driving working condition constructed based on data driving) and a target domain (PHET real-time driving working condition identified based on LVQ neural network working condition identification algorithm), and the optimal strategy of the target domain is learned from the source domain on the basis that the source domain provides priori knowledge accessible to the target domain. Therefore, the convergence speed of energy management strategy training can be increased, the timeliness of the energy management control strategy is further effectively improved, and the adaptability of the energy management control strategy under the variable and complex driving working conditions is improved.

Another embodiment of the present invention further provides a PHET energy management policy generating system based on condition recognition, including:

Another embodiment of the present invention also proposes a computer readable storage medium storing a computer program which, when executed by a processor, implements the PHET energy management policy generation method based on condition recognition of the present invention.

For example, the instructions stored in the memory may be partitioned into one or more modules/units that are stored in a computer readable storage medium and executed by the processor to perform the PHET energy management policy generation method of the invention based on condition identification. The one or more modules/units may be a series of computer readable instruction segments capable of performing a specified function, which describes the execution of the computer program in a server.

The electronic equipment can be a smart phone, a notebook computer, a palm computer, a cloud server and other computing equipment. The electronic device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the electronic device may also include more or fewer components, or may combine certain components, or different components, e.g., the electronic device may also include input and output devices, network access devices, buses, etc.

The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may be an internal storage unit of the server, such as a hard disk or a memory of the server. The memory may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the server. Further, the memory may also include both an internal storage unit and an external storage device of the server. The memory is used to store the computer readable instructions and other programs and data required by the server. The memory may also be used to temporarily store data that has been output or is to be output.

It should be noted that, because the content of information interaction and execution process between the above module units is based on the same concept as the method embodiment, specific functions and technical effects thereof may be referred to in the method embodiment section, and details thereof are not repeated herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. The PHET energy management strategy generation method based on the working condition identification is characterized by comprising the following steps of:

identifying real-time driving conditions of the vehicle;

Transferring the trained neural network from a source domain to a target domain by adopting transfer learning, and generating a PHET energy management strategy conforming to the characteristics of a driving scene, wherein the target domain is the real-time driving working condition of the vehicle;

the method for identifying the real-time driving condition of the vehicle specifically comprises the following steps of:

identifying the real-time driving condition of the vehicle by calculating the pearson correlation coefficient among the characteristic parameters;

the method comprises the steps of constructing a neural network based on a DDPG algorithm, performing deep reinforcement learning on a source domain of the neural network, and completing training of the neural network, wherein the training of the neural network comprises the steps of designing a state space, an action space and a reward function of the deep reinforcement learning;

the design expression of the state space is as follows:

S＝{V，acc，SoC，SoH}

the design expression of the motion space is as follows:

action＝{P _eng P _eng ∈[0，172kw]}

in the middle of，P _eng Output power for the engine;

the design expression of the bonus function is as follows:

J＝{α[fuel(t)+elec+β[SoC(t)-SoC _ref ] ² +γ[SoH(t)-SoH _ref ]}

where J is an objective function defined in energy management, α is a fuel consumption weight, β is a battery charge maintenance weight, γ is a battery degradation cost weight, fuel is a fuel consumption, elec is an electric energy consumption, and SOC _ref Is the reference value of the SOC of the battery, SOH _ref Is a reference value for the state of health of the battery;

the method comprises the steps of constructing a neural network based on a DDPG algorithm, performing deep reinforcement learning on a source domain of the neural network, and completing training of the neural network, wherein the training of the neural network also comprises the steps of providing corresponding constraint for each part of the whole vehicle power assembly;

the constraint expression is as follows:

introducing a target Actor-network μ' (s|θ) ^μ′ ) And target critical-network Q' (s|θ) ^Q′ ) To estimate Q-value:

y _t ＝r _t +γQ′(s _t+1 ，μ′(s _t+1 |θ ^μ′ )|θ ^Q′ )

training Critic-network:

on the premise of ensuring the fuel economy of the whole vehicle, the controller is enabled to find the optimal solution in a smaller action space;

the transferring learning is adopted to transfer the trained neural network from a source domain to a target domain, and the generating of the PHET energy management strategy conforming to the driving scene characteristics comprises the following steps: in a given source domain M _s And a target domain M _t On the basis of (1) obtaining slave source domain M through migration learning _s Middle learning object domain M _t Is of the optimum strategy pi ^* The transfer from the source domain to the target domain is realized, and the source network and the target network both use the same DDPG architecture.

2. The method for generating the PHET energy management strategy based on the working condition identification according to claim 1, wherein the construction of the typical driving working condition of the vehicle under different driving scenes comprises the following steps:

3. The method for generating the PHET energy management strategy based on the working condition identification according to claim 1, wherein the learning vector quantization is selected as the working condition identifier when the real-time driving working condition of the vehicle is identified.

4. The method for generating a PHET energy management strategy based on condition recognition according to claim 1, wherein in the step of extracting and training the corresponding feature parameters, a sliding window is adopted to extract the parameters.

5. The method for generating the PHET energy management strategy based on the working condition identification according to claim 1, wherein when the real-time driving working condition of the vehicle is identified by calculating the pearson correlation coefficient among the characteristic parameters, 25s is selected as an initial identification view field, and the rolling superposition mode is adopted to judge the driving working condition of the vehicle for every 25s for the accumulated history working condition.

6. A PHET energy management strategy generation system based on condition identification, comprising:

the transfer learning module is used for transferring the trained neural network from a source domain to a target domain by adopting transfer learning to generate a PHET energy management strategy which accords with the driving scene characteristics, wherein the target domain is the real-time driving working condition of the vehicle;

the design expression of the state space is as follows:

S＝{V，acc，SoC，SoH}

the design expression of the motion space is as follows:

acion＝{P _eng ∈[0，172kw]}

wherein P is _eng Output power for the engine;

the design expression of the bonus function is as follows:

J＝{α[fuel(t)+elec(t)]+β[SoC(t)-SoC _ref ] ² +γ[SoH(t)-SoH _ref ]}

wherein J is an energy pipeThe objective function defined in the middle, α is the weight of fuel consumption, β is the weight of battery charge maintenance, γ is the weight of battery degradation cost, fuel is fuel consumption, elec is electric energy consumption, and SOC _ref Is the reference value of the SOC of the battery, SOH _ref Is a reference value for the state of health of the battery;

the constraint expression is as follows:

y _t ＝r _t +YQ′(s _t+1 ，μ′(s _t+1 |θ ^μ′ )|θ ^Q′ )

training Critic-network: