CN112668239B

CN112668239B - Hybrid power truck fleet experience teaching method based on counterstudy

Info

Publication number: CN112668239B
Application number: CN202011618869.4A
Authority: CN
Inventors: 衣丰艳; 申阳; 胡东海; 周稼铭; 王金波; 衣杰; 李伟; 鲁大钢; 林海
Original assignee: Shandong Jiaotong University
Current assignee: Shandong Jiaotong University
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2022-11-15
Anticipated expiration: 2040-12-30
Also published as: CN112668239A

Abstract

The invention discloses a hybrid power truck fleet experience teaching method based on countermeasure learning, which comprises the steps of extracting original working conditions of a controller of a hybrid power truck A and processing parameters of the original working conditions; generating a working condition and an SOC trajectory graph by using the data processed by the parameters and the confrontation network learning, and judging the fitting degree with the real data; and performing cycle training of the fitness judgment, and if the generated working condition, the SOC trajectory graph and the real data cannot be judged, optimizing. By means of countermeasure network learning, the vehicle controllers of the hybrid trucks B and C master the optimal SOC tracks under different working conditions, the equivalent fuel economy of the truck fleet is improved, and further the experience teaching of the truck fleet is achieved through countermeasure learning.

Description

Hybrid power truck fleet experience teaching method based on antagonistic learning

Technical Field

The invention relates to the technical field of truck queue, in particular to a hybrid power truck queue experience teaching method based on antagonistic learning.

Background

The 'queue following' of the hybrid truck is used as the field of the leading application of automatic driving, the distance between vehicles in a control vehicle queue and the running state of a vehicle team can be effectively controlled, the air resistance in running is reduced, and therefore the equivalent fuel economy of the vehicles is improved. The experience teaching method based on the artificial intelligence learning method can quickly realize that a new truck fleet column learns the SOC track of the truck fleet column with the existing driving experience, but because the experience teaching method for the train traveling is limited to be directly learned by depending on the whole truck controller with the existing driving experience, the vehicle cannot be learned without the whole truck controller, and the learning neutral period is easy to generate, the truck transportation time is delayed, and the transportation efficiency is reduced. How to creatively provide an experience transmission method based on artificial intelligence, so that a new hybrid truck queue can learn the SOC track of a fixed route in a short time without a truck queue whole controller with driving experience, and the problem to be solved in the field is urgent.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and title of the application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

The invention is provided in view of the problems of the conventional hybrid truck fleet learning.

Therefore, the technical problem solved by the invention is as follows: the equivalent fuel economy of the vehicles in the truck fleet is low, and whether the control quantity is in the best state cannot be determined during the running control of the truck fleet.

In order to solve the technical problems, the invention provides the following technical scheme: extracting original working conditions of a controller of a hybrid power truck A and carrying out parameter processing on the original working conditions; generating a working condition and an SOC trajectory graph by using the data processed by the parameters and the confrontation network learning, and judging the fitting degree with the real data; and performing cycle training of the fitting degree judgment, and if the generated working condition, the SOC trajectory graph and the real data cannot be judged, achieving the optimal result.

As a preferable aspect of the hybrid electric vehicle fleet experience teaching method based on antagonistic learning according to the present invention, wherein: the parameter processing of the original working condition comprises the steps of analyzing the original working condition in a collector, extracting and classifying specific working condition characteristic parameters, and inputting the different types of working condition characteristic parameters serving as random noise into a generator G, namely a controller of the hybrid truck B.

As a preferable aspect of the hybrid electric vehicle fleet experience teaching method based on the countermeasure learning of the present invention, wherein: the countermeasure network learning comprises a controller of the hybrid truck B and the hybrid truck C, wherein the controller of the hybrid truck B utilizes the processing result to carry out random working condition combination; a controller of the hybrid truck C generates a working condition and an SOC trajectory graph according to the combination and judges the fitting degree of the SOC trajectory graph and the real working condition and trajectory; the controller of the hybrid truck C is optimized when the generated result and the true result cannot be discriminated.

As a preferable aspect of the hybrid electric vehicle fleet experience teaching method based on the countermeasure learning of the present invention, wherein: the generating of the working condition and SOC trajectory graph comprises the steps of mapping the random noise to a new data distribution by utilizing a neural network in a multilayer perceptron forming the generator G to obtain a random characteristic parameter working condition, representing the random characteristic parameter working condition as G (z) and inputting the random characteristic parameter working condition to a controller, namely a discriminator D, of the hybrid truck C, and calculating the random characteristic parameter working condition by the controller of the hybrid truck C through an own algorithm to generate the working condition and SOC trajectory graph; the actual conditions and trajectories include a map of actual conditions and SOC of the hybrid truck a.

As a preferable aspect of the hybrid electric vehicle fleet experience teaching method based on the countermeasure learning of the present invention, wherein: the actual working conditions and trajectories include that the actual working conditions and trajectories are actual working conditions and SOC trajectory maps of the hybrid truck A.

As a preferable aspect of the hybrid electric vehicle fleet experience teaching method based on the countermeasure learning of the present invention, wherein: the fitting degree judgment comprises the steps of taking the real working condition and the SOC trajectory graph as a real data distribution x, and judging the fitting degree of the real data distribution x and a random characteristic parameter G (z), wherein the formula is as follows:

wherein: p _data (x) For true data distribution, p _z (z) is the random noise input.

As a preferable aspect of the hybrid electric vehicle fleet experience teaching method based on the countermeasure learning of the present invention, wherein: the fitting degree judging method further comprises the steps that after the fitting degree judging formula judges two input data x and G (z) respectively, a discriminator D outputs a probability value, the probability value represents the fitting degree of the SOC and the working condition track which are suitable for driving, if the probability value of real data is close to 1, the discriminator D indicates that a controller of the hybrid truck C can find the SOC track which is suitable for the input random combination working condition parameter road section through an own algorithm, if the probability value of generated data is close to 0, the discriminator D indicates that the controller of the hybrid truck C cannot generate the SOC track which is suitable for the input random combination working condition parameter road section through the own algorithm, and the discriminator D feeds back parameters which need to be adjusted in the generated data to a generator G and adjusts and regenerates the parameters; and (4) performing the circular training until the discriminator D cannot distinguish the real data from the generated data at last, and judging that the generator G reaches the optimal state at the moment.

As a preferable aspect of the hybrid electric vehicle fleet experience teaching method based on the countermeasure learning of the present invention, wherein: the true data distribution and random noise input include that the fitting degree discrimination uses the true data distribution P _data (x) The entropy of the data of (1) is maximized to 1 by the entropy of the discriminator D, and then the random noise is input into p _z Number of (z)The samples generated are judged to be true and false by the discriminator D and maximized to 0 according to the entropy by the generator G, so the discriminator D aims at the maximization function V (D, G) and the generator G aims at the minimization function V (D, G), reducing the gap between the true data and the generated data.

As a preferable aspect of the hybrid electric vehicle fleet experience teaching method based on antagonistic learning according to the present invention, wherein: the loop comprises that the arbiter D of each iteration in the loop process can give the optimal value under the current state, the generator G is updated once after the arbiter D is updated for a plurality of times by the countermeasure network, so that the arbiter D gives priority to the generator G to achieve the current optimal value in one step, and the loop iteration is carried out to enable P to be obtained _g ＝P _data The optimum arbiter formula for the decision of the final result when G is fixed is expressed as follows:

from the above formula, when P _g ＝P _data When the utility model is used, the water is discharged,

the generator G and the discriminator D are optimal, namely the hybrid trucks B and C master the optimal SOC track under the working condition.

As a preferable aspect of the hybrid electric vehicle fleet experience teaching method based on the countermeasure learning of the present invention, wherein: the cyclic training further comprises replacing the distance metric of the probability distribution in the countervailing learning with the Wasserstein distance using a data similarity metric, the resulting formula being represented as follows:

the Wasserstein distance reflects the P _data And P _g When the distance is measured, the overlapping of the two is not considered, so thatThe counterstudy training process is more stable.

The invention has the beneficial effects that: in the process of the train hybrid electric vehicle based on countermeasure learning, a vehicle control unit with the most SOC track on a road section does not need to directly conduct experience transmission to learn through a countermeasure network, meanwhile, the train of the vehicle trained through the countermeasure learning is guaranteed to have the optimal SOC track through collecting working condition characteristic parameters and the SOC track of the vehicle control unit, and the equivalent fuel economy is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:

FIG. 1 is a schematic flow chart illustrating a hybrid electric truck fleet experience teaching method based on antagonistic learning according to a first embodiment of the present invention;

fig. 2 is a schematic diagram illustrating a hybrid electric truck fleet experience teaching method based on counterstudy according to a first embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, embodiments accompanying figures of the present invention are described in detail below, and it is apparent that the described embodiments are a part, not all or all of the embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Furthermore, the references herein to "one embodiment" or "an embodiment" refer to a particular feature, structure, or characteristic that may be included in at least one implementation of the present invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in a specific case to those of ordinary skill in the art.

Example 1

Referring to fig. 1-2, a first embodiment of the present invention provides a method for learning experience of a hybrid power card fleet based on counterstudy, comprising:

s1: and extracting the original working condition of the controller of the hybrid truck A and carrying out parameter processing on the original working condition. In which it is to be noted that,

the parameter processing of the original working condition comprises the steps of analyzing the original working condition in a collector, extracting and classifying specific working condition characteristic parameters, and inputting the different types of working condition characteristic parameters serving as random noise into a generator G, namely a controller of the hybrid truck B.

S2: and generating a working condition and an SOC trajectory graph by using the data processed by the parameters and the confrontation network learning, and judging the fitting degree with the real data. In which it is to be noted that,

the countermeasure network learning (GAN) comprises a controller of a hybrid truck B and a controller of a hybrid truck C, which respectively represent a generator G and a discriminator D, the nature of the generator G and the discriminator D is a function, and the controller of the hybrid truck B carries out random working condition combination by using a processing result; a controller of the hybrid truck C generates a working condition and an SOC trajectory graph according to the combination and judges the fitting degree of the SOC trajectory graph with the real working condition and the actual trajectory; the controller of the hybrid truck C is optimized when the generated result and the true result cannot be discriminated.

Further, generating the working condition and the SOC trace map comprises mapping random noise to a new data distribution by using a neural network in a multilayer perceptron forming a generator G to obtain a random characteristic parameter working condition, representing the random characteristic parameter working condition as G (z) and inputting the random characteristic parameter working condition to a controller of the hybrid truck C, namely a discriminator D, and calculating the random characteristic parameter working condition by the controller of the hybrid truck C through a self-owned algorithm to generate the working condition and the SOC trace map; the real working condition and the track comprise a real working condition and a SOC track map of the hybrid truck A, the real working condition and the SOC track map are used as a real data distribution x, fitting degree judgment is carried out on the real data distribution x and a random characteristic parameter G (z), and the formula is expressed as follows:

wherein: p is _data (x) For true data distribution, p _z (z) is random noise input, logarithm solving process is added to the formula on the basis of solving D (x) and D (G (z)), so that the influence of some unpredictable interference noise on data distribution and the problem of link data distribution deviation are reduced, an operation of solving an expected value is also added, and GAN enables the finally generated data distribution Pg (G (z)) and the true data distribution P to be combined _data (x) In agreement, that is, it is desirable to generate data that best fits the actual data but not the actual data itself, so as to ensure that G generates data that is both similar to the actual data and different from the original data.

After the two input data x and G (z) are respectively judged in the formula, the discriminator D outputs a probability value (representing the SOC suitable for driving and the fitting degree of the working condition track), if the probability value of the real data is close to 1, the discriminator D represents that the controller of the hybrid truck C can find the SOC track suitable for the input random combination working condition parameter road section through the self-owned algorithm, if the probability value of the generated data is close to 0, the discriminator D represents that the controller of the hybrid truck C cannot generate the SOC track suitable for the input random combination working condition parameter road section, and the discriminator D feeds back the parameters needing to be adjusted in the generated data to the generator G, adjusts and regenerates the parameters.

Fitness discrimination formula utilizes true data distribution P _data (x) Passes the Entropy (Entropy) of the discriminator to be maximized to 1, and then random noise is input into p _z The data of (z) passes through the entropy of the generator G, the generated samples are judged to be true and false by the discriminator D, and are maximized to 0 (i.e., the logarithm of the probability that the generated data is false is 0), so the goal of the discriminator D is to maximize the function V (D, G) and the goal of the generator G is to minimize the function V (D, G), reducing the difference between the true data and the generated data.

S3: and performing cycle training of the fitting degree judgment, and if the generated working condition, the SOC trajectory graph and the real data cannot be judged, achieving the optimal result. In which it is to be noted that,

and continuously and circularly training the generated data and the real data until the discriminator D cannot distinguish the real data from the generated data at last, judging that the generator G reaches the optimum at the moment, wherein the continuous training process is the extremely-small game process of the generator G and the discriminator D, the performances of D and G are continuously improved in iteration until the final D (G (z)) is consistent with the D (x), and the G and the D reach the optimum at the moment.

Furthermore, the arbiter D of each iteration in the circulation process can give the optimal value in the current state, the countermeasure network updates the generator G once after updating the arbiter D for a plurality of times, so that the priority generator G of the arbiter D further reaches the current optimal value, and the circulation iteration is carried out to ensure that P is used for realizing the optimal value _g ＝P _data The optimum arbiter formula for the decision of the final result when G is fixed is expressed as follows:

and the generator G and the discriminator D reach the optimal state, namely the hybrid trucks B and C master the optimal SOC track under the working condition.

Furthermore, the cyclic training further includes proposing WGAN by using a data similarity metric, namely Wasserstein GAN (warserstein distance-based countermeasure network), replacing the distance metric of the probability distribution in the countermeasure learning with Wasserstein distance, adjusting the algorithm of the countermeasure network learning, optimizing the problems of instability in the original training process (needing to carefully balance the training degree of the arbiter and the generator), disappearance of generator gradient and model collapse in the later training period, and learning the following formula according to the countermeasure network:

the resulting formula is expressed as follows:

C(G)＝-log(4)+2*JS(P _data ||P _g )

it can be seen that the anti-network learning model needs to calculate JS divergence on the premise of P _data And P _g The probability that two distributions overlap but there is no overlap at all between the lower and higher dimensions or some slight to negligible overlap is very large, so the loss of the generator becomes a constant (-log (4)) rather than linear, no adjustment can be made, and the generated sample cannot be made to continuously approach the true sample, so the patent proposes to use the formula Wasserstein distance instead of JS divergence, which is expressed as follows:

wherein: the advantage of the Wasserstein distance over the KL and JS divergence is that the Wasserstein distance still reflects their distance even if the two distributions do not overlap. Therefore, the problems of instability of the GAN original model training process and improper process indexes are optimized, and experience teaching of the countermeasure network received by the vehicle controllers of the hybrid trucks B and C can be more practical.

Finally, it should be noted that, for the process of generating data and real data cycle training, which may also be referred to as the process of experience teaching, when data cycle is performed once, the data after the last training is used as experience to teach to the next training and used as experience reference, so that the generated working condition and real data cannot be finally distinguished, and an optimal result is achieved.

Example 2

In the second embodiment of the invention, in order to better verify and explain the technical effects adopted in the method of the invention, a method for controlling the longitudinal following of the vehicle in the running process of the truck queue is selected for testing, and the test results are compared by means of scientific demonstration to verify the real effect of the method;

selecting 6 trucks of the same type to form 2 groups of truck queues, and carrying out a comparison experiment under 3 different road conditions, wherein the 3 road conditions comprise expressways, provincial roads and urban roads, and the two groups of truck queues respectively drive for 100km under each road condition by using two control methods, wherein the method extracts the original working condition for parameter processing, judges the fitting degree by using the working condition generated by antagonistic network learning, an SOC (system on chip) locus diagram and real data, and finally achieves the optimal solution for vehicle control, while the traditional control method is based on a nonlinear model prediction control theory, controls a truck fleet and indirectly improves the integral equivalent fuel economy of the truck queues by considering the constraint of a physical execution mechanism according to the obtained current road information, but the method cannot determine whether the optimally solved control quantity is optimal, tests the truck fleet by using two methods, carries out the experiment by taking the economy as a judgment standard, and the obtained results are shown in the following table:

table 1: and (5) the overall equivalent fuel consumption condition of the truck fleet.

	Highway with a light-emitting diode	Way of province	Urban road
				Method for producing a composite material	25.9L	27.2L	29.3L
Conventional control method	26.8L	27.3L	29.5L

The method and the traditional control method can be seen to run for 100km under 3 different road conditions, wherein the oil consumption under 3 road conditions is lower than that of the traditional method, the oil quantity saved at high speed is more obvious, and the oil consumption is reduced by 0.9L/100km, so that compared with the traditional control method, the method and the traditional control method are taught by experience, the equivalent fuel economy of the queue truck can be improved, the optimal SOC trajectory running route can be learned, and the traditional trajectory can be slightly optimized.

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A hybrid power truck fleet experience teaching method based on counterstudy is characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

extracting original working conditions of a controller of a hybrid power truck A and carrying out parameter processing on the original working conditions;

generating a working condition and an SOC trajectory graph by using the data processed by the parameters and the confrontation network learning, and judging the fitting degree with the real data;

performing cycle training of the fitting degree judgment, and if the generated working condition, the SOC trajectory graph and the real data cannot be judged, achieving the optimal result;

the parameter processing on the original working condition comprises the following steps,

analyzing original working conditions in a collector, extracting and classifying specific working condition characteristic parameters, and inputting the different types of working condition characteristic parameters serving as random noise into a generator G, namely a controller of the hybrid truck B;

the antagonistic network learning includes the steps of,

a controller of hybrid truck B and a controller of hybrid truck C, wherein the controller of hybrid truck B utilizes the processing results for random condition combining; a controller of the hybrid truck C generates a working condition and an SOC trajectory graph according to the combination and judges the fitting degree of the SOC trajectory graph and the real working condition and trajectory; the controller of the hybrid truck C is optimized when the generated result and the real result cannot be discriminated;

the generating of the operating conditions and the SOC map includes,

mapping the random noise to a new data distribution by using a neural network in a multilayer perceptron forming the generator G to obtain a random characteristic parameter working condition, expressing the random characteristic parameter working condition as G (z) and inputting the G (z) to a controller of the hybrid truck C, namely a discriminator D, and calculating the random characteristic parameter working condition by the controller of the hybrid truck C through an own algorithm to generate a working condition and an SOC (system on chip) locus diagram;

the actual operating conditions and trajectories include,

the real working condition and the track are a real working condition and SOC track map of the hybrid truck A;

the fitting degree judgment comprises the steps of taking the real working condition and the SOC trajectory graph as a real data distribution x, and judging the fitting degree of the real data distribution x and a random characteristic parameter G (z), wherein the formula is as follows:

wherein: pdata (x) is the true data distribution and pz (z) is the random noise input.

2. The method of claim 1 for the empirical teaching of a fleet of hybrid power trucks over antagonistic learning, wherein: the degree-of-fit determination further includes,

after the fitting degree judging formula judges two input data x and G (z) respectively, a discriminator D outputs a probability value, wherein the probability value represents the fitting degree of an SOC (state of charge) and a working condition track suitable for driving, if the probability value of real data is close to 1, the discriminator D indicates that a controller of the hybrid truck C can find the SOC track suitable for the input random combination working condition parameter road section through an own algorithm, if the probability value of generated data is close to 0, the discriminator D indicates that the controller of the hybrid truck C cannot generate the SOC track suitable for the input random combination working condition parameter road section through the own algorithm at the moment, and the discriminator D feeds back parameters needing to be adjusted in the generated data to a generator G and adjusts and regenerates the parameters; and (4) performing the circular training until the discriminator D can not distinguish the real data from the generated data at last, and judging that the generator G reaches the optimum state at the moment.

3. The method of claim 2 for the empirical teaching of a fleet of hybrid power trucks over antagonistic learning, wherein: the true data distribution and random noise inputs include,

the fitness discrimination maximizes 1 by using the entropy of the data of the real data distribution Pdata (x) through the discriminator D, and then maximizes 0 by passing the entropy of the generator G through the data of the random noise input pz (z), and the generated sample is determined to be true or false through the discriminator D, so that the discriminator D aims at maximizing functions V (D, G) and the generator G aims at minimizing functions V (D, G), reducing the difference between real data and generated data.

4. The method of claim 2 or 3 for the interactive learning based experience teaching of hybrid fleet trucks, wherein: the cycle may include the steps of,

the arbiter D of each iteration in the loop process can give an optimal value in the current state, the countermeasure network updates the generator G once after updating the arbiter D multiple times, so that the arbiter D gives priority to the generator G to achieve the current optimal value in one step, thereby performing loop iteration so that Pg = Pdata, and when G is fixed, the optimal arbiter formula for determining the final result is expressed as follows:

from the above equation, when Pg = Pdata, it indicates that the generator G and the discriminator D are optimal, that is, the hybrid trucks B and C grasp the optimal SOC trajectory in this condition.

5. The method of claim 4 for the interactive learning based experience teaching of hybrid fleet trucks, wherein: the cycle training further comprises the steps of,

replacing the distance measure of the probability distribution in the countervailing learning with the Wasserstein distance using a data similarity measure, the resulting formula being represented as follows:

the Wasserstein distance does not consider the overlap of Pdata and Pg when reflecting the Pdata and Pg distances, so that the antagonistic learning training process is more stable.