CN116091894B

CN116091894B - Model training method, vehicle control method, device, equipment, vehicle and medium

Info

Publication number: CN116091894B
Application number: CN202310207908.9A
Authority: CN
Inventors: 熊安斌; 杨奎元
Original assignee: Xiaomi Automobile Technology Co Ltd
Current assignee: Xiaomi Automobile Technology Co Ltd
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-07-14
Anticipated expiration: 2043-03-03
Also published as: CN116091894A

Abstract

The present disclosure relates to a model training method, a vehicle control method, a device, equipment, a vehicle, and a medium. The method comprises the following steps: acquiring a first surrounding image of a first sample vehicle; inputting the first peripheral environment image into an original perception prediction model to obtain first prediction running information of a first sample vehicle; inputting the first predicted running information into the rewarding model to obtain a first score of the first predicted running information; and fine tuning the original perception prediction model through a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score to obtain a target perception prediction model. The original perception prediction model is finely adjusted based on the reinforcement learning method, iterative optimization can be carried out on the model, the model capacity facing the full scene is improved, the target perception prediction model can obtain the optimal perception prediction result in the complex scene, the human expectation is better met, the problem of inaccurate perception prediction in the complex scene is solved, and safe and stable running of the vehicle is ensured.

Description

Model training method, vehicle control method, device, equipment, vehicle and medium

Technical Field

The disclosure relates to the technical field of automatic driving, and in particular relates to a model training method, a vehicle control method, a device, equipment, a vehicle and a medium.

Background

With the rapid development of the automatic driving technology, the low-speed running of the automatic driving vehicle in a fixed scene and a relatively simple scene is realized in China, but how to realize the full-scene running of the automatic driving vehicle at any speed is still a very challenging problem. The obstacle detection, the track prediction and the vehicle path planning are one of key technologies for realizing advanced automatic driving, and the movement track of the vehicle can be reasonably planned only if an automatic driving vehicle can accurately judge the intention and the future track of surrounding targets (vehicles, pedestrians and the like) like human driving, so that safe and stable navigation is realized.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a model training method, a vehicle control method, a device, an apparatus, a vehicle, and a medium.

According to a first aspect of embodiments of the present disclosure, there is provided a perceptual prediction model training method, comprising:

acquiring a first surrounding image of a first sample vehicle;

Inputting the first peripheral environment image into a pre-trained original perception prediction model to obtain first prediction running information of the first sample vehicle;

inputting the first predicted running information into a pre-trained rewarding model to obtain a first score of the first predicted running information;

and fine tuning the original perception prediction model through a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score to obtain a target perception prediction model.

Optionally, the reinforcement learning method is a near-end policy optimization algorithm;

the fine tuning of the original perception prediction model according to the first surrounding image, the first predicted driving information and the first score by a reinforcement learning method includes:

determining an objective function of the original perception prediction model according to the first surrounding image, the first prediction running information and the first score;

and according to the objective function, updating model parameters of the original perception prediction model by adopting a random gradient descent method.

After the step of fine tuning the original perceptual prediction model by a reinforcement learning method based on the first surrounding image, the first predicted travel information, and the first score, the method further comprises:

in response to the first training cut-off condition not being met, updating model parameters of the reward model, and repeatedly executing the step of acquiring a first peripheral environment image of a first sample vehicle to the step of finely adjusting the original perception prediction model through a reinforcement learning method according to the first peripheral environment image, the first prediction running information and the first score;

and obtaining the target perception prediction model in response to the first training cut-off condition being met.

Optionally, the reward model is trained by:

acquiring a second surrounding image of a second sample vehicle;

inputting the second surrounding environment image into the original perception prediction model to obtain a plurality of second prediction running information of the second sample vehicle, wherein the plurality of second prediction running information respectively correspond to different prediction probabilities;

determining the actual good and bad sequencing results of the second prediction running information;

Inputting the plurality of second predicted running information into a neural network to obtain a second score of each piece of second predicted running information;

sequencing the plurality of second predicted traveling information according to each second score to obtain a predicted good and bad sequencing result;

according to the actual good and bad sorting result and the predicted good and bad sorting result, updating model parameters of the neural network;

in response to the second training cutoff condition not being met, repeating the step of acquiring a second surrounding environment image of a second sample vehicle to the step of updating the model parameters of the neural network according to the actual good-bad ordering result and the predicted good-bad ordering result;

and determining the neural network obtained after the last model parameter update as the rewarding model in response to the second training cutoff condition being met.

Optionally, the first predicted traveling information includes predicted surrounding vehicle information of the first sample vehicle, predicted lane line information of a road ahead of the first sample vehicle, a predicted trajectory of the predicted surrounding vehicle of the first sample vehicle, and a predicted traveling route of the first sample vehicle;

The original perception prediction model is obtained through training in the following mode:

acquiring a third surrounding environment image of a third sample vehicle and actual running information of the third sample vehicle, wherein the actual running information comprises actual surrounding vehicle information of the third sample vehicle, actual lane line information of a road in front of the third sample vehicle, an actual track of the actual surrounding vehicle of the third sample vehicle and an actual running route of the third sample vehicle;

and performing model training by taking the third surrounding environment image as the input of the original perception prediction model and taking the actual running information as the target output of the original perception prediction model so as to obtain the original perception prediction model.

According to a second aspect of the embodiments of the present disclosure, there is provided a vehicle control method including:

acquiring a current surrounding environment image of a target vehicle;

inputting the current surrounding environment image into a pre-trained target perception prediction model to obtain target running information of the target vehicle, wherein the target perception prediction model is trained based on the perception prediction model training method provided by the first aspect of the disclosure;

And controlling the target vehicle to run according to the target running information.

Optionally, the method further comprises:

and updating parameters of the target perception prediction model according to a preset period, so that the target vehicle predicts running information according to the target perception prediction model obtained after the parameters are updated.

According to a third aspect of embodiments of the present disclosure, there is provided a perceptual prediction model training device, comprising:

a first acquisition module configured to acquire a first ambient environment image of a first sample vehicle;

the first prediction module is configured to input the first surrounding environment image into a pre-trained original perception prediction model to obtain first prediction running information of the first sample vehicle;

the first scoring module is configured to input the first predicted running information into a pre-trained rewarding model to obtain a first score of the first predicted running information;

and the fine tuning module is configured to fine tune the original perception prediction model through a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score to obtain a target perception prediction model.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a vehicle control apparatus including:

a second acquisition module configured to acquire a current surrounding image of the target vehicle;

the second prediction module is configured to input the current surrounding environment image into a pre-trained target perception prediction model to obtain target running information of the target vehicle, wherein the target perception prediction model is trained based on the perception prediction model training method provided by the first aspect of the disclosure;

and the control module is configured to control the target vehicle to run according to the target running information.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

the executable instructions, when executed, implement the steps of the perceptual prediction model training method provided in the first aspect of the present disclosure or the steps of the vehicle control method provided in the second aspect of the present disclosure.

According to a sixth aspect of embodiments of the present disclosure, there is provided a vehicle comprising or being connected with the electronic device provided in the fifth aspect of the present disclosure.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the perceptual prediction model training method provided in the first aspect of the present disclosure or the steps of the vehicle control method provided in the second aspect of the present disclosure.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the original perception prediction model is finely adjusted based on the reinforcement learning method, and the original perception prediction model can be subjected to iterative optimization so as to improve the full-scene-oriented perception prediction model capacity, so that the target perception prediction model can acquire an optimal perception prediction result in a complex scene, the expectation of human beings is better met, the problem that the perception prediction is inaccurate in the complex scene is solved, and the safe and stable running of the vehicle is ensured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart illustrating a method of perceptual prediction model training, according to an exemplary embodiment.

Fig. 2 is a schematic diagram showing a first predicted traveling information according to an exemplary embodiment.

FIG. 3 is a flow chart illustrating a method of perceptual prediction model training, according to another exemplary embodiment.

FIG. 4 is a schematic diagram illustrating a rewards model training process in accordance with an exemplary embodiment.

Fig. 5 is a flowchart illustrating a vehicle control method according to an exemplary embodiment.

FIG. 6 is a block diagram illustrating a perceptual prediction model training device, according to an exemplary embodiment.

Fig. 7 is a block diagram of a vehicle control apparatus according to an exemplary embodiment.

Fig. 8 is a functional block diagram of a vehicle, according to an exemplary embodiment.

FIG. 9 is a block diagram illustrating an apparatus for perceptual prediction model training, according to an example embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, all actions for acquiring signals, information or data in the present application are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

FIG. 1 is a flow chart illustrating a method of perceptual prediction model training, according to an exemplary embodiment. As shown in FIG. 1, the method may include the following S101-S104.

In S101, a first ambient environment image of a first sample vehicle is acquired.

In the present disclosure, one-way mining data may be randomly selected from the first-way mining data set as the first peripheral environmental information. The first road acquisition data set may include a surrounding environment image acquired by a first sample vehicle through a self-sensor (e.g., a laser radar, a millimeter wave radar, a camera, etc.) during the history driving, or a surrounding environment image acquired by a plurality of sample vehicles through a self-sensor during the history driving, the first sample vehicle being any one of the plurality of sample vehicles.

In S102, a first surrounding image is input into a pre-trained original perception prediction model to obtain first predicted running information of a first sample vehicle.

In the present disclosure, as shown in fig. 2, the first predicted traveling information may include predicted surrounding vehicle information of the first sample vehicle, predicted lane line information of a road ahead of the first sample vehicle, a predicted trajectory of the predicted surrounding vehicle of the first sample vehicle, and a predicted traveling route of the first sample vehicle. The predicted surrounding vehicle information refers to predicted surrounding vehicle information of a corresponding sample vehicle, the predicted lane line information refers to predicted lane line information of a road in front of the corresponding sample vehicle, the predicted track of the predicted surrounding vehicle refers to a predicted track of a target surrounding vehicle, and the predicted driving route refers to a predicted driving route of the corresponding sample vehicle, wherein the target surrounding vehicle comprises predicted surrounding vehicles of the corresponding sample vehicle.

In S103, the first predicted travel information is input to a pre-trained bonus model, and a first score of the first predicted travel information is obtained.

In the present disclosure, the reward model may be a scoring model for scoring the first predicted travel information.

In S104, the original perceptual prediction model is fine-tuned by a reinforcement learning method according to the first surrounding image, the first predicted traveling information and the first score, so as to obtain a target perceptual prediction model.

In the present disclosure, the raw perceptual prediction model may be fine-tuned by reinforcement learning methods such as near-end policy optimization (Proximal Policy Optimization, PPO) algorithms, trust zone policy optimization (Trust Region Policy Optimization, TRPO), markov decision process (Markov Decision Process, MDP), depth deterministic policy gradient (Deep Deterministic Policy Gradient, DDPG), etc.

When the first predicted running information simultaneously comprises predicted surrounding vehicle information of the first sample vehicle, predicted lane line information of a road in front of the first sample vehicle, predicted track of the predicted surrounding vehicle of the first sample vehicle and a predicted running route of the first sample vehicle, the first predicted running information is evaluated through the reward model, and the obtained first score can represent the overall automatic driving performance, so that the accuracy of the target perception prediction model is ensured.

The following describes in detail the embodiment of fine tuning the original perception prediction model by the reinforcement learning method according to the first surrounding image, the first predicted traveling information, and the first score in S104. Specifically, the reinforcement learning method may be a near-end policy optimization algorithm, i.e., PPO, where the original perceptual prediction model is essentially a policy network and the reward model is essentially a value network, and the original perceptual prediction model may be fine-tuned by the following steps [1] and [2 ]:

step [1]: and determining an objective function of the original perception prediction model according to the first surrounding image, the first prediction running information and the first score.

For example, the objective function of the original perceptual prediction model may be determined from the first surrounding image, the first predicted travel information, and the first score by the following equation:

wherein, the liquid crystal display device comprises a liquid crystal display device,φthe method comprises the steps of obtaining trainable model parameters in an original perception prediction model;

an objective function of an original perception prediction model; />

The method comprises the steps of (1) obtaining an original perception prediction model; />

The original perception prediction model obtained after model parameter updating is the target perception prediction model; / >

Is a first ambient environment image; />

The first predicted running information;

is the first score; />

Is at->

An estimate of future prize accumulation under distribution;

to obey +.>

The goal is to encourage the original perceptual prediction model to produce a larger cumulative return high score response;βis a weight coefficient; />

Is->

And->

The difference of the distribution of actions generated under the condition of the same input takes KL divergence as a measure, and the aim is to restrict the original perception prediction model obtained after parameter updating so that the difference between the original perception prediction model and the original perception prediction model cannot be excessively large.

Wherein, confirm

、/>

May be operated in a manner commonly used in the art, and is not limited by the present disclosure.

Step [2]: and according to the objective function, updating model parameters of the original perception prediction model by adopting a random gradient descent method.

In addition, in order to improve the accuracy of the perception prediction of the target perception prediction model, the original perception prediction model can be subjected to multiple parameter updating. Specifically, when the reinforcement learning method is a near-end policy optimization algorithm, as shown in fig. 3, after S104, the method may further include the following S105.

In S105, it is determined whether the first training cutoff condition is satisfied.

In one embodiment, the first training cutoff condition may be that the number of training times reaches a first preset number of times, and the first preset number of times may be set according to an actual usage scenario.

In another embodiment, the first training cutoff condition may be that the target loss of the original perceptual prediction model is less than a first preset threshold, which may be set according to an actual usage scenario. Under the condition that the target loss of the original perception prediction model is smaller than the first preset threshold, the perception prediction accuracy of the original perception prediction model can be considered to meet the requirement, and accurate perception prediction can be performed on the complex environment image.

If the first training cutoff condition is not satisfied, the following S106 is executed, and then S101 to S104 are repeatedly executed until the first training cutoff condition is satisfied; if the first training cutoff condition is satisfied, the original perceptual prediction model obtained after the last model parameter update may be determined as the target perceptual prediction model, that is, the following S107 is performed.

In S106, model parameter updating is performed on the bonus model.

Specifically, the dominance function may be calculated based on the first score and the jackpot, and then the parameters of the bonus model are updated using the mean square error loss back propagation of the dominance function. The manner in which the dominance function is determined and the manner in which the parameters of the bonus model are adjusted may be operated in a manner commonly used in the art, and this disclosure is not limited in this regard.

In S107, a target perception prediction model is obtained.

The following describes the training method of the original perception prediction model in detail. Specifically, the original perceptual prediction model may be trained by the following steps (a) and (b):

step (a): and acquiring a third surrounding environment image of the third sample vehicle and actual running information of the third sample vehicle.

In the present disclosure, one-way mining data may be randomly selected from the second-way mining data set as the third surrounding environment information. The second road acquisition data set may include a surrounding environment image acquired by a third sample vehicle through a self-sensor (e.g., a laser radar, a millimeter wave radar, a camera, etc.) during the history running, or a surrounding environment image acquired by a plurality of sample vehicles through a self-sensor during the history running, the third sample vehicle being any one of the plurality of sample vehicles.

The actual traveling information may include actual surrounding vehicle information of the third sample vehicle, actual lane line information of a road ahead of the third sample vehicle, an actual trajectory of the actual surrounding vehicle of the third sample vehicle, and an actual traveling route of the third sample vehicle. The labeling personnel can label the actual surrounding vehicle information of the third sample vehicle, the actual lane line information of the road in front of the third sample vehicle and the actual track of the actual surrounding vehicle of the third sample vehicle according to the collected surrounding environment data of the third sample vehicle in a first preset time period after the third surrounding environment image is collected, and determine the historical driving route of the third sample vehicle in a second preset time period after the third surrounding environment image is collected as the actual driving route of the third sample vehicle.

Step (b): and performing model training by taking the third surrounding environment image as the input of the original perception prediction model and taking the actual running information as the target output of the original perception prediction model so as to obtain the original perception prediction model.

Specifically, a third ambient environment image is input into the original perception prediction model, so that third prediction running information of a third sample vehicle can be obtained; then, according to the difference between the third predicted running information and the actual running information, updating model parameters of the original perception prediction model; and (c) acquiring new training data again, and continuing model training, namely returning to the step (a), until the third training cut-off condition is reached.

In one embodiment, the third training cutoff condition may be that the number of times of training reaches a second preset number of times, the second preset number of times may be set according to an actual usage scenario, when the number of times of training reaches the second preset number of times, it may be determined that the number of times of training is sufficient, and the original perception prediction model may learn sufficient effective features.

In another embodiment, the third training cutoff condition may be that the target loss of the original perceptual prediction model is less than a second preset threshold, which may be set according to the actual usage scenario. Under the condition that the target loss of the original perception prediction model is smaller than the second preset threshold, the perception prediction accuracy of the original perception prediction model can be considered to meet the requirement, and the environment image can be accurately perceived and predicted.

The third predicted traveling information includes predicted surrounding vehicle information of the third sample vehicle, predicted lane line information of a road ahead of the third sample vehicle, a predicted trajectory of the predicted surrounding vehicle of the third sample vehicle, and a predicted traveling route of the third sample vehicle.

For example, a loss may be determined based on a difference between the third predicted travel information and the actual travel information, such that back propagation may be performed based on the loss to adjust parameters of the original perceived prediction model. The manner in which the loss is determined and the parameters are adjusted may be operated in a manner commonly used in the art, and this disclosure is not limited thereto.

For example, the original perceptual prediction model may be a Pre-trained model (generated Pre-trained Transformers, GPT), e.g., GPT-4, GPT-3, etc.

The training method for the bonus model is described in detail below. Specifically, the reward model may be trained by the following steps (1) to (7):

step (1): a second ambient image of a second sample vehicle is acquired.

In the present disclosure, one-way data may be randomly selected from the third-way data set as the second surrounding environment information. The second road acquisition data set may include a surrounding environment image acquired by a second sample vehicle through a self-sensor (e.g., a laser radar, a millimeter wave radar, a camera, etc.) during the history traveling, or a surrounding environment image acquired by a plurality of sample vehicles through a self-sensor during the history traveling, the second sample vehicle being any one of the plurality of sample vehicles.

The first, second and third data sets may be identical, partially identical or completely different, and the disclosure is not limited specifically. Preferably, the three are completely different, so that the perception prediction accuracy of the target perception prediction model can be improved.

Step (2): and inputting the second surrounding environment image into the original perception prediction model to obtain a plurality of second prediction driving information of the second sample vehicle.

In the present disclosure, a plurality of second predicted traveling information corresponds to different prediction probabilities, a second surrounding image is input into an original perceived prediction model, the original perceived prediction model may generate M second predicted traveling information and prediction probabilities of each second predicted traveling information according to the second surrounding image, and at this time, the original perceived prediction model may output N second predicted traveling information with the highest prediction probabilities, that is, the plurality of second predicted traveling information. Wherein M is greater than or equal to N.

Step (3): and determining the actual good-bad ordering result of the second predicted running information.

In the method, labeling personnel can sort the quality of the second prediction running information from the angles of detection accuracy, track prediction and running route prediction rationality to obtain an actual quality sorting result.

Step (4): and inputting the plurality of second predicted running information into the neural network to obtain a second score of each piece of second predicted running information.

The neural network may be, for example, a three-layer fully connected neural network.

Step (5): and sequencing the plurality of second predicted traveling information according to each second score to obtain a predicted good and bad sequencing result.

For example, the plurality of second predicted traveling information may be ranked from large to small according to the second score, to obtain the predicted good-bad ranking result.

Step (6): and updating the model parameters of the neural network according to the actual good and bad sorting result and the predicted good and bad sorting result.

Specifically, the model parameters of the neural network can be updated according to the difference between the actual good and bad sorting result and the predicted good and bad sorting result.

For example, the penalty may be determined based on a difference between the actual good-bad ranking result and the predicted good-bad ranking result, such that back propagation may be performed based on the penalty to adjust the parameters of the neural network. The manner in which the loss is determined and the parameters are adjusted may be operated in a manner commonly used in the art, and this disclosure is not limited thereto.

Illustratively, as shown in fig. 4, the second surrounding image is input into the original perception prediction model to obtain four pieces of second predicted running information of the second sample vehicle, namely, second predicted running information a, second predicted running information B, second predicted running information C and second predicted running information D; the labeling personnel performs good and bad sorting on the four second predicted traveling to obtain actual good and bad sorting results, namely second predicted traveling information D, second predicted traveling information C, second predicted traveling information A and second predicted traveling information B; meanwhile, the four pieces of second predicted running information are input into the neural network to obtain second predicted running information A, second predicted running information B, second predicted running information C and second predicted running information D, wherein second scores of the second predicted running information A, the second predicted running information B, the second predicted running information C and the second predicted running information D are as follows in sequence: 25. 45, 87 and 78, sorting the four pieces of second predicted traveling information from large to small according to a second score to obtain a predicted good and bad sorting result of 'second predicted traveling information C, second predicted traveling information D, second predicted traveling information B and second predicted traveling information A'; then, the model parameters of the neural network can be updated according to the actual good and bad sorting results of the second predicted running information D, the second predicted running information C, the second predicted running information A, the second predicted running information B and the predicted good and bad sorting results of the second predicted running information C, the second predicted running information D, the second predicted running information B and the second predicted running information A.

Step (7): and judging whether the second training cut-off condition is met.

In one embodiment, the second training cutoff condition may be that the number of training times reaches a third preset number of times, the third preset number of times may be set according to an actual usage scenario, when the number of training times reaches the third preset number of times, it may be determined that the number of training times is sufficient, and the reward model may learn sufficient effective features.

In another embodiment, the second training cutoff condition may be that the target loss of the bonus model is less than a third preset threshold, which may be set according to an actual usage scenario. Under the condition that the target loss of the reward model is smaller than the third preset threshold, the scoring accuracy of the reward model can be considered to meet the requirement, and the predicted running information can be accurately evaluated.

If the second training cut-off condition is not met, repeating the steps (1) to (6) until the second training cut-off condition is met; if the second training cutoff condition is satisfied, the neural network obtained after the last model parameter update may be determined as the reward model, that is, the following step (8) is performed.

Step (8): and determining the neural network obtained after the last model parameter updating as a rewarding model.

Fig. 5 is a flowchart illustrating a vehicle control method according to an exemplary embodiment. As shown in fig. 5, the vehicle control method may include the following S201 to S203.

In S201, a current surrounding image of the target vehicle is acquired.

In the present disclosure, the surrounding environment image may be acquired in real time by a sensor (e.g., lidar, millimeter wave radar, camera, etc.) on the target vehicle.

In S202, the current surrounding environment image is input into a pre-trained target perception prediction model to obtain target traveling information of the target vehicle.

In the present disclosure, the target perceptual prediction model is trained based on the above-described perceptual prediction model training method provided by the present disclosure.

In S203, the target vehicle is controlled to travel based on the target travel information.

For example, the target travel information may include a predicted travel route of the target vehicle, and thus the target vehicle may be controlled to travel along the predicted travel route.

In addition, the method can further comprise the following steps:

Specifically, the target perceptual prediction model may be retrained in a manner similar to the perceptual prediction model training method described above to complete parameter updating of the target perceptual prediction model; and then, the target vehicle predicts the running information according to the target perception prediction model obtained after the parameter updating, and further controls the running of the target vehicle.

In the embodiment, the target perception prediction model is updated at regular time, so that the perception prediction accuracy of the target perception prediction model can be ensured, and the safe and stable running of the target vehicle can be ensured.

FIG. 6 is a block diagram illustrating a perceptual prediction model training device, according to an exemplary embodiment. As shown in fig. 6, the perceptual prediction model training device 300 comprises:

a first acquisition module 301 configured to acquire a first surrounding image of a first sample vehicle;

the first prediction module 302 is configured to input the first surrounding image into a pre-trained original perception prediction model to obtain first predicted running information of the first sample vehicle;

A first scoring module 303 configured to input the first predicted travel information into a pre-trained reward model, resulting in a first score for the first predicted travel information;

and the fine tuning module 304 is configured to perform fine tuning on the original perception prediction model by a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score, so as to obtain a target perception prediction model.

the fine tuning module 304 includes:

a determining sub-module configured to determine an objective function of the original perceived prediction model based on the first surrounding image, the first predicted travel information, and the first score;

And the updating sub-module is configured to update model parameters of the original perception prediction model by adopting a random gradient descent method according to the objective function.

the perceptual prediction model training device 300 further comprises:

the first triggering module is configured to trigger the first acquiring module 301 to acquire a first peripheral environment image of a first sample vehicle after the original perception prediction model is finely tuned according to the first peripheral environment image, the first prediction running information and the first score by a reinforcement learning method, and in response to a first training cut-off condition not being met, perform model parameter updating on the reward model;

a first determination module configured to obtain the target perception prediction model in response to the first training cutoff condition being satisfied.

Optionally, the reward model is trained by a reward model building device, where the reward model building device may include:

a third acquisition module configured to acquire a second surrounding image of a second sample vehicle;

the third prediction module is configured to input the second surrounding environment image into the original perception prediction model to obtain a plurality of second prediction running information of the second sample vehicle, wherein the plurality of second prediction running information respectively correspond to different prediction probabilities;

A second determining module configured to determine an actual good-bad ordering result of the plurality of second predicted traveling information;

a second scoring module configured to input the plurality of second predicted travel information into a neural network, resulting in a second score for each of the second predicted travel information;

the sequencing module is configured to sequence the plurality of second predicted traveling information according to each second score to obtain a predicted good and bad sequencing result;

the first updating module is configured to update model parameters of the neural network according to the actual good and bad sorting result and the predicted good and bad sorting result;

the second triggering module is configured to trigger the third acquisition module to acquire a second surrounding environment image of a second sample vehicle in response to the second training cutoff condition not being met;

and a third determining module configured to determine, as the reward model, a neural network obtained after a last model parameter update in response to the second training cutoff condition being satisfied.

The original perception prediction model is obtained through training of an original perception prediction model constructing device, wherein the original perception prediction model constructing device comprises:

a fourth acquisition module configured to acquire a third surrounding image of a third sample vehicle and actual traveling information of the third sample vehicle, wherein the actual traveling information includes actual surrounding vehicle information of the third sample vehicle, actual lane line information of a road ahead of the third sample vehicle, an actual trajectory of the actual surrounding vehicle of the third sample vehicle, and an actual traveling route of the third sample vehicle;

the training module is configured to perform model training by taking the third surrounding environment image as an input of the original perception prediction model and taking the actual running information as a target output of the original perception prediction model so as to obtain the original perception prediction model.

Note that the reward model constructing apparatus may be independent of the perceptual prediction model training apparatus 300, may be integrated in the perceptual prediction model training apparatus 300, and the original perceptual prediction model constructing apparatus may be independent of the perceptual prediction model training apparatus 300, may be integrated in the perceptual prediction model training apparatus 300, and is not particularly limited in the present disclosure.

The specific manner in which the respective modules perform the operations in the above-described embodiment of the perceptual prediction model training device has been described in detail in the embodiment regarding the perceptual prediction model training method, and will not be described in detail herein.

Fig. 7 is a block diagram of a vehicle control apparatus according to an exemplary embodiment. As shown in fig. 7, the vehicle control apparatus 400 includes:

a second acquisition module 401 configured to acquire a current surrounding image of the target vehicle;

the second prediction module 402 is configured to input the current surrounding environment image into a pre-trained target perception prediction model to obtain target running information of the target vehicle, where the target perception prediction model is obtained by training based on the above-mentioned perception prediction model training method provided by the present disclosure;

a control module 403 configured to control the target vehicle to travel according to the target travel information.

Optionally, the vehicle control device 400 further includes:

and the second updating module is configured to update parameters of the target perception prediction model according to a preset period, so that the target vehicle predicts running information according to the target perception prediction model obtained after the parameter updating.

With respect to the vehicle control apparatus in the above-described embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment regarding the vehicle control method, and will not be explained in detail here.

The present disclosure also provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: and executing the executable instructions to implement the steps of the above-mentioned perception prediction model training method or the steps of the above-mentioned vehicle control method provided by the present disclosure.

The disclosure also provides a vehicle, which comprises the electronic device provided by the disclosure, or is connected with the electronic device provided by the disclosure.

The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the above-described perceptual prediction model training method or the steps of the above-described vehicle control method provided by the present disclosure.

The present disclosure also provides a chip comprising a processor and an interface; the processor is configured to read the instructions to perform the above-described perceptual prediction model training method or the above-described vehicle control method provided by the present disclosure.

Fig. 8 is a functional block diagram schematic of a vehicle 600, according to an example embodiment. For example, vehicle 600 may be a hybrid vehicle, but may also be a non-hybrid vehicle, an electric vehicle, a fuel cell vehicle, or other type of vehicle. The vehicle 600 may be an autonomous vehicle.

Referring to fig. 8, a vehicle 600 may include various subsystems, such as an infotainment system 610, a perception system 620, a decision control system 630, a drive system 640, and a computing platform 650. Wherein the vehicle 600 may also include more or fewer subsystems, and each subsystem may include multiple components. In addition, interconnections between each subsystem and between each component of the vehicle 600 may be achieved by wired or wireless means.

In some embodiments, the infotainment system 610 may include a communication system, an entertainment system, a navigation system, and the like.

The perception system 620 may include several sensors for sensing information of the environment surrounding the vehicle 600. For example, the sensing system 620 may include a global positioning system (which may be a GPS system, a beidou system, or other positioning system), an inertial measurement unit (inertial measurement unit, IMU), a lidar, millimeter wave radar, an ultrasonic radar, and a camera device.

Decision control system 630 may include a computing system, a vehicle controller, a steering system, a throttle, and a braking system.

The drive system 640 may include components that provide powered movement of the vehicle 600. In one embodiment, the drive system 640 may include an engine, an energy source, a transmission, and wheels. The engine may be one or a combination of an internal combustion engine, an electric motor, an air compression engine. The engine is capable of converting energy provided by the energy source into mechanical energy.

Some or all of the functions of the vehicle 600 are controlled by the computing platform 650. The computing platform 650 may include at least one processor 651 and a first memory 652, the processor 651 may execute instructions 653 stored in the first memory 652.

The processor 651 may be any conventional processor, such as a commercially available CPU. The processor may also include, for example, an image processor (Graphic Process Unit, GPU), a field programmable gate array (Field Programmable Gate Array, FPGA), a System On Chip (SOC), an application specific integrated Chip (Application Specific Integrated Circuit, ASIC), or a combination thereof.

The first memory 652 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

In addition to instructions 653, the first memory 652 may also store data such as road maps, route information, the position, direction, speed, etc. of the vehicle. The data stored by the first memory 652 may be used by the computing platform 650.

In embodiments of the present disclosure, the processor 651 may execute instructions 653 to perform all or part of the steps of the perceptual prediction model training method described above, or to perform all or part of the steps of the vehicle control method described above.

FIG. 9 is a block diagram illustrating an apparatus 1900 for perceptual prediction model training, according to an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to fig. 9, the apparatus 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by a second memory 1932 for storing instructions, such as application programs, that can be executed by the processing component 1922. The application program stored in the second memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the perceptual prediction model training method described above.

The apparatus 1900 may further comprise a power component 1926 configured to perform power management of the apparatus 1900, a wired or wireless network interface 1950 configured to connect the apparatus 1900 to a network, and an input/output interface 1958. The device 1900 may operate based on an operating system, such as Windows Server, stored in the second memory 1932 ^TM ，Mac OS X ^TM ，Unix ^TM ，Linux ^TM ，FreeBSD ^TM Or the like.

In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described perceptual prediction model training method when executed by the programmable apparatus.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for training a perceptual prediction model, comprising:

Acquiring a first surrounding image of a first sample vehicle;

fine tuning the original perception prediction model through a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score to obtain a target perception prediction model;

wherein, the reinforcement learning method is a near-end strategy optimization algorithm;

2. The method of claim 1, wherein after the step of fine-tuning the original perceptual prediction model by a reinforcement learning method based on the first surrounding image, the first predicted travel information, and the first score, the method further comprises:

3. The method of claim 1, wherein the reward model is trained by:

acquiring a second surrounding image of a second sample vehicle;

4. The method according to claim 1, wherein the first predicted travel information includes predicted surrounding vehicle information of the first sample vehicle, predicted lane line information of a road ahead of the first sample vehicle, a predicted trajectory of the predicted surrounding vehicle of the first sample vehicle, and a predicted travel route of the first sample vehicle;

5. A vehicle control method characterized by comprising:

acquiring a current surrounding environment image of a target vehicle;

inputting the current surrounding environment image into a pre-trained target perception prediction model to obtain target running information of the target vehicle, wherein the target perception prediction model is trained based on the perception prediction model training method according to any one of claims 1-4;

6. The method of claim 5, wherein the method further comprises:

7. A perception prediction model training device, comprising:

the fine tuning module is configured to perform fine tuning on the original perception prediction model through a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score to obtain a target perception prediction model;

the fine tuning module includes:

8. A vehicle control apparatus characterized by comprising:

the second prediction module is configured to input the current surrounding environment image into a pre-trained target perception prediction model to obtain target running information of the target vehicle, wherein the target perception prediction model is trained based on the perception prediction model training method according to any one of claims 1-4;

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to:

the executable instructions, when executed, implement the steps of the method of any one of claims 1-6.

10. A vehicle comprising the electronic device of claim 9 or being connected to the electronic device of claim 9.

11. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1-6.