CN116091894B - Model training method, vehicle control method, device, equipment, vehicle and medium - Google Patents

Model training method, vehicle control method, device, equipment, vehicle and medium Download PDF

Info

Publication number
CN116091894B
CN116091894B CN202310207908.9A CN202310207908A CN116091894B CN 116091894 B CN116091894 B CN 116091894B CN 202310207908 A CN202310207908 A CN 202310207908A CN 116091894 B CN116091894 B CN 116091894B
Authority
CN
China
Prior art keywords
prediction model
vehicle
predicted
model
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310207908.9A
Other languages
Chinese (zh)
Other versions
CN116091894A (en
Inventor
熊安斌
杨奎元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaomi Automobile Technology Co Ltd
Original Assignee
Xiaomi Automobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaomi Automobile Technology Co Ltd filed Critical Xiaomi Automobile Technology Co Ltd
Priority to CN202310207908.9A priority Critical patent/CN116091894B/en
Publication of CN116091894A publication Critical patent/CN116091894A/en
Application granted granted Critical
Publication of CN116091894B publication Critical patent/CN116091894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Transportation (AREA)
  • Human Computer Interaction (AREA)
  • Automation & Control Theory (AREA)
  • Mechanical Engineering (AREA)
  • Traffic Control Systems (AREA)

Abstract

The present disclosure relates to a model training method, a vehicle control method, a device, equipment, a vehicle, and a medium. The method comprises the following steps: acquiring a first surrounding image of a first sample vehicle; inputting the first peripheral environment image into an original perception prediction model to obtain first prediction running information of a first sample vehicle; inputting the first predicted running information into the rewarding model to obtain a first score of the first predicted running information; and fine tuning the original perception prediction model through a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score to obtain a target perception prediction model. The original perception prediction model is finely adjusted based on the reinforcement learning method, iterative optimization can be carried out on the model, the model capacity facing the full scene is improved, the target perception prediction model can obtain the optimal perception prediction result in the complex scene, the human expectation is better met, the problem of inaccurate perception prediction in the complex scene is solved, and safe and stable running of the vehicle is ensured.

Description

Model training method, vehicle control method, device, equipment, vehicle and medium
Technical Field
The disclosure relates to the technical field of automatic driving, and in particular relates to a model training method, a vehicle control method, a device, equipment, a vehicle and a medium.
Background
With the rapid development of the automatic driving technology, the low-speed running of the automatic driving vehicle in a fixed scene and a relatively simple scene is realized in China, but how to realize the full-scene running of the automatic driving vehicle at any speed is still a very challenging problem. The obstacle detection, the track prediction and the vehicle path planning are one of key technologies for realizing advanced automatic driving, and the movement track of the vehicle can be reasonably planned only if an automatic driving vehicle can accurately judge the intention and the future track of surrounding targets (vehicles, pedestrians and the like) like human driving, so that safe and stable navigation is realized.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a model training method, a vehicle control method, a device, an apparatus, a vehicle, and a medium.
According to a first aspect of embodiments of the present disclosure, there is provided a perceptual prediction model training method, comprising:
acquiring a first surrounding image of a first sample vehicle;
Inputting the first peripheral environment image into a pre-trained original perception prediction model to obtain first prediction running information of the first sample vehicle;
inputting the first predicted running information into a pre-trained rewarding model to obtain a first score of the first predicted running information;
and fine tuning the original perception prediction model through a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score to obtain a target perception prediction model.
Optionally, the reinforcement learning method is a near-end policy optimization algorithm;
the fine tuning of the original perception prediction model according to the first surrounding image, the first predicted driving information and the first score by a reinforcement learning method includes:
determining an objective function of the original perception prediction model according to the first surrounding image, the first prediction running information and the first score;
and according to the objective function, updating model parameters of the original perception prediction model by adopting a random gradient descent method.
Optionally, the reinforcement learning method is a near-end policy optimization algorithm;
After the step of fine tuning the original perceptual prediction model by a reinforcement learning method based on the first surrounding image, the first predicted travel information, and the first score, the method further comprises:
in response to the first training cut-off condition not being met, updating model parameters of the reward model, and repeatedly executing the step of acquiring a first peripheral environment image of a first sample vehicle to the step of finely adjusting the original perception prediction model through a reinforcement learning method according to the first peripheral environment image, the first prediction running information and the first score;
and obtaining the target perception prediction model in response to the first training cut-off condition being met.
Optionally, the reward model is trained by:
acquiring a second surrounding image of a second sample vehicle;
inputting the second surrounding environment image into the original perception prediction model to obtain a plurality of second prediction running information of the second sample vehicle, wherein the plurality of second prediction running information respectively correspond to different prediction probabilities;
determining the actual good and bad sequencing results of the second prediction running information;
Inputting the plurality of second predicted running information into a neural network to obtain a second score of each piece of second predicted running information;
sequencing the plurality of second predicted traveling information according to each second score to obtain a predicted good and bad sequencing result;
according to the actual good and bad sorting result and the predicted good and bad sorting result, updating model parameters of the neural network;
in response to the second training cutoff condition not being met, repeating the step of acquiring a second surrounding environment image of a second sample vehicle to the step of updating the model parameters of the neural network according to the actual good-bad ordering result and the predicted good-bad ordering result;
and determining the neural network obtained after the last model parameter update as the rewarding model in response to the second training cutoff condition being met.
Optionally, the first predicted traveling information includes predicted surrounding vehicle information of the first sample vehicle, predicted lane line information of a road ahead of the first sample vehicle, a predicted trajectory of the predicted surrounding vehicle of the first sample vehicle, and a predicted traveling route of the first sample vehicle;
The original perception prediction model is obtained through training in the following mode:
acquiring a third surrounding environment image of a third sample vehicle and actual running information of the third sample vehicle, wherein the actual running information comprises actual surrounding vehicle information of the third sample vehicle, actual lane line information of a road in front of the third sample vehicle, an actual track of the actual surrounding vehicle of the third sample vehicle and an actual running route of the third sample vehicle;
and performing model training by taking the third surrounding environment image as the input of the original perception prediction model and taking the actual running information as the target output of the original perception prediction model so as to obtain the original perception prediction model.
According to a second aspect of the embodiments of the present disclosure, there is provided a vehicle control method including:
acquiring a current surrounding environment image of a target vehicle;
inputting the current surrounding environment image into a pre-trained target perception prediction model to obtain target running information of the target vehicle, wherein the target perception prediction model is trained based on the perception prediction model training method provided by the first aspect of the disclosure;
And controlling the target vehicle to run according to the target running information.
Optionally, the method further comprises:
and updating parameters of the target perception prediction model according to a preset period, so that the target vehicle predicts running information according to the target perception prediction model obtained after the parameters are updated.
According to a third aspect of embodiments of the present disclosure, there is provided a perceptual prediction model training device, comprising:
a first acquisition module configured to acquire a first ambient environment image of a first sample vehicle;
the first prediction module is configured to input the first surrounding environment image into a pre-trained original perception prediction model to obtain first prediction running information of the first sample vehicle;
the first scoring module is configured to input the first predicted running information into a pre-trained rewarding model to obtain a first score of the first predicted running information;
and the fine tuning module is configured to fine tune the original perception prediction model through a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score to obtain a target perception prediction model.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a vehicle control apparatus including:
a second acquisition module configured to acquire a current surrounding image of the target vehicle;
the second prediction module is configured to input the current surrounding environment image into a pre-trained target perception prediction model to obtain target running information of the target vehicle, wherein the target perception prediction model is trained based on the perception prediction model training method provided by the first aspect of the disclosure;
and the control module is configured to control the target vehicle to run according to the target running information.
According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
the executable instructions, when executed, implement the steps of the perceptual prediction model training method provided in the first aspect of the present disclosure or the steps of the vehicle control method provided in the second aspect of the present disclosure.
According to a sixth aspect of embodiments of the present disclosure, there is provided a vehicle comprising or being connected with the electronic device provided in the fifth aspect of the present disclosure.
According to a seventh aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the perceptual prediction model training method provided in the first aspect of the present disclosure or the steps of the vehicle control method provided in the second aspect of the present disclosure.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the original perception prediction model is finely adjusted based on the reinforcement learning method, and the original perception prediction model can be subjected to iterative optimization so as to improve the full-scene-oriented perception prediction model capacity, so that the target perception prediction model can acquire an optimal perception prediction result in a complex scene, the expectation of human beings is better met, the problem that the perception prediction is inaccurate in the complex scene is solved, and the safe and stable running of the vehicle is ensured.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow chart illustrating a method of perceptual prediction model training, according to an exemplary embodiment.
Fig. 2 is a schematic diagram showing a first predicted traveling information according to an exemplary embodiment.
FIG. 3 is a flow chart illustrating a method of perceptual prediction model training, according to another exemplary embodiment.
FIG. 4 is a schematic diagram illustrating a rewards model training process in accordance with an exemplary embodiment.
Fig. 5 is a flowchart illustrating a vehicle control method according to an exemplary embodiment.
FIG. 6 is a block diagram illustrating a perceptual prediction model training device, according to an exemplary embodiment.
Fig. 7 is a block diagram of a vehicle control apparatus according to an exemplary embodiment.
Fig. 8 is a functional block diagram of a vehicle, according to an exemplary embodiment.
FIG. 9 is a block diagram illustrating an apparatus for perceptual prediction model training, according to an example embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be noted that, all actions for acquiring signals, information or data in the present application are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.
FIG. 1 is a flow chart illustrating a method of perceptual prediction model training, according to an exemplary embodiment. As shown in FIG. 1, the method may include the following S101-S104.
In S101, a first ambient environment image of a first sample vehicle is acquired.
In the present disclosure, one-way mining data may be randomly selected from the first-way mining data set as the first peripheral environmental information. The first road acquisition data set may include a surrounding environment image acquired by a first sample vehicle through a self-sensor (e.g., a laser radar, a millimeter wave radar, a camera, etc.) during the history driving, or a surrounding environment image acquired by a plurality of sample vehicles through a self-sensor during the history driving, the first sample vehicle being any one of the plurality of sample vehicles.
In S102, a first surrounding image is input into a pre-trained original perception prediction model to obtain first predicted running information of a first sample vehicle.
In the present disclosure, as shown in fig. 2, the first predicted traveling information may include predicted surrounding vehicle information of the first sample vehicle, predicted lane line information of a road ahead of the first sample vehicle, a predicted trajectory of the predicted surrounding vehicle of the first sample vehicle, and a predicted traveling route of the first sample vehicle. The predicted surrounding vehicle information refers to predicted surrounding vehicle information of a corresponding sample vehicle, the predicted lane line information refers to predicted lane line information of a road in front of the corresponding sample vehicle, the predicted track of the predicted surrounding vehicle refers to a predicted track of a target surrounding vehicle, and the predicted driving route refers to a predicted driving route of the corresponding sample vehicle, wherein the target surrounding vehicle comprises predicted surrounding vehicles of the corresponding sample vehicle.
In S103, the first predicted travel information is input to a pre-trained bonus model, and a first score of the first predicted travel information is obtained.
In the present disclosure, the reward model may be a scoring model for scoring the first predicted travel information.
In S104, the original perceptual prediction model is fine-tuned by a reinforcement learning method according to the first surrounding image, the first predicted traveling information and the first score, so as to obtain a target perceptual prediction model.
In the present disclosure, the raw perceptual prediction model may be fine-tuned by reinforcement learning methods such as near-end policy optimization (Proximal Policy Optimization, PPO) algorithms, trust zone policy optimization (Trust Region Policy Optimization, TRPO), markov decision process (Markov Decision Process, MDP), depth deterministic policy gradient (Deep Deterministic Policy Gradient, DDPG), etc.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the original perception prediction model is finely adjusted based on the reinforcement learning method, and the original perception prediction model can be subjected to iterative optimization so as to improve the full-scene-oriented perception prediction model capacity, so that the target perception prediction model can acquire an optimal perception prediction result in a complex scene, the expectation of human beings is better met, the problem that the perception prediction is inaccurate in the complex scene is solved, and the safe and stable running of the vehicle is ensured.
When the first predicted running information simultaneously comprises predicted surrounding vehicle information of the first sample vehicle, predicted lane line information of a road in front of the first sample vehicle, predicted track of the predicted surrounding vehicle of the first sample vehicle and a predicted running route of the first sample vehicle, the first predicted running information is evaluated through the reward model, and the obtained first score can represent the overall automatic driving performance, so that the accuracy of the target perception prediction model is ensured.
The following describes in detail the embodiment of fine tuning the original perception prediction model by the reinforcement learning method according to the first surrounding image, the first predicted traveling information, and the first score in S104. Specifically, the reinforcement learning method may be a near-end policy optimization algorithm, i.e., PPO, where the original perceptual prediction model is essentially a policy network and the reward model is essentially a value network, and the original perceptual prediction model may be fine-tuned by the following steps [1] and [2 ]:
step [1]: and determining an objective function of the original perception prediction model according to the first surrounding image, the first prediction running information and the first score.
For example, the objective function of the original perceptual prediction model may be determined from the first surrounding image, the first predicted travel information, and the first score by the following equation:
Figure SMS_1
wherein, the liquid crystal display device comprises a liquid crystal display device,φthe method comprises the steps of obtaining trainable model parameters in an original perception prediction model;
Figure SMS_5
an objective function of an original perception prediction model; />
Figure SMS_7
The method comprises the steps of (1) obtaining an original perception prediction model; />
Figure SMS_10
The original perception prediction model obtained after model parameter updating is the target perception prediction model; / >
Figure SMS_4
Is a first ambient environment image; />
Figure SMS_9
The first predicted running information;
Figure SMS_12
is the first score; />
Figure SMS_14
Is at->
Figure SMS_2
An estimate of future prize accumulation under distribution;
Figure SMS_6
to obey +.>
Figure SMS_11
The goal is to encourage the original perceptual prediction model to produce a larger cumulative return high score response;βis a weight coefficient; />
Figure SMS_13
Is->
Figure SMS_3
And->
Figure SMS_8
The difference of the distribution of actions generated under the condition of the same input takes KL divergence as a measure, and the aim is to restrict the original perception prediction model obtained after parameter updating so that the difference between the original perception prediction model and the original perception prediction model cannot be excessively large.
Wherein, confirm
Figure SMS_15
、/>
Figure SMS_16
May be operated in a manner commonly used in the art, and is not limited by the present disclosure.
Step [2]: and according to the objective function, updating model parameters of the original perception prediction model by adopting a random gradient descent method.
In addition, in order to improve the accuracy of the perception prediction of the target perception prediction model, the original perception prediction model can be subjected to multiple parameter updating. Specifically, when the reinforcement learning method is a near-end policy optimization algorithm, as shown in fig. 3, after S104, the method may further include the following S105.
In S105, it is determined whether the first training cutoff condition is satisfied.
In one embodiment, the first training cutoff condition may be that the number of training times reaches a first preset number of times, and the first preset number of times may be set according to an actual usage scenario.
In another embodiment, the first training cutoff condition may be that the target loss of the original perceptual prediction model is less than a first preset threshold, which may be set according to an actual usage scenario. Under the condition that the target loss of the original perception prediction model is smaller than the first preset threshold, the perception prediction accuracy of the original perception prediction model can be considered to meet the requirement, and accurate perception prediction can be performed on the complex environment image.
If the first training cutoff condition is not satisfied, the following S106 is executed, and then S101 to S104 are repeatedly executed until the first training cutoff condition is satisfied; if the first training cutoff condition is satisfied, the original perceptual prediction model obtained after the last model parameter update may be determined as the target perceptual prediction model, that is, the following S107 is performed.
In S106, model parameter updating is performed on the bonus model.
Specifically, the dominance function may be calculated based on the first score and the jackpot, and then the parameters of the bonus model are updated using the mean square error loss back propagation of the dominance function. The manner in which the dominance function is determined and the manner in which the parameters of the bonus model are adjusted may be operated in a manner commonly used in the art, and this disclosure is not limited in this regard.
In S107, a target perception prediction model is obtained.
The following describes the training method of the original perception prediction model in detail. Specifically, the original perceptual prediction model may be trained by the following steps (a) and (b):
step (a): and acquiring a third surrounding environment image of the third sample vehicle and actual running information of the third sample vehicle.
In the present disclosure, one-way mining data may be randomly selected from the second-way mining data set as the third surrounding environment information. The second road acquisition data set may include a surrounding environment image acquired by a third sample vehicle through a self-sensor (e.g., a laser radar, a millimeter wave radar, a camera, etc.) during the history running, or a surrounding environment image acquired by a plurality of sample vehicles through a self-sensor during the history running, the third sample vehicle being any one of the plurality of sample vehicles.
The actual traveling information may include actual surrounding vehicle information of the third sample vehicle, actual lane line information of a road ahead of the third sample vehicle, an actual trajectory of the actual surrounding vehicle of the third sample vehicle, and an actual traveling route of the third sample vehicle. The labeling personnel can label the actual surrounding vehicle information of the third sample vehicle, the actual lane line information of the road in front of the third sample vehicle and the actual track of the actual surrounding vehicle of the third sample vehicle according to the collected surrounding environment data of the third sample vehicle in a first preset time period after the third surrounding environment image is collected, and determine the historical driving route of the third sample vehicle in a second preset time period after the third surrounding environment image is collected as the actual driving route of the third sample vehicle.
Step (b): and performing model training by taking the third surrounding environment image as the input of the original perception prediction model and taking the actual running information as the target output of the original perception prediction model so as to obtain the original perception prediction model.
Specifically, a third ambient environment image is input into the original perception prediction model, so that third prediction running information of a third sample vehicle can be obtained; then, according to the difference between the third predicted running information and the actual running information, updating model parameters of the original perception prediction model; and (c) acquiring new training data again, and continuing model training, namely returning to the step (a), until the third training cut-off condition is reached.
In one embodiment, the third training cutoff condition may be that the number of times of training reaches a second preset number of times, the second preset number of times may be set according to an actual usage scenario, when the number of times of training reaches the second preset number of times, it may be determined that the number of times of training is sufficient, and the original perception prediction model may learn sufficient effective features.
In another embodiment, the third training cutoff condition may be that the target loss of the original perceptual prediction model is less than a second preset threshold, which may be set according to the actual usage scenario. Under the condition that the target loss of the original perception prediction model is smaller than the second preset threshold, the perception prediction accuracy of the original perception prediction model can be considered to meet the requirement, and the environment image can be accurately perceived and predicted.
The third predicted traveling information includes predicted surrounding vehicle information of the third sample vehicle, predicted lane line information of a road ahead of the third sample vehicle, a predicted trajectory of the predicted surrounding vehicle of the third sample vehicle, and a predicted traveling route of the third sample vehicle.
For example, a loss may be determined based on a difference between the third predicted travel information and the actual travel information, such that back propagation may be performed based on the loss to adjust parameters of the original perceived prediction model. The manner in which the loss is determined and the parameters are adjusted may be operated in a manner commonly used in the art, and this disclosure is not limited thereto.
For example, the original perceptual prediction model may be a Pre-trained model (generated Pre-trained Transformers, GPT), e.g., GPT-4, GPT-3, etc.
The training method for the bonus model is described in detail below. Specifically, the reward model may be trained by the following steps (1) to (7):
step (1): a second ambient image of a second sample vehicle is acquired.
In the present disclosure, one-way data may be randomly selected from the third-way data set as the second surrounding environment information. The second road acquisition data set may include a surrounding environment image acquired by a second sample vehicle through a self-sensor (e.g., a laser radar, a millimeter wave radar, a camera, etc.) during the history traveling, or a surrounding environment image acquired by a plurality of sample vehicles through a self-sensor during the history traveling, the second sample vehicle being any one of the plurality of sample vehicles.
The first, second and third data sets may be identical, partially identical or completely different, and the disclosure is not limited specifically. Preferably, the three are completely different, so that the perception prediction accuracy of the target perception prediction model can be improved.
Step (2): and inputting the second surrounding environment image into the original perception prediction model to obtain a plurality of second prediction driving information of the second sample vehicle.
In the present disclosure, a plurality of second predicted traveling information corresponds to different prediction probabilities, a second surrounding image is input into an original perceived prediction model, the original perceived prediction model may generate M second predicted traveling information and prediction probabilities of each second predicted traveling information according to the second surrounding image, and at this time, the original perceived prediction model may output N second predicted traveling information with the highest prediction probabilities, that is, the plurality of second predicted traveling information. Wherein M is greater than or equal to N.
Step (3): and determining the actual good-bad ordering result of the second predicted running information.
In the method, labeling personnel can sort the quality of the second prediction running information from the angles of detection accuracy, track prediction and running route prediction rationality to obtain an actual quality sorting result.
Step (4): and inputting the plurality of second predicted running information into the neural network to obtain a second score of each piece of second predicted running information.
The neural network may be, for example, a three-layer fully connected neural network.
Step (5): and sequencing the plurality of second predicted traveling information according to each second score to obtain a predicted good and bad sequencing result.
For example, the plurality of second predicted traveling information may be ranked from large to small according to the second score, to obtain the predicted good-bad ranking result.
Step (6): and updating the model parameters of the neural network according to the actual good and bad sorting result and the predicted good and bad sorting result.
Specifically, the model parameters of the neural network can be updated according to the difference between the actual good and bad sorting result and the predicted good and bad sorting result.
For example, the penalty may be determined based on a difference between the actual good-bad ranking result and the predicted good-bad ranking result, such that back propagation may be performed based on the penalty to adjust the parameters of the neural network. The manner in which the loss is determined and the parameters are adjusted may be operated in a manner commonly used in the art, and this disclosure is not limited thereto.
Illustratively, as shown in fig. 4, the second surrounding image is input into the original perception prediction model to obtain four pieces of second predicted running information of the second sample vehicle, namely, second predicted running information a, second predicted running information B, second predicted running information C and second predicted running information D; the labeling personnel performs good and bad sorting on the four second predicted traveling to obtain actual good and bad sorting results, namely second predicted traveling information D, second predicted traveling information C, second predicted traveling information A and second predicted traveling information B; meanwhile, the four pieces of second predicted running information are input into the neural network to obtain second predicted running information A, second predicted running information B, second predicted running information C and second predicted running information D, wherein second scores of the second predicted running information A, the second predicted running information B, the second predicted running information C and the second predicted running information D are as follows in sequence: 25. 45, 87 and 78, sorting the four pieces of second predicted traveling information from large to small according to a second score to obtain a predicted good and bad sorting result of 'second predicted traveling information C, second predicted traveling information D, second predicted traveling information B and second predicted traveling information A'; then, the model parameters of the neural network can be updated according to the actual good and bad sorting results of the second predicted running information D, the second predicted running information C, the second predicted running information A, the second predicted running information B and the predicted good and bad sorting results of the second predicted running information C, the second predicted running information D, the second predicted running information B and the second predicted running information A.
Step (7): and judging whether the second training cut-off condition is met.
In one embodiment, the second training cutoff condition may be that the number of training times reaches a third preset number of times, the third preset number of times may be set according to an actual usage scenario, when the number of training times reaches the third preset number of times, it may be determined that the number of training times is sufficient, and the reward model may learn sufficient effective features.
In another embodiment, the second training cutoff condition may be that the target loss of the bonus model is less than a third preset threshold, which may be set according to an actual usage scenario. Under the condition that the target loss of the reward model is smaller than the third preset threshold, the scoring accuracy of the reward model can be considered to meet the requirement, and the predicted running information can be accurately evaluated.
If the second training cut-off condition is not met, repeating the steps (1) to (6) until the second training cut-off condition is met; if the second training cutoff condition is satisfied, the neural network obtained after the last model parameter update may be determined as the reward model, that is, the following step (8) is performed.
Step (8): and determining the neural network obtained after the last model parameter updating as a rewarding model.
Fig. 5 is a flowchart illustrating a vehicle control method according to an exemplary embodiment. As shown in fig. 5, the vehicle control method may include the following S201 to S203.
In S201, a current surrounding image of the target vehicle is acquired.
In the present disclosure, the surrounding environment image may be acquired in real time by a sensor (e.g., lidar, millimeter wave radar, camera, etc.) on the target vehicle.
In S202, the current surrounding environment image is input into a pre-trained target perception prediction model to obtain target traveling information of the target vehicle.
In the present disclosure, the target perceptual prediction model is trained based on the above-described perceptual prediction model training method provided by the present disclosure.
In S203, the target vehicle is controlled to travel based on the target travel information.
For example, the target travel information may include a predicted travel route of the target vehicle, and thus the target vehicle may be controlled to travel along the predicted travel route.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the original perception prediction model is finely adjusted based on the reinforcement learning method, and the original perception prediction model can be subjected to iterative optimization so as to improve the full-scene-oriented perception prediction model capacity, so that the target perception prediction model can acquire an optimal perception prediction result in a complex scene, the expectation of human beings is better met, the problem that the perception prediction is inaccurate in the complex scene is solved, and the safe and stable running of the vehicle is ensured.
In addition, the method can further comprise the following steps:
and updating parameters of the target perception prediction model according to a preset period, so that the target vehicle predicts running information according to the target perception prediction model obtained after the parameters are updated.
Specifically, the target perceptual prediction model may be retrained in a manner similar to the perceptual prediction model training method described above to complete parameter updating of the target perceptual prediction model; and then, the target vehicle predicts the running information according to the target perception prediction model obtained after the parameter updating, and further controls the running of the target vehicle.
In the embodiment, the target perception prediction model is updated at regular time, so that the perception prediction accuracy of the target perception prediction model can be ensured, and the safe and stable running of the target vehicle can be ensured.
FIG. 6 is a block diagram illustrating a perceptual prediction model training device, according to an exemplary embodiment. As shown in fig. 6, the perceptual prediction model training device 300 comprises:
a first acquisition module 301 configured to acquire a first surrounding image of a first sample vehicle;
the first prediction module 302 is configured to input the first surrounding image into a pre-trained original perception prediction model to obtain first predicted running information of the first sample vehicle;
A first scoring module 303 configured to input the first predicted travel information into a pre-trained reward model, resulting in a first score for the first predicted travel information;
and the fine tuning module 304 is configured to perform fine tuning on the original perception prediction model by a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score, so as to obtain a target perception prediction model.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the original perception prediction model is finely adjusted based on the reinforcement learning method, and the original perception prediction model can be subjected to iterative optimization so as to improve the full-scene-oriented perception prediction model capacity, so that the target perception prediction model can acquire an optimal perception prediction result in a complex scene, the expectation of human beings is better met, the problem that the perception prediction is inaccurate in the complex scene is solved, and the safe and stable running of the vehicle is ensured.
Optionally, the reinforcement learning method is a near-end policy optimization algorithm;
the fine tuning module 304 includes:
a determining sub-module configured to determine an objective function of the original perceived prediction model based on the first surrounding image, the first predicted travel information, and the first score;
And the updating sub-module is configured to update model parameters of the original perception prediction model by adopting a random gradient descent method according to the objective function.
Optionally, the reinforcement learning method is a near-end policy optimization algorithm;
the perceptual prediction model training device 300 further comprises:
the first triggering module is configured to trigger the first acquiring module 301 to acquire a first peripheral environment image of a first sample vehicle after the original perception prediction model is finely tuned according to the first peripheral environment image, the first prediction running information and the first score by a reinforcement learning method, and in response to a first training cut-off condition not being met, perform model parameter updating on the reward model;
a first determination module configured to obtain the target perception prediction model in response to the first training cutoff condition being satisfied.
Optionally, the reward model is trained by a reward model building device, where the reward model building device may include:
a third acquisition module configured to acquire a second surrounding image of a second sample vehicle;
the third prediction module is configured to input the second surrounding environment image into the original perception prediction model to obtain a plurality of second prediction running information of the second sample vehicle, wherein the plurality of second prediction running information respectively correspond to different prediction probabilities;
A second determining module configured to determine an actual good-bad ordering result of the plurality of second predicted traveling information;
a second scoring module configured to input the plurality of second predicted travel information into a neural network, resulting in a second score for each of the second predicted travel information;
the sequencing module is configured to sequence the plurality of second predicted traveling information according to each second score to obtain a predicted good and bad sequencing result;
the first updating module is configured to update model parameters of the neural network according to the actual good and bad sorting result and the predicted good and bad sorting result;
the second triggering module is configured to trigger the third acquisition module to acquire a second surrounding environment image of a second sample vehicle in response to the second training cutoff condition not being met;
and a third determining module configured to determine, as the reward model, a neural network obtained after a last model parameter update in response to the second training cutoff condition being satisfied.
Optionally, the first predicted traveling information includes predicted surrounding vehicle information of the first sample vehicle, predicted lane line information of a road ahead of the first sample vehicle, a predicted trajectory of the predicted surrounding vehicle of the first sample vehicle, and a predicted traveling route of the first sample vehicle;
The original perception prediction model is obtained through training of an original perception prediction model constructing device, wherein the original perception prediction model constructing device comprises:
a fourth acquisition module configured to acquire a third surrounding image of a third sample vehicle and actual traveling information of the third sample vehicle, wherein the actual traveling information includes actual surrounding vehicle information of the third sample vehicle, actual lane line information of a road ahead of the third sample vehicle, an actual trajectory of the actual surrounding vehicle of the third sample vehicle, and an actual traveling route of the third sample vehicle;
the training module is configured to perform model training by taking the third surrounding environment image as an input of the original perception prediction model and taking the actual running information as a target output of the original perception prediction model so as to obtain the original perception prediction model.
Note that the reward model constructing apparatus may be independent of the perceptual prediction model training apparatus 300, may be integrated in the perceptual prediction model training apparatus 300, and the original perceptual prediction model constructing apparatus may be independent of the perceptual prediction model training apparatus 300, may be integrated in the perceptual prediction model training apparatus 300, and is not particularly limited in the present disclosure.
The specific manner in which the respective modules perform the operations in the above-described embodiment of the perceptual prediction model training device has been described in detail in the embodiment regarding the perceptual prediction model training method, and will not be described in detail herein.
Fig. 7 is a block diagram of a vehicle control apparatus according to an exemplary embodiment. As shown in fig. 7, the vehicle control apparatus 400 includes:
a second acquisition module 401 configured to acquire a current surrounding image of the target vehicle;
the second prediction module 402 is configured to input the current surrounding environment image into a pre-trained target perception prediction model to obtain target running information of the target vehicle, where the target perception prediction model is obtained by training based on the above-mentioned perception prediction model training method provided by the present disclosure;
a control module 403 configured to control the target vehicle to travel according to the target travel information.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the original perception prediction model is finely adjusted based on the reinforcement learning method, and the original perception prediction model can be subjected to iterative optimization so as to improve the full-scene-oriented perception prediction model capacity, so that the target perception prediction model can acquire an optimal perception prediction result in a complex scene, the expectation of human beings is better met, the problem that the perception prediction is inaccurate in the complex scene is solved, and the safe and stable running of the vehicle is ensured.
Optionally, the vehicle control device 400 further includes:
and the second updating module is configured to update parameters of the target perception prediction model according to a preset period, so that the target vehicle predicts running information according to the target perception prediction model obtained after the parameter updating.
With respect to the vehicle control apparatus in the above-described embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment regarding the vehicle control method, and will not be explained in detail here.
The present disclosure also provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: and executing the executable instructions to implement the steps of the above-mentioned perception prediction model training method or the steps of the above-mentioned vehicle control method provided by the present disclosure.
The disclosure also provides a vehicle, which comprises the electronic device provided by the disclosure, or is connected with the electronic device provided by the disclosure.
The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the above-described perceptual prediction model training method or the steps of the above-described vehicle control method provided by the present disclosure.
The present disclosure also provides a chip comprising a processor and an interface; the processor is configured to read the instructions to perform the above-described perceptual prediction model training method or the above-described vehicle control method provided by the present disclosure.
Fig. 8 is a functional block diagram schematic of a vehicle 600, according to an example embodiment. For example, vehicle 600 may be a hybrid vehicle, but may also be a non-hybrid vehicle, an electric vehicle, a fuel cell vehicle, or other type of vehicle. The vehicle 600 may be an autonomous vehicle.
Referring to fig. 8, a vehicle 600 may include various subsystems, such as an infotainment system 610, a perception system 620, a decision control system 630, a drive system 640, and a computing platform 650. Wherein the vehicle 600 may also include more or fewer subsystems, and each subsystem may include multiple components. In addition, interconnections between each subsystem and between each component of the vehicle 600 may be achieved by wired or wireless means.
In some embodiments, the infotainment system 610 may include a communication system, an entertainment system, a navigation system, and the like.
The perception system 620 may include several sensors for sensing information of the environment surrounding the vehicle 600. For example, the sensing system 620 may include a global positioning system (which may be a GPS system, a beidou system, or other positioning system), an inertial measurement unit (inertial measurement unit, IMU), a lidar, millimeter wave radar, an ultrasonic radar, and a camera device.
Decision control system 630 may include a computing system, a vehicle controller, a steering system, a throttle, and a braking system.
The drive system 640 may include components that provide powered movement of the vehicle 600. In one embodiment, the drive system 640 may include an engine, an energy source, a transmission, and wheels. The engine may be one or a combination of an internal combustion engine, an electric motor, an air compression engine. The engine is capable of converting energy provided by the energy source into mechanical energy.
Some or all of the functions of the vehicle 600 are controlled by the computing platform 650. The computing platform 650 may include at least one processor 651 and a first memory 652, the processor 651 may execute instructions 653 stored in the first memory 652.
The processor 651 may be any conventional processor, such as a commercially available CPU. The processor may also include, for example, an image processor (Graphic Process Unit, GPU), a field programmable gate array (Field Programmable Gate Array, FPGA), a System On Chip (SOC), an application specific integrated Chip (Application Specific Integrated Circuit, ASIC), or a combination thereof.
The first memory 652 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
In addition to instructions 653, the first memory 652 may also store data such as road maps, route information, the position, direction, speed, etc. of the vehicle. The data stored by the first memory 652 may be used by the computing platform 650.
In embodiments of the present disclosure, the processor 651 may execute instructions 653 to perform all or part of the steps of the perceptual prediction model training method described above, or to perform all or part of the steps of the vehicle control method described above.
FIG. 9 is a block diagram illustrating an apparatus 1900 for perceptual prediction model training, according to an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to fig. 9, the apparatus 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by a second memory 1932 for storing instructions, such as application programs, that can be executed by the processing component 1922. The application program stored in the second memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the perceptual prediction model training method described above.
The apparatus 1900 may further comprise a power component 1926 configured to perform power management of the apparatus 1900, a wired or wireless network interface 1950 configured to connect the apparatus 1900 to a network, and an input/output interface 1958. The device 1900 may operate based on an operating system, such as Windows Server, stored in the second memory 1932 TM ,Mac OS X TM ,Unix TM ,Linux TM ,FreeBSD TM Or the like.
In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described perceptual prediction model training method when executed by the programmable apparatus.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

1. A method for training a perceptual prediction model, comprising:
Acquiring a first surrounding image of a first sample vehicle;
inputting the first peripheral environment image into a pre-trained original perception prediction model to obtain first prediction running information of the first sample vehicle;
inputting the first predicted running information into a pre-trained rewarding model to obtain a first score of the first predicted running information;
fine tuning the original perception prediction model through a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score to obtain a target perception prediction model;
wherein, the reinforcement learning method is a near-end strategy optimization algorithm;
the fine tuning of the original perception prediction model according to the first surrounding image, the first predicted driving information and the first score by a reinforcement learning method includes:
determining an objective function of the original perception prediction model according to the first surrounding image, the first prediction running information and the first score;
and according to the objective function, updating model parameters of the original perception prediction model by adopting a random gradient descent method.
2. The method of claim 1, wherein after the step of fine-tuning the original perceptual prediction model by a reinforcement learning method based on the first surrounding image, the first predicted travel information, and the first score, the method further comprises:
in response to the first training cut-off condition not being met, updating model parameters of the reward model, and repeatedly executing the step of acquiring a first peripheral environment image of a first sample vehicle to the step of finely adjusting the original perception prediction model through a reinforcement learning method according to the first peripheral environment image, the first prediction running information and the first score;
and obtaining the target perception prediction model in response to the first training cut-off condition being met.
3. The method of claim 1, wherein the reward model is trained by:
acquiring a second surrounding image of a second sample vehicle;
inputting the second surrounding environment image into the original perception prediction model to obtain a plurality of second prediction running information of the second sample vehicle, wherein the plurality of second prediction running information respectively correspond to different prediction probabilities;
Determining the actual good and bad sequencing results of the second prediction running information;
inputting the plurality of second predicted running information into a neural network to obtain a second score of each piece of second predicted running information;
sequencing the plurality of second predicted traveling information according to each second score to obtain a predicted good and bad sequencing result;
according to the actual good and bad sorting result and the predicted good and bad sorting result, updating model parameters of the neural network;
in response to the second training cutoff condition not being met, repeating the step of acquiring a second surrounding environment image of a second sample vehicle to the step of updating the model parameters of the neural network according to the actual good-bad ordering result and the predicted good-bad ordering result;
and determining the neural network obtained after the last model parameter update as the rewarding model in response to the second training cutoff condition being met.
4. The method according to claim 1, wherein the first predicted travel information includes predicted surrounding vehicle information of the first sample vehicle, predicted lane line information of a road ahead of the first sample vehicle, a predicted trajectory of the predicted surrounding vehicle of the first sample vehicle, and a predicted travel route of the first sample vehicle;
The original perception prediction model is obtained through training in the following mode:
acquiring a third surrounding environment image of a third sample vehicle and actual running information of the third sample vehicle, wherein the actual running information comprises actual surrounding vehicle information of the third sample vehicle, actual lane line information of a road in front of the third sample vehicle, an actual track of the actual surrounding vehicle of the third sample vehicle and an actual running route of the third sample vehicle;
and performing model training by taking the third surrounding environment image as the input of the original perception prediction model and taking the actual running information as the target output of the original perception prediction model so as to obtain the original perception prediction model.
5. A vehicle control method characterized by comprising:
acquiring a current surrounding environment image of a target vehicle;
inputting the current surrounding environment image into a pre-trained target perception prediction model to obtain target running information of the target vehicle, wherein the target perception prediction model is trained based on the perception prediction model training method according to any one of claims 1-4;
And controlling the target vehicle to run according to the target running information.
6. The method of claim 5, wherein the method further comprises:
and updating parameters of the target perception prediction model according to a preset period, so that the target vehicle predicts running information according to the target perception prediction model obtained after the parameters are updated.
7. A perception prediction model training device, comprising:
a first acquisition module configured to acquire a first ambient environment image of a first sample vehicle;
the first prediction module is configured to input the first surrounding environment image into a pre-trained original perception prediction model to obtain first prediction running information of the first sample vehicle;
the first scoring module is configured to input the first predicted running information into a pre-trained rewarding model to obtain a first score of the first predicted running information;
the fine tuning module is configured to perform fine tuning on the original perception prediction model through a reinforcement learning method according to the first surrounding image, the first prediction running information and the first score to obtain a target perception prediction model;
Wherein, the reinforcement learning method is a near-end strategy optimization algorithm;
the fine tuning module includes:
a determining sub-module configured to determine an objective function of the original perceived prediction model based on the first surrounding image, the first predicted travel information, and the first score;
and the updating sub-module is configured to update model parameters of the original perception prediction model by adopting a random gradient descent method according to the objective function.
8. A vehicle control apparatus characterized by comprising:
a second acquisition module configured to acquire a current surrounding image of the target vehicle;
the second prediction module is configured to input the current surrounding environment image into a pre-trained target perception prediction model to obtain target running information of the target vehicle, wherein the target perception prediction model is trained based on the perception prediction model training method according to any one of claims 1-4;
and the control module is configured to control the target vehicle to run according to the target running information.
9. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
Wherein the processor is configured to:
the executable instructions, when executed, implement the steps of the method of any one of claims 1-6.
10. A vehicle comprising the electronic device of claim 9 or being connected to the electronic device of claim 9.
11. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1-6.
CN202310207908.9A 2023-03-03 2023-03-03 Model training method, vehicle control method, device, equipment, vehicle and medium Active CN116091894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310207908.9A CN116091894B (en) 2023-03-03 2023-03-03 Model training method, vehicle control method, device, equipment, vehicle and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310207908.9A CN116091894B (en) 2023-03-03 2023-03-03 Model training method, vehicle control method, device, equipment, vehicle and medium

Publications (2)

Publication Number Publication Date
CN116091894A CN116091894A (en) 2023-05-09
CN116091894B true CN116091894B (en) 2023-07-14

Family

ID=86187055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310207908.9A Active CN116091894B (en) 2023-03-03 2023-03-03 Model training method, vehicle control method, device, equipment, vehicle and medium

Country Status (1)

Country Link
CN (1) CN116091894B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910912B (en) * 2023-07-28 2024-04-30 小米汽车科技有限公司 Method, device, equipment and storage medium for generating three-dimensional model of vehicle
CN116758378B (en) * 2023-08-11 2023-11-14 小米汽车科技有限公司 Method for generating model, data processing method, related device, vehicle and medium
CN117928568A (en) * 2024-03-22 2024-04-26 腾讯科技(深圳)有限公司 Navigation method based on artificial intelligence, model training method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109991987A (en) * 2019-04-29 2019-07-09 北京智行者科技有限公司 Automatic Pilot decision-making technique and device
CN111731326A (en) * 2020-07-02 2020-10-02 知行汽车科技(苏州)有限公司 Obstacle avoidance strategy determination method and device and storage medium
CN113359771A (en) * 2021-07-06 2021-09-07 贵州大学 Intelligent automatic driving control method based on reinforcement learning
CN115303297A (en) * 2022-07-25 2022-11-08 武汉理工大学 Method and device for controlling end-to-end automatic driving under urban market scene based on attention mechanism and graph model reinforcement learning

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6917878B2 (en) * 2017-12-18 2021-08-11 日立Astemo株式会社 Mobile behavior prediction device
US20200033869A1 (en) * 2018-07-27 2020-01-30 GM Global Technology Operations LLC Systems, methods and controllers that implement autonomous driver agents and a policy server for serving policies to autonomous driver agents for controlling an autonomous vehicle
CN110119844B (en) * 2019-05-08 2021-02-12 中国科学院自动化研究所 Robot motion decision method, system and device introducing emotion regulation and control mechanism
US11467591B2 (en) * 2019-05-15 2022-10-11 Baidu Usa Llc Online agent using reinforcement learning to plan an open space trajectory for autonomous vehicles
CN111523643B (en) * 2020-04-10 2024-01-05 商汤集团有限公司 Track prediction method, device, equipment and storage medium
JP7258077B2 (en) * 2021-05-13 2023-04-14 三菱電機株式会社 Other vehicle behavior prediction device
CN114821544B (en) * 2022-06-29 2023-04-11 小米汽车科技有限公司 Perception information generation method and device, vehicle, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109991987A (en) * 2019-04-29 2019-07-09 北京智行者科技有限公司 Automatic Pilot decision-making technique and device
CN111731326A (en) * 2020-07-02 2020-10-02 知行汽车科技(苏州)有限公司 Obstacle avoidance strategy determination method and device and storage medium
CN113359771A (en) * 2021-07-06 2021-09-07 贵州大学 Intelligent automatic driving control method based on reinforcement learning
CN115303297A (en) * 2022-07-25 2022-11-08 武汉理工大学 Method and device for controlling end-to-end automatic driving under urban market scene based on attention mechanism and graph model reinforcement learning

Also Published As

Publication number Publication date
CN116091894A (en) 2023-05-09

Similar Documents

Publication Publication Date Title
CN116091894B (en) Model training method, vehicle control method, device, equipment, vehicle and medium
CN110673602B (en) Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110874642B (en) Learning device, learning method, and storage medium
CN110386145A (en) A kind of real-time forecasting system of target driver driving behavior
CN114846425A (en) Prediction and planning of mobile robots
US20230053459A1 (en) Vehicle-based data processing method and apparatus, computer, and storage medium
US11715338B2 (en) Ranking fault conditions
Crane Iii et al. Team CIMAR's NaviGATOR: An unmanned ground vehicle for the 2005 DARPA grand challenge
CN115878494B (en) Test method and device for automatic driving software system, vehicle and storage medium
US20240054895A1 (en) Parking method and apparatus, storage medium, chip and vehicle
CN112382165A (en) Driving strategy generation method, device, medium, equipment and simulation system
CN111508253B (en) Method for providing automatic driving service platform and server using the same
Wang et al. An interaction-aware evaluation method for highly automated vehicles
CN112784867A (en) Training deep neural networks using synthetic images
CN116136963A (en) Adaptively pruning neural network systems
CN116626670B (en) Automatic driving model generation method and device, vehicle and storage medium
CN116070780B (en) Evaluation method and device of track prediction algorithm, medium and vehicle
CN115837905B (en) Vehicle control method and device, vehicle and storage medium
CN112896179A (en) Vehicle operating parameters
Du et al. Heuristic reinforcement learning based overtaking decision for an autonomous vehicle
CN112700001A (en) Authentication countermeasure robustness for deep reinforcement learning
CN115900771B (en) Information determination method, device, vehicle and storage medium
CN112668692A (en) Quantifying realism of analog data using GAN
CN116659529B (en) Data detection method, device, vehicle and storage medium
CN117128976B (en) Method and device for acquiring road center line, vehicle and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant