CN117636306A

CN117636306A - Driving track determination method, model training method, driving track determination device, model training device, electronic equipment and medium

Info

Publication number: CN117636306A
Application number: CN202311675223.3A
Authority: CN
Inventors: 刘姜江; 谭资昌; 叶晓青; 王井东
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-12-07
Filing date: 2023-12-07
Publication date: 2024-03-01

Abstract

The disclosure provides a driving track determination and model training method, a device, electronic equipment and a medium, relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning, large models and the like, and can be applied to scenes such as automatic driving, autonomous parking, internet of things, intelligent traffic and the like. The specific implementation scheme is as follows: acquiring at least one group of object weights configured for target objects around the vehicle, wherein the object weights represent the influence degree of the target objects on the running process of the vehicle; according to the environment coding vector and at least one group of object weights, the current running track of the vehicle is adjusted to obtain at least one candidate running track of the vehicle, the candidate running track has a target evaluation value, and the environment coding vector is obtained by coding surrounding environment information of the vehicle; and determining the target candidate running track corresponding to the target evaluation value meeting the preset condition as the target running track of the vehicle according to at least one target evaluation value of the at least one candidate running track.

Description

Driving track determination method, model training method, driving track determination device, model training device, electronic equipment and medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning, large models and the like, and can be applied to scenes such as automatic driving, autonomous parking, internet of things, intelligent traffic and the like, and particularly relates to a driving track determining and model training method, device, electronic equipment and medium.

Background

With the rapid development of deep learning technology, the field of automatic driving is divided into several aspects of perception, prediction and decision making for research. The unmanned decision stage is one of the key technologies of the intellectualization and autonomy of the unmanned automobile. The decision stage is a process of realizing functions of path planning, motion control, behavior decision and the like of the unmanned automobile based on the perception data and the priori knowledge.

Disclosure of Invention

The disclosure provides a driving track determination method, a model training method, a device, electronic equipment and a medium.

According to an aspect of the present disclosure, there is provided a travel track determining method including: acquiring at least one group of object weights configured for a target object around a vehicle, the object weights representing the degree of influence of the target object on the running process of the vehicle; according to the environment coding vector and the at least one group of object weights, the current running track of the vehicle is adjusted to obtain at least one candidate running track of the vehicle, the candidate running track has a target evaluation value, and the environment coding vector is obtained by coding surrounding environment information of the vehicle; and determining a target candidate running track corresponding to the target evaluation value meeting a preset condition as the target running track of the vehicle according to at least one target evaluation value of the at least one candidate running track.

According to another aspect of the present disclosure, there is provided a training method of a deep learning model, including: inputting sample surrounding environment information of a sample vehicle into a first neural network of a deep learning model to obtain a sample environment coding vector of the sample vehicle; inputting the sample environment coding vector, the sample running track of the sample vehicle and at least one group of sample object weights configured for sample objects around the sample vehicle into a second neural network of the deep learning model to obtain at least one sample candidate running track of the sample vehicle, wherein the sample candidate running track has a sample evaluation value; determining a target sample candidate running track corresponding to the sample evaluation value meeting a preset condition as an optimized running track of the vehicle according to at least one sample evaluation value of the at least one sample candidate running track; and training the deep learning model according to the sample environment coding vector, the sample running track, the sample object weight, the at least one sample candidate running track and the optimized running track to obtain a trained deep learning model.

According to another aspect of the present disclosure, there is provided a travel track determining apparatus including: an object weight acquisition module for acquiring at least one set of object weights configured for a target object around a vehicle, the object weights characterizing a degree of influence of the target object on a running process of the vehicle; the track adjustment module is used for adjusting the current running track of the vehicle according to the environment coding vector and the at least one group of object weights to obtain at least one candidate running track of the vehicle, wherein the candidate running track has a target evaluation value, and the environment coding vector is obtained by coding surrounding environment information of the vehicle; and a target travel track determining module, configured to determine, according to at least one target evaluation value of the at least one candidate travel track, a target candidate travel track corresponding to a target evaluation value that satisfies a preset condition, as a target travel track of the vehicle.

According to another aspect of the present disclosure, there is provided a training apparatus of a deep learning model, including: the first neural network module is used for inputting the sample surrounding environment information of the sample vehicle into the first neural network of the deep learning model to obtain a sample environment coding vector of the sample vehicle; a second neural network module, configured to input the sample environment encoding vector, a sample running track of the sample vehicle, and at least one set of sample object weights configured for sample objects around the sample vehicle into a second neural network of the deep learning model, to obtain at least one sample candidate running track of the sample vehicle, where the sample candidate running track has a sample evaluation value; an optimized running track determining module, configured to determine, according to at least one sample evaluation value of the at least one sample candidate running track, a target sample candidate running track corresponding to a sample evaluation value that meets a preset condition, as an optimized running track of the vehicle; and the first training module is used for training the deep learning model according to the sample environment coding vector, the sample running track, the sample object weight, the at least one sample candidate running track and the optimized running track to obtain a trained deep learning model.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the travel track determination method and the training method of the deep learning model of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform at least one of the travel track determination method and the training method of the deep learning model of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implements at least one of the method of travel trajectory determination and the training method of a deep learning model of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which at least one of a travel trajectory determination method and a training method of a deep learning model may be applied, and a corresponding apparatus, according to an embodiment of the present disclosure;

fig. 2 schematically illustrates a flowchart of a travel track determination method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic diagram of a closed-loop multi-stage trajectory planning framework in accordance with an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a training method of a deep learning model according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a schematic diagram of a training process of a deep learning model according to an embodiment of the present disclosure;

fig. 6 schematically shows a block diagram of a travel track determining device according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure; and

FIG. 8 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

Regarding the aspect of automatic driving decision, a classical approach is to plan the track of the vehicle by adopting a numerical optimization mode after the prediction of future tracks of other objects in the environment is completed. The related method further comprises the following steps: the vehicle trajectory in the dataset is fitted based on open loop training. And solving a track analysis solution meeting the requirements of safety, comfort and the like according to the predicted environment future information based on a regularization mode.

The method based on open loop training predicts environmental information in a future period of time through an input model, and the training model fits a future running track of the vehicle collected in a data set. One type of method performs a series of sensing tasks, such as 3D object detection and semantic segmentation, to obtain spatio-temporal information of the surrounding environment. A natural idea is that if a model performs well in these perceived tasks, it can make accurate, safe and comfortable trajectory planning based on this information.

The inventors have discovered in implementing the concepts of the present disclosure that open loop training based methods compare the modeling integrity and similarity of a dependent training set to the real world. For example, if the data set is built based on a simulation platform, then the information collected from it may be distinguishable from the real world scene by the data distribution. If the dataset does not cover well the various rare scenarios that may be encountered in real applications, the model in actual testing may not be able to make satisfactory decisions for these situations. In fact, the open loop training and the closed loop testing always have data distribution differences which are difficult to thoroughly solve, and the vehicle always gradually deviates from the track position in the training set in the actual testing, and the acquired environment information which is possibly not contacted by the model is acquired at the moment.

The method based on the rules is used for completing the track prediction of different objects in the environment within a period of time in the future by inputting the perception information of the environment at the current moment, and giving out the track planning of the vehicle by means of an analysis method.

The inventor finds that in the process of realizing the conception of the present disclosure, a track planning method based on rules directly makes a driving decision of a bicycle by means of track prediction of other surrounding objects on one hand, and the method is feasible on the premise that the prediction is accurate and is not influenced by input data distribution. But in real world scenarios the decision on the behavior of an object like an agent is not single. Human drivers often take into account various responses that other vehicles or pedestrians in the scene may make in a short time in the future when making decisions, and based on this judgment, modify their own driving paths. Such rule-based methods have difficulty in implementing the multi-agent interaction process described above, but can only make optimal decisions in some simple sense based on several heuristic evaluation criteria defined by human beings.

Some recent works directly take the real track of the vehicle collected in the scene as a training target to learn the model by means of the characteristic that the neural network can learn end to end, and good results are obtained on some data sets. However, the test mode adopted by these methods is not to actually drive the vehicle according to the planned track, but to evaluate the vehicle in the next frame by taking the past vehicle position collected in the data set and the surrounding environment into consideration, and such a test mode is called an open loop test. Whereas in the real world case of using autopilot technology, the vehicle must travel along a planned trajectory of the model, the input of the next model iteration depends on the output of the last model trajectory, a situation known as closed loop testing. The open loop test differs from the closed loop test in whether the effect produced by the output of the model has a cumulative effect.

The inventors found that most of the methods based on imitation learning and well performing in open loop tests cannot migrate smoothly in closed loop tests in the process of implementing the disclosed concept, mainly because the methods based on imitation learning for performing track fitting have certain requirements on the input data distribution. This is also a common feature of neural network models, in that when the data distribution of the input deviates significantly from the training set, unpredictable deviations in the output of the model may occur.

In an actual floor automatic driving system, the closed loop autonomous adjustment is a necessary requirement for the track planning capability of various driving environments, and not only a certain type of self-vehicle driving track acquired in the past can be well fitted in some automatic driving data sets.

The method and the system provide multi-stage track planning based on closed-loop training by fully considering the difference of the previous method in the open-loop and closed-loop test scenes, so as to better adapt to the scene requirements of automatic driving practical application.

The disclosure provides a driving track determination method, a model training method, a device, electronic equipment and a medium. The driving track determining method comprises the following steps: acquiring at least one group of object weights configured for target objects around the vehicle, wherein the object weights represent the influence degree of the target objects on the running process of the vehicle; according to the environment coding vector and at least one group of object weights, the current running track of the vehicle is adjusted to obtain at least one candidate running track of the vehicle, the candidate running track has a target evaluation value, and the environment coding vector is obtained by coding surrounding environment information of the vehicle; and determining a target candidate travel track corresponding to the target evaluation value meeting the preset condition as the target travel track of the vehicle according to at least one target evaluation value of the at least one candidate travel track.

Fig. 1 schematically illustrates an exemplary system architecture to which at least one of a travel track determination method and a training method of a deep learning model, and a corresponding apparatus, may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the content processing method and apparatus may be applied may include a terminal device, but the terminal device may implement the content processing method and apparatus provided by the embodiments of the present disclosure without interaction with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include an autonomous vehicle 101, a network 102, and a server 103. The network 102 serves as a medium for providing a communication link between the driving apparatus 101 and the server 103. Network 102 may include various connection types, such as wired and/or wireless communication links, and the like.

The autonomous vehicle 101 may interact with a server 103 through a network 102 to receive or transmit data or the like.

The autopilot vehicle 101 may be provided with a display screen for implementing a human-machine interface, and may be further provided with various cameras, infrared scanning sensors, and/or information acquisition devices such as a laser radar, for acquiring information of surrounding environments.

The server 103 may be a server providing various services, such as a background management server (merely an example) that provides support for navigation of a selected target location with content browsed by the user using the driving apparatus 101. The background management server may analyze and the like the received data such as the user request and feed back the processing result (e.g., web page, information, data, or the like acquired or generated according to the user request) to the driving apparatus 101. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that at least one of the driving track determining method and the training method of the deep learning model provided in the embodiments of the present disclosure may be generally performed by the autonomous vehicle 101. Accordingly, at least one of the travel track determining device and the training device of the deep learning model provided in the embodiment of the present disclosure may also be provided in the autonomous vehicle 101.

Alternatively, at least one of the driving trajectory determination method and the training method of the deep learning model provided by the embodiments of the present disclosure may be generally executed by the server 103. Accordingly, at least one of the driving trajectory determining device and the training device of the deep learning model provided in the embodiments of the present disclosure may be generally provided in the server 103. At least one of the driving trajectory determination method and the training method of the deep learning model provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 103 and is capable of communicating with the autonomous vehicle 101 and/or the server 103. Accordingly, at least one of the driving trajectory determination device and the training device of the deep learning model provided in the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 103 and is capable of communicating with the autonomous vehicle 101 and/or the server 103.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically shows a flowchart of a travel track determination method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S230.

In operation S210, at least one set of object weights configured for a target object around the vehicle is acquired, the object weights characterizing a degree of influence of the target object on a running process of the vehicle.

In operation S220, the current driving track of the vehicle is adjusted according to the environmental encoding vector and at least one set of object weights to obtain at least one candidate driving track of the vehicle, the candidate driving track having a target evaluation value, the environmental encoding vector being obtained by encoding surrounding environmental information of the vehicle.

In operation S230, a target candidate travel track corresponding to the target evaluation value satisfying the preset condition is determined as a target travel track of the vehicle according to at least one target evaluation value of the at least one candidate travel track.

According to embodiments of the present disclosure, a target object may represent any object capable of affecting a travel plan of a vehicle. For example, the target object may include an agent object such as a vehicle, a pedestrian, and other dynamic obstacles existing on the road, which may be referred to as a target agent. In some embodiments, the target object may also include static objects such as road boundaries and other static obstacles existing on the road, which may be referred to as target static objects, and is not limited herein. In the running process of the vehicle, at least one object of any target intelligent agent, target static object and the like in the environment which can be detected in real time by the vehicle can be determined as a target object.

According to the embodiment of the disclosure, after the target objects are detected, a weight can be randomly allocated to each target object to obtain the object weight. The object weight may characterize a degree of influence of the target object with respect to a driving course of the vehicle.

For example, in order to simulate the driving habit of a human, at least one of one or more target agents around, a target static object, and the like may be detected based on a vehicle as a target object. According to the influence degree of at least one object of the target intelligent agent and the target static object, which influences the driving behavior of a few seconds in the future in the driving process of the vehicle, importance of different degrees is allocated to the intelligent agent object and the static object, and the importance is used as the object weight, for example, the intelligent agent weight and the static object weight can be obtained. During driving of the vehicle, target agents and target static objects with the weights of the agents and the static objects being larger than a preset weight threshold can be deliberately avoided, so that safety is improved.

According to the embodiment of the present disclosure, a device such as a sensor for detecting the surrounding environment may be mounted on the vehicle, and surrounding environment information of the vehicle may be collected in real time. The surrounding environment information may include at least one of, for example, position information, speed information, traveling direction information, and the like of a dynamic object such as a surrounding vehicle, a pedestrian, position information of a static object such as a plant, a zebra crossing, and the like of a traveling area, and state information of a traffic light, and the like, and is not limited thereto. By encoding the surrounding environment information, an environment encoding vector can be obtained. Environmental coding is an important step in trajectory planning, and can convert surrounding environmental information into a format that is friendly to machine learning models. In this process, for example, a network including, but not limited to, pointNet (Point cloud) may be used as the encoder, which has proven to perform well in a variety of tasks, especially for the processing of point clouds or vectorized format data.

For example, a program shown in equation (1) can be embedded in PointNet.

Based on the formula (1), the perceived surrounding environment information can be input into the encoder of the PointNet as inputs, and converted into a vectorization format to obtain an environment coding vector

According to the embodiment of the present disclosure, the current travel track may represent a travel track preset for the vehicle, or may represent a travel track predicted by the vehicle according to surrounding environment information, and may not be limited thereto.

According to the embodiment of the present disclosure, in the case where the object weights include only a set of the object weights corresponding to the object or a set of the static object weights corresponding to the static object, after determining the current travel track of the vehicle, the current travel track may be adjusted in consideration of avoiding the object or the static object in the course of traveling based on the current travel track according to the object weights configured for the object in the surrounding environment of the vehicle or the static object weights configured for the static object, so as to obtain a candidate travel track with higher safety, and the candidate travel track may be determined as a target travel track, and traveling based on the target travel track.

According to an embodiment of the present disclosure, in a case where the object weights include a plurality of sets of object weights corresponding to at least one of the agent object and the static object, the current travel track may be adjusted according to the plurality of sets of object weights, respectively, based on the environmental encoding vector, to obtain a plurality of candidate travel tracks. Each candidate travel track may have a target evaluation value. Then, candidate driving tracks satisfying the preset conditions can be screened based on the preset conditions set for the target evaluation values, and the vehicles can be driven based on the target form tracks.

According to an embodiment of the present disclosure, the preset condition may include at least one of: the first evaluation value is greater than a preset short-term threshold value, the second evaluation value is greater than a preset long-term threshold value, and the like, and may not be limited thereto.

According to the embodiment of the disclosure, the multiple sets of object weights can reflect multiple distribution conditions of one or more target intelligent agents, and a candidate running track can be obtained after the current running track is adjusted according to each distribution condition of the target intelligent agents on the basis of the environment coding vector. When the vehicle travels under such a distribution of the target agent based on the candidate travel track, it is possible to have higher safety, efficiency, and comfort than when traveling under such a distribution of the target agent based on the current travel track. According to various distribution conditions of the target intelligent agent, after the current running track is adjusted, a plurality of candidate running tracks with higher safety, high efficiency and comfort under corresponding distribution can be obtained.

According to the embodiment of the invention, the current running track can be adjusted according to the environment coding vector and the object weight under the condition of considering the interaction of the target object, so that the vehicle can safely, efficiently and comfortably run in the current environment according to the target running track.

The method shown in fig. 2 is further described below in connection with the specific examples.

According to an embodiment of the present disclosure, before performing the above operation S210, a target object may be first determined, and the method may include: and determining an object with a distance smaller than or equal to a preset distance threshold value from the vehicle as a target object related to the vehicle at the current moment according to the current position information of the vehicle.

According to the embodiments of the present disclosure, only an object within a preset range in the vicinity of the vehicle may be detected as the target object. The preset range may be determined, for example, based on the current location information of the vehicle and the preset distance threshold described above. The preset distance threshold may be set by user according to the actual service requirement, which is not limited herein.

According to embodiments of the present disclosure, the current location information of the vehicle may be the same or different at different times. Accordingly, the target objects determined for the vehicle at different times may be the same or different, and are not limited herein.

According to the embodiment of the disclosure, the target object is obtained through screening according to the distance, so that the calculation amount can be reduced, and the calculation efficiency can be improved.

It should be noted that the method for determining the target object is only an exemplary embodiment, but is not limited thereto, and may include other methods known in the art. For example, when the target object is determined at any one time, a preset number of objects nearest to the vehicle may also be determined as the target object, and may not be limited thereto.

According to an embodiment of the present disclosure, the number of target objects described above is, for example, a target number. The above-described operation S210 may include: and determining a target number of random weights according to the preset weight range. A set of weight vectors is determined based on the target number of random weights. An object weight is determined from at least one set of weight vectors.

According to the embodiment of the disclosure, the preset weight range can be set in a self-defined mode according to actual service requirements. The object weight may be represented using any random number within the preset weight range. For each target object, it may be assigned a plurality of object weights for simulating a plurality of different scenarios. For example, for the same target object, a lesser weight may be assigned thereto, which may indicate that the vehicle need not avoid the target object, a greater weight may be assigned thereto, and may indicate that the vehicle need remain a greater distance from the target object.

For example, the preset weight range may be (0, 1). The 5 objects closest to the vehicle may be determined as target objects. For the process of generating object weights, 10 sets of vectors may be assigned randomly by default, each set of vectors containing 5 random numbers in the (0, 1) range, i.e., each set of vectors contains 5 random weights. The 5 random weights may be respectively assigned to 5 target objects nearest to the vehicle as object weights thereof for representing the relative importance of the target objects with respect to the vehicle. The 10 sets of vectors may simulate the relative importance of the 5 target objects with respect to the vehicle in 10 scenarios.

It should be noted that the weights assigned to the same object at different times may be the same or different. Adjusting the current driving trajectory of the vehicle according to the object weight that varies with time may result in human-like driving behavior.

Through the embodiment of the disclosure, the object weight mode can be determined by determining the weight vector according to the random weight, the target objects with different degrees of importance are characterized, and one or more scenes simulated for each target object are favorable for enhancing the richness of the target object in the actual scene, so that the method is applied to various scenes. In addition, one or more conditions can be considered based on one or more scenes, so that the consideration is more comprehensive and complete, and the effectiveness of subsequent calculation is improved.

According to an embodiment of the present disclosure, before performing the above operation S220, the current driving trajectory may be first determined, and the method may include: and determining the current running track according to the environment coding vector and the target position information determined for the vehicle.

According to the embodiment of the disclosure, the target position information may be determined randomly according to the current position information of the vehicle, or may be determined by inference calculation according to the current position information, the current running speed, the current running direction, and other running parameter information of the vehicle in combination with a kinematic equation, and may not be limited thereto.

For example, open loop MLP (Multilayer Perceptron, multi-layer perceptron) may also be used to predict the current travel trajectory. The open loop MLP may have a program embedded therein, for example, as shown in equation (2).

In the formula (2) of the present invention,the current travel track may be represented. />One possible target location information for the vehicle may be represented. />The context encoding vector may be represented, which may contain rich information about the context, and may provide necessary information for the subsequent process of generating the current travel track. h may represent an MLP that uses the target position information and the context encoding vector as inputs and outputs the predicted current travel trajectory.

By the above-described embodiments of the present disclosure, the current travel track is generated based on the environment-encoded vector, so that the process can exhibit good adaptability in various situations.

According to an embodiment of the present disclosure, the above-described target evaluation value includes at least one of: a first evaluation value and a second evaluation value. The first evaluation value may characterize a running condition evaluation result of running a first distance based on the candidate running track. The second evaluation value may characterize a running condition evaluation result of running a second distance based on the candidate running track. The second distance is greater than the first distance.

According to the embodiment of the disclosure, the first evaluation value and the second evaluation value may be obtained by simulating an actual driving environment and performing scoring determination, or may be obtained by training a network model output capable of realizing scoring, which is not limited herein. For the first evaluation value, a running condition of the vehicle when the vehicle runs for a short period of time or a short distance based on the corresponding candidate running track may be simulated in each distribution condition of the target object, and the first evaluation value may be determined according to the running condition. The first distance may be indicative of a distance traveled for a short period of time or a short distance. For the second evaluation value, a running condition when the vehicle runs on the corresponding candidate running track may be simulated under each distribution condition of the target object, and the second evaluation value may be determined according to the running condition. The second distance may represent a distance represented by the candidate travel track, or may represent any distance greater than the first distance, and may not be limited thereto. The driving situation may include at least one of: whether there is an actual collision or collision risk with the target object, whether the traveling comfort is appropriate, whether the traveling distance is appropriate, and the like, and may not be limited thereto. In this embodiment, the target object may include only the target agent, or may include the target agent and the target static object, which is not limited herein.

According to an embodiment of the present disclosure, the preset condition may further include that the first evaluation value is highest in value, and in this case, the above-described operation S230 may include: the target first evaluation value with the highest value is determined according to at least one first evaluation value determined for at least one candidate driving track. And determining the candidate running track corresponding to the first target evaluation value as a target candidate running track.

According to an embodiment of the present disclosure, the preset condition may further include that the second evaluation value is highest in value, and in this case, the above-described operation S230 may include: and determining a target second evaluation value with the highest value according to the at least one second evaluation value determined for the at least one candidate driving track. And determining the candidate driving track corresponding to the target second evaluation value as the target candidate driving track.

According to an embodiment of the present disclosure, the preset condition may further include that the total evaluation value of the first evaluation value and the second evaluation value is highest, in which case the above-described operation S230 may include: the total evaluation value determined for the candidate travel track is determined based on the first evaluation value and the second evaluation value determined for the same candidate travel track. The target total evaluation value with the highest value is determined according to at least one total evaluation value determined for at least one candidate driving track. The candidate travel locus corresponding to the target total evaluation value is determined as a target candidate travel locus.

According to the embodiment of the disclosure, the required preset conditions can be selected according to actual service requirements, so that the target candidate driving track more suitable for the corresponding scene is determined. For example, in a scenario where only safety is pursued and efficiency is not pursued, a track with the highest value of the second evaluation value may be used as the target candidate travel track. In a scenario where safety and efficiency are pursued at the same time, a track having the highest total evaluation value may be used as the target candidate travel track. The selection criteria regarding the preset conditions may not be limited thereto.

Through the embodiment of the disclosure, the target running track meeting the service requirement can be obtained through screening based on a relatively perfect standard, and the obtained target running track has the advantages of safety, high efficiency and good realization effect.

According to an embodiment of the present disclosure, the target object may include a target agent, and the surrounding environment information may include target agent parameter information of the target agent. In the process of executing the method, the method can further comprise: and according to the current running track or the target running track, the Monte Carlo tree search is combined, and the target intelligent agent parameter information is adjusted to obtain updated target intelligent agent parameter information. And determining updated surrounding environment information according to the updated target agent parameter information.

According to an embodiment of the present disclosure, the target agent parameter information may include at least one of: the traveling speed, traveling direction, position information, and the like of the target agent, and may not be limited thereto. Monte Carlo tree search (MCTS, monte Carlo Tree Search) is an algorithmic framework widely used in the gaming and decision making arts. In an application scenario of the present disclosure, the MCTS framework may simulate interactions of a vehicle with a target agent. In a specific embodiment, by combining the MCTS framework and the method for adjusting the parameter information of the target agents according to the running track of the vehicle, the interaction between the vehicle and each target agent can be considered gradually, and the response and the behavior of the target agent when facing different tracks of the vehicle can be modeled explicitly. These reactions and behaviors can be used as updated ambient information that the vehicle acquires after interacting with the corresponding target agent. By simulating different interaction scenarios based on the method, the MCTS framework can predict possible reactions of target agents and help vehicles to make better decisions.

It should be noted that, corresponding to the above-described process, the vehicle and the target agent may perform the above-described interaction at any one time. And at any next moment, updated surrounding environment information at the corresponding moment can be determined according to the reaction and the behavior of the target intelligent agent. The updated surrounding environment information may be applied to the aforementioned method as surrounding environment information for a process of determining the current travel track and the target travel track at the corresponding times.

With the above-described embodiments of the present disclosure, by using the Monte Carlo tree search framework, interactions of a vehicle with a target agent may be simulated, predicting a reaction of the target agent to a trajectory of the vehicle, so that the vehicle may predict and cope with various possible scenarios. By gradually considering different interaction situations based on the Monte Carlo tree search framework, the vehicle can also be assisted in making better decisions.

According to the embodiment of the disclosure, based on the method, a unified multi-stage track planning framework can be constructed and can be used for automatic driving path planning tasks under a closed loop.

Fig. 3 schematically illustrates a schematic diagram of a closed-loop multi-stage trajectory planning framework in accordance with an embodiment of the present disclosure.

As shown in fig. 3, the closed-loop multi-stage trajectory planning framework 300 may first perform an environmental encoding process based on vehicle perceived ambient environment information 310. After the environmental encoding, a current travel track 320 may be planned for the vehicle based on the encoded environmental information. Then, the closed-loop multi-stage trajectory planning framework 300 may consider, according to the current driving trajectory 320, various feedback situations 330 that may be generated by the target agent in the current environment, that is, the target agent may react and behave according to the current driving trajectory 320, for example, may obtain the first environmental feedback 331, the second environmental feedback 332, the third environmental feedback 333, and the like, and may not be limited thereto. For each possible feedback situation, the current driving track 320 may be corrected according to the situations, to obtain a corrected track 340. The corrected trajectory 340 may include, for example, a first corrected trajectory 341, a second corrected trajectory 342, a third corrected trajectory 343, etc., corresponding to the aforementioned various feedback cases 330, and may not be limited thereto. After multiple corrections are made, the quality of each corrected trajectory may be evaluated, and the best trajectory with the highest quality, such as the second corrected trajectory 342, may be selected. The second modified trajectory 342 may be used as a decision signal for the next frame of the vehicle to control the vehicle to travel. And then, the perceived environmental information can be updated to obtain updated surrounding environmental information, and the closed loop feedback process can be restarted according to the updated surrounding environmental information. In each closed loop feedback process, the vehicle can be correspondingly adjusted according to the change of the environment.

According to embodiments of the present disclosure, the various feedback cases 330 may be determined in conjunction with the simulated case of MCTS by assigning multiple sets of agent weights to the target agents.

Through the above embodiments of the present disclosure, a closed-loop multi-stage trajectory planning framework is provided, including the steps of environmental encoding, initial trajectory generation, multi-agent interaction, and trajectory updating. By respectively constructing different behavior combinations of the vehicle and the target intelligent agent in the closed loop simulator, multi-intelligent agent interaction can be explicitly modeled, prejudgment aiming at various future conditions can be realized based on 1 training data, and data distribution which is closer to an actual application scene can be derived. The use of the framework enables the vehicle to gradually consider the environment and the conditions of the intelligent agents in the environment in different stages, generate a track with better quality and realize more comprehensive and flexible overall.

According to the embodiment of the disclosure, the driving track determining method can complete the related calculation process by training a deep learning model through the trained deep learning model.

Fig. 4 schematically illustrates a flowchart of a training method of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 4, the method includes operations S410 to S440.

In operation S410, sample ambient environment information of the sample vehicle is input into the first neural network of the deep learning model to obtain a sample ambient code vector of the sample vehicle.

In operation S420, the sample environment encoding vector, the sample travel track of the sample vehicle, and at least one set of sample object weights configured for sample objects around the sample vehicle are input into a second neural network of the deep learning model, resulting in at least one sample candidate travel track of the sample vehicle, the sample candidate travel track having a sample evaluation value.

In operation S430, a target sample candidate travel track corresponding to a sample evaluation value satisfying a preset condition is determined as an optimized travel track of the vehicle according to at least one sample evaluation value of the at least one sample candidate travel track.

In operation S440, the deep learning model is trained according to the sample environment encoding vector, the sample travel trajectory, the sample object weight, the at least one sample candidate travel trajectory, and the optimized travel trajectory, resulting in a trained deep learning model.

According to embodiments of the present disclosure, the sample environmental information may have technical features identical or similar to the aforementioned surrounding environmental information. The sample context encoding vector may have the same or similar technical features as the aforementioned sample context encoding vector. The sample travel track may have the same or similar technical characteristics as the aforementioned current travel track. The sample object may have technical features identical or similar to the aforementioned target object. The sample object weights may have the same or similar technical characteristics as the aforementioned object weights. The sample candidate travel track may have the same or similar technical characteristics as the aforementioned candidate travel track. The sample evaluation value may have the same or similar technical features as the aforementioned target evaluation value. The target sample candidate travel locus may have the same or similar technical characteristics as the aforementioned target candidate travel locus. The optimized driving trajectory may have the same or similar technical characteristics as the aforementioned target driving trajectory. And will not be described in detail herein.

According to the practice of the present disclosureIn an embodiment, the first neural network, for example for constructing an open loop planner, may comprise P _o An Intet encoder or other encoder, as well as open loop MLP or other MLP, etc., and may not be so limited. The first neural network may be embedded with programs as in equation (1) and equation (2), and may not be limited thereto.

According to an embodiment of the present disclosure, the training method may further include: and determining a first distance loss according to the distance between the target area and the end position of the sample running track, wherein the target area represents the area related to the end position. And training the deep learning model according to the first distance loss.

According to the embodiment of the disclosure, by restricting the end of the sample travel track to be as close to the target area as possible based on the first distance loss, the minimum distance between the track end and the target area polygon can be reduced, and the quality of the generated track can be improved.

According to embodiments of the present disclosure, the second neural network is used, for example, to construct a neural network short-term trajectory optimizer, whose inputs may be the environmental encoding vector, the current travel trajectory output by the open-loop planner, and the object weights of the target objects in the scene that are related to the vehicle. The input information may be compressed into one dimension and concatenated together. In terms of the model structure, a 3-layer MLP structure may be employed, and may not be limited thereto. The second neural network may be embedded with a program as in equation (3), and may not be limited thereto.

In formula (3), phi may represent a 3-layer MLP structure. f may represent the current travel track output by the open loop planner. w may represent the object weight. d may represent an updated value with respect to the current travel track, i.e. the target travel track described above. The value of d may be between 0 and 1, and may not be limited thereto.

According to an embodiment of the present disclosure, the training method may further include: and determining a second distance loss according to the distance between the track point of the sample running track and the sample object. And determining the collision time loss according to the sample environment coding vector and the sample running track. And determining the acceleration loss according to the acceleration information represented by the sample running track. The deep learning model is trained based on at least one of the second distance loss, the collision time loss, and the acceleration loss.

According to embodiments of the present disclosure, considering some realistic constraints of short-term trajectory planning, one or more factors such as driving comfort, safe distance from obstacles, and driving distance along a navigation route may be introduced, and a corresponding loss function may be constructed to comprehensively optimize the quality of the trajectory.

For example, for safety, the minimum distance from the obstacle may be considered, and the second distance penalty constructed. Based on the second distance loss, a supervision mechanism can be designed for monitoring the distances between the track points and the sample objects, and by increasing these distances, collisions are prevented, so that the safety of the sample vehicle is enhanced.

For example, a measure of collision time may be introduced by constructing a loss of collision time, which represents the time remaining before the collision occurs while continuing to travel at the current speed.

For example, for comfort, the acceleration loss can be constructed by taking into account information such as longitudinal and lateral accelerations in the output trajectory, the acceleration change rate, and the angular velocity and angular acceleration of the direction angle. By limiting the range of various accelerations and reducing the absolute value of the metrics as constraint, the driving comfort of the track can be enhanced, the running stability of the sample vehicle is improved, and conditions such as jolt, sharp turn, rapid acceleration or rapid deceleration are reduced.

For example, travel distances along a navigational route may also be considered, with short, efficient paths being prioritized to reduce travel time and consumption.

According to embodiments of the present disclosure, the various factors described above may be considered as different cost components. In some embodiments, one or more cost components may also be weighted as in equation (4) to obtain a total cost function, and a deep learning model may be trained based on the total cost function.

Cost _all ＝∑ _i w _i Cost _i Formula (4)

Wherein, cost _all The total cost function may be represented. Cost (test) _i The ith cost components may be represented. w (w) _i Weights configured for the f-th cost components may be represented. By constructing the cost function, various factors such as driving comfort, safety distance, path efficiency and the like can be comprehensively considered, so that the generated track meets the requirements of practical application.

With the above-described embodiments of the present disclosure, by converting sample environment information into a vectorized format, a deep learning model can be more effectively used. The method can construct a complete decision standard for the driving track determination method based on the deep learning model, and can be beneficial to realizing application in various scenes.

According to embodiments of the present disclosure, trajectories generated based on open loop simulation learning and short term loss constraints may fall into certain locally optimal solutions. For example, in order to pursue safety and comfort, vehicles tend to travel very slowly, resulting in failure to reach a predetermined destination on time. In order to enable the deep learning model to estimate the effect of driving for a period of time using some planning strategy, such as whether to reach the destination efficiently and on time, a training paradigm following MCTS may be introduced.

According to an embodiment of the present disclosure, an MCTS may include: a representation function for encoding raw observed information, a dynamic function responsive to self-action, and a prediction module for assessing long-term benefit of the current state. A closed loop multi-stage trajectory planning framework may be used as a dynamic function. Portions including encoders, MLPs, and neural network short-term trajectory optimizers may be used as the representation function. A value network for predicting long term benefits may be introduced as a prediction module.

According to an embodiment of the present disclosure, the deep learning model may further include a third neural network. The sample evaluation values may include a sample first evaluation value and a sample second evaluation value. The sample first evaluation value may characterize a travel situation evaluation result of traveling a first sample distance based on the sample candidate travel track. The sample second evaluation value may characterize a travel situation evaluation result of traveling a second sample distance based on the sample candidate travel track. The second sample distance is greater than the first sample distance. The above-described operation S440 may include: and inputting the sample environment coding vector, the sample object weight and at least one sample candidate running track into a third neural network of the deep learning model to obtain a sample second evaluation value corresponding to the sample candidate running track. And performing iterative training on the deep learning model according to at least one of the first sample evaluation value and the second sample evaluation value, the surrounding environment information, the sample object weight and the optimized running track.

According to an embodiment of the present disclosure, the sample first evaluation value may have the same or similar technical features as the aforementioned first evaluation value. The first sample distance may have the same or similar technical characteristics as the aforementioned first distance. The sample second evaluation value may have the same or similar technical features as the aforementioned second evaluation value. The second sample distance may have the same or similar technical characteristics as the aforementioned second distance. And will not be described in detail herein.

According to embodiments of the present disclosure, the third neural network may employ a value network for predicting long-term benefits, and may not be limited thereto. The long term benefit may characterize a sample second evaluation of the entire driving process of the sample vehicle after a period of time following the current open loop planner, neural planner, and value network generated strategy.

In accordance with embodiments of the present disclosure, combining the MCTS process described above with an autopilot task, the following model training process may be proposed.

According to an embodiment of the present disclosure, the performing iterative training on the deep learning model according to the optimized driving trajectory, the surrounding environment information, the sample object weight, and at least one of the first evaluation value and the second evaluation value may include: and determining the surrounding environment information at the end position of the ith frame of optimized running track as the (i+1) th frame of surrounding environment information according to the (i) th frame of surrounding environment information, the (i) th frame of sample object weight and the (i) th frame of optimized running track of the sample vehicle. And inputting the environmental information around the (i+1) th frame into the first neural network to obtain the sample environmental coding vector of the (i+1) th frame. And inputting the (i+1) th frame sample environment coding vector, the (i+1) th frame sample running track of the sample vehicle and the (i+1) th frame sample object weight configured for the (i+1) th frame sample object around the sample vehicle into a second neural network to obtain an (i+1) th frame sample candidate running track and an (i+1) th frame sample first evaluation value corresponding to the (i+1) th frame sample candidate running track. And inputting the environment coding vector of the (i+1) th frame sample and the candidate running track of the (i+1) th frame sample into a third neural network to obtain a second evaluation value of the (i+1) th frame sample corresponding to the candidate running track of the (i+1) th frame sample.

In general, training is iterative, with the goal of gradually improving the capacity of the planner. Each iteration begins with the collection of ambient information and then the training of the open loop planner, the neural planner and the value network. The newly trained model is then used to further collect ambient information in the next iteration, ensuring continued improvement in the performance of the planner. In the process, the deep learning model can update the track through a neural network or an optimization algorithm, and the optimized track is acted on the running behavior of the sample vehicle to generate the next frame of environment information.

Fig. 5 schematically illustrates a schematic diagram of a training process of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 5, the deep learning model 500 may include an open loop planner 520, a neural planner 530, and a value network 540. The open loop planner 520 may include an encoder 5201, an open loop MLP 5202.

Referring to fig. 5, at the beginning of each iteration, training data may be collected for training. Specifically, the training data may include: the current scene information 510 of each node, such as surrounding environment information 511 of the vehicle, target position information 512, real-time changing object weight 513, first current travel track 514, and the like, may not be limited thereto. The evaluation value 550 after the entire driving is completed, such as the short-term benefit 532, the long-term benefit 541, the total benefit 551, and the like, and may not be limited thereto. The ambient information 511 may be used to input an encoder 5201, processed by the open loop planner 520 and the nerve planner 530 to generate multi-object perceived candidate travel trajectories, and supervised by short term losses. The evaluation value 550 is used to supervise the value network.

In each iteration, the deep learning model may make decisions using MCTS, which may include, for example: in the context encoding and initial track generation phase, current scene information 510 may be first acquired. The surrounding environment information 511 in the current scene information 510 may then be encoded based on an encoder 5201 in the open loop planner 520 to obtain an environment encoding vector 521. The second current travel trajectory 522 may be obtained by performing a trajectory prediction on the environmental encoding vector 521, the target position information 512, based on the open-loop MLP 5202 in the open-loop planner 520. In the track optimization stage, the calculated second current running track 522 or the predetermined first current running track 514 and the object weight 513 are taken as inputs, and track optimization processing is performed based on the nerve planner 530, so as to generate a candidate running track 531 perceived by multiple objects. In the process, the nerve planner 530 may also output short term benefits 532 of the candidate travel trajectories 531. In the value network analysis stage, the candidate travel track 531 perceived by each object may be evaluated by the value network 540, so as to obtain long-term benefits 541 after the candidate travel track 531 performs extended interaction with the environment. During training, the next executed trajectory may be selected as a probability weight based on any one of a short-term benefit 532, a long-term benefit 541, and a total benefit 551 determined based on the short-term benefit 532 and the long-term benefit 541 for each candidate travel trajectory extending from the current travel trajectory.

The overall structure of the value network 540 may be similar to that of the nerve planner 530, according to embodiments of the present disclosure. The inputs to the value network 540 may be the environmental encoding vector 521, the real-time varying object weights 513, and the next executed trajectory output by the open loop planner 520, and an evaluation value representing the executed trajectory is output.

According to an embodiment of the present disclosure, the performing iterative training on the deep learning model according to the optimized driving track, the surrounding environment information, the sample object weight, and at least one of the first sample evaluation value and the second sample evaluation value may further include: and determining the gain loss according to at least one of the first evaluation value of the sample and the second evaluation value of the sample. And training the deep learning model according to the gain loss.

According to embodiments of the present disclosure, a closed loop simulator may be used to supervise the expected assessment of the entire driving process during the training phase.

The closed loop feedback mechanism considers the closed loop test of the automatic driving vehicle in the actual application scene, and is closer to the actual application. Environmental feedback after each track is output can be simulated, and a training and testing method for balancing short-term benefits and long-term benefits is ensured. Through MCTS, different environments and contexts than training data may be generated. In practical application, under the condition that the vehicle encounters an environment deviated from the training data set, the vehicle can be ensured to adapt to a scene which is not seen, so that the model is more robust, and a higher value is realized. Since the model can collect feedback information and adjust the track of the own vehicle according to the feedback information after each simulation interaction, the iterative process can ensure that the own vehicle can perform self-adjustment according to the expected behaviors of other objects. In this embodiment, the other objects may include only the agent object, or may include the agent object and other static objects. In the case where the other object includes other static objects, the expected behavior of the other static objects may include behavior in which the position, shape, etc. of the static object changes due to natural disasters, etc., which is not limited herein.

With the above embodiments of the present disclosure, a complete decision framework can be formed by context encoding, multi-object interaction, and real-time updating of trajectories. The system has high self-adaptability and responsiveness, and can ensure that the vehicle can safely and efficiently run in a complex traffic environment.

Based on the method, the interaction between the vehicle and the target object can be actively considered to plan the current running track during closed-loop training and testing, and compared with the expert track in the pure memory training data, the method gets rid of the training paradigm simulating learning. In experimental verification, excellent performance can be exhibited, especially in complex scenarios where interaction with a large number of objects is required. The method disclosed by the invention is more robust, can adapt to various unseen traffic environments, and provides powerful support for practical application.

Fig. 6 schematically shows a block diagram of a travel track determining device according to an embodiment of the present disclosure.

As shown in fig. 6, the travel track determining apparatus 600 includes an object weight acquisition module 610, a track adjustment module 620, and a target travel track determining module 630.

The object weight obtaining module 610 is configured to obtain at least one set of object weights configured for a target object around the vehicle, where the object weights characterize a degree of influence of the target object on a driving process of the vehicle.

The track adjustment module 620 is configured to adjust a current running track of the vehicle according to an environmental encoding vector and at least one set of object weights, so as to obtain at least one candidate running track of the vehicle, where the candidate running track has a target evaluation value, and the environmental encoding vector is obtained by encoding surrounding environmental information of the vehicle.

The target driving track determining module 630 is configured to determine, as the target driving track of the vehicle, a target candidate driving track corresponding to the target evaluation value satisfying the preset condition according to at least one target evaluation value of the at least one candidate driving track.

According to an embodiment of the present disclosure, the target evaluation value includes at least one of: a first evaluation value and a second evaluation value. The first evaluation value characterizes a travel situation evaluation result of traveling a first distance based on the candidate travel track, and the second evaluation value characterizes a travel situation evaluation result of traveling a second distance based on the candidate travel track, the second distance being greater than the first distance.

According to an embodiment of the present disclosure, the target travel track determination module includes a target first evaluation value determination sub-module and a first target candidate travel track determination sub-module.

And the target first evaluation value determination submodule is used for determining a target first evaluation value with the highest value according to at least one first evaluation value determined for at least one candidate running track.

And the first target candidate running track determining submodule is used for determining the candidate running track corresponding to the first target evaluation value as the target candidate running track.

According to an embodiment of the present disclosure, the target travel track determination module includes a target second evaluation value determination sub-module and a second target candidate travel track determination sub-module.

And the target second evaluation value determination submodule is used for determining a target second evaluation value with the highest value according to at least one second evaluation value determined for at least one candidate running track.

And the second target candidate running track determining submodule is used for determining the candidate running track corresponding to the second target evaluation value as the target candidate running track.

According to an embodiment of the present disclosure, the target travel track determination module includes a total evaluation value determination sub-module, a target total evaluation value determination sub-module, and a third target candidate travel track determination sub-module.

And the total evaluation value determination submodule is used for determining the total evaluation value determined for the candidate running track according to the first evaluation value and the second evaluation value determined for the same candidate running track.

And the target total evaluation value determining sub-module is used for determining the target total evaluation value with the highest value according to the at least one total evaluation value determined for the at least one candidate driving track.

And a third target candidate travel track determination submodule for determining a candidate travel track corresponding to the target total evaluation value as a target candidate travel track.

According to an embodiment of the present disclosure, the driving trajectory determination device further includes a current driving trajectory determination module.

And the current running track determining module is used for determining the current running track according to the environment coding vector and the target position information determined for the vehicle.

According to an embodiment of the present disclosure, the number of target objects is a target number. The object weight acquisition module comprises a random weight determination submodule, a weight vector determination submodule and an object weight determination submodule.

The random weight determining sub-module is used for determining a target number of random weights according to a preset weight range.

The weight vector determination submodule is used for determining a group of weight vectors according to the random weights of the target number.

An object weight determination sub-module for determining an object weight from at least one set of weight vectors.

According to an embodiment of the present disclosure, the driving trajectory determination device further includes a target object determination module.

And the target object determining module is used for determining an object with a distance smaller than or equal to a preset distance threshold value from the vehicle as a target object related to the vehicle at the current moment according to the current position information of the vehicle.

According to an embodiment of the present disclosure, the target object includes a target agent, and the surrounding environment information includes target agent parameter information of the target agent. The driving track determining device also comprises a parameter adjusting module and a surrounding environment information updating module.

And the parameter adjustment module is used for adjusting the target intelligent agent parameter information according to the current running track or the target running track by combining with Monte Carlo tree search to obtain updated target intelligent agent parameter information.

And the surrounding environment information updating module is used for determining updated surrounding environment information according to the updated target agent parameter information.

Fig. 7 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 7, the training apparatus 700 of the deep learning model includes a first neural network module 710, a second neural network module 720, an optimized driving trajectory determination module 730, and a first training module 740.

The first neural network module 710 is configured to input environmental information around a sample of the sample vehicle into a first neural network of the deep learning model, and obtain a sample environmental code vector of the sample vehicle.

The second neural network module 720 is configured to input the sample environment encoding vector, the sample travel track of the sample vehicle, and at least one set of sample object weights configured for sample objects around the sample vehicle into the second neural network of the deep learning model, to obtain at least one sample candidate travel track of the sample vehicle, where the sample candidate travel track has a sample evaluation value.

The optimized driving track determining module 730 is configured to determine, as an optimized driving track of the vehicle, a target sample candidate driving track corresponding to a sample evaluation value satisfying a preset condition according to at least one sample evaluation value of the at least one sample candidate driving track.

The first training module 740 is configured to train the deep learning model according to the sample environment encoding vector, the sample driving track, the sample object weight, the at least one sample candidate driving track, and the optimized driving track, and obtain a trained deep learning model.

According to an embodiment of the present disclosure, the deep learning model further includes a third neural network, and the sample evaluation value includes a sample first evaluation value representing a travel situation evaluation result of traveling a first sample distance based on the sample candidate travel track and a sample second evaluation value representing a travel situation evaluation result of traveling a second sample distance based on the sample candidate travel track, the second sample distance being greater than the first sample distance. The first training module includes a third neural network sub-module and an iterative training sub-module.

And the third neural network sub-module is used for inputting the sample environment coding vector, the sample object weight and at least one sample candidate running track into a third neural network of the deep learning model to obtain a sample second evaluation value corresponding to the sample candidate running track.

And the iterative training sub-module is used for carrying out iterative training on the deep learning model according to at least one of the first sample evaluation value and the second sample evaluation value, the surrounding environment information, the sample object weight and the optimized running track.

According to an embodiment of the present disclosure, the iterative training submodule includes an i+1st frame surrounding environment information determination unit, a first neural network unit, a second neural network unit, and a third neural network unit.

An i+1th frame surrounding environment information determination unit configured to determine surrounding environment information at an end position of the i frame optimized traveling locus as i+1th frame surrounding environment information, based on the i frame surrounding environment information of the sample vehicle, the i frame sample object weight, and the i frame optimized traveling locus.

The first neural network unit is used for inputting the environmental information around the (i+1) th frame into the first neural network to obtain the sample environmental coding vector of the (i+1) th frame.

The second neural network unit is used for inputting the (i+1) th frame sample environment coding vector, the (i+1) th frame sample running track of the sample vehicle and the (i+1) th frame sample object weight configured for the (i+1) th frame sample object around the sample vehicle into the second neural network to obtain the (i+1) th frame sample candidate running track and the (i+1) th frame sample first evaluation value corresponding to the (i+1) th frame sample candidate running track.

And the third neural network unit is used for inputting the (i+1) th frame sample environment coding vector and the (i+1) th frame sample candidate running track into the third neural network to obtain a (i+1) th frame sample second evaluation value corresponding to the (i+1) th frame sample candidate running track.

According to an embodiment of the present disclosure, the iterative training submodule includes a loss of revenue determination unit and a training unit.

And the gain loss determining unit is used for determining gain loss according to at least one of the first evaluation value of the sample and the second evaluation value of the sample.

And the training unit is used for training the deep learning model according to the gain loss.

According to an embodiment of the present disclosure, the training apparatus of the deep learning model further includes a first distance loss determination module and a second training module.

And the first distance loss determining module is used for determining the first distance loss according to the distance between the target area and the end position of the sample running track, and the target area represents the area related to the end position.

And the second training module is used for training the deep learning model according to the first distance loss.

According to an embodiment of the present disclosure, the training apparatus of the deep learning model further includes a second distance loss determination module, a collision time loss determination module, an acceleration loss determination module, and a third training module.

And the second distance loss determining module is used for determining the second distance loss according to the distance between the track point of the sample running track and the sample object.

And the collision time loss determining module is used for determining the collision time loss according to the sample environment coding vector and the sample running track.

And the acceleration loss determining module is used for determining the acceleration loss according to the acceleration information represented by the sample running track.

And the third training module is used for training the deep learning model according to at least one of the second distance loss, the collision time loss and the acceleration loss.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the travel track determination method and the training method of the deep learning model of the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform at least one of the travel track determination method and the training method of the deep learning model of the present disclosure.

According to an embodiment of the present disclosure, a computer program product includes a computer program stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implements at least one of a travel track determination method and a training method of a deep learning model of the present disclosure.

Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to an input/output (I/O) interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 801 performs the respective methods and processes described above, such as at least one of a travel track determination method and a training method of a deep learning model. For example, in some embodiments, at least one of the travel track determination method and the training method of the deep learning model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of at least one of the travel locus determination method and the training method of the deep learning model described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform at least one of a travel track determination method and a training method of the deep learning model in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A travel track determination method, comprising:

acquiring at least one group of object weights configured for a target object around a vehicle, the object weights representing the degree of influence of the target object on the running process of the vehicle;

according to the environment coding vector and the at least one group of object weights, the current running track of the vehicle is adjusted to obtain at least one candidate running track of the vehicle, the candidate running track has a target evaluation value, and the environment coding vector is obtained by coding surrounding environment information of the vehicle; and

And determining a target candidate running track corresponding to the target evaluation value meeting the preset condition as the target running track of the vehicle according to at least one target evaluation value of the at least one candidate running track.

2. The method of claim 1, wherein the target evaluation value comprises at least one of: a first evaluation value, a second evaluation value; the first evaluation value represents a running condition evaluation result based on a first distance traveled by the candidate running track, and the second evaluation value represents a running condition evaluation result based on a second distance traveled by the candidate running track, the second distance being greater than the first distance.

3. The method of claim 2, wherein the determining, according to the at least one target evaluation value of the at least one candidate travel track, a target candidate travel track corresponding to a target evaluation value satisfying a preset condition includes:

determining a target first evaluation value with the highest value according to at least one first evaluation value determined for the at least one candidate driving track; and

and determining a candidate running track corresponding to the target first evaluation value as the target candidate running track.

4. The method of claim 2, wherein the determining, according to the at least one target evaluation value of the at least one candidate travel track, a target candidate travel track corresponding to a target evaluation value satisfying a preset condition includes:

determining a target second evaluation value with the highest value according to at least one second evaluation value determined for the at least one candidate driving track; and

and determining a candidate running track corresponding to the target second evaluation value as the target candidate running track.

5. The method of claim 2, wherein the determining, according to the at least one target evaluation value of the at least one candidate travel track, a target candidate travel track corresponding to a target evaluation value satisfying a preset condition includes:

determining a total evaluation value determined for the candidate travel track according to a first evaluation value and a second evaluation value determined for the same candidate travel track;

determining a target total evaluation value with the highest value according to at least one total evaluation value determined for the at least one candidate driving track; and

and determining a candidate running track corresponding to the target total evaluation value as the target candidate running track.

6. The method of any of claims 1-5, further comprising: before the current travel track of the vehicle is adjusted according to the context-encoding vector and the at least one set of object weights,

and determining the current running track according to the environment coding vector and the target position information determined for the vehicle.

7. The method of any of claims 1-6, wherein the number of target objects is a target number; the acquiring at least one set of object weights configured for a target object around a vehicle includes:

determining a target number of random weights according to a preset weight range;

determining a set of weight vectors according to the target number of random weights; and

the object weights are determined from at least one set of the weight vectors.

8. The method of any of claims 1-7, further comprising: prior to the acquiring of the at least one set of object weights configured for the target object around the vehicle,

and determining an object with a distance smaller than or equal to a preset distance threshold value from the vehicle as the target object related to the vehicle at the current moment according to the current position information of the vehicle.

9. The method of any of claims 1-8, wherein the target object comprises a target agent and the ambient environment information comprises target agent parameter information of the target agent; the method further comprises the steps of:

according to the current running track or the target running track, the Monte Carlo tree search is combined, and the target intelligent agent parameter information is adjusted to obtain updated target intelligent agent parameter information; and

and determining updated surrounding environment information according to the updated target agent parameter information.

10. A training method of a deep learning model, comprising:

inputting sample surrounding environment information of a sample vehicle into a first neural network of a deep learning model to obtain a sample environment coding vector of the sample vehicle;

inputting the sample environment coding vector, the sample running track of the sample vehicle and at least one group of sample object weights configured for sample objects around the sample vehicle into a second neural network of the deep learning model to obtain at least one sample candidate running track of the sample vehicle, wherein the sample candidate running track has a sample evaluation value;

Determining a target sample candidate running track corresponding to the sample evaluation value meeting a preset condition as an optimized running track of the vehicle according to at least one sample evaluation value of the at least one sample candidate running track; and

and training the deep learning model according to the sample environment coding vector, the sample running track, the sample object weight, the at least one sample candidate running track and the optimized running track to obtain a trained deep learning model.

11. The method of claim 10, wherein the deep learning model further comprises a third neural network, the sample evaluation values comprising a sample first evaluation value and a sample second evaluation value, the sample first evaluation value characterizing a travel situation evaluation result based on the sample candidate travel track traveling a first sample distance, the sample second evaluation value characterizing a travel situation evaluation result based on the sample candidate travel track traveling a second sample distance, the second sample distance being greater than the first sample distance;

the training the deep learning model according to the sample environment coding vector, the sample travel track, the sample object weight, the at least one sample candidate travel track and the optimized travel track, and the obtaining the trained deep learning model includes:

Inputting the sample environment coding vector, the sample object weight and the at least one sample candidate running track into a third neural network of the deep learning model to obtain the sample second evaluation value corresponding to the sample candidate running track; and

and performing iterative training on the deep learning model according to at least one of the first sample evaluation value and the second sample evaluation value, the surrounding environment information, the sample object weight and the optimized running track.

12. The method of claim 11, wherein the iteratively training the deep learning model based on at least one of the sample first evaluation value and the sample second evaluation value, the ambient information, the sample object weights, and the optimized travel trajectory comprises:

according to the i-frame surrounding environment information, the i-frame sample object weight and the i-frame optimized running track of the sample vehicle, surrounding environment information at the end position of the i-frame optimized running track is determined and used as i+1-frame surrounding environment information;

inputting the environmental information around the (i+1) th frame into the first neural network to obtain an (i+1) th frame sample environmental coding vector;

Inputting the i+1st frame sample environment coding vector, the i+1st frame sample running track of the sample vehicle and the i+1st frame sample object weight configured for the i+1st frame sample object around the sample vehicle into the second neural network to obtain an i+1st frame sample candidate running track and an i+1st frame sample first evaluation value corresponding to the i+1st frame sample candidate running track; and

and inputting the i+1st frame sample environment coding vector and the i+1st frame sample candidate running track into the third neural network to obtain an i+1st frame sample second evaluation value corresponding to the i+1st frame sample candidate running track.

13. The method of claim 11 or 12, wherein the iteratively training the deep learning model based on at least one of the sample first evaluation value and the sample second evaluation value, the surrounding information, the sample object weight, and the optimized driving trajectory comprises:

determining a loss of revenue according to at least one of the first sample evaluation value and the second sample evaluation value; and

and training the deep learning model according to the gain loss.

14. The method of any of claims 10-13, further comprising:

determining a first distance loss according to a distance between a target area and an end position of the sample travel track, wherein the target area represents an area related to the end position; and

and training the deep learning model according to the first distance loss.

15. The method of any of claims 10-14, further comprising:

determining a second distance loss according to the distance between the track point of the sample running track and the sample object;

determining collision time loss according to the sample environment coding vector and the sample running track;

determining acceleration loss according to the acceleration information represented by the sample running track; and

training the deep learning model according to at least one of the second distance loss, the collision time loss, and the acceleration loss.

16. A travel track determining device comprising:

an object weight acquisition module for acquiring at least one set of object weights configured for a target object around a vehicle, the object weights characterizing a degree of influence of the target object on a running process of the vehicle;

The track adjustment module is used for adjusting the current running track of the vehicle according to the environment coding vector and the at least one group of object weights to obtain at least one candidate running track of the vehicle, wherein the candidate running track has a target evaluation value, and the environment coding vector is obtained by coding surrounding environment information of the vehicle; and

and the target running track determining module is used for determining a target candidate running track corresponding to the target evaluation value meeting the preset condition according to at least one target evaluation value of the at least one candidate running track as the target running track of the vehicle.

17. The apparatus of claim 16, wherein the target evaluation value comprises at least one of: a first evaluation value, a second evaluation value; the first evaluation value represents a running condition evaluation result based on a first distance traveled by the candidate running track, and the second evaluation value represents a running condition evaluation result based on a second distance traveled by the candidate running track, the second distance being greater than the first distance.

18. The apparatus of claim 17, wherein the target travel trajectory determination module comprises:

A target first evaluation value determination submodule, configured to determine a target first evaluation value with a highest numerical value according to at least one first evaluation value determined for the at least one candidate travel track; and

and the first target candidate driving track determining submodule is used for determining the candidate driving track corresponding to the target first evaluation value as the target candidate driving track.

19. The apparatus of claim 17, wherein the target travel trajectory determination module comprises:

a target second evaluation value determination submodule, configured to determine a target second evaluation value with a highest numerical value according to at least one second evaluation value determined for the at least one candidate travel track; and

and the second target candidate driving track determining submodule is used for determining the candidate driving track corresponding to the second target evaluation value as the target candidate driving track.

20. The apparatus of claim 17, wherein the target travel trajectory determination module comprises:

a total evaluation value determination sub-module for determining a total evaluation value determined for the candidate travel track according to a first evaluation value and a second evaluation value determined for the same candidate travel track;

A target total evaluation value determination submodule for determining a target total evaluation value with the highest value according to at least one total evaluation value determined for the at least one candidate driving track; and

and a third target candidate travel track determination submodule, configured to determine a candidate travel track corresponding to the target total evaluation value as the target candidate travel track.

21. The apparatus of any of claims 16-20, further comprising:

22. The apparatus of any of claims 16-21, wherein the number of target objects is a target number; the object weight acquisition module includes:

the random weight determining submodule is used for determining a target number of random weights according to a preset weight range;

the weight vector determining submodule is used for determining a group of weight vectors according to the random weights of the target number; and

an object weight determination sub-module for determining the object weight from at least one set of the weight vectors.

23. The apparatus of any of claims 16-22, further comprising:

And the target object determining module is used for determining an object with a distance smaller than or equal to a preset distance threshold value from the vehicle as the target object related to the vehicle at the current moment according to the current position information of the vehicle.

24. The apparatus of any of claims 16-23, wherein the target object comprises a target agent and the ambient environment information comprises target agent parameter information of the target agent; the apparatus further comprises:

the parameter adjustment module is used for adjusting the parameter information of the target object according to the current running track or the target running track by combining Monte Carlo tree search to obtain updated parameter information of the target object; and

25. A training device for a deep learning model, comprising:

the first neural network module is used for inputting the sample surrounding environment information of the sample vehicle into the first neural network of the deep learning model to obtain a sample environment coding vector of the sample vehicle;

a second neural network module, configured to input the sample environment encoding vector, a sample running track of the sample vehicle, and at least one set of sample object weights configured for sample objects around the sample vehicle into a second neural network of the deep learning model, to obtain at least one sample candidate running track of the sample vehicle, where the sample candidate running track has a sample evaluation value;

An optimized running track determining module, configured to determine, according to at least one sample evaluation value of the at least one sample candidate running track, a target sample candidate running track corresponding to a sample evaluation value that meets a preset condition, as an optimized running track of the vehicle; and

and the first training module is used for training the deep learning model according to the sample environment coding vector, the sample running track, the sample object weight, the at least one sample candidate running track and the optimized running track to obtain a trained deep learning model.

26. The apparatus of claim 25, wherein the deep learning model further comprises a third neural network, the sample evaluation values comprising a sample first evaluation value and a sample second evaluation value, the sample first evaluation value characterizing a travel situation evaluation result based on the sample candidate travel track traveling a first sample distance, the sample second evaluation value characterizing a travel situation evaluation result based on the sample candidate travel track traveling a second sample distance, the second sample distance being greater than the first sample distance;

the first training module includes:

A third neural network sub-module, configured to input the sample environment encoding vector, the sample object weight, and the at least one sample candidate driving track into a third neural network of the deep learning model, to obtain the sample second evaluation value corresponding to the sample candidate driving track; and

27. The apparatus of claim 26, wherein the iterative training submodule comprises:

an i+1th frame surrounding environment information determination unit configured to determine surrounding environment information at a destination position of an i-th frame optimized traveling locus as i+1th frame surrounding environment information, based on i-th frame surrounding environment information of the sample vehicle, i-th frame sample object weights, and the i-th frame optimized traveling locus;

the first neural network unit is used for inputting the environmental information around the (i+1) th frame into the first neural network to obtain an (i+1) th frame sample environmental coding vector;

A second neural network unit, configured to input the i+1st frame sample environment encoding vector, the i+1st frame sample driving track of the sample vehicle, and the i+1st frame sample object weight configured for the i+1st frame sample object around the sample vehicle into the second neural network, to obtain an i+1st frame sample candidate driving track and an i+1st frame sample first evaluation value corresponding to the i+1st frame sample candidate driving track; and

and the third neural network unit is used for inputting the (i+1) th frame sample environment coding vector and the (i+1) th frame sample candidate running track into the third neural network to obtain an (i+1) th frame sample second evaluation value corresponding to the (i+1) th frame sample candidate running track.

28. The apparatus of claim 26 or 27, wherein the iterative training submodule comprises:

a gain loss determination unit configured to determine a gain loss based on at least one of the sample first evaluation value and the sample second evaluation value; and

29. The apparatus of any of claims 25-28, further comprising:

The first distance loss determining module is used for determining a first distance loss according to the distance between a target area and the end position of the sample running track, wherein the target area represents an area related to the end position; and

30. The apparatus of any of claims 25-29, further comprising:

the second distance loss determining module is used for determining a second distance loss according to the distance between the track point of the sample running track and the sample object;

the collision time loss determining module is used for determining collision time loss according to the sample environment coding vector and the sample running track;

the acceleration loss determining module is used for determining acceleration loss according to the acceleration information represented by the sample running track; and

31. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-15.

32. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-15.

33. A computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements the method according to any one of claims 1-15.