CN118551806A

CN118551806A - Automatic driving model based on state node prediction, automatic driving method and device

Info

Publication number: CN118551806A
Application number: CN202410693771.7A
Authority: CN
Inventors: 王凡; 黄际洲
Original assignee: Apollo Intelligent Technology Beijing Co Ltd
Current assignee: Apollo Intelligent Technology Beijing Co Ltd
Priority date: 2024-05-30
Filing date: 2024-05-30
Publication date: 2024-08-27

Abstract

The disclosure provides an automatic driving model based on state node prediction, an automatic driving method and an automatic driving device, and particularly relates to the technical fields of automatic driving and artificial intelligence. The implementation scheme is that the automatic driving model comprises: an encoding layer configured to encode input information including at least perception information and a vehicle state, wherein the vehicle state includes a position coordinate and a velocity vector of the vehicle, to obtain encoded input information; a prediction layer configured to process the encoded input information to obtain a predicted probability distribution of a vehicle state at a future time after a predetermined time from a current time over a predefined state range, wherein the predicted probability distribution is used to determine the predicted state of the vehicle to generate a control signal of the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between the probability distribution of the candidate state of the vehicle and the predicted probability distribution.

Description

Automatic driving model based on state node prediction, automatic driving method and device

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to the field of autopilot and artificial intelligence technology, and more particularly, to an autopilot model, an autopilot method and apparatus, an autopilot model training method and apparatus, an electronic device, a computer readable storage medium, and a computer program product based on state node prediction.

Background

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

In the automatic driving process, environmental information around the vehicle is analyzed through an automatic driving model to obtain driving strategy information for controlling the vehicle to continue to run.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.

Disclosure of Invention

The present disclosure provides an autopilot model, an autopilot method and apparatus, a training method and apparatus for autopilot model, an electronic device, a computer readable storage medium and a computer program product based on state node prediction.

According to an aspect of the present disclosure, there is provided an automatic driving model including: an encoding layer configured to encode input information including at least perceptual information and a vehicle state, wherein the vehicle state includes a position coordinate and a velocity vector of a vehicle, to obtain encoded input information; a prediction layer configured to process the encoded input information to obtain a predicted probability distribution of a vehicle state at a future time after a predetermined time from a current time over a predefined state range, wherein the predicted probability distribution is used to determine a predicted state of a vehicle to generate a control signal of the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between a probability distribution of a candidate state of the vehicle and the predicted probability distribution.

According to another aspect of the present disclosure, there is provided an automatic driving method implemented using an automatic driving model according to an embodiment of the present disclosure, including: encoding input information comprising at least perceptual information and a vehicle state, wherein the vehicle state comprises a position coordinate and a velocity vector of the vehicle, to obtain encoded input information; processing the encoded input information to obtain a predicted probability distribution of a vehicle state at a future time after a predetermined time from a current time over a predefined state range, wherein the predicted probability distribution is used to determine a predicted state of a vehicle to generate a control signal for the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between a probability distribution of candidate states of the vehicle and the predicted probability distribution.

According to another aspect of the present disclosure, there is provided a method of training an autopilot model according to an embodiment of the present disclosure, comprising: determining a sample data set for training an autopilot model, wherein sample data in the sample data set comprises a plurality of sample inputs, each sample input comprising perception information for a corresponding sample time instant and a vehicle state, wherein the vehicle state comprises a position coordinate and a velocity vector of a vehicle, the sample data further comprising a annotated real vehicle state for each sample input, wherein the real vehicle state corresponds to a future time instant after a predetermined length of time from the sample; encoding the first sample input data with an encoding layer to obtain an encoded first sample input; processing the encoded first sample input with a prediction layer to obtain a predicted probability distribution over a predefined state range for a vehicle state at a future time instant after a predetermined time period from a sample time instant corresponding to the first sample input; determining a probability distribution of a first real vehicle state for the first sample input from the sample dataset; training the autopilot model by adjusting parameters of the autopilot model, wherein the parameters are adjusted to reduce a variance in the predictive probability distribution and the probability distribution of the first real vehicle state.

According to another aspect of the present disclosure, there is provided an automatic driving apparatus implemented by an automatic driving model according to an embodiment of the present disclosure, including: an encoding unit configured to encode input information including at least perception information and a vehicle state, wherein the vehicle state includes a position coordinate and a speed vector of a vehicle, to obtain encoded input information; a prediction unit configured to process the encoded input information to obtain a predicted probability distribution of a vehicle state at a future time after a predetermined time from a current time over a predefined state range, wherein the predicted probability distribution is used to determine a predicted state of a vehicle to generate a control signal of the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between a probability distribution of a candidate state of the vehicle and the predicted probability distribution.

According to another aspect of the present disclosure, there is provided an apparatus for training an autopilot model according to an embodiment of the present disclosure, comprising: a sample data determination unit configured to determine a sample data set for training an automatic driving model, wherein sample data in the sample data set comprises a plurality of sample inputs, each sample input comprising perception information of a corresponding sample time instant and a vehicle state, wherein the vehicle state comprises a position coordinate and a velocity vector of a vehicle, the sample data further comprising a annotated real vehicle state for each sample input, wherein the real vehicle state corresponds to a future time instant after a predetermined time period from the sample start; an encoding unit configured to encode the first sample input data with an encoding layer to obtain an encoded first sample input; a prediction unit configured to process the encoded first sample input with a prediction layer to obtain a predicted probability distribution over a predefined state range of vehicle states at a future time instant after a predetermined time period from a sample time instant corresponding to the first sample input; a real target determination unit configured to determine a probability distribution of a first real vehicle state for the first sample input from the sample data set; a training unit configured to train the autonomous driving model by adjusting parameters of the autonomous driving model, wherein the parameters are adjusted to reduce a difference in the predicted probability distribution and the probability distribution of the first real vehicle state.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the above-described method.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements the above method.

According to another aspect of the present disclosure, there is provided an autonomous vehicle including: one of an autopilot apparatus and an electronic device according to embodiments of the present disclosure.

According to one or more embodiments of the present disclosure, the autopilot capability of an autopilot model may be improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, in accordance with an embodiment of the present disclosure;

FIG. 2 illustrates an exemplary block diagram of an autopilot model in accordance with an embodiment of the present disclosure;

FIG. 3 illustrates an exemplary flow chart of an autopilot method in accordance with an embodiment of the present disclosure;

An exemplary flowchart of a method of training the autopilot model described in connection with fig. 2 is shown in fig. 4, in accordance with an embodiment of the present disclosure;

FIG. 5 illustrates an exemplary block diagram of an autopilot in accordance with an embodiment of the present disclosure;

FIG. 6 illustrates an exemplary block diagram of an apparatus for training the autopilot model described in connection with FIG. 2, in accordance with an embodiment of the present disclosure;

fig. 7 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another element. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.

The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes a motor vehicle 110, a server 120, and one or more communication networks 130 coupling the motor vehicle 110 to the server 120.

In an embodiment of the present disclosure, motor vehicle 110 may include a computing device in accordance with an embodiment of the present disclosure and/or be configured to perform a method in accordance with an embodiment of the present disclosure.

The server 120 may run one or more services or software applications that enable the method of autopilot. In some embodiments, server 120 may also provide other services or software applications, which may include non-virtual environments and virtual environments. In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user of motor vehicle 110 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.

In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from motor vehicle 110. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of motor vehicle 110.

Network 130 may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, the one or more networks 130 may be a satellite communications network, a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a blockchain network, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (including, for example, bluetooth, wiFi), and/or any combination of these with other networks.

The system 100 may also include one or more databases 150. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 150 may be used to store information such as audio files and video files. Database 150 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 150 may be of different types. In some embodiments, the data store used by server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.

In some embodiments, one or more of databases 150 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.

Motor vehicle 110 may include a sensor 111 for sensing the surrounding environment. The sensors 111 may include one or more of the following: visual cameras, infrared cameras, ultrasonic sensors, millimeter wave radar, and laser radar (LiDAR). Different sensors may provide different detection accuracy and range. The camera may be mounted in front of, behind or other locations on the vehicle. The vision cameras can capture the conditions inside and outside the vehicle in real time and present them to the driver and/or passengers. In addition, by analyzing the captured images of the visual camera, information such as traffic light indication, intersection situation, other vehicle running state, etc. can be acquired. The infrared camera can capture objects under night vision. The ultrasonic sensor can be arranged around the vehicle and is used for measuring the distance between an object outside the vehicle and the vehicle by utilizing the characteristics of strong ultrasonic directivity and the like. The millimeter wave radar may be installed in front of, behind, or other locations of the vehicle for measuring the distance of an object outside the vehicle from the vehicle using the characteristics of electromagnetic waves. Lidar may be mounted in front of, behind, or other locations on the vehicle for detecting object edges, shape information for object identification and tracking. The radar apparatus may also measure a change in the speed of the vehicle and the moving object due to the doppler effect.

Motor vehicle 110 may also include a communication device 112. The communication device 112 may include a satellite positioning module capable of receiving satellite positioning signals (e.g., beidou, GPS, GLONASS, and GALILEO) from satellites 141 and generating coordinates based on these signals. The communication device 112 may also include a module for communicating with the mobile communication base station 142, and the mobile communication network may implement any suitable communication technology, such as the current or evolving wireless communication technology (e.g., 5G technology) such as GSM/GPRS, CDMA, LTE. The communication device 112 may also have a Vehicle-to-Everything (V2X) module configured to enable, for example, vehicle-to-Vehicle (V2V) communication with other vehicles 143 and Vehicle-to-Infrastructure (V2I) communication with Infrastructure 144. In addition, the communication device 112 may also have a module configured to communicate with a user terminal 145 (including but not limited to a smart phone, tablet computer, or wearable device such as a watch), for example, by using a wireless local area network or bluetooth of the IEEE 802.11 standard. With the communication device 112, the motor vehicle 110 can also access the server 120 via the network 130.

Motor vehicle 110 may also include a control device 113. The control device 113 may include a processor, such as a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU), or other special purpose processor, etc., in communication with various types of computer readable storage devices or mediums. The control device 113 may include an autopilot system for automatically controlling various actuators in the vehicle. The autopilot system is configured to control a powertrain, steering system, braking system, etc. of a motor vehicle 110 (not shown) via a plurality of actuators in response to inputs from a plurality of sensors 111 or other input devices to control acceleration, steering, and braking, respectively, without human intervention or limited human intervention. Part of the processing functions of the control device 113 may be implemented by cloud computing. For example, some of the processing may be performed using an onboard processor while other processing may be performed using cloud computing resources. The control device 113 may be configured to perform a method according to the present disclosure. Furthermore, the control means 113 may be implemented as one example of a computing device on the motor vehicle side (client) according to the present disclosure.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

In the related art, the main automatic driving decision control modes include: (1) predicting steering wheel and throttle actions; (2) predicting a sequence of location points.

The main problems of the two control modes include: (1) The method of directly predicting the steering wheel and the throttle makes it difficult to accurately control the vehicle position, so that when the model meets the requirement of stopping the vehicle before stopping the vehicle, accurate control is difficult to realize. In addition, there is a physical delay in the generation of steering wheel and throttle signals. (2) The sequence of location points predicts that the decision-making vehicle behavior is likely to be non-dynamic. Furthermore, predicting location points alone cannot deal with the characteristics of Multi-Modal for autopilot. For example, when there are in fact multiple possible driving routes, a direct fitting of the location points of the multiple routes may result in the model finding an intermediate route closer to all possible routes, instead of getting a prediction of the distance from the ideal route, and may even generate an unrealistic route, for example directly through an obstacle present in between the two routes.

To solve the above-described problems, the present disclosure provides a new automatic driving model.

Fig. 2 illustrates an exemplary block diagram of an autopilot model in accordance with an embodiment of the present disclosure.

As shown in fig. 2, the autopilot model 200 includes a coding layer 210 and a prediction layer 220.

The encoding layer 210 is configured to encode input information including at least perceptual information and a vehicle state, where the vehicle state includes a position coordinate and a velocity vector of the vehicle, to obtain the encoded input information.

The prediction layer 220 is configured to process the encoded input information to obtain a predicted probability distribution of a vehicle state over a predefined state range at a future time after a predetermined time from a current time, wherein the predicted probability distribution is used to determine a predicted state of the vehicle to generate a control signal of the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between the probability distribution of candidate states of the vehicle and the predicted probability distribution.

With the above-described autonomous driving model provided by embodiments of the present disclosure, by predicting the probability distribution of the vehicle state over a predefined state range, when a multimode situation occurs, the probability distribution of the failed position points will be predicted to be close to zero, thereby avoiding the fitting result to approach a virtually failed route. Further, by predicting the probability distribution over a range of states including vehicle position and speed, more precise control may be provided for the automated driving process.

The principles of the present disclosure will be described in detail below.

The encoding layer 210 may be configured to encode input information including at least perception information and a vehicle state, where the vehicle state includes position coordinates and a velocity vector of the vehicle, to obtain encoded input information. The sensing information and the vehicle state referred to herein may include sensing information and the vehicle state at the current time, and may include historical sensing information and historical vehicle state for a predetermined period of time prior to the current time.

Wherein the sensory information may include sensor inputs collected by at least one sensor mounted on the autonomous vehicle. The perceived information for the vehicle surroundings may include at least one of: perception information of one or more cameras, perception information of one or more lidars, and perception information of one or more millimeter wave radars. The current sensing information may include sensor inputs collected at the current time t.

In some embodiments, the perceptual information may be encoded with the encoding layer 210 to obtain an implicit representation corresponding to the perceptual information. The coding layer may be implemented as a Transformer-based neural network layer or a CNN-based neural network layer. The application is not limited herein to the specific implementation of the coding layer. In some examples, the implicit representation referred to herein may refer to an implicit representation of the perceptual information in the BEV space. For example, the current perception information may be mapped to a bird's eye view BEV space to obtain a continuous BEV representation of the current perception information. The current sense information may be mapped to the BEV space using a model such as BEVFormer, BEVFusion to obtain a continuous BEV representation of the current sense information. The continuous BEV representation may then be discretized according to a pre-trained vocabulary to obtain a discrete spatial representation of the perceptual information.

While the principles of the present disclosure have been described above in terms of the representation of perceptual information in a BEV space, it will be appreciated that it is also possible to map the perceptual information under other spaces than the BEV space and determine discrete spatial representations of the corresponding perceptual information without departing from the principles of the present disclosure. Furthermore, any other suitable way other than a vocabulary may be utilized to determine the discrete spatial representation of the perceptual information.

The prediction layer 220 may be configured to process the encoded input information to obtain a predicted probability distribution of a vehicle state at a future time after a predetermined time from a current time over a predefined state range, wherein the predicted probability distribution is used to determine the predicted state of the vehicle to generate the control signal of the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between the probability distribution of the candidate state of the vehicle and the predicted probability distribution. Wherein the predictive probability distribution may be continuous or discrete.

The prediction layer 220 may be implemented as a transform-based neural network layer, or may be a CNN-based neural network layer. The application is not limited herein to the specific implementation of the prediction layer.

As previously described, the vehicle state includes both the position coordinates and the speed vector of the vehicle. Accordingly, the predefined state ranges referred to in the embodiments of the present disclosure include a predetermined position range and a predetermined speed range centered on the currently processed vehicle state, that is, a variation range of the vehicle state. Each vehicle state within the state range may be defined by a position point within a predetermined position range and a speed point within a predetermined speed range. Wherein the predefined state range may comprise a plurality of discretized state points, wherein each state point has a position parameter and a velocity parameter within the respective state range. The predefined state ranges may be represented by discretized grids, where each grid corresponds to a state point and the center point of each grid may correspond to a respective position point and velocity point. In an example, each location point and speed point may also include multiple components. For example, the predefined state range may include N grids, n=n _x×N_y×N_vx×N_vy. Where N _x denotes the number of position points in the x-direction, N _y denotes the number of position points in the y-direction orthogonal to the x-direction, N _vx denotes the number of velocity points in the x-direction, and N _vx denotes the number of velocity points in the y-direction. The above-mentioned permutation and combination of position points and velocity points may constitute N status points within a predefined status range.

The predictive layer 220 may process the input encoded input information, sense and infer the input information through a trained neural network, and derive a predictive probability distribution for a predefined state range. In the case where the predefined state range includes N state points, the predictive probability distribution output by the predictive layer 220 includes a predictive probability output for each state point within the state range, the predictive probability indicating a likelihood that the vehicle state at the future time is a position parameter and a speed parameter corresponding to the respective state point.

It can be seen that in embodiments provided by the present disclosure, the autopilot model does not directly predict the position, speed or driving action of the vehicle at a future time, but rather indicates possible driving decision information by predicting the probability distribution of the vehicle state at the future time. If an obstacle exists in the middle of the lane, the vehicle can choose to bypass from the left side or the right side, and the weights of the two routes are similar. In this case, the result output by the prediction layer will indicate that the state points corresponding to the position points on the routes on the left and right sides of the obstacle have similar probabilities, and the probability of the corresponding state point of the position point on the middle of the two routes (i.e. the route passing through the obstacle) will be close to 0, so that the result will not be marked in the final output result, and thus the problem that the route result deviating from the possible route (i.e. passing through the obstacle) is obtained after fitting two different routes with similar probabilities in the related art can be overcome.

After obtaining the predicted probability distribution of the state points at the future time by using the prediction layer 220, the state points corresponding to the future time may be obtained by inference.

In some embodiments, the predicted state of the vehicle at the future time may be determined by minimizing the difference between the probability distribution of the candidate state of the vehicle and the predicted probability distribution. In an example, an initial candidate state may be determined. The initial candidate state may be randomly generated or may be a predetermined fixed initial state. A candidate state probability distribution for the initial candidate state within a predefined state range of the predicted state may be determined, and then a difference between the candidate state probability distribution and the predicted probability distribution is determined. Wherein the probability distribution of the initial candidate state over the predefined state range may be determined based on a probability distribution function determined by the predefined standard deviation.

In an example, a KL (Kullback-Leibler) distance may be utilized to represent the difference between the candidate state probability distribution and the predicted probability distribution. The initial candidate states may then be optimized by an optimization algorithm to minimize the difference between the candidate state probability distribution and the predicted probability distribution. In an example, a gradient descent algorithm, a gauss-newton algorithm, or the like may be employed as the optimization algorithm. The difference between two probability distributions in the same event space can be measured by using the KL distance. It is understood that other metrics may be used instead of KL distance description candidate state probability distribution and predictive probability distribution differences without departing from the principles of the present disclosure.

In some examples, the objective of the optimization algorithm may be represented by equation (1):

wherein, Representing the predictive probability distribution, i, j, k, m representing the index of the state point within the state range, corresponding to the position parameter and the velocity parameter in the x, y direction respectively,The candidate state is represented as such,Representing the probability distribution of candidate states over a range of states.

The final predicted state can be obtained by adjusting the parameters of the candidate states to minimize the objective of the optimization algorithm.

In some embodiments, the predicted state may include a position and a speed of the vehicle at a future time. The vehicle control signal can be reversely inferred according to the predicted position and the predicted speed of the vehicle corresponding to the predicted state and the current position and the current speed of the vehicle at the current moment by using a reverse dynamics model. The control signal may be used to control the vehicle from a current vehicle state to a predicted state at a future time. In an example, the control signals may be a steering wheel directional control signal and a throttle signal.

In some embodiments, the predicted state may include a vehicle position and speed of the vehicle at a future time instant based on the forward dynamics model represented by control signals for the vehicle from the current time instant to the future time instant. In this case, the position and speed of the vehicle indicated in the predicted state at the future time instant are represented by a function of the control signal. Thus, after the result of the prediction state is determined by the optimization algorithm, the corresponding control signal may be determined from the functional representation of the control signal. With this approach, the reverse dynamics-based reasoning process can be omitted, thereby reducing the reasoning errors of the model.

Fig. 3 illustrates an exemplary flow chart of an autopilot method in accordance with an embodiment of the present disclosure. The autopilot method shown in fig. 3 may be implemented using the autopilot model described in connection with fig. 2. The advantages of the autopilot model described above in connection with fig. 2 are equally applicable to the autopilot method 300 and are not described in detail herein.

In step S302, input information including at least perception information and a vehicle state, where the vehicle state includes a position coordinate and a speed vector of the vehicle, may be encoded to obtain encoded input information.

In step S304, the encoded input information may be processed to obtain a predicted probability distribution of a vehicle state at a future time after a predetermined time from a current time over a predefined state range, wherein the predicted probability distribution is used to determine the predicted state of the vehicle to generate a control signal for the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between the probability distribution of the candidate state of the vehicle and the predicted probability distribution.

With the above-described autonomous driving method provided by the embodiments of the present disclosure, by predicting the probability distribution of the vehicle state over a predefined state range, when a multimode situation occurs, the probability distribution of the failed position points will be predicted to be close to zero, thereby avoiding the fitting result to approach a virtually failed route. Further, by predicting the probability distribution over a range of states including vehicle position and speed, more precise control may be provided for the automated driving process.

In some embodiments, the encoded input information is an implicit representation of the input information in the bird's eye view space.

In some embodiments, the predefined state range includes a predetermined position range and a predetermined speed range based on a vehicle state centered at a current time.

In some embodiments, the predefined state range includes a plurality of state points that are discretized, each state point in the plurality of state points having a position parameter and a velocity parameter within the respective state range.

In some embodiments, the predictive probability distribution includes a predictive probability output for each of a plurality of state points, wherein the predictive probability indicates a likelihood that the predicted state at the future time is the corresponding position parameter and speed parameter for the vehicle state.

In some embodiments, the predicted state includes a vehicle position and speed of the vehicle at a future time, and the control signal of the vehicle is generated by: determining a predicted state by minimizing a difference between a probability distribution of a candidate state of the vehicle and the predicted probability distribution; based on the reverse dynamics model, a control signal for the vehicle from the current time instant to the future time instant is determined from the predicted position and the predicted speed of the vehicle at the future time instant and the current position and the current speed at the current time instant.

In some embodiments, the predicted state includes a vehicle position and speed of the vehicle at a future time instant based on a forward dynamics model represented by a control signal for the vehicle from the current time instant to the future time instant, the control signal of the vehicle being generated by: determining a predicted state by minimizing a difference between a probability distribution of a candidate state of the vehicle and the predicted probability distribution; a control signal for the vehicle from the current time to a future time is determined based on the predicted state.

An exemplary flowchart of a method of training the autopilot model described in connection with fig. 2 is shown in fig. 4, in accordance with an embodiment of the present disclosure.

In step S402, a sample data set for training an autopilot model may be determined, wherein sample data in the sample data set comprises a plurality of sample inputs, each sample input comprising perception information for a corresponding sample time and a vehicle state, wherein the vehicle state comprises a position coordinate and a velocity vector of the vehicle, and further comprising a annotated real vehicle state for each sample input, wherein the real vehicle state corresponds to a future time after a predetermined length of time from the sample.

In step S404, the first sample input data may be encoded with an encoding layer to obtain an encoded first sample input. Wherein the encoded first sample input may be an implicit representation of the first sample input in the bird's eye view space. Wherein the coding layer may be a Transformer or CNN based neural network layer, as described in connection with fig. 2.

In step S406, the encoded first sample input may be processed with a predictive layer to obtain a predictive probability distribution over a predefined state range for a vehicle state at a future time instant after a predetermined time period from a sample time instant corresponding to the first sample input. The prediction layer may be a transducer or CNN-based neural network layer, as described in connection with fig. 2.

The predefined state ranges may include a predetermined position range and a predetermined speed range based on a vehicle state centered at a sample time. In some embodiments, the predefined state range may include a plurality of state points that are discretized, each state point in the plurality of state points having a position parameter and a velocity parameter within the respective state range. Wherein the predictive probability distribution includes a predictive probability output for each of a plurality of state points, wherein the predictive probability indicates a likelihood that the predicted state at the future time is a position parameter and a speed parameter corresponding to the vehicle state.

In step S408, a probability distribution for a first real vehicle state of a first sample input may be determined from the sample dataset.

A probability distribution of the first real vehicle state over a predefined state range may be determined based on a predefined probability distribution function. For example, a probability distribution of the first real vehicle state over a range of states may be calculated based on a gaussian distribution function. The standard deviation of the gaussian distribution function can be predetermined according to the actual situation.

The probability distribution of the first real vehicle state over the state range may be calculated based on formulas (2) - (4):

Z＝∑_i,j,k,mπ(i,j,k,m|s_t) (4)

Where i, j, k, m denotes an index number of a state point within the state range, x _i denotes a position parameter in the x direction of the state point (i, j, k, m), y _j denotes a position parameter in the y direction of the state point (i, j, k, m), v _i denotes a velocity parameter in the x direction of the state point (i, j, k, m), v _j denotes a velocity parameter in the y direction of the state point (i, j, k, m), and x _t、y_t、v_xt、v_yt is a position parameter and a velocity parameter in the x, y direction in the first real vehicle state s _t at the sample time t. p (i, j, k, m|s _t) represents the probability that the first real vehicle state is at a state point (i, j, k, m) within the state range.

In step S410, the autopilot model may be trained by adjusting parameters of the autopilot model, wherein the parameters are adjusted to reduce the difference in the predictive probability distribution and the probability distribution of the first real vehicle state. The prediction probability distribution output by the automatic driving model can be close to the marked real probability distribution through a supervised learning mode, so that the prediction capability of the automatic driving model is improved.

In some embodiments, the autopilot model may be trained further by means of reinforcement learning. For example, the predictive probability distribution output from the automatic driving model may be evaluated to obtain evaluation information of the predictive probability distribution. The prediction probability distribution can be processed by adopting a trained evaluation model or a manual evaluation mode to obtain a scoring result for the prediction result. Then, parameters of the autonomous driving model may be adjusted according to the evaluation information to reduce a difference in the predicted probability distribution and the probability distribution of the first real vehicle state.

The training target for reinforcement learning can be determined using equation (5):

where i, j, k, m denote index numbers of state points within the state range, Is the predictive probability of the state point (i, j, k, m), p (i, j, k, m|s _t) is the probability distribution of the first real vehicle state s _t at the state point (i, j, k, m), r _t is the predictive probability distributionIs a piece of evaluation information of (a). In the training target shown in the formula (5), the evaluation information r _t is configured as a weight of the KL distance, and the higher the evaluation score, the greater the contribution of the state point to the training target.

Fig. 5 illustrates an exemplary block diagram of an autopilot in accordance with an embodiment of the present disclosure. The autopilot 500 may be implemented using the autopilot model described in connection with fig. 2.

As shown in fig. 5, the automatic driving apparatus 500 may include a coding unit 510 and a prediction unit 520.

The encoding unit 510 may be configured to encode input information including at least perception information and a vehicle state, where the vehicle state includes a position coordinate and a speed vector of the vehicle, to obtain encoded input information.

The prediction unit 520 may be configured to process the encoded input information to obtain a predicted probability distribution of the vehicle state over a predefined state range at a future time after a predetermined time from the current time, wherein the predicted probability distribution is used to determine the predicted state of the vehicle to generate the control signal of the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between the probability distribution of the candidate state of the vehicle and the predicted probability distribution.

It should be appreciated that the various modules or units of the apparatus 500 shown in fig. 5 may correspond to the various steps in the method 300 described with reference to fig. 3. Thus, the operations, features and advantages described above with respect to method 300 apply equally to apparatus 500 and the modules and units comprised thereof. For brevity, certain operations, features and advantages are not described in detail herein.

Fig. 6 illustrates an exemplary block diagram of an apparatus for training the autopilot model described in connection with fig. 2, in accordance with an embodiment of the present disclosure.

As shown in fig. 6, the training apparatus 600 may include a sample data determining unit 610, an encoding unit 620, a prediction unit 630, a real target determining unit 640, and a training unit 650.

The sample data determination unit 610 may be configured to determine a sample data set for training the automatic driving model, wherein the sample data in the sample data set comprises a plurality of sample inputs, each sample input comprising perception information of a corresponding sample time instant and a vehicle state, wherein the vehicle state comprises a position coordinate and a velocity vector of the vehicle, and the sample data further comprises a annotated real vehicle state for each sample input, wherein the real vehicle state corresponds to a future time instant after a predetermined time period from the start of the sample.

The encoding unit 620 may be configured to encode the first sample input data with an encoding layer to obtain an encoded first sample input.

The prediction unit 630 may be configured to process the encoded first sample input with a prediction layer to obtain a predicted probability distribution over a predefined state range for the vehicle state at a future time instant after a predetermined time period from the sample time instant to which the first sample input corresponds.

The real target determination unit 640 is configured to determine a probability distribution of the first real vehicle state for the first sample input from the sample dataset.

The training unit 650 may be configured to train the autopilot model by adjusting parameters of the autopilot model, wherein the parameters are adjusted to reduce a variance of the predictive probability distribution and the probability distribution of the first real vehicle state.

It should be appreciated that the various modules or units of the apparatus 600 shown in fig. 6 may correspond to the various steps in the method 400 described with reference to fig. 4. Thus, the operations, features and advantages described above with respect to method 400 apply equally to apparatus 600 and the modules and units comprised thereof. For brevity, certain operations, features and advantages are not described in detail herein.

Although specific functions are discussed above with reference to specific modules, it should be noted that the functions of the various units discussed herein may be divided into multiple units and/or at least some of the functions of the multiple units may be combined into a single unit.

It should also be appreciated that various techniques may be described herein in the general context of software hardware elements or program modules. The various units described above with respect to fig. 5, 5 may be implemented in hardware or in hardware in combination with software and/or firmware. For example, the units may be implemented as computer program code/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these units may be implemented as hardware logic/circuitry. For example, in some embodiments, one or more of the units 510-530, 610-650 may be implemented together in a System on Chip (SoC). The SoC may include an integrated circuit chip including one or more components of a Processor (e.g., a central processing unit (Central Processing Unit, CPU), microcontroller, microprocessor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), etc.), memory, one or more communication interfaces, and/or other circuitry, and may optionally execute received program code and/or include embedded firmware to perform functions.

According to another aspect of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform an autopilot method according to an embodiment of the present disclosure.

According to another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform an automatic driving method according to an embodiment of the present disclosure.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements an autopilot method according to embodiments of the present disclosure.

According to another aspect of the present disclosure, there is also provided an autonomous vehicle including an autonomous device according to an embodiment of the present disclosure and one of the above-described electronic apparatuses. In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, there is also provided an electronic device, a readable storage medium and a computer program product.

Referring to fig. 7, a block diagram of an electronic device 700 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM702, and the RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to the electronic device 700, the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 708 may include, but is not limited to, magnetic disks, optical disks. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above, such as methods 200, 300. For example, in some embodiments, the methods 200, 300 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. One or more of the steps of the methods 200, 300 described above may be performed when a computer program is loaded into RAM 703 and executed by the computing unit 701. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the methods 200, 300 by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims

1. An autopilot model comprising:

an encoding layer configured to encode input information including at least perceptual information and a vehicle state, wherein the vehicle state includes a position coordinate and a velocity vector of a vehicle, to obtain encoded input information;

A prediction layer configured to process the encoded input information to obtain a predicted probability distribution of a vehicle state at a future time after a predetermined time from a current time over a predefined state range, wherein the predicted probability distribution is used to determine a predicted state of a vehicle to generate a control signal of the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between a probability distribution of a candidate state of the vehicle and the predicted probability distribution.

2. The autopilot model of claim 1 wherein the encoded input information includes an implicit representation of the perception information in a bird's eye view space.

3. The autopilot model of claim 1 wherein the coding layer and prediction layer are Transformer based neural network layers.

4. The autopilot model of claim 1 wherein the predefined range of conditions includes a predetermined range of positions and a predetermined range of speeds based on a vehicle condition centered about the current time of day.

5. The autopilot model of claim 4 wherein the predefined state range includes a discretized plurality of state points, each state point of the plurality of state points having a respective position parameter and speed parameter within the state range.

6. The autopilot model of claim 5 wherein the predictive probability distribution includes a predictive probability output for each of the plurality of status points, wherein the predictive probability indicates a likelihood that the predicted state at the future time instant is a location parameter and a speed parameter corresponding to that status point.

7. The autopilot model of claim 1 wherein the predicted state includes a vehicle position and speed of the vehicle at the future time instant, control signals of the vehicle being generated by:

Determining the predicted state by minimizing a difference between a probability distribution of a candidate state of the vehicle and the predicted probability distribution;

Based on the reverse dynamics model, a control signal for the vehicle from the current time instant to the future time instant is determined from the predicted position and the predicted speed of the vehicle at the future time instant and the current position and the current speed at the current time instant.

8. The autopilot model of claim 1 wherein the predicted state includes a vehicle position and speed of the vehicle at the future time instant based on a forward dynamics model represented by control signals for the vehicle from the current time instant to the future time instant, the control signals for the vehicle generated by:

a control signal for the vehicle from the current time to the future time is determined based on the predicted state.

9. An autopilot method implemented using the autopilot model of any one of claims 1-8, comprising:

encoding input information comprising at least perceptual information and a vehicle state, wherein the vehicle state comprises a position coordinate and a velocity vector of the vehicle, to obtain encoded input information;

Processing the encoded input information to obtain a predicted probability distribution of a vehicle state at a future time after a predetermined time from a current time over a predefined state range, wherein the predicted probability distribution is used to determine a predicted state of a vehicle to generate a control signal for the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between a probability distribution of candidate states of the vehicle and the predicted probability distribution.

10. The autopilot method of claim 9 wherein the encoded input information is an implicit representation of the input information in a bird's eye view space.

11. The automatic driving method of claim 9, wherein the predefined state range includes a predetermined position range and a predetermined speed range based on a vehicle state centered on the current time.

12. The autopilot method of claim 11 wherein the predefined state range includes a discretized plurality of state points, each state point of the plurality of state points having a respective position parameter and speed parameter within the state range.

13. The automatic driving method of claim 12, wherein the predictive probability distribution includes a predictive probability output for each of the plurality of state points, wherein the predictive probability indicates a likelihood that the predicted state at the future time instant is a position parameter and a speed parameter corresponding to the vehicle state.

14. The automated driving method of claim 9, wherein the predicted state comprises a vehicle position and speed of the vehicle at the future time,

The control signal of the vehicle is generated by:

15. The automatic driving method of claim 9, wherein the predicted state comprises a vehicle position and speed of the vehicle at the future time instant based on a forward dynamics model represented by a control signal for the vehicle from the current time instant to the future time instant,

The control signal of the vehicle is generated by:

16. A method of training the autopilot model of any one of claims 1-8, comprising:

Determining a sample data set for training an autopilot model, wherein sample data in the sample data set comprises a plurality of sample inputs, each sample input comprising perception information for a corresponding sample time instant and a vehicle state, wherein the vehicle state comprises a position coordinate and a velocity vector of a vehicle, the sample data further comprising a annotated real vehicle state for each sample input, wherein the real vehicle state corresponds to a future time instant after a predetermined length of time from the sample;

encoding the first sample input data with an encoding layer to obtain an encoded first sample input;

processing the encoded first sample input with a prediction layer to obtain a predicted probability distribution over a predefined state range for a vehicle state at a future time instant after a predetermined time period from a sample time instant corresponding to the first sample input;

determining a probability distribution of a first real vehicle state for the first sample input from the sample dataset;

training the autopilot model by adjusting parameters of the autopilot model, wherein the parameters are adjusted to reduce a variance in the predictive probability distribution and the probability distribution of the first real vehicle state.

17. The method of claim 16, wherein the encoded first sample input is an implicit representation of the first sample input in a bird's eye view space.

18. The method of claim 16, wherein the coding layer and prediction layer are Transformer-based neural network layers.

19. The method of claim 16, wherein the predefined range of conditions includes a predetermined range of positions and a predetermined range of speeds based on a vehicle condition centered on the sample time.

20. The method of claim 19, wherein the predefined state range includes a discretized plurality of state points, each state point of the plurality of state points having a respective position parameter and velocity parameter within the state range.

21. The method of claim 20, wherein the predictive probability distribution includes a predictive probability output for each of the plurality of state points, wherein the predictive probability indicates a likelihood that the predicted state at the future time instant is a position parameter and a speed parameter corresponding to the vehicle state.

22. The method of claim 16, wherein training the autopilot model by adjusting parameters of the autopilot model comprises:

Determining evaluation information of the prediction probability distribution;

And adjusting parameters of the automatic driving model according to the evaluation information by using a reinforcement learning mode so as to reduce the difference between the prediction probability distribution and the probability distribution of the first real vehicle state.

23. An autopilot device implemented using the autopilot model of any one of claims 1-8, comprising:

an encoding unit configured to encode input information including at least perception information and a vehicle state, wherein the vehicle state includes a position coordinate and a speed vector of a vehicle, to obtain encoded input information;

A prediction unit configured to process the encoded input information to obtain a predicted probability distribution of a vehicle state at a future time after a predetermined time from a current time over a predefined state range, wherein the predicted probability distribution is used to determine a predicted state of a vehicle to generate a control signal of the vehicle, wherein the predicted state of the vehicle is determined by minimizing a difference between a probability distribution of a candidate state of the vehicle and the predicted probability distribution.

24. An apparatus for training the autopilot model of any one of claims 1-8, comprising:

A sample data determination unit configured to determine a sample data set for training an automatic driving model, wherein sample data in the sample data set comprises a plurality of sample inputs, each sample input comprising perception information of a corresponding sample time instant and a vehicle state, wherein the vehicle state comprises a position coordinate and a velocity vector of a vehicle, the sample data further comprising a annotated real vehicle state for each sample input, wherein the real vehicle state corresponds to a future time instant after a predetermined time period from the sample start;

An encoding unit configured to encode the first sample input data with an encoding layer to obtain an encoded first sample input;

a prediction unit configured to process the encoded first sample input with a prediction layer to obtain a predicted probability distribution over a predefined state range of vehicle states at a future time instant after a predetermined time period from a sample time instant corresponding to the first sample input;

a real target determination unit configured to determine a probability distribution of a first real vehicle state for the first sample input from the sample data set;

a training unit configured to train the autonomous driving model by adjusting parameters of the autonomous driving model, wherein the parameters are adjusted to reduce a difference in the predicted probability distribution and the probability distribution of the first real vehicle state.

25. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein the method comprises the steps of

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 9-22.

26. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 9-22.

27. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 9-22.

28. An autonomous vehicle comprising:

One of the autopilot of claim 23, the electronic device of claim 25.