CN114283576A

CN114283576A - Vehicle intention prediction method and related device

Info

Publication number: CN114283576A
Application number: CN202011045331.9A
Authority: CN
Inventors: 范时伟; 李飞; 李向旭
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2022-04-05
Anticipated expiration: 2040-09-28
Also published as: CN114283576B

Abstract

The embodiment of the application provides a vehicle intention prediction method and a related device, wherein the method comprises the following steps: acquiring an intention prediction model, wherein the intention prediction model is used for predicting the intention of a vehicle to travel to each direction of an intersection; rasterizing and encoding the target driving data to obtain target image samples, wherein different types of data in the target driving data and data corresponding to different outlet directions are encoded into different data channels in one image sample respectively; and inputting the target image sample into the intention prediction model to obtain the intention probability of the second vehicle driving towards each direction of the intersection. By adopting the embodiment of the application, the intention prediction accuracy of the vehicle on the complex road condition can be improved.

Description

Vehicle intention prediction method and related device

Technical Field

The invention relates to the technical field of vehicle networking, in particular to a vehicle intention prediction method and a related device.

Background

With the development of intelligent driving technology, intelligent vehicles become the target of key research of various manufacturers. The intelligent driving comprises auxiliary driving and automatic driving, and the key technology of the realization is as follows: positioning, sensing, predicting, planning control, etc. The vehicle intention prediction means that the behavior intention of the target vehicle is predicted according to the target vehicle state at the current time and the historical time, for example, the vehicle intention can comprise lane keeping, left lane changing, right lane changing and the like in an on-road scene, and the vehicle intention can comprise straight running, left turning, right turning, turning around and the like in an intersection scene. In urban roads, the vehicle intention can be predicted accurately and reliably in real time, the vehicle can be helped to predict the traffic condition in front, the traffic situation around the vehicle can be established, the importance judgment of other vehicles around the vehicle is facilitated, interactive key target vehicles are screened, the vehicle can be conveniently subjected to path planning in advance, and the vehicle can safely pass through complex intersection scenes.

In recent years, with the rapid development of the field of automatic driving brought by the development of sensor hardware and the advancement of algorithms, the research on vehicle trajectory prediction is gradually increased, and the current main prediction methods include:

(1) the rule-based trajectory prediction method comprises the following steps: the method mainly utilizes a motion model such as a uniform speed and uniform acceleration model to predict the motion of a target vehicle in the future according to the information such as the position and the speed of the predicted target vehicle, and then predicts a route which is possibly driven by the target vehicle for a long time in the future by combining the information of a high-precision map such as a lane adjacent to the target vehicle, a subsequent lane connected with the adjacent lane, the steering information of the lane and the like. Due to the fact that an actual road is complex, the method is large in engineering quantity, all situations are difficult to traverse, and the method depends heavily on the sensing precision and the accuracy of a high-precision map.

(2) The method based on probability statistics comprises the following steps: the method extracts the driving state of the target vehicle, such as speed, acceleration, angular velocity and the like, and predicts the intention of the target vehicle, such as straight-ahead driving, steering and the like, by using a Hidden Markov Model (HMM), a Dynamic Bayesian Network (DBN) and other models. Such models have limited representation capabilities and thus have poor generalization performance, and the output intent to turn is not practical in some complex intersections.

(3) Method based on machine learning: the method firstly extracts characteristics of a series of target vehicles, such as vehicle shapes, motion states, attributes of surrounding lanes, correlation relations and the like, and realizes the classification of steering intentions by using methods such as a Multi Layer Perceptron (MLP) and a k-means clustering algorithm (Kmeans). Such methods require manual design features, have limited representation capability in complex scenarios, and output steering intent is not practical in some complex intersections.

It can be seen that the current prediction method has difficulty in accurately predicting the driving intention of the target vehicle under some complex intersection conditions.

Disclosure of Invention

The embodiment of the application discloses a vehicle intention prediction method and a related device, which can improve the intention prediction accuracy of a vehicle at a complex intersection, have good prediction generalization and can be suitable for prediction of intersections with various exit numbers.

In a first aspect, an embodiment of the present application provides a training method for a vehicle intention prestoring model, where the method includes:

acquiring a plurality of pieces of driving data, wherein each piece of driving data comprises first reference information and a first coordinate value of a track when a first vehicle drives through an intersection, and the first reference information comprises a second coordinate value of a virtual lane line in each exit direction of the intersection and/or a third coordinate value of an exit area in each exit direction of the intersection; the origin of the coordinate system where the second coordinate value of the virtual lane line is located and the origin of the coordinate system where the third coordinate value of the exit area is located are both the starting points of the track;

performing rasterization encoding on the multiple pieces of driving data to obtain multiple image samples, wherein different types of data in one piece of driving data and data corresponding to different outlet directions are encoded into different data channels in one image sample respectively;

training a vehicle intention prediction model through the plurality of sample data, wherein each sample data in the plurality of sample data comprises an image sample in the plurality of sample images and a piece of label information, and the piece of label information is used for representing the real condition that a first vehicle in the image sample travels to the exit in the exit.

It should be noted that, a conventional image generally has 3 data channels, which respectively represent three primary colors of red, green, and blue, and the conventional method encodes different information using different colors, and each encoding color is actually encoded on three data channels simultaneously. The embodiment of the application has no concept of coding colors, but codes different types of information and information corresponding to different exit directions to separate data channels, avoids overlapping of complex information, has no intercrossing and interference among different information, and is easier to learn the potential information of different elements, so that the prediction precision for the vehicle intention is higher. Moreover, because the data channels are coded according to different types of information and information corresponding to different exit directions, the data channels with corresponding number can be set according to actual needs no matter the number of exits at the intersection is large or small, simple or complex, and the data channels are flexible; meanwhile, the characteristics are not required to be designed, deep logic is learned through a neural network, and various complex scenes can be covered. In addition, because the data channels are divided according to the exit directions, the characteristics of each exit direction can be accurately trained, so that the driving intention probability of each exit direction can be predicted, the method is equivalent to a classification frame with variable quantity, and the method can be suitable for intersections with different exit quantities.

With reference to the first aspect, in a possible implementation manner of the first aspect, the rasterizing and encoding the multiple pieces of driving data to obtain multiple image samples includes:

determining a first scaling factor according to the specification of the reference image and the starting point of the track in the first traveling data, wherein the first traveling data is any one of the plurality of pieces of traveling data;

carrying out scaling processing on the first coordinate value of the track in the first running data and the first reference information according to the first scaling factor to obtain a fourth coordinate value and second reference information;

generating an image sample corresponding to the first travel data, wherein the image sample comprises a first data channel on which a location fill value represented by the fourth coordinate value is represented and at least one second data channel on which a location fill value represented by the second reference information is represented; in this way, by filling data in the fourth coordinate value of the first data channel and filling numerical values in the second reference information of the second data channel, the relevant information of the track, the relevant information of the virtual lane line and the relevant information of the exit area are effectively encoded into the image; and moreover, the scaling factor is introduced to ensure that the related data is completely embodied into the image without overflow.

On the first data channel, the value filled in the position represented by the fourth coordinate value obtained by scaling the first coordinate value which corresponds to the later time is larger; on the second data channel, the value filled at the position represented by the second reference information obtained by scaling the first reference information closer to the outlet area or the central position of the outlet area is larger; it should be noted that, the distance between the corresponding coordinate value and the track end point (generally close to the exit), or the distance between the corresponding coordinate value and the center position of the exit area, or the exit area, is represented by setting the numerical value filled in the data channel to be gradually changed, that is, the distance between the vehicle and the track end point, or the exit area, or the center position of the exit area is represented by brightness or color depth enhancement, so that the recognition capability of the coordinate close to the track end point, or the exit area, or the center position of the exit area is enhanced, and therefore, the accuracy of the finally obtained intention prediction model is higher.

In addition, if the first reference information includes the second coordinate value and a third coordinate value, the second reference information includes a fifth coordinate value and a sixth coordinate value, the fifth coordinate value is obtained by scaling the second coordinate value by the first scaling factor, and the sixth coordinate value is obtained by scaling the third coordinate value by the first scaling factor; wherein the fifth coordinate value and the sixth coordinate value are respectively used for filling data in different second data channels; second reference information obtained by scaling the first reference information corresponding to different exit directions is respectively used for filling data in different second data channels; it can be understood that the fifth coordinate value related to the virtual lane line and the sixth coordinate value related to the exit area are respectively reflected to different data channels, and information in different exit directions is respectively reflected to different data channels, so that mutual interference of different information is more effectively avoided.

With reference to the first aspect, or any one of the foregoing possible implementations of the first aspect, in yet another possible implementation of the first aspect,

the travel data further includes a travel speed of the first vehicle on the one track, the one image sample further includes a third data channel, and a position represented by the fourth coordinate value on the third data channel fills the travel speed;

and/or the presence of a gas in the gas,

the driving data further includes a plurality of relative driving directions of the first vehicle on the track, and the image sample further includes a plurality of fourth data channels, and a position represented by the fourth coordinate value on each fourth data channel is filled with information related to one of the relative driving directions, wherein the relative driving directions are driving directions of different exit directions relative to the intersection respectively.

In this way, the factors of the driving speed and/or the driving direction are introduced into the prediction of the intention prediction model, so that the steering and movement trends of the vehicle can be more comprehensively learned, and the prediction accuracy of the driving intention is further improved.

With reference to the first aspect, or any one of the foregoing possible implementations of the first aspect, in a further possible implementation of the first aspect, the first scaling factor satisfies the following relationship:

wherein scale is the first scaling factor, h is the height of the reference image, w is the width of the reference image, | P, G_i|' represents the starting point P of the trajectory in the first travel data and the i-th exit area G of the intersection_iThe distance of (d);

wherein i is an integer between 1 and N, and N is the number of outlet areas of the intersection.

In the calculation mode, the numerator takes the minimum value of the image height and the image width, and the moving range of the vehicle takes the area as large as possible of the intersection area, so that after the scale calculated in the mode is used for scaling the coordinate values such as the first coordinate value, the second coordinate value, the third coordinate value and the like, the coordinate values basically do not exceed the coordinate value range of the image, and the information can be ensured not to be lost after the rasterization coding.

With reference to the first aspect, or any one of the foregoing possible implementations of the first aspect, in yet another possible implementation of the first aspect, the method further includes:

sending the intent prediction model to a prediction device.

The intention prediction model is sent to the prediction device, the prediction device predicts the intention of the vehicle, and the prediction result can help the vehicle to reasonably control driving and improve driving safety.

In a second aspect, an embodiment of the present application provides a vehicle intention prediction method, including:

acquiring an intention prediction model, wherein the intention prediction model is used for predicting the intention of a vehicle to travel to each direction of an intersection;

acquiring target driving data, wherein the target driving data comprises first reference information and a first coordinate value of a track when a second vehicle drives through the intersection, and the first reference information comprises a second coordinate value of a virtual lane line in each exit direction of the intersection and/or a third coordinate value of an exit area in each exit direction of the intersection; the origin of the coordinate system where the second coordinate value of the virtual lane line is located and the origin of the coordinate system where the third coordinate value of the exit area is located are both the starting points of the track;

rasterizing and encoding the target driving data to obtain target image samples, wherein different types of data in the target driving data and data corresponding to different outlet directions are encoded into different data channels in one image sample respectively;

and inputting the target image sample into the intention prediction model to obtain the intention probability of the second vehicle driving towards each direction of the intersection.

With reference to the second aspect, in a possible implementation manner of the second aspect, the obtaining an intention prediction model includes:

receiving an intention prediction model transmitted by a model training device, wherein the intention prediction model is trained on a plurality of image samples obtained by rasterizing and coding a plurality of pieces of driving data, and the data format of each piece of driving data is the same as that of the target driving data.

With reference to the second aspect or any one of the foregoing possible implementation manners of the second aspect, in a further possible implementation manner of the second aspect, the rasterizing and encoding the target driving data to obtain a target image sample includes:

determining a second scaling factor according to the specification of the reference image and the starting point of the track in the target driving data;

carrying out scaling processing on the first coordinate value of the track in the target driving data and the first reference information according to the second scaling factor to obtain a fourth coordinate value and second reference information;

generating a target image sample corresponding to the target driving data, wherein the target image sample comprises a first data channel and at least one second data channel, a location filling value represented by the fourth coordinate value on the first data channel, and a location filling value represented by the second reference information on the at least one second data channel; in this way, by filling data in the fourth coordinate value of the first data channel and filling numerical values in the second reference information of the second data channel, the relevant information of the track, the relevant information of the virtual lane line and the relevant information of the exit area are effectively encoded into the image; and moreover, the scaling factor is introduced to ensure that the related data is completely embodied into the image without overflow.

On the first data channel, the value filled in the position represented by the fourth coordinate value obtained by scaling the first coordinate value which corresponds to the later time and the later time is larger; on the second data channel, the value filled at the position represented by the second reference information obtained by scaling the first reference information closer to the outlet area or the central position of the outlet area is larger; it should be noted that, the distance between the corresponding coordinate value and the track end point (generally close to the exit), or the distance between the corresponding coordinate value and the center position of the exit area, or the exit area, is represented by setting the numerical value filled in the data channel to be gradually changed, that is, the distance between the vehicle and the track end point, or the exit area, or the center position of the exit area, or the exit area, is represented by brightness or color depth, so that the recognition capability of the coordinate close to the track end point, or the center position of the exit area, is enhanced, and therefore, the accuracy of the finally obtained intention prediction model is higher.

In addition, if the first reference information includes the second coordinate value and the third coordinate value, the second reference information includes a fifth coordinate value and a sixth coordinate value, the fifth coordinate value is obtained by scaling the second coordinate value by the second scaling factor, and the sixth coordinate value is obtained by scaling the third coordinate value by the second scaling factor; wherein the fifth coordinate value and the sixth coordinate value are respectively used for filling data in different second data channels; second reference information obtained by scaling the first reference information corresponding to different exit directions is respectively used for filling data in different second data channels; it can be understood that the fifth coordinate value related to the virtual lane line and the sixth coordinate value related to the exit area are respectively reflected to different data channels, and information in different exit directions is respectively reflected to different data channels, so that mutual interference of different information is more effectively avoided.

With reference to the second aspect, or any one of the foregoing possible implementations of the second aspect, in yet another possible implementation of the second aspect:

the travel data further includes a travel speed of the second vehicle on the one track, the target image sample further includes a third data channel, and a position represented by the fourth coordinate value on the third data channel fills the travel speed;

and/or the presence of a gas in the gas,

the driving data further includes a plurality of relative driving directions of the first vehicle on the track, and the image sample further includes a plurality of fourth data channels, and a position represented by the fourth coordinate value on each fourth data channel is filled with information related to one of the driving directions, wherein the plurality of relative driving directions are driving directions relative to different exit directions of the intersection respectively.

In a third aspect, an embodiment of the present application provides an image data processing method, including:

and sending the plurality of sample data to a model training device, wherein each sample data in the plurality of sample data comprises one image sample in the plurality of sample images and one piece of label information, and the one piece of label information is used for representing the real intention situation of the first vehicle in the one image sample to drive to the exit in the various exits.

It should be noted that, a conventional image generally has 3 data channels, which respectively represent three primary colors of red, green, and blue, and the conventional method encodes different information using different colors, and each encoding color is actually encoded on three data channels simultaneously. The embodiment of the application has no concept of coding colors, but different types of information and information corresponding to different exit directions are coded to separate data channels, so that overlapping of complex information is avoided, intercrossing and interference do not exist between different information, and when the information is subsequently used for model training, potential information of different elements can be learned more easily, so that the prediction precision for vehicle intentions is higher. Moreover, because the data channels are coded according to different types of information and information corresponding to different exit directions, the data channels with corresponding number can be set according to actual needs no matter the number of exits at the intersection is large or small, simple or complex, and the data channels are flexible; meanwhile, the characteristics are not required to be designed, deep logic is learned through a neural network, and various complex scenes can be covered. In addition, since the data channel is divided according to the exit direction, when the sample data is subsequently used for model training, the characteristics of each exit direction can be accurately trained, so that the driving intention probability of each exit direction can be predicted, and the method is equivalent to a variable number of classification frames and can be suitable for intersections with different exit numbers.

In a fourth aspect, an embodiment of the present application provides a training apparatus for a vehicle intention prediction model, where the apparatus includes:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of pieces of driving data, each piece of driving data comprises first reference information and a first coordinate value of a track when a first vehicle drives through a target intersection, and the first reference information comprises a second coordinate value of a virtual lane line in each exit direction of the intersection and/or a third coordinate value of an exit area in each exit direction of the intersection; the origin of the coordinate system where the second coordinate value of the virtual lane line is located and the origin of the coordinate system where the third coordinate value of the exit area is located are both the starting points of the track;

the encoding unit is used for carrying out rasterization encoding on the plurality of pieces of driving data to obtain a plurality of image samples, wherein different types of data in one piece of driving data and data corresponding to different outlet directions are encoded into different data channels in one image sample respectively;

and the training unit is used for training a vehicle intention prediction model through the plurality of sample data, wherein each sample data in the plurality of sample data comprises an image sample in the plurality of sample images and a piece of label information, and the piece of label information is used for representing the real situation that a first vehicle in the image sample travels to the exit in the exit.

With reference to the fourth aspect, in a possible implementation manner of the fourth aspect, in performing rasterization encoding on the multiple pieces of driving data to obtain multiple image samples, the encoding unit is specifically configured to:

On the first data channel, the value filled in the position represented by the fourth coordinate value obtained by scaling the first coordinate value which corresponds to the later time is larger; on the second data channel, the value filled at the position represented by the second reference information obtained by scaling the first reference information closer to the outlet area or the central position of the outlet area is larger; it should be noted that, the distance between the corresponding coordinate value and the track end point (generally close to the exit), or the distance between the corresponding coordinate value and the center position of the exit area, or the exit area, is represented by setting the numerical value filled in the data channel to be gradually changed, that is, the distance between the vehicle and the track end point, or the exit area, or the center position of the exit area, or the exit area, is represented by brightness or color depth, so that the recognition capability of the coordinate close to the track end point, or the center position of the exit area, is enhanced, and therefore, the accuracy of the finally obtained intention prediction model is higher.

With reference to the fourth aspect, or any one of the foregoing possible implementations of the fourth aspect, in yet another possible implementation of the fourth aspect:

and/or the presence of a gas in the gas,

With reference to the fourth aspect or any one of the foregoing possible implementations of the fourth aspect, in a further possible implementation of the fourth aspect, the first scaling factor satisfies the following relationship:

With reference to the fourth aspect or any one of the foregoing possible implementation manners of the fourth aspect, in a further possible implementation manner of the fourth aspect, the method further includes:

a transmitting unit for transmitting the intention prediction model to a prediction apparatus.

In a fifth aspect, an embodiment of the present application provides a vehicle intention prediction apparatus, including:

a first acquisition unit configured to acquire an intention prediction model for predicting an intention of a vehicle to travel in each direction of an intersection;

a second acquisition unit configured to acquire target travel data, wherein the target travel data includes first reference information and a first coordinate value of a trajectory when a second vehicle travels through the intersection, and wherein the first reference information includes a second coordinate value of a virtual lane line in each exit direction of the intersection and/or a third coordinate value of an exit area in each exit direction of the intersection; the origin of the coordinate system where the second coordinate value of the virtual lane line is located and the origin of the coordinate system where the third coordinate value of the exit area is located are both the starting points of the track;

the encoding unit is used for carrying out rasterization encoding on the target driving data to obtain target image samples, wherein different types of data in the target driving data and data corresponding to different outlet directions are respectively encoded into different data channels in the image sample;

and the prediction unit is used for inputting the target image sample into the intention prediction model to obtain the intention probability of the second vehicle driving towards each direction of the intersection.

With reference to the fifth aspect, in a possible implementation manner of the fifth aspect, in terms of obtaining a target image sample by performing rasterization encoding on the target driving data, the encoding unit is specifically configured to:

With reference to the fifth aspect or any one of the foregoing possible implementations of the fifth aspect, in yet another possible implementation of the fifth aspect:

and/or the presence of a gas in the gas,

In a sixth aspect, an embodiment of the present application provides an image data processing apparatus, including:

and the sending unit is used for sending the plurality of sample data to the model training device, wherein each sample data in the plurality of sample data comprises one image sample in the plurality of sample images and one piece of label information, and the one piece of label information is used for representing the actual intention of the first vehicle in the one image sample to travel to the exit in the exits.

In a seventh aspect, an embodiment of the present application provides a model generating apparatus, where the model generating apparatus includes a processor and a memory, where the memory stores a computer program, and the computer program runs on the processor and is the method described in implementing the first aspect or any possible implementation manner of the first aspect.

In an eighth aspect, the present application provides a prediction device, which includes a processor and a memory, where the memory stores a computer program, and the computer program runs on the processor in the method described in the second aspect or any possible implementation manner of the second aspect. The prediction device may be a vehicle, or a virtual machine deployed in the cloud, or a server cluster formed by multiple servers.

In a ninth aspect, the present application provides a target device, where the target device includes a processor and a memory, where the memory stores a computer program, and the computer program runs on the processor and is the method described in implementing any possible implementation manner of the third aspect or the second aspect. The target device may be a vehicle, or a virtual machine deployed in the cloud, or one server, or a server cluster formed by multiple servers.

In a tenth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, which, when run on a processor, implements the method described in the first aspect or any one of the possible implementations of the first aspect, or implements the method described in the second aspect or any one of the possible implementations of the second aspect, or implements the method described in any one of the possible implementations of the third aspect.

By implementing the embodiment of the present application, it should be noted that, a conventional image generally has 3 data channels, which respectively represent three primary colors of red, green, and blue, and a conventional method encodes different information using different colors, and each encoding color is actually encoded on three data channels simultaneously. The embodiment of the application has no concept of coding colors, but codes different types of information and information corresponding to different exit directions to separate data channels, avoids overlapping of complex information, has no intercrossing and interference among different information, and is easier to learn the potential information of different elements, so that the prediction precision for the vehicle intention is higher. Moreover, because the data channels are coded according to different types of information and information corresponding to different exit directions, the data channels with corresponding number can be set according to actual needs no matter the number of exits at the intersection is large or small, simple or complex, and the data channels are flexible; meanwhile, the characteristics are not required to be designed, deep logic is learned through a neural network, and various complex scenes can be covered. In addition, because the data channels are divided according to the exit directions, the characteristics of each exit direction can be accurately trained, so that the driving intention probability of each exit direction can be predicted, the method is equivalent to a classification frame with variable quantity, and the method can be suitable for intersections with different exit quantities.

Drawings

The drawings used in the embodiments of the present application are described below.

FIG. 1 is a scene schematic diagram of multi-modal intent prediction of a vehicle according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a neural network structure of a vehicle prediction model provided by an embodiment of the present application;

FIG. 3 is a schematic control flow chart of an automatic driving provided by an embodiment of the present application;

fig. 4 is a schematic processing flow diagram of driving data provided by an embodiment of the present application;

FIG. 5 is a schematic flow chart diagram illustrating a method for multi-modal vehicle intent prediction according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an algorithm framework for model training according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a prediction apparatus provided in an embodiment of the present application;

fig. 9 is a schematic structural diagram of a target device provided in an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a training apparatus for pre-storing a vehicle intention model according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a vehicle intention prediction apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an image data processing apparatus according to an embodiment of the present application.

Detailed Description

The embodiments of the present application will be described below with reference to the drawings.

The inventor of the present application finds that there still exist some problems in the prior art when training a vehicle intention prediction model by a deep learning-based method, such as:

1) the coding efficiency is low: the method adopts a non-fixed coordinate system, namely the position and the orientation of the target vehicle at the current moment are taken as coordinate references, and each frame of data needs to be subjected to rasterization coding independently, so that the time consumption is high.

2) Insufficient generalization performance: the method converts the target vehicle intention prediction problem into a twelve-element classification problem, the twelve-element classification corresponds to different sectors under a target vehicle coordinate system, and in practical application, the target vehicle exit does not necessarily fall in a certain sector, and a plurality of exits may fall in the same sector area.

3) The encoding scheme is not conducive to CNN learning: according to the method, the lane line information in the target position and the high-precision map is coded into the same image, the information amount is large, the structure is complex, more cross areas exist, the CNN learning difficulty is high, a more complex CNN model is generally needed, and higher operation cost is brought.

4) The intent multi-modal problem cannot be solved: the multi-modal problem, which is a case where the intention is blurred in the intention prediction problem, cannot be solved because the above method converts the multi-modal intention prediction problem into a specific twelve-element classification problem.

In view of the above problems of the deep learning-based vehicle intention method, the inventors of the present application have proposed a vehicle multi-modal intention prediction method that can overcome the defects of low coding efficiency, insufficient generalization performance, unfavorable CNN learning of the coding scheme, and incapability of solving the intention multi-modal problem, and will now describe in detail the vehicle multi-modal intention prediction method.

Referring to fig. 1, fig. 1 is a schematic view of a scene for multi-modal vehicle intention prediction, where the scene includes a model training device 101 and a vehicle 102, and a communication connection is established between the vehicle 102 and the model training device 101, where the communication connection is not limited here, and may be a connection using a wireless communication technology or a connection combining a wireless communication technology and a wired communication technology, for example, the vehicle 102 is connected to a signal relay station (e.g., a base station) through the wireless communication technology, and the relay station is connected to the model training device 101 through the wired communication technology.

The Wireless communication technology may be The second Generation mobile communication technology (The 2nd-Generation, 2G), The third Generation mobile communication technology (The 3rd-Generation, 3G), Long Term Evolution (LTE), The fourth Generation mobile communication technology (The 4th Generation mobile communication, 4G), The fifth Generation mobile communication technology (The 5th-Generation, 5G), or Wireless-Fidelity (Wi-Fi) technology, or bluetooth technology, or ZigBee technology, or other existing communication technology, or communication technology developed later, and so on.

The model training device 101 may be a device with high computing power, such as a server, a server cluster composed of multiple servers, or a cloud-based virtual machine. The model training apparatus 101 trains the travel data of the vehicle by a convolutional neural network, thereby obtaining an intention prediction model for predicting the travel intention of the vehicle.

The structure of the convolutional neural network may include various types, as shown in fig. 2, which illustrates a convolutional neural network structure provided by some embodiments, the input of the convolutional neural network is a channel number image height image width tensor, in fig. 2, cov denotes a convolutional layer, 7 denotes the size of a convolution kernel, "norm relu" denotes that data passes through batch normalization and then through relu, and posing denotes a pooling layer. The flatten denotes unfolding the tensor into a one-dimensional vector, and the fully connect denotes the fully connected layer. The convolutional neural network used in the present application may be combined as needed based on the existing modules of the convolutional neural network (for example, the modules in fig. 2, or convolution kernels of other sizes) in the deep learning field, and is not limited to the specific architecture in fig. 2.

The vehicle 102 may be an onboard device, such as an automobile, bicycle, electric vehicle, or the like. The travel data input to the model training device 101 for training may be derived from a plurality of pieces of travel data generated by one vehicle at different times, or may be different pieces of travel data generated by a plurality of vehicles. After the model training device 101 trains the intention prediction model, when a piece of travel data generated by a certain vehicle is input to the model training device 101, the travel intention of the vehicle can be predicted.

In the embodiment of the application, after the vehicle 102 acquires the driving data, the driving data may be sent to the model training device 101, after the model training device 101 receives a certain amount of driving data, an intention prediction model is generated based on the certain amount of driving data, and then the intention prediction model is sent to the prediction device, wherein the prediction device may be a vehicle, a virtual machine deployed in the cloud, or a server cluster composed of a plurality of servers; for example, when the prediction device is a vehicle, the vehicle predicts the intention of the vehicle through the intention prediction model, and then performs subsequent control decision based on the intention of the vehicle. For another example, when the prediction device is a virtual machine deployed in the cloud, after the cloud virtual machine predicts the intention of the vehicle through the intention prediction model, the intention information of the vehicle may be sent to the vehicle, so that the vehicle may make subsequent control decisions.

It is understood that the vehicle 102 transmitting the travel data may be the same vehicle as the vehicle 102 making the control decision based on the intention information, or may be a different vehicle. The scene shown in fig. 1 is illustrated by taking 3 vehicles as an example, and the three vehicles may be vehicles which appear at the intersection at the same time or vehicles which appear at the intersection at different times.

In the embodiment of the present application, when the travel intention of the vehicle is predicted by the vehicle based on the intention prediction model, the travel intention of the other vehicle may be predicted specifically for the vehicle, or the travel intention of the own vehicle may be predicted for the vehicle (in this case, it may be considered that the own vehicle plans a route for the own vehicle). The method can be used in many scenes for timely and accurately predicting the driving intention of the vehicle, and by taking an automatic driving scene as an example, judgment can be made in a dangerous scene and emergency safety measures can be taken to guarantee the safety of the vehicle and avoid collision by predicting the driving intention of the vehicle.

As shown in fig. 3, in an automatic driving scenario, after obtaining a trained intent prediction model, a vehicle senses historical behavior information of another vehicle through a sensor (such as a camera, a radar, etc.), then information of a high-precision map of a current intersection area of the vehicle is fused with the sensed historical behavior information of the other vehicle, and then the fused data is input to the intent prediction model to predict driving intent of the other vehicle, so that a route is planned according to the driving intent of the other vehicle, which is beneficial to driving navigation and vehicle control.

As shown in fig. 4, in the prediction link, the extraction of coded information is involved, and the extracted information mainly comes from two aspects, namely, on one hand, the collected historical behavior information generated in the driving process of the vehicle, and on the other hand, the information includes road structure information about intersections in a high-precision map; after extracting the information, the information is encoded, for example, information related to a vehicle travel track in the history behavior information is encoded, information such as a virtual lane line and an exit area of an intersection in a high-precision map is encoded (may be referred to as road structure encoding), and finally, an image obtained by encoding, that is, a target image sample is input to the previously obtained intention prediction model, and the intention probability is estimated (that is, prediction).

For better understanding of the scheme of the embodiment of the present application, the method flow of the present application is described below with reference to fig. 5.

Referring to fig. 5, fig. 5 is a schematic flowchart of a vehicle multi-modal intention prediction method provided by an embodiment of the present application, where the method is applied to the scenario shown in fig. 1, and the method includes, but is not limited to, the following steps:

step S501: the model training apparatus acquires a plurality of pieces of travel data.

Each piece of driving data includes first reference information and a first coordinate value of a track when a first vehicle drives through an intersection, that is, the plurality of pieces of driving data include the first reference information and the first coordinate value, however, different pieces of driving data may include the same or different first reference information, and different pieces of driving data may include the same or different first coordinate values, because for any piece of driving data, what the first reference information and the first coordinate value included in the piece of driving data are specifically determined by the driving behavior of the first vehicle described by the driving data, and the driving behaviors of different first vehicles are often different from each other.

The specific intersection is not limited, and may be any intersection where vehicles can travel. Optionally, the first vehicles described by the plurality of pieces of driving data may be the same vehicle or different vehicles, and the trajectories to which the first coordinate values included in the plurality of pieces of driving data belong may be from the same intersection or from different intersections.

First, for any piece of driving data, the first coordinate value of a track included therein usually includes a plurality of coordinate values, for example, the driving track is sampled, and the sampled coordinate value is the first coordinate value.

Optionally, the origin of the coordinate system where the first coordinate value of the one track is located is the starting point of the one track.

Secondly, for any piece of driving data, the included first reference information specifically includes a second coordinate value of a virtual lane line in each exit direction of the intersection and/or a third coordinate value of an exit area in each exit direction of the intersection; the origin of the coordinate system where the second coordinate value of the virtual lane line is located and the origin of the coordinate system where the third coordinate value of the exit area is located are both the starting points of the one track.

For example, if the intersection has four exit directions, which are respectively expressed as an exit direction 1, an exit direction 2, an exit direction 3, and an exit direction 4, if the first reference information includes the second coordinate value, the first reference information specifically includes the intersection, the second coordinate value of the virtual lane line leading to the exit direction 1, the second coordinate value of the virtual lane line leading to the exit direction 2, the second coordinate value of the virtual lane line leading to the exit direction 3, and the second coordinate value of the virtual lane line leading to the exit direction 4. In addition, for any one exit direction, the second coordinate value of the virtual lane line in the exit direction is obtained by sampling the virtual lane line, and therefore the number of the second coordinate values of the virtual lane line in the exit direction is usually plural. Optionally, the second coordinate values may be aggregated according to the exit attribute, and virtual lane lines of the same exit are aggregated into a group, so as to facilitate subsequent processing.

Similarly, if the intersection has four exit directions, which are respectively expressed as an exit direction 1, an exit direction 2, an exit direction 3, and an exit direction 4, if the first reference information includes a third coordinate value, the first reference information specifically includes the intersection, the third coordinate value of the exit area in the exit direction 1, the third coordinate value of the exit area in the exit direction 2, the third coordinate value of the exit area in the exit direction 3, and the third coordinate value of the exit area in the exit direction 4. In addition, for any one outlet direction, the third coordinate value of the outlet area in the outlet direction is obtained by sampling the outlet area, and therefore the number of the third coordinate values of the outlet area in the outlet direction is usually plural. Optionally, the central position of the exit area is sampled as a third coordinate value, and some other positions deviating from the central position are further collected around the central position as some third coordinate values.

Optionally, if the data channel is also used as a dimension of a coordinate system, the first coordinate value, the second coordinate value, and the third coordinate value may be coordinate values in different coordinate systems, respectively; if the data channel is not considered as a dimension of a coordinate system, the first coordinate value, the second coordinate value, and the third coordinate value may be coordinate values in the same coordinate system.

In the embodiment of the application, the initial source of much information of the intersection (such as the virtual lane line, the exit area, and the center position of the exit area) can be a high-precision map, so that the information is preset or selected by a corresponding algorithm, and the information can be regarded as some known information. However, in the present embodiment, the initial information in these high-precision maps is converted with reference to the travel data, for example, in the high-precision maps, the center positions of the virtual lane line, the exit area, and the high-precision maps have the original coordinate origin as a reference, but the present embodiment needs to convert the coordinate origin of the center positions of the virtual lane line, the exit area, and the exit area into the starting point of the one track, that is, to map the content in one coordinate system into another coordinate system; thereby, coordinate values of the virtual lane line, the exit area, and the center position of the exit area in the driving data in the embodiment of the present application are obtained.

Optionally, the exit area of a certain exit in the height precision map may be determined by a center position and two edge positions, the two edge positions may be located on two edges of a road of the exit, the two edge positions and the center position are in a line of three points, and the connection line is perpendicular to a lane line of the certain exit, and in addition, a distance from the connection line to the center of the intersection is less than or equal to a set value, for example, 5 meters or 6 meters, and may be specifically set according to actual needs. Then, drawing an ellipse by taking the central position as a center and taking a connecting line of the two edge positions as a long side, wherein the short side used for drawing the ellipse can be a preset numerical value according to needs, for example, 3 meters; it is also possible to preset a ratio as desired, such as 2/3 for the long side, e.g., 1/2 for the long side, the oval area being the exit area of the outlet.

Optionally, each piece of driving data may further include a driving speed and/or a driving orientation on one track when the first vehicle travels through the intersection, for example, the driving speed and/or the driving orientation at the above-mentioned first coordinate value.

Step S502: and the model training equipment performs rasterization coding on the multiple pieces of driving data to obtain multiple image samples.

Since the macroscopic world is continuous and is usually difficult to accurately represent by using a certain model, in practical calculation, continuous information in the macroscopic world can be discretized, for example, a continuous lane line needs to be represented by using a discrete point sequence. Rasterization encoding is a discrete representation of the macroscopic world. A map area is rasterized, for example, divided into a grid of 0.1m by 0.1m, and projected onto an image at a certain scaling, so that one pixel on the image can be used to represent one grid of the real world, and the pixel value can represent specific element information of the real world, such as lane lines, vehicles, and the like.

In the embodiment, one piece of driving data is used for rasterization coding to obtain one image sample, so that a plurality of pieces of driving data can be rasterized to obtain a plurality of image samples. Taking a piece of driving data as an example, different types of data in a piece of driving data and data corresponding to different exit directions are respectively encoded into different data channels in the image sample.

For example, the coordinate value of the one track, the coordinate value of the virtual lane line, and the coordinate value of the exit area are three different types of data, and when the travel data further includes a travel speed and a relative travel direction, the travel speed and the relative travel direction belong to two different types of data. For another example, if there are 4 different exit directions at the intersection, the virtual lane lines of the four different exit directions correspond to four data channels (one exit direction corresponds to one data channel) respectively, regarding the virtual lane line alone. Further, since the relative orientation is relative to the exit direction, if there are 4 different exit directions, there are four different relative travel orientations, and it is necessary to correspond to four data channels, respectively.

Therefore, if a piece of travel data includes a first coordinate value, a second coordinate value corresponding to four exit directions, respectively, and a third coordinate value corresponding to four exit directions, the piece of travel data needs to be encoded on an image sample having 9 data channels, considering both the data type and the exit direction dimensions.

Similarly, if a piece of driving data includes a first coordinate value, a second coordinate value corresponding to four exit directions, a third coordinate value corresponding to four exit directions, a driving speed, and a relative driving direction corresponding to four exit directions, the piece of driving data needs to be encoded onto an image sample having 14 data channels, considering the two dimensions of the data type and the exit directions.

For convenience of understanding, in the following, taking one of the plurality of pieces of travel data as an example, it is described how to perform the rasterization encoding on the plurality of pieces of travel data, and therefore, the first travel data may be regarded as any one of the plurality of pieces of travel data, that is, the processing procedure of any one of the plurality of pieces of travel data is the same as the processing procedure of the first travel data described herein, and the processing procedure of the first travel data is as follows:

firstly, determining a first scaling factor according to the specification of a reference image and the starting point of a track in first driving data; optionally, the reference image specification may be regarded as a set desired image specification, and the reference image specification at least includes two parameters, namely, an image height (height) and an image width (width), and can represent the size of the reference image. It should be noted that, in the process of determining the first scaling factor, in addition to the specification of the reference image and the starting point of the track in the first driving data, other information may be used, for example, the third coordinate value of each exit area of the intersection. Optionally, the first scaling factor satisfies the following relationship:

Then, the first coordinate value of the track in the first running data and the first reference information are subjected to scaling processing according to the first scaling factor, and a fourth coordinate value and second reference information are obtained. Specifically, if the first reference information includes the second coordinate value and does not include the third coordinate value, the first coordinate value and the second coordinate value need to be scaled according to the first scaling factor; if the first reference information comprises the third coordinate value and does not comprise the second coordinate value, the first coordinate value and the third coordinate value need to be subjected to scaling processing according to the first scaling factor; if the first reference information includes the second coordinate value and the third coordinate value, the first coordinate value, the second coordinate value, and the third coordinate value need to be scaled according to the first scaling factor.

For example, if the calculated first scale factor scale is 0.85, the first coordinate value is (54,66), the first reference information includes the second coordinate value and the third coordinate value, the second coordinate value is (84,26), and the third coordinate value is (28,96), then the first coordinate value (54,66) is scaled to obtain the fourth coordinate value of (45.9,56.1), the second coordinate value (84,26) is scaled to obtain the fifth coordinate value of (71.4,22.1), and the third coordinate value (28,96) is scaled to obtain the fifth coordinate value of (23.8, 81.6).

Next, an image sample corresponding to the first travel data is generated, wherein the image sample includes a first data channel and at least one second data channel, a position represented by the fourth coordinate value on the first data channel is filled with a numerical value, and a position represented by the second reference information on the at least one second data channel is filled with a numerical value.

Optionally, on the first data channel, the value filled at the position represented by the fourth coordinate value obtained by scaling the first coordinate value after the corresponding time is larger; since the one track is formed by a plurality of positions sequentially generated in sequence, it can be considered that different first coordinate values on the one track have a sequential order, and therefore, the first coordinate value which is closer to the end point of the one track is also equivalent to the first coordinate which is closer to the end point of the one track. The values filled in the data channel are set to be gradually changed to represent that the corresponding coordinate values are away from the track end point (generally close to the exit), namely, the vehicle track end point is equivalently strengthened (accentuate) by brightness or color depth, because the track data at the exit represents the real selection of the vehicle for the exit compared with the track data at the starting position, and therefore, the weight/importance is larger in the whole track data. With the arrangement, in the process of training the intention prediction model based on the convolutional neural network architecture by using the data, the intention prediction model can have stronger learning/recognition capability on the image data close to the end point of the track, so that the accuracy of the intention prediction model obtained on the basis of the learning/recognition capability is higher when the intention prediction model is used.

For example, if the track includes four first coordinate values, the four first coordinate values are sequentially represented as a first coordinate value 1, a first coordinate value 2, a first coordinate value 3, and a first coordinate value 4, wherein the later the time corresponding to the first coordinate value 1 is, the 2 times the first coordinate value is, the next the first coordinate value 3 is, and the earlier the time corresponding to the first coordinate value 4 is; the fourth coordinate value obtained by scaling the first coordinate value 1 is represented as a fourth coordinate value 1, the fourth coordinate value obtained by scaling the first coordinate value 2 is represented as a fourth coordinate value 2, the fourth coordinate value obtained by scaling the first coordinate value 3 is represented as a fourth coordinate value 3, and the fourth coordinate value obtained by scaling the first coordinate value 4 is represented as a fourth coordinate value 4. Then, the value filled in at the fourth coordinate value 1 is the largest, the value filled in at the fourth coordinate value 2 is the next, the value filled in at the fourth coordinate value 3 is the next, and the value filled in at the fourth coordinate value 4 is the smallest.

In addition, in the encoding process, a connection line can be obtained by connecting a plurality of fourth coordinate values, and then the connection line is drawn (i.e. encoded) on the first data channel of the image sample.

Optionally, on the second data channel, the value filled at the position indicated by the second reference information obtained by scaling the first reference information closer to the exit area or the center position of the exit area is larger. Therefore, for the case where the first reference information includes the second coordinate value and the third coordinate value (the principle is the same for other cases), the scheme may be: on the second data channel, the position represented by the fifth coordinate value obtained by scaling the second coordinate value closer to the outlet area is filled with a larger numerical value, and the position represented by the sixth coordinate value obtained by scaling the third coordinate value closer to the center of the outlet area is filled with a larger numerical value.

For example, if there are four second coordinate values of the virtual lane line in a certain exit direction of the intersection, the second coordinate values are sequentially represented as a second coordinate value 1, a second coordinate value 2, a second coordinate value 3, and a second coordinate value 4, where the second coordinate value 1 is closest to the exit area in the certain exit direction, the second coordinate value is 2 times, the second coordinate value 3 is the next, and the second coordinate value 4 is farthest from the exit area in the certain exit direction; a fifth coordinate value obtained by scaling the second coordinate value 1 is represented as a fifth coordinate value 1, a fifth coordinate value obtained by scaling the second coordinate value 2 is represented as a fifth coordinate value 2, a fifth coordinate value obtained by scaling the second coordinate value 3 is represented as a fifth coordinate value 3, and a fifth coordinate value obtained by scaling the second coordinate value 4 is represented as a fifth coordinate value 4. Then, the value filled in at the fifth coordinate value 1 is the largest, the value filled in at the fifth coordinate value 2 is the next, the value filled in at the fifth coordinate value 3 is the next, and the value filled in at the fifth coordinate value 4 is the smallest. And the filling principle of the numerical value obtained by scaling the second coordinate values of the virtual lane lines in other exit directions of the intersection is analogized.

For another example, if the exit area in an exit direction of the intersection includes four third coordinate values, which are sequentially represented as a third coordinate value 1, a third coordinate value 2, a third coordinate value 3, and a third coordinate value 4, wherein the third coordinate value 1 is closest to the center position of the exit area, the third coordinate value 2 times, the third coordinate value 3 again, and the third coordinate value 4 is farthest from the center position of the exit area; a sixth coordinate value obtained by scaling the third coordinate value 1 is represented as a sixth coordinate value 1, a sixth coordinate value obtained by scaling the third coordinate value 2 is represented as a sixth coordinate value 2, a sixth coordinate value obtained by scaling the third coordinate value 3 is represented as a sixth coordinate value 3, and a sixth coordinate value obtained by scaling the third coordinate value 4 is represented as a sixth coordinate value 4. Then, the value filled in the sixth coordinate value 1 is the largest, the value filled in the sixth coordinate value 2 is the next, the value filled in the sixth coordinate value 3 is the next, and the value filled in the sixth coordinate value 4 is the smallest. And the filling principle of the numerical value obtained by scaling the third coordinate values of the exit areas in other exit directions of the intersection is analogized.

Optionally, if the first reference information includes the second coordinate value and the third coordinate value, the fifth coordinate value and the sixth coordinate value obtained after scaling are respectively used in different second data channels to fill data.

Optionally, second reference information obtained by scaling the first reference information corresponding to different exit directions is respectively used for filling data in different second data channels.

For example, if the first reference information includes 4 second coordinate values in the outlet direction, which are respectively represented as a second coordinate value in the direction 1, a second coordinate value in the direction 2, a second coordinate value in the direction 3, and a second coordinate value in the direction 4, then a fifth coordinate value scaled by the second coordinate value in the direction 1, a fifth coordinate value scaled by the second coordinate value in the direction 2, a fifth coordinate value scaled by the second coordinate value in the direction 3, and a fifth coordinate value scaled by the second coordinate value in the direction 4 are respectively used for filling data in 4 different second data channels.

For another example, if the first reference information includes 4 third coordinate values in the exit direction, which are respectively represented as a third coordinate value in the direction 1, a third coordinate value in the direction 2, a third coordinate value in the direction 3, and a third coordinate value in the direction 4, then a sixth coordinate value scaled by the third coordinate value in the direction 1, a sixth coordinate value scaled by the third coordinate value in the direction 2, a sixth coordinate value scaled by the third coordinate value in the direction 3, and a sixth coordinate value scaled by the third coordinate value in the direction 4 are respectively used for filling data in 4 different second data channels.

In addition, in the encoding process, a connection line may be obtained by connecting a plurality of fifth coordinate values (corresponding to the same outlet direction), and then the connection line may be drawn (i.e., encoded) onto a second data channel of the image sample. It is also possible to connect a plurality of sixth coordinate values (corresponding to the same exit direction) to obtain a connection line, and then draw (i.e., encode) the connection line on the second data channel of the image sample.

Optionally, for any piece of driving data, the driving data further includes a driving speed of the first vehicle on the track, and the one image sample further includes a third data channel, and a position represented by the fourth coordinate value on the third data channel fills the driving speed. The travel speed mentioned here may be one or more, and when a plurality of the travel speeds are, for example, instantaneous speeds acquired at positions of the first coordinate values included in the travel data, specifically, a plurality of instantaneous speeds, that is, a plurality of travel speeds, may be acquired with a plurality of the first coordinate values. Then, at the time of filling, if there are 4 first coordinate values expressed as a first coordinate value 1, a first coordinate value 2, a first coordinate value 3, and a first coordinate value 4, respectively, wherein a fourth coordinate value obtained by scaling the first coordinate value 1 represents a fourth coordinate value 1, a fourth coordinate value obtained by scaling the first coordinate value 2 represents a fourth coordinate value 2, a fourth coordinate value obtained by scaling the first coordinate value 3 represents a fourth coordinate value 3, and a fourth coordinate value obtained by scaling the first coordinate value 4 represents a fourth coordinate value 4, then the position represented by the fourth coordinate value 1 on the third data channel is filled with the travel speed acquired at the first coordinate value 1, the position represented by the fourth coordinate value 2 on the third data channel is filled with the travel speed acquired at the first coordinate value 2, and the position represented by the fourth coordinate value 3 on the third data channel is filled with the travel speed acquired at the first coordinate value 3, the position on the third data channel represented by the fourth coordinate value 4 is filled with the travel speed acquired at the first coordinate value 4. The speed here may be an absolute value of the speed.

Optionally, for any piece of driving data, the driving data further includes a plurality of relative driving orientations of the first vehicle on the track, and the one image sample further includes a plurality of fourth data channels, and a position represented by the fourth coordinate value on each fourth data channel is filled with information related to one of the relative driving orientations, wherein the plurality of relative driving orientations are driving orientations relative to different exit directions of the intersection respectively. For example, the plurality of relative travel orientations may be specifically a plurality of instantaneous orientations acquired at the position of the first coordinate value included in the travel data, and thus, with a plurality of first coordinate values, a plurality of sets of a plurality of instantaneous orientations may be acquired. If there are 4 exit directions at the intersection, which respectively represent exit direction 1, exit direction 2, exit direction 3, and exit direction 4, then the relative driving directions collected at the first coordinate value are specifically 4, where 1 relative driving direction is a direction obtained with reference to exit

direction

1, 1 relative driving direction is a direction obtained with reference to exit direction 2, 1 relative driving direction is a direction obtained with reference to exit

direction

3, and 1 relative driving direction is a direction obtained with reference to exit direction 4. And a plurality of first coordinate values are arranged on one track, so that a plurality of groups of relative driving directions exist, and each first coordinate value corresponds to one group of relative driving directions.

In the following, it is described how to fill in values (the principle is the same for the other first coordinate values) by taking one of the first coordinate values as an example, if the first coordinate value is scaled to obtain a fourth coordinate value, then the position represented by the fourth coordinate value on the fourth data channel 1 fills in the relative driving orientation acquired at the first coordinate value with respect to the outlet direction 1, the position represented by the fourth coordinate value on the fourth data channel 2 fills in the relative driving orientation acquired at the first coordinate value with respect to the outlet direction 2, the position represented by the fourth coordinate value on the fourth data channel 3 fills in the relative driving orientation acquired at the first coordinate value with respect to the outlet direction 3, and the position represented by the fourth coordinate value on the fourth data channel 4 fills in the relative driving orientation acquired at the first coordinate value with respect to the outlet direction 4. The fourth data channel 1, the fourth data channel 2, the fourth data channel 3, and the fourth data channel 4 are four different fourth data channels, respectively.

It should be noted that, for any piece of traveling data, it may include one or both of the traveling speed and the traveling direction, or neither of them.

From the above description, each of the obtained image samples is actually a tensor with one dimension (number of data channels, image height, image width).

It should be noted that, the steps S501 and S502 may also be performed by a target device, where the target device is specially configured to perform normalization processing on data, and then send the normalized data to a model training device, so that the model training device is not required to perform normalization processing on a large amount of data, and only needs to train an intention prediction model based on the received normalization data, that is, the multiple sample data, and then based on the multiple sample data. Optionally, the target device may be deployed at a cloud, which has a relatively high computing power and can efficiently complete processing of a large amount of data.

Step S503: the model training device trains a vehicle intention prediction model through the plurality of sample data.

Wherein each sample data in the plurality of sample data comprises an image sample in the plurality of image samples and a piece of label information, and the piece of label information is used for representing the real condition that a first vehicle in the image sample travels to an exit in the outlets; therefore, a plurality of sample data can be obtained based on the plurality of image samples.

In the embodiment of the application, since one image sample is encoded by one piece of driving data, the first vehicle in one image sample specifically refers to the first vehicle involved in one piece of driving data corresponding to the one image sample, for example, the one piece of driving data describes information such as a driving track, a driving speed, a driving direction and the like of a certain vehicle at an intersection, and then the first vehicle in one image sample encoded by the one piece of driving data refers to the certain vehicle.

If there are four exits at the intersection, namely exit 1, exit 2, exit 3, and exit 4, the label information in the sample data includes four label values corresponding to the four exit directions, respectively, where each exit direction corresponds to one label value, and is represented as: label information (exit 1 label value, exit 2 label value, exit 3 label value, exit 4 label value); it should be noted that, in which exit among the four exits is the real exit of the first vehicle related to the image sample in the sample data, the label value of the exit may be labeled as a first value, the label values of the remaining non-real exits may be labeled as second values, and taking the first value as 1, the second value as 0, the exit 2 as the real exit, and the

exits

1, 3, and 4 as the non-real exits as examples, the label information in the sample data is (0,1,0, 0).

In the embodiment of the present application, the actual exit of the first vehicle to which the image sample relates is actually the exit indicated by the direction of the end point of a track involved in a piece of driving data encoded to obtain the image sample. Therefore, under the condition that the real exit and the non-real exit of the first vehicle related to each image sample are known, the label information corresponding to each image sample can be determined, and then a piece of sample data is determined based on the image sample and the label information corresponding to the image sample.

In the embodiment of the present application, a plurality of sample data are trained, which is substantially to learn a deep level relationship between information of each dimension of an image sample and tag information, and fig. 6 illustrates an algorithm framework of a training process, specifically, an intersection is illustrated by taking three exit directions as an example, since there are 1 data channel corresponding to a fourth coordinate value (related to a track), 3 data channels corresponding to 3 fifth coordinate values (related to a virtual lane line) corresponding to 3 exit directions, 3 data channels corresponding to 3 sixth coordinate values (related to an exit region) corresponding to 3 exit directions, 3 data channels corresponding to 3 relative driving directions corresponding to 3 exit directions, and 1 data channel corresponding to a driving speed, there are 11 data channels in total, and an image sample including 11 data channels is input to a neural network, specifically, the 11 data channels are combined according to the exit to obtain 3 sets of input tensors corresponding to 3 exit directions, specifically as follows:

the input tensor corresponding to the exit direction 1 includes a first data channel, a second data channel 11, a second data channel 12, a third data channel, and a fourth data channel, where the first data channel is a data channel filled with a fourth coordinate value (obtained by transforming the trajectory of the first vehicle), the second data channel 11 is a data channel filled with a fifth coordinate value (obtained by transforming the virtual lane line in the exit direction 1), the second data channel 12 is a data channel filled with a sixth coordinate value (obtained by transforming the exit area in the exit direction 1), the third data channel is a data channel filled with the traveling speed of the first vehicle during traveling, and the fourth data channel is a data channel filled with the relative traveling direction of the first vehicle with respect to the exit direction 1 during traveling.

The input tensor corresponding to the exit direction 2 includes a first data channel, a second data channel 21, a second data channel 22, a third data channel, and a fourth data channel, where the first data channel is a data channel filled with a fourth coordinate value (obtained by transforming the trajectory of the first vehicle), the second data channel 21 is a data channel filled with a fifth coordinate value (obtained by transforming the virtual lane line in the exit direction 2), the second data channel 22 is a data channel filled with a sixth coordinate value (obtained by transforming the exit area in the exit direction 2), the third data channel is a data channel filled with the travel speed of the first vehicle during travel, and the fourth data channel is a data channel filled with the relative travel direction of the first vehicle during travel with respect to the exit direction 2.

The input tensor corresponding to the exit direction 3 includes a first data channel, a second data channel 31, a second data channel 32, a third data channel, and a fourth data channel, where the first data channel is a data channel filled with a fourth coordinate value (obtained by transforming the trajectory of the first vehicle), the second data channel 31 is a data channel filled with a fifth coordinate value (obtained by transforming the virtual lane line in the exit direction 3), the second data channel 32 is a data channel filled with a sixth coordinate value (obtained by transforming the exit area in the exit direction 3), the third data channel is a data channel filled with the traveling speed of the first vehicle during traveling, and the fourth data channel is a data channel filled with the relative traveling direction of the first vehicle with respect to the exit direction 3 during traveling.

Therefore, after the output obtained by passing the input tensor of each outlet through the convolutional neural network is subjected to softmax, the intention probability of each outlet is predicted; and constructing a cross entropy loss function, representing the difference range of the predicted intention probability and the real exit condition represented by the label information, and training the convolutional neural network based on the cross entropy loss function, so as to train and obtain an intention prediction model with higher prediction accuracy. Subsequently, after an image sample is input into the intention prediction model, the probability of the vehicle driving towards each direction of the intersection, namely the driving intention, can be predicted.

Optionally, the vehicle intention prediction model may be continuously evolved, and particularly when the training intention prediction model is implemented by the cloud, the intention prediction model may be continuously evolved/updated as training data increases, and the updated intention prediction model (or only parameters thereof) may be sent to the prediction device (e.g., a vehicle) to update the model of the prediction device.

Step S504: the prediction device obtains an intent prediction model.

In this embodiment of the application, the prediction device may be a virtual machine deployed in a cloud, or one server, or a server cluster formed by multiple servers, or a vehicle, and the following description will be given by taking the prediction device as a vehicle as an example.

There are many ways for the vehicle to obtain the vehicle intention prediction model, for example, a wireless or wired communication connection is established between the vehicle and the model training device, so that the intention prediction model sent by the model training device can be received. As another example, the intent prediction model is copied from the model training device to the vehicle by manual means.

Alternatively, the on-board intent prediction model may be updated periodically or aperiodically. For example, updates are made every month, which is a regular update; for another example, the intention prediction model is updated once when a new intention prediction model is generated, which belongs to irregular updating.

Step S505: the prediction apparatus acquires target travel data.

The following description will be given taking the prediction apparatus as a vehicle as an example.

The target driving data includes first reference information and a first coordinate value of a track when the second vehicle drives through the intersection, which has been described above and will not be described herein again.

Optionally, the second vehicle described by the target driving data and the first vehicle described by the plurality of driving data may be the same vehicle, or may be different vehicles, or may be from the same intersection, or may be from different intersections.

First, the first coordinate value of a track included in the target driving data generally includes a plurality of coordinate values, for example, the driving track is sampled, and the sampled coordinate value is the first coordinate value.

Secondly, the first reference information included in the target driving data specifically includes a second coordinate value of a virtual lane line in each exit direction of the intersection and/or a third coordinate value of an exit area in each exit direction of the intersection; the origin of the coordinate system where the second coordinate value of the virtual lane line is located and the origin of the coordinate system where the third coordinate value of the exit area is located are both the starting points of the one track.

For example, if the intersection has four exit directions, which are respectively expressed as an exit direction 1, an exit direction 2, an exit direction 3, and an exit direction 4, if the first reference information includes the second coordinate value, the first reference information specifically includes the intersection, the second coordinate value of the virtual lane line leading to the exit direction 1, the second coordinate value of the virtual lane line leading to the exit direction 2, the second coordinate value of the virtual lane line leading to the exit direction 3, and the second coordinate value of the virtual lane line leading to the exit direction 4. In addition, for any one exit direction, the second coordinate value of the virtual lane line in the exit direction is obtained by sampling the virtual lane line, and therefore the number of the second coordinate values of the virtual lane line in the exit direction is usually plural.

In the embodiment of the application, the initial source of much information of the intersection (such as the virtual lane line, the exit area, and the center position of the exit area) can be a high-precision map, so that the information is preset or selected by a corresponding algorithm, and the information can be regarded as some known information. However, in the present embodiment, the initial information in these high-precision maps is converted with reference to the travel data, for example, in the re-high-precision map, the center positions of the virtual lane line, the exit area, and the exit area have the original coordinate origin as a reference, but in the present embodiment, the coordinate origin of the center positions of the virtual lane line, the exit area, and the exit area needs to be converted into the starting point of the one track, that is, the content in one coordinate system is mapped into another coordinate system; thereby, coordinate values of the center positions of the virtual lane line, the exit area, and the exit area in the target driving data in the embodiment of the present application are obtained.

Alternatively, the target travel data may further include a driving speed and/or a driving orientation on a trajectory when the second vehicle travels through the intersection, for example, the driving speed and/or the driving orientation at the above-described first coordinate value.

Step S506: and the prediction equipment performs rasterization coding on the target driving data to obtain a target image sample.

The target driving data is used for rasterizing coding to obtain target image samples, wherein different types of data in the target driving data and data corresponding to different outlet directions are respectively coded into different data channels in the target image samples.

For example, the coordinate value of the one track, the coordinate value of the virtual lane line, and the coordinate value of the exit area are three different types of data, and when the target travel data further includes a travel speed and a relative travel direction, the travel speed and the relative travel direction belong to two different types of data. For another example, if there are 4 different exit directions at the intersection, the virtual lane lines of the four different exit directions correspond to four data channels (one exit direction corresponds to one data channel) respectively, regarding the virtual lane line alone. Further, since the relative orientation is relative to the exit direction, if there are 4 different exit directions, there are four different relative travel orientations, and it is necessary to correspond to four data channels, respectively.

Therefore, if the target driving data includes a first coordinate value, a second coordinate value corresponding to four outlet directions, and a third coordinate value corresponding to four outlet directions, the target driving data needs to be encoded on a target image sample having 9 data channels, considering both the data type and the outlet direction.

Similarly, if the target driving data includes a first coordinate value, a second coordinate value corresponding to four exit directions, a third coordinate value corresponding to four exit directions, a driving speed, and a relative driving direction corresponding to four exit directions, the target driving data needs to be encoded onto a target image sample having 14 data channels, considering the data type and two dimensions of the exit directions.

For ease of understanding, the following details how the target driving data is rasterized to obtain a target image sample:

firstly, determining a second scaling factor according to the specification of a reference image and the starting point of a track in target driving data; optionally, the reference image specification may be regarded as a set desired image specification, and the reference image specification at least includes two parameters, namely, an image height (height) and an image width (width), and can represent the size of the reference image. It should be noted that, in the process of determining the second scaling factor, in addition to the specification of the reference image and the starting point of the trajectory in the target driving data, other information may also be used, for example, a third coordinate value of each exit area of the intersection, and optionally, the second scaling factor satisfies the following relationship:

wherein scale is the second scaling factor, h is the height of the reference image, w is the width of the reference image, | P0, G_i|' represents the starting point P0 of the trajectory in the target driving data and the i-th exit area G of the intersection_iThe distance of (d);

Then, according to the second scaling factor, scaling the first coordinate value of the trajectory in the target driving data and the first reference information to obtain a fourth coordinate value and second reference information. Specifically, if the first reference information includes the second coordinate value and does not include the third coordinate value, the first coordinate value and the second coordinate value need to be scaled according to the first scaling factor; if the first reference information comprises the third coordinate value and does not comprise the second coordinate value, the first coordinate value and the third coordinate value need to be subjected to scaling processing according to the first scaling factor; if the first reference information includes the second coordinate value and the third coordinate value, the first coordinate value, the second coordinate value, and the third coordinate value need to be scaled according to the first scaling factor.

For example, if the calculated second scale factor scale is 0.85, the first coordinate value is (54,66), the first reference information includes the second coordinate value and the third coordinate value, the second coordinate value is (84,26), and the third coordinate value is (28,96), then the first coordinate value (54,66) is scaled to obtain the fourth coordinate value of (45.9,56.1), the second coordinate value (84,26) is scaled to obtain the fifth coordinate value of (71.4,22.1), and the third coordinate value (28,96) is scaled to obtain the fifth coordinate value of (23.8, 81.6).

Optionally, on the first data channel, the value filled at the position represented by the fourth coordinate value obtained by scaling the first coordinate value is larger as the corresponding time is later; since the one track is formed by a plurality of positions sequentially generated in sequence, it can be considered that different first coordinate values on the one track have a sequential order, and therefore, the first coordinate value which is closer to the end point of the one track is also equivalent to the first coordinate value which is closer to the end point of the one track. The values filled in the data channel are set to be gradually changed to represent the distance between the corresponding coordinate values and the track end point (generally close to the exit), namely, the distance between the corresponding coordinate values and the track end point is represented by the brightness or the color depth, so that the recognition capability of the coordinates close to the track end point is enhanced, and the accuracy of the subsequently obtained intention prediction model based on the recognition capability is higher.

For example, if the track includes four first coordinate values, the four first coordinate values are sequentially represented as a first coordinate value 1, a first coordinate value 2, a first coordinate value 3, and a first coordinate value 4, wherein the time corresponding to the first coordinate value 1 is the latest, the first coordinate value 2 times later, the first coordinate value 3 times later, and the time corresponding to the first coordinate value 4 is the latest; the fourth coordinate value obtained by scaling the first coordinate value 1 is represented as a fourth coordinate value 1, the fourth coordinate value obtained by scaling the first coordinate value 2 is represented as a fourth coordinate value 2, the fourth coordinate value obtained by scaling the first coordinate value 3 is represented as a fourth coordinate value 3, and the fourth coordinate value obtained by scaling the first coordinate value 4 is represented as a fourth coordinate value 4. Then, the value filled in at the fourth coordinate value 1 is the largest, the value filled in at the fourth coordinate value 2 is the next, the value filled in at the fourth coordinate value 3 is the next, and the value filled in at the fourth coordinate value 4 is the smallest.

Optionally, the target driving data further includes a driving speed of the second vehicle on the track, and the one image sample further includes a third data channel, and a position represented by the fourth coordinate value on the third data channel is filled with the driving speed. The travel speeds mentioned here may be one or more, and when a plurality thereof are, for example, instantaneous speeds that are acquired at the positions of the first coordinate values included in the target travel data, specifically, a plurality of instantaneous speeds, that is, a plurality of travel speeds, that are acquired with a plurality of first coordinate values. Then, at the time of filling, if there are 4 first coordinate values expressed as a first coordinate value 1, a first coordinate value 2, a first coordinate value 3, and a first coordinate value 4, respectively, wherein a fourth coordinate value obtained by scaling the first coordinate value 1 represents a fourth coordinate value 1, a fourth coordinate value obtained by scaling the first coordinate value 2 represents a fourth coordinate value 2, a fourth coordinate value obtained by scaling the first coordinate value 3 represents a fourth coordinate value 3, and a fourth coordinate value obtained by scaling the first coordinate value 4 represents a fourth coordinate value 4, then the position represented by the fourth coordinate value 1 on the third data channel is filled with the travel speed acquired at the first coordinate value 1, the position represented by the fourth coordinate value 2 on the third data channel is filled with the travel speed acquired at the first coordinate value 2, and the position represented by the fourth coordinate value 3 on the third data channel is filled with the travel speed acquired at the first coordinate value 3, the position on the third data channel represented by the fourth coordinate value 4 is filled with the travel speed acquired at the first coordinate value 4. The speed here may be an absolute value of the speed.

Optionally, the target driving data further includes a plurality of relative driving orientations of the second vehicle on the track, and the target image sample further includes a plurality of fourth data channels, and a position represented by the fourth coordinate value on each fourth data channel is filled with information related to one of the relative driving orientations, where the plurality of relative driving orientations are driving orientations in different exit directions from the intersection. For example, the plurality of relative travel orientations may be a plurality of instantaneous orientations acquired at positions of the first coordinate values included in the target travel data, and thus, a plurality of sets of the plurality of instantaneous orientations may be acquired with a plurality of first coordinate values. If there are 4 exit directions at the intersection, which respectively represent exit direction 1, exit direction 2, exit direction 3, and exit direction 4, then the relative driving directions collected at the first coordinate value are specifically 4, where 1 relative driving direction is a direction obtained with reference to exit

direction

It should be noted that the target travel data may include one or both of the travel speed and the travel direction, or neither of them.

From the above description, the obtained target image sample is actually a tensor with one dimension (number of data channels, image height, image width).

Step S507: the prediction device inputs the target image sample into the vehicle intention prediction model, and obtains the intention probability of the second vehicle in the target image sample to travel to each exit in the various exits.

In the process of training the intention prediction model, the label information reflects the real situation that the vehicle exits from each exit of the intersection, so that the situation that the second vehicle in the target image sample travels to each exit in the exits, namely the probability of intention of traveling to each exit in the exits can be predicted by inputting the target image sample into the intention prediction model.

For example, after the target image sample is input to the intention prediction model, the output prediction information is (0.06,0.50,0.11,0.13), which indicates that the probability of the second vehicle in the target image sample traveling to exit 1 of the intersection is 0.06, the probability of traveling to exit 2 of the intersection is 0.50, the probability of traveling to exit 3 of the intersection is 0.11, and the probability of traveling to exit 4 of the intersection is 0.13.

It can be seen that the present embodiments pertain to multi-modal intent prediction for multiple exits.

In the present application, the vehicle for predicting the probability of intention based on the intention prediction model may be the first vehicle, the second vehicle, or another vehicle.

The embodiment of the application can achieve the following effects:

the prediction precision is high: the conventional image generally has 3 data channels, which respectively represent three primary colors of red, green and blue, and the conventional method encodes different information using different colors, each of which is actually encoded on three data channels simultaneously. The embodiment of the application has no concept of coding colors, but codes different types of information and information corresponding to different outlet directions to separate data channels, avoids overlapping of complex information, has no intercrossing and interference among different information, and is easier to learn the potential information of different elements.

The operation efficiency is high: because the first coordinate value on the track, the second coordinate value on the virtual lane line and the third coordinate value on the exit area are all referred to the initial position of the vehicle at the intersection, after the image sample is generated based on the driving data corresponding to a certain track, if the vehicle continues to drive, the data generated by the continuous driving can be directly supplemented into the image sample, a new image sample does not need to be generated again, a new image coordinate does not need to be reconstructed, and the image sample can be generated once and used for multiple times, so that the operation cost can be reduced; in addition, the neural network model is simple, small in occupied storage and high in operation efficiency.

Good generalization performance: because the data channels are coded according to different types of information and information corresponding to different exit directions, the data channels with corresponding number can be set according to actual requirements no matter the number of exits of the intersection is large or small, simple or complex, and the data channels are flexible; meanwhile, the characteristics are not required to be designed, deep logic is learned through a neural network, and various complex scenes can be covered. In addition, because the data channels are divided according to the exit directions, the characteristics of each exit direction can be accurately trained, so that the driving intention probability of each exit direction can be predicted, the method is equivalent to a classification frame with variable quantity, and the method can be suitable for intersections with different exit quantities.

The noise resistance is good: the prediction is realized in a coded picture mode, and the method is insensitive to vehicle position jitter and is less influenced by sensing precision.

Effectively solving the multi-modal problem: the exit intent prediction is abstracted into a binary problem, i.e., the matching of a vehicle to each exit of an intersection, matches multiple exits when the vehicle intent is not apparent, thereby solving a multi-modal problem.

The external dependence condition is weak: the method does not strongly depend on the accurate high-precision map lane line, and has higher prediction precision even if part of information in the high-precision map is lost. In addition, since most of the information related to the position in the present application is converted to the information that reflects the running state of the vehicle more by using the start position of the vehicle at the intersection (i.e., the start position of the trajectory) as the origin of coordinates, the intention prediction is performed based on the information, so that the accuracy is higher, and the dependence on high accuracy is also reduced.

In addition, if the traveling speed and/or the traveling direction are also taken into the coding range, the steering and movement tendency of the vehicle can be learned more comprehensively, and the accuracy of prediction of the traveling intention is further improved.

The method of the embodiments of the present application is set forth above in detail and the apparatus of the embodiments of the present application is provided below.

Referring to fig. 7, fig. 7 is a model training device 70 according to an embodiment of the present disclosure, where the model training device is a service, or a server cluster composed of a plurality of servers, or a cloud virtual machine, the model training device 70 includes a processor 701, a memory 702, and a communication interface 703, and the processor 701, the memory 702, and the communication interface 703 are connected to each other through a bus.

The memory 702 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), and the memory 702 is used for related computer programs and data. The communication interface 703 is used for receiving and transmitting data.

The processor 701 may be one or more Central Processing Units (CPUs), and in the case that the processor 701 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 701 in the model training apparatus 70 is configured to read the computer program code stored in the memory 702, and perform the following operations:

In an alternative, when the plurality of driving data are rasterized to obtain a plurality of image samples, the processor 701 is specifically configured to:

In yet another alternative:

and/or the presence of a gas in the gas,

In yet another alternative, the first scaling factor satisfies the following relationship:

In yet another alternative, the processor 701 is further configured to send the intention prediction model to a prediction device through the communication interface 703.

It should be noted that, the implementation of the respective operations may also refer to the corresponding description of the method embodiment shown in fig. 5.

Referring to fig. 8, fig. 8 is a prediction device 80 provided in the embodiment of the present application, where the prediction device 80 may be a vehicle, a virtual machine deployed in a cloud, a server, or a server cluster formed by multiple servers; the prediction device 80 comprises a processor 801, a memory 802 and a communication interface 803, said processor 801, memory 802 and communication interface 803 being interconnected by a bus.

The memory 802 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), and the memory 802 is used for related computer programs and data. The communication interface 803 is used to receive and transmit data.

The processor 801 may be one or more Central Processing Units (CPUs), and in the case where the processor 801 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 801 in the prediction device 80 is configured to read the computer program code stored in the memory 802, and perform the following operations:

In an alternative, in terms of obtaining the intent prediction model, the processor 801 is specifically configured to:

an intention prediction model transmitted by a model training device is received through the communication interface 803, wherein the intention prediction model is trained based on a plurality of image samples obtained by rasterizing and encoding a plurality of pieces of travel data, and the data format of each piece of travel data is the same as that of the target travel data.

In yet another alternative, in terms of obtaining a target image sample by performing rasterization encoding on the target driving data, the processor 801 is specifically configured to:

In yet another alternative:

and/or the presence of a gas in the gas,

In yet another alternative, the second scaling factor satisfies the following relationship:

Referring to fig. 9, fig. 9 is a target device 90 for processing image data according to an embodiment of the present application, where the target device 90 is a server, or a server cluster composed of multiple servers, or a virtual machine deployed in a cloud, and the target device 90 includes a processor 901, a memory 902, and a communication interface 903, and the processor 901, the memory 902, and the communication interface 903 are connected to each other through a bus.

The memory 902 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), and the memory 902 is used for related computer programs and data. The communication interface 903 is used for receiving and transmitting data.

The processor 901 may be one or more Central Processing Units (CPUs), and in the case that the processor 901 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 901 in the target device 90 is configured to read the computer program code stored in the memory 902, and perform the following operations:

In an alternative, in rasterizing and encoding the plurality of pieces of driving data to obtain a plurality of image samples, the processor 901 is specifically configured to:

In yet another alternative:

and/or the presence of a gas in the gas,

Referring to fig. 10, fig. 10 is a schematic structural diagram of a training apparatus 100 for vehicle intention pre-storing a model according to an embodiment of the present application, where the apparatus 100 may be the above model training device, or a device or a module in the above model training device, and the apparatus 100 may include an obtaining unit 1001, a coding unit 1002, and a training unit 1003, where details of each unit are described as follows.

An obtaining unit 1001 configured to obtain a plurality of pieces of travel data, where each piece of travel data includes first reference information and a first coordinate value of a trajectory when a first vehicle travels through a target intersection, where the first reference information includes a second coordinate value of a virtual lane line in each exit direction of the intersection and/or a third coordinate value of an exit area in each exit direction of the intersection; the origin of the coordinate system where the second coordinate value of the virtual lane line is located and the origin of the coordinate system where the third coordinate value of the exit area is located are both the starting points of the track;

an encoding unit 1002, configured to perform rasterization encoding on the multiple pieces of driving data to obtain multiple image samples, where different types of data in one piece of driving data and data corresponding to different exit directions are encoded into different data channels in the one image sample, respectively;

a training unit 1003, configured to train a vehicle intention prediction model through the plurality of sample data, where each sample data in the plurality of sample data includes one image sample in the plurality of sample images and one piece of label information, and the one piece of label information is used to represent a real situation that a first vehicle in the one image sample travels to an exit of the exits.

In an alternative scheme, when the plurality of pieces of driving data are subjected to rasterization coding to obtain a plurality of image samples, the coding unit 1002 is specifically configured to:

In yet another alternative:

and/or the presence of a gas in the gas,

In yet another alternative, the apparatus further comprises:

It should be noted that the implementation of each unit may also correspond to the corresponding description of the method embodiment shown in fig. 5.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a vehicle intention prediction apparatus 110 provided in an embodiment of the present application, where the apparatus 110 may be the prediction device, or a device or a module in the prediction device, and the apparatus 110 may include a first obtaining unit 1101, a second obtaining unit 1102, an encoding unit 1103 and a prediction unit 1104, where details of each unit are described below.

A first acquisition unit 1101 configured to acquire an intention prediction model for predicting an intention of a vehicle to travel in each direction of an intersection;

a second obtaining unit 1102, configured to obtain target traveling data, where the target traveling data includes first reference information and a first coordinate value of a trajectory when a second vehicle travels through the intersection, where the first reference information includes a second coordinate value of a virtual lane line in each exit direction of the intersection and/or a third coordinate value of an exit area in each exit direction of the intersection; the origin of the coordinate system where the second coordinate value of the virtual lane line is located and the origin of the coordinate system where the third coordinate value of the exit area is located are both the starting points of the track;

an encoding unit 1103, configured to perform rasterization encoding on the target driving data to obtain target image samples, where different types of data in the target driving data and data corresponding to different exit directions are encoded in different data channels in the image sample, respectively;

a prediction unit 1104, configured to input the target image sample to the intention prediction model, and obtain an intention probability of the second vehicle traveling in each direction of the intersection.

In an optional aspect, in terms of obtaining a target image sample by performing rasterization encoding on the target driving data, the encoding unit 1103 is specifically configured to:

In yet another alternative: the travel data further includes a travel speed of the second vehicle on the one track, the target image sample further includes a third data channel, and a position represented by the fourth coordinate value on the third data channel fills the travel speed; and/or the driving data further comprises a plurality of relative driving directions of the first vehicle on the track, the image sample further comprises a plurality of fourth data channels, and the position represented by the fourth coordinate value on each fourth data channel is filled with information related to one driving direction, wherein the plurality of relative driving directions are respectively driving directions relative to different exit directions of the intersection.

In this way, the factors of the driving speed and/or the driving direction are introduced into the prediction of the intention prediction model, so that the steering and movement trends of the vehicle can be more comprehensively learned, and the prediction accuracy of the driving intention is further improved. In an alternative, the second scaling factor satisfies the following relationship:

wherein scale is the second scaling factor, h is the reference mapHeight of image, w being width of the reference image, | P0, G_i|' represents the starting point P0 of the trajectory in the target driving data and the i-th exit area G of the intersection_iThe distance of (d);

Referring to fig. 12, fig. 12 is a schematic structural diagram of an apparatus 120 for processing image data according to an embodiment of the present application, where the apparatus 120 may be the target device or a module in the target device, and the apparatus 120 may include an obtaining unit 1201, an encoding unit 1202, and a sending unit 1203, where details of each unit are described below.

An obtaining unit 1201, configured to obtain a plurality of pieces of travel data, where each piece of travel data includes first reference information and a first coordinate value of a trajectory when a first vehicle travels through a target intersection, where the first reference information includes a second coordinate value of a virtual lane line in each exit direction of the intersection and/or a third coordinate value of an exit area in each exit direction of the intersection; the origin of the coordinate system where the second coordinate value of the virtual lane line is located and the origin of the coordinate system where the third coordinate value of the exit area is located are both the starting points of the track;

an encoding unit 1202, configured to perform rasterization encoding on the multiple pieces of travel data to obtain multiple image samples, where different types of data in one piece of travel data and data corresponding to different exit directions are encoded into different data channels in the one image sample, respectively;

a sending unit 1203, configured to send the multiple sample data to a model training device, where each sample data in the multiple sample data includes one image sample in the multiple sample images and one piece of label information, and the one piece of label information is used to represent an actual situation of an intention of a first vehicle in the one image sample to travel to an exit of the exits.

In an alternative scheme, when the plurality of pieces of driving data are rasterized and encoded to obtain a plurality of image samples, the encoding unit 1202 is specifically configured to:

In yet another alternative:

and/or the presence of a gas in the gas,

The embodiment of the present application further provides a chip system, where the chip system includes at least one processor, a memory and an interface circuit, where the memory, the transceiver and the at least one processor are interconnected by a line, and the at least one memory stores a computer program; the method flow shown in fig. 5 is implemented when the computer program is executed by the processor.

Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed on a processor, the method flow shown in fig. 5 is implemented.

Embodiments of the present application also provide a computer program product, where when the computer program product runs on a processor, the method flow shown in fig. 5 is implemented.

In summary, the following effects can be achieved by adopting the embodiment of the application:

The operation efficiency is high: because the first coordinate value on the track, the second coordinate value on the virtual lane line and the third coordinate value on the exit area are all referred to the initial position of the vehicle at the intersection, after the image sample is generated based on the driving data corresponding to a certain track, if the vehicle continues to drive, the data generated by the continuous driving can be directly supplemented into the image sample, a new image sample does not need to be generated again, a new image coordinate value does not need to be reconstructed, and the image sample can be generated once and used for multiple times, so that the operation cost can be reduced; in addition, the neural network model is simple, small in occupied storage and high in operation efficiency.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments can be implemented by hardware associated with a computer program that can be stored in a computer-readable storage medium, and when executed, can include the processes of the above method embodiments. And the aforementioned storage medium includes: various media that can store computer program code, such as ROM or RAM, magnetic or optical disks, etc.

Claims

1. A training method of a vehicle intention prediction model is characterized by comprising the following steps:

training a vehicle intention prediction model by using the plurality of sample data, wherein each sample data in the plurality of sample data comprises an image sample in the plurality of sample images and a piece of label information, and the piece of label information is used for representing the real condition that a first vehicle in the image sample travels to the exit in the exit.

2. The method of claim 1, wherein the rasterizing encoding the plurality of travel data to obtain a plurality of image samples comprises:

generating an image sample corresponding to the first travel data, wherein the image sample comprises a first data channel on which a location fill value represented by the fourth coordinate value is represented and at least one second data channel on which a location fill value represented by the second reference information is represented; on the first data channel, the value filled in the position represented by the fourth coordinate value obtained by scaling the first coordinate value which corresponds to the later time is larger; on the second data channel, the value filled at the position represented by the second reference information obtained by scaling the first reference information closer to the outlet area or the central position of the outlet area is larger;

if the first reference information comprises the second coordinate value and a third coordinate value, the second reference information comprises a fifth coordinate value and a sixth coordinate value, the fifth coordinate value is obtained by scaling the second coordinate value by the first scaling factor, and the sixth coordinate value is obtained by scaling the third coordinate value by the first scaling factor; wherein the fifth coordinate value and the sixth coordinate value are respectively used for filling data in different second data channels;

and the second reference information obtained by scaling the first reference information corresponding to the different exit directions is respectively used for filling data in different second data channels.

3. The method according to claim 1 or 2, characterized in that:

and/or the presence of a gas in the gas,

4. The method according to any of claims 1-3, wherein the first scaling factor satisfies the following relationship:

wherein scale is the first scaling factor, h is the height of the reference image, w is the width of the reference image, | P, G_i|' represents the starting point P of the trajectory in the first travel data and the intersectionOf the ith outlet area G_iThe distance of (d);

5. The method according to any one of claims 1-4, further comprising:

sending the intent prediction model to a prediction device.

6. A vehicle intention prediction method characterized by comprising:

7. The method of claim 6, wherein obtaining the intent prediction model comprises:

8. The method according to claim 6 or 7, wherein the rasterizing encoding the target driving data to obtain target image samples comprises:

generating a target image sample corresponding to the target driving data, wherein the target image sample comprises a first data channel and at least one second data channel, a location filling value represented by the fourth coordinate value on the first data channel, and a location filling value represented by the second reference information on the at least one second data channel;

if the first reference information comprises the second coordinate value and the third coordinate value, the second reference information comprises a fifth coordinate value and a sixth coordinate value, the fifth coordinate value is obtained by scaling the second coordinate value by the second scaling factor, and the sixth coordinate value is obtained by scaling the third coordinate value by the second scaling factor; wherein the fifth coordinate value and the sixth coordinate value are respectively used for filling data in different second data channels;

second reference information obtained by scaling the first reference information corresponding to different outlet directions is respectively used for filling data in different second data channels;

on the first data channel, the value filled in the position represented by the fourth coordinate value obtained by scaling the first coordinate value with the later corresponding time is larger; and on the second data channel, the filling value at the position represented by the second reference information obtained by scaling the first reference information closer to the outlet area or the central position of the outlet area is larger.

9. The method according to any one of claims 6-8, wherein:

and/or the presence of a gas in the gas,

10. An image data processing method characterized by comprising:

11. A training apparatus for a vehicle intention prediction model, comprising:

12. The apparatus according to claim 11, wherein, in rasterizing the plurality of pieces of travel data to obtain a plurality of image samples, the encoding unit is specifically configured to:

13. The apparatus according to claim 11 or 12, wherein:

and/or the presence of a gas in the gas,

14. The apparatus according to any of claims 11-13, wherein the first scaling factor satisfies the following relationship:

15. The apparatus of any one of claims 11-14, further comprising:

16. A vehicle intention prediction apparatus characterized by comprising:

17. The apparatus according to claim 16, wherein, in rasterizing the target driving data to obtain target image samples, the encoding unit is specifically configured to:

on the first data channel, the value filled in the position represented by the fourth coordinate value obtained by scaling the first coordinate value which corresponds to the later time and the later time is larger; on the second data channel, the value filled at the position represented by the second reference information obtained by scaling the first reference information closer to the outlet area or the central position of the outlet area is larger;

18. The apparatus of claim 16 or 17, wherein:

and/or the presence of a gas in the gas,

19. An image data processing apparatus characterized by comprising:

20. A computer-readable storage medium, in which a computer program is stored which, when run on a processor, implements the method of any one of claims 1-10.