CN114581870A

CN114581870A - Trajectory planning method, apparatus, device and computer-readable storage medium

Info

Publication number: CN114581870A
Application number: CN202210216630.7A
Authority: CN
Inventors: 陈立; 胡胜超; 李弘扬; 李阳; 严骏驰
Original assignee: Shanghai AI Innovation Center
Current assignee: Shanghai AI Innovation Center
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-06-03

Abstract

The embodiment of the application discloses a track planning method, a device, equipment and a computer readable storage medium. The method comprises the following steps: acquiring images of a plurality of different angles for a target vehicle and navigation instruction information at the current moment; extracting and fusing features of the images at different angles, and determining a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment; performing network prediction according to the feature matrix corresponding to the current moment, determining a first feature matrix, and performing network prediction according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment to obtain a second feature matrix; fusing to obtain a feature matrix corresponding to the future moment based on the first feature matrix and the second feature matrix; carrying out object segmentation according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the future moment to obtain an object segmentation result; and planning the track according to the object segmentation result and the navigation instruction information to obtain the driving track at the future moment.

Description

Trajectory planning method, apparatus, device and computer-readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a trajectory planning method, apparatus, device, and computer-readable storage medium.

Background

With the rapid development of artificial intelligence, automatic driving is made possible. The method predicts the tracks of surrounding vehicles, pedestrians and the like according to a large amount of image data, senses dangers, plans the running track of the automatically driven vehicle in advance, and has important significance in the fields of traffic and the like.

In the prior art, most vehicle track prediction schemes adopt laser radar to detect point cloud data, and provide rich traffic information such as lanes, sidewalks, signal lamps and the like by combining with a high-precision map, so as to predict the driving track of a vehicle.

However, the construction of the high-precision map requires a large amount of manpower and material resources, and cannot be updated in real time, and because the vehicle trajectory planning scheme in the prior art depends on the point cloud data detected by the laser radar and the high-precision map, the trajectory planning efficiency is reduced.

Disclosure of Invention

The embodiment of the application provides a track planning method, a device, equipment and a computer readable storage medium, wherein various objects of different types around a vehicle are obtained by extracting information around the vehicle, predicting future information according to current information, segmenting according to the current information and the future information, and track planning is carried out by combining navigation instructions, so that an end-to-end prediction process is completed, and the track planning efficiency is improved.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a trajectory planning method, where the method includes: acquiring images of a plurality of different angles of a target vehicle and navigation instruction information at the current moment; extracting and fusing the features of the images at the different angles, and determining a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment; performing network prediction according to the feature matrix corresponding to the current moment, determining a first feature matrix representing uncertainty, and performing network prediction according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment to obtain a second feature matrix; fusing to obtain a feature matrix corresponding to the future moment based on the first feature matrix and the second feature matrix; carrying out object segmentation according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the future moment to obtain an object segmentation result; and planning a track according to the object segmentation result and the navigation instruction information to obtain a driving track at a future moment.

In a second aspect, an embodiment of the present application provides a trajectory planning apparatus, where the apparatus includes: the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring images of a plurality of different angles of a target vehicle and navigation instruction information at the current moment; the determining module is used for extracting and fusing the features of the images at the different angles, and determining a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment; the prediction module is used for carrying out network prediction according to the feature matrix corresponding to the current moment, determining a first feature matrix representing uncertainty, and carrying out network prediction according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment to obtain a second feature matrix; fusing to obtain a feature matrix corresponding to the future moment based on the first feature matrix and the second feature matrix; the segmentation module is used for carrying out object segmentation according to the characteristic matrix corresponding to the current moment and the characteristic matrix corresponding to the future moment to obtain an object segmentation result; and the planning module is used for planning a track according to the object segmentation result and the navigation indication information to obtain a driving track at a future moment.

In a third aspect, an embodiment of the present application provides a trajectory planning device, where the device includes a memory for storing executable instructions, and a processor for implementing the trajectory planning method when executing the executable instructions stored in the memory.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which executable instructions are stored, and when the computer-readable storage medium is executed by a processor, the trajectory planning method is implemented.

The embodiment of the application provides a track planning method, a device, equipment and a computer-readable storage medium. According to the scheme provided by the embodiment of the application, images of a plurality of different angles and navigation indication information of the current moment for a target vehicle are acquired; the images at different angles are subjected to feature extraction and fusion, a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment are determined, the feature matrix at each moment comprises all information around the target vehicle, and comprehensiveness of the information around the target vehicle expressed by the feature matrices is improved. Performing network prediction according to the feature matrix corresponding to the current moment, determining a first feature matrix representing uncertainty, and performing network prediction according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment to obtain a second feature matrix; and fusing to obtain a feature matrix corresponding to the future moment based on the first feature matrix and the second feature matrix. Carrying out object segmentation according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the future moment to obtain an object segmentation result; the object segmentation results represent various different classes of objects around the target vehicle, so that the objects are avoided when planning the trajectory, and the target vehicle is prevented from colliding with the objects. And performing track planning according to the object segmentation result and the navigation instruction information to obtain a driving track at a future moment, wherein the navigation instruction information reflects a path to be driven by the target vehicle at the future moment, and the track planning is performed by combining objects around the target vehicle to complete an end-to-end prediction process, so that the track planning efficiency is improved.

Drawings

Fig. 1 is an optional flowchart of a trajectory planning method according to an embodiment of the present application;

fig. 2 is an alternative flowchart of another trajectory planning method provided in the embodiment of the present application;

FIG. 3 is an alternative flow chart of another trajectory planning method provided in the embodiments of the present application;

FIG. 4A is an alternative schematic diagram of a prediction of the future provided by an embodiment of the present application;

FIG. 4B is an alternative illustration of a future prediction provided by an embodiment of the present application;

FIG. 4C is an alternative illustration of a future prediction provided by an embodiment of the present application;

FIG. 5 is an alternative flow chart of another trajectory planning method provided in the embodiments of the present application;

FIG. 6 is an alternative flow chart of another trajectory planning method provided in the embodiments of the present application;

FIG. 7 is an alternative schematic diagram of a trajectory plan provided by an embodiment of the present application;

FIG. 8 is an alternative schematic diagram of another trajectory plan provided by embodiments of the present application;

fig. 9 is a schematic structural diagram of a trajectory planning apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a trajectory planning apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It should be understood that some of the embodiments described herein are only for explaining the technical solutions of the present application, and are not intended to limit the technical scope of the present application.

For the convenience of understanding of the present invention, prior to the description of the embodiments of the present application, the related art in the embodiments of the present application will be described.

In the field of automatic driving, the following two schemes, modular design and end-to-end design, are generally adopted in trajectory planning. Wherein the modular design comprises three parts: perception-prediction-planning control. Compared with a modular design, the end-to-end design can avoid the problems of cascading errors, information loss and the like, the planning control is taken as an ultimate problem of automatic driving, the end-to-end can directly focus on the planning control, and the experience of a terminal user is fully optimized. In the related technology, end-to-end perception decision integration is mainly established by relying on a laser radar and a high-precision map, the characteristics are extracted through a depth network by inputting point cloud data obtained through laser radar detection, the possibility of vehicles at each position in a bird's-eye view is judged, and meanwhile, a safe route which accords with traffic rules and does not collide with other vehicles is planned by adding road surface information given by the high-precision map. However, the laser radar is difficult to actually put into production due to the expensive cost, a large amount of manpower and material resources are needed for constructing the high-precision map, and once the road condition changes, the road condition information needs to be collected again, so that the accuracy of providing the road information by the high-precision map is reduced.

Based on this, the trajectory planning scheme provided by the embodiment of the application adopts a pure visual input method to construct the vehicle information and the road surface condition around the vehicle at the current moment in real time, and uses the vehicle information and the road surface condition for a final trajectory planning task. Meanwhile, a high-precision map does not need to be constructed in advance, the track planning efficiency is improved on the premise of ensuring the safety of the finally planned route, and the extra cost is reduced.

An embodiment of the present application provides a trajectory planning method, as shown in fig. 1, where fig. 1 is an optional flowchart of the trajectory planning method provided in the embodiment of the present application, and the trajectory planning method includes the following steps:

s101, acquiring images of a plurality of different angles of the target vehicle and navigation instruction information at the current moment.

In the embodiment of the application, the target vehicle is a vehicle with a track to be planned, and a plurality of acquisition devices are mounted on the target vehicle and used for acquiring images of different visual angles around the vehicle, so that images of different angles for the target vehicle are obtained. Illustratively, the plurality of acquisition devices may be around-the-eye cameras.

In the embodiment of the application, the navigation instruction information is determined according to the position of the current target vehicle and the navigation route, wherein the navigation route can be generated by the departure place and the destination input by the user. The navigation instruction information at the current time represents a path or a direction in which the target vehicle is to travel at the next time, for example, turning left, going straight, turning right, and the like.

S102, extracting and fusing the features of the images at different angles, and determining a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment.

In the embodiment of the application, the images at different angles are two-dimensional images, the features of the two-dimensional images need to be extracted, and the features of the two-dimensional images are fused according to the position relationship between each image and the target vehicle, so that the three-dimensional image features are obtained. The three-dimensional image features can be represented in a vector form and represent the spatial position information of each object. In trajectory planning, it is necessary to avoid physical collision between the target vehicle and another object, and it is essential to consider positional overlap on a two-dimensional plane in a bird's eye view (top view). Flattening the three-dimensional image features in the form of a bird's eye view may also be understood as compressing the three-dimensional image features in height, thereby converting them into two-dimensional image features. The two-dimensional image features are different from the two-dimensional image features of the plurality of different angles, represent plane position information of each object, and are complete, rich and comprehensive. The two-dimensional image features of the plurality of different angles represent the spatial position information of the object in different visual angles, and are local and one-sided.

In the embodiment of the application, when the feature matrix corresponding to each moment is determined, the steps of feature extraction, fusion and flattening are performed on the image. Illustratively, images of a plurality of different angles around the target vehicle at the moment are acquired, and the two-dimensional image feature at the moment is obtained through the steps of feature extraction, fusion and flattening. And then fusing the two-dimensional image characteristics at the historical moment and the two-dimensional image characteristics at the current moment to obtain a characteristic matrix corresponding to the current moment and a characteristic matrix corresponding to the historical moment. The feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment can both represent the plane position information of each object under the bird's-eye view at the moment, and can be understood as the probability that each point is a vehicle, a pedestrian, a lane line, a driving area and the like.

S103, performing network prediction according to the feature matrix corresponding to the current moment, determining a first feature matrix representing uncertainty, and performing network prediction according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment to obtain a second feature matrix; and fusing to obtain a feature matrix corresponding to the future moment based on the first feature matrix and the second feature matrix.

The trajectory planning method provided by the embodiment of the application adopts a pure visual input method to construct the vehicle information and the road surface condition around the vehicle at the current moment in real time, and needs to predict the feature matrix at the future moment based on the image at the current moment and the image at the historical moment.

In the embodiment of the present application, the historical time represents a time before a preset time period from the current time, and the historical time may include one or more times before the current time. For example, when the history time includes one time, the history time is the previous time of the current time, the current time is time t, and the history time is time (t-1). When the historical time includes a plurality of times, the current time is time t, the historical time includes time (t-1), time (t-2), time (t-3), time (t-4), and the like, and the number of times included in the historical time is not limited in the embodiment of the application.

In the embodiment of the present application, the future time represents a time after a preset time period from the current time, and the future time may include one or more times. For example, when the future time includes one time, the future time is the next time of the current time, the current time is time t, and the future time is time (t + 1). When the future time includes a plurality of times, the current time is the time t, the future time includes the time (t +1), the time (t +2), the time (t +3), the time (t +4), and the like, and the embodiment of the present application does not limit the number of times included in the future time.

It should be noted that, in the embodiment of the present application, 1 of the difference between the current time and the previous time of the current time represents a time step, and the time step may be set by a person skilled in the art as appropriate according to an actual situation, for example, the step may be 0.01s, 0.1s, 1s, or 5s, which is not limited to this embodiment of the present application.

In the embodiment of the present application, the future prediction may be performed by a preset time series model. The predetermined time series model may be understood as a machine learning model, and may be any suitable Neural Network (NN) model that can be used for future prediction of the feature matrix, including but not limited to: a Gated Recurrent Unit (GRU), a Recurrent Neural Network (RNN), a Long Short-Term Memory Neural Network (LSTM), and the like.

Illustratively, taking the time series model as LSTM as an example, three gate functions are introduced into LSTM: the input gate, the forgetting gate and the output gate are used for controlling the input value, the memory value and the output value. The input of the LSTM comprises a characteristic matrix corresponding to the current time (t), a hidden layer state of the previous time (t-1) and a characteristic prediction matrix corresponding to the current time (t) predicted by the previous time (t-1). And outputting a characteristic prediction matrix corresponding to the next time (t +1) predicted by the current time (t).

Illustratively, the time series model is a GRU, which may be understood as a variant of LSTM. The GRU includes two gate functions, a reset gate and an update gate. The updating door is used for controlling the degree of the state information of the previous moment brought into the current state, and the larger the value of the updating door is, the more the state information of the previous moment is brought; the reset gate is used to control the extent to which the state information at the previous time is ignored, and the smaller the value of the reset gate, the more it is ignored. Resetting the gate helps to capture short term dependencies in the time series; the update gate helps to capture long term dependencies in the time series. The input of the GRU comprises a feature matrix corresponding to the current time (t) and the hidden layer state of the previous time (t-1), the hidden layer state of the current time (t) is output, and then the hidden layer state of the current time (t) is decoded to obtain the feature matrix corresponding to the next time (t + 1).

In the embodiment of the application, after the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment are obtained, prediction is carried out in two paths, network prediction is carried out on one path according to the feature matrix corresponding to the current moment, uncertainty is introduced, and a first feature matrix is obtained. And the other path carries out network prediction according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment, and takes historical information into consideration to obtain a second feature matrix. And then, based on the first feature matrix and the second feature matrix, a feature matrix corresponding to the future time is obtained through fusion, so that the feature matrix corresponding to the future time takes the historical information into consideration, the stability is realized, the future uncertainty is introduced, and the accuracy of the feature matrix corresponding to the future time is improved.

And S104, carrying out object segmentation according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the future moment to obtain an object segmentation result.

In the embodiment of the present application, after obtaining the feature matrix corresponding to the current time and the feature matrix corresponding to the future time, the following task processing may be performed, where the task includes, but is not limited to, Semantic segmentation (Semantic segmentation), instance segmentation (instance segmentation), Cost Function (Cost Function), and high-precision map.

For example, semantic segmentation refers to gathering image portions belonging to the same target together, and it is required to predict a label to which each pixel of an input image belongs. Semantic segmentation is used to obtain pixel points belonging to the same class of objects, for example, the types of objects in an image include cars, pedestrians, and signal lights, and the semantic segmentation is used to distinguish which pixel points belong to cars, which pixel points belong to pedestrians, and which pixel points belong to signal lights.

Illustratively, the example segmentation means that different examples are automatically framed out of the image by an Object detection (Object detection) method, and pixel-by-pixel labeling is performed in different example areas by a semantic segmentation method. The example segmentation is to distinguish different individuals of the same class on the basis of semantic segmentation, and is used for obtaining pixel points belonging to the same object, for example, one image includes two cars, and the example segmentation is used for distinguishing which pixel points belong to a first car and which pixel points belong to a second car.

For example, the cost function can be understood as a matrix for evaluating the quality of a track, and the better a track is, the lower the cost is. The cost matrix may be understood as a cost associated with each point constituting the trajectory in a coordinate system centered on the position of the target vehicle, the cost of the trajectory being the sum of the costs of the points at the corresponding positions. The cost function of the surroundings of the target vehicle can be automatically generated by a learning method.

In the embodiment of the application, the object segmentation comprises semantic segmentation and instance segmentation. The description will be given by taking semantic segmentation as an example, and when performing semantic segmentation, the semantic segmentation can be realized through a semantic segmentation network to obtain a plurality of pixel sets belonging to different categories. The semantic segmentation network may be any type of classification network, and the embodiment of the present application does not limit the structure of the adopted classification network as long as it can perform example segmentation on pixel points, including but not limited to Full Convolution Networks (FCNs), AlexNet, VGG-Net, google lenet, U-Net, SegNet, deep lab, and PSPNet.

In the embodiment of the application, when the object is segmented, the probability that each point at the current moment is a vehicle, a pedestrian, a lane line, a driving area and the like is considered, the probability that each point at the future moment is a vehicle, a pedestrian, a lane line, a driving area and the like is also considered, the accuracy of the object segmentation result is improved, and therefore a basis is provided for avoiding points where a target vehicle is likely to collide, and a safe route where the target vehicle does not collide with other objects is planned when a track is planned.

And S105, planning a track according to the object segmentation result and the navigation instruction information to obtain a driving track at a future moment.

In the embodiment of the present application, the navigation instruction information merely indicates the action or direction at the next time, and cannot provide a specific travel track, for example, which lane the vehicle travels in when turning right, the type of route (S-type, C-type, bezier curve) the vehicle travels, and the like, to the target vehicle. Therefore, the driving track at the future moment is obtained by predicting according to the navigation instruction information and the object segmentation result, and the driving track can be understood as a driving track in a real scene and a curve more conforming to the driving habit of a human body, for example, on the premise that the driving direction is not changed, a small-amplitude turning curve is performed to avoid a vehicle or a pedestrian instead of a theoretically perfect curve.

According to the scheme provided by the embodiment of the application, images of a plurality of different angles and navigation indication information of the current moment for a target vehicle are acquired; the images at different angles are subjected to feature extraction and fusion, a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment are determined, the feature matrix at each moment comprises all information around the target vehicle, and comprehensiveness of the information around the target vehicle expressed by the feature matrices is improved. Performing network prediction according to the feature matrix corresponding to the current moment, determining a first feature matrix representing uncertainty, and performing network prediction according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment to obtain a second feature matrix; and fusing to obtain a feature matrix corresponding to the future moment based on the first feature matrix and the second feature matrix. Carrying out object segmentation according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the future moment to obtain an object segmentation result; the object segmentation results represent various different classes of objects around the target vehicle, so that the objects are avoided when planning the trajectory, and the target vehicle is prevented from colliding with the objects. And performing track planning according to the object segmentation result and the navigation indication information to obtain a driving track at a future moment, wherein the navigation indication information reflects a path to be driven by the target vehicle at the future moment, and the track planning is performed by combining objects around the target vehicle to complete an end-to-end prediction process and improve the track planning efficiency.

In some embodiments, the above S102 may include the following steps. As shown in fig. 2, fig. 2 is an optional flowchart of another trajectory planning method provided in the embodiment of the present application.

And S1021, acquiring an internal reference matrix and an external reference matrix of an acquisition device, wherein the acquisition device is arranged on the target vehicle and is used for acquiring images at a plurality of different angles.

In the embodiment of the application, the process of camera imaging is the transformation of three-dimensional space coordinates to two-dimensional image coordinates, which is a projection process. A camera matrix (camera matrix) is a projection relationship that establishes a three-dimensional to two-dimensional relationship. The camera matrix comprises an internal reference matrix and an external reference matrix, wherein the internal reference matrix (internal matrix) is related to the cameras and comprises a focal length, a principal point coordinate position relative to an imaging plane, a coordinate axis inclination parameter and a distortion parameter, the internal reference matrix reflects the attributes of the cameras, and the internal reference matrix of each camera is different. The Extrinsic matrix (Extrinsic matrix) depends on the position of the camera in the world coordinate system and includes a rotation matrix and a translation matrix, which together describe how to convert points from the world coordinate system to the camera coordinate system. The rotation matrix describes the orientation of the coordinate axes of the world coordinate system relative to the coordinate axes of the camera coordinate system, and the translation matrix describes the position of the spatial origin under the camera coordinate system. The number of the acquisition devices can be multiple and the acquisition devices are arranged at different positions on the target vehicle so as to acquire images at different angles.

And S1022, performing feature extraction on the images at different angles to obtain a plurality of image features.

In the embodiment of the application, when the image is subjected to feature extraction, the image can be extracted through a feature extraction network. The feature extraction network may be a backbone network of Convolutional Neural Networks (CNNs), including but not limited to ResNet50, ResNet101, ResNet152, and Res2Net, among others. The embodiments of the present application are not limited thereto.

In the embodiment of the application, the images at the various angles are respectively subjected to feature extraction to obtain a plurality of image features, and the plurality of image features represent image features corresponding to the images at the various angles.

And S1023, respectively carrying out depth prediction on the images of all the angles according to the plurality of image characteristics to obtain the image depth information of all the angles.

In the embodiment of the present application, the three-dimensional image feature is a 3d (dimensional) feature, which may be represented by a four-dimensional vector (x, y, z, C), where x, y, z represent three-dimensional position coordinates of an object in space, and C represents a free vector, which may be set by a person skilled in the art according to practical situations. x or y may represent depth, which refers to how far an object is from the acquisition device (e.g., camera) in space, with greater object depth representing more distance of the object from the camera. z represents height.

In the embodiment of the present application, a Depth prediction model (Depth Predictor) is used to perform Depth prediction on an image at each angle according to image features (i.e., multiple image features) at each angle, so as to obtain image Depth information at each angle. The depth prediction model is used to estimate the depth distribution of the image. The depth prediction model may be understood as a machine learning model, and may be any suitable convolutional neural network including multiple convolutional layers, which can be used for depth prediction according to image features, and the specific structure of the convolutional neural network used in the embodiments of the present application is not limited.

And S1024, carrying out coordinate conversion according to the image characteristics and the image depth information of each angle by taking the position of the target vehicle as a center through the internal reference matrix and the external reference matrix of the acquisition device to obtain the three-dimensional image characteristics.

In the embodiment of the present application, there are four coordinate systems in the camera, which are respectively a world coordinate system (world), a camera coordinate system (camera), an image coordinate system (image), and a pixel coordinate system (pixel), and the imaging process is a process of the world coordinate system-the camera coordinate system-the image coordinate system-the pixel coordinate system. The external reference matrix is a description (or a position posture) of the world coordinate system under the camera coordinate system, and can be understood as being used for the interconversion of the world coordinate system and the camera coordinate system, the internal reference matrix is used for the interconversion of the camera coordinate system to the image coordinate system, and the image is measured in pixels, so that the image coordinate system needs to be converted into a pixel coordinate system.

In the embodiment of the present application, the process from the acquired images from a plurality of different angles to the three-dimensional image features is the inverse process of the above imaging process, that is, the image (pixel level) in the pixel coordinate system is converted into the features in the world coordinate system according to the internal reference matrix and the external reference matrix.

It should be noted that, since the capturing devices are distributed at various angles of the target vehicle, that is, the images at different angles correspond to the camera coordinate system of the capturing devices. Therefore, in order to perform coordinate conversion, it is necessary to convert image features (i.e., a plurality of image features) of respective angles in different camera coordinate systems into the same coordinate system centered on the position of the target vehicle, or a coordinate system constructed with the center position in the vehicle body coordinate system as the origin of coordinates. And taking the image features of all angles under the same coordinate system and the corresponding image depth information as the three-dimensional image features.

In the embodiment of the application, the three-dimensional image features are obtained through feature extraction, depth prediction and coordinate conversion, the three-dimensional image features comprise three-dimensional position information of images around a target vehicle, and the three-dimensional image features have comprehensiveness and completeness. When the three-dimensional image characteristics are utilized for track planning, the target vehicle can be prevented from colliding with other objects, and the track planning efficiency is improved.

And S1025, adding the image features with different heights at the same position in the three-dimensional image feature to obtain a two-dimensional feature matrix at the current moment.

In an embodiment of the application, the three-dimensional image features represent stereoscopic position information. In the trajectory planning, the physical positional overlap between the target vehicle and the other object on the two-dimensional plane under the bird's eye view is considered. Therefore, the three-dimensional image features need to be highly compressed and thus converted into two-dimensional image features, i.e., a two-dimensional feature matrix at the current time. If the three-dimensional image features are 3D features and can be represented by four-dimensional vectors (x, y, z, C), the resulting two-dimensional feature matrix is 2D features and can be represented by three-dimensional vectors (x, y, C). When the three-dimensional image features are flattened to obtain the two-dimensional image features, the image features with different heights at the same position in the three-dimensional image features are added, so that the two-dimensional image features still retain all the features in the three-dimensional image features, can reflect the three-dimensional position information of the image around the target vehicle, and have comprehensiveness and integrity.

And S1026, acquiring a two-dimensional historical characteristic matrix of the historical images at different angles.

In the embodiment of the application, in the running process of the target vehicle, the steps of feature extraction, fusion and flattening are carried out on the historical images of a plurality of different angles at each moment, and a two-dimensional historical feature matrix at the moment is obtained. The same processing steps as those of the images at a plurality of different angles at the current moment can be specifically referred to in S1021-S1025, and are not described herein again.

S1027, determining a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment according to the two-dimensional feature matrix and the two-dimensional historical feature matrix of the current moment.

In the embodiment of the application, because there are some relations between the two-dimensional image features at the historical time and the two-dimensional image features at the current time on the time features and the object features, the two-dimensional image features at the historical time and the two-dimensional image features at the current time need to be subjected to information mixing, and the comprehensiveness, integrity and accuracy of the fused features are improved.

In the embodiment of the application, the 3D features are obtained through feature extraction, depth prediction and coordinate conversion, and then the image features with different heights on the same position in the 3D features are added to obtain the feature matrix, all the features in the 3D features are reserved in the feature matrix, the three-dimensional position information of the image around the target vehicle can be reflected, and the accuracy of the feature matrix is improved.

In some embodiments, S1027 described above may be implemented in the following manner. Taking the position of the target vehicle at the current moment as a center, and unifying the two-dimensional historical feature matrix and the two-dimensional feature matrix at the current moment to obtain the two-dimensional feature matrix at the historical moment; and performing three-dimensional convolution processing on the two-dimensional feature matrix at the current moment and the two-dimensional feature matrix at the historical moment to obtain the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment.

In the embodiment of the application, images of a plurality of different angles around a target vehicle are obtained at both a historical moment and a current moment, and then a two-dimensional historical feature matrix of the historical moment and a two-dimensional feature matrix of the current moment are obtained through the steps of feature extraction, fusion and flattening. In the fusion step, the position of the target vehicle at the corresponding moment is taken as the center, and the positions of the target vehicle corresponding to the historical moment and the current moment are different along with the movement of the target vehicle. Therefore, it is also necessary to unify the coordinates of the two-dimensional historical feature matrix at the historical time and the two-dimensional feature matrix at the current time, and convert the coordinates corresponding to the two-dimensional historical feature matrix at the historical time with the position of the target vehicle at the current time as a center, so as to obtain the two-dimensional feature matrix at the historical time after the coordinates are unified.

In the embodiment of the application, the two-dimensional feature matrix at the historical moment and the two-dimensional feature matrix at the current moment are not independent because of the time relationship. After the two-dimensional feature matrix at the historical time is obtained, three-dimensional convolution processing can be performed on the two-dimensional feature matrix at the current time and the two-dimensional feature matrix at the historical time by using a three-dimensional convolution neural network (3D-CNN). The three-dimensional convolution neural network is formed by 3D convolution, time dimension (continuous frames) is increased, and time features and space features can be extracted simultaneously. By way of example, convolving 3 consecutive image frames (images acquired at 3 adjacent time instants) with a three-dimensional convolution kernel, it can be understood that by convolving 3 consecutive image frames with 3 different two-dimensional convolution kernels (convolution kernels used in two-dimensional convolution neural networks) and adding the convolution results, the three-dimensional convolution network extracts some kind of correlation between the times.

In the embodiment of the application, the two-dimensional feature matrix at the historical moment and the two-dimensional feature matrix at the current moment are input into a three-dimensional convolution neural network for three-dimensional convolution processing, and the information of the two-dimensional feature matrix at the historical moment and the information of the two-dimensional feature matrix at the current moment are mixed, so that the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment are output. The obtained feature matrix is the feature of the mixed information at each moment, and the feature of the time dimension is added on the basis of the respective two-dimensional feature, so that the comprehensiveness and the accuracy of the feature matrix are improved.

In some embodiments, the step S103 may include the following steps, as shown in fig. 3, where fig. 3 is an optional flowchart of another trajectory planning method provided in this embodiment of the present application.

And S1031, utilizing a preset first time series model to conduct future prediction on the feature matrix corresponding to the current moment, and obtaining a first feature matrix corresponding to the future moment.

S1032, the feature matrix corresponding to the historical moment and the feature matrix corresponding to the current moment are subjected to future prediction by using a preset second time series model, and a second feature matrix corresponding to the future moment is obtained.

In the embodiment of the application, the future time is predicted through the feature matrix corresponding to the current time, historical information is not considered, and uncertainty is introduced, so that the predicted feature matrix of the future time can cope with the future uncertainty. And predicting the future time by the characteristic matrix corresponding to the historical time and the characteristic matrix corresponding to the current time, and considering historical information to ensure that the predicted characteristic matrix of the future time has stability.

And S1033, fusing the first feature matrix and the second feature matrix by using a preset hybrid prediction network to obtain a feature matrix corresponding to the future moment.

In the embodiment of the application, the hybrid prediction network is used for performing convolution processing on the first feature matrix and the second feature matrix corresponding to the future time respectively predicted by the two modes to obtain the feature matrix corresponding to the future time, so that the accuracy of the feature matrix at the future time is improved.

In the embodiment of the application, the hybrid prediction network can be understood as a two-dimensional convolutional neural network (2D-CNN), the two-dimensional convolutional neural network is formed by 2D convolution, and compared with 3D convolution, the two-dimensional convolutional neural network is used for identifying each frame of image by using CNN respectively in the field of image processing, and information of time dimension is not considered.

In the embodiment of the application, the two-dimensional convolution processing is carried out on the feature matrix of the future time predicted by two different considered angles through the hybrid prediction network, not only is historical information considered, but also future uncertainty can be dealt with, and the obtained feature matrix has certain stability and robustness.

In some embodiments, S1031 may be implemented as follows. Processing a characteristic matrix corresponding to the current moment by using a preset first time sequence model to generate a future uncertain probability matrix; and carrying out future prediction on the future uncertain probability matrix and the feature matrix corresponding to the current moment to obtain a first feature matrix corresponding to the future moment.

In an embodiment of the present application, the first time series model may include a convolutional network, a first time series network, and a decoder. The feature matrix corresponding to the current moment is subjected to convolution processing through a convolution network, the obtained matrix can represent future uncertain probability, and for convenience of expression, the future uncertain probability matrix represents the matrix generated through the convolution processing. And taking the future uncertain probability matrix as the state input of the first time series network, and outputting a first feature matrix corresponding to the future time.

In some embodiments, the obtaining of the first feature matrix corresponding to the future time by performing the future prediction on the future uncertain probability matrix and the feature matrix corresponding to the current time may be implemented in the following manner. Predicting a characteristic matrix corresponding to the current moment by using a first time sequence model and a future uncertain probability matrix to obtain a first hidden layer state of the current moment; decoding a first hidden layer state at the current moment to obtain a first characteristic prediction matrix at the next moment of the current moment; continuously predicting the first feature prediction matrix at the next moment by using the first time series model and the future uncertain probability matrix until the feature matrix prediction at the last moment of the future moment is finished to obtain a plurality of first feature prediction matrixes at moments after the current moment; and taking a plurality of first feature prediction matrixes at the time points after the current time point as first feature matrixes corresponding to the future time points.

In this example, the future uncertain probability matrix and the feature matrix corresponding to the current time (t) are input into the GRU, so as to generate the hidden layer state at the current time (t), and the decoder is used to decode the hidden layer state pair at the current time (t) to obtain the first feature prediction matrix corresponding to the next time (t + 1). And inputting the future uncertain probability matrix and the first feature prediction matrix corresponding to the next moment (t +1) into the GRU, so as to generate the first feature prediction matrix corresponding to the next moment (t +2) of the next moment, and repeating the steps to obtain the first feature matrix corresponding to the future moment.

In the embodiment of the application, when the prediction of the first feature matrix at the future time is carried out, the future uncertain probability matrix is introduced for any time at the future time without considering the historical information, so that the predicted feature matrix at the future time can cope with the future uncertainty.

In some embodiments, the historical time includes a plurality of times before the current time and the future time includes a plurality of times after the current time. The above S1032 may be implemented in the following manner. Predicting a feature matrix corresponding to the first moment in the historical moments by using the second time series model and the initial state to obtain a hidden layer state of the first moment in the historical moments; predicting a feature matrix corresponding to a second moment in the historical moments by using a second time series model and the hidden layer state of the first moment to obtain the hidden layer state of the second moment in the historical moments; the second moment is the next moment of the first moment; continuously predicting the characteristic matrix corresponding to the third moment based on the second time series model and the hidden layer state of the second moment until the characteristic matrix of the current moment is predicted, and obtaining the second hidden layer state of the current moment; decoding the second hidden layer state at the current moment to obtain a second characteristic prediction matrix at the next moment of the current moment; predicting a second characteristic prediction matrix at the next moment of the current moment based on a second time series model and a second hidden layer state at the current moment to obtain a second hidden layer state at the next moment; continuing to predict a second feature prediction matrix at the next moment of the next moment based on the second time series model and the second hidden layer state at the next moment until the feature matrix prediction at the last moment of the future moment is finished, and obtaining a plurality of second feature prediction matrixes at moments after the current moment; and taking a plurality of second characteristic prediction matrixes at the time moments after the current time moment as second characteristic matrixes corresponding to the future time moments.

For example, the initial state may be a preset state, or a feature matrix corresponding to the first time (i.e. time (t-5)) may be set as the initial state, which is not limited in this embodiment of the application. The second time series model may include a second time series network and a decoder. Taking historical moments including 5 moments as an example for explanation, inputting a feature matrix corresponding to the moment (t-5) as an initial state into a second time series network for prediction, and obtaining a hidden state of the moment (t-5); inputting the hidden layer state at the time (t-5) and the characteristic matrix corresponding to the time (t-4) into the second time series network for prediction to obtain the hidden layer state at the time (t-4); and analogizing in sequence to obtain a hidden layer state at the moment (t-1); the preheating process of the second time series network is realized, and the second time series network is iteratively updated by taking historical information into account. And then, inputting the hidden layer state at the time (t-1) and the feature matrix corresponding to the current time (namely the time t) into a second time series network for prediction to obtain the hidden layer state at the current time (namely the time t). And decoding the hidden layer state at the current moment by using a decoder to obtain a characteristic prediction matrix corresponding to the (t +1) moment. And inputting the hidden layer state at the current moment and the characteristic prediction matrix corresponding to the (t +1) moment obtained by decoding into a second time series network for prediction to obtain the hidden layer state at the (t +1) moment. And decoding the hidden layer state at the time (t +1) by using a decoder to obtain a characteristic prediction matrix corresponding to the time (t + 2). By analogy, a second feature matrix corresponding to the future time can be obtained. Similarly, the future time includes a plurality of times after the current time, for example, time (t +1), time (t +2), time (t +3), time (t +4), and time (t +5), and the embodiment of the present application does not limit the number of times included in the future time.

In the embodiment of the present application, when predicting the second feature matrix at a future time, historical information is taken into consideration, and for any time at the future time, correlation is provided with the feature at the previous time, so that the feature matrix at the predicted future time has stability.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

The embodiment of the application also provides a track planning method. As shown in fig. 4A, fig. 4A is an alternative diagram for predicting the future according to the embodiment of the present application.

In the embodiment of the present application, it is described by taking an example that the feature matrix corresponding to the history time includes the feature matrix of history 3 frames, and the first time series network and the second time series network are GRUs, where a hybrid prediction gate is used as a hybrid prediction network in fig. 4A, and S is used as^t ₁、S^t ₂…S^t _tFeature matrix representing historical 3 frames, where S^t _tAnd representing the feature matrix of the image frame at the time t, namely the feature matrix of the current frame. And dividing the historical 3 frames into two paths according to the feature matrix of the historical 3 frames to respectively predict, namely the upper path prediction and the lower path prediction.

Illustratively, uncertainty is introduced in the prediction of the upper route, as shown in fig. 4B, fig. 4B is an alternative diagram for predicting the future provided by the embodiment of the present application, and illustratively, the feature matrix S for the current frame^t _t(namely the characteristic matrix corresponding to the current moment) is convolved to generate a future uncertain probability matrix r_tAnd the future uncertainty probability matrix r is divided_tAs the state input of GRU, combining the feature matrix S of the current frame^t _tGenerating a feature matrix S of the future 1 st image frame^t _t+1. It should be noted here that the future uncertainty probability matrix r is_tAnd a feature matrix S of the current frame^t _tInputting GRU, and outputting the hidden of the current frameLayer state h^t _t. Hidden layer state h for current frame^t _tDecoding to obtain a feature matrix S of the future 1 st image frame^t _t+1. For the sake of understanding the overall process, the detail of the portion is not shown in fig. 4B, but only the feature matrix S of the future 1 st image frame is generated by the GRU^t _t+1The illustration is for the sake of example. The future uncertainty probability matrix r is then formed_tAnd a feature matrix S for the future 1 st image frame^t _t+1Inputting GRU, predicting to obtain feature matrix S of future 2 nd image frame^t _t+2And so on, a first feature matrix of the future 4 frames is obtained, and S with a square shape in fig. 4A is used to distinguish the first feature matrix in fig. 4B from the second feature matrix in fig. 4C for convenience of distinguishing^t _t+1、S^t _t+2、S^t _t+3、S^t _t+4A first feature matrix representing 4 frames in the future.

For example, the future change information is predicted by the downlink prediction according to the historical change information, as shown in fig. 4C, fig. 4C is an alternative diagram for predicting the future provided by the embodiment of the present application, and the feature matrix S of the historical 1 st image frame is used^t ₁Set to initial state h^t ₀And using the feature matrix (S) of the history 3 frames^t ₁、S^t ₂…S^t _t) As the state input, the hidden layer state (h) of each image frame is adopted^t ₁、h^t ₂…) to pre-heat network configuration parameters of the GRU. Illustratively, the feature matrix S of the 1 st image frame of the history^t ₁Set to initial state h^t ₀And a feature matrix S of the historical 1 st image frame^t ₁Inputting GRU, outputting hidden layer state h of history 1 st image frame^t ₁. Hiding the hidden layer state h of the 1 st image frame in history^t ₁And a feature matrix S of the historical 2 nd image frame^t ₂Inputting GRU, outputting hidden layer state h of history 2 nd image frame^t ₂And the rest is repeated until the hidden layer state h of the (t-1) th image frame of the history is output^t _t-1Completing the preheating process of GRU. Then the hidden layer state h of the historical (t-1) th image frame^t _t-1And the feature matrix S of the current time^t _tInputting GRU and outputting hidden layer state h of current time^t _tHidden layer state h for the current time by the decoder^t _tDecoding to obtain the characteristic matrix S corresponding to the next moment^t _t+1. Hidden layer state h according to current time^t _tAnd the predicted feature matrix S^t _t+1Predicting hidden layer state h at next time^t _t+1Hidden layer state h for the next time^t _t+1Decoding to obtain the characteristic matrix S of the next moment^t _t+2In this way, the second feature matrix of the future 4 frames is also predicted, which is shown as S in the circle in FIG. 4A^t _t+1、S^t _t+2、S^t _t+3、S^t _t+4A second feature matrix representing 4 frames in the future.

In the embodiment of the present application, for the contents of the two paths of predictions, a hybrid prediction gate is introduced to fuse the two paths of information, the hybrid prediction gate is formed by 2D convolution, the two paths of predicted feature matrices generate probabilities corresponding to respective moments through the hybrid prediction gate, and the probabilities are weighted and summed to obtain a feature matrix of 4 frames in the future, where S is a matrix form in fig. 4A^t _t+1、S^t _t+2、S^t _t+3、S^t _t+4The feature matrix representing 4 frames in the future. By the method, when the characteristic matrix corresponding to the future moment is predicted, historical information can be considered, future uncertainty can be dealt with, and certain stability and robustness are achieved.

It should be noted that the network design for predicting the future provided in fig. 4A to 4C may be applied to an automatic driving scenario to complete the prediction process of the feature matrix corresponding to the future time; the method can also be used in video prediction scenes to complete the prediction of videos and other tasks needing time sequence prediction. The embodiment of the present application is not limited to the above prediction contents, and only the feature matrix corresponding to the predicted future time is taken as an example for explanation.

In some embodiments, the above S104 may include the following steps. As shown in fig. 5, fig. 5 is an alternative flowchart of another trajectory planning method provided in the embodiment of the present application.

S1041, performing fine granularity feature extraction on a feature matrix corresponding to the current moment and a feature matrix corresponding to the future moment by using a preset deep neural model to obtain image edge features; and the preset deep neural model is used for extracting fine granularity characteristics.

And S1042, carrying out object segmentation according to the image edge characteristics to obtain an object segmentation result.

In the embodiment of the application, after the feature matrix corresponding to the current moment and the feature matrix corresponding to the future moment are obtained, the feature matrix corresponding to the current moment and the feature matrix corresponding to the future moment are subjected to fine-granularity feature extraction through the depth neural model, so that edge information is extracted, the edge features of the image are obtained, and the accuracy and the integrity of the edge features of the image are improved.

In the embodiment of the present application, the deep neural model may be understood as a machine learning model, and may be any suitable convolutional neural network that can be used for performing feature extraction of fine granularity, for example, a ResNet deep network, a SuperPoint network, an AlexNet network, and the like.

In the embodiment of the application, after the image edge features are obtained, a convolution head can be allocated to each task for analysis, so that a probability matrix of the corresponding task is generated. For example, semantic segmentation is taken as an example in object segmentation, after the image edge features are obtained, a semantic segmentation task needs to be executed, a two-dimensional convolution head is allocated to the semantic segmentation task, and finally, probabilities of corresponding categories are generated through a normalized softmax function, so that a semantic segmentation result of an area around a target vehicle under a bird's eye view is generated. Because the image edge features comprise richer fine granularity features, the accuracy of the semantic segmentation result is improved.

In some embodiments, the above S105 includes the following steps. As shown in fig. 6, fig. 6 is an alternative flowchart of another trajectory planning method provided in the embodiment of the present application.

S1051, obtaining dynamic state information of the target vehicle.

In the embodiment of the present application, the dynamic state information represents the driving state information of the target vehicle at the current moment, including but not limited to vehicle speed, acceleration, deceleration, safe distance and the like.

And S1052, generating a plurality of running tracks in a plurality of directions according to the dynamic state information.

In the embodiment of the present application, when the vehicle travels on a road, there are behaviors of left turn, right turn, straight running, and u-turn, and the behavior for u-turn is achieved by continuous left turn or continuous right turn, and thus, the plurality of directions may include left turn, right turn, and straight running. Each direction may include a plurality of driving tracks, and a series of tracks (i.e., a plurality of driving tracks in each direction) are sampled according to the dynamic state information of the current target vehicle, as shown in fig. 7, fig. 7 is an alternative schematic diagram of a track planning provided in the embodiment of the present application. Fig. 7 shows a plurality of travel paths in a plurality of directions, and since the left turn and the right turn are relatively complicated, the travel paths related to the left turn and the right turn are generated in a large amount, and since the straight travel is relatively simple, the travel paths related to the straight travel are generated in a small amount.

S1053, determining a cost matrix according to the object segmentation result; the cost matrix represents the probability of collision when the target vehicle runs through a preset position.

In the embodiment of the application, a cost matrix is constructed according to the object segmentation result and a preset matrix construction rule, wherein the cost matrix represents the probability of collision when the target vehicle runs through the position and can also be understood as the cost paid by the target vehicle when the target vehicle runs through the position. The cost matrix is used for evaluating the track quality, and when the track planning is carried out, the track with lower cost is selected. The preset matrix building rules may include: if the position is occupied by an object such as a vehicle at that time, the position is set to a cost within a first preset range, for example, a higher cost. If the position is not occupied by an object such as a vehicle at the moment, the position is set to a cost within a second preset range, for example, a lower cost. Wherein the cost in the first preset range is higher than the cost in the second preset range. In a coordinate system centered on the position of the target vehicle, the sum of costs of a plurality of points on the position constituting the trajectory is taken as the cost of the trajectory.

And S1054, determining a driving track at a future moment in a plurality of driving tracks in a plurality of directions according to the cost matrix and the navigation instruction information.

In the embodiment of the present application, the navigation instruction information reflects the route on which the target vehicle will travel at a future time. And constructing a cost matrix according to the object segmentation result, wherein the cost matrix reflects the cost paid by the target vehicle for driving a certain driving track at the future moment. And comprehensively considering according to the cost matrix and the navigation indication information, so that the driving track at the future moment is determined according to the plurality of driving tracks in the plurality of directions, and the accuracy of the driving track at the future moment is improved.

In some embodiments, the S1054 may include S1054a-S1054 c.

S1054a specifies a plurality of candidate travel paths in the target direction among the plurality of travel paths in the plurality of directions based on the navigation instruction information.

In the embodiment of the present application, a plurality of travel paths in a plurality of directions are selected based on navigation instruction information, the navigation instruction information is shown as a high-level operation in fig. 7, and the example of turning right with the high-level operation instruction in fig. 7 is described, and a plurality of travel paths corresponding to the right turn are selected as a plurality of candidate travel paths in a target direction.

And S1054b, scoring the candidate running tracks according to the cost matrix to obtain scoring results of the candidate running tracks.

S1054c specifies a future travel locus among the plurality of candidate travel loci based on the evaluation results of the plurality of candidate travel loci.

In the embodiment of the application, after the candidate driving tracks in the target direction are screened out, the candidate driving tracks in the target direction are graded by combining the cost matrix to obtain the grading result of each candidate driving track, and the grading result reflects the probability that the target vehicle is collided when the target vehicle is allowed to drive through the candidate driving tracks. Therefore, the tracks which can be driven at the future moment (namely the driving tracks at the future moment) are determined by sequencing according to the plurality of grading results, namely the optimal track is selected, and the track planning efficiency is improved.

In some embodiments, after the above S1054b, the trajectory planning method further includes the following steps. Acquiring a characteristic matrix corresponding to a forward looking angle contained in a characteristic matrix corresponding to the current moment; according to the navigation instruction information, determining a target direction network corresponding to a target direction from a plurality of preset direction networks; determining a target running track in the candidate running tracks according to the grading results of the candidate running tracks; and refining the target driving track by using the target direction network and the characteristic matrix corresponding to the forward looking angle to obtain the driving track at the future moment.

In the embodiment of the present application, since the plurality of driving trajectories in the plurality of directions and the plurality of candidate driving trajectories in the target direction are ideal and perfect trajectories generated according to the dynamic state information, the ideal and perfect trajectories are curves that conform to a certain rule, for example, bezier functions. However, in a real driving scene, the driving track conforms to the driving habit of a human body, and is not required to be very ideal and perfect. Therefore, after obtaining a plurality of scoring results, a target travel track is determined from a plurality of candidate travel tracks in the target direction according to the scoring results, and the target travel track is further refined by a time series model to obtain a final track (i.e., a travel track at a future time).

In the embodiment of the application, the feature matrix corresponding to the current moment comprises a plurality of feature matrices corresponding to different angles; the plurality of different angles includes a forward-looking angle of the target vehicle. The navigation instruction information reflects direction information of the next moment, so that a target direction network can be determined from a plurality of preset direction networks according to the navigation instruction information, and the target direction network is consistent with the direction information of the navigation instruction information. For example, if the navigation instruction information reflects that the next time is turning to the right, the target direction network is a direction network corresponding to the turning to the right. Compared with the scheme that the driving track at the future moment is determined in a plurality of driving tracks in a plurality of directions directly according to the cost matrix and the navigation indication information without passing through the network, the characteristic matrix corresponding to the target driving track and the forward-looking angle is input into the time series model for track optimization, so that the driving track at the future moment is obtained, and the richness of the driving track is improved.

In the embodiment of the present application, there are a plurality of acquisition devices installed on the target vehicle, and the number of the acquisition devices is described by taking 6 cameras installed on the vehicle as an example, and the acquisition devices are respectively installed right in front of the target vehicle, left in front of the target vehicle, right behind the target vehicle, and left behind the target vehicle. The front view angle of the target vehicle can be understood as an image capturing angle corresponding to a camera mounted right in front of the target vehicle. The feature matrix corresponding to the forward-looking angle of the target vehicle can be obtained by: acquiring a forward-looking angle image, and performing feature extraction on the forward-looking angle image to obtain forward-looking angle image features; depth prediction is carried out according to the foresight angle image characteristics to obtain image depth information of the foresight angle image, coordinate conversion is carried out according to the foresight angle image characteristics and the image depth information of the foresight angle image by taking the position of a target vehicle as a center through an internal reference matrix and an external reference matrix of the acquisition device to obtain foresight angle three-dimensional image characteristics; and adding the image characteristics with different heights at the same position in the three-dimensional image characteristics of the forward-looking angle to obtain a characteristic matrix corresponding to the forward-looking angle.

Illustratively, taking the time series model as a GRU as an example, three different branches are shown in fig. 7, a left-turn branch, a right-turn branch and a forward-walk branch, and the explanation is given with a high-level action indication of right-turn in fig. 7. Fig. 7 shows that after a plurality of travel trajectories in a plurality of directions and a cost matrix are obtained, in combination with an input high-level action (i.e., navigation instruction information), only a plurality of candidate trajectories for a right branch are scored, and a target travel trajectory is screened out of the plurality of candidate travel trajectories for the right branch. And then, the target driving track is refined through the GRU corresponding to the right-turn branch, so that the target driving track is more in line with driving habits, and the driving track at the future moment is obtained (shown as the own track in fig. 7). The method can be fully suitable for future multimode and optimality of the track, and the track planning efficiency is improved.

The embodiment of the application also provides a track planning method. As shown in fig. 8, fig. 8 is an alternative schematic diagram of another trajectory planning provided by the embodiment of the present application. The description will be given by taking the example that the acquisition device is a look-around camera, the feature matrix corresponding to the historical time comprises the feature matrix of the historical (t-1) frame, and the time series model is a GRU. O in FIG. 8₁、O₂…O_tRepresents different moments of time, wherein O₁、O₂…O_t-1Indicates the historical time, O_tIndicating the current time, at each time the around-looking camera captures 6 angular images (shown in fig. 8 as 6 images around the target vehicle). For images at 6 angles at each moment, 3D features (i.e., three-dimensional image features) are extracted from the images at 6 angles, and for example, images acquired by a video camera are input to a backbone network, corresponding image features are extracted, and the depth of corresponding pixels is predicted. And converting the image features into a coordinate system with the target vehicle as the center through the internal reference matrix and the external reference matrix of the camera, thereby extracting the 3D features. Wherein, two transformation matrixes are involved in the conversion process, and the transformation matrixes can be constructed by an internal reference matrix and an external reference matrix. Then, the 3D features are flattened in the form of a bird's eye view, i.e. projected onto the bird's eye view, and the features at different heights at the same location are added up to reduce them to 2D features, which are shown as X in fig. 8^t ₁、X^t ₂…X^t _tShown. Then, a feature matrix of the history (t-1) frame is set to coordinate system unification with the position of the target vehicle of the current frame (t frame) as a coordinate center, and a feature matrix set with coordinates unified with s in fig. 8 with the position of the target vehicle of the current frame (t frame) as a coordinate center is included in the feature matrix setDraw X^t ₁、X^t ₂…X^t _t. Mixing the feature matrixes of the image frames with unified coordinate systems through 3D convolution to obtain a final feature matrix, wherein the feature matrix comprises S^t ₁、S^t ₂…S^t _t. The above-described extraction of the 3D feature, projection onto the bird's eye view, and feature matrix corresponding to the mixed history time may be performed by an encoder having an encoding function of performing feature extraction, coordinate conversion, projection, and 3D convolution on an image.

In the embodiment of the application, S is included^t ₁、S^t ₂…S^t _tPredicting the future to obtain a feature matrix of the future time, and S is shown in FIG. 8^t _t+1、S^t _t+2、S^t _t+3、S^t _t+4The feature matrix at the future time is shown, and the process of predicting the future can refer to the method described in fig. 4A to 4C, which is not described herein again.

In the embodiment of the application, the feature matrix (S) is corresponding according to the future time^t _t+1、S^t _t+2、S^t _t+3、S^t _t+4) And the feature matrix S corresponding to the current time^t _tPerforming multi-task processing, allocating a convolution header to each task, analyzing the feature matrix, and generating a probability matrix corresponding to the task, where the multi-task processing is shown by a semantic header, an example header, a high-precision map header, and a cost function header in fig. 8. The above task processing may be performed by a decoder, which may include a plurality of convolution headers for different task processing, and which has a function of decoding the feature matrix according to the task content.

It should be noted that, in fig. 8, the encoder and the decoder may be obtained by performing iterative training on an image training sample and a track corresponding to the image training sample, where the encoder includes the first time series model, the second time series model, the hybrid prediction network, and the depth prediction model, and the decoder includes the depth neural model.

In the embodiment of the application, the semantic division head performs task processing of semantic division on the feature matrix to obtain a semantic division result, and then the semantic division result and a high-level instruction are input into a trajectory planner to perform trajectory planning, so that a trajectory planning result, namely a driving trajectory at a future moment, is obtained. The process of trajectory planning can refer to the method described in fig. 7, and is not described herein again.

According to the track planning method provided by the embodiment of the application, a safe and efficient driving track can be planned only by looking around the image input of the camera without depending on expensive laser radar and a high-precision map, the method is an end-to-end prediction process, and the cost is reduced. The method comprises the steps of extracting features, predicting depth and converting coordinates from images acquired by a look-around camera, generating feature matrixes for representing the probability that each point is a vehicle, a pedestrian, a lane line and a driving area respectively under the visual angle of a bird's-eye view, predicting the feature matrixes corresponding to future moments according to the feature matrixes corresponding to the current moment and the feature matrixes corresponding to historical moments, and planning tracks by using the generated feature matrixes corresponding to the future moments, so that the safety of planning results is guaranteed, and the planning efficiency is improved.

Based on the trajectory planning method of the embodiment of the present application, an embodiment of the present application further provides a trajectory planning device, as shown in fig. 9, fig. 9 is a schematic structural diagram of the trajectory planning device provided in the embodiment of the present application, and the trajectory planning device 90 includes:

an obtaining module 901, configured to obtain images of multiple different angles for a target vehicle and navigation instruction information at a current time;

a determining module 902, configured to perform feature extraction and fusion on the images at the multiple different angles, and determine a feature matrix corresponding to the current time and a feature matrix corresponding to the historical time;

a prediction module 903, configured to perform network prediction according to the feature matrix corresponding to the current time, determine a first feature matrix representing uncertainty, and perform network prediction according to the feature matrix corresponding to the current time and the feature matrix corresponding to the historical time to obtain a second feature matrix; fusing to obtain a feature matrix corresponding to the future moment based on the first feature matrix and the second feature matrix;

a segmentation module 904, configured to perform object segmentation according to the feature matrix corresponding to the current time and the feature matrix corresponding to the future time to obtain an object segmentation result;

and the planning module 905 is configured to perform trajectory planning according to the object segmentation result and the navigation instruction information to obtain a driving trajectory at a future time.

In some embodiments, the predicting module 903 is further configured to perform future prediction on the feature matrix corresponding to the current time by using a preset first time series model, so as to obtain a first feature matrix corresponding to a future time; utilizing a preset second time series model to carry out future prediction on the feature matrix corresponding to the historical moment and the feature matrix corresponding to the current moment to obtain a second feature matrix corresponding to the future moment; and fusing the first characteristic matrix and the second characteristic matrix by using a preset hybrid prediction network to obtain a characteristic matrix corresponding to the future moment.

In some embodiments, the predicting module 903 is further configured to process the feature matrix corresponding to the current time by using the preset first time series model, and generate a future uncertain probability matrix; and predicting the future uncertain probability matrix and the feature matrix corresponding to the current moment in the future to obtain a first feature matrix corresponding to the future moment.

In some embodiments, the predicting module 903 is further configured to predict, by using the first time series model and the future uncertain probability matrix, the feature matrix corresponding to the current time to obtain a first hidden layer state at the current time; decoding the first hidden layer state at the current moment to obtain a first characteristic prediction matrix at the next moment of the current moment; continuously predicting the first characteristic prediction matrix at the next moment by using the first time series model and the future uncertain probability matrix until the characteristic matrix prediction at the last moment of the future moment is finished, and obtaining a plurality of first characteristic prediction matrixes at moments after the current moment; and taking the first feature prediction matrixes of the plurality of moments after the current moment as the first feature matrixes corresponding to the future moments.

In some embodiments, the historical time includes a plurality of times prior to the current time, and the future time includes a plurality of times after the current time;

the predicting module 903 is further configured to predict a feature matrix corresponding to a first time in the historical times by using the second time series model and the initial state, so as to obtain a hidden layer state of the first time in the historical times; predicting a feature matrix corresponding to a second moment in the historical moments by using the second time series model and the hidden state of the first moment to obtain the hidden state of the second moment in the historical moments; the second moment is the next moment of the first moment; continuously predicting a feature matrix corresponding to a third moment based on the second time series model and the hidden layer state of the second moment until the feature matrix of the current moment is predicted, and obtaining a second hidden layer state of the current moment; decoding the second hidden layer state at the current moment to obtain a second characteristic prediction matrix at the next moment of the current moment; predicting a second characteristic prediction matrix at the next moment of the current moment based on the second time series model and the second hidden layer state at the current moment to obtain the second hidden layer state at the next moment; continuing to predict a second feature prediction matrix at the next moment of the next moment based on the second time series model and the second hidden layer state at the next moment until the feature matrix prediction at the last moment of the future moment is completed, and obtaining a plurality of second feature prediction matrices at moments after the current moment; and taking the plurality of second characteristic prediction matrixes at the time moments after the current time moment as second characteristic matrixes corresponding to the future time moments.

In some embodiments, the obtaining module 901 is further configured to obtain the dynamic state information of the target vehicle;

the planning module 905 is further configured to generate a plurality of driving tracks in a plurality of directions according to the dynamic state information; determining a cost matrix according to the object segmentation result; the cost matrix represents the probability of collision when the target vehicle runs through a preset position; and determining the driving track at the future moment in a plurality of driving tracks in the plurality of directions according to the cost matrix and the navigation indication information.

In some embodiments, the planning module 905 is further configured to determine, according to the navigation instruction information, a plurality of candidate driving trajectories of a target direction among a plurality of driving trajectories of the plurality of directions; scoring a plurality of candidate driving tracks according to the cost matrix to obtain scoring results of the plurality of candidate driving tracks; and determining the driving track at the future time in the plurality of candidate driving tracks according to the grading results of the plurality of candidate driving tracks.

In some embodiments, the planning module 905 is further configured to obtain a feature matrix corresponding to a forward-looking angle included in the feature matrix corresponding to the current time; according to the navigation instruction information, determining a target direction network corresponding to the target direction from a plurality of preset direction networks; determining a target running track in the candidate running tracks according to the grading results of the candidate running tracks; and refining the target driving track by utilizing the target direction network and the characteristic matrix corresponding to the forward looking angle to obtain the driving track at the future moment.

In some embodiments, the obtaining module 901 is further configured to obtain an internal reference matrix and an external reference matrix of a collecting device, where the collecting device is disposed on the target vehicle and is configured to collect the images at the plurality of different angles;

the determining module 902 is further configured to perform feature extraction on the images at the multiple different angles to obtain multiple image features; respectively carrying out depth prediction on the images of all angles according to the plurality of image characteristics to obtain image depth information of all angles; performing coordinate conversion according to the plurality of image characteristics and the image depth information of each angle by taking the position of the target vehicle as a center through an internal reference matrix and an external reference matrix of the acquisition device to obtain three-dimensional image characteristics; adding the image features with different heights at the same position in the three-dimensional image features to obtain a two-dimensional feature matrix at the current moment; acquiring two-dimensional historical feature matrixes of historical images at a plurality of different angles; and determining a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment according to the two-dimensional feature matrix of the current moment and the two-dimensional historical feature matrix.

In some embodiments, the determining module 902 is further configured to coordinate unify the two-dimensional historical feature matrix and the two-dimensional feature matrix at the current time by taking the position of the target vehicle at the current time as a center, so as to obtain the two-dimensional feature matrix at the historical time; and performing three-dimensional convolution processing on the two-dimensional feature matrix at the current moment and the two-dimensional feature matrix at the historical moment to obtain the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment.

In some embodiments, the segmentation module 904 is further configured to perform fine-granularity feature extraction on the feature matrix corresponding to the current time and the feature matrix corresponding to the future time by using a preset depth neural model, so as to obtain an image edge feature; the preset deep neural model is used for extracting fine granularity characteristics; and carrying out object segmentation according to the image edge characteristics to obtain the object segmentation result.

It should be noted that, when planning a trajectory, the trajectory planning apparatus provided in the above embodiment is only illustrated by dividing each program module, and in practical applications, the above processing may be distributed to different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the above-described processing. In addition, the embodiment of the trajectory planning device and the embodiment of the trajectory planning method provided by the above embodiments belong to the same concept, and specific implementation processes and beneficial effects thereof are described in detail in the embodiment of the method and are not described herein again. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.

In this embodiment of the application, fig. 10 is a schematic view of a composition structure of a trajectory planning device provided in this embodiment of the application, and as shown in fig. 10, the device 100 provided in this embodiment of the application may further include a processor 1001 and a memory 1002 storing executable instructions of the processor 1001, and in some embodiments, the trajectory planning device 100 may further include a communication interface 1003 and a bus 1004 for connecting the processor 1001, the memory 1002, and the communication interface 1003.

In the embodiment of the present Application, the Processor 1001 may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a ProgRAMmable Logic Device (PLD), a Field ProgRAMmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic devices for implementing the above processor functions may be other devices, and the embodiments of the present application are not limited in particular.

In this embodiment, a bus 1004 is used to connect the communication interface 1003, the processor 1001, and the memory 1002, and to communicate among these devices.

In this embodiment, the processor 1001 is configured to execute the trajectory planning method described in any of the embodiments.

The memory 1002 of the trajectory planning device 100 may be coupled to the processor 1001, the memory 1002 being configured to store executable program code and data, the program code including computer operating instructions, and the memory 1002 may comprise a high-speed RAM memory and may further comprise a non-volatile memory, such as at least two disk memories. In practical applications, the Memory 1002 may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 1001.

In addition, each functional module in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.

Based on the understanding that the technical solutions of the present embodiment substantially or partially contribute to the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiment of the present application provides a computer-readable storage medium, on which a program is stored, and the program, when executed by a processor, implements the trajectory planning method according to any of the embodiments above.

For example, the program instructions corresponding to a trajectory planning method in this embodiment may be stored in a storage medium such as an optical disc, a hard disc, or a usb disk, and when the program instructions corresponding to a trajectory planning method in the storage medium are read or executed by an electronic device, the trajectory planning method according to any of the above embodiments may be implemented.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.

The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims

1. A trajectory planning method, characterized in that the method comprises:

acquiring images of a plurality of different angles of a target vehicle and navigation instruction information at the current moment;

extracting and fusing the features of the images at the different angles, and determining a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment;

performing network prediction according to the feature matrix corresponding to the current moment, determining a first feature matrix representing uncertainty, and performing network prediction according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment to obtain a second feature matrix; fusing to obtain a feature matrix corresponding to the future moment based on the first feature matrix and the second feature matrix;

carrying out object segmentation according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the future moment to obtain an object segmentation result;

and planning a track according to the object segmentation result and the navigation indication information to obtain a driving track at a future moment.

2. The method according to claim 1, wherein the network prediction is performed according to the feature matrix corresponding to the current time, a first feature matrix representing uncertainty is determined, and the network prediction is performed according to the feature matrix corresponding to the current time and the feature matrix corresponding to the historical time to obtain a second feature matrix; based on the first feature matrix and the second feature matrix, fusing to obtain a feature matrix corresponding to a future moment, including:

utilizing a preset first time sequence model to carry out future prediction on the characteristic matrix corresponding to the current moment to obtain a first characteristic matrix corresponding to the future moment;

performing future prediction on the feature matrix corresponding to the historical moment and the feature matrix corresponding to the current moment by using a preset second time series model to obtain a second feature matrix corresponding to the future moment;

and fusing the first characteristic matrix and the second characteristic matrix by using a preset hybrid prediction network to obtain a characteristic matrix corresponding to the future moment.

3. The method according to claim 2, wherein the predicting the feature matrix corresponding to the current time in the future by using a preset first time series model to obtain a first feature matrix corresponding to a future time comprises:

processing the characteristic matrix corresponding to the current moment by using the preset first time sequence model to generate a future uncertain probability matrix;

and predicting the future uncertain probability matrix and the feature matrix corresponding to the current moment in the future to obtain a first feature matrix corresponding to the future moment.

4. The method according to claim 3, wherein the predicting the future uncertainty probability matrix and the feature matrix corresponding to the current time in the future to obtain the first feature matrix corresponding to the future time comprises:

predicting a feature matrix corresponding to the current moment by using the first time series model and the future uncertain probability matrix to obtain a first hidden layer state of the current moment;

decoding the first hidden layer state at the current moment to obtain a first characteristic prediction matrix at the next moment of the current moment;

continuously utilizing the first time series model and the future uncertain probability matrix to predict the first characteristic prediction matrix at the next moment until the characteristic matrix at the last moment of the future moment is predicted, and obtaining a plurality of first characteristic prediction matrixes at moments after the current moment;

and taking the first feature prediction matrixes of the plurality of moments after the current moment as the first feature matrixes corresponding to the future moments.

5. The method of claim 2, wherein the historical time includes a plurality of times prior to the current time, and wherein the future time includes a plurality of times after the current time;

the predicting the feature matrix corresponding to the historical time and the feature matrix corresponding to the current time in the future by using a preset second time series model to obtain a second feature matrix corresponding to the future time comprises the following steps:

predicting a feature matrix corresponding to the first time in the historical moments by using the second time series model and the initial state to obtain a hidden layer state of the first time in the historical moments;

predicting a feature matrix corresponding to a second moment in the historical moments by using the second time series model and the hidden state of the first moment to obtain the hidden state of the second moment in the historical moments; the second moment is the next moment of the first moment;

continuously predicting a feature matrix corresponding to a third moment based on the second time series model and the hidden layer state of the second moment until the feature matrix of the current moment is predicted, and obtaining a second hidden layer state of the current moment;

decoding the second hidden layer state at the current moment to obtain a second characteristic prediction matrix at the next moment of the current moment;

predicting a second characteristic prediction matrix at the next moment of the current moment based on the second time series model and the second hidden layer state at the current moment to obtain the second hidden layer state at the next moment;

continuing to predict a second feature prediction matrix at the next moment of the next moment based on the second time series model and the second hidden layer state at the next moment until the feature matrix prediction at the last moment of the future moment is completed, and obtaining a plurality of second feature prediction matrices at moments after the current moment;

and taking the plurality of second characteristic prediction matrixes at the time moments after the current time moment as second characteristic matrixes corresponding to the future time moments.

6. The method according to claim 1, wherein the performing trajectory planning according to the object segmentation result and the navigation instruction information to obtain a driving trajectory at a future time comprises:

acquiring dynamic state information of the target vehicle;

generating a plurality of driving tracks in a plurality of directions according to the dynamic state information;

determining a cost matrix according to the object segmentation result; the cost matrix represents the probability of collision when the target vehicle drives through a preset position;

and determining the driving track at the future moment in a plurality of driving tracks in the plurality of directions according to the cost matrix and the navigation indication information.

7. The method of claim 6, wherein determining the future time travel trajectory from the plurality of travel trajectories in the plurality of directions based on the cost matrix and the navigation instruction information comprises:

determining a plurality of candidate running tracks of a target direction in a plurality of running tracks of the plurality of directions according to the navigation instruction information;

scoring a plurality of candidate driving tracks according to the cost matrix to obtain scoring results of the plurality of candidate driving tracks;

and determining the driving track at the future time in the candidate driving tracks according to the grading results of the candidate driving tracks.

8. The method of claim 7, wherein after scoring a plurality of candidate driving trajectories according to the cost matrix and obtaining scoring results of the plurality of candidate driving trajectories, the method further comprises:

acquiring a characteristic matrix corresponding to a forward looking angle contained in the characteristic matrix corresponding to the current moment;

according to the navigation instruction information, determining a target direction network corresponding to the target direction from a plurality of preset direction networks;

determining a target running track in the candidate running tracks according to the grading results of the candidate running tracks;

and refining the target driving track by utilizing the target direction network and the characteristic matrix corresponding to the forward looking angle to obtain the driving track at the future moment.

9. The method according to any one of claims 1 to 8, wherein the extracting and fusing the features of the images from the plurality of different angles to determine a feature matrix corresponding to a current time and a feature matrix corresponding to a historical time includes:

acquiring an internal reference matrix and an external reference matrix of an acquisition device, wherein the acquisition device is arranged on the target vehicle and is used for acquiring the images at different angles;

performing feature extraction on the images at the different angles to obtain a plurality of image features;

respectively carrying out depth prediction on the images of all angles according to the plurality of image characteristics to obtain image depth information of all angles;

performing coordinate conversion according to the plurality of image characteristics and the image depth information of each angle by taking the position of the target vehicle as a center through an internal reference matrix and an external reference matrix of the acquisition device to obtain three-dimensional image characteristics;

adding the image features with different heights at the same position in the three-dimensional image features to obtain a two-dimensional feature matrix at the current moment;

acquiring two-dimensional historical feature matrixes of historical images at a plurality of different angles;

and determining a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment according to the two-dimensional feature matrix of the current moment and the two-dimensional historical feature matrix.

10. The method according to claim 9, wherein the determining the feature matrix corresponding to the current time and the feature matrix corresponding to the historical time according to the two-dimensional feature matrix at the current time and the two-dimensional historical feature matrix comprises:

taking the position of the target vehicle at the current moment as a center, and unifying the coordinates of the two-dimensional historical feature matrix and the two-dimensional feature matrix at the current moment to obtain the two-dimensional feature matrix at the historical moment;

and performing three-dimensional convolution processing on the two-dimensional feature matrix at the current moment and the two-dimensional feature matrix at the historical moment to obtain the feature matrix corresponding to the current moment and the feature matrix corresponding to the historical moment.

11. The method according to any one of claims 1 to 8, wherein the performing object segmentation according to the feature matrix corresponding to the current time and the feature matrix corresponding to the future time to obtain an object segmentation result comprises:

performing fine-granularity feature extraction on the feature matrix corresponding to the current moment and the feature matrix corresponding to the future moment by using a preset deep neural model to obtain image edge features; the preset deep neural model is used for extracting fine granularity characteristics;

and carrying out object segmentation according to the image edge characteristics to obtain the object segmentation result.

12. A trajectory planning apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring images of a plurality of different angles of a target vehicle and navigation instruction information at the current moment;

the determining module is used for extracting and fusing the features of the images at the different angles, and determining a feature matrix corresponding to the current moment and a feature matrix corresponding to the historical moment;

the prediction module is used for carrying out network prediction according to the characteristic matrix corresponding to the current moment, determining a first characteristic matrix representing uncertainty, and carrying out network prediction according to the characteristic matrix corresponding to the current moment and the characteristic matrix corresponding to the historical moment to obtain a second characteristic matrix; fusing to obtain a feature matrix corresponding to the future moment based on the first feature matrix and the second feature matrix;

the segmentation module is used for carrying out object segmentation according to the feature matrix corresponding to the current moment and the feature matrix corresponding to the future moment to obtain an object segmentation result;

and the planning module is used for planning a track according to the object segmentation result and the navigation indication information to obtain a driving track at a future moment.

13. Trajectory planning device, characterized in that it comprises a memory and a processor, said memory storing a computer program executable on the processor, said processor implementing the steps of the method according to any one of claims 1 to 11 when executing said program.

14. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1-11.