CN114419098A

CN114419098A - Moving target trajectory prediction method and device based on visual transformation

Info

Publication number: CN114419098A
Application number: CN202210056064.8A
Authority: CN
Inventors: 李迅; 黄志鹏
Original assignee: Changsha Huilian Intelligent Technology Co ltd
Current assignee: Changsha Huilian Intelligent Technology Co ltd
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2022-04-29

Abstract

The invention discloses a method and a device for predicting a moving target track based on visual transformation, wherein the method comprises the following steps: s1, acquiring a road environment image acquired by vehicle-mounted image acquisition equipment in real time, and detecting a moving target to obtain a 2D bounding box of the moving target; s2, predicting a target track by using a 2D bounding box of the moving target and tracking the target to obtain a predicted coordinate of the moving target track; and S3, converting the predicted coordinates of the moving target track into a vehicle body coordinate system according to the coordinate conversion relation between the image pixels and the vehicle body obtained through pre-calibration to obtain the predicted coordinates of the moving target track in the vehicle body coordinate system. The method can realize target track prediction of vehicle-mounted environment perception, and has the advantages of high prediction efficiency and precision and the like.

Description

Moving target trajectory prediction method and device based on visual transformation

Technical Field

The invention relates to the technical field of vehicle environment perception, in particular to a moving target trajectory prediction method and device based on visual transformation.

Background

Unmanned vehicles realize automatic driving on public roads through technical modes such as perception, prediction, planning and control, and often need to realize perception and positioning of the surrounding environment of the vehicle and positioning and navigation information of the vehicle by means of vehicle-mounted sensors such as vehicle-mounted cameras, laser radars, IMUs (inertial measurement units) and GNSS (global navigation satellite system), and then carry out real-time decision planning based on map information and obstacle information. Environmental perception is an important ring of unmanned driving, is the key of information interaction between an unmanned vehicle and the external environment, and can enable the unmanned vehicle to better simulate the perception capability of a human driver through the environmental perception, so that the driving situation of the unmanned vehicle and the surrounding driving situation can be understood. For decision planning of the unmanned vehicle, it is very critical to accurately predict the movement track of the obstacle, and the precision of track prediction directly affects the safety and reliability of the unmanned vehicle.

For the prediction of the motion trail of the unmanned vehicle, the following methods are generally adopted in the prior art:

1. aiming at videos shot by the vehicle-mounted camera, a rule-based prediction algorithm or a machine learning-based unmanned moving object prediction algorithm is adopted. However, in this type of mode, either the problem of missed detection caused by instability of the detection algorithm exists or the problem of tracking failure easily caused when the target is shielded exists, so that the prediction accuracy is limited, and the method is not suitable for realizing real-time motion target trajectory prediction in an unmanned vehicle with higher requirement on the prediction accuracy.

2. A moving target track prediction method based on laser point cloud data is adopted, namely the moving target track is predicted based on the laser point cloud data. However, the equipment used by the method is expensive, the requirement on the working environment is high, the data resolution is low, the types of the predicted targets cannot be accurately distinguished, the situations of inaccurate or wrong track prediction are easily generated when the targets are close to each other or intersect with each other, the calculation time and the precision of single-frame laser radar scanning are also unstable, and the application requirement of safety requirement (stability and small variance) under the unmanned application scene cannot be met.

Patent application CN202110331661.2 discloses a method for predicting the track of an unmanned adjacent vehicle, which extracts a set of vehicles around the unmanned adjacent vehicle and road condition information from video data and point cloud data through LKDSCAN (Limit KDBSCAN), predicts the real-time behavior of the vehicle by using a long-short term memory neural network LSTM, and predicts the track of the vehicle through BLSTM (Behavior basedlsTM) by combining the predicted vehicle behavior of the LSTM and the historical behavior data of the vehicle. The above scheme can only realize the track prediction of the adjacent vehicle, and the vehicle may encounter moving targets such as people and animals in the driving process, and the above scheme cannot realize the track prediction of the moving targets except the adjacent vehicle, so that the application limitation is large, the above scheme needs to simultaneously process video data and point cloud data, and meanwhile, the prediction of the vehicle track needs to depend on the prediction of the vehicle behavior and historical behavior data, so that the problems of complex realization, large data processing amount, low prediction efficiency and the like are caused.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides the moving target track prediction method and the moving target track prediction device based on the visual transformation, which have the advantages of simple implementation method, high prediction speed, high efficiency and high precision, and can realize the real-time track prediction of various surrounding moving targets in the running process of a vehicle.

In order to solve the technical problems, the technical scheme provided by the invention is as follows:

a moving target track prediction method based on visual transformation is applied to vehicle-mounted environment perception and is characterized by comprising the following steps:

s1, acquiring a road environment image acquired by vehicle-mounted image acquisition equipment in real time, and detecting a moving target to obtain a 2D bounding box of the moving target;

s2, predicting a target track by using the 2D bounding box of the moving target and tracking the target to obtain a predicted coordinate of the moving target track;

and S3, converting the motion target track prediction coordinate into a vehicle body coordinate system according to a coordinate conversion relation between the image pixel obtained by pre-calibration and the vehicle body to obtain the motion target track prediction coordinate in the vehicle body coordinate system.

Further, in step S1, the moving target is detected by inputting the road environment image acquired in real time into a target detection network model, which is trained by using the historical road environment image data set acquired by the vehicle-mounted image acquisition device.

Further, the step S1 includes identifying the object type according to the detected motion trajectory, inertia characteristic and speed variation characteristic of the moving object, wherein the vehicle is determined when the motion trajectory of the moving object tends to a straight line parallel to the boundary of the lane line and does not exceed the boundary of the lane line, the inertia characteristic of the moving object is greater than a preset threshold, and the speed variation rate is less than the preset threshold.

Further, in step S2, performing target trajectory prediction on the 2D bounding box of the moving target by using a kalman filter; extracting the depth features of the 2D bounding box by using a pre-trained depth network model, then tracking the moving target by using the 2D bounding box and the extracted depth features, and matching and correcting the target recognition result in the tracking process by using the mass m and the speed v as the physical features of the moving target.

Further, in step S3, the vehicle body, the laser radar, and the vehicle-mounted camera are respectively calibrated jointly by using the laser radar arranged on the vehicle as a reference coordinate system to determine a variation relationship between the vehicle body coordinate system and the laser radar coordinate system, and between the laser radar coordinate system and the vehicle-mounted camera, the external reference matrix from the laser radar to the vehicle body and the external reference matrix from the laser radar to the vehicle-mounted camera are obtained through joint calibration, the external reference matrix from the laser radar to the vehicle body obtained through joint calibration is inverted and then multiplied by the external reference matrix from the laser radar to the vehicle-mounted camera to obtain an external reference transformation matrix from the vehicle body to the vehicle-mounted camera, and a coordinate transformation relationship between the image pixel and the vehicle body is constructed and obtained according to the internal reference matrix of the vehicle-mounted camera and the external reference transformation matrix from the vehicle body to the vehicle-mounted camera.

Further, the internal parameter matrix of the vehicle-mounted image acquisition equipment is

Wherein fx, fy, cx and cy are respectively a scale factor of the vehicle-mounted image acquisition equipment in the u-axis direction under the pixel coordinate system, a scale factor of the vehicle-mounted image acquisition equipment in the v-axis direction under the pixel coordinate system, a horizontal pixel number of a difference between a central pixel coordinate of an image and an image origin pixel coordinate, and a longitudinal pixel number of a difference between the central pixel coordinate of the image and the image origin pixel coordinate, and an external reference matrix between the vehicle body and the vehicle-mounted image acquisition equipment is

Wherein r is₁₁～r₃₃Respectively, elements of an orthogonal rotation matrix, t₁～t₃Are respectively flatAnd moving elements of the matrix, and constructing and obtaining a coordinate conversion relation between the image pixel and the vehicle body as follows:

wherein u and v are coordinates of pixel coordinate points, w is a scale factor, and x_camera、y_camera、z_cameraRespectively are x, y and z axis coordinates under a vehicle body coordinate system;

z is configured according to the distance h between the vehicle body and the ground_cameraNamely zcamera ═ h, and multiplying the internal reference matrix and the external reference matrix to convert:

A00-A23 are elements in a matrix A obtained by multiplying the internal reference matrix and the external reference matrix respectively;

and obtaining a final coordinate conversion relation between the image pixel and the vehicle body as follows:

wherein, x, y are x, y axle coordinate under the automobile body coordinate system respectively.

Further, in step S3, the coordinate of the center point of the bottom of the 2D bounding box obtained by predicting each trajectory is projected in a plan view by using the coordinate transformation relation, so as to obtain the predicted trajectory coordinate of the predicted moving object in the current frame, which corresponds to the vehicle body coordinate system, and complete the trajectory prediction of the moving object.

Further, after the step S3, the method further includes predicting a position deviation between coordinates according to the trajectories of the moving object of the previous and next frames to obtain a moving speed of the moving object according to the formula

Estimating a next frameThe moving speed of (1), wherein vx 'and vy' are respectively the speed components on the x axis and the y axis in the moving speed of the next frame, vx and vy are respectively the speed components on the x axis and the y axis in the moving speed of the current frame, and k is a preset threshold.

A moving object trajectory prediction apparatus based on visual transformation, comprising:

the detection module is used for acquiring road environment images acquired by the vehicle-mounted image acquisition equipment in real time and detecting the moving target to obtain a 2D bounding box of the moving target;

the prediction and tracking module is used for predicting a target track by using the 2D bounding box of the moving target and tracking the target to obtain a predicted coordinate of the moving target track;

and the vision transformation module is used for transforming the predicted coordinates of the moving target track into a vehicle body coordinate system according to the coordinate transformation relation between the image pixels and the vehicle body obtained by pre-calibration so as to obtain the predicted coordinates of the moving target track in the vehicle body coordinate system.

A computer apparatus comprising a processor and a memory, the memory being arranged to store a computer program, the processor being arranged to execute the computer program, and the processor being arranged to execute the computer program to perform the method as described above.

Compared with the prior art, the invention has the advantages that:

1. the invention acquires the road environment image in real time by acquiring the vehicle-mounted camera of the unmanned vehicle in real time, detects the 2D bounding box of each running target on the road, then carries out target prediction and target tracking based on the 2D bounding box, the predicted coordinates of the moving target track obtained by prediction are projected according to the coordinate conversion relation to obtain the predicted coordinates of the moving target track in the vehicle body coordinate system, can combine target prediction tracking and visual transformation to realize reliable and accurate moving target track prediction, avoid the problem of missed detection caused by unstable detection and the problem of tracking failure caused by the shielded target, when the targets are close to each other or intersect with each other, accurate track prediction can be still realized, the prediction precision is high, the prediction speed is high, the flexibility is strong, the method can realize the rapid and accurate detection and track prediction of various moving targets in the road environment.

2. The invention further fully considers the inertia factor of the object motion, the characteristics of the moving object and the difference of the motion speeds of different types of objects, and combines the inertia and the characteristics of the moving object to consider the target type identification, so that the problems of inaccurate identity confirmation of the individual target and frequent switching of tracking ID can be well solved.

3. The invention further can effectively construct and obtain the coordinate conversion relation between the pixels and the vehicle body by continuous conversion among 3 coordinate systems of the vehicle body, the vehicle-mounted radar and the vehicle-mounted camera, and realizes the conversion from low dimensionality (pixel coordinate system) to high dimensionality (vehicle body coordinate system) by solving through a column equation by setting the height of the coordinate system of the vehicle body as a fixed value.

4. The invention further utilizes the vehicle body and the laser radar as well as the vehicle-mounted camera and the laser radar to respectively carry out combined calibration to obtain the coordinate conversion relation between the pixels and the vehicle body, and then calculates the displacement of the projection coordinate of the central point at the bottom of the surrounding box of the moving target of the adjacent frame according to the overlooking projection, thereby being capable of rapidly and accurately realizing the detection of the speed of various moving targets.

Drawings

Fig. 1 is a schematic flow chart of an implementation of the method for predicting a moving object trajectory based on visual transformation according to the present embodiment.

Fig. 2 is a detailed flowchart illustrating the implementation of the motion target trajectory prediction in the embodiment of the present invention.

Fig. 3 is a schematic diagram of the arrangement principle of the vehicle-mounted camera and the laser radar in the embodiment.

Fig. 4 is a schematic diagram of the principle of coordinate conversion in the present embodiment.

Detailed Description

The invention is further described below with reference to the drawings and specific preferred embodiments of the description, without thereby limiting the scope of protection of the invention.

As shown in fig. 1, the steps of the method for predicting the trajectory of the moving object based on the visual transformation in the present embodiment include:

s1, acquiring a road environment image acquired by vehicle-mounted image acquisition equipment in real time, and detecting a moving target to obtain a 2D (dimensional) bounding box of the moving target, wherein the 2D bounding box is a minimum detection rectangular frame of the detected moving target;

s2, predicting a target track by using a 2D bounding box of the moving target and tracking the target to obtain a predicted coordinate of the moving target track;

and S3, converting the predicted coordinates of the moving target track into a vehicle body coordinate system according to the coordinate conversion relation between the image pixels and the vehicle body obtained through pre-calibration to obtain the predicted coordinates of the moving target track in the vehicle body coordinate system.

The embodiment acquires the road environment image in real time by acquiring the vehicle-mounted camera of the unmanned vehicle in real time, detects the 2D bounding box of each running target on the road, then carries out target prediction and target tracking based on the 2D bounding box, the predicted coordinates of the moving target track obtained by prediction are projected according to the coordinate conversion relation to obtain the predicted coordinates of the moving target track in the vehicle body coordinate system, can combine target prediction tracking and visual transformation to realize reliable and accurate motion target track prediction, the problems of missing detection caused by unstable detection and tracking failure caused by the shielded target can be avoided, when the targets are close to each other or intersect with each other, accurate track prediction can be still realized, the prediction precision is high, the prediction speed is high, the flexibility is strong, the method can realize the rapid and accurate detection and track prediction of various moving targets in the road environment. The predicted moving target track can be further provided for a decision end of the unmanned vehicle so as to provide decision support for decision planning of the unmanned vehicle.

In this embodiment, the vehicle-mounted image capturing device in step S1 is specifically a vehicle-mounted camera, and the vehicle-mounted camera is specifically arranged on the top of the unmanned vehicle, as shown in fig. 3, and faces the ground to capture the road surrounding environment information in real time. The arrangement position of the vehicle-mounted camera and the type of the camera can be configured and selected according to actual requirements.

In this embodiment, in step S1, the road environment image obtained in real time is specifically input into the target detection network model to detect the moving target, and the target detection network model is obtained by training using the historical road environment image data set acquired by the vehicle-mounted image acquisition device. The historical road environment image dataset is various road surrounding image which is adopted by the vehicle-mounted camera in advance, the historical road environment image dataset comprises various moving targets, the moving targets can be specifically various targets which need to be identified, such as vehicles, pedestrians, animals and the like, the moving targets in the images in the historical road environment image dataset are marked and then trained to obtain a target detection network model, and then the target detection network model can be used for realizing the detection and identification of the targets.

The detailed steps of the target detection network model construction comprise:

s101, recording video data of surrounding road traffic environment by using a vehicle-mounted camera on a vehicle, and dividing the video data into training set data TRAIN and TEST set data TEST.

The video data collected by the vehicle-mounted camera can be but is not limited to MP4 format, the video data is divided and stored in the form of pictures through a video processing program, and then the pictures are divided into a training set TRAIN and a TEST set TEST according to a preset proportion.

S102: and marking the pictures in the training set TRAIN, including the position information and the category vectors of all targets, and respectively making and forming a picture set pa for training and a picture set pb for verification.

The 2D bounding box can be specifically generated when the pictures in the training set TRAIN are labeled, the 2D bounding box is a minimum detection rectangular box of the detected moving target, the coordinates (x, y) at the upper left corner and the width (w, h) of the 2D bounding box are position information of the moving target, and then the labeled pictures are randomly divided into a picture set pa for training and a picture set pb for verification according to a certain proportion.

S103: and inputting the initial target detection network model by using the prepared picture set pa for training to obtain a trained target detection model M1.

The target detection network model may be a deep neural network model YOLOv5, or may be another type of neural network model.

S104: the method comprises the steps of collecting images of the vehicle-mounted camera in real time and issuing the images in the form of a topic TP1 through the ROS.

Further, the form of the topic TP1 can be customized, such as "/minus/camera/image", etc.

S105: and subscribing the topic TP1 in the step S104 by using the target detection model M1 obtained in the step S103, converting the topic into a picture, and detecting the converted picture to obtain a 2D bounding box and a category of the moving target in the picture.

The topic TP1 can be converted into a picture form by a tool library such as OpenCV.

In the embodiment, based on the road environment image collected by the vehicle-mounted camera, the target detection is performed by using the deep neural network, so that various moving targets can be quickly and accurately detected, the 2D bounding box of the target is obtained, and the category of each object can be identified.

Inertia factors, characteristics of moving objects and motion speed characteristics of different types of objects are different among different moving objects, for example, the motion speed of a vehicle is different from that of a pedestrian, and the motion track of the vehicle is different from that of the pedestrian. In the embodiment, inertia factors of the motion of the object, characteristics of the moving object and differences of motion speeds of objects of different types are fully considered, and the inertia and the characteristics of the moving object are considered in a combined manner to realize the classification of the moving object.

The following features are provided when the vehicle travels along the lane line boundary: the moving track is a straight line approximately parallel to the boundary of the lane line and does not exceed the boundary of the lane line, the inertia of the vehicle is large, the speed change is uniform, and the speed of the pedestrian is unpredictable. The present embodiment takes the above-described characteristics into consideration, and assists in discriminating the object class according to the motion trajectory of the moving object, which is determined as a vehicle when the motion trajectory of the moving object is a straight line that tends to be parallel to the lane line boundary and does not exceed the lane line boundary, and the inertial characteristic of the moving object is greater than a preset threshold value and the rate of change of speed is less than the preset threshold value, and the inertial characteristic of the moving object is greater than the preset threshold value. The target category identification is carried out by combining the inertia and the characteristics of the moving target, the category of each object on the road can be efficiently identified, and the problems of inaccurate identity confirmation of the individual target and frequent switching of the tracking ID can be well solved.

In step S2 of this embodiment, a kalman filter is specifically used to predict the target trajectory of the 2D bounding box of the moving target. And tracking the X direction and the Y direction of the central point of the bottom of the bounding box of the moving target through Kalman filtering, wherein the specific target point of the tracked bounding box can be other position points besides the central point.

In step S2, the depth feature of the 2D bounding box is extracted by using a depth network model trained in advance, and then the moving object is tracked by using the 2D bounding box and the extracted depth feature. The depth network model extracts the depth Feature corresponding to the target of the 2D bounding box by using a Feature extraction network, wherein the Feature extraction network can be a network structure such as ResNet (residual error network), and the target tracking module is formed by performing target tracking on the detected 2D bounding box and the depth Feature corresponding to the 2D bounding box extracted by the Feature extraction network. In the embodiment, after the class and the 2D bounding box of each object on the road are detected by adopting the deep learning detection network, the 2D bounding box and the features extracted by the deep learning detection network are further sent to the target tracking module for target tracking, so that the problem of frequent switching of tracking labels can be well solved.

In this embodiment, when the trajectory of the detected moving target is predicted by using the kalman filter algorithm, the trajectory of each target individual is obtained by specifically using a target tracking module in which a deep convolution re-recognition appearance model in the multi-target tracking algorithm Deepsort is fused with a cascade matching algorithm, and then trajectory information predicted by using the kalman filter is converted into a world coordinate system through visual transformation. And the inertia principle of the object is utilized to combine the track and the reid algorithm, and the strategy judgment is adopted, so that the influences caused by shielding and appearance change are reduced, and the method is high in reliability, good in accuracy and stronger in robustness. According to the law of inertia, inertia is related to the mass m and velocity v of the detected moving object, so according to newton's first law, the larger the mass m of the detected moving object, i.e. the larger the 2d bounding box, the larger the velocity v, the less easily the inertia of the object changes. In the embodiment, the target identification is carried out by using the reid algorithm, the mass m and the speed v of the moving target are represented as the physical characteristics of the moving target, and the matching correction is carried out on the reid result, so that the problem of mismatching between the front frame and the rear frame caused by occlusion and appearance change of the same moving target can be effectively solved.

In the embodiment, after the moving target is detected and tracked based on the image shot by the vehicle-mounted camera, the predicted coordinates of the moving target track are converted into the vehicle body coordinate system according to the coordinate conversion relationship between the image pixel obtained by calibration in advance and the vehicle body, the predicted coordinates of the moving target track in the vehicle body coordinate system are obtained, and the prediction of the moving target track is realized by using visual transformation.

The present embodiment also has the lidar arranged on the unmanned vehicle, and the lidar may be arranged on the left and right sides and the front and rear sides of the vehicle, as shown in fig. 3. The coordinate conversion relation between the image pixels and the vehicle body is characterized in that the vehicle body, the laser radar and the vehicle-mounted camera (vehicle-mounted image acquisition equipment) are respectively subjected to combined calibration, namely the laser radar is used as a reference coordinate system, the vehicle body, the laser radar and the vehicle-mounted camera are respectively subjected to twice combined calibration to determine the change relation between a vehicle body coordinate system and a laser radar coordinate system, and the change relation between a laser radar coordinate system and a vehicle-mounted camera, an external reference matrix from the laser radar to the vehicle body and an external reference matrix from the laser radar to the vehicle-mounted camera are obtained through combined calibration, the external reference matrix comprises a rotation matrix and a translation matrix, the laser radar obtained through combined calibration is subjected to inversion to the external reference matrix from the vehicle body and then multiplied by the external reference matrix from the laser radar to the vehicle-mounted camera to obtain the external reference transformation matrix from the vehicle body to the vehicle-mounted camera, and then the projection transformation matrix between the vehicle body coordinates and the image pixels is solved, and according to the internal reference matrix of the vehicle-mounted camera and the external reference transformation matrix from the vehicle body to the vehicle-mounted camera, which is obtained by calculating the two external reference transformation matrices obtained by combined calibration, constructing a coordinate transformation relation between the image pixel and the vehicle body, so that the coordinate transformation between the image pixel and the vehicle body can be realized.

The conversion from the pixel coordinates to the body coordinates in this embodiment is specifically performed in two steps: firstly converting the coordinates of the vehicle body to the coordinates of the vehicle-mounted camera, and then converting the coordinates of the pixels to the coordinates of the vehicle-mounted camera, specifically:

firstly, calculating an external parameter transformation matrix from a vehicle body to a vehicle-mounted camera: the external parameter matrix of the combined calibration result is the conversion from the vehicle-mounted camera to the laser radar, so that inversion is required to obtain the conversion from the laser radar to the vehicle-mounted camera, external parameters from the vehicle body to the laser radar are obtained by using tf monitoring, and the external parameter conversion matrix from the vehicle body to the laser radar and the external parameter conversion matrix from the laser radar to the vehicle-mounted camera are multiplied to obtain the external parameter conversion matrix from the vehicle body to the vehicle-mounted camera.

Secondly, converting the coordinates of the pixels to the vehicle-mounted camera: the pixel coordinate system is two-dimensional, the vehicle-mounted camera coordinate system is three-dimensional, coordinate transformation can be performed on the same dimension under normal conditions, or transformation from high dimension to low dimension can be performed, but transformation from low dimension to high dimension cannot be performed due to information loss, and in the embodiment, transformation is achieved by giving a certain dimension value to perform column equation solution.

The coordinate transformation relation between the image pixels and the vehicle body is specifically obtained by constructing an internal reference matrix of the vehicle-mounted image acquisition equipment and an external reference matrix between the vehicle body and the vehicle-mounted camera. The internal reference matrix of the vehicle-mounted camera is

Wherein fx, fy, cx and cy are respectively a scale factor of the vehicle-mounted camera in the u-axis direction under the pixel coordinate system, a scale factor of the vehicle-mounted camera in the v-axis direction under the pixel coordinate system, the number of transverse pixels (namely the abscissa of a main point of the camera) of the difference between the central pixel coordinate of the image and the pixel coordinate of the original point of the image, the number of longitudinal pixels (namely the ordinate of the main point of the camera) of the difference between the central pixel coordinate of the image and the pixel coordinate of the original point of the image, and the number of longitudinal pixels between the vehicle body and the vehicle-mounted cameraThe external reference matrix is

Wherein r is₁₁～r₃₃Respectively, elements of an orthogonal rotation matrix, t₁～t₃Respectively, the elements of the translation matrix, namely, a coordinate conversion relation between the image pixel and the vehicle-mounted camera is established as follows:

the origin of the vehicle body coordinate system is at the vehicle body rear axle center, the rear axle center is located off the ground h, and zcamera, i.e., zcamera-h (the plane on which the ground is located in the vehicle body coordinate system is the plane z-h) is arranged according to h. Then according to zcamera ═ h, multiplying the internal parameter matrix and the external parameter matrix to convert into:

wherein, A00-A23 are elements in a matrix A obtained by multiplying the internal reference matrix and the external reference matrix respectively;

obtaining an equation set after matrix multiplication:

and obtaining a coordinate conversion relation between the final image pixel and the vehicle body as follows:

The coordinates (x, y) of the pixel coordinate point (u, v) at any point in the vehicle body coordinate system can be obtained by cross multiplication.

In step S3 in this embodiment, the coordinates of the bottom center point of the 2D bounding box obtained by predicting each trajectory are subjected to top-view projection using the coordinate transformation relationship, so as to obtain predicted trajectory coordinates of the predicted moving object in the current frame, which correspond to the vehicle body coordinate system, and complete trajectory prediction of the moving object.

The conventional coordinate transformation is often performed between two different types of coordinate systems, for example, the coordinate transformation of the laser and the vehicle body can obtain the coordinate of the laser coordinate system in the vehicle body coordinate system. In the embodiment, the coordinate conversion relationship between the pixels and the vehicle body can be finally constructed by continuously converting 3 coordinate systems of the vehicle body, the vehicle-mounted radar and the vehicle-mounted camera and then multiplying the parameters of the vehicle-mounted camera, and the conversion from the low dimension to the high dimension can be realized by making the height of the coordinate system of the vehicle body be a fixed value and solving through a column equation because the pixel coordinate system is two-dimensional and the vehicle body coordinate system is three-dimensional. The conversion principle is shown in fig. 4, in the figure, self-drivingcar represents an unmanned vehicle, lsliadar represents a front radium solid-state laser radar, camera represents a vehicle-mounted front camera, R1 and T1 represent a rotation matrix and a translation matrix of the laser radar reaching the vehicle body respectively, R2 and T2 represent a rotation matrix and a translation matrix of the vehicle-mounted camera to the laser radar respectively, and R3 and T3 are obtained by multiplying (R1, T1) and (R2, T2) to finally obtain the conversion relation between the vehicle body and a pixel coordinate system.

In this embodiment, step S3 further includes predicting a position deviation of the coordinates according to the trajectories of the moving objects in the two frames before and after, and obtaining the moving speed of the moving object. The vehicle body and the laser radar as well as the vehicle-mounted camera and the laser radar are respectively subjected to combined calibration, the coordinate conversion relation between pixels and the vehicle body is obtained, then, the displacement in the X direction and the displacement in the Y direction of the projection coordinates of the central points of the bottoms of the surrounding boxes of the moving targets of adjacent frames are calculated according to the overlooking projection, the speed in the X direction and the speed in the Y direction of the tracked moving targets are respectively calculated, and the detection of the speeds of various moving targets can be quickly and accurately realized.

Specifically, the present embodiment calculates the position deviation between the projection coordinates (u1, v1) of the bottom center point of the 2D bounding box of the current frame target and the projection coordinates (u2, v2) of the center point of the 2D bounding box of the previous frame target, substitutes the position deviation into the transformation equation set shown in the above equation (4) to obtain the coordinates (x1, y1) and (x2, y2) corresponding to the vehicle body coordinate system, and finally calculates the target movement speed according to the equation vx ═ x2-x1)/Δ t, vy ═ y2-y1)/Δ t, where Δ t is the time interval between the previous frame and the next frame.

The speed of the vehicle moving in the previous frame can be decomposed along the lane line into a speed vx in the x-direction and a speed vy in the y-direction, so that further speed estimation can be constrained using this feature, where the vx direction is the normal direction of the lane line and the vy direction is the tangential direction of the lane line. Since the unmanned vehicle can travel a large distance at a large normal direction speed vx in a short time when the vehicle is traveling close to the lane line boundary and exceeds the lane line boundary, the ratio of vx to vy must be smaller than a threshold value k, otherwise if the ratio of vx to vy is larger than k, the unmanned vehicle will tend to exceed the lane line boundary in a short time. In this embodiment, the velocity of the motion of the next frame is further updated according to the ratio of vx to vy by using the following formula, where the update formula is:

the value of k can be taken empirically, and if it is taken as 0.09, the calculation equation is as follows:

the embodiment collects road traffic environment information in real time through the vehicle-mounted camera, effectively detects the category of each object on the road and the 2D bounding box by adopting the depth learning detection network, further sends the characteristics extracted by the 2D bounding box and the depth network to the target tracking module for target tracking, can well solve the problem of frequent switching of tracking labels, further obtains reliable and accurate target track prediction coordinates through combined calibration between the vehicle-mounted camera and a laser radar and overhead projection from a camera to the ground, can realize track prediction of various moving targets based on visual transformation under any vehicle-mounted camera, improves the accuracy of the track prediction of the moving target of the unmanned vehicle, reduces the error of the track prediction, thereby providing accurate information for upper-layer decision planning, and greatly improving the safety and efficiency of unmanned driving, while also greatly reducing the cost and complexity of environmental sensing.

The method is applied to vehicle perception of an unmanned vehicle to predict a moving target track, and can be applied to ordinary vehicles with similar requirements according to actual requirements, such as an automatic vehicle braking function and the like.

The present embodiment of a device for predicting a trajectory of a moving object based on visual transformation includes:

the prediction and tracking module is used for predicting the target track by using the 2D bounding box of the moving target and tracking the target to obtain a predicted coordinate of the moving target track;

and the vision transformation module is used for transforming the predicted coordinates of the moving target track into a vehicle body coordinate system according to the coordinate transformation relation between the image pixels and the vehicle body obtained by calibration in advance to obtain the predicted coordinates of the moving target track in the vehicle body coordinate system.

The detection module specifically comprises a target detection network model, wherein the moving target is detected by inputting the road environment image acquired in real time into the target detection network model, and the target detection network model is obtained by training a historical road environment image data set acquired by vehicle-mounted image acquisition equipment.

The detection module of the embodiment further comprises a target category identification unit, configured to identify a target category according to the detected motion trajectory, inertia characteristics, and speed change characteristics of the moving target, where when the motion trajectory of the moving target tends to a straight line parallel to the lane line boundary and does not exceed the lane line boundary, and when the inertia characteristics of the moving target are greater than a preset threshold and the speed change rate is less than the preset threshold, the vehicle is determined.

The prediction and tracking module comprises a track prediction unit and a target tracking unit, wherein the track prediction unit uses a Kalman filter to predict the target track of the 2D bounding box of the moving target; and the target tracking unit extracts the depth features of the 2D bounding box by using a pre-trained depth network model, and then tracks the moving target by using the 2D bounding box and the extracted depth features.

In the vision transformation module of this embodiment, the vehicle body, the laser radar, and the vehicle-mounted camera are respectively calibrated in a combined manner, and a coordinate transformation relationship between the image pixels and the vehicle body is established and obtained by using the internal reference matrix of the vehicle-mounted image acquisition device and the external reference matrix between the vehicle body and the vehicle-mounted image acquisition device, where the coordinate transformation relationship is as described above.

The vehicle-mounted image acquisition equipment is a vehicle-mounted camera, modules in the moving target track prediction device can be arranged on a vehicle, images acquired by the vehicle-mounted camera are processed in real time, data can be transmitted to a remote control end in a remote control end arrangement mode, data processing is carried out at the control end, the number of modules on the vehicle is reduced, and a specific arrangement mode can be determined according to actual requirements.

In this embodiment, the moving object trajectory prediction device based on visual transformation corresponds to the moving object trajectory prediction method based on visual transformation one to one, and details are not repeated here.

The present embodiment also includes a computer apparatus comprising a processor and a memory, the memory being configured to store a computer program, the processor being configured to execute the computer program, and the processor being configured to execute the computer program to perform the method as described above.

Those skilled in the art will appreciate that the above description of a computer apparatus is by way of example only and is not intended to be limiting of computer apparatus, and that the apparatus may include more or less components than those described, or some of the components may be combined, or different components may be included, such as input output devices, network access devices, buses, etc. The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like which is the control center for the computer device and which connects the various parts of the overall computer device using various interfaces and lines.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the computer device by running or executing the computer programs and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash memory Card (FlashCard), at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device.

The modules/units integrated by the computer device may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can be executed by a processor to implement the steps of the embodiments of the template tagging-based distributed crawler method described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, electrical signals, software distribution medium, and the like.

The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention should fall within the protection scope of the technical scheme of the present invention, unless the technical spirit of the present invention departs from the content of the technical scheme of the present invention.

Claims

1. A moving target track prediction method based on visual transformation is applied to vehicle-mounted environment perception and is characterized by comprising the following steps:

2. The method for predicting the trajectory of a moving object based on visual transformation as claimed in claim 1, wherein in step S1, the moving object is detected by inputting the road environment image acquired in real time into an object detection network model trained by using the historical road environment image data set acquired by the vehicle-mounted image acquisition device.

3. The method for predicting a moving object trajectory based on visual transformation according to claim 1, wherein said step S1 further comprises identifying the object class according to the detected moving object trajectory and the inertia and speed variation characteristics, wherein the vehicle is determined when the moving object trajectory tends to a straight line parallel to the lane line boundary and does not exceed the lane line boundary, the inertia characteristic of the moving object is greater than a preset threshold, and the speed variation rate is less than the preset threshold.

4. The method for predicting a moving object trajectory based on visual transformation according to claim 1, wherein in step S2, a kalman filter is used to perform object trajectory prediction on the 2D bounding box of the moving object; extracting the depth features of the 2D bounding box by using a pre-trained depth network model, then tracking the moving target by using the 2D bounding box and the extracted depth features, and matching and correcting the target recognition result in the tracking process by using the mass m and the speed v as the physical features of the moving target.

5. The method for predicting the moving target track based on the visual transformation according to any one of claims 1 to 4, wherein in step S3, the body and the lidar as well as the lidar and the vehicle-mounted camera are jointly calibrated by using the lidar arranged on the vehicle as a reference coordinate system to determine the variation relationship between the body coordinate system and the lidar coordinate system as well as between the lidar coordinate system and the vehicle-mounted camera, the outer reference matrix from the lidar to the body and the outer reference matrix from the lidar to the vehicle-mounted camera are obtained by joint calibration, the outer reference matrix from the lidar to the body obtained by joint calibration is inverted and then multiplied by the outer reference matrix from the lidar to the vehicle-mounted camera to obtain the outer reference transformation matrix from the body to the vehicle-mounted camera, and the outer reference transformation matrix from the vehicle-mounted camera to the vehicle-mounted camera is obtained according to the inner reference matrix from the vehicle-mounted camera, and constructing to obtain a coordinate conversion relation between the image pixels and the vehicle body.

6. The method for predicting the trajectory of a moving object based on visual transformation as claimed in claim 5, wherein the internal reference matrix of the vehicle-mounted image capturing device is

Wherein r is₁₁～r₃₃Respectively, elements of an orthogonal rotation matrix, t₁～t₃Respectively, establishing and obtaining a coordinate conversion relation between the image pixel and the vehicle body as follows:

7. The method for predicting trajectories of moving objects based on visual transformation according to any one of claims 1 to 4, wherein in step S3, the coordinates of the center point of the bottom of the 2D bounding box obtained by predicting each trajectory are projected in a plan view by using the coordinate transformation relationship to obtain predicted trajectory coordinates of the predicted moving object in the current frame corresponding to the vehicle body coordinate system, thereby completing the trajectory prediction of the moving object.

8. The method for predicting a moving object trajectory based on visual transformation according to any one of claims 1 to 4, wherein the step S3 is followed by obtaining the moving speed of the moving object according to the position deviation between the predicted coordinates of the moving object trajectories of the previous and subsequent frames, and obtaining the moving speed according to the formula

And estimating the motion speed of the next frame, wherein vx 'and vy' are respectively speed components on an x axis and a y axis in the motion speed of the next frame, vx and vy are respectively speed components on the x axis and the y axis in the motion speed of the current frame, and k is a preset threshold.

9. A moving object trajectory prediction apparatus based on visual transformation, comprising:

10. A computer arrangement comprising a processor and a memory, the memory being adapted to store a computer program, the processor being adapted to execute the computer program, wherein the processor is adapted to execute the computer program to perform the method according to any of claims 1-8.