CN116503299A

CN116503299A - Method, system and storage medium for fusing targets among vehicle-mounted multiple cameras

Info

Publication number: CN116503299A
Application number: CN202310483852.XA
Authority: CN
Inventors: 柴志文
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-07-28

Abstract

The invention discloses a method, a system and a storage medium for fusing targets among vehicle-mounted multiple cameras, wherein the method comprises the following steps: step 1: acquiring images shot by the vehicle-mounted multi-camera at the same time, an internal reference matrix of each camera, an external reference matrix of an adjacent camera and a base distance between the adjacent cameras, and performing target detection by using a neural network to acquire all targets on each image; step 2: determining a non-overlapping target frame; step 3: uniformly sampling to obtain a sampling point set, and re-projecting the sampling point set onto an adjacent camera image according to an epipolar constraint error minimization criterion to obtain a re-projection rectangular frame; step 4: logically judging an original target frame, a reprojected rectangular frame and an adjacent camera target frame of the current camera, and carrying out target fusion according to a judging result; step 5: and traversing all cameras, and repeating the steps 2-4 to process the target frames under all cameras. The method can efficiently and accurately judge the uniqueness of the targets among the multiple cameras, and solves the problem of fusion among the multiple cameras of the shielding targets.

Description

Method, system and storage medium for fusing targets among vehicle-mounted multiple cameras

Technical Field

The invention relates to the technical field of target fusion, in particular to a method, a system and a storage medium for fusing targets among vehicle-mounted multiple cameras.

Background

The vision is used as a core sensing means of intelligent driving, so that a dynamic target and a static target in the running process of the vehicle can be intuitively sensed, and the safety running of the vehicle is ensured. Due to the view angle of the camera, a plurality of cameras are generally required to be arranged for sensing 360-degree images around the vehicle, and images between adjacent cameras generally have overlapping areas. On the premise of ensuring 360-degree full perception, the uniqueness of the target in the overlapped area is judged efficiently and accurately, and is also important for planning control of vehicle running.

In the prior art, the Chinese patent publication No. CN113077511B discloses a multi-camera target matching and tracking method and device for an automobile, wherein the method uses a projection matrix between a camera coordinate system and a vehicle coordinate system to project an image target onto the vehicle coordinate system, and the aim of matching targets in multiple paths of video frames is fulfilled by calculating the difference of three-dimensional position coordinates and the appearance similarity. As an industry consensus, a monocular image lacks depth information, and cannot recover the full-image depth information through a projection matrix to be used for calculating three-dimensional coordinates (x, y, z) under a world coordinate system, but if the world coordinate z=0 corresponding to a pixel is known, the pixel is generally called a grounding point, and x, y values of the pixel under the world coordinate system can be calculated through the projection matrix. In the method, the vehicle coordinate system is regarded as a world coordinate system and is positioned on a flat ground, and the coordinate calculation mode under the vehicle coordinate system essentially assumes that the bottom edge of the target frame is positioned on a ground plane and the ground is a horizontal plane, so that the x and y values of the grounding point of the bottom edge of the target frame under the vehicle coordinate system can be obtained. However, in the actual running process, the running road surface is usually a non-standard horizontal plane, and the bottom edge of the target frame is not necessarily located on the running ground of the vehicle, for example, the traffic sign is located on the sky, the sidewalk is higher than the running road surface of the vehicle, and the coordinates calculated by using the method have larger errors. In order to achieve accuracy, the method uses the image similarity as a basis for further judgment, but when the target frames are blocked mutually, the method using the image similarity is time-consuming to calculate and can not accurately distinguish blocked targets.

Therefore, the method has a certain limitation in use, and the vehicle running process is more required to be an efficient and accurate multi-camera target fusion method which is suitable for different roads, does not limit the position of the target in the image and can solve the problem that the target is partially blocked.

Disclosure of Invention

The invention aims to solve the problems in the prior art, and provides a vehicle-mounted multi-camera target fusion method, a vehicle-mounted multi-camera target fusion system and a vehicle-mounted multi-camera target fusion storage medium, which can efficiently and accurately judge the uniqueness of targets among multiple cameras without limiting the running conditions of a vehicle, and solve the problem of multi-camera fusion of shielding targets.

The primary purpose of the invention is to solve the technical problems, and the technical scheme of the invention is as follows:

the invention provides a target fusion method among vehicle-mounted multiple cameras, which comprises the following steps:

step 1: acquiring images shot by the vehicle-mounted multi-camera at the same time, an internal reference matrix of each camera, an external reference matrix of an adjacent camera and a base distance between the adjacent cameras, and performing target detection by using a neural network to acquire all targets on each image;

step 2: selecting each camera to determine a non-overlapping target frame;

step 3: uniformly sampling the area in the non-overlapping target frame to obtain a sampling point set, and re-projecting the sampling point set onto an adjacent camera image according to an epipolar constraint error minimization criterion to obtain a re-projected rectangular frame;

step 4: performing logic judgment on an original target frame, a reprojected rectangular frame and an adjacent camera target frame of the current camera, and performing target fusion processing according to judgment results;

step 5: and traversing all cameras, and repeating the steps 2-4 to process the target frames under all cameras.

Further, the neural network includes: convolutional neural network, transform network.

Further, each camera is selected to determine a non-overlapping target frame, and the specific steps are as follows:

judging whether overlapping exists among original target frames under the currently selected camera, and if so, acquiring a non-overlapping area of each target frame as a non-overlapping target frame; if there is no overlap, the target frame itself is taken as a non-overlapping target frame.

Further, the sampling point set is reprojected onto an adjacent camera image according to an epipolar constraint error minimization criterion to obtain a reprojected rectangular frame, which comprises the following specific steps:

(a) Carrying out distortion correction on the sampling points to obtain undistorted coordinates Po;

(b) Acquiring a right eye based on an intermediate object basic matrix F and a sampling point undistorted coordinate Po by using an internal reference matrix of each camera and an external reference matrix of an adjacent camera, and calculating a polar equation corresponding to the sampling point Po on an image of the adjacent camera;

le＝FPo

(c) According to the polar line equation le, setting a distance threshold value Td, and according to a point-to-direct distance calculation formula, obtaining all pixel points Pt on an image with the polar line distance smaller than Td _i (t _i ＝0,1,...,n)；

(d) Based on sampling point Po and pixel point Pt _i For the pixel point Pt _i Three-dimensional reconstruction is carried out to obtain sampling points Po and pixel points Pt _i Three-dimensional coordinates Wt (Xt, yt, zt) of two corresponding points in adjacent camera coordinate systems;

(e) According to the external parameter matrixes R and T of the current camera and the adjacent camera, converting the three-dimensional coordinate under the coordinate system of the adjacent camera into the three-dimensional coordinate Wo under the coordinate system of the current camera;

(f) According to the internal reference matrix of the current camera, converting the three-dimensional coordinates of the sampling points in the current camera coordinate system into the coordinate of the reprojection image in the current camera, and recording the converted points as Pr _i (i＝0,1,...,n)；

(g) Calculating the distance between the sampling point Po and the converted point to obtain the minimum distance re-projection pixel coordinate Pr _i Determining an optimal re-projection point Pt;

(h) And determining the maximum circumscribed rectangular frame as the reprojected rectangular frame of the current camera unobstructed rectangular frame on the adjacent camera image according to the optimal reprojected points of all the sampling points.

Further, the expression of three-dimensional reconstruction is:

Zt＝B*f/d

Xt＝Z*Pt _i _x/f

Yt＝Z*Pt _i _y/f

wherein B is the base distance between the current camera and the adjacent camera, f is the focal length of the camera, d is Po and Pt _i Parallax of dot, pt _i X is the point Pt _i Pixel x coordinate, pt _i Y is the point Pt _i Is defined as the pixel y coordinate of (c).

Further, the three-dimensional coordinate conversion formula is:

Wo＝RWt+T

wo represents the three-dimensional coordinates in the current camera coordinate system.

Further, in step 4, the original target frame, the reprojected rectangular frame, and the target frames of the adjacent cameras of the current camera are logically judged, and the target fusion processing is performed according to the judging result, specifically:

calculating the intersection ratio of the re-projection rectangular frame and the adjacent camera target frame, and setting the threshold value of the intersection ratio as Tiou;

if the intersection ratio of the adjacent camera target frame and the reprojection rectangular frame is larger than a threshold Tiou, judging that the current camera original target frame corresponding to the adjacent camera target frame and the reprojection frame is a target, and carrying out target fusion;

if the intersection ratio of the adjacent camera target frame and the reprojection rectangular frame is smaller than or equal to the threshold Tiou, the current camera original target frame corresponding to the adjacent camera target frame and the reprojection rectangular frame is not the same target, and target fusion is not carried out.

A second aspect of the present invention provides a vehicle-mounted multi-camera inter-target fusion system, the system comprising: the system comprises a memory and a processor, wherein the memory comprises a vehicle-mounted multi-camera target fusion method program, and the method comprises the following steps of:

step 2: selecting each camera to determine a non-overlapping target frame;

step 3: uniformly sampling the region in the non-overlapping target frame to obtain a sampling point set, and re-projecting the sampling point set onto an adjacent camera image according to an epipolar constraint error minimization criterion to obtain a re-projected rectangular frame;

The third aspect of the present invention provides a storage medium, where the storage medium includes a vehicle-mounted multi-camera target fusion method program, where the vehicle-mounted multi-camera target fusion method program, when executed by a processor, implements the steps of the vehicle-mounted multi-camera target fusion method.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention obtains the target frame of the reprojection rectangular frame by utilizing the minimum rule according to the epipolar constraint error for the sampling point set, and can efficiently and accurately judge the uniqueness of the target among multiple cameras without limiting the running condition of the vehicle; and meanwhile, the problem of fusion among multiple cameras of the shielding target is solved by uniformly sampling the area in the non-overlapping target frame.

Drawings

Fig. 1 is a flowchart of a method for fusing targets between multiple cameras on a vehicle according to an embodiment of the invention.

FIG. 2 is a flow chart of determining non-overlapping target boxes according to an embodiment of the invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.

Example 1

As shown in fig. 1, the first aspect of the present invention provides a method for fusing targets between multiple cameras on a vehicle, comprising the following steps:

in the present invention, the reference matrix: the method is used for describing the relation between the three-dimensional world coordinates and the two-dimensional pixel coordinate system, wherein the three-dimensional coordinates can obtain a determined image pixel coordinate through an internal reference matrix, and the pixel coordinate can obtain a plurality of three-dimensional world coordinates of non-unique solutions through the internal reference matrix, which are parameters of each camera; external parameter matrix: three-dimensional coordinate points are converted from one coordinate system to another three-dimensional coordinate system, and pose transformation between different cameras is described. The internal reference matrix and the external reference matrix can be obtained through a camera calibration method, the method is more and universal, and the invention is not limited.

The invention uses the neural network to detect the target, and the detected target in the image corresponds to a target frame and is selected by the target frame. The present invention is not limited to a specific neural network, and convolutional neural networks such as: a one-phase target detection network, such as the yolo series, may also be used, as may a two-phase target detection network, such as FasterRCNN and its optimization series. A transform network, such as DETR and its variants, may also be used.

Step 2: selecting each camera to determine a non-overlapping target frame;

fig. 2 shows a flow of determining non-overlapping target boxes.

It should be noted that, selecting each camera to determine a non-overlapping target frame includes the following specific steps:

judging whether overlapping exists among original target frames under the currently selected camera, and if so, acquiring a non-overlapping area of each target frame as a non-overlapping target frame;

if there is no overlap, the target frame itself is taken as a non-overlapping target frame.

It should be noted that, by determining the non-overlapping target frame and further determining the non-overlapping region, sampling is facilitated for the non-overlapping region to obtain the sampling point set.

it should be noted that, the polar constraint error minimization criterion can accurately establish a re-projection rectangular frame of the target frame under one camera image coordinate system under another camera image coordinate system.

The sampling point set is re-projected onto an adjacent camera image according to an epipolar constraint error minimization criterion to obtain a re-projected rectangular frame, and the specific steps are as follows:

le＝FPo

(c) Setting a distance threshold Td according to a polar equation le, and calculating a formula according to the point and the direct distanceObtaining all pixel points Pt on the image with the distance from the polar line less than Td _i (t _i ＝0,1,...,n)；

the expression for three-dimensional reconstruction is:

Zt＝B*f/d

Xt＝Z*Pt _i _x/f

Yt＝Z*Pt _i _y/f

(e) According to the external parameter matrixes R and T of the current camera and the adjacent camera, converting the three-dimensional coordinates in the coordinate system of the adjacent camera into the three-dimensional coordinates Wo in the coordinate system of the current camera;

the three-dimensional coordinate conversion formula is:

Wo＝RWt+T

(f) According to the internal reference matrix of the current camera, converting the three-dimensional coordinates of the sampling point under the current camera coordinate system into the coordinate of the reprojection image under the current camera, and recording the converted point as Pr _i (i＝0,1,...,n)；

it should be noted that, by performing logic judgment on the original target frame, the reprojection rectangular frame and the adjacent camera target frame of the current camera, the target fusion processing is performed according to the judgment result, and the specific process is as follows:

step 2: selecting each camera to determine a non-overlapping target frame;

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. The method for fusing targets among the vehicle-mounted multiple cameras is characterized by comprising the following steps of:

step 2: selecting each camera to determine a non-overlapping target frame;

2. The method for fusing targets among multiple cameras in a vehicle of claim 1, wherein the neural network comprises: convolutional neural network, transform network.

3. The method for merging targets among multiple cameras in a vehicle according to claim 1, wherein selecting each camera to determine a non-overlapping target frame comprises the following specific steps:

4. The method for merging targets among multiple vehicle-mounted cameras according to claim 1, wherein the sampling point set is re-projected onto the adjacent camera images according to a polar constraint error minimization criterion to obtain a re-projected rectangular frame, and the specific steps are as follows:

le＝FPo

5. The method for fusing targets among multiple cameras on a vehicle according to claim 4, wherein the expression of three-dimensional reconstruction is:

Zt＝B*f/d

Xt＝Z*Pt _i _x/f

Yt＝Z*Pt _i _y/f

6. The method for fusing targets among multiple cameras in a vehicle according to claim 4, wherein the three-dimensional coordinate transformation formula is:

Wo＝RWt+T

7. The method of claim 1, wherein in step 4, the logical judgment is performed on the original target frame, the reprojected rectangular frame, and the adjacent target frame of the current camera, and the target fusion process is performed according to the judgment result, specifically:

8. A vehicle-mounted multi-camera inter-target fusion system, the system comprising: memory, processor, including a vehicle-mounted multi-camera inter-target fusion method program in the memory, the vehicle-mounted multi-camera inter-target fusion method program when executed by the processor implementing the steps of a vehicle-mounted multi-camera inter-target fusion method according to any one of claims 1-6:

step 2: selecting each camera to determine a non-overlapping target frame;

9. The system for merging targets among multiple cameras in a vehicle according to claim 7, wherein each camera is selected to determine a non-overlapping target frame, comprising the following steps:

10. A storage medium, wherein the storage medium includes a vehicle-mounted multi-camera inter-target fusion method program, and the vehicle-mounted multi-camera inter-target fusion method program, when executed by a processor, implements the steps of a vehicle-mounted multi-camera inter-target fusion method according to any one of claims 1 to 7.