CN112364793A

CN112364793A - Target detection and fusion method based on long-focus and short-focus multi-camera vehicle environment

Info

Publication number: CN112364793A
Application number: CN202011288888.5A
Authority: CN
Inventors: 冯明驰; 王鑫; 孙博望; 刘景林
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2021-02-12

Abstract

The invention requests to protect a target detection and fusion method based on a long-focus and short-focus multi-camera vehicle environment. The method comprises the following steps: 1. and performing target detection on the images acquired by the long-focus and short-focus binocular cameras by adopting a convolutional neural network to obtain the positions of target frames in the images acquired by the cameras with different focal lengths at the same time. 2. According to the camera imaging principle and the internal and external parameters K, R, T obtained by camera calibration, the mapping relationship f of the space target point P in the long and short focus camera pixel coordinate system can be obtained. 3. And obtaining the position of the corresponding target frame in the short-focus camera image according to the mapping relation f of the target frame position in the long-focus camera image, and fusing the target frame position with the target in the original short-focus camera image, thereby realizing the target detection task under different distance conditions. The invention overcomes the limitation that a single focal length camera cannot adapt to target detection tasks at different distances, and improves the target detection accuracy in the vehicle environment. Meanwhile, the method is simple and easy to use, low in cost and high in instantaneity.

Description

Target detection and fusion method based on long-focus and short-focus multi-camera vehicle environment

Technical Field

The invention belongs to the technical field of intelligent automobile environment sensing, and particularly relates to a target detection and fusion method in a long-and-short-focal-length multi-camera vehicle environment.

Background

In recent years, with rapid development of fields such as artificial intelligence and machine vision, autopilot has become an important field for academic and industrial research. The environment perception technology is one of the key technologies in the automatic driving system and is the most basic module, and the environment around the vehicle is informed by the eyes of the vehicle. Target detection, positioning and motion state estimation are the most basic functions in the environment sensing module.

With the wide application of deep learning and the great improvement of computing capability of computing devices, the environmental awareness technology based on deep learning becomes an important support for the environmental awareness module. The environment perception based on vision mainly realizes the functions of pedestrian detection, obstacle detection, lane line detection, drivable area detection, traffic sign identification and the like, and can realize the positioning of a target by combining a stereoscopic vision technology. At present, researchers at home and abroad are always focusing on improving the target detection performance of a single focus camera. However, in a complex working environment, information acquired by a single focal length camera is limited, and only the single focal length camera cannot correctly detect targets at different distances, so that detection omission often occurs. And the cameras with different focal lengths can just make up the defects between the cameras and the vehicle, integrate the advantages of the cameras and the vehicle, and accurately detect the target in the vehicle environment. For example, short-focus cameras have a wide field of view, and distant targets are imaged less and difficult to detect through depth learning; the near target is large and easy to detect. The long-focus camera has narrow visual field and large distant target, and is easy to detect; but near objects may not be captured due to the camera view. Therefore, the respective advantages of the short-focus camera image and the long-focus camera image are integrated, the target detection tasks under different distances can be realized, the target under the vehicle environment can be more accurately detected, and the condition that the target is missed to be detected is effectively avoided.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. A target detection and fusion method based on a long-focus and short-focus multi-camera vehicle environment is provided. The technical scheme of the invention is as follows:

a target detection and fusion method based on a long-focus and short-focus multi-camera vehicle environment comprises the following steps:

step 1, respectively installing a long-focus camera and a short-focus binocular camera, and calibrating a binocular system. Inputting image sequences acquired by the long-focus camera and the short-focus camera into a deep learning convolutional neural network, and obtaining the positions of target frames of the long-focus binocular camera and the short-focus binocular camera in the wide visual field and the narrow visual field at the same moment through target detection;

step 2, obtaining a mapping relation f of the space target point P under the long-focus camera pixel coordinate system according to the target position in the long-focus camera narrow-view field and the internal and external parameters calibrated by the two eyes by utilizing the camera imaging principle, and obtaining a target position P in the long-focus camera narrow-view field₁Corresponding target position p in the wide field of view of a short-focus camera₂；

And 3, fusing target frames in the long-focus image and the short-focus image by analyzing the target position detected in the wide field of view of the short-focus camera and the target position corresponding to the target position in the narrow field of view of the long-focus camera in the wide field of view of the short-focus camera obtained in the step 2.

Further, the step 1) specifically comprises the following steps:

2-1, setting different focal lengths of the long and short-focus cameras, installing a binocular camera system to the same height above the vehicle, and reserving a certain baseline distance between the binocular cameras;

step 2-2, calibrating by using a Zhangyingyou calibration method to obtain an internal parameter K and an external parameter R, T of the binocular system; where K is an internal reference matrix containing information of the focal length, optical center, etc. of the camera, and R and T are the rotation matrix and translation matrix of the long focus camera relative to the short focus camera, respectively.

And 2-3, performing deep learning target detection, and performing target detection on images collected by the long-focus binocular camera and the short-focus binocular camera at the same moment by adopting a lightweight convolutional neural network YOLOv3-Tiny, wherein the method specifically comprises the following steps: and (4) data set making, transfer learning, network reasoning and target detection are carried out to obtain the positions of target frames under cameras with different focal lengths.

Further, the step 2-1 is to set the focal length of the camera and install the camera system, two cameras with different focal lengths are adopted, the short focal length camera is placed on the left side, the long focal length camera is placed on the right side, the length of a base line between the two cameras is b, a long-short focal binocular vision system is formed, and the binocular vision system is placed in front of the top of the vehicle.

Further, calibrating the long-focus and short-focus binocular camera in the step 2-2), placing a checkerboard calibration plate in front of the binocular camera, and necessarily requiring the checkerboard to be simultaneously present in the visual field of the long-focus and short-focus camera; capturing the corner points of the checkerboard calibration board by using a binocular camera, and calculating the internal parameters K of the cameras by using a Zhang-Zhengyou calibration method₁,K₂And external references R and T between the binocular cameras.

Further, the specific process of the step 2-3 of making the data set is that a Chongqing city traffic data set which is automatically collected and finished with label making is merged with an open-source Pascal VOC 2012 data set, and then the merged data set is subjected to data enhancement to obtain more training samples;

the specific process of the transfer learning is that a merged data set is loaded and trained by using a YOLOv3_ Tiny network on the basis of an existing pre-training model;

the network reasoning and target detection means that a trained network model weight is loaded by a YOLOv3_ Tiny network in the normal operation process of the intelligent vehicle to perform forward reasoning calculation so as to complete a target detection task.

Further, in step 2, a corresponding relationship between the long focus camera pixel coordinate system and the short focus camera pixel coordinate system is established by a camera imaging principle, and can be calculated by the following formula:

s₁p₁＝K₁P,s₂p₂＝K₂(RP+T)

p represents a point in real space, P₁,p₂Expressing the pixel points, K, corresponding to the points P in the space in the long and short-focus camera pixel coordinate system₁,K₂Respectively representing the internal parameters of the long-focus camera and the short-focus camera, and R, T representing the external parameters between the long-focus binocular camera and the short-focus binocular camera; s₁,s₂Representing depth information of the point P in the long and short focus camera coordinate systems, respectively.

When using homogeneous coordinates, the above equation is written as follows:

p₁＝K₁P,p₂＝K₂(RP+T)

by the above formula, p can be obtained₁,p₂The mapping relationship f between the following components:

p₂＝K₂RK₁-¹p₁+K₂T

further, the step 3 specifically includes the following steps:

step 3-1, detecting the ith target frame B in the long-focus camera_lPosition (x) of_l,y_l,w_l,h_l) According to the mapping relation f, the corresponding target frame B in the short-focus camera image can be obtained_s'position (x'_s,y′_s,w′_s,h′_s) (ii) a Wherein x is_l,y_l,w_l,h_lRespectively representing the horizontal and vertical coordinates of the central position of the target and the width and height of the target frame; x'_s,y′_s,w′_s,h′_sRespectively representing the horizontal and vertical coordinates of the center position of the mapped target and the width and height of the target frame.

Step 3-2, calculating the mapped target frame B'_s(x′_s,y′_s,w′_s,h′_s) Target frame B detected from short-focus camera_s(x_s,y_s,w_s,h_s) The cross-over ratio between IOU when IOU>When the threshold value t is reached, the long and short focal cameras detect the target frame; otherwise, at least one camera does not detect the target, and the calculation formula of the IOU is as follows:

step 3-3, when IOU>When the threshold value t is reached, the target is detected in the long-focus camera and the short-focus camera, and the target frame B 'needs to be calculated in consideration of the deviation of the actual mapping result'_s、B_sThe scaling ratios Δ w, Δ h and the offset ratios Δ x, Δ y of (a) are calculated as follows:

Δx＝x_s-x′_s

Δy＝y_s-y′_s

step 3-4, when IOU<At threshold t, it means that the short-focus camera has not detected the target, and B 'is required at this time'_sReduction of B_sI.e. calculate B_sAt the position in the short-focus camera, the reduction calculation formula is as follows:

w_s＝w′_s*Δw

h_s＝h′_s*Δh

x_s＝x′_s+Δx

y_s＝y′_s+Δy

and 3-5, repeating all the steps from 3-1 to 3-4, and completing target fusion according to the target positions and types in the long-focus camera and the short-focus camera.

The invention has the following advantages and beneficial effects:

the invention provides a target detection and fusion method based on a long-focus and short-focus multi-camera vehicle environment. In the field of unmanned driving, the target detection technology based on monocular vision is widely applied. The methods often have the problems of good detection effect of near targets and poor detection effect of far targets. This is because a single camera with a fixed focal length does not adapt well to the detection of objects at different positions, for example, short focus cameras have a wide field of view and distant objects have a small image, and are therefore difficult to detect by depth learning. The long-focus camera has narrow visual field, and a far target is imaged clearly, so that the detection is facilitated for deep learning, but a near target may not be in the long-focus visual field and therefore cannot be detected.

Therefore, the invention adopts a method of long-focus and short-focus multi-camera to detect and fuse the targets in the vehicle environment, and becomes an effective method for solving the problems. The advantages are shown in the following aspects:

(1) the invention adopts the short-focus camera and the long-focus camera as the target detection and fusion technology of the sensor, and compared with the target detection method based on the single-focus camera, the target detection method has higher accuracy and better practical application effect. The method combines the advantages of the short-focus camera and the long-focus camera, makes up for the defect of a single-focus camera, and improves the accuracy of target detection in a vehicle environment.

(2) Compared with a monocular camera, the binocular long-focus and short-focus camera can obtain richer visual information in a vehicle environment and can better realize detection tasks of targets at different distances.

(3) On the basis of a self-made Chongqing traffic data set, the method adopts the lightweight convolutional neural network YOLOv3-Tiny to detect the target in the image, and compared with a common YOLOv3 algorithm, the method has the advantages that the detection speed is higher, the real-time operation can be realized on embedded edge equipment and the like, and the good detection precision can be achieved.

(4) The IOU is often used for deep learning target detection to measure the confidence of a target frame, the method innovatively adopts the IOU as the judgment standard of target matching, the accuracy of target matching is greatly improved, and meanwhile, the method is low in computation time complexity and is faster than the traditional method.

Drawings

FIG. 1 is a simplified flowchart of a target detection and fusion method based on a long and short-focus multi-camera vehicle environment according to a preferred embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

the invention aims to provide a target detection and fusion method based on a long-focus and short-focus multi-camera vehicle environment. Through the independent camera of two different focus of installation at intelligent roof portion (keep certain baseline distance between), utilize target detection and fusion technique based on degree of depth learning, overcome the limitation of target detection task under the different distances, the effectual emergence of avoiding the target condition of louing examining, propose this technical scheme, as shown in fig. 1, include following step:

step 1, installing a long-focus and short-focus binocular camera and calibrating a binocular system. And inputting the image sequence acquired by the long-focus and short-focus cameras into the deep learning convolutional neural network, and obtaining the positions of target frames of the long-focus and short-focus binocular cameras in the wide view field and the narrow view field at the same moment through target detection. The method comprises the following specific steps:

the short-focus camera and the long-focus camera are arranged on the top of a vehicle, the focal length of the cameras is set, a camera system is arranged, two cameras with different focal lengths are adopted, the short-focus camera is arranged on the left side, the long-focus camera is arranged on the right side, the length of a base line between the two cameras is b, a long-short-focus binocular vision system is formed, and the binocular vision system is arranged in front of the top of the vehicle.

(2) Calibrating the long-focus and short-focus binocular cameras, placing a checkerboard calibration plate in front of the binocular cameras, and necessarily requiring the checkerboard to be simultaneously present in the visual fields of the long-focus and short-focus cameras; and capturing the angular points of the checkerboard calibration board by using binocular cameras, and calculating internal parameters K of the cameras and external parameters R and T between the binocular cameras by using a Zhang-friend calibration method. Where K is an internal reference matrix containing information of the focal length, optical center, etc. of the camera, and R and T are the rotation matrix and translation matrix of the long focus camera relative to the short focus camera, respectively.

(3) And (3) deep learning target detection, wherein a light-weight convolutional neural network YOLOv3-Tiny is adopted to carry out target detection on images collected by the long-focus and short-focus binocular cameras at the same moment, and the positions of target frames under different focus cameras are obtained.

Step 2, obtaining a mapping relation f of the space target point P under the long-focus camera pixel coordinate system according to the target position in the long-focus camera narrow-view field and the internal and external parameters calibrated by the two eyes by using the camera imaging principle, and further obtaining a target position P in the long-focus camera narrow-view field₁Corresponding target position p in the wide field of view of a short-focus camera₂. The method comprises the following specific steps:

(1) according to the camera imaging principle, establishing a pixel coordinate P of a point P in space in a long-focus and short-focus camera pixel coordinate system₁,p₂The relationship between the target positions can be reduced by the target positions on the long-focus camera pixel coordinate system, and the target positions which are not detected in the short-focus camera pixel coordinate system can be reduced.

(2) The corresponding relation between the long-focus camera pixel coordinate system and the short-focus camera pixel coordinate system is established by the camera imaging principle, and can be calculated by the following formula:

s₁p₁＝K₁P,s₂p₂＝K₂(RP+T)

p represents a point in real space, P₁,p₂And expressing pixel points corresponding to the points P in the space in the long and short focus camera pixel coordinate system respectively. K₁,K₂Representing the internal parameters of a long focus camera and a short focus camera, respectively. R, T denotes extrinsic parameters between the long and short-focus binocular cameras. s₁,s₂Representing depth information of the point P in the long and short focus camera coordinate systems, respectively.

If homogeneous coordinates are used, the above equation can be written as follows:

p₁＝K₁P,p₂＝K₂(RP+T)

(3) by the above formula, p can be obtained₁,p₂The mapping relationship f between the following components:

p₂＝K₂RK₁-1p₁+K₂T

and 3, analyzing the target position detected in the wide field of view of the short-focus camera and the target position corresponding to the target position in the narrow field of view of the long-focus camera in the wide field of view of the short-focus camera obtained in the step 2, and further performing fusion processing on the target frames in the long-focus and short-focus images. The method comprises the following specific steps:

(1) for the ith target frame B detected in the long-focus camera_lPosition (x) of_l,y_l,w_l,h_l) According to the mapping relation f, the corresponding target frame B 'in the short-focus camera image can be obtained'_sPosition (x'_s,y′_s,w′_s,h′_s). Wherein x is_l,y_l,w_l,h_lRespectively representing the horizontal and vertical coordinates of the central position of the target and the width and height of the target frame; x'_s,y′_s,w′_s,h′_sRespectively representing the horizontal and vertical coordinates of the center position of the mapped target and the width and height of the target frame.

(2) Calculating mapped target frame B'_s(x′_s,y′_s,w′_s,h′_s) Target frame B detected from short-focus camera_s(x_s,y_s,w_s,h_s) The cross-over ratio between IOU. When IOU is used>When the threshold value t is reached, the long and short focal cameras detect the target frame; otherwise, at least one camera does not detect the target. The calculation formula of the IOU is as follows:

(3) when IOU is used>When the threshold value t is reached, the target is detected in the long-focus camera and the short-focus camera, and the target frame B 'needs to be calculated in consideration of the deviation of the actual mapping result'_s、B_sAnd the scaling Δ w, Δ h and the offset scaling Δ x, Δ y. The calculation formula is as follows:

Δx＝x_s-x′_s

Δy＝y_s-y′_s

(4) when IOU is used<At threshold t, it means that the short-focus camera has not detected the target, and B 'is required at this time'_sReduction of B_sI.e. calculate B_sAt the position in the short-focus camera, the reduction calculation formula is as follows:

w_s＝w′_s*Δw

h_s＝h′_s*Δh

x_s＝x′_s+Δx

y_s＝y′_s+Δy

(5) and repeating all the steps from 3-1 to 3-4 to complete target fusion according to the target positions and the types in the long-focus camera and the short-focus camera.

The method illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A target detection and fusion method based on a long-focus and short-focus multi-camera vehicle environment is characterized by comprising the following steps:

step 2, obtaining a mapping relation f of the space target point P under the long-focus camera pixel coordinate system according to the target position in the long-focus camera narrow-view field and the binocular calibrated internal and external parameters by utilizing the camera imaging principle, and obtaining a target position P in the long-focus camera narrow-view field₁Corresponding target position p in the wide field of view of a short-focus camera₂；

2. The method for detecting and fusing the targets in the long-and-short-focus-based multi-camera vehicle environment according to claim 1, wherein the step 1) specifically comprises the following steps:

step 2-2, calibrating by using a Zhangyingyou calibration method to obtain an internal parameter K and an external parameter R, T of the binocular system; wherein K is an internal reference matrix containing information such as focal length and optical center of the camera, and R and T are a rotation matrix and a translation matrix of the long-focus camera relative to the short-focus camera respectively;

3. The method for detecting and fusing targets under the vehicle environment based on the long-focus and short-focus multiple cameras as claimed in claim 2, wherein the step 2-1 is to set the focal lengths of the cameras and install the camera system, two cameras with different focal lengths are adopted, the short-focus camera is placed on the left, the long-focus camera is placed on the right, the length of a base line between the two cameras is b, a long-focus and short-focus binocular vision system is formed, and the binocular vision system is placed in front of the top of the vehicle.

4. The method for target detection and fusion based on long-and-short-focus multi-camera vehicle environment according to claim 2, wherein the step 2-2) is used for calibrating long-and-short-focus doubleA checkerboard calibration plate is placed in front of the binocular camera, and the checkerboard must be required to be simultaneously present in the visual field of the long-focus camera and the short-focus camera; capturing the corner points of the checkerboard calibration board by using a binocular camera, and calculating the internal parameters K of the cameras by using a Zhang-Zhengyou calibration method₁,K₂And external references R and T between the binocular cameras.

5. The method for detecting and fusing targets under the environment of the long-and-short-focus multi-camera vehicle as claimed in claim 2, wherein the specific process of the step 2-3 data set production is to combine a Chongqing city traffic data set which is automatically collected and labeled with an open-source Pascal VOC 2012 data set, and then perform data enhancement on the combined data set to obtain more training samples;

6. The method for detecting and fusing targets in the long-focus and short-focus multi-camera vehicle environment according to one of claims 1 to 5, wherein the step 2 establishes the correspondence relationship between the long-focus camera pixel coordinate system and the short-focus camera pixel coordinate system according to the camera imaging principle, and can be calculated according to the following formula:

s₁p₁＝K₁P,s₂p₂＝K₂(RP+T)

p represents a point in real space, P₁,p₂Expressing the pixel points, K, corresponding to the points P in the space in the long and short-focus camera pixel coordinate system₁,K₂Denotes intrinsic parameters of the telephoto camera and the short-focus camera, respectively, R, T denotes extrinsic parameters between the telephoto and the short-focus binocular cameras, s₁,s₂Representing point P in long and short-focus camera coordinate systems, respectivelyDepth information;

when using homogeneous coordinates, the above equation is written as follows:

p₁＝K₁P,p₂＝K₂(RP+T)

。

7. the method for detecting and fusing targets in the long-and-short-focus-based multi-camera vehicle environment according to claim 6, wherein the step 3 specifically comprises the following steps:

step 3-1, detecting the ith target frame B in the long-focus camera_lPosition (x) of_l,y_l,w_l,h_l) According to the mapping relation f, the corresponding target frame B 'in the short-focus camera image can be obtained'_sPosition (x'_s,y′_s,w′_s,h′_s) (ii) a Wherein x is_l,y_l,w_l,h_lRespectively representing the horizontal and vertical coordinates of the central position of the target and the width and height of the target frame; x'_s,y′_s,w′_s,h′_sRespectively representing the horizontal and vertical coordinates of the center position of the mapped target and the width and height of the target frame;

Δx＝x_s-x′_s

Δy＝y_s-y′_s

w_s＝w′_s*Δw

h_s＝h′_s*Δh

x_s＝x′_s+Δx

y_s＝y′_s+Δy