CN112802114B - Multi-vision sensor fusion device, method thereof and electronic equipment - Google Patents

Multi-vision sensor fusion device, method thereof and electronic equipment Download PDF

Info

Publication number
CN112802114B
CN112802114B CN201911103519.1A CN201911103519A CN112802114B CN 112802114 B CN112802114 B CN 112802114B CN 201911103519 A CN201911103519 A CN 201911103519A CN 112802114 B CN112802114 B CN 112802114B
Authority
CN
China
Prior art keywords
camera
image
eye
module
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911103519.1A
Other languages
Chinese (zh)
Other versions
CN112802114A (en
Inventor
孙佳睿
李楠
李健
金瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Sunny Optical Intelligent Technology Co Ltd
Original Assignee
Zhejiang Sunny Optical Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Sunny Optical Intelligent Technology Co Ltd filed Critical Zhejiang Sunny Optical Intelligent Technology Co Ltd
Priority to CN201911103519.1A priority Critical patent/CN112802114B/en
Publication of CN112802114A publication Critical patent/CN112802114A/en
Application granted granted Critical
Publication of CN112802114B publication Critical patent/CN112802114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/239Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

A multi-vision sensor fusion device, a method thereof and an electronic device. The multi-vision sensor fusion device comprises a binocular camera, wherein the binocular camera comprises a left-eye camera and a right-eye camera which are arranged at intervals, the left-eye camera is used for acquiring image information of a scene to obtain an initial left-eye image, and the right-eye camera is used for synchronously acquiring the image information of the scene to obtain an initial right-eye image; the relative position between the TOF camera and the binocular camera is fixed, and the TOF camera is used for synchronously acquiring the depth information of the scene to obtain an initial depth image; and a multi-vision sensor fusion system, wherein the multi-vision sensor fusion system is respectively connected with the binocular camera and the TOF camera in a communication way and is used for processing the initial left-eye image and the initial right-eye image by taking the initial depth image as a reference so as to obtain a fused depth image.

Description

Multi-vision sensor fusion device, method thereof and electronic equipment
Technical Field
The invention relates to the technical field of vision sensors, in particular to a multi-vision sensor fusion device, a method and electronic equipment thereof.
Background
With the rapid development of artificial intelligence, there is an increasing demand for high-precision visual sensors. The good depth information can provide better help for developing various algorithms based on vision, and is convenient to be widely expanded in various fields such as security protection, monitoring, machine vision or robots. For example: object recognition and obstacle detection in autopilot; identification, sorting, unstacking and stacking of scattered objects in industry; goods shelf grabbing of objects in a logistics scene, and the like.
Currently, there are mainly binocular vision stereo matching techniques (e.g., binocular cameras), TOF (Time of Flight) techniques (e.g., TOF cameras), and structured light techniques (e.g., structured light cameras) for acquiring depth information. However, the separate TOF camera has the influence of multipath interference, and has the problem that depth information is lost in a region with low reflectivity, the overall resolution is low, the point cloud is sparse, and the accuracy of the depth information acquired particularly at a long distance is poor; however, although the binocular camera has no drawbacks of the TOF camera, the calculated amount is large, and the problems of occlusion, repeated texture and matching error of the weak texture area exist, so that the accuracy of depth information acquired by the binocular camera is low; in addition, the structured light camera has the defects of low resolution, long measurement time, poor reliability and the like due to different coding methods. That is, a single depth vision sensor cannot obtain depth information with high accuracy; the simple superposition of multiple depth vision sensors can not well improve the overall performance and obtain high-precision depth information, but can greatly increase the complexity and the calculated amount of the system structure, so that the overall cost is greatly increased.
Disclosure of Invention
An object of the present invention is to provide a multi-vision sensor fusion apparatus, a method thereof and an electronic device, which can fuse the advantages of a TOF camera and a binocular camera, and overcome the respective disadvantages, thereby contributing to the improvement of the accuracy of the obtained depth image.
Another object of the present invention is to provide a multi-vision sensor fusion apparatus, a method thereof, and an electronic device, wherein in an embodiment of the present invention, the multi-vision sensor fusion apparatus is capable of acquiring dense and highly accurate depth images at multiple distances.
Another object of the present invention is to provide a multi-vision sensor fusion device, a method thereof and an electronic device, wherein in an embodiment of the present invention, the multi-vision sensor fusion device can be used to further improve the accuracy of the obtained depth image due to the advantages of the TOF camera, the binocular camera and the RGB camera.
Another object of the present invention is to provide a multi-vision sensor fusion device, a method thereof and an electronic apparatus, wherein in an embodiment of the present invention, the multi-vision sensor fusion method can fuse a TOF camera and a binocular camera with larger resolution differences, so as to facilitate popularization and use of the multi-vision sensor fusion method.
Another object of the present invention is to provide a multi-vision sensor fusion device, a method thereof and an electronic apparatus, wherein in an embodiment of the present invention, the multi-vision sensor fusion method adopts a three-dimensional calibration board for calibration, which is helpful for simplifying the calibration process of the multi-vision sensor fusion device and improving the robustness of the overall calibration algorithm.
Another object of the present invention is to provide a multi-vision sensor fusion device, a method thereof and an electronic device, wherein in an embodiment of the present invention, the multi-vision sensor fusion device can greatly reduce matching time of point pairs in binocular images, reduce calculation amount of the matching point pairs, and facilitate fast obtaining of high-precision depth images.
Another object of the present invention is to provide a multi-vision sensor fusion apparatus, a method thereof and an electronic device, wherein expensive materials or complex structures are not required to be used in the present invention in order to achieve the above-mentioned objects. Accordingly, the present invention successfully and effectively provides a solution that not only provides a simple multi-vision sensor fusion apparatus and method thereof and electronic equipment, but also increases the practicality and reliability of the multi-vision sensor fusion apparatus and method thereof and electronic equipment.
To achieve at least one of the above or other objects and advantages, the present invention provides a multi-vision sensor fusion apparatus comprising:
A binocular camera, wherein the binocular camera comprises a left-eye camera and a right-eye camera which are arranged at intervals, wherein the left-eye camera is used for acquiring image information of a scene to obtain an initial left-eye image, and the right-eye camera is used for synchronously acquiring the image information of the scene to obtain an initial right-eye image;
a TOF camera, wherein the relative position between the TOF camera and the binocular camera is fixed, and the TOF camera is used for synchronously acquiring depth information of the scene to obtain an initial depth image; and
And the multi-vision sensor fusion system is respectively connected with the binocular camera and the TOF camera in a communication way and is used for processing the initial left-eye image and the initial right-eye image by taking the initial depth image as a reference so as to obtain a fused depth image.
In an embodiment of the invention, the multi-vision sensor fusion device further includes an RGB camera, wherein a relative position between the RGB camera and the TOF camera is fixed, and the multi-vision sensor fusion device is used for synchronously acquiring color information of the scene to obtain an RGB image.
In an embodiment of the invention, the TOF camera is disposed between the left eye camera and the right eye camera of the binocular camera, and the optical center of the TOF camera is located on a line between the optical centers of the left eye camera and the right eye camera.
In an embodiment of the present invention, the RGB camera is disposed between the left-eye camera and the right-eye camera of the binocular camera, and the optical center of the RGB camera is located on a line between the optical centers of the left-eye camera and the right-eye camera.
In an embodiment of the invention, the RGB camera is disposed between the left eye camera and the right eye camera of the binocular camera, and a line between the optical centers of the RGB camera and the TOF camera is perpendicular to a line between the optical centers of the left eye camera and the right eye camera.
In one embodiment of the present invention, the multi-vision sensor fusion system includes an acquisition module and a fusion module communicatively connected to each other, wherein the acquisition module is communicatively connected to the binocular camera and the TOF camera, respectively, for acquiring the initial left eye image, the initial right eye image, and the initial depth image; the fusion module is used for performing point pair matching on the initial left-eye image and the initial right-eye image in a preset parallax range by taking the depth value of a pixel point in the initial depth image as a reference so as to obtain an initial binocular depth image, and the initial binocular depth image is used as the fused depth image.
In one embodiment of the present invention, the multi-vision sensor fusion system includes an acquisition module and a fusion module communicatively connected to each other, wherein the acquisition module is communicatively connected to the binocular camera and the TOF camera, respectively, for acquiring the initial left eye image, the initial right eye image, the initial depth image, and the RGB image; the fusion module is used for performing point pair matching on the initial left-eye image and the initial right-eye image in a preset parallax range by taking the depth value of a pixel point in the initial depth image as a reference so as to obtain an initial binocular depth image, and the initial binocular depth image is used as the fused depth image.
In an embodiment of the present invention, the multi-vision sensor fusion system further includes a preprocessing module, where the preprocessing module is communicatively disposed between the acquisition module and the fusion module, and is configured to perform preprocessing on the initial left-eye image, the initial right-eye image, and the initial depth image acquired by the acquisition module, so as to obtain a preprocessed left-eye image, a preprocessed right-eye image, and a preprocessed depth image, so that the fusion module is configured to perform point-to-point matching on the preprocessed left-eye image and the preprocessed right-eye image within the preset parallax range with a depth value of a pixel in the preprocessed depth image as a reference, so as to obtain the initial binocular depth image.
In an embodiment of the present invention, the preprocessing module includes a median filtering module, a downsampling module, and a depth filtering module, where the median filtering module is communicatively connected to the acquiring module, and is configured to perform optimized denoising on the initial left-eye image and the initial right-eye image through median filtering, so as to obtain a filtered left-eye image and a filtered right-eye image; the downsampling module is in communication connection with the median filtering module and is used for downsampling the filtered left-eye image and the filtered right-eye image respectively to obtain the preprocessed left-eye image and the preprocessed right-eye image; the depth filtering module is communicatively connected with the acquisition module and is used for filtering the initial depth image by referring to the RGB image through guiding filtering so as to obtain a filtered depth image, so that the filtered depth image is directly used as the preprocessed depth image.
In an embodiment of the invention, the preprocessing module further comprises an upsampling module, wherein the upsampling module is communicatively connected to the depth filtering module for upsampling the filtered depth image to obtain an upsampled depth image such that the upsampled depth image is used as the preprocessed depth image.
In an embodiment of the present invention, the fusion module includes a distortion line correction module and a binocular matching module that are communicatively connected to each other, where the distortion line correction module is communicatively connected to the preprocessing module, and is configured to perform line distortion correction on the preprocessed left-eye image and the preprocessed right-eye image to obtain corrected binocular images; the binocular matching module is communicatively connected with the preprocessing module and is used for referencing the preprocessed depth image and performing binocular stereo matching on the corrected binocular image so as to obtain the initial binocular depth image.
In an embodiment of the present invention, the binocular matching module includes a conversion module and a price calculation module that are communicatively connected to each other, where the conversion module is configured to convert the preprocessed depth image into a coordinate system of the left-eye camera according to a pose relationship between the TOF camera and the left-eye camera, so as to obtain a left-eye depth reference image, and the price calculation module is configured to perform price calculation on pixels having depth values in the left-eye depth reference image within the preset parallax range, so as to perform point-to-point matching, thereby obtaining the initial binocular depth image.
In an embodiment of the present invention, the conversion module of the binocular matching module is only configured to convert pixels with depth values smaller than the detection distance of the TOF camera in the preprocessed depth image into the coordinate system of the left-eye camera, so that only depth information with depth values smaller than the detection distance in the preprocessed depth image is retained in the left-eye depth reference image.
In an embodiment of the present invention, the multi-vision sensor fusion system further includes a post-processing module, wherein the post-processing module is communicatively connected to the fusion module, and is configured to post-process the initial binocular depth image to obtain a final depth image, so that the final depth image is a dense depth image with high accuracy.
In one embodiment of the invention, the post-processing module includes a left-right consistency communicatively coupled to each other
The device comprises a detection module and a mismatching point filling module, wherein the left-right consistency detection module is used for carrying out left-right consistency detection on the initial binocular depth image so as to obtain mismatching points in the initial binocular depth image; the mismatching point filling module is used for filling the depth information of the mismatching points so as to eliminate the mismatching points.
In an embodiment of the present invention, the mismatching point filling module is further configured to fill depth information of a corresponding pixel point in the preprocessed depth image into the mismatching point when the depth of the mismatching point is smaller than the detection distance of the TOF camera; and when the depth of the mismatching point is not smaller than the detection distance of the TOF camera, filling the mismatching point by adopting a horizontal line searching method.
In an embodiment of the present invention, the post-processing module further includes a spurious point filtering module, where the spurious point filtering module is communicatively connected to the mismatching point filling module, and is configured to remove a region with a larger change and a smaller communication space by performing a connected domain judgment on the filled binocular depth image, so as to obtain a binocular depth image with spurious points filtered.
In an embodiment of the present invention, the post-processing module further includes a staining module, where the staining module is respectively communicatively connected to the spurious point filtering module and the obtaining module, and is configured to stain the binocular depth image with the spurious points filtered based on the RGB image, so as to obtain a color depth image.
In an embodiment of the invention, the multi-vision sensor fusion system further includes a calibration module, where the calibration module is configured to calibrate the binocular camera, the TOF camera, and the RGB camera by using a calibration unit, so as to obtain pose relationships between the left-eye camera and the right-eye camera of the binocular camera, pose relationships between the TOF camera and the left-eye camera, and pose relationships between the RGB camera and the TOF camera, respectively.
In an embodiment of the invention, the multi-vision sensor fusion device further includes the target unit, wherein the target unit is a three-dimensional calibration plate, wherein the three-dimensional calibration plate includes a first calibration panel, a second calibration panel, and a third calibration panel, wherein the first calibration panel, the second calibration panel, and the third calibration panel are disposed edge to edge with each other, and calibration surfaces of the first calibration panel, the second calibration panel, and the third calibration panel are not coplanar, such that the three-dimensional calibration plate forms a three-vertical-surface calibration plate.
In an embodiment of the present invention, the first calibration panel, the second calibration panel and the third calibration panel have a common corner point, so as to be used as the vertex of the three-dimensional calibration plate.
In an embodiment of the present invention, the calibration patterns on the first calibration panel, the second calibration panel, and the third calibration panel are patterns having a circular mark array.
In an embodiment of the present invention, the circular mark on each calibration pattern is black and the background is white; the overlapping edges of the first calibration panel, the second calibration panel and the third calibration panel are black.
In an embodiment of the present invention, the calibration module includes a segmentation module, a classification module, a sorting module, and a calibration algorithm module that are sequentially and communicatively connected, where the segmentation module is configured to segment an entire calibration image obtained by synchronously capturing the three-dimensional calibration plate by the binocular camera, the TOF camera, and the RGB camera, respectively, so as to obtain three segmented calibration images, respectively, so that the segmented calibration images respectively correspond to the first calibration panel, the second calibration panel, and the third calibration panel of the three-dimensional calibration plate; the classifying module is used for classifying the surfaces of the circular marks in the segmented calibration images so as to divide the circular marks on the same calibration surface into the same class; the sorting module is used for sorting the circle center coordinates of each type of the circular marks to obtain sorted circle center coordinate data; wherein the calibration algorithm module is used for sitting the ordered circle centers
And (5) performing calibration calculation on the target data to obtain the required pose relationship.
According to another aspect of the present invention, the present invention also provides a multi-vision sensor fusion method, including the steps of:
S200: acquiring an initial left-eye image and an initial right-eye image which are obtained by acquiring image information of a scene through a left-eye camera and a right-eye camera of a binocular camera, and acquiring an initial depth image which is obtained by synchronously acquiring depth information of the scene through a TOF camera, wherein the relative positions of the TOF camera and the binocular camera are fixed;
s300: preprocessing the initial depth image, the initial left-eye image and the initial right-eye image respectively to obtain a preprocessed depth image, a preprocessed left-eye image and a preprocessed right-eye image; and
S400: and taking the preprocessed depth image as a reference, and carrying out fusion processing on the preprocessed left-eye image and the preprocessed right-eye image within a preset parallax range to obtain a fused depth image.
In an embodiment of the present invention, the step S200 further includes the steps of: color information of the scene acquired by an RGB camera is acquired to obtain an RGB image, wherein the relative position between the RGB camera and the TOF camera is fixed.
In an embodiment of the present invention, the step S300 includes the steps of:
Respectively carrying out optimization and de-drying on the initial left-eye image and the initial right-eye image through median filtering to obtain a filtered left-eye image and a filtered right-eye image; and
Downsampling the filtered left-eye image and the filtered right-eye image, respectively, to obtain the preprocessed left-eye image and the preprocessed right-eye image.
In an embodiment of the present invention, the step S300 further includes the steps of:
Filtering the initial depth image by guiding filtering with reference to the RGB image to obtain a filtered depth image; and
The filtered depth image is upsampled to obtain an upsampled depth image such that the upsampled depth image is used as the preprocessed depth image.
In an embodiment of the present invention, the step S400 includes the steps of:
S410: carrying out polar distortion correction on the preprocessed left-eye image and the preprocessed right-eye image to obtain corrected binocular images; and
S420: and (3) referring to the preprocessed depth image, performing binocular stereo matching on the corrected binocular image to obtain an initial binocular depth image.
In an embodiment of the present invention, the step S420 includes the steps of:
Converting the preprocessed depth image into a coordinate system of the left-eye camera according to the pose relation between the TOF camera and the left-eye camera so as to obtain a left-eye depth reference image; and
And carrying out cost calculation on pixel points with depth values in the left-eye depth reference image in the preset parallax range so as to carry out point-to-point matching, thereby obtaining the initial binocular depth image.
In an embodiment of the present invention, the multi-vision sensor fusion method further includes the steps of:
s500: the initial binocular depth image is post-processed to obtain a final depth image, such that the final depth image is a dense and highly accurate depth image.
In an embodiment of the present invention, the step S500 includes the steps of:
s510: performing left-right consistency detection on the initial binocular depth image to obtain mismatching points in the initial binocular depth image; and
S520: and filling the depth information of the mismatching points to remove the mismatching points, and obtaining a filled binocular depth image.
In an embodiment of the present invention, in the step S520, when the depth of the mismatching point is smaller than the detection distance of the TOF camera, the depth information of the corresponding pixel point in the preprocessed depth image is filled into the mismatching point; and when the depth of the mismatching point is not smaller than the detection distance of the TOF camera, filling the mismatching point by adopting a horizontal line searching method.
In an embodiment of the present invention, the step S500 further includes the steps of:
s530: and (3) through judging the connected domain of the filled binocular depth image, removing the region with larger change and smaller connected space, so as to obtain the binocular depth image with the stray points filtered.
In an embodiment of the present invention, the step S500 further includes the steps of:
s540: and on the basis of the RGB image, dyeing the binocular depth image with the stray points filtered to obtain a color depth image.
In an embodiment of the present invention, the multi-vision sensor fusion method further includes, before the step S200:
s100: and calibrating the binocular camera, the TOF camera and the RGB camera through a target plate unit to respectively obtain the pose relation between the left eye camera and the right eye camera of the binocular camera, the pose relation between the TOF camera and the left eye camera and the pose relation between the RGB camera and the TOF camera.
In an embodiment of the present invention, the step S100 includes the steps of:
dividing the whole calibration image obtained by synchronously shooting the three-dimensional calibration plate through the binocular camera, the TOF camera and the RGB camera respectively to obtain three divided calibration images respectively, so that the divided calibration images respectively correspond to a first calibration panel, a second calibration panel and a third calibration panel of the three-dimensional calibration plate;
Classifying the surfaces of the circular marks in the divided calibration images so that the circular marks on the same calibration surface are classified into the same class;
Sequencing the circle center coordinates of each type of the circular marks to obtain sequenced circle center coordinate data; and
And performing calibration calculation on the sequenced circle center coordinate data to obtain a required pose relationship.
According to another aspect of the present invention, there is also provided an electronic apparatus including:
a processor for executing the instructions; and
A memory, wherein the memory is configured to hold machine readable instructions that are executed by the logic to implement some or all of the steps in any of the multi-vision sensor fusion methods described above.
Further objects and advantages of the present invention will become fully apparent from the following description and the accompanying drawings.
These and other objects, features and advantages of the present invention will become more fully apparent from the following detailed description, the accompanying drawings and the appended claims.
Drawings
FIG. 1 is a block diagram schematic of a multi-vision sensor fusion apparatus in accordance with an embodiment of the present invention.
Fig. 2 shows a schematic structural view of the multi-vision sensor fusion apparatus according to the above embodiment of the present invention.
Fig. 3 shows a block diagram of a multi-vision sensor fusion system of the multi-vision sensor fusion apparatus according to the above embodiment of the present invention.
Fig. 4 shows a schematic perspective view of a target unit of the multi-vision sensor fusion apparatus according to the above embodiment of the present invention.
Fig. 5 shows a schematic view of the operation of the multi-vision sensor fusion apparatus according to the above embodiment of the present invention.
Fig. 6 is a flow chart of a multi-vision sensor fusion method according to an embodiment of the present invention.
Fig. 7 shows a flow diagram of one of the steps of the multi-vision sensor fusion method according to the above-described embodiment of the present invention.
Fig. 8 shows a schematic flow chart of a second step of the multi-vision sensor fusion method according to the above embodiment of the present invention.
Fig. 9 shows a schematic flow chart of a third step of the multi-vision sensor fusion method according to the above embodiment of the present invention.
Fig. 10 shows a flow chart of a fourth step of the multi-vision sensor fusion method according to the above embodiment of the present invention.
Fig. 11 is a block diagram schematic of an electronic device according to an embodiment of the invention.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art. The basic principles of the invention defined in the following description may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
In the present invention, the terms "a" and "an" in the claims and specification should be understood as "one or more", i.e. in one embodiment the number of one element may be one, while in another embodiment the number of the element may be plural. The terms "a" and "an" are not to be construed as unique or singular, and the term "the" and "the" are not to be construed as limiting the amount of the element unless the amount of the element is specifically indicated as being only one in the disclosure of the present invention.
In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present invention, unless explicitly stated or limited otherwise, the terms "connected," "connected," and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; may be directly connected or indirectly connected through a medium. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
With the rapid development of science and technology, depth vision is taken as a main depth information acquisition means, and the accuracy of the depth vision is directly influenced by the development and application of various algorithms based on vision, so that the development of technologies in various fields such as security, monitoring, machine vision and the like is influenced. Currently, depth vision sensors are mainly classified into three types, namely TOF sensors (also known as TOF cameras), binocular vision sensors (also known as binocular cameras) and structural light sensors (also known as structural light cameras), because of different working principles
Resulting in different advantages and disadvantages for each, and thus in poor accuracy of the respective obtained depth images. For example, the influence of multipath interference exists in a separate TOF camera, depth loss exists in a region with low reflectivity, the overall resolution is low, and the point cloud is sparse, so that only sparse depth images can be obtained through the TOF camera; the separate binocular camera has the disadvantages of the TOF camera, but has the problems of large calculation amount, shielding, and mismatching of repeated textures and weak texture areas, so that although dense depth images can be obtained by the binocular camera, depth images with high precision cannot be obtained, and huge calculation resources are required. Therefore, in order to solve the above problems, the present invention overcomes the respective drawbacks by combining the advantages of the TOF camera and the binocular camera, and provides a multi-vision sensor fusion apparatus, a method thereof and an electronic device, which can obtain a dense depth image with high accuracy. It can be understood that the higher precision of the depth image mentioned in the present invention means that: the accuracy of the depth image acquired by the multi-vision sensor fusion device of the present invention is higher than that of the depth image acquired by a separate TOF camera or a separate binocular camera; the depth image density mentioned in the present invention means: the degree of density of the pixel points in the depth image obtained by the multi-vision sensor fusion device is larger than that of the pixel points in the depth image obtained by an independent TOF camera.
Schematic device
Referring to fig. 1 to 5 of the drawings, a multi-vision sensor fusion apparatus according to an embodiment of the present invention is illustrated. Specifically, as shown in fig. 1 and 2, the multi-vision sensor fusion apparatus 1 includes a binocular camera 10, a TOF camera 20, and a multi-vision sensor fusion system 30. The binocular camera 10 includes a left-eye camera 11 and a right-eye camera 12 arranged at intervals, wherein the left-eye camera 11 is used for acquiring image information of a scene to obtain an initial left-eye image; wherein the right eye camera 12 is used to synchronously acquire image information of the scene to obtain an initial right eye image. The TOF camera 20 is fixed relative to the left eye camera 11 and the right eye camera 12 of the binocular camera 10, and is configured to synchronously acquire depth information of the scene to obtain an initial depth image. The multi-vision sensor fusion system 30 is communicatively connected to the binocular camera 10 and the TOF camera 20, respectively, for processing the initial left-eye image and the initial right-eye image with reference to the initial depth image to obtain a fused depth image.
It is noted that the present invention refers to the initial depth image obtained by the TOF camera 20 when processing the initial left-eye image and the initial right-eye image, so that the fused depth image is implemented as a dense and highly accurate depth image, that is, the density of pixels in the fused depth image is greater than the density of pixels in the initial depth image; and the accuracy of the fused depth image is higher than that obtained by a separate TOF camera or a separate binocular camera.
More specifically, in order to ensure that the field of view range of the TOF camera 20 can cover the field of view ranges of the left-eye camera 11 and the right-eye camera 12 of the binocular camera 10 simultaneously as much as possible, therefore in the above-described embodiment of the present invention, as shown in fig. 2, the TOF camera 20 is disposed between the left-eye camera 11 and the right-eye camera 12 of the binocular camera 10.
Preferably, as shown in fig. 2, the optical center C20 of the TOF camera 20 is located on the line between the optical center C11 of the left eye camera 11 and the optical center C12 of the right eye camera 12, so as to simplify the subsequent image processing process and reduce the calculation amount in calibration.
More preferably, as shown in fig. 2, the optical center C20 of the TOF camera 20 is located at the exact center of the optical center C11 of the left-eye camera 11 and the optical center C12 of the right-eye camera 12, that is, the distance between the optical center C20 of the TOF camera 20 and the optical center C11 of the left-eye camera 11 is equal to the distance between the optical center C20 of the TOF camera 20 and the optical center C12 of the right-eye camera 12, so as to further simplify the subsequent image processing procedure, reduce the calculation amount at the time of calibration, and prevent the occurrence of error accumulation.
In addition, the external rotation angle of the binocular camera 10 does not exceed 2. That is, the angle between the optical axis of the right eye camera 12 of the binocular camera 10 and the optical axis of the left eye camera 11 of the binocular camera 10 is not more than 2.. Most preferably, the optical axis of the right eye camera 12 of the binocular camera 10 is parallel to the optical axis of the left eye camera 11 of the binocular camera 10.
According to the above embodiment of the present invention, as shown in fig. 3, the multi-vision sensor fusion system 30 includes an acquisition module 31 and a fusion module 32 that are communicatively connected to each other. The acquisition module 31 is communicatively connected with the binocular camera 10 and the TOF camera, respectively, for acquiring the initial left and right eye images from the binocular camera 10 and the initial depth image from the TOF camera 20. The fusion module 32 is configured to perform point-to-point matching on the initial left-eye image and the initial right-eye image in a preset parallax range with a depth value of a pixel point in the initial depth image as a reference, so as to obtain an initial binocular depth image.
It can be understood that, in the existing binocular image matching technology, no depth information is referenced, so when a certain pixel point on the left-eye image is matched, cost calculation is required to be performed on all the pixel points in the right-eye image one by one, and then a matching point pair is obtained, which makes the calculation amount of the whole matching process huge. In the multi-vision sensor fusion device 1 of the present invention, the depth value in the initial depth image obtained by the TOF camera 20 is used as a reference, so that when a certain pixel point on the left-eye image is subjected to point-to-point matching, only the reference depth value of the pixel point is needed to calculate the reference parallax, and a required parallax range (for example, the upper and lower limits of the parallax range and the reference parallax are not more than 5 pixels, etc.) is preset with the reference parallax as the center, and further, cost calculation is performed on the pixel points in the preset parallax range in the right-eye image, so as to obtain a matching point pair, and cost calculation is not needed for all the pixel points in the right-eye image, which is helpful to greatly reduce the calculation amount of the matching process, and improve the matching efficiency and the fusion precision.
It is worth mentioning that the resolution of TOF cameras and binocular cameras on the market is typically different and very different, e.g. the resolution of TOF cameras is typically only 224 x 172 (i.e. less than 4 ten thousand pixels); the resolution of the binocular camera is often more than one million pixels, that is, more than five times the resolution of the TOF camera, which results in that the initial depth image obtained by the lower-resolution TOF camera cannot be directly referenced, and the fusion processing is performed on the initial left-eye image and the initial right-eye image obtained by the higher-resolution binocular camera. Thus, in the above embodiment of the present invention, as shown in fig. 3, the multi-vision sensor fusion system 30 of the multi-vision sensor fusion apparatus 1 further includes a preprocessing module 33, wherein the preprocessing module 33 is communicatively disposed between the acquisition module 31 and the fusion module 32, and is configured to perform preprocessing on the initial left-eye image, the initial right-eye image, and the initial depth image acquired by the acquisition module 31 to obtain a preprocessed left-eye image, a preprocessed right-eye image, and a preprocessed depth image, so that the resolution of the preprocessed depth image matches the preprocessed left-eye image and the preprocessed right-eye image, so as to facilitate a fusion process by the fusion module 32.
Preferably, the resolution of the pre-processed left-eye image and the pre-processed right-eye image is equal to 2-3 times the resolution of the pre-processed depth image.
More specifically, as shown in fig. 3 and 5, the preprocessing module 33 of the multi-vision sensor fusion system 30 includes a median filtering module 331, a downsampling module 332, and a depth filtering module 333. The median filtering module 331 is communicatively connected to the obtaining module 31, and is configured to perform optimized denoising on the initial left-eye image and the initial right-eye image through median filtering, so as to obtain a filtered left-eye image and a filtered right-eye image, and ensure stability of the left-eye image and the right-eye image. The downsampling module 332 is communicatively connected to the median filtering module 331 and is configured to downsample the filtered left-eye image and the filtered right-eye image, respectively, to obtain a downsampled left-eye image and a downsampled right-eye image, such that resolutions of the left-eye image and the right-eye image are reduced. The said
A depth filtering module 333 is communicatively connected to the acquisition module 31 for filtering the initial depth image to obtain a filtered depth image. It will be appreciated that in one example of the invention, the downsampling module 332 is communicatively coupled with the fusion module 32 for taking the downsampled left eye image and the downsampled right eye image as the preprocessed left eye image and the preprocessed right eye image for transmission to the fusion module 32; the depth filtering module 333 is communicably connected with the fusion module 32 for transmitting the filtered depth image as the preprocessed depth image to the fusion module 32. In addition, the downsampling module 332 of the present invention may downsample the image by interpolation, e.g., interpolate 4 pixels into a single pixel, etc.
Notably, if the resolution of the initial left-eye image and the initial right-eye image is too high, even after downsampling by the downsampling module 332, the resolution of the downsampled left-eye image and the upsampled right-eye image is still much greater than the resolution of the filtered depth image (e.g., the resolution of the downsampled left-eye image and the upsampled right-eye image is more than 3 times the resolution of the filtered depth image), the resolution of the preprocessed depth image cannot be matched to the resolution of the preprocessed left-eye image and the preprocessed right-eye image. Accordingly, as shown in fig. 1 and 5, the preprocessing module 33 of the multi-vision sensor fusion system 30 further comprises an upsampling module 334, wherein the upsampling module 334 is communicatively connected to the depth filtering module 333 for upsampling the filtered depth image to obtain an upsampled depth image such that the resolution of the depth image is improved. Wherein the upsampling module 334 is further communicatively coupled to the fusing module 32 for transmitting the upsampled depth image as the preprocessed depth image to the fusing module 32.
According to the above embodiment of the present invention, as shown in fig. 1 and 2, the multi-vision sensor fusion apparatus 1 further includes an RGB camera 40, wherein the relative position between the RGB camera 40 and the TOF camera 20 is fixed, so as to acquire color information of the scene to obtain an RGB image. The depth filtering module 333 of the preprocessing module 33 of the multi-vision sensor fusion system 30 is communicatively connected with the RGB camera 40 for optimizing the initial depth image by guided filtering with reference to the RGB image to obtain the filtered depth image. It will be appreciated that, due to the higher resolution of the RGB camera 40, the overall detail of the depth image can be optimized by referencing the RGB image to obtain a better optimized denoising effect.
Preferably, as shown in fig. 2, the RGB camera 40 is disposed between the left eye camera 11 and the right eye camera 12 of the binocular camera 10, and the RGB camera 40 and the TOF camera 20 are horizontally arranged side by side, that is, the optical center C40 of the RGB camera 40 and the optical center C20 of the TOF camera 20 are on the same horizontal line, which helps to ensure higher accuracy and avoid errors due to positional deviations. In other words, the optical centers C40 and C20 of the RGB camera 40 and the TOF camera 20 are both on the line connecting the optical center C11 of the left eye camera 11 and the optical center C12 of the right eye camera 12 of the binocular camera 10, which helps to reduce the calculation amount of the multi-vision sensor fusion device 1 in the calibration process.
Of course, in other examples of the present invention, if there is less space between the left eye camera 11 and the right eye camera 12 of the binocular camera 10, the RGB camera 40 and the TOF camera 20 may be arranged side by side vertically, i.e., the optical center C40 of the RGB camera 40 and the optical center C20 of the TOF camera 20 are on the same vertical line, when the TOF camera 20 and the RGB camera 40 cannot be placed simultaneously, so that the distance between the optical center C40 of the RGB camera 40 and the optical center C11 of the left eye camera 11 of the binocular camera 10 is equal to the distance between the optical center C40 of the RGB camera 40 and the optical center C12 of the right eye camera 12. In other words, the RGB camera is disposed between the left-eye camera and the right-eye camera of the binocular camera, and a line between the optical center C40 of the RGB camera 40 and the optical center C20 of the TOF camera 20 is perpendicular to a line between the optical center C11 of the left-eye camera 11 and the optical center C12 of the right-eye camera 12.
Notably, although the left-eye camera 11 and the right-eye camera 12 of the binocular camera 10 are acquired
The left-eye and right-eye images of the binocular camera 10 may be implemented as gray scale images or black-and-white images, but in some examples of the present invention, the left-eye and right-eye images acquired by the left-eye camera 11 and the right-eye camera 12 of the binocular camera 10 may also be implemented as RGB images, so that the multi-vision sensor fusion apparatus 1 can directly refer to the left-eye RGB images or the right-eye RGB images by the depth filtering module 333 of the preprocessing module 33 to optimize the initial depth images through guide filtering to obtain the filtered depth images without additionally providing the RGB camera 40.
It should be noted that the difficulty of binocular matching is how to calculate the matching point pairs, mainly by calculating to obtain the position information of the left eye camera 11 and the right eye camera 12, and then intelligently determining that the corresponding points are on the outer polar line according to the properties of the epipolar geometry, so that the specific parallax cannot be determined, so that the traditional matching algorithm is long in overall matching time and huge in calculation cost no matter in global or semi-global. Therefore, in order to solve such a problem, the vision sensor fusion apparatus 1 of the above-described embodiment of the present invention performs binocular matching (i.e., performs fusion processing) with the depth image obtained by the TOF camera 20 as a reference image, to reduce matching difficulty.
Specifically, in the above-described embodiment of the present invention, as shown in fig. 3 and 5, the fusion module 32 of the multi-vision sensor fusion system 30 of the multi-vision sensor fusion apparatus 1 may include a distortion line correction module 321 and a binocular matching module 322, wherein the distortion line correction module 321 may be communicatively connected with the preprocessing module 33 for performing line distortion correction on the preprocessed left-eye image and the preprocessed right-eye image to obtain corrected binocular images; the binocular matching module 322 is communicatively connected to the aberrated epipolar line correction module 321 and the preprocessing module 33, respectively, and is configured to reference the preprocessed depth image, and perform binocular stereo matching on the corrected binocular image, so as to obtain the initial binocular depth image as the fused depth image.
More specifically, as shown in fig. 3 and 5, the binocular matching module 322 of the fusion module 32 further includes a conversion module 3221 and a price calculation module 3222, wherein the conversion module 3221 is configured to convert the preprocessed depth image into the coordinate system of the left-eye camera 11 according to the pose relationship between the TOF camera 20 and the left-eye camera 11, so as to obtain a left-eye depth reference image; the cost calculation module 3222 is configured to perform cost calculation on the pixel points with depth values in the left-eye depth reference image within the preset parallax range to perform point-to-point matching, so as to obtain an initial binocular depth image, so that the time required for matching can be greatly reduced, and meanwhile, the defect of the binocular camera is overcome. Of course, in other examples of the invention, the conversion module 3221 of the binocular matching module 322 may be configured to convert the preprocessed depth image into the coordinate system of the right-eye camera 12 according to the pose relationship between the TOF camera 20 and the right-eye camera 12 to obtain a right-eye depth reference image; then, the cost calculation module 3222 may also perform cost calculation on the pixel points with depth values in the right-eye depth reference image within the preset parallax range to perform point-to-point matching, so as to obtain an initial binocular depth image.
It is noted that due to the limited detection distance (e.g., 1-3 m) of the TOF camera 20, the TOF camera 20 has better depth information only within the detection distance. Therefore, in the above embodiment of the present invention, the binocular matching module 322 of the fusion module 32 converts only the pixels with depth values smaller than the detection distance in the preprocessed depth image into the coordinate system of the left eye camera 11, so that only the depth information with depth values smaller than the detection distance in the preprocessed depth image is retained in the left eye depth reference image. In other words, the binocular matching module 322 refers to the depth image obtained by the TOF camera 20 when performing the point-to-point matching on the pixel points having the depth values smaller than the detection distance; and when the pixel points with the depth value not smaller than (greater than or equal to) the detection distance are subjected to point-to-point matching, the conventional binocular matching method is directly used without referring to the depth image obtained by the TOF camera 20, so as to ensure that the binocular depth image obtained at multiple distances has higher precision.
It should be noted that, since the binocular camera 10 has a problem of shielding in the prior art, the binocular camera
10, In order to solve the above problem, as shown in fig. 3 and 5, the multi-vision sensor fusion system 30 of the multi-vision sensor fusion device 1 according to the above embodiment of the present invention further includes a post-processing module 34, where the post-processing module 34 is communicatively connected to the fusion module 32, and is configured to post-process the fused depth image to obtain a dense and highly accurate depth image.
Specifically, as shown in fig. 3 and 5, the post-processing module 34 includes a left-right consistency detection module 341 and a mismatch point filling module 342 that are communicatively connected to each other, where the left-right consistency detection module 341 is configured to perform left-right consistency detection on the initial binocular depth image to obtain a mismatch point existing in the initial binocular depth image; the mismatching point filling module 342 is configured to perform filling processing on depth information of the mismatching point to reject the mismatching point.
Preferably, when the depth of the mismatching point is smaller than the detection distance of the TOF camera 20, the mismatching point filling module 342 fills the depth information of the corresponding pixel point in the preprocessed depth image into the mismatching point; when the depth of the mismatching point is not smaller than the detection distance of the TOF camera 20, the mismatching point filling module 342 does not fill the mismatching point to reject the mismatching point. Of course, in other examples of the present invention, the mismatching point filling module 342 may fill the mismatching point by using a horizontal line search method when the depth of the mismatching point is not less than the detection distance of the TOF camera 20.
It should be noted that, when the mismatching point existing in the initial binocular depth image cannot correspond to the pixel point on the preprocessed depth image (i.e., the mismatching point existing in the initial binocular depth image does not have the depth information of the reference), the mismatching point filling module 342 may also perform filling matching through the depth information of the neighborhood. If there is no depth information in the larger connected domain, the mismatching point filling module 342 may perform filling matching on the parallax value outside the detection distance of the TOF camera 20 through a conversion formula, so as to obtain a filled binocular depth image, so that the binocular depth image has good depth information on the object outside the detection distance of the TOF camera 20, which is conducive to combining the advantages of the TOF camera and the binocular camera.
Further, as shown in fig. 3 and 5, the post-processing module 34 of the multi-vision sensor fusion system 30 further includes a spurious point filtering module 343, where the spurious point filtering module 343 is communicatively connected to the mismatching point filling module 342, and is configured to remove a region with a larger change and a smaller communication space by performing a connected region judgment on the filled binocular depth image, so as to obtain a binocular depth image with spurious points filtered.
Still further, in the above embodiment of the present invention, as shown in fig. 3 and 5, the post-processing module 34 of the multi-vision sensor fusion system 30 further includes a staining module 344, wherein the staining module 344 is communicatively connected to the spurious point filtering module 343 and the obtaining module 31, respectively, and is configured to stain the binocular depth image with the spurious points filtered according to the RGB image, so as to fill color information into the binocular depth image to obtain a color depth image. In this way, the multi-vision sensor fusion apparatus 1 can obtain a depth image having good accuracy and texture information.
According to the above embodiment of the present invention, the multi-vision sensor fusion system 30 generally uses the pose relationship (i.e. external parameters) between the various vision sensors in the multi-vision sensor fusion device 1 in the process of preprocessing (such as guiding filtering based on RGB images, etc.), fusion (such as binocular matching based on depth images, etc.) and/or post-processing (such as filling of mismatching points, or dyeing based on RGB images, etc.), for example: a pose relationship between the left eye camera 11 and the right eye camera 12 of the binocular camera 10; a pose relationship between the TOF camera 20 and the left eye camera 11 (or the right eye camera 12); a pose relationship between the RGB camera 40 and the TOF camera 20, and so on. Thus, in the above-described embodiments of the present invention, as shown in FIGS. 1-3, the multi-vision sensor fusion system 30
A calibration module 35 may be further included, wherein the calibration module 35 is communicatively connected to the acquisition module 31, and is configured to calibrate the binocular camera 10, the TOF camera 20, and the RGB camera 40 in the multi-vision sensor fusion apparatus 1 by using a calibration board unit 50, so as to obtain pose relationships between the left-eye camera 11 and the right-eye camera 12, pose relationships between the TOF camera 20 and the left-eye camera 11, and pose relationships between the RGB camera 40 and the TOF camera 20, respectively.
Preferably, as shown in fig. 2 and 4, the target unit 50 is implemented as a stereoscopic calibration plate 51, wherein the stereoscopic calibration plate 51 comprises a first calibration panel 511, a second calibration panel 512 and a third calibration panel 513, wherein the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are arranged side-to-side with each other, and the calibration surfaces of the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are not coplanar, such that the stereoscopic calibration plate 51 forms a three-sided type calibration plate. Thus, when the stereoscopic calibration plate 51 is properly placed in the common field of view of the binocular camera 10, the TOF camera 20 and the RGB camera 40 of the multi-vision sensor fusion device 1, so that the binocular camera 10, the TOF camera 20 and the RGB camera 40 can each capture the calibration surfaces of the first, second and third calibration panels 511, 512 and 513, the position of the stereoscopic calibration plate 51 does not need to be moved when calibrating the plurality of vision sensors of the multi-vision sensor fusion device 1, and the binocular camera 10, the TOF camera 20 and the RGB camera 40 of the multi-vision sensor fusion device 1 can acquire images of three calibration surfaces through one capturing so as to obtain the pose relationship between the binocular camera 10, the TOF camera 20 and the RGB camera 40 through the calibration module 35. It will be appreciated that in other examples of the invention, the target unit 50 may also be implemented as a planar calibration plate, such that when calibrating multiple vision sensors of the multi-vision sensor fusion device 1, three or more calibration images may be obtained by moving the position of the planar calibration plate to complete the calibration.
Preferably, as shown in fig. 4, the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 have a common corner point as the vertex of the three-dimensional calibration plate 51. In this way, when the stereoscopic calibration plate 51 is placed, it is only necessary to orient the vertex of the stereoscopic calibration plate 51 to the TOF camera 20 of the multi-vision sensor fusion device 1, so that it is ensured that the binocular camera 10, the TOF camera 20 and the RGB camera 40 can acquire three calibration images by one photographing.
More preferably, as shown in fig. 4, the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are perpendicular to each other, which helps to further simplify the calibration algorithm of the multi-vision sensor fusion apparatus 1.
It should be noted that, because the resolution of the TOF camera 20 is low, the TOF camera 20 has high requirements on the recognition degree and the extraction difficulty of the patterns on the calibration plate, so in order to obtain a good spatial pose relationship, in the above embodiment of the present invention, as shown in fig. 4, the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 of the three-dimensional calibration plate 51 are all implemented as a circular calibration plate, that is, the calibration patterns on the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are patterns with circular mark arrays, so that the recognition degree and the extraction difficulty of the three-dimensional calibration plate 51 are high, so as to obtain a more accurate pose relationship through calibration.
Illustratively, as shown in fig. 4, the calibration patterns on the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 may be white background, black patterns (i.e., white background and black words); in other words, the circular marks on the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are black, and the background is white. Of course, in other examples of the present invention, the calibration patterns on the first calibration panel 511, the second calibration panel 512, and the third calibration panel 513 may be black background, white patterns (i.e., black background and white words); in other words
The circular marks on the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are white, and the background is black.
Further, as shown in fig. 4, the overlapping edges of the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are black, so as to distinguish the boundaries of the three calibration images, so as to facilitate the segmentation of the three calibration images, and facilitate the simplification of the calibration calculation performed subsequently.
According to the above embodiment of the present invention, as shown in fig. 3, the calibration module 35 of the multi-vision sensor fusion system 30 includes a segmentation module 351, a classification module 352, a sorting module 353 and a calibration algorithm module 354 which are sequentially and communicatively connected, wherein the segmentation module 351 segments the entire calibration image obtained by synchronously photographing the stereoscopic calibration plate 51 by the binocular camera 10, the TOF camera 20 and the RGB camera 40, respectively, to obtain three segmented calibration images, respectively, such that the segmented calibration images correspond to the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 of the stereoscopic calibration plate 51, respectively; the classifying module 352 is configured to classify planes of the segmented calibration images where the circular marks are located, so that the circular marks located on the same calibration plane are classified into the same class; the sorting module 353 is configured to sort the center coordinates of each type of the circular marks, so as to obtain sorted center coordinate data; the calibration algorithm module 354 is configured to perform calibration calculation on the sequenced center coordinate data, so as to obtain pose relationships between all the vision sensors in the multi-vision sensor fusion device 1. It will be appreciated that in calibrating all of the visual sensors in the multi-visual sensor fusion device 1, it is necessary to ensure that the calibration images obtained by all of the visual sensors must remain frame-synchronized.
Illustratively, in an example of the present invention, the calibration algorithm module 354 may perform information calibration on the left-eye camera 11 and the right-eye camera 12 of the binocular camera 10 first, and accurately obtain the pose relationship between the left-eye camera 11 and the right-eye camera 12 through an optimized calibration algorithm; next, the calibration algorithm module 354 may perform information calibration on the TOF camera 20 and the RGB camera 40 to obtain a pose relationship between the TOF camera 20 and the RGB camera 40; finally, the calibration algorithm module 354 may perform information calibration on the left eye camera 11 and the TOF camera 20 to obtain a pose relationship between the left eye camera 11 and the TOF camera 20, so as to obtain pose relationships between all the vision sensors in the multi-vision sensor fusion device 1.
Schematic method
Referring to fig. 6 to 10 of the drawings, a multi-vision sensor fusion method according to an embodiment of the present invention is illustrated. Specifically, as shown in fig. 6, the multi-vision sensor fusion method includes the steps of:
S200: acquiring an initial left-eye image and an initial right-eye image obtained by acquiring image information of a scene by a left-eye camera 11 and a right-eye camera 12 of a binocular camera 10, and acquiring an initial depth image obtained by synchronously acquiring depth information of the scene by a TOF camera 20, wherein the relative positions of the TOF camera 20 and the left-eye camera 11 and the right-eye camera 12 are fixed;
S300: preprocessing the initial depth image, the initial left-eye image and the initial right-eye image respectively to obtain a preprocessed depth image, a preprocessed left-eye image and a preprocessed right-eye image; and
S400: and taking the preprocessed depth image as a reference, and carrying out fusion processing on the preprocessed left-eye image and the preprocessed right-eye image within a preset parallax range to obtain a fused depth image.
It will be appreciated that in other examples of the present invention, the multi-vision sensor fusion method may not include the step S300, so that in the step S400, the initial left-eye image and the initial right-eye image are directly processed with reference to the initial depth image, so as to obtain the fused depth image.
Notably, in the above embodiments of the present invention, the steps of the multi-vision sensor fusion method
Step S200 may further include the steps of: color information of the scene acquired by an RGB camera 40 is acquired to obtain an RGB image, wherein the relative position between the RGB camera 40 and the TOF camera 20 is fixed.
In an example of the present invention, as shown in fig. 7, the step S300 of the multi-vision sensor fusion method may include the steps of:
S310: respectively carrying out optimized denoising on the initial left-eye image and the initial right-eye image through median filtering to obtain a filtered left-eye image and a filtered right-eye image; and
S320: and respectively downsampling the filtered left-eye image and the filtered right-eye image to obtain the preprocessed left-eye image and the preprocessed right-eye image.
Further, in the above example of the present invention, as shown in fig. 7, the step S300 of the multi-vision sensor fusion method may further include the steps of:
s330: optimizing the initial depth information by guiding filtering with reference to the RGB image to obtain a filtered depth image; and
S340: and upsampling the filtered depth image to obtain an upsampled depth image such that the upsampled depth image is used as the preprocessed depth image.
Preferably, the resolution of the preprocessed left-eye image and the preprocessed right-eye image is 2-3 times the resolution of the preprocessed depth image.
It is understood that there is no order of the step S310 and the step S330, that is, the step S310 may be before or after the step S330, or the step S310 may be performed in synchronization with the step S330.
It should be noted that, in an example of the present invention, as shown in fig. 8, the step S400 of the multi-vision sensor fusion method may include the steps of:
S410: carrying out polar distortion correction on the preprocessed left-eye image and the preprocessed right-eye image to obtain corrected binocular images; and
S420: and performing binocular stereo matching on the corrected binocular image within the preset parallax range by referring to the preprocessed depth image to obtain an initial binocular depth image as the fused depth image.
Further, in this example of the present invention, as shown in fig. 8, the step S420 includes the steps of:
S421: converting the preprocessed depth image into a coordinate system of the left-eye camera 11 according to the pose relationship between the TOF camera 20 and the left-eye camera 11 so as to obtain a left-eye depth reference image; and
S422: and carrying out cost calculation on pixel points with depth values in the left-eye depth reference image in the preset parallax range so as to carry out point-to-point matching, so as to obtain the initial binocular depth image.
It is noted that in the step S421, the pixel points in the preprocessed depth image, whose depth values are smaller than the detection distance of the TOF camera, are converted into the coordinate system of the left-eye camera, so that only the depth information in the preprocessed depth image, whose depth values are smaller than the detection distance, is retained in the left-eye depth reference image.
It should be noted that, in the above embodiment of the present invention, as shown in fig. 6, the multi-vision sensor fusion method further includes the steps of:
s500: and carrying out post-processing on the fused depth image to obtain a final depth image, so that the final depth image is a dense depth image with higher precision.
In an example of the present invention, as shown in fig. 9, the step S500 of the multi-vision sensor fusion method includes the steps of:
s510: performing left-right consistency detection on the initial binocular depth image to obtain mismatching points in the initial binocular depth image;
s520: and filling the depth information of the mismatching points to remove the mismatching points, and obtaining a filled binocular depth image.
It is noted that, in the step S520 of the above example of the present invention: when the depth of the mismatching point is smaller than the detection distance of the TOF camera 20, filling the depth information of the corresponding pixel point in the preprocessed depth image into the mismatching point; when the depth of the mismatching point is not smaller than the detection distance of the TOF camera 20, filling the mismatching point by adopting a horizontal line searching method; and when the mismatching point cannot correspond to any pixel point on the preprocessed depth image, filling the mismatching point through the depth information of the neighborhood.
In the above example of the present invention, preferably, as shown in fig. 9, the step S500 of the multi-vision sensor fusion method may further include the steps of:
S530: and (3) through carrying out connected domain vending judgment on the filled binocular depth image, removing the region with larger change and smaller connected space, so as to obtain the binocular depth image with the stray points filtered.
More preferably, as shown in fig. 9, the step S500 of the multi-vision sensor fusion method may further include the steps of:
S540: and based on the RGB image, dyeing the binocular depth image with the stray points filtered so as to fill color information in the RGB image into the binocular depth image, so as to obtain the color depth image.
According to the above embodiment of the present invention, as shown in fig. 6, the multi-vision sensor fusion method further includes, before the step S200:
S100: the binocular camera 10, the TOF camera 20 and the RGB camera 40 are calibrated by a calibration unit 50 to obtain the pose relationship of the TOF camera 20, the RGB camera 40 and the left-eye camera 11 and the right-eye camera 12 of the binocular camera 10.
It is noted that in the step S100 of the multi-vision sensor fusion method of the present invention, the target unit 50 is implemented as a three-dimensional calibration plate 51, wherein the three-dimensional calibration plate 51 includes a first calibration panel 511, a second calibration panel 512 and a third calibration panel 513, wherein the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are disposed edge to edge with each other, and the calibration surfaces of the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are not coplanar.
Preferably, the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 have a common corner point.
More preferably, the calibration patterns on the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are patterns having a circular mark array.
Most preferably, the circular marks on the first, second and third calibration panels 511, 512 and 513 are black, and overlapping edges of the first, second and third calibration panels 511, 512 and 513 are black.
It should be noted that, in an example of the present invention, as shown in fig. 10, the step S100 of the multi-vision sensor fusion method includes the steps of:
S110: dividing the whole calibration image obtained by synchronously photographing the stereoscopic calibration plate 51 by the binocular camera 10, the TOF camera 20 and the RGB camera 40, respectively, to obtain three divided calibration images, respectively, such that the divided calibration images correspond to the first calibration panel 511 and the third calibration panel 511 of the stereoscopic calibration plate 51, respectively
A second calibration panel 512 and the third calibration panel 513;
S120: classifying the surfaces of the circular marks in the divided calibration images so that the circular marks on the same calibration surface are classified into the same class;
s130: sequencing the circle center coordinates of each type of the circular marks to obtain sequenced circle center coordinate data; and
S140: calibration calculation is performed based on the sequenced circle center coordinate data to obtain a pose relationship between the left eye camera 11 and the right eye camera 12, a pose relationship between the TOF camera 20 and the left eye camera 11, and a pose relationship between the RGB camera 40 and the TOF camera 20.
Schematic electronic device
Next, an electronic device according to an embodiment of the present invention is described with reference to fig. 11 (fig. 11 shows a block diagram of the electronic device according to an embodiment of the present invention). As shown in fig. 11, the electronic device 60 includes one or more processors 61 and memory 62.
The processor 61 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 60 to perform desired functions.
The memory 62 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium, which may be executed by the processor 61 to implement the methods of the various embodiments of the present invention described above and/or other desired functions.
In one example, as shown in fig. 11, the electronic device 60 may further include: an input device 63 and an output device 64, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
For example, the input device 63 may be, for example, a camera module or the like for capturing image data or video data.
The output device 64 may output various information including the classification result and the like to the outside. The output means 64 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device 60 that are relevant to the present invention are shown in fig. 11 for simplicity, components such as buses, input/output interfaces, etc. are omitted. In addition, the electronic device 60 may include any other suitable components depending on the particular application.
Illustrative computing program product
In addition to the methods and apparatus described above, embodiments of the invention may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a method according to various embodiments of the invention described in the "exemplary methods" section of this specification.
The computer program product may write program code for performing the operations of embodiments of the present invention in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the C programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present invention may also be a computer readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform the steps of the method described above in the present specification.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present invention have been described above in connection with specific embodiments, but it should be noted that the advantages, benefits, effects, etc. mentioned in the present invention are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be construed as necessarily possessed by the various embodiments of the invention. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the invention is not necessarily limited to practice with the above described specific details.
The block diagrams of the devices, apparatuses, devices, systems referred to in the present invention are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present invention, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present invention.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The objects of the present invention have been fully and effectively achieved. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from the principles described.

Claims (37)

1. A multi-vision sensor fusion device, comprising:
A binocular camera, wherein the binocular camera comprises a left-eye camera and a right-eye camera which are arranged at intervals, wherein the left-eye camera is used for acquiring image information of a scene to obtain an initial left-eye image, and the right-eye camera is used for synchronously acquiring the image information of the scene to obtain an initial right-eye image;
a TOF camera, wherein the relative position between the TOF camera and the binocular camera is fixed, and the TOF camera is used for synchronously acquiring depth information of the scene to obtain an initial depth image; and
A multi-vision sensor fusion system, wherein the multi-vision sensor fusion system is respectively connected with the binocular camera and the TOF camera in a communication way and is used for processing the initial left-eye image and the initial right-eye image by taking the initial depth image as a reference so as to obtain a fused depth image;
The multi-vision sensor fusion system comprises an acquisition module and a fusion module which are mutually connected in a communication mode, wherein the acquisition module is respectively connected with the binocular camera and the TOF camera in a communication mode and is used for acquiring the initial left eye image, the initial right eye image and the initial depth image; the fusion module is used for calculating reference parallax by taking the depth value of a pixel point in the initial depth image as a reference, presetting a required parallax range by taking the reference parallax as a center, and further carrying out point pair matching on the initial left-eye image and the initial right-eye image in the preset parallax range to obtain an initial binocular depth image, so that the initial binocular depth image is used as the fused depth image.
2. The multi-vision sensor fusion apparatus of claim 1, further comprising an RGB camera, wherein a relative position between the RGB camera and the TOF camera is fixed for synchronously capturing color information of the scene to obtain RGB images.
3. The multi-vision sensor fusion device of claim 2, wherein the TOF camera is disposed between the left-eye camera and the right-eye camera of the binocular camera, and an optical center of the TOF camera is located on a line between the optical centers of the left-eye camera and the right-eye camera.
4. A multi-vision sensor fusion apparatus as defined in claim 3, wherein the RGB camera is disposed between the left-eye camera and the right-eye camera of the binocular camera, and an optical center of the RGB camera is located on a line between the optical centers of the left-eye camera and the right-eye camera.
5. A multi-vision sensor fusion apparatus as defined in claim 3, wherein the RGB camera is disposed between the left-eye camera and the right-eye camera of the binocular camera, and a line between the optical centers of the RGB camera and the TOF camera is perpendicular to a line between the optical centers of the left-eye camera and the right-eye camera.
6. The multi-vision sensor fusion apparatus of any one of claims 2-5, wherein the acquisition module is further configured to acquire the RGB image.
7. The multi-vision sensor fusion apparatus of claim 6, wherein the multi-vision sensor fusion system further comprises a preprocessing module, wherein the preprocessing module is communicatively disposed between the acquisition module and the fusion module, for preprocessing the initial left-eye image, the initial right-eye image, and the initial depth image acquired via the acquisition module to obtain a preprocessed left-eye image, a preprocessed right-eye image, and a preprocessed depth image, such that the fusion module is configured to perform point-to-point matching on the preprocessed left-eye image and the preprocessed right-eye image within the preset parallax range with depth values of pixels in the preprocessed depth image as references, to obtain the initial binocular depth image.
8. The multi-vision sensor fusion apparatus of claim 7, wherein the preprocessing module comprises a median filtering module, a downsampling module, and a depth filtering module, wherein the median filtering module is communicatively coupled to the acquisition module for optimally denoising the initial left-eye image and the initial right-eye image, respectively, by median filtering to obtain a filtered left-eye image and a filtered right-eye image; the downsampling module is in communication connection with the median filtering module and is used for downsampling the filtered left-eye image and the filtered right-eye image respectively to obtain the preprocessed left-eye image and the preprocessed right-eye image; the depth filtering module is communicatively connected with the acquisition module and is used for filtering the initial depth image by referring to the RGB image through guiding filtering so as to obtain a filtered depth image, so that the filtered depth image is directly used as the preprocessed depth image.
9. The multi-vision sensor fusion apparatus of claim 8, wherein the preprocessing module further comprises an upsampling module, wherein the upsampling module is communicatively coupled to the depth filtering module for upsampling the filtered depth image to obtain an upsampled depth image such that the upsampled depth image is used as the preprocessed depth image.
10. The multi-vision sensor fusion apparatus of claim 9, wherein the fusion module comprises a epipolar line correction module and a binocular matching module communicatively coupled to each other, wherein the epipolar line correction module is communicatively coupled to the preprocessing module for epipolar line correction of the preprocessed left-eye image and the preprocessed right-eye image to obtain corrected binocular images; the binocular matching module is communicatively connected with the preprocessing module and is used for referencing the preprocessed depth image and performing binocular stereo matching on the corrected binocular image so as to obtain the initial binocular depth image.
11. The multi-vision sensor fusion device of claim 10, wherein the binocular matching module comprises a conversion module and a price calculation module which are communicatively connected with each other, wherein the conversion module is configured to convert the preprocessed depth image into a coordinate system of the left-eye camera according to a pose relationship between the TOF camera and the left-eye camera so as to obtain a left-eye depth reference image, and the price calculation module is configured to perform price calculation on pixels with depth values in the left-eye depth reference image within the preset parallax range so as to perform point-to-point matching, thereby obtaining the initial binocular depth image.
12. The multi-vision sensor fusion apparatus of claim 11, wherein the conversion module of the binocular matching module is configured to convert only pixels in the preprocessed depth image having depth values less than the detection distance of the TOF camera into the coordinate system of the left-eye camera, so that only depth information in the preprocessed depth image having depth values less than the detection distance is retained in the left-eye depth reference image.
13. The multi-vision sensor fusion apparatus of claim 7, wherein the multi-vision sensor fusion system further comprises a post-processing module, wherein the post-processing module is communicatively coupled to the fusion module for post-processing the initial binocular depth image to obtain a final depth image.
14. The multi-vision sensor fusion apparatus of claim 13, wherein the post-processing module comprises a left-right consistency detection module and a mismatching point filling module communicatively coupled to each other, wherein the left-right consistency detection module is configured to perform a left-right consistency detection on the initial binocular depth image to obtain mismatching points present in the initial binocular depth image; the mismatching point filling module is used for filling the depth information of the mismatching points so as to eliminate the mismatching points.
15. The multi-vision sensor fusion apparatus of claim 14, wherein the mismatching point filling module is further configured to fill depth information of a corresponding pixel point in the preprocessed depth image to the mismatching point when a depth of the mismatching point is less than a detection distance of the TOF camera; and when the depth of the mismatching point is not smaller than the detection distance of the TOF camera, filling the mismatching point by adopting a horizontal line searching method.
16. The multi-vision sensor fusion device of claim 15, wherein the post-processing module further comprises a spurious point filtering module, wherein the spurious point filtering module is communicatively connected with the mismatching point filling module, and is configured to remove a region with a larger variation and a smaller connected space by performing connected domain judgment on the filled binocular depth image, so as to obtain the binocular depth image with spurious points filtered.
17. The multi-vision sensor fusion apparatus of claim 16, wherein the post-processing module further comprises a staining module, wherein the staining module is communicatively coupled to the spurious point filtering module and the acquisition module, respectively, for staining the spurious point filtered binocular depth image based on the RGB image to obtain a color depth image.
18. The multi-vision sensor fusion apparatus of any one of claims 2 to 5, wherein the multi-vision sensor fusion system further comprises a calibration module, wherein the calibration module is configured to calibrate the binocular camera, the TOF camera, and the RGB camera by a target unit to obtain pose relationships between the left-eye camera and the right-eye camera, the pose relationships between the TOF camera and the left-eye camera, and the pose relationships between the RGB camera and the TOF camera, respectively.
19. The multi-vision sensor fusion device of claim 18, the target unit being a three-dimensional calibration plate, wherein the three-dimensional calibration plate comprises a first calibration panel, a second calibration panel, and a third calibration panel, wherein the first calibration panel, the second calibration panel, and the third calibration panel are disposed edge-to-edge with each other, and wherein the calibration surfaces of the first calibration panel, the second calibration panel, and the third calibration panel are non-coplanar such that the three-dimensional calibration plate forms a three-sided calibration plate.
20. The multi-vision sensor fusion apparatus of claim 19, wherein the first calibration panel, the second calibration panel, and the third calibration panel have a common corner point as a vertex of the three-dimensional calibration plate.
21. The multi-vision sensor fusion device of claim 20, wherein the calibration patterns on the first, second, and third calibration panels are patterns having a circular array of indicia.
22. The multi-vision sensor fusion device of claim 21, wherein circular markers in the circular marker array on the calibration pattern are black and background is white; the overlapping edges of the first calibration panel, the second calibration panel and the third calibration panel are black.
23. The multi-vision sensor fusion apparatus of claim 22, wherein the calibration module comprises a segmentation module, a classification module, a sorting module, and a calibration algorithm module, which are communicatively connected in sequence, wherein the segmentation module is configured to segment an entire calibration image obtained by synchronously capturing the stereoscopic calibration plate by the binocular camera, the TOF camera, and the RGB camera, respectively, to obtain three segmented calibration images, respectively, such that the segmented calibration images correspond to the first calibration panel, the second calibration panel, and the third calibration panel of the stereoscopic calibration plate, respectively; the classifying module is used for classifying the surfaces of the circular marks in the segmented calibration images so as to divide the circular marks on the same calibration surface into the same class; the sorting module is used for sorting the circle center coordinates of each type of the circular marks to obtain sorted circle center coordinate data; the calibration algorithm module is used for performing calibration calculation on the sequenced circle center coordinate data to obtain a required pose relation.
24. A multi-vision sensor fusion method, comprising the steps of:
S200: acquiring an initial left-eye image and an initial right-eye image which are obtained by acquiring image information of a scene through a left-eye camera and a right-eye camera of a binocular camera, and acquiring an initial depth image which is obtained by synchronously acquiring depth information of the scene through a TOF camera, wherein the relative positions of the TOF camera and the binocular camera are fixed;
s300: preprocessing the initial depth image, the initial left-eye image and the initial right-eye image respectively to obtain a preprocessed depth image, a preprocessed left-eye image and a preprocessed right-eye image; and
S400: and calculating a reference parallax by taking the preprocessed depth image as a reference, presetting a required parallax range by taking the reference parallax as a center, and further carrying out fusion processing on the preprocessed left-eye image and the preprocessed right-eye image within the preset parallax range to obtain a fused depth image.
25. The multi-vision sensor fusion method of claim 24, wherein the step S200 further comprises the steps of: color information of the scene acquired by an RGB camera is acquired to obtain an RGB image, wherein the relative position between the RGB camera and the TOF camera is fixed.
26. The multi-vision sensor fusion method of claim 25, wherein the step S300 includes the steps of:
Respectively carrying out optimization and de-drying on the initial left-eye image and the initial right-eye image through median filtering to obtain a filtered left-eye image and a filtered right-eye image; and
Downsampling the filtered left-eye image and the filtered right-eye image, respectively, to obtain the preprocessed left-eye image and the preprocessed right-eye image.
27. The multi-vision sensor fusion method of claim 26, wherein the step S300 further comprises the steps of:
Filtering the initial depth image by guiding filtering with reference to the RGB image to obtain a filtered depth image; and
The filtered depth image is upsampled to obtain an upsampled depth image such that the upsampled depth image is used as the preprocessed depth image.
28. The multi-vision sensor fusion method of claim 27, wherein the step S400 includes the steps of:
S410: carrying out polar distortion correction on the preprocessed left-eye image and the preprocessed right-eye image to obtain corrected binocular images; and
S420: and (3) referring to the preprocessed depth image, performing binocular stereo matching on the corrected binocular image to obtain an initial binocular depth image.
29. The multi-vision sensor fusion method of claim 28, wherein the step S420 includes the steps of:
Converting the preprocessed depth image into a coordinate system of the left-eye camera according to the pose relation between the TOF camera and the left-eye camera so as to obtain a left-eye depth reference image; and
And carrying out cost calculation on pixel points with depth values in the left-eye depth reference image in the preset parallax range so as to carry out point-to-point matching, thereby obtaining the initial binocular depth image.
30. The multi-vision sensor fusion method of claim 29, further comprising the step of:
s500: and carrying out post-processing on the initial binocular depth image to obtain a final depth image.
31. The multi-vision sensor fusion method of claim 30, wherein the step S500 includes the steps of:
s510: performing left-right consistency detection on the initial binocular depth image to obtain mismatching points in the initial binocular depth image; and
S520: and filling the depth information of the mismatching points to remove the mismatching points, and obtaining a filled binocular depth image.
32. The multi-vision sensor fusion method of claim 31, wherein in the step S520, when the depth of the mismatching point is smaller than the detection distance of the TOF camera, depth information of the corresponding pixel point in the preprocessed depth image is filled into the mismatching point; and when the depth of the mismatching point is not smaller than the detection distance of the TOF camera, filling the mismatching point by adopting a horizontal line searching method.
33. The multi-vision sensor fusion method of claim 32, wherein the step S500 further comprises the steps of:
s530: and (3) through judging the connected domain of the filled binocular depth image, removing the region with larger change and smaller connected space, so as to obtain the binocular depth image with the stray points filtered.
34. The multi-vision sensor fusion method of claim 33, wherein the step S500 further comprises the steps of:
s540: and on the basis of the RGB image, dyeing the binocular depth image with the stray points filtered to obtain a color depth image.
35. The multi-vision sensor fusion method of any one of claims 24 to 34, further comprising, prior to said step S200, the steps of:
s100: and calibrating the binocular camera, the TOF camera and the RGB camera through a target plate unit to respectively obtain the pose relation between the left eye camera and the right eye camera of the binocular camera, the pose relation between the TOF camera and the left eye camera and the pose relation between the RGB camera and the TOF camera.
36. The multi-vision sensor fusion method of claim 35, wherein the step S100 includes the steps of:
dividing the whole calibration image obtained by synchronously shooting the three-dimensional calibration plate through the binocular camera, the TOF camera and the RGB camera respectively to obtain three divided calibration images respectively, so that the divided calibration images respectively correspond to a first calibration panel, a second calibration panel and a third calibration panel of the three-dimensional calibration plate;
Classifying the surfaces of the circular marks in the divided calibration images so that the circular marks on the same calibration surface are classified into the same class;
sequencing the circle center coordinates of each type of the circular marks to obtain sequenced circle center coordinate data; and performing calibration calculation on the sequenced circle center coordinate data to obtain a required pose relationship.
37. An electronic device, comprising:
a processor for executing the instructions; and
A memory, wherein the memory is configured to hold machine readable instructions that are executed by the processor to implement some or all of the steps in the multi-vision sensor fusion method of any one of claims 24-36.
CN201911103519.1A 2019-11-13 2019-11-13 Multi-vision sensor fusion device, method thereof and electronic equipment Active CN112802114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911103519.1A CN112802114B (en) 2019-11-13 2019-11-13 Multi-vision sensor fusion device, method thereof and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911103519.1A CN112802114B (en) 2019-11-13 2019-11-13 Multi-vision sensor fusion device, method thereof and electronic equipment

Publications (2)

Publication Number Publication Date
CN112802114A CN112802114A (en) 2021-05-14
CN112802114B true CN112802114B (en) 2024-07-02

Family

ID=75803011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911103519.1A Active CN112802114B (en) 2019-11-13 2019-11-13 Multi-vision sensor fusion device, method thereof and electronic equipment

Country Status (1)

Country Link
CN (1) CN112802114B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115127449B (en) * 2022-07-04 2023-06-23 山东大学 Non-contact fish body measuring device and method assisting binocular vision
CN115965673B (en) * 2022-11-23 2023-09-12 中国建筑一局(集团)有限公司 Centralized multi-robot positioning method based on binocular vision
CN115963917B (en) * 2022-12-22 2024-04-16 北京百度网讯科技有限公司 Visual data processing apparatus and visual data processing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615652A (en) * 2018-10-23 2019-04-12 西安交通大学 A kind of depth information acquisition method and device
CN110009672A (en) * 2019-03-29 2019-07-12 香港光云科技有限公司 Promote ToF depth image processing method, 3D rendering imaging method and electronic equipment
CN110148181A (en) * 2019-04-25 2019-08-20 青岛康特网络科技有限公司 A kind of general binocular solid matching process

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106772431B (en) * 2017-01-23 2019-09-20 杭州蓝芯科技有限公司 A kind of Depth Information Acquistion devices and methods therefor of combination TOF technology and binocular vision
CN108322724B (en) * 2018-02-06 2019-08-16 上海兴芯微电子科技有限公司 Image solid matching method and binocular vision equipment
CN108682026B (en) * 2018-03-22 2021-08-06 江大白 Binocular vision stereo matching method based on multi-matching element fusion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615652A (en) * 2018-10-23 2019-04-12 西安交通大学 A kind of depth information acquisition method and device
CN110009672A (en) * 2019-03-29 2019-07-12 香港光云科技有限公司 Promote ToF depth image processing method, 3D rendering imaging method and electronic equipment
CN110148181A (en) * 2019-04-25 2019-08-20 青岛康特网络科技有限公司 A kind of general binocular solid matching process

Also Published As

Publication number Publication date
CN112802114A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN109961468B (en) Volume measurement method and device based on binocular vision and storage medium
US8705846B2 (en) Position measuring apparatus, position measuring method, image processing apparatus and image processing method
JP5580164B2 (en) Optical information processing apparatus, optical information processing method, optical information processing system, and optical information processing program
CN107635129B (en) Three-dimensional trinocular camera device and depth fusion method
KR101706093B1 (en) System for extracting 3-dimensional coordinate and method thereof
EP2751521B1 (en) Method and system for alignment of a pattern on a spatial coded slide image
CN112802114B (en) Multi-vision sensor fusion device, method thereof and electronic equipment
JP7037876B2 (en) Use of 3D vision in automated industrial inspection
CN110766758B (en) Calibration method, device, system and storage device
CN112346073A (en) Dynamic vision sensor and laser radar data fusion method
CN110458952B (en) Three-dimensional reconstruction method and device based on trinocular vision
CN109859272A (en) A kind of auto-focusing binocular camera scaling method and device
KR20160121509A (en) Structured light matching of a set of curves from two cameras
CN112184793B (en) Depth data processing method and device and readable storage medium
CN110619660A (en) Object positioning method and device, computer readable storage medium and robot
CN110415363A (en) A kind of object recognition positioning method at random based on trinocular vision
CA3233222A1 (en) Method, apparatus and device for photogrammetry, and storage medium
Ahmadabadian et al. Image selection in photogrammetric multi-view stereo methods for metric and complete 3D reconstruction
CN114463303A (en) Road target detection method based on fusion of binocular camera and laser radar
CN115359130A (en) Radar and camera combined calibration method and device, electronic equipment and storage medium
JP7298687B2 (en) Object recognition device and object recognition method
JP2016114445A (en) Three-dimensional position calculation device, program for the same, and cg composition apparatus
CN116379936A (en) Intelligent recognition distance measuring and calculating method and device based on binocular camera
JPH1023311A (en) Image information input method and device therefor
KR20160049639A (en) Stereoscopic image registration method based on a partial linear method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant