CN112802114A - Multi-vision sensor fusion device and method and electronic equipment - Google Patents

Multi-vision sensor fusion device and method and electronic equipment Download PDF

Info

Publication number
CN112802114A
CN112802114A CN201911103519.1A CN201911103519A CN112802114A CN 112802114 A CN112802114 A CN 112802114A CN 201911103519 A CN201911103519 A CN 201911103519A CN 112802114 A CN112802114 A CN 112802114A
Authority
CN
China
Prior art keywords
camera
image
module
depth
binocular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911103519.1A
Other languages
Chinese (zh)
Inventor
孙佳睿
李楠
李健
金瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Sunny Optical Intelligent Technology Co Ltd
Original Assignee
Zhejiang Sunny Optical Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Sunny Optical Intelligent Technology Co Ltd filed Critical Zhejiang Sunny Optical Intelligent Technology Co Ltd
Priority to CN201911103519.1A priority Critical patent/CN112802114A/en
Publication of CN112802114A publication Critical patent/CN112802114A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/239Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

A multi-vision sensor fusion device, a method thereof and electronic equipment. The multi-vision sensor fusion device comprises a binocular camera, a left-eye camera and a right-eye camera, wherein the binocular camera comprises a left-eye camera and a right-eye camera which are arranged at intervals, the left-eye camera is used for acquiring image information of a scene to obtain an initial left-eye image, and the right-eye camera is used for synchronously acquiring the image information of the scene to obtain an initial right-eye image; the TOF camera is fixed in relative position with the binocular camera and used for synchronously acquiring depth information of the scene to obtain an initial depth image; and the multi-vision sensor fusion system is respectively connected with the binocular camera and the TOF camera in a communication way and is used for processing the initial left eye image and the initial right eye image by taking the initial depth image as a reference so as to obtain a fused depth image.

Description

Multi-vision sensor fusion device and method and electronic equipment
Technical Field
The invention relates to the technical field of vision sensors, in particular to a multi-vision sensor fusion device and method and electronic equipment.
Background
With the rapid development of artificial intelligence, the demand for high-precision vision sensors is increasing. And good depth information can provide better help for various algorithm development based on vision, and is convenient to be widely expanded in many fields such as security protection, monitoring, machine vision or robots. For example: object recognition and obstacle detection in autonomous driving; the method comprises the following steps of identifying, sorting, unstacking and stacking scattered and stacked objects in industry; shelf grabbing of objects in a logistics scene, and the like.
Currently, the depth vision sensor for acquiring depth information mainly includes a binocular vision stereo matching technology (such as a binocular camera), a Time of Flight (TOF) technology (such as a TOF camera), a structured light technology (such as a structured light camera), and the like. However, the single TOF camera has the influence of multipath interference, depth information loss exists in an area with low reflectivity, the overall resolution is low, point clouds are sparse, and the accuracy of the depth information acquired particularly at a long distance is poor; although the binocular camera has no defects of a TOF camera, the calculated amount is large, and the problems of occlusion, repeated texture and matching error of weak texture areas exist, so that the accuracy of the depth information acquired by the binocular camera is low; in addition, the structured light camera has the defects of low resolution, long measurement time, poor reliability and the like due to different encoding methods. That is, a certain depth vision sensor alone cannot obtain depth information with high accuracy; the simple superposition of the multiple depth vision sensors cannot improve the overall performance well and obtain high-precision depth information, but can greatly increase the complexity and the calculation amount of the system structure, so that the overall cost is greatly improved.
Disclosure of Invention
An object of the present invention is to provide a multi-vision sensor fusion apparatus, a method thereof, and an electronic device, which can fuse advantages of a TOF camera and a binocular camera, overcome respective disadvantages, and contribute to improving accuracy of an obtained depth image.
Another object of the present invention is to provide a multi-vision sensor fusion apparatus, a method thereof and an electronic device, wherein in an embodiment of the present invention, the multi-vision sensor fusion apparatus is capable of acquiring a dense and high-precision depth image at multiple distances.
Another object of the present invention is to provide a multi-vision sensor fusion apparatus, a method thereof and an electronic device, wherein, in an embodiment of the present invention, the multi-vision sensor fusion apparatus can achieve the advantages of a TOF camera, a binocular camera and an RGB camera, which helps to further improve the accuracy of the obtained depth image.
Another object of the present invention is to provide a multi-vision sensor fusion apparatus, a method thereof and an electronic device, wherein in an embodiment of the present invention, the multi-vision sensor fusion method can fuse a TOF camera and a binocular camera with a large resolution difference, so as to facilitate popularization and use of the multi-vision sensor fusion method.
Another object of the present invention is to provide a multi-vision sensor fusion apparatus, a method thereof and an electronic device, wherein in an embodiment of the present invention, the multi-vision sensor fusion method uses a stereo calibration board for calibration, which is helpful for simplifying a calibration process of the multi-vision sensor fusion apparatus and improving robustness of an overall calibration algorithm.
Another object of the present invention is to provide a multi-vision sensor fusion apparatus, a method thereof and an electronic device, wherein in an embodiment of the present invention, the multi-vision sensor fusion apparatus can greatly reduce the matching time of point pairs in a binocular image, reduce the amount of computation of the matching point pairs, and facilitate to quickly obtain a high-precision depth image.
Another object of the present invention is to provide a multi-vision sensor fusion apparatus, a method thereof and an electronic device, wherein it is not necessary to use expensive materials or complicated structures in the present invention in order to achieve the above objects. Therefore, the present invention successfully and effectively provides a solution to not only provide a simple multi-vision sensor fusion apparatus and method, and electronic device, but also increase the practicality and reliability of the multi-vision sensor fusion apparatus and method, and electronic device.
To achieve at least one of the above objects or other objects and advantages, the present invention provides a multi-vision sensor fusion apparatus, including:
the binocular camera comprises a left eye camera and a right eye camera which are arranged at intervals, wherein the left eye camera is used for acquiring image information of a scene to obtain an initial left eye image, and the right eye camera is used for synchronously acquiring the image information of the scene to obtain an initial right eye image;
the TOF camera is fixed in relative position with the binocular camera and is used for synchronously acquiring depth information of the scene to obtain an initial depth image; and
a multi-vision sensor fusion system, wherein the multi-vision sensor fusion system is communicably connected to the binocular camera and the TOF camera, respectively, and is configured to process the initial left eye image and the initial right eye image with reference to the initial depth image to obtain a fused depth image.
In an embodiment of the present invention, the multi-vision sensor fusion apparatus further includes an RGB camera, wherein a relative position between the RGB camera and the TOF camera is fixed, and the RGB camera is used for synchronously acquiring color information of the scene to obtain an RGB image.
In an embodiment of the present invention, the TOF camera is disposed between the left eye camera and the right eye camera of the binocular camera, and an optical center of the TOF camera is located on a line between the optical centers of the left eye camera and the right eye camera.
In an embodiment of the present invention, the RGB camera is disposed between the left eye camera and the right eye camera of the binocular camera, and the optical centers of the RGB camera are located on a connection line between the optical centers of the left eye camera and the right eye camera.
In an embodiment of the present invention, the RGB camera is disposed between the left eye camera and the right eye camera of the binocular camera, and a connection line between the optical centers of the RGB camera and the TOF camera is perpendicular to a connection line between the optical centers of the left eye camera and the right eye camera.
In an embodiment of the present invention, the multi-vision sensor fusion system includes an acquisition module and a fusion module communicably connected to each other, wherein the acquisition module is communicably connected to the binocular camera and the TOF camera, respectively, for acquiring the initial left eye image, the initial right eye image and the initial depth image; the fusion module is used for performing point-to-point matching on the initial left eye image and the initial right eye image within a preset parallax range by taking the depth value of a pixel point in the initial depth image as a reference so as to obtain an initial binocular depth image, and the initial binocular depth image is used as the fused depth image.
In an embodiment of the present invention, the multi-vision sensor fusion system includes an acquisition module and a fusion module communicably connected to each other, wherein the acquisition module is communicably connected to the binocular camera and the TOF camera, respectively, for acquiring the initial left eye image, the initial right eye image, the initial depth image and the RGB image; the fusion module is used for performing point-to-point matching on the initial left eye image and the initial right eye image within a preset parallax range by taking the depth value of a pixel point in the initial depth image as a reference so as to obtain an initial binocular depth image, and the initial binocular depth image is used as the fused depth image.
In an embodiment of the invention, the multi-vision sensor fusion system further includes a pre-processing module, wherein the pre-processing module is communicably disposed between the acquiring module and the fusion module, and is configured to pre-process the initial left-eye image, the initial right-eye image, and the initial depth image acquired by the acquiring module to obtain a pre-processed left-eye image, a pre-processed right-eye image, and a pre-processed depth image, so that the fusion module is configured to perform point-to-point matching on the pre-processed left-eye image and the pre-processed right-eye image within the preset parallax range by using depth values of pixel points in the pre-processed depth image as references to obtain the initial binocular depth image.
In an embodiment of the present invention, the preprocessing module includes a median filtering module, a down-sampling module, and a depth filtering module, where the median filtering module is communicably connected to the obtaining module, and is configured to perform optimized denoising on the initial left eye image and the initial right eye image respectively through median filtering to obtain a filtered left eye image and a filtered right eye image; wherein the down-sampling module is communicably connected to the median filtering module and configured to down-sample the filtered left eye image and the filtered right eye image, respectively, to obtain the pre-processed left eye image and the pre-processed right eye image; the depth filtering module is communicably connected to the obtaining module, and configured to perform filtering processing on the initial depth image with reference to the RGB image through guided filtering to obtain a filtered depth image, so that the filtered depth image is directly used as the pre-processed depth image.
In an embodiment of the present invention, the preprocessing module further includes an upsampling module, where the upsampling module is communicably connected to the depth filtering module, and is configured to upsample the filtered depth image to obtain an upsampled depth image, so that the upsampled depth image is used as the preprocessed depth image.
In an embodiment of the present invention, the fusion module includes a distortion epipolar line correction module and a binocular matching module communicatively connected to each other, wherein the distortion epipolar line correction module is communicatively connected to the preprocessing module and is configured to perform epipolar line distortion correction on the preprocessed left eye image and the preprocessed right eye image to obtain a corrected binocular image; the binocular matching module is communicably connected with the preprocessing module and is used for referencing the preprocessed depth image and performing binocular stereo matching on the corrected binocular image to obtain the initial binocular depth image.
In an embodiment of the present invention, the binocular matching module includes a conversion module and a cost calculation module, which are communicably connected to each other, wherein the conversion module is configured to convert the preprocessed depth image into a coordinate system of the left eye camera according to a pose relationship between the TOF camera and the left eye camera to obtain a left eye depth reference image, and the cost calculation module is configured to perform cost calculation on pixel points having depth values in the left eye depth reference image within the preset parallax range to perform point-to-point matching, so as to obtain the initial binocular depth image.
In an embodiment of the present invention, the conversion module of the binocular matching module is only configured to convert a pixel point, whose depth value is smaller than the detection distance of the TOF camera, in the preprocessed depth image into a coordinate system of the left eye camera, so that only depth information, whose depth value is smaller than the detection distance, in the preprocessed depth image is retained in the left eye depth reference image.
In an embodiment of the present invention, the multi-vision sensor fusion system further includes a post-processing module, wherein the post-processing module is communicably connected to the fusion module, and is configured to perform post-processing on the initial binocular depth image to obtain a final depth image, so that the final depth image is a dense and highly accurate depth image.
In an embodiment of the present invention, the post-processing module includes a left and right consistency detection module and a mis-matching point filling module, which are communicably connected to each other, where the left and right consistency detection module is configured to perform left and right consistency detection on the initial binocular depth image to obtain mis-matching points existing in the initial binocular depth image; and the mismatching point filling module is used for filling the depth information of the mismatching points so as to eliminate the mismatching points.
In an embodiment of the present invention, the mis-matching point filling module is further configured to fill the depth information of the corresponding pixel point in the preprocessed depth image to the mis-matching point when the depth of the mis-matching point is smaller than the detection distance of the TOF camera; and when the depth of the mismatching point is not less than the detection distance of the TOF camera, filling the mismatching point by adopting a horizontal line searching method.
In an embodiment of the present invention, the post-processing module further includes a stray point filtering module, where the stray point filtering module is communicably connected to the mismatching point filling module, and is configured to perform connected domain determination on the filled binocular depth image, and remove an area with a large change and a small connected space, so as to obtain the binocular depth image with the stray points filtered.
In an embodiment of the present invention, the post-processing module further includes a dyeing module, where the dyeing module is respectively communicably connected to the stray point filtering module and the acquiring module, and is configured to perform dyeing processing on the binocular depth image with the stray points filtered out based on the RGB image, so as to obtain a color depth image.
In an embodiment of the present invention, the multi-vision sensor fusion system further includes a calibration module, where the calibration module is configured to calibrate the binocular camera, the TOF camera and the RGB camera through a target unit, so as to obtain a pose relationship between the left eye camera and the right eye camera of the binocular camera, a pose relationship between the TOF camera and the left eye camera, and a pose relationship between the RGB camera and the TOF camera, respectively.
In an embodiment of the invention, the multi-vision sensor fusion device further includes the calibration unit, wherein the calibration unit is a stereo calibration board, wherein the stereo calibration board includes a first calibration panel, a second calibration panel and a third calibration panel, wherein the first calibration panel, the second calibration panel and the third calibration panel are arranged edge to edge with each other, and calibration surfaces of the first calibration panel, the second calibration panel and the third calibration panel are all not coplanar, so that the stereo calibration board forms a three-dimensional calibration board.
In an embodiment of the invention, the first calibration panel, the second calibration panel and the third calibration panel have a common corner point, which is used as a vertex of the stereoscopic calibration panel.
In an embodiment of the present invention, the calibration patterns on the first calibration panel, the second calibration panel and the third calibration panel are patterns having circular arrays of markers.
In an embodiment of the present invention, the circular mark on each calibration pattern is black and the background is white; the first calibration panel, the second calibration panel and the third calibration panel are black in superposition.
In an embodiment of the present invention, the calibration module includes a segmentation module, a classification module, a sequencing module, and a calibration algorithm module, which are sequentially connected in a communication manner, wherein the segmentation module is configured to segment a whole calibration image obtained by synchronously shooting the stereoscopic calibration plate with the binocular camera, the TOF camera, and the RGB camera, respectively, so as to obtain three segmented calibration images, respectively, so that the segmented calibration images correspond to the first calibration panel, the second calibration panel, and the third calibration panel of the stereoscopic calibration plate, respectively; the classification module is used for classifying the surfaces of the circular marks in the divided calibration images so as to classify the circular marks on the same calibration surface into the same class; the sorting module is used for sorting the circle center coordinates of each type of circular mark to obtain sorted circle center coordinate data; the calibration algorithm module is used for performing calibration calculation on the sorted circle center coordinate data to obtain a required pose relation.
According to another aspect of the present invention, the present invention also provides a multi-vision sensor fusion method, comprising the steps of:
s200: acquiring image information of a scene acquired by a left eye camera and a right eye camera of a binocular camera to obtain an initial left eye image and an initial right eye image, and acquiring depth information of the scene synchronously acquired by a TOF camera to obtain an initial depth image, wherein the relative position between the TOF camera and the binocular camera is fixed;
s300: respectively preprocessing the initial depth image, the initial left eye image and the initial right eye image to obtain a preprocessed depth image, a preprocessed left eye image and a preprocessed right eye image; and
s400: and taking the preprocessed depth image as a reference, and fusing the preprocessed left eye image and the preprocessed right eye image in a preset parallax range to obtain a fused depth image.
In an embodiment of the present invention, the step S200 further includes the steps of: acquiring color information of the scene acquired by an RGB camera to obtain an RGB image, wherein the relative position between the RGB camera and the TOF camera is fixed.
In an embodiment of the present invention, the step S300 includes the steps of:
respectively optimizing and drying the initial left eye image and the initial right eye image through median filtering to obtain a filtered left eye image and a filtered right eye image; and
and respectively carrying out down-sampling on the filtered left eye image and the filtered right eye image to obtain the preprocessed left eye image and the preprocessed right eye image.
In an embodiment of the present invention, the step S300 further includes the steps of:
filtering the initial depth image by referring to the RGB image through guiding filtering to obtain a filtered depth image; and
upsampling the filtered depth image to obtain an upsampled depth image, such that the upsampled depth image is taken as the preprocessed depth image.
In an embodiment of the present invention, the step S400 includes the steps of:
s410: performing epipolar distortion correction on the preprocessed left eye image and the preprocessed right eye image to obtain a corrected binocular image; and
s420: and referring to the preprocessed depth image, and performing binocular stereo matching on the corrected binocular image to obtain an initial binocular depth image.
In an embodiment of the present invention, the step S420 includes the steps of:
converting the preprocessed depth image into a coordinate system of the left eye camera according to the pose relationship between the TOF camera and the left eye camera to obtain a left eye depth reference image; and
and carrying out cost calculation on pixel points with depth values in the left eye depth reference image in the preset parallax range to carry out point pair matching, thereby obtaining the initial binocular depth image.
In an embodiment of the present invention, the multi-vision sensor fusion method further includes the steps of:
s500: and carrying out post-processing on the initial binocular depth image to obtain a final depth image, so that the final depth image is dense and has higher precision.
In an embodiment of the invention, the step S500 includes the steps of:
s510: carrying out left-right consistency detection on the initial binocular depth image to obtain mismatching points existing in the initial binocular depth image; and
s520: and filling the depth information of the mismatching points to eliminate the mismatching points and obtain a filled binocular depth image.
In an embodiment of the present invention, in the step S520, when the depth of the mismatch point is smaller than the detection distance of the TOF camera, the depth information of the corresponding pixel point in the preprocessed depth image is filled into the mismatch point; and when the depth of the mismatching point is not less than the detection distance of the TOF camera, filling the mismatching point by adopting a horizontal line searching method.
In an embodiment of the invention, the step S500 further includes the steps of:
s530: and through the judgment of the connected domain of the filled binocular depth image, the area with large change and small connected space is removed, so that the binocular depth image with the stray points filtered out is obtained.
In an embodiment of the invention, the step S500 further includes the steps of:
s540: based on the RGB image, dyeing the binocular depth image with the stray points filtered out is carried out to obtain a color depth image.
In an embodiment of the present invention, before the step S200, the multi-vision sensor fusion method further includes the steps of:
s100: the binocular camera, the TOF camera and the RGB camera are calibrated through a target unit, so that the pose relationship between the left eye camera and the right eye camera of the binocular camera, the pose relationship between the TOF camera and the left eye camera and the pose relationship between the RGB camera and the TOF camera are obtained respectively.
In an embodiment of the present invention, the step S100 includes the steps of:
respectively segmenting the whole calibration image obtained by synchronously shooting a three-dimensional calibration plate through the binocular camera, the TOF camera and the RGB camera to respectively obtain three segmented calibration images, so that the segmented calibration images respectively correspond to a first calibration panel, a second calibration panel and a third calibration panel of the three-dimensional calibration plate;
classifying the surfaces where the circular marks are located in the divided calibration images so that the circular marks on the same calibration surface are classified into the same class;
sorting the circle center coordinates of each type of the circular marks to obtain sorted circle center coordinate data; and
and calibrating and calculating the sorted circle center coordinate data to obtain the required pose relationship.
According to another aspect of the present invention, the present invention also provides an electronic device comprising:
a processor for executing instructions; and
a memory, wherein the memory is configured to hold machine readable instructions for execution by the logic to perform some or all of the steps of any of the multi-vision sensor fusion methods described above.
Further objects and advantages of the invention will be fully apparent from the ensuing description and drawings.
These and other objects, features and advantages of the present invention will become more fully apparent from the following detailed description, the accompanying drawings and the claims.
Drawings
Fig. 1 is a block diagram schematic diagram of a multi-vision sensor fusion apparatus according to an embodiment of the present invention.
Fig. 2 shows a schematic structural diagram of the multi-vision sensor fusion device according to the above-mentioned embodiment of the present invention.
Fig. 3 is a block diagram illustrating a multi-vision sensor fusion system of the multi-vision sensor fusion apparatus according to the above embodiment of the present invention.
Fig. 4 is a perspective view illustrating a calibration unit of the multi-vision sensor fusion device according to the above embodiment of the present invention.
Fig. 5 is a schematic diagram illustrating an operation process of the multi-vision sensor fusion device according to the above-mentioned embodiment of the present invention.
Fig. 6 is a flow chart of a multi-vision sensor fusion method according to an embodiment of the invention.
Fig. 7 shows a flow chart of one of the steps of the multi-vision sensor fusion method according to the above-described embodiment of the present invention.
Fig. 8 is a flow chart illustrating a second step of the multi-vision sensor fusion method according to the above embodiment of the present invention.
Fig. 9 shows a flow chart of the third step of the multi-vision sensor fusion method according to the above embodiment of the present invention.
Fig. 10 is a flow chart illustrating four steps of the multi-vision sensor fusion method according to the above embodiment of the present invention.
FIG. 11 is a block diagram schematic of an electronic device according to an embodiment of the invention.
Detailed Description
The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
In the present invention, the terms "a" and "an" in the claims and the description should be understood as meaning "one or more", that is, one element may be one in number in one embodiment, and the element may be more than one in number in another embodiment. The terms "a" and "an" should not be construed as limiting the number unless the number of such elements is explicitly recited as one in the present disclosure, but rather the terms "a" and "an" should not be construed as being limited to only one of the number.
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present invention, it should be noted that, unless explicitly stated or limited otherwise, the terms "connected" and "connected" are to be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
With the rapid development of scientific technology, the depth vision is used as a main depth information acquisition means, and the development and application of various algorithms based on the vision are directly influenced by the precision of the depth vision, so that the technical expansion of various fields such as security, monitoring, machine vision and the like is further influenced. Currently, depth vision sensors are mainly classified into a TOF sensor (also called a TOF camera), a binocular vision sensor (also called a binocular camera) and a structured light sensor (also called a structured light camera), which have different advantages and disadvantages due to different working principles, and thus the accuracy of the depth images obtained by the sensors is not high. For example, the TOF camera has the influence of multipath interference, and has the problems of depth loss in a region with low reflectivity, low overall resolution and sparse point cloud, so that only a sparse depth image can be obtained by the TOF camera; although the single binocular camera does not have the defects of the TOF camera, the calculation amount is large, the occlusion exists, and the problems of repeated texture and weak texture region matching errors exist, so that the dense depth image can be obtained through the binocular camera, but the high-precision depth image cannot be obtained, and huge calculation resources are required. Therefore, in order to solve the above problems and combine the advantages of the TOF camera and the binocular camera to overcome the respective disadvantages, the present invention provides a multi-vision sensor fusion apparatus, a method thereof, and an electronic device, which can obtain a dense and highly accurate depth image. It can be understood that the higher precision of the depth image mentioned in the present invention refers to: the precision of the depth image obtained by the multi-vision sensor fusion device is higher than that of the depth image obtained by a single TOF camera or a single binocular camera; the density of the depth image referred to in the present invention means: the density of pixel points in the depth image acquired by the multi-vision sensor fusion device is greater than that of pixel points in the depth image acquired by a single TOF camera.
Schematic device
Referring to fig. 1-5 of the drawings, a multi-vision sensor fusion apparatus in accordance with an embodiment of the present invention is illustrated. Specifically, as shown in fig. 1 and 2, the multi-vision sensor fusion apparatus 1 includes a binocular camera 10, a TOF camera 20, and a multi-vision sensor fusion system 30. The binocular camera 10 comprises a left eye camera 11 and a right eye camera 12 which are arranged at intervals, wherein the left eye camera 11 is used for acquiring image information of a scene to obtain an initial left eye image; wherein the right eye camera 12 is configured to synchronously acquire image information of the scene to obtain an initial right eye image. The TOF camera 20 and the left eye camera 11 and the right eye camera 12 of the binocular camera 10 are fixed in relative position, and are used for synchronously acquiring depth information of the scene to obtain an initial depth image. The multi-vision sensor fusion system 30 is communicably connected to the binocular camera 10 and the TOF camera 20, respectively, for processing the initial left eye image and the initial right eye image with reference to the initial depth image to obtain a fused depth image.
It is noted that, in the present invention, when the initial left eye image and the initial right eye image are processed, the initial depth image obtained by the TOF camera 20 is referred to, so that the fused depth image is implemented as a dense and high-precision depth image, that is, the density of the pixels in the fused depth image is greater than that in the initial depth image; and the accuracy of the fused depth image is higher than the accuracy of the depth image obtained by a TOF camera alone or a binocular camera alone.
More specifically, in order to ensure that the field of view range of the TOF camera 20 can simultaneously cover as many as possible the field of view ranges of the left eye camera 11 and the right eye camera 12 of the binocular camera 10, in the above-described embodiment of the present invention, as shown in fig. 2, the TOF camera 20 is disposed between the left eye camera 11 and the right eye camera 12 of the binocular camera 10.
Preferably, as shown in fig. 2, the optical center C20 of the TOF camera 20 is located on a connection line between the optical center C11 of the left eye camera 11 and the optical center C12 of the right eye camera 12, so as to simplify the subsequent image processing process and reduce the calculation amount during calibration.
More preferably, as shown in fig. 2, the optical center C20 of the TOF camera 20 is located at the center of the optical centers C11 and C12 of the left eye camera 11 and the right eye camera 12, that is, the distance between the optical center C20 of the TOF camera 20 and the optical center C11 of the left eye camera 11 is equal to the distance between the optical center C20 of the TOF camera 20 and the optical center C12 of the right eye camera 12, so as to further simplify the subsequent image processing procedure, reduce the calculation amount of calibration time, and prevent error accumulation.
Furthermore, the angle of the external reference rotation of the binocular camera 10 does not exceed 2 °, that is, the angle between the optical axis of the right eye camera 12 of the binocular camera 10 and the optical axis of the left eye camera 11 of the binocular camera 10 does not exceed 2 °. Most preferably, the optical axis of the right eye camera 12 of the binocular camera 10 is parallel to the optical axis of the left eye camera 11 of the binocular camera 10.
According to the above-described embodiment of the present invention, as shown in fig. 3, the multi-vision sensor fusion system 30 includes an acquisition module 31 and a fusion module 32 communicably connected to each other. The acquisition module 31 is communicably connected to the binocular camera 10 and the TOF camera, respectively, for acquiring the initial left eye image and the initial right eye image from the binocular camera 10 and the initial depth image from the TOF camera 20. The fusion module 32 is configured to perform point-to-point matching on the initial left-eye image and the initial right-eye image within a preset parallax range by using depth values of pixel points in the initial depth image as a reference, so as to obtain an initial binocular depth image.
It can be understood that, in the existing binocular image matching technology, since no depth information is referred to, when point pair matching is performed on a certain pixel point on a left eye image, cost calculation needs to be performed on all pixel points in the right eye image one by one, and then a matching point pair is obtained, so that the calculation amount of the whole matching process is huge. The multi-vision sensor fusion device 1 of the present invention uses the depth value in the initial depth image obtained by the TOF camera 20 as a reference, so that when performing point-to-point matching on a certain pixel point on the left-eye image, only the reference depth value of the pixel point is needed to calculate the reference parallax, and the required parallax range is preset with the reference parallax as a center (for example, the upper and lower limits of the parallax range and the reference parallax do not exceed 5 pixel points, etc.), and then the cost calculation is performed on the pixel point in the preset parallax range in the right-eye image to obtain the matching point pair, without performing cost calculation on all the pixel points in the right-eye image, which is helpful to greatly reduce the calculation amount in the matching process and improve the matching efficiency and the fusion precision.
It is worth mentioning that TOF cameras and binocular cameras on the market are usually different in resolution and very different, e.g. TOF cameras are usually only 224 × 172 (i.e. less than 4 ten thousand pixels); however, the resolution of the binocular camera is often over a million pixels, that is, over five times of the resolution of the TOF camera, which results in that the initial left eye image and the initial right eye image obtained by the binocular camera with higher resolution cannot be fused by directly referring to the initial depth image obtained by the TOF camera with lower resolution. Therefore, in the above embodiment of the present invention, as shown in fig. 3, the multi-vision sensor fusion system 30 of the multi-vision sensor fusion device 1 further includes a preprocessing module 33, wherein the preprocessing module 33 is communicably disposed between the acquisition module 31 and the fusion module 32, and is used for preprocessing the initial left-eye image, the initial right-eye image and the initial depth image acquired by the acquisition module 31 to obtain a preprocessed left-eye image, a preprocessed right-eye image and a preprocessed depth image, so that the resolution of the preprocessed depth image matches the preprocessed left-eye image and the preprocessed right-eye image, thereby facilitating the subsequent fusion processing by the fusion module 32.
Preferably, the resolution of the pre-processed left eye image and the pre-processed right eye image is equal to 2-3 times the resolution of the pre-processed depth image.
More specifically, as shown in fig. 3 and 5, the preprocessing module 33 of the multi-vision sensor fusion system 30 includes a median filtering module 331, a down-sampling module 332, and a depth filtering module 333. The median filtering module 331 is communicably connected to the obtaining module 31, and configured to perform optimized denoising on the initial left eye image and the initial right eye image through median filtering, so as to obtain a filtered left eye image and a filtered right eye image, and ensure stability of the left eye image and the right eye image. The down-sampling module 332 is communicably connected to the median filtering module 331, and is configured to down-sample the filtered left eye image and the filtered right eye image respectively to obtain a down-sampled left eye image and a down-sampled right eye image, so that the resolutions of the left eye image and the right eye image are reduced. The depth filtering module 333 is communicatively connected to the obtaining module 31, and is configured to perform a filtering process on the initial depth image to obtain a filtered depth image. It is understood that in an example of the present invention, the down-sampling module 332 is communicably connected to the fusion module 32 for taking the down-sampled left eye image and the down-sampled right eye image as the pre-processed left eye image and the pre-processed right eye image to be transmitted to the fusion module 32; the depth filtering module 333 may be communicatively coupled with the fusion module 32 for transmitting the filtered depth image as the pre-processed depth image to the fusion module 32. In addition, the downsampling module 332 of the present invention may downsample the image by interpolating, for example, interpolating 4 pixels into a single pixel.
It is noted that if the resolution of the initial left eye image and the initial right eye image is too high, even after down-sampling by the down-sampling module 332, the resolution of the down-sampled left eye image and the up-sampled right eye image is still much greater than the resolution of the filtered depth image (e.g., the resolution of the down-sampled left eye image and the up-sampled right eye image is more than 3 times the resolution of the filtered depth image), the resolution of the pre-processed depth image cannot be matched with the pre-processed left eye image and the pre-processed right eye image. Therefore, as shown in fig. 1 and 5, the preprocessing module 33 of the multi-vision sensor fusion system 30 further includes an upsampling module 334, wherein the upsampling module 334 is communicatively connected to the depth filtering module 333 for upsampling the filtered depth image to obtain an upsampled depth image, so that the resolution of the depth image is improved. Wherein the upsampling module 334 is further communicatively connected with the fusion module 32 for transmitting the upsampled depth image as the preprocessed depth image to the fusion module 32.
According to the above embodiment of the present invention, as shown in fig. 1 and fig. 2, the multi-vision sensor fusion apparatus 1 further includes an RGB camera 40, wherein the relative position between the RGB camera 40 and the TOF camera 20 is fixed, and is used for acquiring the color information of the scene to obtain an RGB image. The depth filtering module 333 of the pre-processing module 33 of the multi-vision sensor fusion system 30 is communicatively connected to the RGB camera 40 for optimizing the initial depth image by guided filtering with reference to the RGB image to obtain the filtered depth image. It can be understood that, because the resolution of the RGB camera 40 is high, the overall details of the depth image can be optimized by referring to the RGB image, so as to obtain a better optimized denoising effect.
Preferably, as shown in fig. 2, the RGB camera 40 is disposed between the left eye camera 11 and the right eye camera 12 of the binocular camera 10, and the RGB camera 40 and the TOF camera 20 are horizontally arranged side by side, i.e., the optical center C40 of the RGB camera 40 and the optical center C20 of the TOF camera 20 are on the same horizontal line, which helps to ensure higher accuracy and avoid errors due to positional deviation. In other words, the optical center C40 of the RGB camera 40 and the optical center C20 of the TOF camera 20 are both located on the connecting line of the optical center C11 of the left eye camera 11 and the optical center C12 of the right eye camera 12 of the binocular camera 10, which helps to reduce the amount of calculation of the multi-vision sensor fusion device 1 during calibration.
Of course, in other examples of the present invention, if the space between the left eye camera 11 and the right eye camera 12 of the binocular camera 10 is small and the TOF camera 20 and the RGB camera 40 cannot be placed simultaneously, the RGB camera 40 and the TOF camera 20 may also be arranged vertically side by side, i.e., the optical center C40 of the RGB camera 40 and the optical center C20 of the TOF camera 20 are on the same vertical line, so that the distance between the optical center C40 of the RGB camera 40 and the optical center C11 of the left eye camera 11 of the binocular camera 10 is equal to the distance between the optical center C40 of the RGB camera 40 and the optical center C12 of the right eye camera 12. In other words, the RGB camera is disposed between the left eye camera and the right eye camera of the binocular camera, and a line between the optical center C40 of the RGB camera 40 and the optical center C20 of the TOF camera 20 is perpendicular to a line between the optical center C11 of the left eye camera 11 and the optical center C12 of the right eye camera 12.
It is noted that, although the left eye image and the right eye image acquired by the left eye camera 11 and the right eye camera 12 of the binocular camera 10 may be implemented as grayscale images or black and white images, in some examples of the present invention, the left eye image and the right eye image acquired by the left eye camera 11 and the right eye camera 12 of the binocular camera 10 may also be implemented as RGB images, so that the multi-vision sensor fusion apparatus 1 can optimize the initial depth image by guiding filtering with direct reference to the left eye RGB image or the right eye RGB image without additionally providing the RGB camera 40, so as to obtain the filtered depth image.
It is worth mentioning that the difficulty of binocular matching lies in how to calculate the matching point pairs, and mainly after obtaining the position information of the left eye camera 11 and the right eye camera 12 through calculation, according to the nature of epipolar geometry, the corresponding points are intelligently determined to be on the epipolar line, and the specific parallax cannot be determined, so that the traditional matching algorithm has a long overall matching time and huge calculation overhead no matter whether the overall matching algorithm is global or semi-global. Therefore, in order to solve such a problem, the multi-video sensing fusion apparatus 1 according to the above-described embodiment of the present invention performs binocular matching (i.e., performs fusion processing) using the depth image obtained by the TOF camera 20 as a reference image, so as to reduce the difficulty of matching.
Specifically, in the above embodiment of the present invention, as shown in fig. 3 and 5, the fusion module 32 of the multi-vision sensor fusion system 30 of the multi-vision sensor fusion apparatus 1 may include a distortion epipolar line correction module 321 and a binocular matching module 322, wherein the distortion epipolar line correction module 321 may be communicably connected to the preprocessing module 33 for performing epipolar distortion correction on the preprocessed left eye image and the preprocessed right eye image to obtain a corrected binocular image; the binocular matching module 322 is respectively communicably connected to the distortion epipolar line correction module 321 and the preprocessing module 33, and is configured to refer to the preprocessed depth image, and perform binocular stereo matching on the corrected binocular image to obtain the initial binocular depth image, which is used as the fused depth image.
More specifically, as shown in fig. 3 and fig. 5, the binocular matching module 322 of the fusion module 32 further includes a conversion module 3221 and a price calculation module 3222, wherein the conversion module 3221 is configured to convert the preprocessed depth image into a coordinate system of the left eye camera 11 according to a pose relationship between the TOF camera 20 and the left eye camera 11 to obtain a left eye depth reference image; the cost calculating module 3222 is configured to perform cost calculation on pixel points having depth values in the left-eye depth reference image within the preset parallax range to perform point-to-point matching, so as to obtain an initial binocular depth image, thereby greatly reducing the time required for matching, and overcoming the disadvantages of a binocular camera. Of course, in other examples of the present invention, the converting module 3221 of the binocular matching module 322 may be configured to convert the preprocessed depth image into the coordinate system of the right eye camera 12 according to the pose relationship between the TOF camera 20 and the right eye camera 12 to obtain a right eye depth reference image; next, the cost calculating module 3222 may also perform cost calculation on pixel points having depth values in the right-eye depth reference image in the preset parallax range to perform point-to-point matching, so as to obtain an initial binocular depth image.
Notably, due to the limited detection range of the TOF camera 20 (e.g., 1-3m), the TOF camera 20 has better depth information only within the detection range. Therefore, in the above embodiment of the present invention, the binocular matching module 322 of the fusion module 32 only converts the pixel points in the pre-processed depth image whose depth value is smaller than the detection distance into the coordinate system of the left eye camera 11, so that only the depth information in the pre-processed depth image whose depth value is smaller than the detection distance is retained in the left eye depth reference image. In other words, the binocular matching module 322 refers to the depth image obtained by the TOF camera 20 when performing point-to-point matching on the pixel points whose depth values are smaller than the detection distance; when the point-to-point matching is performed on the pixel points whose depth values are not less than (greater than or equal to) the detection distance, the depth image obtained by the TOF camera 20 is not referred to, and the conventional binocular matching method is directly used to ensure that the obtained binocular depth image at multiple distances has higher accuracy.
It should be noted that, since the binocular camera 10 has a problem of occlusion in the past, so that a situation of mismatching inevitably occurs when the binocular camera 10 performs binocular matching, in order to solve the above problem, as shown in fig. 3 and 5, the multi-vision sensor fusion system 30 of the multi-vision sensor fusion device 1 according to the above embodiment of the present invention further includes a post-processing module 34, wherein the post-processing module 34 is communicably connected to the fusion module 32, and is configured to perform post-processing on the fused depth image to obtain a dense and high-precision depth image.
Specifically, as shown in fig. 3 and 5, the post-processing module 34 includes a left-right consistency detection module 341 and a mis-matching point filling module 342, which are communicably connected to each other, wherein the left-right consistency detection module 341 is configured to perform left-right consistency detection on the initial binocular depth image to obtain mis-matching points existing in the initial binocular depth image; the mismatch point filling module 342 is configured to perform filling processing on the depth information of the mismatch points to remove the mismatch points.
Preferably, when the depth of the mismatch point is smaller than the detection distance of the TOF camera 20, the mismatch point filling module 342 fills the depth information of the corresponding pixel point in the preprocessed depth image to the mismatch point; when the depth of the mismatch point is not less than the detection distance of the TOF camera 20, the mismatch point filling module 342 does not fill the mismatch point to remove the mismatch point. Of course, in other examples of the present invention, when the depth of the mismatch point is not less than the detection distance of the TOF camera 20, the mismatch point filling module 342 may fill the mismatch point by using a horizontal line search method.
It is to be noted that, when the mismatch point existing in the initial binocular depth image cannot correspond to a pixel point on the preprocessed depth image (i.e., the mismatch point existing in the initial binocular depth image does not have reference depth information), the mismatch point filling module 342 may further perform filling matching according to the depth information of the neighborhood. If there is no depth information in the larger connected domain, the mis-matching point filling module 342 may perform filling matching on the disparity value outside the detection distance of the TOF camera 20 through a conversion formula to obtain a filled binocular depth image, so as to ensure that the binocular depth image has good depth information on objects outside the detection distance of the TOF camera 20, which is beneficial to combining the advantages of the TOF camera and the binocular camera.
Further, as shown in fig. 3 and 5, the post-processing module 34 of the multi-vision sensor fusion system 30 further includes a stray point filtering module 343, where the stray point filtering module 343 is communicably connected to the mismatch point filling module 342, and is configured to perform connected domain determination on the filled binocular depth image, and remove a region with a large change and a small connected space, so as to obtain the binocular depth image with the stray points filtered.
Further, in the above embodiment of the present invention, as shown in fig. 3 and fig. 5, the post-processing module 34 of the multi-vision sensor fusion system 30 further includes a dyeing module 344, wherein the dyeing module 344 is respectively communicably connected to the stray point filtering module 343 and the acquiring module 31, and is configured to perform dyeing processing on the binocular depth image with the stray points filtered out according to the RGB image, so as to fill color information into the binocular depth image to obtain a color depth image. In this way, the multi-vision sensor fusion apparatus 1 can obtain a depth image with good accuracy and texture information.
According to the above-mentioned embodiment of the present invention, the multi-vision sensor fusion system 30 usually uses the pose relationship (i.e. external parameters) between the various vision sensors in the multi-vision sensor fusion apparatus 1 during the pre-processing (such as RGB image-based guided filtering processing, etc.), the fusion processing (such as depth image-based binocular matching, etc.), and/or the post-processing (such as filling processing of mis-matched points; or RGB image-based dyeing processing, etc.), for example: a pose relationship between the left eye camera 11 and the right eye camera 12 of the binocular camera 10; a pose relationship between the TOF camera 20 and the left eye camera 11 (or the right eye camera 12); the pose relationship between the RGB camera 40 and the TOF camera 20, and so on. Therefore, in the above embodiment of the present invention, as shown in fig. 1 to 3, the multi-vision sensor fusion system 30 may further include a calibration module 35, wherein the calibration module 35 is communicably connected to the acquisition module 31 for calibrating the binocular camera 10, the TOF camera 20 and the RGB camera 40 in the multi-vision sensor fusion apparatus 1 through a target unit 50 to obtain the pose relationship between the left eye camera 11 and the right eye camera 12, the pose relationship between the TOF camera 20 and the left eye camera 11 and the pose relationship between the RGB camera 40 and the TOF camera 20, respectively.
Preferably, as shown in fig. 2 and 4, the calibration board unit 50 is implemented as a stereo calibration board 51, wherein the stereo calibration board 51 comprises a first calibration panel 511, a second calibration panel 512 and a third calibration panel 513, wherein the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are arranged edge to edge with each other, and calibration surfaces of the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are all not coplanar, so that the stereo calibration board 51 forms a three-face type calibration board. Such that when the stereo calibration plate 51 is properly positioned within the common field of view of the binocular camera 10, the TOF camera 20 and the RGB camera 40 of the multi-vision sensor fusion apparatus 1, such that the binocular camera 10, the TOF camera 20 and the RGB camera 40 can each photograph the calibration faces of the first, second and third calibration panels 511, 512, 513, when calibrating the plurality of visual sensors of the multi-visual-sensor fusion device 1, there is no need to move the position of the stereo calibration plate 51, the binocular camera 10, the TOF camera 20 and the RGB camera 40 of the multi-vision sensor fusion apparatus 1 are all capable of acquiring images of three calibration planes as three calibration images by one shot, calibration is then completed by the calibration module 35, and the pose relationship among the binocular camera 10, the TOF camera 20, and the RGB camera 40 is obtained. It is understood that in other examples of the present invention, the calibration board unit 50 may be implemented as a planar calibration board, so that when calibrating the plurality of visual sensors of the multi-visual sensor fusion device 1, three or more calibration images may be obtained by moving the position of the planar calibration board to complete the calibration.
Preferably, as shown in fig. 4, the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 have a common corner point as a vertex of the stereoscopic calibration plate 51. Thus, when the stereo calibration plate 51 is placed, only the vertex of the stereo calibration plate 51 is required to face the TOF camera 20 of the multi-vision sensor fusion device 1, so that the binocular camera 10, the TOF camera 20 and the RGB camera 40 can acquire three calibration images through one-time shooting.
More preferably, as shown in fig. 4, the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are perpendicular to each other, which helps to further simplify the calibration algorithm of the multi-vision sensor fusion device 1.
It should be noted that, because the resolution of the TOF camera 20 is low, the TOF camera 20 has high requirements on the identification degree and the extraction difficulty of the pattern on the calibration board, so as to obtain a good spatial pose relationship, in the above embodiment of the present invention, as shown in fig. 4, the first calibration panel 511, the second calibration panel 512, and the third calibration panel 513 of the stereo calibration board 51 are all implemented as a circular calibration board, that is, the calibration patterns on the first calibration panel 511, the second calibration panel 512, and the third calibration panel 513 are patterns with circular mark arrays, so that the identification degree of the stereo calibration board 51 is high and the extraction difficulty is low, so as to obtain a more accurate pose relationship through calibration.
For example, as shown in fig. 4, the calibration patterns on the first calibration panel 511, the second calibration panel 512, and the third calibration panel 513 may be white background, black patterns (i.e. black characters with white background); in other words, the circular signs on the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are black, and the background is white. Of course, in other examples of the present invention, the calibration patterns on the first calibration panel 511, the second calibration panel 512, and the third calibration panel 513 may also be black background, white patterns (i.e. black and white characters); in other words, the circular signs on the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are white, and the background is black.
Further, as shown in fig. 4, the overlapping edges of the first calibration panel 511, the second calibration panel 512, and the third calibration panel 513 are black to distinguish the boundaries of the three calibration images, so as to facilitate the segmentation of the three calibration images, which helps to simplify the subsequent calibration calculation.
According to the above embodiment of the present invention, as shown in fig. 3, the calibration module 35 of the multi-vision sensor fusion system 30 includes a segmentation module 351, a classification module 352, a sequencing module 353 and a calibration algorithm module 352, which are communicably connected in sequence, wherein the segmentation module 351 segments the whole calibration image obtained by synchronously shooting the stereoscopic calibration plate 51 through the binocular camera 10, the TOF camera 20 and the RGB camera 40 respectively to obtain three segmented calibration images respectively, so that the segmented calibration images respectively correspond to the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 of the stereoscopic calibration plate 51; the classification module 352 is configured to classify surfaces where the circular marks are located in the segmented calibration image, so that the circular marks located on the same calibration surface are classified into the same class; the sorting module 353 is configured to sort the circle center coordinates of each type of the circular marks to obtain sorted circle center coordinate data; the calibration algorithm module 352 is configured to perform calibration calculation on the sorted circle center coordinate data to obtain pose relationships between all the visual sensors in the multi-visual sensor fusion device 1. It can be understood that, in calibrating all the visual sensors in the multi-visual sensor fusion device 1, it is necessary to ensure that the calibration images obtained by all the visual sensors must maintain frame synchronization.
For example, in an example of the present invention, the calibration algorithm module 352 may perform information calibration on the left eye camera 11 and the right eye camera 12 of the binocular camera 10, and accurately obtain the pose relationship between the left eye camera 11 and the right eye camera 12 through an optimized calibration algorithm; then, the calibration algorithm module 352 may perform information calibration on the TOF camera 20 and the RGB camera 40 to obtain a pose relationship between the TOF camera 20 and the RGB camera 40; finally, the calibration algorithm module 352 may perform information calibration on the left eye camera 11 and the TOF camera 20 to obtain a pose relationship between the left eye camera 11 and the TOF camera 20, so as to obtain pose relationships between all the visual sensors in the multi-visual sensor fusion apparatus 1.
Illustrative method
Referring to fig. 6-10 of the drawings, a multi-vision sensor fusion method in accordance with an embodiment of the present invention is illustrated. Specifically, as shown in fig. 6, the multi-vision sensor fusion method includes the steps of:
s200: acquiring image information of a scene acquired by a left eye camera 11 and a right eye camera 12 of a binocular camera 10 to obtain initial left eye images and initial right eye images, and acquiring depth information of the scene synchronously acquired by a TOF camera 20 to obtain initial depth images, wherein the relative position between the TOF camera 20 and the left eye camera 11 and the right eye camera 12 is fixed;
s300: respectively preprocessing the initial depth image, the initial left eye image and the initial right eye image to obtain a preprocessed depth image, a preprocessed left eye image and a preprocessed right eye image; and
s400: and taking the preprocessed depth image as a reference, and fusing the preprocessed left eye image and the preprocessed right eye image in a preset parallax range to obtain a fused depth image.
It is understood that in other examples of the present invention, the multi-vision sensor fusion method may not include the step S300, so that in the step S400, the initial left eye image and the initial right eye image are processed directly with reference to the initial depth image to obtain the fused depth image.
It is noted that, in the above embodiment of the present invention, the step S200 of the multi-vision sensor fusion method may further include the steps of: acquiring color information of the scene by an RGB camera 40 to obtain RGB images, wherein the relative position between the RGB camera 40 and the TOF camera 20 is fixed.
In an example of the present invention, as shown in fig. 7, the step S300 of the multi-vision sensor fusion method may include the steps of:
s310: respectively carrying out optimized denoising on the initial left eye image and the initial right eye image through median filtering to obtain a filtered left eye image and a filtered right eye image; and
s320: and respectively carrying out down-sampling on the filtered left eye image and the filtered right eye image to obtain the preprocessed left eye image and the preprocessed right eye image.
Further, in the above example of the present invention, as shown in fig. 7, the step S300 of the multi-vision sensor fusion method may further include the steps of:
s330: optimizing the initial depth information with reference to the RGB image through guided filtering to obtain a filtered depth image; and
s340: upsampling the filtered depth image to obtain an upsampled depth image such that the upsampled depth image is treated as the preprocessed depth image.
Preferably, the resolution of the pre-processed left eye image and the pre-processed right eye image are both 2-3 times the resolution of the pre-processed depth image.
It is understood that the step S310 is not in sequence with the step S330, that is, the step S310 may be before or after the step S330, or the step S310 may be executed synchronously with the step S330.
It is worth mentioning that, in an example of the present invention, as shown in fig. 8, the step S400 of the multi-vision sensor fusion method may include the steps of:
s410: performing epipolar distortion correction on the preprocessed left eye image and the preprocessed right eye image to obtain a corrected binocular image; and
s420: and referring to the preprocessed depth image, and performing binocular stereo matching on the corrected binocular image in the preset parallax range to obtain an initial binocular depth image serving as the fused depth image.
Further, in this example of the present invention, as shown in fig. 8, the step S420 includes the steps of:
s421: converting the preprocessed depth image into a coordinate system of the left eye camera 11 according to the pose relationship between the TOF camera 20 and the left eye camera 11 to obtain a left eye depth reference image; and
s422: and carrying out cost calculation on pixel points with depth values in the left eye depth reference image in the preset parallax range so as to carry out point pair matching, so as to obtain the initial binocular depth image.
It should be noted that, in the step S421, the pixel point in the preprocessed depth image whose depth value is smaller than the detection distance of the TOF camera is converted into the coordinate system of the left eye camera, so that only the depth information in the preprocessed depth image whose depth value is smaller than the detection distance is retained in the left eye depth reference image.
It should be noted that, in the above embodiment of the present invention, as shown in fig. 6, the multi-vision sensor fusion method further includes the steps of:
s500: and carrying out post-processing on the fused depth image to obtain a final depth image, so that the final depth image is a dense depth image with higher precision.
In an example of the present invention, as shown in fig. 9, the step S500 of the multi-vision sensor fusion method includes the steps of:
s510: carrying out left-right consistency detection on the initial binocular depth image to obtain mismatching points existing in the initial binocular depth image;
s520: and filling the depth information of the mismatching points to eliminate the mismatching points and obtain a filled binocular depth image.
It is to be noted that, in the step S520 of the above example of the present invention: when the depth of the mismatch point is smaller than the detection distance of the TOF camera 20, filling the depth information of the corresponding pixel point in the preprocessed depth image into the mismatch point; when the depth of the mismatching point is not less than the detection distance of the TOF camera 20, filling the mismatching point by adopting a horizontal line searching method; and when the mismatching point cannot correspond to any pixel point on the preprocessed depth image, filling the mismatching point through the depth information of the neighborhood.
In the above example of the present invention, preferably, as shown in fig. 9, the step S500 of the multi-vision sensor fusion method may further include the steps of:
s530: and carrying out connected domain pernicity on the filled binocular depth image, and removing the region with large change and small connected space to obtain the binocular depth image with the stray points filtered out.
More preferably, as shown in fig. 9, the step S500 of the multi-vision sensor fusion method may further include the steps of:
s540: and based on the RGB image, dyeing the binocular depth image with the stray points filtered out so as to fill color information in the RGB image into the binocular depth image to obtain the color depth image.
According to the above embodiment of the present invention, as shown in fig. 6, the multi-vision sensor fusion method, before the step S200, further includes the steps of:
s100: calibrating the binocular camera 10, the TOF camera 20 and the RGB camera 40 by a target unit 50 to obtain the pose relationship among the TOF camera 20, the RGB camera 40 and the left eye camera 11 and the right eye camera 12 of the binocular camera 10.
It is noted that in the step S100 of the multi-vision sensor fusion method of the present invention, the calibration board unit 50 is implemented as a stereo calibration board 51, wherein the stereo calibration board 51 includes a first calibration panel 511, a second calibration panel 512 and a third calibration panel 513, wherein the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are arranged edge to edge with each other, and calibration surfaces of the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are all not coplanar.
Preferably, the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 have a common corner point.
More preferably, the calibration patterns on the first, second and third calibration panels 511, 512, 513 are patterns having circular arrays of markers.
Most preferably, the circular signs on the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 are black, and the coinciding edges of the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 with each other are black.
It should be noted that, in an example of the present invention, as shown in fig. 10, the step S100 of the multi-vision sensor fusion method includes the steps of:
s110: respectively segmenting the whole calibration image obtained by synchronously shooting the stereoscopic calibration plate 51 by the binocular camera 10, the TOF camera 20 and the RGB camera 40 to respectively obtain three segmented calibration images, so that the segmented calibration images respectively correspond to the first calibration panel 511, the second calibration panel 512 and the third calibration panel 513 of the stereoscopic calibration plate 51;
s120: classifying the surfaces of the circular marks in the divided calibration images so as to enable the circular marks on the same calibration surface to be classified into the same class;
s130: sorting the circle center coordinates of each type of circular marks to obtain sorted circle center coordinate data; and
s140: and performing calibration calculation based on the sorted circle center coordinate data to obtain a pose relationship between the left eye camera 11 and the right eye camera 12, a pose relationship between the TOF camera 20 and the left eye camera 11, and a pose relationship between the RGB camera 40 and the TOF camera 20.
Illustrative electronic device
Next, an electronic apparatus according to an embodiment of the present invention is described with reference to fig. 11 (fig. 11 shows a block diagram of the electronic apparatus according to an embodiment of the present invention). As shown in fig. 11, the electronic device 60 includes one or more processors 61 and a memory 62.
The processor 61 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
The memory 62 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 61 to implement the methods of the various embodiments of the invention described above and/or other desired functions.
In one example, as shown in fig. 11, the electronic device 60 may further include: an input device 63 and an output device 64, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 63 may be, for example, a camera module or the like for capturing image data or video data.
The output device 64 can output various information including the classification result and the like to the outside. The output devices 64 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for the sake of simplicity, only some of the components of the electronic device 60 relevant to the present invention are shown in fig. 11, and components such as a bus, an input/output interface, and the like are omitted. In addition, the electronic device 60 may include any other suitable components depending on the particular application.
Illustrative computing program product
In addition to the above-described methods and apparatus, embodiments of the present invention may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the methods according to various embodiments of the present invention described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present invention in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, an embodiment of the present invention may also be a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform the steps of the above-described method of the present specification.
The computer readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present invention have been described above with reference to specific embodiments, but it should be noted that the advantages, effects, etc. mentioned in the present invention are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present invention. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the invention is not limited to the specific details described above.
The block diagrams of devices, apparatuses, systems involved in the present invention are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the apparatus, devices and methods of the present invention, the components or steps may be broken down and/or re-combined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are given by way of example only and are not limiting of the invention. The objects of the invention have been fully and effectively accomplished. The functional and structural principles of the present invention have been shown and described in the examples, and any variations or modifications of the embodiments of the present invention may be made without departing from the principles.

Claims (38)

1. A multi-vision sensor fusion device, comprising:
the binocular camera comprises a left eye camera and a right eye camera which are arranged at intervals, wherein the left eye camera is used for acquiring image information of a scene to obtain an initial left eye image, and the right eye camera is used for synchronously acquiring the image information of the scene to obtain an initial right eye image;
the TOF camera is fixed in relative position with the binocular camera and is used for synchronously acquiring depth information of the scene to obtain an initial depth image; and
a multi-vision sensor fusion system, wherein the multi-vision sensor fusion system is communicably connected to the binocular camera and the TOF camera, respectively, and is configured to process the initial left eye image and the initial right eye image with reference to the initial depth image to obtain a fused depth image.
2. The multi-vision sensor fusion apparatus of claim 1, further comprising an RGB camera, wherein the relative position between said RGB camera and said TOF camera is fixed for synchronously acquiring color information of the scene to obtain RGB images.
3. The multi-vision sensor fusion device of claim 2, wherein the TOF camera is disposed between the left eye camera and the right eye camera of the binocular camera, and an optical center of the TOF camera is located on a line between the optical center of the left eye camera and the optical center of the right eye camera.
4. The multi-vision sensor fusion device of claim 3, wherein the RGB camera is disposed between the left eye camera and the right eye camera of the binocular camera, and optical centers of the RGB camera are located on a line connecting the optical centers of the left eye camera and the right eye camera.
5. The multi-vision sensor fusion device of claim 3, wherein the RGB camera is disposed between the left eye camera and the right eye camera of the binocular camera, and a line between the optical centers of the RGB camera and the TOF camera is perpendicular to a line between the optical centers of the left eye camera and the right eye camera.
6. The multi-vision sensor fusion apparatus of claim 1, wherein the multi-vision sensor fusion system comprises an acquisition module and a fusion module communicably connected to each other, wherein the acquisition module is communicably connected to the binocular camera and the TOF camera, respectively, for acquiring the initial left eye image, the initial right eye image, and the initial depth image; the fusion module is used for performing point-to-point matching on the initial left eye image and the initial right eye image within a preset parallax range by taking the depth value of a pixel point in the initial depth image as a reference so as to obtain an initial binocular depth image, and the initial binocular depth image is used as the fused depth image.
7. The multi-vision sensor fusion apparatus of any one of claims 2 to 5 in which the multi-vision sensor fusion system comprises an acquisition module and a fusion module communicably connected to each other, wherein the acquisition module is communicably connected to the binocular camera and the TOF camera, respectively, for acquiring the initial left eye image, the initial right eye image, the initial depth image, and the RGB images; the fusion module is used for performing point-to-point matching on the initial left eye image and the initial right eye image within a preset parallax range by taking the depth value of a pixel point in the initial depth image as a reference so as to obtain an initial binocular depth image, and the initial binocular depth image is used as the fused depth image.
8. The multi-vision sensor fusion device of claim 7, wherein the multi-vision sensor fusion system further comprises a pre-processing module, wherein the pre-processing module is communicably disposed between the acquisition module and the fusion module, for pre-processing the initial left-eye image, the initial right-eye image, and the initial depth image acquired via the acquisition module to obtain a pre-processed left-eye image, a pre-processed right-eye image, and a pre-processed depth image, such that the fusion module is configured to perform point-to-point matching on the pre-processed left-eye image and the pre-processed right-eye image within the preset disparity range with reference to depth values of pixel points in the pre-processed depth image to obtain the initial binocular depth image.
9. The multi-vision sensor fusion device of claim 8, wherein the preprocessing module comprises a median filtering module, a down-sampling module, and a depth filtering module, wherein the median filtering module is communicatively connected to the obtaining module for performing optimized de-noising on the initial left eye image and the initial right eye image by median filtering to obtain a filtered left eye image and a filtered right eye image, respectively; wherein the down-sampling module is communicably connected to the median filtering module and configured to down-sample the filtered left eye image and the filtered right eye image, respectively, to obtain the pre-processed left eye image and the pre-processed right eye image; the depth filtering module is communicably connected to the obtaining module, and configured to perform filtering processing on the initial depth image with reference to the RGB image through guided filtering to obtain a filtered depth image, so that the filtered depth image is directly used as the pre-processed depth image.
10. The multi-vision sensor fusion device of claim 9, wherein the pre-processing module further includes an upsampling module, wherein the upsampling module is communicatively coupled with the depth filtering module for upsampling the filtered depth image to obtain an upsampled depth image, such that the upsampled depth image is treated as the pre-processed depth image.
11. The multi-vision sensor fusion apparatus of claim 10, wherein the fusion module comprises a distortion epipolar correction module and a binocular matching module communicatively connected to each other, wherein the distortion epipolar correction module is communicatively connected to the preprocessing module for epipolar distortion correction of the preprocessed left eye image and the preprocessed right eye image to obtain corrected binocular images; the binocular matching module is communicably connected with the preprocessing module and is used for referencing the preprocessed depth image and performing binocular stereo matching on the corrected binocular image to obtain the initial binocular depth image.
12. The multi-vision sensor fusion apparatus of claim 11, wherein the binocular matching module comprises a transformation module and a cost calculation module, which are communicably connected to each other, wherein the transformation module is configured to transform the preprocessed depth image into the coordinate system of the left eye camera according to the pose relationship between the TOF camera and the left eye camera to obtain a left eye depth reference image, and wherein the cost calculation module is configured to perform cost calculation on pixel points having depth values in the left eye depth reference image within the predetermined parallax range for performing point-to-point matching, so as to obtain the initial binocular depth image.
13. The multi-vision sensor fusion device of claim 12, wherein the conversion module of the binocular matching module is only used for converting pixel points in the pre-processed depth image with depth values smaller than the detection distance of the TOF camera into the coordinate system of the left eye camera, so that only depth information in the pre-processed depth image with depth values smaller than the detection distance is retained in the left eye depth reference image.
14. The multi-vision sensor fusion device of claim 7, wherein the multi-vision sensor fusion system further comprises a post-processing module, wherein the post-processing module is communicatively coupled to the fusion module for post-processing the initial binocular depth image to obtain a final depth image such that the final depth image is a dense and highly accurate depth image.
15. The multi-vision sensor fusion device of claim 14, wherein the post-processing module comprises a left-right consistency detection module and a mis-matching point filling module communicably connected to each other, wherein the left-right consistency detection module is configured to perform left-right consistency detection on the initial binocular depth image to obtain mis-matching points existing in the initial binocular depth image; and the mismatching point filling module is used for filling the depth information of the mismatching points so as to eliminate the mismatching points.
16. The multi-vision sensor fusion device of claim 15, wherein the mis-match point filling module is further configured to fill depth information of a corresponding pixel point in the pre-processed depth image to the mis-match point when the depth of the mis-match point is less than a detection distance of the TOF camera; and when the depth of the mismatching point is not less than the detection distance of the TOF camera, filling the mismatching point by adopting a horizontal line searching method.
17. The multi-vision sensor fusion device of claim 16, wherein the post-processing module further comprises a stray point filtering module, wherein the stray point filtering module is communicably connected to the mis-matching point filling module, and is configured to remove a region with a large change and a small connected space by performing connected domain determination on the filled binocular depth image to obtain the binocular depth image with the stray points filtered out.
18. The multi-vision sensor fusion device of claim 17, wherein the post-processing module further comprises a staining module, wherein the staining module is communicatively connected to the stray point filtering module and the acquiring module, respectively, and is configured to stain the binocular depth image with the stray points filtered out based on the RGB image to obtain a color depth image.
19. The multi-vision sensor fusion apparatus of any one of claims 2 to 5, wherein the multi-vision sensor fusion system further comprises a calibration module, wherein the calibration module is configured to calibrate the binocular camera, the TOF camera and the RGB camera through a target unit to obtain a pose relationship between the left eye camera and the right eye camera of the binocular camera, a pose relationship between the TOF camera and the left eye camera, and a pose relationship between the RGB camera and the TOF camera, respectively.
20. The multi-vision sensor fusion device of claim 19, further comprising the calibration unit, wherein the calibration unit is a stereo calibration plate, wherein the stereo calibration plate comprises a first calibration panel, a second calibration panel, and a third calibration panel, wherein the first calibration panel, the second calibration panel, and the third calibration panel are arranged edge-to-edge with respect to each other, and the calibration surfaces of the first calibration panel, the second calibration panel, and the third calibration panel are all not coplanar, such that the stereo calibration plate forms a three-sided calibration plate.
21. The multi-vision sensor fusion device of claim 20, wherein the first calibration panel, the second calibration panel, and the third calibration panel have a common corner point as a vertex of the stereo calibration plate.
22. The multi-vision sensor fusion device of claim 21 in which the calibration patterns on the first, second and third calibration panels are patterns having an array of circular markers.
23. The multi-vision sensor fusion device of claim 22 in which the circular markings on each of the calibration patterns are black and the background is white; the first calibration panel, the second calibration panel and the third calibration panel are black in superposition.
24. The multi-vision sensor fusion device of claim 23, wherein the calibration module comprises a segmentation module, a classification module, a sequencing module and a calibration algorithm module, which are sequentially communicably connected, wherein the segmentation module is configured to segment the whole calibration image obtained by synchronously shooting the stereoscopic calibration plate with the binocular camera, the TOF camera and the RGB camera, respectively, to obtain three segmented calibration images, respectively, so that the segmented calibration images correspond to the first calibration panel, the second calibration panel and the third calibration panel of the stereoscopic calibration plate, respectively; the classification module is used for classifying the surfaces of the circular marks in the divided calibration images so as to classify the circular marks on the same calibration surface into the same class; the sorting module is used for sorting the circle center coordinates of each type of circular mark to obtain sorted circle center coordinate data; the calibration algorithm module is used for performing calibration calculation on the sorted circle center coordinate data to obtain a required pose relation.
25. A multi-vision sensor fusion method, comprising the steps of:
s200: acquiring image information of a scene acquired by a left eye camera and a right eye camera of a binocular camera to obtain an initial left eye image and an initial right eye image, and acquiring depth information of the scene synchronously acquired by a TOF camera to obtain an initial depth image, wherein the relative position between the TOF camera and the binocular camera is fixed;
s300: respectively preprocessing the initial depth image, the initial left eye image and the initial right eye image to obtain a preprocessed depth image, a preprocessed left eye image and a preprocessed right eye image; and
s400: and taking the preprocessed depth image as a reference, and fusing the preprocessed left eye image and the preprocessed right eye image in a preset parallax range to obtain a fused depth image.
26. The multi-vision sensor fusion method of claim 25, wherein the step S200 further comprises the steps of: acquiring color information of the scene acquired by an RGB camera to obtain an RGB image, wherein the relative position between the RGB camera and the TOF camera is fixed.
27. The multi-vision sensor fusion method of claim 26, wherein the step S300 includes the steps of:
respectively optimizing and drying the initial left eye image and the initial right eye image through median filtering to obtain a filtered left eye image and a filtered right eye image; and
and respectively carrying out down-sampling on the filtered left eye image and the filtered right eye image to obtain the preprocessed left eye image and the preprocessed right eye image.
28. The multi-vision sensor fusion method of claim 27, wherein the step S300 further comprises the steps of:
filtering the initial depth image by referring to the RGB image through guiding filtering to obtain a filtered depth image; and
upsampling the filtered depth image to obtain an upsampled depth image, such that the upsampled depth image is taken as the preprocessed depth image.
29. The multi-vision sensor fusion method of claim 28, wherein the step S400 includes the steps of:
s410: performing epipolar distortion correction on the preprocessed left eye image and the preprocessed right eye image to obtain a corrected binocular image; and
s420: and referring to the preprocessed depth image, and performing binocular stereo matching on the corrected binocular image to obtain an initial binocular depth image.
30. The multi-vision sensor fusion method of claim 29, wherein the step S420 includes the steps of:
converting the preprocessed depth image into a coordinate system of the left eye camera according to the pose relationship between the TOF camera and the left eye camera to obtain a left eye depth reference image; and
and carrying out cost calculation on pixel points with depth values in the left eye depth reference image in the preset parallax range to carry out point pair matching, thereby obtaining the initial binocular depth image.
31. The multi-vision sensor fusion method of claim 30, further comprising the steps of:
s500: and carrying out post-processing on the initial binocular depth image to obtain a final depth image, so that the final depth image is dense and has higher precision.
32. The multi-vision sensor fusion method of claim 31, wherein the step S500 includes the steps of:
s510: carrying out left-right consistency detection on the initial binocular depth image to obtain mismatching points existing in the initial binocular depth image; and
s520: and filling the depth information of the mismatching points to eliminate the mismatching points and obtain a filled binocular depth image.
33. The multi-vision sensor fusion method of claim 32, wherein in the step S520, when the depth of the mis-matching point is smaller than the detection distance of the TOF camera, the depth information of the corresponding pixel point in the preprocessed depth image is filled into the mis-matching point; and when the depth of the mismatching point is not less than the detection distance of the TOF camera, filling the mismatching point by adopting a horizontal line searching method.
34. The multi-vision sensor fusion method of claim 33, wherein the step S500 further comprises the steps of:
s530: and through the judgment of the connected domain of the filled binocular depth image, the area with large change and small connected space is removed, so that the binocular depth image with the stray points filtered out is obtained.
35. The multi-vision sensor fusion method of claim 34, wherein the step S500 further comprises the steps of:
s540: based on the RGB image, dyeing the binocular depth image with the stray points filtered out is carried out to obtain a color depth image.
36. The multi-vision sensor fusion method of any one of claims 25 to 35, further comprising, before said step S200, the step of:
s100: the binocular camera, the TOF camera and the RGB camera are calibrated through a target unit, so that the pose relationship between the left eye camera and the right eye camera of the binocular camera, the pose relationship between the TOF camera and the left eye camera and the pose relationship between the RGB camera and the TOF camera are obtained respectively.
37. The multi-vision sensor fusion method of claim 36, wherein the step S100 includes the steps of:
respectively segmenting the whole calibration image obtained by synchronously shooting a three-dimensional calibration plate through the binocular camera, the TOF camera and the RGB camera to respectively obtain three segmented calibration images, so that the segmented calibration images respectively correspond to a first calibration panel, a second calibration panel and a third calibration panel of the three-dimensional calibration plate;
classifying the surfaces where the circular marks are located in the divided calibration images so that the circular marks on the same calibration surface are classified into the same class;
sorting the circle center coordinates of each type of the circular marks to obtain sorted circle center coordinate data; and
and calibrating and calculating the sorted circle center coordinate data to obtain the required pose relationship.
38. An electronic device, comprising:
a processor for executing instructions; and
a memory, wherein the memory is configured to hold machine readable instructions for execution by the logic to perform some or all of the steps in the multi-vision sensor fusion method of any of claims 25 to 37.
CN201911103519.1A 2019-11-13 2019-11-13 Multi-vision sensor fusion device and method and electronic equipment Pending CN112802114A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911103519.1A CN112802114A (en) 2019-11-13 2019-11-13 Multi-vision sensor fusion device and method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911103519.1A CN112802114A (en) 2019-11-13 2019-11-13 Multi-vision sensor fusion device and method and electronic equipment

Publications (1)

Publication Number Publication Date
CN112802114A true CN112802114A (en) 2021-05-14

Family

ID=75803011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911103519.1A Pending CN112802114A (en) 2019-11-13 2019-11-13 Multi-vision sensor fusion device and method and electronic equipment

Country Status (1)

Country Link
CN (1) CN112802114A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115127449A (en) * 2022-07-04 2022-09-30 山东大学 Non-contact fish body measuring device and method assisting binocular vision
CN115963917A (en) * 2022-12-22 2023-04-14 北京百度网讯科技有限公司 Visual data processing apparatus and visual data processing method
CN115965673A (en) * 2022-11-23 2023-04-14 中国建筑一局(集团)有限公司 Centralized multi-robot positioning method based on binocular vision

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106772431A (en) * 2017-01-23 2017-05-31 杭州蓝芯科技有限公司 A kind of Depth Information Acquistion devices and methods therefor of combination TOF technologies and binocular vision
CN108322724A (en) * 2018-02-06 2018-07-24 上海兴芯微电子科技有限公司 Image solid matching method and binocular vision equipment
CN108682026A (en) * 2018-03-22 2018-10-19 辽宁工业大学 A kind of binocular vision solid matching method based on the fusion of more Matching units
CN109615652A (en) * 2018-10-23 2019-04-12 西安交通大学 A kind of depth information acquisition method and device
CN110009672A (en) * 2019-03-29 2019-07-12 香港光云科技有限公司 Promote ToF depth image processing method, 3D rendering imaging method and electronic equipment
CN110148181A (en) * 2019-04-25 2019-08-20 青岛康特网络科技有限公司 A kind of general binocular solid matching process

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106772431A (en) * 2017-01-23 2017-05-31 杭州蓝芯科技有限公司 A kind of Depth Information Acquistion devices and methods therefor of combination TOF technologies and binocular vision
CN108322724A (en) * 2018-02-06 2018-07-24 上海兴芯微电子科技有限公司 Image solid matching method and binocular vision equipment
CN108682026A (en) * 2018-03-22 2018-10-19 辽宁工业大学 A kind of binocular vision solid matching method based on the fusion of more Matching units
CN109615652A (en) * 2018-10-23 2019-04-12 西安交通大学 A kind of depth information acquisition method and device
CN110009672A (en) * 2019-03-29 2019-07-12 香港光云科技有限公司 Promote ToF depth image processing method, 3D rendering imaging method and electronic equipment
CN110148181A (en) * 2019-04-25 2019-08-20 青岛康特网络科技有限公司 A kind of general binocular solid matching process

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115127449A (en) * 2022-07-04 2022-09-30 山东大学 Non-contact fish body measuring device and method assisting binocular vision
CN115965673A (en) * 2022-11-23 2023-04-14 中国建筑一局(集团)有限公司 Centralized multi-robot positioning method based on binocular vision
CN115965673B (en) * 2022-11-23 2023-09-12 中国建筑一局(集团)有限公司 Centralized multi-robot positioning method based on binocular vision
CN115963917A (en) * 2022-12-22 2023-04-14 北京百度网讯科技有限公司 Visual data processing apparatus and visual data processing method
CN115963917B (en) * 2022-12-22 2024-04-16 北京百度网讯科技有限公司 Visual data processing apparatus and visual data processing method

Similar Documents

Publication Publication Date Title
CN109737874B (en) Object size measuring method and device based on three-dimensional vision technology
US11720766B2 (en) Systems and methods for text and barcode reading under perspective distortion
Ahmadabadian et al. A comparison of dense matching algorithms for scaled surface reconstruction using stereo camera rigs
US10719727B2 (en) Method and system for determining at least one property related to at least part of a real environment
US8326025B2 (en) Method for determining a depth map from images, device for determining a depth map
CN107392958B (en) Method and device for determining object volume based on binocular stereo camera
JP5580164B2 (en) Optical information processing apparatus, optical information processing method, optical information processing system, and optical information processing program
US8447099B2 (en) Forming 3D models using two images
US8452081B2 (en) Forming 3D models using multiple images
EP3273412B1 (en) Three-dimensional modelling method and device
US20150262346A1 (en) Image processing apparatus, image processing method, and image processing program
CN109211207B (en) Screw identification and positioning device based on machine vision
CN110766758B (en) Calibration method, device, system and storage device
CN112802114A (en) Multi-vision sensor fusion device and method and electronic equipment
JP5672112B2 (en) Stereo image calibration method, stereo image calibration apparatus, and computer program for stereo image calibration
WO2015108996A1 (en) Object tracking using occluding contours
CN113052066B (en) Multi-mode fusion method based on multi-view and image segmentation in three-dimensional target detection
WO2015179216A1 (en) Orthogonal and collaborative disparity decomposition
JP6172432B2 (en) Subject identification device, subject identification method, and subject identification program
CN112184811A (en) Monocular space structured light system structure calibration method and device
CN112348890A (en) Space positioning method and device and computer readable storage medium
CN110197104B (en) Distance measurement method and device based on vehicle
Kochi et al. 3D modeling of architecture by edge-matching and integrating the point clouds of laser scanner and those of digital camera
EP2866446B1 (en) Method and multi-camera portable device for producing stereo images
JP7298687B2 (en) Object recognition device and object recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination