CN110349196B

CN110349196B - Depth fusion method and device

Info

Publication number: CN110349196B
Application number: CN201910261747.5A
Authority: CN
Inventors: 郑朝钟; 魏震豪; 陈正旻; 王毓莹; 林亮均
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2018-04-03
Filing date: 2019-04-02
Publication date: 2024-03-29
Anticipated expiration: 2039-04-02
Also published as: TW202001802A; CN110349196A; TWI734092B

Abstract

The disclosure discloses a depth fusion method and a related device. The depth fusion method includes receiving a plurality of sensor signals of different types from a plurality of sensors; generating first depth-related information of the scene and second depth-related information of the scene based on the plurality of sensor signals; and fusing the first depth related information and the second depth related information to generate a fused depth map of the scene. The depth fusion method and the related device can reduce the cost.

Description

Depth fusion method and device

[ Cross-reference ]

The present disclosure claims priority from U.S. patent application 62/651,813 filed on 3 months 4 and 2019 and U.S. patent application 16/359,713 filed on 3 months 20, the disclosures of which are also incorporated herein in their entireties.

[ field of technology ]

The present disclosure relates generally to computer stereoscopic vision, and more particularly, to visual depth sensing through one or more of accurate and full-range depth fusion (full-range depth fusion) and sensing techniques.

[ background Art ]

Unless otherwise indicated herein, the approaches described in this section are not prior art to the scope of the claims in the list of sections and are not included in this section as prior art.

There are many techniques for range sensing and depth estimation to achieve computer stereo vision. For example, some prior art techniques include structured light, passive stereo matching (passive stereo), active stereo matching (active stereo), and time-of-flight (time-of-flight). However, none of the techniques may be combined with some of the techniques. Furthermore, in applications where Infrared (IR) projectors and IR cameras are used for depth sensing, the components used are often expensive. It is desirable to provide a solution that allows computer stereoscopic vision with stereoscopic matching by using off-the-shelf and relatively inexpensive components.

[ invention ]

According to a first aspect of the present invention, a depth fusion method is disclosed, comprising receiving a plurality of sensor signals of different types from a plurality of sensors; generating first depth-related information of a scene and second depth-related information of the scene based on the plurality of sensor signals; and fusing the first depth related information and the second depth related information to generate a fused depth map of the scene.

According to a second aspect of the present invention, a depth fusion device is disclosed, comprising a control circuit coupled to receive a plurality of sensor signals of different types from a plurality of sensors, such that during operation the control circuit performs operations comprising: generating first depth-related information of a scene and second depth-related information of the scene based on the plurality of sensor signals; and fusing the first depth related information and the second depth related information to generate a fused depth map of the scene.

These and other objects of the present invention will no doubt become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiments which are illustrated in the various figures and drawings.

[ description of the drawings ]

Fig. 1 illustrates an example scenario implemented according to the present disclosure.

Fig. 2 illustrates an example scenario of active stereo matching according to an implementation of the present disclosure.

Fig. 3 illustrates an example scenario of an implementation according to the present disclosure.

Fig. 4 illustrates an example scenario of an implementation according to this disclosure.

Fig. 5 illustrates an example scenario of an implementation according to the present disclosure.

Fig. 6 illustrates an example scenario of depth fusion according to an implementation of the present disclosure.

FIG. 7 illustrates an example fusion method according to an implementation of the disclosure.

FIG. 8 illustrates an example fusion method according to an implementation of the disclosure.

Fig. 9 illustrates an example apparatus according to an implementation of the disclosure.

Fig. 10 illustrates an example process according to an implementation of the disclosure.

Fig. 11 illustrates an example process according to an implementation of the disclosure.

Fig. 12 illustrates an example process according to an implementation of the disclosure.

[ detailed description ] of the invention

Certain terms are used throughout the following description and claims to refer to particular elements. As will be appreciated by those of ordinary skill in the art, electronic equipment manufacturers may refer to a component by a different name. The present disclosure is not intended to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but are not limited to … …". Also, the term "coupled" is intended to mean either an indirect or direct electrical connection. Thus, if one device couples to another device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.

In various techniques for range sensing and depth estimation to achieve computer Stereo vision, such as structured light (structural-light), active dual lens Stereo matching (also known as Active Stereo matching), and time of flight (TOF), electromagnetic waves (e.g., laser or structured light) are emitted or otherwise projected into a scene, and Stereo matching may then be performed by utilizing projector-camera correspondence and/or camera-camera correspondence. Each depth estimation technique provides its own advantages. Under various proposed schemes according to the present disclosure, accurate and full range depth information may be provided by the proposed fusion method. More specifically, different device configurations of sensors (e.g., cameras) and projectors may be utilized such that the advantages of different depth estimation techniques may be fused together.

Fig. 1 illustrates an example scenario 100 implemented according to the present disclosure. Scene 100 illustrates a number of different device configurations of sensors and projectors. In portion (a) of the scene 100, a first example device configuration may involve two IR cameras, one IR projector, and one Red Green Blue (RGB) camera. In portion (B) of the scene 100, a second example device configuration may involve one IR camera, one IR projector, and one RGB camera (denoted herein as "RGB-IR camera") containing pixels capable of receiving pure IR light. In portion (C) of the scene 100, a third example device configuration may involve one RGB camera, one IR projector, and one RGB-IR camera. In part (D) of the scene 100, the fourth device configuration may involve two RGB cameras, one IR projector (or TOF projector) and one IR camera (or TOF camera).

In each exemplary device configuration, the physical distance between the two sensors/cameras is denoted as baseline. In each example device configuration, an RGB camera, which serves as an auxiliary RGB camera, may provide color information of a depth map to be generated. It may be necessary to calibrate a pair of cameras, a camera and a projector. For a pair of projector and camera, a structured light or TOF method can be applied, which generally has good accuracy. For a pair of cameras, a stereo algorithm (often good at integrity) may be applied to estimate depth. Under the proposed scheme, these results can be fused together to generate an accurate full range depth or depth map.

Notably, in the device configuration in the scene 100 and any variation thereof, each of the RGB camera and RGB-IR camera may be replaced by one of: a monochrome camera (monochrome camera) (denoted herein as "grayscale camera"), an RGB camera having a dual band-pass filtering (DB-dual band) capability with respect to visible light and IR light (denoted herein as "RGB-DB camera"), a monochrome camera (denoted herein as "grayscale IR (mono-IR) camera") containing pixels capable of receiving pure IR light, and a grayscale camera (denoted herein as "grayscale DB camera") having a dual band-pass filtering capability with respect to visible light and IR light. Further, each of the IR camera, RGB-IR camera, RGB-DB camera, gray scale IR camera, and gray scale DB camera may be interchangeably referred to as an Electromagnetic (EM) wave sensor, as each such camera is capable of sensing EM waves in the visible and/or invisible (e.g., IR) spectrum.

Active stereo matching IR features

Under the proposed scheme according to the present disclosure, structured IR light (structured IR light) (also referred to as patterned IR light (patterned IR light)) emitted or otherwise projected by an IR projector may meet one or more feature requirements (characteristic requirements). That is, one or more features of the patterned IR light may be used for active stereo matching by utilizing relatively inexpensive components (e.g., two cameras and one IR projector). Thus, cost savings can be achieved in computer stereoscopy through active stereo matching without resorting to relatively expensive components.

Fig. 2 illustrates an example scenario 200 of active stereo matching according to an implementation of the present disclosure. In the scene 200, active stereo matching may be performed using two cameras and one IR projector. Each of the two cameras may be an IR camera, an RGB-DB camera, a grayscale IR camera, or a grayscale DB camera.

In operation, the IR projector may emit or otherwise project patterned IR light toward a scene, and each of the two cameras may capture a respective image of the scene (e.g., a left camera capturing a left image of the scene and a right camera capturing a right image of the scene). As shown in fig. 2, active stereo matching may be performed on a given pixel or pixel block within a specified or predefined window in the left image and a corresponding pixel or pixel block within a specified or predefined window in the right image. The result of the active stereo matching may be used to generate a depth map.

Fig. 3 illustrates an example scenario 300 of an implementation according to this disclosure. The following description of the proposed scheme for IR features for active stereo matching is provided with reference to fig. 3.

Under the proposed solution according to the present disclosure, there may be no limitation or restriction on the shape of the IR pattern in the patterned IR light. That is, the IR pattern may be formed from a plurality of IR pixels formed, for example, but not limited to, as one or more points, one or more lines, one or more circles, one or more ellipses, one or more polygons, one or more stars, or a combination thereof. The IR pattern may vary from device to device (e.g., from one IR projector to another). In other words, each device may be different from the other devices. Examples of different IR modes are shown in part (a) of the scene 300.

Under the proposed scheme according to the present disclosure, the density of the IR pattern of the patterned IR light may be sufficiently high such that each pixel block may be distinguishable. The density can be expressed as (number or appearance of pixels per unit area (occurrence of pixels/unit area)), and the unit area can be expressed as (width x height pixels) ² ). Referring to portion (B) of scene 300, the density of the IR pattern of interest may be relative to the density of a specified or predetermined window within the captured IR image having a plurality of IR patterns (e.g., from an IR camera, RGB-DB camera, gray scale IR camera, or gray scale DB camera). Part (B) of the scene 300 also shows the search direction for active stereo matching.

Thus, under the proposed scheme, the pattern of patterned IR light may comprise a plurality of pixels having a density that meets the density requirement, such that (number of IR pixels/total number of pixels in a predefined window within the captured IR image)). Gtoreq.a first threshold. Here, a first threshold (or threshold 1) may be used to constrain the density of IR patterns in a given window of the IR image. Furthermore, the threshold value 1 may be determined by the quality of the output depth map. The value of threshold 1 may be, for example, 0.2 in 1/pixel (a unit of 1/pixel).

Under the proposed scheme according to the present disclosure, in the case where a given pattern is repeated a plurality of times (denoted herein as a "repeated pattern"), the repetition period of the repeated pattern along the search direction of the active stereo matching may be greater than the operation range of the active stereo matching. The operating range may be, for example, but is not limited to, a predefined window equivalent to the specified window shown in part (B) of the scene 300. For illustrative purposes, and not limitation, portion (C) of scene 300 shows an example in which the repetition period of the repeating pattern is less than the operating range.

Under the proposed solution according to the present disclosure, the ambiguity in the search range along the search direction of stereo matching may be relatively low. The cost function (cost function) of the target may be used to calculate the blur degree for each pixel or pixel. And the value associated with the second smallest cost value is the ambiguity value. The ambiguity value should be below threshold (e.g., 0.8).

The ambiguity for each pixel or pixel block may be calculated using a defined cost function. Fig. 4 illustrates an example scenario 400 of an implementation according to this disclosure. In scene 400, the ambiguity value for each pixel or each block of pixels within the search range along the search direction of the stereo matching is less than or equal to a second threshold (or threshold 2), which may be 0.8, for example. For example, in performing stereo matching of a left image and a right image, a cost function including a cost value (or matching cost value) using a difference between the left image of the left view and the right image of the right view is used to calculate the blur value. The cost function can be expressed mathematically as:

here the number of the elements is the number,may represent the brightness (or color response) of the current pixel within a block (patch) in the left image (denoted as "P _L ") and->May represent the luminance (or color response) of the current pixel within the reference block in the right image during stereo matching (denoted as "P _R "). Under the proposed scheme, when calculating the ambiguity value using the cost function, the ambiguity value may be calculated by dividing the minimum cost value from the cost function by the second minimum cost value from the cost function.

Under the proposed scheme according to the present disclosure, the tilt (tilt) or rotation (rotation) angle of the IR pattern can be utilized to reduce the repeatability of the repeated pattern along the stereo matching direction to follow the constraint of low ambiguity. The absolute value of the inclination or rotation angle may be greater than 0 ° and less than 90 °. Fig. 5 illustrates an example scenario 500 of an implementation according to this disclosure. In part (a) of the scene 500, the repeating direction of the repeating pattern coincides with or is parallel to the search direction of the stereo matching. In part (B) of the scene 500, the repetition direction of the repeated pattern is rotated with respect to the search direction of the stereo matching. In portion (C) of scene 500, the absolute value of the tilt/rotation angle may be greater than 0 ° and less than 90 °.

Depth fusion (depth fusion)

Under the proposed scheme according to the present disclosure, accurate and full-range depth information can be obtained by fusing depth information from different depth estimation techniques such as structured light, passive stereo matching (passive stereo), active stereo matching (active stereo), and TOF. Fig. 6 illustrates an example scenario 600 of depth fusion according to an implementation of the present disclosure. The scene 600 may involve an EM wave projector and two sensors. EM wave projectors may be used to emit or otherwise project patterns. The EM wave projector may be, for example, an IR projector or a TOF projector (e.g., a light (light detection and ranging, abbreviated as LiDAR) projector). The two sensors may be a pair of cameras or a camera plus a TOF sensor. As the camera, each sensor may be an RGB camera, an RGB-IR camera, an RGB-DB camera, a grayscale IR camera, or a grayscale DB camera.

In operation, the EM wave projector may emit or otherwise project a pattern, and a depth map and a confidence map (confidence map) may be obtained by a structured light method or a TOF method using the captured pattern from the first camera of the two sensors. Additionally, the depth map and the confidence map may be obtained by a stereo method (e.g., active stereo and/or passive stereo) using a captured pattern from a first camera and a captured pattern from a second camera of the two sensors. The depth map and confidence map from the structured light/TOF method and the depth map and confidence map from the stereoscopic method may then be fused together by depth fusion to generate a fused depth map (fused depth map). In the scene 600, depth fusion may be performed by using the fusion method 700 and the fusion method 800 described below.

Fig. 7 illustrates an example fusion method 700 according to an implementation of the disclosure. In fusion method 700, a depth map from one of the methods (e.g., structured light/TOF method) may be first remapped and then fused with depth maps from the other method (e.g., stereoscopic method) by considering the confidence map of the structured light/TOF method and the confidence map of the stereoscopic method to provide a fused result. Post-processing may then be performed on the fusion results to generate a fusion depth map. Because of the baseline differences between different methods, it is necessary to remap depth maps from one of the methods.

In fusion method 700, given a confidence map of the method, the confidence (initial peak ratio (peak ratio)) A) can be expressed as follows:

C _PKRN =second least cost value/least cost value

Here, the cost value may be generated by an algorithm, for example, taking the absolute difference (absolute difference) between the two captured images, which may be expressed as follows:

here, I represents image intensity, P _L Representing pixels (or blocks of pixels) in the left image, P _R Representing pixels (or blocks of pixels) in the right image. For post-processing, post-processing in the fusion approach may involve edge-aware filtering and segmentation. Furthermore, depth fusion can be expressed as follows:

D(p)＝argmax _d (Conf(stereo(p)),Conf(structured light(p)))

Here, p may represent each pixel in a given depth map, and Conf () may represent a confidence map.

Fig. 8 illustrates an example fusion method 800 according to an implementation of the disclosure. The fusion method 800 may be similar to the fusion method 700 with some differences. For example, in the fusion method 800, the fusion may be estimated at the cost volume (cost volume) stage. The cost volumes generated by each of the structured light method and the stereo method may be combined and optimized to obtain a more correct depth map. In fusion method 800, depth fusion may be represented as follows:

Cost(p,d)＝weight _Conf(stereo) x cost _stereo(p,d) +weight _{Conf(structured} _light) x cost _structured _light(p,d)

under the proposed scheme, the fusion method 700 and the fusion method 800 can be used independently for different situations and applications. For a typical implementation with two sensors and one projector, the depth quality may be significantly different when covering any one component, which is important for ambiguity.

Fig. 9 illustrates an example apparatus 900 according to an implementation of the disclosure. The device 900 can perform various functions to implement the visual depth sensing related processes, schemes, techniques, processes and methods described herein with accurate and full range depth fusion and sensing and IR pattern features for active stereo matching, including as described above with respect to Various processes, scenarios, schemes, solutions, concepts and technologies are described and processes 1000, 1100 and 1200 are described below.

The apparatus 900 may be part of an electronic device, portable or mobile device, a wearable device, a wireless communication device, or a computing device. For example, apparatus 900 may be implemented in a smart phone, a smart watch, a personal digital assistant, a digital camera, or a computing device such as a tablet, laptop, or notebook computer. Furthermore, the device 900 may also be part of a machine type device, which may be an IoT or NB-IoT device such as a non-mobile or fixed device, a home device, a wired communication device, or a computing device. For example, the apparatus 900 may be implemented in a smart thermostat, a smart refrigerator, a smart door lock, a wireless speaker, or a home control center. Alternatively, apparatus 900 may be implemented in the form of one or more collective circuit (IC) die, such as, but not limited to, one or more single-core processors, one or more multi-core processors, one or more reduced-instruction-set-computing (RISC) processors, or one or more complex instruction-set computing (CISC) processors.

Apparatus 900 may include at least some of those components shown in fig. 9, such as control circuitry 910, a plurality of sensors 920 (1) -920 (N), where N is a positive integer greater than 1, and at least one EM wave projector 930. The apparatus 900 may also include one or more other components (e.g., an internal power source, a display device, and/or a user interface device) that are not relevant to the proposed solution of the present disclosure, and thus, such components of the apparatus 900 are not shown in fig. 9, and are not described below for simplicity and brevity.

In one aspect, the control circuit 910 may be implemented in the form of an electronic circuit that includes various electronic components. Alternatively, the control circuitry 910 may be implemented in the form of one or more single-core processors, one or more multi-core processors, one or more RISC processors, or one or more CISC processors. That is, even though the singular term "processor" is used herein to refer to the control circuit 910, the control circuit 910 may include multiple processors in some implementations, and may include a single processor in other implementations according to the present disclosure. In another aspect, the apparatus 910 may be implemented in hardware (and optionally firmware) having electronic components including, for example, but not limited to, one or more transistors, one or more diodes, one or more capacitors, one or more resistors, one or more inductors, one or more memristors, and/or one or more varactors, configured and arranged to achieve a particular objective in accordance with the present disclosure. In other words, in at least some embodiments, the control circuit 910 is a special purpose machine specifically designed, arranged, and configured to perform specific tasks related to visual depth sensing, with accurate and full range depth fusion and sensing, and IR pattern features for active stereo matching in accordance with various embodiments of the present disclosure. In some implementations, the control circuitry 910 may include electronic circuitry with hardware components that implement one or more of the various proposed schemes according to the present disclosure. Alternatively, in addition to hardware components, control circuitry 910 may utilize software code and/or instructions in addition to hardware components to implement visual depth sensing with accurate and full range depth fusion and sensing as well as IR mode features for active stereo matching in accordance with various embodiments of the present disclosure.

Each of the plurality of sensors 920 (1) -920 (N) may be a camera or a TOF sensor. In the context of a camera, the corresponding sensor may be an IR camera, an RGB camera, a grayscale camera, an RGB-IR camera, a grayscale IR camera, an RGB-BD camera or a grayscale BD camera. The EM wave projector may be an IR projector or a TOF projector.

Under various proposed schemes according to the present disclosure, with respect to visual depth sensing with accurate and full range depth fusion and sensing, control circuit 910 may receive multiple sensor signals of different types from multiple sensors 920 (1) to 920 (N). In addition, the control circuit 910 may generate first depth-related information of the scene and second depth-related information of the scene based on the plurality of sensor signals. Further, the control circuit 910 may fuse the first depth related information and the second depth related information to generate a fused depth map of the scene.

In some embodiments, upon receiving multiple sensor signals of different types from multiple sensors 920 (1) -920 (N), control circuitry 910 may receive the multiple sensor signals from two or more of: RGB camera, gray-scale camera, IR camera, RGB-IR camera, gray-scale infrared camera, RGB-DB camera, gray-scale DB camera, and TOF sensor.

In some implementations, the control circuit 910 may perform a plurality of operations in generating the first depth related information and the second depth related information. For example, the control circuit 910 may generate a first depth map and a first confidence map based on at least a first sensor signal of the plurality of sensor signals 920 (1) -920 (N) of the first type. In addition, the control circuit 910 may generate a second depth map and a second confidence map based on at least a second sensor signal of the plurality of sensor signals 920 (1) -920 (N) of a second type different from the first type.

In some implementations, in generating the first depth map and the first confidence map, the control circuit 910 may generate the first depth map and the first confidence map using a structured light method or a TOF method. In some implementations, in generating the second depth map and the second confidence map, the control circuit 910 may generate the second depth map and the second confidence map using an active stereoscopic approach or a passive stereoscopic approach.

In some implementations, the control circuit 910 may perform several operations in fusing the first depth related information and the second depth related information to generate a fused depth map. For example, the control circuitry 910 may remap the first depth map with respect to the second depth map to generate a remapped first depth map. Further, the control circuitry 910 may fuse the remapped first depth map, second depth map, first confidence map, and second confidence map to provide a fused result. Further, the control circuit 910 may perform post-processing on the fusion result to generate a fusion depth map.

Alternatively, the control circuit 910 may perform other operations when fusing the first depth-related information and the second depth-related information to generate a fused depth map. For example, the control circuitry 910 may remap the first depth map with respect to the second depth map to generate a remapped first depth map. Additionally, the control circuitry 910 may estimate an amount of cost associated with generating the first depth map and the first confidence map. Further, the control circuitry 910 may fuse the remapped first depth map, second depth map, first confidence map, second confidence map, and cost amount to provide a fused result. Further, the control circuit 910 may perform post-processing on the fusion result to generate a fusion depth map. In addition, in generating the first depth map and the first confidence map, the control circuit 910 may generate the first depth map and the first confidence map using a structured light method or a TOF method.

In some implementations, in estimating the cost amount, the control circuit 910 may estimate the cost amount by calculating a combination of the weighted cost associated with the stereoscopic approach and the weighted cost associated with the structured light approach.

In some implementations, the control circuit 910 may perform several operations in fusing the first depth related information and the second depth related information to generate a fused depth map. For example, the control circuit 910 may determine whether to fuse the first depth-related information and the second depth-related information using the first fusion method or the second fusion method. Then, based on the result of the determination, the control circuit 910 may fuse the first depth-related information and the second depth-related information using the first fusion method or the second fusion method. The first fusion method may include: (a1) Remapping the first depth map relative to the second depth map to generate a remapped first depth map; (b1) Fusing the remapped first depth map, second depth map, first confidence map, and second confidence map to provide a fused result; (c1) And carrying out post-processing on the fusion result to generate a fusion depth map. The second fusion method may include: (a2) Remapping the first depth map relative to the second depth map to generate a remapped first depth map; (b2) Estimating a cost amount associated with generating the first depth map and the first confidence map; (c2) Fusing the remapped first depth map, second depth map, first confidence map, second confidence map, and cost amount to provide a fused result; (d2) And carrying out post-processing on the fusion result to generate a fusion depth map. In some embodiments, in the second fusion method, the first depth map and the first confidence map may be generated using a structured light method or a TOF method.

In some implementations, the control circuitry 910 may control the EM wave projector 930 to emit electromagnetic waves toward the scene. EM wave projector 930 may include an IR projector or a TOF projector.

In some embodiments, the control circuit 910 may calibrate a pair of the plurality of sensors 920 (1) -920 (N) or one of the plurality of sensors 920 (1) -920 (N) and the EM wave projector 930.

Under the proposed scheme according to the present disclosure, with respect to IR pattern features for active stereo matching, the control circuit 910 may control the EM wave projector 920 (e.g., an IR projector) to project patterned IR light. Further, the control circuit 910 may receive first data of a left image of a scene from a first camera (e.g., sensor 920 (1)) and second data of a right image of the scene from a second camera (e.g., sensor 920 (2)). Further, the control circuitry 910 may perform stereo matching (e.g., active stereo matching) of the left image and the right image to generate a depth map of the scene. The patterned IR light may meet one or more feature requirements.

In some implementations, the pattern of patterned IR light can include a plurality of IR pixels forming one or more points, one or more lines, one or more circles, one or more ellipses, one or more polygons, one or more star shapes, or a combination thereof.

In some implementations, the pattern of patterned IR light can include a plurality of pixels whose density meets a density requirement such that a number of IR pixels divided by a total number of pixels in a predefined window within the left image or the right image is greater than or equal to a first threshold.

In some implementations, the first threshold may be 0.2.

In some implementations, the patterned IR light can include multiple instances of the repeating pattern. In this case, the repetition period of the repeated pattern along the search direction of the stereo matching may be greater than the operation range of the stereo matching.

In some embodiments, the repeat direction of the repeat pattern may be inclined at an angle greater than 0 ° and less than 90 ° relative to the search direction of the stereo matching.

In some implementations, the ambiguity value for each pixel or each pixel block within the search range along the stereo matching direction may be less than or equal to the second threshold.

In some implementations, in performing stereo matching of the left and right images, the control circuit 910 may calculate the ambiguity values using a cost function of cost values of differences between the left and right images. In some implementations, the cost function may be expressed mathematically as follows:

here the number of the elements is the number, Can represent the brightness of the current pixel within the block in the left image, and +.>Can represent the brightness of the current pixel within the reference block in the right image during stereo matching。

In some implementations, when calculating the ambiguity value using the cost function, the control circuit 910 may calculate the ambiguity value by dividing the minimum cost value from the cost function by the second minimum cost value from the cost function.

In some implementations, the second threshold may be 0.8.

In some implementations, in performing stereo matching of the left and right images, the control circuit 910 may perform active stereo matching of the left and right images.

Under another proposed scheme according to the present disclosure, with respect to IR pattern features for active stereo matching, the control circuit 910 may control the EM wave projector 930 (e.g., an IR projector) to project patterned IR light. In addition, the control circuit 910 may receive first data of a left image of a scene from a first camera (e.g., sensor 920 (1)) and second data of a right image of the scene from a second camera (e.g., sensor 920 (2)). In addition, the control circuitry 910 may perform active stereo matching of the left and right images to generate a depth map of the scene. The patterned IR light may satisfy one or more of a plurality of feature requirements. In some implementations, the plurality of feature requirements can include: (1) Patterning a pattern of IR light comprising a plurality of pixels whose density meets a density requirement such that a number of IR pixels divided by a total number of pixels in a predefined window within the left image or the right image is greater than or equal to a first threshold; (2) Patterning the IR light, including instances of multiple repeating patterns, such that a repetition period of the repeating patterns along a search direction of the stereo matching is greater than an operational range of the stereo matching; (3) The repeating direction of the repeating pattern is rotated by an angle of more than 0 DEG and less than 90 DEG with respect to the search direction of the stereo matching.

here the number of the elements is the number,can represent the brightness of the current pixel within the block in the left image, and +.>The luminance of the current pixel within the reference block in the right image during stereo matching may be represented. Further, the first threshold may be 0.2 and the second threshold may be 0.8.

Fig. 10 illustrates an example process 1000 according to an implementation of the disclosure. Process 1000 (whether partial or complete) may be an example implementation of various processes, scenarios, concepts, solutions, concepts and technologies, or combinations thereof, with accurate and full range depth fusion and sensed visual depth sensing according to the present disclosure. Process 1000 may represent an aspect of an implementation of features of apparatus 900. Process 1000 may include one or more operations, actions, or functions as illustrated by one or more of blocks 1010, 1020, and 1030. Although shown as discrete blocks, the various blocks of process 1000 may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation. Further, the blocks of process 1000 may be performed in the order shown in fig. 10, or may be performed in a different order. Further, one or more blocks of process 1000 may be repeated one or more times. Process 1000 may be implemented by apparatus 900 or any variation thereof. For illustrative purposes only and not limitation, process 1000 is described below in the context of device 900. Process 1000 may begin at block 1010.

At 1010, process 1000 may involve control circuitry 910 receiving a plurality of different types of sensor signals from a plurality of sensors 920 (1) -920 (N). Process 1000 may proceed from 1010 to 1020.

At 1020, process 1000 may involve control circuitry 910 generating first depth-related information for a scene and second depth-related information for the scene based on the plurality of sensor signals. Process 1000 may proceed from 1020 to 1030.

At 1030, the process 1000 may involve the control circuitry 910 fusing the first depth-related information and the second depth-related information to generate a fused depth map of the scene.

In some implementations, upon receiving multiple sensor signals of different types from multiple sensors 920 (1) -920 (N), process 1000 may involve control circuitry 910 receiving multiple sensor signals from two or more of the following: RGB camera, gray-scale camera, IR camera, RGB-IR camera, gray-scale IR camera, RGB-DB camera, gray-scale DB camera, and TOF sensor.

In some implementations, the process 1000 may involve the control circuitry 910 performing a plurality of operations in generating the first depth related information and the second depth related information. For example, process 1000 may involve control circuitry 910 generating a first depth map and a first confidence map based on at least a first sensor signal of a plurality of sensor signals 920 (1) -920 (N) of a first type. Additionally, the process 1000 may involve the control circuit 910 generating a second depth map and a second confidence map based on at least a second sensor signal of the plurality of sensor signals 920 (1) -920 (N) of a second type different from the first type.

In some implementations, in generating the first depth map and the first confidence map, the process 1000 may involve the control circuit 910 generating the first depth map and the first confidence map using a structured light method or a TOF method. In some implementations, in generating the second depth map and the second confidence map, the process 1000 may involve the control circuit 910 generating the second depth map and the second confidence map using an active stereoscopic approach or a passive stereoscopic approach.

In some implementations, the process 1000 may involve the control circuitry 910 performing a plurality of operations in fusing the first depth-related information and the second depth-related information to generate a fused depth map. For example, process 1000 may involve control circuitry 910 remapping the first depth map relative to the second depth map to generate the remapped first depth map. Further, the process 1000 may involve the control circuitry 910 fusing the remapped first depth map, second depth map, first confidence map, and second confidence map to provide a fused result. Further, process 1000 may involve control circuitry 910 performing post-processing on the fusion results to generate a fusion depth map.

Alternatively, the process 1000 may involve the control circuitry 910 performing other operations when fusing the first depth-related information and the second depth-related information to generate a fused depth map. For example, process 1000 may involve control circuitry 910 remapping the first depth map relative to the second depth map to generate the remapped first depth map. Additionally, the process 1000 may involve the control circuitry 910 estimating an amount of cost associated with generating the first depth map and the first confidence map. Further, the process 1000 may involve the control circuitry 910 fusing the remapped first depth map, second depth map, first confidence map, second confidence map, and cost amount to provide a fused result. Further, process 1000 may involve control circuitry 910 performing post-processing on the fusion results to generate a fusion depth map. Additionally, in generating the first depth map and the first confidence map, the process 1000 may involve the control circuit 910 generating the first depth map and the first confidence map using a structured light method or a TOF method.

In some implementations, in estimating the cost amount, process 1000 may involve control circuitry 910 estimating the cost amount by calculating a combination of a weighted cost associated with the stereoscopic approach and a weighted cost associated with the structured light approach.

In some implementations, the process 1000 may involve the control circuitry 910 performing a plurality of operations in fusing the first depth-related information and the second depth-related information to generate a fused depth map. For example, process 1000 may involve control circuitry 910 determining whether to fuse the first depth-related information and the second depth-related information using a first fusion method or a second fusion method. Then, based on the result of the determination, the process 1000 may involve the control circuit 910 fusing the first depth-related information and the second depth-related information using the first fusion method or the second fusion method. The first fusion method may include: (a1) Remapping the first depth map relative to the second depth map to generate a remapped first depth map; (b1) Fusing the remapped first depth map, second depth map, first confidence map, and second confidence map to provide a fused result; (c1) And carrying out post-processing on the fusion result to generate a fusion depth map. The second fusion method may include: (a2) Remapping the first depth map relative to the second depth map to generate a remapped first depth map; (b2) Estimating a cost amount associated with generating the first depth map and the first confidence map; (c2) Fusing the remapped first depth map, second depth map, first confidence map, second confidence map, and cost amount to provide a fused result; (d2) And carrying out post-processing on the fusion result to generate a fusion depth map. In some embodiments, in the second fusion method, the first depth map and the first confidence map may be generated using a structured light method or a TOF method.

In some implementations, the process 1000 may further involve the control circuitry 910 controlling the electromagnetic wave projector (electromagnetic wave projector) to emit electromagnetic waves toward the scene. The electromagnetic wave projector may comprise an IR projector or a TOF projector.

In some implementations, the process 1000 may further include the control circuit 910 calibrating a pair of the plurality of sensors or one of the plurality of sensors plus the electromagnetic wave projector.

Fig. 11 illustrates an example process 1100 according to an implementation of the disclosure. Process 1100 may be an example implementation of various processes, scenarios, concepts, solutions, concepts and technologies, or combinations thereof, regarding part or all of the IR pattern features of active stereo matching according to the present disclosure. Process 1100 may represent an aspect of an implementation of the features of apparatus 900. Process 1100 may include one or more operations, acts, or functions, as illustrated by one or more of blocks 1110, 1120, and 1130. Although shown as discrete blocks, the various blocks of process 1100 may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation. Further, the blocks of process 1100 may be performed in the order shown in fig. 11, or may be performed in a different order. Further, one or more blocks of process 1100 may be repeated one or more times. Process 1100 may be implemented by apparatus 900 or any variation thereof. For illustrative purposes only and not limitation, process 1100 is described below in the context of device 900. Process 1100 may begin at block 1110.

At 1110, process 1100 can involve control circuitry 910 controlling an electromagnetic wave projector 930 (e.g., an IR projector) to project patterned IR light. Process 1100 may proceed from 1110 to 1120.

At 1120, the process 1100 may involve the control circuitry 910 receiving first data of a left image of a scene from a first camera (e.g., sensor 920 (1)) and receiving second data of a right image of the scene from a second camera (e.g., sensor 920 (2)). Process 1100 may proceed from 1120 to 1130.

At 1130, process 1100 may involve control circuitry 910 performing stereo matching (e.g., active stereo matching) of the left image and the right image to generate a depth map of the scene. The patterned IR light may meet one or more feature requirements.

In some implementations, the first threshold may be 0.2.

In some embodiments, the repeat direction of the repeat pattern may be rotated by an angle greater than 0 ° and less than 90 ° absolute with respect to the search direction of the stereo matching.

here the number of the elements is the number,can represent the brightness of the current pixel within the block in the left image, and +.>The luminance of the current pixel within the reference block in the right image during stereo matching may be represented.

In some implementations, in calculating the ambiguity values using the cost function, the process 1100 may involve the control circuit 910 calculating the ambiguity values by dividing the minimum cost value from the cost function by the second minimum cost value from the cost function.

In some implementations, the second threshold may be 0.8.

In some implementations, in performing stereo matching of left and right images, process 1100 may involve control circuit 910 performing active stereo matching of left and right images.

Fig. 12 illustrates an example process 1200 according to an implementation of the disclosure. Process 1200 may be an example implementation of various processes, scenarios, concepts, solutions, concepts and technologies, or combinations thereof, regarding part or all of the IR pattern features of active stereo matching according to the present disclosure. Process 1200 may represent an aspect of an implementation of features of apparatus 900. Process 1200 may include one or more operations, actions, or functions as illustrated by one or more of blocks 1210, 1220, and 1230. Although shown as discrete blocks, the various blocks of process 1200 may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation. Further, the blocks of process 1200 may be performed in the order shown in fig. 12, or may be performed in a different order. Further, one or more blocks of process 1200 may be repeated one or more times. Process 1200 may be implemented by apparatus 900 or any variation thereof. For illustrative purposes only and not limitation, process 1200 is described below in the context of device 900. Process 1200 may begin at block 1210.

At 1210, the process 1200 can involve the control circuitry 910 controlling the electromagnetic wave projector 930 (e.g., an IR projector) to project patterned IR light. Process 1200 may proceed from 1210 to 1220.

At 1220, the process 1200 may involve the control circuit 910 receiving first data of a left image of a scene from a first camera (e.g., sensor 920 (1)) and receiving second data of a right image of the scene from a second camera (e.g., sensor 920 (2)). Process 1200 may proceed from 1220 to 1230.

At 1230, process 1200 may involve control circuitry 910 performing active stereo matching of the left image and the right image to generate a depth map of the scene. The patterned IR light may satisfy one or more of a plurality of feature requirements.

In some implementations, the plurality of feature requirements can include: (1) The pattern of patterned IR light includes a plurality of pixels having a density that meets a density requirement such that a number of IR pixels divided by a total number of pixels in a predefined window within the left image or the right image is greater than or equal to a first threshold; (2) The patterned IR light includes multiple instances of repeating patterns such that a repetition period of the repeating patterns along a search direction of the stereo matching is greater than an operational range of the stereo matching; (3) The repeating direction of the repeating pattern is rotated by an angle of more than 0 DEG and less than 90 DEG with respect to the search direction of the stereo matching.

In some implementations, in performing stereo matching of left and right images, the process 1200 may involve the control circuit 910 calculating the ambiguity values using a cost function of cost values of differences between the left and right images. In some implementations, the cost function may be expressed mathematically as follows:

In some implementations, in calculating the ambiguity values using the cost function, the process 1200 may involve the control circuit 910 calculating the ambiguity values by dividing the minimum cost value from the cost function by the second minimum cost value from the cost function.

The subject matter described herein sometimes illustrates different components contained within, or connected with, other different components. It should be understood that: the architecture thus depicted is merely exemplary, and many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Thus, any two components herein combined to achieve a particular functionality can be seen as "associated with" each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Similarly, any two components so associated can also be viewed as being "operably connected," or "operably coupled," to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being "operably couplable," to each other to achieve the desired functionality. Specific examples of "operatively couplable" include, but are not limited to: physically couplable and/or physically interactable, and/or wirelessly interactable, and/or logically interactable.

Furthermore, with respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application.

It will be understood by those within the art that, in general, terms used herein, and especially those used in the appended claims (e.g., bodies in the appended claims) are generally intended as "open" terms (e.g., the term "comprising" should be interpreted as "including but not limited to," the term "having" should be interpreted as "having at least," the term "comprising" should be interpreted as "including but not limited to," etc.). Those skilled in the art will also understand that if a claim is intended to introduce a specific number of objects, such intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim scope recitation. However, the use of such phrases should not be construed to be: recitation of claim recitations of the indefinite articles "a" or "an" limits any claim range containing such recitation to inventions containing only one such recitation, even when the same claim range contains the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" (e.g., "a" and/or "an") should typically be interpreted to mean "at least one" or "one or more"); the same applies to the introduction of the claim recitation by definite articles. In addition, even if the introduced claims recite a specific number of an object, those skilled in the art will recognize that: such recitation should typically be interpreted to mean at least the recited number (e.g., the bare recitation of "two recitation objects" without other modifiers typically means at least two recitation objects, or two or more recitation objects). Further, where a convention analogous to "at least one of A, B and C, etc." is used, such a construction in general is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to a system having a alone, B alone, C, A and B together alone, a and C together, B and C together, and/or A, B and C together, etc.). Where a convention analogous to "at least one of A, B or C, etc." is used, such a construction in general is intended in the sense such convention being understood by one having skill in the art (e.g., "a system having at least one of A, B or C" would include but not be limited to a system having a alone, B alone, C, A and B alone, a and C together, B and C together, and/or A, B and C together, etc.). Those skilled in the art will further appreciate that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "a or B" should be understood to encompass the possibilities of "a", "B", or "a and B".

Although several exemplary techniques have been described and illustrated herein using different methods, devices, and systems, those skilled in the art will appreciate that: various other modifications may be made, and equivalents substituted, without departing from claimed subject matter. In addition, many modifications may be made to adapt a particular situation to the teachings of the claimed subject matter without departing from the central concept described herein. It is intended, therefore, that the claimed subject matter not be limited to the particular examples disclosed, but that such claimed subject matter may also include all implementations falling within the scope of the appended claims, and equivalents thereof.

The foregoing description is only of the preferred embodiments of the invention, and all changes and modifications that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A depth fusion method, comprising:

receiving a plurality of sensor signals of different types from a plurality of sensors;

generating first depth-related information of a scene and second depth-related information of the scene based on the plurality of sensor signals, comprising:

generating a first depth map and a first confidence map based on at least a first sensor signal of the plurality of sensor signals of a first type; and

Generating a second depth map and a second confidence map based on at least a second sensor signal of the plurality of sensor signals of a second type different from the first type; and

fusing the first depth-related information and the second depth-related information to generate a fused depth map of the scene, comprising:

remapping the first depth map relative to the second depth map to generate a remapped first depth map;

estimating a cost amount associated with generating the first depth map and the first confidence map;

fusing the remapped first depth map, the second depth map, the first confidence map, the second confidence map, and the cost amount to provide a fused result; and

post-processing the fusion result to generate the fusion depth map,

wherein generating the first depth map and the first confidence map comprises generating the first depth map and the first confidence map using a structured light method or a time of flight method.

2. The depth fusion method of claim 1, wherein receiving the plurality of types of different sensor signals from the plurality of sensors comprises receiving the plurality of sensor signals from two or more of a red-green-blue camera, a gray-scale camera, an infrared camera, a red-green-blue infrared camera, a gray-scale infrared camera, a red-green-blue camera with dual band-pass filtering, a gray-scale camera with dual band-pass filtering, and a time-of-flight sensor.

3. The depth fusion method of claim 1, wherein generating the second depth map and the second confidence map comprises generating the second depth map and the second confidence map using an active stereo method or a passive stereo method.

4. The depth fusion method of claim 1, wherein estimating the cost amount comprises estimating the cost amount by calculating a combination of a weighted cost associated with the stereo method and a weighted cost associated with the structured light method.

5. The depth fusion method of claim 1, further comprising:

controlling the electromagnetic wave projector to emit electromagnetic waves to the scene,

wherein the electromagnetic wave projector comprises an infrared projector or a time-of-flight projector.

6. The depth fusion method of claim 5, further comprising:

a pair of the plurality of sensors or one of the plurality of sensors plus the electromagnetic wave projector are calibrated.

7. A depth fusion device, comprising:

a control circuit coupled to receive a plurality of sensor signals of different types from a plurality of sensors such that during operation, the control circuit performs operations comprising:

post-processing the fusion result to generate the fusion depth map,

8. The depth fusion device of claim 7, further comprising:

the plurality of sensors includes two or more of a red, green, blue camera, a grayscale camera, an infrared camera, a red, green, blue camera with dual band pass filtering, a grayscale camera with dual band pass filtering, and a time of flight sensor.

9. The depth fusion apparatus of claim 7, wherein the control circuit generates the second depth map and the second confidence map using an active stereo method or a passive stereo method when generating the second depth map and the second confidence map.

10. The depth fusion apparatus of claim 7, wherein in estimating the cost amount, the control circuit estimates the cost amount by calculating a combination of a weighted cost associated with the stereoscopic approach and a weighted cost associated with the structured light approach.

11. The depth fusion apparatus of claim 7, further comprising:

an electromagnetic wave projector; and

the plurality of sensors may be configured to detect a plurality of sensors,

wherein during operation, the control circuit further controls the electromagnetic wave projector to emit electromagnetic waves toward the scene, wherein the electromagnetic wave projector comprises an infrared projector or a time-of-flight projector.

12. The depth fusion apparatus of claim 11, wherein during operation, the control circuit further calibrates a pair of the plurality of sensors or one of the plurality of sensors plus the electromagnetic wave projector.