CN113160298B

CN113160298B - Depth truth value acquisition method, device and system and depth camera

Info

Publication number: CN113160298B
Application number: CN202110347156.7A
Authority: CN
Inventors: 兰富洋; 袁正刚; 杨鹏; 王兆民; 黄源浩; 肖振中
Original assignee: Orbbec Inc
Current assignee: Orbbec Inc
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2024-03-08
Anticipated expiration: 2041-03-31
Also published as: CN113160298A

Abstract

The application is applicable to the technical field of computer vision and camera calibration, and provides a depth truth value acquisition method, a device and a system and a depth camera, wherein the depth truth value acquisition method comprises the following steps: acquiring respective internal parameters of a binocular camera and a target camera, external parameters of the binocular camera, and external parameters of any one of the binocular camera and the target camera; when a target camera is started, a target image of a target scene is acquired; when the binocular camera and the projection module are started, and the projection module is rotated and/or translated, left-eye images and right-eye images of a plurality of pairs of target scenes are acquired; calculating parallaxes of a plurality of pairs of left-eye images and right-eye images of the target scene, and calculating a fusion parallax map; and calculating depth information of the target scene according to the fusion disparity map, and acquiring a depth truth value of the target camera by utilizing internal parameters and external parameters of the target camera. The embodiment of the application can conveniently generate the high-precision depth map aligned with the imaging device in the target equipment pixel by pixel.

Description

Depth truth value acquisition method, device and system and depth camera

Technical Field

The invention belongs to the technical field of computer vision and camera calibration, and particularly relates to a depth truth value acquisition method, device and system and a depth camera.

Background

In order to obtain better quality three-dimensional information and solve some of the problems that are difficult to overcome in the conventional three-dimensional measurement methods, more and more three-dimensional measurement methods begin to calculate depth using a deep learning algorithm. For example, three-dimensional information of a scene is reconstructed from a single RGB image or a gray scale image by using deep learning, so that the problem of depth errors caused by multipath and scattering equivalent in indirect time-of-flight (iToF) measurement is solved. However, deep learning requires a large amount of data and depth truth (ground truh) as training samples, and the number of training samples and the accuracy of the depth truth directly determine the final performance of the deep learning algorithm.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a depth truth value obtaining method, device, system and depth camera, which can solve one or more technical problems in the related art.

In a first aspect, an embodiment of the present application provides a depth truth value obtaining method, including:

acquiring respective internal parameters of a binocular camera and a target camera, external parameters of the binocular camera, and external parameters of any one of the binocular camera and the target camera;

when a target camera is started, a target image of a target scene is acquired; when the binocular camera and the projection module are started, and the projection module is rotated and/or translated, left-eye images and right-eye images of a plurality of pairs of target scenes are acquired;

Calculating parallaxes of a plurality of pairs of left-eye images and right-eye images of the target scene, and calculating a fusion parallax map;

and calculating depth information of the target scene according to the fusion disparity map, and acquiring a depth truth value of the target camera by utilizing internal parameters and external parameters of the target camera.

The present embodiment advantageously generates a high precision depth map that is aligned pixel-by-pixel with an imaging device in a target device. In practical application, the deep truth training sample can be conveniently and accurately provided for the deep learning algorithm.

As an implementation manner of the first aspect, the depth truth value obtaining method further includes:

and taking the target image and the depth truth value as a group of training samples, acquiring a plurality of groups of training samples under different target scenes, and training a neural network model by utilizing the plurality of groups of training samples to acquire a trained neural network model.

As another implementation manner of the first aspect, the calculating the parallaxes of the pairs of left-eye images and right-eye images of the target scene and calculating the fusion disparity map includes:

calculating parallax for each pair of left-eye images and right-eye images of the target scene to obtain a parallax map;

and fusing the multi-frame parallax images to obtain a fused parallax image.

As another implementation manner of the first aspect, the calculating depth information of the target scene according to the fused disparity map, and obtaining a depth truth value of the target camera by using internal parameters and external parameters of the target camera includes:

Calculating depth information of the target scene by using the fusion disparity map;

converting the depth information into point cloud data, and projecting the point cloud data onto an imaging plane of a target camera based on internal parameters and external parameters of the target camera;

and calculating a depth truth value acquired by the target camera according to the point cloud data projected to the imaging plane of the target camera.

As another implementation manner of the first aspect, the acquiring the internal parameters of each of the binocular camera and the target camera, the external parameters of the binocular camera, and the external parameters of any one of the binocular camera and the target camera includes:

respectively acquiring a left target fixed image, a right target fixed image and a target calibration image of a target camera of the calibration plate with different postures;

performing polar correction on the binocular camera by utilizing the left target fixed image and the right target fixed image acquired by the binocular camera;

and calculating the external parameters of any one of the target camera and the binocular camera by using the target calibration image acquired by the target camera.

As another implementation manner of the first aspect, when the target camera is turned on, acquiring a target image of a target scene; when the binocular camera and the projection module are turned on and the projection module is rotated and/or translated, left and right eye images of the multiple pairs of target scenes are acquired, including:

Only starting a target camera, and acquiring a target image of a target scene through the target camera;

and closing the target camera, opening the binocular camera and the projection module, and acquiring a plurality of pairs of left-eye images and right-eye images of the target scene through the binocular camera in the process of rotating and/or translating the projection module.

In a second aspect, an embodiment of the present application provides a depth truth value obtaining apparatus, including:

the calibration module is used for acquiring the respective internal parameters of the binocular camera and the target camera, the external parameters of the binocular camera and the external parameters of any one of the binocular camera and the target camera;

the image acquisition module is used for acquiring a target image of a target scene acquired by the target camera when the target camera is started; when a binocular camera and a projection module are started, and the projection module is rotated and/or translated, acquiring left-eye images and right-eye images of a plurality of pairs of target scenes acquired by the binocular camera;

the parallax calculation module is used for calculating parallaxes of a plurality of pairs of left-eye images and right-eye images of the target scene and calculating a fusion parallax image;

and the depth truth value calculation module is used for calculating the depth information of the target scene according to the fusion disparity map and acquiring the depth truth value of the target camera by utilizing the parameters of the target camera.

As an implementation manner of the second aspect, the method further includes:

and the training module is used for taking the target image and the depth truth value as a group of training samples, obtaining a plurality of groups of training samples under different target scenes, and training a neural network model by utilizing the plurality of groups of training samples to obtain a trained neural network model.

As another implementation manner of the second aspect, the parallax calculating module is specifically configured to:

and fusing the multi-frame parallax images to obtain a fused parallax image.

As another implementation manner of the second aspect, the depth truth value calculation module is specifically configured to:

As another implementation manner of the second aspect, the calibration module is specifically configured to:

As another implementation manner of the second aspect, the image acquisition module is specifically configured to:

As another implementation manner of the second aspect, the method further includes:

and the control execution module is used for controlling the control module to drive the projection module to rotate and/or translate.

In a third aspect, an embodiment of the present application provides a depth truth value acquisition system, including a binocular camera, a target camera, and a projection module, where the depth truth value acquisition system further includes a depth truth value acquisition device according to the second aspect or any implementation manner of the second aspect.

As an implementation manner of the third aspect, the depth truth value obtaining system further includes: and the manipulation module is used for driving the projection module to rotate and/or translate.

In a fourth aspect, an embodiment of the present application provides a depth camera, including an acquisition module and a processing module,

the acquisition module is used for acquiring a first image of the target area;

the processing module comprises a trained training unit, and is used for acquiring a depth true value of the target area by using the first image and the training unit;

the training unit is a neural network model trained by taking a target image acquired by the depth truth value acquisition method according to the first aspect or any implementation manner of the first aspect as a training sample in advance.

As an implementation manner of the fourth aspect, the depth camera may further include a projection module, the projection module is configured to project an infrared beam toward a target area, the acquisition module is configured to acquire the infrared beam reflected back through the target area and generate an infrared image, and the processing module is configured to acquire a depth truth value of the target area using the infrared image and the training unit.

It will be appreciated that the advantages of the second to fourth aspects may be found in the relevant description of the first aspect and are not repeated here.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a depth truth acquisition system according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a processing unit according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a depth truth obtaining system according to an embodiment of the present application;

FIG. 4 is a schematic view of a projection module according to an embodiment of the present disclosure before and after rotation;

FIG. 5 is a schematic view of a change in the speckle distribution of a scene object before and after rotation of a laser projection according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an implementation flow of a depth truth value obtaining method according to an embodiment of the present application;

fig. 7 is a flowchart illustrating a specific implementation of step S610 of a depth truth value obtaining method according to an embodiment of the present application;

FIG. 8 is a flowchart illustrating a specific implementation of step S620 of a depth truth value obtaining method according to an embodiment of the present application;

fig. 9 is a flowchart illustrating a specific implementation of step S630 of a depth truth value obtaining method according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an image and corresponding depth truth values of a default output of a target device (i.e., a target camera) according to an embodiment of the present application;

FIG. 11 is a flowchart illustrating a specific implementation of step S640 of a depth truth value obtaining method according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a depth truth obtaining apparatus according to an embodiment of the present application;

FIG. 13 is a schematic diagram of a depth truth obtaining apparatus according to an embodiment of the present application;

fig. 14 is a schematic diagram of a depth camera according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

The term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Furthermore, in the description of the present application, the meaning of "plurality" is two or more. The term "first" and the like are used solely to distinguish one from another and should not be taken as indicating or implying a relative importance.

It will be further understood that the term "coupled" is to be interpreted broadly, and may be a fixed connection, a removable connection, or an integral body, for example, unless explicitly stated or defined otherwise; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.

In order to illustrate the technical scheme of the invention, the following description is made by specific examples.

Fig. 1 is a schematic structural diagram of a depth truth value acquisition system according to an embodiment of the present application, where the depth truth value acquisition system includes a binocular camera 11, a target camera 12, a projection module 13, and a processing unit 14, where the binocular camera 11 includes a left-eye camera (or referred to as a left camera or a left camera) and a right-eye camera (or referred to as a right camera or a right camera), the left-eye camera captures a left-eye image, and the right-eye camera captures a right-eye image; the target camera 12 is used for acquiring a target image of a target scene; the projection module 13 is used for projecting the optical signal to the target scene; the processing unit 14 may be used for switching control, calibration, correction and subsequent processing of the binocular camera 11 and the target camera 12 and communicates with the binocular camera 11 and the target camera 12 via a wired and/or wireless network.

In one embodiment, target camera 12 may include an imaging device with imaging capabilities, such as an iToF camera, a direct time-of-flight (dtif) camera, a color camera, or a black and white camera. The embodiment of the application does not limit the type of the target camera. The term "target camera" is not intended to refer to a particular camera or a particular camera. More generally, the term "target" described herein is used solely to distinguish between descriptions and is not intended to limit the present application. The target image acquired by the target camera may be a depth image, a two-dimensional (2D) image, or the like. The 2D image is, for example, a black-and-white image, a color image, or the like. The color image may be, for example, an RGB image or the like.

In one embodiment, the projection module 13 may include light sources such as an edge-emitting laser (edge emitting laser, EEL) and a vertical cavity surface emitting laser (vertical cavity surface emitting laser, VCSEL), and may also include a light source array or a projector including a plurality of light sources. The light beam emitted by the light source may be visible light, infrared light, ultraviolet light, etc. The light beam projected by the projection module may form a uniform, random, or specially designed intensity distribution projection pattern on the target scene.

In the embodiment shown in fig. 1, the processing unit 14 is a computer. In other embodiments, the processing unit 14 may include an electronic device such as a mobile phone, a tablet, a notebook, a netbook, a personal digital assistant (personal digital assistant, PDA), etc., and the embodiments of the present application do not limit the specific type of processing unit 14.

In some embodiments of the present application, as shown in FIG. 2, the processing unit may include one or more processors 20 (only one shown in FIG. 2), a memory 21, and a computer program 22 stored in the memory 21 and executable on the one or more processors 20, e.g., a program that obtains depth truth values. The one or more processors 20, when executing the computer program 22, may implement various steps in embodiments of the depth truth acquisition method described below. Alternatively, the one or more processors 20 may perform the functions of the modules/units of the various depth truth acquiring apparatus embodiments described below when executing the computer program 22, which are not limiting herein.

Those skilled in the art will appreciate that fig. 2 is merely an example of a processing unit and is not meant to limit the processing unit. The processing unit may comprise more or less components than illustrated, or may combine certain components, or different components, e.g., the processing unit may also comprise input-output devices, network access devices, buses, etc.

In one embodiment, the processor 20 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In one embodiment, the memory 21 may be an internal storage unit of the processing unit, such as a hard disk or a memory of the processing unit. The memory 21 may also be an external storage device of the processing unit, such as a plug-in hard disk provided on the processing unit, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card), or the like. Further, the memory 21 may also include both an internal memory unit of the processing unit and an external memory device. The memory 21 is used for storing computer programs and other programs and data required by the processing unit. The memory 21 may also be used to temporarily store data that has been output or is to be output.

An embodiment of the present application also provides another preferred embodiment of the processing unit 14, in which the processing unit comprises one or more processors for executing the following program modules stored in the memory: the device comprises a calibration module, an image acquisition module, a parallax calculation module and a depth truth value calculation module, wherein:

and the calibration module is used for acquiring the respective internal parameters of the binocular camera and the target camera, the external parameters of the binocular camera and the external parameters of any one of the binocular camera and the target camera.

The image acquisition module is used for acquiring a target image of a target scene acquired by the target camera when the target camera is started; when the binocular camera and the projection module are started, and the projection module is rotated and/or translated, left-eye images and right-eye images of a plurality of pairs of target scenes acquired by the binocular camera are acquired.

And the parallax calculating module is used for calculating parallaxes of the left eye images and the right eye images of the target scene and calculating a fusion parallax image.

The depth truth value calculation module is used for calculating the depth information of the target scene according to the fusion disparity map and acquiring the depth truth value of the target camera by utilizing the parameters of the target camera

In some embodiments, the processing unit may be independent of the binocular camera and the target camera, see the embodiment shown in fig. 1. In other embodiments, the processing unit may be integrated with at least one of the binocular camera and the target camera. That is, in other embodiments, one or more processors of the binocular camera and/or the target camera itself, when executing the computer program, may implement one or more steps of the depth truth value acquisition method embodiments described below. Alternatively, the binocular camera and/or the one or more processors of the target camera themselves, when executing the computer program, may implement the functionality of one or more of the modules/units of the various depth truth value acquisition apparatus embodiments described below.

In addition to the foregoing depth truth acquisition system embodiments, in other embodiments the depth truth acquisition system further includes a manipulation module. The improvement on the embodiment shown in fig. 1 is illustrated here, and as shown in fig. 3, the depth truth value acquisition system further includes a manipulation module 15, where the manipulation module 15 is connected to the projection module 13, and is used to control rotation and/or translation of the projection module to scan the target scene. The steering module 15 communicates with the processing unit 14 via a wired and/or wireless network. The processing unit 14 may be configured to control the manipulation module 15 to drive the projection module 13 for stepwise rotation and/or translation. The embodiment of fig. 3 is identical to the embodiment of fig. 1 and will not be described in detail here. The embodiment shown in fig. 3 may rely on the rotating and/or moving projection module to project a pattern to scan a target scene by driving the projection module to rotate and/or translate by the steering module.

In some implementations, the steering module may include a stepper motor and/or a drive motor. The stepper motor and/or the drive motor is connected to the projection module 13. When the projection module needs to rotate, the processing unit 14 can control the stepper motor to rotate, so as to drive the projection module connected with the stepper motor to rotate step by step. When the projection module needs to translate, the processing unit 14 can control the driving motor to move, so as to drive the projection module connected with the driving motor to translate.

As an example, but not limited to, an example of rotation of the projection module may be referred to as an example shown in fig. 4, in this example, the projection module adopts laser projection, an optical axis of the projection module rotates by a certain angle, and the distribution of speckles projected onto a scene object translates before and after the rotation of the projection module, as shown in fig. 5, so that positions of speckles in a left eye image and a right eye image acquired by the binocular camera are changed, so as to facilitate acquisition of left eye images and right eye images of multiple pairs of target scenes.

It should be noted that, in other implementations, the target scene may be scanned by manually rotating and/or manually translating the projection module (or laser). It should be appreciated that in these implementations, the depth truth acquisition system need not be provided with a manipulation module, e.g., the depth truth acquisition system may include the structures shown in fig. 1 or 3.

Fig. 6 is a schematic flow chart of an implementation of a depth truth value obtaining method according to an embodiment of the present invention, where the depth truth value obtaining method in the present embodiment may be executed by an electronic device, and the electronic device includes, but is not limited to, a computer, a mobile phone, a camera, etc. By way of example and not limitation, the depth truth acquisition method may be applied to the depth truth acquisition system shown in fig. 1 or 3, and the depth truth acquisition method may be performed by the processing units in the embodiments shown in fig. 1, 2, or 3. The depth truth value obtaining method in the present embodiment is suitable for a situation where a depth truth value of a scene needs to be obtained, and may include steps S610 to S640 shown in fig. 6.

S610, acquiring respective internal parameters of the binocular camera and the target camera, external parameters of the binocular camera, and external parameters of any one of the binocular camera and the target camera.

The internal reference K of the binocular camera is obtained by calibrating the system _s The internal parameters K of the target camera, external parameters of the binocular camera, and external parameters R and T between either of the binocular cameras (left or right) and the target camera are acquired. Wherein R is a rotation matrix, and T is a translation matrix.

In some implementations, as shown in fig. 7, step S610 includes steps S611 to S613.

S611, respectively acquiring a left target fixed image, a right target fixed image and a target calibration image of a target camera of the calibration plate with different postures.

Specifically, the binocular camera and the target camera are fixed in front of the calibration plate, and the left target fixed image, the right target fixed image and the target fixed image of the calibration plate with different postures are respectively obtained by the binocular camera and the target camera. Aiming at a certain gesture of the calibration plate, the binocular camera and the target camera can acquire a left target fixed image, a right target fixed image and a target calibration image of the gesture of the calibration plate at the same time; the binocular camera and the target camera may also collect the left target calibration image, the right target calibration image, and the target calibration image of the calibration plate in a time sequence, which is not limited herein.

In one embodiment, the attitude of the calibration plate may be changed by changing the position and/or angle of the calibration plate. The calibration plates with different postures can provide rich coordinate information for the image. It should be understood that the calibration plate may include dots, checkerboard, or coded patterns, etc., as this application is not limited in this regard.

And S612, performing epipolar correction on the binocular camera by using the left target fixed image and the right target fixed image acquired by the binocular camera.

And carrying out epipolar correction on the binocular camera by utilizing the left target fixed image and the right target fixed image which are acquired by the binocular camera and have different postures.

In some embodiments, a left target fixed image is selected as a reference image, and the internal and external parameters of the binocular camera are obtained according to the mapping relation between the coordinate information of the corresponding right target fixed image and the left target fixed image. In other embodiments, the right target fixed image is selected as the reference image, and the internal and external parameters of the binocular camera are obtained according to the mapping relation between the coordinate information of the corresponding left calibration image and the right target fixed image. It should be understood that the left and right fixed images of the binocular camera are the corresponding images acquired simultaneously by the left and right cameras.

Further, according to the internal and external parameters of the binocular camera, polar correction is performed on the binocular camera. The optical axes of the left eye camera and the right eye camera of the binocular camera can be ensured to be parallel to each other through polar line correction, and the imaging planes of the left eye camera and the right eye camera are parallel to the base line of the binocular camera. It should be noted that, the epipolar correction method includes, but is not limited to, bouguet algorithm, hartley (Hartley) algorithm, and the like.

S613, calculating the external parameters of any one of the target camera and the binocular camera by using the target calibration image acquired by the target camera.

In some embodiments, a target calibration image is acquired by using a target camera, a left target calibration image acquired by a left camera in a binocular camera is selected as a reference image, and external parameters between the target camera and the left camera are acquired according to a mapping relation between the target calibration image and the left target calibration image.

In other embodiments, a right target fixed image acquired by a right camera in the binocular camera is selected as a reference image, and external parameters of the target camera and the right camera are acquired according to a mapping relation between the target camera calibration image and the right target fixed image.

S620, when the target camera is started, acquiring a target image of a target scene; when the binocular camera and the projection module are turned on, and the projection module is rotated and/or translated, left and right eye images of multiple pairs of target scenes are acquired.

Specifically, when the target camera is turned on, a target image of a target scene is acquired by the target camera, and at this time, the binocular camera and the projection module are not turned on. When the binocular camera and the projection module are started and the projection module is rotated or translated, left-eye images and right-eye images of a plurality of pairs of target scenes are acquired through the binocular camera. And scanning the target scene by rotating and/or translating the projection module, and acquiring left-eye images and right-eye images of a plurality of pairs of target scenes.

In some implementations, the left-eye and right-eye images of the target scene may be synchronously acquired with the binocular camera during the stepwise rotation and/or translation of the projection module by rotating and/or translating the projection module via the steering module. In other implementations, the projection module may be manually rotated and/or translated, after each rotation and/or translation of the projection module, the left-eye image and the right-eye image of the target scene are synchronously acquired with the binocular camera.

In some implementations, as shown in fig. 8, step S620 includes steps S621 to S622.

S621, only the target camera is turned on, and the target image of the target scene is acquired by the target camera.

When only the target camera is turned on, the binocular camera and the projection module are in a closed state, and the target camera and the target scene remain relatively stationary.

S622, closing the target camera, opening the binocular camera and the projection module, and acquiring a plurality of pairs of left-eye images and right-eye images of the target scene through the binocular camera in the process of rotating and/or translating the projection module.

And closing the target camera, starting the projection module and the binocular camera, rotating the projection module in a stepping way at a preset angle and/or translating the projection module in a stepping way at a preset length so that the projection module can scan a target scene, rotating the projection module in a stepping way and/or translating the projection module once, and synchronously exposing the binocular camera once to acquire images of the target scene to acquire a pair of left-eye images and right-eye images. Preferably, the projection module projects speckle to the target scene, and the binocular camera acquires left and right eye speckle images of the target scene.

It should be appreciated that the predetermined angle may be satisfied by the number of pixels occupied by each speckle shift in each of the left and right eye speckle images acquired by the binocular camera for each rotation of the projection module. The number of rotations of the projection module may be specifically designed according to specific conditions, and in some embodiments, the projection module may be rotated N times, i.e., N pairs of left and right eye speckle images may be acquired.

Similarly, the preset length can meet the requirement that each speckle in the left and right speckle images acquired by the binocular camera translates by one speckle pixel number every time the projection module translates. The number of translation times of the projection module may be specifically designed according to specific situations, and in some embodiments, the projection module may be translated N times, that is, N pairs of left and right eye speckle images may be acquired, where N may be any integer greater than or equal to 20, for example, 30, 50, or even 100. The value of N is not limited in this application.

In other embodiments, further, to collect a sufficient number of training samples for deep learning, the target scene may be changed multiple times, and the above step S620 (or steps S621 to S622) may be repeated for each target scene, so that training samples in a plurality of different target scenes may be obtained.

S630, calculating parallaxes of the left eye images and the right eye images of the target scene, and calculating a fusion parallax map.

In some embodiments, as shown in fig. 9, S630 includes steps S631 to S632.

S631, parallax is calculated for each pair of left-eye and right-eye images of the target scene, and a parallax map is obtained.

The optical axes of the left-eye camera and the right-eye camera of the epipolar-corrected binocular camera are parallel to each other, and the imaging plane is parallel to the base line. Therefore, when the position information of a certain pixel point in the left eye image (or the right eye image) is known, the matching point is only needed to be searched in the position of the middle pixel line in the right eye image (or the left eye image), so that the searching range can be reduced from two dimensions to one dimension, the matching speed is increased, and the matching accuracy is improved.

In some embodiments, after epipolar correction, the binocular camera acquires left and right eye speckle images, and parallax values can be obtained by performing parallax calculation according to coordinate information of pixels in the left eye speckle image and corresponding matched pixels in the right eye speckle image.

As a non-limiting example, using the left eye speckle image as a reference image, all pixels of the left eye speckle image are traversed to obtain a complete disparity map. As another non-limiting example, taking the right eye speckle image as a reference image, all pixels of the right eye speckle image are traversed to obtain a complete disparity map.

The projection module rotates once, a pair of left and right eye speckle images are acquired, and a frame of parallax image can be acquired according to the pair of left and right eye speckle images. The projection module rotates for a plurality of times, and a multi-frame parallax image can be obtained. For example, the projection module rotates N times, and N frame disparity maps can be obtained.

Similarly, the projection module translates once, a pair of left and right eye speckle images are acquired, and a frame of parallax image can be acquired according to the pair of left and right eye speckle images. For example, the projection module translates N times, and may obtain N frame disparity maps.

It should be understood that the real-time performance of the data generation is not high, so that the parallax calculation can be performed by adopting a stereo matching method with good effects, such as stereo processing (stereo processing by semi-global matching and mutual information, SGBM) or Graph Cuts (GC) based on semi-global matching and mutual information, and the application is not limited.

S632, fusing multi-frame disparity maps to obtain a fused disparity map.

In some embodiments, the multi-frame disparity map may be fused using a mean method to obtain a fused disparity map. As a non-limiting example, assume that the parallax of a certain pixel in the N-frame parallax map corresponding to N times of rotation of the projection module is d in turn ₁ 、d ₂ 、…、d _N The fusion disparity of the pixel is:

further, if the parallax value corresponding to a certain pixel in a certain frame is an invalid value in the parallax matrix of the N-frame parallax map, a subtraction operation is required to be performed on N when the fusion parallax of the pixel is obtained. And traversing each pixel of the reference image by taking any one of the N frames of parallax images as the reference image to obtain a fusion parallax image.

It should be noted that, in other embodiments, the parallax fusion may also use a median method or a bilateral weight method, which is not limited in this application.

It should be understood that before the fusion disparity map is obtained, the disparity may be optimized, and the untrusted points in the disparity map may be removed by using the left-right consistency.

{136 segment } S640, calculating depth information of the target scene according to the fusion disparity map, and obtaining a depth truth value of the target camera by using parameters of the target camera.

And calculating the depth information of the target scene according to the fusion disparity map, and acquiring a depth truth value of the target scene under the view angle of the target camera by utilizing the parameters of the target camera. Thus, a depth truth value corresponding to the target image is obtained.

As a non-limiting example, fig. 10 is a schematic diagram of an image and a corresponding depth truth value output by the target device (i.e., the target camera) by default. As shown in fig. 10, the left graph in fig. 10 represents a two-dimensional image of a certain scene, such as an RGB image or a grayscale image, which is output by the target device by default. The right graph in fig. 10 represents the depth truth value corresponding to the two-dimensional image of the scene.

In some embodiments, as shown in fig. 11, step S640 may include steps S641 to S643.

S641, calculating depth information of the target scene using the fusion disparity map.

In some embodiments, assuming that the distance between the left-eye camera and the right-eye camera of the binocular camera is b, the focal length of the binocular camera is f, the parallax of a certain pixel on the fusion parallax map is d, the depth information of the pixel is calculated according to the parallax as follows:

and traversing each pixel on the fusion disparity map to obtain the complete depth information of the target scene.

S642, converting the depth information into point cloud data, and projecting the point cloud data onto an imaging plane of the target camera based on the internal and external parameters of the target camera.

As a non-limiting example, the computing method for converting depth information into point cloud data is:

wherein, (x) _s ,y _s ,z _s ) Is the three-dimensional coordinates of the point cloud in the binocular camera coordinate system, z is the depth on each pixel, (u, v) is the pixel coordinates, (u) ₀ ,v ₀ ) For the principal point coordinates of the image, dx and dy are the physical dimensions of the binocular camera sensor pixels in two directions, f' being the focal length (in millimeters).

The depth information acquired in step S641 is converted into point cloud data, and based on the internal and external parameters of the target camera acquired in step S610, i.e. K, R and T, the point cloud data is projected onto the imaging plane of the target camera, so as to acquire corresponding point cloud data under the view angle of the target camera.

S643: and calculating a depth truth value acquired by the target camera according to the point cloud data projected to the imaging plane of the target camera.

And calculating the depth information of the target scene acquired by the target camera according to the point cloud data projected to the imaging plane of the target camera, and taking the depth information as a depth truth value of the target camera.

As a non-limiting example, from point cloud data projected onto an imaging plane of a target camera, a method of calculating depth truth values acquired by the target camera is:

wherein K is an internal parameter of the target camera, R and T are external parameters of any one of the binocular cameras and the target camera, and (u, v) is a pixel coordinate corresponding to the point cloud data mapped to the target camera in step S642, z _c And the depth value of the point cloud under the coordinate system of the target camera. According to the pixel coordinate pair z _c And (5) interpolating to obtain a depth truth value corresponding to the target image of the target camera.

On the basis of the foregoing embodiment, through steps S610 to S640, the target image of the target scene of the target camera and the depth truth value corresponding to the target image are acquired. Further, in other embodiments, the target image and the corresponding depth truth values may be used as a set of training data, and multiple sets of training data may be obtained for multiple different scenarios, where the multiple sets of training data form a training sample. And training the neural network model by using the training sample, and obtaining the weight parameters of the neural network model to obtain the trained neural network model. The trained neural network model can be deployed on an electronic device, such as a computer, a mobile phone or a camera, and when the electronic device is applied, the electronic device can obtain a depth image of a certain area or scene by using the trained neural network model according to an imaging image of the area or scene obtained by the certain camera or the camera.

In the embodiment of the present application, a target camera is used to obtain a target image of a target scene, and the target image is used as an input of a neural network model, and a weight parameter of the neural network model is obtained by learning a depth truth value corresponding to the target image. It should be understood that, through steps S610 to S640, the target camera may obtain depth truth values of different scenes according to the binocular camera, obtain images of different scenes by using the target camera, learn the depth truth values of different scenes by using the neural network model, and update and iterate the weight parameters of the neural network model to ensure that the depth image with high precision can be obtained after the image obtained by the target camera is input into the neural network model.

The neural network model may be, for example, a convolutional neural network model, a fully-connected neural network model, etc., which is not limited in this application.

In the related art, deep learning requires a large amount of training data (e.g., images) and a depth truth value (ground truth) as training samples, and the number of training samples and the accuracy of the depth truth value directly determine the final performance of the deep learning algorithm. According to the embodiment of the application, through a simple method and system, training samples required by the end-to-end deep learning neural network can be accurately and efficiently obtained.

Corresponding to the above depth truth value obtaining method, an embodiment of the present application further provides a depth truth value obtaining device. The depth truth value obtaining device is described in detail in the foregoing description of the method.

Referring to fig. 12, fig. 12 is a schematic block diagram of a depth truth acquiring apparatus according to an embodiment of the present invention. As an example, the depth truth value obtaining device is configured in the processing unit shown in fig. 1 or fig. 2, where the processing unit is connected to the binocular camera, the projection module, and the target camera.

The depth truth value acquisition device comprises: a calibration module 1201, an image acquisition module 1202, a parallax calculation module 1203, and a depth truth calculation module 1204.

The calibration module 1201 is configured to obtain respective internal parameters of the binocular camera and the target camera, external parameters of the binocular camera, and external parameters of any one of the binocular camera and the target camera.

An image acquisition module 1202, configured to acquire a target image of a target scene acquired by a target camera when the target camera is turned on; when the binocular camera and the projection module are started, and the projection module is rotated and/or translated, left-eye images and right-eye images of a plurality of pairs of target scenes acquired by the binocular camera are acquired.

The parallax calculating module 1203 is configured to calculate parallaxes of the pairs of left-eye images and right-eye images of the target scene, and calculate a fusion parallax.

The depth truth value calculation module 1204 is configured to calculate depth information of the target scene according to the fusion disparity map, and obtain a depth truth value of the target camera by using parameters of the target camera.

Alternatively, as another example, the depth truth value obtaining device is configured in the processing unit of the embodiment shown in fig. 3, where the processing unit is further connected to the manipulation module. As shown in fig. 13, the depth truth acquiring apparatus further includes a manipulation executing module 1205, where the manipulation executing module 1205 is configured to control the manipulation module to drive the projection module to rotate and/or translate.

Optionally, as shown in fig. 13, the depth truth obtaining apparatus further includes a training module 1206. The training module 1206 is configured to take the target image and the depth truth value as a set of training samples, obtain multiple sets of training samples in different target scenes, and train the neural network model by using the multiple sets of training samples to obtain a trained neural network model.

Fig. 14 is a schematic structural diagram of a depth camera according to the present invention, where the depth camera includes an acquisition module and a processing module, and the acquisition module is configured to acquire a first image of a target area; the processing module comprises a training unit, wherein the training unit is used for acquiring a depth truth value of a target area by using the first image and the training unit, and the training unit is a trained neural network model, more specifically, the training unit is used for acquiring the target image by using the acquisition module, and training the neural network model by using the target image as a training sample according to the depth truth value acquisition method to acquire the trained neural network model.

In one embodiment, the depth camera may further include a projection module for projecting an infrared beam toward the target area, an acquisition module for acquiring the infrared beam reflected back through the target area and generating an infrared image, and a processing module for acquiring a depth truth value of the target area using the infrared image and the training unit.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The embodiment of the application also provides electronic equipment, which comprises: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, which when executed by the processor performs the steps of any of the depth truth value acquisition method embodiments described above.

Embodiments of the present application also provide a computer readable storage medium storing a computer program that, when executed by a processor, implements steps that may implement the foregoing embodiments of the respective depth truth value acquisition method.

Embodiments of the present application provide a computer program product that, when run on an electronic device, enables the electronic device to implement the steps of the various deep truth acquisition method embodiments described above.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other manners. For example, the apparatus/electronic device embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random access memory (random access memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A depth truth value acquisition method, comprising:

calculating depth information of a target scene according to the fusion disparity map, and acquiring a depth truth value of the target camera by utilizing internal parameters and external parameters of the target camera;

the calculating the depth information of the target scene according to the fusion disparity map, and obtaining the depth truth value of the target camera by using the internal parameters and the external parameters of the target camera includes:

calculating a depth truth value acquired by the target camera according to the point cloud data projected to the imaging plane of the target camera;

According to the point cloud data projected to the imaging plane of the target camera, the method for calculating the depth truth value acquired by the target camera comprises the following steps:

wherein K is an internal reference of the target camera, R and T are external parameters of any one of the binocular cameras and the target camera, and (u, v) is a pixel coordinate corresponding to the point cloud data after being mapped to the target camera, and z _c The depth value of the point cloud under the coordinate system of the target camera is obtained; according to the pixel coordinate pair z _c Interpolation is carried out to obtain a depth truth value corresponding to a target image of the target camera; (x) _s ,y _s ,z _s ) Is the three-dimensional coordinates of the point cloud under the binocular camera coordinate system.

2. The depth truth value acquisition method according to claim 1, further comprising:

3. The depth truth value acquisition method according to claim 1 or 2, wherein the calculating the parallaxes of the pairs of left-eye images and right-eye images of the target scene and calculating the fusion disparity map includes:

And fusing the multi-frame parallax images to obtain a fused parallax image.

4. A depth truth value acquisition apparatus, comprising:

the depth truth value calculation module is used for calculating depth information of the target scene according to the fusion disparity map and obtaining a depth truth value of the target camera by utilizing internal parameters and external parameters of the target camera;

5. The depth truth obtaining apparatus according to claim 4, further comprising:

6. The depth truth acquisition apparatus according to claim 4 or 5, further comprising:

7. A depth truth acquisition system comprising a binocular camera, a target camera and a projection module, the depth truth acquisition system further comprising a depth truth acquisition apparatus according to any one of claims 4 to 6.

8. The depth truth acquisition system of claim 7, further comprising: and the manipulation module is used for driving the projection module to rotate and/or translate.

9. A depth camera is characterized by comprising an acquisition module and a processing module,

the acquisition module is used for acquiring a first image of the target area;

the training unit is a neural network model trained by taking a target image acquired by the depth truth value acquisition method according to any one of claims 1 to 3 as a training sample in advance.

10. The depth camera of claim 9, further comprising a projection module for projecting an infrared beam toward a target area, the acquisition module for acquiring the infrared beam reflected back through the target area and generating an infrared image, the processing module for acquiring a depth truth value of the target area using the infrared image and the training unit.