[ invention ]
The application provides a depth estimation model and method, a training system and an i-TOF depth camera, and aims to solve the problem that in the related art, the depth estimation accuracy of an i-TOF depth engine on some special scenes is low.
In order to solve the above technical problems in the related art, a first aspect of the present application provides a depth estimation model, which is applied to an i-TOF depth camera and is used for processing a raw phase image of a measured object acquired by the i-TOF depth camera to obtain a distance image, where the depth estimation model includes: the feature map extraction module is used for carrying out feature extraction on the rawphase map of the measured object so as to obtain a 1/4 resolution feature map and a 1/2 resolution feature map; the distance map generation module is used for carrying out convolution operation on the 1/4 resolution characteristic map to obtain a 1/4 resolution distance map, and up-sampling the 1/4 resolution distance map to obtain a 1/2 resolution distance map; the first distance map optimizing module is used for generating a first feedback characteristic according to the 1/2 resolution distance map and the 1/2 resolution characteristic map, performing iterative optimization on the 1/2 resolution distance map by using the first feedback characteristic to obtain an optimal 1/2 resolution distance map, and performing up-sampling on the optimal 1/2 resolution distance map to obtain a full resolution distance map; and the second distance map optimizing module is used for generating a second feedback characteristic according to the full-resolution distance map and the rawphase map, and carrying out iterative optimization on the full-resolution distance map by utilizing the second feedback characteristic so as to obtain an optimal full-resolution distance map.
A second aspect of an embodiment of the present application provides a depth estimation method, including: acquiring a rafphase diagram of a measured object, and carrying out feature extraction on the rafphase diagram to obtain a 1/4 resolution feature diagram and a 1/2 resolution feature diagram; convolving the 1/4 resolution feature map to obtain a 1/4 resolution distance map, and upsampling the 1/4 resolution distance map to obtain a 1/2 resolution distance map; generating a first feedback feature according to the 1/2 resolution distance map and the 1/2 resolution feature map, performing iterative optimization on the 1/2 resolution distance map by using the first feedback feature to obtain an optimal 1/2 resolution distance map, and performing up-sampling on the optimal 1/2 resolution distance map to obtain a full resolution distance map; and generating a second feedback characteristic according to the full-resolution distance map and the rawphase map, and performing iterative optimization on the full-resolution distance map by using the second feedback characteristic to obtain an optimal full-resolution distance map.
A third aspect of the embodiments of the present application provides a training system for a depth estimation model, the training system comprising a calibration plate, an i-TOF depth camera, and a control and processor, wherein: the i-TOF depth camera is used for emitting line laser to the calibration plate, receiving the line laser reflected by the calibration plate and generating a corresponding rapphase diagram according to the line laser; a control and processor for: processing the line laser intensity map to obtain a line laser intensity map, and generating a three-dimensional coordinate of the calibration plate according to a line laser scanning principle and the line laser intensity map, wherein a coordinate value corresponding to a target coordinate axis in the three-dimensional coordinate is a depth true value, the target coordinate axis is a coordinate axis pointing to the direction of the calibration plate from the i-TOF depth camera, and the depth true value and the corresponding line laser intensity map jointly form a group of training data; and adjusting the pose of the calibration plate to obtain a plurality of groups of training data, and training the depth estimation model through the plurality of groups of training data, wherein a rapphase diagram in each group of training data is used as the input of the depth estimation model, and a depth true value in each group of training data is used as the output of the depth estimation model.
A fourth aspect of the present application provides a training method for a depth estimation model, including: controlling an i-TOF depth camera to emit line laser to a calibration plate, receiving the line laser reflected by the calibration plate and generating a corresponding rapphase diagram according to the line laser; processing the line laser intensity map to obtain a line laser intensity map, and generating three-dimensional coordinates of the calibration plate according to a line laser scanning principle and the line laser intensity map, wherein coordinate values corresponding to target coordinate axes in the three-dimensional coordinates are depth truth values, the target coordinate axes are coordinate axes pointing to the direction of the calibration plate from the i-TOF depth camera, and the depth truth values and the corresponding line laser intensity map form a group of training data together; and adjusting the pose of the calibration plate to obtain a plurality of groups of training data, and training the depth estimation model through the plurality of groups of training data, wherein a rawphase graph in each group of training data is used as the input of the depth estimation model, and a depth true value in each group of training data is used as the output of the depth estimation model.
A fifth aspect of the embodiments of the present application provides a training system for another depth estimation model, including a calibration plate, a common imaging device, an i-TOF depth camera, and a control and processor, wherein: the common imaging device comprises a laser projector and an image sensor, wherein the laser projector is used for emitting first line laser to the calibration plate, and the image sensor is used for receiving the first line laser reflected by the calibration plate and generating a corresponding measurement graph according to the first line laser; the i-TOF depth camera is used for transmitting second line laser to the calibration plate, receiving the second line laser reflected by the calibration plate and generating a corresponding rapphase diagram according to the second line laser; a control and processor for: calculating the three-dimensional coordinates of the calibration plate according to the line laser scanning principle and the measurement graph, wherein the coordinate value corresponding to the target coordinate axis in the three-dimensional coordinates is a depth true value, the target coordinate axis is a coordinate axis pointing to the direction of the calibration plate from the common imaging device, and the depth true value and the corresponding rapphase graph jointly form a group of training data; and adjusting the pose of the calibration plate to obtain a plurality of groups of training data, and training the depth estimation model through the plurality of groups of training data, wherein a rapphase diagram in each group of training data is used as the input of the depth estimation model, and a depth true value in each group of training data is used as the output of the depth estimation model.
A sixth aspect of the embodiments of the present application provides another training method for a depth estimation model, including: controlling a laser projector in the common imaging device to emit first line laser to the calibration plate, and controlling an image sensor in the common imaging device to receive the first line laser reflected by the calibration plate and generate a corresponding measurement map according to the first line laser; controlling the i-TOF depth camera to emit second line laser to the calibration plate and receive the second line laser reflected by the calibration plate so as to generate a corresponding rawphase graph according to the second line laser; calculating the three-dimensional coordinates of the calibration plate according to the line laser scanning principle and the measurement graph, wherein the coordinate value corresponding to the target coordinate axis in the three-dimensional coordinates is a depth true value, the target coordinate axis is a coordinate axis pointing to the direction of the calibration plate from the common imaging device, and the depth true value and the corresponding rapphase graph jointly form a group of training data; and adjusting the pose of the calibration plate to obtain a plurality of groups of training data, and training the depth estimation model through the plurality of groups of training data, wherein a rawphase graph in each group of training data is used as the input of the depth estimation model, and a depth true value in each group of training data is used as the output of the depth estimation model.
A seventh aspect of embodiments of the present application provides an i-TOF depth camera, comprising: the transmitting end is used for transmitting the modulated light beam to the measured object; the receiving end is used for receiving the modulated light beam reflected by the measured object and generating a corresponding rapphase diagram according to the modulated light beam; the processing end is used for carrying out depth estimation on the rawphase map by using the depth estimation model in the first aspect of the embodiment of the application to obtain a corresponding optimal full-resolution distance map, and generating a corresponding depth map according to the depth camera internal reference and the optimal full-resolution distance map.
An eighth aspect of an embodiment of the present application provides another i-TOF depth camera, comprising: the transmitting end is used for transmitting modulated light beams with different modulation frequencies to the measured object; the receiving end comprises an image sensor, wherein the image sensor at least comprises a pixel, each pixel at least comprises three taps, and the receiving end is used for receiving modulated light beams with different modulation frequencies reflected by a measured object according to different demodulation frequencies in a mode of alternately collecting the taps, and generating a raw phase diagram under the different modulation and demodulation frequencies according to the modulated light beams, wherein the different demodulation frequencies correspond to the different modulation frequencies respectively; the processing end is configured to perform depth estimation on the rawphase map under different modulation and demodulation frequencies by using the depth estimation model described in the first aspect of the embodiment of the present application to obtain a corresponding optimal full-resolution distance map, and generate a corresponding depth map according to the depth camera internal reference and the optimal full-resolution distance map.
As can be seen from the above description, compared with the related art, the present application has the following beneficial effects: the method comprises the steps that a depth estimation model is formed by a feature map extraction module, a distance map generation module, a first distance map optimization module and a second distance map optimization module, depth estimation is carried out on a measured object through the depth estimation model, in practical application, a raw phase map of the measured object is firstly obtained, and feature extraction is carried out on the raw phase map through a feature map extraction module to obtain a 1/4 resolution feature map and a 1/2 resolution feature map; then, a distance map generating module carries out convolution operation on the 1/4 resolution characteristic map to obtain a 1/4 resolution distance map, and up-samples the 1/4 resolution distance map to obtain a 1/2 resolution distance map; then, generating a first feedback characteristic according to the 1/2 resolution distance map and the 1/2 resolution characteristic map by a first distance map optimizing module, performing iterative optimization on the 1/2 resolution distance map by using the first feedback characteristic to obtain an optimal 1/2 resolution distance map, and performing up-sampling on the optimal 1/2 resolution distance map to obtain a full resolution distance map; and then generating a second feedback characteristic according to the full-resolution distance map and the rawphase map by a second distance map optimizing module, and carrying out iterative optimization on the full-resolution distance map by using the second feedback characteristic to obtain an optimal full-resolution distance map. Therefore, the depth estimation model provided by the application essentially belongs to an end-to-end deep learning model, the depth data (namely, the optimal full-resolution distance map) of the measured object can be directly obtained from the original image data (namely, the raw image) of the measured object, a step-by-step optimization strategy is adopted in the depth estimation process, namely, firstly, the 1/2 resolution distance map is subjected to iterative optimization to obtain the optimal 1/2 resolution distance map, and then, the full-resolution distance map obtained by upsampling the optimal 1/2 resolution distance map is subjected to iterative optimization to obtain the optimal full-resolution distance map, so that the depth estimation result identical to that of the traditional scheme can be obtained under the conventional scene, and the depth estimation result superior to that of the traditional scheme can be obtained under the special scene (such as an inner right angle region which is easy to generate multipath errors and the like), and the accuracy of the depth estimation is guaranteed.
[ description of the drawings ]
In order to more clearly illustrate the technology of the related art or the technical solutions in the embodiments of the present application, the following description will briefly introduce the drawings that are required to be used in the description of the related technology or the embodiments of the present application, and it is apparent that the drawings in the following description are only some embodiments of the present application, but not all embodiments, and that other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is a schematic diagram of an i-TOF depth camera provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a training system for a depth estimation model according to an embodiment of the present application;
FIG. 3 is a flowchart of a training method of a depth estimation model according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a training system for another depth estimation model provided in an embodiment of the present application;
FIG. 5 is a flowchart of another training method of a depth estimation model according to an embodiment of the present application;
FIG. 6 is a first block diagram of a depth estimation model provided by an embodiment of the present application;
FIG. 7 is a second block diagram of a depth estimation model according to an embodiment of the present application;
FIG. 8 is an exemplary diagram of iterative optimization of a first distance map optimization module provided in an embodiment of the present application;
FIG. 9 is a third block diagram of a depth estimation model provided by an embodiment of the present application;
fig. 10 is a flowchart of a depth estimation method according to an embodiment of the present application.
[ detailed description ] of the invention
For the purposes of making the objects, technical solutions and advantages of the present application more apparent and understandable, the present application will be clearly and completely described in the following description with reference to the embodiments of the present application and the corresponding drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. It should be understood that the following embodiments of the present application are described only for explaining the present application, and are not intended to limit the present application, that is, all other embodiments obtained by persons of ordinary skill in the art without making any inventive effort based on the embodiments of the present application are within the scope of protection of the present application. Furthermore, the technical features referred to in the embodiments of the present application described below may be combined with each other as long as they do not constitute a conflict with each other.
Fig. 1 is a schematic diagram of an i-TOF depth camera provided in an embodiment of the present application, where the i-TOF depth camera includes a transmitting end 101, a receiving end 102, and a processing end 103, where the transmitting end 101 is configured to transmit a laser beam to a measured object, the receiving end 102 is configured to receive the laser beam reflected by the measured object and generate a corresponding rapphase diagram according to the laser beam, and the processing end 103 is configured to process the rapphase diagram generated by the receiving end 102 through an i-TOF depth engine carried by itself to obtain a corresponding depth diagram. In the related art, there are still many drawbacks of the i-TOF depth engine installed on the processing end 103, such as: acceleration cannot be performed on the NPU; the depth estimation scheme does not belong to the deep learning scheme; for some special scenes (such as an inner right angle region which is prone to generate multipath errors), the accuracy of depth estimation is low, and the result is quite poor. In order to solve these problems, the embodiment of the present application designs a depth estimation model, which essentially belongs to an end-to-end deep learning model, and is capable of directly obtaining an optimal full resolution distance map of a measured object from original image data (i.e., a raw image) of the measured object, and then the depth estimation model can be carried on the processing end 103 of the i-TOF depth camera during the process of producing and manufacturing the i-TOF depth camera, so that the processing end 103 can use the depth estimation model to perform depth estimation on the raw image generated by the receiving end 102 to obtain the optimal full resolution distance map of the measured object, and then generate the depth map of the measured object according to the obtained optimal full resolution distance map and the internal parameters of the i-TOF depth camera.
In some embodiments, the emitting end 101 includes a light source, a light beam modulator, and a light source driver, where the light source may be a Light Emitting Diode (LED), an Edge Emitting Laser (EEL), a Vertical Cavity Surface Emitting Laser (VCSEL), or the like, or may be a light source array formed by a plurality of light sources, and the light beam emitted by the light source may be visible light, infrared light, ultraviolet light, or the like, and the light source emits the light beam under the control of the light source driver (which may be further controlled by the processing end 103). In one embodiment, the light source emits a beam of light whose amplitude is modulated under control to emit a pulsed beam, a square wave beam, a sine wave beam, or the like; the beam modulator receives a beam from the light source and emits a modulated line laser beam toward the object under test. It will be appreciated that a portion of the processing end 103 or sub-circuits that exist independently of the processing circuitry may be utilized to control the light source to emit an associated light beam, such as a pulse signal generator.
In some embodiments, the receiving end 102 includes a TOF sensor including at least one pixel, each pixel in the TOF sensor including 3 or more taps (taps for storing and reading or draining charge signals generated by reflected light pulses under control of the corresponding electrodes) compared to a conventional image sensor for photographing only, and when each pixel includes multiple taps, the taps are sequentially switched in an order within a single frame period T (or a single exposure time) to collect electrons generated by the pixels receiving light signals reflected back by the object to form a charge signal resulting in a rawphase map. In one embodiment, the receiving end 102 further includes a lens and an optical filter, where the lens is used to receive the light beam reflected by the measured object and focus the light beam on the pixel of the TOF sensor, and the optical filter is used to limit the reflected light in the preset wavelength band to enter the image sensor, so as to avoid the parasitic light interference. Alternatively, the lens may be a single focusing lens or a focusing lens group, without limitation.
In some embodiments, the processing terminal 103 may be a separate dedicated circuit, such as a dedicated SOC chip, FPGA chip, ASIC chip, NPU chip, etc. comprising a CPU, memory, bus, etc., or may comprise a general purpose processing circuit, such as when the i-TOF depth camera is integrated into a smart terminal, such as a cell phone, television, computer, etc., as at least a portion of the processing terminal 103.
In some embodiments, the transmitting end 101 transmits a modulated light beam to the measured object, where the modulated light beam has a single modulation frequency, the receiving end 102 controls each tap to receive the modulated light beam reflected by the measured object in a rotation acquisition manner according to a single demodulation frequency corresponding to the single modulation frequency of the transmitting end 101 in a single frame period, and generates a corresponding wwphase map according to the modulated light beam, and the processing end 103 performs depth estimation on the wwphase map by using the depth estimation model to obtain an optimal full resolution distance map of the measured object, and generates a depth map of the measured object according to the i-TOF depth camera internal reference and the optimal full resolution distance map. It can be understood that these embodiments all belong to a single-frequency depth estimation scheme, that is, the modulated light beam emitted by the transmitting end 101 has a single modulation frequency, and the rawphase map generated by the receiving end 102 is also single-frequency image data.
In other embodiments, the transmitting end 101 transmits modulated light beams with different modulation frequencies to the measured object, the receiving end 102 controls each tap to receive modulated light beams with different modulation frequencies reflected by the measured object in a rotation acquisition manner according to different demodulation frequencies in different frame periods, and generates a rawphase map with different modulation and demodulation frequencies according to the modulated light beams, and the processing end 103 performs depth estimation on the rawphase map with different modulation and demodulation frequencies by using the depth estimation model to obtain a corresponding optimal full resolution distance map, and generates a corresponding depth map according to the i-TOF depth camera internal reference and the optimal full resolution distance map. It can be understood that these embodiments belong to a multi-frequency depth estimation scheme, that is, the transmitting end 101 transmits modulated light beams with different modulation frequencies, and the receiving end 102 generates a rawphase map with different modulation and demodulation frequencies, where the different demodulation frequencies respectively correspond to the different modulation frequencies.
It should be noted to those skilled in the art that the above-described embodiments are only preferred implementations of the examples of the present application, and are not intended to be the only limitations of the description; in this regard, those skilled in the art may flexibly set according to the actual application scenario on the basis of the embodiments of the present application. In addition, the processing end 103 in the above embodiment may use a conventional processor or an NPU, but when the processing end 103 uses an NPU, the i-TOF depth camera may not need to be provided with a memory, where the memory is used to store a computer program and related data necessary in some depth estimation process.
In the foregoing, the depth estimation model designed in the embodiment of the present application essentially belongs to an end-to-end deep learning model, which means that the depth estimation model needs to be trained on the processing end 103 in the process of producing and manufacturing the depth camera, so that the processing end 103 can use the depth estimation model to perform depth estimation on the rawphase map generated by the receiving end 102. Therefore, the embodiment of the application also provides two training systems with different structures and principles to complete the training work of the depth estimation model, and the two training systems are respectively described in terms of structures, principles, training methods and the like.
Fig. 2 is a schematic diagram of a training system of a depth estimation model according to an embodiment of the present application, where the training system (hereinafter referred to as a first training system) includes a calibration board 201, an i-TOF depth camera 202, a driving device (not shown in the figure) and a control and processor 203, where the calibration board 201 is opposite to the i-TOF depth camera 202, the driving device is in driving connection with the calibration board 201, and the control and processor 203 is electrically connected to the driving device and the i-TOF depth camera 202. Specifically, the i-TOF depth camera 202 is configured to emit line laser light to the calibration plate 201 and receive the line laser light reflected back through the calibration plate 201 to generate a corresponding rawphase map therefrom. The control and processor 203 is configured to: processing the line laser intensity map to obtain a line laser intensity map, and generating a three-dimensional coordinate of the calibration plate 201 according to a line laser scanning principle and the line laser intensity map, wherein a coordinate value corresponding to a target coordinate axis in the three-dimensional coordinate is a depth truth value, the target coordinate axis is a coordinate axis pointing from the i-TOF depth camera 202 to the direction of the calibration plate 201 (namely, a connecting line direction between the i-TOF depth camera 202 and the calibration plate 201), and the depth truth value and the corresponding line laser intensity map jointly form a group of training data; and adjusting the pose of the calibration plate 201 by controlling the driving device to drive the calibration plate 201 to rotate and/or translate (preferably translate on the target coordinate axis) so as to obtain multiple sets of training data, and training the depth estimation model through the multiple sets of training data, wherein a ram image in each set of training data is used as input of the depth estimation model, and a depth true value in each set of training data is used as output of the depth estimation model.
In the embodiment of the present application, the process of generating the three-dimensional coordinates of the calibration board 201 by the control and processor 203 needs to be performed based on the line laser scanning principle, and the line laser scanning principle will be described by taking an i-TOF depth camera including a transmitting end and a receiving end as an example: a laser line emitted from the emitting end to the calibration plate 201 may form a light plane, and one light plane corresponds to one light plane equation, and the light plane equation may be obtained through calibration; the receiving end collects the laser line reflected by the calibration plate 201 to form a rawphase map on the imaging plane and transmits the rawphase map to the control and processor 203; the control and processor 203 processes the ray phase diagram to obtain a line laser intensity diagram including a laser line, and simultaneously obtains a target ray (a ray passing through the optical center of the receiving end and emitted to the calibration plate 201 from any pixel (denoted as P) of the receiving end receiving the laser line) and an intersection point of the corresponding light plane to determine a three-dimensional coordinate of P, and then traverses all pixels in the receiving end to obtain the three-dimensional coordinate of the calibration plate 201.
In the line laser scanning principle, the light plane equation corresponding to any light plane can be obtained through calibration, and the calibration process of the light plane equation is described by taking an i-TOF depth camera comprising a transmitting end and a receiving end as an example: closing the transmitting end, only opening the receiving end, and shooting a calibration plate image (hereinafter referred to as a first image) without laser lines; starting a transmitting end, and controlling a receiving end to shoot a calibration plate image (hereinafter referred to as a second image) containing laser lines, wherein the first image and the corresponding second image together form a group of light plane equation calibration images; the pose of the calibration plate is adjusted, and the steps of acquiring the first image and the second image are executed under different poses, so that a plurality of groups of light plane equation calibration images are obtained; in each group of light plane equation calibration images, calculating a plane equation of the calibration plate by using the first image, calculating a light ray equation (an equation of a ray which passes through an optical center of the receiving end and is directed to the calibration plate from any pixel of the receiving end for receiving the laser line) corresponding to each pixel point in the receiving end by using the second image, and calculating three-dimensional point coordinates on the light plane equation by using a line-plane intersection method; and (3) carrying out least square plane fitting on all three-dimensional point coordinates calculated in the calibration images of each group of light plane equations, wherein the fitting result is the light plane equation.
It should be noted that the i-TOF depth camera in the present embodiment may further include a processing end, but the processing end of the i-TOF depth camera may be only used to transmit the image data to the control and processor 203 in the training system during the training process, without processing the image data. In addition, the control and processor 203 in the training system is a host computer with a higher computing power than the processing end in the i-TOF depth camera.
Fig. 3 is a flowchart of a training method of a depth estimation model according to an embodiment of the present application, where the training method is actually a method for training the depth estimation model by the first training system, and the training method includes the following steps 301 to 303.
Step 301, controlling a transmitting end in the i-TOF depth camera to transmit line laser to the calibration plate, and controlling a receiving end in the i-TOF depth camera to receive the line laser reflected by the calibration plate and accordingly generating a corresponding rapphase diagram.
In this embodiment, when the first training system is used to train the depth estimation model, the control and processor 203 is required to control the transmitting end in the i-TOF depth camera 202 to transmit the line laser to the calibration plate 201, and control the receiving end in the i-TOF depth camera 202 to receive the line laser reflected by the calibration plate 201 and generate the corresponding rawphase map accordingly.
And 302, processing the rapphase diagram to obtain a line laser intensity diagram, generating three-dimensional coordinates of a calibration plate according to a line laser scanning principle and the line laser intensity diagram to obtain a depth truth value, and taking the depth truth value and the corresponding rapphase diagram as a group of training data.
In this embodiment, after the raw image of the calibration board 201 is obtained, the control and processor 203 processes the raw image to obtain a line laser intensity image including line laser, and generates a three-dimensional coordinate of the calibration board 201 according to the line laser scanning principle and the line laser intensity image, where a coordinate value corresponding to a target coordinate axis in the three-dimensional coordinate is a depth truth value, the target coordinate axis is a coordinate axis pointing from the i-TOF depth camera 202 to the direction of the calibration board 201, and the depth truth value and the corresponding raw image together form a set of training data.
And 303, adjusting the pose of the calibration plate to obtain a plurality of sets of training data, and training the depth estimation model through the plurality of sets of training data.
In the embodiment of the present application, after obtaining one set of training data, steps 301 and 302 are repeatedly performed under different poses of the calibration plate 201 to obtain additional sets of training data. Specifically, the control and processor 203 adjusts the pose of the calibration plate 201 by controlling the driving device to drive the calibration plate 201 to rotate and/or translate, and obtains different depth truth values and corresponding rawphase graphs through steps 301 and 302 under different poses, so as to finally obtain multiple sets of training data, and then trains the depth estimation model through the multiple sets of training data, wherein the rawphase graphs in each set of training data are all used as input of the depth estimation model, and the depth truth values in each set of training data are all used as output of the depth estimation model.
Fig. 4 is a schematic diagram of a training system of another depth estimation model according to an embodiment of the present application, where the training system (hereinafter referred to as a second training system) includes, in addition to the calibration board 201 of the first training system, the i-TOF depth camera 202, a driving device (not shown in the figure), and the control and processor 203, a common imaging device 204 also disposed opposite to the calibration board 201, and the common imaging device 204 is electrically connected to the control and processor 203. Specifically, the common imaging device 204 includes a laser projector for emitting the first line laser light toward the calibration plate 201, and an image sensor (e.g., a CMOS sensor) for receiving the first line laser light reflected back through the calibration plate 201 and generating a corresponding measurement map therefrom. The i-TOF depth camera 202 functions the same as the first training system by emitting a second line laser light to the calibration plate 201 and receiving the second line laser light reflected back through the calibration plate 201 and generating therefrom a corresponding rawphase map for transmission to the control and processor 203. The control and processor 203 is configured to: calculating the three-dimensional coordinates of the calibration plate 201 according to the line laser scanning principle and the measurement graph, wherein the coordinate value corresponding to the target coordinate axis in the three-dimensional coordinates is a depth truth value, the target coordinate axis is a coordinate axis pointing to the direction of the calibration plate 201 from the common imaging device 204 (namely, the connecting line direction between the common imaging device 204 and the calibration plate 201), and the depth truth value and the corresponding rawphase graph form a group of training data together; and adjusting the pose of the calibration plate 201 by controlling the driving device to drive the calibration plate 201 to rotate and/or translate (preferably translate on the target coordinate axis) so as to obtain multiple sets of training data, and then training the depth estimation model through the multiple sets of training data, wherein a ram image in each set of training data is used as an input of the depth estimation model, and a depth true value in each set of training data is used as an output of the depth estimation model.
It will be appreciated that the second training system differs from the first training system in that: the first training system includes only the i-TOF depth camera 202, while the second training system includes not only the i-TOF depth camera 202 but also the common imaging device 204. When the training system includes the i-TOF depth camera 202 and the common imaging device 204, before training the depth estimation model, the i-TOF depth camera 202 and the common imaging device 204 need to be calibrated, and the calibration result is that a relative transformation matrix of the two is obtained, in this case, the i-TOF depth camera 202 performs depth estimation on the rawphase map, and the finally obtained depth map can be converted into the same coordinate system as the depth truth value through the relative transformation matrix, so as to facilitate comparison with the depth truth value to check the accuracy of the depth estimation.
Fig. 5 is a flowchart of another training method of a depth estimation model according to an embodiment of the present application, where the training method is actually a method for training the depth estimation model by using the second training system, and the training method includes the following steps 501 to 504.
Step 501, controlling a laser projector in a common imaging device to emit first line laser to a calibration plate, and controlling an image sensor in the common imaging device to receive the first line laser reflected by the calibration plate and generate a corresponding measurement map according to the first line laser.
In this embodiment, when the second training system is used to train the depth estimation model, the control and processor 203 controls the laser projector in the common imaging device 204 to emit the first line laser to the calibration plate 201, and controls the image sensor in the common imaging device 204 to receive the first line laser reflected back by the calibration plate 201 and generate a corresponding measurement map according to the first line laser.
Step 502, controlling a transmitting end in the i-TOF depth camera to transmit the second line laser to the calibration plate, and controlling a receiving end in the i-TOF depth camera to receive the second line laser reflected by the calibration plate and generate a corresponding rapophase graph according to the second line laser.
In this embodiment, not only the measurement map of the calibration board 201 but also the rawphase map of the calibration board 201 needs to be obtained, that is, the control and processor 203 controls the transmitting end in the i-TOF depth camera 202 to transmit the second line laser to the calibration board 201, and controls the receiving end in the i-TOF depth camera 202 to receive the second line laser reflected by the calibration board 201 and generate the corresponding rawphase map accordingly.
And 503, calculating the three-dimensional coordinates of the calibration plate according to the line laser scanning principle and the measurement diagram to obtain a depth truth value, and taking the depth truth value and the corresponding rawphase diagram as a group of training data.
In this embodiment, after obtaining the measurement map and the ram of the calibration board 201, the control and processor 203 calculates the three-dimensional coordinates of the calibration board 201 according to the line laser scanning principle and the measurement map, where the coordinate value corresponding to the target coordinate axis in the three-dimensional coordinates is a depth truth value, the target coordinate axis is a coordinate axis pointing to the direction of the calibration board 201 from the common imaging device 204, and the depth truth value and the corresponding ram together form a set of training data.
And 504, adjusting the pose of the calibration plate to obtain a plurality of sets of training data, and training the depth estimation model through the plurality of sets of training data.
In the embodiment of the present application, after obtaining a set of training data, steps 501 to 503 are further required to be repeatedly executed under different poses of the calibration plate 201 to obtain additional sets of training data. Specifically, the control and processor 203 adjusts the pose of the calibration plate 201 by controlling the driving device to drive the calibration plate 201 to rotate and/or translate, and obtains different depth truth values and corresponding rawphase graphs in different poses through steps 501-503, so as to finally obtain multiple sets of training data, and then trains the depth estimation model through the multiple sets of training data, wherein the rawphase graphs in each set of training data are all used as input of the depth estimation model, and the depth truth values in each set of training data are all used as output of the depth estimation model.
In view of the foregoing, the embodiment of the present application designs a depth estimation model, which essentially belongs to an end-to-end deep learning model, and is capable of directly obtaining an optimal full resolution distance map of a measured object from original image data (i.e., a raw image) of the measured object, and training the depth estimation model by using a raw image acquired by an i-TOF depth camera through a first training system or a second training system in the process of producing and manufacturing the depth camera, and then mounting the depth estimation model on a processing end 103 of the i-TOF depth camera, so that the i-TOF depth camera can directly use the depth estimation model to perform depth estimation on the generated raw image after leaving the factory to obtain the optimal full resolution distance map of the measured object, and then generate the depth map of the measured object according to the obtained optimal full resolution distance map and the internal parameters of the depth camera. The depth estimation model is explained in detail below.
Fig. 6 is a first block diagram of a depth estimation model according to an embodiment of the present application, where the depth estimation model includes a feature map extracting module 610, a distance map generating module 620, a first distance map optimizing module 630, and a second distance map optimizing module 640, where the feature map extracting module 610 takes a rawphase map of a measured object as input and is connected to the distance map generating module 620, the distance map generating module 620 is connected to the first distance map optimizing module 630, the first distance map optimizing module 630 is connected to the second distance map optimizing module 640, and the second distance map optimizing module 640 takes an optimal full resolution distance map of the measured object as output. Specifically, the feature map extracting module 610 is configured to perform feature extraction on a rawphase map of a measured object to obtain a 1/4 resolution feature map and a 1/2 resolution feature map; the distance map generating module 620 is configured to perform convolution operation on the 1/4 resolution feature map to obtain a 1/4 resolution distance map, and up-sample the 1/4 resolution distance map to obtain a 1/2 resolution distance map; the first distance map optimizing module 630 is configured to generate a first feedback feature according to the 1/2 resolution distance map and the 1/2 resolution feature map, perform iterative optimization on the 1/2 resolution distance map by using the first feedback feature to obtain an optimal 1/2 resolution distance map, and perform upsampling on the optimal 1/2 resolution distance map to obtain a full resolution distance map; the second distance map optimization module 640 is configured to generate a second feedback feature according to the full-resolution distance map and the rawphase map, and perform iterative optimization on the full-resolution distance map by using the second feedback feature to obtain an optimal full-resolution distance map.
Step 1001, a rawphase map of the measured object is obtained, and feature extraction is performed on the rawphase map to obtain a 1/4 resolution feature map and a 1/2 resolution feature map.
In this embodiment of the present application, when the depth estimation model is used to perform depth estimation on the measured object, the feature map extracting module 610 needs to obtain a rawphase map of the measured object, and perform feature extraction on the obtained rawphase map to obtain a 1/4 resolution feature map and a 1/2 resolution feature map.
Step 1002, convolving the 1/4 resolution feature map to obtain a 1/4 resolution distance map, and upsampling the 1/4 resolution distance map to obtain a 1/2 resolution distance map.
In the embodiment of the present application, after obtaining the 1/4 resolution feature map and the 1/2 resolution feature map by the feature map extraction module 610, the distance map generation module 620 needs to perform convolution operation on the 1/4 resolution feature map to obtain a 1/4 resolution distance map, and upsample the 1/4 resolution distance map to obtain a 1/2 resolution distance map.
Step 1003, generating a first feedback feature according to the 1/2 resolution distance map and the 1/2 resolution feature map, performing iterative optimization on the 1/2 resolution distance map by using the first feedback feature to obtain an optimal 1/2 resolution distance map, and performing up-sampling on the optimal 1/2 resolution distance map to obtain a full resolution distance map.
In this embodiment of the present application, after obtaining a 1/2 resolution distance map through the distance map generating module 620, the first distance map optimizing module 630 needs to generate a first feedback feature according to the 1/2 resolution distance map and the 1/2 resolution feature map obtained before, and iteratively optimize the 1/2 resolution distance map by using the first feedback feature to obtain an optimal 1/2 resolution distance map, and upsample the optimal 1/2 resolution distance map to obtain a full resolution distance map.
In some embodiments, the process of the first distance map optimization module 630 performing iterative optimization on the 1/2 resolution distance map to obtain the optimal 1/2 resolution distance map may be: performing difference comparison on the 1/2 resolution distance map and the 1/2 resolution feature map to obtain a first feedback feature; optimizing the 1/2 resolution distance map by using the first feedback feature to obtain a first optimized 1/2 resolution distance map, and performing difference comparison on the first optimized 1/2 resolution distance map and the 1/2 resolution feature map to obtain a second first feedback feature; optimizing the 1/2 resolution distance map after the first optimization by using the second first feedback feature to obtain a 1/2 resolution distance map after the second optimization, and performing difference comparison on the 1/2 resolution distance map after the second optimization and the 1/2 resolution feature map to obtain a third first feedback feature; and so on until the first feedback feature stops within a preset range or reaches a preset iteration number, the output 1/2 resolution distance graph is optimal.
And step 1004, generating a second feedback feature according to the full-resolution distance map and the rawphase map, and performing iterative optimization on the full-resolution distance map by using the second feedback feature to obtain an optimal full-resolution distance map.
In this embodiment of the present application, after obtaining the full-resolution distance map of the measured object through the first distance map optimizing module 630, the second distance map optimizing module 640 needs to generate the second feedback feature according to the full-resolution distance map and the previously obtained rawphase map, and perform iterative optimization on the full-resolution distance map by using the second feedback feature to obtain the optimal full-resolution distance map.
In some embodiments, the process of the second distance map optimization module 640 performing iterative optimization on the full resolution distance map to obtain the optimal full resolution distance map may be: performing difference comparison on the full-resolution distance map and the rawphase map to obtain a first feedback characteristic and a second feedback characteristic; optimizing the full-resolution distance map by using the first and second feedback features to obtain a first optimized full-resolution distance map, and performing difference comparison on the first optimized full-resolution distance map and the rawphase map to obtain a second and second feedback feature; optimizing the first optimized full-resolution distance map by using a second feedback feature to obtain a second optimized full-resolution distance map, and performing difference comparison on the second optimized full-resolution distance map and the ram phase map to obtain a third second feedback feature; and the like, until the second feedback characteristic stops in a preset range or reaches a preset iteration number, the output full-resolution distance graph is optimal.
Further, the wwphase diagram of the measured object obtained in step 1001 may be single-frequency or multi-frequency, and when the wwphase diagram is a multi-frequency wwphase diagram, the multi-frequency wwphase diagram needs to be normalized before feature extraction, so as to keep the numerical ranges of the multi-frequency wwphase diagram consistent. Thus, when the rapphase diagram is a multi-frequency rapphase diagram, step 1001 further includes, before: the multi-frequency rafphase map is normalized to keep the numerical range of the multi-frequency rafphase map consistent. Preferably, the normalization processing performed on the multi-frequency rafphase map is a processing normalized to 0 to 1.
It should be noted to those skilled in the art that the above-described embodiments are only preferred implementations of the examples of the present application, and are not intended to be the only limitations of the description; in this regard, those skilled in the art may flexibly set according to the actual application scenario on the basis of the embodiments of the present application. In addition, the upsampling referred to in steps 1001-1004 may all be bilinear interpolation upsampling.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state drive SSD), etc.
It should be noted that, in the present application, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all referred to each other. For product class embodiments, the description is relatively simple as it is similar to method class embodiments, as relevant points are found in the partial description of method class embodiments.
It should also be noted that in the present application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.